Why does Java use a mediocre hashCode implementation for strings?
-
The Java hashCode() implementation for strings (and arrays of primitive types) is quite simple: int h = 0; for (int i = 0; i < input.length() ; i++) { h = 31 * h + input.charAt(i); } This hash function isn't particularly good, especially in the higher bits [1]. There are much better string hash functions (e.g. Jenkings Hash, FNV Hash [2]) that are roughly equally fast to execute but have a better distribution. Why doesn't Java implement a more elaborate hashing scheme? [1] http://www.javamex.com/tutorials/collections/hash_function_technical_2.shtml [2] http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
-
Answer:
Referring to the documentation of Object's hashCode : "This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable." http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Object.java#Object.hashCode%28%29 The Java language designers added hashcode primarily for hash-based collections such as HashTable and HashMap. String is no different. If you look at HashMap, you will notice that it is uses "bitwise-And" to assign objects to HashMap buckets -- essentially the low-order bits determine the bucket index. /*** Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); } From from your school days, you may recall using myString.hashCode() % numOfBuckets to assign an object to a bucket. In Java HashMap (starting in 1.4, I believe), Josh Bloch et al changed this to myString.hashCode() & (numOfBuckets -1) This is much cheaper in terms of CPU as modulo (division) is more expensive than bit-AND or bit-shifts. As long as the low order bits of myString.hashCode() are random, this algorithm will get uniform object-to-bucket distribution. However, since anyone can override hashCode for an object, Josh et al needed to provide some safeguards. This is accomplished by a supplemental hash method (in the HashMap class). /*** Applies a supplemental hash function to a given hashCode, which * defends against poor quality hash functions. This is critical * because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } To summarize, a key aim of Java's hashCode was to support hash-based collections. Hash-based collections like HashMap deal with some bit-bias and issues related to poor hash function implementations, so a mediocre hashCode implementation in String is not a terribly big deal.
Siddharth Anand at Quora Visit the source
Other answers
Because it's cheap; if you need something more sophisticated, then you probably won't be able to use any other default function either (which also should be cheap). There is a reason why hashCode() is easily overrideable. Even if tempting, changing it would break any applications with stored hashes.
Toby Thain
Compatibility mostly. It was a lot worse in JDK 1.0 and 1.1 where it would only sample every other character. JDK and above multiplied an odd prime (31) by the sequence char to avoid collisions. Seems like a decent tradeoff between speed and collision reduction.
Chris Longo
The hash code algorithm is specified in the documentation for String so they are "not allowed" to change it (it would break backwards compatibility). Some people may have written programs that depend on this specific algorithm
Jonathan Paulson
Related Q & A:
- Why cannot we use static keyword inside a method in java?Best solution by Stack Overflow
- Is it better to get a higher grade in a easier class or a mediocre/low grade in a hard class?Best solution by greatcollegeadvice.com
- Can a mediocre programmer find a job?Best solution by Programmers
- Is there any why I can use my Sony Handycam as a webcam?Best solution by Yahoo! Answers
- Why does North america use a different voltage than Europe and Asia?Best solution by Quora
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.