Hash functions

A hash is a mathematical function that assigns a number to an element. Say you work at a university administration and you have to tell if Wilkinson is a student at your class. You can store the names on small papers in envelopes one for each starting letter. Instead of searching through the 10 thousand students, you can look at the papers in the envelope titled W. This very simple hash function assigns the first letter of the name to the name (or the ordinal number of the letter, as we said that a hash function results a number). This is not really a good hash function because it puts only a few elements, if any, into the envelope denoted X and many to A for example.

A good hash function results each possible ordinal number with similar probability. In hash tables, we usually have more buckets (envelopes in the previous example) than the number of elements to be stored. Therefore, when an element is searched for, it is likely that there is only one element there. At least that is what we would like to have. If there are multiple elements in a single bucket, it is called collision. A good hash function has as little collisions as possible.

For backward compatibility, there is a Hashtable class in the JDK. This was one of the first hash table implementations in Java right in the very first version, and as Java is backward compatible, it was not thrown away. The Map interface was introduced in version 1.2 only. Hashtable has many drawbacks and its use is not recommended. (Even the name is violating the Java naming conventions.) We do not discuss this class in this book. Whenever we talk about hash tables, it is referring to the actual array that is inside the implementation of HashSet, HashMap, or any other collection that uses some hash indexed table.

Hash tables are arrays that use the result of the hash function to index the array. Usually, linked lists manage collisions. Hash table implementations also implement a strategy to resize the array when the number of elements to be stored becomes too high and the likelihood of collisions increase. This operation may take considerable time and, during this, the individual elements are moved between the buckets.

During this operation, the hash table cannot reliably be used and this may be some source of issues in a multithread environment. In single thread code, you do not meet this problem. When you call the add method, the hash table (set or map) decides that the table has to be resized. The add method calls the resizing method and does not return until it is finished. Single thread code has no possibility to use the hash table during this period: the one and single thread is executing the resizing itself. In a multithread environment, however...

HashSet and HashMap use the hash function provided by the Object that is stored in the collection. The Object class implements the hashCode and equals methods. You can override them and if you do, you should override both in a consistent manner. First, we will see what they are and then how to override them consistently.

Table of Contents for Hash functions

Create new playlist

Sign In

Sign Up

Table of Contents for
Hash functions