Remainder and Multiplication Hash Functions

For hash tables, a hash function maps a specific key's space into a smaller number range. More formally, a hash function, f, maps keys of a specific data type to integers in a fixed interval [0,..., N - 1]. We say f(x) hashes the value of x.

The hash function can accept only numeric data types. To allow us to use hash tables on more complex data types, we usually need to convert all these types into numeric representations. This translation varies, depending on the type of data. For example, a character can be changed into its UTF-8 (or ASCII) numeric equivalent. Converting a full string can be done by converting each character separately and then using a strategy to combine the characters into one value.

In Java, the hashCode() method converts an object into a numeric representation, which is ready to be used by a hash function. It is present in the object class and can be overridden using a custom implementation.

There are many techniques on how we can map keys from a wide range into smaller ones. An ideal hash function is one that reduces collisions to a minimum. In other words, when a good hash function is used, each key has the same probability of filling any of the slots in our array. In practice, finding an ideal hash function is very difficult unless we know the input distribution.

A simple technique to implement a hash function is what is known as the remainder method. The hash function simply takes in any numeric key, divides it by the table size (size of the array), and uses the resultant remainder as the hash value. This value can then be used as an index on the array.

The following code shows how the remainder hashing method can be implemented in Java using the modulus operator:

public int hashKey(Integer key, int tableSize) {
return key % tableSize;
}
Snippet 3.5: The remainder method. Source class name: RemainderHashing

Go to https://goo.gl/wNyWWX to access this code.

The reminder method might result in many collisions if care is not taken when choosing an appropriate table size. Once again, consider the example given in the beginning of this section where we are using the student's passport or national ID number to identify a student in the school. To demonstrate the problem, we use an array-based hash table with a size of 1,000 elements. It just so happens that in the country where the school is based, the last four digits of the passport numbers represent the year of birth of the passport holder.

When using the remainder method in this scenario, all the students with the same year of birth will hash to the same value, causing a lot of collisions on the hash table.

A better choice of a table size is to use a prime number, ideally not too close to the power of 2. For example, the value of 1,447 is a good choice in our example, since it's not too close to 1,024 or 2,048 (210 and 211) and is also prime. Using this value as a table size for our example would reduce collisions.

Using the remainder method restricts us on the choice of size for our hash table (to reduce the chance of collisions). To address this, we can use a different hashing technique, called the multiplication method. In this method, we multiply the key by a constant double value, k, in the range 0 < k < 1. We then extract the fractional part from the result and multiply it by the size of our hash table.

The hash value is then the floor of this result:

Where:

  • k is a decimal in the range between 0 and 1
  • s is the size of the hash table
  • x is the key
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.176.5