Unlike arrays, which strictly use integer indices, hashes can use any data type as their index. What Ruby calls a “hash” is really a clever way of using a string data type to map quickly to a specific element inside an array.
The string is referred to as a hash key. Some kind of function must exist to map a string to a number. For example, a simple hash function could add up the ASCII codes for each letter and implement a modulo for the number of keys we have. A hash collision occurs when our hash function returns the same number for two different keys, which can be handled with various collision resolution algorithms. A simple collision resolution algorithm simply places all keys that have a collision into a bucket, and the bucket is sequentially scanned for the specific key that is requested when a collision occurs. A detailed discussion of hashing is beyond the scope of this book, but we wanted to illustrate the differences between a hash table and an array.
In most cases, strings are used to associate keys to values. For example, instead of using a two-dimensional array, we can use a hash to store student test scores by name as seen in Example 6-8. As shown, similar to arrays, line 1 creates a new hash structure. Likewise, element assignment, lines 2–4, follow the same process done for arrays.
1
scores
=
Hash
.
new
2
scores
[
"Geraldo"
]
=
[
98
,
95
,
93
,
96
]
3
scores
[
"Brittany"
]
=
[
74
,
90
,
84
,
92
]
4
scores
[
"Michael"
]
=
[
72
,
87
,
68
,
54
,
10
]
To access Brittany’s score, we could simply call on scores["Brittany"]
. Of course, the string
"Brittany"
can also be replaced by a
variable that holds that string.
1
scores
=
Hash
.
new
2
scores
[
"Geraldo"
]
=
[
98
,
95
,
93
,
96
]
3
scores
[
"Brittany"
]
=
[
74
,
90
,
84
,
92
]
4
scores
[
"Michael"
]
=
[
72
,
87
,
68
,
54
,
10
]
5
name
=
"Brittany"
6
puts
name
+
" first score is: "
+
scores
[
name
][
0
].
to_s
In line 5 of Example 6-9, we
assigned “Brittany” to the variable name
; so, assuming that the code of Example 6-9 is stored in file hash_2.rb, executing the code should display
Brittany’s first score on the screen:
$
ruby
hash_2
.
rb
Brittany
first
score
is
:
74
It is possible to get an array of all the keys by calling on
scores.keys
. We can then go through
each key by using a for loop. We can now rewrite the maximum score
example to work for any number of students, no matter what their names
are or how many scores each student has.
Note that in our example, the number of individual scores varies among the students. That is, in Example 6-9, both “Geraldo” and “Brittany” have four scores each, while “Michael” has five. The ability to have varying numbers of entries provides great flexibility.
1
scores
=
Hash
.
new
2
3
scores
[
"Geraldo"
]
=
[
98
,
95
,
93
,
96
]
4
scores
[
"Brittany"
]
=
[
74
,
90
,
84
,
92
]
5
scores
[
"Michael"
]
=
[
72
,
87
,
68
,
54
,
10
]
6
7
maxscore
=
0
8
for
name
in
scores
.
keys
9
column
=
0
10
while
(
column
<
scores
[
name
].
size
)
11
12
if
(
scores
[
name
][
column
]
>
maxscore
)
13
maxname
=
name
14
maxscore
=
scores
[
name
][
column
]
15
end
16
column
=
column
+
1
17
end
18
end
19
20
puts
maxname
+
" has the highest score."
21
puts
"The highest score is: "
+
maxscore
.
to_s
We see that running the code from Example 6-10, stored in file find_max_hash.rb, will output the following result:
$
ruby
find_max_hash
.
rb
Geraldo
has
the
highest
score
.
The
highest
score
is
:
98
Note that the entries in this hash differ from the entries used in the array example.
Hashes cannot replace arrays outright. Due to the nature of their keys, they do not actually have any sensible sequence for their elements. Hashes and arrays serve separate but similar roles. Hashes excel at lookup. A hash keyed on name with a phone number as a value is much easier to work with than a multidimensional array of names and phone numbers.
Arrays refer to a sequence of variables where each variable does
not have a name; instead, it is referenced by an integer index. That is,
arr[i]
refers to the
i
th element in the
sequence, remembering that indices start at 0. In contrast, a hash table
uses a key-value pairing to identify the particular entry. In the
earlier example, we wish to access test scores based on a person’s name.
That is, the hash table arr
['Geraldo'
] identifies Geraldo’s test scores even though Geraldo
is not an integer. Such referencing supports both efficient access and
logical correlations.
18.191.162.51