Searching a list of names or numbers is another very common computer science task. There are many search algorithms, but the key in developing a search algorithm is to determine which type of candidate search process matches the particular need. The following are the two questions (parameters) that affect our search algorithm selection:
Is the list we are searching sorted or unsorted?
Are the searched list elements unique, or are there duplicate values within the list?
For simplicity, we illustrate the search process using only a unique element list. That is, our implementation assumes that there are no duplicate values. We then discuss what needs to be modified in the algorithm and corresponding Ruby implementation to support duplicate values. Given the level of programming sophistication you now possess, we forgo presenting the only slightly modified implementation that supports duplicate values and leave it as an exercise for you. Once again, we revisit the final exam grade example we used in the sections on sorting.
We now discuss two types of searches. The first is for an unsorted list called a linear search, and the second is for an ordered or sorted list; it is called a binary search.
Consider the problem of finding a number or a name, or more accurately, its position, in an unsorted list of unique elements. The simplest means to accomplish this is to visit each element in the list and check whether the element in the list matches the sought-after value. This is called a linear or sequential search since, in the worst case, the entire list must be searched in a linear fashion (one item after another). This occurs when the sought-after value either is in the last position or is absent from the list. Obviously, the average case requires searching half the list since the sought-after value can be found equally likely anywhere in the list, and the best case occurs when the sought-after value is the first element in the list. The algorithm is as follows:
For every element in the list, check whether the element is equal to the value to be found.
If the element looked for is found, then the position where the element is found is returned. Otherwise, continue to the next element in the list.
Continue the search until either the element looked for is found or the end of the list is reached.
A Ruby implementation for unique element linear search is provided in Example 7-5.
1
# Code for linear search
2
NUM_STUDENTS
=
35
3
MAX_GRADE
=
100
4
arr
=
Array
.
new
(
NUM_STUDENTS
)
5
value_to_find
=
8
6
i
=
1
7
found
=
false
8
9
# Randomly put some student grades into arr
10
for
i
in
(
0
.
.
NUM_STUDENTS
-
1
)
11
arr
[
i
]
=
rand
(
MAX_GRADE
+
1
)
12
end
13
14
puts
"Input List:"
15
for
i
in
(
0
.
.
NUM_STUDENTS
-
1
)
16
puts
"arr["
+
i
.
to_s
+
"] ==> "
+
arr
[
i
].
to_s
17
end
18
i
=
0
19
# Loop over the list until it ends or we have found our value
20
while
((
i
<
NUM_STUDENTS
)
and
(
not
found
))
21
# We found it :)
22
if
(
arr
[
i
]
==
value_to_find
)
23
puts
"Found "
+
value_to_find
.
to_s
+
" at position "
+
i
.
to_s
+
" of
the list."
24
found
=
true
25
end
26
i
=
i
+
1
27
end
28
29
# If we haven't found the value at this point, it doesn't exist in our list
30
if
(
not
found
)
31
puts
"There is no "
+
value_to_find
.
to_s
+
" in the list."
32
end
Consider now the case of an unsorted list with potentially duplicate elements. In this case, it is necessary to check each and every element in the list, since an element matching the sought-after value does not imply completion of the search process. Thus, the only difference between this algorithm and the unique element linear search algorithm is that we continue through the entire list without terminating the loop if a matching element is found.
In line 5, we initialize the sought-after value. Clearly the
user would be prompted with some nice box to fill in, but we do not
want to get distracted with user-interface issues. Ultimately the
user fills in the nice box, pulls down a value list, or clicks on a
radio button, and a variable such as value_to_find
will be initialized.
Next, a flag called found
is set to false
on line 7. This
is used so that the search will terminate when the sought-after
value is indeed found.
In lines 10–12, the list is initialized and filled with some random values.
The key loop starts at line 20, where the list is traversed
one item at a time. Each time, the comparison in line 22 tests to
determine if the element in the array matches the sought-after value
the user is trying to find. If the value matches, a message is
displayed, the flag is set to true
, and the loop terminates.
Finally, on line 30, a check is made to determine if the
element was not found—in other words, is absent from the list. If
this is the case, the value in the found
flag will remain false
. If it is still false
after traversing the entire list,
this means that no value in the list matched the sought-after value,
and the user is notified.
Binary search operates on an ordered set of numbers. The idea is to search the list precisely how you might automatically perfectly search a phone book. A phone book is ordered from A to Z. Ideally, you initially search the halfway point in the phone book. For example, if the phone book had 99 names, ideally you would initially look at name number 50. Let’s say it starts with an M, since M is the 13th letter in the alphabet. If the name we are looking for starts with anything from A to M, for example, “Fred,” we find the halfway point between those beginning with A and those beginning with M. If, on the other hand, we are searching for a name farther down the alphabet than those names that start with M, for example, “Sam,” we find the halfway point between those beginning with M and those beginning with Z. Each time, we find the precise middle element of those elements left to be searched and repeat the process. We terminate the process once the sought-after value is found or when the remaining list of elements to search consists of only one value.
By following this process, each time we compare, we reduce half of the search space. This halving process results in a tremendous savings in terms of the number of comparisons made. A linear search must compare each element in the list; a binary search reduces the search space in half each time. Thus, 2x = n is the equation needed to determine how many comparisons (x) are needed to find a sought-after value in an n element list. Solving for x, we obtain x = log2(n).
Binary search is one of the finest examples of computer science helping to make software work smart instead of just working hard. A linear search of 1 million elements takes on average half a million comparisons. A binary search takes 20. That is an average savings of 499,980 comparisons! So think before you code.
Instead of an algorithm needed to find an element using a linear search, we now have an search algorithm. Now, for example, consider a sorted list with 1,048,576 elements. For a linear search, on average, we would need to compare 524,288 elements against the sought-after value, but we may need to perform a total of 1,048,576 comparisons.
In contrast, in a binary search we are guaranteed to search using only log2(1,048,576) = 20 comparisons. Instead of 524,288 comparisons in the average case, the binary search algorithm requires only 20. That is, the number of comparisons required by the binary search algorithm is less than 0.004% of the expected number of comparisons needed by the linear search algorithm. As an aside, and are equivalent, since they differ strictly by a constant. Hence, generally speaking, the notation is preferred. Of course, binary search is possible only if the original list is sorted. The binary search explanation given earlier is for a unique element list only. However, before presenting the modification needed for potential duplicate elements, a few remarks regarding the use of binary search must be made:
First, binary search assumes an ordered list. If the list is unordered, it must be sorted prior to the search, or a binary search won’t work. Sorting involves a greater time complexity than searching. Thus, if the search will occur rarely, it might not be wise to sort the list. On the other hand, searches often occur frequently and the updating of the list occurs infrequently. Thus, in such cases, always sort (order) the list and then use binary search.
The average- and worst-case search times for binary search are , while the average- and worst-case search times for linear search are . What is interesting, however, is that unlike for linear search, where, in practice, the worst-case search time is double that of the average case, for binary search both times are roughly identical in practice.
In Example 7-6, we present a Ruby implementation of unique element binary search. Note that we introduce on line 16 a built-in Ruby feature to check if a value is already present within an array and on line 22 to sort an array in place.
1
# Code for binary search
2
NUM_STUDENTS
=
30
3
MAX_GRADE
=
100
4
arr
=
Array
.
new
(
NUM_STUDENTS
)
5
# The value we are looking for
6
value_to_find
=
7
7
low
=
0
8
high
=
NUM_STUDENTS
-
1
9
middle
=
(
low
+
high
)
/
2
10
found
=
false
11
12
# Randomly put some exam grades into the array
13
for
i
in
(
0
.
.
NUM_STUDENTS
-
1
)
14
new_value
=
rand
(
MAX_GRADE
+
1
)
15
# make sure the new value is unique
16
while
(
arr
.
include?
(
new_value
))
17
new_value
=
rand
(
MAX_GRADE
+
1
)
18
end
19
arr
[
i
]
=
new_value
20
end
21
# Sort the array (with Ruby's built-in sort)
22
arr
.
sort!
23
24
"Input List: "
25
for
i
in
(
0
.
.
NUM_STUDENTS
-
1
)
26
puts
"arr["
+
i
.
to_s
+
"] ==> "
+
arr
[
i
].
to_s
27
end
28
29
while
((
low
<=
high
)
and
(
not
found
))
30
middle
=
(
low
+
high
)
/
2
31
# We found it :)
32
if
arr
[
middle
]
==
value_to_find
33
puts
"Found grade "
+
value_to_find
.
to_s
+
"% at position "
+
middle
.
to_s
34
found
=
true
35
end
36
37
# If the value should be lower than middle, search the lower half,
38
# otherwise, search the upper half
39
if
(
arr
[
middle
]
<
value_to_find
)
40
low
=
middle
+
1
41
else
42
high
=
middle
-
1
43
end
44
end
45
46
if
(
not
found
)
47
puts
"There is no grade of "
+
value_to_find
.
to_s
+
"% in the list."
48
end
We use these features to simplify the code illustrated. Note that this type of abstraction, namely, the use of the built-in encapsulated feature, simplifies software development, increases readability, and simplifies software maintenance. Its use is paramount in practice. It should be understood that this book uses Ruby as a tool for learning concepts of computer science and basic programming, and not as an attempt to teach all the capabilities of the Ruby interpreter. See the additional reading list in Appendix A if you are interested in exploring additional built-in features of Ruby.
Line 9 computes the first middle range.
Lines 29–44 implement the key loop that keeps cutting the search space down by half.
Line 32 is the comparison to the sought-after value. If we
find the element, we are done and we update the same found
flag that was used for the linear
search. If the value is less than the middle we update the high side
of the range, and if it is greater we update the low side of the
range. At the end, we verify that the sought-after value was indeed
found.
As with the linear search algorithm, the modification required to support possible duplicate values is relatively minimal for the binary search algorithm. Since binary search requires an ordered list, if a sought-after value is found, then all duplicates must be adjacent to the position just found.
Thus, to find all duplicates, positions immediately preceding and following the current position are checked for as long as the sought-after value is found. That is, in succession, adjacent positions earlier and earlier in the list are checked while the stored element value equals that which is sought after, and similarly for later and later positions. Again, the implementation of this change is left as an exercise to the reader.
3.17.162.214