As you have already seen in Table 11, List Methods, Python lists have a method called index that searches for a particular item:

 index(...)
  L.index(value, [start, [stop]]) -> integer -- return first index of value

List method index starts at the front of the list and examines each item in turn. For reasons that will soon become clear, this technique is called linear search. Linear search is used to find an item in an unsorted list. If there are duplicate values, our algorithms will find the leftmost one:

 >>>​​ ​​[​​'d'​​,​​ ​​'a'​​,​​ ​​'b'​​,​​ ​​'a'​​].index(​​'a'​​)
 1

We’re going to write several versions of linear search in order to demonstrate how to compare different algorithms that all solve the same problem.

After we do this analysis, we will see that we can search a sorted list much faster than we can search an unsorted list.

An Overview of Linear Search

Linear search starts at index 0 and looks at each item one by one. At each index, we ask this question: Is the value we are looking for at the current index? We’ll show three variations of this. All of them use a loop of some kind, and they are all implementations of this function:

 from​ typing ​import​ Any
 
 def​ linear_search(lst: list, value: Any) -> int:
 """Return the index of the first occurrence of value in lst, or return
  -1 if value is not in lst.
 
  >>> linear_search([2, 5, 1, -3], 5)
  1
  >>> linear_search([2, 4, 2], 2)
  0
  >>> linear_search([2, 5, 1, -3], 4)
  -1
  >>> linear_search([], 5)
  -1
  """
 
 # examine the items at each index i in lst, starting at index 0:
 # is lst[i] the value we are looking for? if so, stop searching.

The algorithm in the function body describes what every variation will do to look for the value.

We’ve found it to be helpful to have a picture of how linear search works. (We will use pictures throughout this chapter for both searching and sorting.)

Because our versions examine index 0 first, then index 1, then 2, and so on, that means that partway through our searching process we have this situation:

images/searchsort/linearsearch_abstract.png

There is a part of the list that we’ve examined and another part that remains to be examined. We use variable i to mark the current index.

Here’s a concrete example of where we are searching for a value in a list that starts like this: [2, -3, 5, 9, 8, -6, 4, 15, …]. We don’t know how long the list is, but let’s say that after six iterations we have examined items at indices 0, 1, 2, 3, 4, and 5. Index 6 is the index of the next item to examine:

images/searchsort/linearsearch_concrete.png

That vertical line divides the list in two: the part we have examined and the part we haven’t. Because we stop when we find the value, we know that the value isn’t in the first part:

images/searchsort/linearsearch_invariant.png

This picture is sometimes called an invariant of linear search. An invariant is something that remains unchanged throughout a process. But variable i is changing—how can that picture be an invariant?

Here is a text version of the picture:

 lst[0:i] doesn't contain value, and 0 <= i <= len(lst)

This word version says that we know that value wasn’t found before index i and that i is somewhere between 0 and the length of the list. If our code matches that word version, that word version is an invariant of the code, and so is the picture version.

We can use invariants to come up with the initial values of our variables. For example, with linear search, at the very beginning the entire list is unknown—we haven’t examined anything:

images/searchsort/linearsearch_invariant_start.png

Variable i refers to 0 at the beginning, because then the section with the label value not here is empty; further, list[0:0] is an empty list, which is exactly what we want according to the word version of the invariant. So the initial value of i should be 0 in all of our versions of linear search.

The while Loop Version of Linear Search

Let’s develop our first version of linear search. We need to refine our comments to get them closer to Python:

 Examine every index i in lst, starting at index 0:
  Is lst[i] the value we are looking for? if so, stop searching

Here’s a refinement:

 i = 0 # The index of the next item in lst to examine
 
 While the unknown section isn't empty, and lst[i] isn't
 the value we are looking for:
  add 1 to i

That’s easier to translate. The unknown section is empty when i == len(lst), so it isn’t empty as long as i != len(lst). Here is the code:

 from​ typing ​import​ Any
 
 def​ linear_search(lst: list, value: Any) -> int:
 """Return the index of the first occurrence of value in lst, or return
  -1 if value is not in lst.
 
  >>> linear_search([2, 5, 1, -3], 5)
  1
  >>> linear_search([2, 4, 2], 2)
  0
  >>> linear_search([2, 5, 1, -3], 4)
  -1
  >>> linear_search([], 5)
  -1
  """
 
  i = 0 ​# The index of the next item in lst to examine.
 
 # Keep going until we reach the end of lst or until we find value.
 while​ i != len(lst) ​and​ lst[i] != value:
  i = i + 1
 
 # If we fell off the end of the list, we didn't find value.
 if​ i == len(lst):
 return​ -1
 else​:
 return​ i

This version uses variable i as the current index and marches through the values in lst, stopping in one of two situations: when we have run out of values to examine or when we find the value we are looking for.

The first check in the loop condition, i != len(lst), makes sure that we still have values to look at; if we were to omit that check, then if value isn’t in lst, we would end up trying to access lst[len(lst)]. This would result in an IndexError.

The second check, lst[i] != value, causes the loop to exit when we find value. The loop body increments i; we enter the loop when we haven’t reached the end of lst, and when lst[i] isn’t the value we are looking for.

After the loop terminates, if i == len(lst) then value wasn’t in lst, so we return -1. Otherwise, the loop terminated because we found value at index i.

The for Loop Version of Linear Search

The first version evaluates two Boolean subexpressions each time through the loop. But the first check, i != len(lst), is almost unnecessary; it evaluates to True almost every time through the loop, so the only effect it has is to make sure we don’t attempt to index past the end of the list. We can instead exit the function as soon as we find the value:

 i = 0 # The index of the next item in lst to examine
 
 For each index i in lst:
  If lst[i] is the value we are looking for:
  return i
 
 If we get here, value was not in lst, so we return -1

In this version, we use Python’s for loop to examine each index.

 from​ typing ​import​ Any
 
 def​ linear_search(lst: list, value: Any) -> int:
 """… Exactly the same docstring goes here …
  """
 
 for​ i ​in​ range(len(lst)):
 if​ lst[i] == value:
 return​ i
 
 return​ -1

With this version, we no longer need the first check because the for loop controls the number of iterations. This for loop version is significantly faster than our first version; we’ll see in a bit how much faster.

Sentinel Search

The last linear search we will study is called sentinel search. (A sentinel is a guard whose job it is to stand watch.) Remember that one problem with the while loop linear search is that we check i != len(lst) every time through the loop even though it can never evaluate to False except when value is not in lst. So we’ll play a trick: we’ll add value to the end of lst before we search. That way we are guaranteed to find it! We also need to remove it before the function exits so that the list looks unchanged to whoever called this function:

 Set up the sentinel: append value to the end of lst
 
 i = 0 # The index of the next item in lst to examine
 
 While lst[i] isn't the value we are looking for:
  Add 1 to i
 
 Remove the sentinel
 
 return i

Let’s translate that to Python:

 from​ typing ​import​ Any
 
 def​ linear_search(lst: list, value: Any) -> int:
 """… Exactly the same docstring goes here …
  """
 
 # Add the sentinel.
  lst.append(value)
 
  i = 0
 
 # Keep going until we find value.
 while​ lst[i] != value:
  i = i + 1
 
 # Remove the sentinel.
  lst.pop()
 
 # If we reached the end of the list we didn't find value.
 if​ i == len(lst):
 return​ -1
 else​:
 return​ i

All three of our linear search functions are correct. Which one you prefer is largely a matter of taste: some programmers dislike returning in the middle of a loop, so they won’t like the second version. Others dislike modifying parameters in any way, so they won’t like the third version. Still others will dislike that extra check that happens in the first version.

Timing the Searches

Here is a program that we used to time the three searches on a list with about ten million values:

 import​ time
 import​ linear_search_1
 import​ linear_search_2
 import​ linear_search_3
 
 from​ typing ​import​ Callable, Any
 
 def​ time_it(search: Callable[[list, Any], Any], L: list, v: Any) -> float:
 """Time how long it takes to run function search to find
  value v in list L.
  """
 
  t1 = time.perf_counter()
  search(L, v)
  t2 = time.perf_counter()
 return​ (t2 - t1) * 1000.0
 
 def​ print_times(v: Any, L: list) -> None:
 """Print the number of milliseconds it takes for linear_search(v, L)
  to run for list.index, the while loop linear search, the for loop
  linear search, and sentinel search.
  """
 
 # Get list.index's running time.
  t1 = time.perf_counter()
  L.index(v)
  t2 = time.perf_counter()
  index_time = (t2 - t1) * 1000.0
 
 # Get the other three running times.
  while_time = time_it(linear_search_1.linear_search, L, v)
  for_time = time_it(linear_search_2.linear_search, L, v)
  sentinel_time = time_it(linear_search_3.linear_search, L, v)
 
 print​(​"{0}​​ ​​{1:.2f}​​ ​​{2:.2f}​​ ​​{3:.2f}​​ ​​{4:.2f}"​.format(
  v, while_time, for_time, sentinel_time, index_time))
 
 L = list(range(10000001)) ​# A list with just over ten million values
 
 print_times(10, L) ​# How fast is it to search near the beginning?
 print_times(5000000, L) ​# How fast is it to search near the middle?
 print_times(10000000, L) ​# How fast is it to search near the end?

This program makes use of function perf_counter in built-in module time. Function time_it will call whichever search function it’s given on v and L and returns how long that search took. Function print_times calls time_it with the various linear search functions we have been exploring and prints those search times.

Linear Search Running Time

The running times of the three linear searches with that of Python’s list.index are compared in Table 18. This comparison used a list of 10,000,001 items and three test cases: an item near the front, an item roughly in the middle, and the last item. Except for the first case, where the speeds differ by very little, our while loop linear search takes about thirteen times as long as the one built into Python, and the for loop search and sentinel search take about five and seven times as long, respectively.


Table 18. Running Times for Linear Search (in milliseconds)

Case

while

for

sentinel

list.index

First

0.01

0.01

0.01

0.01

Middle

1261

515

697

106

Last

2673

1029

1394

212


What is more interesting is the way the running times of these functions increase with the number of items they have to examine. Roughly speaking, when they have to look through twice as much data, every one of them takes twice as long. This is reasonable because indexing a list, adding 1 to an integer, and evaluating the loop control expression require the computer to do a fixed amount of work. Doubling the number of times the loop has to be executed therefore doubles the total number of operations, which in turn should double the total running time. This is why this kind of search is called linear: the time to do it grows linearly with the amount of data being processed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.248.162