Binary Search

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Binary Search

Consider a list of 1 million sorted values. Linear search starts at the beginning of the list and asks, “Is this value what I’m looking for?” If it isn’t, the same is asked about the second value, and then the third. Up to 1 million questions are asked. This algorithm doesn’t take advantage of the list being sorted.

Here’s a new algorithm, called binary search, that relies on the list being sorted: look at the middle value and ask, “Is this value bigger than or smaller than the one I’m looking for?” With that one question, we can eliminate 500,000 values! That leaves a list of 500,000 values to search. We’ll do it again: look at the middle value, ask the same question, and eliminate another 250,000 values. We have eliminated 3/4 of the list with only two questions! Asking only 20 questions, we can locate a particular value in a list of 1 million sorted values.

The logarithm of a number is how many times that number can be divided until we get to 1. We’ll need to know what number we are dividing by—we’ll call that the base. For binary search, we use base 2, because we divide the list in half each iteration.

The logarithm base 2 of 1, which we’ll write as log₂ 1, is 0: we don’t need to divide 1 at all in order to reach 1.

log₂ 2 is 1, because 2/2 is 1.

log₂ 4 is 2: 4/2 is 2, and 2/2 is 1, so we divided by 2 twice to reach 1.

log₂ 8 is 3: 8/2 is 4, 4/2 is 2, and 2/2 is 1. Every time we double the number, the logarithm base 2 increases by 1.

Here’s a table of base 2 logarithms:

Table 19. Logarithmic Growth

N (the # of items)	N as a power of 2	log₂ N
1	2⁰	0
2	2¹	1
4	2²	2
8	2³	3
16	2⁴	4
32	2⁵	5
64	2⁶	6
128	2⁷	7
256	2⁸	8
512	2⁹	9
1024	2¹⁰	10

To figure out how fast it is, we’ll think about how big a list we can search with a certain number of questions. With only one question, we can determine whether a list of length 1 contains a value. With two questions, we can search a list of length 2. With three questions, we can search a list of length 4. Four questions, length 8. Five questions, length 16. Every time we get to ask another question, we can search a list twice as big.

Using logarithmic notation, N sorted values can be searched in ceiling(log₂ N) steps, where ceiling() is the ceiling function that rounds a value up to the nearest integer. As shown in Table 20, this increases much less quickly than the time needed for linear search.

Table 20. Logarithmic Growth

Searching N Items	Worst Case—Linear Search	Worst Case—Binary Search
100	100	7
1000	1000	10
10,000	10,000	14
100,000	100,000	17
1,000,000	1,000,000	20
10,000,000	10,000,000	24

The key to binary search is to keep track of three parts of the list: the left part, which contains values that are smaller than the value we are searching for; the right part, which contains values that are equal to or larger than the value we are searching for; and the middle part, which contains values that we haven’t yet examined—the unknown section. If there are duplicate values, we will return the index of the leftmost one, which is why the “equal to” section belongs on the right.

We’ll use two variables to keep track of the boundaries: i will mark the index of the first unknown value, and j will mark the index of the last unknown value:

images/searchsort/binarysearch_invariant.png

At the beginning of the algorithm, the unknown section makes up the entire list, so we will set i to 0 and j to the length of the list minus one as shown in the figure.

images/searchsort/binarysearch_invariant_start.png

We are done when that unknown section is empty—when we’ve examined every item in the list. This happens when i == j + 1—when the values cross. (When i == j, there is still one item left in the unknown section.) Here is a picture of what the values are when the unknown section is empty:

images/searchsort/binarysearch_invariant_finish.png

To make progress, we will set either i or j to near the middle of the range between them. Let’s call this index m, which is at (i + j) // 2. (Notice the use of integer division: we are calculating an index, so we need an integer.)

Think for a moment about the value at m. If it is less than v, we need to move i up, while if it is greater than v, we should move j down. But where exactly do we move them?

When we move i up, we don’t want to set it to the midpoint exactly, because L[m] isn’t included in the range; instead, we set it to one past the middle—in other words, to m + 1.

images/searchsort/binarysearch_invariant_start2.png

Similarly, when we move j down, we move it to m - 1:

images/searchsort/binarysearch_invariant_start3.png

The completed function is as follows:

	from typing import Any

	def binary_search(L: list, v: Any) -> int:
	"""Return the index of the first occurrence of value in L, or return
	-1 if value is not in L.

	>>> binary_search([1, 3, 4, 4, 5, 7, 9, 10], 1)
	0
	>>> binary_search([1, 3, 4, 4, 5, 7, 9, 10], 4)
	2
	>>> binary_search([1, 3, 4, 4, 5, 7, 9, 10], 5)
	4
	>>> binary_search([1, 3, 4, 4, 5, 7, 9, 10], 10)
	7
	>>> binary_search([1, 3, 4, 4, 5, 7, 9, 10], -3)
	-1
	>>> binary_search([1, 3, 4, 4, 5, 7, 9, 10], 11)
	-1
	>>> binary_search([1, 3, 4, 4, 5, 7, 9, 10], 2)
	-1
	>>> binary_search([], -3)
	-1
	>>> binary_search([1], 1)
	0
	"""

	# Mark the left and right indices of the unknown section.
	i = 0
	j = len(L) - 1

	while i != j + 1:
	m = (i + j) // 2
	if L[m] < v:
	i = m + 1
	else:
	j = m - 1

	if 0 <= i < len(L) and L[i] == v:
	return i
	else:
	return -1

	if __name__ == '__main__':
	import doctest
	doctest.testmod()

There are a lot of tests because the algorithm is quite complicated and we wanted to test pretty thoroughly. Our tests cover these cases:

The value is the first item.
The value occurs twice. We want the index of the first one.
The value is in the middle of the list.
The value is the last item.
The value is smaller than everything in the list.
The value is larger than everything in the list.
The value isn’t in the list, but it is larger than some and smaller than others.
The list has no items.
The list has one item.

In Chapter 15, Testing and Debugging, you’ll learn a different testing framework that allows you to write tests in a separate Python file (thus making docstrings shorter and easier to read; only a couple of examples are necessary), and you’ll learn strategies for coming up with your own test cases.

Binary Search Running Time

Binary search is much more complicated to write and understand than linear search. Is it fast enough to make the extra effort worthwhile? To find out, we can compare it to list.index. As before, we search for the first, middle, and last items in a list with about ten million elements as shown in Table 21.

Table 21. Running Times for Binary Search

Case	list.index	binary_search	Ratio
First	0.007	0.02	0.32
Middle	105	0.02	5910
Last	211	0.02 (Wow!)	11661

The results are impressive. Binary search is up to several thousand times faster than its linear counterpart when searching ten million items. Most importantly, if we double the number of items, binary search takes only one more iteration, whereas the time for list.index nearly doubles.

Note also that although the time taken for linear search grows in step with the index of the item found, there is no such pattern for binary search. No matter where the item is, it takes the same number of steps.

Built-In Binary Search

The Python standard library’s bisect module includes binary search functions that are slightly faster than our binary search. Function bisect_left returns the index where an item should be inserted in a list to keep it in sorted order, assuming it is sorted to begin with. insort_left actually does the insertion.

The word left in the name signals that these functions find the leftmost (lowest index) position where they can do their jobs; the complementary functions bisect_right and insort_right find the rightmost.

There is one minor drawback to binary search: the algorithm assumes that the list is sorted, and sorting is time and memory intensive. We’ll look at that next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Binary Search

Create new playlist

Sign In

Sign Up

Binary Search

Binary Search Running Time

Built-In Binary Search

Table of Contents for
Binary Search