Algorithms are designed to solve common problems that arise frequently in software applications. You can improve the efficiency of your programs by learning about a wide range of algorithms. This chapter introduces the terminology and notation used by theoreticians and practitioners alike in documenting the performance of algorithms in terms of computational performance and resource usage.
When multiple algorithms solve the exact same problem, it is sometimes possible to identify which one will be the most efficient simply by classifying its performance. First, I want to start with a few examples showing how the behavior of an algorithm is revealed when solving problem instances of increasing size.
A library has one thousand books on five double-entry shelving stacks, where books are stored on shelves on either side. The books are arranged alphabetically by author’s last name and then by first name. If an author has multiple books, they are arranged alphabetically by title. Libraries routinely order books in this way to make it easy to locate a book when you know the author’s name.
Come up with a strategy for completing each of the following tasks and estimate (a) how many words you have to read; and (b) how many steps you have to take. These are the “key operations” for any algorithm in the library. You are alone in the library and you have no smart phone or even a pen to write with. I also want you to estimate the time you need to perform each task.
T1. Check whether the library has a book written by Robert Heinlein.
T2. Find the book named “Dinosaur in a Haystack.”
T3. Find two books in the library whose first sentence is identical.
Don’t peek at my results! Take a few minutes to come up with your own estimates.
T1 (by author) | T2 (by title) | T3 (matching first sentences) |
---|---|---|
Walk along edge of the stacks until you find the one that contains
authors starting with “H”. Then walk along one side to find “He” and then
slow down to inspect names closer to find “Heinlein.” This task may take
longer if “H” authors wrap around a stack. |
Start at the first stack and read the binder of each book, spending no
more than one second on each book as you walk past all shelves on all
stacks. Since book author is Stephen J. Gould, the book would be found on a
shelf on the fourth stack. |
Open first book on first shelf of first stack to its first page. For
each of remaining |
What if there were 2,000
books on ten book stacks, or twice as many? How
would your estimates change? T1 should still be completed in less than a
minute while T2 doubles in time to 30
minutes. T3 must check 2000
* 1999
/2
= 1,999,000
possible book pairs, now requiring 552
hours
to complete, a four-fold increase. It seems tasks can be affected
differently when the problem instance doubles. Pay attention to how the
information is organized; tasks like T1 are easier to perform while other
tasks, such as T3, are simply harder.
If T3 were more common, then a librarian might consider organizing the books in the stacks alphabetically by first sentence. Storing books in this way would allow all three tasks to complete in about 15 minutes. T1 and T2 might force you to read the binder of every book, but now you can complete T3 by making a single pass through the stacks, checking neighboring books to find a pair whose first sentence is identical.
When you understand which operations are to be performed, you can structure your data to increase the efficiency of these operations.
Often an algorithm can speed up its performance by increasing the extra information it stores. This “space vs. time” tradeoff is common when designing an algorithm.
If you had extra storage where you could write down some additional
information, then you could solve T3 more efficiently even when the books
are stored alphabetically by author’s last name. For example, you could
create a spreadsheet where each row stores a book’s title and first
sentence. Let’s assume it takes 10
seconds to enter each book’s
information into the spreadsheet. Once done, sort the spreadsheet by
sentence and then check whether two neighboring sentences are identical.
This reduces the number of words to read, since you only need to read each
sentence exactly once plus you only need to walk through the stacks once,
eliminating countless steps. For 1,000
books, T3 now should take less
than 3
hours – a significant improvement.
The problem instance size, N, for each of these tasks is the number of
total books in the library. If you can fit 100
books on the shelves on
each side of a stack, then the library needs to build N/200
stacks to
hold the books. Based solely on N, you can come up with reasonable
estimates for each of these tasks.
We’re now ready to tackle the concept of Asymptotic Analysis which is central to understand how to analyze algorithms. This technique allows us to estimate (in terms of N) the time required by an algorithm to process a problem instance of size N.
Let’s begin this discussion with a real-world problem, depicted in Figure 2-2, that can be solved in different ways, to discuss the concept of performance classes. If you want to travel from the Prudential Center in Boston, Massachusetts to Rockefeller Center in New York City, how would you do it?
Certainly flying a plane involves the quickest speed, but this doesn’t take into account (a) the pre-flight time to take an Uber to Boston Logan airport; (b) any unexpected delays at the airport because of weather, security, or flight traffic; (c) the post-flight time to take an Uber from LaGuardia Airport to Rockefeller Center. Would it be faster to just drive a car from Boston to Manhattan, which otherwise has no additional pre- or post- travel times? Should you just grab your bicycle and see how far you get? What if you jumped on an Amtrak train at Back Bay station in Boston and walked to Rockefeller Center from Grand Central Station in New York City? Clearly there are a number of options you could take.
If the destination changed to the Golden Gate Bridge in San Francisco, California, then all reasonable travel strategies would involve some air plane travel, because the distance is simply too great. If you had the option to walk, bike, drive and/or fly as part of this trip, how would you choose to travel?
I am going to group each potential means of transportation into different
speed performance classes. For example, the foot speed class represents
walking or running while the bicycle speed class represents riding a
bicycle. Some people are faster runners than others1 but no matter how fast a runner is, in head-to-head
competition with a bicyclist, the runner will always eventually fall behind
as shown in Figure 2-3. For short distances (say 100
meters), the runner will win, but for longer distances, the bicyclist will
always win. In other words, even the slowest bicyclist will eventually
outperform even the fastest runner.
Let’s look at different transportation types in Figure 2-4.
The y-axis scale is logarithmic, which means that each horizontal line is
ten times faster than the horizontal line below it. For each speed class,
the world speed record is plotted; for example the record speed for a
car2 is 763
mph, faster than any
train. Each new type of transportation represents a speed class,
introducing different opportunities for speed records.
Once you choose your mode of transportation, that determines the speed you can travel and ultimately the distance you can achieve in “reasonable time.” Although people have walked across the United States3, one would naturally prefer flying in a plane! To make another point, even though the rocket achieves the fastest speed, you can’t use it for travel within the United States; it just isn’t practical except for very long distances (and besides, where would you park?).
To bring this discussion back to algorithms, the time complexity class is an example of a performance class used to analyze algorithms. We have seen two performance classes so far. When an algorithm belongs to the linear time complexity class, it means that as the problem size, N, doubles, the algorithm should require twice as much time. When an algorithm belongs to the quadratic time complexity class, it means that as N doubles, the performance quadruples in time. Algorithm analysis is concerned with classifying algorithms into the appropriate performance class so you can accurately determine how it will behave in the long run.
Up next is an algorithm that belongs to a third logarithmic time complexity class, which means that as N doubles, the performance increases by a fixed, constant amount. These algorithms are extremely efficient. Let’s see how this works.
I have written a sequence of seven different numbers in increasing order
from left to right, and hidden each one behind a door. Try this challenge:
what is the fewest number of doors you need to open – one at a time – to
either find a target value of 643
or prove it is not hidden behind one of
these doors? You could start from the left and open each door – one at a
time – until you find 643
or a larger number (which means it wasn’t
behind a door in the first place). But with bad luck, you might have to
open all seven doors.
Instead, you can solve the challenge by opening no more than three doors. Start with door 4 in the middle and open it. Rats!
The number it was hiding is 173
, but this means you can ignore all of the
doors to the left of door 4 (since the numbers behind those doors will
all be smaller than 173
). Now open door 6 to reveal the number 900
.
Ok, so now you know that you can ignore the doors to the right of door
6. Only door 5 can be hiding 643
, so open it now to determine whether
the original series contained 643
. I leave it to your imagination whether
the number was behind that door.
If you repeat this process on any ascending list of seven numbers using any
target value, you will never need to open more than three doors. Did you
notice that 2
3
– 1
= 7
? What if you had 1,000,000
doors
covering an ascending list of numbers? Would you accept a $10,000
challenge to find out whether a specific number were hidden behind some
door if you were only able to open twenty doors? You should! Since
2
20
– 1
= 1,048,575
, you can always locate a number in an
ascending list of 1,048,575
numbers after opening 20 or fewer doors. Even
better, if there were suddenly twice as many doors, 2,097,152
in fact,
you would never need to open more than 21 doors to find a number; that’s
just one additional door to open. That seems astonishingly efficient! You
have just discovered Binary Array Search.
Binary Array Search is a fundamental algorithm in computer science because
of its time complexity. Let’s first review a Python implementation below
that searches for target
in an ordered list, A
.
Initially lo
and hi
are set to the leftmost and rightmost indices of
A
. While there is a sublist to explore, find the midpoint, mid
, using
integer division. If A[mid]
is target
your search is over; otherwise
you have learned whether to search the sublist to the left, A[lo
.. mid-1]
, or to the right, A[mid+1 .. hi]
.
This algorithm determines whether a value exists in a sorted list of N
elements. As the loop iterates, eventually either the target
will be
found or hi
crosses over to become smaller than lo
, which ends the
loop.
The notation A[lo .. mid]
means the sublist from lo
up to and
including mid
.
Consider using Binary Array Search to find a target value of 53
in
the list shown in Figure 2-7.
A[mid]
= 19
which is smaller than the target, 53
, so take the elif
case and focus the search on the sublist of A
from mid+1
up to and
including hi
. The grayed-out elements in Figure 2-8 are no
longer in consideration. The size of the sublist being explored after this
first iteration is reduced by half (from 7
elements to 3
).
A[mid]
= 53
which is the target value, so the function returns True
.
Now use Binary Array Search to search for the target value of 17
in
Figure 2-7.
A[mid]
= 19
which is larger than the target, 17
, so take the if
case and focus the search on the sublist from lo
up to and including
mid-1
. The grayed-out elements in Figure 2-9 are no longer in
consideration.
17
exist in the left three elements?The target, 17
, is greater than A[mid]
= 14
, so take the elif case
and try sublist from mid+1
to hi
.
17
could beFinally, A[mid]
= 15
which is smaller than target value of 17
. Once
again, take the if case, which results in the “crossover” where the value
of lo
is greater than hi
, as shown in Figure 2-11.
17
is not presentAt this point, the condition of the while loop is false, and so False
is returned to state that the target value is not contained within the
list.
Before conducting Binary Array Search, is it worth checking that target
≥ A[0]
and target
≤ A[-1]
? Doing so would prevent a
fruitless search for a target that couldn’t possibly be present in an
ordered list. What is your opinion?
What if you want to know the exact location of target
in an A
?
Currently Binary Array Search returns True
or False
. Modify the code,
as shown in Figure 2-12, to return the index position, mid
,
where target
is found.
target
in A
.What should be returned when target
is not in A
? You could just return
-1
(which is an invalid index location) but there is an opportunity to
return more information. What if we could tell the caller “target
is not
in A
but if you wanted to insert target
into A
, it would go in this
location?”
Look at Figure 2-7 again. When searching for a target value of
17
(which doesn’t exist in A
), the final value of lo
is actually
where 17
would be inserted. You could return –lo
as the result, and
this would work for all index locations except for the first one which is
zero. Instead return the opposite of (lo+1)
. The calling function that
receives a negative value, x
, can easily determine that target
would be
placed at location –(x+1)
. When a non-negative value is returned, that is
the location of target
in A
.
One final optimization remains. In the above code, you see that there are
two comparisons between target
and A[mid]
. When both values are
numeric, compute their difference instead to only invoke this key operation
once.
rc
=
target
-
A
[
mid
]
if
rc
<
0
:
hi
=
mid
-
1
elif
rc
>
0
:
lo
=
mid
+
1
else
:
return
mid
If target
is smaller than A[mid]
then rc
< 0, which is equivalent
logically to above. If rc
is positive, then you know target
was greater
than A[mid]
. Even when the values are not numeric, some languages offer a
compareTo
function that returns a negative number, zero, or a positive
number based on the relative ordering of two values.
What if the values in a list appear in descending order? What would you have to change in the above code?
How efficient is BinaryArraySearch
on a problem instance of size N? To
answer this question I have to compute, in the worst case, how many times
the while loop is forced to execute. The mathematical concept of
logarithms 4 will tell us the answer. Let’s show how
logarithm works by asking a simple question: how many times do you need to
double the number 1
until the value equals 33,554,432
? Well, you could
start computing this manually: 1
, 2
, 4
, 8
, 16
and so on, but this
is truly tedious. Mathematically, you are looking for a value, x
, such
that 2
x
= 33554432
. Note that 2
x
involves exponentiation of
a base (the value 2
) and an exponent (the value x
). A logarithm is the
opposite of exponentiation, in the same way that division is the opposite
of multiplication. To find an x
such that 2
x
= 33554432
, we
compute log
2
(33554432
) using a base of 2
which equals 25.0
. If
you type the equation 2
25
into a calculator, the result is
33554432
. This computation also shows that you can repeatedly divide
33554432
by two 25
times until you get to 1
. Logarithm computes a
floating point result, for example, log
2
(137
) is about 7.098032
;
this makes sense since 2
7
= 128
and 137
would require a slightly
higher exponent for base 2
.
The BinaryArraySearch
algorithm will repeat the while loop as long as
lo
≤ hi
, or in other words, while there are elements to
search. The first iteration starts with N
elements to search, and in the
second iteration this is reduced to no more than N/2
(if N
is odd it is
(N-1)/2
). So to determine the maximum number of successive iterations,
you need to know how many times you can divide N
by 2
until you reach
1
. This quantity is exactly k = log
2
(N
), so the total number of
times through the while loop is 1 + k
, counting 1
for the first time
for N
elements plus k
for the successive iterations). Because log
can
return a floating point value, and we need an integer number for the number
of iterations, we use the floor(x)
mathematical operation, which computes
the largest integer that is smaller than x
.
For BinaryArraySearch
, the while loop iterates no more than
floor
(log
2
(N)) + 1
times. This behavior is truly extraordinary!
With one million elements in sorted order, you can locate any element in
just 21
passes through the while loop.
To provide some quick evidence for this formula, count the number of
iterations through the while loop for problem instances whose size, N,
ranges from 8
to 15
: you only need 4
in all cases. For example,
starting with 15
elements in the first iteration, the second explores a
sublist with 7
, the third iteration explores a sublist with 3
elements,
and the fourth and final iteration explores a sublist with just 1
element. If you started with 10
elements, the number of explored elements
in each iteration would be 10
→ 5
→ 2
→ 1
, which
also means four iterations in total.
We can now tackle asymptotic analysis, an approach that lets us eliminate any knowledge about the actual computer when evaluating the performance of an algorithm.
The concept of an additive constant is common in many real-world
scenarios; it’s what we mean when we say “I’ll be there in 40
minutes,
give or take 5
minutes.” Asymptotic Analysis takes this idea further
and introduces the notion of a multiplicative constant to analyze
algorithms. If you’ve heard of Moore’s Law, you will find this constant
familiar. Gordon Moore, the CEO and co-founder of Intel corporation,
predicted in 1965 that the number of components per integrated circuit
would double every year for a decade; in 1975 he revised this prediction to
doubling every year and a half. This prediction was valid for over forty
years and explains why the speed of computers essentially doubles every
eighteen months. A multiplicative constant applied to computing means you
can find an older computer where the same program runs one thousand-times
slower (or even worse) than on a modern computer.
Consider two algorithms that solve the same problem. Algorithm
A requires 3*N
operations on problems of size N, and B requires
2020*log(N)
operations. You have two computers on which you execute
implementations of these algorithms; computer C1 is two
times faster than C2. In Figure 2-13, the table on the
left reports (a) the # of operations for each algorithm on a problem
instance of size N; and (b) the performance of A and B on C1 and the
performance of A on C2.
You must choose whether A or B is more efficient; while initially A
requires more operations than its counterpart B on problem instances of
the same size, once N is 8,192
or larger, B requires far fewer
operations. The graphs on the right of Figure 2-13
present this information visually, where you can clearly see the
crossover point between 8,192
and 16,384
when B begins to
outperform A in terms of the number of operations required. However, when
you run the exact same implementation of A on two different computers,
you can see that A2 (running on C2) outperforms A1 (running on C1).
If you found a computer C3 that was 500 times faster than C1, you
could eventually find a problem instance size for which B1, running on
C1, outperforms A3, running on the super-powerful C3 computer. In
this specific case, the crossover occurs between 4,194,304
and
8,388,608
. The lesson of this story is that you can throw advanced
computing hardware at a problem, but eventually the more efficient
algorithm will eventually be faster once the problem size becomes large
enough.
Define T
(N) to be the time required for an algorithm to process a problem
instance of size N. There can be different T
(N) defined for best case
and worst case problem instances for the same algorithm. The time unit
does not matter (whether milliseconds, or seconds).
Define S
(N) to be the storage required for an algorithm to process a
problem instance of size N. There can be different S
(N) defined for best
case and worst case problem instances for the same algorithm. The space
unit does not matter (whether bytes or megabytes).
Computer Scientists use a “Big O” notation to classify algorithms based
on how run time performance grows as the problem instance grows. The letter
O is used because the growth rate of a function is called the “order of a
function”. For example, the formula 4x
2
+ 3x
- 5
is an “order 2”
function, also called quadratic, since the largest exponent for x
is 2
.
We start by trying to count the number of operations that will execute
based on the size of the problem instance N. Because each operation takes
a fixed amount of time to execute, this turns into an estimate for the
runtime performance.
The “Big O” notation has a precise, mathematical definition that can
classify an algorithm’s runtime
performance. Figure 2-14 shows the
performance on random lists of sorted integers. The size of each list varies
from from 32
to 2,097,152
. The second column, T(N)
reports the time
(in seconds) to perform 50,000
executions of Binary Array Search on a
problem of size N. The third column is the value of the function
0.0125*log(N)
.
50,000
trials of Binary Array SearchAs you can see, once N ≥ 128
, T(N)
is always smaller than or
equal to 0.0125*log(N)
. For small problem instances, the true performance
of the algorithm has not yet stabilized. But for large enough N, the value
in the third column is a good model of T(N)
and is always greater.
The goal is to find a function f
(N) (e.g., f
(N) = log
(N) ) and
a constant c
> 0 (e.g., c
= 0.0125
) such that for some N0 (e.g.,
N0 = 128
), T
(N) ≤ c
* f
(N) for all N ≥
N0. When this happens, you can state that T
(N) ∈
O(f
(N)) or more colloquially T
(N) is O(f
(N)).
In this example, we have demonstrated that T
(N) is O(log
(N)).
Okay, this material can be a lot to take in. And there are plenty of online resources that provide more detailed mathematical explanation for this notation.
O(f
(N)) is meant to provide an assurance that you can model the worst
case performance T
(N) of an algorithm on a problem of size N. As you
can see from the definition of O(f
(N)), c
* f
(N) is always
greater than T
(N) once N is large enough.
Let’s return (briefly) to Figure 1-2 which showed the difficulty in counting the number of operations executed by an algorithm. Recall that the overall goal is to estimate the time for an algorithm to process a problem instance of size N.
First you identify and compute K
(N), the count of how many times a key
operation executes on a problem instance of size N. Next, you estimate
that the number of machine instructions executed in total would be a
multiple of this count, that is c * K
(N). This is a safe assumption
because modern programming languages can be compiled into hundreds or
thousands of machine instructions. You don’t even have to compute c
but
rather you can empirically determine it (based on the individual
performance on a computer). Figure 2-14
empirically computes the suitable constant c = 0.0125
, given the
empirical for a specific computer. Running the same code on a different
computer would result in a different constant.
The notation clearly classifies the trajectory of the performance (or
storage) as a function of N. Each performance class is described by some
O(f
(N)). We have seen three performance classes so far:
O(log
N) is the logarithmic time complexity class where f
(N) = log
(N)
O(N) is the linear time complexity class where f
(N) = N
O(N2
) is the quadratic time complexity class where f
(N) = N2
We now have the ability to inspect the implementation of an algorithm - or
even just its description - and identify its performance class. In the
following code example, how many times does the key operation ct = ct + 1
execute?
for
i
in
range
(
100
):
for
j
in
range
(
N
):
ct
=
ct
+
1
The outer i
loop executes 100
times, and for each of these loops, the
inner j
loop executes N times. In total, ct = ct + 1
executes 100
*N
times. The total time T
(N) to execute the above code (that is, a problem
instance of size N) is smaller than c
* N for some suitably chosen c
.
If you execute this code on an actual computer, you will be able to
determine the exact c
. More precisely using the “Big O” notation, we
can state that the performance of this code is O
(N).
Run this code thousands of times on different computing systems, and each
time you would be able to compute a different c
; this fact remains true,
and is the reason we can classify the code performance as O
(N).
You can perform a similar analysis to determine the space complexity
required for an algorithm on a problem instance of size N. The first
fragment below, F1
, uses a fixed amount of space because in Python 3
range
is a generator that produces the numbers one at a time, regardless
of the value of N. The second fragment, F2
, constructs a list storing N
integers. The size of required memory grows larger in direct proportion to
N.
# Fragment F1
for
i
in
range
(
N
):
...
# Fragment F2
for
i
in
list
(
range
(
N
)):
...
Quantifying the space for F1
and F2
is hard because there is no
universally agreed-upon unit for space. Should we count the bytes of
memory used? or bits? Does it matter if an integer requires 32-bits of
storage of 64-bits of storage? Imagine a future computer that allowed for
128-bit representations of integers. Has the space complexity changed?
The goal is to find a function f
(N) (e.g., f
(N) = log
(N) ) and a
constant c
> 0 (e.g., c
= 11
) such that for some N0 (e.g., N0 =
100
), S
(N) ≤ c
* f
(N) for all N ≥ N0. When this
happens, you can state that S
(N) ∈ O(f
(N)) or more colloquially
S
(N) is O(f
(N)).
Here S(N)
is a function to measure total space required by an algorithm.
In Python, sys.getsizeof(...)
determines the size in bytes for an
object. Python version 3 uses generators for range()
which significantly
reduces the storage needs for Python programs. If you type the following
statements into a Python interpreter, you will see the corresponding
storage requirements:
>>>
import
sys
>>>
sys
.
getsizeof
(
range
(
100
))
48
>>>
sys
.
getsizeof
(
range
(
10000
))
48
>>>
sys
.
getsizeof
(
list
(
range
(
100
)))
1008
>>>
sys
.
getsizeof
(
list
(
range
(
1000
)))
9112
>>>
sys
.
getsizeof
(
list
(
range
(
10000
)))
90112
>>>
sys
.
getsizeof
(
list
(
range
(
100000
)))
900112
This report shows that the byte storage for list(range(10000))
is about
one hundred times larger than for list(range(100))
. And when you review
the other numbers, you can classify the storage requirements for F2
as
O(N): choose c = 11
and you can see that each result is ≤ 11*N
.
In contrast, the number of bytes required for range(100)
and
range(10000)
is identical (48 bytes). Since the storage for F1
is
constant, we need to introduce another complexity class, known as constant
time complexity class.
O(1
) is the constant time complexity class where f
(N) = 1.
In this case, set c = 50
and you can state the space complexity for F1
is O(1
) because 48 ≤ 50.
We have covered a lot of ground in these first two chapters. I presented a number of examples so I could describe the way algorithms behave independent of the problem being solved. Fifty years ago, while researchers were discovering new algorithms, advances in computing technology was increasing the very performance of the computers executing these algorithms. Asymptotic Analysis provides the foundation for independently assessing the performance of algorithms in a way that eliminates any dependence on the computing platform. We have defined four different time (or storage) complexity classes to understand the behavior of an algorithm in terms of the size of the input problem instance. These complexity classes will appear throughout the book as a form of shorthand notation to quickly summarize these insights.
Assess the time complexity of the following code fragments
# Fragment-1
for
i
in
range
(
100
):
for
j
in
range
(
N
):
for
k
in
range
(
10000
):
...
# Fragment-2
for
i
in
range
(
N
):
for
j
in
range
(
N
):
for
k
in
range
(
100
):
...
# Fragment-3
for
i
in
range
(
0
,
N
,
2
):
for
j
in
range
(
0
,
N
,
2
):
...
# Fragment-4
ct
=
N
while
ct
>=
1
:
...
ct
=
ct
//
2
# Fragment-5
for
i
in
range
(
2
,
N
,
3
):
for
j
in
range
(
3
,
N
,
2
):
...
+ 2. TBA
+ 3. TBA