Algorithms are designed to solve common problems that arise frequently in software applications. You can improve the efficiency of your programs by learning about a wide range of algorithms. This chapter introduces the terminology and notation used by theoreticians and practitioners alike in documenting the performance of algorithms in terms of computational performance and resource usage.
When multiple algorithms solve the exact same problem, it is sometimes possible to identify which one will be the most efficient simply by classifying its performance. First, I want to start with a few examples showing how the behavior of an algorithm is revealed when solving problem instances of increasing size.
A library has one thousand books on five double-entry shelving stacks, where books are stored on shelves on either side. The books are arranged alphabetically by author’s last name and then by first name. If an author has multiple books, they are arranged alphabetically by title. Libraries routinely order books in this way to make it easy to locate a book when you know the author’s name.
Come up with a strategy for completing each of the following tasks and estimate (a) how many words you have to read; and (b) how many steps you have to take. These are the “key operations” for any algorithm in the library. You are alone in the library and you have no smart phone or even a pen to write with. I also want you to estimate the time you need to perform each task.
T1. Check whether the library has a book written by Robert Heinlein.
T2. Find the book named “Dinosaur in a Haystack.”
T3. Find two books in the library whose first sentence is identical.
Don’t peek at my results! Take a few minutes to come up with your own estimates.
|T1 (by author)||T2 (by title)||T3 (matching first sentences)|
Walk along edge of the stacks until you find the one that contains
authors starting with “H”. Then walk along one side to find “He” and then
slow down to inspect names closer to find “Heinlein.” This task may take
longer if “H” authors wrap around a stack.
Start at the first stack and read the binder of each book, spending no
more than one second on each book as you walk past all shelves on all
stacks. Since book author is Stephen J. Gould, the book would be found on a
shelf on the fourth stack.
Open first book on first shelf of first stack to its first page. For
each of remaining
What if there were
2,000 books on ten book stacks, or twice as many? How
would your estimates change? T1 should still be completed in less than a
minute while T2 doubles in time to
30 minutes. T3 must check
1,999,000 possible book pairs, now requiring
to complete, a four-fold increase. It seems tasks can be affected
differently when the problem instance doubles. Pay attention to how the
information is organized; tasks like T1 are easier to perform while other
tasks, such as T3, are simply harder.
If T3 were more common, then a librarian might consider organizing the books in the stacks alphabetically by first sentence. Storing books in this way would allow all three tasks to complete in about 15 minutes. T1 and T2 might force you to read the binder of every book, but now you can complete T3 by making a single pass through the stacks, checking neighboring books to find a pair whose first sentence is identical.
When you understand which operations are to be performed, you can structure your data to increase the efficiency of these operations.
Often an algorithm can speed up its performance by increasing the extra information it stores. This “space vs. time” tradeoff is common when designing an algorithm.
If you had extra storage where you could write down some additional
information, then you could solve T3 more efficiently even when the books
are stored alphabetically by author’s last name. For example, you could
create a spreadsheet where each row stores a book’s title and first
sentence. Let’s assume it takes
10 seconds to enter each book’s
information into the spreadsheet. Once done, sort the spreadsheet by
sentence and then check whether two neighboring sentences are identical.
This reduces the number of words to read, since you only need to read each
sentence exactly once plus you only need to walk through the stacks once,
eliminating countless steps. For
1,000 books, T3 now should take less
3 hours – a significant improvement.
The problem instance size, N, for each of these tasks is the number of
total books in the library. If you can fit
100 books on the shelves on
each side of a stack, then the library needs to build N/
200 stacks to
hold the books. Based solely on N, you can come up with reasonable
estimates for each of these tasks.
We’re now ready to tackle the concept of Asymptotic Analysis which is central to understand how to analyze algorithms. This technique allows us to estimate (in terms of N) the time required by an algorithm to process a problem instance of size N.
Let’s begin this discussion with a real-world problem, depicted in Figure 2-2, that can be solved in different ways, to discuss the concept of performance classes. If you want to travel from the Prudential Center in Boston, Massachusetts to Rockefeller Center in New York City, how would you do it?
Certainly flying a plane involves the quickest speed, but this doesn’t take into account (a) the pre-flight time to take an Uber to Boston Logan airport; (b) any unexpected delays at the airport because of weather, security, or flight traffic; (c) the post-flight time to take an Uber from LaGuardia Airport to Rockefeller Center. Would it be faster to just drive a car from Boston to Manhattan, which otherwise has no additional pre- or post- travel times? Should you just grab your bicycle and see how far you get? What if you jumped on an Amtrak train at Back Bay station in Boston and walked to Rockefeller Center from Grand Central Station in New York City? Clearly there are a number of options you could take.
If the destination changed to the Golden Gate Bridge in San Francisco, California, then all reasonable travel strategies would involve some air plane travel, because the distance is simply too great. If you had the option to walk, bike, drive and/or fly as part of this trip, how would you choose to travel?
I am going to group each potential means of transportation into different
speed performance classes. For example, the foot speed class represents
walking or running while the bicycle speed class represents riding a
bicycle. Some people are faster runners than others1 but no matter how fast a runner is, in head-to-head
competition with a bicyclist, the runner will always eventually fall behind
as shown in Figure 2-3. For short distances (say
meters), the runner will win, but for longer distances, the bicyclist will
always win. In other words, even the slowest bicyclist will eventually
outperform even the fastest runner.
Let’s look at different transportation types in Figure 2-4.
The y-axis scale is logarithmic, which means that each horizontal line is
ten times faster than the horizontal line below it. For each speed class,
the world speed record is plotted; for example the record speed for a
763 mph, faster than any
train. Each new type of transportation represents a speed class,
introducing different opportunities for speed records.
Once you choose your mode of transportation, that determines the speed you can travel and ultimately the distance you can achieve in “reasonable time.” Although people have walked across the United States3, one would naturally prefer flying in a plane! To make another point, even though the rocket achieves the fastest speed, you can’t use it for travel within the United States; it just isn’t practical except for very long distances (and besides, where would you park?).
To bring this discussion back to algorithms, the time complexity class is an example of a performance class used to analyze algorithms. We have seen two performance classes so far. When an algorithm belongs to the linear time complexity class, it means that as the problem size, N, doubles, the algorithm should require twice as much time. When an algorithm belongs to the quadratic time complexity class, it means that as N doubles, the performance quadruples in time. Algorithm analysis is concerned with classifying algorithms into the appropriate performance class so you can accurately determine how it will behave in the long run.
Up next is an algorithm that belongs to a third logarithmic time complexity class, which means that as N doubles, the performance increases by a fixed, constant amount. These algorithms are extremely efficient. Let’s see how this works.
I have written a sequence of seven different numbers in increasing order
from left to right, and hidden each one behind a door. Try this challenge:
what is the fewest number of doors you need to open – one at a time – to
either find a target value of
643 or prove it is not hidden behind one of
these doors? You could start from the left and open each door – one at a
time – until you find
643 or a larger number (which means it wasn’t
behind a door in the first place). But with bad luck, you might have to
open all seven doors.
Instead, you can solve the challenge by opening no more than three doors. Start with door 4 in the middle and open it. Rats!
The number it was hiding is
173, but this means you can ignore all of the
doors to the left of door 4 (since the numbers behind those doors will
all be smaller than
173). Now open door 6 to reveal the number
Ok, so now you know that you can ignore the doors to the right of door
6. Only door 5 can be hiding
643, so open it now to determine whether
the original series contained
643. I leave it to your imagination whether
the number was behind that door.
If you repeat this process on any ascending list of seven numbers using any
target value, you will never need to open more than three doors. Did you
7? What if you had
covering an ascending list of numbers? Would you accept a
challenge to find out whether a specific number were hidden behind some
door if you were only able to open twenty doors? You should! Since
1,048,575, you can always locate a number in an
ascending list of
1,048,575 numbers after opening 20 or fewer doors. Even
better, if there were suddenly twice as many doors,
2,097,152 in fact,
you would never need to open more than 21 doors to find a number; that’s
just one additional door to open. That seems astonishingly efficient! You
have just discovered Binary Array Search.
Binary Array Search is a fundamental algorithm in computer science because
of its time complexity. Let’s first review a Python implementation below
that searches for
target in an ordered list,
hi are set to the leftmost and rightmost indices of
A. While there is a sublist to explore, find the midpoint,
integer division. If
target your search is over; otherwise
you have learned whether to search the sublist to the left,
.. mid-1], or to the right,
A[mid+1 .. hi].
This algorithm determines whether a value exists in a sorted list of N
elements. As the loop iterates, eventually either the
target will be
hi crosses over to become smaller than
lo, which ends the
A[lo .. mid] means the sublist from
lo up to and
Consider using Binary Array Search to find a target value of
the list shown in Figure 2-7.
19 which is smaller than the target,
53, so take the elif
case and focus the search on the sublist of
mid+1 up to and
hi. The grayed-out elements in Figure 2-8 are no
longer in consideration. The size of the sublist being explored after this
first iteration is reduced by half (from
7 elements to
53 which is the target value, so the function returns
Now use Binary Array Search to search for the target value of
19 which is larger than the target,
17, so take the if
case and focus the search on the sublist from
lo up to and including
mid-1. The grayed-out elements in Figure 2-9 are no longer in
17, is greater than
14, so take the elif case
and try sublist from
15 which is smaller than target value of
again, take the if case, which results in the “crossover” where the value
lo is greater than
hi, as shown in Figure 2-11.
At this point, the condition of the while loop is false, and so
is returned to state that the target value is not contained within the
Before conducting Binary Array Search, is it worth checking that
A[-1]? Doing so would prevent a
fruitless search for a target that couldn’t possibly be present in an
ordered list. What is your opinion?
What if you want to know the exact location of
target in an
Currently Binary Array Search returns
False. Modify the code,
as shown in Figure 2-12, to return the index position,
target is found.
What should be returned when
target is not in
A? You could just return
-1 (which is an invalid index location) but there is an opportunity to
return more information. What if we could tell the caller “
target is not
A but if you wanted to insert
A, it would go in this
Look at Figure 2-7 again. When searching for a target value of
17 (which doesn’t exist in
A), the final value of
lo is actually
17 would be inserted. You could return
–lo as the result, and
this would work for all index locations except for the first one which is
zero. Instead return the opposite of
(lo+1). The calling function that
receives a negative value,
x, can easily determine that
target would be
placed at location
–(x+1). When a non-negative value is returned, that is
the location of
One final optimization remains. In the above code, you see that there are
two comparisons between
A[mid]. When both values are
numeric, compute their difference instead to only invoke this key operation
target is smaller than
rc < 0, which is equivalent
logically to above. If
rc is positive, then you know
target was greater
A[mid]. Even when the values are not numeric, some languages offer a
compareTo function that returns a negative number, zero, or a positive
number based on the relative ordering of two values.
What if the values in a list appear in descending order? What would you have to change in the above code?
How efficient is
BinaryArraySearch on a problem instance of size N? To
answer this question I have to compute, in the worst case, how many times
the while loop is forced to execute. The mathematical concept of
logarithms 4 will tell us the answer. Let’s show how
logarithm works by asking a simple question: how many times do you need to
double the number
1 until the value equals
33,554,432? Well, you could
start computing this manually:
16 and so on, but this
is truly tedious. Mathematically, you are looking for a value,
33554432. Note that
x involves exponentiation of
a base (the value
2) and an exponent (the value
x). A logarithm is the
opposite of exponentiation, in the same way that division is the opposite
of multiplication. To find an
x such that
33554432) using a base of
2 which equals
you type the equation
25 into a calculator, the result is
33554432. This computation also shows that you can repeatedly divide
33554432 by two
25 times until you get to
1. Logarithm computes a
floating point result, for example,
137) is about
this makes sense since
137 would require a slightly
higher exponent for base
BinaryArraySearch algorithm will repeat the while loop as long as
hi, or in other words, while there are elements to
search. The first iteration starts with
N elements to search, and in the
second iteration this is reduced to no more than
N is odd it is
(N-1)/2). So to determine the maximum number of successive iterations,
you need to know how many times you can divide
2 until you reach
1. This quantity is exactly
k = log
N), so the total number of
times through the while loop is
1 + k, counting
1 for the first time
N elements plus
k for the successive iterations). Because
return a floating point value, and we need an integer number for the number
of iterations, we use the
floor(x) mathematical operation, which computes
the largest integer that is smaller than
BinaryArraySearch, the while loop iterates no more than
2 (N)) +
1 times. This behavior is truly extraordinary!
With one million elements in sorted order, you can locate any element in
21 passes through the while loop.
To provide some quick evidence for this formula, count the number of
iterations through the while loop for problem instances whose size, N,
15: you only need
4 in all cases. For example,
15 elements in the first iteration, the second explores a
7, the third iteration explores a sublist with
and the fourth and final iteration explores a sublist with just
element. If you started with
10 elements, the number of explored elements
in each iteration would be
also means four iterations in total.
We can now tackle asymptotic analysis, an approach that lets us eliminate any knowledge about the actual computer when evaluating the performance of an algorithm.
The concept of an additive constant is common in many real-world
scenarios; it’s what we mean when we say “I’ll be there in
give or take
5 minutes.” Asymptotic Analysis takes this idea further
and introduces the notion of a multiplicative constant to analyze
algorithms. If you’ve heard of Moore’s Law, you will find this constant
familiar. Gordon Moore, the CEO and co-founder of Intel corporation,
predicted in 1965 that the number of components per integrated circuit
would double every year for a decade; in 1975 he revised this prediction to
doubling every year and a half. This prediction was valid for over forty
years and explains why the speed of computers essentially doubles every
eighteen months. A multiplicative constant applied to computing means you
can find an older computer where the same program runs one thousand-times
slower (or even worse) than on a modern computer.
Consider two algorithms that solve the same problem. Algorithm
3*N operations on problems of size N, and B requires
2020*log(N) operations. You have two computers on which you execute
implementations of these algorithms; computer C1 is two
times faster than C2. In Figure 2-13, the table on the
left reports (a) the # of operations for each algorithm on a problem
instance of size N; and (b) the performance of A and B on C1 and the
performance of A on C2.
You must choose whether A or B is more efficient; while initially A
requires more operations than its counterpart B on problem instances of
the same size, once N is
8,192 or larger, B requires far fewer
operations. The graphs on the right of Figure 2-13
present this information visually, where you can clearly see the
crossover point between
16,384 when B begins to
outperform A in terms of the number of operations required. However, when
you run the exact same implementation of A on two different computers,
you can see that A2 (running on C2) outperforms A1 (running on C1).
If you found a computer C3 that was 500 times faster than C1, you
could eventually find a problem instance size for which B1, running on
C1, outperforms A3, running on the super-powerful C3 computer. In
this specific case, the crossover occurs between
8,388,608. The lesson of this story is that you can throw advanced
computing hardware at a problem, but eventually the more efficient
algorithm will eventually be faster once the problem size becomes large
T(N) to be the time required for an algorithm to process a problem
instance of size N. There can be different
T(N) defined for best case
and worst case problem instances for the same algorithm. The time unit
does not matter (whether milliseconds, or seconds).
S(N) to be the storage required for an algorithm to process a
problem instance of size N. There can be different
S(N) defined for best
case and worst case problem instances for the same algorithm. The space
unit does not matter (whether bytes or megabytes).
Computer Scientists use a “Big O” notation to classify algorithms based
on how run time performance grows as the problem instance grows. The letter
O is used because the growth rate of a function is called the “order of a
function”. For example, the formula
5 is an “order 2”
function, also called quadratic, since the largest exponent for
We start by trying to count the number of operations that will execute
based on the size of the problem instance N. Because each operation takes
a fixed amount of time to execute, this turns into an estimate for the
The “Big O” notation has a precise, mathematical definition that can
classify an algorithm’s runtime
performance. Figure 2-14 shows the
performance on random lists of sorted integers. The size of each list varies
2,097,152. The second column,
T(N) reports the time
(in seconds) to perform
50,000 executions of Binary Array Search on a
problem of size N. The third column is the value of the function
As you can see, once N ≥
T(N) is always smaller than or
0.0125*log(N). For small problem instances, the true performance
of the algorithm has not yet stabilized. But for large enough N, the value
in the third column is a good model of
T(N) and is always greater.
The goal is to find a function
log(N) ) and
c > 0 (e.g.,
0.0125) such that for some N0 (e.g.,
f(N) for all N ≥
N0. When this happens, you can state that
f(N)) or more colloquially
T(N) is O(
In this example, we have demonstrated that
T(N) is O(
Okay, this material can be a lot to take in. And there are plenty of online resources that provide more detailed mathematical explanation for this notation.
f(N)) is meant to provide an assurance that you can model the worst
T(N) of an algorithm on a problem of size N. As you
can see from the definition of O(
f(N) is always
T(N) once N is large enough.
Let’s return (briefly) to Figure 1-2 which showed the difficulty in counting the number of operations executed by an algorithm. Recall that the overall goal is to estimate the time for an algorithm to process a problem instance of size N.
First you identify and compute
K(N), the count of how many times a key
operation executes on a problem instance of size N. Next, you estimate
that the number of machine instructions executed in total would be a
multiple of this count, that is
c * K(N). This is a safe assumption
because modern programming languages can be compiled into hundreds or
thousands of machine instructions. You don’t even have to compute
rather you can empirically determine it (based on the individual
performance on a computer). Figure 2-14
empirically computes the suitable constant
c = 0.0125, given the
empirical for a specific computer. Running the same code on a different
computer would result in a different constant.
The notation clearly classifies the trajectory of the performance (or
storage) as a function of N. Each performance class is described by some
f(N)). We have seen three performance classes so far:
log N) is the logarithmic time complexity class where
O(N) is the linear time complexity class where
f(N) = N
2) is the quadratic time complexity class where
f(N) = N
We now have the ability to inspect the implementation of an algorithm - or
even just its description - and identify its performance class. In the
following code example, how many times does the key operation
ct = ct + 1
i loop executes
100 times, and for each of these loops, the
j loop executes N times. In total,
ct = ct + 1 executes
times. The total time
T(N) to execute the above code (that is, a problem
instance of size N) is smaller than
c * N for some suitably chosen
If you execute this code on an actual computer, you will be able to
determine the exact
c. More precisely using the “Big O” notation, we
can state that the performance of this code is
Run this code thousands of times on different computing systems, and each
time you would be able to compute a different
c; this fact remains true,
and is the reason we can classify the code performance as
You can perform a similar analysis to determine the space complexity
required for an algorithm on a problem instance of size N. The first
F1, uses a fixed amount of space because in Python 3
range is a generator that produces the numbers one at a time, regardless
of the value of N. The second fragment,
F2, constructs a list storing N
integers. The size of required memory grows larger in direct proportion to
# Fragment F1
# Fragment F2
Quantifying the space for
F2 is hard because there is no
universally agreed-upon unit for space. Should we count the bytes of
memory used? or bits? Does it matter if an integer requires 32-bits of
storage of 64-bits of storage? Imagine a future computer that allowed for
128-bit representations of integers. Has the space complexity changed?
The goal is to find a function
log(N) ) and a
c > 0 (e.g.,
11) such that for some N0 (e.g., N0 =
f(N) for all N ≥ N0. When this
happens, you can state that
S(N) ∈ O(
f(N)) or more colloquially
S(N) is O(
S(N) is a function to measure total space required by an algorithm.
sys.getsizeof(...) determines the size in bytes for an
object. Python version 3 uses generators for
range() which significantly
reduces the storage needs for Python programs. If you type the following
statements into a Python interpreter, you will see the corresponding
This report shows that the byte storage for
list(range(10000)) is about
one hundred times larger than for
list(range(100)). And when you review
the other numbers, you can classify the storage requirements for
c = 11 and you can see that each result is ≤
In contrast, the number of bytes required for
range(10000) is identical (48 bytes). Since the storage for
constant, we need to introduce another complexity class, known as constant
time complexity class.
1) is the constant time complexity class where
f(N) = 1.
In this case, set
c = 50 and you can state the space complexity for
1) because 48 ≤ 50.
We have covered a lot of ground in these first two chapters. I presented a number of examples so I could describe the way algorithms behave independent of the problem being solved. Fifty years ago, while researchers were discovering new algorithms, advances in computing technology was increasing the very performance of the computers executing these algorithms. Asymptotic Analysis provides the foundation for independently assessing the performance of algorithms in a way that eliminates any dependence on the computing platform. We have defined four different time (or storage) complexity classes to understand the behavior of an algorithm in terms of the size of the input problem instance. These complexity classes will appear throughout the book as a form of shorthand notation to quickly summarize these insights.
Assess the time complexity of the following code fragments
+ 2. TBA
+ 3. TBA