Column 8: Algorithm Design Techniques

Column 2 describes the everyday effect of algorithm design on programmers: algorithmic insights can make a program simpler. In this column we’ll see a less frequent but more dramatic contribution of the field: sophisticated algorithms sometimes give extreme performance improvements.

This column studies four different algorithms for one small problem, with an emphasis on the techniques used to design them. Some of the algorithms are a little complicated, but with justification. While the first program we’ll study takes fifteen days to solve a problem of size 100,000, the final program solves the same problem in five milliseconds.

8.1 The Problem and a Simple Algorithm

The problem arose in one-dimensional pattern recognition; we’ll see its history later. The input is a vector x of n floating-point numbers; the output is the maximum sum found in any contiguous subvector of the input. For instance, if the input vector contains these ten elements

Image

then the program returns the sum of x[2..6], or 187. The problem is easy when all the numbers are positive; the maximum subvector is the entire input vector. The rub comes when some of the numbers are negative: should we include a negative number in hopes that positive numbers on either side will compensate for it? To complete the problem definition, we’ll say that when all inputs are negative the maximum-sum sub-vector is the empty vector, which has sum zero.

The obvious program for this task iterates over all pairs of integers i and j satisfying 0≤i≤j<n; for each pair it computes the sum of x[i..j] and checks whether that sum is greater than the maximum sum so far. The pseudocode for Algorithm 1 is

maxsofar = 0
for i = [0, n)
    for j = [i, n)
        sum = 0
        for k = [i, j]
            sum += x[k]
        /* sum is sum of x[i..j] */
        maxsofar = max(maxsofar, sum)

This code is short, straightforward and easy to understand. Unfortunately, it is also slow. On my computer, for instance, the program takes about 22 minutes if n is 10,000 and fifteen days if n is 100,000; we’ll see the timing details in Section 8.5.

Those times are anecdotal; we get a different kind of feeling for the algorithm’s efficiency using the big-oh notation described in Section 6.1. The outermost loop is executed exactly n times, and the middle loop is executed at most n times in each execution of the outer loop. Multiplying those two factors of n shows that the code in the middle loop is executed O(n2) times. The innermost loop within the middle loop is never executed more than n times, so its cost is O(n). Multiplying the cost per inner loop times its number of executions shows that the cost of the entire program is proportional to n cubed. We’ll therefore refer to this as a cubic algorithm.

This example illustrates the technique of big-oh analysis and many of its strengths and weaknesses. Its primary weakness is that we still don’t really know the amount of time the program will take for any particular input; we just know that the number of steps is O(n3). That weakness is often compensated for by two strong points of the method. Big-oh analyses are usually easy to perform (as above), and the asymptotic run time is often sufficient for a back-of-the-envelope calculation to decide whether a program is sufficient for a given application.

The next several sections use asymptotic run time as the only measure of program efficiency. If that makes you uncomfortable, peek ahead to Section 8.5, which shows that such analyses are extremely informative for this problem. Before you read further, though, take a minute to try to find a faster algorithm.

8.2 Two Quadratic Algorithms

Most programmers have the same response to Algorithm 1: “There’s an obvious way to make it a lot faster.” There are two obvious ways, however, and if one is obvious to a given programmer then the other often isn’t. Both algorithms are quadratic — they take O(n2) steps on an input of size n — and both achieve their run time by computing the sum of x[i..j] in a constant number of steps rather than in the ji + 1 additions of Algorithm 1. But the two quadratic algorithms use very different methods to compute the sum in constant time.

The first quadratic algorithm computes the sum quickly by noticing that the sum of x[i..j] is intimately related to the sum previously computed (that of x[i..j – 1]). Exploiting that relationship leads to Algorithm 2.

maxsofar = 0
for i = [0, n)
    sum = 0
    for j = [i, n)
        sum += x[j]
        /* sum is sum of x[i..j] */
        maxsofar = max(maxsofar, sum)

The statements inside the first loop are executed n times, and those inside the second loop are executed at most n times on each execution of the outer loop, so the total run time is O(n2).

An alternative quadratic algorithm computes the sum in the inner loop by accessing a data structure built before the outer loop is ever executed. The ith element of cumarr contains the cumulative sum of the values in x[0..i], so the sum of the values in x[i..j] can be found by computing cumarr[j]–cumarr[i – 1]. This results in the following code for Algorithm 2b.

cumarr[-1] = 0
for i = [0, n)
    cumarr[i] = cumarr[i-1] + x[i]
maxsofar = 0
for i = [0, n)
    for j = [i, n)
        sum = cumarr[j] - cumarr[i-1]
        /* sum is sum of x[i..j] */
        maxsofar = max(maxsofar, sum)

(Problem 5 addresses how we might access cumarr[–1].) This code takes O(n2) time; the analysis is exactly the same as that of Algorithm 2.

The algorithms we’ve seen so far inspect all possible pairs of starting and ending values of subvectors and evaluate the sum of the numbers in that subvector. Because there are O(n2) subvectors, any algorithm that inspects all those values must take at least quadratic time. Can you think of a way to sidestep this problem and achieve an algorithm that runs in less time?

8.3 A Divide-and-Conquer Algorithm

Our first subquadratic algorithm is complicated; if you get bogged down in its details, you won’t lose much by skipping to the next section. It is based on the following divide-and-conquer recipe:

To solve a problem of size n, recursively solve two subproblems of size approximately n/2, and combine their solutions to yield a solution to the complete problem.

In this case the original problem deals with a vector of size n, so the most natural way to divide it into subproblems is to create two subvectors of approximately equal size, which we’ll call a and b.

Image

We then recursively find the maximum-sum subvectors in a and b, which we’ll call ma and mb.

Image

It is tempting to think that we have now solved the problem because the maximum-sum subvector of the entire vector must be either ma or mb. That is almost right. In fact, the maximum is either entirely in a, entirely in b, or it crosses the border between a and b; we’ll call that mc for the maximum crossing the border.

Image

Thus our divide-and-conquer algorithm will compute ma and mb recursively, compute mc by some other means, and then return the maximum of the three.

That description is almost enough to write code. All we have left to describe is how we’ll handle small vectors and how we’ll compute mc. The former is easy: the maximum of a one-element vector is the only value in the vector (or zero if that number is negative), and the maximum of a zero-element vector was defined to be zero. To compute mc we observe that its left side is the largest subvector starting at the boundary and reaching into a, and similarly for its right side in b. Putting these facts together leads to the following code for Algorithm 3:

float maxsum3(l, u)
    if (l > u)  /* zero elements */
        return 0
    if (l == u)  /* one element */
        return max(0, x[l])
    m = (l + u) / 2
    /* find max crossing to left */
    lmax = sum = 0
    for (i = m; i >= 1; i--)
        sum += x[i]
        lmax = max(lmax, sum)
    /* find max crossing to right */
    rmax = sum = 0
    for i = (m, u]
        sum += x[i]
        rmax = max(rmax, sum)

    return max(lmax+rmax, maxsum3(l, m), maxsum3(m+l, u))

Algorithm 3 is originally invoked by the call

answer = maxsum3(0, n-1)

The code is subtle and easy to get wrong, but it solves the problem in O(n log n) time. We can prove that fact in several ways. An informal argument observes that the algorithm does O(n) work on each of O(log n) levels of recursion. The argument can be made more precise by the use of recurrence relations. If T(n) denotes the time to solve a problem of size n, then T(1) = O (1) and

T(n) = 2T(n/2) + O(n).

Problem 15 shows that this recurrence has the solution T(n) = O(n log n).

8.4 A Scanning Algorithm

We’ll now use the simplest kind of algorithm that operates on arrays: it starts at the left end (element x[0]) and scans through to the right end (element x[n – 1]), keeping track of the maximum-sum subvector seen so far. The maximum is initially zero. Suppose that we’ve solved the problem for x[0..i – 1]; how can we extend that to include x[i]? We use reasoning similar to that of the divide-and-conquer algorithm: the maximum-sum subarray in the first i elements is either in the first i – 1 elements (which we’ll store in maxsofar), or it ends in position i (which we’ll store in maxendinghere).

Image

Recomputing maxendinghere from scratch using code like that in Algorithm 3 yields yet another quadratic algorithm. We can get around this by using the technique that led to Algorithm 2: instead of computing the maximum subvector ending in position i from scratch, we’ll use the maximum subvector that ends in position i – 1. This results in Algorithm 4.

maxsofar = 0
maxendinghere = 0
for i = [0, n)
    /* invariant: maxendinghere and maxsofar
       are accurate for x[0..i-1] */
    maxendinghere = max(maxendinghere + x[i], 0)
    maxsofar = max(maxsofar, maxendinghere)

The key to understanding this program is the variable maxendinghere. Before the first assignment statement in the loop, maxendinghere contains the value of the maximum subvector ending in position i – 1; the assignment statement modifies it to contain the value of the maximum subvector ending in position i. The statement increases it by the value x[i] so long as doing so keeps it positive; when it goes negative, it is reset to zero (because the maximum subvector ending at i is now the empty vector). Although the code is subtle, it is short and fast: its run time is O(n), so we’ll refer to it as a linear algorithm.

8.5 What Does It Matter?

So far we’ve played fast and loose with big-ohs; it’s time to come clean and study the run times of the programs. I implemented the four primary algorithms in C on a 400MHz Pentium II, timed them, and extrapolated the observed run times to produce the following table. (The run time of Algorithm 2b was typically within ten percent of Algorithm 2, so it is not included.)

Image

This table makes a number of points. The most important is that proper algorithm design can make a big difference in run time; that point is underscored by the middle rows. The last two rows show how increases in problem size are related to increases in run time.

Another important point is that when we’re comparing cubic, quadratic and linear algorithms with one another, the constant factors of the programs don’t matter much. (The discussion of the O(n!) algorithm in Section 2.4 shows that constant factors matter even less in functions that grow faster than polynomially.) To underscore this point, I conducted an experiment to make the constant factors of two algorithms differ by as much as possible. To achieve a huge constant factor I implemented Algorithm 4 on a Radio Shack TRS-80 Model III (a 1980 personal computer with a Z-80 processor running at 2.03MHz). To slow that poor old beast even further, I used an interpreted Basic that is an order or two of magnitude slower than compiled code. For the other end of the spectrum, I implemented Algorithm 1 on an Alpha 21164 running at 533MHz. I got the disparity I wanted: the run time of the cubic algorithm was measured as 0.58n3 nanoseconds, while the run time of the linear algorithm was 19.5n milliseconds, or 19,500,000n nanoseconds (that is, it processes about 50 elements per second). This table shows how those expressions translate to times for various problem sizes.

Image

The difference in constant factors of thirty-three million allowed the cubic algorithm to start off faster, but the linear algorithm was bound to catch up. The break-even point for the two algorithms is around 5,800, where each takes just under two minutes of run time.

Image

8.6 Principles

The history of the problem sheds light on the algorithm design techniques. The problem arose in a pattern-matching problem faced by Ulf Grenander at Brown University; the original problem was in the two-dimensional form described in Problem 13. In that version, the maximum-sum subarray was the maximum likelihood estimator of a certain kind of pattern in a digitized picture. Because the two-dimensional problem required too much time to solve, Grenander simplified it to one dimension to gain insight into its structure.

Grenander observed that the cubic time of Algorithm 1 was prohibitively slow, and derived Algorithm 2. In 1977 he described the problem to Michael Shamos, who overnight designed Algorithm 3. When Shamos showed me the problem shortly thereafter, we thought that it was probably the best possible; researchers had just shown that several similar problems require time proportional to n log n. A few days later Shamos described the problem and its history at a Carnegie Mellon seminar attended by statistician Jay Kadane, who sketched Algorithm 4 within a minute. Fortunately, we know that there is no faster algorithm: any correct algorithm must take O(n) time (see Problem 6).

Even though the one-dimensional problem is completely solved, Grenander’s original two-dimensional problem remains open two decades after it was posed, as the second edition of this book goes to press. Because of the computational expense of all known algorithms, Grenander had to abandon that approach to his pattern-matching problem. Readers who feel that the linear-time algorithm for the one-dimensional problem is “obvious” are therefore urged to find an “obvious” algorithm for Problem 13!

The algorithms in this story illustrate important algorithm design techniques.

Save state to avoid recomputation. This simple form of dynamic programming was used in Algorithms 2 and 4. By using space to store results, we avoid using time to recompute them.

Preprocess information into data structures. The cumarr structure in Algorithm 2b allows the sum of a subvector to be quickly computed.

Divide-and-conquer algorithms. Algorithm 3 uses a simple form of divide-and-conquer; textbooks on algorithm design describe more advanced forms.

Scanning algorithms. Problems on arrays can often be solved by asking “how can I extend a solution for x[0..i – 1] to a solution for x[0..i]?” Algorithm 4 stores both the old answer and some auxiliary data to compute the new answer.

Cumulatives. Algorithm 2b uses a cumulative table in which the ith element contains the sum of the first I values of x; such tables are common when dealing with ranges. Business analysts, for instance, find the sales from March to October by subtracting the February year-to-date sales from the October year-to-date sales.

Lower bounds. Algorithm designers sleep peacefully only when they know their algorithms are the best possible; for this assurance, they must prove a matching lower bound. The linear lower bound for this problem is the subject of Problem 6; more complex lower bounds can be quite difficult.

8.7 Problems

1. Algorithms 3 and 4 use subtle code that is easy to get wrong. Use the program verification techniques of Column 4 to argue the correctness of the code; specify the loop invariants carefully.

2. Time the four algorithms on your machine to build a table like that in Section 8.5.

3. Our analysis of the four algorithms was done only at the big-oh level of detail. Analyze the number of max functions used by each algorithm as exactly as possible; does this exercise give any insight into the running times of the programs? How much space does each algorithm require?

4. If the elements in the input array are random real numbers chosen uniformly from [–1, 1], what is the expected value of the maximum subvector?

5. For simplicity, Algorithm 2b accessed cumarr[–1]. How would you deal with that issue in C?

6. Prove that any correct algorithm for computing maximum subvectors must inspect all n inputs. (Algorithms for some problems may correctly ignore some inputs; consider Saxe’s algorithm in Solution 2.2 and Boyer and Moore’s substring searching algorithm.)

7. When I first implemented the algorithms, my scaffolding always compared the answer produced by the various algorithms to that produced by Algorithm 4. I was distraught to see the scaffolding report errors in Algorithms 2b and 3, but when I looked closely at the numerical answers, I found that although not identical, they were very close. What was going on?

8. Modify Algorithm 3 (the divide-and-conquer algorithm) to run in linear worst-case time.

9. We defined the maximum subvector of an array of negative numbers to be zero, the sum of the empty subvector. Suppose that we had instead defined the maximum subvector to be the value of the largest element; how would you change the various programs?

10. Suppose that we wished to find the subvector with the sum closest to zero rather than that with maximum sum. What is the most efficient algorithm you can design for this task? What algorithm design techniques are applicable? What if we wished to find the subvector with the sum closest to a given real number t?

11. A turnpike consists of n – 1 stretches of road between n toll stations; each stretch has an associated cost of travel. It is trivial to tell the cost of going between any two stations in O(n) time using only an array of the costs or in constant time using a table with O(n2) entries. Describe a data structure that requires O(n) space but allows the cost of any route to be computed in constant time.

12. After the array x[0..n – 1] is initialized so that every element is zero, n of the following operations are performed

for i = [l, u]
    x[i] += v

where l, u and v are parameters of each operation (l and u are integers satisfying 0≤lu<n, and v is a real number). After the n operations, the values of x[0..n – 1] are reported in order. The method just sketched requires O(n2) time. Can you find a faster algorithm?

13. In the maximum subarray problem we are given an n×n array of reals, and we must find the maximum sum contained in any rectangular subarray. What is the complexity of this problem?

14. Given integers m and n and the real vector x[n], find the integer i (0≤i < nm) such that the sum x[i] +...+ x[i + m] is nearest zero.

15. What is the solution of the recurrence T(n) = 2T(n/2) + cn when T(1) = 0 and n is a power of two? Prove your result by mathematical induction. What if T(1) = c?

8.8 Further Reading

Only extensive study and practice can put algorithm design techniques at your fingertips; most programmers will get this only from a course or textbook on algorithms. Data Structures and Algorithms by Aho, Hopcroft and Ullman (published by Addison-Wesley in 1983) is an excellent undergraduate text. Chapter 10 on “Algorithm Design Techniques” is especially relevant to this column.

Cormen, Leiserson and Rivest’s Introduction to Algorithms was published by MIT Press in 1990. This thousand-page volume provides a thorough overview of the field. Parts I, II and III cover foundations, sorting and searching. Part IV on “Advanced Design and Analysis Techniques” is particularly relevant to the theme of this column. Parts V, VI and VII survey advanced data structures, graph algorithms and other selected topics.

Those books and seven others were collected onto a CD-ROM as Dr. Dobb’s Essential Books on Algorithms and Data Structures. The CD was published in 1999 by Miller Freeman, Inc. It is an invaluable reference work for any programmer interested in algorithms and data structures. As this book goes to press, the complete electronic set could be ordered from the Dr. Dobb’s web site at www.ddj.com for about the price of one physical book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.140.204