Column 11: Sorting

How should you sort a sequence of records into order? The answer is usually easy: Use a library sort function. We took this approach in Solution 1.1 and twice in the anagram program in Section 2.8. Unfortunately, this plan doesn’t always work. Existing sorts may be cumbersome to employ or too slow to solve a particular problem (as we saw in Section 1.1). In such cases, a programmer has no choice but to write a sort function.

11.1 Insertion Sort

Insertion Sort is the method most card players use to sort their cards. They keep the cards dealt so far in sorted order, and as each new card arrives they insert it into its proper relative position. To sort the array x[n] into increasing order, start with the first element as the sorted subarray x[0..0] and then insert the elements x[1], ..., x[n – 1], as in this pseudocode.

for i = [1, n)
    /* invariant: x[0..i-1] is sorted */
    /* goal: sift x[i] down to its
         proper place in x[0..i] */

The following four lines show the progress of the algorithm on a four-element array. The “|” represents the variable i; elements to its left are sorted, while elements to its right are in their original order.

3|1 4 2
1 3|4 2
1 3 4|2
1 2 3 4|

The sifting is accomplished by a right-to-left loop that uses the variable j to keep track of the element being sifted. The loop swaps the element with its predecessor in the array as long as there is a predecessor (that is, j>0) and the element hasn’t reached its final position (that is, it is out of order with its predecessor). Thus the entire sort is isort 1:

for i = [1, n)
    for (j = i; j > 0 && x[j-1] > x[j]; j--)
        swap(j-1, j)

For those rare times when I need to write my own sort, that’s the first function I try; it’s just three straightforward lines.

Programmers inclined to tuning code might be uncomfortable with the swap function call in the inner loop. We can perhaps speed up the code by writing the function body inline, though many optimizing compilers will do this for us. We replace the function with this code, which uses the variable t to exchange x[j] and x[j – 1].

t = x[j];  x[j] = x[j-1];  x[j-1] = t

On my machine, isort 2 takes about one third the time of isort 1.

This change opens the door for another speedup. Because the variable t is assigned the same value over and over (the value originally in x[i]), we can move the assignments to and from t out of the loop, and change the comparison to yield isort 3:

for i = [1, n)
    t = x[i]
    for (j = i; j > 0 && x[j-1] > t; j--)
        x[j] = x[j-1]
    x[j] = t

This code shifts elements right by one position so long as t is less than the array value, and finally moves t into its correct position. This five-line function is a little more subtle than its predecessors, but on my system it runs about fifteen percent faster than the isort 2 function.

On random data as well as in the worst case, the run time of Insertion Sort is proportional to n2. Here are the run times of the three programs when their inputs are n random integers:

Image

The third sort takes a few milliseconds to sort n = 1000 integers, a third of a second to sort n = 10,000 integers, and almost an hour to sort a million integers. We’ll soon see code to sort a million integers in under a second. If the input array is already almost sorted, though, Insertion Sort is much faster because each element sifts down just a short distance. An algorithm in Section 11.3 exploits exactly this property.

11.2 A Simple Quicksort

This algorithm was described by C. A. R. Hoare in his classic paper “Quicksort” in the Computer Journal 5, 1, April 1962, pp. 10-15. It uses the divide-and-conquer approach of Section 8.3: to sort an array we divide it into two smaller pieces and sort those recursively. For instance, to sort the eight-element array

Image

we partition it around the first element (55) so that all elements less than 55 are to the left of it, while all greater elements are to its right

Image

If we then recursively sort the subarray from 0 to 2 and the subarray from 4 to 7, independently, the entire array is sorted.

The average run time of this algorithm is much less than the O(n2) time of Insertion Sort, because a partitioning operation goes a long way towards sorting the sequence. After a typical partitioning of n elements, about half the elements are above the partition value and half the elements are below it. In a similar amount of run time, the sift operation of Insertion Sort manages to get just one more element into the right place.

We now have a sketch of a recursive function. We represent the active portion of the array by the two indices l and u, for the lower and upper limits. The recursion stops when we come to an array with fewer than two elements. The code is

void qsort(l, u)
    if l >= u then
        /* at most one element, do nothing */
        return
    /* goal: partition array around a particular value,
       which is eventually placed in its correct position p
     */
    qsort(l, p-1)
    qsort(p+1, u)

To partition the array around the value t we’ll start with a simple scheme that I learned from Nico Lomuto. We will see a faster program for this job in the next section, but this function is so easy to understand that it’s hard to get wrong, and it is by no means slow. Given the value t, we are to rearrange x[a..b] and compute the index m (for “middle”) such that all elements less than t are to one side of m, while all other elements are on the other side. We’ll accomplish the job with a simple for loop that scans the array from left to right, using the variables i and m to maintain the following invariant in array x.

Most presentations of Quicksort use the two-sided partitioning in the next section. Although the basic idea of that code is simple, I have always found the details tricky — I once spent the better part of two days chasing down a bug hiding in a short partitioning loop. A reader of a draft of this column complained that the standard method is in fact simpler than Lomuto’s, and sketched some code to make his point; I stopped looking after I found two bugs.

Image

When the code inspects the ith element it must consider two cases. If x[i] >t then all is fine; the invariant is still true. On the other hand, when x[i] <t, we can regain the invariant by incrementing m (which will index the new location of the small element), and then swapping x[i] and x[m]. The complete partitioning code is

m = a-1
for i = [a, b]
    if x[i] < t
        swap(++m, i)

In Quicksort we’ll partition the array x[l.u] around the value t = x[l]; a will therefore be l + 1 and b will be u. Thus the invariant of the partitioning loop is depicted as

Image

When the loop terminates we have

Image

We then swap x[l] with x[m] to give

It is tempting to ignore this step and to recur with parameters (l, m) and (m + 1, u); unfortunately, this gives an infinite loop when t is the strictly greatest element in the subarray. I would have caught the bug had I tried to verify termination, but the reader can probably guess how I really discovered it. Miriam Jacob gave an elegant proof of incorrectness: since x[l] is never moved, the sort can only work if the minimum element in the array starts in x[0].

Image

We can now recursively call the function twice with the parameters (l, m – 1) and (m + 1, u).

The final code is our first complete Quicksort, qsort 1. To sort the array x[n], we call the function qsort 1(0, n – 1).

void qsortl(l, u)
    if (l >= u)
        return
    m = l
    for i = [l+1, u]
        /* invariant: x[l+1..m]   <  x[l] &&
                      x[m+1..i-1] >= x[l] */
        if (x[i] < x[l])
            swap(++m, i)
    swap(l, m)
    /* x[l..m-1] < x[m] <= x[m+1..u] */
    qsort1(l, m-1)
    qsort1(m+1, u)

Problem 2 describes Bob Sedge wick’s modification to this partitioning code, which results in the slightly faster qsort 2.

Most of the proof of correctness of this program was given in its derivation (which is, of course, its proper place). The proof proceeds by induction: the outer if statement correctly handles empty and single-element arrays, and the partitioning code correctly sets up larger arrays for the recursive calls. The program can’t make an infinite sequence of recursive calls because the element x[m] is excluded at each invocation; this is exactly the same argument that we used in Section 4.3 to show that binary search terminates.

This Quicksort runs in O(n log n) time and O(log n) stack space on the average, for an input array that is a random permutation of distinct elements. The mathematical arguments are similar to those in Section 8.3. Most algorithms texts analyze Quicksort’s run time, and also prove the lower bound that any comparison-based sort must use O(n log n) comparisons; Quicksort is therefore close to optimal.

The qsort 1 function is the simplest Quicksort I know, and it illustrates many important attributes of the algorithm. The first is that it is indeed quick: on my system, this function sorts a million random integers in just over a second, about twice as fast as the finely-tuned C library qsort. (That function has an expensive general-purpose interface.) This sort may be appropriate for a few well-behaved applications, but it has another property of many Quicksorts: it can degrade to quadratic time on some common inputs. The next section studies more robust Quicksorts.

11.3 Better Quicksorts

The qsort 1 function quickly sorts an array of random integers, but how does it perform on nonrandom inputs? As we saw in Section 2.4, programmers often sort to bring together equal elements. We should therefore consider an extreme case: an array of n identical elements. Insertion sort performs very well on this input: each element is sifted down zero positions, so the total run time is O(n). The qsort 1 function, on the other hand, blows this case badly. Each of n – 1 partitioning passes uses O(n) time to peel off a single element, so the total run time is O(n2). The run time for n = 1,000,000 jumps from a second to two hours.

We can avoid this problem with careful two-sided partitioning code, using this loop invariant:

Image

The indices i and j are initialized to the two extremes of the array to be partitioned. The main loop has two inner loops. The first inner loop moves i up over small elements and stops on a large element; the second inner loop moves j down over large elements and stops on a small element. The main loop then tests the indices for crossing and swaps the two values.

But how does the code respond to equal elements? The first temptation is to scan right over them to avoid doing excess work, but this leads to quadratic behavior when all inputs are equal. We will instead stop each scan on equal elements, and swap them. Although this approach performs more swaps than necessary, it transforms the worst case of all equal elements into a best case that requires almost exactly n log2 n comparisons. This code implements that partitioning:

void qsort3(l, u)
    if l >= u
        return
    t = x[l]; i = l; j = u+1
    loop
        do i++ while i <= u && x[i] < t
        do j-- while x[j] > t
        if i > j
            break
        swap(i, j)
    swap(l, j)
    qsort3(l, j-1)
    qsort3(j+1, u)

In addition to taming the degenerate case of all equal elements, this code also performs fewer swaps on the average than qsort 1.

The Quicksorts that we have seen so far always partition around the first element in the array. This is fine for random inputs, but it can require excessive time and space for some common inputs. If the array is already sorted in increasing order, for instance, it will partition around the smallest element, then the second smallest, and so on through the entire array in order, in O(n2) time. We do far better to choose a partitioning element at random; we accomplish this by swapping x[l] with a random entry in x[l..u]:

swap(l, randint(l, u));

If you don’t have the randint function handy, Problem 12.1 investigates how you might make your own. But whatever code you use, be careful that randint returns a value in the range [l, u] — a value out of range is an insidious bug. Combining the random partition element and the two-way partitioning code, the expected run time of the Quicksort is now proportional to n log n, for any input array of n elements. The random performance bound is a result of calling the random number generator, rather than an assumption about the distribution of inputs.

Our Quicksorts spend a great deal of time sorting very small subarrays. It would be faster to sort those using a simple method like Insertion Sort rather than firing up all the machinery of Quicksort. Bob Sedgewick developed a particularly clever implementation of this idea. When Quicksort is called on a small subarray (that is, when l and u are near), we do nothing. We implement this by changing the first if statement in the function to

if u-1 > cutoff
    return

where cutoff is a small integer. When the program finishes, the array will not be sorted, but it will be grouped into small clumps of randomly ordered values such that the elements in one clump are less than elements in any clump to its right. We must clean up within the clumps by another sort method. Because the array is almost sorted, Insertion Sort is just right for the job. We sort the entire array by the code

qsort4(0, n-1)
isort3()

Problem 3 investigates the best choice for cutoff.

As a final piece of code tuning, we may expand the code for the swap function in the inner loop (because the other two calls to swap aren’t in the inner loop, writing them in line would have a negligible impact on the speed). Here is our final Quicksort, qsort 4:

void qsort4(l, u)
    if u – 1 < cutoff
        return
    swap(l, randint(l, u))
    t = x[l]; i = l; j = u+1
    loop
        do i++; while i <= u && x[i] < t
        do j--; while x[j] > t
        if i > j
            break
        temp = x[i]; x[i] = x[j]; x[j] = temp
    swap(l, j)
    qsort4(l, j-1)
    qsort4(j+l, u)

Problems 4 and 11 mention further ways of improving Quicksort’s performance.

The versions of Quicksort are summarized in this table. The right column gives the average run time in nanoseconds to sort n random integers; many of the functions can degrade to quadratic behavior on some inputs.

Image

The qsort 4 function uses 15 lines of C, and also the 5 lines of isort 3. For a million random integers, the run times range from 0.6 seconds for the C++ sort to 2.7 seconds for the C qsort. In Column 14, we’ll see an algorithm that is guaranteed to sort n integers in O(n log n) time, even in the worst case.

11.4 Principles

This exercise teaches several important lessons about both sorting in particular and programming in general.

The C library qsort is easy and relatively fast; it is slower than the hand-made Quicksorts only because its general and flexible interface uses a function call for each comparison. The C++ library sort has the simplest interface: we sort the array x with the call sort(x, x + n); it also has a particularly efficient implementation. If a system sort can meet your needs, don’t even consider writing your own code.

Insertion Sort is simple to code and may be fast enough for small sorting jobs. Sorting 10,000 integers with isort 3 requires just a third of a second on my system.

For large n, the O(n log n) run time of Quicksort is crucial. The algorithm design techniques of Column 8 gave us the basic idea for the divide-and-conquer algorithm, and the program verification techniques of Column 4 helped us implement the idea in succinct and efficient code.

Even though the big speedups are achieved by changing algorithms, the code tuning techniques of Column 9 speed up Insertion Sort by a factor of 4 and Quicksort by a factor of 2.

11.5 Problems

1. Like any other powerful tool, sorting is often used when it shouldn’t be and not used when it should be. Explain how sorting could be overused or underused when calculating the following statistics of an array of n floating point numbers: minimum, maximum, mean, median and mode.

2. [R. Sedgewick] Speed up Lomuto’s partitioning scheme by using x[l] as a sentinel. Show how this scheme allows you to remove the swap after the loop.

3. How could you experiment to find the best value of cutoff on a particular system?

4. Although Quicksort uses only O(log n) stack space on the average, it can use linear space in the worst case. Explain why, then modify the program to use only logarithmic space in the worst case.

5. [M. D. McIlroy] Show how to use Lomuto’s partitioning scheme to sort varying-length bit strings in time proportional to the sum of their lengths.

6. Use the techniques of this column to implement other sorting algorithms. Selection sort first places the smallest value in x[0], then the smallest remaining value in x[1], and so forth. Shell sort (or “diminishing increment sort”) is like Insertion Sort, but moves elements down h positions rather than just one position. The value of h starts large, and shrinks.

7. Implementations of the sorting programs in this column can be found on this book’s web site. Time them on your system and summarize them in a table like that in Section 11.3.

8. Sketch a one-page guide to show a user of your system how to select a sort. Make sure that your method considers the importance of run time, space, programmer time (development and maintenance), generality (what if I want to sort character strings that represent Roman numerals?), stability (items with equal keys should retain their relative order), special properties of the input data, etc. As an extreme test of your method, try feeding it the sorting problem described in Column 1.

9. Write a program for finding the kth-smallest element in the array x[0..n-1] in O(n) expected time. Your algorithm may permute the elements of x.

10. Gather and display empirical data on the run time of a Quicksort program.

11. Write a “fat pivot” partitioning function with the postcondition

Image

How would you incorporate the function into a Quicksort program?

12. Study sorting methods used in non-computer applications (such as mail rooms and change sorters).

13. The Quicksort programs in this column choose a partitioning element at random. Study better choices, such as the median element of a sample from the array.

14. The Quicksorts in this column represented a subarray with two integer indices; this is necessary in a language like Java, which does not have pointers into arrays. In C or C++, though, it is possible to sort an array of integers using a function like

void qsort(int x[], int n)

for the original call and for all recursive calls. Modify the algorithms in this column to use this interface.

11.6 Further Reading

Since the first edition was published by Addison-Wesley in 1973, Don Knuth’s Art of Computer Programming, Volume 3: Sorting and Searching has been the definitive reference on the topic. He describes all important algorithms in detail, analyzes their run times mathematically, and implements them in assembly code. The exercises and references describe many important variations of the primary algorithms. Knuth updated and revised the book for a second edition in 1998. The MIX assembly language that he uses shows its age, but the principles revealed in the code are timeless.

Bob Sedgewick gives a more modern treatment of sorting and searching in the monumental third edition of his Algorithms. Parts 1 through 4 cover Fundamentals, Data Structures, Sorting and Searching. Algorithms in C was published by Addison-Wesley in 1997, Algorithms in C++ (with C++ consulting by Chris Van Wyk) appeared in 1998, and Algorithms in Java (with Java consulting by Tim Lindholm) appeared in 1999. He emphasizes the implementation of useful algorithms (in the language of your choice), and intuitively explains their performance.

These two books are the primary references for this column on sorting, for Column 13 on searching, and for Column 14 on heaps.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.86.18