Chapter 11 Sorting

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Sorting

Objectives

 The concept and implementation of standard sorting algorithms—

bubblesort, heapsort, insertionsort, mergesort, and quicksort

 Analysis of the worst-case running times of the standard sorting algorithms

 Analysis of the average-case running times of the standard sorting algo-

rithms

 The diﬀerence between worst-case and average-case running times of algo-

rithms and how to perform an average-case analysis

 Practical considerations of sorting: auxiliary space, disk, the extent to which

the input data is already sorted, multilevel sorting, and stability

 The lower bound on sorting by comparisons

 * Sorting on a distributed memory parallel computer

Key Terms

average-case running

time

best-case running time

bubblesort

bucketsort

distributed memory

parallel computer

in-place algorithm

insertionsort

key

mergesort

out-of-place algorithm

pivot element

quicksort

shared memory parallel

computer

stable sort

worst-case running time

Introduction

It has been asserted by Knuth that perhaps a third of all CPU cycles are spent in

sorting data. This is a huge fraction, but the problem of sorting data comes up in

247

248 CHAPTER 11 SORTING

many circumstances when it might not be so obvious that sorting was important.

Sorting and searching go hand in hand (indeed, Volume 3 of The Art of Computer

Programming by Knuth bears Sorting and Searching as the title), and creating and

maintaining sorted lists, or searchable trees like a priority queue with a heap or a

binary search tree, is enormously important. For example, when a Java program

is compiled, the compiler must do a lookup every time a symbol or variable name

appears. Every time Google does a search, it must process index ﬁles. We have

already seen that in the absence of some means for creating a sorted or search-

able list, the time needed to ﬁnd a single record among N records is, on average,

O(N/2), which is expensive. With binary search on a sorted or indexed ﬁle, as in

our FlatFile example, or with a binary search tree, that cost can be reduced to

O(lg N). With a heap used as a priority queue, we can extract the most important

entry and then rebuild the heap structure in O(lg N) time.

As is customary, we will refer to sorting keys under the assumption that our data

record, which might be an entire student transcript, is to be sorted based on the

value of some key, like the student’s ID number. Although it sometimes happens

that there is an assigned key for a record, like an ID number or a Social Security

number, it is often the case that the key is constructed from data in the record.

For example, one might try to construct as unambiguous a key as possible from

last name, ﬁrst name, and middle name (probably in that order) when sorting an

address book. This involves some art rather than science, and perhaps also some

rules for deciding how to alphabetize. We will gloss over somewhat the process of

creating the key and simply assume that keys can be created based on a set of rules.

Our interest is in how to use the keys in sorting algorithms.

 11.1 Worst-Case, Best-Case, and Average-Case

In this chapter we will, for the ﬁrst time, explore more deeply the issue of worst-

case running time versus average-case running time and best-case running time.For

the purpose of doing asymptotic analysis, we always concentrate on the worst-case

running time, and we would like to discover algorithms that are fast even in the

worst case. However, as we shall see with quicksort, an algorithm that has a good

average-case behavior might be preferable if its implementation takes advantage of

other factors beyond simple asymptotics. The usual goal, after all, is to sort, not

to prove theorems about how one might sort.

Under some circumstances we might also look at the best-case behavior of an

algorithm. We can’t write code as if we are going to encounter the best case in

practice, but we can learn sometimes from best-case behavior. For example, there

are some sorting algorithms that are ﬁxed-cost algorithms, taking always the same

11.2 Bubblesort 249

length of time regardless of the nature of the input data. Some algorithms, though,

are much more eﬀective on almost-sorted data than they are on random data. The

algorithm chosen to sort a large ﬁle that is known to be nearly sorted to begin with

will probably be diﬀerent from the algorithm chosen to sort a ﬁle of data about

which we know nothing.

 11.2 Bubblesort

The simple version of code for a bubblesort,asusedintheIndexedFlatFile

example, is given in Figure 11.1.

public void bubblesort()

{

for(int length = this.getSize(); length > 1; --length)

{

for(inti=0;i<length-1; ++i)

{

if(record[i] > record[i+1])

{

CODE TO SWAP record[i] WITH record[i+1]

}

FIGURE 11.1

 Code fragment for a bubblesort.

As should be clear by now, this is an O(N

) computation. The outer loop iterates

N −1 times. In the ﬁrst iteration of the outer loop, the inner loop iterates N − 2

times. In the next outer iteration, the inner loop iterates N − 3times,andsoforth.

This is a total of

(N −1) + (N − 2) + ···+1=

(N −1)N

= O(N

)

iterations of the inner loop code to compare and perhaps swap two records based

on comparing the values of their keys.

This algorithm works because in each iteration of the outer loop we exchange

pairs, bubbling the largest element down the list into the current “last” position,

and by the time we have ﬁnished the ith iteration, the element in the ith location

from the end is the ith largest of all elements. With each outer loop iteration, then,

we ﬁx only one more element into its proper place.

250 CHAPTER 11 SORTING

 11.3 Insertionsort

An insertionsort is not asymptotically better than a bubblesort, but it can be better

in practice under some circumstances. The insertionsort from Chapter 4 is given in

Figure 11.2.

public void insertionsort()

{

for(inti=1;i<=n;++i)

{

insertSub = i;

for(int j = insertSub-1; j >= 0; --j)

{

if(key[insertSub] > key[j])

{

CODE TO SWAP record[insertSub] WITH record[j]

insertSub = j;

}

else

break;

}

FIGURE 11.2

 Code fragment for an insertionsort.

This is really very similar to a bubblesort, in that we are ﬁxing only one entry into

position with each iteration of the outer loop. In the ﬁrst iteration, we compare key

1 with key 0 and swap if the two records are out of order. After this ﬁrst iteration,

the records at subscripts 0 and 1 are in the proper order. In the second iteration, we

“pull” record 2 forward through records 1 and 0 (if necessary) and swap if necessary

so that when we ﬁnish this iteration, the three records at subscripts 0 through 2

will be in the proper order. In the worst case, then, when the records are presented

in exactly reverse sorted order, this requires

1+2+···+(N −1) =

(N −1)N

= O(N

)

comparisons.

In the best case, however, insertionsort could run in linear time. If the records

are presented in correct sorted order, then we will test the new value against the

last of the old values. These two will be in the correct order already, so we know

11.4 Improving Bubblesort 251

that the entire list with the new value included is in the correct order, and we break

out of the loop with every iteration after only one comparison.

In the average case of insertionsort, we have to progress halfway through the

current list with every iteration, so we only cut a factor of

oﬀ the worst-case total

comparison count, and that’s still O(N

) comparisons.

Finally, we point out that insertionsort as presented here runs an outer loop

on subarrays of lengths 2, 3,..., and in the inner loop pulls the last entry in the

subarray forward until it drops into place. Our code actually does a swap of two

entries if they are out of place. This isn’t necessary; if we save oﬀ the entry we are

trying to position, we can avoid half the stores into the array by simply moving the

one that is out of place down one location, and then eventually just dropping the

entry to be positioned into its correct location. Whether this will actually save any

time will depend on how the computer actually does the storing back into memory.

Although a program that reads a data item can’t progress until it has that data

item, a program that stores a data item can continue if the lower-level code and

hardware of the machine carry out the store in the background.

 11.4 Improving Bubblesort

In both the bubblesort and insertionsort presented here, the eﬀect of the outer loop

is to move exactly one element into its proper position with every iteration. We seize

upon the one element to be placed into position and compare and exchange only

with that element. In the case of bubblesort, the element is the “current largest”

(or smallest); in the case of insertionsort, the element is the new element at the end

of the list. We also saw that insertionsort could run in linear time, but bubblesort

as described previously will always take worst-case time.

There are variations of these sorts that still have worst-case O(N

) running time

but could ﬁnish much faster.

In an improved version of bubblesort, we keep track of whether there have been

any swaps in a given outer loop. If there have been no swaps, then there were no

pairs of elements out of place, which means that the list is sorted. The code for

this version could look like Figure 11.3. After the ith iteration of the outer loop,

we know that the element at position i + 1 (and all subsequent elements) are in the

correct locations. However, in this variation we compare all adjacent elements and

swap any pairs that are out of order, and we keep track of whether any swaps in

the current inner iteration have been necessary. If there have been no swaps in the

inner iteration, then there were no pairs out of order and we can exit early.

252 CHAPTER 11 SORTING

public void bubblesort()

{

boolean didSwap = true;

length = this.getSize();

while(didSwap)

{

didSwap = false;

for(inti=0;i<length-1; ++i)

{

if(record[i] > record[i+1])

{

CODE TO SWAP record[i] WITH record[i+1]

didSwap = true;

}

--length;

}

FIGURE 11.3

 Code fragment for an improved bubblesort.

 11.5 Heapsort

We will learn later, in Section 11.12.1, that any sort that is done with comparisons,

and indeed any sort of a large number of data records, cannot be accomplished in

fewer than O(N lg N) comparisons in the worst case. The heapsort is one of a small

number of sorts that is guaranteed to run in exactly O(N lg N) comparisons for the

worst case and the average case, so it is a sort that is guaranteed to run in the

asymptotically best possible time. Algorithmically, heapsort is accomplished with

the following algorithm. On an array of N records,

// Run code to build a max-heap

for (i = N; i > 0; --i)

{

Exchange the root data item with the item at location i

Run fixHeapDown(*) on the root of the tree/heap/array

}

What this does is the following. After building the max-heap, the largest value

in the array is stored at the root. We exchange this largest value with the value

stored at location N, so the maximal value is now at location N. We shorten the

array by one so that this maximal value will never be looked at again, and then we

11.6 Worst-Case, Best-Case, and Average-Case (Again) 253

recreate the heap structure using fixHeapDown() to push the new value stored at

theroot(andthatusedtobeatlocationN) down into its proper place. After this

step, the second-largest value in the entire array, which is the largest value in the

newly shortened array, is now stored at the root. We exchange this with the value

at location N − 1, shorten the array again, and rebuild the heap.

We have N items to extract from the root and place at the end of the array,

and for each of these extractions, we pay O(lg N) comparisons to rebuild the heap.

Thus the entire process requires, for extracting the values in sorted order, O(N lg N)

comparisons. We combine this with the O(N lg N) running time for creating the

heap in the ﬁrst place to get an overall running time of O(N lg N).

(Remember also that we said back in Chapter 7 that there was an O(N) algorithm

to build the heap from scratch. One of the reasons we weren’t so worried about the

O(N lg N) cost is that the heapsort takes O(N lg N) time to sort after having built

the heap, so we couldn’t get an asymptotically better sort even if we did build the

heap in O(N) time.)

 11.6 Worst-Case, Best-Case, and Average-Case (Again)

Determining best-case, average-case, and worst-case running times is important for

sorting because sorting is important. It is also important because there are a number

of situations in which we know that data will be partly sorted or that data will have

speciﬁc characteristics, so if we can exploit what we know about the data we can

have algorithms that run faster than worst case. In the case of insertionsort, for

example, we have O(N) best case if the data is already sorted correctly, but O(N

)

average-case and worst-case running times. Similarly, our variant of bubblesort, in

which we keep track of the number of swaps that have been necessary, and if there

are no swaps we can terminate early, provides a best-case O(N) running time on a

list that is already sorted. But for both insertionsort and bubblesort, if the data are

sorted in exactly backward order, we have to pull every new entry all the way to the

beginning of the array or push every entry to the end. Unfortunately, the average

case is no better asymptotically than the worst case: If the data are randomly

presented to the insertionsort, then on average we save only half the worst-case

time because on average we have to move an entry halfway into the list before it

can be placed in its proper location.

In the case of heapsort, the best-case, worst-case, and average-case running times

are all O(N lg N), making it at least a totally predictable algorithm to implement.

Quicksort, on the other hand, to be presented later in the chapter, has become the

de facto standard sorting method because, although its worst-case time is O(N

its average-case running time is O(N lg N) and its operating characteristics make it

highly suited for a variety of standard computing platforms. Further, the constant

254 CHAPTER 11 SORTING

out in front of the big Oh seems, based on experimentation, to be better than that

of heapsort. Our own tables and graph in Figures 11.5–11.7 bear out this fact.

One feature of sorting that makes it somewhat easier to analyze the algorithms

is that we know exactly what the possible data inputs are. Given three data items

a, b, c to be sorted, they can be presented to the sort in exactly 6 = 3! possible

permutations (orderings): abc, acb, bac, bca, cab, cba. With four data items, there are

4! = 24 possible permutations:

abcd, abdc, acbd, acdb, adbc, adcb,

bacd, badc, bcad, bcda, bdac, bdca,

cabd, cadb, cbad, cbda, cdab, cdba,

dabc, dacb, dbac, dbca, dcab, dcba.

Andsoforth.IfwehaveN data values, then we can have N! possible permu-

tations of those data values, as is easily seen: We can choose the ﬁrst value in N

ways. For the second value, we have N − 1 unused values, and we can thus choose

the second value to be any of these. For the third value, we have N −2 choices,

and so forth. All these choices are independent, so the number of possibilities is the

product of the number of individual choices, and we have

N ×(N − 1) ×···2 × 1=N!

total possibilities. As stated in Chapter 6, we can use Stirling’s formula to get

an approximation for the number of possible diﬀerent permutations as data inputs.

This approximation has (N/e)

as its main term, and grows faster than an ordinary

exponential function e

For examining the average-case behavior of an algorithm, the symmetry inher-

ent in the set of possible permutations helps to make analysis possible. For every

instance of insertionsort, for example, in which the new element to be inserted is

smaller than all the existing elements, and thus needs to be pulled all the way to the

front, there is a corresponding permutation in which the new element is larger than

all the existing elements and will only need to be compared against the element

at the end. These pairs of input permutations will be mutually exclusive and will

exhaust the possible inputs. In this instance, then, the average case can be found

as the average of best case and worst case.

The fact that the possible inputs to a sorting algorithm include all possible

permutations also makes harder our analysis and choice of algorithm, partly because

the number of permutations grows rapidly and partly because having all possible

inputs as legal means that we have no aprioriinformation that we can use to our

advantage.

Unless we have additional knowledge of what the data will look like, we will not be

able to assume anything except the worst case for input data in analyzing worst-case

11.7 Mergesort 255

running time, and therefore we have to prepare for the worst. If, however, we have

reason to believe that the data do not appear in entirely random order, then it is

conceivable that a less-than-optimal sorting algorithm might still be appropriate. If,

for example, one has an array of N records already sorted, and it is necessary to add

only another m records, then an insertionsort will run in only m · N time. If m is

much smaller than N, this may not actually be so bad. If the situation is that a very

large ﬁle needs to be updated only occasionally, then the insertion of new records

with an insertionsort is likely to be preferable to even a good sorting algorithm if

the “good” algorithm would require resorting a large and almost-sorted ﬁle.

 11.7 Mergesort

Mergesort is a divide-and-conquer algorithm that requires 2N space in order to

sort N records but completes a sort in O(N lg N) time. The 2N space includes the

original N-sized space to hold the array to be sorted, but the downside of mergesort

is that it cannot be done in place but must be done out of place using auxiliary

space equal to the size of the original data. This need for extra space is in signiﬁcant

contrast to bubblesort, insertionsort, heapsort, and quicksort, each of which requires

only one extra storage location (as temporary space for the swap) beyond the space

for the original data array.

Mergesort can be described top-down or bottom-up. Either way, the motivation

for mergesort comes from the following steady state merge operation. Given a sorted

array A of length N and a sorted array B of length N, we create the sorted array

of length 2N containing all the elements of arrays A and B with the pseudocode

in Figure 11.4 (this isn’t correct Java, but the intent of the pseudocode should be

obvious and we want to emphasize the array lengths).

What this code fragment does is the following. We have a pointer into the array

A,apointerintothearrayB, and a pointer into the array C,eachofwhichis

initialized to point to the beginning of the arrays.

We now repeat:

• If the value pointed to in the A array is smaller than the value pointed to in

the B array, then we copy the value from the A array, bump the A pointer by

one, and bump the C pointer by one.

• If the value pointed to in the B array is the smaller, then we copy that value

into C and bump the pointers for B and C instead of those for A and C.

As long as we have data in both arrays, we copy the smaller of the two values into

the result array C, and thus in general merge the two sorted arrays into one sorted

array. At some point we run out of one or the other of A or B.Ifithappensthatthe

256 CHAPTER 11 SORTING

public int[2*N] merge(int[N] A, int[N] B)

{

int[2*N] C;

ASubscript = 0;

BSubscript = 0;

CSubscript = 0;

while((ASubscript < N) && (BSubscript < N))

{

if(A[ASubscript] <= B[BSubscript]

{

C[CSubscript] = A[ASubscript];

++ASubscript;

}

else

{

C[CSubscript] = B[BSubscript];

++BSubscript;

}

++CSubscript;

}

// Note that only one of the following two loops actually executes

while(ASubscript < N)

{

C[CSubscript] = A[ASubscript];

++ASubscript;

++CSubscript;

}

while(BSubscript < N)

{

C[CSubscript] = B[BSubscript];

++BSubscript;

++CSubscript;

}

FIGURE 11.4

 Pseudocode for a merge.

B array still contains data, then all the keys remaining in B are larger than any of

the keys in A, and since they are in sorted order already, we can simply copy them

in place into the result array. If it happens that the A array still contains data, we

do the corresponding copy from A.

We note three basic facts about this central operation.

• First, the question of equal keys doesn’t aﬀect anything. If there are equal

keys, then we can (unless there are other conditions imposed) choose entries

11.7 Mergesort 257

from either array without aﬀecting the fact that the eventual array is in

sorted order.

• Second, there is no way to avoid using extra storage space. In the best case,

we are interleaving entries from A and B, and we would need only ﬁxed extra

storage. But in the worst case, all the entries of one array (say A) are smaller

than all the entries of B and thus we have to copy all of A into C and then copy

all of B. The only way to do this is by having space in the result array for both

subarrays.

• Finally, this merge runs in linear time. With two arrays of length N,wecan

perform the merge in worst-case 2N comparisons (in the case of perfectly inter-

leaved data).

The mergesort algorithm can be viewed either top-down as a recursive algorithm

or bottom-up as an algorithm that creates with each phase a sorted list of 2N

items by merging two lists of N items each. It is perhaps better understood in the

bottom-up form.

Consider an array of 2

items arranged as 2

n−1

pairs of data items. For example,

if n = 4, we would have 16 items in all that could be arranged as 8 pairs as

shown here.

1 3

7 4

0 8

2 1

9 2

8 3

9 8

7 8

In the ﬁrst pass, we use one comparison per pair, eight times over, to sort the

eight pairs into proper order. (This is in fact the same algorithmic step as the merge

phases later on, but it’s a degenerate merge when we have only two items.)

1 3

4 7

0 8

1 2

2 9

3 8

8 9

7 8

258 CHAPTER 11 SORTING

We now apply the preceding merge operation on the four pairs of pairs.

1 3 →

→

1347

4 7

0 8 → 0128

1 2

2 9 → 2389

3 8

8 9 → 7889

7 8

We now have four quadruples, which we can merge into pairs.

1347 →

→

01123478

0128

238

→ 23788899

Finally, we merge the pair of arrays of length 8.

→

2378889

1123478 112233477888899

We have described mergesort from the bottom up, which is arguably the more

intuitive approach. If we were to code this, the outer loop of the program would

look something like the following.

// N = 2^n items to be sorted

for(int phase = 1; phase <= n; ++phase)

{

// merge the 2^(n-phase) pairs of arrays of length 2^phase

}

Note that the code inside the loop requires O(N) comparisons. Our earlier

example merge code would perform a merge of two N-long arrays in 2N compar-

isons. Indeed we can merge all the pairs of arrays in time equal to the total length.

If we are merging pairs of arrays of length 8 into arrays of length 16, for example,

our loop would look something like the following.

11.7 Mergesort 259

pointer1 = 0

pointer2 = 8

while((pointer1 < N) and (pointer2 < N))

{

merge entries at 0,1,...,7 and entries at 8,9,...15

store the merged set as entries 0,1,...,15

pointer1 = pointer1 + 8

pointer2 = pointer2 + 8

}

Note that we are walking through the array exactly once, even though we are

using a merge loop nested inside the while loop.

Writing a mergesort bottom up like this is not especially diﬃcult, but it is slightly

tedious. The subscripting is a bit complicated and we must get the subscripting

exactly correct. Mergesort can often be written recursively, and the recursive version

is slightly less complicated. In either case, the critical factor becomes the fact that

extra storage is necessary. Done wrong, a program would need additional storage

with every new phase of the sort. (Think back to the extra storage in the Jumble

puzzle code.) Done right, mergesort would be written to ping-pong from one data

array to another, taking data from array X in phase 1 and storing the result in

array Y, then taking data from array Y in phase 2 and storing the result in array X,

and so forth. If one were to implement mergesort recursively, one would absolutely

not want to implement the code as in Figure 11.4 that passed in data arrays as

parameters but would rather have the data arrays stored outside the method and

pointers (subscripts) passed as parameters to the method.

 11.7.1 Analysis of Mergesort

The easiest way to analyze mergesort is to look at the bottom-up algorithm. For

simplicity, we consider the problem of sorting N =2

items. In the ﬁrst merge

phase, we use O(N) comparisons to put 2

n−1

pairs into sorted order. In the second

merge phase, we use O(N) comparisons to merge 2

n−2

quadruples into sorted order.

In the third merge phase, we use O(N) comparisons to merge 2

n−3

octuples into

sorted order. This continues until the entire list is sorted. If N =2

, we will need

to continue for n =lgN phases in order to sort the entire list, which means that

overall we will be doing O(Nn)=O(N lg N) comparisons.

 11.7.2 Disk-Based Sorting

We would be derelict in our duty if we did not make one ﬁnal observation with regard

to mergesort. As will be taught in computer architecture and operating systems

260 CHAPTER 11 SORTING

courses (at least), modern computers are built so as to maximize the appearance

of an ideal computer to the user and to minimize for the user the complexity of the

physical device. This is done in part by trying to match what can be done eﬃciently

on the hardware with the choice of what could be wanted by the user. Since the cost

of physically moving a read head on a disk to a particular location on a disk is high,

compared with the cost of actually doing the read of data, streaming N sequential

items oﬀ a disk into memory is much faster than searching for and accessing N

speciﬁc items located randomly throughout a ﬁle. One slow positioning followed

by N fast reads takes much less time than N slow positionings each followed by a

single N fast read.

If the data to be sorted are on disk instead of in memory (and this will be

the case for extremely large ﬁles to be sorted), there will be a huge advantage in

implementation for a sorting algorithm that can take advantage of streaming data

from a disk, and in this regard, mergesort excels. The mergesort algorithm streams

data sequentially from two source locations (ﬁles), performs simple (and thus fast)

comparisons, and then streams the result data back to the disk to be stored (in

sequential order). Its execution characteristics thus match the characteristics of the

underlying hardware. Although this doesn’t show up in asymptotic analysis (except

in the constant out front), it can make a huge diﬀerence in practice.

 11.8 Quicksort

We now come to the last of the sort methods we will examine, which has become

the de facto standard as the best compromise between worst-case asymptotics,

average-case asymptotics, and execution characteristics on extant machines.

To understand the notion of quicksort, we recall one aspect of each of two previous

sorting algorithms. First, we recall the recursive nature of mergesort when run on an

array of N =2

data items: by splitting the sorting problem in half for each phase,

we run n =lgN phases. Each phase k comprises 2

n−k

subsorts, each of which is

done on 2

data items, and each phase requires N =2

· 2

n−k

comparisons

and exactly one traversal of the entire data set. This gives us a running time of

N lg N, which is good (we will see in a moment that it’s the best we can do). The

negative feature of mergesort is the need to do the merge using extra storage space.

Quicksort accomplishes this recursive sorting without needing extra space.

On the other side, we remember that both bubblesort and insertionsort ﬁxed, in

a single iteration of the outer loop, one element in the array into its proper place.

Since only one element is properly positioned with each outer iteration, and such a

positioning takes N comparisons, these sorts run in N

time.

In contrast, what we will be able to accomplish with quicksort is to ﬁx one

element in phase 1, then two elements in phase 2 (one in each of the two half-arrays

11.8 Quicksort 261

passed recursively to the sort), then four elements in phase 3 (one in each of the

quarter-arrays), and so forth. If we can do this, then in n =lgN phases, each of

which ﬁxes k elements, we will be able to ﬁx

1+2+2

+ ···+2

n−1

− 1=N −1

elements. Fudging N and N − 1 for convenience, this means we will have an

O(N lg N) algorithm, with lg N phases, each of which takes N time.

The Achilles heel of quicksort, when it comes to worst-case asymptotics, is that

we cannot guarantee that our recursive breakdown of the array will divide the array

into equal halves at each step. If we are lucky, and we get roughly equal halves, then

we will run lg N recursive phases just as with mergesort, and on average we will

get the behavior we need for an eﬃcient algorithm. Each of the recursive phases of

quicksort will run in O(N) time. In the worst case, however, it is possible that when

we split a subarray of length m into two “halves,” we will in fact have one “half”

of a single element and the other half containing m − 1 elements. If this happens

with every phase, then we will have N phases and not lg N phases, and the total

running time will be O(N

). Referring to the previous paragraph, in this worst case

we will not be ﬁxing multiple elements into their proper location but in fact ﬁxing

only a single new element with each phase.

 11.8.1 The Quicksort Algorithm

The basic quicksort algorithm is as follows.

1. We choose a pivot element whose key is k. This can be any element in the array,

but most implementations will choose either the ﬁrst or the last element.

2. We divide the full array into three subarrays: subarray A contains the elements

whose keys are less than k; subarray B contains the elements whose keys are

equal to k; and subarray C contains the elements whose keys are greater than k.

3. We recursively sort subarrays A and C, and return the elements of A, B,and

C in that order.

In general, dealing with the case of equal keys adds a certain complexity but

no further understanding to the problem of sorting, and we will assume in our

examples that all keys are distinct.

11.8.1.1 Quicksort—the Magic

Clearly, if we can divide the array into equal subarrays in every phase, then we

would have lg N phases in the computation. If each phase were to take N steps,

the running time would be N lg N. The crucial issue is then the choice of the pivot

262 CHAPTER 11 SORTING

element. A good choice splits the array in half; a bad choice might leave either

subarray A or C empty with all the elements in the other subarray.

Let’s assume for the moment that we have a magical algorithm for choosing a

pivot element. We must still show how to perform each phase of the array-splitting

of step 1 in linear time so as to hope for N lg N overall running time, and we

must show how to perform this splitting so as to do better than mergesort by not

requiring extra space. Consider the following array.

97 11 90 81 34 40 65 72 51 23 57

We choose to pivot on the last element 57, and we set up pointers at the beginning

of the array and at the penultimate entry of the array.

↓↓

97 11 90 81 34 40 65 72 51 23 57

We now move forward from the right pointer. As long as the entry pointed to

by the right pointer is larger than the pivot, we keep moving the right pointer to

the left and checking again. In this case, 23 < 57 so we switch to moving the left

pointer. We test and immediately determine that 57 < 97, so our left pointer and

our right pointer are pointing to elements that are in the wrong half of the array.

We exchange these two values.

↓↓

97 11 90 81 34 40 65 72 51 23 57

↓↓

23 11 90 81 34 40 65 72 51 97 57

We now move the right pointer once and ﬁnd that 51 < 57, so we stop moving

the right pointer and switch again to the left pointer. We ﬁnd that 11 is in the

correct subarray, move the left pointer once, and ﬁnd that 90 is not in the correct

subarray.

↓↓

23 11 90 81 34 40 65 72 51 97 57

↓↓

23 11 90 81 34 40 65 72 51 97 57

↓↓

23 11 90 81 34 40 65 72 51 97 57

11.8 Quicksort 263

We exchange 90 and 51 and continue, moving ﬁrst the right and then the left

pointer until we encounter entries in the wrong locations and then exchanging.

↓↓

23 11 51 81 34 40 65 72 90 97 57

↓↓

23 11 51 81 34 40 65 72 90 97 57

↓↓

23 11 51 81 34 40 65 72 90 97 57

↓↓

23 11 51 81 34 40 65 72 90 97 57

↓↓

23 11 51 81 34 40 65 72 90 97 57

↓↓

23 11 51 40 34 81 65 72 90 97 57

↓↓

23 11 51 40 34 81 65 72 90 97 57

↓↓

23 11 51 40 34 81 65 72 90 97 57

We now exchange the pivot element with the ﬁrst element of the “larger-than”

array. The pivot is now in its proper location and we have two subarrays that we

hope are of roughly equal size.

(2311514034)57 (6572909781)

We make the recursive calls to sort the subarrays.

↓↓

(23 11514034) 57 (65 72 90 97 81)

↓↓

(23 11514034) 57 (65 72 90 97 81)

↓↓

(23 11514034) 57 (65 72 90 97 81)

↓↓

(23 11514034) 57 (65 72 90 97 81)

264 CHAPTER 11 SORTING

↓↓

(23 11 34 40 51) 57 (65 72 90 97 81)

↓↓

(23 11) 34 (40 51) 57 (65 72 90 97 81)

↓↓

(23 11) 34 (40 51) 57 (65 72 90 97 81)

↓↓

(23 11) 34 (40 51) 57 (65 72 90 97 81)

↓↓

(23 11) 34 (40 51) 57 (65 72 90 97 81)

↓↓

(23 11) 34 (40 51) 57 (65 72 81 97 90)

(23 11) 34 (40 51) 57 (65 72) 81 (97 90)

At this point, as with mergesort, we have recursed down to subarrays of length

two, and we can simply compare and exchange. The array has been sorted.

(11 23) 34 (40 51) 57 (65 72) 81 (90 97)

This was an example created to be a best-case example. We split 11 elements

into a pivot and two 5-element subarrays, and each subarray was further split into

a pivot and two 2-element subarrays.

What allows us to get a linear time for the array-splitting, however, is that we

have traversed the entire array only once for each splitting phase of the algorithm.

We move pointers and compare elements once in each subarray, and all the subarrays

are nonoverlapping, so we never actually look at any particular element more than

once. This gives us worst-case linear time for each phase.

To see how a worst-case quicksort would execute, consider the following sorted

array.

1234567891011

When we start moving pointers in the ﬁrst phase, we ﬁnd that the pivot element

is larger than the rest of the array, and one of our two subarrays is empty. After

the ﬁrst phase, we have produced the unsatisfying tableau

(12345678910)11 ().

11.8 Quicksort 265

In the next phase, this continues to be true. If we choose 10, the last element,

as our pivot, then after the entire second phase we have

(123456789)10() 11.

After the third phase we have

(12345678)9() 10 11

and so on. Since we are ﬁxing only one element per phase into its proper location,

we will need nine phases (assuming that when only two elements remain we can do

a comparison and return).

The diﬀerence between the best-case O(N lg N) and worst-case O(N

) running

times clearly lies in our choice of the pivot element. What we would like to have is

an oracle that would tell us which element chosen as pivot would split the array in

half, but if we were that good at predicting the future, we’d be working on Wall

Street instead of learning how to write programs and design algorithms. There are

variations of quicksort that work to mitigate the performance degradation from

pathologically bad input data by choosing a better pivot element. For example,

instead of choosing the last element in the array every time, one could sample three

values from throughout the array and take the middle as the pivoting value. In

the case of already-sorted data, a random sample would be more likely to return

a better pivot; if we did this in the preceding array, we would get a pivot value

of 6 for the ﬁrst phase, which would be clearly superior. On the other hand, for

any such sampling scheme, there will be some pathological worst case, so no pivot

choice method will be able to overcome the fact that quicksort has O(N

) running

time when presented with the worst case of data input for that choice.

 11.8.2 Average-Case Running Time for Quicksort

The proof of the following important theorem is beyond the scope of this book, but

the statement of the theorem is not.

 Theorem 11.1

The average-case running time of quicksort is O(N lg N).

We will show later in this chapter that no sorting algorithm that sorts by

making comparisons can sort asymptotically faster than O(N lg N). That result

coupled with this one is tremendously important. Together we thus know that

266 CHAPTER 11 SORTING

although the worst-case behavior of quicksort is as bad as the “naive” bubblesort

and insertionsort, the quicksort algorithm actually does as well as any algorithm

could possibly do when presented with a random set of input data.

We will also discuss later why quicksort is a preferred sort in terms of using the

physical hardware eﬃciently. It is the eﬃcient use of hardware plus the best-possible

average-case running time that make quicksort the standard sorting algorithm in

most implementations.

 11.9 Sorting Without Comparisons

All the sorting methods we have just described require comparisons. Under normal

circumstances, this is what we count when we consider the asymptotics for sorting

data. It is possible, though, under certain very restricted circumstances, to be able

to sort without comparisons. For example, if we happen to know that all the keys

have values in the range 1 through 100, we can sort the records using a bucketsort.

In such a bucketsort, we would set up 100 buckets of variable size, and instead

of comparing keys, we would simply look at the key value and store the record in

the appropriate bucket for that key.

Dealing with duplicate keys in a bucketsort requires further knowledge of what

is actually going on. In some cases, we don’t care whether there is one element or

many with a given key, because we need only know that some element occurred

with that key. For such an application, we need only keep a boolean value in

the array.

For other applications, we might be able to assume that all records with the same

key are identical. If we are sorting mail by zip code for a particular metropolitan

area so as to get bulk mail discounts, then this is a safe assumption because we

would probably be physically sorting mail into actual bins. Even if it is not a

physical sort, it may well be the case that it does not matter in which order records

with the same key value are stored. If it does matter, then it might still be the case

that the number of elements with the same key is small and we could use a linked

list with an insertionsort to store the elements in each bucket.

Unfortunately, this kind of sort is usually not possible because it requires that

we know in advance all the possible values and that we be able to set up a bucket

of arbitrary size for each possible key value. In the cases when it can be used, it

runs in linear time O(N) because we need only look at the record once and place

it in the correct bucket.

For the most part, though, we will be looking at sorting algorithms that require

comparisons, and in our asymptotic analysis we will measure the quality of an

algorithm by counting the number of comparisons.

11.10 Experimental Results 267

 11.10 Experimental Results

In Volume 3 of The Art of Computer Programming, Knuth presents empirical data

on the performance of various sorting algorithms as he had implemented them. We

have done our own simple implementations of bubblesort, insertionsort, heapsort,

and mergesort in Java, and have counted the number of comparisons and the num-

ber of swaps when run on data arrays of lengths 127, 511, 2,047, 8,191, 32,767, and

113,071. We generated “reasonably random”

arrays of these lengths of integers less

than 99,991 (the largest prime smaller than 100,000).

The raw data from these tests are presented in Figure 11.5. We show the number

of comparisons made and the number of swaps for each of the four sorting methods

on our six (ostensibly) random data sets.

Bubblesort Insertionsort

Data Comps Swaps Comps Swaps

127 8,001 4,024 4,146 4,024

511 130,305 64,184 64,690 64,184

2,047 2,094,081 1,058,276 1,060,314 1,058,276

8,191 33,542,145 16,804,307 16,812,488 16,804,307

32,767 536,821,761 268,317,072 268,349,824 268,317,072

131,071 8,589,737,985 4,054,936,240 4,296,967,076 4,296,836,017

Heapsort Quicksort

Data Comps Swaps Comps Swaps

127 1,423 836 930 215

511 7,826 4,416 4,883 1,100

2,047 39,431 21,707 24,596 5,430

8,191 190,910 103,572 121,762 25,365

32,767 894,810 479,714 594,064 116,024

131,071 4,103,982 2,181,680 2,665,459 523,552

FIGURE 11.5  Raw counts of comparisons and swaps.

We have theorems that claim that the worst-case running times for these four

sorts are O(N

)andO(N lg N). If our “random” data produced running times like

Our generation method would almost certainly not hold up under a rigorous test of random-

nicity, but we believe it is suﬃcient for this illustrative purpose.

268 CHAPTER 11 SORTING

Bubblesort/N

Insertionsort/N

Data Comps Swaps Comps Swaps

127 0.496 0.249 0.257 0.249

511 0.499 0.246 0.248 0.246

2,047 0.500 0.253 0.253 0.253

8,191 0.500 0.250 0.251 0.250

32,767 0.500 0.250 0.250 0.250

131,071 0.500 0.236 0.250 0.250

Heapsort/(N lg N) Quicksort/(N lg N)

Data Comps Swaps Comps Swaps

127 1.601 0.941 1.046 0.242

511 1.702 0.960 1.062 0.239

2,047 1.751 0.964 1.092 0.241

8,191 1.793 0.973 1.143 0.238

32,767 1.821 0.976 1.209 0.236

131,071 1.842 0.979 1.196 0.235

FIGURE 11.6  Normalized counts of comparisons and swaps.

127 511 2,047 8,191 32,767 131,0710.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Bubblesort

comps/N

Insertionsort

comps/N

Heapsort

comps/(N lg N)

Quicksort

comps/(N lg N)

FIGURE 11.7  Comparison counts for the four sorting algorithms.

11.11 Auxiliary Space and Implementation Issues 269

this, then we would have some reason to believe (but not a proof) that perhaps

these are not just worst-case running times but also average-case running times.

In fact, that’s exactly what we see. In Figure 11.6 we divide these counts by

and by N lg N as appropriate so as to estimate the constant C in the asymp-

totic estimate. The data of Figure 11.6 is graphed in Figure 11.7. It is reassuring

to notice that as the data size increases, the “constant” is in fact reasonably con-

stant, although there is some upward creep in both heapsort and quicksort. Our

earlier estimate was that heapsort as we implemented it would require 2N lg N

comparisons, so we would expect the constant to stabilize at no larger than 2.0.

One obvious observation is that quicksort is indeed faster than heapsort on this

(allegedly) random data. In addition to the implementation advantage that quick-

sort has, it would appear that on random data it also has an asymptotic advantage

as well. If we accept the truth of Theorem 11.1, we should not be surprised at

the graph for quicksort, but it’s nonetheless reassuring to have observations ﬁt

the theory.

 11.11 Auxiliary Space and Implementation Issues

When analyzing a sorting algorithm, the most prominent issue is the asymptotic

running time. In theory, this is the most important aspect of a choice of sorting

methods. Asymptotic running time, however, is not the only criterion to be consid-

ered. Indeed, if one is actually going to be sorting data,

then all of the real-world

considerations and characteristics of the problem to be solved must be brought into

the discussion. The U.S. Internal Revenue Service, for example, has a very diﬀerent

problem when sorting some 200 million tax records that live on disk than does an

Internet server that has to take in packets in real time and sort all the packets for a

particular transmission in order to deliver them to the eventual recipient. We don’t

expect the IRS to be sorting constantly, but when it does it will sort a large number

of items. The packet server, in contrast, is sorting a small number of items, but it’s

doing a new sort on a new message all the time.

Further, data that is known to be almost sorted apriorishould probably be

sorted diﬀerently from data about which nothing is known. Data to be sorted on a

large mainframe might well be sorted diﬀerently from data on a desktop or laptop.

Unless one is an algorithmist or theoretical computer scientist, the goal in imple-

menting a sorting algorithm is probably to be able to sort data, and the real world,

however messy, must intrude.

As opposed to proving theorems about how one might sort data

270 CHAPTER 11 SORTING

 11.11.1 Space Considerations

The ﬁrst implementation issue to be considered is whether extra space is needed for

the sorting algorithm. In the case of bubblesort, insertionsort, heapsort, and quick-

sort, no extra space is needed other than the one extra location necessary for tempo-

rary storage in doing a swap of two records. The array in which the data are stored

at the beginning of the sort is the array in which the data are stored at the end.

In the case of mergesort, extra space is required, and the amount of extra space is

O(N), equal to the size of the original array. On the one hand, this is a major burden

on mergesort if all the data must be kept in memory. On the other hand, mergesort is

the one sorting algorithm, of the several we have discussed, that is highly suitable for

disk-to-disk sorting of very large ﬁles. One characteristic of mergesort that is highly

desirable for this purpose is that all the data are accessed in a purely sequential

way. Two (large) ﬁles are brought in from disk sequentially for comparisons and

the result array is written exactly in sequence to the output ﬁle. Since a disk is

at its best when streaming data to the processor, this match between best use of

the hardware and the needs of the sorting algorithm make mergesort a favorite for

large-ﬁle sorts. This is in sharp contrast to heapsort, whose access pattern goes all

over the array, pulling up entries n,2n,and2n + 1 that will be located in very

diﬀerent places on disk. Even quicksort is likely to perform poorly in a disk-to-disk

implementation because it requires accessing and then rewriting (or not) at speciﬁc

locations that may or may not be contiguous on disk.

 11.11.2 Memory

Of the three O(N lg N) sorting algorithms we have discussed—mergesort, heapsort,

and quicksort—we can generally decide that mergesort is not the best general-

purpose sort because it requires O(N) extra space. Between heapsort with guaran-

teed O(N lg N) running time and quicksort that is only guaranteed on average to

run in O(N lg N) time, it is somewhat surprising at ﬁrst that quicksort has become

the standard sort found in software libraries, but that is in fact true.

Part of the reason for quicksort’s success is that the memory access patterns for

quicksort are far superior than those for heapsort given the constraints of standard

computer architectures. We normally only care about the running time of sorts when

we have large arrays to sort, so we assume, in dealing with real-world constraints,

that substantial memory space is needed for our array. The feature of quicksort

that makes it superior to heapsort for general applications is that quicksort usually

runs largely in cache, not in main memory.

Cache, which these days is usually on the processor chip, is a smaller, inter-

mediate memory space between the processor’s registers and logic units and the

RAM that lives elsewhere on the motherboard. For various reasons including the

11.11 Auxiliary Space and Implementation Issues 271

fact that cache is actually on the processor chip, access to cache is much faster

than access to main memory. This comes at a price in real dollars, though, and

while many desktops are conﬁgured with some number of gigabytes of RAM, cache

is still measured in megabytes, with as much as 8 MB showing up only in current

high-end processors. For the most part below the level that any programmer can

control (even in assembly language), the processor loads blocks of data from the

main memory RAM into the cache either as the data is requested or in anticipation

of requests for the data. As long as the processor’s requests for data can be met

with values from cache instead of main memory, execution is faster.

It is here, with respect to cache, that heapsort’s memory access patterns work

against the algorithm. Records accessed at subscripts n,2n,and2n + 1 are likely

to be in diﬀerent blocks of memory for large values of n, so they are unlikely to be

in cache because they are only accessed once every so often. In contrast, quicksort

accesses sequential locations in the array with its moving pointers, and it rapidly

recurses down to working on subarrays that will ﬁt in cache. Once the subarray ﬁts

entirely in cache, all the processing can be done by moving data in cache, not in

main memory, with a concomitant increase in speed.

 11.11.3 Multilevel Sorts

It is not uncommon for a large general-purpose sort to be implemented as a mul-

tilevel sort with more than one sorting method. Although a mergesort is eﬀective

on large ﬁles, its eﬀectiveness is greater in the later phases when merging large,

sorted arrays than in the early phases. One might well implement a sort of large

data by ﬁrst bringing into memory blocks of data in as large a block as would ﬁt

conveniently and eﬃciently, sorting that data in memory with a quicksort, and then

writing that block of data back out to disk. By doing this subsorting in memory,

one gets the advantage of a (faster) memory-based sort on blocks of data as large

as can be managed, and one can follow this with a disk-to-disk merge of the large

blocks of sorted data.

We will see a variation of this idea later in this chapter when we discuss sorting

on parallel computers.

 11.11.4 Stability

A ﬁnal consideration for sorting data is that some sorting methods are stable and

some are not stable. A sort is stable if two records that have the same key end up

being stored in the sorted array in the same order in which they are presented in the

original array. If record i precedes record j in the original array, and if both records

have the same key, then record i should still precede record j when the array is

eventually sorted. Stability of sorting algorithms is important, for example, when

one is adding new data to an already sorted list or when one is sorting on multiple

272 CHAPTER 11 SORTING

keys. If one is sorting on last name and is adding data to an existing list, one usually

wants the new data simply inserted where it ought to go in the augmented list. It

could be especially annoying to ﬁnd that the original list of last name “Smith”

entries came out in a diﬀerent order from what they had been before just because

a new “Smith” entry was added.

Not all sorts are stable. Heapsort, for example, is pretty much guaranteed not to

be stable. Bubblesort and insertionsort are stable provided one gets the “less-than”

and “less-than-or-equal” comparisons done properly. In the preceding implementa-

tion, two records are swapped if the key of one is greater than the key of the other.

If the keys are equal, the records are not swapped, so records of equal key end up

being stored in their original order, and the sort is stable. If one wrote the sort

comparison as if(key[i] >= key[j]), the records would still end up sorted, but

the sort would not be stable.

Quicksort is stable in the same way that bubblesort and insertionsort are stable,

that is, provided one takes care not to force the exchange of records with equal keys.

In all three instances we are moving records only because they are out of order,

and we do not move records that are locally in the correct order.

 11.12 The Asymptotics of Sorting

We have seen three sorting algorithms that run in O(N lg N) time. What we will

now prove is that this is asymptotically the fastest possible running time for any

sorting algorithm that sorts by comparing keys. This is a very important result. If

the asymptotic running time of heapsort, mergesort, and quicksort (and any other

sort we can come up with that has this running time) is as good as is possible, then

the best we can do is to improve the constant out front and to improve the imple-

mentation; no further theoretical improvement is possible. As we have seen from

Figures 11.5 and 11.6, quicksort appears to have a better constant than heapsort, at

least on our experimental data. As we have discussed previously, it also has better

characteristics when implemented on modern computers. Any algorithm that would

replace quicksort as the standard would have to have an O(N lg N) running time

and a better constant on observed execution instances or better implementation

characteristics.

 11.12.1 Lower Bounds on Sorting

What we are going to prove in the next few pages is a lower bound on the cost of sort-

ing by comparisons. That lower bound will be O(N lg N). Since that is the worst-case

running time for heapsort and mergesort, we can conclude that no sorting algo-

rithm that uses comparisons will be asymptotically faster than either of these. If one

11.12 The Asymptotics of Sorting 273

algorithm is to be faster than another, it will be either because the constant out front

is smaller (remember Figure 6.1) or because the algorithm just happens to be better

suited to the way certain computers (possibly most computers) are constructed.

Consider an algorithm to sort three elements {a, b, c} by comparison of keys as a

decision tree, as in Figure 11.8. What we are going to do is show that there is a lower

bound on the number of comparisons needed to sort by comparisons. Speciﬁcally,

we will show that no comparison-based sort can possibly sort all possible input

permutations in fewer than O(N lg N) comparisons.

b < c

a < c

a < b

a < c

b < c

a, b,c

a, c,b

c, a,b

b, a,c

b, c,a

c, b,a

FIGURE 11.8  A decision tree.

What we have in the decision tree is a tree of comparisons to be applied in order

going down from the root, so that after the sequence of comparisons down any given

path, we know that the leaf provides the order of the input data.

For example, for three inputs {a, b, c}, with the six possible ways in which those

values could appear in sorted order, we could ﬁrst compare a against b.Ifa is less

than b, then we have one of the three orders a<b<c, a<c<b,orc<a<b.And

that’s all we know after that one comparison. If we next compare a against c,and

ﬁnd that a is in fact less than c, we can exclude the case c<a<band we know

we have one of the two possibilities a<b<cor a<c<b.Athirdcomparisonof

b against c is necessary to resolve this uncertainty.

If we can prove a lower bound for sorting using decision trees, then clearly the

number of levels in the tree is that lower bound because each level corresponds to

one comparison. We point out that there are many decision trees, all of which result

in being able to uniquely identify which correct sorted order is found in the data.

If we are to prove a lower bound, we have to show that no decision tree has fewer

than some minimum number of levels.

Let’s just hammer away with what we know:

• There are N! total permutations of N elements, so a decision tree for sorting

N elements must have at least N! leaves. If we don’t have a path from the root

274 CHAPTER 11 SORTING

(the ﬁrst comparison) down to every possible input permutation, then we can’t

actually sort.

• For a binary tree of k leaves and height h,wehavek ≤ 2

,thatis,h ≥ lg k.

• Thus a decision tree with N! leaves must have height at least lg(N!).

• The lower bound for number of comparisons is the lower bound for the height

of the decision tree.

• So the lower bound for sorting by comparing keys is lg(N!).

It may take a moment to understand why this guarantees a lower bound and not

just a lower bound for this instance or this representation of the problem. We have

chosen to compare a with b in the top level, and then made other choices as well.

If we chose to compare a with c at the top level, we could produce a decision tree

that was merely the result of exchanging b for c throughout the tree. Any other

decision tree is merely the result of permuting the input of the three elements a, b,

and c, not a change to the algorithm itself.

Next, we can observe that all these comparisons are necessary. With each com-

parison, we have a set of possible permutations for either of the two outcomes of the

comparison. If we take the right path for the ﬁrst two levels, we know that a is not

less than b and that a is not less than c. We have at this point no information as to

the relative position of b and c.Bothb ≤ c ≤ a and c ≤ b ≤ a are possible, and since

this is a sorting algorithm, any possible ordering of the three inputs is a possible

input to the algorithm. Unless and until we compare b against c, we cannot deter-

mine which of the two remaining possibilities is correct. Unless we have a decision

tree with at least as many leaves as there are possible input possibilities, we will

not have a path of decisions that will uniquely identify each of the possible inputs.

 11.12.2 A Proof of Lower Bounds

We now know that it takes at least O(lg(N!)) comparisons to sort N items. The

question is, exactly how big is lg(N!)? The answer, at least for today’s purposes,

comes from Stirling’s formula that we introduced in Chapter 6.

 Theorem 11.2

Sorting N items by comparisons takes at least O(N lg N) comparisons.

Proof By what we have just done in the preceding section, what we need to show

is that

lg(N!) = O(N lg N).

11.12 The Asymptotics of Sorting 275

From Stirling’s formula we have

N!=

√

2πN





This means that

lg(N!) =

(1 + lg π +lgN)+N(lg N − lg e)+α

lg e

= N lg N − N lg e +

(1 + lg π +lgN)+α

lg e

≥ N lg N − N lg e

≥

N lg N (for large N)

= O(N lg N)

comparisons are needed to sort N items for large N. The ﬁrst line becomes the

second line by simply rearranging terms; the second line becomes the third because

we are throwing away terms that are known to be positive; and the ﬁnal inequality

holds because we know that N lg e = o(N lg N), so for large N we have N lg e ≤

N lg N

, and the inequality holds. Sorting cannot be done in fewer than O(N lg N)

comparisons.



 11.12.3 Lower Bounds for the Average Case

We have shown that the lower bound for sorting is O(N lg N) by showing that in

the worst case the maximal path down a decision tree has a path of length at

least lg(N!). We can actually prove something somewhat stronger, which is that

the average path length down the decision tree must be at least of length lg(N!).

What we can conclude from this is that we don’t have pathological worst cases that

might be few and far between. Quite the contrary, if we have best cases that take

less time, it is the best cases that are exceptional and few and far between.

Consider the general decision tree for sorting N items. The tree has N!leaves,

so the average path length from the root to a leaf is

sum of all path lengths to leaves

We start with a lemma about the numerator.

 Lemma 11.3

The sum of all path lengths to leaves is minimized when the tree is completely

balanced.

276 CHAPTER 11 SORTING

Proof If we disconnect any subtree from a balanced binary tree and move that

subtree so as to connect it somewhere else in a binary tree, we necessarily have

increased the sum of the path lengths.



We can now prove the theorem.

 Theorem 11.4

The average case of any sort by comparison of keys is at least

lg(N!)

comparisons.

Proof The minimal sum of all path lengths to trees is

(N!) ·(lg(N!))

so the average path length is bounded below by lg(N!).



Thus, sorting either in the worst case or in the average case takes at least

O(N lg N) comparisons. What this means is that we cannot improve upon sorting

methods such as heapsort and mergesort in the worst case, and we cannot improve

upon heapsort, mergesort, or quicksort in the average case.



11.13 Sorting in Parallel

We will close this chapter with a section on parallel computing, speciﬁcally on

how to sort on a distributed memory parallel machine. The canonical version

of a distributed memory parallel computer is a collection of standard computers

connected by as fast an interconnect as one can aﬀord. This is the model of

computation adopted by the very common Beowulf cluster computers. Each node

is an independent computer and can only access its own memory locations. If

processor A wants to get data stored in the memory of processor B, for example,

then A must send a message to B requesting data, and B must respond by sending

back the data in another message. This makes the communication of data the

bottleneck to eﬃcient computation because the overhead of sending a message

across a slow connection (possibly just Ethernet) is very large when compared

with the cost of accessing a memory location local to the computer.



11.13 Sorting in Parallel 277

In a cluster computer, a head node computer farms out identical tasks to a

collection of compute nodes, each of which is an independent computer. The com-

pute nodes process for as long as they can using the data they have available to

them locally. At some point in the computation, the compute nodes will probably

have to exchange data. In the cheapest version of a Beowulf cluster, the inter-

connect would be simply an Ethernet connection; although faster interconnects

exist, they are still much slower than is a simple memory reference local to the

computer itself.

Invariably, it is this communication of data from one processor to another that is

the bottleneck that slows down the parallel computation. In the worst-case scenario,

if one processor is collecting data from all the other processors, then the incom-

ing data ﬂood will overwhelm the bandwidth. Ideally, when we have to exchange

data, we would like to arrange things so that all processors are sending and receiv-

ing equal amounts of data because this will distribute the data equally across the

communication paths.

We note that the other kind of parallel computer is a shared memory parallel

computer in which all processors have access to all memory locations in the machine.

Clearly, this is a more powerful model of computation because it eliminates the need

for moving large quantities of data from one machine to another. However, this

power comes at a price—it is not unreasonable for the switch between processors

and memory to be the most expensive single part of the computer. It just isn’t

very easy to have lots of processors all with the ability to access huge amounts of

memory at the very high speeds and very ﬁne granularity of loading and storing

entries in memory.

The contrast between the two diﬀerent kinds of parallel computer is made very

clear when considering how to sort. Let’s consider what happens when we use four

processors labeled P

through P

to sort 32 integers, and assume that the data

is initially distributed as shown in Figure 11.9. Remember that computation by a

25 27 26 3

29 4 24 5

7 8 19 9

2 1 2 2

10 3 12 13

15 14 5 4

18 21 20 3

22 30 28 31

FIGURE 11.9  Initial conﬁguration of data for a parallel sort.

278 CHAPTER 11 SORTING

processor on its own memory locations is fast, but communication of data from one

processor to another is slow.

The ﬁrst step in the parallel sorting process is for each processor to sort its

own data, using whatever is the fastest sort for that processor. (Probably this is a

quicksort.) This results in the data shown in Figure 11.10.

2 1 2 2

7 3 5 3

10 4 12 3

15 8 19 4

18 14 20 5

22 21 24 9

25 27 26 13

29 30 28 31

FIGURE 11.10  Data after the local sort.

Now the problem is to extract the sorted lists from the processors and merge

them together to create a single completely sorted list. So far, our computation

has resembled a mergesort, and if this were a single processor doing a mergesort

on disk, we would just continue with the merge. But the rules are diﬀerent with

parallel machines, especially with Beowulf clusters. The overhead cost of the head

node’s sending messages to P

and P

, say, asking for the top two elements and

merging the two, is very high. If we are going to send messages and move data,

we want to move blocks of data; we absolutely do not want to pull oﬀ entries from

each processor one at a time. Further, if the head node is performing a merge from

and P

, then P

and P

are sitting idle, and we don’t want half of the machine

doing nothing; we want all processors working in parallel as much as possible. Note

that if we had a truly shared memory machine, then this would be a simple task

even in parallel; although there can be some contention in the lower levels of the

hardware by having multiple processors reading from the same hardware memory

banks, there is nothing algorithmic that prevents all processors in a shared memory

computer from reading diﬀerent locations in memory simultaneously.

The solution is as follows. Given that we have n = 4 processors, we let each

processor select the n − 1 = 3 breakpoint elements that break up local memory into

four pieces. In Figure 11.11, we read the boldface numbers to mean, for example,

that entries for processor P

that are less than or equal to 5 are in the ﬁrst quarter,

entries greater than 5 but less than or equal to 19 are in the second quarter, and

so forth.



11.13 Sorting in Parallel 279

2 1 2 2

7 3 5 3

10 4 12 3

15 8 19 4

18 14 20 5

22 21 24 9

25 27 26 13

29 30 28 31

FIGURE 11.11  Data after determining local breakpoints.

Each processor now sends its samples to the head node, so the head node has

the data in Figure 11.12.

7 15 22 3 8 21 5 19 24 3 4 9

FIGURE 11.12  Head node sample data before sorting.

The head node now sorts these sampled values, and then it splits its sample data

into four quarters, as indicated by the bold entries in Figure 11.13.

3 3

5 7

9 15

21 22 24

FIGURE 11.13  Head node sample data after sorting.

The head node now broadcasts the boldface sample values (4, 8, and 19) to all

processors. Each processor now sends all entries less than or equal to 4 to processor

, all entries greater than 4 but less than or equal to 8 to processor P

, all entries

greater than 8 but less than or equal to 19 to processor P

, and all entries greater

than 19 to processor P

. This results in the situation illustrated in Figure 11.14. In

this tableau we have listed the “local” data ﬁrst and then the entries that would

have been received in processor subscript order, but this is arbitrary, since we cannot

usually rely on a cluster computer to send data in predictable ways. For example,

for P

, the entry 31 comes from the processor locally, the entries 22, 25, and 29

come from P

, the entries 21, 27, and 30 come from P

, and the entries 20, 24, 26,

and 28 come from P

Now, each processor sorts its own data locally, using whatever is its fastest sorting

algorithm, to produce the ﬁnal tableau in Figure 11.15.

280 CHAPTER 11 SORTING

2 8 12 31

1 7 19 22

3 5 10 25

4 5 15 29

2 18 21

2 14 27

3 9 30

3 13 20

4 24

FIGURE 11.14  Data after the global sends.

1 5 9 20

2 5 10 21

2 7 12 22

2 8 13 24

3 14 25

3 15 26

3 18 27

4 19 28

4 29

FIGURE 11.15  Final tableau of data.

We can now stream the data in sorted order to the head node in processor

subscript order.

 11.13.1 Analysis

Let’s step back a moment and think about what we have done, what will make this

algorithm work, and why this is a better approach than other ideas for sorting on

distributed memory computers.

First, we have been able to use the parallelism of the computer. Except for the

time spent by the head node in sorting the samples, all the processors were working

and were working in parallel. That’s a good thing.



11.13 Sorting in Parallel 281

Second, the time spent by the head node in sorting, when the other processors

are sitting idle, is probably relatively short. Each of n processors sends n −1val-

ues to the head node. For a 1024-processor machine (which is large but not huge),

this results in the head node’s having to sort about a million items. Given that

each compute node probably had gigabytes of storage, each local node was prob-

ably sorting much more than a million items, so the head node’s sort is small by

comparison.

The third consideration is the data movement. If there were n processors, then

each processor had to send nearly all its data to other processors, so in essence we

were moving all the data exactly once. But we were moving all the data simultane-

ously to all the other processors. We can’t do much about the fact that the overall

network might have been bandwidth limited, but we didn’t create any hot spots by

moving all the data on a single wire to a single machine.

Finally, let’s think about whether this actually works in practice. If the data get

distributed randomly to the various processors, then each processor’s data will be

similar. This means the breakpoints sent to the head node will be similar. When

the head node sorts to produce a set of global breakpoints, these breakpoints sent

back to the local processors will be fairly similar to what the local processors

already had. This means that we will in fact be sending about 1/nth of the data to

each of the other processors, thus balancing the load on the network. There will be

some imbalance in the size (look again at Figure 11.15, in which P

is underloaded

and P

is overloaded), but it ought not to be too bad if the data are essentially

randomly distributed.

Contrast this with the worst-case situation (at least for the load balancing of the

data movement) in which the data are already sorted when they get distributed to

the local processors, but the distribution is in reverse order. If the original conﬁgura-

tion, for example, is as in Figure 11.16, then each processor winds up having to send

all its data to a single other processor, and there is no real balancing of the load.

25 17 9 1

26 18 10 2

27 19 11 3

28 20 12 4

29 21 13 5

30 22 14 6

31 23 15 7

32 24 16 8

FIGURE 11.16  Worst-case parallel sort tableau.

282 CHAPTER 11 SORTING

In general, sorting is the worst possible kind of computation to be done on a

parallel machine. No matter where we initially place a data element on a local

processor, there is some input data set that would require the element eventually

to wind up at any other processor. In many parallel computations in physics, for

example, there is a known locality of reference. Seattle’s weather, for example,

has little eﬀect on Miami, so one can safely task one processor with computing

Seattle weather and one with computing Miami weather and not expect there to be

much data movement between the two. But in sorting, all possible conﬁgurations

could happen, This makes controlling the worst case of data movement the primary

problem for a cluster computation.

As a general rule, the methodology used in this parallel sorting algorithm is the

norm for cluster computing. If we cannot aprioriforce data and computation to

be localized, then we rely on statistics to try to balance the load. If we suspect that

our data might already be partially sorted, then we would not choose to pivot in a

quicksort on the last element in the array but would instead do a quick sampling

of the array to pick a value we hope would be closer to the middle. Similarly, we

sample locally in this parallel sort and then sort the samples, in hopes of producing

global breakpoints that will break the entire data set into equal-sized chunks. If we

cannot rely on deterministic predictions of what the data will look like, then we

must rely on statistical analysis and hope for reasonably random inputs.

 11.14 Summary

We have in this chapter provided a thorough introduction to the standard sort-

ing algorithms of bubblesort, heapsort, insertionsort, mergesort, and quicksort. We

have done both a worst-case running time analysis and an average-case running

time analysis of these algorithms, and discussed the nature of average-case analysis

compared to the (usually easier) worst-case analysis.

Having done an asymptotic analysis, we have gone further to consider the prac-

tical aspects of sorting, such as the use of auxiliary space, disk-based sorting for

very large data sets, the extent to which the input data is already sorted, multi-

level sorting, and stability. These considerations often aﬀect the choice of a sorting

algorithm other than the theoretically best algorithm. Indeed, the common use of

quicksort as the default algorithm is not because its worst-case running time is opti-

mal, but because its average-case running time is O(N lg N), the pathological cases

seem very rare, and it can be implemented more eﬃciently on modern computers

than the worst-case asymptotically better heapsort algorithm.

Finally, in a starred section, we have discussed sorting on a distributed memory

parallel computer. This is an example of the use of the assumption of statistically

random data in an algorithm that can be shown to be eﬃcient if that assumption

holds.

11.15 Exercises 283

 11.15 Exercises

1. Given the following sequence of integers, show the ﬁrst three levels of the

mergesort from the bottom up (that is, ﬁrst comparing adjacent pairs).

33, 55, 2, 19, 92, 18, 45, 74, 88, 27, 51, 6, 91, 22, 44, 13

2. If we choose the last element in the (sub)array as the pivot element, show

the ﬁrst two complete phases of quicksort. The ﬁrst phase should use the last

element as the pivot and result in two subarrays. The second phase should use

the last element in each subarray as the pivot and result in four subarrays.

33, 55, 2, 19, 92, 18, 45, 74, 88, 27, 51, 6, 91, 22, 44, 13

3. Repeat the previous exercise but choose the ﬁrst element as the pivot instead

of the last.

4. In the best possible case, what is the minimum number of comparisons neces-

sary in order to determine the proper order of n integers?

What does this tell you about the minimum distance from the root to a leaf

in a decision tree for sorting?

5. Consider the possible input sequence of data items:

Freshman Tyrone

Sophomore Tyrone

Freshman Amanda

Sophomore Amanda

We would like this to be sorted ﬁrst by class, then by name, so our desired

output is

Freshman Amanda

Freshman Tyrone

Sophomore Amanda

Sophomore Tyrone

We have said that our bubblesort as written is stable with the strict “greater

than” of Figure 11.1, and we have said that stability of sorting is important

when one is doing a multilevel sort ﬁrst on one key, then on a second key.

With the strict >, we pull the first occurrence of the least value to the top; if

we wrote ≥ instead of >, we would pull the

last occurrence of the least value

to the top.

284 CHAPTER 11 SORTING

Trace the execution of a bubblesort with ≥ replacing the >, sorting ﬁrst on

class (freshman is lower, of course) and then alphabetically on name. Demon-

strate that we don’t get a stable sort by showing that we don’t get what we

want as the output.

(In the case of bubblesort, we write > more naturally because there is no

reason to do the extra work of an exchange for an equal key. In the case of

heapsort, for example, it is not at all obvious how we might be able to force

stability.)

6. In the preceding problem, we sorted ﬁrst on class and then on name. Some

people might have argued that it would be easier to concatenate class and

name and do a single sort.

Under what circumstances might this not be such a simple thing to do

or perhaps not a desirable thing to do? (Hint: think Java, object-oriented

programming, reusable code, etc.)

7. The table in Figure 11.6 suggests that our insertionsort takes about 0.25N

comparisons and our quicksort takes about 1.1N lg N comparisons for small

N. If we accept these constants as the actual values, how large does N have

to be for quicksort to take fewer comparisons? Given that quicksort is a more

complicated program, does this suggest that we should switch to insertionsort

when the subarrays in quicksort get suﬃciently small?

8. Implement a complete mergesort from the bottom up. Include with your sorted

output a counter of the number of comparisons.

9. Implement a complete mergesort recursively from the top down. Include with

your sorted output a counter of the number of comparisons.

10. Implement a complete quicksort as a recursive computation. Make sure that

you do not create new arrays and thus waste memory but rather are working

on subscripts pointing into an array that is declared as an instance variable

for the class. Include with your sorted output a counter of the number of

comparisons and compare this to Figure 11.6.

11. Implement quicksort, but this time, if the subarray is of length at least 9,

sample the ﬁrst element, the last element, and the middle element, and choose

the middle value as the pivot. Count the comparisons and see if you do any

better than either your previous implementation or Figure 11.6. (Don’t worry

too much about rounding up or down for the pivot or about rounding up or

down to get the middle subscript. Statistically this shouldn’t matter, although

you might see if it does on the sample data you use.)

12. Implement a sort on an ArrayList of Record instances (from Figure 3.1) that

will allow you to sort either by name, by course being taught, or by oﬃce

11.15 Exercises 285

number. The oﬃce number is formatted with the leading character as the

ﬂoor number, the second as the building wing, and then the room number,

and you want to sort by ﬂoor, then by wing, then by number. Use an interface

to allow a comparison method to accept as parameters both a Record value

and a constant to direct which ﬁeld is to be used for sorting.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11 Sorting

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 11 Sorting