Entire books have been written on sorting and searching with computers. We introduce the topic here only to stress, once again, that writing programs is not the target of computer science; solving problems efficiently and effectively with the limited resources found in a computer is the real goal.

It turns out that computers spend a tremendous amount of time sorting. Just as we discussed different algorithms for computing prime numbers, we will now discuss three basic, comparison-based sorting algorithms. None of these are truly efficient. Efficient comparison-based sorting is beyond the scope of this introductory text. Additionally, we introduce a radix sort, one that capitalizes on the nature of the elements stored, rather than individual comparison between elements.

The sorting problem is described as follows:

Given a list of elements provided as input in any arbitrary order, these elements having an established ordinal value, namely a collating sequence, reorder them so that they appear according to their ordinal value from lowest to highest.

Gem of Wisdom

A common conjecture is that computers around the world spend the majority of their time sorting. Hence, it is difficult to talk much about computer science without talking about sorting. There are many sorting approaches.

For example, consider the following list of numbers as input: 5, 3, 7, 5, 2, 9. A sorted output corresponding to this input is: 2, 3, 5, 5, 7, 9.

In the following subsections, we will describe three comparison-based sorting algorithms and briefly compare them to demonstrate how to determine which one is the best. The implementation of each sorting algorithm will be presented in the context of grades for a final exam in a programming class. We want to provide a sorted list of the final scores, shown as percentages, to the students given an unsorted list. In each case, we describe the algorithm in plain language and then provide a corresponding Ruby implementation. Once the algorithms are presented, we discuss how we measure the notion of “best.”

7.1.1 Selection Sort

Selection sort is the simplest to explain and the most intuitive. Imagine you have a deck of cards in your hand, and they have numbers on them. If you wanted to sort them, one easy way is to just select the smallest number in the deck and bring it to the top. Now repeat the process for all cards other than the one that you just did. If you repeated this process until the entire deck was selected, you would end up with a sorted deck of cards. The algorithm just described is selection sort.

The selection sort algorithm is formally defined as follows:

  1. Start with the entire list marked as unprocessed.

  2. Find the smallest element in the yet unprocessed list; swap it with the element that is in the first position of the unprocessed list; reset the unprocessed list starting with the second element.

  3. Repeat step 2 for an additional n – 2 times for the remaining n – 1 numbers in the list. After n – 1 iterations, the n th element, by definition, is the largest and is in the correct location.

    We’ve already discussed arrays, so our Ruby code will first initialize an array and populate it with randomly generated numbers. The rand(x) function, where x is an integer, returns a randomly generated integer in the range [0, x].

The Ruby code for the selection sort is given in Example 7-1.

Example 7-1. Code for selection sort
     1 # Code for selection sort
     2 # 35 students in our class
     3 NUM_STUDENTS = 35
     4 # Max grade of 100%
     5 MAX_GRADE = 100
     6 num_compare = 0 
     7 arr = Array.new(NUM_STUDENTS)
     8 
     9 # Randomly populate arr
    10 for i in (0..NUM_STUDENTS - 1)
    11   # Maximum possible grade is 100%, keep in mind that rand(5) returns 
           possible values 0-4, so 
           we must add 1 to MAX_GRADE
    12   arr[i] = rand(MAX_GRADE + 1)
    13 end
    14 
    15 # Output current values of arr
    16 puts "Input list:"
    17 for i in (0..NUM_STUDENTS - 1)
    18   puts "arr[" + i.to_s + "] ==> " + arr[i].to_s
    19 end
    20 
    21 # Now let's use a selection sort. We first find the lowest number in the 
    22 # array and then we move it to the beginning of the list
    23 for i in (0..NUM_STUDENTS - 2)
    24   min_pos = i
    25   for j in (i + 1)..(NUM_STUDENTS - 1)
    26     num_compare = num_compare + 1
    27     if (arr[j] < arr[min_pos])
    28       min_pos = j
    29     end
    30   end
    31   # Knowing the min, swap with current first element (at position i)
    32   temp = arr[i]
    33   arr[i] = arr[min_pos]
    34   arr[min_pos] = temp
    35 end
    36 
    37 # Now output the sorted array
    38 puts "Sorted list:"
    39 for i in (0..NUM_STUDENTS - 1)
    40   puts "arr[" + i.to_s + "] ==> " + arr[i].to_s
    41 end
    42 
    43 puts "Number of Comparisons ==> " + num_compare.to_s  
  • Lines 3 and 5 declare important constants that represent the problem. If the number of students in the class changes, we have to change only one constant.

  • Line 7 initializes an array called arr that will hold the randomly generated numbers and ultimately the sorted list.

  • Lines 10–13 step through the array arr and initialize each element to a randomly generated number in the range [0, MAX_GRADE].

  • Lines 17–19 output the initial list so that you can examine its contents. Comment this out if you want to try a large set of numbers to sort.

  • Line 23 is where the real work begins.

  • Lines 23–35, the outer loop, ensure that we repeat the core of step 1 n – 2 times.

  • Line 24 is the first line of finding the minimum value in the list. We set the first position of the unprocessed list to min_pos.

  • Lines 25–30 iterate through the rest of the unprocessed list to find a value smaller than the item located at position min_pos. If we find such a value, as in line 27, we update the value of min_pos as in line 28. Once we have found the minimum value, we perform the latter part of step 2 and swap it with the first position in the unprocessed list. The outer loop repeats until the entire list is sorted.

  • Line 26 counts the number of comparisons performed and is simply here for pedagogical purposes to determine the best sorting algorithm.

  • Lines 38–43 output the sorted list and the number of comparisons.

Gem of Wisdom

Selection sort works by repeatedly finding the lowest remaining number and bringing it to the top. Selection sort is explained first since intuitively it is the easiest to understand. If you are confused by Example 7-1, come back to it after a break. Please do not just skip past it and hope that the rest of the chapter gets easier. It does not.

7.1.2 Insertion Sort

Insertion sort is a little trickier than selection sort. Imagine once again that you have a deck of cards and that you are given an additional card to add to this deck. You could start at the top of your deck and look for the right place to insert your new card. If you started with only one card and gradually built the deck, you would always have a sorted deck.

The insertion sort algorithm is formally defined as follows:

Step 1: Consider only the first element, and thus, our list is sorted.
Step 2: Consider the next element; insert that element into the proper position in the already-sorted list.
Step 3: Repeat this process of adding one new number for all n numbers.

The Ruby code for an insertion sort is given in Example 7-2.

Gem of Wisdom

Insertion sort works by leaving the first element alone and declaring it as a sorted list of size 1. The next element is inserted into the right position in our newly sorted list (either above or below the element we started with). We continue by taking each new element and inserting it in the right position in our list. By the end, all of our insertions result in a single sorted list.

Example 7-2. Code for insertion sort
     1 # Code for insertion sort
     2 # Declare useful constants
     3 NUM_STUDENTS = 35
     4 MAX_GRADE = 100
     5 num_compare = 0 
     6 arr = Array.new(NUM_STUDENTS)
     7 
     8 # Randomly populate arr
     9 
    10 for i in (0..NUM_STUDENTS - 1)
    11   arr[i] = rand(MAX_GRADE + 1)
    12 end
    13 
    14 # Output randomly generated array
    15 puts "Input array:"
    16 for i in (0..NUM_STUDENTS - 1)
    17   puts "arr[" + i.to_s + "] ==> " + arr[i].to_s
    18 end
    19 
    20 # Now let's use an insertion sort
    21 # Insert lowest number in the array at the right place in the array
    22 for i in (0..NUM_STUDENTS - 1)
    23   # Now start at current bottom and move toward arr[i]
    24   j = i
    25   done = false
    26   while ((j > 0) and (! done))
    27     num_compare = num_compare + 1
    28     # If the bottom value is lower than values above it, swap it until it
    29     # lands in a place where it is not lower than the next item above it
    30     if (arr[j] < arr[j - 1]) 
    31       temp = arr[j - 1]
    32       arr[j - 1] = arr[j]
    33       arr[j] = temp
    34     else
    35       done = true
    36     end
    37     j = j - 1
    38   end 
    39 end
    40 
    41 # Now output the sorted array
    42 puts "Sorted array:"
    43 for i in (0..NUM_STUDENTS - 1)
    44   puts "arr[" + i.to_s + "] ==> " + arr[i].to_s
    45 end
    46 puts "Number of Comparisons ==> " + num_compare.to_s
  • Lines 22–39 contain the core outer loop that inserts the next number in the list into the right place.

  • Lines 26–38 contain the inner loop that swaps numbers starting at the beginning of the unsorted list until the number falls into the right place.

  • Once the number is in the right place, the flag done is set to true in line 35.

7.1.3 Bubble Sort

Bubble sort is based on percolation; that is, elements successively percolate to the right order by swapping neighboring elements. This is like continuously and repetitively comparing adjacent pairs of cards within your deck.

The bubble sort uses two relatively straightforward loops. The outer loop ensures that the core process in the inner loop is repeated n – 1 times. The core process is to loop through the list and, for any successive elements in the list, check the following: if the value we are currently examining is larger than the next member of the list, simply swap those two values. Thus, each value will fall down the list into its proper place. Essentially, small values “bubble” to the top of the list, hence the name “bubble sort.”

The bubble sort algorithm is formally defined as follows:

Step 1: Loop through all entries of the list.
Step 2: Compare each entry to all successive entries and swap entries if they are out of order.
Step 3: Repeat this process a total of n – 1 times.

The Ruby code is given in Example 7-3. An efficiency optimization (not shown) terminates the processing once no swaps occur. This conceptually does not affect the efficiency of the sort, but typically does so in practice.

Gem of Wisdom

Bubble sort is a little tricky. It is not how people would likely sort. The premise is that if we repeatedly place successive elements in order, eventually the smallest element will bubble up to the top. It is clever and sometimes is more efficient than the other algorithms we have discussed. So it is worth knowing. Take some time and step through this code.

Example 7-3. Code for bubble sort
     1 # Code for bubble sort
     2 NUM_STUDENTS = 35
     3 # Max grade of 100%
     4 MAX_GRADE = 100
     5 num_compare = 0 
     6 arr = Array.new(NUM_STUDENTS)
     7 
     8 # Randomly put some final exam grades into arr
     9 
    10 for i in (0..NUM_STUDENTS - 1)
    11   arr[i] = rand(MAX_GRADE + 1)
    12 end
    13 
    14 # Output randomly generated array
    15 puts "Input array:"
    16 for i in (0..NUM_STUDENTS - 1)
    17   puts "arr[" + i.to_s + "] ==> " + arr[i].to_s
    18 end
    19 
    20 # Now let's use bubble sort. Swap pairs iteratively as we loop through the 
    21 # array from the beginning of the array to the second-to-last value
    22 for i in (0..NUM_STUDENTS - 2)
    23   # From arr[i + 1] to the end of the array
    24   for j in ((i + 1)..NUM_STUDENTS - 1)
    25     num_compare = num_compare + 1
    26     # If the first value is greater than the second value, swap them
    27     if (arr[i] > arr[j])
    28       temp = arr[j]
    29       arr[j] = arr[i]
    30       arr[i] = temp
    31     end
    32   end
    33 end
    34 
    35 # Now output the sorted array
    36 puts "Sorted array:"
    37 for i in (0..NUM_STUDENTS - 1)
    38   puts "arr[" + i.to_s + "] ==> " + arr[i].to_s
    39 end
    40 puts "Number of Comparisons ==> " + num_compare.to_s
  • Lines 22–33 contain the core algorithm.

  • Lines 24–32 contain the inner loop that swaps all elements that are larger than their next successive element.

7.1.4 Radix Sort

The radix sort is very different from the others. The sorting algorithms we have discussed compare the entire number with other numbers in the list and ultimately make a decision as to where an element belongs based on its number. The radix sort works by sorting the list by each successive digit. The idea is that if we first sort all the units or ones digits in a list and then sort all the tens digits and so on, ultimately, when we run out of digits, we will have a sorted list. Sorting by a single digit can be done by running one of the three sorting algorithms we have discussed. It can also be done by storing all values that match the digit in an array. We use this method so that the algorithm ends up looking very different from the algorithms we have already discussed.

Let’s make sure we are clear about the idea of sorting values one digit at a time.

Consider a list of values:

47
21
90

Now let’s sort them based only on their rightmost digit. The rightmost digits are 7, 1, 0. We can sort these as 0, 1, 7. Now let’s look at our list:

90
21
47

It is clearly not in sorted order (but at least the rightmost digit is nicely sorted).

Now we move on to the next digit. It is 9, 2, 4. Sorting this, we obtain 2, 4, 9. Here is the list:

21
47
90

It is now sorted. You may wonder why we start at the rightmost digit. The reason is that we know every number has at least one digit, so we can start there. Some numbers may be bigger or smaller than others, so we have to start at the right and work our way to the left. Now let’s consider the use of a hash. For the same example, we start with:

47
21
90

Gem of Wisdom

In radix sort, unlike the other sorting algorithms discussed, no comparison of elements is made. Instead, radix sort repeatedly sorts elements digit by digit, commencing from the rightmost digit until the last digit is done. Radix sort illustrates that there are numerous unique approaches to the sorting problem; thus, investigate alternatives rather than simply selecting the first solution that comes to mind.

Now let’s make a hash bucket for each possible digit, so we have a bucket for 0, a bucket for 1, and finally a bucket for 9.

We read the rightmost digit and put it into the correct bucket. This results in:

0 90
1 21
7 47

We now read the buckets in order from 0 to 9 and output all values in the bucket to continue the sort. This yields:

90
21
47

This is the same place we were at when we sorted the rightmost digit. This works because we process the hash buckets in order from 0 to 9. Now we repopulate our hash buckets with the tens digit. We obtain:

2 21
4 47
9 90

Reading the buckets in order gives us our sorted result of 21, 47, 90.

To review, we are building a hash of the following form:

0 [Array of matching values for the digit 0]
1 [Array of matching values for the digit 1]
...
9 [Array of matching values for the digit 9]

It can be seen that our hash of 10 entries (one for each digit) points to an array of matches for that specific digit. Note that this works because we know we are sorting only one digit at a time, and we know the full set of valid digits.

The code for radix sort is shown in Example 7-4.

Example 7-4. Code for radix sort
     1 # Code for radix sort
     2 NUM_STUDENTS = 35
     3 MAX_GRADE = 100
     4 arr = Array.new(NUM_STUDENTS)
     5 
     6 # Randomly put some grades into the array *as strings*
     7 for i in (0..NUM_STUDENTS - 1)
     8   arr[i] = rand(MAX_GRADE + 1).to_s
     9 end
    10 
    11 # Output array and find the maximum number of digits in the generated array
    12 puts "Input array: "
    13 max_length = 0
    14 for i in (0..NUM_STUDENTS - 1)
    15   puts "arr[" + i.to_s + "] ==> " + arr[i]
    16   if arr[i].length > max_length
    17     max_length = arr[i].length
    18   end
    19 end
    20 puts "Max length ==> " + max_length.to_s
    21 
    22 # Add 0 padding based on the max length, simplifying the sort algorithm
    23 for i in (0..NUM_STUDENTS - 1)
    24   arr[i] = arr[i].rjust(max_length, "0")
    25 end
    26 
    27 # Now let's use a radix sort. Go through each digit and 
    28 # add each element to an array corresponding to the digits.
    29 for i in (0..max_length - 1)
    30   # Clear out and reset the bucket
    31   buckets = Hash.new()
    32   for j in 0..9
    33     buckets[j.to_s] = Array.new()
    34   end
    35 
    36   # Add each number to its respective digit bucket
    37   for j in 0..NUM_STUDENTS - 1
    38     num = arr[j]
    39     digit = num[max_length - 1 - i]
    40     buckets[digit].push(num)
    41   end
    42   # Flatten the buckets into a one-dimensional array
    43   arr = buckets.values.flatten
    44 end
    45 
    46 # Now output the sorted array
    47 puts "Sorted array:"
    48 for i in (0..NUM_STUDENTS - 1)
    49   puts "arr[" + i.to_s + "] ==> " + arr[i].to_s
    50 end
  • Lines 2–9 initialize the list to be sorted as we have described in all the other sorts. Notice that we are storing the numbers as strings, rather than integers, so we can easily access the individual digits of the number. The list is initialized with random values.

  • One addition is a loop, on lines 23–25, that right-justifies the array elements in the list using the Ruby rjust function. This pads the numbers in the list with leading zeros.

    Since we are going to loop through the entries in the list digit by digit, it is crucial that all numbers contain the same number of digits. Padding with zeros in the front of the number is the best way to ensure that all numbers are of the same length.

  • Lines 29–44 contain the outer loop that processes the list one digit at a time. For each digit, the entire list will be traversed.

  • On lines 31–34, we reset the hash named bucket that we discussed in the description of the algorithm.

  • Lines 37–41 are the inner loop that adds each number to its corresponding bucket.

  • Line 43 uses two functions, values and flatten, to give a new array representing the values sorted according to the current digit we are processing. The values function returns an array of all the values in a hash, which is analogous to the keys function discussed in Section 6.3, “Hashes.” The flatten function takes a two-dimensional array and returns a one-dimensional array with the same elements, as shown in the following irb session:

    irb(main):001:0> arr = [[1, 2], [3, 4], [5, 6]]
    => [[1, 2], [3, 4], [5, 6]]
    irb(main):002:0> arr.flatten
    => [1, 2, 3, 4, 5, 6]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.107.31