A brief overview of various sorting algorithms

There are many different sorting algorithms. As I said, there are simpler and more complex algorithms and, in many cases, more complex algorithms are the ones that run faster. In this chapter, we will implement the bubble sort and quick sort. We have already implemented the bubble sort for strings in the previous chapter, so in this case, the implementation will mainly focus on the recoding for general sortable object sorting. Implementing quick sort will involve a bit of algorithmic interest.

Be warned that this section is here to give you only a taste of algorithmic complexity. It is far from precise and I am in the vain hope that no mathematician reads this and puts a curse on me. Some of the explanations are vague. If you want to learn computer science in depth, then after reading this book, find some other books or visit online courses.

When we talk about the general sorting problem, we will think about some general set of objects that can be compared and any two of them can be swapped while we sort. We will also assume that this is an in-place sort; thus, we do not create another list or array to collect the original objects in sorted order. When we talk about the speed of an algorithm, we are talking about some abstract thing and not milliseconds. When we want to talk about milliseconds, actual real-world duration, we should already have some implementation in some programming language running on a real computer.

Algorithms, in their abstract form, don't do that without implementation. Still, it is worth talking about the time and memory need of an algorithm. When we do that, we will usually investigate how the algorithm behaves for a large set of data. For a small set of data, most algorithms are just fast. Sorting two numbers is usually not an issue, is it?

In case of sorting, we will usually examine how many comparisons are needed to sort a collection of n elements. Bubble sort needs approximately n2 (n times n) comparisons. We cannot say that this is exactly n2 because in case of n=2, the result is 1, for n=3 it is 3, for n=4 it is 6, and so on. However, as n starts to get larger, the actual number of comparisons needed and n2 will asymptotically be of the same value. We say that the algorithmic complexity of the bubble sort is O(n2). This is also called the big-O notation. If you have an algorithm that is O(n2) and it works just fine for 1,000 elements finishing in a second, then you should expect the same algorithm finishing for 1 million elements in around ten days or in a month. If the algorithm is linear, say O(n), then finishing 1,000 element in one second should make you expect 1 million to be finished in 1,000 seconds. That is a bit longer than a coffee break and too short for lunch.

This makes it feasible that if we want some serious business sorting objects, we will need something better than bubble sort. That many unnecessary comparisons are not only wasting our time, but also CPU power, consuming energy, and polluting the environment. The question, however, is: how fast can a sort be? Is there a provable minimum that we cannot overcome?

The answer is yes.

When we implement any sorting algorithm, the implementation will execute comparisons and element swaps. That is the only way to sort a collection of objects. The outcome of a comparison can have two values. Say, these values are 0 or 1. This is one bit of information. If the result of the comparison is 1, then we swap, if the result is 0, then we do not swap.

We can have the objects in different orders before we start the comparison and the number of different orders is n! (n factorial). That is, the numbers multiplied from 1 to n, in other words n!=1*2*3*...*(n-1)*n.

Let's assume that we stored the result of the individual comparisons in a number as a series of bits for each possible input for the sort. Now, if we reverse the execution of the sort and run the algorithm starting from the sorted collection, control the swapping using the bits that described the results of the comparison, and we use the bits the other way around doing the last swap first and the one that was done first during the sorting first, we should get back the original order of the objects. This way, each original order is uniquely tied to a number expressed as an array of bits.

Now, we can express the original question this way: how many bits are needed to describe n factorial different numbers? That is exactly the number of comparisons we will need to sort n elements. The number of bits is log2(n!) . Using some mathematics, we will know that log2(n!) is the same as log2(1)+ log2(2)+...+ log2(n). If we look at this expression's asymptotic value, then we can say that this is the same O(n*log n). We should not expect any general sorting algorithm to be faster.

For special cases, there are faster algorithms. For example, if we want to sort 1 million numbers that are each between one and 10, then we only need to count the number of the different numbers and then create a collection that contains that many ones, twos, and so on. This is an O(n) algorithm, but this is not applicable for the general case.

Again, this was not a formal mathematical proof.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.220.22