Sorting ranges efficiently

Phobos' std.algorthm includes sorting algorithms. Let's look at how they are used, what requirements they have, and the dangers of trying to implement range primitives without minding their efficiency requirements.

Getting ready

Let's make a linked list container that exposes an appropriate view, a forward range, and an inappropriate view, a random access range that doesn't meet its efficiency requirements. A singly-linked list can only efficiently implement forward iteration due to its nature; the only tool it has is a pointer to the next element. Implementing any other range primitives will require loops, which is not recommended. Here, however, we'll implement a fully functional range, with assignable elements, length, bidirectional iteration, random access, and even slicing on top of a linked list to see the negative effects this has when we try to use it. The code for this can be found at Chapter03Code3 of your code bundle.

We'll also create a skeleton main program that initializes the list. You can find the code for this at Chapter03Code3 of your code bundle.

How to do it…

We're going to both sort and benchmark this program.

To sort

Let's sort ranges by executing the following steps:

  1. Import std.algorithm.
  2. Determine the predicate you need. The default is (a, b) => a < b, which results in an ascending order when the sorting is complete (for example, [1,2,3]). If you want ascending order, you don't have to specify a predicate at all. If you need descending order, you can pass a greater-than predicate instead, as shown in the following line of code:
    auto sorted = sort!((a, b) => a > b)([1,2,3]); // results: [3,2,1]
  3. When doing string comparisons, the functions std.string.cmp (case-sensitive) or std.string.icmp (case-insensitive) may be used, as is done in the following code:
    auto sorted = sort!((a, b) => cmp(a, b) < 0)(["b", "c", "a"]); // results: a, b, c
  4. Your predicate may also be used to sort based on a struct member, as shown in the following code:
    auto sorted = sort!((a, b) => a.value < b.value)(structArray);
  5. Pass the predicate as the first compile-time argument. The range you want to sort is passed as the runtime argument.
  6. If your range is not already sortable (if it doesn't provide the necessary capabilities), you can convert it to an array using the array function from std.range, as shown in the following code:
    auto sorted = sort(fibanocci().take(10)); // won't compile, not enough capabilities
    auto sorted = sort(fibanocci().take(10).array); // ok, good
  7. Use the sorted range. It has a unique type from the input to signify that it has been successfully sorted. Other algorithms may use this knowledge to increase their efficiency.

To benchmark

Let's sort objects using benchmark by executing the following steps:

  1. Put our range and skeleton main function from the Getting ready section of this recipe into a file.
  2. Use std.datetime.benchmark to test the sorting of an array from the appropriate walker against the slow walker and print the results at the end of main. The code is as follows:
        auto result = benchmark!(
            {    auto sorted = sort(list.walker.array);    },
            {    auto sorted = sort(list.slowWalker);      }
        )(100);
        writefln("Emulation resulted in a sort that was %d times slower.",
            result[1].hnsecs / result[0].hnsecs);
  3. Run it. Your results may vary slightly, but you'll see that the emulated, inappropriate range functions are consistently slower. The following is the output:
    Emulation resulted in a sort that was 16 times slower.
  4. Tweak the size of the list by changing the initialization loop. Instead of 1000 entries, try 2000 entries. Also, try to compile the program with inlining and optimization turned on (dmd –inline –O yourfile.d) and see the difference. The emulated version will be consistently slower, and as the list becomes longer, the gap will widen.

On my computer, a growing list size led to a growing slowdown factor, as shown in the following table:

List size

Slowdown factor

500

13

1000

16

2000

29

4000

73

How it works…

The interface to Phobos' main sort function hides much of the complexity of the implementation. As long as we follow the efficiency rules when writing our ranges, things either just work or fail, telling us we must call an array in the range before we can sort it. Building an array has a cost in both time and memory, which is why it isn't performed automatically (std.algorithm prefers lazy evaluation whenever possible for best speed and minimum memory use). However, as you can see in our benchmark, building an array is much cheaper than emulating unsupported functions.

The sort algorithms require a full-featured range and will modify the range you pass to it instead of allocating memory for a copy. Thus, the range you pass to it must support random access, slicing, and either assignable or swappable elements. The prime example of such a range is a mutable array. This is why it is often necessary to use the array function when passing data to sort.

Our linked list code used static if with a compile-time parameter as a configuration tool. The implemented functions include opSlice and properties that return ref. The ref value can only be used on function return values or parameters. Assignments to a ref value are forwarded to the original item. The opSlice function is called when the user tries to use the slice syntax: obj[start .. end].

Inside the beSlow condition, we broke the main rule of implementing range functions: avoid loops. Here, we see the consequences of breaking that rule; it ruined algorithm restrictions and optimizations, resulting in code that performs very poorly. If we follow the rules, we at least know where a performance problem will arise and can handle it gracefully.

Tip

For ranges that do not implement the fast length property, std.algorithm includes a function called walkLength that determines the length by looping through all items (like we did in the slow length property). The walkLength function has a longer name than length precisely to warn you that it is a slower function, running in O(n) (linear with length) time instead of O(1) (constant) time. Slower functions are OK, they just need to be explicit so that the user isn't surprised.

See also

  • The std.algorithm module also includes other sorting algorithms that may fit a specific use case better than the generic (automatically specialized) function. See the documentation at http://dlang.org/phobos/std_algorithm.html for more information.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.151.44