Ranges in use

The key concept to understand ranges in the general case is that, unless they are infinite, they are consumable. In idiomatic usage, they aren't intended to be kept around, adding and removing elements to and from them as if they were some sort of container. A range is generally created only when needed, passed to an algorithm as input, then ultimately consumed, often at the end of a chain of algorithms. Even forward ranges and output ranges with their save and put primitives usually aren't intended to live beyond an algorithm.

That's not to say it's forbidden to keep a range around; some might even be designed for long life. For example, the random number generators in std.random are all ranges that are intended to be reused. However, idiomatic usage in D generally means lazy, fire-and-forget ranges that allow algorithms to operate on data from any source and minimize memory allocations.

For most programs, the need to deal with ranges directly should be rare; most code will be passing ranges to algorithms, then either converting the result to a container or iterating it with a foreach loop. Only when implementing custom containers and range-based algorithms is it necessary to implement a range or call a range interface directly. Still, understanding what's going on under the hood helps in understanding the algorithms in Phobos, even if you never need to implement a range or algorithm yourself. That's the focus of the remainder of this chapter.

Custom ranges

When implementing custom ranges, some thought should be given to the primitives that need to be supported and how to implement them. Since arrays support a number of primitives out of the box, it might be tempting to return a slice from a custom type, rather than a struct wrapping an array or something else. While that might be desirable in some cases, keep in mind that a slice is also an output range and has assignable elements (unless it's qualified with const or immutable, but those can be cast away). In many cases, what's really wanted is an input range that can never be modified; one that allows iteration and prevents unwanted allocations.

A custom range should be as lightweight as possible. If a container uses an array or pointer internally, the range should operate on a slice of the array, or a copy of the pointer, rather than a copy of the data. This is especially true for the save primitive of a forward iterator; it could be called more than once in a chain of algorithms, so an implementation that requires deep copying would be extremely suboptimal (not to mention problematic for a range that requires ref return values from front).

Now we're going to implement two actual ranges that demonstrate two different scenarios. One is intended to be a one-off range used to iterate a container, and one is suited to sticking around for as long as needed. Both can be used with any of the algorithms and range operations in Phobos.

Getting a range from a stack

Here's a barebones, simple stack implementation with the common operations push, pop, top, and isEmpty (so named to avoid confusion with the input range interface). It uses an array to store its elements, appending them in the push operation and decreasing the array length in the pop operation. The top of the stack is always _array[$-1]:

struct Stack(T) {
    private T[] _array;
    void push(T element) {
        _array ~= element;
    }
    void pop() {
        assert(!isEmpty);
        _array.length -= 1;
    }
    ref T top() {
        assert(!isEmpty);
        return _array[$-1];
    }
    bool isEmpty() { return _array.length == 0; }
}

Rather than adding an opApply to iterate a stack directly, we want to create a range to do the job for us so that we can use it with all of those algorithms we'll look at in the next chapter. Additionally, we don't want the stack to be modified through the range interface, so we should declare a new range type internally. That might look like this:

private struct Range {
    T[] _elements;
    bool empty() { return _elements.length == 0; }
    T front() { return _elements[$-1]; }
    void popFront() { _elements.length -= 1; }
}

Add this anywhere you'd like inside the stack declaration. Note the implementation of front. Effectively, this range will iterate the elements backwards. Since the end of the array is the top of the stack, that means it's iterating the stack from the top to the bottom. We could also add back and popBack primitives that iterate from the bottom to the top, which would require adding a save primitive since bidirectional ranges must also be forward ranges.

Now, all we need is a function to return a Range instance:

auto elements() { return Range(_array); }

Again, add this anywhere inside the Stack declaration. A real implementation might also add the ability to get a range instance from slicing a stack. Now, test it out:

Stack!int stack;
foreach(i; 0..10)
    stack.push(i);
writeln("Iterating...");
foreach(i; stack.elements)
    writeln(i);
stack.pop();
stack.pop();
writeln("Iterating...");
foreach(i; stack.elements)
    writeln(i);

One of the great side-effects of this sort of range implementation is that you can modify the container behind the range's back and the range doesn't care:

foreach(i; stack.elements) {
        stack.pop();
        writeln(i);
}
writeln(stack.top);

This will still print exactly what was in the stack at the time the range was created, but the writeln outside the loop will cause an assertion failure because the stack will be empty by then. Of course, it's still possible to implement a container that can cause its ranges not just to become stale, but to become unstable and lead to an array bounds error, an access violation, or something similar. However, D's slices used in conjunction with structs give a good deal of flexibility.

A name generator range

Imagine that we're working on a game and need to generate fictional names. For this example, let's say it's a music group simulator and the names are those of group members. We'll need a data structure to hold the list of possible names. To keep the example simple, we'll implement one that holds both first and last names:

struct NameList {
private:
    string[] _firstNames;
    string[] _lastNames;
    struct Generator {
        private string[] _first;
        private string[] _last;
        private string _next;
        enum empty = false;
        this(string[] first, string[] last) {
            _first = first;
            _last = last;
            popFront();
        }
        string front() {
            return _next;
        }
        void popFront() {
            import std.random : uniform;
            auto firstIdx = uniform(0, _first.length);
            auto lastIdx = uniform(0, _last.length);
            _next = _first[firstIdx] ~ " " ~ _last[lastIdx];
        }
    }
public:
    auto generator() {
        return Generator(_firstNames, _lastNames);
    }
}

The custom range is in the highlighted block. It's a struct called Generator that stores two slices, _first and _last, which are both initialized in its only constructor. It also has a field called _next, which we'll come back to in a minute. The goal of the range is to provide an endless stream of randomly generated names, which means it doesn't make sense for its empty property to ever return true. As such, it is marked as an infinite range by the manifest constant implementation of empty that is set to false.

This range has a constructor because it needs to do a little work to prepare itself before front is called for the first time. All of the work is done in popFront, which the constructor calls after the member variables are set up. Inside popFront, you can see that we're using the std.random.uniform function. By default, this function uses a global random number generator and returns a value in the range specified by the parameters—in this case 0 and the length of each array. The first parameter is inclusive and the second is exclusive. Two random numbers are generated, one for each array, and then used to combine a first name and a last name to store in the _next member, which is the value returned when front is called. Remember, consecutive calls to front without any calls to popFront should always return the same value.

Note

std.random.uniform can be configured to use any instance of one of the random number generator implementations in Phobos. It can also be configured to treat the bounds differently. For example, both could be inclusive, exclusive, or the reverse of the defaults. See the documentation at http://dlang.org/phobos/std_random.html for details.

The generator property of NameList returns an instance of Generator. Presumably, the names in a NameList would be loaded from a file on disk or, from a database, or perhaps even imported at compile-time. It's perfectly fine to keep a single Generator instance handy for the life of the program as implemented. However, if the NameList instance backing the range supported reloading or appending, not all changes would be reflected in the range. In that scenario, it's better to go through generator every time new names need to be generated.

Now, let's see how our custom range might be used:

auto nameList = NameList(
    ["George", "John", "Paul", "Ringo", "Bob", "Jimi",
     "Waylon", "Willie", "Johnny", "Kris", "Frank", "Dean",
     "Anne", "Nancy", "Joan", "Lita", "Janice", "Pat",
     "Dionne", "Whitney", "Donna", "Diana"],
    ["Harrison", "Jones", "Lennon", "Denver", "McCartney",
     "Simon", "Starr", "Marley", "Dylan", "Hendrix", "Jennings",
     "Nelson", "Cash", "Mathis", "Kristofferson", "Sinatra",
     "Martin", "Wilson", "Jett", "Baez", "Ford", "Joplin",
     "Benatar", "Boone", "Warwick", "Houston", "Sommers",
     "Ross"]
);
import std.range : take;
auto names = nameList.generator.take(4);
writeln("These artists want to form a new band:");
foreach(artist; names)
    writeln(artist);

First up, we initialize a NameList instance with two array literals, one of first names and one of last names. Next, the highlighted line is where the range is used. We call nameList.generator and then, using UFCS, pass the returned Generator instance to std.range.take. This function creates a new lazy range containing a number of elements, four in this case, from the source range. In other words, the result is the equivalent of calling front and popFront four times on the range returned from nameList.generator. However, since it's lazy, the popping doesn't occur until the foreach loop. That loop produces four randomly generated names that are each written to standard output. One iteration yielded the following names for me:

These artists want to form a new band:
Dionne Wilson
Johnny Starr
Ringo Sinatra
Dean Kristofferson

Other considerations

The Generator range is infinite, so it doesn't need length. There should never be a need to index it, iterate it in reverse, or assign any values to it. It has exactly the interface it needs. But it's not always so obvious where to draw the line when implementing a custom range. Consider the interface for a range from a queue data structure.

A basic queue implementation allows two operations to add and remove items—enqueue and dequeue (or push and pop if you prefer). It provides the self-describing properties empty and length. What sort of interface should a range from a queue implement?

An input range with a length property is perhaps the most obvious, reflecting the interface of the queue itself. Would it make sense to add a save property? Should it also be a bidirectional range? Should the range be random-access? There are queue implementations out there in different languages that allow indexing, either through an operator overload or a function such as getElementAt. Does that make sense? Maybe. More importantly, if a queue doesn't allow indexing, does it make sense for a range produced from that queue to allow it? What about slicing? Or assignable elements? For our queue type at least, there are no clear-cut answers to these questions.

A variety of factors come into play when choosing which range primitives to implement, including the internal data structure used to implement the queue, the complexity requirements of the primitives involved (indexing should be a O(1) operation), whether the queue was implemented to meet a specific need or is a more general-purpose data structure, and so on. A good rule of thumb is that, if a range can be made a forward range, then it should be. Other than that, which range options should be applied is wholly dependent on context.

In the next chapter, we'll add a range interface to the custom Array type in MovieMan. It's an example of drawing the line at a minimal interface that meets a specific need.

Custom algorithms

When implementing custom, range-based algorithms, it's not enough to just drop an input range interface onto the returned range type and be done with it. Some consideration needs to be given to the type of range used as input to the function and how its interface should affect the interface of the returned range. Consider the FilteredRange we implemented earlier, which provides the minimal input range interface. Given that it's a wrapper range, what happens when the source range is an infinite range? Let's look at it step by step.

First, an infinite range is passed in to filter. Next, it's wrapped up in a FilteredRange instance that's returned from the function. The returned range is going to be iterated at some point, either directly by the caller or somewhere in a chain of algorithms. There's one problem, though: with a source range that's infinite, the FilteredRange instance can never be consumed. Because its empty property simply wraps that of the source range, it's always going to return false if the source range is infinite. However, since FilteredRange does not implement empty as a compile-time constant, it will never match the isInfiniteRange predicate itself. This will cause any algorithm that makes that check to assume it's dealing with a finite range and, if iterating it, enter into an infinite loop. Imagine trying to track down that bug.

One option is to prohibit infinite ranges with a template constraint, but that's too restrictive. A better way around this potential problem is to check the source range against the isInfinite predicate inside the FilteredRange implementation. Then, the appropriate form of the empty primitive of FilteredRange can be configured with conditional compilation:

import std.range : isInfinite;
static if(isInfinite!T)
  enum empty = false;
else
  bool empty(){ return _source.empty; }

With this, FilteredRange will satisfy the isInfinite predicate when it wraps an infinite range, avoiding the infinite loop bug.

Another good rule of thumb is that a wrapper range should implement as many of the primitives provided by the source range as it reasonably can. If the range returned by a function has fewer primitives than the one that went in, it is usable with fewer algorithms. But not all ranges can accommodate every primitive.

Take FilteredRange as an example again. It could be configured to support the bidirectional interface, but that would have a bit of a performance impact as the constructor would have to find the last element in the source range that satisfies the predicate in addition to finding the first, so that both front and back are primed to return the correct values. Rather than using conditional compilation, std.algorithm provides two functions, filter and filterBidirectional, so that users must explicitly choose to use the latter version. A bidirectional range passed to filter will produce a forward range, but the latter maintains the interface.

The random-access interface, on the other hand, makes no sense on FilteredRange. Any value taken from the range must satisfy the predicate, but if users can randomly index the range, they could quite easily get values that don't satisfy the predicate. It could work if the range were made eager rather than lazy. In that case, it would allocate new storage and copy all the elements from the source that satisfies the predicate, but that defeats the purpose of using lazy ranges in the first place.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.202.209