Navigating Phobos

Although there are range-based functions in many modules in Phobos, the focus of this section is on those found in std.range, std.algorithm, and std.array. The functions in these modules are often central to composable pipelines. We aren't going to cover all of them, just enough to give you a taste of what is possible and where to find the functions you might need. An in-depth treatment of these modules would span more pages than we have room for.

Note

Note that while many of the functions in these modules are usable in function pipelines, some of them are not, either because they have a return type of void, or they do not take a range as the first parameter. Although this chapter is focused on composable pipelines, we will still take a look at a few of these misfit functions as they can be quite useful to initialize a range before pipelining or to do something with the result of a pipeline.

std.range

The std.range package exposes two modules, std.range.interfaces and std.range.primitives, and a number of functions are located in its package.d file.

The interfaces module provides a number of object-based interfaces that can be used for special cases when runtime polymorphism is required for ranges. For example, if it were necessary to store instance of a number of different input range types in an array, then the InputRange interface could be used as the array type. We can safely ignore this module for now, as it plays no special role in composable pipelines.

The templates for testing range properties that we saw in the previous chapter, such as isInputRange and hasLength, all live in the primitives module, along with some utility functions for directly manipulating ranges. These are useful for implementing range-based algorithms, but are irrelevant to our purposes in this chapter.

The bulk of the functions in the std.range package that typically play a role in composable pipelines are found in the std.range module itself (package.d). The documentation at http://dlang.org/phobos/std_range.html says the following:

"… this module provides a rich set of range creation and composition templates that let you construct new ranges out of existing ranges…"

Actually, most range-based functions create new ranges from existing ranges, including many of those in std.algorithm. We saw this sort of thing first-hand when we implemented our own custom filter function in Chapter 6, Understanding Ranges. The key difference is that the only purpose of these functions in std.range is to create a new range. They aren't used to apply any sort of transformation to the elements, or otherwise manipulate the elements, in any way. Any work carried out inside the range is done to produce values, not transform them.

At the time of writing, the documentation does not divide the std.range functions into different categories, but it's helpful to do so for our purposes. It's also helpful to think of them not in terms of functions, but rather in terms of the ranges they produce. We can roughly categorize them into three groups: generative ranges, selective ranges, and compositional ranges. Not all of the functions from std.range will be covered here, but enough to serve as a foundation for learning more from the documentation.

Generative ranges

The functions in this group do not get their elements from other ranges, but rather generate them on the fly. These are used to create ranges from scratch, often to contain a sequence of numbers. It's much more convenient to call one of these functions than to implement a new range, or fill out an array literal, when all that is needed is a quick, one-off sequence.

iota

This function is the simplest in this category. The word iota is the name of a letter from the Greek alphabet. It has been used in some programming languages to represent a consecutive sequence of integers. The range returned by this function is a consecutive sequence, but it need not be of integers. All of the built-in numeric types are supported, as well as any user-defined type that overloads the ++, < and == operators. It takes up to three parameters. The following lines demonstrate this function:

import std.range : iota;
// Prints 0 - 9
foreach(i; iota(10)) writeln(i);
// Prints 1 - 10
foreach(i; iota(1, 11)) writeln(i);
// Prints all even numbers from 2 - 20
foreach(i; iota(2, 21, 2)) writeln(i);

The two-argument version takes a start value and an end value as parameters. The single-argument version is equivalent to iota(0, arg), where arg is 10 in the example. The third parameter represents the step value, or the amount by which to increment the current element on each iteration. The step value is 1 when it isn't specified. As with the slice operator, the start value is inclusive and the end value is exclusive, hence iota(1, 3) yields a range that starts with 1 and ends with 2. The type of range returned by iota depends on the type of the inputs. For built-in types, the return type is a random access range, but for user-defined types it is a basic input range.

recurrence

A recurrence equation recursively defines a sequence of numbers. Given an initial number to begin the sequence, subsequent numbers are each defined as a function of the preceding numbers. A well-known example of this is the Fibonacci sequence.

std.range.recurrence takes one or more function arguments that represent the initial state of the new range. As a template parameter, it accepts the recurrence relation in the form of a string, delegate or other callable type. The returned range is a forward range. This example generates the first 20 Fibonacci numbers.

import std.range : recurrence, take;
auto r1 = recurrence!("a[n-1] + a[n-2]")(0, 1).take(20);

When using a string as the recurrence equation, a represents the current state and n represents the current index. The following example uses a delegate to achieve the same result as iota(2, 21, 2).

auto r2 = recurrence!((a,n) => a[n-1] + 2)(2).take(10);

sequence

This function is similar to recurrence, but the numbers are generated differently. Instead of an equation where the value of n depends on the previous values, sequence uses a closed form expression, where the nth value is a function of the initial value and n itself. The following example generates a sequence of the first 10 even numbers.

import std.range : sequence, take, dropOne;
auto r = sequence!("n*2").dropOne.take(10);

The call to dropOne (the description of which is coming up shortly) ensures the generated sequence doesn't start with zero. sequence returns a random-access range.

Selective ranges

A selective range is produced from a source range to selectively iterate the source elements. This group includes the take and drop families of ranges, stride, and retro.

take

The take function and a variation called takeExactly both accept two arguments, a range and the number of elements to take from it. They both return a new range containing the elements taken. It is the equivalent of calling front and popFront on the source range n times and storing the result of front each call to in the new range. The difference between the two is that take will happily succeed if the source range does not contain enough elements, while the same circumstance in takeExactly will trigger an assertion failure.

import std.range : iota, take, takeExactly;
// OK -- returned range has 10 elements
auto r1 = iota(10).take(12);
// Assertion failure -- takeExactly requires 12 elements
auto r2 = iota(10).takeExactly(12);

The type of range returned by take is dependent on the properties of the source range. The documentation only specifies that if the source range has a length property, so does the returned range, but a look at the source shows that other properties will be configured based on the type of the source range. The same holds true for takeExactly, with the exception that the returned range will always support .length, even when the source range does not.

There are two other variations of this function. takeOne returns a range that contains, at the most, one element. If the source range is empty, the returned range will also be empty, otherwise it contains the first element of the source. The returned range is always a random-access range with .length, no matter the properties of the source range. This sets it apart from take. takeNone will always return an empty range. It's possible to call takeNone with no source range at all, so it can be used to create an empty range from scratch. The returned range is always a random-access range with .length.

drop

The drop family of selective ranges are the inverse of the take family. drop, dropExactly and dropOne, given a source range, will return a range that contains all of the elements of the source except the first n elements. It's the equivalent of popFront on the source range n times to discard n elements. For example, the following snippet fills a range with 20 elements, then discards the first 10. It prints the numbers 1120.

import std.range : iota, drop;
auto r = iota(1, 21).drop(10);
foreach(i; r) writeln(i);

There are three other variants called dropBack, dropBackExactly, and dropBackOne that work with the back and popBack primitives of bidirectional ranges.

stride

This function produces a range that iterates its elements according to a step value. Given a step of n, each call to popFront is the equivalent of calling popFront on the source range n times. If the source range is random-access, iteration is performed via indexing. The following prints all even numbers from 0 to 20.

import std.range : iota, stride;
auto r1 = iota(21).stride(2);
foreach(i; r) writeln(i);

Multiple calls to stride on the same range results in a step that is a product of the two steps all of the arguments. For example, the following prints all the numbers divisible by 32 from 0 to 256:

auto r2 = iota(257).stride(2).stride(16);
foreach(i; r2) writeln(i);

The returned range is of the same type as the source.

retro

A significant number of range-based functions iterate their arguments based on the assumption that they are only input ranges. This means they always iterate a source range from front to back. Few functions are designed to iterate bidirectional ranges from back to front. retro allows any range that supports the bidirectional range interface to be iterated in reverse with any range-based function. Consider the following:

import std.range : iota, retro;
auto r = iota(1, 21).retro();
foreach(i; r) writeln(i);

Remember that iota returns a random-access range when its arguments are built-in types. Since random-access ranges are bidirectional, we can use retro on the returned range to iterate it in reverse without the need for foreach_reverse. If the source is a random-access range, so will be the returned range. Note that two consecutive calls to retro on the same range cancel each other out; the returned range will be the source range.

Compositional ranges

This category of ranges takes any number of ranges and returns a range that treats them all as a single range. Calling front on the range returned from one of these functions will result in front being called on one of the composed ranges, though which range is selected and the pattern used for multiple calls to front is different for each function. These functions can accept ranges with different properties. The return value of each is dependent on the properties of the source ranges. When in doubt, always consult the source.

chain

This function takes multiple ranges and produces a range that iterates each in sequence, effectively treating them as a single, concatenated range. Once front/popFront have gone through the first range, they pick up with the second, and so on. The following chains three ranges, each containing ten integers, then prints them all from 0 to 29:

import std.range : iota, chain;
auto r = chain(iota(10), iota(10, 20), iota(20, 30));
foreach(i; r) writeln(i);

The returned range is a simple input range unless the sources are all random-access ranges with .length, in which case the returned range will have the same properties.

roundRobin

This function takes any number of ranges and alternates between them on each successive call to front. Given the ranges r1 and r2, the first call to front returns the first element of r1, the second returns the first element of r2, the third returns the second element of r1, and so on. The following takes the sequences [0, 1, 2], [3, 4, 5], and [6, 7, 8], and prints 0, 3, 6, 1, 4, 7, 2, 5, 8.

auto rr = roundRobin(iota(3), iota(3, 6), iota(6, 9));
foreach(i; rr) writeln(i);

If the sources are all forward ranges, the returned range will be, too. If all of the source ranges have .length, so will the returned range. Otherwise, it's a simple input range.

transposed

Consider the following array of int[]:

int[][] arr = [[1, 2, 3], [4, 5, 6]];

We can refer to this as an array of arrays, but since std.array exposes the range interface for dynamic arrays, we can also refer to it as a range of ranges. transposed takes a range of ranges and transforms it into a new range of ranges that looks like [[1, 4], [2, 5], [3, 6]]. In other words, the ith range in the result contains the ith elements of each range in the source.

import std.range : transposed;
auto arr = [[1, 2, 3], [4, 5, 6]];
auto r = transposed(arr);
foreach(a; r) writeln(a);

The source range must be a forward range with assignable elements. The returned range is a forward range that supports slicing.

zip

Given multiple ranges, zip allows them to be iterated in lockstep. Rather than returning an individual element, each call to front on a zipped range returns a tuple containing the corresponding elements from each source range. This allows transformations to be applied to the elements of each source range in parallel. The following iterates a zipped range and prints each tuple whose first item is even:

import std.range : iota, zip;
import std.algorithm : filter;
auto r = zip(iota(1, 11), iota(11, 21))
  .filter!(t => (t[0] & 1) == 0);
foreach(i; r) writeln(i);

The output looks like the following:

Tuple!(int, int)(2, 12)
Tuple!(int, int)(4, 14)
Tuple!(int, int)(6, 16)
Tuple!(int, int)(8, 18)
Tuple!(int, int)(10, 20)

The returned range shares the common properties of all the source ranges. For example, given three ranges, if one is a forward range, one is bidirectional, and one is random-access, the returned range will be a forward range, since it is the lowest common property among them all. This holds true for optional primitives like .length.

zip supports an optional template parameter that is expected to be a value from the StoppingPolicy enumeration. This is used to specify how to handle ranges of different length. StoppingPolicy.shortest will stop iteration as soon as empty returns true from any of its source ranges, while StoppingPolicy.longest will continue until all ranges are consumed. StoppingPolicy.requireSameLength will cause an exception to be thrown if any range is consumed while any of the others has elements remaining. The default is StoppingPolicy.shortest.

lockstep

This function works like zip, except that it is intended to be used in foreach loops and not as part of a pipeline. As such, its return value is not a range, but an object with an opApply implementation. This allows for an index variable to be used in the loop, something not normally possible with ranges by default (std.range.enumerate returns a range that allows index variables in foreach loops).

import std.range : iota, lockstep;
foreach(i, x, y; lockstep(iota(1, 11), iota(11, 21)))
  writefln("%s. x = %s and j = %s", i, x, y);

This results in the following output:

0. x = 1 and j = 11
1. x = 2 and j = 12
2. x = 3 and j = 13
3. x = 4 and j = 14
4. x = 5 and j = 15
5. x = 6 and j = 16
6. x = 7 and j = 17
7. x = 8 and j = 18
8. x = 9 and j = 19
9. x = 10 and j = 20

lockstep also accepts an optional StoppingPolicy template argument, with StoppingPolicy.shortest as the default. However, StoppingPolicy.longest is not supported and will cause an exception to be thrown if used.

std.algorithm

It is probably accurate to call this package the workhorse of Phobos. The range-based functions found here are used to solve a variety of problems in the D world and make up the bulk of the composable pipelines you'll see in the wild. The algorithms are split among several modules called comparison, iteration, mutation, searching, setops, and sorting. Again, there's no room for us to cover all of the functions these modules expose, but by the end of this subsection you'll have a good feel for the layout of the package and a better understanding of where to find what you need. All of the functions described here, and many more, can be found in the documentation at http://dlang.org/phobos/std_algorithm.html.

Comparison

The comparison algorithms take two or more ranges and return a result based on a one-by-one comparison of the elements. The result could be a range, a numeric value, a bool, or a tuple, depending on the algorithm. Here, we take a look at four of these functions: equal, cmp, mismatch, and levenshteinDistance. It's also worth noting that there are a few functions in this module that are not range-specific. For example, min, max, and clamp can be used with any built-in or user-defined type that supports opCmp comparisons.

equal

This function iterates two ranges in lockstep and applies a predicate to the corresponding elements of each, the default being "a == b". It returns bool, making it convenient to use as a predicate to other algorithms. It's also useful for unit testing functions that return ranges.

import std.range : iota, take;
import std.algorithm : equal;
assert(equal(iota(51).take(20), iota(20)));

cmp

Given two ranges, this function compares the corresponding element of each range using the given predicate. The default predicate is "a < b". Given an element e1 from the first range and e2 from the second, if predicate(e1, e2) is true, then the function returns a negative value; if predicate(e2, e1) is true, the function returns a positive value. If one of the ranges is consumed, the return will be negative if the first range has fewer elements than the second, positive for the opposite case, and 0 if both ranges are the same length.

Recall that strings are iterated by Unicode code point, so comparison will happen one code point at a time after the strings have been decoded.

import std.algorithm : cmp;
auto s1 = "Michael";
auto s2 = "Michel";
assert(cmp(s1, s2) < 0);

mismatch

This function iterates two ranges, comparing corresponding elements using the given predicate. When two elements are found that fail the predicate, the function returns a tuple of the two reduced ranges. The ranges begin with the elements that failed the predicate.

import std.algorithm : mismatch, equal;
auto s1 = "Michael";
auto s2 = "Michel";
auto t1 = mismatch(s1, s2);
assert(equal(t1[0], "ael"));
assert(equal(t1[1], "el"));

If all elements match, the tuple will contain empty ranges.

auto arr = [1, 2, 3];
auto t2 = mismatch(arr, arr);
assert(t2[0].length == 0);
assert(t2[1].length == 0);

levenshteinDistance

Given two forward ranges r1 and r2, this function returns the number of edits necessary to transform r1 into r2. This is most useful with strings, but can be used with any forward range. The following example uses levenshteinDistance to suggest a command to the user when an incorrect command is entered:

import std.algorithm : levenshteinDistance;
auto commands = ["search", "save", "delete", "exit"];
auto input = "safe";
size_t shortest = size_t.max, shortestIndex;
foreach(i, s; commands) {
  auto distance = levenshteinDistance(input, s);
  if(distance < shortest) {
    shortest = distance;
    shortestIndex = i;
  }
}
if(shortest != size_t.max)
  writefln(""%s" is an unknown command. Did you mean "%s"?", input, commands[shortestIndex]);

Iteration

While most of the comparison algorithms may be best suited to the end of a pipeline or to be used in isolation, the iteration algorithms tend to form the bulk of a function chain. It is not uncommon to see the same algorithm more than once in the same pipeline. Here, we look at three of them: group, map, and reduce. The filter and splitter functions we saw earlier also live in this module.

group

This function iterates a range, looks at the number of times an element appears consecutively in sequence, and returns a range containing a tuple of each element and the number of times it appears in the sequence. If you're familiar with run-length encoding, then you should easily understand group.

import std.algorithm : group;
auto arr = [1, 1, 1, 1, 2, 3, 3, 6, 6, 4, 3, 3, 3, 3];
auto r = arr.group();
foreach(val, count; r)
  writefln("%s appears %s times", val, count);

This snippet produces the following output:

1 appears 4 times
2 appears 1 times
3 appears 2 times
6 appears 2 times
4 appears 1 times
3 appears 4 times

map

Given a range as a function argument and a string or callable as a template argument, map applies the given callable to each element of the range. The return value is a range containing the transformed elements. The following example outputs the even numbers from 2 to 40.

import std.algorithm : map;
import std.range : iota;
auto r = iota(1, 21).map!(x => x * 2);
foreach(n; r) writeln(n);

map can accept multiple template arguments as callables. Instantiating it in this manner causes it to return a tuple containing one range for each callable. The following returns a tuple whose first element is a range containing all of the elements of the source range multiplied by 2 and the second a range containing all of the elements of the source range divided by 2.

import std.algorithm : map;
  import std.range : iota;
  auto r = iota(100, 111).map!("a", "a * 2", "a / 2");
  foreach(t; r ) {
    writefln("%s * 2 = %s", t[0], t[1]);
    writefln("%s / 2 = %s", t[0], t[2]);
  }

Although map is most commonly used to directly transform the elements of a range, it can be used in other ways. The English Open Word List we used earlier is distributed as a set of separate text files, one for each letter of the alphabet. The name format follows the pattern A Words.txt. I used the following script to combine all of them into a single file:

void main() {
  import std.stdio : File;
  import std.range : iota;
  import std.algorithm : map;
  import std.file : readText;
  import std.string : chomp;
  auto file = File("words.txt", "w");
  auto text = 
  iota('A', cast(char)('Z'+1))
    .map!(c => readText(c ~ " Words.txt")
  .chomp());
  foreach(s; text)
    file.writeln(s);
}

The lines most relevant to this chapter are highlighted. iota is used to generate a range of characters from 'A' to 'Z'. I used 'Z' + 1 rather than '[' (which follows Z in the ASCII chart) as the end of the range because it's more readable. However, adding an int and a char results in an int, which would cause the element type of the range returned by iota to be int; hence the cast.

It's important to understand that map performs its transformations in front and never caches the result. When using it to read files or perform other potentially expensive operations, care should be taken that the operations are not performed multiple times. Remember that due to the way the foreach loop is rewritten by the compiler, text in the above example is never actually consumed. It's possible to iterate it once more, reading all of the files into memory a second time.

reduce

reduce takes a string or a callable as a template argument, and a range and optional initial seed value as function arguments. The seed value is used to initialize an accumulator; if no seed is specified, then the first element of the range is used. Given the call range.reduce!(fun), for every element e of range, the following evaluation is made: accumulator = fun(accumulator, e). Once all elements have been iterated, the value of accumulator is returned. Unfortunately, the two-argument form of the function accepts the seed value as the first parameter, rather than the range, which precludes it from being used with UFCS call chains in which each step returns a range. The single argument form fits right in, though. The following example sums all of the numbers in the range 05:

import std.algorithm : reduce;
import std.range : iota;
assert(iota(6).reduce!("a + b") == 15);

Like map, reduce can accept multiple template arguments that can be applied to the range elements. In that case, it returns a tuple containing the accumulated results of each.

Mutation

Functions in this module modify a range in some way, either by adding or removing elements. There are many properties that allow a range to be modified. All output ranges are modifiable, some ranges may allow modification through opIndexAssign, and others may have an assignable front property. The functions in this module look for one or more of these properties in order to carry out their work. We'll look at three: copy, fill, and remove.

copy

This function takes an input range as its first parameter and an output range as its second, copying all of the elements from the former into the latter. The copy happens eagerly, so it's great to use at the end of a pipeline and avoid the need to allocate an array, for which it was used earlier in the words2 example. The target range must have enough room to hold all of the elements of the source range, or else an assertion failure will result. The function returns the remaining, unfilled portion of the target range when the copy is complete, as demonstrated by the following example:

import std.algorithm : copy;
import std.range : iota;
int[20] sink;
auto r = iota(10).copy(sink[]);
iota(10, 20).copy(r);
writeln(sink);

Tip

Static arrays and templates

When passing a static array to a normal function, slicing happens implicitly. When passing it to a template function where the types of the parameters are not explicitly specified in the instantiation, the slice needs to be explicit.

fill

Given an input range that has assignable elements and a single value, this function fills the range with the value. The following example fills a static array of 20 elements with the value 100:

import std.algorithm : fill;
int[20] sink;
fill(sink[], 100);
foreach(n; sink)
  writeln(n);

remove

This function removes one or more elements from a bidirectional range that has assignable elements. It is the recommended way to remove items from an array. Elements are specified for removal by offset, not by value. In other words, r.remove(0, 2) removes the first and third elements of the range. The function returns the shortened range. The length of the source array is untouched, though its contents can be shuffled or stomped. In order for the original to reflect the shortened length, it must be assigned the return value of remove. The following prints [2, 4, 5]:

import std.algorithm : remove;
auto arr = [1, 2, 3, 4, 5];
arr = arr.remove(0, 2);
writeln(arr);

The function accepts a template parameter to specify the strategy to use regarding the order of the remaining elements in the range. The default, SwapStrategy.stable, maintains the order of the remaining elements. Specifying SwapStrategy.unstable allows a more performant version of the algorithm, in which the last element in the range is swapped to the newly emptied location. The following prints [5, 2, 4]:

import std.algorithm : remove, SwapStrategy;
auto arr = [1, 2, 3, 4, 5];
arr = arr.remove!(SwapStrategy.unstable)(0, 2);
writeln(arr);

Searching

std.algorithm.searching contains a number of functions that provide different ways to find elements or ranges inside other ranges. We are going to discuss three of them: find, count, and any.

find

This function searches for a single element in any input range. It takes a template argument of a string or callable to use as a predicate, the default being "a == b". There are five overloads of the function, all of which take a range in which to search (the haystack), as the first function argument and a second argument that specifies what to search for (the needle). If a match is found, the haystack is returned, advanced to the point where the match is in the front position. If no match is found, the returned range is empty. The following example shows the most basic form, looking for an element in a range of elements. It prints [7, 8, 9]:

import std.algorithm : find;
import std.range : iota;
auto r = iota(10).find(7);
writeln(r);

Another form searches for a range of elements inside a range of elements. It can serve as a D equivalent of the C standard library function strstr.

import std.algorithm : find;
auto s = "Like Frankie said I did it my way.";
auto r = s.find("Frankie");
if(!r.empty)    // s contains "Frankie"
  writeln(r);

count

This function is all about counting occurrences of needles in haystacks. If the needle is a single element, the haystack must be an input range and cannot be infinite. If the needle is a forward range, the haystack must be a forward range and cannot be infinite. In both cases, a predicate specified as a template argument is applied to every element of the haystack. The default predicate is "a == b", where a is an element from the haystack and b is either the needle or an element from the needle (when the needle is a range). If the predicate passes, the count is incremented. If no needle is specified, the predicate is applied to each element in isolation. In that case, the default predicate is simply "true".

import std.algorithm : count;
// How many 2s?
auto arr1 = [1, 2, 3, 5, 2, 6, 3, 2];
assert(arr1.count(2) == 3);
// How many occurrences of "ke"?
auto s = "Mike Parker";
assert(s.count("ke") == 2);
// How many even numbers?
auto arr2 = [2, 3, 1, 4, 5, 10, 7];
assert(arr2.count!("(a & 1) == 0") == 3);

any

This function returns true if any elements of a range satisfy the given predicate and false if none of them do. The default predicate, simply "a", results in the use of each element's implicit Boolean value. The following checks if any of the elements multiplied by 2 is greater than 50:

import std.algorithm : any;
import std.range : iota;
assert(iota(50).any!("(a * 2) > 50"));

Set operations

In mathematics, a set is a collection of objects which is treated as a distinct object itself. For example, the set {1, 2, 3} consists of three objects, 1, 2, and 3, but we can refer to it as a set of size three. A set operation (setop) is used to combine two sets in a particular way to produce a new set. This module provides a number of such operations, where ranges serve as the sets. We are going to look at three, based on common set operations: setIntersection, setDifference, and setUnion.

Each of these functions requires their source ranges to have the same element type. Each function also assumes each range is sorted by the predicate "a < b". A different predicate can be specified as a template parameter.

setIntersection

This function accepts any number of source ranges and returns a lazy range containing every element that is common between all of the source ranges. The following prints [1, 2, 5, 8, 10]:

import std.algorithm : setIntersection;
auto a1 = [0, 1, 2, 5, 6, 8, 9, 10];
auto a2 = [1, 2, 3, 4, 5, 7, 8, 10, 11, 12];
auto a3 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12];
writeln("Intersection: ", setIntersection(a1, a2, a3));

setDifference

This function takes two source ranges and returns a lazy range containing elements that only appear in one, but not in the other. The following prints [0, 6, 9]:

import std.algorithm : setDifference;
auto a1 = [0, 1, 2, 5, 6, 8, 9, 10];
auto a2 = [1, 2, 3, 4, 5, 7, 8, 10, 11, 12];
writeln(setDifference(a1, a2));

setUnion

This function takes any number of ranges and returns a lazy range containing every element of every source range, sorted according to the predicate. Elements in the returned range are not unique. This is a divergence from the mathematical notion of sets, where members are expected to be locally unique. The following prints [0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 6, 6]:

import std.algorithm : setUnion;
auto a1 = [0, 1, 2, 4, 6];
auto a2 = [1, 3, 4, 6];
auto a3 = [0, 1, 2, 3, 4];
writeln(setUnion(a1, a2, a3));

Sorting

This module contains several functions used to manipulate the order of a range. We are only going to examine two of them: sort and partition.

sort

This is an eager function that modifies the original range. It requires a random-access range and sorts its elements according to a template predicate, the default being "a < b". The function expects the predicate to behave in a particular manner. If predicate(a, b) and predicate(b, c) are true, then predicate(a, c) should also be true. Furthermore, if the former two are false, so should be the third. In many cases, such transitivity comes naturally, but care should be taken when using floating point types. Any user-defined types should have properly behaving opCmp and opEquals overloads when using the default predicate, otherwise a custom predicate should be provided.

sort supports stable and unstable sorting, with SwapStrategy.unstable being the default. In that case, Introsort is used as the sorting algorithm. When SwapStrategy.stable is specified, the Timsort algorithm is used. The former makes no allocations, while the latter will make one or more per call.

The following loads the words.txt file from an earlier example, randomly shuffles the words, sorts them with the default stable sort, and prints the first ten words from the list:

import std.algorithm : sort;
import std.range : take;
import std.array : split;
import std.file : readText;
import std.random : randomShuffle;
auto wordList = readText("words.txt").split();
wordList.randomShuffle();
writeln(wordList.sort().take(10));

For this specific example, an alternative would be to use std.algorithm.sorting.topN, which would replace sort().take(10) with a single call.

Tip

An error of omission

It's not uncommon for D programmers to omit the parentheses on functions called in a pipeline. Be careful about that when using std.algorithm.sort. As a remnant from the D1 days, arrays still have a built in sort property that uses quicksort. It internally uses a function pointer to make comparisons, calls to which cannot be inlined. If the parameter to std.algorithm.sort is an array and the parentheses are omitted, the compiler will pick up the old array property instead of the algorithm. This could be bad news for performance.

partition

Given a predicate as a template argument and a range as a function argument, this function reorders the range such that all elements that pass the predicate come first, followed by all that fail. The default swap strategy is SwapStrategy.unstable.

Let's revisit the robot health example from earlier in this chapter. Remember that we wanted to take two squads and reorganize them such that the five bots with the highest health are in one squad and those with the lowest in the other. Let's change that up a bit and say that we want five bots with health no lower than 30. Here's one approach:

void healthBasedSwap(Robot[] squad1, Robot[] squad2) {
  import std.algorithm : partition;
  import std.range : chain;
  squad1.chain(squad2).partition!("a.health >= 30");
}

This won't guarantee that we'll always have the five bots with the highest health, or that the result will be sorted, but it still works as a nice variation for one of the game AIs. With this implementation, the following prints [94, 76, 33, 67, 55]:

auto squad1 = [10, 76, 22, 67, 55];
auto squad2 = [33, 94, 17, 27, 16];
healthBasedSwap(squad1, squad2);
foreach(robot; squad1)
  writeln(robot.health);

std.array

This module provides several functions that operate on arrays and associative arrays. We've already seen array and split. Now we'll look at just a few more: Appender, assocArray, and join. The docs at http://dlang.org/phobos/std_array.html describe them all.

Appender

While D's array appending operator is quite convenient, it can unfortunately be inefficient if used often. Appender is a struct that can be used to efficiently append elements to an array. An Appender can be constructed directly, or one can be created via the appender convenience function. Once created, the put member function, or the append operator, is used to append individual items or entire ranges to the backing array.

There are a small number of member functions and properties that can be used on the Appender instance, such as reserve to preallocate additional space, or capacity to determine how many elements can be appended before triggering a reallocation. Appender also exposes the backing array via the data property, so it can be accessed directly to initiate a pipeline. Another potential use is to append the result of a pipeline to an Appender, rather than allocating a new array. When an instance is used multiple times, this approach can cut down on memory allocations. The following example creates a range of 50 integers and appends all of the even numbers to an Appender instance:

import std.array : appender;
import std.algorithm : filter, copy;
import std.range : iota;
auto app = appender!(int[]);
iota(50).filter!("(a & 1) == 0").copy(app);
foreach(n; app.data) writeln(n);

The put member function makes Appender an output range, allowing it to be used as the target in the call to copy.

assocArray

This function allocates a new associative array and populates it from an input range of key/value tuples. The zip algorithm is good for this, but the following example uses the range returned from group:

import std.array : assocArray;
import std.algorithm : group;
auto arr = [1, 1, 1, 2, 2, 3, 3, 3, 3, 4];
auto aa = arr.group().assocArray();
writefln("The number 3 appears %s times.", aa[3]);

join

This function is like a cousin of the array function. It takes a range of ranges and concatenates all of their elements into a single array, with the option to specify a separator to place between each element. The optional separator can be a range or an element. Let's do something with the word list example again. This time, rather than sorting it, we'll pull 10 strings from the list and join them all into a single string where each word is partitioned by a pipe character.

import std.range : take;
import std.array : split, join;
import std.file : readText;
import std.random : randomShuffle;
auto wordList = readText("words.txt").split();
wordList.randomShuffle();
auto arr = wordList.take(10).join("|");
writeln(arr);

In the std.algorithm.iteration module there is a function called joiner which is a lazy version of join. The following would achieve the same result:

auto arr = wordList.take(10).joiner("|").array;

Where to look for more

We've only covered a small fraction of the range-based functions in std.range, std.algorithm and std.array. In addition, there are a number of range-based functions scattered throughout Phobos. As I write, new range-based versions of older functions are being added, and new APIs are expected to provide a range-centric interface. Some of these functions can be used directly in a pipeline, others cannot. Some can be used with any kind of range, others require specific features. Some return a range configured based on the source range, others always return a specific type of range. In short, look for range-based functions throughout Phobos and always pay careful attention to the function signature and the documentation to learn how to make the most of them. Don't be afraid to look directly at the source to learn how a range-based function is implemented. Not only will it help you to better understand the function, but it will also help you learn techniques to implement your own composable, range-based functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.79.147