Chapter 5. Data and Functions. Functional Style

Talking about data and functions today when even objects are old news—that’s so 1970s. Yet, sadly, the days when we’ll just tell the computer what we want to do and let it figure out ways to do it are still ahead of us. Until then, functions are an essential component of all major programming disciplines. Fundamentally, any program consists of computations pushing data around; all of the elaborate scaffolding we build—types, objects, modules, frameworks, design patterns—just add interesting properties to the computation, such as modularity, error isolation, or ease of maintenance. A good language allows its user to hit a golden ratio of code that’s there for “doing” to code that’s there for “being.” The ideal ratio depends on a number of factors, program size being an obvious one: a short script should be dedicated mostly to doing things, whereas a large application will inevitably be concerned with specifying non-executable things such as interfaces, protocols, and modular constraints.

D allows creation of sizable programs so it has powerful modeling abilities; however, it strives to reduce, within reason, code dedicated to “being,” thus allowing you to concentrate on the “doing” part. Well-written D functions tend to be compact and general, packing a sometimes disconcerting power-to-weight ratio. So get ready to burn some tire.

5.1 Writing and unittesting a Simple Function

It could reasonably be argued that what computers do most of the time (save for uninteresting things such as waiting for input) is searching. Database programs search. Artificial Intelligence programs search. (That annoying automated telephone bank teller making chitchat with you? Search.) Internet search engines . . . well, you know. And as you no doubt know from direct experience, at their core, many programs that have ostensibly nothing to do with searching actually do quite a bit of it. Wherever there’s a problem to be solved, searching is involved in the solution. Conversely, many ingenious solutions to problems hinge on searching intelligently and on setting things up such that searching is easy to do. Naturally, the computing world is full of searching-related memes: pattern matching, relational algebra, binary searching, hashtables, binary trees, tries, red-black trees, skip lists . . . well, we can’t pretend to look at them all here, so for the moment let’s aim for a more modest goal—defining a few simple searching functions in D, starting with linear search, the simplest search there is. So without further ado, let’s write a function that tells whether a slice of ints contains some particular int.


Click here to view code image

bool find(int[] haystack, int needle) {
   foreach (v; haystack) {
      if (v == needle) return true;
   }
   return false;
}


Great. Now since this is the first D function under our belt, let’s describe what it really does in minute detail. When the compiler sees the definition of find, it compiles the function down to binary code. At runtime, when find is invoked, haystack and needle are passed into the function by value. Now, “by value” does not mean that if you pass in a million-element array that will be copied; as discussed in Chapter 4, the type int[] (aka a slice of an array of integers) is what’s called a fat pointer, really a pointer + length pair or a pointer + pointer pair that stores only the limits of a given portion of an array. Passing a slice of a million-element array into find really means passing enough information to get to the beginning and the end of the slice, as explained in § 4.1.4 on page 98. (Dealing with a container through a small, bounded representative that knows how to crawl the container is pervasive in the D language and standard library. That representative is generally called a range.) So at the end of the day, three machine words need to make it from the caller side into find. Once find gets control, it does its deed and returns a Boolean value (usually in a processor register), at which point the caller is ready to pick it up. And—as they encouragingly say at the end of a home improvement show after having completed an incredibly complicated task—that’s all there is to it.

To be frank, find’s design is quite a bit lacking. Returning bool is terribly uninformative; often, position information is also needed, for example, for continuing the search. We could return an integer (and some special value such as -1 for “not found”), but although integers are great for referring elements in contiguous storage, they are terribly inefficient for most other containers (such as linked lists). Getting to the nth element of a linked list after find returns n requires walking the list node by node all the way from its head—almost as much work as the find operation itself! So returning an integer would port terribly to most anything but arrays.

One idea that would work for a variety of haystacks—arrays, linked lists, and even files and sockets—is to have find simply nibble one element (straw?) off haystack until the searched value is found and return what’s left of haystack. (If the value is never found, find naturally returns the emptied haystack.) That’s a simple and general specification: “find(haystack, needle) reduces haystack from the left until it starts with needle or until haystack is exhausted and then returns the balance of haystack.” Let’s implement that design for int[].


Click here to view code image

int[] find(int[] haystack, int needle) {
   while (haystack.length > 0 && haystack[0] != needle) {
      haystack = haystack[1 .. $];
   }
   return haystack;
}


Note how find accesses only the first element of haystack and repeatedly reassigns haystack to a shorter subset of it. These primitives should be easily replaced later with, for example, list-specific primitives, but let’s worry about that generalization in a few minutes. For now let’s kick find’s tires a bit.

Recent years have seen increased attention paid to proper software testing in virtually all development methodologies. That’s a good trend because proper testing does have a huge positive impact on bug management. Let us, then, be in keeping with the times and write a short unit test for find. Simply follow (if you’re like yours truly) or precede (if you’re a true test-driven developer) find’s definition with this:


Click here to view code image

unittest {
   int[] a = [];
   assert(find(a, 5) == []);
   a = [ 1, 2, 3 ];
   assert(find(a, 0) == []);
   assert(find(a, 1).length == 3);
   assert(find(a, 2).length == 2);
   assert(a[0 .. $ - find(a, 3).length] == [ 1, 2 ]);
}


All we have to do to obtain a working module is to put together the function and the unit test in a file searching.d and then run


Click here to view code image

$ rdmd --main -unittest searching.d


If you pass the -unittest flag, unit tests will be compiled and set up to run before the main program. Otherwise, the compiler ignores the unittest blocks, which is useful if you’re interested in running already tested code without startup latencies. The --main flag instructs rdmd to add a do-nothing main function. (If you forget about --main, don’t worry; the linker will fluently and baroquely remind you of that in its native language, encrypted Klingon.) The surrogate main is useful to us because we’re interested in running only the unit test, not an actual program. Presumably there will be hordes of interested developers who will take our little file and use it in their projects, each defining its own main.

5.2 Passing Conventions and Storage Classes

As mentioned, the two arguments to find (one int plus one fat pointer representing an int[]) are copied into find’s private state. When find returns, a fat pointer is copied back to the caller. This is easily recognizable as straight call by value. In particular, changes to the arguments are not “seen” by the caller after the function returns. But beware of indirection: given that the content of the slice is not copied, changing individual elements of the slice will be seen by the caller. Consider:


Click here to view code image

void fun(int x) { x += 42; }
void gun(int[] x) { x = [ 1, 2, 3 ]; }
void hun(int[] x) { x[0] = x[1]; }
unittest {
   int x = 10;
   fun(x);
   assert(x == 10);             // Unchanged
   int[] y = [ 10, 20, 30 ];
   gun(y);
   assert(y == [ 10, 20, 30 ]); // Unchanged
   hun(y);
   assert(y == [ 20, 20, 30 ]); // Changed!
}


What happened? In the first two cases, fun and gun changed only their own copies of the parameters. In particular, the second case rebinds the fat pointer to point elsewhere, leaving the original array intact. In the third case, however, hun chose to change one element of the array, a change reflected in the original array. It is easy to imagine why when we think that the slice y is in a different place from the three integers that y manages. So if you assign a slice wholesale á la x = [ 1, 2, 3 ], then the slice that x previously held is left alone as x starts a new life; but if you change some particular element x[i] of a slice, other slices seeing that element (in our case, the caller of hun) will see the change.

5.2.1 ref Parameters and Returns

Sometimes we do want to make a change visible in the caller. In that case, the ref storage class comes to the rescue:


Click here to view code image

void bump(ref int x) { ++x; }
unittest {
   int x = 1;
   bump(x);
   assert(x == 2);
}


If a function expects a ref, it accepts only “real” data, not temporaries; anything that’s not an lvalue is rejected during compilation; for example:


Click here to view code image

bump(5); // Error! Cannot bind an rvalue to a ref parameter


This preempts silly mistakes when people believe work is being done but in fact the call has no visible effect.

You can also attach ref to the result of a function. In that case, the result of the function is an lvalue itself. For example, let’s modify bump as follows:


Click here to view code image

ref int bump(ref int x) { return ++x; }
unittest {
   int x = 1;
   bump(bump(x)); // Two increments
   assert(x == 3);
}


The inner call to bump returns an lvalue, which is then a legit argument for the outer call. Had the definition of bump looked like this:


Click here to view code image

int bump(ref int x) { return ++x; }


the compiler would have rejected the call bump(bump(x)) as an illegal attempt to bind the rvalue resulting from bump(x) to the ref parameter of the outer call to bump.

5.2.2 in Parameters

If you specify in with a parameter, then that parameter is considered read-only so you cannot change it in any way. For example:


Click here to view code image

void fun(in int x) {
   x = 42; // Error! Cannot modify 'in' parameter
}


The code above does not compile, revealing that in is quite strict. Even though fun already owns a copy of its argument, it is unable to change it.

Having an effectively constant parameter inside a function may certainly be helpful when reviewing its implementation, but the more interesting effect is outside the function. An in parameter disallows even indirect changes to the parameter, changes that would be reflected in the object after the function has returned. This makes in parameters extremely useful because they give guarantees to the caller, not only to the function implementation itself. Consider:


Click here to view code image

void fun(in int[] data) {
   data = new int[10]; // Error! Cannot modify 'in' parameter
   data[5] = 42;       // Error! Cannot modify 'in' parameter
}


The first error is not surprising because it is of the same kind as the error involving an int alone. The second error is much more interesting. Somehow the compiler has magically extended the reach of in from data itself to each of data’s slots—in is sort of “deep.”

The restriction actually goes to any depth, not only one level. Let’s exemplify that with a multidimensional array:


Click here to view code image

// An array of arrays of int has two indirections
void fun(in int[][] data) {
   data[5] = data[0];       // Error! Cannot modify 'in' parameter
   data[5][0] = data[0][5]; // Error! Cannot modify 'in' parameter
}


So in protects its data from modification transitively, all the way down through indirections. This behavior is not specific to arrays and applies to all of D’s data types. In fact, in in a parameter context is a synonym for the type qualifier const. Chapter 8 discusses in detail how const works.

5.2.3 out Parameters

Sometimes a ref parameter is meant only for the function to deposit something in. In such cases, you may want to use the out storage class, which is much like ref, except that it initializes the argument to its default value upon entry into the function:


Click here to view code image

// Computes divisor and remainder of a and b
// Returns divisor by value, remainder in the 'rem' parameter
int divrem(int a, int b, out int rem) {
   assert(b != 0);
   rem = a % b;
   return a / b;
}
unittest {
   int r;
   int d = divrem(5, 2, r);
   assert(d == 2 && r == 1);
}


The code above could have used ref instead of out without a hitch; however, using out clarifies to the caller that the function divrem does not expect rem to contain anything interesting upon entry.

5.2.4 static Data

Although static is not related to passing arguments to functions, discussing it here is appropriate because, just like ref, static applied to data is a storage class, meaning an indication about a detail regarding how data is stored.

Any variable declaration may be amended with static, which means that the data has one copy per execution thread. This is different from the C-established tradition of allocating one copy of the data for the entire application, and Chapter 13 discusses the rationale and consequences of D’s decision.

Static data being shared, it preserves its value across function calls, whether it sits inside or outside the function. The choice of placing static data inside various scopes concerns only visibility, but not storage. At module level, static is really handled the same way private is.


Click here to view code image

static int zeros; // Practically the same as private int zeros;

void fun(int x) {
   static int calls;
   ++calls;
   if (!x) ++zeros;
   ...
}


Static data must be initialized with compile-time constants. To initialize function-level static data upon first pass through a function, you may want to use a simple trick that uses a companion bool static variable:


Click here to view code image

void fun(double x) {
   static double minInput;
   static bool minInputInitialized;
   if (!minInputInitialized) {
      minInput = x;
      minInputInitialized = true;
   } else {
      if (x < minInput) minInput = x;
   }
   ...
}


5.3 Type Parameters

Let’s get back to the find function defined on page 132, because it has more than a few issues. For starters, find has a pretty narrow usefulness, so it’s worth looking into ways to make it more general. Let’s start with a simple observation. The presence of int in find is hard coding, pure and simple. There’s nothing different in the code shape when it comes to finding double values in double[] slices or string values in string[] slices. What we’d like, then, is to transform int into a placeholder—a parameter of find that describes the type, not the value, of the entities involved. To do so, we need to change our definition to


Click here to view code image

T[] find(T)(T[] haystack, T needle) {
   while (haystack.length > 0 && haystack[0] != needle) {
      haystack = haystack[1 .. $];
   }
   return haystack;
}


As expected, there is no change in the body of find, only in its signature. The signature now has two pairs of parenthesized items. The first one lists the type parameters of the function, and the second one is the regular parameter list, which now can make use of the just-defined type parameters. Now we can handle not only slices of int, but slices of everything, be they other built-in types or user-defined types. To top it off, our previous unittest continues to work because the compiler deduces T automatically from the argument types. Neat. But instead of resting on our laurels let’s add a unit test to confirm these extraordinary claims:


Click here to view code image

unittest {
   // Testing generic capabilities
   double[] d = [ 1.5, 2.4 ];
   assert(find(d, 1.0) == null);
   assert(find(d, 1.5) == d);
   string[] s = [ "one", "two" ];
   assert(find(s, "two") == [ "two" ]);
}


Now what happens when the compiler sees the improved definition of find? The compiler faces a tougher challenge compared to the int[] case because now T is not known yet—it could be just about any type. And different types are stored differently, passed around differently, and sport different definitions of ==. Dealing with this challenge is important because type parameters really open up possibilities and multiply reusability of code. When it comes to generating code for type parameterization, two schools of thought are prevalent today [43]:

  • Homogeneous translation: Bring all data to a common format, which allows compiling only one version of find that will work for everybody.
  • Heterogeneous translation: Invoking find with various type arguments (e.g., int versus double versus string) prompts the compiler to generate as many specialized versions of find.

In homogeneous translation, the language must offer a uniform access interface to data as a prerequisite to presenting it to find. Heterogeneous translation is pretty much as if you had an assistant writing one special find for each data format you may come up with, all built from the same mold. Clearly the two approaches have relative advantages and disadvantages, which are often the subject of passionate debates in various languages’ communities. Homogeneous translation favors uniformity, simplicity, and compact generated code. For example, traditional functional languages favor putting everything in list format, and many traditional object-oriented languages favor making everything an object that offers uniform access to its features. However, the disadvantages of homogeneous translation may include rigidity, lack of expressive power, and inefficiency. In contrast, heterogeneous translation favors specialization, expressive power, and speed of generated code. The costs may include bloating of generated code, increases in language complexity, and an awkward compilation model (a frequently aired argument against heterogeneous approaches is that they’re glorified macros [gasp]; and ever since C gave such a bad reputation to macros, the label evokes quite a powerful negative connotation).

A detail worth noting is an inclusion relationship: heterogeneous translation includes homogeneous translation for the simple reason that “many formats” includes “one format,” and “many implementations” includes “one implementation.” Therefore it can be argued (all other issues left aside) that heterogeneous translation is more powerful than homogeneous translation. If you have heterogeneous translation means at your disposal, at least in principle there’s nothing stopping you from choosing one unified data format and one unified function when you so wish. The converse option is simply not available under a homogeneous approach. However, it would be oversimplifying to conclude that heterogeneous approaches are “better” because aside from expressive power there are, again, other arguments that need to be taken into consideration.

D uses heterogeneous translation with (warning, incoming technical terms flak) statically scoped symbol lookup and deferred typechecking. This means that when the D compiler sees the generic find definition, it parses and saves the body, remembers where the function was defined, and does nothing else until find gets called. At that point, the compiler fetches the parsed definition of find and attempts to compile it with the type that the caller chose in lieu of T. When the function uses symbols, they are looked up in the context in which the function was defined.

Should the compiler fail to generate find for your particular type, an error message is generated. This can actually be annoying because the error may be due to a bug in find that went undetected. This, in fact, provides the perfect motivation for the next section because find has two bugs—not functional bugs, but generality bugs: as it stands now, find is at the same time too general and not general enough. Let’s see how that Zen claim works.

5.4 Signature Constraints

Say we have an array of double and we want to look for an integer in it. It should go over rather smoothly, right?


Click here to view code image

double[] a = [ 1.0, 2.5, 2.0, 3.4 ];
a = find(a, 2); // Error! 'find(double[], int)' undefined


Yes, you were ambushed. What happens is that the function find expects a T[] in the first position and a T in the second. However, find receives a double[] and an int, which are claiming T = double and T = int, respectively. If we squint hard enough, we do see that the intent of the caller in this case was to have T = double and benefit from the nice implicit conversion from int to double. However, having the language attempt combinatorially at the same time implicit conversions and type deduction is a dicey proposition in the general case, so D does not attempt to do all that. If you said T[] and T, you can’t pass a double[] and an int.

It seems like our implementation of find lacks generality because it asks for the type of the slice to be identical to the type of the searched value. In fact, for a given slice type, we need to accept any value that can be compared using == against a slice element.

Well, if one type parameter is good, two can only be better:


Click here to view code image

T[] find(T, E)(T[] haystack, E needle) {
   while (haystack.length > 0 && haystack[0] != needle) {
      haystack = haystack[1 .. $];
   }
   return haystack;
}


Now the test passes with flying colors. However, find is technically lying because it declares it accepts any T and E, including pairings that make no sense! To illustrate why that is a problem, consider the call


Click here to view code image

assert(find([1, 2, 3], "Hello")); // Error!
   // Comparison haystack[0] != needle is invalid
   //    for int[] and string


The compiler does find a problem; however, it finds it in the comparison situated inside find’s body. This can get rather confusing to the unwitting user, because it’s unclear whether the error lies at the call site or in the implementation of find. (In particular, the file and line reported by the compiler point straight inside find’s definition.) If the problem is at the end of a long chain of invocations, it gets all the more confusing, so we’d like to fix this. Now, what is the root of the problem? If you allow a little wordplay, find puts its signature on checks that its body can’t cash.

In its signature (i.e., the part just before the first {), find solemnly states it accepts a slice of any type T and a value of any type E. The compiler gladly acknowledges that, dispatches the nonsensical arguments to find, deduces T = int and E = string, and is about to call it a day. However, as soon as find’s body comes into the discussion, the compiler embarrassingly realizes it is unable to generate sensible code for the comparison haystack[0] != needle and reports an error tantamount to “find bit off more than it can chew.” Only a few combinations of all possible Ts and Es are really accepted by find’s body, namely, those that accept comparison for equality.

Building some sort of back-off mechanism would be possible. Another solution, chosen by D, is to allow find’s implementor to systematically limit the applicability of the function, and the right place to specify that constraint is find’s signature, right where T and E appear for the first time. D allows that via a signature constraint:


Click here to view code image

T[] find(T, E)(T[] haystack, E needle)
   if (is(typeof(haystack[0] != needle) == bool))
{
   ... // Implementation remains the same
}


The if clause in the signature advertises that find accepts haystack of type T[] and needle of type E only if the type of the expression haystack[0] != needle is bool. This has several important consequences. First, the if clause clarifies to the writer, the compiler, and the reader what expectations find has of its parameters, without having to inspect the body at all (which most of the time is longer than what we now have). Second, with the if clause in tow, find is now able to elegantly decline commitment when incompatible arguments are passed to it, which in turn allows other features like function overloading to work smoothly. Third, the new definition makes for better compiler error messages because the error becomes evident in the call, not the body, of find.

Note that the expression under typeof is never evaluated at runtime; its purpose is merely to tell what type it would have if the expression compiled. (If the expression under typeof does not compile, that is no compilation error, just mere information that it has no type at all, and “no type at all” is not bool.) In particular, you shouldn’t be worried that haystack[0] is under the test even if haystack’s length is zero. Conversely, you cannot put runtime-evaluated conditions in a signature constraint; for example, you could not specify that you limit the charter of find to needle > 0.

5.5 Overloading

We defined find for a slice and an element. Let’s now set out to write a version of find that tells whether a slice can be found within another slice. A common approach is a brute-force search using two nested loops. That algorithm is not very efficient—its runtime is proportional to the product of the two slices’ lengths. Let’s not worry about algorithmic efficiency for the time being; for now let’s focus on defining a good signature for the newly added function. The previous section equipped us with most everything we need. Indeed, the implementation writes itself:


Click here to view code image

T1[] find(T1, T2)(T1[] longer, T2[] shorter)
   if (is(typeof(longer[0 .. 1] == shorter) : bool))
{
   while (longer.length >= shorter.length) {
      if (longer[0 .. shorter.length] == shorter) break;
      longer = longer[1 .. $];
   }
   return longer;
}


Aha! You see, this time we didn’t fall again into the trap of making the function unduly particular. An inferior definition would have been


Click here to view code image

// No! This signature is severely limiting!
bool find(T)(T[] longer, T[] shorter) {
    ...
}


which, agreed, is a bit terser but plenty more limiting. Our implementation can tell whether a slice of int includes a slice of long, or whether a slice of double includes a slice of float, without copying any data around. These options were simply inaccessible to the simplistic signature. You would have to either copy data around to ensure the right types are in place or give up on using the function altogether and roll your search by hand. And what kind of function is one that looks cute in toy examples and fails for serious use?

As far as the implementation goes, note (in the outer loop) the now familiar reduction of longer by one element from the left end. The inner loop is implicit in the bulk comparison longer[0 .. shorter.length] == shorter, which compares the first shorter.length elements of longer with shorter.

D supports function overloading—several functions may share the same name as long as they differ in the number of parameters or the type of at least one parameter. Language rules decide during compilation where actual calls should go. Overloading builds on our innate linguistic ability to disambiguate the meaning of words by using context and is very helpful for providing ample functionality without a corresponding growth in the vocabulary that callers must remember. The converse risk is that if the call disambiguation rules are too lax, people may think they are calling one function and call another instead, or if said rules are too tight, people would need to contort their code to explain to the compiler which version they meant to call. D strives to keep the rules simple, and in this particular case the rule in effect is a no-brainer: if a function’s signature constraint (the if clause) evaluates to false, the function simply vanishes from the overload set—it’s not considered as a candidate at all. For our two find versions, the corresponding if clauses are never true for the same arguments. So for any possible call to find, at least one of the two overloads will make itself invisible; there’s never an ambiguity to solve. So let’s follow these thoughts with a unittest.


Click here to view code image

unittest {
   // Test the newly introduced overload
   double[] d1 = [ 6.0, 1.5, 2.4, 3 ];
   float[] d2 = [ 1.5, 2.4 ];
   assert(find(d1, d2) == d1[1 .. $]);
}


The two versions of find may live in the same file or in different files; there is no competition between them because their if clauses are never true at the same time. Now, to discuss overloading rules further, let’s assume we work with int[] a lot so we want to define an optimized find for that type:


Click here to view code image

int[] find(int[] longer, int[] shorter) {
   ...
}


As written, this overload of find has no type parameters. Also, it is rather clear that some competition is going on between the general find we defined before and the specialized find for integers. What is the relative position of the two functions in the overloading food chain, and which of them will succeed in grabbing the call below?


Click here to view code image

int[] ints1 = [ 1, 2, 3, 5, 2 ];
int[] ints2 = [ 3, 5 ];
auto test = find(ints1, ints2); // Correct or error?
                                // General or specialized?


D’s approach to the matter is very simple: its choice is consistently biased in favor of the more specialized function. Now, in the general case the notion of “more specialized” demands some explanation; it suggests there’s a sort of specialization order, a “less-than” for functions. Indeed there is, and that relation is called partial ordering of functions.

5.5.1 Partial Ordering of Functions

Although it sounds like it would demand a lot of math-fu, partial ordering is a very simple notion. Think of generalizing the familiar numeric relation to other sets, in our case functions. Given two functions foo1 and foo2, we want to tell whether foo1 is any less fit than foo2 for a call (let’s denote “foo1 is less fit than foo2” as foo1 foo2). If we define such a relation, we then have a criterion for deciding which of them should be called in an overloading contest: upon a call to foo, sort the possible foos by and choose the “largest” foo found. To be worth its salt, a partial order must be reflexive (a a), antisymmetric (if a b and b a then a and b are considered equivalent), and transitive (if a b and b c then a c).

D defines a simple partial ordering relation for functions: if foo1 can be called with the parameter types of foo2, then foo1 foo2. It is possible that foo1 foo2 and foo2 foo1 simultaneously, in which case we say that the two are equally specialized. For example


Click here to view code image

// Three equally specialized functions: either could be called
//     with another's parameter type
void sqrt(real);
void sqrt(double);
void sqrt(float);


are equally specialized because any of them could be called with either a float, a double, or a real (paradoxically sensible in spite of the lossy conversion, as discussed in § 2.3.2 on page 42).

It’s also possible that neither function is the other, in which case we say that foo1 and foo2 are unordered.1 There are plenty of examples, such as


Click here to view code image

// Two unordered functions: neither could be called with
//     the other's parameter type
void print(double);
void print(string);


In the most interesting case, exactly one of foo1 foo2 and foo2 foo1 is true—for example, the first, in which case we say that foo1 is less specialized than foo2. To wit:


Click here to view code image

// Two ordered functions: write(double) is less specialized than
//     write(int) because the former can be called with an int, but
//     the latter cannot be called with a double
void write(double);
void write(int);


Using this partial order, D performs the following simple algorithm for making a decision in an overloaded call foo(arg1, ..., argn):

  1. If there’s one exact match (same types and same number of parameters as the arguments), take that.
  2. Select a set of candidates { foo1, ... fook } that would accept the call if no other overloads are present. Here is where type deduction deduces types and if clauses are evaluated.
  3. If the set has size zero, issue “no match” error.
  4. If all functions are not in the same module, issue “attempt at cross-module overloading” error.
  5. From that set eliminate all functions that are less specialized than any others in the set, that is, keep only the most specialized functions.
  6. If the remaining set has size greater than one, issue “ambiguous call” error.
  7. The sole element of the set is the winner.

That’s about it. Consider a first example:


Click here to view code image

void transmogrify(uint) {}
void transmogrify(long) {}

unittest {
   transmogrify(42); // Calls transmogrify(uint)
}


There is no exact match and both functions could apply, so partial ordering kicks in. It dictates that, although both functions would accept the call, the first is more specialized so it is entitled to win. (For better or worse, int converts automatically to uint.) Now let’s throw a generic function into the mix:


Click here to view code image

// As above, plus ...
void transmogrify(T)(T value) {}

unittest {
   transmogrify(42);      // Still calls transmogrify(uint)
   transmogrify("hello"); // Calls transmogrify(T), T=string
   transmogrify(1.1);     // Calls transmogrify(T), T=double
}


Now, what happens when transmogrify(int) is compared for ordering against the generic function transmogrify(T)(T)? Well, even though it was decided T = uint, when comparing for ordering, T is not replaced with uint but preserved in all its genericity. Could transmogrify(int) accept some arbitrary type T? It couldn’t. Could transmogrify(T)(T) accept an int? Sure it could. It follows that transmogrify(T)(T) is less specialized than transmogrify(int), so it is eliminated from the candidate set. So non-generic functions are generally preferred to generic functions, even when the non-generic functions need an implicit conversion.

5.5.2 Cross-Module Overloading

Step 4 in the overloading algorithm on the preceding page deserves particular attention. Consider a slightly modified example with overloads for uint and long, just with more files involved:


Click here to view code image

// In module calvin.d
void transmogrify(long) { ... }

// In module hobbes.d
void transmogrify(uint) { ... }

// Module client.d
import calvin, hobbes;
unittest {
   transmogrify(42);
}


The transmogrify(uint) overload in calvin.d is more specialized; however, the compiler refuses to call it by claiming ambiguity. D staunchly refuses to overload across different modules. If such overloading were allowed, then the meaning of the call would depend on the interplay of various modules included (and in general there could be many modules, many overloads, and more complicated calls in the fray). Imagine adding one new import to a working program and having its behavior change in unpredictable ways! Conversely, if cross-module overloading were allowed, the burden on the code reader would increase enormously: now, in order to figure out where a call goes, they don’t need to know what one module contains, but instead what all included modules contain because one of them might define a better match. Worse, if the order of top-level declarations mattered, the call transmogrify(5) could actually end up calling different functions depending on its position in the file. This is an endless source of problems because it essentially means that the reader of a piece of code must keep a large moving context in mind at all times.

A module can define a bunch of overloads that implement functionality for a variety of types. Another module cannot just barge in with its own addenda to that functionality. However, the second module can define its own bunch of overloads. As long as a function in one module does not hijack calls that would otherwise go to a function in another module, there’s no ambiguity. The decision on whether there’s a conflict or not is made on a per-call basis. Consider:


Click here to view code image

// In module calvin.d
void transmogrify(long) { ... }
void transmogrify(uint) { ... }

// In module hobbes.d
void transmogrify(double) { ... }

// In module susie.d
void transmogrify(int[]) { ... }
void transmogrify(string) { ... }

// Module client.d
import calvin, hobbes, susie;

unittest {
   transmogrify(5);        // Error! cross-module overloading
                           //     across calvin and hobbes
   calvin.transmogrify(5); // Fine, explicit resolution
                           // calvin.transmogrify(uint) called
   transmogrify(5.5);      // Fine, only hobbes could take it
   transmogrify("hi");     // Hello from Susie
}


Calvin, Hobbes, and Susie interact in interesting ways. Note how ambiguities are very fine-grained; the fact that there’s a conflict between calvin.d and hobbes.d in the first call does not render the modules mutually incompatible—the third call still goes through because no function in other modules was able to take it. Finally, susie.d defines its own overloads and is never in conflict with the other two (unlike the eponymous comic strip characters).

Guiding Overloading

Whenever there is ambiguity between modules, there are two main ways in which you can guide the workings of overloading. One is to qualify the function name with the module name, as shown in the second call calvin.transmogrify(5). Doing this restricts lookup to only calvin.d. Inside that module, overloading rules are still at work. A more transparent way is to use a local alias for the symbol in question, which goes like this:


Click here to view code image

// Inside calvin.d
alias hobbes.transmogrify transmogrify;


This directive does something very interesting: it wheelbarrows all of the overloads of transmogrify in module hobbes.d into the current module, calvin.d. So if calvin.d contains the directive above, it’s as if it defined all overloads of transmogrify that hobbes.d defined, in addition to its own overloads. That’s very nice of calvin.d—it democratically consults hobbes.d whenever the important decision of calling transmogrify is to be made. Alternatively, if calvin.d and hobbes.d have had a misadventure and choose to ignore each other’s existence, client.d can still call transmogrify, taking all overloads into account by aliasing both calvin.transmogrify and hobbes.transmogrify:


Click here to view code image

// Inside client.d
alias calvin.transmogrify transmogrify;
alias hobbes.transmogrify transmogrify;


Now any call to transmogrify issued from client.d will resolve overloads as if the definitions in both calvin.d and hobbes.d were present in client.d.

5.6 Higher-Order Functions. Function Literals

So far we know how to find an element or a slice inside a slice. However, finding is not always about searching a given item. Consider a task such as “Find the first negative element in an array of numbers.” For all its might, our find library cannot solve that problem.

Fundamentally, find looks to fulfill some Boolean condition, a predicate; so far, the predicate has always been a comparison using == against a given value. A more flexible find would receive a predicate from the user and assemble the linear search logic around it. If it could amass such power, find would become a higher-order function, meaning a function that can take other functions as arguments. This is a quite powerful approach to doing things because higher-order functions compound their own functionality with the functionality provided by their arguments, reaching a range of behaviors inaccessible to simple functions. To have find take a predicate, we can use an alias parameter.


Click here to view code image

T[] find(alias pred, T)(T[] input)
   if (is(typeof(pred(input[0])) == bool))
{
   for (; input.length > 0; input = input[1 .. $]) {
      if (pred(input[0])) break;
   }
   return input;
}


This new overload of find takes only one “classic” parameter but adds the mysterious alias pred parameter. An alias parameter can match any argument: a value, a type, a function name—anything that can be expressed symbolically. Let’s now see how to invoke this new overload of find.


Click here to view code image

unittest {
   int[] a = [ 1, 2, 3, 4, -5, 3, -4 ];
   // Find the first negative number
   auto b = find!(function bool(int x) { return x < 0; })(a);
}


This time find takes two argument lists. The first list is distinguished by the syntax !(...) and consists of generic arguments. The second list consists of the classic arguments. Note that although find declares two generic parameters (alias pred and T), the calling code specifies only one. That’s because deduction works as usual by binding T = int. In our use of find so far, we’ve never needed to specify any generic arguments because the compiler deduced them for us. This time around, there’s no deduction for pred so we specified it as a function literal. The function literal is


Click here to view code image

function bool(int x) { return x < 0; }


where function is a keyword and the rest is a regular function definition, just without a name.

Function literals (also known as anonymous functions or lambda functions) turn out to be very useful in a variety of situations, but their syntax is a bit heavy. The literal used above is 41 characters long, of which only about 5 do actual work. To help with that, D allows you to trim the syntax quite a bit. The first shortcut is to eliminate either or both return type and parameter types; the compiler is smart enough to infer them all because by its very definition, the body of the anonymous function is right there.


Click here to view code image

auto b = find!(function(x) { return x < 0; })(a);


The second shortcut is to simply eliminate the keyword function itself. You can combine the two shortcuts like this, leading to a rather terse notation:


Click here to view code image

auto b = find!((x) { return x < 0; })(a);


That looks easily recognizable to the initiated, which you just became as of a couple of seconds ago.

5.6.1 Function Literals versus Delegate Literals

One important requirement of a lambda function facility is to allow access to the context in which the lambda was defined. Consider a slightly modified invocation:


Click here to view code image

unittest {
   int[] a = [ 1, 2, 3, 4, -5, 3, -4 ];
   int z = -2;
   // Find the first number less than z
   auto b = find!((x) { return x < z; })(a);
   assert(b == a[4 .. $]);
}


This modified example works, which apparently answers the question. However, if, just for kicks, we prepend function to the literal, the code mysteriously stops working!


Click here to view code image

auto b = find!(function(x) { return x < z; })(a);
// Error! function cannot access frame of caller function!


What’s happening, and what’s with that complaint about a frame? Clearly there must be some underlying mechanism through which the function literal gets access to z—it can’t divine its location from thin air. That mechanism is encoded as a hidden parameter, called a frame pointer, that the literal takes. The compiler uses the frame pointer to wire access to outer variables such as z. However, a function literal that does not use any local variable wouldn’t need that extra parameter. D being statically typed, it must distinguish between the two, and indeed it does. Aside from function literals, there are also delegate literals, which can be created like this:


Click here to view code image

unittest {
   int z = 3;
   auto b = find!(delegate(x) { return x < z; })(a); // OK
}


Delegates have access to the enclosing frame, whereas functions do not. If both function and delegate are absent from the literal, the compiler automatically detects which is necessary. Type deduction comes to the rescue again by making the tersest, most convenient code also do the right thing automagically.


Click here to view code image

auto f = (int i) {};
assert(is(f == function));


5.7 Nested Functions

We can now invoke find with an arbitrary function literal, which is quite neat. However, if the literal grows pretty large or if we want to reuse it, it becomes clumsy to write its body at the invocation place (potentially several times, too). We’d like to invoke find with a named function, as opposed to an anonymous one; furthermore, we’d want to preserve the right to access local variables if we so wish. For that kind of activity and many others, D has nested functions.

A nested function definition looks exactly like a regular one, except that it appears inside another function. Consider:


Click here to view code image

void transmogrify(int[] input, int z) {
   // Nested function
   bool isTransmogrifiable(int x) {
      if (x == 42) {
         throw new Exception("42 cannot be transmogrified");
      }
      return x < z;
   }
   // Find the first transmogrifiable element in the input
   input = find!(isTransmogrifiable)(input);
   ...
   // ... and again
   input = find!(isTransmogrifiable)(input);
   ...
}


Nested functions can come in very handy in a variety of situations. Although they don’t do anything that regular functions can’t do, nested functions enhance convenience and modularity because they sit right inside the function using them, and they have intimate access to the context of the nesting function. The latter advantage is very important; in the example above, if nesting were not an option, accessing z would have been much more problematic.

The nested function isTransmogrifiable uses the same trick (a hidden parameter) to get access to its parent’s stack frame, and z in particular. Sometimes you may want to actively prevent that from happening and make isTransmogrifiable just an absolutely regular function, save for its definition sitting inside transmogrify. To effect that, simply prepend static (what else?) to isTransmogrifiable’s definition:


Click here to view code image

void transmogrify(int[] input, int z) {
   static int w = 42;
   // Nested regular function
   static bool isTransmogrifiable(int x) {
      if (x == 42) {
         throw new Exception("42 cannot be transmogrified");
      }
      return x < w; // Accessing z would be an error
   }
   ...
}


With static in tow, isTransmogrifiable can access only module-level and static data defined inside transmogrify (as shown in the example with w). Any transitory data such as function parameters or non-static variables is not accessible (but of course could be explicitly passed).

5.8 Closures

As mentioned, alias is a purely symbolic device; all it does is to make a symbol mean the same thing as another. In our examples above, pred is not a real value, as much as any function name is not a value; you can’t assign something to pred. If you want to create an array of functions (e.g., a sequence of commands), alias won’t help you a bit. Definitely something extra is needed here, and that something is the ability to have a palpable function object that can be stored and retrieved, much like a pointer to a function in C.

Consider, for example, the following challenge: “Given a value x of type T, return a function that finds the first value equal to x in an array of Ts.” This alembicated, indirect definition is typical of a higher-order function: you don’t do something, you just return something that will do it. So we need to write a function that in turn (careful here) returns a function that in turn (we’re on a roll!) takes a T[] and returns a T[]. So the type returned is T[] delegate(T[]). Why delegate and not function? Well, as discussed above, the delegate gets to access its own state in addition to its arguments, whereas function cannot. And our function must have some state because it has to save that value x.

This is a very important point, so it is worth emphasizing. Imagine the type T[] function(T[]) as the sheer address of a function: one machine word. That function has access only to its parameters and the program globals. If you take two pointers to the same function and pass them the same arguments, they will have access to the same program state. Anyone who has tried to deal with C callbacks—for example, for windowing systems or thread launching—knows about this perennial problem: pointers to functions can’t have access to private state. The way C callbacks get around it is usually by taking a void* (an untyped address) that passes state information around. Other callback systems—such as the old and crotchety MFC library—store additional state in a global associative array, and yet others, such as the Active Template Library (ATL), use assembler code to create new functions dynamically. The cross-platform Qt library uses an advanced mechanism called signals/slots to effect the same thing. Wherever there’s interfacing with C callbacks, there’s some involved solution that allows callbacks to access state; it’s not a simple problem.

With delegate, all of these issues vanish. Delegates achieve that by taking a size penalty: a delegate holds a pointer to a function and a pointer to that function’s environment. That may be larger and sometimes slower, but it also is considerably more powerful, so you should prefer using delegate to using function objects for your own designs. (Of course, function is irreplaceable when interfacing with C via callbacks.)

That being said, let’s take a stab at writing the new finder function. Remember that we need to return a T[] delegate(T[]).


Click here to view code image

import std.algorithm;

T[] delegate(T[]) finder(T)(T x)
    if (is(typeof(x == x) == bool))
{
   return delegate(T[] a) { return find(a, x); };
}

unittest {
   auto d = finder(5);
   assert(d([1, 3, 5, 7, 9]) == [ 5, 7, 9 ]);
   d = finder(10);
   assert(d([1, 3, 5, 7, 9]) == []);
}


Agreed, things like two return statements on the same line are bound to look odd to the uninitiated, but then many higher-order functions are liable to seem bizarre at first. So, line by line: finder is parameterized by a type T, takes a T and returns a T[] delegate(T[]); finder imposes a condition on T: you must be able to compare two values of type T and get a bool. (Again, the silly comparison x == x is there just for the sake of types, not particular values.) Then, finder cleverly does its deed by returning a delegate literal. That literal has a short body that calls our previously defined find, which fulfills the contract. The returned delegate is called a closure.

Usage is as expected—calling finder returns a delegate that you can later call or reassign. The variable d defined by the unittest has type T[] delegate(T[]), but thanks to auto there’s no need for us to write that type explicitly. In fact, to be completely honest, auto can serve as a shortcut in defining finder as well; the presence of all types was intended as training wheels and exposition helpers. A considerably briefer definition of finder could look like this:


Click here to view code image

auto finder(T)(T x) if (is(typeof(x == x) == bool)) {
   return (T[] a) { return find(a, x); };
}


Note the use of auto as the result type of the function and also the omission of the delegate keyword; the compiler will gladly take care of those for us. However, the T[] in front of delegate parameter a must be specified. This is because the compiler must have a starting point to make the auto magic happen: the type returned by the delegate is inferred from the type of find(a, x), which in turn is inferred from the type of a and that of x; from there, the type of the delegate is inferred as T[] delegate(T[]), which is also the return type of finder. Without knowing the type of a, all this chain of reasoning could not be carried out.

5.8.1 OK, This Works. Wait, It Shouldn’t. Oh, It Does!

The unittest supplied probes the behavior of finder, but of course that’s not a proof that it works correctly. There’s one advanced question that lurks in the background: the delegate returned by finder uses x. After finder has returned, where does x sit? In fact, given that D uses the regular call stack for calling functions, the question even suggests that something very dangerous is going on: a caller invokes finder, x is deposited onto the call stack, finder returns, the stack is popped back to its position just before the call . . . which implies that the delegate returned by finder accesses a defunct stack location!

Persistence of local environment (in our case the environment consists only of x, but it could be arbitrarily large) is a classic problem in implementing closures, and each language that supports closures must address it somehow. D uses the following approach:2 Ordinarily, all calls use a simple stack. When the compiler detects the presence of a closure, it automatically copies the used context onto heap-allocated memory and wires the delegate to use the heap-allocated data. That heap is garbage-collected.

The disadvantage is that every call to finder will issue a memory allocation request. However, closures are very expressive and enable a host of interesting programming paradigms, so in most cases the cost is more than justified.

5.9 Beyond Arrays. Ranges. Pseudo Members

The end of § 5.3 on page 138 enigmatically claimed: “find is at the same time too general and not general enough.” We then saw how find is too general and fixed that by restricting its accepted types. It is time now to figure out why find is still not general enough.

What’s the essence of linear search? You go through the searched elements one by one looking for a specific value, or to fulfill a predicate. The problem is that so far we’ve been working on contiguous arrays (slices aka T[]) exclusively, but contiguity is never, ever of relevance to the notion of linear searching. (It is relevant only to the mechanics of carrying out the walk.) By limiting ourselves to T[], we robbed find of access to a host of other data structures with various topologies that could be searched linearly. A language that, for example, makes find a method of some type Array rightly deserves your scornful eye. That’s not to say you can’t get work done in that language; you’d sure be busy doing work, actually quite a bit more than you should.

It’s time to clean the slate and reevaluate our basic find implementation, copied below for convenience:


Click here to view code image

T[] find(T)(T[] haystack, T needle) {
   while (haystack.length > 0 && haystack[0] != needle) {
      haystack = haystack[1 .. $];
   }
   return haystack;
}


What are the primitive operations we’re using against haystack, and what are their respective meanings?

  1. haystack.length > 0 tells whether there are elements left in haystack.
  2. haystack[0] accesses the first element of haystack.
  3. haystack = haystack[1 .. $] eliminates the first element of haystack.

The particular manner in which arrays implement these operations is not easy to generalize to other containers. For example, it would be Darwin-awards-worthy to ask whether a singly linked list has elements in it by evaluating haystack.length > 0. Unless the list consistently caches its length (problematic in more ways than one), evaluating a list’s length takes time proportional to the actual length of the list, whereas a quick look at the head node in the list takes only a few machine instructions. Using indexed access with lists is an equally losing proposition. So let’s distill the primitives as three named functions and leave it to haystack’s type to figure out their implementation. The syntax of the primitives could be

  1. haystack.empty for testing whether haystack is done
  2. haystack.front for getting the first element of haystack
  3. haystack.popFront() for eliminating the first element of haystack

Note how the first two operations do not modify haystack so they don’t use parentheses, whereas the third does affect haystack, which is reinforced syntactically by the use of (). Let’s redefine find to use the new shiny syntax:


Click here to view code image

R find(R, T)(R haystack, T needle)
   if (is(typeof(haystack.front != needle) == bool))
{
   while (!haystack.empty && haystack.front != needle) {
      haystack.popFront();
   }
   return haystack;
}


It would be great now to bask a little in the glow of this generous definition, but the failing unittests are an unpleasant reality check. Well, of course: the built-in slice type T[] has no idea that we were suddenly enlightened to opt for a new set of primitives bearing arbitrary names such as empty, front, and popFront. We need to define them for all T[]s. Sure, they will have trivial implementations, but we need those to have our nice abstraction work again with the type we started from.

5.9.1 Pseudo Members and the @property Attribute

One syntactic problem is that function invocations so far have looked like fun(argument), whereas now we’d like to define calls that look like argument.fun() and argument.fun. The latter syntaxes are called method invocation syntax and property access syntax, respectively. We’ll learn in the next chapter that they’re rather easy to define for user-defined types, but T[] is a built-in type. What to do?

D recognizes this as a purely syntactic issue and allows pseudo-member notation: if a.fun(b, c, d) is seen but fun is not a member of a’s type, D rewrites that as fun(a, b, c, d) and tries that as well. (The opposite path is never taken, though: if you write fun(a, b, c, d) and it does not make sense, a.fun(b, c, d) is not tried.) The intent of pseudo methods is to allow you to call regular functions with the send-message-to-object syntax that is familiar to some of us. Without further ado, let’s implement empty, front, and popFront for built-in arrays. Three lines and we’re done:


Click here to view code image

@property bool empty(T)(T[] a) { return a.length == 0; }
@property ref T front(T)(T[] a) { return a[0]; }
void popFront(T)(ref T[] a) { a = a[1 .. $]; }


The @property notation introduces an attribute called “property.” Attributes, always introduced with @, are simple adornments specifying certain features for the symbol being defined. Some attributes are recognized by the compiler; some are defined and used by the programmer alone. In particular, “property” is recognized by the compiler and signals the fact that the function bearing such an attribute must be called without the trailing ().

Also note the use of ref5.2.1 on page 135) in two places. One is front’s return type, the intent being to allow you to modify elements in the array if you want to. Also, popFront uses ref to make sure it effects its change on the slice directly.

With the help of the three simple definitions, the modified find compiles and runs smoothly, which is deeply satisfactory; we generalized find to work against any type that defines empty, front, and popFront and then completed the circle having the general version work with the concrete case that motivated generalization in the first place. If the three primitives are inlined, the general find stays just as efficient as the previous implementation that was crippled to work only for slices.

Now, if empty, front, and popFront were useful only for defining find, the abstraction would be distinctly underwhelming. All right, we pulled it off with find, but when we define another function, will the empty-front-popFront troika be of any use, or will we have to start all over again with some different primitives? Fortunately, an extensive body of experience shows that there is something distinctly fundamental about the notion of a one pass access to a collection of data. The notion is so useful, it has been enshrined as the Iterator pattern in the famous Design Patterns book [27]; C++’s STL [51] refines the notion further and defines a conceptual hierarchy of iterators: input, forward, bidirectional, and random-access iterators.

In D’s nomenclature, the abstract data type that allows traversal of a collection of elements is called a range. (Iterator would have been a good name choice as well, but different preexisting libraries ascribe different meanings to the term, leading to possible confusion.) D’s ranges are more akin to the Iterator pattern than STL iterators (a D range can be modeled roughly by a pair of STL iterators) but do inherit STL’s categorical taxonomy. The troika empty-front-popFront in particular defines an input range, so our quest for a good find function revealed the inextricable relationship between linear searching and input ranges: you can’t linearly search anything less capable than an input range, and it would be a mistake to gratuitously require more than input range capabilities from your collection (e.g., you shouldn’t need arrays with indexed access). A near-identical implementation of our find can be found in the std.algorithm module of the standard library.

5.9.2 reduce—Just Not ad Absurdum

How about a challenging task that uses only input ranges: Define a function reduce that, given an input range r, an operation fun, and a seed x, repeatedly computes x = fun(x, e) for each element e in r, and then returns x. The reduce higher-order function is mightily powerful because it can express a variety of interesting accumulations. Many languages capable of higher-order functions define reduce as a staple facility, possibly under names such as accumulate, compress, inject, or foldl. Let’s start to define reduce in the undying spirit of test-driven development with a few unit tests:


Click here to view code image

unittest {
   int[] r = [ 10, 14, 3, 5, 23 ];
   // Compute the sum of all elements
   int sum = reduce!((a, b) { return a + b; })(0, r);
   assert(sum == 55);
   // Compute minimum
   int min = reduce!((a, b) { return a < b ? a : b; })(r[0], r);
   assert(min == 3);
}


As you can see, reduce is quite flexible and useful, of course if we ignore the minor detail that it doesn’t exist yet. Let’s set out to implement reduce such that it fulfills the tests above. We now have the knowledge to write a true industry-strength reduce from scratch: we know from § 5.3 on page 138 how to pass type parameters into a function; § 5.4 on page 140 taught us how to restrict reduce such that it accepts only sensible arguments; we saw in § 5.6 on page 148 how function literals can be passed into a function as alias parameters; and we just zeroed in on a nice, simple input range interface.


Click here to view code image

V reduce(alias fun, V, R)(V x, R range)
   if (is(typeof(x = fun(x, range.front)))
      && is(typeof(range.empty) == bool)
      && is(typeof(range.popFront())))
{
   for (; !range.empty; range.popFront()) {
      x = fun(x, range.front);
   }
   return x;
}


Compile, run unittests, and all pass. The definition of reduce would be, however, a bit more likable if the constraints weren’t about as bulky as the implementation itself. Besides, nobody wants to write the tedious tests that ensure R is an input range. Such verbose constraints are a subtle form of duplication. Fortunately, we have the tests for ranges nicely packaged in the standard module std.range, which simplifies reduce’s implementation to


Click here to view code image

import std.range;

V reduce(alias fun, V, R)(V x, R range)
   if (isInputRange!R && is(typeof(x = fun(x, range.front))))
{
   for (; !range.empty; range.popFront()) {
      x = fun(x, range.front);
   }
   return x;
}


which is distinctly more palatable. With reduce you can compute not only sum and minimum, but a variety of other aggregate functions such as the closest number to a given one, the largest number in absolute value, and standard deviation. The standard library defines reduce in std.algorithm pretty much as above, except that it accepts multiple functions to compute; that makes for very efficient computation of multiple aggregate functions because it makes only one pass through the input.

5.10 Variadic Functions

The traditional “Hello, world!” program on page 1 used the standard library function writeln to print its greeting to the standard output. One interesting detail about writeln is that it accepts any number and types of arguments. There are several ways to define variadic functions in D, catering to different needs; let’s start with the simplest.

5.10.1 Homogeneous Variadic Functions

A homogeneous variadic function accepts any number of arguments of the same type and is defined like this:


Click here to view code image

import std.algorithm, std.array;

// Computes the average of a set of numbers,
//    passable directly or via an array.
double average(double[] values...) {
   if (values.empty) {
      throw new Exception("Average of zero elements is undefined");
   }
   return reduce!((a, b) { return a + b; })(0.0, values)
      / values.length;
}

unittest {
   assert(average(0) == 0);
   assert(average(1, 2) == 1.5);
   assert(average(1, 2, 3) == 2);
   // Passing arrays and slices works as well
   double[] v = [1, 2, 3];
   assert(average(v) == 2);
}


(Notice the reuse of reduce at its best.) The distinguishing trait of average is the presence of the ellipsis ... following the values parameter, in conjunction with the fact that values has slice type. (If it didn’t, or if values were not the last parameter of average, the ellipsis would have been in error.)

If you invoke average with a slice of double, it’s business as usual, as shown in the unittest’s last line. But thanks to the ellipsis, you can also invoke average with any number of arguments, as long as each is convertible to double. The compiler will automatically arrange all arguments in a slice and will pass it to average.

You could consider that this feature is little more than syntactic sugar rewriting average(a, b, c) into average([a, b, c]). But because of its invocation syntax, a homogeneous variadic function overloads others in its scope. For example:


Click here to view code image

// For the sake of argument
double average() {}
double average(double) {}
// Homogeneous variadic function
double average(double[] values...) { /* as above */ ... }
unittest {
   average(); // Error! Ambiguous call to overloaded function!
}


The presence of the first two overloads of average effectively renders zero- and one-argument calls to the variadic version of average ambiguous. You can disambiguate the calls by explicitly passing a slice to average, for example, average([1, 2]).

If a non-variadic function and a variadic one coexist in the same scope for the same slice type, the non-variadic function is preferred for calls that pass an actual slice:


Click here to view code image

import std.stdio;

void average(double[]) { writeln("non-variadic"); }
void average(double[]...) { writeln("variadic"); }
void main() {
   average(1, 2, 3);   // Writes "variadic"
   average([1, 2, 3]); // Writes "non-variadic"
}


5.10.2 Heterogeneous Variadic Functions

Getting back to writeln, clearly it must be doing something other than what average did because writeln accepts arguments of different types. To match an arbitrary number and types of arguments, you’d use a heterogeneous variadic function, which is defined like this:


Click here to view code image

import std.conv;

void writeln(T...)(T args) {
   foreach (arg; args) {
      stdout.rawWrite(to!string(arg));
   }
   stdout.rawWrite(' '),
   stdout.flush();
}


This implementation is a bit crude and inefficient, but it does work. Inside writeln, T is a parameter type tuple—a type that packs together several types—and args is a parameter tuple. The foreach statement detects that args is a type tuple, so it generates radically different code from that of a usual foreach invocation (such as one for iterating an array). For example, consider the call


Click here to view code image

writeln("Writing integer: ", 42, " and array: ", [ 1, 2, 3 ]);


For this call, the code generated by foreach would look like this:


Click here to view code image

// Approximation of generated code
void writeln(string a0, int a1, string a2, int[new] a3) {
   stdout.rawWrite(to!string(arg0));
   stdout.rawWrite(to!string(arg1));
   stdout.rawWrite(to!string(arg2));
   stdout.rawWrite(to!string(arg3));
   stdout.rawWrite(' '),
   stdout.flush();
}


The module std.conv defines to!string for all types (including string itself, for which to!string is the identity function), so the function works by converting each argument in turn to a string and writing it as raw bytes to the standard output.

You don’t need to use foreach to access the types or the values in a parameter tuple. If n is a compile-time constant integral, T[n] yields the nth type and args[n] yields the nth value in the parameter tuple. To get the number of arguments, use T.length or args.length (both are compile-time constants). If you noticed a resemblance to arrays, you won’t be surprised to find out that T[$ - 1] accesses the last type in T (and args[$ - 1] is an alias for the last value in args). For example:


Click here to view code image

import std.stdio;

void testing(T...)(T values) {
   writeln("Called with ", values.length, " arguments.");
   // Access each index and each value
   foreach (i, value; values) {
      writeln(i, ": ", typeid(T[i]), " ", value);
   }
}
void main() {
   testing(5, "hello", 4.2);
}


The program prints


Click here to view code image

Called with 3 arguments.
0: int 5
1: immutable(char)[] hello
2: double 4.2


5.10.2.1 The Type without a Name

The writeln function does too many specific things to be general—it always prints ' ' at the end and then flushes the stream. We’d like to define writeln in terms of a primitive write that just writes each argument in turn:


Click here to view code image

import std.conv;

void write(T...)(T args) {
   foreach (arg; args) {
      stdout.rawWrite(to!string(arg));
   }
}

void writeln(T...)(T args) {
   write(args, ' '),
   stdout.flush();
}


Note how writeln forwards args plus ' ' to write. When forwarding a parameter tuple, it is automatically expanded, so the call writeln(1, "2", 3) forwards four, not two, arguments to write. This behavior is a tad irregular and potentially puzzling because pretty much anywhere else in D mentioning one symbol means one value. Here’s an example that could surprise even the prepared:


Click here to view code image

void fun(T...)(T args) {
   gun(args);
}

void gun(T)(T value) {
   writeln(value);
}
unittest {
   fun(1);      // Fine
   fun(1, 2.2); // Error! Cannot find 'gun' taking two arguments!
}


The first call goes through, but the second call does not. You’d expect it would, because any value has a type, so args must have some type that is then inferred by gun. What is happening?

The answer is that indeed all values do have a type that is correctly tracked by the compiler. The culprit is the call gun(args) because wherever a parameter tuple is passed to a function as an argument, the compiler expands it automatically. So even though you wrote gun(args), the compiler always expands that to gun(args[0], args[1], ..., args[$ - 1]). In the second call that means gun(args[0], args[1]), which looks for a two-parameter gun and doesn’t find it—hence the error.

To investigate the matter further, let’s have fun print out the type of args:


Click here to view code image

void fun(T...)(T args) {
   writeln(typeof(args).stringof);
}


The typeof construct is not a function call; it just yields the type of args, so we needn’t worry about automatic expansion. The .stringof property of any type yields its name, so let’s compile and run the program again. It prints


Click here to view code image

(int)
(int, double)


So indeed it seems like the compiler tracks the type of parameter tuples and has a string representation for them. However, you cannot explicitly define a parameter tuple—there is no type called (int, double):


Click here to view code image

// No avail
(intdouble) value = (1, 4.2);


This is because tuples are unique that way: they are types that the compiler uses internally but cannot be expressed. There is no way you can sit down and write a parameter tuple type. For that matter, there is no literal value that would yield a parameter tuple (if there were, auto would obviate the need for a type name).

5.10.2.2 Tuple and tuple

Types without a name and values without a literal may seem an interesting concept for a thrill seeker, but quite crippling for a practical programmer. Fortunately (finally! a “fortunately” was due sooner or later), that’s not really a limitation as much as a way to save on syntax. You can eminently express parameter tuple types by using the standard library type Tuple, and parameter tuple values with tuple. Both are to be found in the standard module std.typecons. So a parameter tuple containing an int and a double could be written as


Click here to view code image

import std.typecons;
unittest {
   Tuple!(intdouble) value = tuple(1, 4.2); // Whew
}


or equivalently, given that tuple(1, 4.2) returns a value of type Tuple!(int, double):


Click here to view code image

auto value = tuple(1, 4.2); // Double whew


Tuple!(int, double) being a type like any other, it doesn’t do the automatic expansion trick, so if you want to expand it into its constituents you must do so explicitly by using Tuple’s expand property. For example, let’s scrap our fun/gun program and rewrite it like this:


Click here to view code image

import std.stdio, std.typecons;

void fun(T...)(T args) {
   // Create a Tuple to pack all arguments together
   gun(tuple(args));
}

void gun(T)(T value) {
   // Expand the tuple back
   writeln(value.expand);
}

void main() {
   fun(1);      // Fine
   fun(1, 2.2); // Fine
}


Notice how fun packs all of its arguments inside a Tuple and passes it to gun, which expands the received tuple into whatever it contains. The expression value.expand is automatically rewritten into an arguments list containing whatever you put in the Tuple.

The implementation of Tuple has a couple of subtleties but uses means available to any programmer. Checking its definition in the standard library is a useful exercise.

5.11 Function Attributes

D functions allow attachment of a number of attributes—specific features that advertise certain characteristics of a function to the programmer and the compiler. Function attributes are checked, so all the user has to do to figure out important information about a function’s behavior is to look at its signature, with the assurance that the guarantee is much more than just a comment or a convention.

5.11.1 Pure Functions

Purity of functions is a notion borrowed from mathematics that has quite a few theoretical and practical advantages. In D, a function is considered pure if returning a result is its only effect and the result depends only on the function’s arguments.

In classic mathematics, all functions are pure because classic mathematics does not use state and mutation. What’s image? Well, 1.4142 something, same as yesterday, tomorrow, or really at any point in time. It could even be argued that image had the same value even before humankind discovered roots, algebra, numbers, or even before there even was a humankind to appreciate the beauty of mathematics, and will be the same long after the universe dies of heat exhaustion. Mathematical results are forever.

Purity is good for functions, at least sometimes and with qualifications, just as in life. (Also as in life, functional purity is not easy to achieve. Plus, at least according to some, too much of either kind of purity could get really annoying.) But looking at the bright side, a pure function is easier to reason about. You know you can call it at any time and only look at the call to see what effect it’s supposed to have. You can substitute equivalent function calls for values and values for equivalent function calls. You know that bugs in pure functions never cause “shrapnel”—bugs can never affect more than the result of the function itself.

Also, pure functions can run literally in parallel because they don’t interact with the rest of the program except through their result. In contrast, mutation-intensive functions are prone to step on each other’s toes when running in parallel. Even when being run in sequence, their effect may subtly depend on the order in which you call them. Many of us got so used to this way of doing things, we consider the difficulties a part of the job so we hardly raise an eyebrow. But it can be very refreshing and very useful if at least parts of an application obey purity.

You can define a pure function by prefixing its definition with pure:


Click here to view code image

pure bool leapYear(uint y) {
   return (y % 4) == 0 && (y % 100 || (y % 400) == 0);
}


The function’s signature is


Click here to view code image

pure bool leapYear(uint y);


and gives the user a guarantee that, for example, leapYear doesn’t write to the standard output. In addition, just by seeing the signature it’s clear that leapYear(2020) returns the same value now or ever.

The compiler is keenly aware of pure, too, and it actually guards against anything that would render leapYear less than pristine. Consider the following change:


Click here to view code image

pure bool leapYear(uint y) {
   auto result = (y % 4) == 0 && (y % 100 || (y % 400) == 0);
   if (result) writeln(y, " is a leap year!"); // Error!
      // Cannot call impure function writeln from pure function!
   return result;
}


The writeln function is not, and cannot be, pure. If it claimed to be, the compiler would have disabused it of such a pretense. The compiler makes sure a pure function never calls an impure one. That’s why the modified leapYear does not compile. On the other hand, the compiler can successfully verify a function such as daysInYear:


Click here to view code image

// Certified pure
pure uint daysInYear(uint y) {
   return 365 + leapYear(y);
}


5.11.1.1 pure Is as pure Does

Traditionally, functional languages require absolutely no mutation in order to allow programs to claim purity. D relaxes that requirement by allowing functions to mutate their own private and transitory state. That way, even if mutation is present inside, from the outside the function is still spotless.

Let’s see how that relaxation works. Consider, for example, a naïve implementation of the functional-style Fibonacci function:


Click here to view code image

ulong fib(uint n) {
   return n < 2 ? n : fib(n - 1) + fib(n - 2);
}


No computer science teacher should ever teach such an implementation of Fibonacci. fib takes exponential time to complete and as such promotes nothing but ignorance of complexity and of the costs of computation, a “cute excuses sloppy” attitude, and SUV driving. You know how bad exponential is? fib(10) and fib(20) take negligible time on a contemporary machine, whereas fib(50) takes 19 minutes. In all likelihood, evaluating fib(1000) will outlast humankind (just not in a good way like image does).

Fine, so then what does a “green” functional Fibonacci implementation look like?


Click here to view code image

ulong fib(uint n) {
   ulong iter(uint i, ulong fib_1, ulong fib_2) {
      return i == n
         ? fib_2
         : iter(i + 1, fib_1 + fib_2, fib_1);
   }
   return iter(0, 1, 0);
}


The revised version takes negligible time to compute fib(50). The implementation now takes image(n) time, and tail call elimination (§ 1.4.2 on page 12) takes care of the space complexity. (It should be mentioned that in fact there are image(log n)-time algorithms that compute Fibonacci.)

The problem is that the new fib kind of lost its glory. Essentially the revised implementation maintains two state variables in the disguise of function parameters, so we might as well come clean and write the straight loop that iter made unnecessarily obscure:


Click here to view code image

ulong fib(uint n) {
   ulong fib_1 = 1, fib_2 = 0;
   foreach (i; 0 .. n) {
      auto t = fib_1;
      fib_1 += fib_2;
      fib_2 = t;
   }
   return fib_2;
}


But, alas, this is not functional anymore. Look at all that mutation going on in the loop. One mistaken step, and we fell all the way from the peak of mathematical purity down to the unsophisticatedness of the unwashed masses.

But if we sit for a minute and think, the iterative fib is not that unwashed. If you think of it as a black box, fib always outputs the same thing for a given input, and after all, pure is as pure does. The fact that it uses private mutable state may make it less functional in letter, but not in spirit. Pulling carefully on that thread, we reach a very interesting conclusion: as long as the mutable state in a function is entirely transitory (i.e., allocated on the stack) and private (i.e., not passed along by reference to functions that may taint it), then the function can be considered pure.

And that’s how D defines functional purity: you can use mutation in the implementation of a pure function, as long as it’s transitory and private. You can then put pure in that function’s signature and the compiler will compile it without a hitch:


Click here to view code image

pure ulong fib(uint n) {
    ... // Iterative implementation
}


The way D relaxes purity is pretty cool because you’re getting the best of both worlds: ironclad functional purity guarantees, and comfortable implementation when mutation is the preferred method.

5.11.2 The nothrow Function Attribute

The nothrow attribute specified with a function conveys the information that that function will never throw an exception. Just like pure, nothrow is compile-time-checked. For example:


Click here to view code image

import std.stdio;

nothrow void tryLog(string msg) {
   try {
      stderr.writeln(msg);
   } catch (Exception) {
      // Ignore exception
   }
}


The tryLog function does a best-effort attempt at logging a message. If an exception is thrown, it is silently ignored. This makes tryLog usable in critical sections of code. In some circumstances it would be silly for some important transaction to fail just because a log message couldn’t be written. Code with transactional semantics relies critically on certain portions never throwing, and nothrow is the way you can ensure such facts statically.

The semantic checking of nothrow functions ensures an exception may never leak out of the function. Essentially any statement in that function must never throw (e.g., is a call to another nothrow function) or is nested inside a try statement that swallows the exception. To illustrate the former case:


Click here to view code image

nothrow void sensitive(Widget w) {
   tryLog("Starting sensitive operation");
   try {
      w.mayThrow();
      tryLog("Sensitive operation succeeded");
   } catch (Exception) {
      tryLog("Sensitive operation failed");
   }
}


The first call to tryLog needn’t be inside a try statement because the compiler already knows it can’t throw. Similarly, the call inside the catch clause does not need to be protected by an extra try statement.

What is the relationship between pure and nothrow? It might appear that they are entirely orthogonal, but there may be a certain degree of correlation. At least in the standard library, many mathematical functions are both pure and nothrow, for example, most transcendental functions (exp, sin, cos, etc.).

5.12 Compile-Time Evaluation

In keeping with the saying that good things come to those who wait (or read patiently), this last section discusses a very interesting feature of D. The best part is, you don’t even need to learn much to use this feature gainfully.

Let’s use an example that’s large enough to be meaningful. Suppose you want to define the ultimate random number generator library. There are many random number generators out there, among them the fast and well-studied linear congruential generators [35, § 3.2.1, pages 10–26]. Such generators have three integral parameters: the modulus m > 0, the multiplier 0 < a < m, and the increment3 0 < c < m. Starting from an arbitrary seed 0 x0 < m, the linear congruential generator yields pseudorandom numbers using the following recurrence formula:

xn+1 =(axn + c) mod m

Coding such an algorithm is simple—all you need is to keep the state defined by m, a, c, and xn and define a getNext function that changes xn into xn+1 and returns it.

But there is a rub. Not all combinations of m, a, and c lead to good random number generators. For starters, if a = 1 and c = 1, the generator gives the sequence 0,1, ..., m– 1,0,1, ..., m – 1,0,1, ... which is admittedly quite non-random.

With larger values of a and c such obvious risks are avoided, but a subtler problem appears: periodicity. Because of the modulus operator the generated number is always between 0 and m–1, so it’s good to make m as large as possible (usually it’s a power of 2 to match the machine word size, in which case the mod comes for free). The problem is that the generated sequence may have a period much smaller than m. Say we operate with uint and choose m = 232 so we don’t even need a modulus operation; then a = 210, c = 123, and some crazy value for x0, such as 1,780,588,661. Let’s run this program:


Click here to view code image

import std.stdio;

void main() {
   enum uint a = 210, c = 123, x0 = 1_780_588_661;
   auto x = x0;
   foreach (i; 0 .. 100) {
      x = a * x + c;
      writeln(x);
   }
}


Instead of a colorful random display of digits, we see something rather surprising:


Click here to view code image

 1 261464181
 2 3367870581
 3 2878185589
 4 3123552373
 5 3110969461
 6 468557941
 7 3907887221
 8 317562997
 9 2263720053
10 2934808693
11 2129502325
12 518889589
13 1592631413
14 3740115061
15 3740115061
16 3740115061
17 ...


The generator starts with great aplomb. At least to the untrained eye, it does a good job of generating random numbers. But it doesn’t take more than 14 steps to stall in a fixed point: through one of those strange coincidences that only math is capable of, 3740115061 is (and was and will be) exactly equal to (3740115061*210+123) mod 232. That’s a period of one, the worst possible!

So we need to make sure that m, a, and c are chosen such that the generator has a large period. Investigating the matter further, it turns out that the conditions for generating a sequence of period m (the largest period possible) are the following:

  1. c and m are relatively prime.
  2. a – 1 is divisible by all prime factors of m.
  3. If a – 1 is a multiple of 4, then m is a multiple of 4, too.

The relative primality of c and m can be easily checked by comparing their greatest common divisor against 1. To compute the greatest common divisor, we use Euclid’s algorithm:4


Click here to view code image

// Implementation of Euclid's algorithm
ulong gcd(ulong a, ulong b) {
   while (b) {
      auto t = b;
      b = a % b;
      a = t;
   }
   return a;
}


Euclid expressed his algorithm by using subtraction instead of modulus. The modulus version takes fewer iterations, but on today’s machines, % can be quite slow, something that Euclid might have had in mind.

The second test is a tad more difficult to implement. We could write a function factorize that returns all prime factors of a number with their powers and then use it, but factorize is more than the bare necessity. Going with the simplest design that could possibly work, probably a simple choice is to write a function primeFactorsOnly(n) that returns the product of n’s prime factors, but without the powers. Then the requirement boils down to checking (a - 1) % primeFactorsOnly(m) == 0. So let’s implement primeFactorsOnly.

There are many ways to go about getting the prime factors of some number n. A simple one would be to generate prime numbers p1, p2, p3, . . . and check in turn whether pk divides n, in which case pk is multiplied to an accumulator r. When pk has become greater than n, the accumulator contains the desired answer: the product of all of n’s prime factors, each taken once.

(I know you are asking yourself what this has to do with compile-time evaluation. It does. Please bear with me.)

A simpler version would be to do away with generating prime numbers and simply evaluate n mod k for increasing values of k starting at 2: 2, 3, 5, 7, 9, . . . Whenever k divides n, the accumulator is multiplied by k and then n is “depleted” of all powers of k, that is, n is reassigned n/k for as long as k divides n. That way, we recorded the divisor k, and we also reduced n until it became irreducible by k. That seems like a wasteful method, but note that generating prime numbers would probably entail comparable work, at least in a straightforward implementation. An implementation of this idea would look like this:


Click here to view code image

ulong primeFactorsOnly(ulong n) {
   ulong accum = 1;
   ulong iter = 2;
   for (; n >= iter * iter; iter += 2 - (iter == 2)) {
      if (n % iter) continue;
      accum *= iter;
      do n /= iter; while (n % iter == 0);
   }
   return accum * n;
}


The update iter += 2 - (iter == 2) bumps iter by 2 except when iter is 2, in which case the update brings iter to 3. That way iter spans 2, 3, 5, 7, 9, and so on. It would be wasteful to check any even number such as 4 because 2 has been tested already and all powers of 2 have been extracted out of n.

Why does the iteration go on while n >= iter * iter as opposed to n >= iter? The answer is a bit subtle. If iter is greater than image and different from n itself, then we can be sure iter can’t be a divisor of n: if it were, there would need to be some multiplier k such that n == k * iter, but all divisors smaller than iter have been tried already, so k must be greater than iter and consequently k * iter is greater than n, which makes the equality impossible.

Let’s unittest the primeFactorsOnly function:


Click here to view code image

unittest {
   assert(primeFactorsOnly(100) == 10);
   assert(primeFactorsOnly(11) == 11);
   assert(primeFactorsOnly(7 * 7 * 11 * 11 * 15) == 7 * 11 * 15);
   assert(primeFactorsOnly(129 * 2) == 129 * 2);
}


To conclude, we need a small wrapper that performs the three checks against the three candidate linear congruential generator parameters:


Click here to view code image

bool properLinearCongruentialParameters(ulong m, ulong a, ulong c) {
   // Bounds checking
   if (m == 0 || a == 0 || a >= m || c == 0 || c >= m) return false;
   // c and m are relatively prime
   if (gcd(c, m) != 1) return false;
   // a - 1 is divisible by all prime factors of m
   if ((a - 1) % primeFactorsOnly(m)) return false;
   // If a - 1 is multiple of 4, then m is a multiple of 4, too.
   if ((a - 1) % 4 == 0 && m % 4) return false;
   // Passed all tests
   return true;
}


Let’s unittest a few popular values of m, a, and c:


Click here to view code image

unittest {
   // Our broken example
   assert(!properLinearCongruentialParameters(
      1UL << 32, 210, 123));
   // Numerical Recipes book [48]
   assert(properLinearCongruentialParameters(
      1UL << 32, 1664525, 1013904223));
   // Borland C/C++ compiler
   assert(properLinearCongruentialParameters(
      1UL << 32, 22695477, 1));
   // glibc
   assert(properLinearCongruentialParameters(
      1UL << 32, 1103515245, 12345));
   // ANSI C
   assert(properLinearCongruentialParameters(
      1UL << 32, 134775813, 1));
   // Microsoft Visual C/C++
   assert(properLinearCongruentialParameters(
      1UL << 32, 214013, 2531011));
}


It looks like properLinearCongruentialParameters works like a charm, so we’re done with all the details of testing the soundness of a linear congruential generator. It’s about time to stop stalling and fess up. What does all that primality and factorization stuff have to do with compile-time evaluation? Where’s the meat? Where are the templates, macros, or whatever they call them? The clever static ifs? The mind-blowing code generation and expansion?

Well, here’s the truth: you just saw everything there is to be seen about compile-time function evaluation. Given any constant numbers m, a, and c, you can evaluate properLinearCongruentialParameters during compilation without any change in that function or the functions it calls. The D compiler embeds an interpreter that evaluates D functions during compilation—with arithmetic, loops, mutation, early returns, arrays, and even transcendental functions.

All you need to do is to clarify to the compiler that the evaluation must be performed at compile time. There are several ways to do that:


Click here to view code image

unittest {
   enum ulong m = 1UL << 32, a = 1664525, c = 1013904223;
   // Method 1: use static assert
   static assert(properLinearCongruentialParameters(m, a, c));
   // Method 2: assign the result to an enum
   enum proper1 = properLinearCongruentialParameters(m, a, c);
   // Method 3: assign the result to a static value
   static proper2 = properLinearCongruentialParameters(m, a, c);
}


We haven’t looked closely at structs and classes yet, but just to anticipate a little, the typical way you’d use properLinearCongruentialParameters would be inside a struct or a class that defines a linear congruential generator, for example:


Click here to view code image

struct LinearCongruentialEngine(UIntType,
      UIntType a, UIntType c, UIntType m) {
   static assert(properLinearCongruentialParameters(m, a, c),
      "Incorrect instantiation of LinearCongruentialEngine");
   ...
}


In fact, the lines above were copied from the eponymous struct found in the standard module std.random.

There are two interesting consequences of moving the test from runtime to compile time. First, LinearCongruentialEngine could have deferred the test to runtime, for example, by placing it in its constructor. As a general rule, however, it is better to discover errors sooner rather than later, particularly in a library that has little control over how it is being used. The static test does not make erroneous instantiations of LinearCongruentialEngine signal the error; it makes them nonexistent. Second, the code using compile-time constants has a good chance to be faster than code that uses regular variables for m, a, and b. On most of today’s processors, literal constants can be made part of the instruction stream so loading them causes no extra memory access at all. And let’s face it—linear congruential engines aren’t the most random out there, so the primary reason you’d want to use one is speed.

The interpretation process is slower than generating code by a couple of orders of magnitude, but that is already much faster and more scalable than traditional metaprogramming carried out with C++ templates. Besides, within reason, compile-time activity is in a way “free.”

At the time of this writing, the interpreter has certain limitations. Allocating class objects and memory in general is not allowed (though built-in arrays work). Static data, inline assembler, and unsafe features such as unions and certain casts are also disallowed. But the limits of what can be done during interpretation form an envelope under continuous pressure. The plan is to allow everything in the safe subset of D to be interpretable during compilation. All in all, the ability to interpret code during compilation is recent and opens very exciting possibilities that deserve further exploration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.128