Chapter 34. Nested Function

Compose functions by nesting function calls as arguments of other calls.

image

34.1 How It Works

By representing a DSL clause as a Nested Function, you’re able to reflect the hierarchic nature of the language in a way that’s mirrored in the host language, not just in a formatting convention.

A notable property of Nested Function is the way it affects the evaluation order of its arguments. Function Sequence and Method Chaining both evaluate the functions in a left-to-right sequence. Nested Function evaluates the arguments of a function before the enclosing function itself. I find this most memorable with the “Old MacDonald” example: To sing the chorus, you type o(i(e(i(e()))). This evaluation order has an impact on both how to use Nested Function and when to choose it instead of alternatives.

Evaluating the enclosing function last can be very handy, in that it provides a built-in context to work with the arguments. Consider defining a computer processor configuration:

processor(cores(2), speed(2500),i386())

The nice thing here is that the argument functions can return fully formed values which the processor function can then assemble into its return value. Since the processor function evaluates last, we don’t need to worry about the stopping problem of Method Chaining, nor do we need to have the Context Variable that we need for Function Sequence.

With mandatory elements in the grammar, along the lines of parent::= first second, Nested Function works particularly well. A parent function can define exactly the arguments required in the child functions and, with a statically typed language, can also define the return types, which enables IDE autocompletion.

One issue with function arguments is how to label them so as to make them readable. Consider indicating the size and speed of a disk. The natural programming response is disk(150, 7200) but this isn’t terribly readable as there’s no indication what the numbers mean, unless you have a language with keyword arguments. A way to deal with this is to use a wrapping function that does nothing other than provide a name: disk(size(150), speed(7200)). In the simplest form of this, the wrapping function just returns the argument value, representing pure syntactic sugar. It also means that there’s no enforcement of the meaning of these functions—a call to disk(speed(7200), size(150)) could easily result in a very slow disk. You can avoid this by making the nested functions return intermediate data, such as a builder or token—although that is more effort to set up.

Optional arguments can also present problems. If the base language supports default arguments for functions, you can use these for the optional case. If you don’t have this, one approach is to define different functions for each combination of the optional arguments. If you only have a couple of cases, this is tedious but reasonable. As the number of optional arguments increases, so does the tediousness (but not the reasonableness). One way out of this problem is to use intermediate data again—tokens can be a particularly effective choice.

If your language supports it, a Literal Map is often a good way out of these quandaries. In this case, you get just the right data structure to deal with the issue. The only problem is that C-like languages don’t usually support Literal Map.

With multiple arguments in the same call, a varargs parameter is the best choice if the host language supports it. You can also think of this as a nested Literal List. Multiple arguments of different kinds end up being like optional arguments, with the same complications.

The worst case of this is a grammar like parent::= (this | that)*. The issue here is that, unless you have keyword arguments, the only way to identify the arguments is through their position and type. This can make picking out which argument is which messy—and downright impossible if this and that have the same types. Once this happens, you are forced into either returning intermediate results, or using a Context Variable. Using a Context Variable is particularly difficult here since the parent function isn’t evaluated till the end, forcing you to use the broader context of the language to properly set up the Context Variable.

In order to keep the DSL readable, you usually want Nested Functions to be bare function calls. This implies you either need to make them global functions, or use Object Scoping. Since global functions are problematic, I usually try to use Object Scoping if I can. However, global functions can often be much less problematic in Nested Function, because the biggest problem with global functions is when they come with a global parsing state. A global function that just returns a value, such as a static method like DayOfWeek.MONDAY, is often a good choice.

34.2 When to Use It

One of the great strengths—and weaknesses—of Nested Function is the order of evaluation. With Nested Function, the arguments are evaluated before the parent function (unless you use Closures for arguments). This is very useful for building up a hierarchy of values because you can have the arguments create fully formed model objects to be assembled by the parent function. This can avoid much of the mucking about with replacements and intermediate data that you get with Function Sequence and Method Chaining.

Conversely, this evaluation order causes problems in a sequence of commands, leading to the Old MacDonald problem: o(i(e(i(e()))). So, for a sequence that you want to read left to right, Function Sequence or Method Chaining are usually a better bet. For precise control of when to evaluate multiple arguments, use Nested Closure.

Nested Function also often struggles with optional arguments and multiple varied arguments. Nested Function very much expects you to say what you want and in the precise order you want, so if you need greater flexibility you’ll need to look to Method Chaining or a Literal Map. A Literal Map is often a good choice as it allows you to get the arguments sorted out before calling the parent while giving you the flexibility of ordering and optionality of the arguments, particularly with a hash argument.

Another disadvantage of Nested Function is the punctuation, which usually relies on matching brackets and putting commas in the right place. At its worst, this can look like a disfigured Lisp, with all the parentheses and added warts. This is less of an issue for DSLs aimed at programmers, who get more used to these warts.

Name clashes are less of a trouble here than with Function Sequence, since the parent function provides the context to interpret the nested function call. As a result, you can happily use “speed” for processor speed and disk speed and use the same function as long as the types are compatible.

34.3 The Simple Computer Configuration Example (Java)

Here’s the common example of stating the configuration of a simple computer:

image

For this case, each clause in the script returns a Semantic Model object, so I can use the nested evaluation order to build up the entire expression without using Context Variables. I’ll start from the bottom, looking at the processor clause.

image

I’ve defined the builder elements as static methods and constants on a builder class. By using Java’s static import feature, I can use bare calls to use them in the script. (Is it only me who finds it confusing that we call them “static imports” but have to declare them with import static?)

The cores and speed methods are pure syntactic sugar—only there to help readability (particularly if you skipped dessert). I toy with calling something that’s pure syntactic sugar a “sucratic” function, but maybe that is a step too far for even my neologizing habits. In this case, the sugar also helps with the disk speed—if they needed different return types this could be a problem, but it isn’t in this case.

The disk clause has two optional arguments. Since there’s only a couple, I’ll nap for a while as I write out the combination of functions.

image

For the top-level computer clause, I use varargs parameter to handle the multiple disks.

image

I’m usually a big fan of using Object Scoping to avoid littering the code with global functions and Context Variables. However, with static imports and Nested Function, I can use static elements without introducing global trash.

34.4 Handling Multiple Different Arguments with Tokens (C#)

One of the trickier areas to use Nested Function is where you have multiple arguments of different kinds. Consider a language for defining properties of an onscreen box:

image

In this situation, we can have any number of a wide variety of properties to set. There’s no strong reason to force an order in declaring the properties, so the usual style of argument identification in C# (position) doesn’t work too well. For this example, I’ll explore using tokens to identify the arguments to compose them into the structure.

Here’s the target model object:

image

The various contained functions all return the token data type, which looks like this:

image

I’m using Object Scoping and define the clauses of the DSL as functions on the builder supertype.

image

I’m only showing a couple of them, but I’m sure you can deduce from these what the rest look like.

The parent function now just runs through the argument results and assembles a box.

image

34.5 Using Subtype Tokens for IDE Support (Java)

Most languages differentiate between function arguments by their position. So in the above example, we might set the size and speed of a disk with a function like disk(150, 7200). That bare function isn’t too readable, so in the above example I wrapped the numbers with simple functions to get disk(size(150), speed(7200)). In the earlier code example, the functions just return their arguments, which aids readability but doesn’t prevent someone typing the erroneous disk(speed(7200), size(150)).

Using simple tokens, like in the Box example, provides a mechanism for error checking. By returning a token of [size, 150] you can use the token type to check that you have the right argument in the right position, or indeed make the arguments work in any order.

Checking is all very well, but in a statically typed language with a modern IDE, you want to go further. You want autocompletion popups to force you to put size before speed. By using subclasses, you can pull this off.

In the tokens above, the token type was a property of the token. An alternative is to create a different subtype for each token; I can then use the subtype in the parent function definition.

Here’s the short script I want to support:

image

Here’s the target model object:

image

To handle size and speed, I create a general integer token with subclasses for the two kinds of clauses.

image

I can then define static functions in a builder, with the right arguments.

image

With this setup, the IDE will suggest the right functions in the right places, and I’ll see comforting red squigglies should I do any reckless typing.

(Another way to approach adding static typing is to use generics, but I’ll leave that as an exercise for the reader.)

34.6 Using Object Initializers (C#)

If you’re using C#, then the most natural way to handle a pure hierarchy of data is to use object initializers.

image

This can work with a simple set of model classes.

image

You can think of object initializers as Nested Functions that can take keyword arguments (like a Literal Map) which are restricted to object construction. You can’t use them for everything, but they can come in handy for situations like this.

34.7 Recurring Events (C#)

I used to live in the South End of Boston. There was much to like about living in a downtown area of the city, close to restaurants and other ways to pass the time and spend my money. There were irritations, however, and one of them was street cleaning. On the first and third Monday of the month between April and October, they would clean the streets near my apartment and I had to be sure I didn’t leave my car there. Often I forgot and I got a ticket.

The rule for my street was that the cleaning occurred on the first and third Monday of the month between April and October. I could write a DSL expression for this.

image

This example combines Method Chaining with Nested Function. Usually when I use Nested Function, I prefer to combine it with Object Scoping, but in this case the functions that I’m nesting just return a value so I don’t really feel a strong need to use Object Scoping.

34.7.1 Semantic Model

Recurring events are a recurring event in software systems. You often want to schedule things on particular combinations of dates. The way I think of them these days is that they are a Specification [Evans DDD] of dates. We want code that can tell us if a given date is included on a schedule. We do this by defining a general specification interface—which we can make generic, as specifications are useful in all sorts of situations.

image

When building a specification model for a particular type, I like to identify small building blocks that I can combine together. One small building block is the notion of a particular period in a year, such as between April and October.

image

Another element is the notion of the first Monday in the month. This class is a little more tricky as I have to walk through sample dates in the month to see which one is the first.

image

To walk through the days in a month, this specification makes use of a special enumerator. I set the enumerator with a particular month and year.

image

It implements the IEnumerator methods.

image

And also implements IEnumerable to allow it to be used in a for-each loop.

image

Finally, we have a very simple Month class, which also acts as a specification.

image

These are useful building blocks, but they can’t do much on their own. To really make them sing and dance, I need to be able to combine them into logical expressions, which I do with a couple more specifications.

image

I trust you can figure out how to implement a NotSpecfication.

One thing I don’t like about this model is my usage of the DateTime class. The problem is that DateTime has subsecond precision, but I’m only working at day precision. Using overprecise temporal data types is very common, because usually libraries push us in that direction. However, they can easily result in awkward bugs when you compare two DateTimes that are different below the level of precision you care about. If I were doing this on a real project, I’d make a proper Date class with the correct precision.

34.7.2 The DSL

Here’s the DSL text for my old street cleaning schedule:

image

Like most realistic DSLs, it uses a combination of internal DSL techniques, namely a mix of Method Chaining and Nested Function. I’m not going to worry too much about the Method Chaining here; instead, I’ll concentrate on the way Nested Function is used. Since each Nested Function returns a simple value, I don’t have a strong need for Object Scoping as they won’t require any Context Variables. As a result, I’ll use static methods. As I’m in C#, all the static methods need to be prefixed with their class name. This reads pretty well, although it does add noise compared to an Object Scoping approach.

Two of the Nested Functions are calls to return a simple value. DayOfWeek.Monday is actually built into the .NET libraries. I added Month.April and friends myself.

image

The calls on Schedule are a bit different. The initial use of Schedule.First is an example of a common feature in these languages—using a bare function to create a starting object to begin the chaining. Schedule here is an Expression Builder. It’s not called “builder” because I think it reads better as just “schedule.”

image

Like most Expression Builders, the schedule builds up a content, which is a specification.

image

Notice how the initial call returns a schedule that wraps the first element in the specification. The later call to Third is the same (except for the parameter). I would usually argue against writing different methods for something that would be better handled as a parameter, but this is yet another example where you have different rules of good programming when you use an Expression Builder.

It’s the Method Chaining that actually builds up the composite structure. Here’s the interestingly named And method:

image

We say “first and third Monday” in our language, but in terms of the specification, it’s the first or third Monday that matches the Boolean condition. It’s an interesting example of where the DSL is opposite to the model in order for both to read naturally.

The period at the end is similarly assembled using Method Chaining calls.

image

Here I use a Context Variable to properly build up the period.

This example uses simple static methods for the Nested Functions. Would it benefit by getting rid of the class names? I think it would read better to say Monday rather than DayOfWeek.Monday. Object Scoping would provide this at the cost of requiring the inheritance relationship. In Java, I could use static imports. The gain isn’t huge but would probably be worthwhile.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.27.131