Chapter 15. Macro

Transform input text into a different text before language processing using Templated Generation.

image

A language has a fixed set of forms and structure that it can process. At times, we see a way to add abstraction to a language by manipulating its input text with a purely textual transformation before that text is parsed by the compiler or interpreter for that language. Since we know the final form we’d like to see, it makes sense to describe the transformation by writing the desired output, with callouts for any parametrizable values.

A Macro allows you to define these transformations, either in a purely textual form or as a syntactic macro that understands the syntax of the underlying language.

15.1 How It Works

Macros are one of the oldest techniques for building abstractions in programming languages. In the early days of programming, macros were as prevalent as functions. Since then, they’ve largely fallen out of favor, mostly for good reasons. But there are still places where they do appear in internal DSLs, particularly in the Lisp community.

I like to separate macros into two main varieties: textual macros and syntactic macros. Textual macros are more familiar and easy to understand—they treat text as text. Syntactic macros are aware of the syntactic structure of the host language, thus making it easier to ensure that they operate on syntactically sensible units of text and produce syntactically valid results. A textual macro processor can operate with any language that’s represented as text—which means pretty much any language. A syntactic macro processor is designed to work with only a single language; it is often baked into the tooling for that language, or even into the language specification itself.

To understand how Macros work, I think it’s easiest to understand textual macros first, to get a hold of the basic concepts, even if you’re more interested in syntactic macros.

15.1.1 Textual Macros

Most modern languages don’t support textual macros and most developers avoid them. However, you can use textual macros with any language by using a generic macro processor such as the classic Unix m4 macro processor. Template engines, such as Velocity, are very simple macro processors and can be used for some of the techniques. And although most modern languages shy away from macros, C (and thus C++) has a macro preprocessor built into the basic tooling. C++ gurus mostly tell people to avoid the preprocessor, with good reason, but it’s still there.

The simplest form of macro processing is substitution of one string for another. A good example of this being useful is avoiding duplication when specifying colors in CSS documents. Say you have a website, and there’s a particular color that you use repeatedly—for table borders, line colors, text highlighting, etc. With basic CSS, you’d have to repeat the color code every time you use it.

div.leftbox { border-bottom-color: #FFB595}
p.head { bgcolor: #FFB595 }

This duplication makes it harder to update the color, and the use of a raw code makes it harder to understand what’s happening. With a macro processor, you can define a special word for your color and use that instead.

div.leftbox { border-bottom-color: MEDIUM_SHADE}
p.head { bgcolor: MEDIUM_SHADE }

Essentially, the macro processor goes through the CSS file and replaces MEDIUM_SHADE with the color symbol to produce the same text as in the first example above. The CSS file you edit isn’t therefore proper CSS; this language doesn’t have the ability to define symbolic constants, so you’ve enhanced the CSS language with a macro processor.

For this example, you could do the substitution using a simple search-and-replace on the input text, essentially using Textual Polishing. Although text substitution is staggeringly simple, it is a common use of macros in C programming, specifically for symbolic constants. You can use the same mechanism to introduce common elements to files, like common headers and footers to web pages. Define a marker in your pre-HTML file, run the substitution over it, and get your actual HTML file. A simple trick like this is remarkably handy for small websites that want a common header and footer without duplicating it on every page.

More interesting textual macros are those that allow you to parametrize them. Consider the case where you frequently want to determine the maximum of two numbers, so you repeatedly write the C expression a > b ? a : b. You can write this in the C preprocessor as a macro:

#define max(x,y) x > y ? x : y

int a = 5, b = 7, c = 0;
c = max(a,b);

The difference between a macro and a function call is that the macro is evaluated at compile time. It does textual search-and-replace for the max expression, substituting the arguments as it goes. The compiler never sees max.

(I should mention here that some environments use the term “macro” for a subroutine. Annoying, but such is life.)

So a macro gives you an alternative to a function call. It has the bonus of avoiding all the overhead of invoking a function—which C programmers often worried about, particularly in the early years. The trouble with macros is that they have a lot of subtle problems, particularly if they use parameters. Consider this macro for squaring a number:

#define sqr(x) x * x

Seems simple and should work. But try invoking it like this:

int a = 5, b = 1, c = 0;
c = sqr(a + b);

In this case, the value of c is 11. This is because the macro expansion resulted in the expression a + b * a + b. Since * binds tighter than +, you get a + (b * a) + b rather than (a + b) * (a + b). This is one example where a macro’s expansion results in something other than what the programmer was expecting, so I call it a mistaken expansion. Such expansions may work most of the time but only break down in particular cases, leading to surprising bugs that are hard to find.

You can avoid that case by using more parenthesis than a Lisper.

#define betterSqr(x) ((x) * (x))

Syntactic macros avoid much of this because they operate with a knowledge of the host language. However, there are other macro problems that they share. I’ll illustrate these first with textual macros.

Let’s go back to the max macro and watch me mess this one up.

image

This is an example of multiple evaluation where we pass in a argument that has a side effect, and the macro body mentions the argument more than once and thus evaluates it more than once. In this case, a and b are both incremented twice. Again, this is a good example of a bug that can be hard to find. It’s particularly frustrating because it’s hard to predict the various ways macro expansions can go wrong. You have to think differently than you do with function calls, and it’s harder to see through consequences, particularly when you start nesting macros.

For some more snakes, consider the following macro. It takes three arguments: a size 5 array of integers, a cap, and a slot for the result. It adds up the numbers in the array and puts either that sum or the cap, whichever is smaller, into the result slot.

image

We’d call it like this:

int arr1[5] = {1,2,3,4,5};?
int amount = 0;
cappedTotal (arr1, 10, amount);

This works quite nicely (despite the fact it would be better as a function.) Now, look at this slight variation in usage:

int total = 0;
cappedTotal (arr1, 10, total);

After this code total is 0. The problem is that the name total was expanded into the macro but interpreted by the macro as a variable defined within the macro itself. As a result, the variable passed into the macro is ignored—this error is called variable capture.

There is also a reverse of this problem, which doesn’t happen in C but does in languages that don’t force you to declare variables. To illustrate this, I’ll do some textual macros in Ruby—an exercise that’s almost too pointless even by the standards of book examples. For our macro processor, we’ll use Velocity, which is a fairly well-known tool for generating web pages. Velocity has a macro feature, which I can press into service for this illustration.

We’ll use the cappedTotal example again, just as with C. Here is the Velocity macro on the Ruby code:

image

It’s not very idiomatic Ruby, to put it mildly, but it’s conceivable that a new Ruby programmer, fresh from C, might do it this way. Within the macro body, the variables $input, $cap, and $result refer to the arguments when the macro is called. Our hypothetical programmer might use the macro in a Ruby program like this:

array = [1,2,3,4,5]
#cappedTotal('array' 10 'amount')
puts "amount is: #{amount}"

If you now use Velocity to process the Ruby program before running it and run the resulting file, it all seems to work fine. Here’s what it expands to:

image

Now, our programmer went off for a cup of tea, then came back and wrote this code:

image

He would be surprised. The code works, in that it sets amount correctly. However, he’ll sooner or later run into a bug because the variable total is altered behind the scenes when the macro runs. This is because the body of the macro mentions total, so when it’s expanded, the expansion changes the value of the variable. The total variable has been captured by the macro. The consequences of the capture may be different, indeed worse, than the earlier form of variable capture, but both of them stem from the same basic problem.

15.1.2 Syntactic Macros

As a result of all of these issues, macro processing, particularly textual macros, has fallen out of favor in most programming environments. You still run into it in C, but modern languages avoid macros entirely.

There are two notable exceptions—languages that use and encourage syntactic macros: C++ and Lisp. In C++, the syntactic macros are templates which have spawned many fascinating approaches to generating code at compile time. I’m not going to talk about C++ templates any more here. Partly, this is because I’m not very familiar with templates, as my C++ work predates them becoming common. C++ is also not a language noted for internal DSLs; usually, DSLs in the C/C++ world are external. After all, C++ is a complex tool to use even for experienced programmers, which doesn’t encourage internal DSL usage. (As Ron Jeffries puts it: It’s a long time since I did C++ . . . but not long enough!)

Lisp, however, is another matter. Lispers have been talking about doing internal DSLs in Lisp since the dawn of Lisp, which is a long time since Lisp is one of the oldest programming languages still in active use. This is no surprise, for Lisp is all about symbolic processing—that is, about the manipulation of language.

Macros have penetrated deeper into Lisp’s heart than almost any other programming language. Many core features of Lisp are done through macros, so even a beginning Lisp programmer will use them—usually without realizing they are macros. As a result, when people are talking about language features for internal DSLs, Lispers will always talk about the importance of macros. When the inevitable language comparison arguments surface, Lispers can be counted on to belittle any language that doesn’t have macros.

(This also puts me in a somewhat awkward pose. Although I’ve done plenty of dabbling with Lisp, I’d not call myself a serious Lisper and am not active in the Lisp community.)

Syntactic macros do have some powerful abilities, and Lispers do use them. However, much, perhaps most, use of macros in Lisp is to polish the syntax for handling Closures. Here’s a simple and silly example of a closure for an Execute-Around Method [Beck SBPP] in Ruby:

image

The open method is implemented like this:

image

The key point here is that the content of the closure isn’t evaluated until the receiver calls yield. This ensures that the receiver can open the safe before running the passed-in code. Compare this approach:

puts aSafe.open(aSafe.contents)

This doesn’t work because the code in the parameter is evaluated before the call to open. Passing the code in a closure enables you to defer the evaluation of that code. Deferred evaluation means that the receiving method to a call chooses when, or indeed if, to execute the code that’s been passed in.

It makes sense to do the same thing in Lisp. The equivalent call would be:

(openf-safe aSafe (read-contents aSafe))

We might expect that this can be implemented using a function call like this:

image

But this doesn’t defer evaluation. In order to defer evaluation, you need to call it like this:

(openf-safe aSafe (lambda() (read-contents aSafe)))

But this looks way too messy. To get the clean style of call to work, you need a macro.

image

This macro avoids the need to wrap functions in lambdas, so we can call this with a clearer syntax.

(openm-safe aSafe (read-contents aSafe))

A large part (perhaps the majority) of the use of Lisp macros is to provide a clear syntax for the mechanism of delayed evaluation. A language with a cleaner closure syntax doesn’t need macros for this.

The macro above will work almost all of the time, but that “almost” indicates problems—for example, if we call it like this:

(let (result)
  (setq result (make-safe "secret"))
  (openm-safe result (read-contents result)))

This problem is variable capture, causing an error if we use a symbol named result as an argument. Variable capture is a endemic problem for Lisp macros; as a result, Lisp dialects have worked hard to come up with ways to avoid it. Some, like Scheme, have a hygienic macro system, where the system avoids any variable capture by redefining symbols behind the scenes. Common Lisp has a different mechanism: gensyms, essentially an ability to generate symbols for these local variables guaranteeing that they won’t collide with anything else. Gensyms are more trouble to use, but they give the programmer the ability to deliberately use variable capture, and there are some situations when deliberate variable capture is useful, although I’ll leave that discussion to Paul Graham [Graham].

Apart from variable capture, there is also the potential problem of multiple evaluation, as the parameter safe is used at several points in the expansion definition. To avoid this, I need to bind the parameter to another local variable, which also needs a gensym, which results in this:

image

Avoiding such issues makes macros a lot harder to write than they might seem at first sight. Despite this, the deferred evaluation with a convenient syntax is used heavily in Lisp, because closures are important for creating new control abstractions and alternative computational models—which is the kind of thing Lispers like doing.

Despite the fact that a large proportion of Lisp macros are written for deferred evaluation, there are other useful things you can do with Lisp macros that are beyond what can be done with syntactically convenient closures alone. In particular, macros provide a mechanism for Lispers to do Parse Tree Manipulation.

Lisp syntax seems quirky on first glance, but as you get used to it, you realize that it’s a good representation of the parse tree of the program. With each list, the first element is the type of the parse tree node, and the remaining elements are its children. Lisp programs use Nested Functions heavily, and the result is a parse tree. By using macros to manipulate the Lisp code before evaluation, Lispers can do Parse Tree Manipulation.

Few programming environments support Parse Tree Manipulation at the moment, so Lisp’s support for it is a distinguishing feature of the language. In addition to supporting DSL elements, it also allows for more fundamental manipulations in the language. A good example of this is the standard common Lisp macro setf.

Although Lisp is often used as a functional language—that is, one that doesn’t have side effects on data—it does have functions to store data in variables. The basic function for this is setq which can set a variable like this:

(setq var 5)

Lisp forms lots of different data structures out of nested lists, and it may be that you want to update data in these structures. You can access the first item in a list with car and update it with rplaca. But there are lots of ways to access various bits of data structures, and valuable brain cells are spent on remembering an access function and an update function for each one. So, to help matters, Lisp has setf which, given an access function, will automatically calculate and apply its corresponding update. Thus we can use (car (cdr aList)) to access the second element in the list and (setf (car (cdr aList)) 8) to update it.

image

This is an impressive trick, which can seem almost magical. There are limitations on it, which reduce the magic. You can’t do this on any expression. You can only do this on expressions that are made up of invertible functions. Lisp keeps a record of inverse functions, such as rplaca being the inverse of car. The macro analyzes its first argument expression and computes the update expression by finding the inverse function. As you define new functions, you can tell Lisp their inverses, and then use setf to do updates.

I’m waving my arms a little here, as setf is more complicated than my brief description implies. But the important fact for this discussion is that, to define setf, you do need macros because setf depends on the ability to parse the input expression. This ability to parse its arguments is the key advantage of Lisp macros.

Macros work well for Parse Tree Manipulation in Lisp because Lisp’s syntactic structure is so close to the parse tree. However, macros aren’t the only way to do Parse Tree Manipulation. C# is the example of a language that supports Parse Tree Manipulation by providing the ability to get the parse tree for an expression and a library for the program to manipulate it.

15.2 When to Use It

On a first encounter, textual macros are quite appealing. They can be used with any language that uses text, they do all their manipulation at compile time, and they can implement very impressive behaviors that are beyond the abilities of the host language.

But textual macros come with many problems. Subtle bugs like mistaken expansions, variable capture, and multiple evaluation are often intermittent and hard to track down. The fact that macros don’t appear in downstream tools means the abstractions they provide leak like a sieve without the wires, and you get no support from debuggers, intelligent IDEs, or anything else that relies on the expanded code. Most people also find it much harder to reason about nested macro expansion than about nested function calls. That could be a lack of practice in dealing with macros, but I suspect it’s something more fundamental.

To sum up, I don’t recommend using textual macros in anything but the very simplest cases. I think that for Templated Generation they work acceptably, providing you avoid trying to be too clever with them—in particular, avoiding nesting the expansions. But otherwise they are simply not worth the trouble.

How much of this reasoning applies to syntactic macros? I’m inclined to say that most of it does. While you are less likely to get mistaken expansions, the other problems still crop up. This makes me very wary of them.

A counterexample to this is the heavy use of syntactic macros in the Lisp community. As an outsider to this world, I feel a certain reluctance to make too much of a judgment. My overall sense is that they do make sense for Lisp, but I’m not convinced that the logic of using them there makes sense for other language environments.

And that, in the end, is the nub of the choice on whether to use syntactic macros. Most language environments don’t support them, so there’s no choice to worry about. Where you do have them, for example in Lisp and C++, they are often necessary to do useful things, so you have to become at least a little familiar with them. That means that the choice on using syntactic macros is really made for you by your language environment.

The only choice this leaves is whether syntactic macros are a reason to choose a language that has them. For the moment, I see macros as a worse choice than available alternatives, and thus a point deducted from those environments that use them—but with the rider that I haven’t worked closely enough with those languages to be completely sure of my judgment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.227.9