Chapter 12. Macros and Metaprogramming

What Is Metaprogramming?

Metaprogramming is the use of code to modify or create other code. It is primarily a developer tool and acts as a force multiplier, allowing large amounts of predictable code to be generated from just a few statements in the host language (or "metalanguage"). It is extremely useful for automating repetitive, boilerplate code.

Most programming languages support some form of metaprogramming. C has a preprocessor and C++ has templates. Java has annotations and aspect-oriented programming extensions. Scripting languages have "eval" statements. Most languages have some sort of API that can be used to introspect or modify the core language features (such as classes and methods). As a last resort, any language can be used to build source code using string manipulation and then feed it to a compiler.

Code vs. Data

Whatever the implementation, metaprogramming systems have one feature in common: they manipulate code as data. Conceptually, programs execute code and consume or produce data as input and output. By definition, metaprogramming inverts this relationship. Programs consume or produce code (as their data), so when the generated program runs, it is executing data (as its code).

For most languages, treating code as data or data as code is a more or less a cumbersome process, depending on the type of data which represents the code.

One common strategy is to treat code as a textual string. Code can be created by concatenating keywords, variable names, and textual symbols, witht4 the resulting text fed back to the languages parser or evaluator. Needless to say, this can be quite messy and confusing for all but the simplest metaprogramming tasks.

Another strategy is to provide a set of APIs that expose the concepts of a programming language as objects within the language, allowing the programmer to make calls such as createClass()or addMethod(), to build code structures programmatically. This is much more effective than writing and parsing strings, and is used extensively in many object-oriented languages. In this case, the data is objects, which have a special relationship with the language runtime.

Homoiconicity

Clojure (and other Lisps) provide a third way of handling the code/data distinction: there is no distinction. In Clojure, all code is data and all data is code.

This property is called homoiconicity, which means that the language's code is represented in terms of the language's data structures. For example, this is a line of code in Clojure:

(println "Hello, world")

And this is a sequence (data):

'(println "Hello, world")

There is only one slight difference—the leading single quote. This is simply an instruction to Clojure that itshould only read the list, instead of reading it and immediately evaluating it, as it would in the first snippet. Forms like this (called quoted forms) stop after reading, rather than going on to be evaluated.

How Clojure code is loaded

Figure 12-1. How Clojure code is loaded

The key point is that Clojure source code isn't fundamentally comprised of strings: Clojure source code is comprised of data structure literals—vectors, maps, and sequences of symbols, literals, and other sequences. In Clojure, data structures are very, very easy to work with, thanks to the sequence abstraction. Metaprogramming is no more difficult than creating a list.

Macros

Macros are the primary means of metaprogramming in Clojure. A Clojure macro is a construct which can be used to transform or replace code before it is compiled. Syntactically, they look a lot like functions, but with several crucial distinctions:

  • Macros shouldn't return values directly, but a form.

  • Arguments to macros are passed in without being evaluated. They can then be altered, ignored, or added to the macro's output.

  • Macros are evaluated only at compile-time.

When you use a macro in your code, what you are really telling Clojure to do is to replace your macro expression with the expression returned by the macro. This is a powerful means of abstraction, and is very useful for implementing control structures or eliminating boilerplate or "wrapper" code.

For example, it is possible to define a macro called triple-do which takes one expression as an argument, and replaced it with a do form which evaluates the expression three times. The programmer would only type the following expression:

(triple-do (println "Hello"))

However, this would actually be compiled as this expression:

(do (println "Hello") (println "Hello") (println "Hello"))

Aside from debugging it, there's no need for the programmer ever to need to see or worry about this intermediate form. They can use it directly in their programs, and not worry about the complexity tucked underneath:

user=> (triple-do (println "Hello"))
Hello
Hello
Hello
nil

Working with Macros

To create a macro, use the defmacro macro. This defines a function and registers it as a macro with the Clojure compiler. From then on, when the compiler encounters the macro, it will call the function and use the return value instead of the original expression.

defmacro takes basically the same arguments as defn: a name, an optional documentation string, a vector of arguments, and a body. As previously mentioned, the body should evaluate to a valid Clojure form. If the form returned by the macro function is syntactically invalid, it will cause an error wherever it is used.

For example, the following code defines the very simple triple-do macro already mentioned:

(defmacro triple-do [form]
    (list 'do form form form))

This simply uses the built-in list function to create a list of four items: the do special form and three repetitions of the provided form. Note that do is quoted, so it is added to the resultant list as a symbol, rather than being evaluated in place in the body of the macro. If the provided form is (println "test"), this list will be (do (println "test") (println "test") (println "test")). This list is valid Clojure syntax, and so the macro works:

user=> (triple-do (println "test"))
test
test
test
nil

As another example of the possibilities of macros, it is possible write a macro that rewrites an infixed mathematical expression as a standard Clojure prefixed expression, so it can be evaluated. For example, it might transform (1 + 1) to the more standard (in Clojure) (+ 1 1). Prefix notation is the Lisp standard and is preferable for all programming tasks. Don't use something like this in your main Clojure code. However, this type of functionality could be useful for writing Domain Specific Languages (DSLs) for people who didn't know Lisp.

When developing, it's first helpful to have a clear idea of what you want the input and output expression to be. For this macro, you want to convert expressions like:

(infix (2 + 3))

to:

(+ 2 3)

The macro definition is:

(defmacro infix [form]
    (cons (second form) (cons (first form) (nnext form))))

It introspects the provided form, and uses cons to build a new expression, starting with the second item (the operator), then the first item (the first number), then any additional items. You can verify that it works using the following code:

user=> (infix (2 + 3))
5

Again, in general, it's bad form to go around redefining the standard way forms are evaluated. Typically, users should get consistent behavior whether their expression is within a macro or not. Still, this example demonstrates the power of macros, and occasionally there are good reasons to do such drastic transformations on expressions.

Debugging Macros

Using macros can be somewhat mind-bending, since you have to keep in mind not only the code you're writing, but the code you're generating. Clojure provides two functions that help debug macros as you write them: macroexpand and macroexpand-1. They both take a single quoted form as an argument. If the form is a macro expression, they return the expanded result of the macro without evaluating it, making it possible to inspect and see exactly what a macro is doing. macroexpand expands the given form repeatedly until it is no longer a macro expression. macroexpand-1 expands the expression only once. Both of them expand only the macro forms present in the original expression; they don't recursively expand additional macros present in the output.

The following example shows macroexpand applied to the macros defined in the previous section:

user=> (macroexpand '(triple-do (println "test")))
 (do (println "test") (println "test") (println "test")))

user=> (macroexpand '(infix (2 + 3)))
 (+ 2 3)

You can use different expressions with macroexpand, to see what the output for any arguments to your macro looks like, even though it can quickly become complicated:

user=> (macroexpand '(triple-do (do (println "a") (println "b"))))
 (do (do (println "a") (println "b")) (do (println "a") (println "b")) (do (println "a") (println "b")))

Sometimes, you can see errors before they occur. For example, if you pass an expression to the infix macro that is already prefixed, it will actually reverse the process and infix the result, which is:

user=> (macroexpand '(infix (+ 1 2)))
 (1 + 2)

Using macroexpand gives an opportunity to see potential problems before you actually try evaluating them. You can also run unit tests against the output of macroexpand to verify that your macros are behaving as expected.

Code Templating

Manually creating forms to return from macro functions can sometimes be tedious. Worse, with complex macros it can be difficult to determine what the output form will actually be.

To alleviate this problem, Clojure provides a code templating system. Effectively, it allows macro developers to enter the return forms of macros as literals, splicing in values where necessary.

The templating system is based around the syntax-quote character, a backquote: `. Syntax quoting works almost exactly the same as regular quoting with single-quote, with one major exception: you can use the unquote symbol (the tilde, ~) to insert a value at any point within the syntax-quoted expression. Also, symbols directly referenced within a syntax quote are assumed to be top level, namespace-qualified symbols and will be expanded as such.

For example, take the macro body of triple-do. It explicitly uses the list function to construct a list for return. Of course, the easier way to represent a list in code is to enter it as a literal, using the single quote. However, it's then impossible to modify it. By using syntax-quote, and by using unquote within it to insert values, it is possible.

The templated version of the triple-do macro looks like the following:

(defmacro template-triple-do [form]
    `(do ~form ~form ~form))

The do expression is represented as a list literal, and the return value of the macro function. It uses the syntax-quote character to ensure that it is treated as a literal and not evaluated right away. Inside the syntax-quote are three unquotes; they actually insert the value of the form parameter at that point inside the literal value.

The expansion of template-triple-do is identical to the original version:

user=> (macroexpand '(template-triple-do (println "test")))
 (do (println "test") (println "test") (println "test"))

Splicing Unquotes

Unquoting sequences within a syntax-quote doesn't always work out quite as intended. Sometimes, it is desirable to insert the contents of a sequence the templated list, rather than the list itself. To see why, try implementing the infix macro described previously, using templating:

(defmacro template-infix [form]
    `(~(second form) ~(first form) ~(nnext form)))

It looks like it should work fine. But try expanding it:

user=> (macroexpand '(template-infix (1 + 3)))
 (+ 1 (3))

There's an extra set of parenthesis around the 3, which will cause problems. The reason is that the ~(nnext form) expression resolves to a list, not an individual symbol. In this case, you want to insert the contents of the sequence returned by (nnext form), not the sequence itself.

To insert the contents of a list, use the splicing unquote, denoted by ~@. ~@ inserts the values of a sequence consecutively into a parent sequence. Using it instead of the normal unquote in the template-infix macro yields the correct results:

(defmacro template-infix [form]
    `(~(second form) ~(first form) ~@(nnext form)))

user=> (macroexpand '(template-infix (1 + 3)))
 (+ 1 3)

Generating Symbols

One very important rule of Clojure macros is that while it is possible to create and bind local symbols in macro-generated code, the names of such locals may not conflict with any existing symbols. But this is problematic: when writing a macro, it is impossible to know all of the potential contexts in which a macro might later be run. So Clojure enforces the rule: don't bind named symbols in macros.

Still, sometimes it's necessary to define local symbols in a macro. To get around this restriction, Clojure provides a feature called auto gensym within syntax quoted forms. Within any syntax-quoted form (forms using the back-tick, `), you can append the # character to the end of any local symbol name, and when the macro is expanded, it will replace the symbol with a randomly generated symbol that is guaranteed not to conflict with anything, and which will match any other symbol created with auto gensym in the same syntax-quote template. As long as you use the auto gensym feature on them, you can define as many local symbols as you like within your macros.

To see an example of this, consider a macro called debug-println which performs the same function as println, but instead of returning nil, it returns the value of the expression. This allows it to be used inside expressions and debug them. You want to be able to use it like this:

(+ 5 (* 4 (debug-println (/ 4 3)))

First, determine what you want the generated code to look like. In this case, it's as follows:

(let [result (/ 4 3)]
    (println (str "Value is: " result))
    result)

Then build the macro definition. Note how the result symbol is using the auto gensym feature:

(defmacro debug-println [expr]
    `(let [result# ~expr]
         (println (str "Value is: " result#))
         result#))

Calling macroexpand-1 shows the generated symbol name:

user=> (macroexpand '(debug-println (/ 4 3)))
(clojure.core/let [result_2349_auto (/ 4 3)]
    (clojure.core/println (clojure.core/str "Value is: " result_2349_auto)
    result_2349_auto)

With the exception of the alternate name for the result symbol, and the fully qualified function names, it looks exactly like what we originally wanted. And it works!

user=> (+ 5 (* 4 (debug-println (/ 4 3)))
Value is: 4/3
31/3

When to Use Macros

Macros are extremely powerful and allow you to control and abstract code in ways that would not be otherwise possible. However, using them does come at a cost. They operate at a higher level of abstraction, and so they are significantly more difficult to reason about then normal code. If a problem occurs, it can be much trickier to debug, since there's an extra level of indirection between where the problem actually is, and where the error message originates.

Therefore, the best way to use macros is to use them as little as possible. A few macros go a long way. Most things you need macros for (including some of the examples in this chapter) could also be accomplished with first-class functions. When you can, do that instead, and don't use macros.

That said, there are certain situations where using a macro is the best, easiest, or the only way to accomplish a given task. Usually, they fall into one of the following categories:

  • Implement control structures: One of the main differences between macros and functions is that the arguments of macros are not evaluated. If you need to write a control structure that might not evaluate some of its parameters, it has to be a macro.

  • Wrap def or defn: Usually, you only want to call def or defn at compile time. Calling them programmatically while a program is running is usually a recipe for disaster. So, if you need to wrap their behavior in additional logic, the best place to do it is usually a macro.

  • Performance: Because they are expanded at compile time, using a macro can be faster than calling a function. Usually, this doesn't make much of a difference, but in extremely tight loops, you can sometimes eke out performance by eliminating a function call or two and using macros instead.

  • Codify reoccurring patterns: Macros can be used to formalize any commonly occurring pattern in your code. In essence, macros are your means of modifying the language itself to suit your needs. Macros aren't the only way to do this, but they can sometimes do it in a way that is least invasive to other parts of your code.

Using Macros

Understanding macros and knowing when to use them can be a daunting proposition, so it is helpful to look at a range of examples to gain a sense of what macros can be used for. Unfortunately, no selection of examples can entirely cover the types of things you can do with macros: macros represent no less than an ability to change the language itself, and the potential ways one might want to do so are limitless. However, there are some common patterns that are often implemented with macros and being familiar with them can give you a head start in understanding when they can be useful.

Implementing a Control Structure

As mentioned, one of the important distinctions between macros and functions is that since macros are expanded before compilation, rather than at runtime, it is possible that their arguments might not be evaluated at all. This is an essential component of control structures, where it is necessary that only some of the provided expressions actually evaluate, not all of them.

Consider a control form which takes two expressions and executes only one of them randomly. This might be used in a game, or in an artificial intelligence implementation. You want it to look something like the following:

(rand-expr (println "A") (println "B") )

This cannot be implemented as a function, since both println statements are evaluated as arguments before rand-expr is even called. But you want only one of the expressions to evaluate at random. This can only be accomplished with a macro.

The first thing to do is to plan out the form to which you want the macro to expand. In this case, it has to include the logic for picking an expression at random from those provided. The expansion should look something like this:

(let [n (rand-int 2)]
         (if (zero? n) (println "A") (println "B")))

First, the macro needs to pick a random number between 0 and 1. Then, if the number is 0, it executes the first expression, otherwise the second.

The macro for this is fairly straightforward, given the syntax described:

(defmacro rand-expr [form1 form2]
    `(let [n# (rand-int 2)]
         (if (zero? n#) ~form1 ~form2)))

And, it works as expected, with the same expression sometimes evaluating (println "A") and sometimes (println "B"), never both.

user=> (rand-expr (println "A") (println "B"))
B
nil
user=> (rand-expr (println "A") (println "B"))
B
nil
user=> (rand-expr (println "A") (println "B"))
A
nil

Implementing a Macro with Variadic Arguments

Macros can take variable numbers of arguments. An example of this would be the preceding macro, but with the requirement that it randomly evaluate one of any number of expressions, rather than just one of two.

(rand-expr-multi (println "A") (println "B") (println "C"))

Creating a macro which takes a variable number of forms as "arguments" is easily done, the same way as it is for a function:

(defmacro rand-expr-multi [& forms] ...)

What about the macro body? How to handle the variable number of arguments? Obviously, since you don't know how many there are, you can't just reference them by name and slot them into place in an if expression as was done in the first draft of rand-expr. You might be tempted to use something like the nth function to select a random expression from the list, but consider: At macro-expansion time, when you're building the structure, you don't have access to the random value. It has to be generated within the expansion at runtime. If you generate it at compile time, it will effectively become a constant. Without access to the random value at expansion-time, you need to list all the possible expressions as options in one of Clojure's more primitive control structures. Macro expansion is a process purely of code transformation— keeping that fact firmly in mind will help avoid a lot of confusion about what is available at expansion time as opposed to run time.

One viable solution would be to try and generate an expansion of something along these lines:

(let [ct (count <number of expressions>))]
    (case (rand-int ct)
        0 (println "A")
        1 (println "B")
        2 (println "C")))

The most succinct way is to use splicing unquote to splice in the list of forms that constitute the body of the case. Noticing that these forms are alternating indexes and expressions lets you use the interleave function to generate the list to splice in, which shortens the code considerably:

(defmacro rand-expr-multi [& exprs]
    `(let [ct# ~(count exprs)]
         (case (rand-int ct#)
             ~@(interleave (range (count exprs)) exprs))))

It generates the expected expansion:

user=> (macroexpand-1 '(rand-expr-multi (println "A") (println "B") (println "C")))
(clojure.core/let [ct__2188__auto__ 3]
    (clojure.core/case (clojure.core/rand-int ct__2188__auto__)
         0 (println "A")
         1 (println "B")
         2 (println "C")))

Upon testing, it works as expected:

user=> (rand-expr (println "A") (println "B"))
B
nil
user=> (rand-expr (println "A") (println "B"))
A
Nil
user=> (rand-expr (println "A") (println "B"))
C
nil
user=> (rand-expr (println "A") (println "B"))
B
nil
user=> (rand-expr (println "A") (println "B"))
B
nil

Implementing a Macro Using Recursion

Macros can also be applied recursively. As an example, consider a custom macro, ++, which can be used instead of +, and which automatically replaces multiargument addition expressions with nested binary expressions which perform slightly better in Clojure (see Chapter 14 for a more comprehensive discussion of this issue). In other words, it takes easy-to-read expressions such as (++ 1 2 3 4 5) and transforms them to slightly better performing, but more complex expressions like (+ 1 (+ 2 (+ 3 (+ 4 5)))).

Like recursive functions, recursive macros must have a base case at which they no longer recur, or else they will continue recursing forever and cause a stack overflow error, though at compile time instead of runtime. For the ++ macro, the base case is when it is passed only one or two arguments. In that scenario, it merely emits a standard + expression. When given three or more arguments, it applies itself recursively to its argument list, emitting an additional nested expression with each level of recursion.

It's easiest to look at the code:

(defmacro ++ [& exprs]
    (if (>= 2 (count exprs)
        `(+ ~@exprs)
        `(+ ~@(first exprs) (++ ~@(rest exprs)))))

It is very straightforward. There is one if condition, which differentiates between the base and recursive case. In the base case, it simply splices the provided expressions into a straightforward application of the + function. In the recursive case, it also creates a + function application and splices in the first expression as the first argument. For the second argument, it recursively inserts ++, splicing in the rest of the expressions as its arguments.

When the macro is expanded, the first layer is unwrapped and shows that it is correct, at least so far.

user=> (macroexpand '(++ 1 2 3 4))
(clojure.core/+ 1 (user/++ 2 3 4))

To see the entire recursive expansion, you can use Stuart Sierra's clojure.walk library, which is packaged with Clojure. It includes a macroexpand-all which, unlike macroexpand or macroexpand-1, does recursively expand all the macros it can find until there are none left. Importing and running macroexpand-all gives the complete, final expansion:

user=> (clojure.walk/macroexpand-all '(++ 1 2 3 4))
(clojure.core/+ 1 (clojure.core/+ 2 (clojure.core/+ 3 4)))

Actually using the macro shows it has the same semantics as +. It should be ever so slightly faster, as well, although the difference isn't detectable without an elaborate benchmark.

user=> (++ 1 2 3 4)
10

Using Macros to Create DSLs

One common use of macros is to generate custom DSLs. Using macros, a few simple, intuitive expressions can generate much more bulky, complex code without exposing it to the user.

The potential use for DSLs in Clojure is unlimited. Enclojure (the web framework for Clojure currently in vogue) allows the user to define web application paths and restful APIs using a simple, immediately understandable DSL syntax. Another Clojure project, Incanter, provides a DSL based on the R programming language that is incredibly succinct and useful for doing statistics and building charts.

Clojure's DSLs are particularly effective because there is no sharp distinction between an API and a DSL. Every well-designed Clojure API automatically ends up looking a lot like a DSL, and as Clojure programs get more complex they tend to evolve high-level functions and macros that are extremely easy to read.

The following macro demonstrates a very rudimentary Clojure DSL, one that uses Clojure expressions to build something very similar to XML (minus complexities such as attributes and namespaces).

The xml macro shown here is slightly different from the previous examples of macros; its expansion is a string, rather than a collection of forms. A macro is used instead of a function because the DSL works by overriding the normal processing of the provided forms, rendering them to a string instead of evaluating them. It isn't the best way to process XML in Clojure, by a long shot—for that, look at the clojure.xml, clojure.zip, and Stuart Sierra's clojure.contrib.prxml libraries. This is just a small, manageable example that will show some of the versatility that macros provide.

The input of the macro is just a series of nested forms. The forms don't have to resolve: they will be transformed into a string by the macro without ever being evaluated. The macro transforms input like this:

(xml
    (book
        (authors
            (author "Luke")
            (author "Stuart"))))

Into output like this:

<book><authors><author>Luke</author><author>Stuart</author></authors></book>

The code itself is as follows:

(defn xml-helper [form]
    (if (not (seq? form))
        (str form)
        (let [name (first form)
               children (rest form)]
           (str "<" name ">"
                 (apply str (map xml-helper children))
                 "</" name ">"))))

(defmacro xml [form]
    (xml-helper form))

The macro is very lightweight. It is passed a single form which it immediately passes off to a helper function. Macro helper functions are a common idiom. Often, as in this case, the macro itself doesn't do any work at all, but only serves to obtain the original form as a sequence. From there, functions can do all the actual work of transformation. When this is possible, it is usually desirable, since functions are often much easier to reason about than macros. Just remember, the function will be evaluated at compile time, as the macro is expanded, so it will not have access to the full runtime state of your program.

The helper function is a simple recursive function. The base case is when the provided form is a primitive (not a sequence). It simply returns it as a string. When the form is a sequence, it creates and returns an XML string, using the first item as the element name and the rest of the items as children which it processes recursively.

Running the macro shows that it is working:

user=> (xml (book (authors (author "Luke") (author "Stuart"))))
"<book><authors><author>luke</author><author>Stuart</author></authors></book>"

From an XML processing perspective, it is terribly primitive and should not be used for any real work. As a demonstration of the power of macros, it is beautiful. The conversion from nested expressions to XML string happens at compile time. Because xml is a macro which returns a string, a program using it will actually "see" the xml expression as a string literal! The mini-XML DSL shown here is now an extension of the Clojure compiler itself.

Obviously such power can be abused, and it is possible to use macros to build incredibly obtuse and convoluted expressions. When used correctly, they provide nearly unlimited power to change the language to suit any need.

Summary

Through macros, Clojure provides powerful, elegant metaprogramming facilities. In Clojure, code and data are interchangeable, and macros are compile-time functions which emit data that becomes code.

Macros can either build code directly, or use syntax-quoting to template their output. They are hygienic, in that symbols bound by macros must use the auto gensym feature to avoid potential collisions with existing symbols.

Although they can add complexity to a program, when used judiciously macros provide the means to eliminate nearly all repeated and boilerplate code. They allow the developer to create language-level control structures and abstractions, extending the language exactly as needed to fit the problem domain. Tasteful and restrained use of macros, along with Clojure's other dynamic features such as first-class functions, allows developers to create custom DSLs, organically adapting their systems to fit a problem domain, rather than being forced to restate their problems just to meet the demands of an inflexible system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.246.148