Chapter 3. Controlling Program Flow

Functions

As a functional language, functions are the beginning and end of every Clojure program. The "shape" of any Clojure program is like a tree, each function branching out and calling other functions. Understanding a Clojure program means understanding its functions and the patterns in which they are called. Use functions carelessly and your Clojure programs will be incomprehensible spaghetti. Use them thoughtfully and your Clojure programs will be fast, elegant, and a genuine joy both to write and to read.

First-Class Functions

In Clojure, all functions are first-class objects. This means the following:

  • They can be dynamically created at any point during the execution of the program.

  • They aren't intrinsically named, but can be bound to symbols or to more than one symbol.

  • They can be stored as values in any data structure.

  • They can be passed to, and returned from, other functions.

Contrast this with functions in more static languages, such as Java or C. In these languages, functions must always be defined and named up-front, before compilation. It is a tremendous advantage of Clojure (and other functional languages) to be able to define new functions on-the-fly and to store them in arbitrary data structures.

Defining Functions with fn

The most basic way to define a function is with the fn special form, which returns a new first-class function when evaluated. In its simplest form, it takes two arguments: a vector (a bracketed list) of argument symbols and an expression which will be evaluated when the function is called.

Note

Vectors, delimited by left and right square brackets, have not yet been discussed. For a detailed explanation of their characteristics, see Chapter 4. For now, you can think of them as an alternate way of expressing a list. Unlike lists delimited by parentheses, they don't denote a function call when evaluated, so they are suitable for quickly and easily expressing literal data structures in code.

For example, at the REPL, you can define an extremely simple function which takes two arguments and multiplies them.

user=> (fn [x y] (* x y))

This form may look slightly complicated, but it is really very simple: it is a form consisting of just three other forms: fn, [x y] and (* x y). fn is called with the other two as arguments—the vector [x y] defines that the new function has two arguments, x and y, while (* x y) is the body of the function, with x and y bound to their respective arguments. There is no need to use any kind of explicit return statement—the function always returns the evaluation of the provided expression.

However, this isn't much use on its own. It just returns the function, which then gets translated to a string to be printed by the REPL. The string view of a function isn't particularly pretty or useful:

#<user$eval__43$fn__45 user$eval__43$fn__45@ac06d4>

What's more, you now can't use this function, because you didn't bind it to any symbol or put it in any data structure. The JVM might have garbage collected it right away, because it was of no more use. Typically, it's more useful to bind a function to a var, like this:

user=> (def my-mult (fn [x y] (* x y)))

You can now use the new function in any context where you have access to that var:

user=> (my-mult 3 4)
12

And, it works as advertised. The expression (fn [x y] (* x y)) is evaluated to a first-class function, which is then bound to the symbol my-mult. To call my-mult, you evaluate a list with a function as the first element. my-mult resolves to the new function, which is then called with 3 and 4 as arguments.

Note, however, that the assignment of the function to the symbol is only one way to use it, as long as something which resolves to a function is used as the first element of a form it will be called, whether it is a symbol or not. For example, it is entirely possible to define a function and use it within the same form:

user=> ((fn [x y] (* x y)) 3 4)
12

In this form, notice that the entire function definition, (fn [x y] (* x y)), is used as the first item in the form. When it is evaluated, it resolves to a function and is passed 3 and 4 as arguments, the same as when it was bound to a symbol and the symbol was evaluated.

The important thing to remember is that functions are not the same as the symbols to which they are bound. In the previous example, my-mult is not the function, it is only a symbol bound to the function. When it is called, it is not calling my-mult, it is resolving my-mult to obtain a function and calling that in turn.

Defining Functions with defn

Although functions are distinct from the symbols to which they may be bound, it is by far the most common case that functions are named and bound to particular symbols for later use. For this purpose, Clojure provides the defn form as a shortcut for defining a function and binding it to a symbol. defn is semantically equivalent to using def and fn together, but shorter and more convenient. It also offers the ability to add a documentation string to a function, explaining how it is used.

The defn form takes the following arguments: a symbol name, a documentation string (optional), a vector of arguments, and an expression for the function body. For example, the following code defines a function which squares a single argument:

user=> (defn sq
          "Squares the provided argument"
          [x]
          (* x x))

You can then call the function using the assigned name:

user=> (sq 5)
25

You can check the doc-string of any function using the built-in doc function, which prints information on a function (including its doc-string) to the standard system output.

user=> (doc sq)
---------------------
user/sq
([x])
   Squares the provided argument
nil

Tip

The doc function is very useful for exploratory programming. All the built-in Clojure functions (as well as practically all libraries) provide good documentation, and using doc it is all easily accessible from the REPL. Make it your practice to document your functions with doc-strings as well, even if nobody else ever reads your code. You will be surprised how much of an aid it is to your own memory after a week or two. Making it easy to remember exactly what your functions do is very helpful.

Functions of Multiple Arities

Arity refers to the number of arguments that a function accepts. In Clojure, it is possible to define alternate implementation for functions based on arity.

This uses the same fn or defn forms previously discussed, but with a slight modification in the arguments. Instead of passing a single vector for arguments and expression for the implementation, you can pass multiple vector/expression pairs, each enclosed in parentheses. This is easier to demonstrate rather than explain:

user=> (defn square-or-multiply
         "squares a single argument, multiplies two arguments"
        ([] 0)
         ([x] (* x x))
        ([x y] (* x y)))

This defines a function with three alternate implementations. The first is an empty vector and will be applied when the function is called with no arguments. The implementation just returns the constant 0. The second implementation takes a single argument, and returns that argument multiplied by itself. The third implementation takes two arguments, and returns their product. This can be verified in the REPL:

user=> (square-or-multiply)
0
user=>(square-or-multiply 5)
25
user=>(square-or-multiply 5 2)
10

Functions with Variable Arguments

Often, it is necessary to have a function that takes any number of arguments. This is referred to as variable arity. Clojure accommodates this requirement by providing the special symbol & in the argument definition vector for function definitions. It works in both fn and defn.

To use it, just add a & and a symbol name after any normal argument definitions in your argument definition vector. When the function is called, any additional arguments will be added to a seq (similar to a list), and the seq will be bound to the provided symbol. For example, the following code:

user=> (defn add-arg-count
         "Returns the first argument + the number of additional arguments"
        [first & more]
        (+ first (count more)))

count is simply a built-in function which returns the length of a list. Try it out, using the following code:

user=> (add-arg-count 5)
5
user=> (add-arg-count 5 5)
6
user=> (add-arg-count 5 5 5 5 5 5)
10

In the first call, the single argument 5 is bound to first, and the empty list is bound to more since there are no additional arguments. (count more) returns 0, and so the result is simply the first argument. In the second and third calls, however, more is bound to the lists (5) and (5 5 5 5 5), the lengths of which are 1 and 5, respectively. These are added to 5 and returned.

Chapter 4 discusses lists and some common functions for reading and extracting values from them. These will all work on the list bound to the more argument.

Shorthand Function Declaration

As succinct as fn can be when defining functions, there are still cases where it can be cumbersome to type it out in its entirety. Typically, these are cases where a function is declared and used inline, rather than bound to a top-level symbol.

Clojure provides a shorthand form for declaring a function, in the form of a reader macro. To declare a function in shorthand, use the pound sign, followed by an expression. The expression becomes the body of the function, and any percent signs in the body are interpreted as arguments to the function.

Note

Reader macros are specialized, shorthand syntax and can usually be identified because they are just about the only forms in Clojure that are not contained by matched parenthesis, brackets, or braces. They are resolved as the first step when parsing Clojure code and are transformed into their long form before the code is actually compiled. The shorthand function form #(* %1 %2) is actually identical to the longer form (fn [x y] (* x y)) before it is even seen by the compiler. Reader macros are provided for a few extremely common tasks, and they can't be defined by users. The rationale behind this limitation is that overuse of reader macros makes code impossible to read unless the reader is very familiar with the macro in question. Preventing users from creating custom reader macros lowers the barriers to sharing code and helps to keep Clojure consistent as a language. Still, they can be very useful for certain extremely common forms, so Clojure provides a small set that are available by default.

For example, here is the square function implemented in shorthand:

user=> (def sq #(* % %))
#'user/sq
user=> (sq 5)
25

The percent sign implies that the function takes a single argument and is bound to the argument within the function body. To declare shorthand functions with multiple arguments, use the percent sign followed by a numeral 1 through 20:

user=> (def multiply #(* %1 %2))
'#user/multiply
user=> (multiply 5 3)
15

%1 or % refers to the first argument, %2 to the second, etc. It can be readily seen that the shorthand function is much more compact, especially for functions declared inline:

user=> (#(* % %) 5)
25

The only downside to shorthand functions is that they can be difficult to read, so use them judiciously and only when they are very short. Also, be aware that shorthand function declarations cannot be nested.

Conditional Expressions

It is an essential characteristic of any program that it must be able to alter its behavior depending on the situation. Clojure, of course, provides a full set of simple conditional forms.

The most basic conditional form is the if form. It takes a test expression as its first argument. If the test expression evaluates to true, it returns the result of evaluating the second argument (the "then" clause). If the test expression evaluates to logical false (including nil), it evaluates and returns the third argument (the "else" clause), if one is provided, and nil if it is not. For example, the following code:

user=> (if (= 1 1)
           "Math still works.")
"Math still works."

Another example with an "else" expression:

user=> (if (= 1 2)
           "Math is broken!"
           "Math still works.")
"Math still works."

Clojure also provides an if-not form. This functions exactly the same way as if, except its behavior is reversed. It evaluates the second argument if the test expression is logically false, and the third only when logically true.

user=> (if-not (= 1 1)
           "Math is broken!"
           "Math still works.")
"Math still works."

Sometimes, it is useful to choose not just between true and false but between several different options. You could do this with nested if's, but it's much cleaner to use the cond form. cond takes as its arguments any number of test/expression pairs. It evaluates the first test, and, if true, returns the result of the first expression. If the first test evaluates to false, it tries the next test expression, and so on. If none of the test expressions evaluate to true, it returns nil, unless you provide an :else keyword as the last expression, which serves as a catch-all. For an example, let's define a function that uses cond to comment on the weather:

(defn weather-judge
        "Given a temperature in degrees centigrade, comments on the weather."
        [temp]
        (cond
                (< temp 20) "It's cold"
                (> temp 25) "It's hot"
                :else  "It's comfortable"))

Try it out with the following code:

user=> (weather-judge 15)
"It's cold"
user=> (weather-judge 22)
"It's comfortable"
user=> (weather-judge 30)
"It's hot"

Tip

cond can be useful, but be careful—large cond statements are be difficult to maintain, especially as the range of possible behaviors in your program grows. Instead, consider using polymorphic dispatch by means of multimethods, discussed in Chapter 9. Multimethods allow conditional logic, similar to cond, but are much more extensible.

Local Bindings

In a functional language, new values are obtained by function composition—nesting multiple function calls. Sometimes, however, it is necessary to assign a name to the result of a computation, both for clarity and, if the value might be used more than once, for efficiency.

Clojure provides the let form for this purpose. let allows you to specify bindings for multiple symbols, and a body expression within which those symbols will be bound. The symbols are local in scope—they are only bound within the body of the let. They are also immutable; once they are bound, they are guaranteed to refer to the same value throughout the body of the let and cannot be changed.

The let form consists of a vector of bindings and a body expression. The binding vector consists of a number of name-value pairs. For example, the following let-expression binds a to 2, b to 3, and then adds them:

user=> (let [a 2 b 3] (+ a b))
5

This is the simplest possible way to use let. However, it is fairly trivial and let adds more complexity than it provides value. For a more compelling example of when to use let, consider the following function:

(defn seconds-to-weeks
        "Converts seconds to weeks"
        [seconds]
         (/ (/ (/ (/ seconds 60) 60) 24) 7))

It works fine, but it's not very clear. The nested calls to the division function are a bit confusing, and although most people would be able to figure out the code without too much trouble, it is more work than it should be for this seemingly simple functionality. Also, one can easily imagine a similar function, with values and operations that are much less familiar. Such a function, written like this, might never be deciphered.

We can use let to clean up this definition:

(defn seconds-to-weeks
        "Converts seconds to weeks"
        [seconds]
        (let [minutes (/ seconds 60)
               hours (/ minutes 60)
               days (/ hours 24)
               weeks (/ days 7)]
        weeks))

This is longer, but you can see what's going on at each step of the calculation. You bind intermediary symbols to minutes, hours, days, and weeks, and then return weeks rather than doing the calculation all in one go. This example demonstrates mostly a stylistic choice. It makes the code clearer, but also longer. When and how to use it is up to you, but the bottom line is simple: use let to make your code clearer and to store the results of calculations, so you don't have to perform them multiple times.

Looping and Recursion

It will probably come as a minor shock to users of imperative programming languages that Clojure provides no direct looping syntax. Instead, like other functional languages, it uses recursion in scenarios where it is necessary to execute the same code multiple times. Because Clojure encourages the use of immutable data structures, recursion provides a much better conceptual fit than typical, imperative iteration.

Thinking recursively is one of the largest challenges coming from imperative to functional languages, but it is surprisingly powerful and elegant, and you will soon learn how to easily express any repeated computation using recursion.

Most programmers have some notion of recursion in its simplest form—a function calling itself. This is accurate, but does not carry any idea of how useful recursion can actually be or how to use it effectively and understand how it works in a variety of scenarios.

For effective recursion in Clojure (or any other functional language, for that matter), you only need to keep these guidelines in mind:

  • Use a recursive function's arguments to store and modify the progress of a computation. In imperative programming languages, loops usually work by repeatedly modifying a single variable. In Clojure, there are no variables to modify. Instead, make full use of a function's arguments. Don't think about recursion as repeatedly modifying anything, but as a chain of function calls. Each call needs to contain all the information required for the computation to continue. Any values or results that are modified in the course of a recursive computation should be passed as arguments to the next invocation of the recursive function, so it can continue operating on them.

  • Make sure the recursion has a base case or base condition. Within every recursive function, there needs to be a test to see if some goal or condition has been reached, and if it has, to finish recurring and return a value. This is similar to protecting against infinite loops in an imperative language. If there isn't a case where the code is directed to stop recurring, it never will. Obviously, this causes problems.

  • With every iteration, the recursion must make at least some progress towards the base condition. Otherwise, there is no guarantee that it would ever end. Typically, this is achieved by making some numeric value larger or smaller, and testing that it has reached a certain threshold as the base condition.

As an example, the following Clojure program uses Newton's algorithm to recursively calculate the square root of any number. It is a full, albeit small Clojure program with one main function and several helper functions that demonstrate all these features of recursion (see Listing 3-1).

Example 3-1. Calculating Square Roots

(defn abs
    "Calculates the absolute value of a number"
    [n]
    (if (< n 0)
        (* −1 n)
        n))

(defn avg
    "returns the average of two arguments"
    [a b]
    (/ (+ a b) 2))

(defn good-enough?
    "Tests if a guess is close enough to the real square root"
    [number guess]
    (let [diff (- (* guess guess) number)]
        (if (< (abs diff) 0.001)
            true
            false)))

(defn sqrt
    "returns the square root of the supplied number"
    ([number] (sqrt number 1.0))
    ([number guess]
    (if (good-enough? number guess)
        guess
        (sqrt number (avg guess (/ number guess))))))

Let's try it out. After loading this file into the Clojure runtime, execute try the following at the REPL:

user=> (sqrt 25)
5.000023178253949
user=> (sqrt 10000)
100.00000025490743

As advertised, this code returns a number within .001 of the exact square root.

The first three methods defined in this file, abs, avg, and good-enough?, are straightforward helper functions. You don't need to observe them too closely at this point, unless you want to. The meat of the algorithm happens in the fourth, the sqrt function.

The most obvious thing about the sqrt function is that it has two implementations. The first can be thought of as the "public" interface. It's easy to call, and takes only a single argument: the number for which you are trying to find the square root. The second is the recursive implementation, which takes both the number and your best guess so far. The first implementation merely calls the second, with an initial guess of 1.0.

The recursive implementation itself is simple. It first checks the base condition, defined by the good-enough? function, which returns true if your guess is close enough to the actual square root. If the base condition is met, the function doesn't recur any more, but simply returns the guess as the answer.

If the base condition is not met, however, it continues the recursion by calling itself. It passes the guess and the number to itself as arguments, as those are all it needs to continue the calculation. This fulfills the first characteristic of recursive functions defined above.

Finally, note the expression provided as the value of guess for the next iteration: (avg guess (/ number guess)). It always passes the average of the current guess and the number divided by the current guess. The mathematical properties of square roots guarantee that this number will always be closer to the square root of the number than the previous guess. This fulfills the last requirement for a good recursive function. With each iteration, it makes progress and gets closer to the result. Each time the function is run, guess gets a little closer to the actual square root, and eventually it is guaranteed to get close enough that good-enough? can return true and the calculation will end.

As another example, Listing 3-2 is a function that uses recursion to calculate exponents.

Example 3-2. Calculating Exponents

(defn power
    "Calculates a number to the power of a provided exponent."
    [number exponent]
    (if (zero? exponent)
        1
        (* number (power number (- exponent 1)))))

Trying it out with the following code:

user=> (pow 5 3)
125

This function uses recursion differently than the square root function. Here, you use the mathematical observation that xn = x * x(n-1). This can be seen in the recursive call: the function returns the number, multiplied by the number raised to one less than the initial power. You have a base case: it checks if the exponent is zero, and if so, returns 1, since x0 is always 1. Since you subtract 1 from the exponent on each iteration, you can be sure that you will eventually reach it (as long as you don't give the function a negative exponent). The function always makes progress towards the base condition.

Note

Of course, there are easier ways to get square roots and powers than implementing these functions. Both exist in Java's standard math library, which is extremely easy to call from Clojure. These are merely presented as clean examples of recursive logic. See the chapter on Java Interoperability for instructions on how to call Java library functions.

Tail Recursion

One practical problem with recursion is that, due to the hardware limitations of physical computers, there is a limit on the number of nested functions (the size of the stack). On the JVM, this varies and can be quite large. On the machine on which I write this, it's about 5000. Nevertheless, no matter how large the stack size is, it does force a major issue: there is a strict limit on the number of times a function can recur. For small functions, this rarely matters. But if recursion is a generic and complete replacement for loops, it becomes an issue. There are many situations in which it is necessary to iterate or recur indefinitely.

Historically, functional languages resolve this issue through tail-call optimization. Tail-call optimization means that, if certain conditions are met, the compiler can optimize the recursive calls in such a way that they do not consume stack. Under the covers, they're implemented as iterations in the compiled machine code.

The only requirement for a recursive call to be optimized in most functional languages is that the call occurs in tail position. There are several formal definitions of tail position, but the easiest to remember, and the most important, is that it is the last thing a function does before returning. If the return value of the "outer" function is wholly delegated to the "inner" function, the call is in tail position. If the "outer" function does anything with the value returned from the inner function except just return it, it is not tail recursive and cannot be optimized. This makes sense when the nature of the call stack is considered; if a call is in tail position, then the program can effectively "forget" that it was called recursively at all and delegate the entire program flow to the result of the inner function. If there is additional processing to do, the compiler can't throw away the outer function. It has to keep it around in order to finish computing its result.

For example, in the preceding examples, the recursive power function is not in tail position, because it doesn't simply return the value of the recursive call, but takes it and does additional math on it before returning. This cannot be optimized.

On the other hand, the recursive call in sqrt is in tail position, because all the function does with the call is to return the value—no extra processing required.

Clojure'srecur

In some functional languages, such as Scheme, tail call optimization happens automatically whenever a recursive call is in tail position. Clojure does not do this. In order to have tail recursion in Clojure, it is necessary to indicate it explicitly using the recur form.

To use recur, just call it instead of the function name whenever you want to make a recursive call. It will automatically call the containing function with tail-call optimization enabled.

For example, Listing 3-3 is non-recursive function which adds up all the numbers to a given limit, e.g., (add-up 3) = 1 + 2 + 3 = 6.

Example 3-3. Adding Up Numbers without Tail Recursion

(defn add-up
    "adds all the numbers below a given limit"
    ([limit] (add-up limit 0 0 ))
    ([limit current sum]
        (if (< limit current)
               sum
               (add-up limit (+ 1 current) (+ current sum)))))

This works fine and is valid according to the rules of recursion. It passes the current number, the sum so far, and the limit as arguments. It checks for a base case (when the current number is greater than the limit), and each iteration gets closer to the base case. It works great for small and moderate values:

user=> (add-up 3)
6
user=> (add-up 500)
125250

But if you try to use it on a really large number, it chokes:

user=> (add-up 5000)
java.lang.StackOverflowError

This is where you need tail call optimization. Just redefine it, replacing the call to adds-up with a call to recur, as shown in Listing 3-4.

Example 3-4. Adding up Numbers Correctly with Tail-recursion

(defn add-up
    "adds all the numbers up to a limit"
    ([limit] (add-up limit 0 0 ))
    ([limit current sum]
        (if (< limit current)
                sum
                (recur limit (+ 1 current) (+ current sum)))))

Now you can give it a try:

user=> (add-up 5000)
12502500

It works with no problems. Using recur, the only limit to how much recursion you can use is how long you are willing to wait for the processing to finish.

Note

Clojure has come under fire from some quarters for not doing tail-call optimization by default, whenever possible, without the need for the recur special form. Although the invention of recur was spurred by the limitations of the JVM that make it difficult to do automatic tail optimization, many members of the Clojure community find that having explicit tail recursion is much clearer and more convenient than having it implicitly assumed. With Clojure, you can tell at a glance if a function is tail recursive or not, and it's impossible to make a mistake. If something uses recur, it's guaranteed never to run out of stack space due to recursion. And if you try to use recur somewhere other than in correct tail position, the compiler will complain. You are never left wondering whether a call is actually in tail position or not.

Using loop

The loop special form, used in conjunction with recur, provides the capability to make tail recursion even simpler by providing the means to declare and call a function at the same time. Logically, loop is no different from defining and then immediately calling an anonymous recursive function, but it makes it much easier to "read" the logical flow and see how iterative looping and tail-recursion are actually the same thing.

To define a loop construct, use the loop form. It in turn takes two forms: first, a vector of initial argument bindings (in name/value pairs) and an expression for the body. Whenever recur is used within the body of the loop, it will recursively "call" the loop again with any passed arguments rebound to the same names as in the loop definition.

For example, the following is a very simple loop that establishes an initial binding of the symbol i to 0, recursively increments it up to ten and then returns:

(loop [i 0]
    (if (= i 10)
        i
        (recur (+ i 1))))

Note that, like any recursive function, the loop body has a base case (when i = 10) and makes progress towards the base case with every iteration. Unlike a recursive function, however, there isn't any need to define a function by itself. loop sets up your functions and assigns initial values, and then provides the point that the program execution "comes back" to when recur is called. You can look at it equally well as a recursive call, or an iterative loop with a set of values that changes each time around.

This is extremely useful, to the point where almost all uses of recur in practice are coupled with a loop. One extremely common idiom when writing recursive functions in other functional languages is to have two versions of the function—one recursive, one not. Typically, the non-recursive version sets up some initial values and then calls the recursive function. This is a natural outcome of good recursive style—the recursive function may need a lot of arguments to keep track of its computational state, but those don't always need to be exposed to the end caller of the function. loop provides the capability to do this much more compactly. To see an example of this, look at the square root function introduced earlier in this chapter (modified to use recur instead of direct recursion).

(defn sqrt
    "returns the square root of the supplied number"
    ([number] (sqrt number 1.0))
    ([number guess]
    (if (good-enough? number guess)
        guess
        (recur number (avg guess (/ number guess))))))

Notice the two implementations of the function—the non-recursive version sets the initial value of guess, and then kicks off the recursion. You can refactor this to use loop and to do both of these things in a single step:

(defn loop-sqrt
    "returns the square root of the supplied number"
    [number]
    (loop [guess 1.0]
        (if (good-enough? number guess)
guess
            (recur (avg guess (/ number guess))))))

This version only has one function implementation. The loop sets the initial value of guess and immediately executes its body. When recur is called, it "calls" the loop statement again, not the top-level function. The argument to recur is matched up with the binding in the loop, so with each iteration the new guess value is bound to guess. The code meant to repeat is neatly packaged between loop and recur.

Deliberate Side Effects

As discussed in Chapter2, Clojure avoids side effects wherever possible, preferring a purely functional style. Some tasks, however, such as IO, explicit state management and Java interaction are, by their very nature, side effects. These cannot be incorporated into a fully functional program and so Clojure provides constructs to explicitly run side effects.

Using do

The most important and basic way to run a side effect is to use the do special form. do is very simple. It takes multiple expressions, evaluates them all and returns the value of the last one. This means that from a functional standpoint, all expressions but the last are ignored; they are present only as a means to execute side effects.

For example, take the println function. println is a side effect, since it performs output. It returns nil, so it doesn't fit well in a functional program (which rely heavily on meaningful return values). The following code entered at the REPL uses do to call several println functions as side effects then returns a distinct value.

user=> (do
              (println "hello")
              (println "from")
              (println "side effects")
              (+ 5 5))

The following output is produced:

hello
from
side effects
10

The first three lines are output produced as a result of calling println: the final value, 10, is the return value of the do form itself printed to the REPL as output, not a side effect. Side effects will be called whenever the do form is evaluated, whether at the REPL or not.

Side Effects in Function Definitions

If you have a function that needs to perform side effects, Clojure also provides a way to run side effects directly from a function definition, using either fn or defn, or directly inside the body of a loop without needing to explicitly use a do form. This is accomplished quite simply by providing multiple expressions, instead of just one, as the body of a function or loop. The last expression will be evaluated, as usual, for the return value of the function. All the other expressions are evaluated solely for side effects.

For example, here is a function definition for a function which squares a number. From a functional standpoint, it is identical to the one at the beginning of this chapter. However, it runs two side effects (specifically, calls to println) in addition to returning the value.

(defn square
    "Squares a number, with side effects."
    [x]
    (println "Squaring" x)
    (println "The return value will be" (* x x))
    (* x x))

As with do, only the last line of the function definition actually returns the value. But running the function at the REPL, you see:

user=> (square 5)
Squaring 5
The return value will be 25
25

The same construct also works for fn: just add additional expressions before the one that returns the value. This can be very useful, for example, for adding logging to track when functions are called.

Functional Programming Techniques

As previously described, the mechanical basics of how to declare functions and control program flow within a Clojure program. These are the basic, most fundamental components from which Clojure programs are built. Most of the rest of Clojure's standard library is expressible in terms of these basic constructs (with the exception of macro-based forms, discussed in Chapter 12).

However, to write a good Clojure program, you must not only know these forms but some of the techniques for using them effectively and understand everything that Clojure allows you to do. Most of these techniques are by no means exclusive to Clojure, but are common to all functional languages.

First-Class Functions

Functions can themselves be values and passed to and returned from other functions. This is an important feature of functional programming. It isn't just a way of doing clever tricks with code, but a key way to structure programs. By passing blocks of functionality around as functions, it is possible to write code that can be extremely generic and nearly eliminate code duplication.

There are two aspects to using first-class functions: taking them as arguments and calling them and creating and returning them. The former is somewhat more common, as it is conceptually "easier," although the latter can be extremely powerful as well.

Consuming First-Class Functions

Functions that take other functions as arguments are extremely common. These are known as higher-order functions. Most of the sequence manipulation library (see Chapter 5) is based around this technique.

The primary motivation for allowing a function to take other functions as arguments is to make it more generic. By delegating specific behaviors to the provided functions, the outer function can be much more general, and therefore, suitable for use in a much wider range of scenarios.

For example, the following example is a function which calculates the result of a function applied to two arguments, and also the result when the order of the arguments is reversed. The key point to notice is that it works for any function that takes two arguments. Perhaps you designed this function with one function in mind, but it works equally well for anything else.

(defn arg-switch
    "Applies the supplied function to the arguments in both possible orders. "
    [fun arg1 arg2]
    (list (fun arg1 arg2) (fun arg2 arg1)))

The function constructs a list of two items. The first is the result of calling the function with the parameters in the original order and the second is the result of calling them in reverse order. Test it at the REPL:

user=> (arg-switch / 2 3)
(2/3 3/2)

Here, you pass arg-switch three distinct parameters: the division function, the number two, and the number three. It returns a list with two items: the first is two divided by three and the second is three divided by two. Both are presented as fractions, because that is Clojure's default numerical representation for rational numbers.

arg-switch works equally well when passed other functions:

user=> (arg-switch > 2 3)
(false true)

When passed the greater-than function, it returns (false true), the respective results of (> 2 3) and (> 3 2). It works for non-numeric functions. Here you try it with the string concatenation function str:

user=> (arg-switch str "Hello" "World")
("HelloWorld" "WorldHello")

You can even pass it a custom function, defined inline:

user=> (arg-switch (fn [a b]
                                   (/ a (* b b)))
2 3)
(2/9 3/4)

As you can see, by allowing your function to take another function as an argument, you have with no extra work created an extremely generic, flexible function that can be used in a wide variety of scenarios (assuming you needed this sort of function to begin with). Defining it using a first-class function is infinitely preferable to having to write it again and again for each type of operation. When programs become more complex, this is even more of an advantage. Functions can concentrate entirely on their own logic and delegate all other operations.

Producing First-Class Functions

Not only can functions take other functions as arguments, but they can construct them and return them as values. This has the potential to be rather mind-bending, if not kept clean and understandable, but is also an extraordinarily powerful feature.

This is one of the main reasons Lisp has historically been associated with artificial intelligence. It was thought that functions creating other functions would allow a machine to evolve and define its own behavior. Although self-modifying programs never quite lived up to expectations, the ability to define functions on-the-fly is nevertheless extremely powerful and useful for many everyday programming tasks.

As one example, here is a very simple function that creates and returns another function which checks that a number is in a given range:

(defn rangechecker
    "Returns a function that determines if a number is in a provided range."
    [min max]
    (fn [num]
        (and (<= num max)
                (<= min num))))

To use this function, you can call it and save the result in the REPL:

user=> (def myrange (rangechecker 5 10))
#'user/myrange

Then call your new function, myrange, like any other function:

user=> (myrange 7)
true
user=> (myrange 11)
false

If you only needed one range check, it would probably be easier just to write it directly. But in a program where there may be dynamically generated ranges or thousands of different ranges required, creating a "function factory" function like rangechecker is very useful. For functions that are more complicated than just checking a range, it is a huge win, since any functions that can be generated dynamically are functions that don't have to be written manually with lots of complicated logic.

Closures

As might be gathered from its very name, closures are a central feature in Clojure. But what, exactly, is a closure? And why do they matter so much?

Briefly stated, closures are first-class functions that contain values as well as code. These values are those in scope at function declaration, preserved along with the function. Whenever a function is declared, the values locally bound to symbols it references are stored along with it. They are "closed over" (hence the name) and maintained along with the function itself. This means that they are then available for the function's entire lifespan and the function can be referred to as a closure.

For example, the -rangechecker function defined previously is actually a closure. The inner function definition refers to the min and max symbols. If these values were not closed over and made available as part of the function, they would be well out of scope by the time the function was called. Instead, the generated function carries them with it, so they are available wherever and whenever it is called.

The value of a closed-over value can't change after the function is created, so it becomes in essence a constant for that function.

One interesting property of closures is that due to their dual nature—both behavior and data—they can fulfill some roles that are assumed by objects in object-oriented languages. Just as anonymous classes with one method are used to simulate first-class functions in Java, closures can be viewed as an object with a single method. If you implement this method as a generic dispatcher for "messages" sent to the closure, it can have the beginnings of a full object system (although this is overkill for most programs). It is very common to create closures in which the data they hold is just as important as the behavior they embody.

Currying and Composing Functions

Currying, first invented by Moses Schönfinkel but named after Haskell Curry, refers to the process of transforming a function into a function with fewer arguments by wrapping it in a closure. Manipulating functions in this way is extremely useful, as it allows for the creation of new, customized functions without having to write explicit definitions for each one.

Using partial to Curry Functions

In Clojure, any function can be curried using the partial function. partial takes a function as its first argument and any number of additional arguments. It returns a function that is similar to the provided function, but with fewer arguments; it uses the additional arguments to partial instead.

For example, the multiplication function * normally takes at least two arguments to be useful. But if you need a single-argument version, you can use partial to curry it, combining it with a specific value to create a single-argument function that suits your needs:

user=> (def times-pi (partial * 3.14159))
#'user/times-pi

Now, you can call times-pi with a single argument, which it will multiply by PI:

user=> (times-pi 2)
6.28318

Notice that (times-pi 2) is exactly equivalent to (* 3.14159 2). All you've done is to create a version of * with some of its parameters already defined. You could have done the same thing by manually defining a function:

(defn times-pi
    "Multiplies a number by PI"
    [n]
    (* 3.14159 n))

Although this is quite cumbersome, the entire function definition is basically a wrapper for the multiplication function, supplying specific values. This is where currying shines: it eliminates the need to explicitly write this type of simple wrapper function. The function returned by partial is identical to the manually defined version of times-pi, but by using partial you can leverage the fact that times-pi is defined exclusively in terms of the multiplication function and a particular value. This makes the code much easier to keep track of, and it mirrors the abstract logic of what is happening more accurately.

Using comp to Compose Functions

Another powerful tool to use in conjunction with currying is function composition. In one sense, every function is a composition, since all functions must use other functions in their definitions. However, it is also possible to succinctly create new functions by combining existing functions, using the comp function instead of specifying an actual function body.

comp takes any number of parameters: each parameter is a function. It returns a function that is the result of calling all of its argument functions, from right to left. Starting with the rightmost, it calls the function and passes the result as the argument to the next function and so on. Therefore, the function returned by comp will have the same arity as the rightmost argument to comp, and all the functions passed to comp except for the rightmost must take a single argument. The final return value is the return value of the leftmost function.

To see this in action, consider the following example entered at the REPL:

user=> (def my-fn (comp - *))
#'user/my-fn

This defines my-fn as a function which takes any number of arguments, multiplies them, negates them, and returns the result. Try it out using the following code:

user=> (my-fn 5 3)
-15

As expected, the result is −(5 * 3), or −15. First, the rightmost argument function is called on the parameters. In this case, it is multiplication, which returns 15. Fifteen is passed to the negation function, giving 15. Since this is the leftmost argument function, this is the return value as a whole. You can use comp, in this case, because the logic of my-fn can be expressed solely in terms of the multiplication and negation functions. Of course, it is possible to write my-fn out longhand:

(defn my-fn
    "Returns −(x * y)"
    [x y]
    (- (* x y)))

However, since it does nothing but compose the multiplication and negation functions anyway, it is much simpler as well as more expressive to use comp.

Because the functions passed to comp are required to take a single argument, it makes them particularly good candidates for using currying with partial. Say, for example, that you need a function similar to the one defined above, but that carries out an additional step: multiplying the final product by ten. In conventional mathematical notation, you want to write a function that calculates 10 * -(x * y).

Normally, this could not be expressed using comp alone—each argument to comp (excepting the rightmost) must take a single argument, and multiplication requires multiple arguments. But by passing the result of partial as one of the arguments to comp, you can get around this restriction:

user=> (def my-fn (comp (partial * 10) - *))
#'user/my-fn
user=> (my-fn 5 3)
-150

It works as expected. First, 3 and 5 are multiplied. That result, 15, is passed to the negation function. That result, 15, is passed to the function created by partial, which multiplies it by 10 and returns the final value as the result: 150.

This example should demonstrate how it is possible to use function composition and currying to create arbitrarily complex functions, as long as they are definable in terms of existing functions. Using currying and composition will make the intent of your code clear and keep things very succinct. Often, complex multiline function definitions can be replaced with a single line of composed or curried functions.

Putting It All Together

This chapter has covered the most basic elements of a Clojure program: functions, recursion, and conditional logic. To use Clojure effectively, it is very important to be completely comfortable with these constructs.

However, unlike most other languages, Clojure doesn't stop with these basic control structures. They are intended to be built upon as well as used directly. It is certainly possible to write a program of any size or complexity using just basic structures. Conditionals, loops, and function calls go a long way, and, indeed, they are the only tools available in some languages. But this can be seen as growing a program "horizontally"—piling on more and more conditions, more functions, more complex looping, or recursion. The cost of modifying or extending the program is linear; small changes or additions take a little bit of work, and big changes or additions require lots of work.

Clojure encourages you to program "vertically" by building up your own control structures on top of the provided primitives, rather than using them directly. First-class functions and closures are extremely powerful ways to do this. By recognizing patterns particular to your program or problem domain, it is possible to build your own controls that are far more powerful than the primitive structures could ever be. Your program can be expanded and modified with sub-linear effort—making small changes is still easy, but making larger changes can be easy too, since the language itself is now customized to the problem domain.

For example, it is entirely possible to do processing on a collection by recursing through it manually. But this is such a common task that Clojure has provided a powerful suite of higher-order collection-processing functions: map, reduce, filter, etc. These are all discussed in Chapter 5 and allow operations on collections to be expressed often with a single line rather than coding entirely new recursive functions for each occasion. The same principle applies to any domain problem. Clojure includes functions for collections, since they are used in almost every program, but you can take the same approach with problems and structures specific to any problem domain. Don't just build out functionality, but use higher-order functions (and later on, macros) to build up the tools that will help deal with that type of problem.

By the time any Clojure program reaches a certain level of complexity, if it's well designed, you should find that it looks very much like a highly customized domain specific language (DSL). This is no extra work—it comes naturally, and will actually make the program much smaller and more lightweight than using the primitive structures repeatedly. loop, recur, and cond are useful, but they should be the building blocks, not the substance of a program. Once a project is underway, it can be very surprising how little they are needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.255.250