Chapter 14. Performance

In principle, Clojure can be just as fast as Java: both are compiled to Java bytecode instructions, which are executed by a Java Virtual Machine. Clojure's design is careful to avoid features—such as continuations or a Common Lisp-like condition system—that would severely compromise performance on the JVM. But Clojure is still a young language, and has not had the benefit of hundreds of thousands of programmer-hours spent optimizing the compiler. As a result, Clojure code will generally run slower than equivalent Java code. However, with some minor adjustments, Clojure performance can usually be brought near Java performance. Don't forget that Java is always available as a fallback for performance-critical sections of code.

Profiling on the JVM

The number one rule when evaluating performance of any programming language or algorithm is: test! Do not assume that one technique will necessarily be faster because it appears to have fewer steps or use fewer variables. This is especially true on modern JVMs such as Hotspot, which constantly measure code performance and dynamically recompile critical sections while your application is running.

So-called microbenchmarks that measure a single operation in isolation are meaningless in this environment. Also meaningless are benchmarks where the start-up time of the JVM dominates the measurement (this is a frequent error in comparisons between Java and C++). Modern JVMs are typically optimized for throughput, maximizing the total number of operations that can be performed over a long period of time.

General Tips for Java Performance

Java Virtual Machines have a number of options that affect performance. First, for JVMs that distinguish between "client" and "server" modes, the "server" mode will always offer better overall performance (at the expense of longer start-up time).

Second, the size of the Java heap space and the choice of garbage collection strategy impact performance. This is especially true for Clojure, which because of its use of immutable data, tends to use more heap space and put more stress on the garbage collector than Java.

There are many more tuning parameters in modern JVMs that can affect performance. Make sure you are familiar with the "knobs" offered by your VM and experiment to see how they affect your particular application.

Simple Profiling with Time

Clojure has a very simple profiling tool built-in, the time macro. time takes a single expression, evaluates it, and prints how long it took in milliseconds:

user=> (time (reduce + (range 100)))
"Elapsed time: 1.005 msecs"
4950

As previously noted, such microbenchmarks are all but meaningless in the context of the JVM. A slightly better measurement can be obtained by repeating the same calculation thousands of times in a tight loop:

user=> (time (dotimes [i 100000]
               (reduce + (range 100))))
"Elapsed time: 252.594 msecs"
nil

However, this still does not present the whole picture, as the JVM might reoptimize the calculation between executions of the loop. A more accurate result can be obtained by repeating the loop several times:

user=> (dotimes [j 5]
         (time (dotimes [i 100000]
                 (reduce + (range 100)))))
"Elapsed time: 355.759 msecs"
"Elapsed time: 239.404 msecs"
"Elapsed time: 217.362 msecs"
"Elapsed time: 221.168 msecs"
"Elapsed time: 217.753 msecs"

As you can see, in this example, the time bounces around for a couple of iterations before converging around 220 milliseconds. This pattern is typical of the JVM.

However, even with this information, you cannot predict exactly how the calculation (reduce + (range 100)) will perform in the context of a large application. Only further testing will tell.

Also, be aware of the impact of lazy sequences. If the expression you are testing uses lazy sequences (for example, using map), the time macro may only report the time to initialize the sequence. To measure the time to realize the entire sequence, you must use doall, which can be difficult to do in a complex data structure and is probably not representative of how the structure will actually be used.

Using Java Profiling Tools

Since Clojure compiles to Java bytecode, Java profiling tools will work on Clojure, but a discussion of such tools is outside the scope of this book.

The best rule-of-thumb is this: write your code in the simplest, most direct way possible, then test to see if it meets your performance expectations. If it does not, use profiling tools to identify the critical sections that matter most to performance, and tweak or rewrite those sections until they meet your performance goals. The following pages describe some techniques for optimizing critical sections of Clojure code.

Memoization

One simple technique for speeding up large, complex functions is memoization, which is a form of caching. Each time it is called, a memoized function will store its return value in a table, along with the input arguments. If that function is called again with the same arguments, it can return the value stored in the table without repeating the calculation.

Clojure has built-in support for memoization with the memoize function, which takes a function as its argument and returns a memoized version of that function.

(defn really-slow-function [x y z] ...)
(def faster-function (memoize really-slow-function))

Memoization is a classic example of trading increased memory usage for faster execution time. If a function takes longer to calculate its result than a hash table lookup, and it will be called frequently with the same inputs, it is a good candidate for memoization. Only pure functions—that is, functions that always return the same output for a particular input—can be memoized.

Reflection and Type Hints

As you know, Java is a statically typed language: it knows the types of all objects at compile time. Clojure is dynamically typed, meaning the types of some objects may not be known until runtime.

To implement dynamically-typed function calls on top of statically-typed Java, Clojure uses a Java feature called reflection. Reflection allows code to inspect Java classes at runtime and call methods by name. However, reflective method calls are much slower than compiled method calls.

Clojure allows you to add type hints to symbols and expressions to help the compiler avoid reflective method calls. Type hints are indicated through read-time metadata (see Chapter 8) using the :tag keyword. A type-hinted symbol would be written as #^{:tag hint} symbol, this is usually abbreviated as #^hint symbol. The type hint is a Java class name. (The class name may also be a string, which is only rarely needed to handle obscure Java interoperability problems.)

To find out whether a method call is reflective or not, set the compiler flag *warn-on-reflection* to true. After that, evaluating any code that contains a reflective call will cause the Clojure compiler to print a warning message.

user=> (set! *warn-on-reflection* true)
true
user=> (defn nth-char [s n]
         (.charAt s n))
Reflection warning - call to charAt can't be resolved.

The warning can usually be eliminated by adding a type hint to the symbol on which you are calling the method. This works for both function parameters and locals bound with let.

;; No reflection warnings:
user=> (defn nth-char [#^String s n]
         (.charAt s n))
#'user/nth-char
user=> (defn nth-char [s n]
         (let [#^String st s]
(.charAt st n)))
#'user/nth-char

In the case of Java methods overloaded on different argument types, further type hints may be needed. For example, the String.replace method accepts either char or CharSequence arguments. You have to type hint all three arguments to avoid reflection.

user=> (defn str-replace [#^String s a b]
         (.replace s a b))
Reflection warning - call to replace can't be resolved.
#'user/str-replace
user=> (defn str-replace [#^String s
                          #^CharSequence a
                          #^CharSequence b]
         (.replace s a b))
#'user/str-replace

Note that type hints are not type coercions, they cannot convert one type into another. Calling a type-hinted method with the wrong types will result in a runtime error:

user=> (str-replace "Hello" H J)
java.lang.ClassCastException: java.lang.Character cannot
be cast to java.lang.CharSequence

Also, note that incorrect type hints will cause a reflection warning:

user=> (defn str-replace [#^String s #^Integer a #^Integer b]
         (.replace s a b))
Reflection warning - call to replace can't be resolved.
#'user/str-replace

You can type-hint the return value of a function by adding a type tag to its Var when it is defined. This works for any Var, such as those used as global values.

user=> (defn greeting [] "Hello, World!")  ;; no type hint
#'user/greeting
user=> (.length (greeting))
Reflection warning - reference to field length can't be resolved.
13
user=> (defn #^String greeting [] "Hello, World!")
#'user/greeting
user=> (.length (greeting))  ;; no reflection warning
13
user=> (defn greeting {:tag String} [] "Hello, World!") ;; same as above

In rare cases, type hinting symbols will not be sufficient to avoid reflection. In that case, you can type-hint an entire expression:

user=> (.length (identity "Hello, World!"))
Reflection warning - reference to field length can't be resolved.
13
user=> (.length #^String (identity "Hello, World!"))
13

The Clojure compiler is pretty clever about tracking the types of objects. For example, the return types of Java methods are always known and never need to be hinted. Given just a few hints, the compiler can usually infer most of the other type information it needs. In general, you should write your code first without any type hints, then set *warn-on-reflection* and add them only where necessary for performance.

Working with Primitives

Java's type system is not 100% object-oriented; it supports the primitive types boolean, char, byte, short, int, long, float, and double. These primitive types do not fit into the standard Java class hierarchy. When used with methods that expect an Object, primitives must be boxed in the classes Boolean, Character, Byte, Short, Integer, Long, Float, and Double. Starting with Java 1.5, the Java compiler automatically boxes and unboxes primitives as needed.

In Clojure, everything is an Object, so numbers are always boxed. This can be seen by inspecting the results of simple arithmetic:

user=> (class (+ 1 1))
java.lang.Integer

However, in the JVM, operations on boxed numbers are slower than operations on unboxed primitives. So for math-intensive applications, Clojure code with boxed numbers will be slower than Java code that works directly with primitives.

Loop Primitives

Clojure supports primitive types where it matters most: in the body of a loop. In the bindings vector of loop (or let) you can coerce values to primitive types with the functions boolean, char, byte, short, int, float, and double. Here is an example of Euclid's algorithm for computing the greatest common denominator of two integers:

(defn gcd [a b]
  (loop [a (int a), b (int b)]
    (cond (zero? a) b
          (zero? b) a
          (> a b) (recur (- a b) b)
          :else (recur a (- b a)))))

The primitive coercions to int happen in the initialization vector of the loop. This version is about twelve times faster than the non-primitive version. But, be careful! Suppose you had chosen to implement the same algorithm using the mod (modulo) function. That code would be slower using primitives, because arguments to Clojure functions (except arithmetic) are always boxed. Therefore, when using loop primitives, you should only call the following primitive-aware functions:

  • Arithmetic functions +, -, *, and /

  • Comparison functions ==, <, >, <=, and >=

  • Predicate functions pos?, neg?, and zero?

  • Java methods with primitive argument and return types

  • Unchecked arithmetic functions (described in the following section)

Notice that the general-purpose = function is not on this list. Instead, use the == function, which only works on numbers. Full primitive support for all Clojure functions, including user-defined functions, is planned for a future release.

Numeric literals must also be coerced to primitive types to use primitive operations. For example, the following code computes the sum of the integers from 1 to 100:

(let [max (int 100)]
  (loop [sum (int 0)
         i (int 1)]
    (if (> i max)
      sum
      (recur (+ sum i) (inc i)))))

The initial values of the loop variables sum and i must be coerced into primitives with int. The primitive coercion of max is outside the loop because it only needs to be done once. If you used the literal number 100 instead of the local variable max, the code would still work, but it would not be quite as fast.

Unchecked Integer Arithmetic

Clojure's primitive arithmetic operations are defined to be safe, meaning they will throw an error if the result of an operation is too big for the result type. For example, this loop, designed to calculate 2^64, throws an exception:

user=> (let [max (int 64)
             two (int 2)]
         (loop [total (int 1), n (int 0)]
           (if (== n max)
               total
             (recur (* total two) (inc n)))))
java.lang.ArithmeticException: integer overflow

However, there are certain algorithms (such as hashing) where the silent overflow behavior of integer arithmetic is desirable. For these cases, Clojure provides a set of functions that perform integer arithmetic exactly like Java's arithmetic operators. They all accept Integer or Long arguments, and are primitive-aware.

The following functions are subject to integer overflow: unchecked-add, unchecked-subtract, unchecked-multiply, unchecked-negate, unchecked-inc, and unchecked-dec.

user=> (unchecked-inc Integer/MAX_VALUE)
-2147483648
user=> (unchecked-negate Integer/MIN_VALUE)
-2147483648

The unchecked-divide and unchecked-remainder functions are subject to lossy truncation.

user=> (unchecked-divide 403 100)
4

The unchecked operations will be slightly faster than the standard arithmetic functions when used with loop primitives. However, make certain you can accept the loss of safety before switching to unchecked arithmetic.

Primitive Arrays

Starting with Clojure 1.1, you can type-hint arrays of primitives: Java boolean[], char[], byte[], short[], int[], long[], float[], and double[] can be hinted as #^booleans, #^chars, #^bytes, #^shorts, #^ints, #^longs, #^floats, and #^doubles, respectively. (There is also #^objects for Object[].)

Type-hinted arrays support primitive operations using aget and aset. There is no need to use the type-specific setter functions such as aset-int and aset-double; in fact, those functions will be slower than aset for type-hinted primitive arrays. For aset, the new value must also be the correct primitive type. For both aget and aset, the array index must be a primitive int. The amap and areduce macros (described in Chapter 10) are an excellent way to perform fast operations on primitive arrays while retaining a functional style.

Transients

As you know by now, all of Clojure's built-in data structures are immutable and persistent to ensure safe concurrent access from multiple threads. But what if you have a data structure that you know will only be used by a single thread? Should you still have to pay the immutable/persistent performance penalty? Clojure's answer is: No!

Clojure 1.1 introduced transients, temporary mutable data structures, as a performance optimization. They are useful when you are building up a large data structure through a series of steps.

The key feature of transients is that they do not change the functional style of your code. Importantly, they do not give you a truly mutable data structure (like Java's collection classes) that you can bash at with imperative code. The mutable nature of transients is largely an implementation detail.

Transients are best explained by an example. The following code creates a map from ASCII characters to their decimal values:

(loop [m {}, i 0]
  (if (> i 127)
    m
    (recur (assoc m (char i) i) (inc i))))

Here is the same loop using transients:

(loop [m (transient {}), i 0]
  (if (> i 127)
    (persistent! m)
    (recur (assoc! m (char i) i) (inc i))))

Notice that very little changes. This example shows the three code modifications required to use transients:

  1. Initialize a transient version of the data structure with the transient function. Vectors, hash maps, and hash sets are supported.

  2. Replace all uses of conj, assoc, dissoc, disj, and pop with their transient versions: conj!, assoc!, dissoc!, disj!, and pop!.

  3. Call persistent! on the result to return a normal persistent data structure.

One important feature of transients is that the transient and persistent! functions run in constant time, regardless of the size of the input. Therefore, it is very efficient to call transient on a large data structure, manipulate it using transient-specific functions, and then call persistent! before returning the structure.

Remember, transients are not mutable data structures like Java collections. Just like persistent data structures, you must use the return value of any function that modifies the structure. The following imperative-style code will not work:

;; bad code!
(let [m (transient {})]
  (dotimes [i 127]
    (assoc! m (char i) i))
  (persistent! m))

The dotimes macro creates an imperative loop; on each iteration, the return value of assoc! is discarded. The exact results of this code are unpredictable, but always wrong.

Generally, transients are used within a single function or loop/recur block. They can be passed around to other functions, but they impose several restrictions:

  • Thread isolation is enforced. Accessing the transient structure from another thread will throw an exception.

  • After calling persistent!, the transient version of the structure is gone. Attempts to access it will throw an exception.

  • Intermediate versions of the transient structure cannot be stored or used; only the latest version is available (unlike persistent data structures).

The advantage to transients is that their modifying operations are much faster than those of persistent data structures. In general, anywhere you are building up a large data structure recursively, transients will offer a performance boost. But use of transients should almost always be limited to the body of a single function, not spread across different sections of code.

Var Lookups

Every time you use a Var, Clojure has to look up the Var's value. If you are repeatedly using the same Var in a loop, those lookups can slow down the code. To avoid this performance penalty, use let to bind the Var's value to a local for the duration of the loop, as in this example:

(def *var* 100)
(let [value *var*]
  ... loop using value instead of *var* ...)

Use this technique with caution: it does not always yield a performance improvement and may become unnecessary in a future Clojure release.

Inlining

Inlining—replacing a function call with the compiled body of the function—is a common optimization technique in compilers. The Hotspot JVM does extensive inlining of performance-critical code.

Clojure automatically inlines operations on primitives. However, arithmetic functions that take a variable number of arguments, such as +, will only be inlined in the two-argument case. That means this code:

(+ a b c d)

will be faster when written as:

(+ a (+ b (+ c d)))

especially when the values involved are primitives. This may also become unnecessary in a future release.

Macros and definline

In a sense, macros are a kind of inlining, because they are expanded at compile time. However, macros cannot be used as arguments to higher-order functions like map and reduce.

As an alternative, Clojure offers the definline form. definline looks, and works, like defmacro: its body should return a data structure representing the code to be compiled. Unlike defmacro, it creates a real function that can be used anywhere a normal function can. Here's an example.

;; a normal function:
(defn square [x] (* x x))
;; an inlined function:
(definline square2 [x] `(* ~x ~x))

definline is labeled "experimental" in Clojure. It exists primarily to work around the problem that functions cannot receive primitive arguments or return primitive values. When that feature is added, the JVM will do the inlining for you and definline will be unnecessary.

Summary

As Donald Knuth famously said, "Premature optimization is the root of all evil." The key word is premature. Trying to optimize a function before you have tested it is pointless. Trying to optimize an application before you have identified the performance-critical sections is worse than useless. In a just-in-time compiled, self-optimizing runtime such as the JVM, the situation is even more precarious, because it is difficult to look at a piece of code and predict how fast it will run.

The best approach is to step back and consider performance from two different angles. First, high-level performance considerations, such as avoiding bottlenecks or unnecessary I/O, should be considered during the design phase of an application. Low-level performance optimizations should be postponed until after the first draft of the code, when performance-critical sections can be identified through profiling.

Clojure provides many tools to optimize code without sacrificing its functional style. When those tools are insufficient, you can always fall back on imperative programming techniques, either in Clojure code (using arrays) or by writing Java code and calling it from Clojure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.178.9