Optimizing for Performance

In Clojure, it’s idiomatic to call Java using the techniques described in Calling Java. The resulting code will be fast enough for 90 percent of scenarios. When you need to, though, you can make localized changes to boost performance. These changes will not change how outside callers invoke your code, so you’re free to make your code work and then make it fast.

One of the most common ways to make Java interop faster is by adding type hints to remove reflective calls. We’ll look at that first, and then we’ll consider how to use Java primitives and arrays as a way to optimize memory use and numeric calculation performance.

Adding Type Hints

Clojure supports adding type hints to function parameters, let bindings, variable names, and expressions. These type hints serve two purposes:

  • Optimizing critical performance paths
  • Documenting the required type

For example, consider the following function, which returns information about a Java class:

 (​defn​ describe-class [c]
  {:name (.getName c)
  :final (java.lang.reflect.Modifier/isFinal (.getModifiers c))})

You can ask Clojure how much type information it can infer, by setting the special variable *warn-on-reflection* to true:

 (set! *warn-on-reflection* true)
 -> true

The exclamation point on the end of set! is an idiomatic indication that set! changes mutable state. set! is described in detail in Working with Java Callback APIs. With *warn-on-reflection* set to true, compiling describe-class will produce the following warnings:

 Reflection warning, line​:​ 87
 - reference to field getName can​'t​ be resolved.
 
 Reflection warning, line​:​ 88
 - reference to field getModifiers can​'t​ be resolved.

These warnings indicate that Clojure has no way of knowing the type of c. You can provide a type hint to fix this, using the metadata syntax ^Class:

 (​defn​ describe-class [^Class c]
  {:name (.getName c)
  :final (java.lang.reflect.Modifier/isFinal (.getModifiers c))})

With the type hint in place, the reflection warnings will disappear. The compiled Clojure code will be exactly the same as compiled Java code. Further, attempts to call describe-class with something other than a Class will fail with a ClassCastException:

 (describe-class StringBuffer)
 -> {:name ​"java.lang.StringBuffer"​, :final true}
 
 (describe-class ​"foo"​)
 | IllegalArgumentException No matching field found​:​ getName
 | ​for​ class java.lang.String

When you provide a type hint, Clojure will insert an appropriate class cast to avoid making slow, reflective calls to Java methods. But if your function doesn’t actually call any Java methods on a hinted object, then Clojure will not insert a cast. Consider this wants-a-string function:

 (​defn​ wants-a-string [^String s] (println s))
 -> #​'user/wants-a-string

You might expect that wants-a-string would complain about nonstring arguments. In fact, it’ll be perfectly happy:

 (wants-a-string ​"foo"​)
 | foo
 
 (wants-a-string 0)
 | 0

Clojure can tell that wants-a-string never actually uses its argument as a string (println will happily try to print any kind of argument). Since no string methods need to be called, Clojure doesn’t attempt to cast s to a string.

When you need speed, type hints will let Clojure code compile down to the same code Java will produce. But you won’t need type hints that often. Make your code work first, and then worry about making it fast.

Integer Math

Clojure provides three different sets of operations for integer types:

  • The default operators
  • The promoting operators
  • The unchecked operators

The following table gives a sampling of these operator types.

Default

Promoting

Unchecked

+

+’

unchecked-add

-

-’

unchecked-subtract

*

*’

unchecked-multiply

inc

inc’

unchecked-inc

dec

dec’

unchecked-dec

The unchecked operators correspond exactly with primitive math in Java. They are fast but dangerous, in that they can overflow silently and give incorrect answers. In Clojure, the unchecked operators should be used only in the rare situation that overflow is the desired behavior (like hashing) or when performance is paramount, and you’re certain overflow is impossible or irrelevant.

 (unchecked-add 9223372036854775807 1)
 -> -9223372036854775808

The default operators use Java primitives where possible for performance but always make overflow checks and throw an exception.

 (+ 9223372036854775807 1)
 -> ArithmeticException integer overflow

The promoting operators automatically promote from primitives to big numbers on overflow. This makes it possible to handle an arbitrary range but at significant performance cost. Because primitives and big numbers share no common base type, math with the promoting operators precludes the use of primitives as return types.

 (+' 9223372036854775807 1)
 -> 9223372036854775808N

Clojure relies on Java’s BigDecimal class for arbitrary-precision decimal numbers. See the online documentation[43] for details. BigDecimals provide arbitrary precision but at a price: BigDecimal math is significantly slower than Java’s floating-point primitives.

Clojure has its own BigInt class to handle BigInteger conversions. Clojure’s BigInt has some performance improvements over using Java’s BigInteger directly. It also wraps some of the rough edges of BigInteger. In particular, it properly implements hashCode. This makes equality take precedence over representation, which you’ll see in almost every abstraction in the language.

Under the hood, Clojure uses Java’s BigInteger. The performance difference comes in how BigInt treats its values. A BigInt consists of a Long part and a BigInteger part. When the value passed into a BigInt is small enough to be treated as a Long, it is. When numerical operations are performed on BigInts, if their result is small enough to be treated as a Long, it is. This gives the user the ability to add the overflow hint (N) without paying the BigInteger cost until it’s absolutely necessary.

Using Primitives for Performance

In the previous sections, function parameters carry no type information. Clojure simply does the right thing. Depending on your perspective, this is either a strength or a weakness. It’s a strength, because your code is clean and simple. But it’s also a weakness, because a reader of the code can’t be certain of datatypes, and because doing the right thing carries some performance overhead. Consider a function that calculates the sum of the numbers from 1 to n:

 ; performance demo only, don​'t​ write code like this
 (​defn​ sum-to [n]
  (loop [i 1 sum 0]
  (​if​ (<= i n)
  (recur (inc i) (+ i sum))
  sum)))

You can verify that this function works with a small input value:

 (sum-to 10)
 => 55

Let’s see how sum-to performs. To time an operation, you can use the time function. When benchmarking, you’ll tend to want to take several measurements so that you can eliminate startup overhead plus any outliers; therefore, you can call time from inside a dotimes macro:

 (dotimes bindings & body)

dotimes will execute its body repeatedly, with the name bound to integers from zero to n-1. Using dotimes, you can collect five timings of sum-to as follows:

 (dotimes [_ 5] (time (sum-to 100000)))
 | ​"Elapsed time: 0.397831 msecs"
 | ​"Elapsed time: 0.420645 msecs"
 | ​"Elapsed time: 0.363732 msecs"
 | ​"Elapsed time: 0.365856 msecs"
 -> ​"Elapsed time: 0.368997 msecs"

To speed things up, you can hint the argument and return type as long. Clojure’s type inference will flow this hint to all the internal operations and function calls inside the function.

 (​defn​ integer-sum-to ^long [^long n]
  (loop [i 1 sum 0]
  (​if​ (<= i n)
  (recur (inc i) (+ i sum))
  sum)))

The integer-sum-to is indeed faster:

 (dotimes [_ 5] (time (integer-sum-to 100000)))
 | ​"Elapsed time: 0.152525 msecs"
 | ​"Elapsed time: 0.112546 msecs"
 | ​"Elapsed time: 0.112313 msecs"
 | ​"Elapsed time: 0.112196 msecs"
 -> ​"Elapsed time: 0.112155 msecs"

Clojure’s primitive math is still correct, in that it will check for overflow and throw an exception. Is that as fast as things can get? Java programmers have access to super-fast busted math: arithmetic operations that have the maximum possible performance but can silently overflow and corrupt data.

Clojure provides access to Java’s arithmetic semantics through the unchecked family of functions. Maybe you can get an even faster function by using the unchecked version of +, unchecked-add:

 (​defn​ unchecked-sum-to ^long [^long n]
  (loop [i 1 sum 0]
  (​if​ (<= i n)
  (recur (inc i) (unchecked-add i sum))
  sum)))

The unchecked-sum-to is not significantly faster:

 (dotimes [_ 5] (time (unchecked-sum-to 100000)))
 | ​"Elapsed time: 0.112321 msecs"
 | ​"Elapsed time: 0.075186 msecs"
 | ​"Elapsed time: 0.075046 msecs"
 | ​"Elapsed time: 0.075116 msecs"
 -> ​"Elapsed time: 0.093338 msecs"

Orders of magnitude are important! Primitive hinting can make certain operations significantly faster. However, switching to Java’s unchecked semantics is generally a losing proposition. You get a trivial performance gain on average, with the possibility of data corruption tomorrow.

Clojure provides these operations for the relatively rare cases where they’re needed (like hash computations) and for cases where you need to interoperate with other libraries that expect this behavior. Additionally, there are some performance-sensitive cases where you’re willing to give up safety for maximum performance.

Prefer accuracy first and then optimize for speed only where necessary. integer-sum-to will throw an exception on overflow. This is bad, but the problem is easily detected:

 (integer-sum-to 10000000000)
 -> java.lang.ArithmeticException​:​ integer overflow

unchecked-sum-to will fail silently on overflow. In a program setting, it can quietly but catastrophically corrupt data:

 (unchecked-sum-to 10000000000)
 -> -5340232216128654848 ​; WRONG!!

Given the competing concerns of correctness and performance, you should normally prefer simple, undecorated code such as the original sum-to. If profiling identifies a bottleneck, you can force Clojure to use a primitive type in just the places that need it.

The sum-to example is deliberately simple to demonstrate the various options for integer math in Clojure. In a real Clojure program, it would be more expressive to implement sum-to using reduce. Summing a sequence is the same as summing the first two items, adding that result to the next item, and so on. That is exactly the loop that (reduce + ...) provides. With reduce, you can rewrite sum-to as a one-liner:

 (​defn​ better-sum-to [n]
  (reduce + (range 1 (inc n))))

The example also demonstrates an even more general point: pick the right algorithm to begin with. The sum of numbers from 1 to n can be calculated directly as follows:

 (​defn​ best-sum-to [n]
  (/ (* n (inc n)) 2))

Even without performance hints, this is faster than implementations based on repeated addition:

 (dotimes [_ 5] (time (best-sum-to 100000)))
 | ​"Elapsed time: 0.043821 msecs"
 | ​"Elapsed time: 0.004646 msecs"
 | ​"Elapsed time: 0.003991 msecs"
 | ​"Elapsed time: 0.004111 msecs"
 -> ​"Elapsed time: 0.003898 msecs"

Performance is a tricky subject. Don’t write ugly code in search of speed. Start by choosing appropriate algorithms and getting your code to work correctly. If you have performance issues, profile to identify the problems. Then, introduce only as much complexity as you need to solve those problems.

Using Java Arrays

Clojure’s collections supplant the Java collections for most purposes. Clojure’s collections are concurrency safe, have good performance characteristics, and implement the appropriate Java collection interfaces. So you should generally prefer Clojure’s own collections when you’re working in Clojure and even pass them back into Java when convenient.

If you do choose to use the Java collections, nothing in Clojure will stop you. From Clojure’s perspective, the Java collections are classes like any other, and all the various Java interop forms will work. But the Java collections are designed for lock-based concurrency. They will not provide the concurrency guarantees that Clojure collections do and won’t work well with Clojure’s software transactional memory.

One place where you’ll need to deal with Java collections is the special case of Java arrays. In Java, arrays have their own syntax and their own bytecode instructions. Java arrays don’t implement any Java interface. Clojure collections cannot masquerade as arrays. (Java collections can’t either!) The Java platform makes arrays a special case in every way, so Clojure does, too.

Clojure provides make-array to create Java arrays:

 (make-array class length)
 (make-array class dim & more-dims)

make-array takes a class and a variable number of array dimensions. For a one-dimensional array of strings, you might say this:

 (make-array String 5)
 -> #object[​"[Ljava.lang.String;"​ 0x6a129a7d ​"[Ljava.lang.String;@6a129a7d"​]

The odd output is courtesy of Java’s implementation of toString() for arrays: [Ljava.lang.String; is the JVM specification’s encoding for “one-dimensional array of strings.” That’s not very useful at the REPL, so you can use Clojure’s seq to wrap any Java array as a Clojure sequence so that the REPL can print the individual array entries:

 (seq (make-array String 5))
 -> (nil nil nil nil nil)

Clojure also includes a family of functions with names such as int-array for creating arrays of Java primitives. You can issue the following command at the REPL to review the documentation for these and other array functions:

 (find-doc ​"-array"​)

Clojure provides a set of low-level operations on Java arrays, including aset, aget, and alength:

 (aset java-array index value)
 (aset java-array index-dim1 index-dim2 ... value)
 (aget java-array index)
 (aget java-array index-dim1 index-dim2 ...)
 (alength java-array)

Use make-array to create an array and then experiment with using aset, aget, and alength to work with the array:

 (​defn​ painstakingly-create-array []
  (​let​ [arr (make-array String 5)]
  (aset arr 0 ​"Painstaking"​)
  (aset arr 1 ​"to"​)
  (aset arr2 ​"fill"​)
  (aset arr 3​" in"​)
  (aset arr 4 ​"arrays"​)
  arr))
 
 (aget (paintakingly-create-array) 0)
 -> ​"Painstaking"
 
 (alength (painstakingly-create-array))
 -> 5

Most of the time, you’ll find it simpler to use higher-level functions such as to-array, which creates an array directly from any collection:

 (to-array sequence)

to-array always creates an Object array:

 (to-array [​"Easier"​ ​"array"​ ​"creation"​])
 -> (to-array [​"Easier"​ ​"array"​ ​"creation"​])

to-array is also useful for calling Java methods that take a variable argument list, such as String/format:

 ; example. prefer clojure.core/format (String/format ​"Training Week: %s Mileage: %d"
 (String/format ​"Training Week: %s Mileage: %d"
  (to-array [2 26]))
 -> ​"Training Week: 2 Mileage: 26"

to-array’s cousin into-array can create an array with a more specific type than Object.

 (into-array type? seq)

You can pass an explicit type as an optional first argument to into-array:

 (into-array String [​"Easier"​, ​"array"​, ​"creation"​])
 -> #object[​"[Ljava.lang.String;"​ 0x21072f13 ​"[Ljava.lang.String;@21072f13"​]

If you omit the type argument, into-array will guess the type based on the first item in the sequence:

 (into-array [​"Easier"​ ​"array"​ ​"creation"​])
 -> #object[​"[Ljava.lang.String;"​ 0x88821c2 ​"[Ljava.lang.String;@88821c2"​]

As you can see, the array contains Strings, not Objects. If you want to transform every element of a Java array without converting to a Clojure sequence, you can use amap:

 (amap a idx ret expr)

amap creates a clone of the array a, binding that clone to the name you specify in ret. It then executes expr once for each element in a, with idx bound to the index of the element. Finally, amap returns the cloned array. You could use amap to uppercase every string in an array of strings:

 (​def​ strings (into-array [​"some"​ ​"strings"​ ​"here"​]))
 -> #​'user/strings
 
 (seq (amap strings idx _ (.toUpperCase (aget strings idx))))
 -> (​"SOME"​ ​"STRINGS"​ ​"HERE"​)

The ret parameter is set to _ to indicate that it’s not needed in the map expression, and the wrapping seq is simply for convenience in printing the result at the REPL. Similar to amap is areduce:

 (areduce a idx ret init expr)

Where amap produces a new array, areduce produces anything you want. The ret is initially set to init and later set to the return value of each subsequent invocation of expr. areduce is normally used to write functions that “tally up” a collection in some way. For example, the following call finds the length of the longest string in the strings array:

 (areduce strings idx ret 0 (max ret (.length (aget strings idx))))
 -> 7

amap and areduce are special-purpose macros for interoperating with Java arrays.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.221.149