Using type hints

Most problems that are good targets for parallelization involve doing calculations in tight loops. These places are good for all kinds of performance optimizations, from hoisting conditionals out of them to fine-tuning compiler hints, which we will do here.

Being a dynamic language, Clojure doesn't require type declarations. However, if we know what types we are using, we can get better performance by including type hints in our code. This is helpful for object types, where Clojure can then resolve method calls at compile time, and also for primitive types, where Clojure can generate well-tuned code that doesn't include boxing or wrapping the primitive type in a heavier Java object.

Note

For more information about type hints and working with Java primitives, see the documentation on interacting with Java from Clojure at http://clojure.org/java_interop.

Type hints are expressed as metadata tags for return types and object types. We'll see examples of both in this recipe. For this recipe, we'll revisit the Monte Carlo simulation to estimate the value of pi. We first saw this in the Paritioning Monte Carlo simulations for better pmap performance recipe. This example has enough low-level math which makes it a good candidate for this kind of optimization.

Getting ready

We'll use the Criterium library for benchmarking, so include these in our Leiningen project.clj file:

(defproject parallel-data ""0.1.0""
  :dependencies [[org.clojure/clojure ""1.6.0""]
                 [criterium ""0.4.3""]])

Use it and the java.lang.Math class in our script or REPL:

(use 'criterium.core)
(import [java.lang Math])

From the Partitioning Monte Carlo simulations for better pmap performance recipe, we'll use the rand-point and center-dist functions.

How to do it…

We'll implement the Monte Carlo simulation without type hints, benchmark it, and then reimplement it with type hints to see whether we can get better performance.

  1. The simulation itself places n random points in the unit square and tests how many of them fall within the upper-right quadrant of a circle centered at 0, 0. The ratio of those that fall within the circle, multiplied by 4, should approximate the value of pi:
    (defn mc-pi [n]
      (let [in-circle (->> (repeatedly n rand-point)
                           (map center-dist)
                           (filter #(<= % 1.0))
                           count)]
          (* 4.0 (/ in-circle n))))
  2. If we benchmark the existing implementation with Criterium, it will give us something to shoot for. This shows us that the average runtime is 45.7 milliseconds:
    user=> (bench (mc-pi 100000))
    WARNING: Final GC required 1.476534955514225 % of runtime
    Evaluation count : 1320 in 60 samples of 22 calls.
                 Execution time mean : 45.774244 ms
        Execution time std-deviation : 621.709776 µs
       Execution time lower quantile : 44.878862 ms ( 2.5%)
       Execution time upper quantile : 46.787552 ms (97.5%)
                       Overhead used : 1.780493 ns
    
    Found 3 outliers in 60 samples (5.0000 %)
            low-severe       2 (3.3333 %)
            low-mild         1 (1.6667 %)
     Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
    nil
  3. Now, we'll add casts for the primitive doubles in center-dist and add type hint metadata for the return type too:
    (defn center-dist-hint
      (^double [[x y]]
         (Math/sqrt (+ (Math/pow (double x) (double 2.0))
                       (Math/pow (double y)
                                 (double 2.0))))))
  4. Next, change mc-pi to call center-dist-hint and add a type hint to it:
    (defn mc-pi-hint
      (^double [n]
         (let [in-circle (double
                           (->>
                             (repeatedly n rand-point)
                             (map center-dist-hint)
                             (filter #(<= % (double 1.0)))
                             count))]
           (double
             (* (double 4.0) (/ in-circle (double n)))))))

Finally, let's see if this helps:

user=> (bench (mc-pi-hint 100000))
Evaluation count : 1380 in 60 samples of 23 calls.
             Execution time mean : 46.317966 ms
    Execution time std-deviation : 1.237294 ms
   Execution time lower quantile : 44.981042 ms ( 2.5%)
   Execution time upper quantile : 49.716563 ms (97.5%)
                   Overhead used : 1.780493 ns

Found 5 outliers in 60 samples (8.3333 %)
        low-severe       2 (3.3333 %)
        low-mild         3 (5.0000 %)
 Variance from outliers : 14.1789 % Variance is moderately inflated by outliers
nil

In this case, it resulted in a slight slowdown. However, I have seen simple changes such as this result in good performance improvements for a minimal amount of work.

How it works…

The first type hint we used is the ^double, which indicates the functions' return types and parameters. These allow the Clojure compiler to generate a bytecode that's well-optimized for these return types and for working with the Java primitives.

The other use of type hints allow the Clojure compiler to resolve methods at compile time. The clojure.string namespace has a good example of this:

user=> (require '[clojure.string :as string])
user=> (source string/trim)
(defn ^String trim
  ""Removes whitespace from both ends of string.""
  {:added ""1.2""}
  [^CharSequence s]
(.. s toString trim))
nil

We can see that the parameter to the string/trim function is a CharSequence, and the compiler can use this information to know which toString to dispatch to.

See also

  • For more information on the pros and cons of different methods of benchmarking and what the Criterium library does, see the Benchmarking with Criterium recipe
  • For more information on Monte Carlo methods, see the Partitioning Monte Carlo simulations for Better pmap Performance recipe
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.231.15