Pattern 17 | Memoization |
To cache the results of a pure function call to avoid performing the same computation more than once
Since pure functions always return the same value for given arguments, it’s possible to replace a pure function call with cached results.
We can do this manually by writing a function that keeps track of its previous arguments. When it’s called, it first checks its cache to see if it has already been called with the passed-in arguments. If it has, it returns the cached value. Otherwise, it performs the computation.
Some languages provide first-class support for Memoization using higher-order functions. Clojure, for instance, has a function called memoize that takes a function and returns a new one that will cache results. Scala doesn’t have a built-in memoization function, so we’ll use a simple manual implementation.
One use for Memoization is as a simple cache for expensive or time-consuming functions, especially when the function is called multiple times with the same argument. In this example, we’ll simulate the time-consuming operation by having it sleep the thread.
Let’s get started with a look at our simulated expensive function call. As an example, we’re using a lookup by ID from some (presumably slow) datastore. To fake it out here, we sleep the thread for a second before returning a value from a static map. We also print the ID we’re looking up to the console to demonstrate when the function is being executed:
ScalaExamples/src/main/scala/com/mblinn/mbfpp/functional/memoization/Examples.scala | |
| def expensiveLookup(id: Int) = { |
| Thread.sleep(1000) |
| println(s"Doing expensive lookup for $id") |
| Map(42 -> "foo", 12 -> "bar", 1 -> "baz").get(id) |
| } |
Just as we’d expect, the lengthy function is executed each time we call it, as we can see from the console output:
| scala> expensiveLookup(42) |
| Doing expensive lookup for 42 |
| res0: Option[String] = Some(foo) |
| |
| scala> expensiveLookup(42) |
| Doing expensive lookup for 42 |
| res1: Option[String] = Some(foo) |
Now let’s take a look at a simple memoized version of expensiveLookup. To create it we’ll use memoizeExpensiveLookup, which initializes a cache and returns a new function that wraps calls to memoizeExpensiveFunction.
The new function first checks its cache to see if it has results from a previous function call. If it does, it returns the cached results. Otherwise it calls the expensive lookup and caches the results before returning them.
Finally, we call memoizeExpensiveFunction and store a reference to the function it returns into a new var. The full solution is in the following code:
ScalaExamples/src/main/scala/com/mblinn/mbfpp/functional/memoization/Examples.scala | |
| def memoizeExpensiveLookup() = { |
| var cache = Map[Int, Option[String]]() |
| (id: Int) => |
| cache.get(id) match { |
| case Some(result: Option[String]) => result |
| case None => { |
| val result = expensiveLookup(id) |
| cache += id -> result |
| result |
| } |
| } |
| } |
| val memoizedExpensiveLookup = memoizeExpensiveLookup |
As we can see from the following REPL output, the expensive function is only called the first time for a given argument. After that, it returns the cached copy:
| scala> memoizedExpensiveLookup(42) |
| Doing expensive lookup for 42 |
| res2: Option[String] = Some(foo) |
| |
| scala> memoizedExpensiveLookup(42) |
| res3: Option[String] = Some(foo) |
One quirk with this example is in the last line:
| val memoizedExpensiveLookup = memoizeExpensiveLookup |
Here, we’re having memoizeExpensiveLookup return a new function, and we’re storing a reference to it. This allows us to wrap the cache up in a closure so that only the function has a reference to it. If we needed another cache, we could create it like so:
| scala> val memoizedExpensiveLookup2 = memoizeExpensiveLookup |
| memoizedExpensiveLookup2: Int => Option[String] = <function1> |
| |
| scala> memoizedExpensiveLookup2(42) |
| Doing expensive lookup for 42 |
| res4: Option[String] = Some(foo) |
Our Scala solution is a bit clumsy since we’ve done it manually for a single, specific case, but it serves as a good model for how memoization works behind the scenes. Let’s take a look at how we can use Clojure’s memoize function to solve the same problem.
In Clojure, we’ll start with a similar simulated expensive function. However, we won’t manually memoize it. Instead, we’ll use Clojure’s memoize function to automatically return a memoized version of the function, as this code shows:
ClojureExamples/src/mbfpp/functional/memoization/examples.clj | |
| (defn expensive-lookup [id] |
| (Thread/sleep 1000) |
| (println (str "Lookup for " id)) |
| ({42 "foo" 12 "bar" 1 "baz"} id)) |
| |
| (def memoized-expensive-lookup |
| (memoize expensive-lookup)) |
As we can see from the following REPL output, it behaves similarly to the Scala version and only performs the expensive operation once:
| => (memoized-expensive-lookup 42) |
| Lookup for 42 |
| "foo" |
| => (memoized-expensive-lookup 42) |
| "foo" |
Behind the scenes, the memoize function creates a new function that’s much like the manual example we saw in Scala that uses a map as a cache.
One use of Memoization we didn’t cover here is in solving dynamic programming problems, which is one of its original uses. Dynamic programming problems are problems that can be broken down into simpler subproblems recursively. A classic, easy-to-understand example is computing a Fibonacci number.
The formula for calculating the nth Fibonacci number adds together the previous two numbers in the sequence. A simple Clojure function to calculate a Fibonacci number using this definition follows:
ClojureExamples/src/mbfpp/functional/memoization/examples.clj | |
| (def slow-fib |
| (fn [n] |
| (cond |
| (<= n 0) 0 |
| (< n 2) 1 |
| :else (+ (slow-fib (- n 1)) (slow-fib (- n 2)))))) |
The nice thing about this function is that it mirrors the mathematical definition. However, it needs to recursively compute its subparts repeatedly, so its performance is terrible for even moderately large numbers. If we memoize the function, as we do in the following code, then the subparts are cached and the function can perform reasonably well:
ClojureExamples/src/mbfpp/functional/memoization/examples.clj | |
| (def mem-fib |
| (memoize |
| (fn [n] |
| (cond |
| (<= n 0) 0 |
| (< n 2) 1 |
| :else (+ (mem-fib (- n 1)) (mem-fib (- n 2))))))) |
Running the two functions shows the drastic difference in performance:
| => (time (slow-fib 40)) |
| "Elapsed time: 6689.204 msecs" |
| 102334155 |
| => (time (mem-fib 40)) |
| "Elapsed time: 0.402 msecs" |
| 102334155 |
Dynamic programming problems are rich and fascinating; however, they only pop up in a limited number of domains. I’ve generally seen memoization used as a simple, convenient cache for expensive or long-lived operations rather than as a dynamic programming tool.
3.139.83.96