Pattern 18Lazy Sequence

Intent

To create a sequence whose members are computed only when needed—this allows us to easily stream results from a computation and to work with infinitely long sequences

Overview

We often deal with elements of a sequence one at a time. Since this is so, we generally don’t need to have the entire sequence realized before we start processing it. For instance, we may wish to stream lines of a file off of disk and process them without ever holding the entire file in memory. We could use Pattern 12, Tail Recursion, to seek through the file, but Lazy Sequence provides a much cleaner abstraction for this sort of streaming computation.

Lazy Sequence does so by only creating an element in a sequence when it’s asked for. In the file-reading example, the lines are only read off of disk when asked for, and they can be garbage-collected when we’re done processing them, though we need to take a bit of care to ensure that they are.

When we create an element, we call that realizing the element. Once realized, elements are put into a cache using Pattern 17, Memoization, which means we only need to realize each element in the sequence once. This is demonstrated in Figure 13, Lazy Sequence.

images/LazySeqOne.png

Figure 13. Lazy Sequence. An instance of Lazy Sequence before and after the third element has been realized.

Lazy Sequence also lets us create an extremely useful abstraction: an infinitely long sequence. This may not seem useful at first blush, but since the entire sequence isn’t realized at once, we can work with the beginning of the sequence and defer creation of the rest. This allows us to create, say, an infinitely long string of pseudorandom numbers of which we realize only a portion.

Sample Code: Built-In Lazy Sequences

Let’s start with a couple of simple examples from the built-in library. In the first example, we’ll show how to work with an infinitely long list of integers. In the second, we’ll show how to use Lazy Sequence to generate a series of random test data.

Let’s get started with a dive into the Scala code.

In Scala

Scala’s has built-in support for Lazy Sequence in its Stream library. Perhaps the simplest thing we can do with a lazy sequence is to create an infinite sequence of all integers. Scala’s Stream library has a method that does just that, called from. According to the ScalaDoc, it will “create an infinite stream starting at start and incrementing by step.”

Here, we use from to create a sequence of all integers, starting at 0:

ScalaExamples/src/main/scala/com/mblinn/mbfpp/functional/ls/LazySequence.scala
 
val​ integers = Stream.from(0)

This may seem a strange thing to do, but we can use another method, take, to work with the first few numbers in the sequence. Here we’re using it to take the first five integers from our infinitely long list and then print them:

 
scala>​ val someints = integers take 5
 
someints: scala.collection.immutable.Stream[Int] = Stream(0, ?)
 
 
scala>​ someints foreach println
 
0
 
1
 
2
 
3
 
4

Let’s take a look at a slightly fancier instance of Lazy Sequence that uses another method in Scala’s Sequence library. The continually method creates an infinitely long sequence by repeatedly evaluating the expression passed into here.

Let’s use this to create an infinitely long sequence of pseudorandom numbers. To do so, we create a new random number generator in the val generate, and then we pass generate.nextInt in the continually method, as illustrated in the following code:

ScalaExamples/src/main/scala/com/mblinn/mbfpp/functional/ls/LazySequence.scala
 
val​ generate = ​new​ Random()
 
val​ randoms = Stream.continually(generate.nextInt)

We can now take a few random numbers from our infinite list:

 
scala>​ val aFewRandoms = randoms take 5
 
aFewRandoms: scala.collection.immutable.Stream[Int] = Stream(326862669, ?)
 
 
scala>​ aFewRandoms foreach println
 
326862669
 
-473217479
 
-1619928859
 
785666088
 
1642217833

If we want a few more random numbers, we can use take again with a larger number:

 
scala>​ val aFewMoreRandoms = randoms take 6
 
aFewMoreRandoms: scala.collection.immutable.Stream[Int] = Stream(326862669, ?)
 
 
scala>​ aFewMoreRandoms foreach println
 
326862669
 
-473217479
 
-1619928859
 
785666088
 
1642217833
 
1819425161

Notice how the first five numbers here are repeated. This is because the Stream library relies on Pattern 17, Memoization, to cache copies it’s already seen. The first five values were realized when we originally printed aFewRandoms, the sixth only once we printed aFewMoreRandoms.

In Clojure

Lazy Sequence is built into Clojure as well, but it’s not focused in a single library. Rather, most of Clojure’s core sequence manipulation functions work in a lazy manner. Clojure’s normal range function, for instance, works with Lazy Sequence. The following code generates a list of all the positive integers that fit into an Integer:

ClojureExamples/src/mbfpp/functional/ls/examples.clj
 
(​def​ integers (​range​ Integer/MAX_VALUE))

We can then use the take function to take a few integers from the start of our long list:

 
=> (take 5 integers)
 
(0 1 2 3 4)

To generate our list of random integers, we can use Clojure’s repeatedly function. This takes a function of one argument and repeats it an infinite number of times, as the following code shows:

ClojureExamples/src/mbfpp/functional/ls/examples.clj
 
(​def​ randoms (​repeatedly​ (​fn​ [] (​rand-int​ Integer/MAX_VALUE))))

To take a few, we can use take again:

 
=> (take 5 randoms)
 
(1416806782 956363594 262805953 1830450442 834342645)

If we want some more, we use take with a bigger argument. Again, the first five random integers won’t be recomputed, they’ll be pulled from a memoized cache:

 
=> (take 6 randoms)
 
(1416806782 956363594 262805953 1830450442 834342645 1793189704)

Scala and Clojure’s treatments of Lazy Sequence have a few key differences. Most of Clojure’s sequence-handling functions are lazy, but they recognize the sequence in chunks of thirty-two. If we take a single number from a lazy sequence of integers, Clojure will recognize the first thirty-two integers even though we only asked for one.

We can see this if we add a side effect into the lazy sequence generation. Here, we can see that take recognizes thirty-two integers, even though it only returns the first one:

 
=> (defn print-num [num] (print (str num " ")))
 
#'mbfpp.functional.ls.examples/print-num
 
=> (take 1 (map print-num (range 100)))
 
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
 
19 20 21 22 23 24 25 26 27 28 29 30 31 nil)

Another, more subtle difference comes into play when using Lazy Sequence in the REPL. When the Scala REPL comes across an instance of Lazy Sequence in the form of a Stream, it does not attempt to realize the whole thing.

This is easiest to see when we’ve got an obvious side effect. In the following Scala code, we use continually to print "hello" to the console and store a reference to the produced Stream in printHellos. As we can see, the first "hello" is printed when we call continually, which indicates that the method realizes the first element in the stream:

 
scala>​ val printHellos = Stream.continually(println("hello"))
 
hello
 
printHellos: scala.collection.immutable.Stream[Unit] = Stream((), ?)

If we now call take on printHellos, we don’t get any further "hello"s printed to the console, which means the REPL isn’t trying to realize the returned Stream.

 
scala>​ printHellos take 5
 
res0: scala.collection.immutable.Stream[Unit] = Stream((), ?)

If we want to force the remainder of our "hello"s to be realized, we can use any method that iterates over Stream, or we can just use the force:

 
scala>​ printHellos take 5 force
 
hello
 
hello
 
hello
 
hello
 
res1: scala.collection.immutable.Stream[Unit] = Stream((), (), (), (), ())

This isn’t something you generally need to do, but it’s important to understand when the elements of Lazy Sequence are realized.

In contrast, Clojure’s REPL will attempt to realize an instance of Lazy Sequence; however, defining an instance of Lazy Sequence may not realize the first element! Here we define a print-hellos much like the Scala version. Notice how "hello" isn’t printed to the console.

 
(def print-hellos (repeatedly (fn [] (println "hello"))))

However, if we take five elements, the REPL evaluating the resulting instance of Lazy Sequence will force it to print to the console.

 
=> (take 5 print-hellos)
 
(hello
 
hello
 
nil hello
 
nil hello
 
nil hello
 
nil nil)

This reflects the difference in how Scala and Clojure’s REPL evaluate Lazy Sequence. It also highlights something to watch out for when using Lazy Sequence. Since you can create infinite sequences, we need to ensure that we don’t attempt to realize an entire infinite sequence at once. For instance, if we had forgotten to use take in the Clojure example and had just evaluated (repeatedly (fn [] (println "hello"), we would have attempted to realize an infinitely long sequence of printing "hello"!

Sample Code: Paged Response

In our first example, we looked at a couple of higher-order functions that let us create an instance of Lazy Sequence. Now let’s take a look at how we’d make one from scratch.

The example we’ll use here is a lazy sequence that lets us go through a set of paged data. In our simple example, we’ll simulate the paged data with a local function call, though in a real program this would probably come from an external source such as a web service. Let’s get started with a look at the Scala code.

In Scala

Our Scala solution has two parts: the sequence itself, pagedSequence, and a method to generate some sample paged data, getPage.

We need to define the solution to our problem recursively, much as we would in Pattern 12, Tail Recursion. However, instead of passing our sequence through the call stack, we add to it in each recursive call using the #:: operator.

The following code is the full solution to our paged data problem:

ScalaExamples/src/main/scala/com/mblinn/mbfpp/functional/ls/LazySequence.scala
 
def​ pagedSequence(pageNum: ​Int​): Stream[​String​] =
 
getPage(pageNum) ​match​ {
 
case​ Some(page: ​String​) => page #:: pagedSequence(pageNum + 1)
 
case​ None => Stream.Empty
 
}
 
 
def​ getPage(page: ​Int​) =
 
page ​match​ {
 
case​ 1 => Some(​"Page1"​)
 
case​ 2 => Some(​"Page2"​)
 
case​ 3 => Some(​"Page3"​)
 
case​ _ => None
 
}

Let’s dig into pagedSequence a bit more, starting with the #:: operator, which allows us to prepend a value to a Stream. Here we use it to append the strings "foo" and "bar" to a new Stream:

 
scala>​ val aStream = "foo" ​​#:: "bar" #:: Stream[String]()
 
aStream: scala.collection.immutable.Stream[String] = Stream(foo, ?)

We can get at the head and tail of our Stream, just as we could with other sequences:

 
scala>​ aStream.head
 
res0: String = foo
 
 
scala>​ aStream.tail
 
res1: scala.collection.immutable.Stream[String] = Stream(bar, ?)

Let’s take a closer look at the heart of our solution in the following code snippet:

 
getPage(pageNum) ​match​ {
 
case​ Some(page: ​String​) => page #:: pagedSequence(pageNum + 1)
 
case​ None => Stream.Empty
 
}

We call getPage and match on the result. If we match a Some, then we know that we got back a valid page. We prepend it to our sequence and then recursively call the method generating the sequence, passing in the next page we’re trying to fetch.

If we get a None, we know we’ve gone through all our pages, and we append the empty stream, Stream.Empty, to our lazy sequence. This signals the end of the sequence.

Now we can work with pagedSequence just like we worked with some of the sequences we saw in the previous example. Here we take two pages from the sequence, starting at the first element:

 
scala>​ pagedSequence(1) take 2 force
 
res2: scala.collection.immutable.Stream[String] = Stream(Page1, Page2)

Here we force the whole thing to be realized, which is safe since this sequence, while lazy, isn’t infinite:

 
scala>​ pagedSequence(1) force
 
res3: scala.collection.immutable.Stream[String] = Stream(Page1, Page2, Page3)

That wraps up our Scala paged sequence example. Now let’s take a look at how to do it in Clojure.

In Clojure

In Clojure, we can construct an instance of Lazy Sequence from scratch using lazy-sequence and add to it with cons, as shown in the following snippet:

 
=> (cons 1 (lazy-seq [2]))
 
(1 2)

We can then use recursive function calls to build up useful sequences. To write our paged sequence example in Clojure, we first define a get-page function to mock up our paged data. The core of our solution is in the paged-sequence function.

The paged-sequence function is called with the start page, and it recursively builds up a lazy sequence by fetching that page, appending it to the sequence, and then calling itself with the number of the next page. The entire solution follows:

ClojureExamples/src/mbfpp/functional/ls/examples.clj
 
(​defn​ get-page [page-num]
 
(​cond
 
(​=​ page-num 1) ​"Page1"
 
(​=​ page-num 2) ​"Page2"
 
(​=​ page-num 3) ​"Page3"
 
:default nil))
 
 
(​defn​ paged-sequence [page-num]
 
(​let​ [page (get-page page-num)]
 
(​when​ page
 
(​cons​ page (​lazy-seq​ (paged-sequence (​inc​ page-num)))))))

Now we can work with our lazy sequence like any other. If we call paged-sequence in the REPL, we get the entire sequence:

 
=> (paged-sequence 1)
 
("Page1" "Page2" "Page3")

If we use take, we can get a portion of it:

 
=> (take 2 (paged-sequence 1))
 
("Page1" "Page2")

This can give us a very clean way of working with streaming data.

Discussion

One thing to watch out for when using lazy sequences is accidentally holding on to the head of the sequence when you don’t mean to, as Figure 14, Holding on to the Head demonstrates.

images/HoldingHead.png

Figure 14. Holding on to the Head. Holding on to the head of a lazy sequence will keep the entire sequence in memory.

In Scala, it’s easy to accidentally do this simply by assigning our lazy sequence into a val, as we do in the following code:

ScalaExamples/src/main/scala/com/mblinn/mbfpp/functional/ls/LazySequence.scala
 
val​ holdsHead = {
 
def​ pagedSequence(pageNum: ​Int​): Stream[​String​] =
 
getPage(pageNum) ​match​ {
 
case​ Some(page: ​String​) => {
 
println(​"Realizing "​ + page)
 
page #:: pagedSequence(pageNum + 1)
 
}
 
case​ None => Stream.Empty
 
}
 
pagedSequence(1)
 
}

If we try to force the sequence more than once, we can see that the second time uses the cached copy, as the following REPL output demonstrates:

 
scala>​ holdsHead force
 
Realizing Page1
 
hello
 
Realizing Page2
 
Realizing Page3
 
res0: scala.collection.immutable.Stream[String] = Stream(Page1, Page2, Page3)
 
scala>​ holdsHead force
 
res1: scala.collection.immutable.Stream[String] = Stream(Page1, Page2, Page3)

If we don’t want to hold on to the head of the sequence, we can use def instead of val, as we do in the following code:

ScalaExamples/src/main/scala/com/mblinn/mbfpp/functional/ls/LazySequence.scala
 
def​ doesntHoldHead = {
 
def​ pagedSequence(pageNum: ​Int​): Stream[​String​] =
 
getPage(pageNum) ​match​ {
 
case​ Some(page: ​String​) => {
 
println(​"Realizing "​ + page)
 
page #:: pagedSequence(pageNum + 1)
 
}
 
case​ None => Stream.Empty
 
}
 
pagedSequence(1)
 
}

This forces the sequence to be realized fresh each time it’s forced and does not hold onto the head:

 
scala>​ doesntHoldHead force
 
Realizing Page1
 
Realizing Page2
 
Realizing Page3
 
res2: scala.collection.immutable.Stream[String] = Stream(Page1, Page2, Page3)
 
 
scala>​ doesntHoldHead force
 
Realizing Page1
 
Realizing Page2
 
Realizing Page3
 
res3: scala.collection.immutable.Stream[String] = Stream(Page1, Page2, Page3)

Holding on to the head of a sequence by accident is really no more mysterious than holding on to a reference to any object when you don’t mean to, but it can be surprising if you’re not watching out for it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.164