The Ultimately Lazy Streams

Placing a lazy view on a strict collection doesn’t change the collection; it helps delay executing operations to the last possible moment. In other words, with a lazy view the values may all be in there; you simply won’t process them urgently or eagerly. A Stream is lazy to the bone—it produces values only on demand.

That may sound odd at first and may raise a number of questions: how many values would it hold, when does it create it, how do we get the values? A stream may be finite or infinite—that’s right, an infinite series. Wait a minute, if it is infinite in size, then how could we possibly store it on our machines, with the limited amount of memory we have? Well, now you know what this cloud computing is all about, with such abundant storage—just kidding. An infinite series generates values only when you ask it to, and produces only the values you ask for. In essence it says, “I’m infinite, but I dare you to ask for all the values.”

We’ll work through two examples so you can get the hang of streams and infinite series. Let’s first create a number generator, much like the ones you’ll find in offices that serve a large number of customers. With each pull a new number is generated, the one next in sequence. Here’s the code that starts generating values, given a starting number.

Parallel/numberGenerator.scala
 
def​ generate(starting: ​Int​) : Stream[​Int​] = {
 
starting #:: generate(starting + 1)
 
}
 
 
println(generate(25))

The generate function takes an integer starting as its parameter and returns a Stream of integers. Its implementation uses a special function #:: to concatenate the value in the variable starting and the result of a recursive call to the generate function. In concept, the function #:: of Stream is much like the function :: of List; they both concatenate or prefix a value to the respective collection. However, the function on the Stream is lazy; it performs the concatenation only on demand, postponing the execution until the final result is requested.

Thus, unlike a regular recursion, the program won’t eagerly pounce on that call. Instead, it lazily postpones calling that function. You must be curious to know the result of calling this function. We’ll quench that thirst by calling the function and printing the result of that call.

 
Stream(25, ?)

The output tells us that we have a stream with the initial value 25 followed by a yet-to-be-computed value. That seems like the stream is throwing a challenge: “If you want the next one, come and get it.” If you don’t make any call on the stream, it won’t do any real work or take up any space for elements.

There’s only one way to make the stream generate values and do some work: you’ve got to force a result out of it. You could use a call to the force method for that, but be careful not to call force on a stream that is infinite—you’ll end up running out of memory, even on the cloud! Alternately you can call a method that will force a result that is a non-stream or non-lazy, like a call to toList, for example. Again, make sure to perform this on a finite stream.

That begs the question, how can we convert an infinite stream to a finite stream? The take method can help us with that. This method also results in a stream, but unlike the original stream you call it on, the resulting stream is finite in size. Let’s take a look at the code to get some data out of the stream we created.

Parallel/numberGenerator.scala
 
println(generate(25).take(10).force)
 
println(generate(25).take(10).toList)

The code shows examples of both force and toList. The first method will force the stream to generate values. The second method will do the same and, in addition, convert the result to a strict collection, a list. Let’s take a look at the result in the output.

 
Stream(25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
 
List(25, 26, 27, 28, 29, 30, 31, 32, 33, 34)

The force method kept the result as a stream but finite in size as limited by the take method. The toList method helped transform that into a familiar list.

The take method helped to constraint the stream to a finite number of elements. You may also ask the stream to keep generating—on force, of course—while a certain condition is not met. Let’s ask the stream to generate values until it reaches a number, say 40.

Parallel/numberGenerator.scala
 
println(generate(25).takeWhile{ _ < 40 }.force)

We replaced the call to take with takeWhile, which takes a function value as a parameter. As long as the expression in this function value keeps returning true, the force method will continue to generate values. Once the function value returns false the generation will terminate. We can see this from the output:

 
Stream(25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)

We created an infinite series and then bound it. The series doesn’t have to be continuous. Let’s look at another example that shows how to create a series of prime numbers.

Parallel/primes.scala
 
def​ isDivisibleBy(number: ​Int​, divisor: ​Int​) = number % divisor == 0
 
 
def​ isPrime(number: ​Int​) =
 
number > 1 && !(2 to number - 1).exists { isDivisibleBy(number, _) }
 
 
def​ primes(starting: ​Int​) : Stream[​Int​] = {
 
println(s​"computing for $starting"​)
 
if​(isPrime(starting))
 
starting #:: primes(starting + 1)
 
else
 
primes(starting + 1)
 
}

The first two functions are self-explanatory. The interesting parts are in the primes function. In this function we first print a message to display its call. If the given number is prime, we return that number with a request to lazily fetch all the primes that follow. If the given number isn’t prime, we immediately proceed to search for the prime that follows this number.

This example is not much different from the number generator, except that the generation isn’t contiguous. We’ll use this example to learn about one other feature of streams: they memoize the values they generate. That’s nothing but caching, but in our field we prefer to give weird names that no one understands. When the stream produces a new value on demand, it will cache it—I mean memoize it—before returning. Then if you ask for the same value again, it doesn’t have to repeat the computations. Let’s take a look at this by making two calls on the stream we create from the primes function.

Parallel/primes.scala
 
val​ primesFrom100 = primes(100)
 
 
println(primesFrom100.take(3).toList)
 
println(​"Let's ask for more..."​)
 
println(primesFrom100.take(4).toList)

We store the stream returned by the call to primes in the primesFrom100 variable. We use this variable twice: once to get the first three values and again to get the first four values. The first call to take, with 3 as the argument, creates a finite stream that’s backed by the original infinite stream. The call to toList triggers the computations and places the result in a list. The second call to take is back on the original infinite stream, but this time we ask for four values. It should give us the three values already generated and, in addition, produce a new value. That’s because the values that the stream produced already are safely memoized and reused, as we see from the output—only the new value is generated:

 
computing for 100
 
computing for 101
 
computing for 102
 
computing for 103
 
computing for 104
 
computing for 105
 
computing for 106
 
computing for 107
 
List(101, 103, 107)
 
Let's ask for more...
 
computing for 108
 
computing for 109
 
List(101, 103, 107, 109)

Streams are one of the most charming features in the Scala library. They can come in quite handy to implement a variety of algorithms where we can conceptualize problems into a series that can be lazily generated and executed on demand. You already saw one such example, back in Trampoline Calls. The tail recursion can be seen as an infinite series problem. The execution of a recursion may either produce another recursion, which can be lazily evaluated, or terminate a recursion, yielding a result. Now that you have the hang of infinite series, you’ll soon be looking to see if streams are a fit for the problems you encounter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.10.246