Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Lazy Evaluation

A powerful feature of Python is its iterator protocol (which we will get to shortly). This capability is only loosely connected to functional programming per se, since Python does not quite offer lazy data structures in the sense of a language like Haskell. However, use of the iterator protocol—and Python’s many built-in or standard library iteratables—accomplish much the same effect as an actual lazy data structure.

Let us explain the contrast here in slightly more detail. In a language like Haskell, which is inherently lazily evaluated, we might define a list of all the prime numbers in a manner like the following:

-- Define a list of ALL the prime numbers
primes = sieve [2 ..]
  where sieve (p:xs) = p : sieve [x | x <- xs, (x `rem` p)/=0]

This report is not the place to try to teach Haskell, but you can see a comprehension in there, which is in fact the model that Python used in introducing its own comprehensions. There is also deep recursion involved, which is not going to work in Python.

Apart from syntactic differences, or even the ability to recurse to indefinite depth, the significant difference here is that the Haskell version of primes is an actual (infinite) sequence, not just an object capable of sequentially producing elements (as was the primes object we demonstrated in the chapter entitled “Callables”). In particular, you can index into an arbitrary element of the infinite list of primes in Haskell, and the intermediate values will be produced internally as needed based on the syntactic construction of the list itself.

Mind you, one can replicate this in Python too, it just isn’t in the inherent syntax of the language and takes more manual construction. Given the get_primes() generator function discussed earlier, we might write our own container to simulate the same thing, for example:

from collections.abc import Sequence
class ExpandingSequence(Sequence):
    def __init__(self, it):
        self.it = it
        self._cache = []
    def __getitem__(self, index):
        while len(self._cache) <= index:
            self._cache.append(next(self.it))
        return self._cache[index]
    def __len__(self):
        return len(self._cache)

This new container can be both lazy and also indexible:

>>> primes = ExpandingSequence(get_primes())
>>> for _, p in zip(range(10), primes):
....     print(p, end=" ")
....
2 3 5 7 11 13 17 19 23 29
>>> primes[10]
31
>>> primes[5]
13
>>> len(primes)
11
>>> primes[100]
547
>>> len(primes)
101

Of course, there are other custom capabilities we might want to engineer in, since lazy data structures are not inherently intertwined into Python. Maybe we’d like to be able to slice this special sequence. Maybe we’d like a prettier representation of the object when printed. Maybe we should report the length as inf if we somehow signaled it was meant to be infinite. All of this is possible, but it takes a little bit of code to add each behavior rather than simply being the default assumption of Python data structures.

The Iterator Protocol

The easiest way to create an iterator—that is to say, a lazy sequence—in Python is to define a generator function, as was discussed in the chapter entitled “Callables.” Simply use the yield statement within the body of a function to define the places (usually in a loop) where values are produced.

Or, technically, the easiest way is to use one of the many iterable objects already produced by built-ins or the standard library rather than programming a custom one at all. Generator functions are syntax sugar for defining a function that returns an iterator.

Many objects have a method named .__iter__(), which will return an iterator when it is called, generally via the iter() built-in function, or even more often simply by looping over the object (e.g., for item in collection: ...).

What an iterator is is the object returned by a call to iter(something), which itself has a method named .__iter__() that simply returns the object itself, and another method named .__next__(). The reason the iterable itself still has an .__iter__() method is to make iter() idempotent. That is, this identity should always hold (or raise TypeError("object is not iterable")):

iter_seq = iter(sequence)
iter(iter_seq) == iter_seq

The above remarks are a bit abstract, so let us look at a few concrete examples:

>>> lazy = open('06-laziness.md')  # iterate over lines of file
>>> '__iter__' in dir(lazy) and '__next__' in dir(lazy)
True
>>> plus1 = map(lambda x: x+1, range(10))
>>> plus1                  # iterate over deferred computations
<map at 0x103b002b0>
>>> '__iter__' in dir(plus1) and '__next__' in dir(plus1)
True
>>> def to10():
...     for i in range(10):
...         yield i
...
>>> '__iter__' in dir(to10)
False
>>> '__iter__' in dir(to10()) and '__next__' in dir(to10())
True
>>> l = [1,2,3]
>>> '__iter__' in dir(l)
True
>>> '__next__' in dir(l)
False
>>> li = iter(l)          # iterate over concrete collection
>>> li
<list_iterator at 0x103b11278>
>>> li == iter(li)
True

In a functional programming style—or even just generally for readability—writing custom iterators as generator functions is most natural. However, we can also create custom classes that obey the protocol; often these will have other behaviors (i.e., methods) as well, but most such behaviors necessarily rely on statefulness and side effects to be meaningful. For example:

from collections.abc import Iterable
class Fibonacci(Iterable):
    def __init__(self):
        self.a, self.b = 0, 1
        self.total = 0
    def __iter__(self):
        return self
    def __next__(self):
        self.a, self.b = self.b, self.a + self.b
        self.total += self.a
        return self.a
    def running_sum(self):
        return self.total

# >>> fib = Fibonacci()
# >>> fib.running_sum()
# 0
# >>> for _, i in zip(range(10), fib):
# ...     print(i, end=" ")
# ...
# 1 1 2 3 5 8 13 21 34 55
# >>> fib.running_sum()
# 143
# >>> next(fib)
# 89

This example is trivial, of course, but it shows a class that both implements the iterator protocol and also provides an additional method to return something stateful about its instance. Since statefulness is for object-oriented programmers, in a functional programming style we will generally avoid classes like this.

Module: itertools

The module itertools is a collection of very powerful—and carefully designed—functions for performing iterator algebra. That is, these allow you to combine iterators in sophisticated ways without having to concretely instantiate anything more than is currently required. As well as the basic functions in the module itself, the module documentation provides a number of short, but easy to get subtly wrong, recipes for additional functions that each utilize two or three of the basic functions in combination. The third-party module more_itertools mentioned in the Preface provides additional functions that are likewise designed to avoid common pitfalls and edge cases.

The basic goal of using the building blocks inside itertools is to avoid performing computations before they are required, to avoid the memory requirements of a large instantiated collection, to avoid potentially slow I/O until it is stricly required, and so on. Iterators are lazy sequences rather than realized collections, and when combined with functions or recipes in itertools they retain this property.

Here is a quick example of combining a few things. Rather than the stateful Fibonacci class to let us keep a running sum, we might simply create a single lazy iterator to generate both the current number and this sum:

>>> def fibonacci():
...     a, b = 1, 1
...     while True:
...         yield a
...         a, b = b, a+b
...
>>> from itertools import tee, accumulate
>>> s, t = tee(fibonacci())
>>> pairs = zip(t, accumulate(s))
>>> for _, (fib, total) in zip(range(7), pairs):
...     print(fib, total)
...
1 1
1 2
2 4
3 7
5 12
8 20
13 33

Figuring out exactly how to use functions in itertools correctly and optimally often requires careful thought, but once combined, remarkable power is obtained for dealing with large, or even infinite, iterators that could not be done with concrete collections.

The documentation for the itertools module contain details on its combinatorial functions as well as a number of short recipes for combining them. This paper does not have space to repeat those descriptions, so just exhibiting a few of them above will suffice. Note that for practical purposes, zip(), map(), filter(), and range() (which is, in a sense, just a terminating itertools.count()) could well live in itertools if they were not built-ins. That is, all of those functions lazily generate sequential items (mostly based on existing iterables) without creating a concrete sequence. Built-ins like all(), any(), sum(), min(), max(), and functools.reduce() also act on iterables, but all of them, in the general case, need to exhaust the iterator rather than remain lazy. The function itertools.product() might be out of place in its module since it also creates concrete cached sequences, and cannot operate on infinite iterators.

Chaining Iterables

The itertools.chain() and itertools.chain.from_iterable() functions combine multiple iterables. Built-in zip() and itertools.zip_longest() also do this, of course, but in manners that allow incremental advancement through the iterables. A consequence of this is that while chaining infinite iterables is valid syntactically and semantically, no actual program will exhaust the earlier iterable. For example:

from itertools import chain, count
thrice_to_inf = chain(count(), count(), count())

Conceptually, thrice_to_inf will count to infinity three times, but in practice once would always be enough. However, for merely large iterables—not for infinite ones—chaining can be very useful and parsimonious:

def from_logs(fnames):
    yield from (open(file) for file in fnames)
lines = chain.from_iterable(from_logs(
              ['huge.log', 'gigantic.log']))

Notice that in the example given, we didn’t even need to pass in a concrete list of files—that sequence of filenames itself could be a lazy iterable per the API given.

Besides the chaining with itertools, we should mention collections.ChainMap() in the same breath. Dictionaries (or generally any collections.abc.Mapping) are iterable (over their keys). Just as we might want to chain multiple sequence-like iterables, we sometimes want to chain together multiple mappings without needing to create a single larger concrete one. ChainMap() is handy, and does not alter the underlying mappings used to construct it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Lazy Evaluation

Create new playlist

Sign In

Sign Up

Chapter 3. Lazy Evaluation

The Iterator Protocol

Module: itertools

Chaining Iterables

Table of Contents for
3. Lazy Evaluation