A powerful feature of Python is its iterator protocol (which we will get to shortly). This capability is only loosely connected to functional programming per se, since Python does not quite offer lazy data structures in the sense of a language like Haskell. However, use of the iterator protocol—and Python’s many built-in or standard library iteratables—accomplish much the same effect as an actual lazy data structure.
Let us explain the contrast here in slightly more detail. In a language like Haskell, which is inherently lazily evaluated, we might define a list of all the prime numbers in a manner like the following:
-- Define a list of ALL the prime numbers
primes
=
sieve
[
2
..
]
where
sieve
(
p
:
xs
)
=
p
:
sieve
[
x
|
x
<-
xs
,
(
x
`
rem
`
p
)
/=
0
]
This report is not the place to try to teach Haskell, but you can see a comprehension in there, which is in fact the model that Python used in introducing its own comprehensions. There is also deep recursion involved, which is not going to work in Python.
Apart from syntactic differences, or even the ability to recurse to
indefinite depth, the significant difference here is that the Haskell
version of primes
is an actual (infinite) sequence, not just an object
capable of sequentially producing elements (as was the primes
object
we demonstrated in the chapter entitled “Callables”). In
particular, you can index into an arbitrary element of the infinite list
of primes in Haskell, and the intermediate values will be produced
internally as needed based on the syntactic construction of the list
itself.
Mind you, one can replicate this in Python too, it just isn’t in the
inherent syntax of the language and takes more manual construction.
Given the get_primes()
generator function discussed earlier, we might
write our own container to simulate the same thing, for example:
from
collections.abc
import
Sequence
class
ExpandingSequence
(
Sequence
):
def
__init__
(
self
,
it
):
self
.
it
=
it
self
.
_cache
=
[]
def
__getitem__
(
self
,
index
):
while
len
(
self
.
_cache
)
<=
index
:
self
.
_cache
.
append
(
next
(
self
.
it
))
return
self
.
_cache
[
index
]
def
__len__
(
self
):
return
len
(
self
.
_cache
)
This new container can be both lazy and also indexible:
>>>
primes
=
ExpandingSequence
(
get_primes
())
>>>
for
_
,
p
in
zip
(
range
(
10
),
primes
):
....
(
p
,
end
=
" "
)
....
2
3
5
7
11
13
17
19
23
29
>>>
primes
[
10
]
31
>>>
primes
[
5
]
13
>>>
len
(
primes
)
11
>>>
primes
[
100
]
547
>>>
len
(
primes
)
101
Of course, there are other custom capabilities we might want to engineer
in, since lazy data structures are not inherently intertwined into
Python. Maybe we’d like to be able to slice this special sequence. Maybe
we’d like a prettier representation of the object when printed. Maybe we
should report the length as inf
if we somehow signaled it was meant to
be infinite. All of this is possible, but it takes a little bit of code
to add each behavior rather than simply being the default assumption of
Python data structures.
The easiest way to create an iterator—that is to say, a lazy sequence—in
Python is to define a generator function, as was discussed in the chapter entitled “Callables.” Simply use the yield
statement within the body of
a function to define the places (usually in a loop) where values are
produced.
Or, technically, the easiest way is to use one of the many iterable objects already produced by built-ins or the standard library rather than programming a custom one at all. Generator functions are syntax sugar for defining a function that returns an iterator.
Many objects have a method named .__iter__()
, which will return an
iterator when it is called, generally via the iter()
built-in
function, or even more often simply by looping over the object (e.g.,
for item in collection: ...
).
What an iterator is is the object returned by a call to
iter(something)
, which itself has a method named .__iter__()
that
simply returns the object itself, and another method named
.__next__()
. The reason the iterable itself still has an .__iter__()
method is to make iter()
idempotent. That is, this identity should
always hold (or raise TypeError("object is not iterable")
):
iter_seq
=
iter
(
sequence
)
iter
(
iter_seq
)
==
iter_seq
The above remarks are a bit abstract, so let us look at a few concrete examples:
>>>
lazy
=
open
(
'06-laziness.md'
)
# iterate over lines of file
>>>
'__iter__'
in
dir
(
lazy
)
and
'__next__'
in
dir
(
lazy
)
True
>>>
plus1
=
map
(
lambda
x
:
x
+
1
,
range
(
10
))
>>>
plus1
# iterate over deferred computations
<
map
at
0x103b002b0
>
>>>
'__iter__'
in
dir
(
plus1
)
and
'__next__'
in
dir
(
plus1
)
True
>>>
def
to10
():
...
for
i
in
range
(
10
):
...
yield
i
...
>>>
'__iter__'
in
dir
(
to10
)
False
>>>
'__iter__'
in
dir
(
to10
())
and
'__next__'
in
dir
(
to10
())
True
>>>
l
=
[
1
,
2
,
3
]
>>>
'__iter__'
in
dir
(
l
)
True
>>>
'__next__'
in
dir
(
l
)
False
>>>
li
=
iter
(
l
)
# iterate over concrete collection
>>>
li
<
list_iterator
at
0x103b11278
>
>>>
li
==
iter
(
li
)
True
In a functional programming style—or even just generally for readability—writing custom iterators as generator functions is most natural. However, we can also create custom classes that obey the protocol; often these will have other behaviors (i.e., methods) as well, but most such behaviors necessarily rely on statefulness and side effects to be meaningful. For example:
from
collections.abc
import
Iterable
class
Fibonacci
(
Iterable
):
def
__init__
(
self
):
self
.
a
,
self
.
b
=
0
,
1
self
.
total
=
0
def
__iter__
(
self
):
return
self
def
__next__
(
self
):
self
.
a
,
self
.
b
=
self
.
b
,
self
.
a
+
self
.
b
self
.
total
+=
self
.
a
return
self
.
a
def
running_sum
(
self
):
return
self
.
total
# >>> fib = Fibonacci()
# >>> fib.running_sum()
# 0
# >>> for _, i in zip(range(10), fib):
# ... print(i, end=" ")
# ...
# 1 1 2 3 5 8 13 21 34 55
# >>> fib.running_sum()
# 143
# >>> next(fib)
# 89
This example is trivial, of course, but it shows a class that both implements the iterator protocol and also provides an additional method to return something stateful about its instance. Since statefulness is for object-oriented programmers, in a functional programming style we will generally avoid classes like this.
The module itertools
is a collection of very powerful—and carefully
designed—functions for performing iterator algebra. That is, these
allow you to combine iterators in sophisticated ways without having to
concretely instantiate anything more than is currently required. As well
as the basic functions in the module itself, the
module documentation
provides a number of short, but easy to get subtly wrong, recipes for
additional functions that each utilize two or three of the basic
functions in combination. The third-party module more_itertools
mentioned in the Preface provides additional functions that are likewise
designed to avoid common pitfalls and edge cases.
The basic goal of using the building blocks inside itertools
is to
avoid performing computations before they are required, to avoid the
memory requirements of a large instantiated collection, to avoid
potentially slow I/O until it is stricly required, and so on. Iterators
are lazy sequences rather than realized collections, and when combined
with functions or recipes in itertools
they retain this property.
Here is a quick example of combining a few things. Rather than the
stateful Fibonacci
class to let us keep a running sum, we might simply
create a single lazy iterator to generate both the current number and
this sum:
>>>
def
fibonacci
():
...
a
,
b
=
1
,
1
...
while
True
:
...
yield
a
...
a
,
b
=
b
,
a
+
b
...
>>>
from
itertools
import
tee
,
accumulate
>>>
s
,
t
=
tee
(
fibonacci
())
>>>
pairs
=
zip
(
t
,
accumulate
(
s
))
>>>
for
_
,
(
fib
,
total
)
in
zip
(
range
(
7
),
pairs
):
...
(
fib
,
total
)
...
1
1
1
2
2
4
3
7
5
12
8
20
13
33
Figuring out exactly how to use functions in itertools
correctly and
optimally often requires careful thought, but once combined, remarkable
power is obtained for dealing with large, or even infinite, iterators
that could not be done with concrete collections.
The documentation for the itertools
module contain details on its
combinatorial functions as well as a number of short recipes for
combining them. This paper does not have space to repeat those
descriptions, so just exhibiting a few of them above will suffice. Note
that for practical purposes, zip()
, map()
, filter()
, and range()
(which is, in a sense, just a terminating itertools.count()
) could
well live in itertools
if they were not built-ins. That is, all of
those functions lazily generate sequential items (mostly based on
existing iterables) without creating a concrete sequence. Built-ins like
all()
, any()
, sum()
, min()
, max()
, and functools.reduce()
also act on iterables, but all of them, in the general case, need to
exhaust the iterator rather than remain lazy. The function
itertools.product()
might be out of place in its module since it also
creates concrete cached sequences, and cannot operate on infinite
iterators.
The itertools.chain()
and itertools.chain.from_iterable()
functions
combine multiple iterables. Built-in zip()
and
itertools.zip_longest()
also do this, of course, but in manners that
allow incremental advancement through the iterables. A consequence of
this is that while chaining infinite iterables is valid syntactically
and semantically, no actual program will exhaust the earlier iterable.
For example:
from
itertools
import
chain
,
count
thrice_to_inf
=
chain
(
count
(),
count
(),
count
())
Conceptually, thrice_to_inf
will count to infinity three times, but in
practice once would always be enough. However, for merely large
iterables—not for infinite ones—chaining can be very useful and
parsimonious:
def
from_logs
(
fnames
):
yield from
(
open
(
file
)
for
file
in
fnames
)
lines
=
chain
.
from_iterable
(
from_logs
(
[
'huge.log'
,
'gigantic.log'
]))
Notice that in the example given, we didn’t even need to pass in a concrete list of files—that sequence of filenames itself could be a lazy iterable per the API given.
Besides the chaining with itertools
, we should mention
collections.ChainMap()
in the same breath. Dictionaries (or generally
any collections.abc.Mapping
) are iterable (over their keys). Just as
we might want to chain multiple sequence-like iterables, we sometimes
want to chain together multiple mappings without needing to create a
single larger concrete one. ChainMap()
is handy, and does not alter
the underlying mappings used to construct it.
3.149.250.11