Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10 Iterators and generators

Have you ever noticed that many Python objects know how to behave inside of a for loop? That’s not an accident. Iteration is so useful, and so common, that Python makes it easy for an object to be iterable. All it has to do is implement a handful of behaviors, known collectively as the iterator protocol.

In this chapter, we’ll explore that protocol and how we can use it to create iterable objects. We’ll do this in three ways:

We’ll create our own iterators via Python classes, directly implementing the protocol ourselves.

We’ll create generators, objects that implement the protocol, based on something that looks very similar to a function. Not surprisingly, these are known as generator functions.

We’ll also create generators using generator expressions, which look quite a bit like list comprehensions.

Even newcomers to Python know that if you want to iterate over the characters in a string, you can write

for i in 'abcd':
    print(i)      ❶

❶ Prints a, b, c, and d, each on a separate line

This feels natural, and that’s the point. What if you just want to execute a chunk of code five times? Can you iterate over the integer 5? Many newcomers to Python assume that the answer is yes and write the following:

for i in 5:     ❶
     print(i)

❶ This doesn’t work.

This code produces an error:

TypeError: 'int' object is not iterable

From this, we can see that while strings, lists, and dicts are iterable, integers aren’t. They aren’t because they don’t implement the iterator protocol, which consists of three parts:

The __iter__ method, which returns an iterator
The __next__ method, which must be defined on the iterator
The StopIteration exception, which the iterator raises to signal the end of the iterations

Sequences (strings, lists, and tuples) are the most common form of iterables, but a large number of other objects, such as files and dicts, are also iterable. Best of all, when you define your own classes, you can make them iterable. All you have to do is make sure that the iterator protocol is in place on your object.

Given those three parts, we can now understand what a for loop really does:

It asks an object whether it’s iterable using the iter built-in function (http:// mng.bz/jgja). This function invokes the __iter__ method on the target object. Whatever __iter__ returns is called the iterator.
If the object is iterable, then the for loop invokes the next built-in function on the iterator that was returned. That function invokes __next__ on the iterator.
If __next__ raises a Stopiteration exception, then the loop exits.

This protocol explains a couple things that tend to puzzle newcomers to Python:

Why don’t we need any indexes? In C-like languages, we need a numeric index for our iterations. That’s so the loop can go through each of the elements of the collection, one at a time. In those cases, the loop is responsible for keeping track of the current location. In Python, the object itself is responsible for producing the next item. The for loop doesn’t know whether we’re on the first item or the last one. But it does know when we’ve reached the end.
How is it that different objects behave differently in for loops? After all, strings return characters, but dicts return keys, and files return lines. The answer is that the iterator object can return whatever it wants. So string iterators return characters, dict iterators return keys, and file iterators return the lines in a file.

If you’re defining a new class, you can make it iterable as follows:

Define an __iter__ method that takes only self as an argument and returns self. In other words, when Python asks your object, “Are you iterable?” the answer will be, “Yes, and I’m my own iterator.”
Define a __next__ method that takes only self as an argument. This method should either return a value or raise StopIteration. If it never returns StopIteration, then any for loop on this object will never exit.

There are some more sophisticated ways to do things, including returning a separate, different object from __iter__. I demonstrate and discuss that later in this chapter.

Here’s a simple class that implements the protocol, wrapping itself around an iterable object but indicating when it reaches each stage of iteration:

class LoudIterator():
    def __init__(self, data):
        print('	Now in __init__')
        self.data = data                        ❶
        self.index = 0                          ❷
 
    def __iter__(self):
        print('	Now in __iter__')
        return self                             ❸
 
    def __next__(self):
        print('	Now in __next__')
        if self.index >= len(self.data):        ❹
            print(
                f'	self.index ({self.index}) is too big; exiting')
            raise StopIteration
 
        value = self.data[self.index]           ❺
        self.index += 1                         ❻
        print('	Got value {value}, incremented index to {self.index}')
        return value
 
 
for one_item in LoudIterator('abc'):
    print(one_item)

❶ Stores the data in an attribute, self.data

❷ Creates an index attribute, keeping track of our current position

❸ Our __iter__ does the simplest thing, returning self.

❹ Raises StopIteration if our self.index has reached the end

❺ Grabs the current value, but doesn’t return it yet

❻ Increments self.index

If we execute this code, we’ll see the following output:

Now in __init__
        Now in __iter__
        Now in __next__
        Got value a, incremented index to 1
a
        Now in __next__
        Got value b, incremented index to 2
b
        Now in __next__
        Got value c, incremented index to 3
c
        Now in __next__
        self.index (3) is too big; exiting

This output walks us through the iteration process that we’ve already seen, starting with a call to __iter__ and then repeated invocations of __next__. The loop exits when the iterator raises StopIteration.

Adding such methods to a class works when you’re creating your own new types. There are two other ways to create iterators in Python:

You can use a generator expression, which we’ve already seen and used. As you might remember, generator expressions look and work similarly to list comprehensions, except that you use round parentheses rather than square brackets. But unlike list comprehensions, which return lists that might consume a great deal of memory, generator expressions return one element at a time.

You can use a generator function --something that looks like a function, but when executed acts like an iterator; for example

def foo():
    yield 1
    yield 2
    yield 3

When we execute foo, the function’s body doesn’t execute. Rather, we get a generator object back--that is, something that implements the iterator protocol. We can thus put it in a for loop:

g = foo()
for one_item in g:
    print(one_item)

This loop will print 1, 2, and 3. Why? Because with each iteration (i.e., each time we call next on g), the function executes through the next yield statement, returns the value it got from yield, and then goes to sleep, waiting for the next iteration. When the generator function exits, it automatically raises StopIteration, thus ending the loop.

Iterators are pervasive in Python because they’re so convenient--and in many ways, they’ve been made convenient because they’re pervasive. In this chapter, you’ll practice writing all of these types of iterators and getting a feel for when each of these techniques should be used.

The two terms iterable and iterator are very similar but have different meanings:

An iterable object can be put inside a for loop or list comprehension. For something to be iterable, it must implement the __iter__ method. That method should return an iterator.
An iterator is an object that implements the __next__ method.

In many cases, an iterable is its own iterator. For example, file objects are their own iterators. But in many other cases, such as strings and lists, the iterable object returns a separate, different object as an iterator.

Table 10.1 What you need to know

Concept	What is it?	Example	To learn more
`iter`	A built-in function that returns an object’s iterator	`iter('abcd')`	http://mng.bz/jgja
`next`	A built-in function that requests the next object from an iterator	`next(i)`	http://mng.bz/WPBg
`StopIteration`	An exception raised to indicate the end of a loop	`raise StopIteration`	http://mng.bz/8p0K
`enumerate`	Helps us to number elements of iterables	`for i, c in enumerate('ab'):` `print(f'{i}: {c}')`	http://mng.bz/qM1K
Iterables	A category of data in Python	Iterables can be put in `for` loops or passed to many functions.	http://mng.bz/EdDq
`itertools`	A module with many classes for implementing iterables	`import itertools`	http://mng.bz/NK4E
`range`	Returns an iterable sequence of integers	`# every 3rd integer, from 10` `# to (not including) 50` `range(10, 50, 3)`	http://mng.bz/B2DJ
`os.listdir`	Returns a list of files in a directory	`os.listdir('/etc/')`	http://mng.bz/YreB
`os.walk`	Iterates over the files in a directory	`os.walk('/etc/')`	http://mng.bz/D2Ky
`yield`	Returns control to the loop temporarily, optionally returning a value	`yield 5`	http://mng.bz/lG9j
`os.path.join`	Returns a string based on the path components	`os.path.join('etc', 'passwd')`	http://mng.bz/oPPM
`time.perf_ counter`	Returns the number of elapsed seconds (as a float) since the program was started	`time.perf_counter()`	http://mng.bz/B21v
`zip`	Takes `n` iterables as arguments and returns an iterator of tuples of length `n`	`# returns [('a', 10),` `# ('b', 20), ('c', 30)]` `zip('abc',` `[10, 20, 30])`	http://mng.bz/Jyzv

Exercise 46 ■ MyEnumerate

The built-in enumerate function allows us to get not just the elements of a sequence, but also the index of each element, as in

for index, letter in enumerate('abc'):
    print(f'{index}: {letter}')

Create your own MyEnumerate class, such that someone can use it instead of enumerate. It will need to return a tuple with each iteration, with the first element in the tuple being the index (starting with 0) and the second element being the current element from the underlying data structure. Trying to use MyEnumerate with a noniterable argument will result in an error.

Working it out

In this exercise, we know that our MyEnumerate class will take a single iterable object. With each iteration, we’ll get back not one of that argument’s elements, but rather a two-element tuple.

This means that at the end of the day, we’re going to need a __next__ method that will return a tuple. Moreover, it’ll need to keep track of the current index. Since __next__, like all methods and functions, loses its local scope between calls, we’ll need to store the current index in another place. Where? On the object itself, as an attribute.

Thus, our __init__ method will initialize two attributes: self.data, where we’ll store the object over which we’re iterating, and self.index, which will start with 0 and be incremented with each call to __next__. Our implementation of __iter__ will be the standard one that we’ve seen so far, namely return self.

Finally __next__ checks to see if self.index has gone past the length of self.data. If so, then we raise StopIteration, which causes the for loop to exit.

So far, we’ve seen that our __iter__ method should consist of the line return self and no more. This is often a fine way to go. But you can get into trouble. For example, what happens if I use our MyEnumerate class in the following way?

e = MyEnumerate('abc')
 
print('** A **')
for index, one_item in e:
    print(f'{index}: {one_item}')
 
print('** B **')
for index, one_item in e:
    print(f'{index}: {one_item}')

We’ll see the following printout:

** A **
0: a
1: b
2: c
** B **

Why didn’t we get a second round of a, b, and c? Because we’re using the same iterator object each time. The first time around, its self.index goes through 0, 1, and 2, and then stops. The second time around, self.index is already at 2, which is greater than len(self.data), and so it immediately exits from the loop.

Our return self solution for __iter__ is fine if that’s the behavior you want. But in many cases, we need something more sophisticated. The easiest solution is to use a second class--a helper class, if you will--which will be the iterator for our class. Many of Python’s built-in classes do this already, including strings, lists, tuples, and dicts. In such a case, we implement __iter__ on the main class, but its job is to return a new instance of the helper class:

# in MyEnumerate
 
def __iter__(self):
    return MyEnumerateIterator(self.data)

Then we define MyEnumerateIterator, a new and separate class, whose __init__ looks much like the one we already defined for MyIterator and whose __next__ is taken directly from MyIterator.

There are two advantages to this design:

As we’ve already seen, by separating the iterable from the iterator, we can put our iterable in as many for loops as we want, without having to worry that it’ll lose the iterations somehow.
The second advantage is organizational. If we want to make a class iterable, the iterations are a small part of the functionality. Thus, do we really want to clutter the class with a __next__, as well as attributes used only when iterating? By delegating such problems to a helper iterator class, we separate out the iterable aspects and allow each class to concentrate on its role.

Many people think that we can solve the problem in a simpler way, simply by resetting self.index to 0 whenever __iter__ is called. But that has some flaws too. It means that if we want to use the same iterable in two different loops simultaneously, they’ll interfere with one another. Such problems won’t occur with a helper class.

Solution

class MyEnumerate():
    def __init__(self, data):                         ❶
        self.data = data                              ❷
        self.index = 0                                ❸
    def __iter__(self):
        return self                                   ❹
 
    def __next__(self):
        if self.index >= len(self.data):              ❺
            raise StopIteration
        value = (self.index, self.data[self.index])   ❻
        self.index += 1                               ❼
        return value                                  ❽
 
for index, letter in MyEnumerate('abc'):
    print(f'{index} : {letter}')

❶ Initializes MyEnumerate with an iterable argument, “data”

❷ Stores “data” on the object as self.data

❸ Initializes self.index with 0

❹ Because our object will be its own iterator, returns self

❺ Are we at the end of the data? If so, then raises StopIteration.

❻ Sets the value to be a tuple, with the index and value

❼ Increments the index

❽ Returns the tuple

You can work through a version of this code in the Python Tutor at http://mng.bz/ JydQ.

Note that the Python Tutor sometimes displays an error message when StopIteration is raised.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

Now that you’ve created a simple iterator class, let’s dig in a bit deeper:

Rewrite MyEnumerate such that it uses a helper class (MyEnumerateIterator), as described in the “Discussion” section. In the end, MyEnumerate will have the __iter__ method that returns a new instance of MyEnumerateIterator, and the helper class will implement __next__. It should work the same way, but will also produce results if we iterate over it twice in a row.
The built-in enumerate method takes a second, optional argument--an integer, representing the first index that should be used. (This is particularly handy when numbering things for nontechnical users, who believe that things should be numbered starting with 1, rather than 0.)
Redefine MyEnumerate as a generator function, rather than as a class.

Exercise 47 ■ Circle

From the examples we’ve seen so far, it might appear as though an iterable simply goes through the elements of whatever data it’s storing and then exits. But an iterator can do anything it wants, and can return whatever data it wants, until the point when it raises StopIteration. In this exercise, we see just how that works.

Define a class, Circle, that takes two arguments when defined: a sequence and a number. The idea is that the object will then return elements the defined number of times. If the number is greater than the number of elements, then the sequence repeats as necessary. You should define the class such that it uses a helper (which I call CircleIterator). Here’s an example:

c = Circle('abc', 5)
print(list(c))          ❶

❶ Prints a, b, c, a, b

Working it out

In many ways, our Circle class is a simple iterator, going through each of its values. But we might need to provide more outputs than we have inputs, circling around to the beginning one or more times.

The trick here is to use the modulus operator (%), which returns the integer remainder from a division operation. Modulus is often used in programs to ensure that we can wrap around as many times as we need.

In this case, we’re retrieving from self.data, as per usual. But the element won’t be self.data[self.index], but rather self.data[self.index % len(self.data)].

Since self.index will likely end up being bigger than len(self.data), we can no longer use that as a test for whether we should raise StopIteration. Rather, we’ll need to have a separate attribute, self.max_times, which tells us how many iterations we should execute.

Once we have all of this in place, the implementation becomes fairly straightforward. Our Circle class remains with only __init__ and __iter__, the latter of which returns a new instance of CircleIterator. Note that we have to pass both self.data and self.max_times to CircleIterator, and thus need to store them as attributes in our instance of Circle.

Our iterator then uses the logic we described in its __next__ method to return one element at a time, until we have self.max_times items.

Oliver Hach and Reik Thormann, who read an earlier edition of this book, shared an elegant solution with me:

class Circle():
 
    def __init__(self, data, max_times):
        self.data = data
        self.max_times = max_times
 
    def __iter__(self):
        n = len(self.data)
        return (self.data[x % n] for x in range(self.max_times))

This version of Circle takes advantage of the fact that an iterating class may return any iterator, not just self, and not just an instance of a helper class. In this case, they returned a generator expression, which is an iterator by all standards.

The generator expression iterates a particular number of times, as determined by self.max_times, feeding that to range. We can then iterate over range, returning the appropriate element of self.data with each iteration.

In this way, we see there are multiple ways to answer the question, “What should __iter__ return?” As long as it returns an iterator object, it doesn’t matter whether it’s an iterable self, an instance of a helper class, or a generator.

Solution

class CircleIterator():
    def __init__(self, data, max_times):
        self.data = data
        self.max_times = max_times
        self.index = 0
 
    def __next__(self):
        if self.index >= self.max_times:
            raise StopIteration
        value = self.data[self.index % len(self.data)]
        self.index += 1
        return value
 
 
class Circle():
    def __init__(self, data, max_times):
        self.data = data
        self.max_times = max_times
 
    def __iter__(self):
        return CircleIterator(self.data,
                              self.max_times)
 
c = Circle('abc', 5)
print(list(c))

You can work through a version of this code in the Python Tutor at http://mng.bz/ wBjg.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

I hope you’re starting to see the potential for iterators, and how they can be written in a variety of ways. Here are some additional exercises to get you thinking about what those ways could be:

Rather than write a helper, you could also define iteration capabilities in a class and then inherit from it. Reimplement Circle as a class that inherits from CircleIterator, which implements __init__ and __next__. Of course, the parent class will have to know what to return in each iteration; add a new attribute in Circle, self.returns, a list of attribute names that should be returned.
Implement Circle as a generator function, rather than as a class.
Implement a MyRange class that returns an iterator that works the same as range, at least in for loops. (Modern range objects have a host of other capabilities, such as being subscriptable. Don’t worry about that.) The class, like range, should take one, two, or three integer arguments.

Exercise 48 ■ All lines, all files

File objects, as we’ve seen, are iterators; when we put them in a for loop, each iteration returns the next line from the file. But what if we want to read through a number of files? It would be nice to have an iterator that goes through each of them.

In this exercise, I’d like you to create just such an iterator, using a generator function. That is, this generator function will take a directory name as an argument. With each iteration, the generator should return a single string, representing one line from one file in that directory. Thus, if the directory contains five files, and each file contains 10 lines, the generator will return a total of 50 strings--each of the lines from file 0, then each of the lines from file 1, then each of the lines from file 2, until it gets through all of the lines from file 4.

If you encounter a file that can’t be opened--because it’s a directory, because you don’t have permission to read from it, and so on--you should just ignore the problem altogether.

Working it out

Let’s start the discussion by pointing out that if you really wanted to do this the right way, you would likely use the os.walk function (http://mng.bz/D2Ky), which goes through each of the files in a directory and then descends into its subdirectories. But we’ll ignore that and work to understand the all_lines generator function that I’ve created here.

First, we run os.listdir on path. This returns a list of strings. It’s important to remember that os.listdir only returns the filenames, not the full path of the file. This means that we can’t just open the filename; we need to combine path with the filename.

We could use str.join, or even just + or an f-string. But there’s a better approach, namely os.path.join (http://mng.bz/oPPM), which takes any number of parameters (thanks to the *args) and then joins them together with the value of os.sep, the directory-separation character for the current operating system. Thus, we don’t need to think about whether we’re on a Unix or Windows system; Python can do that work for us.

What if there’s a problem reading from the file? We then trap that with an except OSError clause, in which we have nothing more than pass. The pass keyword means that Python shouldn’t do anything; it’s needed because of the structure of Python’s syntax, which requires something indented following a colon. But we don’t want to do anything if an error occurs, so we use pass.

And if there’s no problem? Then we simply return the current line using yield. Immediately after the yield, the function goes to sleep, waiting for the next time a for loop invokes next on it.

Note Using except without specifying which exception you might get is generally frowned upon, all the more so if you pair it with pass. If you do this in production code, you’ll undoubtedly encounter problems at some point, and because you haven’t trapped specific exceptions or logged the errors, you’ll have trouble debugging the problem as a result. For a good (if slightly old) introduction to Python exceptions and how they should be used, see: http:// mng.bz/VgBX.

Solution

import os
 
 
def all_lines(path):
    for filename in os.listdir(path):             ❶
        full_filename = os.path.join(path,
                                     filename)    ❷
        try:
            for line in open(full_filename):      ❸
                yield line                        ❹
        except OSError:
            pass                                  ❺

❶ Gets a list of files in path

❷ Uses os.path.join to create a full filename that we’ll open

❸ Opens and iterates over each line in full_filename

❹ Returns the line using yield, needed in iterators

❺ Ignores file-related problems silently

The Python Tutor site doesn’t work with files, so there’s no link to it. But you could see all of the lines from all files in the /etc/ directory on your computer with

for one_line in all_lines('/etc/'):
    print(one_line)

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

If something you want to do as an iterator doesn’t align with an existing class but can be defined as a function, then a generator function will likely be a good way to implement it. Generator functions are particularly useful in taking potentially large quantities of data, breaking them down, and returning their output at a pace that won’t overwhelm the system. Here are some other problems you can solve using generator functions:

Modify all_lines such that it doesn’t return a string with each iteration, but rather a tuple. The tuple should contain four elements: the name of the file, the current number of the file (from all those returned by os.listdir), the line number within the current file, and the current line.
The current version of all_lines returns all of the lines from the first file, then all of the lines from the second file, and so forth. Modify the function such that it returns the first line from each file, and then the second line from each file, until all lines from all files are returned. When you finish printing lines from shorter files, ignore those files while continuing to display lines from the longer files.
Modify all_lines such that it takes two arguments--a directory name, and a string. Only those lines containing the string (i.e., for which you can say s in line) should be returned. If you know how to work with regular expressions and Python’s re module, then you could even make the match conditional on a regular expression.

Note In generator functions, we don’t need to explicitly raise StopIteration. That happens automatically when the generator reaches the end of the function. Indeed, raising StopIteration from within the generator is something that you should not do. If you want to exit from the function prematurely, it’s best to use a return statement. It’s not an error to use return with a value (e.g., return 5) from a generator function, but the value will be ignored. In a generator function, then, yield indicates that you want to keep the generator going and return a value for the current iteration, while return indicates that you want to exit completely.

Exercise 49 ■ Elapsed since

Sometimes, the point of an iterator is not to change existing data, but rather to provide data in addition to what we previously received. Moreover, a generator doesn’t necessarily provide all of its values in immediate succession; it can be queried on occasion, whenever we need an additional value. Indeed, the fact that generators retain all of their state while sleeping between iterations means that they can just hang around, as it were, waiting until needed to provide the next value.

In this exercise, write a generator function whose argument must be iterable. With each iteration, the generator will return a two-element tuple. The first element in the tuple will be an integer indicating how many seconds have passed since the previous iteration. The tuple’s second element will be the next item from the passed argument.

Note that the timing should be relative to the previous iteration, not when the generator was first created or invoked. Thus the timing number in the first iteration will be 0.

You can use time.perf_counter, which returns the number of seconds since the program was started. You could use time.time, but perf_counter is considered more reliable for such purposes.

Working it out

The solution’s generator function takes a single piece of data and iterates over it. However, it returns a two-element tuple for each item it returns, in which the first element is the time since the previous iteration ran.

For this to work, we need to always know when the previous iteration was executed. Thus, we always calculate and set last_time before we yield the current values of delta and item.

However, we need to have a value for delta the first time we get a result back. This should be 0. To get around this, we set last_time to None at the top of the function. Then, with each iteration, we calculate delta to be the difference between current _time and last_time or current_time. If last_time is None, then we’ll get the value of current_time. This should only occur once; after the first iteration, last_time will never be zero.

Normally, invoking a function multiple times means that the local variables are reset with each invocation. However, a generator function works differently: it’s only invoked once, and thus has a single stack frame. This means that the local variables, including parameters, retain their values across calls. We can thus set such values as last_time and use them in future iterations.

Solution

import time
 
def elapsed_since(data):
    last_time = None                                ❶
    for item in data:
        current_time = time.perf_counter()          ❷
        delta = current_time - (last_time
                                or current_time)    ❸
        last_time = time.perf_counter()
        yield (delta, item)                         ❹
 
for t in elapsed_since('abcd'):
    print(t)
    time.sleep(2)

❶ Initializes last_time with None

❷ Gets the current time

❸ Calculates the delta between the last time and now

❹ Returns a two-element tuple

You can work through a version of this code in the Python Tutor at http://mng.bz/ qMjz.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

In this exercise, we saw how we can combine user-supplied data with additional information from the system. Here are some more exercises you can try to get additional practice writing such generator functions:

The existing function elapsed_since reported how much time passed between iterations. Now write a generator function that takes two arguments--a piece of data and a minimum amount of time that must elapse between iterations. If the next element is requested via the iterator protocol (i.e., next), and the time elapsed since the previous iteration is greater than the user-defined minimum, then the value is returned. If not, then the generator uses time.sleep to wait until the appropriate amount of time has elapsed.
Write a generator function, file_usage_timing, that takes a single directory name as an argument. With each iteration, we get a tuple containing not just the current filename, but also the three reports that we can get about a file’s most recent usage: its access time (atime), modification time (mtime), and creation time (ctime). Hint: all are available via the os.stat function.
Write a generator function that takes two elements: an iterable and a function. With each iteration, the function is invoked on the current element. If the result is True, then the element is returned as is. Otherwise, the next element is tested, until the function returns True. Alternative: implement this as a regular function that returns a generator expression.

Exercise 50 ■ MyChain

As you can imagine, iterator patterns tend to repeat themselves. For this reason, Python comes with the itertools module (http://mng.bz/NK4E), which makes it easy to create many types of iterators. The classes in itertools have been optimized and debugged across many projects, and often include features that you might not have considered. It’s definitely worth keeping this module in the back of your mind for your own projects.

One of my favorite objects in itertools is called chain. It takes any number of iterables as arguments and then returns each of their elements, one at a time, as if they were all part of a single iterable; for example

from itertools import chain
 
for one_item in chain('abc', [1,2,3], {'a':1, 'b':2}):
    print(one_item)

This code would print:

a
b
c
1
2
3
a
b

The final 'a' and 'b' come from the dict we passed, since iterating over a dict returns its keys.

While itertools.chain is convenient and clever, it’s not that hard to implement. For this exercise, that’s precisely what you should do: implement a generator function called mychain that takes any number of arguments, each of which is an iterable. With each iteration, it should return the next element from the current iterable, or the first element from the subsequent iterable--unless you’re at the end, in which case it should exit.

Working it out

It’s true that you could create this as a Python class that implements the iterator protocol, with __iter__ and __call__. But, as you can see, the code is so much simpler, easier to understand, and more elegant when we use a generator function.

Our function takes *args as a parameter, meaning that args will be a tuple when our function executes. Because it’s a tuple, we can iterate over its elements, no matter how many there might be.

We’ve stated that each argument passed to mychain should be iterable, which means that we should be able to iterate over those arguments as well. Then, in the inner for loop, we simply yield the value of the current line. This returns the current value to the caller, but also holds onto the current place in the generator function. Thus, the next time we invoke __next__ on our iteration object, we’ll get the next item in the series.

Solution

def mychain(*args):         ❶
    for arg in args:        ❷
        for item in arg:    ❸
            yield item
 
for one_item in mychain('abc', [1,2,3], {'a':1, 'b':2}):
    print(one_item)

❶ args is a tuple of iterables

❷ Loops over each iterable

❸ Loops over each element of each iterable, and yield’s it

You can work through a version of this code in the Python Tutor at http://mng.bz/ 7Xv4.

Screencast solution

Watch this short video walkthrough of the solution: https://livebook.manning.com/ video/python-workout.

Beyond the exercise

In this exercise, we saw how we can better understand some built-in functionality by reimplementing it ourselves. In particular, we saw how we can create our own version of itertools.chain as a generator function. Here are some additional challenges you can solve using generator functions:

The built-in zip function returns an iterator that, given iterable arguments, returns tuples taken from those arguments’ elements. The first iteration will return a tuple from the arguments’ index 0, the second iteration will return a tuple from the arguments’ index 1, and so on, stopping when the shortest of the arguments ends. Thus zip('abc', [10, 20, 30]) returns the iterator equivalent of [('a', 10), ('b', 20), ('c', 30)]. Write a generator function that reimplements zip in this way.
Reimplement the all_lines function from exercise 49 using mychain.
In the “Beyond the exercise” section for exercise 48, you implemented a MyRange class, which mimics the built-in range class. Now do the same thing, but using a generator expression.

Summary

In this chapter, we looked at the iterator protocol and how we can both implement and use it in a variety of ways. While we like to say that there’s only one way to do things in Python, you can see that there are at least three different ways to create an iterator:

Add the appropriate methods to a class
Write a generator function
Use a generator expression

The iterator protocol is both common and useful in Python. By now, it’s a bit of a chicken-and-egg situation--is it worth adding the iterator protocol to your objects because so many programs expect objects to support it? Or do programs use the iterator protocol because so many programs support it? The answer might not be clear, but the implications are. If you have a collection of data, or something that can be interpreted as a collection, then it’s worth adding the appropriate methods to your class. And if you’re not creating a new class, you can still take advantage of iterables with generator functions and expressions.

After doing the exercises in this chapter, I hope that you can see how to do the following:

Add the iterator protocol to a class you’ve written
Add the iterator protocol to a class via a helper iterator class
Write generator functions that filter, modify, and add to iterators that you would otherwise have created or used
Use generator expressions for greater efficiency than list comprehensions

Conclusion

Congratulations! You’ve reached the end of the book, which (if you’re not peeking ahead) means that you’ve finished a large number of Python exercises. As a result, your Python has improved in a few ways.

First, you’re now more familiar with Python syntax and techniques. Like someone learning a foreign language, you might previously have had the vocabulary and grammar structures in place, but now you can express yourself more fluently. You don’t need to think quite as long when deciding what word to choose. You won’t be using constructs that work but are considered un-Pythonic.

Second, you’ve seen enough different problems, and used Python to solve them, that you now know what to do when you encounter new problems. You’ll know what questions to ask, how to break the problems down into their elements, and what Python constructs will best map to your solutions. You’ll be able to compare the trade-offs between different options and then integrate the best ones into your code.

Third, you’re now more familiar with Python’s way of doing things and the vocabulary that the language uses to describe them. This means that the Python documentation, as well as the community’s ecosystem of blogs, tutorials, articles, and videos, will be more understandable to you. The descriptions will make more sense, and the examples will be more powerful.

In short, being more fluent in Python means being able to write better code in less time, while keeping it readable and Pythonic. It also means being able to learn more as you continue on your path as a developer.

I wish you the best of success in your Python career and hope that you’ll continue to find ways to practice your Python as you move forward.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10 Iterators and generators

Create new playlist

Sign In

Sign Up

10 Iterators and generators

Exercise 46 ■ MyEnumerate

Working it out

Solution

Screencast solution

Beyond the exercise

Exercise 47 ■ Circle

Working it out

Solution

Screencast solution

Beyond the exercise

Exercise 48 ■ All lines, all files

Working it out

Solution

Screencast solution

Beyond the exercise

Exercise 49 ■ Elapsed since

Working it out

Solution

Screencast solution

Beyond the exercise

Exercise 50 ■ MyChain

Working it out

Solution

Screencast solution

Beyond the exercise

Summary

Conclusion

Table of Contents for
10 Iterators and generators