Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7 Generators, Iterators, and Asynchronous Programming

Generators are another one of those features that makes Python a peculiar language over more traditional ones. In this chapter, we will explore their rationale, why they were introduced in the language, and the problems they solve. We will also cover how to address problems idiomatically by using generators, and how to make our generators (or any iterable, for that matter) Pythonic.

We will understand why iteration (in the form of the iterator pattern) is automatically supported in the language. From there, we will take another journey and explore how generators became such a fundamental feature of Python in order to support other functionality, such as coroutines and asynchronous programming.

The goals of this chapter are as follows:

To create generators that improve the performance of our programs
To study how iterators (and the iterator pattern, in particular) are deeply embedded in Python
To solve problems that involve iteration idiomatically
To understand how generators work as the basis for coroutines and asynchronous programming
To explore the syntactic support for coroutines—yield from, await, and async def

Mastering generators will take you a long way in writing idiomatic Python, hence the importance of them for this book. In this chapter, we not only study how to use generators, but we also explore their internals, in order to deeply understand how they work.

Technical requirements

The examples in this chapter will work with any version of Python 3.9 on any platform.

The code used in this chapter can be found at https://github.com/PacktPublishing/Clean-Code-in-Python-Second-Edition. The instructions are available in the README file.

Creating generators

Generators were introduced in Python a long time ago (PEP-255), with the idea of introducing iteration in Python while improving the performance of the program (by using less memory) at the same time.

The idea of a generator is to create an object that is iterable, and, while it's being iterated, will produce the elements it contains, one at a time. The main use of generators is to save memory—instead of having a very large list of elements in memory, holding everything at once, we have an object that knows how to produce each particular element, one at a time, as it is required.

This feature enables lazy computations of heavyweight objects in memory, in a similar manner to what other functional programming languages (Haskell, for instance) provide. It would even be possible to work with infinite sequences because the lazy nature of generators enables such an option.

A first look at generators

Let's start with an example. The problem at hand now is that we want to process a large list of records and get some metrics and indicators over them. Given a large dataset with information about purchases, we want to process it in order to get the lowest sale, the highest sale, and the average price of a sale.

For the simplicity of this example, we will assume a CSV with only two fields, in the following format:

<purchase_date>, <price>
...

We are going to create an object that receives all the purchases, and this will give us the necessary metrics. We could get some of these values out of the box by simply using the min() and max() built-in functions, but that would require iterating all of the purchases more than once, so instead, we are using our custom object, which will get these values in a single iteration.

The code that will get the numbers for us looks rather simple. It's just an object with a method that will process all the prices in one go, and, at each step, will update the value of each particular metric we are interested in. First, we will show the first implementation in the following listing, and, later on in this chapter (once we have seen more about iteration), we will revisit this implementation and get a much better (and more compact) version of it. For now, we are settling with the following:

class PurchasesStats:
    def __init__(self, purchases):
        self.purchases = iter(purchases)
        self.min_price: float = None
        self.max_price: float = None
        self._total_purchases_price: float = 0.0
        self._total_purchases = 0
        self._initialize()
    def _initialize(self):
        try:
            first_value = next(self.purchases)
        except StopIteration:
            raise ValueError("no values provided")
        self.min_price = self.max_price = first_value
        self._update_avg(first_value)
    def process(self):
        for purchase_value in self.purchases:
            self._update_min(purchase_value)
            self._update_max(purchase_value)
            self._update_avg(purchase_value)
        return self
    def _update_min(self, new_value: float):
        if new_value < self.min_price:
            self.min_price = new_value
    def _update_max(self, new_value: float):
        if new_value > self.max_price:
            self.max_price = new_value
    @property
    def avg_price(self):
        return self._total_purchases_price / self._total_purchases
    def _update_avg(self, new_value: float):
        self._total_purchases_price += new_value
        self._total_purchases += 1
    def __str__(self):
        return (
            f"{self.__class__.__name__}({self.min_price}, "
            f"{self.max_price}, {self.avg_price})"
        )

This object will receive all the totals for purchases and process the required values. Now, we need a function that loads these numbers into something that this object can process. Here is the first version:

def _load_purchases(filename):
    purchases = []
    with open(filename) as f:
        for line in f:
            *_, price_raw = line.partition(",")
            purchases.append(float(price_raw))
    return purchases

This code works; it loads all the numbers of the file into a list that, when passed to our custom object, will produce the numbers we want. It has a performance issue, though. If you run it with a rather large dataset, it will take a while to complete, and it might even fail if the dataset is large enough to not fit into the main memory.

If we take a look at our code that consumes this data, it is processing purchases, one at a time, so we might be wondering why our producer fits everything in memory at once. It is creating a list where it puts all of the content of the file, but we know we can do better.

The solution is to create a generator. Instead of loading the entire content of the file in a list, we will produce the results one at a time. The code will now look like this:

def load_purchases(filename):
    with open(filename) as f:
        for line in f:
            *_, price_raw = line.partition(",")
            yield float(price_raw)

If you measure the process this time, you will notice that the usage of memory has dropped significantly. We can also see how the code looks simpler—there is no need to define the list (therefore, there is no need to append to it), and the return statement has also disappeared.

In this case, the load_purchases function is a generator function, or simply a generator.

In Python, the mere presence of the keyword yield in any function makes it a generator, and, as a result, when calling it, nothing other than creating an instance of the generator will happen:

>>> load_purchases("file")
<generator object load_purchases at 0x...>

A generator object is an iterable (we will revisit iterables in more detail later on), which means that it can work with for loops. Note how we did not have to change anything on the consumer code—our statistics processor remained the same, with the for loop unmodified, after the new implementation.

Working with iterables allows us to create these kinds of powerful abstractions that are polymorphic with respect to for loops. As long as we keep the iterable interface, we can iterate over that object transparently.

What we're exploring in this chapter is another case of idiomatic code that blends well with Python itself. In previous chapters, we have seen how we can implement our own context managers to connect our objects into with statements, or how can we create custom container objects to leverage the in operator, or booleans for the if statement, and so on. Now it's the turn of the for operator, and for that, we'll create iterators.

Before going into the details and nuances of generators, we can take a quick look at how generators relate to a concept that we have already seen: comprehensions. A generator in the form of a comprehension is called a generator expression, and we'll discuss it briefly in the next section.

Generator expressions

Generators save a lot of memory, and since they are iterators, they are a convenient alternative to other iterables or containers that require more space in memory such as lists, tuples, or sets.

Much like these data structures, they can also be defined by comprehension, only that they are called a generator expression (there is an ongoing argument about whether they should be called generator comprehensions. In this book, we will just refer to them by their canonical name, but feel free to use whichever you prefer).

In the same way, we would define a list comprehension. If we replace the square brackets with parentheses, we get a generator that results from the expression. Generator expressions can also be passed directly to functions that work with iterables, such as sum() and max():

>>> [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> (x**2 for x in range(10))
<generator object <genexpr> at 0x...>
>>> sum(x**2 for x in range(10))
285

Always pass a generator expression, instead of a list comprehension, to functions that expect iterables, such as min(), max(), and sum(). This is more efficient and Pythonic.

What the previous recommendation means is to try to avoid passing lists to functions that already work with generators. The example in the next code is something you would want to avoid, and favor the approach from the previous listing:

>>> sum([x**2 for x in range(10)])  # here the list can be avoided

And, of course, you can assign a generator expression to a variable and use it somewhere else (as with comprehensions). Keep in mind that there is an important distinction in this case, because we're talking about generators here. A list can be reused and iterated multiple times, but a generator will be exhausted after it has been iterated over. For this reason, make sure the result of the expression is consumed only once, or you'll get unexpected results.

Remember that generators are exhausted after they're iterated over, because they don't hold all the data in memory.

A common approach is to create new generator expressions in the code. This way, the first one will be exhausted after is iterated, but then a new one is created. Chaining generator expressions this way is useful and helps to save memory as well as to make the code more expressive because it's resolving different iterations in different steps. One scenario where this is useful is when you need to apply multiple filters on an iterable; you can achieve this by using multiple generator expressions that act as chained filters.

Now that we have a new tool in our toolbox (iterators), let's see how we can use it to write more idiomatic code.

Iterating idiomatically

In this section, we will first explore some idioms that come in handy when we have to deal with iteration in Python. These code recipes will help us get a better idea of the types of things we can do with generators (especially after we have already seen generator expressions), and how to solve typical problems in relation to them.

Once we have seen some idioms, we will move on to exploring iteration in Python in more depth, analyzing the methods that make iteration possible, and how iterable objects work.

Idioms for iteration

We are already familiar with the built-in enumerate() function that, given an iterable, will return another one on which the element is a tuple, whose first element is the index of the second one (corresponding to the element in the original iterable):

>>> list(enumerate("abcdef"))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]

We wish to create a similar object, but in a more low-level fashion; one that can simply create an infinite sequence. We want an object that can produce a sequence of numbers, from a starting one, without any limits.

An object as simple as the following one can do the trick. Every time we call this object, we get the next number of the sequence ad infinitum:

class NumberSequence:
    def __init__(self, start=0):
        self.current = start
    def next(self):
        current = self.current
        self.current += 1
        return current

Based on this interface, we would have to use this object by explicitly invoking its next() method:

>>> seq = NumberSequence()
>>> seq.next()
0
>>> seq.next()
1
>>> seq2 = NumberSequence(10)
>>> seq2.next()
10
>>> seq2.next()
11

But with this code, we cannot reconstruct the enumerate() function as we would like to, because its interface does not support being iterated over a regular Python for loop, which also means that we cannot pass it as a parameter to functions that expect something to iterate over. Notice how the following code fails:

>>> list(zip(NumberSequence(), "abcdef"))
Traceback (most recent call last):
  File "...", line 1, in <module>
TypeError: zip argument #1 must support iteration

The problem lies in the fact that NumberSequence does not support iteration. To fix this, we have to make the object an iterable by implementing the magic method __iter__(). We have also changed the previous next() method, by using the __next__ magic method, which makes the object an iterator:

class SequenceOfNumbers:
    def __init__(self, start=0):
        self.current = start
    def __next__(self):
        current = self.current
        self.current += 1
        return current
    def __iter__(self):
        return self

This has an advantage—not only can we iterate over the element, but we also don't even need the .next() method anymore because having __next__() allows us to use the next() built-in function:

>>> list(zip(SequenceOfNumbers(), "abcdef"))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]
>>> seq = SequenceOfNumbers(100)
>>> next(seq)
100
>>> next(seq)
101

This makes use of the iteration protocol. Similar to the context manager protocol we have explored in previous chapters, which consists of the __enter__ and __exit__ methods, this protocol relies on the __iter__ and __next__ methods.

Having these protocols in Python has an advantage: everyone that knows Python will be familiar with this interface already, so there's a sort of "standard contract." This means, instead of having to define your own methods and agree with the team (or any potential reader of the code), that this is the expected standard or protocol your code works with (as with our custom next() method in the first example); Python already provides an interface and has a protocol already. We only have to implement it properly.

The next() function

The next() built-in function will advance the iterable to its next element and return it:

>>> word = iter("hello")
>>> next(word)
'h'
>>> next(word)
'e'  # ...

If the iterator does not have more elements to produce, the StopIteration exception is raised:

>>> ...
>>> next(word)
'o'
>>> next(word)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

This exception signals that the iteration is over and that there are no more elements to consume.

If we wish to handle this case, besides catching the StopIteration exception, we could provide this function with a default value in its second parameter. Should this be provided, it will be the return value in lieu of throwing StopIteration:

>>> next(word, "default value")
'default value'

It is advisable to use the default value most of the time, to avoid having exceptions at runtime in our programs. If we are absolutely sure that the iterator we're dealing with cannot be empty, it's still better to be implicit (and intentional) about it, and not rely on side effects of built-in functions (that is, to properly assert the case).

The next() function can be quite useful in combination with generator expressions, in situations where we want to look for the first elements of an iterable that meets certain criteria. We'll see examples of this idiom throughout the chapter, but the main idea is to use this function instead of creating a list comprehension and then taking its first element.

Using a generator

The previous code can be simplified significantly by simply using a generator. Generator objects are iterators. This way, instead of creating a class, we can define a function that yields the values as needed:

def sequence(start=0):
    while True:
        yield start
        start += 1

Remember that from our first definition, the yield keyword in the body of the function makes it a generator. Because it is a generator, it's perfectly fine to create an infinite loop like this, because, when this generator function is called, it will run all the code until the next yield statement is reached. It will produce its value and suspend there:

>>> seq = sequence(10)
>>> next(seq)
10
>>> next(seq)
11
>>> list(zip(sequence(), "abcdef"))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]

This difference can be thought of as an analogy of the different ways there are to create a decorator, as we explored in the previous chapter (with an object of functions). Here as well, we can use a generator function, or an iterable object, as in the previous section. Whenever is possible, constructing a generator is recommended, because it's syntactically simpler, and therefore easier to understand.

Itertools

Working with iterable objects has the advantage that the code blends better with Python itself because iteration is a key component of the language. Besides that, we can take full advantage of the itertools module (ITER-01). Actually, the sequence() generator we just created is fairly similar to itertools.count(). However, there is more we can do.

One of the nicest things about iterators, generators, and itertools is that they are composable objects that can be chained together.

For instance, going back to our first example that processed purchases in order to get some metrics, what if we want to do the same, but only for those values over a certain threshold? The naïve approach to solving this problem would be to place the condition while iterating:

# ...
    def process(self):
        for purchase in self.purchases:
            if purchase > 1000.0:
                ...

This is not only non-Pythonic, but it's also rigid (and rigidity is a trait that denotes bad code). It doesn't handle changes very well. What if the number changes now? Do we pass it by parameter? What if we need more than one? What if the condition is different (less than, for instance)? Do we pass a lambda?

These questions should not be answered by this object, whose sole responsibility is to compute a set of well-defined metrics over a stream of purchases represented as numbers. And, of course, the answer is no. It would be a huge mistake to make such a change (once again, clean code is flexible, and we don't want to make it rigid by coupling this object to external factors). These requirements will have to be addressed elsewhere.

It's better to keep this object independent of its clients. The less responsibility this class has, the more useful it will be for more clients, hence enhancing its chances of being reused.

Instead of changing this code, we're going to keep it as it is and assume that the new data is filtered according to whatever requirements each customer of the class has.

For instance, if we wanted to process only the first 10 purchases that amount to more than 1000, we would do the following:

>>> from itertools import islice
>>> purchases = islice(filter(lambda p: p > 1000.0, purchases), 10)
>>> stats = PurchasesStats(purchases).process()  # ...

There is no memory penalization for filtering this way because since they are all generators, the evaluation is always lazy. This gives us the power of thinking as if we had filtered the entire set at once and then passed it to the object, but without actually fitting everything in memory.

Keep in mind the trade-off mentioned at the beginning of the chapter, between memory and CPU usage. While the code might use less memory, it could take up more CPU time, but most of the times, this is acceptable, when we have to process lots of objects in memory while keeping the code maintainable.

Simplifying code through iterators

Now, we will briefly discuss some situations that can be improved with the help of iterators, and occasionally the itertools module. After discussing each case, and its proposed optimization, we will close each point with a corollary.

Repeated iterations

Now that we have seen more about iterators, and introduced the itertools module, we can show you how one of the first examples of this chapter (the one for computing statistics about some purchases) can be dramatically simplified:

def process_purchases(purchases):
    min_, max_, avg = itertools.tee(purchases, 3)
    return min(min_), max(max_), median(avg)

In this example, itertools.tee will split the original iterable into three new ones. We will use each of these for the different kinds of iterations that we require, without needing to repeat three different loops over purchases.

The reader can simply verify that if we pass an iterable object as the purchases parameter, this one is traversed only once (thanks to the itertools.tee function [TEE]), which was our main requirement. It is also possible to verify how this version is equivalent to our original implementation. In this case, there is no need to manually raise ValueError because passing an empty sequence to the min() function will do this.

If you are thinking about running a loop over the same object more than once, stop and think if itertools.tee can be of any help.

The itertools module contains many useful functions and nice abstractions that come in handy when dealing with iterations in Python. It also contains good recipes about how to solve typical iteration problems in an idiomatic fashion. As general advice, if you're thinking about how to solve a particular problem that involves iteration, go and take a look at this module. Even if the answer isn't literally there, it'll be good inspiration.

Nested loops

In some situations, we need to iterate over more than one dimension, looking for a value, and nested loops come as the first idea. When the value is found, we need to stop iterating, but the break keyword doesn't work entirely because we have to escape from two (or more) for loops, not just one.

What would be the solution to this? A flag signaling escape? No. Raising an exception? No, this would be the same as the flag, but even worse because we know that exceptions are not to be used for control flow logic. Moving the code to a smaller function and returning it? Close, but not quite.

The answer is, whenever possible, flatten the iteration to a single for loop.

This is the kind of code we would like to avoid:

def search_nested_bad(array, desired_value):
    coords = None
    for i, row in enumerate(array):
        for j, cell in enumerate(row):
            if cell == desired_value:
                coords = (i, j)
                break
        if coords is not None:
            break
    if coords is None:
        raise ValueError(f"{desired_value} not found")
    logger.info("value %r found at [%i, %i]", desired_value, *coords)
    return coords

And here is a simplified version of it that does not rely on flags to signal termination, and has a simpler, more compact structure of iteration:

def _iterate_array2d(array2d):
    for i, row in enumerate(array2d):
        for j, cell in enumerate(row):
            yield (i, j), cell
def search_nested(array, desired_value):
    try:
        coord = next(
            coord
            for (coord, cell) in _iterate_array2d(array)
            if cell == desired_value
        )
    except StopIteration as e:
        raise ValueError(f"{desired_value} not found") from e
    logger.info("value %r found at [%i, %i]", desired_value, *coord)
    return coord

It's worth mentioning how the auxiliary generator that was created works as an abstraction for the iteration that's required. In this case, we just need to iterate over two dimensions, but if we needed more, a different object could handle this without the client needing to know about it. This is the essence of the iterator design pattern, which, in Python, is transparent, since it supports iterator objects automatically, which is the topic covered in the next section.

Try to simplify the iteration as much as possible with as many abstractions as are required, flattening the loops wherever possible.

Hopefully, this example serves as inspiration to you to get the idea that we can use generators for something more than just saving memory. We can take advantage of the iteration as an abstraction. That is, we can create abstractions not only by defining classes or functions but also by taking advantage of the syntax of Python. In the same way that we have seen how to abstract away some logic behind a context manager (so we don't know the details of what happens under the with statement), we can do the same with iterators (so we can forget the underlying logic of a for loop).

That's why we will start exploring how the iterator pattern works in Python, starting with the next section.

The iterator pattern in Python

Here, we will take a small detour from generators to understand iteration in Python more deeply. Generators are a particular case of iterable objects, but iteration in Python goes beyond generators, and being able to create good iterable objects will give us the chance to create more efficient, compact, and readable code.

In the previous code listings, we have been seeing examples of iterable objects that are also iterators, because they implement both the __iter__() and __next__() magic methods. While this is fine in general, it's not strictly required that they always have to implement both methods, and here we'll show the subtle differences between an iterable object (one that implements __iter__) and an iterator (that implements __next__).

We also explore other topics related to iterations, such as sequences and container objects.

The interface for iteration

An iterable is an object that supports iteration, which, at a very high level, means that we can run a for .. in ... loop over it, and it will work without any issues. However, iterable does not mean the same as iterator.

Generally speaking, an iterable is just something we can iterate, and it uses an iterator to do so. This means that in the __iter__ magic method, we would like to return an iterator, namely, an object with a __next__() method implemented.

An iterator is an object that only knows how to produce a series of values, one at a time, when it's being called by the already explored built-in next() function, while the iterator is not called, it's simply frozen, sitting idly by until it's called again for the next value to produce. In this sense, generators are iterators.

Python concept

Magic method

Considerations

Iterable

__iter__

They work with an iterator to construct the iteration logic.

These objects can be iterated in a for ... in ...: loop.

Iterator

__next__

Define the logic for producing values one at a time.

The StopIteration exception signals that the iteration is over.

The values can be obtained one by one via the built-in next() function.

Table 7.1: Iterables and iterators

In the following code, we will see an example of an iterator object that is not iterable—it only supports invoking its values, one at a time. Here, the name sequence refers just to a series of consecutive numbers, not to the sequence concept in Python, which we will explore later on:

class SequenceIterator:
    def __init__(self, start=0, step=1):
        self.current = start
        self.step = step
    def __next__(self):
        value = self.current
        self.current += self.step
        return value

Notice that we can get the values of the sequence one at a time, but we can't iterate over this object (this is fortunate because it would otherwise result in an endless loop):

>>> si = SequenceIterator(1, 2)
>>> next(si)
1
>>> next(si)
3
>>> next(si)
5
>>> for _ in SequenceIterator(): pass
... 
Traceback (most recent call last):
  ...
TypeError: 'SequenceIterator' object is not iterable

The error message is clear, as the object doesn't implement __iter__().

Just for explanatory purposes, we can separate the iteration in another object (again, it would be enough to make the object implement both __iter__ and __next__, but doing so separately will help clarify the distinctive point we're trying to make in this explanation).

Sequence objects as iterables

As we have just seen, if an object implements the __iter__() magic method, it means it can be used in a for loop. While this is a great feature, it's not the only possible form of iteration we can achieve. When we write a for loop, Python will try to see if the object we're using implements __iter__, and if it does, it will use that to construct the iteration, but if it doesn't, there are fallback options.

If the object happens to be a sequence (meaning that it implements the __getitem__() and __len__() magic methods), it can also be iterated. If that is the case, the interpreter will then provide values in sequence, until the IndexError exception is raised, which, analogous to the aforementioned StopIteration, also signals the stop for the iteration.

With the sole purpose of illustrating such a behavior, we will run the following experiment that shows a sequence object that implements map() over a range of numbers:

# generators_iteration_2.py
class MappedRange:
    """Apply a transformation to a range of numbers."""
    def __init__(self, transformation, start, end):
        self._transformation = transformation
        self._wrapped = range(start, end)
    def __getitem__(self, index):
        value = self._wrapped.__getitem__(index)
        result = self._transformation(value)
        logger.info("Index %d: %s", index, result)
        return result
    def __len__(self):
        return len(self._wrapped)

Keep in mind that this example is only designed to illustrate that an object such as this one can be iterated with a regular for loop. There is a logging line placed in the __getitem__ method to explore what values are passed while the object is being iterated, as we can see from the following test:

>>> mr = MappedRange(abs, -10, 5)
>>> mr[0]
Index 0: 10
10
>>> mr[-1]
Index -1: 4
4
>>> list(mr)
Index 0: 10
Index 1: 9
Index 2: 8
Index 3: 7
Index 4: 6
Index 5: 5
Index 6: 4
Index 7: 3
Index 8: 2
Index 9: 1
Index 10: 0
Index 11: 1
Index 12: 2
Index 13: 3
Index 14: 4
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4]

As a word of caution, it's important to highlight that while it is useful to know this, it's also a fallback mechanism for when the object doesn't implement __iter__, so most of the time we'll want to resort to these methods by thinking about creating proper sequences, and not just objects we want to iterate over.

When thinking about designing an object for iteration, favor a proper iterable object (with __iter__), rather than a sequence that can coincidentally also be iterated.

Iterables are an important part of Python, not only because of the capabilities they offer to us as software engineers, but also because they play a fundamental role in the internals of Python.

We have seen in A brief introduction to asynchronous code in Chapter 2, Pythonic Code, how to read asynchronous code. Now that we have also explored iterators in Python, we can see how these two concepts are related. In particular, the next section explores coroutines, and we'll see how iterators are at the core of them.

Coroutines

The idea of a coroutine is to have a function, whose execution can be suspended at a given point in time, to be later resumed. By having this kind of functionality, a program might be able to suspend a part of the code, in order to dispatch something else for processing, and then come back to this original point to resume.

As we already know, generator objects are iterables. They implement __iter__() and __next__(). This is provided by Python automatically so that when we create a generator object function, we get an object that can be iterated or advanced through the next() function.

Besides this basic functionality, they have more methods so that they can work as coroutines (PEP-342). Here, we will explore how generators evolved into coroutines to support the basis of asynchronous programming before we go into more detail in the next section, where we will explore the new features of Python and the syntax that covers programming asynchronously.

The basic methods added in PEP-342 to support coroutines are as follows:

.close()
.throw(ex_type[, ex_value[, ex_traceback]])
.send(value)

Python takes advantage of generators in order to create coroutines. Because generators can naturally suspend, they're a convenient starting point. But generators weren't enough as they were originally thought to be, so these methods were added. This is because typically, it's not enough to just be able to suspend some part of the code; you'd also want to communicate with it (pass data, and signal about changes in the context).

By exploring each method in more detail, we'll be able to learn more about the internals of coroutines in Python. After this, I'll present another recapitulation of how asynchronous programming works, but unlike the one presented in Chapter 2, Pythonic Code, this one will relate to the internal concepts we just learned.

The methods of the generator interface

In this section, we will explore what each of the aforementioned methods does, how it works, and how it is expected to be used. By understanding how to use these methods, we will be able to make use of simple coroutines.

Later on, we will explore more advanced uses of coroutines, and how to delegate to sub-generators (coroutines) in order to refactor code, and how to orchestrate different coroutines.

close()

When calling this method, the generator will receive the GeneratorExit exception. If it's not handled, then the generator will finish without producing any more values, and its iteration will stop.

This exception can be used to handle a finishing status. In general, if our coroutine does some sort of resource management, we want to catch this exception and use that control block to release all resources being held by the coroutine. It is similar to using a context manager or placing the code in the finally block of an exception control, but handling this exception specifically makes it more explicit.

In the following example, we have a coroutine that makes use of a database handler object that holds a connection to a database, and runs queries over it, streaming data by pages of a fixed length (instead of reading everything that is available at once):

def stream_db_records(db_handler):
    try:
        while True:
            yield db_handler.read_n_records(10)
    except GeneratorExit:
        db_handler.close()

At each call to the generator, it will return 10 rows obtained from the database handler, but when we decide to explicitly finish the iteration and call close(), we also want to close the connection to the database:

>>> streamer = stream_db_records(DBHandler("testdb"))
>>> next(streamer)
[(0, 'row 0'), (1, 'row 1'), (2, 'row 2'), (3, 'row 3'), ...]
>>> next(streamer)
[(0, 'row 0'), (1, 'row 1'), (2, 'row 2'), (3, 'row 3'), ...]
>>> streamer.close()
INFO:...:closing connection to database 'testdb'

Use the close() method on generators to perform finishing-up tasks when needed.

This method is intended to be used for resource cleanup, so you'd typically use it for manually freeing resources when you couldn't do this automatically (for example, if you didn't use a context manager). Next, we'll see how to pass exceptions to the generator.

throw(ex_type[, ex_value[, ex_traceback]])

This method will throw the exception at the line where the generator is currently suspended. If the generator handles the exception that was sent, the code in that particular except clause will be called; otherwise, the exception will propagate to the caller.

Here, we are modifying the previous example slightly to show the difference when we use this method for an exception that is handled by the coroutine, and when it's not:

class CustomException(Exception):
    """A type of exception that is under control."""
def stream_data(db_handler):
    while True:
        try:
            yield db_handler.read_n_records(10)
        except CustomException as e:
            logger.info("controlled error %r, continuing", e)
        except Exception as e:
            logger.info("unhandled error %r, stopping", e)
            db_handler.close()
            break

Now, it is a part of the control flow to receive a CustomException, and, in such a case, the generator will log an informative message (of course, we can adapt this according to our business logic on each case), and move on to the next yield statement, which is the line where the coroutine reads from the database and returns that data.

This particular example handles all exceptions, but if the last block (except Exception:) wasn't there, the result would be that the generator is raised at the line where the generator is paused (again, yield), and it will propagate from there to the caller:

>>> streamer = stream_data(DBHandler("testdb"))
>>> next(streamer)
[(0, 'row 0'), (1, 'row 1'), (2, 'row 2'), (3, 'row 3'), (4, 'row 4'), ...]
>>> next(streamer)
[(0, 'row 0'), (1, 'row 1'), (2, 'row 2'), (3, 'row 3'), (4, 'row 4'), ...]
>>> streamer.throw(CustomException)
WARNING:controlled error CustomException(), continuing
[(0, 'row 0'), (1, 'row 1'), (2, 'row 2'), (3, 'row 3'), (4, 'row 4'), ...]
>>> streamer.throw(RuntimeError)
ERROR:unhandled error RuntimeError(), stopping
INFO:closing connection to database 'testdb'
Traceback (most recent call last):
  ...
StopIteration

When our exception from the domain was received, the generator continued. However, when it received another exception that was not expected, the default block caught where we closed the connection to the database and finished the iteration, which resulted in the generator being stopped. As we can see from the StopIteration that was raised, this generator can't be iterated further.

send(value)

In the previous example, we created a simple generator that reads rows from a database, and when we wished to finish its iteration, this generator released the resources linked to the database. This is a good example of using one of the methods that generators provide (close()), but there is more we can do.

An obvservation of the generator is that it was reading a fixed number of rows from the database.

We would like to parametrize that number (10) so that we can change it throughout different calls. Unfortunately, the next() function does not provide us with options for that. But luckily, we have send():

def stream_db_records(db_handler):
    retrieved_data = None
    previous_page_size = 10
    try:
        while True:
            page_size = yield retrieved_data
            if page_size is None:
                page_size = previous_page_size
            previous_page_size = page_size
            retrieved_data = db_handler.read_n_records(page_size)
    except GeneratorExit:
        db_handler.close()

The idea is that we have now made the coroutine able to receive values from the caller by means of the send() method. This method is the one that actually distinguishes a generator from a coroutine because when it's used, it means that the yield keyword will appear on the right-hand side of the statement, and its return value will be assigned to something else.

In coroutines, we generally find the yield keyword to be used in the following form:

receive = yield produced

yield, in this case, will do two things. It will send produced back to the caller, which will pick it up on the next round of iteration (after calling next(), for example), and it will suspend there. At a later point, the caller will want to send a value back to the coroutine by using the send() method. This value will become the result of the yield statement, assigned in this case to the variable named receive.

Sending values to the coroutine only works when this one is suspended at a yield statement, waiting for something to produce. For this to happen, the coroutine will have to be advanced to that status. The only way to do this is by calling next() on it. This means that before sending anything to the coroutine, this has to be advanced at least once via the next() method. Failure to do so will result in an exception:

>>> def coro():
...     y = yield
...
>>> c = coro()
>>> c.send(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't send non-None value to a just-started generator
>>>

Always remember to advance a coroutine by calling next() before sending any values to it.

Back to our example. We are changing the way elements are produced or streamed to make it able to receive the length of the records it expects to read from the database.

The first time we call next(), the generator will advance up to the line containing yield; it will provide a value to the caller (None, as set in the variable), and it will suspend there). From there, we have two options. If we choose to advance the generator by calling next(), the default value of 10 will be used, and it will go on with this as usual. This is because calling next() is technically the same as send(None), but this is covered in the if statement that will handle the value that we previously set.

If, on the other hand, we decide to provide an explicit value via send(<value>), this one will become the result of the yield statement, which will be assigned to the variable containing the length of the page to use, which, in turn, will be used to read from the database.

Successive calls will have this logic, but the important point is that now we can dynamically change the length of the data to read in the middle of the iteration, at any point.

Now that we understand how the previous code works, most Pythonistas would expect a simplified version of it (after all, Python is also about brevity and clean and compact code):

def stream_db_records(db_handler):
    retrieved_data = None
    page_size = 10
    try:
        while True:
            page_size = (yield retrieved_data) or page_size
            retrieved_data = db_handler.read_n_records(page_size)
    except GeneratorExit:
        db_handler.close()

This version is not only more compact, but it also illustrates the idea better. The parentheses around yield makes it clearer that it's a statement (think of it as if it were a function call), and that we are using the result of it to compare it against the previous value.

This works as we expect it does, but we always have to remember to advance the coroutine before sending any data to it. If we forget to call the first next(), we'll get a TypeError. This call could be ignored for our purposes because it doesn't return anything we'll use.

It would be good if we could use the coroutine directly, right after it is created, without having to remember to call next() the first time, every time we are going to use it. Some authors (PYCOOK) devised an interesting decorator to achieve this. The idea of this decorator is to advance the coroutine, so the following definition works automatically:

@prepare_coroutine
def auto_stream_db_records(db_handler):
    retrieved_data = None
    page_size = 10
    try:
        while True:
            page_size = (yield retrieved_data) or page_size
            retrieved_data = db_handler.read_n_records(page_size)
    except GeneratorExit:
        db_handler.close()

>>> streamer = auto_stream_db_records(DBHandler("testdb"))
>>> len(streamer.send(5))
5

Keep in mind, these are the fundamentals of how coroutines work in Python. By following these examples, you'll get an idea of what's actually going on in Python when working with coroutines. However, in modern Python, you wouldn't typically write these sorts of coroutines by yourself, because there's new syntax available (which we have mentioned, but we'll revisit to see how they relate to the ideas we have just seen).

Before jumping into the new syntactic capabilities, we need to explore the last jump the coroutines took in terms of their added functionality, in order to bridge missing gaps. After that, we'll be able to understand the meaning behind each keyword and statement used in asynchronous programming.

More advanced coroutines

So far, we have a better understanding of coroutines, and we can create simple ones to handle small tasks. We can say that these coroutines are, in fact, just more advanced generators (and that would be right, coroutines are just fancy generators), but, if we actually want to start supporting more complex scenarios, we usually have to go for a design that handles many coroutines concurrently, and that requires more features.

When handling many coroutines, we find new problems. As the control flow of our application becomes more complex, we want to pass values up and down the stack (as well as exceptions), be able to capture values from sub-coroutines we might call at any level, and finally, schedule multiple coroutines to run toward a common goal.

To make things simpler, generators had to be extended once again. That is what PEP-380 addressed by changing the semantics of generators so that they can return values and by introducing the new yield from construction.

Returning values in coroutines

As introduced at the beginning of this chapter, iteration is a mechanism that calls next() on an iterable object many times until a StopIteration exception is raised.

So far, we have been exploring the iterative nature of generators—we produce values one at a time, and, in general, we only care about each value as it's being produced at every step of the for loop. This is a very logical way of thinking about generators, but coroutines have a different idea; even though they are technically generators, they weren't conceived with the idea of iteration in mind, but with the goal of suspending the execution of code until it's resumed later on.

This is an interesting challenge; when we design a coroutine, we usually care more about suspending the state rather than iterating (and iterating a coroutine would be an odd case). The challenge lies in that it is easy to mix them both. This is because of a technical implementation detail; the support for coroutines in Python was built upon generators.

If we want to use coroutines to process some information and suspend its execution, it would make sense to think of them as lightweight threads (or green threads, as they are called in other platforms). In such a case, it would make sense if they could return values, much like calling any other regular function.

But let's remember that generators are not regular functions, so in a generator, the construction value = generator() will do nothing other than create a generator object. What would be the semantics for making a generator return a value? It will have to be after the iteration is done.

When a generator returns a value, its iteration is immediately stopped (it can't be iterated any further). To preserve the semantics, the StopIteration exception is still raised, and the value to be returned is stored inside the exception object. It's the responsibility of the caller to catch it.

In the following example, we are creating a simple generator that produces two values and then returns a third. Notice how we have to catch the exception in order to get this value, and how it's stored precisely inside the exception under the attribute named value:

>>> def generator():
...     yield 1
...     yield 2
...     return 3
... 
>>> value = generator()
>>> next(value)
1
>>> next(value)
2
>>> try:
...     next(value)
... except StopIteration as e:
...     print(f">>>>>> returned value: {e.value}")
... 
>>>>>> returned value: 3

As we'll see later, this mechanism is used to make coroutines return values. Before PEP-380, this didn't make any sense, and any attempt at having a return statement inside a generator was considered a syntax error. But now, the idea is that, when the iteration is over, we want to return a final value, and the way to provide it is to store it in the exception being raised at the end of the iteration (StopIteration). That might not be the cleanest approach, but it's completely backward-compatible, as it doesn't change the interface of the generator.

Delegating into smaller coroutines – the 'yield from' syntax

The previous feature is interesting in the sense that it opens up a lot of new possibilities with coroutines (generators), now that they can return values. But this feature, by itself, would not be so useful without proper syntax support, because catching the returned value this way is a bit cumbersome.

This is one of the main features of the yield from syntax. Among other things (that we'll review in detail), it can collect the value returned by a sub-generator. Remember that we said that returning data in a generator was nice, but that, unfortunately, writing statements as value = generator() wouldn't work? Well, writing them as value = yield from generator() would.

The simplest use of yield from

In its most basic form, the new yield from syntax can be used to chain generators from nested for loops into a single one, which will end up with a single string of all the values in a continuous stream.

A canonical example is about creating a function similar to itertools.chain() from the standard library. This is a very nice function because it allows you to pass any number of iterables and will return them all together in one stream.

The naïve implementation might look like this:

def chain(*iterables):
    for it in iterables:
        for value in it:
            yield value

It receives a variable number of iterables, traverses through all of them, and since each value is iterable, it supports a for... in.. construction, so we have another for loop to get every value inside each particular iterable, which is produced by the caller function.

This might be helpful in multiple cases, such as chaining generators together or trying to iterate things that it wouldn't normally be possible to compare in one go (such as lists with tuples, and so on).

However, the yield from syntax allows us to go further and avoid the nested loop because it's able to produce the values from a sub-generator directly. In this case, we could simplify the code like this:

def chain(*iterables):
    for it in iterables:
        yield from it

Notice that for both implementations, the behavior of the generator is exactly the same:

>>> list(chain("hello", ["world"], ("tuple", " of ", "values.")))
['h', 'e', 'l', 'l', 'o', 'world', 'tuple', ' of ', 'values.']

This means that we can use yield from over any other iterable, and it will work as if the top-level generator (the one the yield from is using) were generating those values itself.

This works with any iterable, and even generator expressions aren't the exception. Now that we're familiar with its syntax, let's see how we could write a simple generator function that will produce all the powers of a number (for instance, if provided with all_powers(2, 3), it will have to produce 2^0, 2^1,... 2^3):

def all_powers(n, pow):
    yield from (n ** i for i in range(pow + 1))

While this simplifies the syntax a bit, saving one line of a for statement isn't a big advantage, and it wouldn't justify adding such a change to the language.

Indeed, this is actually just a side effect and the real raison d'être of the yield from construction is what we are going to explore in the following two sections.

Capturing the value returned by a sub-generator

In the following example, we have a generator that calls another two nested generators, producing values in a sequence. Each one of these nested generators returns a value, and we will see how the top-level generator is able to effectively capture the return value since it's calling the internal generators through yield from:

def sequence(name, start, end):
    logger.info("%s started at %i", name, start)
    yield from range(start, end)
    logger.info("%s finished at %i", name, end)
    return end
def main():
    step1 = yield from sequence("first", 0, 5)
    step2 = yield from sequence("second", step1, 10)
    return step1 + step2

This is a possible execution of the code in main while it's being iterated:

>>> g = main()
>>> next(g)
INFO:generators_yieldfrom_2:first started at 0
0
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
4
>>> next(g)
INFO:generators_yieldfrom_2:first finished at 5
INFO:generators_yieldfrom_2:second started at 5
5
>>> next(g)
6
>>> next(g)
7
>>> next(g)
8
>>> next(g)
9
>>> next(g)
INFO:generators_yieldfrom_2:second finished at 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: 15

The first line of main delegates into the internal generator, and produces the values, extracting them directly from it. This is nothing new, as we have already seen. Notice, though, how the sequence() generator function returns the end value, which is assigned in the first line to the variable named step1, and how this value is correctly used at the start of the following instance of that generator.

In the end, this other generator also returns the second end value (10), and the main generator, in turn, returns the sum of them (5+10=15), which is the value we see once the iteration has stopped.

We can use yield from to capture the last value of a coroutine after it has finished its processing.

With this example and the ones presented in the previous section, you can get an idea of what the yield from construction does in Python. The yield from construction will take the generator, and forward the iteration of it downstream, but once it's done, it'll catch its StopIteration exception, get the value of it, and return that value to the caller function. The value attribute of the StopIteration exception becomes the result of the statement.

This is a powerful construction, because in conjunction with the topic of the next section (how to send and receive contextual information from a sub-generator), this means coroutines can take the shape of something similar to threads.

Sending and receiving data to and from a sub-generator

Now, we will see the other nice feature of the yield from syntax, which is probably what gives it its full power. As we already introduced when we explored generators acting as coroutines, we know that we can send values and throw exceptions at them, and, in such cases, the coroutine will either receive the value for its internal processing, or it will have to handle the exception accordingly.

If we now have a coroutine that delegates into other ones (such as in the previous example), we would also like to preserve this logic. Having to do so manually would be quite complex (you can take a look at the code described in PEP-380 if we didn't have this handled by yield from automatically).

In order to illustrate this, let's keep the same top-level generator (main) unmodified with respect to the previous example (calling other internal generators), but let's modify the internal generators to make them able to receive values and handle exceptions.

The code is probably not idiomatic, only for the purposes of showing how this mechanism works:

def sequence(name, start, end):
    value = start
    logger.info("%s started at %i", name, value)
    while value < end:
        try:
            received = yield value
            logger.info("%s received %r", name, received)
            value += 1
        except CustomException as e:
            logger.info("%s is handling %s", name, e)
            received = yield "OK"
    return end

Now, we will call the main coroutine, not only by iterating it, but also by providing values and throwing exceptions at it in order to see how they are handled inside sequence:

>>> g = main()
>>> next(g)
INFO: first started at 0
0
>>> next(g)
INFO: first received None
1
>>> g.send("value for 1")
INFO: first received 'value for 1'
2
>>> g.throw(CustomException("controlled error"))
INFO: first is handling controlled error
'OK'
... # advance more times
INFO:second started at 5
5
>>> g.throw(CustomException("exception at second generator"))
INFO: second is handling exception at second generator
'OK'

This example is telling us a lot of different things. Notice how we never send values to sequence, but only to main, and even so, the code that is receiving those values is the nested generators. Even though we never explicitly send anything to sequence, it's receiving the data as it's being passed along by yield from.

The main coroutine calls two other coroutines internally, producing their values, and it will be suspended at a particular point in time in any of those. When it's stopped at the first one, we can see the logs telling us that it is that instance of the coroutine that received the value we sent. The same happens when we throw an exception to it. When the first coroutine finishes, it returns the value that was assigned in the variable named step1, and passed as input for the second coroutine, which will do the same (it will handle the send() and throw() calls, accordingly).

The same happens for the values that each coroutine produces. When we are at any given step, the return from calling send() corresponds to the value that the sub-coroutine (the one that main is currently suspended at) has produced. When we throw an exception that is being handled, the sequence coroutine produces the value OK, which is propagated to the called coroutine (main), and that in turn will end up at main's caller.

As anticipated, these methods, together with yield from, provide us with a lot of new functionality (something that can resemble threads). This opens up the doors for asynchronous programming, which we will explore next.

Asynchronous programming

With the constructions we have seen so far, we can create asynchronous programs in Python. This means that we can create programs that have many coroutines, schedule them to work in a particular order, and switch between them when they're suspended after a yield from has been called on each of them.

The main advantage that we can take from this is the possibility of parallelizing I/O operations in a non-blocking way. What we would need is a low-level generator (usually implemented by a third-party library) that knows how to handle the actual I/O while the coroutine is suspended. The idea is for the coroutine to effect suspension so that our program can handle another task in the meantime. The way the application would retrieve the control back is by means of the yield from statement, which will suspend and produce a value to the caller (as in the examples we saw previously when we used this syntax to alter the control flow of the program).

This is roughly the way asynchronous programming had been working in Python for quite a few years, until it was decided that better syntactic support was needed.

The fact that coroutines and generators are technically the same causes some confusion. Syntactically (and technically), they are the same, but semantically, they are different. We create generators when we want to achieve efficient iteration. We typically create coroutines with the goal of running non-blocking I/O operations.

While this difference is clear, the dynamic nature of Python would still allow developers to mix these different types of objects, ending up with a runtime error at a very late stage of the program. Remember that in the simplest and most basic form of the yield from syntax, we used this construction over iterable objects (we created a sort of chain function applied over strings, lists, and so on). None of these objects were coroutines, and it still worked. Then, we saw that we can have multiple coroutines, use yield from to send the value (or exceptions), and get some results back. These are clearly two very different use cases; however, if we write something along the lines of the following statement:

result = yield from iterable_or_awaitable()

It's not clear what iterable_or_awaitable returns. It can be a simple iterable such as a string, and it might still be syntactically correct. Or, it might be an actual coroutine. The cost of this mistake will be paid much later, at runtime.

For this reason, the typing system in Python had to be extended. Before Python 3.5, coroutines were just generators with a @coroutine decorator applied, and they were to be called with the yield from syntax. Now, there is a specific type of object the Python interpreter recognizes as such, that is, a coroutine.

This change heralded syntax changes as well. The await and async def syntax were introduced. The former is intended to be used instead of yield from, and it only works with awaitable objects (which coroutines conveniently happen to be). Trying to call await with something that doesn't respect the interface of an awaitable will raise an exception (this is a good example of how interfaces can help to achieve a more solid design, preventing runtime errors).

async def is the new way of defining coroutines, replacing the aforementioned decorator, and this actually creates an object that, when called, will return an instance of a coroutine. In the same way as when you invoke a generator function, the interpreter will return you a generator object, when you invoke an object defined with async def, it'll give you a coroutine object that has an __await__ method, and therefore can be used in await expressions.

Without going into all the details and possibilities of asynchronous programming in Python, we can say that despite the new syntax and the new types, this is not doing anything fundamentally different from the concepts we have covered in this chapter.

The idea behind programming asynchronously in Python is that there is an event loop (typically asyncio because it's the one that is included in the standard library, but there are many others that will work just the same) that manages a series of coroutines. These coroutines belong to the event loop, which is going to call them according to its scheduling mechanism. When each one of these runs, it will call our code (according to the logic we have defined inside the coroutine we programmed), and when we want to get control back to the event loop, we call await <coroutine>, which will process a task asynchronously. The event loop will resume and another coroutine will take place while that operation is left running.

This mechanism represents the basics of how asynchronous programming works in Python. You can think that the new syntax added for coroutines (async def / await) is just an API for you to write code in a way that's going to be called by the event loop. By default, that event loop will typically be asyncio because it's the one that comes in the standard library, but any event loop system that matches the API would work. This means you can use libraries like uvloop (https://github.com/MagicStack/uvloop) and trio (https://github.com/python-trio/trio), and the code would work the same. You can even register your own event loop, and it should also work the same (provided compliance with the API, that is).

In practice, there are more particularities and edge cases that are beyond the scope of this book. It is, however, worth mentioning that these concepts are related to the ideas introduced in this chapter and that this arena is another place where generators demonstrate being a core concept of the language, as there are many things constructed on top of them.

Magic asynchronous methods

I've made the case in previous chapters (and hopefully convinced you) that whenever possible, we can take advantage of the magic methods in Python in order to make the abstractions we created blend naturally with the syntax of the language and this way achieve better, more compact, and perhaps cleaner code.

But what happens if on any of these methods we need to call a coroutine? If we have to call await in a function, that means the function itself would have to be a coroutine (defined with async def), or else there will be a syntax error.

But then, how does this work with the current syntax and magic methods? It doesn't. We need new syntax, and new magic methods, in order to work with asynchronous programming. The good news is that they're analogous to the previous ones.

Here's a summary of the new magic methods and how they relate to the new syntax.

Concept

Magic methods

Syntax usage

Context manager

__aenter__

__aexit__

async with async_cm() as x:

...

Iteration

__aiter__

__anext__

async for e in aiter:

...

Table 7.2: Asynchronous syntax and their magic methods

This new syntax is mentioned in PEP-492 (https://www.python.org/dev/peps/pep-0492/).

Asynchronous context managers

The idea is simple: if we were to use a context manager but needed to call a coroutine on it, we couldn't use the normal __enter__ and __exit__ methods because they're defined as regular functions, so instead we need to use the new __aenter__ and __aexit__ coroutine methods. And instead of calling it merely using with, we'd have to use async with.

There's even an @asynccontextmanager decorator available in the contextlib module, to create an asynchronous context manager in the same way as shown before.

The async with syntax for asynchronous context managers works in a similar way: when the context is entered, the __aenter__ coroutine is called automatically, and when it's being exited, __aexit__ will trigger. It's even possible to group multiple asynchronous context managers in the same async with statement, but it's not possible to mix them with regular ones. An attempt of using a regular context manager with the async with syntax will fail with an AttributeError.

Our example from Chapter 2, Pythonic Code, would look like the following code if adapted to asynchronous programming:

@contextlib.asynccontextmanager
async def db_management():
    try:
        await stop_database()
        yield
    finally:
        await start_database()

Moreover, if we had more than one context manager that we wanted to use, we could do, for example:

@contextlib.asynccontextmanager
async def metrics_logger():
    yield await create_metrics_logger()
 
 
async def run_db_backup():
    async with db_management(), metrics_logger():
        print("Performing DB backup...")

As you'd expect, the contextlib module provides the abstract base class AbstractAsyncContextManager, which requires the implementation of the __aenter__ and __aexit__ methods.

Other magic methods

What happens with the rest of the magic methods? Do they all get their asynchronous counterpart? No, but there's something I wanted to point out about that: it shouldn't be needed.

Remember that achieving clean code is in part about making sure you distribute the responsibilities correctly in the code and place things in their proper places. To give an example, if you're thinking about calling a coroutine inside a __getattr__ method, there's something probably amiss in your design, as there should probably be a better place for that coroutine.

Coroutines that we await are used in order to have parts of our code running concurrently, so they typically relate to external resources being managed, whereas the logic we put in the rest of the magic methods (__getitem__, __getattr__, etc.) should be object-oriented code, or code that can be resolved in terms of solely the internal representation of that object.

By the same token (and also following up on good design practices), it wouldn't be good to make __init__ a coroutine, because we typically want lightweight objects that we can initialize safely without side effects. Even better, we have already covered the benefits of using dependency injection, so that's even more reason not to want an asynchronous initialization method: our object should work with dependencies already initialized.

The second case of the previous table, asynchronous iteration, is of more interest for the purposes of this chapter, so we'll explore it in the next section.

The syntax for asynchronous iteration (async for) works with any asynchronous iterator, whether it is created by us (as we'll see how to do in the next section), or whether it's an asynchronous generator (which we'll see in the section after that).

Asynchronous iteration

In the same way that we have the iterator objects we saw at the beginning of the chapter (that is, objects that support being iterated over with Python's built-in for loop), we can do the same, but in an asynchronous fashion.

Imagine we want to create an iterator to abstract the way in which we read data from an external source (like a database), but the part that extracts the data itself is a coroutine, so we couldn't call it during the already familiar __next__ operation as before. That's why we need to make use of the __anext__ coroutine.

The following example illustrates in a simple way how this can be achieved. Disregarding external dependencies, or any other accidental complexity, we'll focus on the methods that make this type of operation possible, in order to study them:

import asyncio
import random
 
 
async def coroutine():
    await asyncio.sleep(0.1)
    return random.randint(1, 10000)
 
 
class RecordStreamer:
    def __init__(self, max_rows=100) -> None:
        self._current_row = 0
        self._max_rows = max_rows
 
    def __aiter__(self):
        return self
 
    async def __anext__(self):
        if self._current_row < self._max_rows:
            row = (self._current_row, await coroutine())
            self._current_row += 1
            return row
        raise StopAsyncIteration

The first method, __aiter__, is used to indicate that the object is an asynchronous iterator. Just as in the synchronous version, most of the time it's enough to return self, and therefore it doesn't need to be a coroutine.

But __anext__, on the other hand, is precisely the part of our code where our asynchronous logic lies, so that needs to be a coroutine for starters. In this case, we're awaiting another coroutine in order to return part of the data to be returned.

It also needs a separate exception in order to signal the end of the iteration, in this case, called StopAsyncIteration.

This exception works in an analogous way, only that it's meant for the async for kind of loops. When encountered, the interpreter will finish the loop.

This sort of object can be used in the following form:

async for row in RecordStreamer(10):
    ...

You can clearly see how this is analogous to the synchronous version we explored at the beginning of the chapter. One important distinction though is that, as we would expect, the next() function wouldn't work on this object (it doesn't implement __next__ after all), so advancing an asynchronous generator by one place would require a different idiom.

Advancing the asynchronous iterator by one place could be achieved by doing something like the following:

await async_iterator.__anext__()

But more interesting constructions, like the ones we saw before about using the next() function to work over a generator expression to search for the first value that meets certain conditions, wouldn't be supported, because they're not capable of handling asynchronous iterators.

Inspired by the previous idiom, we can create a generator expression using the asynchronous iteration, and then take the first value from it. Better yet, we can create our own version of this function to work with asynchronous generators, which might look like this:

NOT_SET = object()
 
async def anext(async_generator_expression, default=NOT_SET):
    try:
        return await async_generator_expression.__anext__()
    except StopAsyncIteration:
        if default is NOT_SET:
            raise
        return default

Starting from Python 3.8, the asyncio module has a nice capability that allows us to interact with coroutines directly from the REPL. That way, we can test interactively how the previous code would work:

$ python -m asyncio
>>> streamer = RecordStreamer(10)
>>> await anext(streamer)
(0, 5017)
>>> await anext(streamer)
(1, 5257)
>>> await anext(streamer)
(2, 3507)
...
>>> await anext(streamer)
(9, 5440)
>>> await anext(streamer)
Traceback (most recent call last):
    ...
    raise StopAsyncIteration
StopAsyncIteration
>>>

You'll note that it resembles the original next() function both in terms of interface and behavior.

Now we know how to use iteration in asynchronous programming, but we can do better than that. Most of the time we just need a generator and not a whole iterator object. Generators have the advantage that their syntax makes them easier to write and understand, so in the next section, I'll mention how to create generators for asynchronous programs.

Asynchronous generators

Before Python 3.6, the functionality explored in the previous section was the only way to achieve asynchronous iteration in Python. Because of the intricacies of the coroutines and generators we explored in previous sections, trying to use the yield statement inside a coroutine was not entirely defined, hence not allowed (for example, would yield try to suspend the coroutine, or generate a value for the caller?).

Asynchronous generators were introduced in PEP-525 (https://www.python.org/dev/peps/pep-0525/).

The issue with the use of the yield keyword inside a coroutine was solved in this PEP, and it's now allowed, but with a different and clear meaning. Unlike the first example of coroutines we have seen, yield inside a coroutine properly defined (with async def) doesn't mean to suspend or pause the execution of that coroutine, but instead to produce a value for the caller. This is an asynchronous generator: same as the generators we've seen at the very beginning of the chapter, but that can be used in an asynchronous way (meaning they probably await other coroutines inside their definition).

The main advantage of asynchronous generators over iterators is the same advantage regular generators have; they allow us to achieve the same thing but in a more compact way.

As promised, the previous example looks more compact when written with an asynchronous generator:

async def record_streamer(max_rows):
    current_row = 0
    while current_row < max_rows:
        row = (current_row, await coroutine())
        current_row += 1
        yield row

This feels closer to a regular generator as the structure is the same except for the async def / await construction. Moreover, you'll have to remember fewer details (as to the methods that need implementation and the right exception that has to be triggered), so I'd recommend that whenever possible you try to favor asynchronous generators over iterators.

This concludes our journey through iteration in Python and asynchronous programming. In particular, this last topic we've just explored is the pinnacle of it, because it relates to all the concepts we've learned in this chapter.

Summary

Generators are everywhere in Python. Since their inception in Python a long time ago, they proved to be a great addition that makes programs more efficient and iteration much simpler.

As time passed by, and more complex tasks needed to be added to Python, generators helped again in supporting coroutines.

And, while in Python coroutines are generators, we still don't have to forget that they're semantically different. Generators are created with the idea of iteration, while coroutines have the goal of asynchronous programming (suspending and resuming the execution of a part of our program at any given time). This distinction became so important that it made Python's syntax (and type system) evolve.

Iteration and asynchronous programming constitute the last of the main pillars of Python programming. Now, it's time to see how everything fits together and to put all of these concepts we have been exploring over the past few chapters into action. This means that by now, you have a complete understanding of Python's capabilities.

It's now time to use this to your advantage, so in the next chapters, we'll see how to put these concepts into action, related to more general ideas of software engineering, such as testing, design patterns, and architecture.

We'll start this new part of our journey by exploring unit testing and refactoring in the next chapter.

References

Here is a list of information you can refer to:

PEP-234: Iterators (https://www.python.org/dev/peps/pep-0234/)
PEP-255: Simple Generators (https://www.python.org/dev/peps/pep-0255/)
ITER-01: Python's itertools module (https://docs.python.org/3/library/itertools.html)
GoF: The book written by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides named Design Patterns: Elements of Reusable Object-Oriented Software
PEP-342: Coroutines via Enhanced Generators (https://www.python.org/dev/peps/pep-0342/)
PYCOOK: The book written by Brian Jones and David Beazley named Python Cookbook: Recipes for Mastering Python 3, Third Edition
PY99: Fake threads (generators, coroutines, and continuations) (https://mail.python.org/pipermail/python-dev/1999-July/000467.html)
CORO-01: Co Routine (http://wiki.c2.com/?CoRoutine)
CORO-02: Generators Are Not Coroutines (http://wiki.c2.com/?GeneratorsAreNotCoroutines)
PEP-492: Coroutines with async and await syntax (https://www.python.org/dev/peps/pep-0492/)
PEP-525: Asynchronous Generators (https://www.python.org/dev/peps/pep-0525/)
TEE: The itertools.tee function (https://docs.python.org/3/library/itertools.html#itertools.tee)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Generators, Iterators, and Asynchronous Programming

Create new playlist

Sign In

Sign Up

7

Generators, Iterators, and Asynchronous Programming