Some performance considerations

So, we've seen that we have many different ways to achieve the same result. We can use any combination of map, zip, and filter, or choose to go with a comprehension, or maybe choose to use a generator, either function or expression. We may even decide to go with for loops; when the logic to apply to each running parameter isn't simple, they may be the best option.

Other than readability concerns though, let's talk about performance. When it comes to performance, usually there are two factors that play a major role: space and time.

Space means the size of the memory that a data structure is going to take up. The best way to choose is to ask yourself if you really need a list (or tuple) or if a simple generator function would work as well. If the answer is yes, go with the generator, it'll save a lot of space. The same goes for functions; if you don't actually need them to return a list or tuple, then you can transform them into generator functions as well.

Sometimes, you will have to use lists (or tuples), for example there are algorithms that scan sequences using multiple pointers or maybe they run over the sequence more than once. A generator function (or expression) can be iterated over only once and then it's exhausted, so in these situations, it wouldn't be the right choice.

Time is a bit harder than space because it depends on more variables and therefore it isn't possible to state that X is faster than Y with absolute certainty for all cases. However, based on tests run on Python today, we can say that on average, map exhibits performances similar to list comprehensions and generator expressions, while for loops are consistently slower.

In order to appreciate the reasoning behind these statements fully, we need to understand how Python works, and this is a bit outside the scope of this book, as it's too technical in detail. Let's just say that map and list comprehensions run at C-language speed within the interpreter, while a Python for loop is run as Python bytecode within the Python Virtual Machine, which is often much slower.

There are several different implementations of Python. The original one, and still the most common one, is CPython (https://github.com/python/cpython), which is written in C. C is one of the most powerful and popular programming languages still used today.

How about we do a small exercise and try to find out whether the claims I made are accurate? I will write a small piece of code that collects the results of divmod(a, b) for a certain set of integer pairs, (a, b). I will use the time function from the time module to calculate the elapsed time of the operations that I will perform:

# performances.py
from time import time
mx = 5000

t = time()  # start time for the for loop
floop = []
for a in range(1, mx):
    for b in range(a, mx):
        floop.append(divmod(a, b))
print('for loop: {:.4f} s'.format(time() - t))  # elapsed time

t = time()  # start time for the list comprehension
compr = [
    divmod(a, b) for a in range(1, mx) for b in range(a, mx)]
print('list comprehension: {:.4f} s'.format(time() - t))

t = time()  # start time for the generator expression
gener = list(
    divmod(a, b) for a in range(1, mx) for b in range(a, mx))
print('generator expression: {:.4f} s'.format(time() - t))

As you can see, we're creating three lists: floop, compr, and gener. Running the code produces the following:

$ python performances.py
for loop: 4.4814 s
list comprehension: 3.0210 s
generator expression: 3.4334 s

The list comprehension runs in ~67% of the time taken by the for loop. That's impressive. The generator expression came quite close to that, with a good ~77%. The reason the generator expression is slower is that we need to feed it to the list() constructor, and this has a little bit more overhead compared to a sheer list comprehension. If I didn't have to retain the results of those calculations, a generator would probably have been a more suitable option.

An interesting result is to notice that, within the body of the for loop, we're appending data to a list. This implies that Python does the work, behind the scenes, of resizing it every now and then, allocating space for items to be appended. I guessed that creating a list of zeros, and simply filling it with the results, might have sped up the for loop, but I was wrong. Check it for yourself, you just need mx * (mx - 1) // 2 elements to be preallocated.

Let's see a similar example that compares a for loop and a map call:

# performances.map.py
from time import time
mx = 2 * 10 ** 7

t = time()
absloop = []
for n in range(mx):
    absloop.append(abs(n))
print('for loop: {:.4f} s'.format(time() - t))

t = time()
abslist = [abs(n) for n in range(mx)]
print('list comprehension: {:.4f} s'.format(time() - t))

t = time()
absmap = list(map(abs, range(mx)))
print('map: {:.4f} s'.format(time() - t))

This code is conceptually very similar to the previous example. The only thing that has changed is that we're applying the abs function instead of the divmod one, and we have only one loop instead of two nested ones. Execution gives the following result:

$ python performances.map.py
for loop: 3.8948 s
list comprehension: 1.8594 s
map: 1.1548 s

And map wins the race: ~62% of the list comprehension and ~30% of the for loop. Take these results with a pinch of salt, as things might be different according to various factors, such as OS and Python version. But in general, I think it's safe to say that these results are good enough for having an idea when it comes to coding for performance.

Apart from the case-by-case little differences though, it's quite clear that the for loop option is the slowest one, so let's see why we still want to use it.

Table of Contents for Some performance considerations

Create new playlist

Sign In

Sign Up

Table of Contents for
Some performance considerations