Profiling Python

There are a few different ways to profile a Python application. Profiling means having the application run while keeping track of several different parameters, such as the number of times a function is called and the amount of time spent inside it. Profiling can help us find the bottlenecks in our application, so that we can improve only what is really slowing us down.

If you take a look at the profiling section in the standard library official documentation, you will see that there are a couple of different implementations of the same profiling interface—profile and cProfile:

cProfile is recommended for most users, it's a C extension with reasonable overhead that makes it suitable for profiling long-running programs
profile is a pure Python module whose interface is imitated by cProfile, but which adds significant overhead to profiled programs

This interface does determinist profiling, which means that all function calls, function returns, and exception events are monitored, and precise timings are made for the intervals between these events. Another approach, called statistical profiling, randomly samples the effective instruction pointer, and deduces where time is being spent.

The latter usually involves less overhead, but provides only approximate results. Moreover, because of the way the Python interpreter runs the code, deterministic profiling doesn't add as much overhead as one would think, so I'll show you a simple example using cProfile from the command line.

We're going to calculate Pythagorean triples (I know, you've missed them...) using the following code:

# profiling/triples.py
def calc_triples(mx):
    triples = []
    for a in range(1, mx + 1):
        for b in range(a, mx + 1):
            hypotenuse = calc_hypotenuse(a, b)
            if is_int(hypotenuse):
                triples.append((a, b, int(hypotenuse)))
    return triples

def calc_hypotenuse(a, b):
    return (a**2 + b**2) ** .5

def is_int(n):  # n is expected to be a float
    return n.is_integer()

triples = calc_triples(1000)

The script is extremely simple; we iterate over the interval [1, mx] with a and b (avoiding repetition of pairs by setting b >= a) and we check whether they belong to a right triangle. We use calc_hypotenuse to get hypotenuse for a and b, and then, with is_int, we check whether it is an integer, which means (a, b, c) is a Pythagorean triple. When we profile this script, we get information in a tabular form. The columns are ncalls, tottime, percall, cumtime, percall, and filename:lineno(function). They represent the amount of calls we made to a function, how much time we spent in it, and so on. I'll trim a couple of columns to save space, so if you run the profiling yourself—don't worry if you get a different result. Here is the code:

$ python -m cProfile triples.py
1502538 function calls in 0.704 seconds
Ordered by: standard name

ncalls tottime percall filename:lineno(function)
500500   0.393   0.000 triples.py:17(calc_hypotenuse)
500500   0.096   0.000 triples.py:21(is_int)
     1   0.000   0.000 triples.py:4(<module>)
     1   0.176   0.176 triples.py:4(calc_triples)
     1   0.000   0.000 {built-in method builtins.exec}
  1034   0.000   0.000 {method 'append' of 'list' objects}
     1   0.000   0.000 {method 'disable' of '_lsprof.Profil...
500500   0.038   0.000 {method 'is_integer' of 'float' objects}

Even with this limited amount of data, we can still infer some useful information about this code. First, we can see that the time complexity of the algorithm we have chosen grows with the square of the input size. The amount of times we get inside the inner loop body is exactly mx (mx + 1) / 2. We run the script with mx = 1000, which means we get 500500 times inside the inner for loop. Three main things happen inside that loop: we call calc_hypotenuse, we call is_int, and, if the condition is met, we append it to the triples list.

Taking a look at the profiling report, we notice that the algorithm has spent 0.393 seconds inside calc_hypotenuse, which is way more than the 0.096 seconds spent inside is_int, given that they were called the same number of times, so let's see whether we can boost calc_hypotenuse a little.

As it turns out, we can. As I mentioned earlier in this book, the ** power operator is quite expensive, and in calc_hypotenuse, we're using it three times. Fortunately, we can easily transform two of those into simple multiplications, like this:

def calc_hypotenuse(a, b): 
    return (a*a + b*b) ** .5

This simple change should improve things. If we run the profiling again, we see that 0.393 is now down to 0.137. Not bad! This means now we're spending only about 37% of the time inside calc_hypotenuse that we were before.

Let's see whether we can improve is_int as well, by changing it, like this:

def is_int(n): 
    return n == int(n)

This implementation is different, and the advantage is that it also works when n is an integer. Alas, when we run the profiling against it, we see that the time taken inside the is_int function has gone up to 0.135 seconds, so, in this case, we need to revert to the previous implementation. You will find the three versions in the source code for the book.

This example was trivial, of course, but enough to show you how one could profile an application. Having the amount of calls that are performed against a function helps us better understand the time complexity of our algorithms. For example, you wouldn't believe how many coders fail to see that those two for loops run proportionally to the square of the input size.

One thing to mention: depending on what system you're using, results may be different. Therefore, it's quite important to be able to profile software on a system that is as close as possible to the one the software is deployed on, if not actually on that one.

Table of Contents for Profiling Python

Create new playlist

Sign In

Sign Up

Table of Contents for
Profiling Python