Profiling Cython

Cython provides a feature, called annotated view, that helps find which lines are executed in the Python interpreter and which are good candidates for ulterior optimizations. We can turn this feature on by compiling a Cython file with the -a option. In this way, Cython will generate an HTML file containing our code annotated with some useful information. The usage of the -a option is as follows:

$ cython -a cevolve.pyx
$ firefox cevolve.html

The HTML file displayed in the following screenshot shows our Cython file line by line:

Each line in the source code can appear in different shades of yellow. A more intense color corresponds to more interpreter-related calls, while white lines are translated to regular C code. Since interpreter calls substantially slow down execution, the objective is to make the function body as white as possible. By clicking on any of the lines, we can inspect the code generated by the Cython compiler. For example, the v_y = x/norm line checks that the norm is not 0 and raises a ZeroDivisionError if the condition is not verified. The x = r_i[j, 0] line shows that Cython checks whether the indexes are within the bounds of the array. You may note that the last line is of a very intense color; by inspecting the code, we can see that this is actually a glitch; the code refers to a boilerplate related to the end of the function.

Cython can shut down checks, such as division by zero, so that it can remove those extra interpreter related calls; this is usually accomplished through compiler directives. There are a few different ways to add compiler directives:

  • Using a decorator or a context manager
  • Using a comment at the beginning of the file
  • Using the Cython command-line options
For a complete list of the Cython compiler directives, you can refer to the official documentation at http://docs.cython.org/src/reference/compilation.html#compiler-directives.

For example, to disable bounds checking for arrays, it is sufficient to decorate a function with cython.boundscheck, in the following way:

    cimport cython 

@cython.boundscheck(False)
def myfunction():
# Code here

Alternatively, we can use cython.boundscheck to wrap a block of code into a context manager, as follows:

    with cython.boundscheck(False): 
# Code here

If we want to disable bounds checking for a whole module, we can add the following line of code at the beginning of the file:

    # cython: boundscheck=False 

To alter the directives with the command-line options, you can use the -X option as follows:

$ cython -X boundscheck=True

To disable the extra checks in our c_evolve function, we can disable the boundscheck directive and enable cdivision (this prevents checks for ZeroDivisionError), as in the following code:

    cimport cython 

@cython.boundscheck(False)
@cython.cdivision(True)
def c_evolve(double[:, :] r_i,
double[:] ang_speed_i,
double timestep,
int nsteps):

If we look at the annotated view again, the loop body has become completely white--we removed all traces of the interpreter from the inner loop. In order to recompile, just type python setup.py build_ext --inplace again. By running the benchmark, however, we note that we didn't obtain a performance improvement, suggesting that those checks are not part of the bottleneck:

    In [3]: %timeit benchmark(100, 'cython') 
100 loops, best of 3: 13.4 ms per loop

Another way to profile Cython code is through the use of the cProfile module. As an example, we can write a simple function that calculates the Chebyshev distance between coordinate arrays. Create a cheb.py file:

    import numpy as np 
from distance import chebyshev

def benchmark():
a = np.random.rand(100, 2)
b = np.random.rand(100, 2)
for x1, y1 in a:
for x2, y2 in b:
chebyshev(x1, x2, y1, y2)

If we try profiling this script as-is, we won't get any statistics regarding the functions that we implemented in Cython. If we want to collect profiling information for the max and min functions, we need to add the profile=True option to the mathlib.pyx file, as shown in the following code:

    # cython: profile=True 

cdef int max(int a, int b):
# Code here

We can now profile our script with %prun using IPython, as follows:

    import cheb 
%prun cheb.benchmark()

# Output:
2000005 function calls in 2.066 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
1 1.664 1.664 2.066 2.066 cheb.py:4(benchmark)
1000000 0.351 0.000 0.401 0.000 {distance.chebyshev}
1000000 0.050 0.000 0.050 0.000 mathlib.pyx:2(max)
2 0.000 0.000 0.000 0.000 {method 'rand' of 'mtrand.RandomState' objects}
1 0.000 0.000 2.066 2.066 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

From the output, we can see that the max function is present and is not a bottleneck. Most of the time seems to be spent in the benchmark function, meaning that the bottleneck is likely the pure Python for-loop. In this case, the best strategy will be rewriting the loop in NumPy or porting the code to Cython.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.199.122