Reaching optimal performance with numexpr

When handling complex expressions, NumPy stores intermediate results in memory. David M. Cooke wrote a package called numexpr, which optimizes and compiles array expressions on the fly. It works by optimizing the usage of the CPU cache and by taking advantage of multiple processors.

Its usage is generally straightforward and is based on a single function--numexpr.evaluate. The function takes a string containing an array expression as its first argument. The syntax is basically identical to that of NumPy. For example, we can calculate a simple a + b * c expression in the following way:

    a = np.random.rand(10000) 
    b = np.random.rand(10000) 
    c = np.random.rand(10000) 
    d = ne.evaluate('a + b * c')

The numexpr package increases the performances in almost all cases, but to get a substantial advantage, you should use it with large arrays. An application that involves a large array is the calculation of a distance matrix. In a particle system, a distance matrix contains all the possible distances between the particles. To calculate it, we should first calculate all the vectors connecting any two particles (i,j), as follows:

    x_ij = x_j - x_i 
    y_ij = y_j - y_i.

Then, we calculate the length of this vector by taking its norm, as in the following code:

    d_ij = sqrt(x_ij**2 + y_ij**2)

We can write this in NumPy by employing the usual broadcasting rules (the operation is similar to the outer product):

    r = np.random.rand(10000, 2) 
    r_i = r[:, np.newaxis] 
    r_j = r[np.newaxis, :] 
    d_ij = r_j - r_i

Finally, we calculate the norm over the last axis using the following line of code:

    d_ij = np.sqrt((d_ij ** 2).sum(axis=2))

Rewriting the same expression using the numexpr syntax is extremely easy. The numexpr package doesn't support slicing in its array expression; therefore, we first need to prepare the operands for broadcasting by adding an extra dimension, as follows:

    r = np.random(10000, 2) 
    r_i = r[:, np.newaxis] 
    r_j = r[np.newaxis, :]

At that point, we should try to pack as many operations as possible in a single expression to allow a significant optimization.

Most of the NumPy mathematical functions are also available in numexpr. However, there is a limitation--the reduction operations (the ones that reduce an axis, such as sum) have to happen last. Therefore, we have to first calculate the sum, then step out of numexpr, and finally calculate the square root in another expression:

    d_ij = ne.evaluate('sum((r_j - r_i)**2, 2)') 
    d_ij = ne.evaluate('sqrt(d_ij)')

The numexpr compiler will avoid redundant memory allocation by not storing intermediate results. When possible, it will also distribute the operations over multiple processors. In the distance_matrix.py file, you will find two functions that implement the two versions: distance_matrix_numpy and distance_matrix_numexpr:

    from distance_matrix import (distance_matrix_numpy, 
                                 distance_matrix_numexpr) 
    %timeit distance_matrix_numpy(10000) 
    1 loops, best of 3: 3.56 s per loop 
    %timeit distance_matrix_numexpr(10000) 
    1 loops, best of 3: 858 ms per loop

By simply converting the expressions to use numexpr, we were able to obtain a 4.5x increase in performance over standard NumPy. The numexpr package can be used every time you need to optimize a NumPy expression that involves large arrays and complex operations, and you can do so with minimal changes in the code.

Table of Contents for Reaching optimal performance with numexpr

Create new playlist

Sign In

Sign Up

Table of Contents for
Reaching optimal performance with numexpr