First steps with Numba

Getting started with Numba is fairly straightforward. As a first example, we will implement a function that calculates the sum of squares of an array. The function definition is as follows:

    def sum_sq(a):
result = 0
N = len(a)
for i in range(N):
result += a[i]
return result

To set up this function with Numba, it is sufficient to apply the nb.jit decorator:

    from numba import nb

@nb.jit
def sum_sq(a):
...

The nb.jit decorator won't do much when applied. However, when the function will be invoked for the first time, Numba will detect the type of the input argument, a , and compile a specialized, performant version of the original function.

To measure the performance gain obtained by the Numba compiler, we can compare the timings of the original and the specialized functions. The original, undecorated function can be easily accessed through the py_func attribute. The timings for the two functions are as follows:

    import numpy as np

x = np.random.rand(10000)

# Original
%timeit sum_sq.py_func(x)
100 loops, best of 3: 6.11 ms per loop

# Numba
%timeit sum_sq(x)
100000 loops, best of 3: 11.7 µs per loop

From the previous code, you can see how the Numba version (11.7 µs) is one order of magnitude faster than the Python version (6.11 ms). We can also compare how this implementation stacks up against NumPy standard operators:

    %timeit (x**2).sum()
10000 loops, best of 3: 14.8 µs per loop

In this case, the Numba compiled function is marginally faster than NumPy vectorized operations. The reason for the extra speed of the Numba version is likely that the NumPy version allocates an extra array before performing the sum in comparison with the in-place operations performed by our sum_sq function.

As we didn't use array-specific methods in sum_sq, we can also try to apply the same function on a regular Python list of floating point numbers. Interestingly, Numba is able to obtain a substantial speed up even in this case, as compared to a list comprehension:

    x_list = x.tolist()
%timeit sum_sq(x_list)
1000 loops, best of 3: 199 µs per loop

%timeit sum([x**2 for x in x_list])
1000 loops, best of 3: 1.28 ms per loop

Considering that all we needed to do was apply a simple decorator to obtain an incredible speed up over different data types, it's no wonder that what Numba does looks like magic. In the following sections, we will dig deeper and understand how Numba works and evaluate the benefits and limitations of the Numba compiler.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.216.254