Numba

In this final subsection, we want to talk about Numba. It is probably one of the hottest ways to speed up your Python code with almost no changes. Numba compiles Python codevanilla Python or NumPy-based—into C code using LLVM. By doing soand by leveraging a suite of optimizations along the wayit drastically increases the speed of the code, especially if you use a lot of loops and NumPy arrays.

The great thing about Numba is that, in the best-case scenario, it will improve your code by adding a simple decorator over your function or classthat is, if you're lucky. If you're not, you'll have to work through the documentation and somewhat obscure error messages and experiment with datatype annotations. In some cases, Numba could be more performant than NumPy! As if that isn't enough, Numba can also compile your code for CUDA, leveraging the heavy performance of GPUswhich are often an order of magnitude faster than CPUs! 

Here is a simple example. The compute_distances function resembles the behavior of euclidean_distances and performs fairly well:

def distance(p1, p2):
distance = 0
for c1, c2, in zip(p1,p2):
distance += (c2-c1)**2

return np.sqrt(distance)

def compute_distances(points1, points2):
A = np.zeros(shape=(len(points1), len(points2)))

for i, p1 in enumerate(points1):
for j, p2 in enumerate(points2):
A[i, j] = distance(p1, p2)

return A

%timeit compute_distances([(0, 0)]*100, [(1,1)]*200)

The performance (output) of the preceding code snippet is as follows:

>>> 43.8 ms ± 1.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

However, once we add a decorator to each function, performance increases more than tenfold: 

@jit()
def distance(p1, p2):

distance = 0
for c1, c2, in zip(p1,p2):
distance += (c2-c1)**2

return np.sqrt(distance)


@jit()
def compute_distances(points1, points2):
A = np.zeros(shape=(len(points1), len(points2)))

for i, p1 in enumerate(points1):
for j, p2 in enumerate(points2):
A[i, j] = distance(p1, p2)

return A

%timeit compute_distances([(0, 0)]*100, [(1,1)]*200)

The performance (output) of the preceding code snippet is as follows:

>>> 3.02 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

On that run Numba shows a deprecation warning—future versions will require to specify a list type; in the current version it works as it is.

In our experience, Numba is great for non-trivial, multi-nested computations, where it is easier to write in pure Python (and optimize with Numba) than in NumPy. At the same time, it isn't very mature code (as NumPy is), and different changes occurring in the API happen fairly often.

In this section, we covered a few ways to improve the performance of Python code. Starting from a naive, slow, but easy algorithm implementation, we took on different angles in order to make it faster, such as using vectorized C-based loops, specific data structures that are efficient for the task, running operations on multiple cores or multiple machines, and using modern compilers. Some of those solutions can and should be combined. All of them have their own benefits, limitations, and requirementslarger memory, more CPUs and computers, specific knowledge, and so on. Don't rush to implement any optimization before you're sure you need it. Once you are sure, though, a wide range of possibilities is available.

Numba is not the only way to compile Python into a more performant C version. In fact, there are quite a few other ways to do this. Among the most popular ones is Cython. The idea behind this package is somewhat similar to Numba, but there is no LLVM involved, and code is compiled to C directlyby doing this, you can store and use the compiled code. In addition, Numba can be compiled to CUDA and run on a GPU!

For more information on Numba, check out the following resources:

Now, let's talk about an important topic we've ignored so farconcurrency.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.106.225