Particle simulator in Cython

Now that we have a basic understanding of how Cython works, we can rewrite the ParticleSimulator.evolve method. Thanks to Cython, we can convert our loops in C, thus removing the overhead introduced by the Python interpreter.

In Chapter 3, Fast Array Operations with NumPy and Pandas, we wrote a fairly efficient version of the evolve method using NumPy. We can rename the old version as evolve_numpy to differentiate it from the new version:

    def evolve_numpy(self, dt): 
timestep = 0.00001
nsteps = int(dt/timestep)

r_i = np.array([[p.x, p.y] for p in self.particles])
ang_speed_i = np.array([p.ang_speed for p in self.particles])
v_i = np.empty_like(r_i)

for i in range(nsteps):
norm_i = np.sqrt((r_i ** 2).sum(axis=1))

v_i = r_i[:, [1, 0]]
v_i[:, 0] *= -1
v_i /= norm_i[:, np.newaxis]

d_i = timestep * ang_speed_i[:, np.newaxis] * v_i

r_i += d_i

for i, p in enumerate(self.particles):
p.x, p.y = r_i[i]

We want to convert this code to Cython. Our strategy will be to take advantage of the fast indexing operations by removing the NumPy array broadcasting, thus reverting to an indexing-based algorithm. Since Cython generates efficient C code, we are free to use as many loops as we like without any performance penalty.

As a design choice, we can decide to encapsulate the loop in a function that we will rewrite in a Cython module called cevolve.pyx. The module will contain a single Python function, c_evolve, that will take the particle positions, angular velocities, timestep, and number of steps as input.

At first, we are not adding typing information; we just want to isolate the function and ensure that we can compile our module without errors:

    # file: simul.py 
def evolve_cython(self, dt):
timestep = 0.00001
nsteps = int(dt/timestep)

r_i = np.array([[p.x, p.y] for p in self.particles])
ang_speed_i = np.array([p.ang_speed for p in self.particles])

c_evolve(r_i, ang_speed_i, timestep, nsteps)

for i, p in enumerate(self.particles):
p.x, p.y = r_i[i]

# file: cevolve.pyx
import numpy as np

def c_evolve(r_i, ang_speed_i, timestep, nsteps):
v_i = np.empty_like(r_i)

for i in range(nsteps):
norm_i = np.sqrt((r_i ** 2).sum(axis=1))

v_i = r_i[:, [1, 0]]
v_i[:, 0] *= -1
v_i /= norm_i[:, np.newaxis]

d_i = timestep * ang_speed_i[:, np.newaxis] * v_i

r_i += d_i

Note that we don't need a return value for c_evolve as values are updated in the r_i array in-place. We can benchmark the untyped Cython version against the old NumPy version by slightly changing our benchmark function, as follows:

    def benchmark(npart=100, method='python'): 
particles = [Particle(uniform(-1.0, 1.0),
uniform(-1.0, 1.0),
uniform(-1.0, 1.0))
for i in range(npart)]
simulator = ParticleSimulator(particles)
if method=='python':
simulator.evolve_python(0.1)
elif method == 'cython':
simulator.evolve_cython(0.1)
elif method == 'numpy':
simulator.evolve_numpy(0.1)

We can time the different versions in an IPython shell:

    %timeit benchmark(100, 'cython') 
1 loops, best of 3: 401 ms per loop
%timeit benchmark(100, 'numpy')
1 loops, best of 3: 413 ms per loop

The two versions have the same speed. Compiling the Cython module without static typing doesn't have any advantage over pure Python. The next step is to declare the type of all the important variables so that Cython can perform its optimizations.

We can start adding types to the function arguments and see how the performance changes. We can declare the arrays as typed memoryviews containing double values. It's worth mentioning that if we pass an array of the int or float32 type, the casting won't happen automatically and we will get an error:

    def c_evolve(double[:, :] r_i,
double[:] ang_speed_i,
double timestep,
int nsteps):

At this point, we can rewrite the loops over the particles and timesteps. We can declare the i and j iteration indices and the nparticles particle number as int:

    cdef int i, j 
cdef int nparticles = r_i.shape[0]

The algorithm is very similar to the pure Python version; we iterate over the particles and timesteps, and we compute the velocity and displacement vectors for each particle coordinate using the following code:

      for i in range(nsteps): 
for j in range(nparticles):
x = r_i[j, 0]
y = r_i[j, 1]
ang_speed = ang_speed_i[j]

norm = sqrt(x ** 2 + y ** 2)

vx = (-y)/norm
vy = x/norm

dx = timestep * ang_speed * vx
dy = timestep * ang_speed * vy

r_i[j, 0] += dx
r_i[j, 1] += dy

In the preceding code, we added the x, y, ang_speed, norm, vx, vy, dx, and dy variables. To avoid the Python interpreter overhead, we have to declare them with their corresponding types at the beginning of the function, as follows:

    cdef double norm, x, y, vx, vy, dx, dy, ang_speed 

We also used a function called sqrt to calculate the norm. If we use the sqrt present in the math module or the one in numpy, we will again include a slow Python function in our critical loop, thus killing our performance. A fast sqrt is available in the standard C library, already wrapped in the libc.math Cython module:

    from libc.math cimport sqrt 

We can rerun our benchmark to assess our improvements, as follows:

    In [4]: %timeit benchmark(100, 'cython') 
100 loops, best of 3: 13.4 ms per loop
In [5]: %timeit benchmark(100, 'numpy')
1 loops, best of 3: 429 ms per loop

For small particle numbers, the speed-up is massive as we obtained a 40x performance over the previous version. However, we should also try to test the performance scaling with a larger number of particles:

    In [2]: %timeit benchmark(1000, 'cython') 
10 loops, best of 3: 134 ms per loop
In [3]: %timeit benchmark(1000, 'numpy')
1 loops, best of 3: 877 ms per loop

As we increase the number of particles, the two versions get closer in speed. By increasing the particle size to 1000, we already decreased our speed-up to a more modest 6x. This is likely due to the fact that, as we increase the number of particles, the Python for-loop overhead becomes less and less significant compared to the speed of other operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.80.45