Rewriting the particle simulator in NumPy

In this section, we will optimize our particle simulator by rewriting some parts of it in NumPy. We found, from the profiling we did in Chapter 1, Benchmarking and Profiling, that the slowest part of our program is the following loop contained in the ParticleSimulator.evolve method:

    for i in range(nsteps): 
      for p in self.particles: 

        norm = (p.x**2 + p.y**2)**0.5 
        v_x = (-p.y)/norm 
        v_y = p.x/norm 

        d_x = timestep * p.ang_vel * v_x 
        d_y = timestep * p.ang_vel * v_y 

        p.x += d_x 
        p.y += d_y

You may have noticed that the body of the loop acts solely on the current particle. If we had an array containing the particle positions and angular speed, we could rewrite the loop using a broadcasted operation. In contrast, the loop's steps depend on the previous step and cannot be parallelized in this way.

It is natural then, to store all the array coordinates in an array of shape (nparticles, 2) and the angular speed in an array of shape (nparticles,), where nparticles is the number of particles. We'll call those arrays r_i and ang_vel_i:

    r_i = np.array([[p.x, p.y] for p in self.particles]) 
    ang_vel_i = np.array([p.ang_vel for p in self.particles])

The velocity direction, perpendicular to the vector (x, y), was defined as follows:

    v_x = -y / norm 
    v_y = x / norm

The norm can be calculated using the strategy illustrated in the Calculating the norm section under the Getting started with NumPy heading:

    norm_i = ((r_i ** 2).sum(axis=1))**0.5

For the (-y, x) components, we first need to swap the x and y columns in r_i and then multiply the first column by -1, as shown in the following code:

    v_i = r_i[:, [1, 0]] / norm_i 
    v_i[:, 0] *= -1

To calculate the displacement, we need to compute the product of v_i, ang_vel_i, and timestep. Since ang_vel_i is of shape (nparticles,), it needs a new axis in order to operate with v_i of shape (nparticles, 2). We will do that using numpy.newaxis, as follows:

    d_i = timestep * ang_vel_i[:, np.newaxis] * v_i 
    r_i += d_i

Outside the loop, we have to update the particle instances with the new coordinates, x and y, as follows:

    for i, p in enumerate(self.particles): 
      p.x, p.y = r_i[i]

To summarize, we will implement a method called ParticleSimulator.evolve_numpy and benchmark it against the pure Python version, renamed as ParticleSimulator.evolve_python:

    def evolve_numpy(self, dt): 
      timestep = 0.00001 
      nsteps = int(dt/timestep) 

      r_i = np.array([[p.x, p.y] for p in self.particles]) 
      ang_vel_i = np.array([p.ang_vel for p in self.particles]) 

      for i in range(nsteps): 

        norm_i = np.sqrt((r_i ** 2).sum(axis=1)) 
        v_i = r_i[:, [1, 0]] 
        v_i[:, 0] *= -1 
        v_i /= norm_i[:, np.newaxis] 
        d_i = timestep * ang_vel_i[:, np.newaxis] * v_i 
        r_i += d_i 

        for i, p in enumerate(self.particles): 
          p.x, p.y = r_i[i]

We also update the benchmark to conveniently change the number of particles and the simulation method, as follows:

    def benchmark(npart=100, method='python'): 
      particles = [Particle(uniform(-1.0, 1.0),     
                            uniform(-1.0, 1.0),
                            uniform(-1.0, 1.0))  
                            for i in range(npart)] 

      simulator = ParticleSimulator(particles) 

      if method=='python': 
        simulator.evolve_python(0.1) 

      elif method == 'numpy': 
        simulator.evolve_numpy(0.1)

Let's run the benchmark in an IPython session:

    from simul import benchmark 
    %timeit benchmark(100, 'python') 
    1 loops, best of 3: 614 ms per loop 
    %timeit benchmark(100, 'numpy') 
    1 loops, best of 3: 415 ms per loop

We have some improvement, but it doesn't look like a huge speed boost. The power of NumPy is revealed when handling big arrays. If we increase the number of particles, we will note a more significant performance boost:

    %timeit benchmark(1000, 'python') 
    1 loops, best of 3: 6.13 s per loop 
    %timeit benchmark(1000, 'numpy') 
    1 loops, best of 3: 852 ms per loop

The plot in the following figure was produced by running the benchmark with different particle numbers:

The plot shows that both the implementations scale linearly with particle size, but the runtime in the pure Python version grows much faster than the NumPy version; at greater sizes, we have a greater NumPy advantage. In general, when using NumPy, you should try to pack things into large arrays and group the calculations using the broadcasting feature.

Table of Contents for Rewriting the particle simulator in NumPy

Create new playlist

Sign In

Sign Up

Table of Contents for
Rewriting the particle simulator in NumPy