Scientific computing with mpi4py

Even though Dask and Spark are great technologies widely used in the IT industry, they have not been widely adopted in academic research. High-performance supercomputers with thousands of processors have been used in academia for decades to run intense numerical applications. For this reason, supercomputers are generally configured using a very different software stack that focuses on a computationally-intensive algorithm implemented in a low-level language, such as C, Fortran, or even assembly.

The principal library used for parallel execution on these kinds of systems is Message Passing Interface (MPI), which, while less convenient or sophisticated than Dask or Spark, is perfectly capable of expressing parallel algorithms and achieving excellent performance. Note that, contrary to Dask and Spark, MPI does not follow the MapReduce model and is best used for running thousands of processes with very little data sent between them.

MPI works quite differently compared to what we've seen so far. Parallelism in MPI is achieved by running the same script in multiple processes (which possibly exist on different nodes); communication and synchronization between processes is handled by a designated process, which is commonly called root and is usually identified by a 0 ID.

In this section, we will briefly demonstrate the main concepts of MPI using its mpi4py Python interface. In the following example, we demonstrate the simplest possible parallel code with MPI. The code imports the MPI module and retrieves COMM_WORLD, which is an interface that can be used to interact with other MPI processes. The Get_rank function will return an integer identifier for the current process:

    from mpi4py import MPI

    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    print("This is process", rank)

We can place the preceding code in a file, mpi_example.py, and execute it. Running this script normally won't do anything special as it involves the execution of a single process:

    $ python mpi_example.py
    This is process 0

MPI jobs are meant to be executed using the mpiexec command, which takes a -n option to indicate the number of parallel processes. Running the script using the following command will generate four independent executions of the same script, each with a different ID:

    $ mpiexec -n 4 python mpi_example.py
    This is process 0
    This is process 2
    This is process 1
    This is process 3

Distributing processes among the network is performed automatically through a resource manager (such as TORQUE). Generally, supercomputers are configured by the system administrator, which will also provide instructions on how to run MPI software.

To get a feel as to what an MPI program looks like, we will reimplement the approximation of pi. The complete code is shown here. The program will do the following:

Create a random array of N / n_procs size for each process so that each process will test the same amount of samples (n_procs is obtained through the Get_size function)
In each separate process, calculate the sum of the hit tests and store it in hits_counts, which will represent the partial counts for each process
Use the reduce function to calculate the total sum of the partial counts. When using reduce, we need to specify the root argument to specify which process will receive the result
Print the final result only on the process corresponding to the root process:

      from mpi4py import MPI

      comm = MPI.COMM_WORLD
      rank = comm.Get_rank()

      import numpy as np

      N = 10000

      n_procs = comm.Get_size()

      print("This is process", rank)

      # Create an array
      x_part = np.random.uniform(-1, 1, int(N/n_procs))
      y_part = np.random.uniform(-1, 1, int(N/n_procs))

      hits_part = x_part**2 + y_part**2 < 1
      hits_count = hits_part.sum()

      print("partial counts", hits_count)

      total_counts = comm.reduce(hits_count, root=0)

      if rank == 0:
         print("Total hits:", total_counts)
         print("Final result:", 4 * total_counts/N)

We can now place the preceding code in a file named mpi_pi.py and execute it using mpiexec. The output shows how the four process executions are intertwined until we get to the reduce call:

$ mpiexec -n 4 python mpi_pi.py
This is process 3
partial counts 1966
This is process 1
partial counts 1944
This is process 2
partial counts 1998
This is process 0
partial counts 1950
Total hits: 7858
Final result: 3.1432

Table of Contents for Scientific computing with mpi4py

Create new playlist

Sign In

Sign Up

Table of Contents for
Scientific computing with mpi4py