Implementing an MPI-enabled application

To make an application work with MPI, we need to put some code that can understand MPI commands in the application:

  1. We will reuse the OpenMP sample code, so copy the openmp.cu file in the 08_openmp_cuda directory.
  2. Insert the mpi header include statement at the beginning of the code:
#include <mpi.h>
  1. Insert the following code right after the stopwatch has been created in the main() function:
// set num_operator as the number of requested process
int np, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  1. Slice the required memory size by the number of processes, after the code mentioned in step 3, like this:
bufsize /= np;
size /= np;
  1. We need to make each thread to report their process which they belong. Let's update the printf() function in the parallel execution code block, as follows:
// execute each operator collesponding data
omp_set_num_threads(num_operator);
#pragma omp parallel
{
int i = omp_get_thread_num();
int offset = i * size / num_operator;
printf("Launched GPU task (%d, %d) ", rank, i);

ls_operator[i].set_index(i);
ls_operator[i].async_operation(&h_c[offset],
&h_a[offset], &h_b[offset],
&d_c[offset], &d_a[offset],
&d_b[offset],
size / num_operator,
bufsize / num_operator);
}
  1. At the end of main(), place the MPI_Finalize() function to close the MPI instances.
  2. Compile the code with the following command: 
$ nvcc -m64 -gencode arch=compute_70,code=sm_70 -I/usr/local/cuda/samples/common/inc -I/usr/local/include/ -Xcompiler -fopenmp -lgomp -lmpi -o simpleMPI ./simpleMPI.cu

You must use your GPU's compute capability version number for the gencode option.

  1. Test the compiled application using the following command:
$ ./simpleMPI 2
  1. Now, test MPI execution using the following command:
$ mpirun -np 2 ./simpleMPI 2
Number of process: 2
Number of operations: 2
Launched GPU task (1, 0)
Launched GPU task (1, 1)
Number of operations: 2
Launched GPU task (0, 0)
Launched GPU task (0, 1)
stream 0 - elapsed 13.390 ms
stream 1 - elapsed 25.532 ms
compared a sample result...
host: 1.306925, device: 1.306925
Time= 25.749 msec, bandwidth= 15.637624 GB/s
stream 0 - elapsed 21.334 ms
stream 1 - elapsed 26.010 ms
compared a sample result...
host: 1.306925, device: 1.306925
Time= 26.111 msec, bandwidth= 15.420826 GB/s
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.244.201