Multi-Process Service

The GPU is capable of executing kernels from concurrent CPU processes. However, by default, they are only executed in a time-sliced manner even though each kernel doesn't fully utilize GPU compute resources. To address this unnecessary serialization, the GPU provides Multi-Process Service (MPS) mode. This enables different processes to execute their kernels simultaneously on a GPU to fully utilize GPU resources. When it is enabled, the nvidia-cuda-mps-control daemon monitors the target GPU and manages process kernel operations using that GPU. This feature is only available on Linux. Here, we can see the MPS in which multiple processes share the same GPU:

As we can see, each process has a part that runs in parallel in the GPU (green bars), while some part runs on the CPU (blue bars). Ideally, you would need both the blue bars and green bars to get the best performance. This can be made possible by making use of the MPS feature, which is supported by all the latest GPUs.

Please note that multiple MPI processes running on the same GPU are beneficial when one MPI process is unable to saturate the whole GPU and a significant part of the code is also running on the CPU. If one MPI process utilizes the whole GPU, even though the CPU part (blue bar) will reduce, the green bar time will not as the GPU is completely utilized by one MPI process. The other MPI processes will access the GPU one after another in a time-sliced manner based on the GPU architecture. This is similar to the launching-concurrent-kernels scenario. If one kernel utilizes the whole GPU, then the other kernel will either wait for the first kernel to finish or be time-sliced.

The good thing about this is that no changes need to be made to the application to make use of MPS. The MPS process runs as a daemon, as shown in the following commands:

$nvidia-smi -c EXCLUSIVE_PROCESS 
$nvidia-cuda-mps-control –d

After running this command, all the processes submit their commands to the MPS daemon, which takes care of submitting the CUDA commands to GPU. For the GPU, there is only one process accessing the GPU (MPS Daemon) and hence multiple kernels can run concurrently from multiple processes. This can help overlap memory copies from one process with kernel executions from other MPI processes.

Table of Contents for Multi-Process Service

Create new playlist

Sign In

Sign Up

Table of Contents for
Multi-Process Service