First, we focus on DirectPath I/O (DPIO) passthrough mode as we scale from one GPU to four GPUs:
CIFAR-10 |
1 GPU | 2 GPUs | 4 GPUs |
Normalized images/sec in thousands (w.r.t. 1 GPU) |
1.1 |
2.01 |
3.77 |
CPU utilization |
23% |
41% |
73% |
Images processed per second get better with the increased number of GPUs on the server. One GPU almost used to normalized data at 1,000 images/second and will grow further with the increase of GPUs. DPIO and GRID vGPU mode performance can be compared by configuring with one vGPU/VM in both modes:
MNIST Workload (lower is better) |
DPIO | GRID vGPU |
Normalized training times |
1.1 |
1.03 |
CIFAR-10 Workload (Higher is better) |
DPIO | GRID vGPU |
Normalized images/second |
1.1 |
0.83 |
DPIO and GRID mode vGPU have more-or-less the same performance as one vGPU/VM. We can configure a VM with all the available GPUs on the host in DPIO, but a VM can configure a maximum of one GPU in GRID vGPU mode. We can differentiate between four VMs running the same job and a VM using four GPUs/hosts in DPIO mode:
CIFAR-10 Workload |
DPIO |
DPiO (four VMs) |
GRID vGPU (four VMs) |
Normalized images/second |
1.1 |
0.96 |
0.94 |
CPU utilization |
73% |
69% |
67% |
We should configure virtual machines with low latency or require a shorter training time in multi-GPU DPIO mode. As they are dedicated to specific virtual machines, the rest of the virtual machines will not be able to access the GPUs on the host during this time. We can leverage virtual machines with longer latencies or learning times by configuring 1-GPU in GRID vGPU mode and enjoy the virtualization benefits.