CUDA device synchronization

Before we can use CUDA streams, we need to understand the notion of device synchronization. This is an operation where the host blocks any further execution until all operations issued to the GPU (memory transfers and kernel executions) have completed. This is required to ensure that operations dependent on prior operations are not executed out-of-order—for example, to ensure that a CUDA kernel launch is completed before the host tries to read its output.

In CUDA C, device synchronization is performed with the cudaDeviceSynchronize function. This function effectively blocks further execution on the host until all GPU operations have completed. cudaDeviceSynchronize is so fundamental that it is usually one of the very first topics covered in most books on CUDA C—we haven't seen this yet, because PyCUDA has been invisibly calling this for us automatically as needed. Let's take a look at an example of CUDA C code to see how this is done manually:

// Copy an array of floats from the host to the device.
cudaMemcpy(device_array, host_array, size_of_array*sizeof(float), cudaMemcpyHostToDevice);
// Block execution until memory transfer to device is complete.
cudaDeviceSynchronize();
// Launch CUDA kernel.
Some_CUDA_Kernel <<< block_size, grid_size >>> (device_array, size_of_array);
// Block execution until GPU kernel function returns.
cudaDeviceSynchronize();
// Copy output of kernel to host.
cudaMemcpy(host_array, device_array, size_of_array*sizeof(float), cudaMemcpyDeviceToHost);
// Block execution until memory transfer to host is complete.
cudaDeviceSynchronize();

In this block of code, we see that we have to synchronize with the device directly after every single GPU operation. If we only have a need to call a single CUDA kernel at a time, as seen here, this is fine. But if we want to concurrently launch multiple independent kernels and memory operations operating on different arrays of data, it would be inefficient to synchronize across the entire device. In this case, we should synchronize across multiple streams. We'll see how to do this right now.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.166.131