Streams, Events, Contexts, and Concurrency

In the prior chapters, we saw that there are two primary operations we perform from the host when interacting with the GPU:

Copying memory data to and from the GPU
Launching kernel functions

We know that within a single kernel, there is one level of concurrency among its many threads; however, there is another level of concurrency over multiple kernels and GPU memory operations that are also available to us. This means that we can launch multiple memory and kernel operations at once, without waiting for each operation to finish. However, on the other hand, we will have to be somewhat organized to ensure that all inter-dependent operations are synchronized; this means that we shouldn't launch a particular kernel until its input data is fully copied to the device memory, or we shouldn't copy the output data of a launched kernel to the host until the kernel has finished execution.

To this end, we have what are known as CUDA streams—a stream is a sequence of operations that are run in order on the GPU. By itself, a single stream isn't of any use—the point is to gain concurrency over GPU operations issued by the host by using multiple streams. This means that we should interleave launches of GPU operations that correspond to different streams, in order to exploit this notion.

We will be covering this notion of streams extensively in this chapter. Additionally, we will look at events, which are a feature of streams that are used to precisely time kernels and indicate to the host as to what operations have been completed within a given stream.

Finally, we will briefly look at CUDA contexts. A context can be thought of as analogous to a process in your operating system, in that the GPU keeps each context's data and kernel code walled off and encapsulated away from the other contexts currently existing on the GPU. We will see the basics of this near the end of the chapter.

The following are the learning outcomes for this chapter:

Understanding the concepts of device and stream synchronization
Learning how to effectively use streams to organize concurrent GPU operations
Learning how to effectively use CUDA events
Understanding CUDA contexts
Learning how to explicitly synchronize within a given context
Learning how to explicitly create and destroy a CUDA context
Learning how to use contexts to allow for GPU usage among multiple processes and threads on the host

Table of Contents for Streams, Events, Contexts, and Concurrency

Create new playlist

Sign In

Sign Up

Table of Contents for
Streams, Events, Contexts, and Concurrency