Synchronizing the current context

We're going to see how to explicitly synchronize our device within a context from within Python as in CUDA C; this is actually one of the most fundamental skills to know in CUDA C, and is covered in the first or second chapters in most other books on the topic. So far, we have been able to avoid this topic, since PyCUDA has performed most synchronizations for us automatically with pycuda.gpuarray functions such as to_gpu or get; otherwise, synchronization was handled by streams in the case of the to_gpu_async or get_async functions, as we saw at the beginning of this chapter.

We will be humble and start by modifying the program we wrote in Chapter 3, Getting Started with PyCUDA, which generates an image of the Mandelbrot set using explicit context synchronization. (This is available here as the file gpu_mandelbrot0.py under the 3 directory in the repository.)

We won't get any performance gains over our original Mandelbrot program here; the only point of this exercise is just to help us understand CUDA contexts and GPU synchronization.

Looking at the header, we, of course, see the import pycuda.autoinit line. We can access the current context object with pycuda.autoinit.context, and we can synchronize in our current context by calling the pycuda.autoinit.context.synchronize() function.

Now let's modify the gpu_mandelbrot function to handle explicit synchronization. The first GPU-related line we see is this:

mandelbrot_lattice_gpu = gpuarray.to_gpu(mandelbrot_lattice)

We can now change this to be explicitly synchronized. We can copy to the GPU asynchronously with to_gpu_async, and then synchronize as follows:

mandelbrot_lattice_gpu = gpuarray.to_gpu_async(mandelbrot_lattice)
pycuda.autoinit.context.synchronize()

We then see the next line allocates memory on the GPU with the gpuarray.empty function. Memory allocation in CUDA is, by the nature of the GPU architecture, automatically synchronized; there is no asynchronous memory allocation equivalent here. Hence, we keep this line as it was before.

Memory allocation in CUDA is always synchronized!

We now see the next two lines—our Mandelbrot kernel is launched with an invocation to mandel_ker, and we copy the contents of our Mandelbrot gpuarray object with an invocation to get. We synchronize after the kernel launch, switch get to get_async, and finally synchronize one last line:

mandel_ker( mandelbrot_lattice_gpu, mandelbrot_graph_gpu, np.int32(max_iters), np.float32(upper_bound))
pycuda.autoinit.context.synchronize()
mandelbrot_graph = mandelbrot_graph_gpu.get_async()
pycuda.autoinit.context.synchronize()

We can now run this, and it will produce a Mandelbrot image to disk, exactly as in Chapter 3Getting Started with PyCUDA.

(This example is also available as gpu_mandelbrot_context_sync.py in the repository.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.249.220