Working with Compiled GPU Code

Throughout the course of this book, we have generally been reliant on the PyCUDA library to interface our inline CUDA-C code for us automatically, using just-in-time compilation and linking with our Python code. We might recall, however, that sometimes the compilation process can take a while. In Chapter 3, Getting Started With PyCUDA, we even saw in detail how the compilation process can contribute to slowdown, and how it can even be somewhat arbitrary as to when inline code will be compiled and retained. In some cases, this may be inconvenient and cumbersome given the application, or even unacceptable in the case of a real-time system.

To this end, we will finally see how to use pre-compiled GPU code from Python. In particular, we will look at three distinct ways to do this. First, we will look at how we can do this by writing a host-side CUDA-C function that can indirectly launch a CUDA kernel. This method will involve invoking the host-side function with the standard Python Ctypes library. Second, we will compile our kernel into what is known as a PTX module, which is effectively a DLL file containing compiled binary GPU. We can then load this file with PyCUDA and launch our kernel directly. Finally, we will end this chapter by looking at how to write our own full-on Ctypes interface to the CUDA Driver API. We can then use the appropriate functions from the Driver API to load our PTX file and launch a kernel.

The learning outcomes for this chapter are as follows:

Launching compiled (host-side) code with the Ctypes module
Using host-side CUDA C wrappers with Ctypes to launch a kernel from Python
How to compile a CUDA C module into a PTX file
How to load a PTX module into PyCUDA to launch pre-compiled kernels
How to write your own custom Python interface to the CUDA Driver API

Table of Contents for Working with Compiled GPU Code

Create new playlist

Sign In

Sign Up

Table of Contents for
Working with Compiled GPU Code