Working with Compiled GPU Code

Throughout the course of this book, we have generally been reliant on the PyCUDA library to interface our inline CUDA-C code for us automatically, using just-in-time compilation and linking with our Python code. We might recall, however, that sometimes the compilation process can take a while. In Chapter 3, Getting Started With PyCUDA, we even saw in detail how the compilation process can contribute to slowdown, and how it can even be somewhat arbitrary as to when inline code will be compiled and retained. In some cases, this may be inconvenient and cumbersome given the application, or even unacceptable in the case of a real-time system. 

To this end, we will finally see how to use pre-compiled GPU code from Python. In particular, we will look at three distinct ways to do this. First, we will look at how we can do this by writing a host-side CUDA-C function that can indirectly launch a CUDA kernel. This method will involve invoking the host-side function with the standard Python Ctypes library. Second, we will compile our kernel into what is known as a PTX module, which is effectively a DLL file containing compiled binary GPU. We can then load this file with PyCUDA and launch our kernel directly. Finally, we will end this chapter by looking at how to write our own full-on Ctypes interface to the CUDA Driver API. We can then use the appropriate functions from the Driver API to load our PTX file and launch a kernel.

The learning outcomes for this chapter are as follows:

  • Launching compiled (host-side) code with the Ctypes module
  • Using host-side CUDA C wrappers with Ctypes to launch a kernel from Python
  • How to compile a CUDA C module into a PTX file
  • How to load a PTX module into PyCUDA to launch pre-compiled kernels
  • How to write your own custom Python interface to the CUDA Driver API
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.121.101