
We started this chapter with a brief overview of the Python Ctypes library, which is used to interface directly with compiled binary code, and particularly dynamic libraries written in C/C++. We then looked at how to write a C-based wrapper with CUDA-C that launches a CUDA kernel, and then used this to indirectly launch our CUDA kernel from Python by writing an interface to this function with Ctypes. We then learned how to compile a CUDA kernel into a PTX module binary, which can be thought of as a DLL but with CUDA kernel functions, and saw how to load a PTX file and launch pre-compiled kernels with PyCUDA. Finally, we wrote a collection of Ctypes wrappers for the CUDA Driver API and saw how we can use these to perform basic GPU operations, including launching a pre-compiled kernel from a PTX file onto the GPU.

We will now proceed to what will arguably be the most technical chapter of this book: Chapter 11, Performance Optimization in CUDA. In this chapter, we will learn about some of the technical ins and outs of NVIDIA GPUs that will assist us in increasing performance levels in our applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.