Debugging and Profiling Your CUDA Code

In this chapter, we will finally learn how to debug and profile our GPU code using several different methods and tools. While we can easily debug pure Python code using IDEs such as Spyder and PyCharm, we can't use these tools to debug the actual GPU code, remembering that the GPU code itself is written in CUDA-C with PyCUDA providing an interface. The first and easiest method for debugging a CUDA kernel is the usage of printf statements, which we can actually call directly in the middle of a CUDA kernel to print to the standard output. We will see how to use printf in the context of CUDA and how to apply it effectively for debugging.

Next, we will fill in some of the gaps in our CUDA-C programming so that we can directly write CUDA programs within the NVIDIA Nsight IDE, which will allow us to make test cases in CUDA-C for some of the code we have been writing. We will take a look at how to compile CUDA-C programs, both from the command line with nvcc and also with the Nsight IDE. We will then see how to debug within Nsight and use Nsight to understand the CUDA lockstep property. Finally, we will have an overview of the NVIDIA command line and Visual Profilers for profiling our code.

The learning outcomes for this chapter include the following:

Using printf effectively as a debugging tool for CUDA kernels
Writing complete CUDA-C programs outside of Python, especially for creating test cases for debugging
Compiling CUDA-C programs on the command line with the nvcc compiler
Developing and debugging CUDA programs with the NVIDIA Nsight IDE
Understanding the CUDA warp lockstep property and why we should avoid branch divergence within a single CUDA warp
Learn to effectively use the NVIDIA command line and Visual Profilers for GPU code

Table of Contents for Debugging and Profiling Your CUDA Code

Create new playlist

Sign In

Sign Up

Table of Contents for
Debugging and Profiling Your CUDA Code