Debugging and Profiling Your CUDA Code

In this chapter, we will finally learn how to debug and profile our GPU code using several different methods and tools. While we can easily debug pure Python code using IDEs such as Spyder and PyCharm, we can't use these tools to debug the actual GPU code, remembering that the GPU code itself is written in CUDA-C with PyCUDA providing an interface. The first and easiest method for debugging a CUDA kernel is the usage of printf statements, which we can actually call directly in the middle of a CUDA kernel to print to the standard output. We will see how to use printf in the context of CUDA and how to apply it effectively for debugging. 

Next, we will fill in some of the gaps in our CUDA-C programming so that we can directly write CUDA programs within the NVIDIA Nsight IDE, which will allow us to make test cases in CUDA-C for some of the code we have been writing. We will take a look at how to compile CUDA-C programs, both from the command line with nvcc and also with the Nsight IDE. We will then see how to debug within Nsight and use Nsight to understand the CUDA lockstep property. Finally, we will have an overview of the NVIDIA command line and Visual Profilers for profiling our code.

The learning outcomes for this chapter include the following:

  • Using printf effectively as a debugging tool for CUDA kernels
  • Writing complete CUDA-C programs outside of Python, especially for creating test cases for debugging
  • Compiling CUDA-C programs on the command line with the nvcc compiler
  • Developing and debugging CUDA programs with the NVIDIA Nsight IDE
  • Understanding the CUDA warp lockstep property and why we should avoid branch divergence within a single CUDA warp
  • Learn to effectively use the NVIDIA command line and Visual Profilers for GPU code

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.220