CUDA Thread Programming

CUDA has a hierarchical thread architecture so that we can control CUDA threads in groups. Understanding how they work in parallel on a GPU helps you to write parallel programming code and achieve better performance. In this chapter, we will cover CUDA thread operations and their relationship with GPU resources. As a practical experience, we will investigate the parallel reduction algorithm and see how we can optimize CUDA code by using optimization strategies.

In this chapter, you will learn how CUDA threads operate in a GPU: parallel and concurrent thread execution, warp execution, memory bandwidth issues, control overheads, SIMD operation, and so on.

The following topics will be covered in this chapter:

  • Hierarchical CUDA thread operations
  • Understanding CUDA occupancy
  • Data sharing across multiple CUDA threads
  • Identifying an application's performance limiter
  • Minimizing the CUDA warp divergence effect
  • Increasing memory utilization and grid-stride loops
  • Cooperative Groups for flexible thread handling
  • Warp synchronous programming
  • Low-/mixed-precision operations
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.254.138