CUDA Thread Programming

CUDA has a hierarchical thread architecture so that we can control CUDA threads in groups. Understanding how they work in parallel on a GPU helps you to write parallel programming code and achieve better performance. In this chapter, we will cover CUDA thread operations and their relationship with GPU resources. As a practical experience, we will investigate the parallel reduction algorithm and see how we can optimize CUDA code by using optimization strategies.

In this chapter, you will learn how CUDA threads operate in a GPU: parallel and concurrent thread execution, warp execution, memory bandwidth issues, control overheads, SIMD operation, and so on.

The following topics will be covered in this chapter:

Hierarchical CUDA thread operations
Understanding CUDA occupancy
Data sharing across multiple CUDA threads
Identifying an application's performance limiter
Minimizing the CUDA warp divergence effect
Increasing memory utilization and grid-stride loops
Cooperative Groups for flexible thread handling
Warp synchronous programming
Low-/mixed-precision operations

Table of Contents for CUDA Thread Programming

Create new playlist

Sign In

Sign Up

Table of Contents for
CUDA Thread Programming