Analyzing the optimal occupancy using the Occupancy Calculator

In practice, we can use the CUDA Occupancy Calculator, which is provided with the CUDA Toolkit. Using this, we can obtain theoretical occupancy by providing some kernel information. The calculator is an Excel file, and you can find it in the following, based on the OS you use:

  • Windows: C:Program FilesNVIDIA GPU Computing ToolkitCUDA<cuda-version> ools
  • Linux: /usr/local/cuda/tools
  • macOS: /Developer/NVIDIA/<cuda-version>/tools

The following is a screenshot of the calculator:

CUDA Occupancy Calculator

This calculator has two parts: kernel information inputs and occupancy information outputs. As input, it requires two kinds of information, as follows:

  • The GPU's compute capability (green)
  • Thread block resource information (yellow):
    • Threads per CUDA thread block
    • Registers per CUDA thread
    • Shared memory per block

The calculator shows the GPU's occupancy information here:

  • GPU occupancy data (blue)
  • The GPU's physical limitation for GPU compute capability (gray)
  • Allocated resources per block (yellow)
  • Maximum thread blocks per stream multiprocessor (yellow, orange, and red) 
  • Occupancy limit graph following three key occupancy resources, which are threads, registers, and shared memory per block
  • Red triangles on graphs, which show the current occupancy data

Now, let's put the obtained information into the calculator. We can edit the green-and orange-colored areas in the Excel sheet:

Enter your acquired kernel resource information, and see how the sheet changes. 

Depending on compute capability and input data, the occupancy changes, as shown in the following screenshot: 

Changes in occupancy depending on compute capability and input data

The blue-colored area shows the kernel function's achieved occupancy. In this screenshot, it shows 100% occupancy achievements. The right-hand side of the sheet presents the occupancy utilization graphs for GPU resources: CUDA threads, shared memory, and registers.

In general, kernel code cannot have 100% theoretical occupancy due to many reasons. However, setting the pick occupancy is the start of utilizing GPU resources efficiently.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.154.171