Memory throughput analysis

It becomes important for an application developer to understand the memory throughput of an application. This can be defined in two ways:

  • From an app point of view: Counts the bytes that were requested by the application
  • From a hardware point of view: Count the bytes that were moved by the hardware

The two numbers are completely different. There are many reasons for this, including uncoalesced access resulting in not all of the transaction bytes being utilized, shared memory bank conflicts, and so on. The two aspects we should use to analyze the application from a memory point of view are as follows:

  • Address pattern: Determining the access pattern in real code is quite difficult and hence the use of tools such as profilers becomes really important. The metrics that are shown by the profiler, such as global memory efficiency and L1/L2 transactions per access need to be carefully looked at. 
  • The number of concurrent accesses in flight: As a GPU is a latency-hiding architecture, it becomes important to saturate the memory bandwidth. But determining the number of concurrent accesses is generally insufficient. Also, the throughput from an HW point of view is much more different than the theoretical value.

The following diagram demonstrates that ~6 KB of data in flight per SM can reach 90% of peak bandwidth for the Volta architecture. The same experiment, when done on a previous generation architecture, will yield a different graph. In general, it is recommended to understand the GPU memory characteristic for a particular architecture in order to get the best performance from that hardware:

This section provided us with sample uses of global memory and how we can utilize it in an optimal fashion. Sometimes, coalesced data access from global memory is difficult (for example, in CFD domains, in the case of unstructured grids, the data of neighboring cells may not reside next to each other in memory). To solve a problem like this or to reduce the impact on performance, we need to make use of another form of memory, known as shared memory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.214.32