Memory access in GPUs

By now, it should hopefully be clear to you that fast and local memory is key to the performance of the kinds of workloads we are offloading to our processor when doing deep learning. It is, however, not just the quantity and proximity of memory that matters, but also how this memory is accessed. Think of sequential access versus random access performance on hard drives, as the principle is the same.

Why does this matter for DNNs? Put simply, they are high-dimensional structures that have to be embedded, ultimately, in a 1D space for the memory that feeds our ALUs. Modern (vector) GPUs, built for graphics workloads, assume that they will be accessing adjacent memory, which is where one part of a 3D scene will be stored next to a related part (adjacent pixels in a frame). Thus, they are optimized for this assumption. Our networks are not 3D scenes. The layout of their data is sparse and dependent on network (and, in turn, graph) structure and the information they hold.

The following diagram represents the memory access motifs for these different workloads:

For DNNs, we are looking to get as close to Strided memory access patterns as possible when we write our operations. After all, matrix multiplication happens to be one of the more common operations in DNNs.

Table of Contents for Memory access in GPUs

Create new playlist

Sign In

Sign Up

Table of Contents for
Memory access in GPUs