Compute space and work groups

The number of invocations of a compute shader is governed by the user-defined compute space. This space is divided into a number of work groups. Each work group is then broken down into a number of invocations. We think of this in terms of the global compute space (all shader invocations) and the local work group space (the invocations within a particular work group). The compute space can be defined as a one-, two-, or three-dimensional space.

Technically, it is always defined as a three-dimensional space, but any of the three dimensions can be defined with a size of one (1), which effectively removes that dimension.

For example, a one-dimensional compute space with five work groups and three invocations per work group could be represented as the following diagram. The thicker lines represent the work groups, and the thinner lines represent the invocations within each work group:

In this case, we have 5 * 3 = 15 shader invocations. The grey shaded invocation is in work group 2, and within that work group is invocation 1 (the invocations are indexed starting at 0). We can also refer to that invocation with a global index of 7 by indexing the total number of invocations starting at zero. The global index determines an invocation's location within the global compute space, rather than just within the work group.

It is determined by taking the product of work group (2) and index the number of invocations per work group (3), plus the local invocation index (1) that is 2 * 3 + 1 = 7. The global index is simply the index of each invocation in the global compute space, starting at zero on the left and counting from there.

The following diagram shows a representation of a two-dimensional compute space where the space is divided into 20 work groups, four in the x direction and five in the y direction. Each work group is then divided into nine invocations, three in the x direction and three in the y direction:

The cell that is shaded in gray represents invocation (0, 1) within the work group (2, 0). The total number of compute shader invocations in this example is then 20 * 9 = 180. The global index of this shaded invocation is (6, 1). As with the one-dimensional case, we can think of this index as a global compute space (without the work groups), and it can be computed (for each dimension) by the number of invocations per work group times the work group index, plus the local invocation index. For the x dimension, this would be 3 * 2 + 0 = 6, and for the y dimension it is 3 * 0 + 1 = 1.

The same idea can extend in a straightforward manner to a three-dimensional compute space. In general, we choose the dimensionality based on the data to be processed. For example, if I'm working on the physics of a particle simulation, I would just have a list of particles to process, so a one-dimensional compute space might make sense. On the other hand, if I'm processing a cloth simulation, the data will have a grid structure, so a two-dimensional compute space would be appropriate.

There are limits to the total number of work groups and local shader invocations. These can be queried (via glGetInteger*) using the GL_MAX_COMPUTE_WORK_GROUP_COUNTGL_MAX_COMPUTE_WORK_GROUP_SIZE, and GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS parameters.

The order of execution of the work groups and thereby the individual shader invocations is unspecified and the system can execute them in any order. Therefore, we shouldn't rely on any particular ordering of the work groups. Local invocations within a particular work group will be executed in parallel (if possible). Therefore, any communication between invocations should be done with great care. Invocations within a work group can communicate via shared local data, but invocations should not (in general) communicate with invocations in other work groups without the consideration of the various pitfalls involved such as deadlock and data races. In fact, those can also be issues for local shared data within a work group as well, and care must be taken to avoid these problems. In general, for reasons of efficiency, it is best to only attempt communication within a work group. As with any kind of parallel programming, "there be dragons here."

OpenGL provides a number of atomic operations and memory barriers that can help with the communication between invocations. We'll see some examples in the recipes that follow.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.97.64