If the order of execution of the invocations of a local workgroup and all of the local workgroups that make up the global workgroup are not defined, the operations that an invocation performs can occur out of order with respect to other invocations. If no communication between the invocations is required and they can all run completely independently of each other, then this likely isn’t going to be an issue. However, if the invocations need to communicate with each other either through images and buffers or through shared variables, then it may be necessary to synchronize their operations with each other.

There are two types of synchronization commands. The first is an execution barrier, which is invoked using the barrier() function. This is similar to the barrier() function you can use in a tessellation control shader to synchronize the invocations that are processing the control points. When an invocation of a compute shader reaches a call to barrier(), it will stop executing and wait for all other invocations within the same local workgroup to catch up. Once the invocation resumes executing, having returned from the call to barrier(), it is safe to assume that all other invocations have also reached their corresponding call to barrier(), and have completed any operations that they performed before this call. The usage of barrier() in a compute shader is somewhat more flexible than what is allowed in a tessellation control shader. In particular, there is no requirement that barrier() be called only from the shader’s main() function. Calls to barrier() must, however, only be executed inside uniform flow control. That is, if one invocation within a local workgroup executes a barrier() function, then all invocations within that workgroup must also execute the same call. This seems logical as one invocation of the shader has no knowledge of the control flow of any other and must assume that the other invocations will eventually reach the barrier—if they do not, then deadlock can occur.

When communicating between invocations within a local workgroup, you can write to shared variables from one invocation and then read from them in another. However, you need to make sure that by the time you read from a shared variable in the destination invocation that the source invocation has completed the corresponding write to that variable. To ensure this, you can write to the variable in the source invocation, and then in both invocations execute the barrier() function. When the destination invocation returns from the barrier() call, it can be sure that the source invocation has also executed the function (and therefore completed the write to the shared variable), and so it is safe to read from the variable.

The second type of synchronization primitive is the memory barrier. The heaviest, most brute-force version of the memory barrier is memoryBarrier(). When memoryBarrier() is called, it ensures that any writes to memory that have been performed by the shader invocation have been committed to memory rather than lingering in caches or being scheduled after the call to memoryBarrier(), for example. Any operations that occur after the call to memoryBarrier() will see the results of those memory writes if the same memory locations are read again—even in different invocations of the same compute shader. Furthermore, memoryBarrier() can serve as instruction to the shader compiler to not reorder memory operations if it means that they will cross the barrier. If memoryBarrier() seems somewhat heavy handed, that would be an astute observation. In fact, there are several other memory barrier functions that serve as subsets of the memoryBarrier() mega function. In fact, memoryBarrier() is simply defined as calling each of these subfunctions back to back in some undefined (but not really relevant) order.

The memoryBarrierAtomicCounter() function wait for any updates to atomic counters to complete before continuing. The memoryBarrierBuffer() and memoryBarrierImage() functions wait for any write accesses to buffer and image variables to complete, respectively. The memoryBarrierShared() function waits for any updates to variables declared with the shared qualifier. These functions allow much finer-grained control over what types of memory accesses are waited for. For example, if you are using an atomic counter to arbitrate accesses to a buffer variable, you might want to ensure that updates to atomic counters are seen by other invocations of the shader without necessarily waiting for any prior writes to the buffer to complete, as the latter may take much longer than the former. Also, calling memoryBarrierAtomicCounter() will allow the shader compiler to reorder accesses to buffer variables without violating the logic implied by atomic counter operations.

Note that even after a call to memoryBarrier() or one of its subfunctions, there is still no guarantee that all other invocations have reached this point in the shader. To ensure this, you will need to call the execution barrier function, barrier(), before reading from memory that would have been written prior to the call to memoryBarrier().

Use of memory barriers is not necessary to ensure the observed order of memory transactions within a single shader invocation. Reading the value of a variable in a particular invocation of a shader will always return the value most recently written to that variable, even if the compiler reordered them behind the scenes.

One final function, groupMemoryBarrier() is effectively equivalent to memoryBarrier(), except that it applies only to other invocations within the same local workgroup. All of the other memory barrier functions apply globally. That is, they ensure that memory writes performed by any invocation in the global workgroup is committed before continuing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.