Implementing order-independent transparency

Transparency can be a difficult effect to do accurately in pipeline architectures like OpenGL. The general technique is to draw opaque objects first, with the depth buffer enabled, then to make the depth buffer read-only (using glDepthMask), disable the depth test, and draw the transparent geometry. However, care must be taken to ensure that the transparent geometry is drawn from back to front. That is, objects farther from the viewer should be drawn before the objects that are closer. This requires some sort of depth-sorting to take place prior to rendering.

The following images show an example of a block of small, semi-transparent spheres with some semi-transparent cubes placed evenly within them. On the right-hand side, the objects are rendered in an arbitrary order, using standard OpenGL blending. The result looks incorrect because objects are blended in an improper order. The cubes, which were drawn last, appear to be on top of the spheres, and the spheres look jumbled, especially in the middle of the block. On the left, the scene is drawn using proper ordering, so objects appear to be oriented correctly with respect to depth, and the overall look is more realistic looking:

Order Independent Transparency (OIT) means that we can draw objects in any order and still get accurate results. Depth sorting is done at some other level, perhaps within the fragment shader, so that the programmer need not sort objects before rendering. There are a variety of techniques for doing this; one of the most common technique is to keep a list of colors for each pixel, sort them by depth, and then blend them together in the fragment shader. In this recipe we'll use this technique to implement OIT, making use of some of the newest features in OpenGL 4.3.

Shader storage buffer objects (SSBO) and image load/store are some of the newest features in OpenGL, introduced in 4.3 and 4.2, respectively. They allow arbitrary read/write access to data from within a shader. Prior to this, shaders were very limited in terms of what data they could access. They could read from a variety of locations (textures, uniforms, and so on), but writing was very limited. Shaders could only write to controlled, isolated locations such as fragment shader outputs and transform feedback buffers. This was for a very good reason. Since shaders can execute in parallel and in a seemingly arbitrary order, it is very difficult to ensure that data is consistent between instantiations of a shader. Data written by one shader instance might not be visible to another shader instance whether or not that instance is executed after the other. Despite this, there are good reasons for wanting to read and write to shared locations. With the advent of SSBOs and image load/store, that capability is now available to us. We can create buffers and textures (called images) with read/write access to any shader instance. This is especially important for compute shaders, the subject of Chapter 11, Using Compute Shaders. However, this power comes at a price. The programmer must now be very careful to avoid the types of memory consistency errors that come along with writing to memory that is shared among parallel threads. Additionally, the programmer must be aware of the performance issues that come with synchronization between shader invocations.

For a more thorough discussion of the issues involved with memory consistency and shaders, refer to Chapter 11, of The OpenGL Programming Guide, 8th Edition. That chapter also includes another similar implementation of OIT.

In this recipe, we'll use SSBOs and image load/store to implement order-independent transparency. We'll use two passes. In the first pass, we'll render the scene geometry and store a linked list of fragments for each pixel. After the first pass, each pixel will have a corresponding linked list containing all fragments that were written to that pixel, including their depth and color. In the second pass, we'll draw a fullscreen quad to invoke the fragment shader for each pixel. In the fragment shader, we'll extract the linked list for the pixel, sort the fragments by depth (largest to smallest), and blend the colors in that order. The final color will then be sent to the output device.

That's the basic idea, so let's dig into the details. We'll need three memory objects that are shared among the fragment shader instances:

An atomic counter: This is just an unsigned integer that we'll use to keep track of
the size of our linked list buffer. Think of this as the index of the first unused slot in the buffer.
A head-pointer texture that corresponds to the size of the screen: The texture will store a single unsigned integer in each texel. The value is the index of the head of the linked list for the corresponding pixel.
A buffer containing all of our linked lists: Each item in the buffer will correspond to a fragment, and contains a struct with the color and depth of the fragment as well as an integer, which is the index of the next fragment in the linked list.

In order to understand how all of this works together, let's consider a simple example. Suppose that our screen is three pixels wide and three pixels high. We'll have a head pointer texture that is the same dimensions, and we'll initialize all of the texels to a special value that indicates the end of the linked list (an empty list). In the following diagram, that value is shown as an x, but in practice, we'll use 0xffffffff. The initial value of the counter is zero, and the linked list buffer is allocated to a certain size but treated as empty initially. The initial state of our memory is shown in the following diagram:

Now suppose that a fragment is rendered at the position (0,1) with a depth of 0.75. The fragment shader will take the following steps:

Increment the atomic counter. The new value will be 1, but we'll use the previous value (0) as the index for our new node in the linked list.
Update the head pointer texture at (0,1) with the previous value of the counter (0). This is the index of the new head of the linked list at that pixel. Hold on to the previous value that was stored there (x), as we'll need that in the next step.

Add a new value into the linked list buffer at the location corresponding to the previous value of the counter (0). Store the color of the fragment and its depth here. Store in the next component the previous value of the head pointer texture at (0,1) that we held on to in step 2. In this case, it is the special value indicating the end of the list.

After processing this fragment, the memory layout looks as follows:

Now, suppose another fragment is rendered at (0,1), with a depth of 0.5. The fragment shader will execute the same steps as the previous ones, resulting in the following memory layout:

We now have a two-element linked list starting at index 1 and ending at index 0. Suppose, now that we have three more fragments in the following order: a fragment at (1,1) with a depth of 0.2, a fragment at (0,1) with a depth of 0.3, and a fragment at (1,1) with a depth of 0.4. Following the same steps for each fragment, we get the following result:

The linked list at (0,1) consists of fragments {3, 1, 0} and the linked list at (1,1) contains fragments {4, 2}.

Now, we must keep in mind that due to the highly parallel nature of GPUs, fragments can be rendered in virtually any order. For example, fragments from two different polygons might proceed through the pipeline in the opposite order as to when the draw instructions for polygons were issued. As a programmer, we must not expect any specific ordering of fragments. Indeed, instructions from separate instances of the fragment shader may interleave in arbitrary ways. The only thing that we can be sure of is that the statements within a particular instance of the shader will execute in order. Therefore, we need to convince ourselves that any interleaving of the previous three steps will still result in a consistent state. For example, suppose instance one executes steps 1 and 2, then another instance (another fragment, perhaps at the same fragment coordinates) executes steps 1, 2, and 3, before the first instance executes step 3. Will the result still be consistent? I think you can convince yourself that it will be, even though the linked list will be broken for a short time during the process. Try working through other interleavings and convince yourself that we're OK.

Not only can statements within separate instances of a shader interleave with each other, but the sub-instructions that make up the statements can interleave. (For example, the sub-instructions for an increment operation consist of a load, increment, and a store.) What's more, they could actually execute at exactly the same time. Consequently, if we aren't careful, nasty memory consistency issues can crop up. To help avoid this, we need to make careful use of the GLSL support for atomic operations.

Recent versions of OpenGL (4.2 and 4.3) have introduced the tools that we need to make this algorithm possible. OpenGL 4.2 introduced atomic counters and the ability to read and write to arbitrary locations within a texture (called image load/store). OpenGL 4.3 introduced shader storage buffer objects. We'll make use of all three of these features in this example, as well as the various atomic operations and memory barriers that go along with them.

Table of Contents for Implementing order-independent transparency

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing order-independent transparency