i
i
i
i
i
i
i
i
2
VII
Order-Independent Transparency
using Per-Pixel Linked Lists
Nicolas Thibieroz
2.1 Introduction
Order-independent transparency (OIT) has been an active area of research in
real-time computer graphics for a number of years. The main area of research
has focused on how to effect fast and efficient back-to-front sorting and render-
ing of translucent fragments to ensure visual correctness when order-dependent
blending modes are employed. The complexity of the task is such that many real-
time applications have chosen to forfeit this step altogether in favor of simpler and
faster alternative methods such as sorting per object or per triangle, or simply
falling back to order-independent blending modes (e.g., additive blending) that
don’t require any sorting [Thibieroz 08]. Different OIT techniques have previously
been described (e.g., [Everitt 01], [Myers 07]) and although those techniques suc-
ceed in achieving correct ordering of translucent fragments, they usually come
with performance, complexity, or compatibility shortcomings that make their use
difficult for real-time gaming scenarios.
This chapter presents an OIT implementation relying on the new features
of the DirectX 11 API from Microsoft.
1
An overview of the algorithm will be
presented first, after which each phase of the method will be explained in detail.
Sorting, blending, multisampling, anti-aliasing support, and optimizations will
be treated in separate sections before concluding with remarks concerning the
attractiveness and future of the technique.
2.2 Algorithm Overview
The OIT algorithm presented in this chapter shares some similarities with the
A-buffer algorithm [Carpenter 84], whereby translucent fragments are stored in
1
“Microsoft DirectX” is a registered trademark.
409
i
i
i
i
i
i
i
i
410 VII GPGPU
buffers for later rendering. Our method uses linked lists [Yang 10] to store a list
of translucent fragments for each pixel. Therefore, every screen coordinate in the
render viewport will contain an entry to a unique per-pixel linked list containing
all translucent fragments at that particular location.
Prior to rendering translucent geometry, all opaque and transparent (alpha-
tested) models are rendered onto the render target viewport as desired. Then the
OIT algorithm can be invoked to render corrected-ordered translucent fragments.
The algorithm relies on a two-step process.
1. The first step is the creation of per-pixel linked lists whereby the translucent
contribution to the scene is entirely captured into a pair of buffers containing
the head pointers and linked lists nodes for all translucent pixels.
2. The second step is the traversal of per-pixel linked lists to fetch, sort, blend
and finally render all pixels in the correct order onto the destination render
viewport.
2.3 DirectX 11 Features Requisites
DirectX 11 has introduced new features that finally make it possible to create
and parse concurrent linked lists on the GPU.
2.3.1 Unordered Access Views
Unordered access view (UAV) is a special type of resource view that can be bound
as output to a pixel or compute shader to allow the programmer to write data at
arbitrary locations. The algorithm uses a pair of UAVs to store per-pixel linked
lists nodes and head pointers.
2.3.2 Atomic Operations
The creation of per-pixel linked lists also requires a way to avoid any contention
when multiple pixel shader invocations perform memory operations into a buffer.
Indeed such read/modify/write operations must be guaranteed to occur atom-
ically for the algorithm to work as intended. DirectX 11 supports a set of
Interlocked*() Shader Model 5.0 instructions that fulfill this purpose. Such
atomic operation will be used to keep track of the head pointer address when
creating the per-pixel linked lists.
2.3.3 Hardware Counter Support on UAV
A less well-known feature of DirectX 11 allows the creation of a built-in hardware
counter on buffer UAVs. This counter is declared by specifying the
D3D11 BUFFER UAV FLAG COUNTER flag when generating a UAV for the intended
i
i
i
i
i
i
i
i
2. Order-Independent Transparency using Per-Pixel Linked Lists 411
buffer. The programmer is given control of the counter via the following two
Shader Model 5.0 methods:
u i n t <Buf fe r >. IncrementCoun t er ( ) ;
u i n t <Buf fe r >. DecrementCounter ( ) ;
Hardware counter support is used to keep track of the offset at which to store
the next linked list node.
While hardware counter support is not strictly required for the algorithm to
work, it enables considerable performance improvement compared to manually
keeping track of a counter via a single-element buffer UAV.
2.3.4 Early Depth/Stencil Rejection
Graphics APIs like OpenGL and Direct3D specify that the depth/stencil test
be executed after the pixel shader stage in the graphics pipeline. A problem
arises when the pixel shader outputs to UAVs because UAVs may be written
into the shader even though the subsequent depth/stencil test actually discards
the pixel, which may not be the intended behavior of the algorithm. Shader
Model 5.0 allows the [earlydepthstencil] keyword to be declared in front of
a pixel shader function to indicate that the depth/stencil test is to be explicitly
performed before the pixel shader stage, allowing UAV writes to be carried out
only if the depth/ stencil test succeeds first. This functionality is important for
the algorithm presented in this chapter, since only visible translucent fragments
need storing into the per-pixel linked lists.
2.3.5 SV COVERAGE Pixel Shader Input
DirectX 11 allows SV COVERAGE to be declared as an input to the pixel shader
stage. SV COVERAGE contains a bit mask of all samples that are covered by the
current primitive. This information is used by this OIT technique when multi-
sampling antialiasing (MSAA) is enabled.
2.3.6 Per-sample Pixel Shader Execution
DirectX 11 allows the pixel shader stage to execute per sample (as opposed to
per pixel) when MSAA is enabled. This functionality will be exploited to allow
MSAA support with our OIT technique.
2.4 Head Pointer and Nodes Buffers
The algorithm builds a reverse linked list for each pixel location in the target
viewport. The linked list head pointer is the address of the first element in the
i
i
i
i
i
i
i
i
412 VII GPGPU
linked list. The linked list nodes are the individual elements of the linked list
that contain fragment data as well as the address of the next node.
2.4.1 Head Pointer Buffer
The algorithm allocates a screen-sized buffer of type DXGI FORMAT R32 UINT that
contains the address of the head pointer for every 2D coordinate on the screen.
Despite the resource having the same dimensions as the render viewport,
the declaration must employ the buffer type because atomic operations are not
supported on Texture2D resources. Therefore an extra step will be required in
the shader wherein a 2D screen-coordinate location is converted to a byte-linear
address:
u i n t uL in ea rA dd re s s I n B y t e s = 4 ( S cree n Pos . yRENDERWIDTH +
Scre e nPos . x ) ;
The head pointer buffer is initialized to a “magic” value of 0xFFFFFFFF, in-
dicating that the linked list is empty to start with. In effect, an address of
0xFFFFFFFF indicates that no more nodes are available (i.e., the end of the list
has been reached).
The term reverse linked list is used because the head pointer is dynamically
updated at construction time to receive the address of the latest linked list node
written out at each pixel location. Once construction is complete, the head
pointer value effectively contains the address of the last node written out, with
this last node sequentially pointing to the nodes previously stored for the same
pixel location.
2.4.2 Nodes Buffer
The nodes buffer stores the nodes of all per-pixel linked lists. We cannot allocate
individual nodes buffers for every linked list since the render viewport dimensions
guarantee that a huge number of them will be required (one for each pixel in
the render viewport). Therefore the nodes buffer must be allocated with enough
memory to accommodate all possible translucent fragments in the scene. It is the
responsibility of the programmer to define this upper limit. A good heuristic to
use is to base the allocation size on the render viewport dimensions multiplied by
the average translucent overdraw expected. Certainly the size of the nodes buffer
is likely to be very large, and may place an unreasonable burden on the video
memory requirements of the OIT technique. Section 2.9 introduces a significant
optimization to dramatically reduce the memory requirements of this algorithm.
The UAV for the nodes buffer is created with a built-in hardware counter
initialized to zero as a way to keep track of how many fragments have been
stored in the buffer.
i
i
i
i
i
i
i
i
2. Order-Independent Transparency using Per-Pixel Linked Lists 413
2.4.3 Fragment Data
Each linked list node stores fragment data as well as the address of the next
node. The address of the next node is stored as a uint while the type and size
of the fragment data depends on the needs of the application. Typically the
fragment structure includes fragment depth and color but other variables may be
stored as needed (e.g., normal, blending mode id, etc.). Fragment depth is an
essential component of the fragment structure since it will be required at linked
list traversal time to correctly sort fragments prior to rendering.
It is important to point out that any additional structure member will increase
the total size of the nodes buffer and therefore the total memory requirement of
the OIT solution. It is therefore desirable to economize the size of the fragment
structure. The implementation presented in this chapter packs the fragment color
into a single uint type and uses the following node data structure for a total of
12 bytes per fragment:
struct NodeData STRUCT
{
u i n t uCol o r ; // Fragment c o l o r packed as RGBA
u i n t uDepth ; // Fragment depth
u i n t uNext ; // Address o f next l in k e d l i s t node
} ;
2.5 Per-Pixel Linked List Creation
The algorithm presented does not use DirectCompute; instead, the storing of
translucent fragments into linked lists is done via a pixel shader writing to the
head pointer and nodes buffers UAVs. Earlier shader stages in the pipeline are
enabled (vertex shader, but also hull shader, domain shader and geometry shader
if needed) in order to turn incoming triangles into fragments that can be stored
into per-pixel linked lists.
No color render target is bound at this stage although a depth buffer is still
required to ensure that only translucent fragments that pass the depth/stencil
test are stored in the per-pixel linked lists. Binding a depth buffer avoids the
need to store translucent fragments that would result in being completely hidden
by previously-rendered opaque or alpha-tested geometry.
2.5.1 Algorithm Description
A description of the algorithm used to build per-pixel linked lists follows.
For each frame
Clear head pointer buffer to 0xFFFFFFFF (1). This indicates that
the per-pixel linked lists are all empty to start with.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.81.232