Chapter 2. Order-Independent Transparency using Per-Pixel Linked Lists (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

VII

Order-Independent Transparency

using Per-Pixel Linked Lists

Nicolas Thibieroz

2.1 Introduction

Order-independent transparency (OIT) has been an active area of research in

real-time computer graphics for a number of years. The main area of research

has focused on how to eﬀect fast and eﬃcient back-to-front sorting and render-

ing of translucent fragments to ensure visual correctness when order-dependent

blending modes are employed. The complexity of the task is such that many real-

time applications have chosen to forfeit this step altogether in favor of simpler and

faster alternative methods such as sorting per object or per triangle, or simply

falling back to order-independent blending modes (e.g., additive blending) that

don’t require any sorting [Thibieroz 08]. Diﬀerent OIT techniques have previously

been described (e.g., [Everitt 01], [Myers 07]) and although those techniques suc-

ceed in achieving correct ordering of translucent fragments, they usually come

with performance, complexity, or compatibility shortcomings that make their use

diﬃcult for real-time gaming scenarios.

This chapter presents an OIT implementation relying on the new features

of the DirectX 11 API from Microsoft.

An overview of the algorithm will be

presented ﬁrst, after which each phase of the method will be explained in detail.

Sorting, blending, multisampling, anti-aliasing support, and optimizations will

be treated in separate sections before concluding with remarks concerning the

attractiveness and future of the technique.

2.2 Algorithm Overview

The OIT algorithm presented in this chapter shares some similarities with the

A-buﬀer algorithm [Carpenter 84], whereby translucent fragments are stored in

“Microsoft DirectX” is a registered trademark.

409

410 VII GPGPU

buﬀers for later rendering. Our method uses linked lists [Yang 10] to store a list

of translucent fragments for each pixel. Therefore, every screen coordinate in the

render viewport will contain an entry to a unique per-pixel linked list containing

all translucent fragments at that particular location.

Prior to rendering translucent geometry, all opaque and transparent (alpha-

tested) models are rendered onto the render target viewport as desired. Then the

OIT algorithm can be invoked to render corrected-ordered translucent fragments.

The algorithm relies on a two-step process.

1. The ﬁrst step is the creation of per-pixel linked lists whereby the translucent

contribution to the scene is entirely captured into a pair of buﬀers containing

the head pointers and linked lists nodes for all translucent pixels.

2. The second step is the traversal of per-pixel linked lists to fetch, sort, blend

and ﬁnally render all pixels in the correct order onto the destination render

viewport.

2.3 DirectX 11 Features Requisites

DirectX 11 has introduced new features that ﬁnally make it possible to create

and parse concurrent linked lists on the GPU.

2.3.1 Unordered Access Views

Unordered access view (UAV) is a special type of resource view that can be bound

as output to a pixel or compute shader to allow the programmer to write data at

arbitrary locations. The algorithm uses a pair of UAVs to store per-pixel linked

lists nodes and head pointers.

2.3.2 Atomic Operations

The creation of per-pixel linked lists also requires a way to avoid any contention

when multiple pixel shader invocations perform memory operations into a buﬀer.

Indeed such read/modify/write operations must be guaranteed to occur atom-

ically for the algorithm to work as intended. DirectX 11 supports a set of

Interlocked*() Shader Model 5.0 instructions that fulﬁll this purpose. Such

atomic operation will be used to keep track of the head pointer address when

creating the per-pixel linked lists.

2.3.3 Hardware Counter Support on UAV

A less well-known feature of DirectX 11 allows the creation of a built-in hardware

counter on buﬀer UAVs. This counter is declared by specifying the

D3D11 BUFFER UAV FLAG COUNTER ﬂag when generating a UAV for the intended

2. Order-Independent Transparency using Per-Pixel Linked Lists 411

buﬀer. The programmer is given control of the counter via the following two

Shader Model 5.0 methods:

u i n t <Buf fe r >. IncrementCoun t er ( ) ;

u i n t <Buf fe r >. DecrementCounter ( ) ;

Hardware counter support is used to keep track of the oﬀset at which to store

the next linked list node.

While hardware counter support is not strictly required for the algorithm to

work, it enables considerable performance improvement compared to manually

keeping track of a counter via a single-element buﬀer UAV.

2.3.4 Early Depth/Stencil Rejection

Graphics APIs like OpenGL and Direct3D specify that the depth/stencil test

be executed after the pixel shader stage in the graphics pipeline. A problem

arises when the pixel shader outputs to UAVs because UAVs may be written

into the shader even though the subsequent depth/stencil test actually discards

the pixel, which may not be the intended behavior of the algorithm. Shader

Model 5.0 allows the [earlydepthstencil] keyword to be declared in front of

a pixel shader function to indicate that the depth/stencil test is to be explicitly

performed before the pixel shader stage, allowing UAV writes to be carried out

only if the depth/ stencil test succeeds ﬁrst. This functionality is important for

the algorithm presented in this chapter, since only visible translucent fragments

need storing into the per-pixel linked lists.

2.3.5 SV COVERAGE Pixel Shader Input

DirectX 11 allows SV COVERAGE to be declared as an input to the pixel shader

stage. SV COVERAGE contains a bit mask of all samples that are covered by the

current primitive. This information is used by this OIT technique when multi-

sampling antialiasing (MSAA) is enabled.

2.3.6 Per-sample Pixel Shader Execution

DirectX 11 allows the pixel shader stage to execute per sample (as opposed to

per pixel) when MSAA is enabled. This functionality will be exploited to allow

MSAA support with our OIT technique.

2.4 Head Pointer and Nodes Buﬀers

The algorithm builds a reverse linked list for each pixel location in the target

viewport. The linked list head pointer is the address of the ﬁrst element in the

412 VII GPGPU

linked list. The linked list nodes are the individual elements of the linked list

that contain fragment data as well as the address of the next node.

2.4.1 Head Pointer Buﬀer

The algorithm allocates a screen-sized buﬀer of type DXGI FORMAT R32 UINT that

contains the address of the head pointer for every 2D coordinate on the screen.

Despite the resource having the same dimensions as the render viewport,

the declaration must employ the buﬀer type because atomic operations are not

supported on Texture2D resources. Therefore an extra step will be required in

the shader wherein a 2D screen-coordinate location is converted to a byte-linear

address:

u i n t uL in ea rA dd re s s I n B y t e s = 4 ∗ ( S cree n Pos . y∗RENDERWIDTH +

Scre e nPos . x ) ;

The head pointer buﬀer is initialized to a “magic” value of 0xFFFFFFFF, in-

dicating that the linked list is empty to start with. In eﬀect, an address of

0xFFFFFFFF indicates that no more nodes are available (i.e., the end of the list

has been reached).

The term reverse linked list is used because the head pointer is dynamically

updated at construction time to receive the address of the latest linked list node

written out at each pixel location. Once construction is complete, the head

pointer value eﬀectively contains the address of the last node written out, with

this last node sequentially pointing to the nodes previously stored for the same

pixel location.

2.4.2 Nodes Buﬀer

The nodes buﬀer stores the nodes of all per-pixel linked lists. We cannot allocate

individual nodes buﬀers for every linked list since the render viewport dimensions

guarantee that a huge number of them will be required (one for each pixel in

the render viewport). Therefore the nodes buﬀer must be allocated with enough

memory to accommodate all possible translucent fragments in the scene. It is the

responsibility of the programmer to deﬁne this upper limit. A good heuristic to

use is to base the allocation size on the render viewport dimensions multiplied by

the average translucent overdraw expected. Certainly the size of the nodes buﬀer

is likely to be very large, and may place an unreasonable burden on the video

memory requirements of the OIT technique. Section 2.9 introduces a signiﬁcant

optimization to dramatically reduce the memory requirements of this algorithm.

The UAV for the nodes buﬀer is created with a built-in hardware counter

initialized to zero as a way to keep track of how many fragments have been

stored in the buﬀer.

2. Order-Independent Transparency using Per-Pixel Linked Lists 413

2.4.3 Fragment Data

Each linked list node stores fragment data as well as the address of the next

node. The address of the next node is stored as a uint while the type and size

of the fragment data depends on the needs of the application. Typically the

fragment structure includes fragment depth and color but other variables may be

stored as needed (e.g., normal, blending mode id, etc.). Fragment depth is an

essential component of the fragment structure since it will be required at linked

list traversal time to correctly sort fragments prior to rendering.

It is important to point out that any additional structure member will increase

the total size of the nodes buﬀer and therefore the total memory requirement of

the OIT solution. It is therefore desirable to economize the size of the fragment

structure. The implementation presented in this chapter packs the fragment color

into a single uint type and uses the following node data structure for a total of

12 bytes per fragment:

struct NodeData STRUCT

{

u i n t uCol o r ; // Fragment c o l o r packed as RGBA

u i n t uDepth ; // Fragment depth

u i n t uNext ; // Address o f next l in k e d l i s t node

} ;

2.5 Per-Pixel Linked List Creation

The algorithm presented does not use DirectCompute; instead, the storing of

translucent fragments into linked lists is done via a pixel shader writing to the

head pointer and nodes buﬀers UAVs. Earlier shader stages in the pipeline are

enabled (vertex shader, but also hull shader, domain shader and geometry shader

if needed) in order to turn incoming triangles into fragments that can be stored

into per-pixel linked lists.

No color render target is bound at this stage although a depth buﬀer is still

required to ensure that only translucent fragments that pass the depth/stencil

test are stored in the per-pixel linked lists. Binding a depth buﬀer avoids the

need to store translucent fragments that would result in being completely hidden

by previously-rendered opaque or alpha-tested geometry.

2.5.1 Algorithm Description

A description of the algorithm used to build per-pixel linked lists follows.

• For each frame

– Clear head pointer buﬀer to 0xFFFFFFFF (−1). This indicates that

the per-pixel linked lists are all empty to start with.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2. Order-Independent Transparency using Per-Pixel Linked Lists (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 2. Order-Independent Transparency using Per-Pixel Linked Lists (1/5)