12. Data Structures for Graphics (8/9)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

296 12. Data Structures for Graphics

in the ﬁnal else statement (assuming the subtrees are non-empty for simplicity)

gives:

if (fa∗ fc ≥ 0) then

swap(fb,fc)

swap(b, c)

swap(fa,fb)

swap(a, b)

else if (fb∗ fc ≥ 0) then

swap(fa,fc)

swap(a, c)

swap(fa,fb)

swap(a, b)

compute A

compute B

=(a, b, A)

=(b, B , A)

=(A, B, c)

if (fc ≥ 0) then

negative-subtree.add(T

)

negative-subtree.add(T

)

positive-subtree.add(T

)

else

positive-subtree.add(T

)

positive-subtree.add(T

)

negative-subtree.add(T

)

This code takes advantage of the fact that the product of a and b are positive if they

have the same sign—thus, the ﬁrst if statement. If vertices are swapped, we must

do two swaps to keep the vertices ordered counterclockwise. Note that exactly

one of the vertices may lie exactly on the plane, in which case the code above will

work, but one of the generated triangles will have zero area. This can be handled

by ignoring the possibility, which is not that risky, because the rasterization code

must handle zero-area triangles in screen space (i.e., edge-on triangles). You can

also add a check that does not add zero-area triangles to the tree. Finally, you can

put in a special case for when exactly one of fa, fb,andfc is zero which cuts the

triangle into two triangles.

To compute A and B, a line segment and implicit plane intersection is needed.

For example, the parametric line connecting a and c is

p(t)=a + t(c −a).

12.5. Tiling Multidimensional Arrays 297

The point of intersection with the plane n · p + D =0is found by plugging p(t)

into the plane equation:

n · (a + t(c − a)) + D =0,

and solving for t:

t = −

n · a + D

n · (c − a)

Calling this solution t

, we can write the expression for A:

A = a + t

(c − a).

A similar computation will give B.

12.4.4 Optimizing the Tree

The efﬁciency of tree creation is much less of a concern than tree traversal because

it is a preprocess. The traversal of the BSP tree takes time proportional to the

number of nodes in the tree. (How well balanced the tree is does not matter.)

There will be one node for each triangle, including the triangles that are created

as a result of splitting. This number can depend on the order in which triangles

are added to the tree. For example, in Figure 12.39, if T

is the root, there will be

two nodes in the tree, but if T

is the root, there will be more nodes, because T

will be split.

It is difﬁcult to ﬁnd the “best” order of triangles to add to the tree. For N

triangles, there are N! orderings that are possible. So trying all orderings is not

usually feasible. Alternatively, some predetermined number of orderings can be

Figure 12.39. Using

as the root of a BSP tree

will result in a tree with two

nodes. Using

as the root

will require a cut and thus

make a larger tree.

tried from a random collection of permutations, and the best one can be kept for

the ﬁnal tree.

The splitting algorithm described above splits one triangle into three trian-

gles. It could be more efﬁcient to split a triangle into a triangle and a con-

vex quadrilateral. This is probably not worth it if all input models have only

triangles, but would be easy to support for implementations that accommodate

arbitrary polygons.

12.5 Tiling Multidimensional Arrays

Effectively utilizing the memory hierarchy is a crucial task in designing algo-

rithms for modern architectures. Making sure that multidimensional arrays have

298 12. Data Structures for Graphics

data in a “nice” arrangement is accomplished by tiling, sometimes also called

bricking. A traditional 2D array is stored as a 1D array together with an indexing

mechanism; for example, an N

by N

array is stored in a 1D array of length

and the 2D index (x, y) (which runs from (0, 0) to (N

−1,N

−1))maps

to the 1D index (running from 0 to N

− 1) using the formula

Figure 12.40. The mem-

ory layout for an untiled 2D

array with

and

index = x + N

An example of how that memory lays out is shown in Figure 12.40. A problem

with this layout is that although two adjacent array elements that are in the same

row are next to each other in memory, two adjacent elements in the same column

will be separated by N

elements in memory. This can cause poor memory lo-

cality for large N

. The standard solution to this is to use tiles to make memory

locality for rows and columns more equal. An example is shown in Figure 12.41

where 2 × 2 tiles are used. The details of indexing such an array are discussed in

the next section. A more complicated example, with two levels of tiling on a 3D

array, is covered after that.

A key question is what size to make the tiles. In practice, they should be

similar to the memory-unit size on the machine. For example, if we are using

Figure 12.41. The mem-

ory layout for a tiled 2D ar-

ray with

and

and 2 × 2 tiles. Note that

padding on the top of the

array is needed because

is not a multiple of the tile

size two.

16-bit (2-byte) data values on a machine with 128-byte cache lines, 8 × 8 tiles ﬁt

exactly in a cache line. However, using 32-bit ﬂoating-point numbers, which ﬁt

32 elements to a cache line, 5 × 5 tiles are a bit too small and 6 × 6 tiles are a

bit too large. Because there are also coarser-sized memory units such as pages,

hierarchical tiling with similar logic can be useful.

12.5.1 One-Level Tiling for 2D Arrays

If we assume an N

×N

array decomposed into square n×n tiles (Figure 12.42),

then the number of tiles required is

= N

/n,

= N

/n.

Here, we assume that n divides N

and N

exactly. When this is not true, the

array should be padded. For example, if N

=15and n =4,thenN

should

be changed to 16. To work out a formula for indexing such an array, we ﬁrst ﬁnd

the tile indices (b

) that give the row/column for the tiles (the tiles themselves

form a 2D array):

= x ÷ n,

= y ÷ n,

12.5. Tiling Multidimensional Arrays 299

Figure 12.42. A tiled 2D array composed of

tiles each of size

where ÷ is integer division, e.g., 12 ÷ 5=2. If we order the tiles along rows as

shown in Figure 12.40, then the index of the ﬁrst element of the tile (b

) is

index = n

+ b

The memory in that tile is arranged like a traditional 2D array as shown in Fig-

ure 12.41. The partial offsets (x



) inside the tile are



= x mod n,



= y mod n,

where mod is the remainder operator, e.g., 12 mod 5 = 2. Therefore, the offset

inside the tile is

offset = y



n + x



Thus the full formula for ﬁnding the 1D index element (x, y) in an N

×N

array

with n × n tiles is

index = n

+ b

)+y



n + x



= n

((N

÷ n)(y ÷ n)+x ÷ n)+(y mod n)n +(x mod n).

This expression contains many integer multiplication, divide and modulus oper-

ations, which are costly on some processors. When n is a power of two, these

operations can be converted to bitshifts and bitwise logical operations. However,

as noted above, the ideal size is not always a power of two. Some of the mul-

tiplications can be converted to shift/add operations, but the divide and modulus

300 12. Data Structures for Graphics

operations are more problematic. The indices could be computed incrementally,

but this would require tracking counters, with numerous comparisons and poor

branch prediction performance.

However, there is a simple solution; note that the index expression can be

written as

index = F

(x)+F

(y),

where

(x)=n

(x ÷ n)+(x mod n),

(y)=n

÷ n)(y ÷ n)+(y mod n)n.

We tabulate F

and F

, and use x and y to ﬁnd the index into the data array.

These tables will consist of N

and N

elements, respectively. The total size of

the tables will ﬁt in the primary data cache of the processor, even for very large

data set sizes.

12.5.2 Example: Two-Level Tiling for 3D Arrays

Effective TLB utilization is also becoming a crucial factor in algorithm perfor-

mance. The same technique can be used to improve TLB hit rates in a 3D array

TLB: translation lookaside

buffer, a cache that is part

of the virtual memory sys-

tem.

by creating m × m × m bricks of n × n × n cells. For example, a 40 × 20 × 19

volume could be decomposed into 4 × 2 × 2 macrobricks of 2 × 2 × 2 bricks of

5 × 5 × 5 cells. This corresponds to m =2and n =5. Because 19 cannot be

factored by mn =10, one level of padding is needed. Empirically useful sizes

are m =5for 16 bit datasets and m =6for ﬂoat datasets.

The resulting index into the data array can be computed for any (x, y, z) triple

with the expression

index =((x ÷ n) ÷ m)n

((N

÷ n) ÷ m)((N

÷ n) ÷ m)

+((y ÷ n) ÷ m)n

((N

÷ n) ÷ m)

+((z ÷ n) ÷ m)n

+((x ÷ n)modm)n

+((y ÷ n)modm)n

+((z ÷ n)modm)n

+(x mod (n

))n

+(y mod n)n

+(z mod n),

where N

, N

and N

are the respective sizes of the dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Data Structures for Graphics (8/9)

Create new playlist

Sign In

Sign Up

Table of Contents for
12. Data Structures for Graphics (8/9)