i
i
i
i
i
i
i
i
296 12. Data Structures for Graphics
in the nal else statement (assuming the subtrees are non-empty for simplicity)
gives:
if (fa fc 0) then
swap(fb,fc)
swap(b, c)
swap(fa,fb)
swap(a, b)
else if (fb fc 0) then
swap(fa,fc)
swap(a, c)
swap(fa,fb)
swap(a, b)
compute A
compute B
T
1
=(a, b, A)
T
2
=(b, B , A)
T
3
=(A, B, c)
if (fc 0) then
negative-subtree.add(T
1
)
negative-subtree.add(T
2
)
positive-subtree.add(T
3
)
else
positive-subtree.add(T
1
)
positive-subtree.add(T
2
)
negative-subtree.add(T
3
)
This code takes advantage of the fact that the product of a and b are positive if they
have the same sign—thus, the rst if statement. If vertices are swapped, we must
do two swaps to keep the vertices ordered counterclockwise. Note that exactly
one of the vertices may lie exactly on the plane, in which case the code above will
work, but one of the generated triangles will have zero area. This can be handled
by ignoring the possibility, which is not that risky, because the rasterization code
must handle zero-area triangles in screen space (i.e., edge-on triangles). You can
also add a check that does not add zero-area triangles to the tree. Finally, you can
put in a special case for when exactly one of fa, fb,andfc is zero which cuts the
triangle into two triangles.
To compute A and B, a line segment and implicit plane intersection is needed.
For example, the parametric line connecting a and c is
p(t)=a + t(c a).
i
i
i
i
i
i
i
i
12.5. Tiling Multidimensional Arrays 297
The point of intersection with the plane n · p + D =0is found by plugging p(t)
into the plane equation:
n · (a + t(c a)) + D =0,
and solving for t:
t =
n · a + D
n · (c a)
.
Calling this solution t
A
, we can write the expression for A:
A = a + t
A
(c a).
A similar computation will give B.
12.4.4 Optimizing the Tree
The efciency of tree creation is much less of a concern than tree traversal because
it is a preprocess. The traversal of the BSP tree takes time proportional to the
number of nodes in the tree. (How well balanced the tree is does not matter.)
There will be one node for each triangle, including the triangles that are created
as a result of splitting. This number can depend on the order in which triangles
are added to the tree. For example, in Figure 12.39, if T
1
is the root, there will be
two nodes in the tree, but if T
2
is the root, there will be more nodes, because T
1
will be split.
It is difcult to nd the “best” order of triangles to add to the tree. For N
triangles, there are N! orderings that are possible. So trying all orderings is not
usually feasible. Alternatively, some predetermined number of orderings can be
Figure 12.39. Using
T
1
as the root of a BSP tree
will result in a tree with two
nodes. Using
T
2
as the root
will require a cut and thus
make a larger tree.
tried from a random collection of permutations, and the best one can be kept for
the nal tree.
The splitting algorithm described above splits one triangle into three trian-
gles. It could be more efcient to split a triangle into a triangle and a con-
vex quadrilateral. This is probably not worth it if all input models have only
triangles, but would be easy to support for implementations that accommodate
arbitrary polygons.
12.5 Tiling Multidimensional Arrays
Effectively utilizing the memory hierarchy is a crucial task in designing algo-
rithms for modern architectures. Making sure that multidimensional arrays have
i
i
i
i
i
i
i
i
298 12. Data Structures for Graphics
data in a “nice” arrangement is accomplished by tiling, sometimes also called
bricking. A traditional 2D array is stored as a 1D array together with an indexing
mechanism; for example, an N
x
by N
y
array is stored in a 1D array of length
N
x
N
y
and the 2D index (x, y) (which runs from (0, 0) to (N
x
1,N
y
1))maps
to the 1D index (running from 0 to N
x
N
y
1) using the formula
Figure 12.40. The mem-
ory layout for an untiled 2D
array with
N
x
=
4
and
N
y
=
3
.
index = x + N
x
y.
An example of how that memory lays out is shown in Figure 12.40. A problem
with this layout is that although two adjacent array elements that are in the same
row are next to each other in memory, two adjacent elements in the same column
will be separated by N
x
elements in memory. This can cause poor memory lo-
cality for large N
x
. The standard solution to this is to use tiles to make memory
locality for rows and columns more equal. An example is shown in Figure 12.41
where 2 × 2 tiles are used. The details of indexing such an array are discussed in
the next section. A more complicated example, with two levels of tiling on a 3D
array, is covered after that.
A key question is what size to make the tiles. In practice, they should be
similar to the memory-unit size on the machine. For example, if we are using
Figure 12.41. The mem-
ory layout for a tiled 2D ar-
ray with
N
x
=
4
and
N
y
=
3
and 2 × 2 tiles. Note that
padding on the top of the
array is needed because
N
y
is not a multiple of the tile
size two.
16-bit (2-byte) data values on a machine with 128-byte cache lines, 8 × 8 tiles t
exactly in a cache line. However, using 32-bit oating-point numbers, which t
32 elements to a cache line, 5 × 5 tiles are a bit too small and 6 × 6 tiles are a
bit too large. Because there are also coarser-sized memory units such as pages,
hierarchical tiling with similar logic can be useful.
12.5.1 One-Level Tiling for 2D Arrays
If we assume an N
x
×N
y
array decomposed into square n×n tiles (Figure 12.42),
then the number of tiles required is
B
x
= N
x
/n,
B
y
= N
y
/n.
Here, we assume that n divides N
x
and N
y
exactly. When this is not true, the
array should be padded. For example, if N
x
=15and n =4,thenN
x
should
be changed to 16. To work out a formula for indexing such an array, we rst nd
the tile indices (b
x
,b
y
) that give the row/column for the tiles (the tiles themselves
form a 2D array):
b
x
= x ÷ n,
b
y
= y ÷ n,
i
i
i
i
i
i
i
i
12.5. Tiling Multidimensional Arrays 299
Figure 12.42. A tiled 2D array composed of
B
x
×
B
y
tiles each of size
n
by
n
.
where ÷ is integer division, e.g., 12 ÷ 5=2. If we order the tiles along rows as
shown in Figure 12.40, then the index of the rst element of the tile (b
x
,b
y
) is
index = n
2
(B
x
b
y
+ b
x
).
The memory in that tile is arranged like a traditional 2D array as shown in Fig-
ure 12.41. The partial offsets (x
,y
) inside the tile are
x
= x mod n,
y
= y mod n,
where mod is the remainder operator, e.g., 12 mod 5 = 2. Therefore, the offset
inside the tile is
offset = y
n + x
.
Thus the full formula for nding the 1D index element (x, y) in an N
x
×N
y
array
with n × n tiles is
index = n
2
(B
x
b
y
+ b
x
)+y
n + x
,
= n
2
((N
x
÷ n)(y ÷ n)+x ÷ n)+(y mod n)n +(x mod n).
This expression contains many integer multiplication, divide and modulus oper-
ations, which are costly on some processors. When n is a power of two, these
operations can be converted to bitshifts and bitwise logical operations. However,
as noted above, the ideal size is not always a power of two. Some of the mul-
tiplications can be converted to shift/add operations, but the divide and modulus
i
i
i
i
i
i
i
i
300 12. Data Structures for Graphics
operations are more problematic. The indices could be computed incrementally,
but this would require tracking counters, with numerous comparisons and poor
branch prediction performance.
However, there is a simple solution; note that the index expression can be
written as
index = F
x
(x)+F
y
(y),
where
F
x
(x)=n
2
(x ÷ n)+(x mod n),
F
y
(y)=n
2
(N
x
÷ n)(y ÷ n)+(y mod n)n.
We tabulate F
x
and F
y
, and use x and y to nd the index into the data array.
These tables will consist of N
x
and N
y
elements, respectively. The total size of
the tables will t in the primary data cache of the processor, even for very large
data set sizes.
12.5.2 Example: Two-Level Tiling for 3D Arrays
Effective TLB utilization is also becoming a crucial factor in algorithm perfor-
mance. The same technique can be used to improve TLB hit rates in a 3D array
TLB: translation lookaside
buffer, a cache that is part
of the virtual memory sys-
tem.
by creating m × m × m bricks of n × n × n cells. For example, a 40 × 20 × 19
volume could be decomposed into 4 × 2 × 2 macrobricks of 2 × 2 × 2 bricks of
5 × 5 × 5 cells. This corresponds to m =2and n =5. Because 19 cannot be
factored by mn =10, one level of padding is needed. Empirically useful sizes
are m =5for 16 bit datasets and m =6for oat datasets.
The resulting index into the data array can be computed for any (x, y, z) triple
with the expression
index =((x ÷ n) ÷ m)n
3
m
3
((N
z
÷ n) ÷ m)((N
y
÷ n) ÷ m)
+((y ÷ n) ÷ m)n
3
m
3
((N
z
÷ n) ÷ m)
+((z ÷ n) ÷ m)n
3
m
3
+((x ÷ n)modm)n
3
m
2
+((y ÷ n)modm)n
3
m
+((z ÷ n)modm)n
3
+(x mod (n
2
))n
2
+(y mod n)n
+(z mod n),
where N
x
, N
y
and N
z
are the respective sizes of the dataset.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.151.153