390 24.BitHacksforGames
On PowerPC processors, the cntlzw (count leading zeros word) instruction
can be used to evaluate the first expression in Table 24.1,
(a == 0), by calculat-
ing
cntlzw(a) >> 5. This works because the cntlzw instruction produces the
value 32 when its input is zero and a lower value for any other input. When shift-
ed right five bits, the value 32 becomes 1, and all other values are completely
shifted out to produce 0. A similar instruction called
BSR (bit scan reverse) exists
on x86 processors, but it produces undefined results when the input is zero, so it
cannot achieve the same result without a branch to handle the zero case.
Predicates can be used to perform a variety of conditional operations. The
expressions shown in Table 24.1 are typically used to change some other value
by one (or a power of two with the proper left shift), and the expressions shown
in Table 24.2 are typically used as masks that conditionally preserve or clear
some other value. We look at several examples in the remainder of this section.
ConditionalIncrementandDecrement
To perform conditional increment or decrement operations, the expressions
shown in Table 24.1 can simply be added to or subtracted from another value,
respectively. For example, the conditional statement
if (a >= 0) x++;
can be replaced by the following non-branching code:
x += (unsigned) ~a >> 31;
ConditionalAdditionandSubtraction
Conditional addition and subtraction can be performed by using the expressions
shown in Table 24.2 to mask the operations. For example, the conditional state-
ment
if (a >= 0) x += y;
can be replaced by the following non-branching code:
x += y & (~a >> 31);
IncrementorDecrementModuloN
The mask for the predicate (a < 0) can be used to implement increment and
decrement operations modulo a number n. Incrementing modulo 3 is particularly
common in game programming because it’s used to iterate over the vertices of a
24.2Predicates 391
triangle or to iterate over the columns of a
3
3
matrix from an arbitrary starting
index in the range
0,2
.
To increment a number modulo n, we can subtract 1n
from it and compare
against zero. If the result is negative, then we keep the new value; otherwise, we
wrap around to zero. For the decrement operation, we subtract one and then
compare against zero. If the result is negative, then we add the modulus n. These
operations are shown in Listing 24.4, and both generate four instructions when n
is a compile-time constant.
template <int n> inline int IncMod(int x)
{
return ((x + 1) & ((x - (n - 1)) >> 31));
}
template <int n> inline int DecMod(int x)
{
x--;
return (x + ((x >> 31) & n));
}
Listing 24.4. These functions increment and decrement the input value modulo n.
ClampingtoZero
Another use of masks is clamping against zero. The minimum and maximum
functions shown in Listing 24.5 take a single input and clamp to a minimum of
zero or a maximum of zero. On processors that have a logical AND with com-
plement instruction, like the PowerPC, both of these functions generate only two
instructions.
inline int MinZero(int x)
{
return (x & (x >> 31));
}
inline int MaxZero(int x)
{
return (x & ~(x >> 31));
}
Listing 24.5. These functions take the minimum and maximum of the input with zero.
392 24.BitHacksforGames
MinimumandMaximum
Branchless minimum and maximum operations are difficult to achieve when they
must work for the entire range of 32-bit integers. (However, see [1] for some ex-
amples that use special instructions.) They become much easier when we can
assume that the difference between the two input operands doesn’t underflow or
overflow when one is subtracted from the other. Another way of putting this is to
say that the input operands always have two bits of sign or that they are always in
the range
30 30
2,2 1
. When this is the case, we can compare the difference to
zero in order to produce a mask used to choose the minimum or maximum value.
The code is shown in Listing 24.6. Both functions generate four instructions if a
logical AND with complement is available; otherwise, the
Min() function gener-
ates five instructions.
24.3MiscellaneousTricks
This section presents several miscellaneous tricks that can be used to optimize
code. Some of the tricks are generic and can be applied to many different situa-
tions, while others are meant for specific tasks.
CleartheLeastSignificant1Bit
The least significant 1 bit of any value x can be cleared by logically ANDing it
with 1
x
. This property can be used to count the number of 1 bits in a value by
repeatedly clearing the least significant 1 bit until the value becomes zero. (See
[Anderson 2005] for more efficient methods, however.)
inline int Min(int x, int y)
{
int a = x - y;
return (x - (a & ~(a >> 31)));
}
inline int Max(int x, int y)
{
int a = x - y;
return (x - (a & (a >> 31)));
}
Listing 24.6. These functions return the minimum and maximum of a pair of integers when we
can assume two bits of sign.
24.3MiscellaneousTri cks 393
TestforPowerofTwo
If we clear the least significant 1 bit and find that the result is zero, then we know
that the original value was either zero or was a power of two because only a sin-
gle bit was set. Functions that test whether a value is a power of two, with and
without returning a positive result for zero, are shown in Listing 24.7.
inline bool PowerOfTwo(int x)
{
int y = x - 1; // y is negative only if x == 0.
return ((x & y) - (y >> 31) == 0);
}
inline bool PowerOfTwoOrZero(int x)
{
return ((x & (x - 1)) == 0);
}
Listing 24.7. These functions test whether a value is a power of two.
TestforPowerofTwoMinusOne
The bits to the right of the least significant 0 bit of any value x can be cleared by
logically ANDing it with 1
x
. If the result is zero, then that means that the value
was composed of a contiguous string of n 1 bits with no trailing 0 bits, which is
the representation of
2
1
n
. Note that this test gives a positive result for zero,
which is correct because zero is one less than
0
2 .
DetermineWhetheraVoxelContainsTriangles
In the marching cubes algorithm, an 8-bit code is determined for every cell in a
voxel grid, and this code maps to a set of triangulation cases that tell how many
vertices and triangles are necessary to extract the isosurface within the cell. The
codes
0x00 and 0xFF correspond to cells containing no triangles, and such cells
are skipped during the mesh generation process. We would like to avoid making
two comparisons and instead make only one so that empty cells are skipped more
efficiently.
Suppose that voxel values are signed 8-bit integers. We form the case code
by shifting the sign bits from the voxels at each of the eight cell corners into a
394 24.BitHacksforGames
different position in an 8-bit byte, as shown in Listing 24.8. We then exclusive
OR the case code with any one of the voxel values shifted right seven bits. The
result is zero for exactly the cases
0x00 and 0xFF, and it’s nonzero for everything
else.
unsigned long caseCode = ((corner[0] >> 7) & 0x01)
| ((corner[1] >> 6) & 0x02)
| ((corner[2] >> 5) & 0x04)
| ((corner[3] >> 4) & 0x08)
| ((corner[4] >> 3) & 0x10)
| ((corner[5] >> 2) & 0x20)
| ((corner[6] >> 1) & 0x40)
| (corner[7] & 0x80);
if ((caseCode ^ ((corner[7] >> 7) & 0xFF)) != 0)
{
// Cell has a nontrivial triangulation.
}
Listing 24.8. The case code for a cell is constructed by shifting the sign bits from the eight corner
voxel values into specific bit positions. One of the voxel values is shifted seven bits right to
produce a mask of all 0s or all 1s, and it is then exclusive ORed with the case code to determine
whether a cell contains triangles.
DeterminetheIndexoftheGreatestValueinaSetofThree
Given a set of three values (which could be floating-point), we sometimes need
to determine the index of the largest value in the set, in the range
0,2
. In partic-
ular, this arises when finding the support point for a triangle in a given direction
for the Gilbert-Johnson-Keerthi (GJK) algorithm. It’s easy to perform a few
comparisons and return different results based on the outcome, but we would like
to eliminate some of the branches involved in doing that.
For a set of values
012
,,vvv
, Table 24.3 enumerates the six possible combi-
nations of the truth values for the comparisons
10
v
v
,
20
v
v
, and
21
v
v
, which
we label as
0
b
,
1
b
, and
2
b
, respectively. As it turns out, the sum of
01
|
b
b and
12
&
b
b
produces the correct index for all possible cases. This leads us to the code
shown in Listing 24.9.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.126.211