24. Bit Hacks for Games (2/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

390 24.BitHacksforGames

On PowerPC processors, the cntlzw (count leading zeros word) instruction

can be used to evaluate the first expression in Table 24.1,

(a == 0), by calculat-

ing

cntlzw(a) >> 5. This works because the cntlzw instruction produces the

value 32 when its input is zero and a lower value for any other input. When shift-

ed right five bits, the value 32 becomes 1, and all other values are completely

shifted out to produce 0. A similar instruction called

BSR (bit scan reverse) exists

on x86 processors, but it produces undefined results when the input is zero, so it

cannot achieve the same result without a branch to handle the zero case.

Predicates can be used to perform a variety of conditional operations. The

expressions shown in Table 24.1 are typically used to change some other value

by one (or a power of two with the proper left shift), and the expressions shown

in Table 24.2 are typically used as masks that conditionally preserve or clear

some other value. We look at several examples in the remainder of this section.

ConditionalIncrementandDecrement

To perform conditional increment or decrement operations, the expressions

shown in Table 24.1 can simply be added to or subtracted from another value,

respectively. For example, the conditional statement

if (a >= 0) x++;

can be replaced by the following non-branching code:

x += (unsigned) ~a >> 31;

ConditionalAdditionandSubtraction

Conditional addition and subtraction can be performed by using the expressions

shown in Table 24.2 to mask the operations. For example, the conditional state-

ment

if (a >= 0) x += y;

can be replaced by the following non-branching code:

x += y & (~a >> 31);

IncrementorDecrementModuloN

The mask for the predicate (a < 0) can be used to implement increment and

decrement operations modulo a number n. Incrementing modulo 3 is particularly

common in game programming because it’s used to iterate over the vertices of a

24.2Predicates 391

triangle or to iterate over the columns of a



matrix from an arbitrary starting

index in the range





0,2

To increment a number modulo n, we can subtract 1n



from it and compare

against zero. If the result is negative, then we keep the new value; otherwise, we

wrap around to zero. For the decrement operation, we subtract one and then

compare against zero. If the result is negative, then we add the modulus n. These

operations are shown in Listing 24.4, and both generate four instructions when n

is a compile-time constant.

template <int n> inline int IncMod(int x)

{

return ((x + 1) & ((x - (n - 1)) >> 31));

}

template <int n> inline int DecMod(int x)

{

x--;

return (x + ((x >> 31) & n));

}

Listing 24.4. These functions increment and decrement the input value modulo n.

ClampingtoZero

Another use of masks is clamping against zero. The minimum and maximum

functions shown in Listing 24.5 take a single input and clamp to a minimum of

zero or a maximum of zero. On processors that have a logical AND with com-

plement instruction, like the PowerPC, both of these functions generate only two

instructions.

inline int MinZero(int x)

{

return (x & (x >> 31));

}

inline int MaxZero(int x)

{

return (x & ~(x >> 31));

}

Listing 24.5. These functions take the minimum and maximum of the input with zero.

392 24.BitHacksforGames

MinimumandMaximum

Branchless minimum and maximum operations are difficult to achieve when they

must work for the entire range of 32-bit integers. (However, see [1] for some ex-

amples that use special instructions.) They become much easier when we can

assume that the difference between the two input operands doesn’t underflow or

overflow when one is subtracted from the other. Another way of putting this is to

say that the input operands always have two bits of sign or that they are always in

the range





30 30

2,2 1

. When this is the case, we can compare the difference to

zero in order to produce a mask used to choose the minimum or maximum value.

The code is shown in Listing 24.6. Both functions generate four instructions if a

logical AND with complement is available; otherwise, the

Min() function gener-

ates five instructions.

24.3MiscellaneousTricks

This section presents several miscellaneous tricks that can be used to optimize

code. Some of the tricks are generic and can be applied to many different situa-

tions, while others are meant for specific tasks.

CleartheLeastSignificant1Bit

The least significant 1 bit of any value x can be cleared by logically ANDing it

with 1

 . This property can be used to count the number of 1 bits in a value by

repeatedly clearing the least significant 1 bit until the value becomes zero. (See

[Anderson 2005] for more efficient methods, however.)

inline int Min(int x, int y)

{

int a = x - y;

return (x - (a & ~(a >> 31)));

}

inline int Max(int x, int y)

{

int a = x - y;

return (x - (a & (a >> 31)));

}

Listing 24.6. These functions return the minimum and maximum of a pair of integers when we

can assume two bits of sign.

24.3MiscellaneousTri cks 393

TestforPowerofTwo

If we clear the least significant 1 bit and find that the result is zero, then we know

that the original value was either zero or was a power of two because only a sin-

gle bit was set. Functions that test whether a value is a power of two, with and

without returning a positive result for zero, are shown in Listing 24.7.

inline bool PowerOfTwo(int x)

{

int y = x - 1; // y is negative only if x == 0.

return ((x & y) - (y >> 31) == 0);

}

inline bool PowerOfTwoOrZero(int x)

{

return ((x & (x - 1)) == 0);

}

Listing 24.7. These functions test whether a value is a power of two.

TestforPowerofTwoMinusOne

The bits to the right of the least significant 0 bit of any value x can be cleared by

logically ANDing it with 1



. If the result is zero, then that means that the value

was composed of a contiguous string of n 1 bits with no trailing 0 bits, which is

the representation of



. Note that this test gives a positive result for zero,

which is correct because zero is one less than

2 .

DetermineWhetheraVoxelContainsTriangles

In the marching cubes algorithm, an 8-bit code is determined for every cell in a

voxel grid, and this code maps to a set of triangulation cases that tell how many

vertices and triangles are necessary to extract the isosurface within the cell. The

codes

0x00 and 0xFF correspond to cells containing no triangles, and such cells

are skipped during the mesh generation process. We would like to avoid making

two comparisons and instead make only one so that empty cells are skipped more

efficiently.

Suppose that voxel values are signed 8-bit integers. We form the case code

by shifting the sign bits from the voxels at each of the eight cell corners into a

394 24.BitHacksforGames

different position in an 8-bit byte, as shown in Listing 24.8. We then exclusive

OR the case code with any one of the voxel values shifted right seven bits. The

result is zero for exactly the cases

0x00 and 0xFF, and it’s nonzero for everything

else.

unsigned long caseCode = ((corner[0] >> 7) & 0x01)

| ((corner[1] >> 6) & 0x02)

| ((corner[2] >> 5) & 0x04)

| ((corner[3] >> 4) & 0x08)

| ((corner[4] >> 3) & 0x10)

| ((corner[5] >> 2) & 0x20)

| ((corner[6] >> 1) & 0x40)

| (corner[7] & 0x80);

if ((caseCode ^ ((corner[7] >> 7) & 0xFF)) != 0)

{

// Cell has a nontrivial triangulation.

}

Listing 24.8. The case code for a cell is constructed by shifting the sign bits from the eight corner

voxel values into specific bit positions. One of the voxel values is shifted seven bits right to

produce a mask of all 0s or all 1s, and it is then exclusive ORed with the case code to determine

whether a cell contains triangles.

DeterminetheIndexoftheGreatestValueinaSetofThree

Given a set of three values (which could be floating-point), we sometimes need

to determine the index of the largest value in the set, in the range





0,2

. In partic-

ular, this arises when finding the support point for a triangle in a given direction

for the Gilbert-Johnson-Keerthi (GJK) algorithm. It’s easy to perform a few

comparisons and return different results based on the outcome, but we would like

to eliminate some of the branches involved in doing that.

For a set of values





012

,,vvv

, Table 24.3 enumerates the six possible combi-

nations of the truth values for the comparisons

v

, and

v

, which

we label as

, and

, respectively. As it turns out, the sum of

b and

produces the correct index for all possible cases. This leads us to the code

shown in Listing 24.9.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 24. Bit Hacks for Games (2/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
24. Bit Hacks for Games (2/3)