Chapter 18. Multiprecision Processing and Monte Carlo Methods

This chapter discusses two important SDK libraries whose capabilities can be used in many different applications. The first is the Multiprecision Math library, or libmpm, and the second is the Monte Carlo library, or libmc_rand. Both libraries provide functions that generate and manipulate special kinds of numbers.

The functions in the Multiprecision Math library operate on signed and unsigned integers that, by default, can occupy up to 4096 bits. Numbers of this size are important in cryptography, number theory, and any computation requiring values larger than 64-bit long longs. libmpm not only contains basic arithmetic functions for multiprecision numbers, but also advanced routines for operations such as modular reduction and modular exponentiation. These routines are crucial in data security, and the discussion of libmpm concludes with a full example of RSA encryption.

The Monte Carlo library, libmc_rand, provides functions that generate and transform sequences of numbers that resemble random numbers. Specifically, it generates number sequences using the Kirkpatrick-Stoll, Mersenne Twister, and Sobol methods. The distribution of these sequences can then be transformed using the Box-Muller algorithm, Moro’s inversion, or the Polar method.

Multiprecision Mathematics Library (libmpm)

In his fine book Cryptography in C and C++, Michael Welschenbach spends many chapters developing functions that operate on numbers comprising hundreds and thousands of bits. These functions are vital in Internet security and find application throughout computational mathematics. With libmpm, these routines are already available, and as a later example will show, they provide a solid foundation for implementing cryptographic methods such as RSA.

The multiprecision numbers we’ll be dealing with can be positive or negative, but they’re all integers. The functions in libmpm access these numbers as arrays of vector unsigned ints. For example, to allocate memory for a 512-bit integer called big_int, you’d create an array of 512/128 = 4 vectors, as shown here:

vector unsigned int big_int[4];

In this example, the pointer big_int would be passed into the libmpm functions and its size would be given as 4.

The maximum allowable size is determined by the MPM_MAX_SIZE parameter. This is set to 32, which means that multiprecision numbers can have up to 128 * 32 = 4096 bits. The bits of a multiprecision number are ordered in big-endian fashion, just as they are inside a vector.

This presentation of libmpm consists of two parts: general-purpose arithmetic/logic functions and multiprecision division/modular functions. The functions in the first category are easy to understand, but those in the second require additional explanation.

Multiprecision General Arithmetic/Logic Functions

Table 18.1 lists the general-purpose routines in libmpm that operate on multiprecision values. In each entry, vec is short for vector unsigned int. The size parameter identifies how many vector unsigned ints make up the multiprecision value.

Table 18.1. General-Purpose Multiprecision Functions

Function Name

Purpose

mpm_abs(vec* a, int size)

Set a equal to its absolute value.

mpm_neg(vec* a, int size)

Set a equal to its negation.

mpm_add(vec* sum, vec* a, vec* b, int size)

Set sum equal to the sum of a and b, where a and b have the same size, return carry.

mpm_add2(vec* sum, vec* a, int a_size, vec* b, int b_size)

Set sum equal to the sum of a and b, where a and b have different sizes, return carry.

mpm_add3(vec* sum, int sum_size, vec* a, int a_size, vec* b, int b_size)

Set sum equal to the sum of a and b, where a, b, and sum have different sizes, return carry.

mpm_add_partial(vec* sum, vec* a, vec* b, vec* carry, int size)

Perform a partial sum of two values. Accept carry as input, return as output.

mpm_sub(vec* diff, vec* a, vec* b, int size)

Set diff equal to a minus b, where a and b have the same size, return borrow.

mpm_sub2(vec* diff, vec* a, int a_size, vec* b, int b_size)

Set diff equal to a minus b, where a and b have different sizes.

mpm_square(vec* square, vec* a, int size)

Set square equal to a*a.

mpm_mul(vec* prod, vec* a, int a_size, vec* b, int b_size)

Set prod equal to a times b, where a and b have different sizes.

mpm_madd(vec* res, vec* a, int a_size, vec* b, int b_size, vec* c, int c_size)

Set res equal to a*b + c, where a, b, and c have different sizes.

mpm_cmpeq(vec* a, vec* b, int size)

Return all 1s if a and b are equal, all 0s otherwise.

mpm_cmpeq2(vec* a, int a_size, vec* b, int b_size)

Return all 1s if a and b (different sizes) are equal, all 0s otherwise.

mpm_cmpge(vec* a, vec* b, int size)

Return all 1s if a is greater than or equal to b, all 0s otherwise.

mpm_cmpge2(vec* a, int a_size, vec* b, int b_size)

Return all 1s if a is greater than or equal to b (different sizes), all 0s otherwise.

mpm_cmpgt(vec* a, vec* b, int size)

Return all 1s if a is greater than b, all 0s otherwise.

mpm_cmpgt2(vec* sum, vec* a, int a_size, vec* b, int b_size)

Return all 1s if a is greater than b (different sizes), all 0s otherwise.

mpm_swap_endian(vec* a, int size)

Reverse the bit ordering of the contents of the multiprecision number in a.

mpm_sizeof(vec* a, int size)

Return index of highest non-0 quadword in a.

None of these functions initialize a multiprecision value. The vectors in the value must be initialized individually.

The first two functions are concerned with the sign of the input multiprecision value. mpm_abs sets the input number equal to its magnitude and mpm_neg sets the number equal to its negation.

The usage of mpm_add, mpm_add2, and mpm_add3 depends on the size of the operands and the sum. mpm_add assumes that the operands and sum have the same size, and returns a vector containing the carry result. mpm_add2 assumes that the operands have different sizes and returns the size of the sum. mpm_add3 also assumes that the operands have different sizes, but accepts the size of the sum as a function argument.

mpm_add_partial is similar to mpm_add, but carry is a multiprecision value that serves as both input and output. This makes it possible to add multiple multiprecision values with a single carry value. For example, let’s say you want to add three 256-bit numbers. The same carry value is updated with each addition and is used at the end to augment the running sum. This is shown in Figure 18.1.

Multiprecision addition for multiple values

Figure 18.1. Multiprecision addition for multiple values

To implement this in code, mpm_add_partial must be called twice: once to add the first two inputs, and once to add the third input to the temporary sum. Then mpm_add adds the carry to the temporary sum. This is shown in Listing 18.1, which adds three vectors containing repeated 0xAAAAAAAA, 0xBBBBBBBB, and 0xCCCCCCCC sequences.

Example 18.1. Multiprecision Addition: spu_mpm_add.c

#include <stdio.h>
#include <libmpm.h>
#include <spu_intrinsics.h>

#define SIZE 2

int main() {

   int i, j;

   /* Create multiprecision values x, y, and z */
   vector unsigned int x[SIZE], y[SIZE], z[SIZE];

   /* Declare the carry, temp sum, and sum */
   vector unsigned int carry[SIZE], tmp_sum[SIZE],
      sum[SIZE+1];

   for (i=0; i<SIZE; i++) {
      x[i] = (vector unsigned int)
         {0xAAAAAAAA, 0xAAAAAAAA, 0xAAAAAAAA, 0xAAAAAAAA};
      y[i] = (vector unsigned int)
         {0xBBBBBBBB, 0xBBBBBBBB, 0xBBBBBBBB, 0xBBBBBBBB};
      z[i] = (vector unsigned int)
         {0xCCCCCCCC, 0xCCCCCCCC, 0xCCCCCCCC, 0xCCCCCCCC};
      carry[i] = (vector unsigned int)
         {0x00000000, 0x00000000, 0x00000000, 0x00000000};
      tmp_sum[i] = (vector unsigned int)
         {0x00000000, 0x00000000, 0x00000000, 0x00000000};
   }

   /* Perform the addition */
   mpm_add_partial(tmp_sum, x, y, carry, SIZE);
   mpm_add_partial(tmp_sum, z, tmp_sum, carry, SIZE);

   /* Set the first vector in sum */
   vector unsigned char sum_vec = {128, 128, 128, 128, 128,
      128, 128, 128, 128, 128, 128, 128, 28, 29, 30, 31};
   sum[0] = spu_shuffle(carry[0], carry[0], sum_vec);

   /* Shift the rest of the carry values into the sum */
   sum_vec = (vector unsigned char){4, 5, 6, 7,
      8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
   for (i=1; i<SIZE; i++) {
      sum[i] = spu_shuffle(carry[i-1], carry[i], sum_vec);
   }

   /* Shift the last carry value */
   sum[SIZE] = spu_slqwbyte(carry[SIZE-1],
      sizeof(unsigned int));

   /* Add the sum to the tmp_sum */
   mpm_add2(sum, tmp_sum, SIZE, sum, SIZE+1);

   /* Display results */
   printf("Sum:
");
   for (i=0; i<SIZE+1; i++) {
      for (j=0; j<4; j++)
         printf("%08x ", spu_extract(sum[i], j));
      printf("
");
   }
   return 0;
}

The three multiplication functions, mpm_square, mpm_mul, and mpm_madd operate exactly as their names imply. The first multiplies a multiprecision value by itself, the second multiplies different numbers, and the third multiplies two numbers and adds a third. In each case, the quadword size of each input value must be identified.

The next six functions in the table compare multiprecision values. Like the similarly named SPU intrinsics, mpm_cmpgeq tests for equality, mpm_cmpgth tests for a greater-than relationship, and mpm_cmpgte tests for a greater-than-or-equal-to relationship. Each function has a second implementation that operates on values of different sizes. If a test succeeds, the function returns an unsigned int whose bits are all set to 1. If the test fails, all the bits are set to 0.

The last two functions perform miscellaneous operations. mpm_swap_endian reverses the bit ordering of a number: little-endian becomes big-endian, and big-endian becomes little-endian. mpm_sizeof returns the size of the non-zero portion of a multiprecision value, which is not necessarily the amount of memory allocated to hold the value. For example, if big_int is made up of eight vectors but only the lowest six hold non-zero values, mpm_sizeof returns 6.

Multiprecision Division and Modular Operations

Multiprecision division and modular mathematics are crucial in cryptography, and many systems are only secure because these operations take so long to perform. For this reason, libmpm.a provides a number of different functions for division and modular computation. Table 18.2 lists each of them.

Table 18.2. Multiprecision Division Functions

Function name

Purpose

mpm_div(vec* quot, vec* rem, vec* a, int a_size, vec* b, int b_size)

Compute a divided by b, place the quotient in quot, the remainder in rem

mpm_div2(vec* quot, vec* a, int a_size, vec* b, int b_size)

Compute a divided by b and place the quotient in quot

mpm_mod(vec* mod, vec* a, int a_size, vec* b, int b_size)

Set mod equal to a mod b

mpm_fixed_mod_reduction(vec* mod, vec* a, vec* b, vec* u, int b_size)

Set mod equal to a mod b using efficient algorithm

mpm_mul_inv(vec* res, vec* a, vec* b, int size)

Set res equal to the modular multiplicative inverse of b with respect to a

mpm_mul_inv2(vec* res, vec* a, int a_size, vec* b, int b_size)

Set res equal to the modular multiplicative inverse of b with respect to a

mpm_mul_inv3(vec* res, vec* a, int a_size, vec* b, int b_size)

Set res equal to the modular multiplicative inverse of b with respect to a

mpm_gcd(vec* gcd, vec* a, int a_size, vec* b, int b_size)

Set gcd equal to the greatest common divisor of a and b

mpm_mont_mod_mul(vec* res, vec* a, vec* b, vec* m, int size, int p)

Set res equal to the product of a and b modulo m

mpm_mod_exp(vec* res, vec* b, vec* e, int e_size, vec* m, int m_size, int k)

Set res equal to b^e mod m, where k is the size of the sliding window

mpm_mod_exp2(vec* res, vec* b, vec* e, int e_size, vec* m, int m_size, int k, int u)

Set res equal to b^e mod m, where k is the size of the sliding window and u is an optimization value

mpm_mod_exp3(vec* res, vec* b, int b_size, vec* e, int e_size, vec* m, int m_size, int u)

Set res equal to b^e mod m, where the sliding window is 6 bits wide and u is a precomputed value

mpm_mont_mod_exp(vec* res, vec* b, vec* e, int e_size, vec* m, int m_size, int k)

Set res equal to b^e mod m, where k is the size of the sliding window

mpm_mont_mod_exp(vec* res, vec* b, vec* e, int e_size, vec* m, int m_size, int k, vec p, vec* a, vec* u)

Set res equal to b^e mod m, where k is the size of the sliding window and p, a, and u are precomputed values

mpm_mont_mod_exp(vec* res, vec* b, vec* e, int e_size, vec* m, int m_size)

Set res equal to b^e mod m, where the sliding window is at a maximum

The first four functions perform straightforward arithmetic. mpm_div accepts two multiprecision values and returns the quotient (a / b or a div b) and remainder (a % b or a mod b). mpm_div2 returns the quotient without the remainder and mpm_mod returns the remainder without the quotient. mpm_fixed_mod_reduction also returns the remainder, but uses an especially efficient method called Barrett’s algorithm. This algorithm requires that a be twice the quadword size of b, and requires a precomputed value u = 2256*n/b.

The next three functions look similar to regular division, but there’s an important difference. If a is the multiplicative inverse of b, ab = 1. If res is the modular multiplicative inverse of b with respect to a, the equation becomes

(b*res)mod a = 1

Each modular inverse function requires b to be non-zero and less than a. Each returns 1 if it finds an inverse, 0 if it doesn’t.

The three mpm_mult_inv functions accept the same parameters but differ in how they compute the inverse. mpm_mult_inv is efficient for values of approximately equal size, mpm_mult_inv2 is efficient for values of significantly different size, and mpm_mult_inv3 uses a technique that takes a middle ground between mpm_mult_inv and mpm_mult_inv2. Experimentation may be required to determine which function performs best for a given application.

The next function, gcd, returns the greatest common divisor of two values. This is the largest number that divides evenly into both values. Finding this value is a common task in number theory, and gcd(21,28) = 7 because 7 is the largest value that divides into both 21 and 28. If two numbers are coprime, their gcd equals 1. For example, because gcd(54, 25) = 1, 54 and 25 are coprime.

The mpm_mont_mod_mul function performs modular multiplication. It computes the value

(a*b*y)mod m

where

Multiprecision Division Functions

This uses the Montgomery method to calculate the product, and the result must have the same quadword size as m. The last parameter, p equals 2128 − g, where g is given by

Multiprecision Division Functions

For more information on mpm_mont_mod_mul and the other modular functions, see the SDK Example Library API documentation provided with the Cell SDK.

Modular Exponentiation

The last six functions in Table 18.2 deal with modular exponentiation, which computes

res = be mod m

This operation is important in cryptography for a simple reason: It’s easy to compute res given b, e, and m, but nearly impossible to reverse the operation and compute b given res, e, and m. Rapid modular exponentiation is critical in many encryption/decryption schemes, including the RSA cryptographic system.

The first three functions, mpm_mod_exp and its variants, rely on a sliding-window optimization to compute the value of res. The operation of the window goes beyond the scope of this book, but its maximum size is determined by the global parameter MPM_MOD_EXP_MAX_K. mpm_mod_exp and mpm_mod_exp2 accept variable window lengths, and mpm_mod_exp3 sets it to a constant value of 6 bits. mpm_mod_exp2 accepts an additional parameter u, where

Modular Exponentiation

The next three functions, mpm_mont_mod_exp and its variants, perform modular exponentiation using Montgomery modular multiplication and the sliding-window optimization. mpm_mont_mod_exp and mpm_mont_mod_exp2 accept variable lengths for the window, but mpm_mont_mod_exp3 specifies its size as MPM_MOD_EXP_MAX_K. mpm_mont_mod_exp2 accepts additional parameters u, a, and p where

u = 2256*size(m) mod m

a = 2128*size(m) mod m

and p equals 2128 – g, where g is given by

Modular Exponentiation

Now let’s see how these functions are used in practice.

Public Key Cryptography and RSA

Section 14.3, “SPU Security and Isolation,” describes public key cryptography and how SPU isolation uses a public key and private key to encrypt and decrypt SPU executables. This section explains how these keys can be generated using the basic RSA algorithm.

RSA is named after its creators, Ron Rivest, Adi Shamir, and Leonard Adleman. In 1977, they published a paper describing how modular operations on large numbers can be used to encrypt and decrypt information. Since their discovery, RSA has become a popular means of securing online data.

The goal of RSA is to enable a sender (we’ll call her Alice) to transmit an encrypted message to a receiver (Bob) in such a way that an eavesdropper (Eve) can’t decrypt it. Alice can’t send a decryption key because Eve will use it to break the encryption. Instead, Bob sends Alice his public key. This can encrypt messages intended for Bob but it can’t decrypt them. When Alice encrypts her message with Bob’s public key, Bob can authenticate that Eve didn’t interfere. Then he alone can decrypt the message with his private key.

Mathematically, the process of generating the public/private key consists of five steps:

  1. Pick large prime numbers, p and q, and compute their product, called n.

  2. Calculate m = (p − 1)(q − 1).

  3. Find an integer e less than m that has no divisors with m except 1.

  4. Compute d, the modular multiplicative inverse of e with respect to m. That is, find d such that de mod m = 1.

  5. Release e and n as the public key, keep d and n as the private key.

Once the keys are generated, Alice can encrypt her message to Bob by converting it to a number, PT, and applying the following equation

CT = PTe mod n

where e is Bob’s public key. The resulting value, called the cipher text or CT, can be sent safely over the channel. When Bob receives it, he can decrypt it using

PT = CTd mod n

The result is Alice’s original message, called the plain text or PT. Eve can intercept the cipher text, she can’t decrypt it because she doesn’t know the value of d.

The following lines of code are taken from spu_rsa.c in the Chapter18/rsa directory. They show how the RSA process is performed using libmpm functions. The modular multiplicative inverse is computed with mpm_mul_inv2, and modular exponentiation is performed with mpm_mod_exp:

/* Step 1: Compute the product of the primes */
vector unsigned int n[12];
mpm_mul(n, p_vec, 4, q_vec, 8);

/* Step 2: Compute m = (p-1)(q-1) */
vector unsigned int m[12];
mpm_sub2(p_vec, p_vec, 4, &one_vec, 1);
mpm_sub2(q_vec, q_vec, 8, &one_vec, 1);
mpm_mul(m, p_vec, 4, q_vec, 8);

/* Step 3: Find an integer e<m such that gcd(e, m) = 1 */
vector unsigned int e = (vector unsigned int)
   {0, 0, 0, 257};

/* Step 4: Compute d such that de mod m = 1 */
vector unsigned int d[12];
mpm_mul_inv2(d, m, 12, &e, 1);

/* Step 5: Create and encrypt the message */
mpm_mod_exp(cipher_text, plain_text, &e, 1, n, 12, 2);

/* Step 6: Decrypt the message */
mpm_mod_exp(result_text, cipher_text, d, 12, n, 12, 6);

The proper value for the last argument in mpm_mod_exp depends on the exponent. The SDK documentation recommends a sliding window size of 2 for an exponent comprising 4 to 12 bits and a size of 6 for an exponent comprising 1024 to 2048 bits.

The Monte Carlo Library (libmc_rand)

Many systems of interest to scientists and engineers are either too complex or too uncertain to provide a clear relationship between input and output. Statistical simulation can be of great help in investigating these systems, and a common method is this:

  1. Generate a series of random inputs.

  2. Send the random inputs into the simulated system.

  3. Analyze output statistics and draw inferences about the system.

This was the method employed by John von Neumann and other Los Alamos physicists to simulate chemical processes related to the atomic bomb. One physicist, Stanislas Ulam, had a high regard for poker, so this became known as the Monte Carlo method. This method has grown widely in popularity since its early usage, and today its range of applications includes finance, operations research, and computer graphics, particularly ray tracing.

As more random inputs are fed into the system, the outputs will ideally converge to an approximation of the actual system response. Practical simulations of complex systems may require millions of inputs to gain meaningful data. But if the input values aren’t completely random, patterns may skew the results and lead to erroneous conclusions.

Purely digital circuits can’t generate random numbers. Therefore, analysts usually settle for pseudo-random numbers with qualities that resemble randomness. When comparing pseudo-random number generators (PRNGs), a common figure of merit is how many numbers can be generated before the sequence repeats itself. This is called the period of the PRNG. Another important concern is how quickly these numbers can be generated.

The SDK’s Monte Carlo library, libmc_rand, provides functions that generate number sequences with qualities that approach randomness. Other libmc_rand functions change the distribution of existing number sequences. This library is contained in the SDK ISO containing extra packages (see Chapter 2, “The Cell Software Development Kit (SDK)”). If the Extras ISO is in the /tmp/cellsdkiso directory, change to the /opt/cell directory and mount it with the following command:

./cellsdk --iso /tmp/cellsdkiso mount

Then install the Monte Carlo module with this command:

yum install libmc-rand-devel

The libmc_rand functions are only available to run on SPUs. Unlike the libmpm functions, they cannot be inlined within code.

Generating Pseudo- and Quasi-Random Numbers

Mathematicians have two criteria for random numbers: the values must be independent and identically distributed. The first requirement demands that the numbers have no pattern or bias, and the second requires that the values be evenly distributed throughout the range. libmc_rand provides three different ways of generating values that either meet these requirements or come close.

The best way to generate random numbers using the Cell is to use its hardware random number generator (RNG). This draws samples from the physical environment to generate random numbers with no pattern whatsoever. This RNG plays a critical role in the security functions described in Chapter 14, “Advanced SPU Topics: Overlays, Software Caching, and SPU Isolation,” but the Cell simulator and PlayStation 3 can’t access it. mc_rand_hw_init tests whether the hardware RNG is available. If this function returns 0, the hardware RNG is inaccessible.

If the hardware RNG isn’t available, libmc_rand provides three software methods that produce values that resemble random numbers, but aren’t. The first two, Kirkpatrick-Stoll and Mersenne Twister, generate values using modular polynomials, conceptually represented by feedback shift registers (FSRs). These values appear to be random, and long sequences can be generated before the values repeat themselves. But they deviate statistically from randomness, so these values are called pseudo-random.

In many cases, analysts are willing to accept sequences with patterns as long as the values are evenly distributed throughout the range. Numbers generated for this purpose are called quasi-random, and they’re generally used for simulations in multidimensional space. Quasi-random numbers fill space more evenly than pseudo-random numbers. This makes certain types of computations converge faster. Monte Carlo methods that rely on these values are called Quasi-Monte Carlo methods, or QMC.

Feedback Shift Registers (FSRs) and Pseudo-Random Numbers

The Monte Carlo library provides two methods for generating pseudo-random numbers: the Kirkpatrick-Stoll method and the Mersenne Twister. To see how these work, you need to understand feedback shift registers (FSRs).

A shift register is a simple queue of bit-wide memory stages. This is shown on the upper portion of Figure 18.2. The bits shift with each clock signal: The last bit leaves the register, a new bit enters the register, and the rest of the bits shuffle toward the final position.

Shift register and linear-feedback shift register

Figure 18.2. Shift register and linear-feedback shift register

This isn’t interesting. The register’s value (0xA in the figure) is just the concatenation of its bit sequence. But if the bit shifted out is XORed with one or more intermediate bits and the result is fed back into the input, the register’s value becomes interesting and useful. This is the structure of a linear-feedback shift register (LFSR), and the bottom part of Figure 18.2 shows an example LFSR with four stages, numbered from right to left.

LFSRs have been studied for decades, and if the XOR gates are put in the right places, an n-stage LFSR can produce a sequence of 2n − 1 values without repetition. These values have no clear pattern from one to the next, and any subset of the 2n − 1 values are suitable for use in Monte Carlo simulations. After producing 2n−1 values, the sequence repeats itself from start to finish.

For example, suppose the LFSR in Figure 18.2 has a starting value of 1010. As clock signals proceed, the register takes the sequence of values listed in Table 18.3.

Table 18.3. Successive Values of a Four-Bit Linear Feedback Shift Register (LFSR)

Stage 4

Stage 3

Stage 2

Stage 1

Register Value

1

0

1

0

A

0

1

0

1

5

1

0

1

1

B

0

1

1

0

6

1

1

0

0

C

1

0

0

1

9

0

0

1

0

2

0

1

0

0

4

1

0

0

0

8

0

0

0

1

1

0

0

1

1

3

0

1

1

1

7

1

1

1

1

F

1

1

1

0

E

1

1

0

1

D

1

0

1

0

A

When the LFSR cycles through all 24 − 1 stages, it returns to its original value of 0xA. It can never hold an all-zero value because it would remain in that stage indefinitely.

Keeping in mind that XOR gates perform modulo-2 addition, the operation of an LFSR can be expressed as a modulo-2 polynomial. The output from stage n is represented by xn, and the input into the first stage is represented by x0 = 1. Other terms are included in the polynomial if the corresponding stage is connected to XOR gates. The LFSR in Figure 18.2 is expressed by the polynomial x4 + x + 1 because the output of the first stage (x) is connected to the XOR gate.

A generalized feedback shift register (GFSR) consists of identical LFSRs compressed on top of one another. That is, each stage of a GFSR stores n bits rather than one. There are no actual GFSRs in the Cell device; they’re implemented with loops that repeatedly bit-shift and XOR integer values. GFSRs can generate pseudo-random numbers at high speed, but require an initial value (the seed) and take time to initialize.

The simplest and fastest PRNG in libmc_rand is the Kirkpatrick-Stoll method (KS). This generates pseudo-random numbers using a 250-stage GFSR. The KS GFSR structure is simple and produces values quickly, but statistical analysis has shown that these values deviate from randomness.

Other libmc_rand functions generate pseudo-random numbers using the Mersenne Twister (MT) method. These values are statistically more random than those of the Kirkpatrick-Stoll, but take longer to generate. This method takes its name from its period (219937 − 1, 19,937 is a Mersenne prime), and its GFSR structure, called a twisted generalized feedback shift register, or TGFSR. This is one of the most popular PRNGs available.

Table 18.4 lists the functions that access the hardware RNG, the Kirkpatrick-Stoll PRNG, and the Mersenne Twister. The XX in certain function names can be replaced with HW for the hardware generator, KS for Kirkpatrick-Stoll, or MT for Mersenne Twister. Parameter types are abbreviated: ui stands for unsigned int, vui stands for vector unsigned int, vf stands for vector float, and vd stands for vector double.

Table 18.4. Monte Carlo Functions Related to the Hardware (HW), Kirkpatrick-Stoll (KS), and Mersenne Twister (MT) Number Generators

Function Name

Purpose

mc_rand_hw_init()

Initialize the hardware generator/check if available

mc_rand_ks_init(int seed)

Initialize the Kirkpatrick-Stoll generator with seed value

mc_rand_mt_init(int seed)

Initialize the Mersenne Twister generator with seed value

mc_rand_XX_u4()

Return integers in a vector unsigned int

mc_rand_XX_0_to_1_f4()

Return floats from 0 to 1 in a vector float

mc_rand_XX_0_to_1_d2()

Return doubles from 0 to 1 in a vector double

mc_rand_XX_minus1_to_1_f4()

Return floats from −1 to 1 in a vector float

mc_rand_XX_minus1_to_1_d2()

Return doubles from −1 to 1 in a vector double

mc_rand_XX_array_u4(ui count, vui* array)

Return integers in an array of count vectors

mc_rand_XX_0_to_1_array_f4(ui count, vf* array)

Return floats from 0 to 1 in an array of count vectors

mc_rand_XX_0_to_1_array_d2(ui count, vd* array)

Return doubles from 0 to 1 in an array of count vectors

mc_rand_XX_minus1_to_1_array_f4(ui count, vf* array)

Return floats from −1 to 1 in an array of count vectors

mc_rand_XX_minus1_to_1_array_d2(ui count, vd* array)

Return doubles from −1 to 1 in an array of count vectors

There are no data structures representing the number generators; you just call a function to initialize a generator and then call functions to request numbers. The first three functions initialize the hardware number generator, Kirkpatrick-Stoll generator, and Mersenne Twister generator, respectively. The Kirkpatrick-Stoll and Mersenne Twister functions require seed values, but the hardware generator doesn’t.

Each of the next five functions returns a single vector as output. None of them have any arguments, and the only difference between them is the datatype and range of the output values. Here are some examples:

  • mc_rand_hw_u4() returns a hardware-generated vector unsigned int.

  • mc_rand_ks_0_to_1_f4() returns a vector of four floats between 0 and 1 generated by the Kirkpatrick-Stoll method.

  • mc_rand_mt_minus1_to_1_d2() returns a vector of two doubles between −1 and 1 generated by the Mersenne Twister.

The next five functions are similar to the preceding five, but produce values in vector arrays rather than individual vectors. Each function takes two parameters: an integer identifying how many vectors are in the array, and a pointer to the vector array itself. Extending the earlier examples:

  • mc_rand_hw_array_u4(2, int_vec) places two vectors of hardware-generated integers in the int_vec array.

  • mc_rand_ks_0_to_1_array_f4(3, float_vec) places three vectors of floats between 0 and 1 generated by the Kirkpatrick-Stoll method in the float_vec array.

  • mc_rand_mt_minus1_to_1_array_d2(4, double_vec) places a vector of two doubles between −1 and 1 generated by the Mersenne Twister in the double_vec array.

By calling these functions, you can generate large numbers of random values quickly and without a for loop.

The code in Listing 18.2 shows how these functions are used in practice. It generates values from the hardware generator if available. Then it calls on the Kirkpatrick-Stoll method and Mersenne Twister to generate values. The last two PRNGs are initialized with a value returned by gettimeofday().

Example 18.2. Generating Random and Pseudo-Random Numbers: spu_random1.c

#include <stdio.h>
#include <stdlib.h>
#include <spu_intrinsics.h>
#include <sys/time.h>
#include <mc_rand.h>

int main(int argc, char **argv) {

   int i, j;
   vector unsigned int int_vec;
   vector float float_vec;
   vector double double_vec[4];

   /* Get the current time to use as seed */
   struct timeval time;
   gettimeofday(&time, NULL);

   /* The hardware number generator */
   printf("
Hardware Generator:
");
   if(mc_rand_hw_init() == 0) {
      int_vec = mc_rand_hw_u4();
      for(i=0; i<4; i++)
         printf("%u ",spu_extract(int_vec, i));
      printf("

");
   }
   else
      printf("Hardware generator not available.

");

   /* The Kirkpatrick Stoll PRNG */
   mc_rand_ks_init(time.tv_sec);
   float_vec = mc_rand_ks_0_to_1_f4();
   printf("Kirkpatrick-Stoll:
");
   for(i=0; i<4; i++)
      printf("%f ", spu_extract(float_vec, i));
   printf("

");

   /* The Mersenne Twister PRNG */
   mc_rand_mt_init(time.tv_sec);
   mc_rand_mt_minus1_to_1_array_d2(4, double_vec);
   printf("Mersenne Twister:
");
   for(i=0; i<4; i++) {
      for(j=0; j<2; j++)
         printf("%g ",spu_extract(double_vec[i], j));
      printf("
");
   }

   return 0;
}

There are two points to keep in mind when using these functions. First, be sure to check whether the hardware generator is available before trying to generate numbers with it. Second, array generation functions like mc_rand_ks_0_to_1_array_f4 always return void, whereas the vector generation functions like mc_rand_hw_u4 always return vectors.

Quasi-Random Numbers and the Sobol Generator

In addition to pseudo-random numbers, libmc_rand functions generate quasi-random numbers using the Sobol method. Each value in a Sobol sequence represents a point in a k-dimensional cube, where each of the cube’s dimensions extends from 0 and 1. As more bits are used in each point, more points can be packed inside the cube.

Initializing a Sobol generator is more complex than initializing any of the other generators. This is because Sobol generators require a table of direction numbers. A direction number is a special k-dimensional point whose dimensions lie between 0 and 1. The theory behind generating direction numbers requires a great deal of math, so this presentation only explains the method.

To compute n direction numbers in a k-dimensional cube, perform the following steps:

  1. For each of the k dimensions, find the an LFSR polynomial that produces 2n − 1 nonrepeating values. Form a binary sequence, ai, by taking the coefficients of the polynomial terms less than xn.

  2. Choose n odd integers (mi) for i between 1 and n, with each less than 2i. Then compute successive values of m with

    mi = 2a1mi−1 ⊕22a2mi−2 ⊕23 a3mi−3... ⊕2n anmi−nmi−n

    where ⊕ represents the bitwise binary XOR operation.

  3. As each mi is computed, calculate the corresponding direction number vi = mi/2i. Obtain n direction numbers for each of the k dimensions.

Each fractional direction number, vi, must be converted to an unsigned integer. This is done by setting bits 2 through 2+i of a 32-bit value equal to mi. To start, m1 must be odd and less than 21 = 2, so m1 always equals 1. Figure 18.3 shows how to compute direction numbers where m1−m4 equal [1, 3, 3, 7].

Expressing fractional direction numbers as integers

Figure 18.3. Expressing fractional direction numbers as integers

Successive values of m are calculated with the equation

mi = 2a1mi−1 ⊕22a2mi−2 ⊕23 a3mi−3... ⊕2n anmi−nmi−n

where ai are the lower terms of a polynomial corresponding to an n-stage LFSR. For example, the polynomial corresponding to the LFSR in Figure 18.3 is x4 + x + 1, so ai = {0, 0, 1, 1}. If mi = {1, 3, 3, 7}, m5 is computed by

Expressing fractional direction numbers as integers

Once the direction numbers are determined for each of the k dimensions, the Sobol generator will produce quasi-random numbers that fill the k-dimensional space.

Table 18.5 lists the libmc_rand functions associated with the Sobol generator. The names of the functions are similar to those in Table 18.4, but the parameters differ significantly.

Table 18.5. Monte Carlo Functions for Generating Sobol Sequences

Function Name

Purpose

mc_rand_sb_init(sobol_cntrlblck_t* block, uint max, uint dim, vector uchar* mem, uint memsize)

Initialize the Sobol generator with direction numbers for the given dimensions and memory usage

mc_rand_sb_seed()

Seed the Sobol generator

mc_rand_sb_u4()

Return numbers in a vector unsigned int

mc_rand_sb_0_to_1_f4()

Return numbers from 0 to 1 in a vector float

mc_rand_sb_0_to_1_d2()

Return numbers from 0 to 1 in a vector double

mc_rand_sb_minus1_to_1_f4()

Return numbers from −1 to 1 in a vector float

mc_rand_sb_minus1_to_1_d2()

Return numbers from −1 to 1 in a vector double

mc_rand_sb_array_u4()

Return numbers in a vector unsigned int

mc_rand_sb_0_to_1_f4()

Return numbers from 0 to 1 in a vector float

mc_rand_sb_0_to_1_d2()

Return numbers from 0 to 1 in a vector double

mc_rand_sb_minus1_to_1_f4()

Return numbers from −1 to 1 in a vector float

mc_rand_sb_minus1_to_1_d2

Return numbers from −1 to 1 in a vector double

Direction numbers are contained in a sobol_cntrlblck_t structure, declared in mc_rand_sb.h:

typedef struct {
   vector unsigned int* pu32_vecDirection;
   unsigned int u32sizeofTable;
   unsigned int u32TableDimension;
   unsigned int u32TableBitCount;
   unsigned int u32MaxBitCount;
   unsigned int u32Seed;
} sobol_cntrlblck_t;

The first field, pu32_vecDirection, points to the table of direction numbers, and the second identifies how many dimensions each generated point should have. u32TableDirection specifies much memory the table takes up. The next two fields tell the generator how many bits should be used for each quasi-random number, and the last field, u32Seed, provides an initial value for the generator.

The Sobol initialization function, mc_rand_sb_init, accepts the sobol_cntrlblck_t as its first parameter. The rest of the parameters are as follows:

  • max: The maximum number of values that need to be generated at any one time.

  • dim: How many dimensions each point should have. Should have the same value as the u32TableDimension field of sobol_cntrlblck_t.

  • mem:Memory location to store lookup table data and intermediate computation.

  • memsize: Size of lookup table storage area, mem.

The Monte Carlo library documentation strongly recommends defining SOBOL_RUNS and SOBOL_DIMENSION in code. The first value identifies the maximum number of elements that will be needed in an array (112 by default), and the second identifies the desired dimensionality of the generated values (40 by default).

The SDK provides an excellent example that demonstrates how Sobol sequences are generated and used in code. The /opt/cell/sdk/prototype/src/examples/monte_carlo directory contains a project folder called pi. The pi application approximates the value of π by generating Sobol numbers. The direction numbers are taken from a header file called sobol_init_30_40.h, which provides Sobol numbers for 30 bits and 40 dimensions.

Transforming the Distribution of Number Sequences

The number sequences produced thus far have a uniform probability distribution, which means that any possible value is as likely to occur as the next. For example, a sequence of unbiased dice rolls will have approximately equal numbers of ones, twos, threes, fours, fives, and sixes. When pulling a card out of a fair deck, it’s equally likely that the card’s suit is spades, clubs, hearts, or diamonds.

But many random processes don’t have uniform distributions. If a classroom of students take an exam, it’s much more likely that an exam grade will be close to the average than to the highest or lowest grades. If you sample heights of random people, the sequence will contain many values close to the average height but only a few who are extremely short or extremely tall. This average-oriented distribution is called the normal probability distribution, and its graph is commonly referred to as the bell curve. Figure 18.4 shows the graph of the uniform and normal probability distribution.

Normal and uniform probability distributions

Figure 18.4. Normal and uniform probability distributions

If a simulation requires normally distributed values, the output of the Mersenne Twister and the other pseudo-random generators won’t suffice. You need a way to transform the distribution of the values from uniform to normal. libmc_rand provides three methods for this purpose: Box-Muller, Moro inversion, and the Polar method.

The Monte Carlo Transformation Methods

The mathematical derivations of the Box-Muller, Moro inversion, and Polar methods lie beyond the scope of this book, but to decide which one to use, it’s important to see how they work. The Box-Muller method takes two numbers from a uniform distribution (x and y) and produces two numbers from a normal distribution (a and b). The relationship between a, b, x, and y is given by the following equations:

The Monte Carlo Transformation Methods

The Box-Muller method can only transform an even number of input values.

The Polar method is similar to the Box-Muller method, but removes the need to compute sine and cosine values. The Polar equations are given by

The Monte Carlo Transformation Methods

and

The Monte Carlo Transformation Methods

where

q =(2x−1)2 +(2y−1)2

The problem with the Polar method is that q must be less than 1. Any input values that would result in an invalid value must be discarded. Therefore, the number of output values will usually be less than the number of input values. Despite the discarding, the Polar method is as effective as the Box-Muller and generally computes values faster.

The last method is the Moro inversion, which is entirely different from the Box-Muller and Polar methods. It returns one normally distributed value for each uniformly distributed value, and does so by computing the inverse of the cumulative distribution function, or cdf. The cdf is the probability that a random variable is less than or equal to a specific value. You can find more information about Moro’s inversion and the other two methods in The Monte Carlo Programmer’s Guide and API Reference.

The Monte Carlo Transformation Functions

Table 18.6 lists all the libmc_rand functions that transform the distribution of numeric sequences. In each case, the input floats must lie between 0 and 1, but may not equal 0 or 1.

Table 18.6. Monte Carlo Functions for Transforming Uniform Sequences into Normal Sequences

Function Name

Purpose

mc_transform_bm2_array_f4(ui count, vf* in_array, vf* out_array)

Transforms count (must be even) vector floats of in_array into the normally distributed out_array using the Box-Muller method

mc_transform_bm2_array_d2(ui count, vd* in_array, vf* out_array)

Transforms count (must be even) vector doubles of in_array into the normally distributed out_array using the Box-Muller method

mc_transform_mi_f4(vf vec)

Transforms vec using Moro inversion and returns the transformed vector float

mc_transform_mi_d2(vd vec)

Transforms vec using Moro inversion and returns the transformed vector double

mc_transform_mi_array_f4(ui count, vf* in_array, vf* out_array)

Transforms count vector floats of in_array into the normally distributed out_array using Moro inversion

mc_transform_mi_array_d2(ui count, vd* in_array, vd* out_array)

Transforms count vector doubles of in_array into the normally distributed out_array using Moro inversion

mc_transform_po_f4(vf (*func)(void))

Transforms the vector float returned by func using the Polar method

mc_transform_po_d2(vd (*func)(void))

Transforms the vector double returned by func using the Polar method

mc_transform_po_array_f4(ui count, vf* in_array, vf* out_array, vf (*func)(void))

Transforms count vector floats of in_array into the normally distributed out_array using the Polar method

mc_transform_po_array_d2(ui count, vd* in_array, vd* out_array, vd (*func)(void))

Transforms count vector doubles of in_array into the normally distributed out_array using the Polar method

The Box-Muller functions are identified with bm2 and the Moro inversion functions are identified with mi. Each Box-Muller function operates on an array of uniformly distributed vectors and produces an output array of normally distributed values. The first two Moro inversion functions operate on arrays, and the second two receive individual vectors. The second pair of Moro inversion functions, mc_transform_mi_array_f4 and mc_transform_mi_array_d2, return their output values in a vector.

The last four functions in the table transform number distribution with the Polar method. Their usage is markedly different from the Box-Muller and Moro inversion functions. Each accepts a pointer to a function that generates pseudo-random numbers, and this is necessary because the Polar method discards unsuitable input values. To produce N output values, the Polar functions must call the generation function until all N outputs are produced.

The functions that use the Polar method can be confusing, so the code in Listing 18.3 shows how they’re used in practice. First, the spu_random2 application uses the Mersenne Twister to generate four vector floats containing uniformly distribute values. Then it transforms the uniformly distributed sequence into a normally distributed sequence using the Polar method.

Example 18.3. Transforming Random Numbers: spu_random2.c

#include <stdio.h>
#include <stdlib.h>
#include <spu_intrinsics.h>
#include <sys/time.h>
#include <mc_rand.h>
#include <simdmath.h>

int main() {

   int i, j;

   /* The input and output arrays */
   vector float uniform_vec[4];
   vector float normal_vec[4];

   /* Generate a seed */
   struct timeval time;
   gettimeofday(&time, NULL);

   /* Generate the random numbers */
   mc_rand_mt_init(time.tv_sec);
   mc_rand_mt_0_to_1_array_f4(4, uniform_vec);

   /* Transform the array */
   mc_transform_po_array_f4(4, uniform_vec, normal_vec,
      &mc_rand_mt_0_to_1_f4);

   /* Display the results */
   printf("Uniform Distribution: 
");
   for(i=0; i<4; i++)
      for(j=0; j<4; j++)
         printf("%f ", spu_extract(uniform_vec[i], j));

   printf("

Normal Distribution: 
");
   for(i=0; i<4; i++)
      for(j=0; j<4; j++)
         printf("%f ", spu_extract(normal_vec[i], j));
   printf("
");
   return 0;
}

The call to mc_transform_po_array_f4 might seem odd at first. The last parameter is a pointer to the function mc_rand_mt_0_to_1_f4, which uses the Mersenne Twister to produce a vector float of uniformly distributed values. If the Polar method can call the generation function, why does it need an input vector of floats? The answer is that the Polar method may need to discard input values, and the Mersenne Twister provides a backup mechanism to make sure that count values are always returned.

Conclusion

This chapter has covered a lot of ground in a small number of pages, and it’s a shame that neither of the topics have received the in-depth treatment they deserve. Multiprecision mathematics and Monte Carlo methods are not only important facets of the SDK, they’re also crucial in cryptography, simulation, and financial modeling. Research in these areas progresses at a rapid pace, so don’t be surprised if these libraries expand beyond their current capabilities.

The functions in the Multiprecision Math library, or libmpm, operate on values whose bit length far exceed any of the normally available datatypes. Many of the functions perform simple arithmetic, but others implement such complex algorithms as modular reduction and exponentiation. These routines are vital in cryptography, and although this chapter presented a basic RSA application, a full appreciation of the libmpm functions requires specialized knowledge in the field of computational mathematics.

The routines in the Monte Carlo library, libmc_rand, make it possible to generate and transform numbers whose properties approach randomness. Some functions generate pseudo-random numbers, whereas others generate quasi-random numbers. Pseudo-random numbers, generated by the Kirkpatrick-Stoll method and the Mersenne Twister, come close to the ideals of independence and identical distribution. Quasi-random numbers aren’t as independent as pseudo-random numbers, but are distributed in such a way as to occupy as much of the range as possible. They’re commonly used in simulations where space filling is a priority.

The transformation functions in libmc_rand accept uniformly distributed number sequences and produce values with a normal probability distribution. The Polar method functions execute faster than the Box-Muller and Moro inversion functions, but they’re more complex to use in code. This is because the Polar method requires that certain input values be discarded before processing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.27.100