Estimating π with Monte Carlo

First, we will apply our new knowledge of cuRAND to perform an estimate of the well-known mathematical constant π, or Pi, which is, of course, the never-ending irrational number 3.14159265358979...

To get an estimate, though, we need to take a moment to think about what this means. Let's think about a circle. Remember that the radius of a circle is the length from the center of the circle to any point in the circle; usually, this is designated with R. The diameter is defined as D = 2R, and the circumference C is the length around the circle. Pi is then defined as π = C / D . We can use Euclidean geometry to find a formula for the area of the circle, which turns out being A = πR² . Now, let's think about a circle with radius R being circumscribed in a square with all sides of length 2R:

So, of course, we know that the area of the square is (2R)² = 4R². Let's consider R=1, so that we have known that the area of the circle is exactly π, while the area of the square is exactly 4. Let's make a further assumption and state that both the circle and square are centered at (0,0) in the Cartesian plane. Now, let's take a completely random value within the square, (x,y), and see if it falls within the circle. How can we do this? By applying the Pythagorean formula: we do this by checking whether x² + y² is less than or equal to 1. Let's designate the total number of random points we choose with iters, and the number of hits with hits.

Let's do a little bit more thinking about this: the probability of picking a point within the circle should be proportionate to the area of the circle divided by the area of the rectangle; here, this is π / 4. However, if we choose a very large value of random points, notice that we will get the following approximation:

This is exactly how we will estimate π! The number of iterations we will have to do will be very high before we can come up with a decent estimate of Pi, but notice how nicely parallelizable this is: we can check the "hits" in different threads, splitting the total number of iterations among different threads. At the end of the day, we can just sum up the total number of hits among all of the threads to get our estimate.

We can now begin to write a program to make our Monte Carlo estimate. Let's first import the usual Python modules that we will need for a PyCUDA program, with one addition from SymPy:

SymPy is used for perfect symbolic computations that are to be made in Python so that when we have very large integers, we can use the Rational function to make a much more accurate floating-point estimate of a division.

import pycuda.autoinit
import pycuda.driver as drv
from pycuda import gpuarray
from pycuda.compiler import SourceModule
import numpy as np
from sympy import Rational

Now, we have to do something a little different than normal when we build our kernel: we need to set the option no_extern_c=True in SourceModule. This modifies how the code is compiled so that our code can properly link with C++ code, as required by the cuRAND library. We then begin writing our kernel and include the appropriate header:

ker = SourceModule(no_extern_c=True, source='''
#include <curand_kernel.h>

Now, let's include a macro for the Pythagorean distance. Since we are just checking if this value is equal to or below 1, we can, therefore, omit the square root. We will be using a lot of unsigned 64-bit integers, so let's make another macro to save us from typing unsigned long long over and over:

#define _PYTHAG(a,b) (a*a + b*b)
#define ULL unsigned long long

We can now set up our kernel. By the nature of PyCUDA, this will have to be compiled to the interface as a bonafide C function rather than as a C++ function. We do this with an extern "C" block:

extern "C" {

We can now define our kernel. We will have two parameters: one for iters, which is the total number of iterations for each thread, and another for an array that will hold the total number of hits for each thread. We will need a curandState object for this:

__global__ void estimate_pi(ULL iters, ULL * hits)
{
    curandState cr_state;

Let's hold the global thread ID in an integer called tid:

int tid = blockIdx.x * blockDim.x + threadIdx.x;

clock() is a device function that outputs the current time down to the millisecond. We can add tid to the output of clock() to get a unique seed for each thread. We don't need to use different subsequences or offsets, so let's set them both to 0. We will also carefully typecast everything here to 64-bit unsigned integers:

curand_init( (ULL) clock() + (ULL) tid, (ULL) 0, (ULL) 0, &cr_state);

Let's set up the x and y values to hold a random point in the rectangle:

float x, y;

We will then iterate iters times to see how many hits in the circle we get. We generate these with curand_uniform(&cr_state). Notice that we can generate them over 0 to 1, rather than from -1 to 1, since the squaring of these in the _PYTHAG macro will remove any negative values:

for(ULL i=0; i < iters; i++)
 {
     x = curand_uniform(&cr_state);
     y = curand_uniform(&cr_state);

     if(_PYTHAG(x,y) <= 1.0f)
         hits[tid]++;
 }

We can now end and close off our kernel, as well as the extern "C" block with another final } bracket:

return;
}
}
''')

Now, let's get the Python wrapper function to our kernel with get_function. We will also set up the block and grid sizes: 32 threads per block, and 512 blocks per grid. Let's calculate the total number of threads and set up an array on the GPU to hold all of the hits (initialized to 0s, of course):

pi_ker = ker.get_function("estimate_pi")
threads_per_block = 32
blocks_per_grid = 512
total_threads = threads_per_block * blocks_per_grid
hits_d = gpuarray.zeros((total_threads,),dtype=np.uint64)

Let's set up the total number of iterations per thread to 2²⁴:

iters = 2**24

We can now launch the kernel as usual:

pi_ker(np.uint64(iters), hits_d, grid=(blocks_per_grid,1,1), block=(threads_per_block,1,1))

Now, let's sum over the number of hits in the array, which gives us the total number of hits. Let's also calculate the total number of iterations among all of the threads in the array:

total_hits = np.sum( hits_d.get() )
total = np.uint64(total_threads) * np.uint64(iters)

We can now make our estimate with Rational, like so:

est_pi_symbolic =  Rational(4)*Rational(int(total_hits), int(total) )

We can now convert this into a floating point value:

est_pi = np.float(est_pi_symbolic.evalf())

Let's check our estimate against NumPy's constant value, numpy.pi:

print "Our Monte Carlo estimate of Pi is : %s" % est_pi
print "NumPy's Pi constant is: %s " % np.pi
print "Our estimate passes NumPy's 'allclose' : %s" % np.allclose(est_pi, np.pi)

We are now done. Let's run this from IPython and check it out (This program is also available as the monte_carlo_pi.py file under Chapter08 in this book's repository.):

Table of Contents for Estimating &#x3C0; with Monte Carlo

Create new playlist

Sign In

Sign Up

Table of Contents for
Estimating π with Monte Carlo