A simple 1D FFT

Let's start by looking at how we can use cuBLAS to compute a simple 1D FFT. First, we will briefly discuss the cuFFT interface in Scikit-CUDA.

There are two submodules here that we can access the cuFFT library with, cufft and fftcufft consists of a collection of low-level wrappers for the cuFFT library, while fft provides a more user-friendly interface; we will be working solely with fft in this chapter.

Let's start with the appropriate imports, remembering to include the Scikit-CUDA fft submodule:

import pycuda.autoinit
from pycuda import gpuarray
import numpy as np
from skcuda import fft

We now will set up some random array and copy it to the GPU. We will also set up an empty GPU array that will be used to store the FFT (notice that we are using a real float32 array as an input, but the output will be a complex64 array, since the Fourier transform is always complex-valued):

x = np.asarray(np.random.rand(1000), dtype=np.float32 )
x_gpu = gpuarray.to_gpu(x)
x_hat = gpuarray.empty_like(x_gpu, dtype=np.complex64)

We will now set up a cuFFT plan for the forward FFT transform. This is an object that cuFFT uses to determine the shape, as well as the input and output data types of the transform:

plan = fft.Plan(x_gpu.shape,np.float32,np.complex64)

We will also set up a plan for the inverse FFT plan object. Notice that this time we go from complex64 to real float32:

inverse_plan = fft.Plan(x.shape, in_dtype=np.complex64, out_dtype=np.float32)

Now, we must take the forward FFT from x_gpu into x_hat, and the inverse FFT from x_hat back into x_gpu. Notice that we set scale=True in the inverse FFT; we do this to indicate to cuFFT to scale the inverse FFT by 1/N:

fft.fft(x_gpu, x_hat, plan)
fft.ifft(x_hat, x_gpu, inverse_plan, scale=True)

We now will check x_hat against a NumPy FFT of x, and x_gpu against x itself:

y = np.fft.fft(x)
print 'cuFFT matches NumPy FFT: %s' % np.allclose(x_hat.get(), y, atol=1e-6)
print 'cuFFT inverse matches original: %s' % np.allclose(x_gpu.get(), x, atol=1e-6)

If you run this, you will see that x_hat does not match y, yet, inexplicably, x_gpu matches x. How is this possible? Well, let's remember that x is real; if you look at how the Discrete Fourier Transform is computed, you can prove mathematically that the outputs of a real vector will repeat as their complex conjugates after N/2. While the NumPy FFT fully computes these values anyway, cuFFT saves time by only computing the first half of the outputs when it sees that the input is real, and it sets the remaining outputs to 0. You should verify that this is the case by checking the preceding variables.

Thus, if we change the first print statement in the preceding code to only compare the first N/2 outputs between CuFFT and NumPy, then this will return true:

print 'cuFFT matches NumPy FFT: %s' % np.allclose(x_hat.get()[0:N//2], y[0:N//2], atol=1e-6)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.237.89