Using cuFFT for 2D convolution 

Now we are going to make a small program that performs Gaussian filtering on an image using cuFFT-based two-dimensional convolution. Gaussian filtering is an operation that smooths a rough image using what is known as a Gaussian filter. This is named as such because it is based on the Gaussian (normal) distribution in statistics. This is how the Gaussian filter is defined over two dimensions with a standard deviation of σ:

When we convolve a discrete image with a filter, we sometimes refer to the filter as a convolution kernel. Oftentimes, image processing engineers will just call this a plain kernel, but since we don't want to confuse these with CUDA kernels, we will always use the full term, convolution kernel. We will be using a discrete version of the Gaussian filter as our convolution kernel here.

Let's start with the appropriate imports; notice that we will use the Scikit-CUDA submodule linalg here. This will provide a higher-level interface for us than cuBLAS. Since we're working with images here, we will also import Matplotlib's pyplot submodule. Also note that we will use Python 3-style division here, from the first line; this means that if we divide two integers with the / operator, then the return value will be a float without typecasting (we perform integer division with the // operator):

from __future__ import division
import pycuda.autoinit
from pycuda import gpuarray
import numpy as np
from skcuda import fft
from skcuda import linalg
from matplotlib import pyplot as plt

Let's jump right in and start writing the convolution function. This will take in two NumPy arrays of the same size, x and y. We will typecast these to complex64 arrays, and then return -1 if they are not of the same size:

def cufft_conv(x , y):
    x = x.astype(np.complex64)
    y = y.astype(np.complex64)

    if (x.shape != y.shape):
        return -1

We will now set up our FFT plan and inverse FFT plan objects:

plan = fft.Plan(x.shape, np.complex64, np.complex64)
inverse_plan = fft.Plan(x.shape, np.complex64, np.complex64)

Now we can copy our arrays to the GPU. We will also set up some empty arrays of the appropriate sizes to hold the FFTs of these arrays, plus one additional array that will hold the output of the final convolution, out_gpu:

 x_gpu = gpuarray.to_gpu(x)
 y_gpu = gpuarray.to_gpu(y)
 
 x_fft = gpuarray.empty_like(x_gpu, dtype=np.complex64)
 y_fft = gpuarray.empty_like(y_gpu, dtype=np.complex64)
 out_gpu = gpuarray.empty_like(x_gpu, dtype=np.complex64)

We now can perform our FFTs:

fft.fft(x_gpu, x_fft, plan)
fft.fft(y_gpu, y_fft, plan)

We will now perform pointwise (Hadamard) multiplication between x_fft and y_fft with the linalg.multiply function. We will set overwrite=True so as to write the final value into y_fft:

linalg.multiply(x_fft, y_fft, overwrite=True)

Now we will call the inverse FFT, outputting the final result into out_gpu. We transfer this value to the host and return it:

fft.ifft(y_fft, out_gpu, inverse_plan, scale=True)
conv_out = out_gpu.get()
return conv_out

We are not done yet. Our convolution kernel will be much smaller than our input image, so we will have to adjust the sizes of our two 2D arrays (both the convolution kernel and the image) so that they are equal and perform the pointwise multiplication between them. Not only should we ensure that they are equal, but we also need to ensure that we perform zero padding on the arrays and that we appropriately center the convolution kernel. Zero padding means that we add a buffer of zeros on the sides of the images so as to prevent a wrap-around error. If we are using an FFT to perform our convolution, remember that it is a circular convolution, so the edges will literally always wrap-around. When we are done with our convolution, we can remove the buffer from the outside of the image to get the final output image.

Let's create a new function called conv_2d that takes in a convolution kernel, ker, and an image, img. The padded image size will be (2*ker.shape[0] + img.shape[0], 2*ker.shape[1] + img.shape[1]). Let's set up the padded convolution kernel first. We will create a 2D array of zeros of this size, and then set the upper-left submatrix as our convolution kernel, like so:

def conv_2d(ker, img):

    padded_ker = np.zeros( (img.shape[0] + 2*ker.shape[0], img.shape[1] + 2*ker.shape[1] )).astype(np.float32)
    padded_ker[:ker.shape[0], :ker.shape[1]] = ker

We will now have to shift our convolution kernel so that its center is precisely at the coordinate (0,0). We can do this with the NumPy roll command:

padded_ker = np.roll(padded_ker, shift=-ker.shape[0]//2, axis=0)
padded_ker = np.roll(padded_ker, shift=-ker.shape[1]//2, axis=1)

Now we need to pad the input image:

padded_img = np.zeros_like(padded_ker).astype(np.float32)
padded_img[ker.shape[0]:-ker.shape[0], ker.shape[1]:-ker.shape[1]] = img

Now we have two arrays of the same size that are appropriately formatted. We can now use our cufft_conv function that we just wrote here:

out_ = cufft_conv(padded_ker, padded_img)

We now can remove the zero buffer outside of our image. We then return the result:

output = out_[ker.shape[0]:-ker.shape[0], ker.shape[1]:-ker.shape[1]]

return output

We are not yet done. Let's write some small functions to set up our Gaussian filter, and then we can move on to applying this to an image. We can write the basic filter itself with a single line using a lambda function:

gaussian_filter = lambda x, y, sigma : (1 / np.sqrt(2*np.pi*(sigma**2)) )*np.exp( -(x**2 + y**2) / (2 * (sigma**2) ))

We can now write a function that uses this filter to output a discrete convolution kernel. The convolution kernel will be of height and length 2*sigma + 1, which is fairly standard:

Notice that we normalize the values of our Gaussian kernel by summing its values into total_ and dividing it.

def gaussian_ker(sigma):
    ker_ = np.zeros((2*sigma+1, 2*sigma+1))
    for i in range(2*sigma + 1):
        for j in range(2*sigma + 1):
            ker_[i,j] = gaussian_filter(i - sigma, j - sigma, sigma)
    total_ = np.sum(ker_.ravel())
    ker_ = ker_ / total_
    return ker_

We are now ready to test this on an image! As our test case, we will use Gaussian filtering to blur a color JPEG image of this book's editor, Akshada Iyer. (This image is available under the Chapter07 directory in the GitHub repository with the file name akshada.jpg.) We will use Matplotlib's imread function to read the image; this is stored as an array of unsigned 8-bit integers ranging from 0 to 255 by default. We will typecast this to an array of floats and normalize it so that all of the values will range from 0 to 1.

Note to the readers of the print edition of this text: although the print edition of this text is in greyscale, this a color image.

We will then set up an empty array of zeros that will store the blurred image:

if __name__ == '__main__':
    akshada = np.float32(plt.imread('akshada.jpg')) / 255
    akshada_blurred = np.zeros_like(akshada)

Let's set up our convolution kernel. Here, a standard deviation of 15 should be enough:

ker = gaussian_ker(15)

We can now blur the image. Since this is a color image, we will have to apply Gaussian filtering to each color layer (red, green, and blue) individually; this is indexed by the third dimension in the image arrays:

for k in range(3):
    akshada_blurred[:,:,k] = conv_2d(ker, akshada[:,:,k])

Now let's look at the Before and After images side-by-side by using some Matplotlib tricks:

fig, (ax0, ax1) = plt.subplots(1,2)
fig.suptitle('Gaussian Filtering', fontsize=20)
ax0.set_title('Before')
ax0.axis('off')
ax0.imshow(akshada)
ax1.set_title('After')
ax1.axis('off')
ax1.imshow(akshada_blurred)
plt.tight_layout()
plt.subplots_adjust(top=.85)
plt.show()

We can now run the program and observe the effects of Gaussian filtering:

This program is available in the Chapter07 directory in a file called conv_2d.py in the repository for this book.

Table of Contents for Using cuFFT for 2D convolution&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
Using cuFFT for 2D convolution