Understanding Numba, CUDA, and GPU acceleration

You have seen how simple it is to create CPU-accelerated code using Numba. Numba also provides a similar interface to make a computation on a GPU using Compute Unified Device Architecture (CUDA). Let's port our IOU matrix calculation function to be computed on a GPU using Numba.

We can instruct Numba to make the computation on a GPU by slightly modifying the decorator parameters, as follows:

@numba.guvectorize(['(f8[:, :], f8[:, :], f8)'], '(m,k),(n,k1)->()',target="cuda")
def mat_mul(x, y, z):
    for i in range(x.shape[0]):
        for j in range(y.shape[1]):
            z=iou(x[i],y[j])

Here, we have instructed Numba to make the computation on a GPU by passing target="cuda". We also have work to do on the iou function. The new function looks as follows:

@numba.cuda.jit(device=True)
def iou(a: np.ndarray, b: np.ndarray) -> float:
    xx1 = max(a[0], b[0])
    yy1 = max(a[1], b[1])
    xx2 = min(a[2], b[2])
    yy2 = min(a[3], b[3])
    w = max(0., xx2 - xx1)
    h = max(0., yy2 - yy1)
    wh = w * h
    result = wh / ((a[2]-a[0])*(a[3]-a[1])
        + (b[2]-b[0])*(b[3]-b[1]) - wh)
    return result

First of all, we have changed the decorator, which now uses numba.cuda.jit instead of numba.jit. The latter instructs Numba to create a function that is executed on a GPU. This function itself is called from a function that is running on a GPU device. For that purpose, we have passed device=True, which explicitly states that this function is intended to be used from functions that are calculated on a GPU.

You can also note that we made quite a few modifications so that we have eliminated all the NumPy function calls. As with CPU acceleration, this is due to the fact that numba.cuda cannot currently perform all operations that were available in the function, and we replaced them with the ones that numba.cuda supports.

Usually, in computer vision, your app will require GPU acceleration only when you are working with deep neural networks (DNNs). Most of the modern deep learning frameworks, such as TensorFlow, PyTorch, and MXNet, support GPU acceleration out of the box, allowing you to be away from low-level GPU programming and to concentrate on your models instead. After analyzing the frameworks, if you find yourself with a specific algorithm that you think should be necessarily implemented with CUDA directly, you might want to analyze the numba.cuda API, which supports most of the CUDA features.

Table of Contents for Understanding Numba, CUDA, and GPU acceleration

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding Numba, CUDA, and GPU acceleration