GPU acceleration

Convolution and its associated operations tend to do very well on GPU acceleration. You saw earlier that our GPU acceleration had minimal impact, but it is extremely useful for building CNNs. All we need to do is add the magical 'cuda' build tag, as shown here:

go build -tags='cuda'

As we tend to be more memory constrained on GPUs, be aware that the same batch size may not work on your GPU. The model as mentioned previously uses around 4 GB of memory, so you will probably want to reduce the batch size if you have less than 6 GB of GPU memory (because presumably, you will be using about 1 GB for your normal desktop). If your model is running very slowly, or the CUDA version of your executable just fails, it would be prudent to check if being out of memory is the issue. You can do this using the NVIDIA SMI utility and getting it to check your memory every second, as shown here:

nvidia-smi -l 1

This will tend to produce the following report every second; watching it while your code runs will tell you broadly how much GPU memory your code is consuming:

Let's quickly compare the performance between CPU and GPU versions of our code. The CPU version takes broadly around three minutes per epoch, as shown in the following code:

2018/12/30 13:23:36 Batches 500
2018/12/30 13:26:23 Epoch 0 |
2018/12/30 13:29:15 Epoch 1 |
2018/12/30 13:32:01 Epoch 2 |
2018/12/30 13:34:47 Epoch 3 |
2018/12/30 13:37:33 Epoch 4 |
2018/12/30 13:40:19 Epoch 5 |
2018/12/30 13:43:05 Epoch 6 |
2018/12/30 13:45:50 Epoch 7 |
2018/12/30 13:48:36 Epoch 8 |
2018/12/30 13:51:22 Epoch 9 |
2018/12/30 13:51:55 Epoch Test |

The GPU version takes around two minutes thirty seconds per epoch, as shown in the following code:

2018/12/30 12:57:56 Batches 500
2018/12/30 13:00:24 Epoch 0
2018/12/30 13:02:49 Epoch 1
2018/12/30 13:05:15 Epoch 2
2018/12/30 13:07:40 Epoch 3
2018/12/30 13:10:04 Epoch 4
2018/12/30 13:12:29 Epoch 5
2018/12/30 13:14:55 Epoch 6
2018/12/30 13:17:21 Epoch 7
2018/12/30 13:19:45 Epoch 8
2018/12/30 13:22:10 Epoch 9
2018/12/30 13:22:40 Epoch Test

A future version of Gorgonia will also include support for better operations; this is currently in testing, and you can use it by importing gorgonia.org/gorgonia/ops/nn and replacing your Conv2d, Rectify, MaxPool2D, and Dropout calls from their Gorgonia versions with their nnops version, An example of a slightly different Layer 0 is as follows:

if c0, err = nnops.Conv2d(x, m.w0, tensor.Shape{3, 3}, []int{1, 1}, []int{1, 1}, []int{1, 1}); err != nil {
    return errors.Wrap(err, "Layer 0 Convolution failed")
}
if a0, err = nnops.Rectify(c0); err != nil {
    return errors.Wrap(err, "Layer 0 activation failed")
}
if p0, err = nnops.MaxPool2D(a0, tensor.Shape{2, 2}, []int{0, 0}, []int{2, 2}); err != nil {
    return errors.Wrap(err, "Layer 0 Maxpooling failed")
}
if l0, err = nnops.Dropout(p0, m.d0); err != nil {
    return errors.Wrap(err, "Unable to apply a dropout")
}

As an exercise, replace all the necessary operations and run it to see how it is different.

Table of Contents for GPU acceleration

Create new playlist

Sign In

Sign Up

Table of Contents for
GPU acceleration