Why convolution?

So why is convolution so efficient? In order to see that, let's suppose for a moment that we are not using the convolution layer, but instead, a fully-connected or dense layer:

So, we have the input, 784, which is just a multiplication of 28 x 28, and the first hidden layer or dense layer, which is 11,520. Basically, multiplying the multiplication of the three numbers will give 11,520.

Since this is fully-connected, which means that each of the input, is connected to all of the outputs, it means that for each input, we have 11,500 parameters to learn. In total, that's 9,000,000 parameters to learn. Imagine this is just the first hidden layer; usually neuron networks have many of these hidden layers. Hence, basically, this doesn't look that promising.

As we know, when we use convolution, we don't predefine the nature of the filter, so we don't say it's a Sobel, horizontal, or vertical, but we rather let the neural network figure out the type of filter it wants to learn. And that means that the cells in the filters are just parameters. And the number of cells on this filter, if the image is 5 x 5 x 20, gives us 500 parameters. So we are comparing 500 parameters versus 9,000,000 parameters.

This is what we gain using convolution: we have fewer parameters, but, at the same time, the neural network's ability to predict is retained. Adding more dense layers will sometimes just slow it down and not provide any good performance. Although this is the best way to demonstrate as to why convolution is so efficient, there are two other ways of seeing that, and one of them is parameter sharing.

Parameter-sharing says that the filter that works on one position will also work just fine on the other positions:

So basically, you don't need three other filters; one type of filter on one position, another type of filter on another position, and the other one on another position. What works at one position will also work at other positions. And that, as you can imagine, reduces the number of parameters dramatically.

Similarly, we have the sparsity of the connections. The sparsity of the connections says that only a small part of the image or matrix is connected to the filter, and basically only these pixels under the selected window are connected, and the other pixels are just isolated, so they aren't connected at all, and that means that you have fewer parameters. The parameters you have to learn are only those connected with the filter at one time.

Another idea is that convolution makes the neural network prediction more robust, so a cat that is a little bit on the right, or on the left, or maybe skewed up and down, is just a cat for a convolution neural network.

We'll now apply the architecture to build a Java application for handwritten digits, and we'll get a much higher occurrence doing that.

Table of Contents for Why convolution?

Create new playlist

Sign In

Sign Up

Table of Contents for
Why convolution?