An essential component of the convolutional neural network architecture is a reduction in the amount of data from the input to the output of the model while still increasing the channel depth. As mentioned earlier, this is usually done by choosing a convolution step (stride) or pooling layers. The receptive field determines how much of the original input from the source is processed at the output. The expansion of the receptive field allows convolutional layers to combine low-level features (lines, edges) to create higher-level features (curves, textures):
The receptive field, , of layer k can be given by the following formula:
Here, is the receptive field of the layer, k - 1, is the filter size, and is the stride of layer i. So, for the preceding example, the input layer has RF = 1, the hidden layer has RF = 3, and the last layer has RF = 5.
Now that we're acquainted with the basic concepts of convolutional neural networks, let's look at how we can combine them to create a concrete network architecture for image classification.