Recipe for CNN creation

The following points are based on our experience of training neural networks and of what is considered as current best practices from researchers in this field. Hopefully, they will help you if you ever need to design your own CNN architecture from scratch. But before trying out designing your own CNN, you should check out other off-the-shelf architectures to learn from them and also check if they already do the job for you.

Use convolution layers with kernels of size 3x3. Larger kernels are more expensive in terms of both parameters and computation. On top of this, as we saw in the earlier chapters, you can stack conv layers to produce a bigger receptive field and with the benefit of more nonlinear activations.
First layer convolutions should generally have at least 32 filters. This way, deeper layers are not restricted by the number of features that the first layer extracted.
Try to avoid the use of pooling layers, if possible. Instead, use convolution layers with strides of 2. This will downsample our inputs like pooling but it doesn’t just throw away valuable information like pooling does. Also, using strided convolutions is like combining conv and pooling together in one layer.
When you decrease the spatial size of your feature maps, you should increase the number of filters you use so that you don’t lose too much information too quickly. In deep networks, avoid reducing the spatial size too quickly in the first layers.
Follow the advice in this chapter about starting your network design as small and then gradually increasing the complexity so that you avoid overfitting issues.
Use batchnorm. It really helps with training your networks!
Gradually decrease the spatial size of your feature maps as you get deeper into your network.
Minimize the number of FC layers (use dropout before the final layer). Use FC only if you need to concatenate some scalar features in the end. (You can even avoid that by encoding things on the input channel)
If you require a large receptive field (detection or classification where object size is closer to the total image size), try using dilated convolutions with exponential dilation factor for each layer. This way, you will grow your receptive field very quickly while keeping the number of parameters low.

If the network becomes deep and training loss does not decrease, then consider using residual connections.
After you have your network accuracy within expected values, and if computation cost is an issue, then you might look at techniques like depthwise convolution, bottleneck modules, or whatever is coming out on the field, depending on the use case.

Remember that training and designing CNNs is a very empirical science so always be aware that what is considered as best practices can change quickly.

Table of Contents for Recipe for CNN creation

Create new playlist

Sign In

Sign Up

Table of Contents for
Recipe for CNN creation