Parameters and memory calculation

One of the coolest features of VGG is that due to its small kernel size in the conv layers, the amount of parameters used is low. If we remember from Chapter 2, Deep Learning and Convolutional Neural Networks, the amount of parameters in a convolution layer (minus the bias) can be calculated as follows:

So, for example, the first layer would have the following parameters:

Beware, though, that this low number of parameters is not the case when it comes to the fully connected (dense) layers at the end of the model, which is usually the place where we can find a lot of the model parameters. This is especially true if, like in VGGNet, you stack multiple dense layers one after the other.

For example, the first dense layer would have this amount of parameters:

That's more than six times all the parameters up to that point!

As mentioned earlier, you need to have a large number of samples in your training dataset to consume your model parameters, so it's better to avoid this explosion of parameters with excessive use of fully connected layers. Luckily, people found out that VGGNet works pretty much the same if we have just one dense layer at the end rather than three. So, removing these fully connected layers removes a huge number of parameters from the model, while not reducing performance much. As a result, this is what we recommend that you do as well if you decide to implement a VGGNet.

Table of Contents for Parameters and memory calculation

Create new playlist

Sign In

Sign Up

Table of Contents for
Parameters and memory calculation