VGGNet

Created by the Visual Geometry Group (VGG) at Oxford University, VGGNet was one of the first architectures to really introduce the idea of stacking a much larger number of layers together. While AlexNet was considered deep when it first came out with its seven layers, this is now a small amount compared to both VGG and other modern architectures.

VGGNet uses only very small filters with a spatial size of 3x3, compared to AlexNet, which had up to 11x11. These 3x3 convolution filters are frequently interspersed with 2x2 max pooling layers.

Using such small filters means that the neighborhood of pixels seen is also very small. Initially, this might give the impression that local information is all that is being taken into account by the model. Interestingly though, by stacking small filters one after another, it gives the same "receptive field" as a single large filter. For example, stacking three lots of 3x3 filters will have the same receptive field as one 7x7 filter.

This insight of stacking filters brings the bonus of being able to have a deeper structure (which we will see is usually always better) that retains the same receptive field size, while also reducing the number of parameters. This idea is further explored later in the chapter.

Table of Contents for VGGNet

Create new playlist

Sign In

Sign Up

Table of Contents for
VGGNet