Deep convolutional networks

A network structure that recently achieved very good results at image recognition benchmarks is the convolutional neural network (CNN) or ConvNet. CNNs are a type of feedforward neural network that are structured in such a way that it emulates the behavior of the visual cortex, exploiting 2D structures of an input image, that is, patterns that exhibit spatially local correlation. It works on the basic principles of how the brain recalls or remembers images. We, as humans, remember images on the basis of features only. Given the features, our brain will start forming the image itself. In computers, consider the following diagram, which shows how a feature is detected:

In the same way, many features are detected from the image, as shown in the following diagram:

A CNN consists of a number of convolutional and subsampling layers, optionally followed by fully connected layers. An example of this shown in the following diagram. The input layer reads all of the pixels in an image and then we apply multiple filters. In the following diagram, four different filters are applied. Each filter is applied to the original image; for example, one pixel of a 6 x 6 filter is calculated as the weighted sum of a 6 x 6 square of input pixels and corresponding 6 x 6 weights. This effectively introduces filters that are similar to standard image processing, such as smoothing, correlation, edge detection, and so on. The resulting image is called a feature map. In the example in the following diagram, we have four feature maps, one for each filter.

The next layer is the subsampling layer, which reduces the size of the input. Each feature map is subsampled typically with mean or max pooling over a contiguous region of 2 x 2 (up to 5 x 5 for large images). For example, if the feature map size is 16 x 16 and the subsampling region is 2 x 2, the reduced feature map size is 8 x 8, where 4 pixels (a 2 x 2 square) are combined into a single pixel by calculating the max, min, mean, or some other functions:

The network may contain several consecutive convolution and subsampling layers, as shown in the preceding diagram. A particular feature map is connected to the next reduced/convoluted feature map, while feature maps at the same layer are not connected to each other.

After the last subsampling or convolutional layer, there is usually a fully connected layer, identical to the layers in a standard multilayer neural network, which represents the target data.

A CNN is trained using a modified backpropagation algorithm that takes the subsampling layers into account and updates the convolutional filter weights based on all the values where this filter is applied.

Some good CNN designs can be found at the ImageNet competition results page: http://www.image-net.org/. An example is AlexNet, which is described in the ImageNet Classification with Deep Covolutional Neural Networks paper by A. Krizhevsky et al.

This concludes our review of the main neural network structures. In the following section, we'll move on to the actual implementation.

Table of Contents for Deep convolutional networks

Create new playlist

Sign In

Sign Up

Table of Contents for
Deep convolutional networks