Conversion from traditional CNN to Fully Convnets

Something very important to efficient object detectors, that improves the reuse of computation, is to use sliding windows convolutionally. In order to this,we will convert all FC layers to CONV layers, as shown in the next figure.

The purpose of implementing our network this way is that they can use a bigger image as input than what they are originally designed for and at the same time share computations to make it all more efficient. The name of this type of network, where all the FC layers are converted to CONV layers, is called fully convolutional network (FCN).

The basic technique to convert an FC layer to a CONV layer is to use the kernel size as big as the input spatial dimensions and to use the number of filters to match the number of outputs on the FC layer. In this example we expect a 14x14x3 input image.

If we take the for example, training a fully convolutional network with input patches of 100 x 100 and test with input images sized 2,000 x 2,000, the effect would be to have a sliding window of size 100 x 100 running across the 2000 x 2000 image. When using using bigger input volumes (like in this example) the output of the FCN will be a volume where each cell corresponds to one slide of the 100x100 window patch on the original input image.

Now, every time we use an input image bigger than the original training input, the effect will be like we're actually sliding the classifier throughout the image but with less computation. Doing it this way, we make the sliding window convolutionally in one step through one forward pass of the CNN:

Table of Contents for Conversion from traditional CNN to Fully Convnets

Create new playlist

Sign In

Sign Up

Table of Contents for
Conversion from traditional CNN to Fully Convnets