Learning about transfer learning

Usually, a CNN has millions of parameters. Let's make an estimation to find out where all of those parameters come from.

Suppose we have a 10-layer network and each layer has 100 filters of size 3 x 3. These numbers are quite low and networks that have good performance usually have dozens of layers and hundreds of filters in each layer. For our case, each filter has a depth of 100.

Hence, each filter has 3 x 3 x 3 = 900 parameters (excluding biases, the number of which is 100), which results in 900 x 100 parameters for each layer and, therefore, about 900,000 parameters for the complete network. To learn so many parameters from scratch without overfitting would require quite a large annotated dataset. A question arises: what can we do instead?

You have learned that layers of a network act as feature extractors. Besides this, natural images have quite a lot in common. Therefore, it would be a good idea to use the feature extractors of a network that was trained on a large dataset to achieve good performance on a different, smaller dataset. This technique is called transfer learning.

Let's pick a pretrained model as our base model, which is a single line of code with Keras:

base_model = K.applications.MobileNetV2(input_shape=(224,224, 3), include_top=False)

Here, we use the MobileNetV2 pretrained network, which is a robust and lightweight network. Of course, you can use other available models instead, which can be found on the Keras website or by simply listing them with dir(K.applications).

We have taken the version of the network that excludes the top layers responsible for classification by passing in include_top=False, as we are going to build a new classifier on top of it. But still, the network includes all the other layers that were trained on ImageNet. ImageNet is a dataset that includes millions of images and each of the images is annotated with one of 1,000 classes of the dataset.

Let's take a look at the shape of the output of our base model:

print(base_model.output.shape)

The result is as follows:

(None, 7, 7, 1280)

The first number is undefined and denotes the batch size or, in other words, the number of input images. Suppose we simultaneously pass a stack of 10 images to the network; then, the output here would have a shape of (10,7,7,1280) and the first dimension of the tensor will correspond to the input image number.

The next two indexes are the size of the output shape and the last is the number of channels. In the original model, this output represents features from the input images that are later used to classify the images of the ImageNet dataset.

Therefore, they are quite a good representation of all the images so that the network can classify the images of ImageNet based on them. Let's try to use these features to classify the types and breeds of our pets. In order to do this, let's first prepare a classifier in the next section.

Table of Contents for Learning about transfer learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Learning about transfer learning