How? An overview

How should we use transfer learning? There are two typical ways to go about this. The first and less timely way, is to use what is known as a pre-trained model, that is, a model that has previously been trained on a large scale dataset, for example, the ImageNet dataset. These pre-trained models are readily available across different deep learning frameworks and are often referred to as "model zoos". The choice of a pre-trained model is largely dependent on what the current task to be solved is, and on the size of the datasets. After the choice of model, we can use all of it or parts of it, as the initialized model for the actual task that we want to solve.

The other, less common way deep learning is to pretrain the model ourselves. This typically occurs when the available pretrained networks are not suitable to solve specific problems, and we have to design the network architecture ourselves. Obviously, this requires more time and effort to design the model and prepare the dataset. In some cases, the dataset to pretrain the network on can even be synthetic, generated from computer graphics engines such as 3D studio Max or Unity, or other convolutional neural networks, such as GANs. The model pretrained on virtual data can be fine-tuned on real data, and it can work equally well with a model trained solely on real data.

If, for example, we want to discriminate between cats and dogs, and we do not have enough data, we can download a network trained on ImageNet from the "model zoo" and use the weights from all but the last of its layers. The last layer has to be adjusted to have the same size as the number of classes, in our case two, and the weights to be reinitialized and trained. In this way, we do what we call freezing of the layers that are not to be trained by setting the learning rate for these layers to zero, or to a very small number (refer to the following figure). In case a bigger dataset is available, we can train the last three fully connected layers. Sometimes, the pretrained network can be used only to initialize the weights and then be trained normally.

Transfer learning works because the features computed at the initial layers are more general and look similar. The features extracted in the top layers become more specific to the problem that we want to solve.

For a further look into how to use transfer learning, and a deeper understanding of the topic, let's take a look at the code.

Table of Contents for How? An overview

Create new playlist

Sign In

Sign Up

Table of Contents for
How? An overview