Transfer learning

The reusability of code is one of the fundamental concepts of object-oriented programming (OOP) and it is pretty popular in the software-engineering world. Similarly, transfer learning involves reusing a model built to achieve a specific task to solve another related task.

It is understandable that to achieve better performance measurements, ML models need to be trained on large amounts of labeled data. The availability of fewer amounts of data means less training and the result is a model with suboptimal performance.

Transfer learning attempts to solve the problems arising from the availability of fewer amounts of data by reusing the knowledge obtained by a different related model. Having fewer data points available to train a model should not impede building a better model, which is the core concept behind transfer learning. The following diagram is an illustration showing the purpose of transfer learning in an image recognition task that classifies dog and cat images:

In this task, a neural network model is involved with detecting the edges, color blob detection, and so on in the first few layers. Only at the progressive layers (maybe in the last few layers) does the model attempt to identify the facial features of dogs or cats in order to classify them as one of the targets (a dog or a cat).

It may be observed that the tasks of identifying edges and color blobs are not specific to cats' and dogs' images. The knowledge to infer edges or color blobs may be generally inferred even if a model is trained on non-dog or non-cat images. Eventually, if this knowledge is clubbed with knowledge derived from inferring cat faces versus dog faces, even if they are small in number, we will have a better model than the suboptimal model obtained by training on fewer images.

In the case of a dogs-cats classifier, first, a model is trained on a large set of images that are not confined to cats' and dogs' images. The model is then taken and the last few layers are retrained on the dogs' and cats' faces. The model, thus obtained, is then tested and used post evidencing performance measurements that are satisfactory.

The concept of transfer learning is used not just for image-related tasks. Another example of it being used is in natural language processing (NLP) where it can perform sentiment analysis on text data.

Assume a company that launched a new product has a concept that never existed before (say, for now, a flying car). The task is to analyze the tweets related to the new product and identify each of them as being of positive, negative, or neural sentiment. It may be observed that prior, labeled tweets are unavailable in the flying car's domain. In such cases, we can take a model built based on the labeled data of generic product reviews for several products and domains. We can reuse the model by supplementing it with flying-car-domain-specific terminology to avail a new model. This new model will be finally used for testing and deploying to analyze sentiment on the tweets obtained about the newly launched flying cars.

It is possible to achieve transfer learning through the following two ways:

By reusing one's own model
By reusing a pretrained model

Pretrained models are models built by various organizations or individuals as part of their research work or as part of a competition. These models are generally very complex and are trained on large amounts of data. They are also optimized to perform their tasks with high precision. These models may take days or weeks to train on modern hardware. Organizations or individuals often release these models under permissive license for reuse. Such pretrained models can be downloaded and reused through the transfer-learning paradigm. This will effectively make use of the vast existing knowledge that the pretrained models possess, which would otherwise be hard to attain for an individual with limited hardware resources and amounts of data to train.

There are several pretrained models made available by various parties. The following described are some of the popular pretrained models:

Inception-V3 model: This model has been trained on ImageNet as part of a large visual recognition challenge. The competition required the participants to classify a given image into one of 1,000 classes. Some of the classes include the names of animals and object names.

MobileNet: This pretrained model has been built by Google and it is meant to perform object detection using the ImageNet database. The architecture is designed for mobiles.
VCG Face: This is a pretrained model built for face recognition.
VCG 16: This is a pretrained model trained on the MS COCO dataset. This one accomplishes image captioning; that is, given an input image, it generates a caption describing the image's contents.
Google's Word2Vec model and Stanford's GloVe model: These pretrained models take text as input and produce word vectors as output. Distributed word vectors offer one form of representing documents for NLP or ML applications.

Now that we have a basic understanding of various possible ML methods, in the next section, we focus on quickly reviewing the key terminology used in ML.

Table of Contents for Transfer learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Transfer learning