Transfer learning strategies

Let's start by first looking at a formal definition for transfer learning and then utilize it to understand different strategies. In their paper, A Survey on Transfer Learning (https://www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf), Pan and Yang use domain, task, and marginal probabilities to present a framework for understanding transfer learning. The framework is defined as follows:

A domain, D, is defined as a two-element tuple consisting of feature space, , and marginal probability, P(Χ), where Χ is a sample data point.

Here, Χ = {x₁, x₂....x_n} with x_i as a specific vector and Χ . Thus:

A task, T, on the other hand, can be defined as a two-element tuple of the label space, γ, and objective function, f. The objective function can also be denoted as P(γ| Χ) from a probabilistic view point. Thus:

Using this framework, we can define transfer learning as a process aimed at improving the target objective function, f_T (or target task, T_T), in the target domain, D_T, using knowledge from the T_Ssource task in the D_Sdomain. This leads to the following four scenarios:

Feature space: The feature spaces of source and target domains are different from each other, such as χ_s ≠ χ_t . For instance, if our tasks are related to document classification, this scenario refers to source and target tasks in different languages.
Marginal probability: The marginal probabilities or source and target domains are different, such as P(X_s) ≠ P(X_t). This scenario is also known as domain adaptation.
Label space: The label spaces of the source and target domains are different in this scenario, such as γ_s ≠ γ_t. This also usually implies the presence of scenario four—different conditional probabilities.
Conditional probabilities: In this case, P(Υ_s|Χ_s) ≠ P(Υ_t|Χ_t), such that the conditional probabilities are different in the source and target domains.

Transfer learning, as we have seen so far, is having the ability to utilize existing knowledge from the source learner in the target task. During the process of transfer learning, the following three important questions must be answered:

What to transfer: This is the first and the most important step in the whole process. We try to seek answers about which part of the knowledge can be transferred from the source to the target in order to improve the performance of the target task. When trying to answer this question, we try to identify which portion of knowledge is source-specific and what is common between the source and the target.
When to transfer: There can be scenarios where transferring knowledge for the sake of it may make matters worse than improving anything (also known as negative transfer). We should aim at utilizing transfer learning to improve target task performance/results and not degrade them. We need to be careful about when to transfer and when not to.
How to transfer: Once the what and when have been answered, we can proceed toward identifying ways of actually transferring the knowledge across domains/tasks. This involves changes to existing algorithms and different techniques, which we will cover in later sections of this chapter. Also, specific use cases are lined up in the next section for a better understanding of how to transfer.

The paper, A Survey on Transfer Learning, by Pan and Yang is available here: https://www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf.

Grouping techniques help us understand overall characteristics and provide a better framework for utilizing them. Transfer learning methods can be categorized based on the type of traditional ML algorithms involved, such as:

Inductive transfer: In this scenario, the source and target domains are the same, yet the source and target tasks are different from each other. The algorithms try to utilize the inductive biases of the source domain to help improve the target task. Depending upon whether the source domain contains labeled data or not, this can be further divided into two subcategories, similar to multitask learning and self-taught learning, respectively.
Unsupervised transfer: This setting is similar to inductive transfer itself, with a focus on unsupervised tasks in the target domain. The source and target domains are similar, but the tasks are different. In this scenario, labeled data is unavailable in either of the domains.
Transductive transfer: In this scenario, there are similarities between the source and target tasks but the corresponding domains are different. In this setting, the source domain has a lot of labeled data while the target domain has none. This can be further classified into subcategories, referring to settings where either the feature spaces are different or the marginal probabilities.

The three transfer categories discussed in the previous section outline different settings where transfer learning can be applied and studied in detail. To answer the question of what to transfer across these categories, some of the following approaches can be applied:

Instance transfer: Reusing knowledge from the source domain to the target task is usually an ideal scenario. In most cases, the source domain data cannot be reused directly. Rather, there are certain instances from the source domain that can be reused along with target data to improve results. In case of inductive transfer, modifications such as AdaBoost by Dai and their co-authors help utilize training instances from the source domain for improvements in the target task.
Feature-representation transfer: This approach aims to minimize domain divergence and reduce error rates by identifying good feature representations that can be utilized from the source to target domains. Depending upon the availability of labeled data, supervised or unsupervised methods may be applied for feature-representation-based transfers.
Parameter transfer: This approach works on the assumption that the models for related tasks share some parameters or prior distribution of hyperparameters. Unlike multitask learning, where both the source and target tasks are learned simultaneously, for transfer learning, we may apply additional weightage to the loss of the target domain to improve overall performance.
Relational-knowledge transfer: Unlike the preceding three approaches, the relational-knowledge transfer attempts to handle non-IID data, such as data that is not independent and identically distributed. In other words, data where each data point has a relationship with other data points; for instance, social network data utilizes relational-knowledge-transfer techniques.

In this section, we studied different strategies for performing transfer learning under different contexts and settings in a very generic manner. Let's now utilize this understanding and learn how transfer learning is applied in the context of deep learning.

Table of Contents for Transfer learning strategies

Create new playlist

Sign In

Sign Up

Table of Contents for
Transfer learning strategies