Relation networks in one-shot learning

A relation network consists of two important functions: the embedding function, denoted by , and the relation function, denoted by . The embedding function is used for extracting the features from the input. If our input is an image, then we can use a convolutional network as our embedding function, which will give us the feature vectors/embeddings of an image. If our input is a text, then we can use LSTM networks to get the embeddings of the text.

As we know, in one-shot learning, we have only a single example per class. For example, let's say our support set contains three classes with one example per class. As shown in the following diagram, we have a support set containing three classes, {lion, elephant, dog}:

And let's say we have a query image, , as shown in the following diagram, and we want to predict the class of this query image:

First, we take each image, , from the support set and pass it to the embedding function, , for extracting the features. Since our support set has images, we can use a convolution network as our embedding function for learning the embeddings. The embedding function will give us the feature vector of each of the data points in the support set. Similarly, we will learn the embeddings of our query image, , by passing it to the embedding function, .

So, once we have the feature vectors of the support set, , and query set, , we combine them using an operator, . Here can be any combination operator; we use concatenation as an operator for combining the feature vectors of support and query set—that is, .

As shown in the following figure, we will combine the feature vectors of the support set, , and query set, . But what is the use of combining like this? It will help us to understand how the feature vector of an image in the support set is related to the feature vector of a query image. In our example, it will help us to understand how the feature vectors of images of a lion, an elephant and a dog are related to the feature vector of the query image:

But how can we measure this relatedness? This is why we use a relation function, . We pass these combined feature vectors to the relation function, which will generate the relation score ranging from 0 to 1, representing the similarity between samples in the support set, , and samples in the query set, .

The following equation shows how we compute the relation score in a relation network:

In this equation, denotes the relation score representing similarity between each of the classes in the support set and the query image. Since we have three classes in the support set and one image in the query set, we will have three scores indicating how all three classes in the support set are similar to the query image.

The overall representation of a relation network in a one-shot learning setting is shown in the following diagram:

Table of Contents for Relation networks in one-shot learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Relation networks in one-shot learning