Algorithm

The algorithm of the prototypical networks is shown here:

Let's say we have the dataset, D, comprising {(x₁, y₁), (x₂, y₂), ... (x_n, y_n)} where x is the feature and y is the class label.
Since we perform episodic training, we randomly sample n number of data points per each class from our dataset, D, and prepare our support set, S.
Similarly, we select n number of data points and prepare our query set, Q.
We learn the embeddings of the data points in our support set using our embedding function, f_∅ (). The embedding function can be any feature extractor—say, a convolutional network for images and an LSTM network for text.
Once we have the embeddings for each data point, we compute the prototype of each class by taking the mean embeddings of the data points under each class:

Similarly, we learn the query set embeddings.
We calculate the Euclidean distance, d, between query set embeddings and the class prototype.
We predict the probability, p_∅(y = k|x), of the class of a query set by applying softmax over the distance d:

We compute the loss function, J(∅), as a negative log probability, J(∅) = -logp_∅(y=k|x), and we try to minimize the loss using stochastic gradient descent.

Table of Contents for Algorithm