Exploring triplet loss

In this section, we shall look at the finer details of the cost function with a triplet loss. We will train network weights using the triplet cost function or the triplet loss. The idea was introduced in the paper by Schroff in 2015, FaceNet: A unified embedding for face recognition and clusteringWe shall also explore the areas to choose the triplet so that our network will learn to achieve really high accuracy. 

We begin by choosing the base or anchor image, which will be used as a sample for other comparts. The base image is as follows: 

We shall now select a different image that represents the same person; this is known as the positive image, shown as follows:

AS we have seen in the previous section, we want the similarity function d as close to zero as possible. It is mathematically expressed as:

Having this value close to zero means that the encoded values for the anchor image and positive image will be almost the same. This also implies that the images are the same, which is exactly what we want, in this case. 

To understand this better, let's keep the anchor image constant, and change the positive image to that of a different person, which would be known as a negative image: 

Here the distance function or the similarity function will produce a much larger value:

The reason we want a big value here is because we want the encoded values of the anchor image and the native image to be different, which implies that the images are different. This would help the neural network figure out the difference in the two. 

The name triplet is a derivative of the fact that we now require three images: the anchor image, the positive image, and the negative image, as compared to the classical case where we use two images. To understand this better, for one sample, let's use three images and feed this to a convolution architecture, but before we jump into this, let's take a look at the definition of triplet loss: 

We have the two similarity functions, the positive similarity function and the negative similarity function. The negative similarity function tends to be greater than zero, while the positive similarity function tends to go to zero.

If we replace the zero with the functions, will have the following equation:

So, the similarity function of a positive case will have to be smaller than the similarity function of the negative case. This makes  perfect sense, since we want the distance between the images of the same person to be considerably smaller than the distance between the images of different people. 

Let's solve this equation mathematically:

This holds mathematically true, and we have an equation that satisfies our conditions. 

In practice, a neural network could cheat by finding out the weights, so that the distances will have the same value. The neural network usually figures out weights in a few iterations, which causes the distance of the positive case to be the same as the distance of the negative case. And if they are the same, then our equation will be equal to zero. In this situation, even though our condition is satisfied, the network really learns nothing.  

To solve this, we shall add a constant  value as follows: 

The neural network has to work harder to produce at least a minimum difference of  between the two similar functions. If the condition is not satisfied, the neural network will have to keep trying for any number of iterations. Solving this equation further, we get the following:

Let us assume that the positive case produces an output of 0.5 and the  value is equal to 0.2:

The neural network will try to learn the weights that would help us attain some distance function for the negative image. Let's assume that it tries to cheat the equation the first time, by having 0.5, which is the same value as the positive image distance. The equation will not be satisfied as the final answer will be greater than zero, which is the exact opposite of what we require. 

The following occasion, it assumes the negative distance value to be 0.7. This perfectly satisfies the condition. 

For the next iteration, we can chose values 0.5, 0.3, and such until the final value obtained is less than 0.

This step-by-step, iteration-by-iteration approach helps us increase the negative cost function up as we want the difference of different values to be a greater value and decrease the positive cost function to reduce the difference. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.234.28