Applying content cost function

In this section, we are going to define the content cost function and then formalize the function a bit more by calculating the derivative, which we will use for further applications as well. We will use transfer learning or a pre-trained convolution architecture, such as VGG-16, but this time in a different way. Instead of using the model for this prediction for the softmax layer, we will use the layer's knowledge or their ability to capture the features of the images as depicted in the following diagram:

As we saw in the first section, What are convolution network layers learning, the first layers of the neural network capture rather low-level features, such as shapes or maybe colors, and as we move deeper, the layers are detecting more high-level features, and at the same time also capture a bigger part of the image, as shown in the preceding diagram.

Suppose that we feed our content and generated image to this neural network, and for each of them, we store the corresponding activation values as well, as shown in the following diagram:

A pre-trained network, such as VGG-16 was trained for a long time with millions of images, so these layers of the network have really learned to capture a vast, complex set of features during that time.

Now, knowing this, if the activation values of a specific layer between those two images are quite similar, then we can say that the images are also similar. In some way, this is true, but of course that highly depends on the layer as well.

For early layers, even those activation values are the same between the images. We can't predict whether the images are the same, since those capture only low-level features. So basically, even if some images share their shapes and some colors, it does not make them similar. But as we go deeper, the layers detect more high-level features and bigger parts of the images, and that's when we can say that those two images are the same, where the activation values are also the same.

Remember, we want the generated image to be similar to the content image, but also to the style image at the same time. So we don't want the generated image to be identical to the content image, but in a way, we need to leave some room also for the style image as well. So, it is not in our best interest to go on the deepest layer of this neural network, because that will make this generated image identical to the content image. We need the end, but not very deep into it.

So in our example here, this layer could be okay. So, we will take the activation in the layer l for the content and generated image, and depending on the comparison as to how similar or how different they are, we will use that feedback to update the pixels of the generated image to be more similar to the content image. But again we'll leave room here also for the style image, and that's why we won't go that deep:

Let's try to formalize a bit further. So, we had this general cost function formula:

This is simply the sum of the content cost function and the style cost function, which we will see in the next section. And this coefficient simply affects how similar you want the generated image to be to one of these.

So, the cost function of the content image will look like this, where, 1/4 is multiplied by the width, height, and the number of channels in the layer that we choose to compare:

And as we go deeper, the width and the height tend to shrink, while the number of channels increases. And then we have the squared difference between those activation values of the content and generated image, and this is quite similar to what we have seen so far.

Now, the cost function itself will tell us how similar those two activation values are between them, but actually, in order to change the pixels and the weights, we need to turn this into a feedback, and mathematically that is done by the derivative of the cost function. And in this instance, it is quite simple. So we just used simple algebra and put two at the front, and 2 x 4 makes two, and we leave this unchanged because those are just constants, and then this ends up just being the absolute difference between the activation values:

So each time we will use this derivative, and of course the derivative of the style cost function as well, to update the pixels of the image to look more similar in this case to the content image. In the next section, we will also see the style cost function.

Table of Contents for Applying content cost function

Create new playlist

Sign In

Sign Up

Table of Contents for
Applying content cost function