A faster way to transfer style

As you may have inferred from the title of this section, the big drawback of the approach introduced in the previous section is that the process requires iterative optimization, as summarized in the following figure:

This optimization is akin to training, in terms of performing many iterations to minimize the loss. Therefore, it typically takes a considerable amount of time, even when using a modest computer. As implied at the start of this book, we ideally want to restrict ourselves to performing inference on the edge as it requires significantly less compute power and can be run in near-real time, allowing us to adopt it for interactive applications. Luckily for us, in their paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution, J. Johnson, A. Alahi, and L. Fei-Fei describe a technique that decouples training (optimization) and inference for style transfer.

Previously, we described a network that took as its input a generated image, a style image, and a content image. The network minimized loss by iteratively adjusting the generated image using the loss functions for content and style; this provided the flexibility of allowing us to plug in any style and content image, but came at the cost of being computationally expensive, that is, slow. What if we sacrifice this flexibility for performance by restraining ourselves to a single style and, instead of performing the optimization to generate the image, train a CNN? The CNN would learn the style and, once trained, could generate a stylized image given a content image with a single pass through the network (inference). This is, in essence, what the paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution, describes, and it is the network we will use in this chapter.

To better elucidate the difference between the previous approach and this approach, take a moment to review and compare the preceding figure with the following one:

 

Unlike the previous approach, where we optimized for a given set of content, style, and generated images and adjusted the generated image to minimize loss, we now feed a CNN with a set of content images and have the network generate the image. We then perform the same loss functions as described earlier for a single style. But, instead of adjusting the generated image, we adjust the weights of the networks using the gradients from the loss function. And we repeat until we have sufficiently minimized the mean loss across all of our content images.

Now, with our model trained, we can have our network stylize an image with a single pass, as shown here:

Over the last two sections we have described, at a high-level, how these networks work. Now, it's time to build an application that takes advantage of all this. In the next section, we will quickly walk through converting the trained Keras model to Core ML before moving on to the main topic of this chapter—implementing custom layers for Core ML. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.106.237