Transfer learning

We have discussed how deep learning is fundamentally based on gray-box models that learn how to associate input patterns to specific classification/regression outcomes. All the processing pipeline that is often employed to prepare the data for specific detections is absorbed by the complexity of the neural architecture. However, the price to pay for high accuracies is a proportionally large number of training samples. State-of-the-art visual networks are trained with millions of images and, obviously, each of them must be properly labeled. Even if there are many free datasets that can be employed to train several models, many specific scenarios need hard preparatory work that sometimes is very difficult to achieve.

Luckily, deep neural architectures are hierarchical models that learn in a structured way. As we have seen in the examples of deep convolutional networks, the first layers become more and more sensitive to detect low-level features, while the higher ones concentrate their work on extracting more detailed high-level features. In several tasks, it's reasonable to think that a network trained, for example, with a large visual dataset (such as ImageNet or Microsoft Coco) could be reused to achieve a specialization in a slightly different task. This concept is known as transfer learning and it's one of the most useful techniques when it's necessary to create state-of-the-art models with brand new datasets and specific objectives. For example, a customer can ask for a system to monitor a few cameras with the goal to segment the images and highlight the boundaries of specific targets.

The input is made up of video frames with the same geometric properties as thousands of images employed in training very powerful models (for example, Inception, ResNet, or VGG); therefore, we can take a pre-trained model, remove the highest layers (normally dense ones ending in a softmax classification layer) and connect the flattening layer to an MLP that outputs the coordinates of the bounding boxes. The first part of the network can be frozen (the weights are not modified anymore), while the SGD is applied to tune up the weights of the newly specialized sub-network.

Clearly, such an approach can dramatically speed up the training process, because the most complex part of the model is already trained and can also guarantee an extremely high accuracy (with respect to a naive solution), thanks to the optimization already performed on the original model. Obviously, the most natural question is how does this method work? Is there any formal proof? Unfortunately, there are no mathematical proofs, but there's enough evidence to assure about us of this approach. Generally speaking, the goal of a neural training process is to specialize each layer in order to provide a more particular (detailed, filtered, and so on) representation to the following one. Convolutional networks are a clear example of this behavior, but the same is observable in MLPs as well. The analysis of very deep convolutional networks showed how the content is still visual until reaching the flattening layer, where it's sent to a series of dense layers that are responsible for feeding the final softmax layer. In other words, the output of the convolutional block is a higher-level, segmented representation of the input, which is seldom affected by the specific classification problem. For this reason, transfer learning is generally sound and doesn't normally require a retraining of the lower layers. However, it's difficult to understand which model can yield the best performances and it's very useful to know which dataset has been used to train the original network. General purpose datasets (for example, ImageNet) are very useful in many contexts, while specific ones (such as Cifar-10 or Fashion; MNIST can be too restrictive). Luckily, Keras offers (in the package keras.applications) many models (even quite complex ones) that are always trained with ImageNet datasets and that can be immediately employed in a production-ready application. Even if using them is extremely simple, it requires a deeper knowledge of this framework, which is beyond the scope of this book. I invite the reader interested in this topic to check the book Deep Learning with Keras, Gulli A., Pal S., Packt.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.135.175