Deep convolutional networks

In the previous chapter, Chapter 9, Neural Networks for Machine Learning we have seen how a multi-layer perceptron can achieve a very high accuracy when working with an complex image dataset that is not very complex, such as the MNIST handwritten digits one. However, as the fully-connected layers are horizontal, the images, which in general are three-dimensional structures (width × height × channels), must be flattened and transformed into one-dimensional arrays where the geometric properties are definitively lost. With more complex datasets, where the distinction between classes depends on more details and on their relationships, this approach can yield moderate accuracies, but it can never reach the precision required by production-ready applications.

The conjunction of neuroscientific studies and image processing techniques suggested experimenting with neural networks where the first layers work with bidimensional structures (without the channels), trying to extract a hierarchy of features that are strictly dependent on the geometric properties of the image. In fact, as confirmed by neuroscientific research about the visual cortex, a human being doesn't decode an image directly. The process is sequential and starts by detecting low-level elements such as lines are orientations; progressively, it proceeds by focusing on sub-properties that define more and more complex shapes, different colors, structural features, and so on, until the amount of information is enough to resolve any possible ambiguity (for further scientific details, I recommend the book Vision and Brain: How We Perceive the World, Stone J. V., MIT Press).

For example, we can image the decoding process of an eye as a sequence made up of these filters (of course, this is only a didactic example): directions (dominant horizontal dimension), a central circle inside an ellipsoidal shape, a darker center (pupil) and a clear background (bulb), a smaller darker circle in the middle of the pupil, the presence of eyebrows, and so on. Even if the process is not biologically correct, it can be considered as a reasonable hierarchical process where a higher level sub-feature is obtained after a lower-level filtering.

This approach has been synthesized using the bidimensional convolutional operator, which was already known as a powerful image processing tool. However, in this case, there's a very important difference: the structure of the filters is not pre-imposed but learned by the network using the same back-propagation algorithm employed for MLPs. In this way, the model can adapt the weights considering a final goal (which is the classification output), without taking into account any pre-processing steps. In fact, a deep convolutional network, more than an MLP, is based on the concept of end-to-end learning, which is a different way to express what we have described before. The input is the source; in the middle, there's a flexible structure; and, at the end, we define a global cost function, measuring the accuracy of the classification. The learning process has to back-propagate the errors and correct the weights to reach a specific goal, but we don't know exactly how this process works. What we can easily do is analyze the structure of the filters at the end of the learning phase, discovering that the network has specialized the first layers on low-level details (such as orientations) and the last ones on high-level, sometimes recognizable, ones (such as the components of a face). It's not surprising that such models achieved state-of-the-art performance in tasks such as image recognition, segmentation (detecting the boundaries of different parts composing an image), and tracking (detecting the position of moving objects). Nevertheless, deep convolutional networks have become the first block of many different architectures (such as deep reinforcement learning or neural style transfer) and, even with a few known limitations, continue to be the first choice for solving several complex real-life problems. The main drawback of such models (which is also a common objection) is that they require very large datasets to reach high accuracies. All the most important models are trained with millions of images and their generalization ability (that is, the main goal) is proportional to the number of different samples. There were researchers who noticed that a human being learns to generalize without this huge amount of experience and, in the coming decades, we are likely to observe improvements under this viewpoint. However, deep convolutional networks have revolutionized many Artificial Intelligence fields, allowing results that were considered almost impossible just a few years ago.

In this section, we are going to discuss different kinds of convolutions and how they can be implemented using Keras; therefore, for specific technical details I continue suggesting to check the official documentation and the book Deep Learning with Keras, Gulli A, Pal S., Packt.

Table of Contents for Deep convolutional networks

Create new playlist

Sign In

Sign Up

Table of Contents for
Deep convolutional networks