Neural networks

As we saw in the first section of the chapter, all neural networks are learning the weights in each of the layers. Maybe we have millions of weights, but what a neural network is trying to figure out are good values. For example, first we do the forward pass, during which we generate the hypothesis. Then we compare the hypothesis with the real values of the data we have, and then come back with feedback that will change the weights in a way that the next forward pass will produce a better hypothesis. This feedback pass, or the backpropagation pass, updates all the weights:

We repeat this process of the forward pass and the backpropagation pass until we're satisfied with the accuracy:

Now, if we store all these decimal values of the weights in the disk, in some way, we are storing all the knowledge of this neural network to resolve the problem in hand. This means that if any other neural network with the same configuration or number of neurons and hidden layers loads these decimal values or these weights, it will be able to predict the results with the exact same accuracy. The knowledge is completely abstracted from the original neural network, which is the first strength. Every other neural network can then load the knowledge or the parameters, and can improve from there by continuing to train even further.

Transfer learning knowledge is independent from the technical details because, in the end, we don't store any source code or binary code as is the case with software, just pure decimal values. Therefore, any program written in a

ny technology that can read those decimal values will have all the knowledge of solving the problem, and indeed some of the words we'll use in this book come from the C++ neural network, Python, or TensorFlow, which are very different from Java frameworks.

The ways we can use the power of transfer learning depend mostly on two factors: the amount and quality of the data we have at hand, and the processing power we can afford. For computer vision, I would say, as a default choice, that it's better to first look around; maybe someone else already trained a network for the problem we want to solve, or maybe a similar problem. Someone may have trained the neural network for weeks or even months, and has already gone through the painful process of parameter tuning, and you can reuse all that work and maybe take it from there to improve it further.

Let's suppose we have a neural network with an input, hidden layers, and the softmax layer that tries to predict 1,000 classes, as shown in the following diagram. Let's take the first example, when we don't have a lot of data and we can't afford a lot of processing power. In this case, we'll load the weights from a pre-trained neural network, and we'll freeze up all the weights in all the hidden layers. What we mean by freeze is that we aren't going to train these weights:

We're going to use the weight values for the forward pass, and when we get the feedback, we'll stop. We aren't going to update all the millions of weights used here. Therefore, it's clear that we aren't using up any processing power.

We can even go one step further by modifying this softmax layer. Instead of outputting 1,000 classes we modify the output. For example, consider two classes, if we're trying to solve a problem such as determining whether the image is of a dog or a cat. Now, we'll train only these weights. Perhaps these weights may number 1,000 classes, but they'll be much lower than millions of weights, as in VGG-16 for example.

Then we have the second case, where we have a bit more data than before but still not enough, but we can afford more processing power:

Usually, we can load pre-trained weights from a neural network somewhere, but in this case, we freeze fewer layers, as shown in the preceding diagram, and we'll train the rest of the weights. For the forward pass, we'll reuse all the layers at once and go straight to the end, and when we come back with some feedback, with the backpropagation step, we'll update all these weights. Instead of stopping here, we'll go to the fourth layer and stop there. Sometimes, this produces better results because we're adapting these weights more accurately to our problem, since we're training more weights. Or we can say that we're improving the existing weights:

Then we have a case where we have a lot of data and we're sure about its quality. It'll bring something better to our problem, and, at the same time, we can afford a lot of processing power. In this case, we load the weights from a neural network, but we don't freeze any of the layers. So we just get these weights and then train our data again and again. Of course, we need to modify the softmax most of the time, because we're trying to predict different classes. Even when you have a lot of resources to train a neural network from scratch, using the weights won't hurt if the problems are sufficiently similar. You can still reuse some of the features the neural network is capturing there. In the next section, we'll see how to build a Java application that's able to recognize cats and dogs, and we'll use the first case, where we don't have a lot of data and processing power.

Table of Contents for Neural networks

Create new playlist

Sign In

Sign Up

Table of Contents for
Neural networks