Training neural networks

So how do we go about setting the values of the weights and biases in our neural network that will best solve our problem? Well this is done in something called the training phase. During this phase, we want to make our neural network “learn” from a training dataset. The training dataset consists of a set of inputs (normally denoted as X) along with corresponding desired outputs or labels (normally denoted as Y).

When we say the network learns, all that is happening is the network parameters get updated in such a way that the network should be able to output the correct Y for every X in the training dataset. The expectation is that after the network is trained, it will be able to generalize and perform well for new inputs not seen during training. However, in order for this to happen, you must have a dataset that is representative enough to capture what you want to output. For example, if you want to classify cars, you need to have a dataset with different types, colors, illumination, and so on.

One common mistake when training machine learning models in general occurs when we do not have enough data or our model is not complex enough to capture the complexity of the data. These mistakes can lead to the problems of overfitting and underfitting. You will learn how to deal in practice with these problems in future chapters.

During training, the network is executed in two distinct modes:

  • Forward propagation: We work forward through the network producing an output result for a current given input from the dataset. A loss function is then evaluated that tells us how well the network did at predicting the correct outputs.
  • Backward propagation: We work backward through the network calculating the impact each weight had on producing the current loss of the network.

This image shows the two different way the network is run when training.

Currently, the workhorse for making neural networks "learn" is the backpropagation algorithm combined with a gradient-based optimizer like gradient descent.

Backpropagation is used to calculate gradients that tell us what effect each weight had on producing the current loss. After gradients are found an optimization technique such as gradient descent uses them to update the weights in such a way that we can minimize the value of the loss function.

Just a closing remark: ML libraries such as TensorFlow, PyTorch, Caffe, or CNTK will provide the backpropagation, optimizers, and everything else needed to represent and train neural networks without the need to rewrite all of this code yourself.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.113.229