Controlling variance with dropout

One really great way to reduce overfitting in deep neural networks is to employ a technique called dropout. Dropout does exactly what it says, it drop neurons out of a hidden layer. Here's how it works.

Through every minibatch, we will randomly choose to turn off nodes in each hidden layer. Imagine we had some hidden layer where we had implemented dropout, and we chose the drop probability to be 0.5. That means, for every mini batch, for every neuron, we flip a coin to see whether we use that neuron. In doing so, you'd probably randomly turn off about half of the neurons in that hidden layer:

If we do this over and over again, it's like we're training many smaller networks. The model weights remain relatively smaller, and each smaller network is less likely to overfit the data. It also forces each neuron to be less dependent on the other neurons doing their jobs.

Dropout works amazingly well to combat overfitting on many, if not most, of the deep learning problems that you are likely to encounter. If you have a high variance model, dropout is a good first choice to reduce overfitting.

Keras contains a built in Dropout layer that we can easily use to implement Dropout in the network. A Dropout layer will simply turn off the outputs to neurons in the previous layer, randomly, to let us easily retrofit our network to use Dropout. To use it, we will need to first import the new layer in addition to the other layer types we're using, as shown in the following code:

from keras.layers import Input, Dense, Dropout

Then, we just insert Dropout layers into our model, as shown in the following code:

def build_network(input_features=None):
# first we specify an input layer, with a shape == features
inputs = Input(shape=(input_features,), name="input")
x = Dense(512, activation='relu', name="hidden1")(inputs)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu', name="hidden2")(x)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu', name="hidden3")(x)
x = Dropout(0.5)(x)
prediction = Dense(10, activation='softmax', name="output")(x)
model = Model(inputs=inputs, outputs=prediction)
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=["accuracy"])
return model

This is the exact model we've previously used; however, we've inserted a Dropout layer after each Dense layer, which is how I normally start when I implement dropout. Like other model architecture decisions, you could choose to implement dropout in only some layers, all layers, or no layers. You can also choose to vary the dropout/keep probability; however, I do recommend starting at 0.5 as it tends to work pretty well.

A safe choice is dropout at every layer with keep probability 0.5. A good second try would be only using dropout at the first layer.

Let's train our new model with dropout, and see how it compares to our first try:

Let's take a look at validation accuracy first. The model using dropout struggles to train as fast as the unregularized model, but in this case, it does seem to get up to speed pretty quickly. Look at the validation accuracy at around epoch 44. It's marginally better than the unregularized model.

Now, let's look at validation loss. You can see the impact dropout had on the model overfitting and it's really quite pronounced. While it only translates to a marginal improvement in the final product, dropout is doing a pretty good job of keeping our validation loss from climbing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.234.188