L2 and L1 regularization

The first way that we will look at to create a more robust model is to use L1 or L2 regularization. These are by far the most common methods of regularization. The basic idea is that during training of our model, we actively try to impose some constraint on the values of the model weights using either the L1 or L2 norms of those weights.

We do this by adding an extra term to whatever loss function we are using. For L1 regularization, the term we add is , and for L2 regularization, the term we add is . In the preceding terms, is all the weights in our network, and is a hyperparameter called the regularization strength. By adding this term, we prevent weight values from becoming too large.

Therefore, the SVM loss function, that is, , from Chapter 1, Setup and Introduction to TensorFlow, for L1 regularization becomes as follows:

Here, we add the regularization term that sums over all weights of the network. Here, is the layer index, and are the indexes of the weight matrix for each layer. The expression for L2 regularization will look similar.

For L1 regularization, this extra term encourages the weight vectors to become sparse, meaning that many of the weight values become zero. As a result, the model becomes immune to noisy inputs, as the weight vectors will only use a subset of the important inputs, which helps avoid overfitting.

For L2 regularization, this extra term, apart from keeping the sum of weights low, also forces the weight values to be evenly spread across the weight vectors so that the model uses all the weights a little bit rather than use a few weights a lot. Due to the multiplicative interaction between inputs and weights, this is intuitively a desirable property to have and helps the model avoid overfitting. L2 regularization is also sometimes called weight decay; this is because during training, all of your weights will be linearly reduced or ‘decay’ by the term (the derivative of the L2 regularization term).

Note that we do not include the bias term during regularization, only the weights. This is because the bias terms do not really affect a model's overfitting, as they affect the output in an additive way, just shifting up or down rather than changing the shape of your function. There is no harm to include them, but there's also no benefit, so consequently, there's no point in including them.

In the following diagram, you may note that increasing the regularization strength lambda reduces overfitting. A high regularization term means that the network becomes nearly linear and cannot shape complicated decision boundaries.

We can manually implement L2/L1 regularization by grabbing all our weights and applying the l2 norm to each, and then adding them all together, but this soon gets tedious for big models. Luckily, there is an easier way in TensorFlow if we are using tf.layers. First, we set up our regularizer, as demonstrated:

l2_reg = tf.contrib.layers.l2_regularizer(scale=0.001)

The scale argument is our lambda from before that we need to find and set ourselves, usually by cross validation. If we set it to 0, no regularization will occur. Now, when we create any layers, we pass our regularizer through as an argument. TensorFlow will do the calculations to get all the regularization terms we need to add to our loss function:

# Example of adding passing regularizer to a conv layer.
 reg_conv_layer = tf.layers.conv2d( inputs, filters, kernel_size, kernel_regularizer=l2_reg)

To add our regularization terms, we first need to collect them all up. Luckily, TensorFlow will automatically place all of our regularization terms together in a collection for us, so that we can access them easily. TensorFlow stores some important collections associated with your created graph, such as trainable variables, summaries, and regularization losses, to name a few, inside tf.GraphKeys. We can access these collections using tf.get_collection() and supplying the name of a collection to get. For example, to get our regularization losses, we will write the following:

reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)

This will return a list containing all the tensors stored in this collection.

You can also make your own collections using tf.get_collection(key='my_collection') and then add variables to it using tf.add_to_collection(name='my_collection', value=some_variable_to_add). If a collection already exists with the supplied key, tf.get_collection will return that collection rather than creating it.

Now that we have our regularization loss terms, we can just add them to our usual training loss, like so, and then optimize the combined loss:

train_loss=[...]  # Training loss 

combined_loss = tf.n_add(train_loss, reg_losses)

Table of Contents for L2 and L1 regularization

Create new playlist

Sign In

Sign Up

Table of Contents for
L2 and L1 regularization