If you are looking for custom training loops, custom layers, and so on, check out Appendix B, where we cover those topics, although briefly.
Understanding Keras Layers
Layers are a fundamental part of Keras. A
layer includes a state (the weights of the neurons for example) and some computation (implemented in the
call method). Keras offers a lot of layers that you can use without having to develop your own. The most commonly used (and probably the ones you may have seen so far) are the following:
3Dense: Like the ones we saw in the FFNN discussion
Conv1D, Conv2D, and Conv3D: Convolutional layers in multiple dimensions
MaxPooling1D, MaxPooling2D and MaxPooling3D: Max-pooling layers
AveragePooling1D, AveragePooling2D and AveragePooling3D: Average-pooling layers
LSTM layers
Regularization layers as Dropout
And many more. Remember that any operation that takes a tensor as input and gives a tensor as output is a layer in the Keras language. For example, flattening a 2D image into a 1D vector is also a layer (see the Flatten layer). Also, reshaping an input can be done with a layer (see the Reshape layer). Even applying an activation function can be done with a layer (see the ReLU layer for example).
In Appendix B, we briefly discuss how to develop your own layers. Note that you can also easily do the following things with layers:
Retrieve the gradients (see Appendix B)
Retrieve the weights (see Appendix B)
Add regularization losses (as discussed in the main part of the book)
Set the weights to values of your choosing (see Appendix B)
Use initializers for the weights (for example, He, Glorot, etc.)
Setting the Activation Function
To set the
activation function in layers, use the
activation property. For example, the code
layers.Dense(2, activation="relu")
creates a layer with two neurons and the ReLU activation function. Note that if you don’t specify any activation, none is used (or, in other words, the identity function is used as the activation function). As usual, Keras offers many activation functions:4 relu, sigmoid, softmax, softplus, softsign, tanh, selu, elu, and exponential function.
Putting It All Together and Training
The easiest way to train a
model in Keras involves three steps:
1.
Create the network by specifying the architecture (the number of layers, number of neurons, types of layers, activation functions, etc.). You would do this in the examples with, for example, keras.Sequential().
2.
Compile the model with the compile() method. This step specifies which loss function and which metrics Keras should use.
3.
Train the model by using the fit() method. In the fit() method you can specify the number of epochs, batch size, and many other parameters.
Here is a minimal example
model = keras.Sequential(
[
layers.Dense(2, activation="relu"),
layers.Dense(3, activation="relu"),
layers.Dense(1),
]
)
This specifies the network architecture. After that, you compile the model with the
compile() method.
model.compile(optimizer=Adam, loss='mse')
where you specify the optimizer Adam and the MSE loss. After that, you can train the model
model.fit(x, y, batch_size=32, epochs=10)
where x indicates the inputs, y indicates the labels, the batch_size is specified as 32, and you want to train for ten epochs.
The
fit() method accepts lots of parameters. You can specify:
How much output you want by specifying the verbose parameter (0 for no output, 1 for a progress bar, and 2 for one line for each epoch).
Actions at different points during the training by specifying which callbacks functions the fit() method should use. For more information on callbacks, see the next sections.
A validation dataset by simply giving a validation_split parameter (that is, the fraction of the data that you want to use as the validation dataset). The fit() method will give you the metrics for this dataset.
You can specify many more options. As usual, to get a complete overview, check out the official documentation at https://keras.io/api/models/model_training_apis/.
Using Callback Functions
Callback functions are a powerful way to customize training of a model. They can be used with the fit(), evaluate(), and predict() functions.
It is instructive to understand a bit better what Keras callback functions are, since they are used quite often when developing models. From the official documentation
7A callback is a set of functions to be applied at given stages of the training procedure.
The idea is that you can pass a list of callback functions to the
.fit() method of the
Sequential or
Model classes. Relevant methods of the callbacks will then be called at each stage of the training. Their use is rather easy. For the
fit() function, you would use them as
model.fit(
...,
callbacks=[Callback()],
)
Where
Callback() is a placeholder name for a callback (you need to change it to the callback function name you want to use). There are callbacks functions that perform many tasks, such as
ModelCheckPoint saves weights and models at specific frequencies
LearningRateScheduler changes the learning rate according to some schedule
TerminateOnNaN stops the training process if NaN appears (so you don’t waste time or computing resources)
And many more
As usual, you can find more information on the official documentation at https://keras.io/api/callbacks/. In Appendix B, I discuss how to develop your own custom callback class, since this is one of the best ways to check and control the training process at various stages.
Saving and Loading Models
It is often useful to save a model on disk, so you can continue the training at a later stage or reuse a previously trained model. To learn how to do this, let's consider the MNIST dataset for the sake of giving a concrete example.8
You need the following
imports
import os
import tensorflow as tf
from tensorflow import keras
Load the MNIST dataset again and take the first 5,000 observations.
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_labels = train_labels[:5000]
test_labels = test_labels[:5000]
train_images = train_images[:5000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:5000].reshape(-1, 28 * 28) / 255.0
Let's now build a simple Keras model with a
Dense layer with 512 neurons, a bit of dropout, and the classical ten-neuron output layer for classification (remember the MNIST dataset has ten classes).
model = tf.keras.models.Sequential([
keras.layers.Dense(512, activation=tf.keras.activations.relu, input_shape=(784,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation=tf.keras.activations.softmax)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy'])
We have added a bit of dropout, since this model has 407,050 trainable parameters. You can check this number simply by using model.summary().
What we need to do is first define where we want to save the model on the disk. And we can do that (for example) in this way
checkpoint_path = "training/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
After that, we need to use a callback (remember what we did in the last section) that will save the weights
9cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
save_weights_only=True,
verbose=1)
Note that we don't need to define a class as we did in the previous section, since ModelCheckpoint inherits from the Callback class.
Then we can simply train the model, specifying the correct callback function
model.fit(train_images, train_labels, epochs = 10,
validation_data = (test_images,test_labels),
callbacks = [cp_callback])
If you check the contents of the folder where your code is running, you should see at least three files:
cp.ckpt.data-00000-of-00001: Contains the weights (if the number of weights is big, you will see many files like this one)
cp.ckpt.index: Contains information about which weights are in which file
checkpoint: Contains information about the checkpoint itself
We can now test our method. This code will give you a model that will reach an accuracy on the validation dataset of roughly 92%.
If we define a second model
model2 = tf.keras.models.Sequential([
keras.layers.Dense(512, activation=tf.keras.activations.relu, input_shape=(784,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(10, activation=tf.keras.activations.softmax)
])
model2.compile(optimizer='adam',
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy'])
and we check its accuracy on the validation dataset with
loss, acc = model2.evaluate(test_images, test_labels)
print("Untrained model, accuracy: {:5.2f}%".format(100*acc))
we will get an accuracy of roughly 8.6%, That was expected, since this model has not been trained yet. But now we can load the saved weights in this model and try again.
model2.load_weights(checkpoint_path)
loss,acc = model2.evaluate(test_images, test_labels)
print("Second model, accuracy: {:5.2f}%".format(100*acc))
We should get this result
5000/5000 [==============================] - 0s 50us/step
Restored model, accuracy: 92.06%
That makes again sense, since the new model is now using the weights of the old, trained model. Keep in mind that, to load pretrained weights in a new model, the model needs to have the exact same architecture as the one used to save the weights.
As you will see again and again, the basic idea is to use a callback that will save your weights. Of course, you can customize the callback function. For example, if you want to save the weights every 100 epochs with a different filename each time, so that you could decide to restore a specific check point, you need first to define the filename in a dynamic way as
checkpoint_path = "training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
You should use the following callback
cp_callback = tf.keras.callbacks.ModelCheckpoint(
checkpoint_path, verbose=1, save_weights_only=True,
period=1)
Note that
checkpoint_path can contain named-formatting options (in the name we have
{epoch:04d}), which will be filled by the values of
epoch and
logs (passed in
on_epoch_end, which you saw in the previous section).
10 You can check the original code for
tf.keras.callbacks.ModelCheckpoint and you will find that the formatting is done in the
on_epoch_end(self, epoch, logs) method.
filepath = self.filepath.format(epoch=epoch + 1, **logs)
You can define your filename using the epoch number and the values contained in the logs dictionary.
Let's get back to our example. Let's start by saving a first version of the model
model.save_weights(checkpoint_path.format(epoch=0))
and then we can fit the model as usual
model.fit(train_images, train_labels,
epochs = 10, callbacks = [cp_callback],
validation_data = (test_images,test_labels),
verbose=0)
Be careful since this will save lots of files. In our example, one every epoch. So for example, the directory content may look like this:
checkpoint cp-0006.ckpt.data-00000-of-00001
cp-0000.ckpt.data-00000-of-00001 cp-0006.ckpt.index
cp-0000.ckpt.index cp-0007.ckpt.data-00000-of-00001
cp-0001.ckpt.data-00000-of-00001 cp-0007.ckpt.index
cp-0001.ckpt.index cp-0008.ckpt.data-00000-of-00001
cp-0002.ckpt.data-00000-of-00001 cp-0008.ckpt.index
cp-0002.ckpt.index cp-0009.ckpt.data-00000-of-00001
cp-0003.ckpt.data-00000-of-00001 cp-0009.ckpt.index
cp-0003.ckpt.index cp-0010.ckpt.data-00000-of-00001
cp-0004.ckpt.data-00000-of-00001 cp-0010.ckpt.index
cp-0004.ckpt.index cp.ckpt.data-00000-of-00001
cp-0005.ckpt.data-00000-of-00001 cp.ckpt.index
cp-0005.ckpt.index
A last tip before moving on is how to get the latest checkpoint, without bothering to search for the filename. This can be done easily with the following code
latest = tf.train.latest_checkpoint('training')
model.load_weights(latest)
This will automatically load the weights saved in the latest checkpoint. The variable latest is simply a string and contains the last checkpoint filename saved. In this example, that is training/cp-0010.ckpt.
Saving Your Weights Manually
Of course, you can simply save your model weights manually when you are done training, without defining a callback function
model.save_weights('./checkpoints/my_checkpoint')
This command will generate three files, all starting with the string you gave as a name—in this case,
my_checkpoint. Running this code will generate the three files we described previously:
checkpoint
my_checkpoint.data-00000-of-00001
my_checkpoint.index
Reloading the weights in a new model is as simple as this:
model.load_weights('./checkpoints/my_checkpoint')
Keep in mind, that to be able to reload saved weights in a new model, the old model must have the same architecture as the new one. It must be exactly the same.
Saving the Entire Model
Keras also gives us a way to save the entire model on disk: weights, the architecture, and the optimizer. We can re-create the same model by simply moving some files. For example, we could use the following code
model.save('my_model.h5')
This will save the entire model in one file, called
my_model.h5. We can simply move the file to a different computer and re-create the same trained model with
new_model = keras.models.load_model('my_model.h5')
Note that this model will have the same trained weights as your original model, so it’s ready to use. This may be helpful if you want to stop training your model and continue the training on a different machine, for example. Or maybe you must stop the training for a while and continue at a later time.