Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3 Keras

In this chapter, we will focus on the high-level TensorFlow API named Keras.

By the end of this chapter, you should have a better understanding of:

The Keras Sequential API
The Keras Functional API
The Keras Subclassing API
The Keras Preprocessing API

Introduction

In the previous chapter, we covered TensorFlow's fundamentals, and we are now able to set up a computational graph. This chapter will introduce Keras, a high-level neural network API written in Python with multiple backends. TensorFlow is one of them. François Chollet, a French software engineer and AI researcher currently working at Google, created Keras for his own personal use before it was open-sourced in 2015. Keras's primary goal is to provide an easy-to-use and accessible library to enable fast experiments.

TensorFlow v1 suffers from usability issues; in particular, a sprawling and sometimes confusing API. For example, TensorFlow v1 offers two high-level APIs:

The Estimator API (added in release 1.1) is used for training models on localhost or distributed environments
The Keras API was then added later (release 1.4.0) and intended to be used for fast prototyping

With TensorFlow v2, Keras became the official high-level API. Keras can scale and suit various user profiles, from research to application development and from model training to deployment. Keras provides four key advantages: it's user-friendly (without sacrificing flexibility and performance), modular, composable, and scalable.

The TensorFlow Keras APIs are the same as the Keras API. However, the implementation of Keras in its TensorFlow version of the backend has been optimized for TensorFlow. It integrates TensorFlow-specific functionality, such as eager execution, data pipelines, and Estimators.

The difference between Keras, the independent library, and Keras' implementation as integrated with TensorFlow is only the way to import it.

Here is the command to import the Keras API specification:

import keras

Here is TensorFlow's implementation of the Keras API specification:

import tensorflow as tf
from tensorflow import keras

Now, let's start by discovering the basic building blocks of Keras.

Understanding Keras layers

Keras layers are the fundamental building blocks of Keras models. Each layer receives data as input, does a specific task, and returns an output.

Keras includes a wide range of built-in layers:

Core layers: Dense, Activation, Flatten, Input, Reshape, Permute, RepeatVector, SpatialDropOut, and many more.
Convolutional layers for Convolutional Neural Networks: Conv1D, Conv2D, SeparableConv1D, Conv3D, Cropping2D, and many more.
Pooling layers that perform a downsampling operation to reduce feature maps: MaxPooling1D, AveragePooling2D, and GlobalAveragePooling3D.
Recurrent layers for recurrent neural networks to process recurrent or sequence data: RNN, SimpleRNN, GRU, LSTM, ConvLSTM2D, etc.
The embedding layer, only used as the first layer in a model and turns positive integers into dense vectors of fixed size.
Merge layers: Add, Subtract, Multiply, Average, Maximum, Minimum, and many more.
Advanced activation layers: LeakyReLU, PReLU, Softmax, ReLU, etc.
The batch normalization layer, which normalizes the activation of the previous layer at each batch.
Noise layers: GausianNoise, GausianDropout, and AlphaDropout.
Layer wrappers: TimeDistributed applies a layer to every temporal slice of an input and bidirectional wrapper for RNNs.
Locally-connected layers: LocallyConnected1D and LocallyConnected2D. They work like Conv1D or Conv2D without sharing their weights.

We can also write our Keras layers as explained in the Keras Subclassing API section of this chapter.

Getting ready

To start, we'll review some methods that are common in all Keras layers. These methods are very useful to know the configuration and the state of a layer.

How to do it...

Let's start with the layer's weights. The weights are possibly the most essential concept in a layer; it decides how much influence the input will have on the output. It represents the state of a layer. The get_weights() function returns the weights of the layer as a list of NumPy arrays:
```
layer.get_weights()
```
The set_weights() method fixes the weights of the layer from a list of Numpy arrays:
```
layer.set_weights(weights)
```
As we'll explain in the Keras Functional API recipe, sometimes neural network topology isn't linear. In this case, a layer can be used several times in the network (shared layer). We can easily get the inputs and outputs of a layer by using this command if the layer is a single node (no shared layer):
```
layer.input
layer.output
```
Or this one, if the layer has multiple nodes:
```
layer.get_input_at(node_index)
layer.get_output_at(node_index)
```
We can also easily get the layer's input and output shapes by using this command if a layer is a single node (no shared layer):
```
layer.input_shape
layer.output_shape
```
Or this one, if the layer has multiple nodes:
```
layer.get_input_shape_at(node_index)
layer.get_output_shape_at(node_index)
```
Now, we'll be discussing the layer's configuration. As the same layer could be instantiating several times, the configuration doesn't include the weights or connectivity information. The get_config() function returns a dictionary containing the configuration of the layer:
```
layer.get_config()
```
The from_config() method instantiates a layer's configuration:
```
layer.from_config(config)
```
Note that the layer configuration is stored in an associative array (Python dictionary), a data structure that maps keys to values.

How it works...

The layers are the building blocks of the models. Keras offers a wide range of building layers and useful methods to know more about what's happening and get inside the models.

With Keras, we can build models in three ways: with the Sequential, the Functional, or the Subclassing API. We'll later see that only the last two APIs allow access to the layers.

Using the Keras Sequential API

The main goal of Keras is to make it easy to create deep learning models. The Sequential API allows us to create Sequential models, which are a linear stack of layers. Models that are connected layer by layer can solve many problems. To create a Sequential model, we have to create an instance of a Sequential class, create some model layers, and add them to it.

We will go from the creation of our Sequential model to its prediction via the compilation, training, and evaluation steps. By the end of this recipe, you will have a Keras model ready to be deployed in production.

Getting ready

This recipe will cover the main ways of creating a Sequential model and assembling layers to build a model with the Keras Sequential API.

To start, we load TensorFlow and NumPy, as follows:

import tensorflow as tf
from tensorflow import keras
from keras.layers import Dense
import numpy as np

We are ready to proceed with an explanation of how to do it.

How to do it...

First, we will create a Sequential model. Keras offers two equivalent ways of creating a Sequential model. Let's start by passing a list of layer instances as an array to the constructor. We'll build a multi-class classifier (10 categories) fully connected model, aka a multi-layer perceptron, by entering the following code.

model = tf.keras.Sequential([
    # Add a fully connected layer with 1024 units to the model
    tf.keras.layers.Dense(1024, input_dim=64),
    # Add an activation layer with ReLU activation function
    tf.keras.layers.Activation('relu'),
    # Add a fully connected layer with 256 units to the model
    tf.keras.layers.Dense(256),
    # Add an activation layer with ReLU activation function
    tf.keras.layers.Activation('relu'),
    # Add a fully connected layer with 10 units to the model
    tf.keras.layers.Dense(10),
    # Add an activation layer with softmax activation function
    tf.keras.layers.Activation('softmax')
])

Another way to create a Sequential model is to instantiate a Sequential class and then add layers via the .add() method.

model = tf.keras.Sequential()
# Add a fully connected layer with 1024 units to the model
model.add(tf.keras.layers.Dense(1024, input_dim=64))
# Add an activation layer with ReLU activation function
model.add(tf.keras.layers.Activation(relu))
# Add a fully connected layer with 256 units to the model
model.add(tf.keras.layers.Dense(256))
# Add an activation layer with ReLU activation function
model.add(tf.keras.layers.Activation('relu'))
# Add a fully connected Layer with 10 units to the model
model.add(tf.keras.layers.Dense(10))
# Add an activation layer with softmax activation function
model.add(tf.keras.layers.Activation('softmax'))

Let's take a closer look at the layer configuration. The tf.keras.layers API offers a lot of built-in layers and also provides an API to create our layers. In most of them, we can set these parameters to the layer's constructor:
- We can add an activation function by specifying the name of a built-in function or as a callable object. This function decides whether a neuron should be activated or not. By default, a layer has no activation function. Below are the two ways to create a layer with an activation function. Note that you don't have to run the following code; these layers are not assigned to variables.
```
# Creation of a dense layer with a sigmoid activation function:
Dense(256, activation='sigmoid')
# Or:
Dense(256, activation=tf.keras.activations.sigmoid)
```
- We can also specify an initialization strategy for the initial weights (kernel and bias) by passing the string identifier of built-in initializers or a callable object. The kernel is by default set to the "Glorot uniform" initializer, and the bias is set to zeros.
```
# A dense layer with a kernel initialized to a truncated normal distribution:
Dense(256, kernel_initializer='random_normal')
# A dense layer with a bias vector initialized with a constant value of 5.0:
Dense(256, bias_initializer=tf.keras.initializers.Constant(value=5))
```
- We can also specify regularizers for kernel and bias, such as L1 (also called Lasso) or L2 (also called Ridge) regularization. By default, no regularization is applied. A regularizer aims to prevent overfitting by penalizing a model for having large weights. These penalties are incorporated in the loss function that the network optimizes.
```
# A dense layer with L1 regularization of factor 0.01 applied to the kernel matrix:
Dense(256, kernel_regularizer=tf.keras.regularizers.l1(0.01))
# A dense layer with L2 regularization of factor 0.01 applied to the bias vector:
Dense(256, bias_regularizer=tf.keras.regularizers.l2(0.01))
```
In Keras, it's strongly recommended to set the input shape for the first layer. Yet, contrary to appearances, the input layer isn't a layer but a tensor. Its shape must be the same as our training data. The following layers perform automatic shape inference; their shapes are calculated based on the unit of the previous layer.
Each type of layer requires input with a certain number of dimensions, so there are different ways to specify the input shape depending on the kind of layer. Here, we'll focus on the Dense layer, so we'll use the input_dim parameter. Since the shape of the weights depends on the input size, if the input shape isn't specified in advance, the model has no weights: the model is not built. In this case, you can't call any methods of the Layer class such as summary, layers, weights, and so on.

In this recipe, we'll create datasets with 64 features, and we'll process batches of 10 samples. The shape of our input data is (10,64), aka (batch_size, number_of_features). By default, a Keras model is defined to support any batch size, so the batch size isn't mandatory. We just have to specify the number of features through the input_dim parameter to our first layer.
```
Dense(256, input_dim=(64))
```
However, we can force the batch size for efficiency reasons with the batch_size argument.
```
 Dense(256, input_dim=(64), batch_size=10)
```
Before the learning phase, our model needs to be configured. This is done by the compile method. We have to specify:
- An optimization algorithm for the training of our neural network. We can pass an optimizer instance from the tf.keras.optimizers module. For example, we can use an instance of tf.keras.optimizers.RMSprop or 'RMSprop', which is an optimizer that implements the RMSprop algorithm.
- A loss function called an objective function or optimization score function aims at minimizing the model. It can be the name of an existing loss function (such as categorical_crossentropy or mse), a symbolic TensorFlow loss function (tf.keras.losses.MAPE), or a custom loss function, which takes as input two tensors (true tensors and predicted tensors) and returns a scalar for each data point.
- A list of metrics used to judge our model's performance that aren't used in the model training process. We can either pass the string names or callables from the tf.keras.metrics module.
- If you want to be sure that the model trains and evaluates eagerly, we can set the argument run_eagerly to true.
Note that the graph is finalized with the compile method.

Now, we'll compile the model using the Adam optimizer for categorical cross-entropy loss and display the accuracy metric.
```
model.compile(
    optimizer="adam", 
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)
```

Now, we'll generate three toy datasets of 64 features with random values. One will be used to train the model (2,000 samples), another one to validate (500 samples), and the last one to test (500 samples).

data = np.random.random((2000, 64))
labels = np.random.random((2000, 10))
val_data = np.random.random((500, 64))
val_labels = np.random.random((500, 10))
test_data = np.random.random((500, 64))
test_labels = np.random.random((500, 10))

After the model has been configured, the learning phase begins by calling the fit method. The training configuration is done by these three arguments:
- We have to set the number of epochs, aka the number of iterations over the entire input data.
- We have to specify the number of samples per gradient, called the batch_size argument. Note that the last batch may be smaller if the total number of samples is not divisible by the batch size.
- We can specify a validation dataset by setting the validation_data argument (a tuple of inputs and labels). This dataset makes it easy to monitor the performance of the model. The loss and metrics are computed in inference mode at the end of each epoch.
Now, we'll train the model on our toy datasets by calling the fit method:
```
model.fit(data, labels, epochs=10, batch_size=50,
          validation_data=(val_data, val_labels))
```
Then, we'll evaluate our model on the test dataset. We'll call the model.evaluate function, which predicts the loss value and the metric values of the model in test mode. Computation is done in batches. It has three important arguments: the input data, the target data, and the batch size. This function predicts the output for a given input. Then, it computes the metrics function (specified in the model.compile based on the target data) and the model's prediction and returns the computed metric value as the output.
```
model.evaluate(data, labels, batch_size=50)
```
We can also just use the model to make a prediction. The tf.keras.Model.predict method takes as input only data and returns a prediction. And here's how to predict the output of the last layer of inference for the data provided, as a NumPy array:
```
result = model.predict(data, batch_size=50)
```
Analyzing this model's performance is of no interest in this recipe because we randomly generated a dataset.

Now, let's move on to an analysis of this recipe.

How it works...

Keras provides the Sequential API to create models composed of a linear stack of layers. We can either pass a list of layer instances as an array to the constructor or use the add method.

Keras provides different kinds of layers. Most of them share some common constructor arguments such as activation, kernel_initializer and bias_initializer, and kernel_regularizer and bias_regularizer.

Take care with the delayed-build pattern: if no input shape is specified on the first layer, the model gets built the first time the model is called on some input data or when methods such as fit, eval, predict, and summary are called. The graph is finalized with the compile method, which configures the model before the learning phase. Then, we can evaluate the model or make predictions.

Using the Keras Functional API

The Keras Sequential API is great for developing deep learning models in most situations. However, this API has some limitations, such as a linear topology, that could be overcome with the Functional API. Note that many high-performing networks are based on a non-linear topology such as Inception, ResNet, etc.

The Functional API allows defining complex models with a non-linear topology, multiple inputs, multiple outputs, residual connections with non-sequential flows, and shared and reusable layers.

The deep learning model is usually a directed acyclic graph (DAG). The Functional API is a way to build a graph of layers and create more flexible models than the tf.keras.Sequential API.

Getting ready

This recipe will cover the main ways of creating a Functional model, using callable models, manipulating complex graph topologies, sharing layers, and finally introducing the concept of the layer "node" with the Keras Sequential API.

As usual, we just need to import TensorFlow as follows:

import tensorflow as tf
from tensorflow import keras
from keras.layers import Input, Dense, TimeDistributed
import keras.models

We are ready to proceed with an explanation of how to do it.

How to do it...

Let's go and make a Functional model for recognizing the MNIST dataset of handwritten digits. We will predict the handwritten digits from grayscale images.

Creating a Functional model

First, we will load the MNIST dataset.

mnist = tf.keras.datasets.mnist
(X_mnist_train, y_mnist_train), (X_mnist_test, y_mnist_test) = mnist.load_data()

Then, we will create an input node with a 28x28 dimensional shape. Remember that in Keras, the input layer is not a layer but a tensor, and we have to specify the input shape for the first layer. This tensor must have the same shape as our training data. By default, a Keras model is defined to support any batch size, so the batch size isn't mandatory. Input() is used to instantiate a Keras tensor.
```
inputs = tf.keras.Input(shape=(28,28))
```
Then, we will flatten the images of size (28,28) using the following command. This will produce an array of 784 pixels.
```
flatten_layer = keras.layers.Flatten()
```
We'll add a new node in the graph of layers by calling the flatten_layer on the inputs object:
```
flatten_output = flatten_layer(inputs)
```
The "layer call" action is like drawing an arrow from inputs to the flatten_layer. We're "passing" the inputs to the flatten layer, and as a result, it produces outputs. A layer instance is callable (on a tensor) and returns a tensor.

Then, we'll create a new layer instance:

dense_layer = tf.keras.layers.Dense(50, activation='relu')

We'll add a new node:

dense_output = dense_layer(flatten_output)

To build a model, multiple layers are stacked. In this example, we will add another dense layer to do a classification task between 10 classes:
```
predictions = tf.keras.layers.Dense(10, activation='softmax')(dense_output)
```
Input tensor(s) and output tensor(s) are used to define a model. The model is a function of one or more input layers and one or more output layers. The model instance formalizes the computational graph on how the data flows from input(s) to output(s).
```
model = keras.Model(inputs=inputs, outputs=predictions)
```
Now, we'll print the summary.
```
model.summary()
```
This results in the following output:

Figure 3.1: Summary of the model

Such a model can be trained and evaluated by the same compile, fit, evaluate, and predict methods used in the Keras Sequential model.

model.compile(optimizer='sgd',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])
model.fit(X_mnist_train, y_mnist_train,
          validation_data=(X_mnist_train, y_mnist_train),
          epochs=10)

In this recipe, we have built a model using the Functional API.

Using callable models like layers

Let's go into the details of the Functional API with callable models.

With the Functional API, it is easy to reuse trained models: any model can be treated as a layer, by calling it on a tensor. We will reuse the model defined in the previous section as a layer to see this in action. It's a classifier for 10 categories. This model returns 10 probabilities: 1 for each category. It's called a 10-way softmax. So, by calling the model defined above, the model will predict for each input one of the 10 classes.
```
x = Input(shape=(784,))
# y will contain the prediction for x
y = model(x)
```
Note that by calling a model, we are not just reusing the model architecture, we are also reusing its weights.

If we're facing a sequence problem, creating a model will become very easy with the Functional API. For example, instead of processing one image, we want to process a video composed of many images. We could turn an image classification model into a video classification model in just one line using the TimeDistributed layer wrapper. This wrapper applies our previous model to every temporal slice of the input sequence, or in other words, to each image of our video.

from keras.layers import TimeDistributed
# Input tensor for sequences of 50 timesteps,
# Each containing a 28x28 dimensional matrix.
input_sequences = tf.keras.Input(shape=(10, 28, 28))
# We will apply the previous model to each sequence so one for each timestep.
# The MNIST model returns a vector with 10 probabilities (one for each digit).
# The TimeDistributed output will be a sequence of 50 vectors of size 10.
processed_sequences = tf.keras.layers.TimeDistributed(model)(input_sequences)

We have seen that models are callable like layers. Now, we'll learn how to create complex models with a non-linear topology.

Creating a model with multiple inputs and outputs

The Functional API makes it easy to manipulate a large number of intertwined datastreams with multiple inputs and outputs and non-linear connectivity topologies. These cannot be handled with the Sequential API, which isn't able to create a model with layers that aren't connected sequentially or with multiple inputs or outputs.

Let's go with an example. We're going to build a system for predicting the price of a specific house and the elapsed time before its sale.

The model will have two inputs:

Data about the house such as the number of bedrooms, house size, air conditioning, fitted kitchen, etc.
A recent picture of the house

This model will have two outputs:

The elapsed time before the sale (two categories – slow or fast)
The predicted price

To build this system, we'll start by building the first block to process tabular data about the house.

house_data_inputs = tf.keras.Input(shape=(128,), name='house_data')
x = tf.keras.layers.Dense(64, activation='relu')(house_data_inputs)
block_1_output = tf.keras.layers.Dense(32, activation='relu')(x)

Then, we'll build the second block to process the house image data.

house_picture_inputs = tf.keras.Input(shape=(128,128,3), name='house_picture')
x = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')(house_picture_inputs)
x = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')(x)
block_2_output = tf.keras.layers.Flatten()(x)

Now, we'll merge all available features into a single large vector via concatenation.
```
x = tf.keras.layers.concatenate([block_1_output, block_2_output])
```
Then, we'll stick a logistic regression for price prediction on top of the features.
```
price_pred = tf.keras.layers.Dense(1, name='price', activation='relu')(x)
```

And, we'll stick a time classifier on top of the features.

time_elapsed_pred = tf.keras.layers.Dense(2, name='elapsed_time', activation='softmax')(x)

Now, we'll build the model.

model = keras.Model([house_data_inputs, house_picture_inputs],
                   [price_pred, time_elapsed_pred],
                   name='toy_house_pred')

Now, we'll plot the model.

keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)

This results in the following output:

Figure 3.2: Plot of a model with multiple inputs and outputs

In this recipe, we have created a complex model using the Functional API with multiple inputs and outputs that predicts the price of a specific house and the elapsed time before its sale. Now, we'll introduce the concept of shared layers.

Shared layers

Some models reuse the same layer multiple times inside their architecture. These layer instances learn features that correspond to multiple paths in the graph of layers. Shared layers are often used to encode inputs from similar spaces.

To share a layer (weights and all) across different inputs, we only need to instantiate the layer once and call it on as many inputs as we want.

Let's consider two different sequences of text. We will apply the same embedding layer to these two sequences, which feature similar vocabulary.

# Variable-length sequence of integers
text_input_a = tf.keras.Input(shape=(None,), dtype='int32')
# Variable-length sequence of integers
text_input_b = tf.keras.Input(shape=(None,), dtype='int32')
# Embedding for 1000 unique words mapped to 128-dimensional vectors
shared_embedding = tf.keras.layers.Embedding(1000, 128)
# Reuse the same layer to encode both inputs
encoded_input_a = shared_embedding(text_input_a)
encoded_input_b = shared_embedding(text_input_b)

In this recipe, we have learned how to reuse a layer multiple times in the same model. Now, we'll introduce the concept of extracting and reusing a layer.

Extracting and reusing nodes in the graph of layers

In the first recipe of this chapter, we saw that a layer is an instance that takes a tensor as an argument and returns another tensor. A model is composed of several layer instances. These layer instances are objects that are chained one to another by their layer input and output tensors. Each time we instantiate a layer, the output of the layer is a new tensor. By adding a "node" to the layer, we link the input to the output tensor.

The graph of layers is a static data structure. With the Keras Functional API, we can easily access and inspect the model.

The tf.keras.application module contains canned architectures with pre-trained weights.

Let's go to download the ResNet 50 pre-trained model.
```
resnet = tf.keras.applications.resnet.ResNet50()
```
Then, we'll display the intermediate layers of the model by querying the graph data structure:
```
intermediate_layers = [layer.output for layer in resnet.layers]
```
Then, we'll display the top 10 intermediate layers of the model by querying the graph data structure:
```
intermediate_layers[:10]
```

This results in the following output:

 [<tf.Tensor 'input_7:0' shape=(None, 224, 224, 3) dtype=float32>,
 <tf.Tensor 'conv1_pad/Pad:0' shape=(None, 230, 230, 3) dtype=float32>,
 <tf.Tensor 'conv1_conv/BiasAdd:0' shape=(None, 112, 112, 64) dtype=float32>,
 <tf.Tensor 'conv1_bn/cond/Identity:0' shape=(None, 112, 112, 64) dtype=float32>,
 <tf.Tensor 'conv1_relu/Relu:0' shape=(None, 112, 112, 64) dtype=float32>,
 <tf.Tensor 'pool1_pad/Pad:0' shape=(None, 114, 114, 64) dtype=float32>,
 <tf.Tensor 'pool1_pool/MaxPool:0' shape=(None, 56, 56, 64) dtype=float32>,
 <tf.Tensor 'conv2_block1_1_conv/BiasAdd:0' shape=(None, 56, 56, 64) dtype=float32>,
 <tf.Tensor 'conv2_block1_1_bn/cond/Identity:0' shape=(None, 56, 56, 64) dtype=float32>,
 <tf.Tensor 'conv2_block1_1_relu/Relu:0' shape=(None, 56, 56, 64) dtype=float32>]

Now, we'll select all the feature layers. We'll go into the details in the convolution neural network chapter.
```
feature_layers = intermediate_layers[:-2]
```

Then, we'll reuse the nodes in order to create our feature-extraction model.

feat_extraction_model = keras.Model(inputs=resnet.input, outputs=feature_layers)

One of the interesting benefits of a deep learning model is that it can be reused partly or wholly on similar predictive modeling problems. This technique is called "transfer learning": it significantly improves the training phase by decreasing the training time and the model's performance on a related problem.

The new model architecture is based on one or more layers from a pre-trained model. The weights of the pre-trained model may be used as the starting point for the training process. They can be either fixed or fine-tuned, or totally adapted during the learning phase. The two main approaches to implement transfer learning are weight initialization and feature extraction. Don't worry, we'll go into the details later in this book.

In this recipe, we have loaded a pretrained model based on the VGG19 architecture. We have extracted nodes from this model and reused them in a new model.

How it works...

The Keras Sequential API is appropriate in the vast majority of cases but is limited to creating layer-by-layer models. The Functional API is more flexible and allows extracting and reusing nodes, sharing layers, and creating non-linear models with multiple inputs and multiple outputs. Note that many high-performing networks are based on a non-linear topology.

In this recipe, we have learned how to build models using the Keras Functional API. These models are trained and evaluated by the same compile, fit, evaluate, and predict methods used by the Keras Sequential model.

We have also viewed how to reuse trained models as a layer, how to share layers, and also how to extract and reuse nodes. This last approach is used in transfer learning techniques that speed up training and improve performance.

There's more...

As we can access every layer, models built with the Keras Functional API have specific features such as model plotting, whole-model saving, etc.

Models built with the Functional API could be complex, so here are some tips to consider to avoid pulling your hair out during the process:

Name the layers: It will be quite useful when we display summaries and plots of the model graph.
Separate submodels: Consider each submodel as being like a Lego brick that we will combine together with the others at the end.
Review the layer summary: Use the summary method to check the outputs of each layer.
Review graph plots: Use the plot method to display and check the connection between the layers.
Consistent variable names: Use the same variable name for the input and output layers. It avoids copy-paste mistakes.

Using the Keras Subclassing API

Keras is based on object-oriented design principles. So, we can subclass the Model class and create our model architecture definition.

The Keras Subclassing API is the third way proposed by Keras to build deep neural network models.

This API is fully customizable, but this flexibility also brings complexity! So, hold on to your hats, it's harder to use than the Sequential or Functional API.

But you're probably wondering why we need this API if it's so hard to use. Some model architectures and some custom layers can be extremely challenging. Some researchers and some developers hope to have full control of their models and the way to train them. The Subclassing API provides these features. Let's go into the details.

Getting ready

Here, we will cover the main ways of creating a custom layer and a custom model using the Keras Subclassing API.

To start, we load TensorFlow, as follows:

import tensorflow as tf
from tensorflow import keras

We are ready to proceed with an explanation of how to do it.

How to do it...

Let's start by creating our layer.

Creating a custom layer

As explained in the Understanding Keras layers section, Keras provides various built-in layers such as dense, convolutional, recurrent, and normalization layers through its layered API.

All layers are subclasses of the Layer class and implement these methods:

The build method, which defines the weights of the layer.
The call method, which specifies the transformation from inputs to outputs done by the layer.
The compute_output_shape method, if the layer modifies the shape of its input. This allows Keras to perform automatic shape inference.
The get_config and from_config methods, if the layer is serialized and deserialized.

Let's put the theory into action. First, we'll make a subclass layer for a custom dense layer:

class MyCustomDense(tf.keras.layers.Layer):
    # Initialize this class with the number of units
    def __init__(self, units):
        super(MyCustomDense, self).__init__()
        self.units = units
 
    # Define the weights and the bias
    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units),
                            initializer='random_normal',
                            trainable=True)
        self.b = self.add_weight(shape=(self.units,),
                            initializer='random_normal',
                            trainable=True)
 
    # Applying this layer transformation to the input tensor
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
    # Function to retrieve the configuration
    def get_config(self):
        return {'units': self.units}

Then, we'll create a model using the MyCustomDense layer created in the previous step:

# Create an input layer
inputs = keras.Input((12,4))
# Add an instance of MyCustomeDense layer
outputs = MyCustomDense(2)(inputs)
# Create a model
model = keras.Model(inputs, outputs)
# Get the model config
config = model.get_config()

Next, we will reload the model from the config:

new_model = keras.Model.from_config(config, 
                              custom_objects={'MyCustomDense': MyCustomDense})

In this recipe, we have created our Layer class. Now, we'll create our model.

Creating a custom model

By subclassing the tf.keras.Model class, we can build a fully customizable model.

We define our layers in the __init__ method, and we can have full, complete control over the forward pass of the model by implementing the call method. The training Boolean argument can be used to specify different behavior during the training or inference phase.

First, we will load the MNIST dataset and normalize the grayscale:

mnist = tf.keras.datasets.mnist
(X_mnist_train, y_mnist_train), (X_mnist_test, y_mnist_test) = mnist.load_data()
train_mnist_features = X_mnist_train/255
test_mnist_features = X_mnist_test/255

Let's go and make a subclass Model for recognizing MNIST data:

class MyMNISTModel(tf.keras.Model):
    def __init__(self, num_classes):
        super(MyMNISTModel, self).__init__(name='my_mnist_model')
        self.num_classes = num_classes
        self.flatten_1 = tf.keras.layers.Flatten()
        self.dropout = tf.keras.layers.Dropout(0.1)
        self.dense_1 = tf.keras.layers.Dense(50, activation='relu')
        self.dense_2 = tf.keras.layers.Dense(10, activation='softmax')
    def call(self, inputs, training=False):
        x = self.flatten_1(inputs)
        # Apply dropout only during the training phase
        x = self.dense_1(x)
        if training:
            x = self.dropout(x, training=training)
        return self.dense_2(x)

Now, we are going to instantiate the model and process the training:

my_mnist_model = MyMNISTModel(10)
# Compile
my_mnist_model.compile(optimizer='sgd',
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy'])
# Train
my_mnist_model.fit(train_features, y_train,
                  validation_data=(test_features, y_test),
                  epochs=10)

How it works...

The Subclassing API is a way for deep learning practitioners to build their layers or models using object-oriented Keras design principles. We recommend using this API only if your model cannot be achieved using the Sequential or the Functional API. Although this way can be complicated to implement, it remains useful in a few cases, and it is interesting for all developers and researchers to know how layers and models are implemented in Keras.

Using the Keras Preprocessing API

The Keras Preprocessing API gathers modules for data processing and data augmentation. This API provides utilities for working with sequence, text, and image data. Data preprocessing is an essential step in machine learning and deep learning. It converts, transforms, or encodes raw data into an understandable, useful, and efficient format for learning algorithms.

Getting ready

This recipe will cover some preprocessing methods provided by Keras for sequence, text, and image data.

As usual, we just need to import TensorFlow as follows:

import tensorflow as tf
from tensorflow import keras
import numpy as np
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator, pad_sequences, skipgrams, make_sampling_table
from tensorflow.keras.preprocessing.text import text_to_word_sequence, one_hot, hashing_trick, Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

We are ready to proceed with an explanation of how to do it.

How to do it...

Let's start with the sequence data.

Sequence preprocessing

Sequence data is data where the order matters, such as text or a time series. So, a time series is defined by a series of data points ordered by time.

Time series generator

Keras provides utilities for preprocessing sequence data such as time series data. It takes in consecutive data points and applies transformations using time series parameters such as stride, length of history, etc., to return a TensorFlow dataset instance.

Let's go with a toy time series dataset of 10 integer values:

series = np.array([i for i in range(10)])
print(series)

This results in the following output:
```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

We want to predict the next value from the last five lag observations. So, we'll define a generator with the length argument set to 5. This argument specifies the length of the output sequences in a number of timesteps:

generator = TimeseriesGenerator(data = series,
                               targets = series,
                               length=5,
                               batch_size=1,
                               shuffle=False,
                               reverse=False)

We want to generate samples composed of 5 lag observations for one prediction and the toy time series dataset contains 10 values. So, the number of samples generated is 5:
```
# number of samples
print('Samples: %d' % len(generator))
```
Then, we'll display the inputs and output of each sample and check that the data is well prepared:
```
for i in range(len(generator)):
    x, y = generator[i]
    print('%s => %s' % (x, y))
```

This results in the following output:

[[0 1 2 3 4]] => [5]
[[1 2 3 4 5]] => [6]
[[2 3 4 5 6]] => [7]
[[3 4 5 6 7]] => [8]
[[4 5 6 7 8]] => [9]

Now, we'll create and compile a model:

model = Sequential()
model.add(Dense(10, activation='relu', input_dim=5))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

And we'll train the model by giving the generator as input data:
```
model.fit(generator, epochs=10)
```

Preparing time series data for modeling with deep learning methods can be very challenging. But fortunately, Keras provides a generator that will help us transform a univariate or multivariate time series dataset into a data structure ready to train models. This generator offers many options to prepare the data, such as the shuffle, the sampling rate, the start and end offsets, etc. We recommend consulting the official Keras API to get more details.

Now, we'll focus on how to prepare data for variable-length input sequences.

Padding sequences

When processing sequence data, each sample often has different lengths. In order for all the sequences to fit the desired length, the solution is to pad them. Sequences shorter than the defined sequence length are padded with values at the end (by default) or the beginning of each sequence. Otherwise, if the sequence is greater than the desired length, the sequence is truncated.

Let's start with four sentences:

sentences = [["What", "do", "you", "like", "?"],
             ["I", "like", "basket-ball", "!"],
             ["And", "you", "?"],
             ["I", "like", "coconut", "and", "apple"]]

First, we'll build the vocabulary lookup table. We'll create two dictionaries to go from the words to integer identifiers and vice versa.

text_set = set(np.concatenate(sentences))
vocab_to_int = dict(zip(text_set, range(len(text_set))))
int_to_vocab = {vocab_to_int[word]:word for word in vocab_to_int.keys()}

Then after building the vocabulary lookup table, we'll encode the sentences as integer arrays.

encoded_sentences = []
for sentence in sentences:
    encoded_sentence = [vocab_to_int[word] for word in sentence]
    encoded_sentences.append(encoded_sentence)
encoded_sentences

This results in the following output:

[[8, 4, 7, 6, 0], [5, 6, 2, 3], [10, 7, 0], [5, 6, 1, 9, 11]]

Now, we'll use the pad_sequences function to truncate and pad sequences to a common length easily. The pre-sequence padding is activated by default.
```
pad_sequences(encoded_sentences)
```

This results in the following output:

array([[ 8,  4,  7,  6,  0],
       [ 0,  5,  6,  2,  3],
       [ 0,  0, 10,  7,  0],
       [ 5,  6,  1,  9, 11]], dtype=int32)

Then, we'll activate the post-sequence padding and set the maxlen argument to the desired length – here, 7.
```
pad_sequences(encoded_sentences, maxlen = 7)
```

This results in the following output:

array([[ 0,  0,  8,  4,  7,  6,  0],
       [ 0,  0,  0,  5,  6,  2,  3],
       [ 0,  0,  0,  0, 10,  7,  0],
       [ 0,  0,  5,  6,  1,  9, 11]], dtype=int32)

The length of the sequence can also be trimmed to the desired length – here, 3. By default, this function removes timesteps from the beginning of each sequence.
```
pad_sequences(encoded_sentences, maxlen = 3)
```

This results in the following output:

array([[ 7,  6,  0],
       [ 6,  2,  3],
       [10,  7,  0],
       [ 1,  9, 11]], dtype=int32)

Set the truncating argument to post to remove timesteps from the end of each sequence.
```
pad_sequences(encoded_sentences, maxlen = 3, truncating='post')
```

This results in the following output:

array([[ 8,  4,  7],
       [ 5,  6,  2],
       [10,  7,  0],
       [ 5,  6,  1]], dtype=int32)

Padding is very useful when we want all sequences in a list to have the same length.

In the next section, we will cover a very popular technique for preprocessing text.

Skip-grams

Skip-grams is one of the unsupervised learning techniques in natural language processing. It finds the most related words for a given word and predicts the context word for this given word.

Keras provides the skipgrams pre-processing function, which takes in an integer-encoded sequence of words and returns the relevance for each pair of words in the defined window. If the pair of words is relevant, the sample is positive, and the associated label is set to 1. Otherwise, the sample is considered negative, and the label is set to 0.

An example is better than thousands of words. So, let's take this sentence, "I like coconut and apple," select the first word as our "context word," and use a window size of two. We make pairs of the context word "I" with the word covered in the specified window. So, we have two pairs of words (I, like) and (I, coconut), both of which equal 1.

Let's put the theory into action:

First, we'll encode a sentence as a list of word indices:

sentence = "I like coconut and apple"
encoded_sentence = [vocab_to_int[word] for word in sentence.split()]
vocabulary_size = len(encoded_sentence)

Then, we'll call the skipgrams function with a window size of 1:

pairs, labels = skipgrams(encoded_sentence, 
                          vocabulary_size, 
                          window_size=1,
                          negative_samples=0)

Now, we'll print the results:

for i in range(len(pairs)):
    print("({:s} , {:s} ) -> {:d}".format(
          int_to_vocab[pairs[i][0]], 
          int_to_vocab[pairs[i][1]], 
          labels[i]))

This results in the following output:

(coconut , and ) -> 1
(apple , ! ) -> 0
(and , coconut ) -> 1
(apple , and ) -> 1
(coconut , do ) -> 0
(like , I ) -> 1
(and , apple ) -> 1
(like , coconut ) -> 1
(coconut , do ) -> 0
(I , like ) -> 1
(coconut , like ) -> 1
(and , do ) -> 0
(like , coconut ) -> 0
(I , ! ) -> 0
(like , ! ) -> 0
(and , coconut ) -> 0

Note that the non-word is defined by index 0 in the vocabulary and will be skipped. We recommend that readers consult the Keras API to find more details about padding.

Now, let's introduce some tips to preprocess text data.

Text preprocessing

In deep learning, we cannot feed raw text directly into our network. We have to encode our text as numbers and provide integers as input. Our model will generate integers as output. This module provides utilities for preprocessing text input.

Split text to word sequence

Keras provides the text_to_word_sequence method, which transforms a sequence into a list of words or tokens.

Let's go with this sentence:

sentence = "I like coconut , I like apple"

Then, we'll call the method that converts a sentence into a list of words. By default, this method splits the text on whitespace.
```
text_to_word_sequence(sentence, lower=False) 
```

This results in the following output:

['I', 'like', 'coconut', 'I', 'like', 'apple']

Now, we'll set the lower argument to True, and the text will be converted to lower case:
```
text_to_word_sequence(sentence, lower=True, filters=[])
```

This results in the following output:

['i', 'like', 'coconut', ',', 'i', 'like', 'apple']

Note that by default, the filter argument filters out a list of characters such as punctuation. In our last code execution, we removed all the predefined filters.

Let's continue with a method to encode words or categorical features.

Tokenizer

The Tokenizer class is the utility class for text tokenization. It's the preferred approach for preparing text in deep learning.

This class takes as inputs:

The maximum number of words to keep. Only the most common words will be kept based on word frequency.
A list of characters to filter out.
A boolean to convert the text into lower case, or not.
The separator for word splitting.

Let's go with this sentence:

sentences = [["What", "do", "you", "like", "?"],
             ["I", "like", "basket-ball", "!"],
             ["And", "you", "?"],
             ["I", "like", "coconut", "and", "apple"]]

Now, we will create a Tokenizer instance and fit it on the previous sentences:

# create the tokenizer
t = Tokenizer()
# fit the tokenizer on the documents
t.fit_on_texts(sentences)

The tokenizer creates several pieces of information about the document. We can get a dictionary containing the count for each word.
```
print(t.word_counts)
```

This results in the following outputs:

OrderedDict([('what', 1), ('do', 1), ('you', 2), ('like', 3), ('?', 2), ('i', 2), ('basket-ball', 1), ('!', 1), ('and', 2), ('coconut', 1), ('apple', 1)])

We can also get a dictionary containing, for each word, the number of documents in which it appears:
```
print(t.document_count)
```
This results in the following outputs:
```
4
```
A dictionary contains, for each word, its unique integer identifier:
```
print(t.word_index)
```

This results in the following outputs:

{'like': 1, 'you': 2, '?': 3, 'i': 4, 'and': 5, 'what': 6, 'do': 7, 'basket-ball': 8, '!': 9, 'coconut': 10, 'apple': 11}

The number of unique documents that were used to fit the Tokenizer.
```
print(t.word_docs)
```

This results in the following outputs:

defaultdict(<class 'int'>, {'do': 1, 'like': 3, 'what': 1, 'you': 2, '?': 2, '!': 1, 'basket-ball': 1, 'i': 2, 'and': 2, 'coconut': 1, 'apple': 1})

Now, we are ready to encode our documents, thanks to the texts_to_matrix function. This function provides four different document encoding schemes to compute the coefficient for each token.
Let's start with the binary mode, which returns whether or not each token is present in the document.
```
t.texts_to_matrix(sentences, mode='binary')
```

This results in the following outputs:

 [[0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0.]
 [0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1. 1.]]

The Tokenizer API offers another mode based on word count – it returns the count of each word in the document:
```
t.texts_to_matrix(sentences, mode='count')
```

This results in the following outputs:

[[0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0.]
 [0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1. 1.]]

Note that we can also use the tfidf mode or the frequency mode. The first returns the term frequency-inverse document frequency score for each word, and the second returns the frequency of each word in the document related to the total number of words in the document.

The Tokenizer API can fit the training dataset and encode text data in the training, validation, and test datasets.

In this section, we have covered a few techniques to prepare text data before training and prediction.

Now, let's go on to prepare and augment images.

Image preprocessing

The data preprocessing module provides a set of tools for real-time data augmentation on image data.

In deep learning, the performance of a neural network is often improved by the number of examples available in the training dataset.

The ImageDataGenerator class in the Keras preprocessing API allows the creation of new data from the training dataset. It isn't applied to the validation or test dataset because it aims to expand the number of examples in the training datasets with plausible new images. This technique is called data augmentation. Beware not to confuse data preparation with data normalization or image resizing, which is applied to all data in interaction with the model. Data augmentation includes many transformations from the field of image manipulation, such as rotation, horizontal and vertical shift, horizontal and vertical flip, brightness, and much more.

The strategy may differ depending on the task to realize. For example, in the MNIST dataset, which contains images of handwritten digits, applying a horizontal flip doesn't make sense. Except for the figure 8, this transformation isn't appropriate.

While in the case of a baby picture, applying this kind of transformation makes sense because the image could have been taken from the left or right.

Let's put the theory into action and perform a data augmentation on the CIFAR10 dataset. We will start by downloading the CIFAR dataset.

# Load CIFAR10 Dataset
(x_cifar10_train, y_cifar10_train), (x_cifar10_test, y_cifar10_test) = tf.keras.datasets.cifar10.load_data()

Now, we'll create an image data generator that applies a horizontal flip, a random rotation between 0 and 15, and a shift of 3 pixels on the width and on the height.
```
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=15,
    width_shift_range=3,
    height_shift_range=3,
    horizontal_flip=True)
```

Create an iterator on the train dataset.

it= datagen.flow(x_cifar10_train, y_cifar10_train, batch_size = 32)

Create a model and compile it.

model = tf.keras.models.Sequential([
   tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu", input_shape=[32, 32, 3]),
   tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu"),
   tf.keras.layers.MaxPool2D(pool_size=2),
   tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation="relu"),
   tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation="relu"),
   tf.keras.layers.MaxPool2D(pool_size=2),
   tf.keras.layers.Flatten(),
   tf.keras.layers.Dense(128, activation="relu"),
   tf.keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
             optimizer=tf.keras.optimizers.SGD(lr=0.01),
             metrics=["accuracy"])

And process the training by calling the fit method. Take care to set the step_per_epoch argument, which specifies the number of sample batches comprising an epoch.

history = model.fit(it, epochs=10,
                    steps_per_epoch=len(x_cifar10_train) / 32,
                    validation_data=(x_cifar10_test,                                           y_cifar10_test))

With the image data generator, we have extended the size of our original dataset by creating new images. With more images, the training of a deep learning model can be improved.

How it works...

The Keras Preprocessing API allows transforming, encoding, and augmenting data for neural networks. It makes it easier to work with sequence, text, and image data.

First, we introduced the Keras Sequence Preprocessing API. We used the time series generator to transform a univariate or multivariate time series dataset into a data structure ready to train models. Then, we focused on the data preparation for variable-length input sequences, aka padding. And we finished this first part with the skip-gram technique, which finds the most related words for a given word and predicts the context word for that given word.

Then, we covered the Keras Text Preprocessing API, which offers a complete turnkey solution to process natural language. We learned how to split text into words and tokenize the words using binary, word count, tfidf, or frequency mode.

Finally, we focused on the Image Preprocessing API using the ImageDataGenerator, which is a real advantage to increase the size of your training dataset and to work with images.

Table of Contents for Keras

Create new playlist

Sign In

Sign Up

3

Keras

Introduction

Understanding Keras layers

Getting ready

How to do it...

How it works...

See also

Using the Keras Sequential API

Getting ready

How to do it...

How it works...

See also

Using the Keras Functional API

Getting ready

How to do it...

Creating a Functional model

Using callable models like layers

Creating a model with multiple inputs and outputs

Shared layers

Extracting and reusing nodes in the graph of layers

How it works...

There's more...

See also

Using the Keras Subclassing API

Getting ready

How to do it...

Creating a custom layer

Creating a custom model

How it works...

See also

Using the Keras Preprocessing API

Getting ready

How to do it...

Sequence preprocessing

Time series generator

Padding sequences

Skip-grams

Text preprocessing

Split text to word sequence

Tokenizer

Image preprocessing

How it works...

See also

Table of Contents for
Keras