In this chapter, we will focus on the high-level TensorFlow API named Keras.
By the end of this chapter, you should have a better understanding of:
In the previous chapter, we covered TensorFlow's fundamentals, and we are now able to set up a computational graph. This chapter will introduce Keras, a high-level neural network API written in Python with multiple backends. TensorFlow is one of them. François Chollet, a French software engineer and AI researcher currently working at Google, created Keras for his own personal use before it was open-sourced in 2015. Keras's primary goal is to provide an easy-to-use and accessible library to enable fast experiments.
TensorFlow v1 suffers from usability issues; in particular, a sprawling and sometimes confusing API. For example, TensorFlow v1 offers two high-level APIs:
With TensorFlow v2, Keras became the official high-level API. Keras can scale and suit various user profiles, from research to application development and from model training to deployment. Keras provides four key advantages: it's user-friendly (without sacrificing flexibility and performance), modular, composable, and scalable.
The TensorFlow Keras APIs are the same as the Keras API. However, the implementation of Keras in its TensorFlow version of the backend has been optimized for TensorFlow. It integrates TensorFlow-specific functionality, such as eager execution, data pipelines, and Estimators.
The difference between Keras, the independent library, and Keras' implementation as integrated with TensorFlow is only the way to import it.
Here is the command to import the Keras API specification:
import keras
Here is TensorFlow's implementation of the Keras API specification:
import tensorflow as tf
from tensorflow import keras
Now, let's start by discovering the basic building blocks of Keras.
Keras layers are the fundamental building blocks of Keras models. Each layer receives data as input, does a specific task, and returns an output.
Keras includes a wide range of built-in layers:
We can also write our Keras layers as explained in the Keras Subclassing API section of this chapter.
To start, we'll review some methods that are common in all Keras layers. These methods are very useful to know the configuration and the state of a layer.
get_weights()
function returns the weights of the layer as a list of NumPy arrays:
layer.get_weights()
The set_weights()
method fixes the weights of the layer from a list of Numpy arrays:
layer.set_weights(weights)
layer.input
layer.output
Or this one, if the layer has multiple nodes:
layer.get_input_at(node_index)
layer.get_output_at(node_index)
layer.input_shape
layer.output_shape
Or this one, if the layer has multiple nodes:
layer.get_input_shape_at(node_index)
layer.get_output_shape_at(node_index)
get_config()
function returns a dictionary containing the configuration of the layer:
layer.get_config()
The from_config()
method instantiates a layer's configuration:
layer.from_config(config)
Note that the layer configuration is stored in an associative array (Python dictionary), a data structure that maps keys to values.
The layers are the building blocks of the models. Keras offers a wide range of building layers and useful methods to know more about what's happening and get inside the models.
With Keras, we can build models in three ways: with the Sequential, the Functional, or the Subclassing API. We'll later see that only the last two APIs allow access to the layers.
For some references on the Keras Layers API, see the following documentation:
The main goal of Keras is to make it easy to create deep learning models. The Sequential API allows us to create Sequential models, which are a linear stack of layers. Models that are connected layer by layer can solve many problems. To create a Sequential model, we have to create an instance of a Sequential class, create some model layers, and add them to it.
We will go from the creation of our Sequential model to its prediction via the compilation, training, and evaluation steps. By the end of this recipe, you will have a Keras model ready to be deployed in production.
This recipe will cover the main ways of creating a Sequential model and assembling layers to build a model with the Keras Sequential API.
To start, we load TensorFlow and NumPy, as follows:
import tensorflow as tf
from tensorflow import keras
from keras.layers import Dense
import numpy as np
We are ready to proceed with an explanation of how to do it.
model = tf.keras.Sequential([
# Add a fully connected layer with 1024 units to the model
tf.keras.layers.Dense(1024, input_dim=64),
# Add an activation layer with ReLU activation function
tf.keras.layers.Activation('relu'),
# Add a fully connected layer with 256 units to the model
tf.keras.layers.Dense(256),
# Add an activation layer with ReLU activation function
tf.keras.layers.Activation('relu'),
# Add a fully connected layer with 10 units to the model
tf.keras.layers.Dense(10),
# Add an activation layer with softmax activation function
tf.keras.layers.Activation('softmax')
])
Another way to create a Sequential model is to instantiate a Sequential class and then add layers via the .add()
method.
model = tf.keras.Sequential()
# Add a fully connected layer with 1024 units to the model
model.add(tf.keras.layers.Dense(1024, input_dim=64))
# Add an activation layer with ReLU activation function
model.add(tf.keras.layers.Activation(relu))
# Add a fully connected layer with 256 units to the model
model.add(tf.keras.layers.Dense(256))
# Add an activation layer with ReLU activation function
model.add(tf.keras.layers.Activation('relu'))
# Add a fully connected Layer with 10 units to the model
model.add(tf.keras.layers.Dense(10))
# Add an activation layer with softmax activation function
model.add(tf.keras.layers.Activation('softmax'))
tf.keras.layers
API offers a lot of built-in layers and also provides an API to create our layers. In most of them, we can set these parameters to the layer's constructor:# Creation of a dense layer with a sigmoid activation function:
Dense(256, activation='sigmoid')
# Or:
Dense(256, activation=tf.keras.activations.sigmoid)
# A dense layer with a kernel initialized to a truncated normal distribution:
Dense(256, kernel_initializer='random_normal')
# A dense layer with a bias vector initialized with a constant value of 5.0:
Dense(256, bias_initializer=tf.keras.initializers.Constant(value=5))
# A dense layer with L1 regularization of factor 0.01 applied to the kernel matrix:
Dense(256, kernel_regularizer=tf.keras.regularizers.l1(0.01))
# A dense layer with L2 regularization of factor 0.01 applied to the bias vector:
Dense(256, bias_regularizer=tf.keras.regularizers.l2(0.01))
Each type of layer requires input with a certain number of dimensions, so there are different ways to specify the input shape depending on the kind of layer. Here, we'll focus on the Dense layer, so we'll use the input_dim
parameter. Since the shape of the weights depends on the input size, if the input shape isn't specified in advance, the model has no weights: the model is not built. In this case, you can't call any methods of the Layer
class such as summary
, layers
, weights
, and so on.
In this recipe, we'll create datasets with 64 features, and we'll process batches of 10 samples. The shape of our input data is (10,64), aka (batch_size
, number_of_features
). By default, a Keras model is defined to support any batch size, so the batch size isn't mandatory. We just have to specify the number of features through the input_dim
parameter to our first layer.
Dense(256, input_dim=(64))
However, we can force the batch size for efficiency reasons with the batch_size
argument.
Dense(256, input_dim=(64), batch_size=10)
compile
method. We have to specify:tf.keras.optimizers
module. For example, we can use an instance of tf.keras.optimizers.RMSprop
or 'RMSprop'
, which is an optimizer that implements the RMSprop
algorithm.categorical_crossentropy
or mse
), a symbolic TensorFlow loss function (tf.keras.losses.MAPE
), or a custom loss function, which takes as input two tensors (true tensors and predicted tensors) and returns a scalar for each data point.tf.keras.metrics
module.run_eagerly
to true.Note that the graph is finalized with the compile
method.
Now, we'll compile the model using the Adam optimizer for categorical cross-entropy loss and display the accuracy metric.
model.compile(
optimizer="adam",
loss="categorical_crossentropy",
metrics=["accuracy"]
)
data = np.random.random((2000, 64))
labels = np.random.random((2000, 10))
val_data = np.random.random((500, 64))
val_labels = np.random.random((500, 10))
test_data = np.random.random((500, 64))
test_labels = np.random.random((500, 10))
fit
method. The training configuration is done by these three arguments:batch_size
argument. Note that the last batch may be smaller if the total number of samples is not divisible by the batch size.validation_data
argument (a tuple of inputs and labels). This dataset makes it easy to monitor the performance of the model. The loss and metrics are computed in inference mode at the end of each epoch.Now, we'll train the model on our toy datasets by calling the fit
method:
model.fit(data, labels, epochs=10, batch_size=50,
validation_data=(val_data, val_labels))
model.evaluate
function, which predicts the loss value and the metric values of the model in test mode. Computation is done in batches. It has three important arguments: the input data, the target data, and the batch size. This function predicts the output for a given input. Then, it computes the metrics
function (specified in the model.compile
based on the target data) and the model's prediction and returns the computed metric value as the output.
model.evaluate(data, labels, batch_size=50)
tf.keras.Model.predict
method takes as input only data and returns a prediction. And here's how to predict the output of the last layer of inference for the data provided, as a NumPy array:
result = model.predict(data, batch_size=50)
Analyzing this model's performance is of no interest in this recipe because we randomly generated a dataset.
Now, let's move on to an analysis of this recipe.
Keras provides the Sequential API to create models composed of a linear stack of layers. We can either pass a list of layer instances as an array to the constructor or use the add
method.
Keras provides different kinds of layers. Most of them share some common constructor arguments such as activation
, kernel_initializer
and bias_initializer
, and kernel_regularizer
and bias_regularizer
.
Take care with the delayed-build pattern: if no input shape is specified on the first layer, the model gets built the first time the model is called on some input data or when methods such as fit
, eval
, predict
, and summary
are called. The graph is finalized with the compile
method, which configures the model before the learning phase. Then, we can evaluate the model or make predictions.
For some references on the Keras Sequential API, visit the following websites:
The Keras Sequential API is great for developing deep learning models in most situations. However, this API has some limitations, such as a linear topology, that could be overcome with the Functional API. Note that many high-performing networks are based on a non-linear topology such as Inception, ResNet, etc.
The Functional API allows defining complex models with a non-linear topology, multiple inputs, multiple outputs, residual connections with non-sequential flows, and shared and reusable layers.
The deep learning model is usually a directed acyclic graph (DAG). The Functional API is a way to build a graph of layers and create more flexible models than the tf.keras.Sequential
API.
This recipe will cover the main ways of creating a Functional model, using callable models, manipulating complex graph topologies, sharing layers, and finally introducing the concept of the layer "node" with the Keras Sequential API.
As usual, we just need to import TensorFlow as follows:
import tensorflow as tf
from tensorflow import keras
from keras.layers import Input, Dense, TimeDistributed
import keras.models
We are ready to proceed with an explanation of how to do it.
Let's go and make a Functional model for recognizing the MNIST dataset of handwritten digits. We will predict the handwritten digits from grayscale images.
mnist = tf.keras.datasets.mnist
(X_mnist_train, y_mnist_train), (X_mnist_test, y_mnist_test) = mnist.load_data()
Input()
is used to instantiate a Keras tensor.
inputs = tf.keras.Input(shape=(28,28))
flatten_layer = keras.layers.Flatten()
the flatten_layer
on the inputs
object:
flatten_output = flatten_layer(inputs)
The "layer call" action is like drawing an arrow from inputs
to the flatten_layer
. We're "passing" the inputs to the flatten layer, and as a result, it produces outputs. A layer instance is callable (on a tensor) and returns a tensor.
dense_layer = tf.keras.layers.Dense(50, activation='relu')
dense_output = dense_layer(flatten_output)
dense
layer to do a classification task between 10 classes:
predictions = tf.keras.layers.Dense(10, activation='softmax')(dense_output)
model = keras.Model(inputs=inputs, outputs=predictions)
model.summary()
Figure 3.1: Summary of the model
compile, fit
, evaluate
, and predict
methods used in the Keras Sequential model.
model.compile(optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_mnist_train, y_mnist_train,
validation_data=(X_mnist_train, y_mnist_train),
epochs=10)
In this recipe, we have built a model using the Functional API.
Let's go into the details of the Functional API with callable models.
x = Input(shape=(784,))
# y will contain the prediction for x
y = model(x)
Note that by calling a model, we are not just reusing the model architecture, we are also reusing its weights.
TimeDistributed
layer wrapper. This wrapper applies our previous model to every temporal slice of the input sequence, or in other words, to each image of our video.
from keras.layers import TimeDistributed
# Input tensor for sequences of 50 timesteps,
# Each containing a 28x28 dimensional matrix.
input_sequences = tf.keras.Input(shape=(10, 28, 28))
# We will apply the previous model to each sequence so one for each timestep.
# The MNIST model returns a vector with 10 probabilities (one for each digit).
# The TimeDistributed output will be a sequence of 50 vectors of size 10.
processed_sequences = tf.keras.layers.TimeDistributed(model)(input_sequences)
We have seen that models are callable like layers. Now, we'll learn how to create complex models with a non-linear topology.
The Functional API makes it easy to manipulate a large number of intertwined datastreams with multiple inputs and outputs and non-linear connectivity topologies. These cannot be handled with the Sequential API, which isn't able to create a model with layers that aren't connected sequentially or with multiple inputs or outputs.
Let's go with an example. We're going to build a system for predicting the price of a specific house and the elapsed time before its sale.
The model will have two inputs:
This model will have two outputs:
house_data_inputs = tf.keras.Input(shape=(128,), name='house_data')
x = tf.keras.layers.Dense(64, activation='relu')(house_data_inputs)
block_1_output = tf.keras.layers.Dense(32, activation='relu')(x)
house_picture_inputs = tf.keras.Input(shape=(128,128,3), name='house_picture')
x = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')(house_picture_inputs)
x = tf.keras.layers.Conv2D(64, 3, activation='relu', padding='same')(x)
block_2_output = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.concatenate([block_1_output, block_2_output])
price_pred = tf.keras.layers.Dense(1, name='price', activation='relu')(x)
time_elapsed_pred = tf.keras.layers.Dense(2, name='elapsed_time', activation='softmax')(x)
model = keras.Model([house_data_inputs, house_picture_inputs],
[price_pred, time_elapsed_pred],
name='toy_house_pred')
keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)
Figure 3.2: Plot of a model with multiple inputs and outputs
In this recipe, we have created a complex model using the Functional API with multiple inputs and outputs that predicts the price of a specific house and the elapsed time before its sale. Now, we'll introduce the concept of shared layers.
Some models reuse the same layer multiple times inside their architecture. These layer instances learn features that correspond to multiple paths in the graph of layers. Shared layers are often used to encode inputs from similar spaces.
To share a layer (weights and all) across different inputs, we only need to instantiate the layer once and call it on as many inputs as we want.
Let's consider two different sequences of text. We will apply the same embedding layer to these two sequences, which feature similar vocabulary.
# Variable-length sequence of integers
text_input_a = tf.keras.Input(shape=(None,), dtype='int32')
# Variable-length sequence of integers
text_input_b = tf.keras.Input(shape=(None,), dtype='int32')
# Embedding for 1000 unique words mapped to 128-dimensional vectors
shared_embedding = tf.keras.layers.Embedding(1000, 128)
# Reuse the same layer to encode both inputs
encoded_input_a = shared_embedding(text_input_a)
encoded_input_b = shared_embedding(text_input_b)
In this recipe, we have learned how to reuse a layer multiple times in the same model. Now, we'll introduce the concept of extracting and reusing a layer.
In the first recipe of this chapter, we saw that a layer is an instance that takes a tensor as an argument and returns another tensor. A model is composed of several layer instances. These layer instances are objects that are chained one to another by their layer input and output tensors. Each time we instantiate a layer, the output of the layer is a new tensor. By adding a "node" to the layer, we link the input to the output tensor.
The graph of layers is a static data structure. With the Keras Functional API, we can easily access and inspect the model.
The tf.keras.application
module contains canned architectures with pre-trained weights.
resnet = tf.keras.applications.resnet.ResNet50()
intermediate_layers = [layer.output for layer in resnet.layers]
intermediate_layers[:10]
[<tf.Tensor 'input_7:0' shape=(None, 224, 224, 3) dtype=float32>,
<tf.Tensor 'conv1_pad/Pad:0' shape=(None, 230, 230, 3) dtype=float32>,
<tf.Tensor 'conv1_conv/BiasAdd:0' shape=(None, 112, 112, 64) dtype=float32>,
<tf.Tensor 'conv1_bn/cond/Identity:0' shape=(None, 112, 112, 64) dtype=float32>,
<tf.Tensor 'conv1_relu/Relu:0' shape=(None, 112, 112, 64) dtype=float32>,
<tf.Tensor 'pool1_pad/Pad:0' shape=(None, 114, 114, 64) dtype=float32>,
<tf.Tensor 'pool1_pool/MaxPool:0' shape=(None, 56, 56, 64) dtype=float32>,
<tf.Tensor 'conv2_block1_1_conv/BiasAdd:0' shape=(None, 56, 56, 64) dtype=float32>,
<tf.Tensor 'conv2_block1_1_bn/cond/Identity:0' shape=(None, 56, 56, 64) dtype=float32>,
<tf.Tensor 'conv2_block1_1_relu/Relu:0' shape=(None, 56, 56, 64) dtype=float32>]
feature_layers = intermediate_layers[:-2]
feat_extraction_model = keras.Model(inputs=resnet.input, outputs=feature_layers)
One of the interesting benefits of a deep learning model is that it can be reused partly or wholly on similar predictive modeling problems. This technique is called "transfer learning": it significantly improves the training phase by decreasing the training time and the model's performance on a related problem.
The new model architecture is based on one or more layers from a pre-trained model. The weights of the pre-trained model may be used as the starting point for the training process. They can be either fixed or fine-tuned, or totally adapted during the learning phase. The two main approaches to implement transfer learning are weight initialization and feature extraction. Don't worry, we'll go into the details later in this book.
In this recipe, we have loaded a pretrained model based on the VGG19 architecture. We have extracted nodes from this model and reused them in a new model.
The Keras Sequential API is appropriate in the vast majority of cases but is limited to creating layer-by-layer models. The Functional API is more flexible and allows extracting and reusing nodes, sharing layers, and creating non-linear models with multiple inputs and multiple outputs. Note that many high-performing networks are based on a non-linear topology.
In this recipe, we have learned how to build models using the Keras Functional API. These models are trained and evaluated by the same compile
, fit
, evaluate
, and predict
methods used by the Keras Sequential model.
We have also viewed how to reuse trained models as a layer, how to share layers, and also how to extract and reuse nodes. This last approach is used in transfer learning techniques that speed up training and improve performance.
As we can access every layer, models built with the Keras Functional API have specific features such as model plotting, whole-model saving, etc.
Models built with the Functional API could be complex, so here are some tips to consider to avoid pulling your hair out during the process:
summary
method to check the outputs of each layer.plot
method to display and check the connection between the layers.For some references on the Keras Functional API, visit the following websites:
tf.keras.Model
API: https://www.tensorflow.org/api_docs/python/tf/keras/ModelKeras is based on object-oriented design principles. So, we can subclass the Model
class and create our model architecture definition.
The Keras Subclassing API is the third way proposed by Keras to build deep neural network models.
This API is fully customizable, but this flexibility also brings complexity! So, hold on to your hats, it's harder to use than the Sequential or Functional API.
But you're probably wondering why we need this API if it's so hard to use. Some model architectures and some custom layers can be extremely challenging. Some researchers and some developers hope to have full control of their models and the way to train them. The Subclassing API provides these features. Let's go into the details.
Here, we will cover the main ways of creating a custom layer and a custom model using the Keras Subclassing API.
To start, we load TensorFlow, as follows:
import tensorflow as tf
from tensorflow import keras
We are ready to proceed with an explanation of how to do it.
Let's start by creating our layer.
As explained in the Understanding Keras layers section, Keras provides various built-in layers such as dense, convolutional, recurrent, and normalization layers through its layered API.
All layers are subclasses of the Layer
class and implement these methods:
build
method, which defines the weights of the layer.call
method, which specifies the transformation from inputs to outputs done by the layer.compute_output_shape
method, if the layer modifies the shape of its input. This allows Keras to perform automatic shape inference.get_config
and from_config
methods, if the layer is serialized and deserialized.class MyCustomDense(tf.keras.layers.Layer):
# Initialize this class with the number of units
def __init__(self, units):
super(MyCustomDense, self).__init__()
self.units = units
# Define the weights and the bias
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)
# Applying this layer transformation to the input tensor
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
# Function to retrieve the configuration
def get_config(self):
return {'units': self.units}
MyCustomDense
layer created in the previous step:
# Create an input layer
inputs = keras.Input((12,4))
# Add an instance of MyCustomeDense layer
outputs = MyCustomDense(2)(inputs)
# Create a model
model = keras.Model(inputs, outputs)
# Get the model config
config = model.get_config()
new_model = keras.Model.from_config(config,
custom_objects={'MyCustomDense': MyCustomDense})
In this recipe, we have created our Layer
class. Now, we'll create our model.
By subclassing the tf.keras.Model
class, we can build a fully customizable model.
We define our layers in the __init__
method, and we can have full, complete control over the forward pass of the model by implementing the call
method. The training
Boolean argument can be used to specify different behavior during the training or inference phase.
mnist = tf.keras.datasets.mnist
(X_mnist_train, y_mnist_train), (X_mnist_test, y_mnist_test) = mnist.load_data()
train_mnist_features = X_mnist_train/255
test_mnist_features = X_mnist_test/255
Model
for recognizing MNIST data:
class MyMNISTModel(tf.keras.Model):
def __init__(self, num_classes):
super(MyMNISTModel, self).__init__(name='my_mnist_model')
self.num_classes = num_classes
self.flatten_1 = tf.keras.layers.Flatten()
self.dropout = tf.keras.layers.Dropout(0.1)
self.dense_1 = tf.keras.layers.Dense(50, activation='relu')
self.dense_2 = tf.keras.layers.Dense(10, activation='softmax')
def call(self, inputs, training=False):
x = self.flatten_1(inputs)
# Apply dropout only during the training phase
x = self.dense_1(x)
if training:
x = self.dropout(x, training=training)
return self.dense_2(x)
my_mnist_model = MyMNISTModel(10)
# Compile
my_mnist_model.compile(optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train
my_mnist_model.fit(train_features, y_train,
validation_data=(test_features, y_test),
epochs=10)
The Subclassing API is a way for deep learning practitioners to build their layers or models using object-oriented Keras design principles. We recommend using this API only if your model cannot be achieved using the Sequential or the Functional API. Although this way can be complicated to implement, it remains useful in a few cases, and it is interesting for all developers and researchers to know how layers and models are implemented in Keras.
For some references on the Keras Subclassing API, see the following tutorials, papers, and articles:
The Keras Preprocessing API gathers modules for data processing and data augmentation. This API provides utilities for working with sequence, text, and image data. Data preprocessing is an essential step in machine learning and deep learning. It converts, transforms, or encodes raw data into an understandable, useful, and efficient format for learning algorithms.
This recipe will cover some preprocessing methods provided by Keras for sequence, text, and image data.
As usual, we just need to import TensorFlow as follows:
import tensorflow as tf
from tensorflow import keras
import numpy as np
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator, pad_sequences, skipgrams, make_sampling_table
from tensorflow.keras.preprocessing.text import text_to_word_sequence, one_hot, hashing_trick, Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
We are ready to proceed with an explanation of how to do it.
Let's start with the sequence data.
Sequence data is data where the order matters, such as text or a time series. So, a time series is defined by a series of data points ordered by time.
Keras provides utilities for preprocessing sequence data such as time series data. It takes in consecutive data points and applies transformations using time series parameters such as stride, length of history, etc., to return a TensorFlow dataset instance.
series = np.array([i for i in range(10)])
print(series)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
length
argument set to 5. This argument specifies the length of the output sequences in a number of timesteps:
generator = TimeseriesGenerator(data = series,
targets = series,
length=5,
batch_size=1,
shuffle=False,
reverse=False)
# number of samples
print('Samples: %d' % len(generator))
for i in range(len(generator)):
x, y = generator[i]
print('%s => %s' % (x, y))
[[0 1 2 3 4]] => [5]
[[1 2 3 4 5]] => [6]
[[2 3 4 5 6]] => [7]
[[3 4 5 6 7]] => [8]
[[4 5 6 7 8]] => [9]
model = Sequential()
model.add(Dense(10, activation='relu', input_dim=5))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(generator, epochs=10)
Preparing time series data for modeling with deep learning methods can be very challenging. But fortunately, Keras provides a generator that will help us transform a univariate or multivariate time series dataset into a data structure ready to train models. This generator offers many options to prepare the data, such as the shuffle, the sampling rate, the start and end offsets, etc. We recommend consulting the official Keras API to get more details.
Now, we'll focus on how to prepare data for variable-length input sequences.
When processing sequence data, each sample often has different lengths. In order for all the sequences to fit the desired length, the solution is to pad them. Sequences shorter than the defined sequence length are padded with values at the end (by default) or the beginning of each sequence. Otherwise, if the sequence is greater than the desired length, the sequence is truncated.
sentences = [["What", "do", "you", "like", "?"],
["I", "like", "basket-ball", "!"],
["And", "you", "?"],
["I", "like", "coconut", "and", "apple"]]
text_set = set(np.concatenate(sentences))
vocab_to_int = dict(zip(text_set, range(len(text_set))))
int_to_vocab = {vocab_to_int[word]:word for word in vocab_to_int.keys()}
encoded_sentences = []
for sentence in sentences:
encoded_sentence = [vocab_to_int[word] for word in sentence]
encoded_sentences.append(encoded_sentence)
encoded_sentences
[[8, 4, 7, 6, 0], [5, 6, 2, 3], [10, 7, 0], [5, 6, 1, 9, 11]]
pad_sequences
function to truncate and pad sequences to a common length easily. The pre-sequence padding is activated by default.
pad_sequences(encoded_sentences)
array([[ 8, 4, 7, 6, 0],
[ 0, 5, 6, 2, 3],
[ 0, 0, 10, 7, 0],
[ 5, 6, 1, 9, 11]], dtype=int32)
maxlen
argument to the desired length – here, 7.
pad_sequences(encoded_sentences, maxlen = 7)
array([[ 0, 0, 8, 4, 7, 6, 0],
[ 0, 0, 0, 5, 6, 2, 3],
[ 0, 0, 0, 0, 10, 7, 0],
[ 0, 0, 5, 6, 1, 9, 11]], dtype=int32)
pad_sequences(encoded_sentences, maxlen = 3)
array([[ 7, 6, 0],
[ 6, 2, 3],
[10, 7, 0],
[ 1, 9, 11]], dtype=int32)
post
to remove timesteps from the end of each sequence.
pad_sequences(encoded_sentences, maxlen = 3, truncating='post')
array([[ 8, 4, 7],
[ 5, 6, 2],
[10, 7, 0],
[ 5, 6, 1]], dtype=int32)
Padding is very useful when we want all sequences in a list to have the same length.
In the next section, we will cover a very popular technique for preprocessing text.
Skip-grams is one of the unsupervised learning techniques in natural language processing. It finds the most related words for a given word and predicts the context word for this given word.
Keras provides the skipgrams
pre-processing function, which takes in an integer-encoded sequence of words and returns the relevance for each pair of words in the defined window. If the pair of words is relevant, the sample is positive, and the associated label is set to 1. Otherwise, the sample is considered negative, and the label is set to 0.
An example is better than thousands of words. So, let's take this sentence, "I like coconut and apple,"
select the first word as our "context word," and use a window size of two. We make pairs of the context word "I" with the word covered in the specified window. So, we have two pairs of words (I, like)
and (I, coconut)
, both of which equal 1
.
Let's put the theory into action:
sentence = "I like coconut and apple"
encoded_sentence = [vocab_to_int[word] for word in sentence.split()]
vocabulary_size = len(encoded_sentence)
skipgrams
function with a window size of 1:
pairs, labels = skipgrams(encoded_sentence,
vocabulary_size,
window_size=1,
negative_samples=0)
for i in range(len(pairs)):
print("({:s} , {:s} ) -> {:d}".format(
int_to_vocab[pairs[i][0]],
int_to_vocab[pairs[i][1]],
labels[i]))
(coconut , and ) -> 1
(apple , ! ) -> 0
(and , coconut ) -> 1
(apple , and ) -> 1
(coconut , do ) -> 0
(like , I ) -> 1
(and , apple ) -> 1
(like , coconut ) -> 1
(coconut , do ) -> 0
(I , like ) -> 1
(coconut , like ) -> 1
(and , do ) -> 0
(like , coconut ) -> 0
(I , ! ) -> 0
(like , ! ) -> 0
(and , coconut ) -> 0
Note that the non-word is defined by index 0 in the vocabulary and will be skipped. We recommend that readers consult the Keras API to find more details about padding.
Now, let's introduce some tips to preprocess text data.
In deep learning, we cannot feed raw text directly into our network. We have to encode our text as numbers and provide integers as input. Our model will generate integers as output. This module provides utilities for preprocessing text input.
Keras provides the text_to_word_sequence
method, which transforms a sequence into a list of words or tokens.
sentence = "I like coconut , I like apple"
text_to_word_sequence(sentence, lower=False)
['I', 'like', 'coconut', 'I', 'like', 'apple']
lower
argument to True
, and the text will be converted to lower case:
text_to_word_sequence(sentence, lower=True, filters=[])
['i', 'like', 'coconut', ',', 'i', 'like', 'apple']
Note that by default, the filter
argument filters out a list of characters such as punctuation. In our last code execution, we removed all the predefined filters.
Let's continue with a method to encode words or categorical features.
The Tokenizer
class is the utility class for text tokenization. It's the preferred approach for preparing text in deep learning.
This class takes as inputs:
sentences = [["What", "do", "you", "like", "?"],
["I", "like", "basket-ball", "!"],
["And", "you", "?"],
["I", "like", "coconut", "and", "apple"]]
Tokenizer
instance and fit it on the previous sentences:
# create the tokenizer
t = Tokenizer()
# fit the tokenizer on the documents
t.fit_on_texts(sentences)
print(t.word_counts)
OrderedDict([('what', 1), ('do', 1), ('you', 2), ('like', 3), ('?', 2), ('i', 2), ('basket-ball', 1), ('!', 1), ('and', 2), ('coconut', 1), ('apple', 1)])
print(t.document_count)
4
print(t.word_index)
{'like': 1, 'you': 2, '?': 3, 'i': 4, 'and': 5, 'what': 6, 'do': 7, 'basket-ball': 8, '!': 9, 'coconut': 10, 'apple': 11}
Tokenizer
.
print(t.word_docs)
defaultdict(<class 'int'>, {'do': 1, 'like': 3, 'what': 1, 'you': 2, '?': 2, '!': 1, 'basket-ball': 1, 'i': 2, 'and': 2, 'coconut': 1, 'apple': 1})
texts_to_matrix
function. This function provides four different document encoding schemes to compute the coefficient for each token.Let's start with the binary mode, which returns whether or not each token is present in the document.
t.texts_to_matrix(sentences, mode='binary')
[[0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 0. 0.]
[0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0.]
[0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1. 1.]]
Tokenizer
API offers another mode based on word count – it returns the count of each word in the document:
t.texts_to_matrix(sentences, mode='count')
[[0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 0. 0.]
[0. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 0.]
[0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1. 1.]]
Note that we can also use the tfidf
mode or the frequency mode. The first returns the term frequency-inverse document frequency score for each word, and the second returns the frequency of each word in the document related to the total number of words in the document.
The Tokenizer
API can fit the training dataset and encode text data in the training, validation, and test datasets.
In this section, we have covered a few techniques to prepare text data before training and prediction.
Now, let's go on to prepare and augment images.
The data preprocessing module provides a set of tools for real-time data augmentation on image data.
In deep learning, the performance of a neural network is often improved by the number of examples available in the training dataset.
The ImageDataGenerator
class in the Keras preprocessing API allows the creation of new data from the training dataset. It isn't applied to the validation or test dataset because it aims to expand the number of examples in the training datasets with plausible new images. This technique is called data augmentation. Beware not to confuse data preparation with data normalization or image resizing, which is applied to all data in interaction with the model. Data augmentation includes many transformations from the field of image manipulation, such as rotation, horizontal and vertical shift, horizontal and vertical flip, brightness, and much more.
The strategy may differ depending on the task to realize. For example, in the MNIST dataset, which contains images of handwritten digits, applying a horizontal flip doesn't make sense. Except for the figure 8, this transformation isn't appropriate.
While in the case of a baby picture, applying this kind of transformation makes sense because the image could have been taken from the left or right.
CIFAR10
dataset. We will start by downloading the CIFAR
dataset.
# Load CIFAR10 Dataset
(x_cifar10_train, y_cifar10_train), (x_cifar10_test, y_cifar10_test) = tf.keras.datasets.cifar10.load_data()
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=15,
width_shift_range=3,
height_shift_range=3,
horizontal_flip=True)
it= datagen.flow(x_cifar10_train, y_cifar10_train, batch_size = 32)
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu", input_shape=[32, 32, 3]),
tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu"),
tf.keras.layers.MaxPool2D(pool_size=2),
tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation="relu"),
tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation="relu"),
tf.keras.layers.MaxPool2D(pool_size=2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer=tf.keras.optimizers.SGD(lr=0.01),
metrics=["accuracy"])
fit
method. Take care to set the step_per_epoch
argument, which specifies the number of sample batches comprising an epoch.
history = model.fit(it, epochs=10,
steps_per_epoch=len(x_cifar10_train) / 32,
validation_data=(x_cifar10_test, y_cifar10_test))
With the image data generator, we have extended the size of our original dataset by creating new images. With more images, the training of a deep learning model can be improved.
The Keras Preprocessing API allows transforming, encoding, and augmenting data for neural networks. It makes it easier to work with sequence, text, and image data.
First, we introduced the Keras Sequence Preprocessing API. We used the time series generator to transform a univariate or multivariate time series dataset into a data structure ready to train models. Then, we focused on the data preparation for variable-length input sequences, aka padding. And we finished this first part with the skip-gram technique, which finds the most related words for a given word and predicts the context word for that given word.
Then, we covered the Keras Text Preprocessing API, which offers a complete turnkey solution to process natural language. We learned how to split text into words and tokenize the words using binary, word count, tfidf
, or frequency mode.
Finally, we focused on the Image Preprocessing API using the ImageDataGenerator
, which is a real advantage to increase the size of your training dataset and to work with images.
For some references on the Keras Preprocessing API, visit the following websites:
18.119.133.228