In the last chapter, I introduced deep learning for forecasting by covering the situations where deep learning is ideal and by outlining the three main types of deep learning models: single-step, multi-step, and multi-output. We then proceeded with data exploration and feature engineering to remove useless features and create new features that will help us forecast traffic volume. With that setup done, we are now ready to implement deep learning to forecast our target variable, which is the traffic volume.
In this chapter, we’ll build a reusable class that will create windows of data. This step is probably the most complicated and most useful topic in this part of the book on deep learning. Applying deep learning for forecasting relies on creating appropriate time windows and specifying the inputs and labels. Once that is done, you will see that implementing different models becomes incredibly easy, and this framework can be reused for different situations and datasets.
Once you know how to create windows of data, we’ll move on to implement baseline models, linear models, and deep neural networks. This will let us measure the performance of these models, and we can then move on to more complex architectures in the following chapters.
We’ll start off by creating the DataWindow
class, which will allow us to format the data appropriately to be fed to our deep learning models. We’ll also add a plotting method to this class so that we can visualize the predictions and the actual values.
Before diving into the code and building the DataWindow
class, however, it is important to understand why we must perform data windowing for deep learning. Deep learning models have a particular way of fitting on data, which we’ll explore in the next section. Then we’ll move on and implement the DataWindow
class.
In the first half of this book, we fit statistical models, such as SARIMAX, on training sets and made predictions. We were, in reality, fitting a set of predefined functions of a certain order (p,d,q)(P,D,Q)m, and finding out which order resulted in the best fit.
For deep learning models, we do not have a set of functions to try. Instead, we let the neural network derive its own function such that when it takes the inputs, it generates the best predictions possible. To achieve that, we perform what is called data windowing. This is a process in which we define a sequence of data points on our time series and define which are inputs and which are labels. That way, the deep learning model can fit on the inputs, generate predictions, compare them to the labels, and repeat this process until it cannot improve the accuracy of its predictions.
Let’s walk through an example of data windowing. Our data window will use 24 hours of data to predict the next 24 hours. You probably wonder why are we using just 24 hours of data to generate predictions. After all, deep learning is data hungry and is used for large datasets. The key lies in the data window. A single window has 24 timesteps as input to generate an output of 24 timesteps. However, the entire training set is separated into multiple windows, meaning that we have many windows with inputs and labels, as shown in figure 13.1.
In figure 13.1 you can see the first 400 timesteps of our training set for traffic volume. Each data window consists of 24 input timesteps and 24 label timesteps (as shown in figure 13.2), giving us a total length of 48 timesteps. We can generate many data windows with the training set, so we are, in fact, leveraging this large quantity of data.
As you can see in figure 13.2, the data window’s total length is the sum of the lengths of each sequence. In this case, since we have 24 timesteps as input and 24 labels, the total length of the data window is 48 timesteps.
You might think that we are wasting a lot of training data, since in figure 13.2 timesteps 24 to 47 are labels. Are those never going to be used as inputs? Of course, they will be. The DataWindow
class that we’ll implement in the next section generates data windows with inputs starting at t = 0. Then it will create another set of data windows, but this time starting at t = 1. Then it will start at t = 2. This goes on until it cannot have a sequence of 24 consecutive labels in the training set, as illustrated in figure 13.3.
To make computation more efficient, deep learning models are trained with batches. A batch is simply a collection of data windows that are fed to the model for training, as shown in figure 13.4.
Figure 13.4 shows an example of a batch with a batch size of 32. That means that 32 data windows are grouped together and used to train the model. Of course, this is only one batch—the DataWindow
class generates as many batches as possible with the given training set. In our case, we have a training set with 12,285 rows. If each batch has 32 data windows, that means that we will have 12285/32 = 384 batches.
Training the model on all 384 batches once is called one epoch. One epoch often does not result in an accurate model, so the model will train for as many epochs as necessary until it cannot improve the accuracy of its predictions.
The final important concept in data windowing for deep learning is shuffling. I mentioned in the very first chapter of this book that time series data cannot be shuffled. Time series data has an order, and that order must be kept, so why are we shuffling the data here?
In this context, shuffling occurs at the batch level, not inside the data window—the order of the time series itself is maintained within each data window. Each data window is independent of all others. Therefore, in a batch, we can shuffle the data windows and still keep the order of our time series, as shown in figure 13.5. Shuffling the data is not essential, but it is recommended as it tends to make more robust models.
Now that you understand the inner working of data windowing and how it is used for training deep learning models, let’s implement the DataWindow
class.
We are now ready to implement the DataWindow
class. This class has the advantage of being flexible, meaning that you can use it in a wide variety of scenarios to apply deep learning. The full code is available on GitHub: https://github.com/marcopeix/TimeSeriesForecastingInPython/tree/master/CH13%26CH14.
The class is based on the width of the input, the width of the label, and the shift. The width of the input is simply the number of timesteps that are fed into the model to make predictions. For example, given that we have hourly data in our dataset, if we feed the model with 24 hours of data to make a prediction, the input width is 24. If we feed only 12 hours of data, the input width is 12.
The label width is equivalent to the number of timesteps in the predictions. If we predict only one timestep, the label width is 1. If we predict a full day of data (with hourly data), the label width is 24.
Finally, the shift is the number of timesteps separating the input and the predictions. If we predict the next timestep, the shift is 1. If we predict the next 24 hours (with hourly data), the shift is 24.
Let’s visualize some windows of data to better understand these parameters. Figure 13.6 shows a window of data where the model predicts the next data point, given a single data point.
Now let’s consider the situation where we feed 24 hours of data to the model in order to predict the next 24 hours. The data window in that situation is shown in figure 13.7. Now that you understand the concept of input width, label width, and shift, we can create the DataWindow
class and define its initialization function in listing 13.1. The function will also take in the training, validation, and test sets, as the windows of data will come from our dataset. Finally, we’ll allow the target column to be specified.
Note that the following listing reuses code from the official TensorFlow documentation’s website (https://www.tensorflow.org/tutorials/structured_data/time_series). This method of creating windows of data is viewed by the community as the best and easiest way of predicting time series data with deep learning models. It is also the best way to extend the capabilities of TensorFlow’s native function timeseries_dataset_from_array
, such that we can apply deep learning models in any forecasting scenario.
The full implementation of the data windowing technique is shown in code listing 13.3. All code is reused within the terms of the Apache 2.0 License (https://www.apache.org/licenses/LICENSE-2.0), which you can consult in the GitHub repository (https://github.com/marcopeix/TimeSeriesForecastingInPython) for this book.
The examples that follow in the book build upon the code from the documentation, to make it more reusable in any scenario you might encounter outside of this book.
class DataWindow(): def __init__(self, input_width, label_width, shift, train_df=train_df, val_df=val_df, test_df=test_df, label_columns=None): self.train_df = train_df self.val_df = val_df self.test_df = test_df self.label_columns = label_columns ❶ if label_columns is not None: self.label_columns_indices = {name: i for i, name in ➥ enumerate(label_columns)} ❷ self.column_indices = {name: i for i, name in ➥ enumerate(train_df.columns)} ❸ self.input_width = input_width self.label_width = label_width self.shift = shift self.total_window_size = input_width + shift self.input_slice = slice(0, input_width) ❹ self.input_indices = ➥ np.arange(self.total_window_size)[self.input_slice] ❺ self.label_start = self.total_window_size - self.label_width ❻ self.labels_slice = slice(self.label_start, None) ❼ self.label_indices = ➥ np.arange(self.total_window_size)[self.labels_slice]
❶ Name of the column that we wish to predict
❷ Create a dictionary with the name and index of the label column. This will be used for plotting.
❸ Create a dictionary with the name and index of each column. This will be used to separate the features from the target variable.
❹ The slice function returns a slice object that specifies how to slice a sequence. In this case, it says that the input slice starts at 0 and ends when we reach the input_width.
❺ Assign indices to the inputs. These are useful for plotting.
❻ Get the index at which the label starts. In this case, it is the total window size minus the width of the label.
❼ The same steps that were applied for the inputs are applied for labels.
In listing 13.1 you can see that the initialization function basically assigns the variables and manages the indices of the inputs and the labels. Our next step is to split our window between inputs and labels, so that our models can make predictions based on the inputs and measure an error metric against the labels. The following split_to_ inputs_labels
function is defined within the DataWindow
class.
def split_to_inputs_labels(self, features): inputs = features[:, self.input_slice, :] ❶ labels = features[:, self.labels_slice, :] ❷ if self.label_columns is not None: ❸ labels = tf.stack( [labels[:,:,self.column_indices[name]] for name in ➥ self.label_columns], axis=-1 ) inputs.set_shape([None, self.input_width, None]) ❹ labels.set_shape([None, self.label_width, None]) return inputs, labels
❶ Slice the window to get the inputs using the input_slice defined in __init__.
❷ Slice the window to get the labels using the labels_slice defined in __init__.
❸ If we have more than one target, we stack the labels.
❹ The shape will be [batch, time, features]. At this point, we only specify the time dimension and allow the batch and feature dimensions to be defined later.
The split_to_inputs_labels
function will separate the big data window into two windows: one for the inputs and the other for the labels, as shown in figure 13.8.
Next we’ll define a function to plot the input data, the predictions, and the actual values (listing 13.2). Since we will be working with many time windows, we’ll show only the plot of three time windows, but this parameter can easily be changed. Also, the default label will be traffic volume, but we can change that by specifying any column we choose. Again, this function should be included in the DataWindow
class.
def plot(self, model=None, plot_col='traffic_volume', max_subplots=3): inputs, labels = self.sample_batch plt.figure(figsize=(12, 8)) plot_col_index = self.column_indices[plot_col] max_n = min(max_subplots, len(inputs)) for n in range(max_n): plt.subplot(3, 1, n+1) plt.ylabel(f'{plot_col} [scaled]') plt.plot(self.input_indices, inputs[n, :, plot_col_index], label='Inputs', marker='.', zorder=-10) ❶ if self.label_columns: label_col_index = self.label_columns_indices.get(plot_col, ➥ None) else: label_col_index = plot_col_index if label_col_index is None: continue plt.scatter(self.label_indices, labels[n, :, label_col_index], edgecolors='k', marker='s', label='Labels', ➥ c='green', s=64) ❷ if model is not None: predictions = model(inputs) plt.scatter(self.label_indices, predictions[n, :, ➥ label_col_index], marker='X', edgecolors='k', label='Predictions', c='red', s=64) ❸ if n == 0: plt.legend() plt.xlabel('Time (h)')
❶ Plot the inputs. They will appear as a continuous blue line with dots.
❷ Plot the labels or actual values. They will appear as green squares.
❸ Plot the predictions. They will appear as red crosses.
We are almost done building the DataWindow
class. The last main piece of logic will format our dataset into tensors so that they can be fed to our deep learning models. TensorFlow comes with a very handy function called timeseries_dataset_from_ array
, which creates a dataset of sliding windows, given an array.
def make_dataset(self, data): data = np.array(data, dtype=np.float32) ds = tf.keras.preprocessing.timeseries_dataset_from_array( data=data, ❶ targets=None, ❷ sequence_length=self.total_window_size, ❸ sequence_stride=1, ❹ shuffle=True, ❺ batch_size=32 ❻ ) ds = ds.map(self.split_to_inputs_labels) return ds
❶ Pass in the data. This corresponds to our training set, validation set, or test set.
❷ Targets are set to None, as they are handled by the split_to_input_labels function.
❸ Define the total length of the array, which is equal to the total window length.
❹ Define the number of timesteps separating each sequence. In our case, we want the sequences to be consecutive, so sequence_stride=1.
❺ Shuffle the sequences. Keep in mind that the data is still in chronological order. We are simply shuffling the order of the sequences, which makes the model more robust.
❻ Define the number of sequences in a single batch.
Remember that we are shuffling the sequences in a batch. This means that within each sequence, the data is in chronological order. However, in a batch of 32 sequences, we can and should shuffle them to make our model more robust and less prone to overfitting.
We’ll conclude our DataWindow
class by defining some properties to apply the make_dataset
function on the training, validation, and testing sets. We’ll also create a sample batch that we’ll cache within the class for plotting purposes.
@property def train(self): return self.make_dataset(self.train_df) @property def val(self): return self.make_dataset(self.val_df) @property def test(self): return self.make_dataset(self.test_df) @property def sample_batch(self): ❶ result = getattr(self, '_sample_batch', None) if result is None: result = next(iter(self.train)) self._sample_batch = result return result
❶ Get a sample batch of data for plotting purposes. If the sample batch does not exist, we’ll retrieve a sample batch and cache it.
Our DataWindow
class is now complete. The full class with all methods and properties is shown in listing 13.3.
class DataWindow(): def __init__(self, input_width, label_width, shift, train_df=train_df, val_df=val_df, test_df=test_df, label_columns=None): self.train_df = train_df self.val_df = val_df self.test_df = test_df self.label_columns = label_columns if label_columns is not None: self.label_columns_indices = {name: i for i, name in ➥ enumerate(label_columns)} self.column_indices = {name: i for i, name in ➥ enumerate(train_df.columns)} self.input_width = input_width self.label_width = label_width self.shift = shift self.total_window_size = input_width + shift self.input_slice = slice(0, input_width) self.input_indices = ➥ np.arange(self.total_window_size)[self.input_slice] self.label_start = self.total_window_size - self.label_width self.labels_slice = slice(self.label_start, None) self.label_indices = ➥ np.arange(self.total_window_size)[self.labels_slice] def split_to_inputs_labels(self, features): inputs = features[:, self.input_slice, :] labels = features[:, self.labels_slice, :] if self.label_columns is not None: labels = tf.stack( [labels[:,:,self.column_indices[name]] for name in ➥ self.label_columns], axis=-1 ) inputs.set_shape([None, self.input_width, None]) labels.set_shape([None, self.label_width, None]) return inputs, labels def plot(self, model=None, plot_col='traffic_volume', max_subplots=3): inputs, labels = self.sample_batch plt.figure(figsize=(12, 8)) plot_col_index = self.column_indices[plot_col] max_n = min(max_subplots, len(inputs)) for n in range(max_n): plt.subplot(3, 1, n+1) plt.ylabel(f'{plot_col} [scaled]') plt.plot(self.input_indices, inputs[n, :, plot_col_index], label='Inputs', marker='.', zorder=-10) if self.label_columns: label_col_index = self.label_columns_indices.get(plot_col, ➥ None) else: label_col_index = plot_col_index if label_col_index is None: continue plt.scatter(self.label_indices, labels[n, :, label_col_index], edgecolors='k', marker='s', label='Labels', ➥ c='green', s=64) if model is not None: predictions = model(inputs) plt.scatter(self.label_indices, predictions[n, :, ➥ label_col_index], marker='X', edgecolors='k', label='Predictions', c='red', s=64) if n == 0: plt.legend() plt.xlabel('Time (h)') def make_dataset(self, data): data = np.array(data, dtype=np.float32) ds = tf.keras.preprocessing.timeseries_dataset_from_array( data=data, targets=None, sequence_length=self.total_window_size, sequence_stride=1, shuffle=True, batch_size=32 ) ds = ds.map(self.split_to_inputs_labels) return ds @property def train(self): return self.make_dataset(self.train_df) @property def val(self): return self.make_dataset(self.val_df) @property def test(self): return self.make_dataset(self.test_df) @property def sample_batch(self): result = getattr(self, '_sample_batch', None) if result is None: result = next(iter(self.train)) self._sample_batch = result return result
For now, the DataWindow
class might seem a bit abstract, but we will soon use it to apply baseline models. We will be using this class in all the chapters in this deep learning part of the book, so you will gradually tame this code and appreciate how easy it is to test different deep learning architectures.
With the DataWindow
class complete, we are ready to use it. We will apply baseline models as single-step, multi-step, and multi-output models. You will see that their implementation is similar and incredibly simple when we have the right data windows.
Recall that a baseline is used as a benchmark to evaluate more complex models. A model is performant if it compares favorably to another, so building a baseline is an important step in modeling.
We’ll first implement a single-step model as a baseline. In a single-step model, the input is one timestep and the output is the prediction of the next timestep.
The first step is to generate a window of data. Since we are defining a single-step model, the input width is 1, the label width is 1, and the shift is also 1, since the model predicts the next timestep. Our target variable is the volume of traffic.
single_step_window = DataWindow(input_width=1, label_width=1, shift=1, ➥ label_columns=['traffic_volume'])
For plotting purposes, we’ll also define a wider window so we can visualize many predictions of our model. Otherwise, we could only visualize one input data point and one output prediction, which is not very interesting.
wide_window = DataWindow(input_width=24, label_width=24, shift=1, ➥ label_columns=['traffic_volume'])
In this situation, the simplest prediction we can make is the last observed value. Basically, the prediction is simply the input data point. This is implemented by the class Baseline
. As you can see in the following listing, the Baseline
class can also be used for a multi-output model. For now, we’ll solely focus on a single-step model.
class Baseline(Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): if self.label_index is None: ❶ return inputs elif isinstance(self.label_index, list): ❷ tensors = [] for index in self.label_index: result = inputs[:, :, index] result = result[:, :, tf.newaxis] tensors.append(result) return tf.concat(tensors, axis=-1) result = inputs[:, :, self.label_index] ❸ return result[:,:,tf.newaxis]
❶ If no target is specified, we return all columns. This is useful for multi-output models where all columns are to be predicted.
❷ If we specify a list of targets, it will return only the specified columns. Again, this is used for multi-output models.
❸ Return the input for a given target variable.
With the class defined, we can now initialize the model and compile it to generate predictions. To do so, we’ll find the index of our target column, traffic_volume, and pass it in to Baseline
. Note that TensorFlow requires us to provide a loss function and a metric of evaluation. In this case, and throughout the deep learning chapters, we’ll use the mean squared error (MSE) as a loss function—it penalizes large errors, and it generally yields well-fitted models. For the evaluation metric, we’ll use the mean absolute error (MAE) for its ease of interpretation.
column_indices = {name: i for i, name in enumerate(train_df.columns)} ❶ baseline_last = Baseline(label_index=column_indices['traffic_volume']) ❷ baseline_last.compile(loss=MeanSquaredError(), ➥ metrics=[MeanAbsoluteError()]) ❸
❶ Generate a dictionary with the name and index of each column in the training set.
❷ Pass the index of the target column in the Baseline class.
❸ Compile the model to generate the predictions.
We’ll now evaluate the performance of our baseline on both the validation and test sets. Models built with TensorFlow conveniently come with the evaluate
method, which allows us to compare the predictions to the actual values and calculate the error metric.
val_performance = {} ❶ performance = {} ❷ val_performance['Baseline - Last'] = ➥ baseline_last.evaluate(single_step_window.val) ❸ performance['Baseline - Last'] = ➥ baseline_last.evaluate(single_step_window.test, verbose=0) ❹
❶ Create a dictionary to hold the MAE of a model on the validation set.
❷ Create a dictionary to hold the MAE of a model on the test set.
❸ Store the MAE of the baseline on the validation set.
❹ Store the MAE of the baseline on the test set.
Great, we have successfully built a baseline that predicts the last known value and evaluated it. We can visualize the predictions using the plot
method of the DataWindow
class. Remember to use the wide_window
to see more than just two data points.
In figure 13.9 the labels are squares and the predictions are crosses. The crosses at each timestep are simply the last known value, meaning that we have a baseline that functions as expected. Your plot may differ from figure 13.9, as the cached sample batch changes every time a data window is initialized.
We can optionally print the MAE of our baseline on the test set.
This returns an MAE of 0.081. More complex models should perform better than the baseline, resulting in a smaller MAE.
In the previous section, we built a single-step baseline model that simply predicted the last known value. For multi-step models, we’ll predict more than one timestep into the future. In this case, we’ll forecast the traffic volume for the next 24 hours of data given an input of 24 hours.
Again, the first step is to generate the appropriate window of data. Because we wish to predict 24 timesteps into the future with an input of 24 hours, the input width is 24, the label width is 24, and the shift is also 24.
multi_window = DataWindow(input_width=24, label_width=24, shift=24, ➥ label_columns=['traffic_volume'])
With the data window generated, we can now focus on implementing the baseline models. In this situation, there are two reasonable baselines:
With that in mind, let’s implement the first baseline, where we’ll simply repeat the last known value over the next 24 timesteps.
Predicting the last known value
To predict the last known value, we’ll define a MultiStepLastBaseline
class that simply takes in the input and repeats the last value of the input sequence over 24 timesteps. This acts as the prediction of the model.
class MultiStepLastBaseline(Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): if self.label_index is None: return tf.tile(inputs[:, -1:, :], [1, 24, 1]) ❶ return tf.tile(inputs[:, -1:, self.label_index:], [1, 24, 1]) ❷
❶ If no target is specified, return the last known value of all columns over the next 24 timesteps.
❷ Return the last known value of the target column over the next 24 timesteps.
Next we’ll initialize the class and specify the target column. We’ll then repeat the same steps as in the previous section, compiling the model and evaluating it on the validation set and test set.
ms_baseline_last = ➥ MultiStepLastBaseline(label_index=column_indices['traffic_volume']) ms_baseline_last.compile(loss=MeanSquaredError(), ➥ metrics=[MeanAbsoluteError()]) ms_val_performance = {} ms_performance = {} ms_val_performance['Baseline - Last'] = ➥ ms_baseline_last.evaluate(multi_window.val) ms_performance['Baseline - Last'] = ➥ ms_baseline_last.evaluate(multi_window.test, verbose=0)
We can now visualize the predictions using the plot
method of DataWindow
. The result is shown in figure 13.10.
Again, we can optionally print the baseline’s MAE. From figure 13.10, we can expect it to be fairly high, since there is a large discrepancy between the labels and the predictions.
This gives an MAE of 0.347. Now let’s see if we can build a better baseline by simply repeating the input sequence.
Let’s implement a second baseline for multi-step models, which simply returns the input sequence. This means that the prediction for the next 24 hours will simply be the last known 24 hours of data. This is implemented through the RepeatBaseline
class.
class RepeatBaseline(Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): return inputs[:, :, self.label_index:] ❶
❶ Return the input sequence for the given target column.
Now we can initialize the baseline model and generate predictions. Note that the loss function and evaluation metric remain the same.
ms_baseline_repeat = ➥ RepeatBaseline(label_index=column_indices['traffic_volume']) ms_baseline_repeat.compile(loss=MeanSquaredError(), ➥ metrics=[MeanAbsoluteError()]) ms_val_performance['Baseline - Repeat'] = ➥ ms_baseline_repeat.evaluate(multi_window.val) ms_performance['Baseline - Repeat'] = ➥ ms_baseline_repeat.evaluate(multi_window.test, verbose=0)
Next we can visualize the predictions. The result is shown in figure 13.11.
This baseline performs well. This is to be expected, since we identified daily seasonality in the previous chapter. This baseline is the equivalent to predicting the last known season.
Again, we can print the MAE on the test set to verify that we indeed have a better baseline than simply predicting the last known value.
This gives an MAE of 0.341, which is lower than the MAE obtained by predicting the last known value. We have therefore successfully built a better baseline.
The final type of model we’ll cover is the multi-output model. In this situation, we wish to predict the traffic volume and the temperature for the next timestep using a single input data point. Essentially, we’re applying the single-step model on both the traffic volume and temperature, making it a multi-output model.
Again, we’ll start off by defining the window of data, but here we’ll define two windows: one for training and the other for visualization. Since the model takes in one data point and outputs one prediction, we want to initialize a wide window of data to visualize many predictions over many timesteps.
mo_single_step_window = DataWindow(input_width=1, label_width=1, shift=1, ➥ label_columns=['temp','traffic_volume']) ❶ mo_wide_window = DataWindow(input_width=24, label_width=24, shift=1, ➥ label_columns=['temp','traffic_volume'])
❶ Notice that we pass in both temp and traffic_volume, as those are our two targets for the multi-output model.
Then we’ll use the Baseline
class that we defined for the single-step model. Recall that this class can output the last known value for a list of targets.
class Baseline(Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): if self.label_index is None: ❶ return inputs elif isinstance(self.label_index, list): ❷ tensors = [] for index in self.label_index: result = inputs[:, :, index] result = result[:, :, tf.newaxis] tensors.append(result) return tf.concat(tensors, axis=-1) result = inputs[:, :, self.label_index] ❸ return result[:,:,tf.newaxis]
❶ If no target is specified, we return all columns. This is useful for multi-output models where all columns are to be predicted.
❷ If we specify a list of targets, it will return only these specified columns. Again, this is used for multi-output models.
❸ Return the input for a given target variable.
In the case of the multi-output model, we must simply pass the indexes of the temp and traffic_volume columns to output the last known value for the respective variables as a prediction.
print(column_indices['traffic_volume']) ❶ print(column_indices['temp']) ❷ mo_baseline_last = Baseline(label_index=[0, 2])
With the baseline initialized with our two target variables, we can now compile the model and evaluate it.
mo_val_performance = {} mo_performance = {} mo_val_performance['Baseline - Last'] = ➥ mo_baseline_last.evaluate(mo_wide_window.val) mo_performance['Baseline - Last'] = ➥ mo_baseline_last.evaluate(mo_wide_window.test, verbose=0)
Finally, we can visualize the predictions against the actual values. By default, our plot
method will show the traffic volume on the y-axis, allowing us to quickly display one of our targets, as shown in figure 13.12.
Figure 13.12 does not show anything surprising, as we already saw these results when we built a single-step baseline model. The particularity of the multi-output model is that we also have predictions for the temperature. Of course, we can also visualize the predictions for the temperature by specifying the target in the plot
method. The result is shown in figure 13.13.
Again, we can print the MAE of our baseline model.
We obtain an MAE of 0.047 on the test set. In the next chapter, we’ll start building more complex models, and they should result in a lower MAE, as they will be trained to fit the data.
In this chapter, we covered the crucial step of creating data windows, which will allow us to quickly build any type of model. We then proceeded to build baseline models for each type of model, so that we have benchmarks we can compare to when we build our more complex models in later chapters.
Of course, building baseline models is not an application of deep learning just yet. In the next chapter, we will implement linear models and deep neural networks, and see if those models are already more performant than the simple baselines.
In the previous chapter, as an exercise, we prepared the air pollution dataset for deep learning modeling. Now we’ll use the training set, validation set, and test set to build baseline models and evaluate them.
For each type of model, follow the steps outlined. Recall that the target for the single-step and multi-step model is the concentration of NO2, and the targets for the multi-output model are the concentration of NO2 and temperature. The complete solution is available on GitHub: https://github.com/marcopeix/TimeSeriesForecastingInPython/tree/master/CH13%26CH14.
Data windowing is essential in deep learning to format the data as inputs and labels for the model.
The DataWindow
class can easily be used in any situation and can be extended to your liking. Make use of it in your own projects.
Deep learning models require a loss function and an evaluation metric. In our case, we chose the mean squared error (MSE) as the loss function, because it penalizes large errors and tends to yield better-fit models. The evaluation metric is the mean absolute error (MAE), chosen for its ease of interpretation.
18.226.222.89