There's more...

In this section, we would like to mention a few interesting and useful characteristics of CNNs.

The stochastic aspect of training neural networks: The inherent aspect of training neural networks is their stochasticity – there is a random component to it. That is why we should not judge the skill of the model based on a single evaluation. Given that the network is feasible to train multiple times, we can do so and aggregate the performance evaluation metrics. Alternatively, we can set the random seed (as we did in the recipes of this chapter), which guarantees (to a given extent) that training the same network multiple times will result in the same weights. In the previous recipe, we used torch.manual_seed(42) to make the results reproducible. When using more complex networks with more dependencies, it is safer to take extra precautions. That is why we defined a custom function that tries to make sure that – given it is possible – the results will be reproducible.

The function is defined as follows:

def custom_set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)

Inside the function, we execute a series of operations, such as setting the seeds using multiple libraries and configuring some CUDA-related settings. However, there are cases when even such precautions do not help. As stated in the PyTorch documentation, there is currently no simple way to avoid non-determinism in some functions (certain forms of pooling, padding or sampling).

Multi-headed models: A popular approach to training more advanced CNNs is training a so-called multi-headed model. The underlying idea is to train different specifications of the convolutional parts of the network, concatenate them at the flattening stage, and feed the joined output to the fully connected layer(s).

The main benefit of using multi-headed models is the flexibility they provide, which (depending on the use case) can result in improved performance. We will illustrate them with an example. Imagine training a three-headed CNN network. Each head can apply a different number of kernels to the same input series. Additionally, the kernel size can be different. This way, each head extracts slightly different features from the input series and then uses all of them to make the ultimate prediction.

Alternatively, some heads can use a modified version of the input series. A possible example is applying some kind of smoothing to a volatile time series. Cui et al. (2016) present a potential architecture of a multi-headed model.

Multivariate input: In this recipe, we showed how to apply 1D CNN to a univariate time series. Naturally, it is possible to expand the architecture to a multivariate case, in which multiple univariate time series are used to predict the next value of a given series (it is also possible to predict the next value of multiple parallel series, but, for simplicity, we do not elaborate on that).

For a good understanding of using multivariate input for the CNNs, we start with 2D images, as this was the original application of the CNNs. A grayscale image can be represented as a matrix of numbers in the range of 0-255, where 0 represents black, 255 stands for white, and all the numbers in between are shades of gray. The common RGB representation of colored images consists of 3 components – red, green, and blue. In other words, we can represent any image using 3 matrices (of the same size), each one of them representing the intensity of a different RGB color.

That is why CNNs accept multi-channel input. A grayscale image has 1 channel, while a colored one has 3 channels. Using this approach, we can use the channels to store different time series used as features. We need to make sure that all of them are of the same size (same number of observations).

Multi-output CNN: Similar to the multilayer perceptron presented in the previous recipe, CNNs can also produce multi-step output. To do so, we need to appropriately define the last fully connected layer. Please refer to the There's more section of the previous recipe for more information.

Combining CNNs with RNNs: In the next recipe, we cover recurrent neural networks, which account for the time component in the time series and where the feature was located within the series. We have already mentioned that CNNs are translation invariant – they do not distinguish where the feature was located within the series, just that it was there. That is why CNNs and RNNs make a natural combination. CNNs offer the speed and feature-extracting capabilities, while RNNs cover the sensitivity of the network to the time component. A possible use case is when we consider series that are too long to be processed using RNNs (thousands of observations). We can use CNNs to downsample (shorten) the series by extracting the higher-level features and then feeding them as input to the RNN. This approach has been proven in the literature to perform better than using CNNs or RNNs alone.

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...