Summary

In this chapter, we learned the basic principles of RNNs. This type of neural network is commonly used in sequence analysis. The main differences between the feedforward neural network types are the existence of a recurrent link; the fact it is shared across timestep's weights; its ability to save some internal state in memory; and the fact it has a forward and backward data flow (bidirectional networks).

We became familiar with different types of RNNs and saw that the simplest one has problems with vanishing and exploding gradients, while the more advanced architectures can successfully deal with these problems. We learned the basics of the LSTM architecture, which is based on the hidden state, cell state, and three types of gates (filters), which control what information to use from the previous timestep, what information to forget, and what portion of information to pass to the next timestep.

Then, we looked at the GRU, which is simpler than LSTM and has only one hidden state and two gates. We also looked at the bidirectional RNN architecture and saw how it can be used to process input sequences backward. However, we saw that this type of architecture makes the network twice as large sometimes. We also learned how to use multiple layers in an RNN to process the hidden state from the bottom layers in upper levels, and that such an approach can significantly improve network performance.

Next, we learned that we need a modified backpropagation algorithm called BPTT to train RNNs. This algorithm assumes that the RNN is unfolded to the feedforward network with a number of layers equal to the timesteps (sequence length). Also, BPTT shares the same weights for all layers, and the gradient is accumulated before the weights are updated. Then, we talked about the computational complexity of this algorithm and that the TBPTT algorithm's modification is more suitable in practice. The TBPTT algorithm uses a limited number of timesteps for unfolding and a backward pass.

Another theme we discussed was connected to natural language processing. This theme is word embedding. An embedding, in general, is a numerical vector associated with other type items (such as words), but the algebraic properties of this vector should reflect some innate nature of the original item. Embeddings are used to convert non-numeric concepts into numeric ones so that we can work with them. We looked at the Word2Vec algorithm for creating word embeddings based on local statistics, as well as the GloVe algorithm, which is based mostly on global statistics.

Finally, in the last part of this chapter, we developed an application so that we could perform a sentiment analysis of movie reviews. We implemented a bidirectional multilayered LSTM network with the PyTorch framework. We also made helper classes so that we could read the training and test datasets and pre-trained GloVe embeddings. Then, we implemented the full training and testing cycle and applied the optimization technique with packed sequences, which improved the model's computational complexity and made it ignore the noise (zero noise) from padded sequences.

In the next chapter, we will discuss how to save and load model parameters. We will also look at the different APIs that exist in ML libraries for this purpose. Saving and loading model parameters can be quite an important part of the training process because it allows us to stop and restore training at an arbitrary moment. Also, saved model parameters can be used for evaluation purposes after the model has been trained.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary