LSTM layer

I'm only going to use one LSTM layer here, with just 10 neurons, as shown in the following code:

lstm1 = LSTM(10, activation='tanh', return_sequences=False,
             dropout=0.2, recurrent_dropout=0.2, name='lstm1')(embedding)

Why am I using such a small LSTM layer? As as you're about to see, this model is going to struggle with overfitting. Even just 10 LSTM units are able to learn the training data a little too well. The answer to this problem is likely to add data, but we really can't, so keeping the network structure simple is a good idea.

That leads us to the use of dropout. I will use both dropout and recurrent dropout on this layer. We haven't talked about recurrent dropout yet so let's cover that now. Normal dropout, applied on an LSTM layer in this way, will randomly mask inputs to the LSTM. Recurrent dropout randomly turns on and off memory between the unrolled cells in an LSTM unit/neuron. As always, dropout is a hyperparameter and you'll need to search for an optimal value.

Because our inputs are document based, and because there isn't any context, we need to remember between documents that this is a great time to use a stateless LSTM.

Table of Contents for LSTM layer

Create new playlist

Sign In

Sign Up

Table of Contents for
LSTM layer