Training network architecture

In this example, we're actually going to use two separate architectures, one for training and one for inference. We will use the trained layers from training in the inference model. While really we're using the same parts for each architecture, to make things more clear I will show each part separately. The following is the model we will use to train the network:

encoder_input = Input(shape=(None, num_encoder_tokens), name='encoder_input')
encoder_outputs, state_h, state_c = LSTM(lstm_units, return_state=True,
                                         name="encoder_lstm")(encoder_input)
encoder_states = [state_h, state_c]
decoder_input = Input(shape=(None, num_decoder_tokens), name='decoder_input')
decoder_lstm = LSTM(lstm_units, return_sequences=True, 
  return_state=True, name="decoder_lstm")
decoder_outputs, _, _ = decoder_lstm(decoder_input, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax',
  name='softmax_output')
decoder_output = decoder_dense(decoder_outputs)

model = Model([encoder_input, decoder_input], decoder_output)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

If we zoom into the encoder, we see a fairly standard LSTM. What's different is that we're getting the states from the encoder (return_state=True), which we don't typically do if we're connecting an LSTM to a dense layer. These states are what we will capture in encoder_states. We will use them to provide context to, or condition, the decoder.

On the decoder side, we're setting up decoder_lstm slightly different from how we have previously constructed a Keras layer, but it's really just slightly different syntax.

Have a look at the following code:

decoder_lstm = LSTM(lstm_units, return_sequences=True, 
   return_state=True, name="decoder_lstm")
decoder_outputs, _, _ = decoder_lstm(decoder_input, initial_state=encoder_states)

Its functionally the same as the following code:

decoder_outputs, _, _ = LSTM(lstm_units, return_sequences=True, 
  return_state=True, name="decoder_lstm")(decoder_input, initial_state=encoder_states)

The reason why I did this will become apparent in the inference architecture.

Please note that the the decoder takes the encoder's hidden states as its initial state. The decoder output is then passed to a softmax layer that predicts decoder_output_data.

Lastly, we will define our training model, which I will creatively call model, as one that takes encoder_input_data and decoder_input data as inputs and predicts decoder_output_data.

Table of Contents for Training network architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
Training network architecture