Network architecture (for inference)

In order to predict an entire sequence given an input sequence, we need to rearrange our architecture just a little. I suspect in future versions of Keras this will be made simpler, but it's a necessary step as of today.

Why does it need to be different? Because we won't have the decoder_input_data teacher vector on inference. We're on our own now. So, we will have to set things up so that we don't require that vector.

Let's take a look at this inference architecture, and then step through the code:

encoder_model = Model(encoder_input, encoder_states)

decoder_state_input_h = Input(shape=(lstm_units,))
decoder_state_input_c = Input(shape=(lstm_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_input, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_input] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

First, we start off by building an encoder model. This model will take an input sequence and return the hidden states of the LSTM we trained in the previous model.

The decoder model then has two inputs, the h and c hidden states that condition its output, derived from the encoder model. Collectively, we call these decoder_states_inputs.

We can reuse decoder_lstm from above; however, this time we aren't going to discard the states, state_h and state_c. We're going to instead pass them as network outputs, along with the softmax prediction of the target.

Now, when we infer a new output sequence, we can get these states after the first character is predicted and pass them back into the LSTM with the softmax predictions so that the LSTM can predict another character. We will repeat that loop until the decoder generates a ' ' which signals we've reach the <EOS>.

We will look at the inference code shortly; for now, let's look at how we train and serialize this collection of models.

Table of Contents for Network architecture (for inference)

Create new playlist

Sign In

Sign Up

Table of Contents for
Network architecture (for inference)