Training

We're finally ready to train our sequence-to-sequence network. The following code makes calls to all our data loading functions first, creates our callbacks, and then fits the model:

data = load_data()
data = one_hot_vectorize(data)
callbacks = create_callbacks("char_s2s")
model, encoder_model, decoder_model = build_models(256, data['num_encoder_tokens'], data['num_decoder_tokens'])
print(model.summary())

model.fit(x=[data["encoder_input_data"], data["decoder_input_data"]],
          y=data["decoder_target_data"],
          batch_size=64,
          epochs=100,
          validation_split=0.2,
          callbacks=callbacks)

model.save('char_s2s_train.h5')
encoder_model.save('char_s2s_encoder.h5')
decoder_model.save('char_s2s_decoder.h5')

You'll note that I previously haven't defined a validation or test set like we normally do. This time, following the example set forth in the blog post, I'll let Keras randomly choose 20% of the data as validation, which works perfectly fine in an example. If you're going to use this code to actually do machine translation, please use a separate test set.

After the training model is fit, I'm going to save all three models and load them again in a separate program built for inference. I'm doing this to keep the code somewhat clean because the inference code is quite complex in itself.

Lets take a look at 100 epochs of model training for this model:

As you can see, we start to overfit somewhere around epoch 20. While loss continues to decrease, val_loss is increasing. Model check pointing is probably going to work less than well in this scenario, since we won't be serializing the inference model until after training is over. So, ideally, we should train one more time, setting the number of epochs we train for to just slightly more than the smallest value observed in TensorBoard.

Table of Contents for Training

Create new playlist

Sign In

Sign Up

Table of Contents for
Training