Training

With all the code put together, training can be done in just a few lines, as shown in the following code:

glove_dir = os.path.join(BASE_DIR, 'glove.6B')
text_data_dir = os.path.join(BASE_DIR, '20_newsgroup')
embeddings_index = load_word_vectors(glove_dir)

data = load_data(text_data_dir, vocab_size=20000, sequence_length=1000)
data = tokenize_text(data)
data = train_val_test_split(data)
data["embedding_dim"] = 100
data["embedding_matrix"] = embedding_index_to_matrix(embeddings_index=embeddings_index,
                                                     vocab_size=data["vocab_size"],
                                                     embedding_dim=data["embedding_dim"],
                                                     word_index=data["tokenizer"].word_index)

callbacks = create_callbacks("newsgroups-pretrained")
model = build_model(vocab_size=data["vocab_size"],
                    embedding_dim=data['embedding_dim'],
                    sequence_length=data['sequence_length'],
                    embedding_matrix=data['embedding_matrix'])

model.fit(data["X_train"], data["y_train"],
          batch_size=128,
          epochs=10,
          validation_data=(data["X_val"], data["y_val"]),
          callbacks=callbacks)

Note that we're only training for 10 epochs, it doesn't really take long for us to minimize loss for this problem.

Table of Contents for Training

Create new playlist

Sign In

Sign Up

Table of Contents for
Training