With all the code put together, training can be done in just a few lines, as shown in the following code:
glove_dir = os.path.join(BASE_DIR, 'glove.6B')
text_data_dir = os.path.join(BASE_DIR, '20_newsgroup')
embeddings_index = load_word_vectors(glove_dir)
data = load_data(text_data_dir, vocab_size=20000, sequence_length=1000)
data = tokenize_text(data)
data = train_val_test_split(data)
data["embedding_dim"] = 100
data["embedding_matrix"] = embedding_index_to_matrix(embeddings_index=embeddings_index,
vocab_size=data["vocab_size"],
embedding_dim=data["embedding_dim"],
word_index=data["tokenizer"].word_index)
callbacks = create_callbacks("newsgroups-pretrained")
model = build_model(vocab_size=data["vocab_size"],
embedding_dim=data['embedding_dim'],
sequence_length=data['sequence_length'],
embedding_matrix=data['embedding_matrix'])
model.fit(data["X_train"], data["y_train"],
batch_size=128,
epochs=10,
validation_data=(data["X_val"], data["y_val"]),
callbacks=callbacks)
Note that we're only training for 10 epochs, it doesn't really take long for us to minimize loss for this problem.