Training

With all the code put together, training can be done in just a few lines, as shown in the following code:

glove_dir = os.path.join(BASE_DIR, 'glove.6B')
text_data_dir = os.path.join(BASE_DIR, '20_newsgroup')
embeddings_index = load_word_vectors(glove_dir)

data = load_data(text_data_dir, vocab_size=20000, sequence_length=1000)
data = tokenize_text(data)
data = train_val_test_split(data)
data["embedding_dim"] = 100
data["embedding_matrix"] = embedding_index_to_matrix(embeddings_index=embeddings_index,
vocab_size=data["vocab_size"],
embedding_dim=data["embedding_dim"],
word_index=data["tokenizer"].word_index)

callbacks = create_callbacks("newsgroups-pretrained")
model = build_model(vocab_size=data["vocab_size"],
embedding_dim=data['embedding_dim'],
sequence_length=data['sequence_length'],
embedding_matrix=data['embedding_matrix'])

model.fit(data["X_train"], data["y_train"],
batch_size=128,
epochs=10,
validation_data=(data["X_val"], data["y_val"]),
callbacks=callbacks)

Note that we're only training for 10 epochs, it doesn't really take long for us to minimize loss for this problem.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.245.167