Visualizing document embeddings

In our document CNN model, we have the document embedding layer. Let's try to visualize what features the model has learned at this layer. We will first take the test set and calculate the document embeddings as follows:

doc_embeddings = newsgrp_model.get_document_model().predict(x_test)
print(doc_embeddings.shape)

(7318, 80)

We get 80 dimensional embedding vectors for all the test documents. To visualize these vectors, we will use the popular t-SNE dimentionality reduction technique to project the vectors in two-dimensionality space and graph a scatter plot as follows:

from utils import scatter_plot

doc_proj = TSNE(n_components=2, random_state=42,
).fit_transform(doc_embeddings)
f, ax, sc, txts = scatter_plot(doc_proj, np.array(test_labels))

The output of the preceding code is as follows:

The labels (0-5) on the scatter represent the six classes. As you can see, the model has learned decent embeddings and is able to separate the six classes well in the 80-dimensional space. We can use these embeddings for other text analytics tasks, such as information retrieval or text search. Given a query document, we can compute its dense embedding and then compare that with similar embeddings in the entire corpus. This can help us boost the keyword based query results and improve retrieval performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.47.59