How to implement LDA using sklearn

Using the BBC data as before, we use sklearn.decomposition.LatentDirichletAllocation to train an LDA model with five topics (see the sklearn documentation for detail on parameters, and the notebook lda_with_sklearn for implementation details):

lda = LatentDirichletAllocation(n_components=5, 
n_jobs=-1,
max_iter=500,
learning_method='batch',
evaluate_every=5,
verbose=1,
random_state=42)
ldat.fit(train_dtm)
LatentDirichletAllocation(batch_size=128, doc_topic_prior=None,
evaluate_every=5, learning_decay=0.7, learning_method='batch',
learning_offset=10.0, max_doc_update_iter=100, max_iter=500,
mean_change_tol=0.001, n_components=5, n_jobs=-1,
n_topics=None, perp_tol=0.1, random_state=42,
topic_word_prior=None, total_samples=1000000.0, verbose=1)

The model tracks the in-sample perplexity during training and stops iterating once this measure stops improving. We can persist and load the result as usual with sklearn objects:

joblib.dump(lda, model_path / 'lda.pkl')
lda = joblib.load(model_path / 'lda.pkl')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.3.154