Reverse-engineering the process

The generative process is fictional but turns out to be useful because it permits the recovery of the various distributions. The LDA algorithm reverse-engineers the work of the imaginary author and arrives at a summary of the document-topic-word relationships that concisely describes the following:

  • The percentage contribution of each topic to a document
  • The probabilistic association of each word with a topic

LDA solves the Bayesian inference problem of recovering the distributions from the body of documents and the words they contain by reverse-engineering the assumed content generation process. The original paper uses variational Bayes (VB) to approximate the posterior distribution. Alternatives include Gibbs sampling and expectation propagation. Later, we will illustrate implementations using the sklearn and gensim libraries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.90.242.249