The generative model

The Dirichlet distribution figures prominently in the LDA topic model, which assumes the following generative process when an author adds an article to a body of documents:

  1. Randomly mix a small subset of shared topics K according to the topic probabilities
  2. For each word, select one of the topics according to the document-topic probabilities
  3. Select a word from the topic's word list according to the topic-word probabilities

As a result, the article content depends on the weights of each topic and on the terms that make up each topic. The Dirichlet distribution governs the selection of topics for documents and words for topics and encodes the idea that a document only covers a few topics, while each topic uses only a small number of words frequently.

The plate notation for the LDA model summarizes these relationships:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.232.179.191