The Expectation Maximization (EM) algorithm is a probabilistic-model-based clustering algorithm that depends on the mixture model in which the data is modeled by a mixture of simple models. The parameters related to these models are estimated by Maximum Likelihood Estimation (MLE).
Mixture models assume that the data is the result of the combination of various simple probabilistic distribution functions. Given K distribution functions and the jth distribution with the parameter, , is the set of of all distributions:
The EM algorithm performs in the following way. In the first step, an initial group of model parameters are selected. The expectation step is the second step that performs the calculation of the probability:
The previous equation represents the probability of each data object belonging to each distribution. Maximization is the third step. With the result of the expectation step, update the estimation of the parameters with the ones that maximize the expected likelihood:
The expectation step and maximization step are performed repeatedly until the output matches the ending condition, that is, the changes of the parameter estimations below a certain threshold.
Please take a look at the R codes file ch_06_em.R
from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:
> source("ch_06_em.R")
Determining the user search intent is an important yet a difficult issue with respect to the sparse data available concerning the searcher and queries.
User intent has a wide range of applications, cluster query refinements, user intention profiles, and web search intent induction. Given Web search engine queries, finding user intention is also a key and requirement.
To determine the user interest and preferences, the clicks' sequence upon the search result can be used as a good base.
Web search personalization is another important application with the user search intention. It is related to user context and intention. With the user intention applied, more effective and efficient information will be provided.
18.226.150.245