Model-based clustering algorithms

Model-based algorithms assume that there is a particular mathematical model of the cluster in the data space and try to maximize the likelihood of this model and the available data. Often, this uses the apparatus of mathematical statistics.

The EM (Expectation–Maximization) algorithm assumes that the dataset can be modeled using a linear combination of multidimensional normal distributions. Its purpose is to estimate distribution parameters that maximize the likelihood function used as a measure of model quality. In other words, it assumes that the data in each cluster obeys a particular distribution law—namely, the normal distribution. With this assumption, it is possible to determine the optimal parameters of the distribution law—the mean and variance at which the likelihood function is maximal. Thus, we assume that any object belongs to all clusters, but with a different probability. Then, the task will be to fit the set of distributions to the data and then to determine the probabilities of the object belonging to each cluster. The object should be assigned to the cluster for which this probability is higher than the others. 

The EM algorithm is simple and easy to implement. It is not sensitive to isolated objects and quickly converges in the case of successful initialization. However, it requires us to specify the k number of clusters, which implies a priori knowledge about the data. Also, if the initialization failed, the convergence of the algorithm may be slow, or we might obtain a poor-quality result. Such algorithms do not apply to high dimensionality spaces since, in this case, it is complicated to assume a mathematical model for the distribution of data in this space.

In this section, we discussed various clustering algorithms, and in the following sections, we will see how to use them in real examples with various C++ libraries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.139.169