Mixture models

Mixture models naturally arise when the overall population is a combination of distinct sub-populations. A very familiar example is the distribution of heights in a given adult human population, which can be described as a mixture of female and male sub-populations. Another classical example is the clustering of handwritten digits. In this case, it is very reasonable to expect 10 sub-populations, at least in a base 10 system! If we know to which sub-population each observation belongs, it is generally a good idea to use that information to model each sub-population as a separate group. However, when we do not have direct access to this information is when mixture models come in handy.

Many datasets cannot be properly described using a single probability distribution, but they can be described as a mixture of such distributions. Models that assume data comes from a mixture of distributions are known as mixture models.

When building a mixture model, it is not really necessary to believe we are describing true sub-populations in the data. Mixture models can also be used as a statistical trick to add flexibility to our toolbox. Take, for example the Gaussian distribution. We can use it as a reasonable approximation for many unimodal and more or less symmetrical distributions. But what about multimodal or skewed distributions? Can we use Gaussian distributions to model them? Yes, we can, if we use a mixture of Gaussians. In a Gaussian mixture model, each component will be a Gaussian with a different mean and generally (but not necessarily) a different standard deviation. By combining Gaussians, we can add flexibility to our model in order to fit complex data distributions. In fact, we can approximate any distribution we want by using a proper combination of Gaussians. The exact number of distributions will depend on the accuracy of the approximation and the details of the data. In fact, we have been applying this idea of a mixture of Gaussians in many of the plots throughout this book. The Kernel Density Estimation (KDE) technique is a non-Bayesian (and non-parametric) implementation of this idea. Conceptually, when we call az.plot_kde, the function is placing a Gaussian (with a fixed variance) on top of each data point and then it is summing all the individual Gaussians to approximate the empirical distribution of the data. Figure 6.1 shows an actual example of how we can mix eight Gaussians to represent a complex distribution, like a boa constrictor digesting an elephant (if you do not get the reference I strongly recommend that you get a copy of the book, The Little Prince). In Figure 6.1, all Gaussians have the same variance and they are centered at the orange dots, which are representing sample points from a possible unknown population. If you look carefully at Figure 6.1, you may notice that two of the Gaussians are basically one on top of the other:

Figure 6.1

Whether we really believe in sub-populations, or we use them for mathematical convenience (or even something in the middle), mixture models are a useful way of adding flexibility to our models by using a mixture of distributions to describe the data.

Table of Contents for Mixture models

Create new playlist

Sign In

Sign Up

Table of Contents for
Mixture models