Finite mixture models

One way to build mixture models is to consider a finite weighted mixture of two or more distributions. This is known as a finite mixture model. Thus, the probability density of the observed data is a weighted sum of the probability density for subgroups of the data:

Here, is the weight of each component (or class). We can interpret as the probability of the component , thus its values are restricted to the interval [0, 1] and . The components can be virtually anything we may consider useful from simple distributions, such as a Gaussian or a Poisson, to more complex objects, such as hierarchical models or neural networks. For a finite mixture model, is a finite number (usually, but not necessary, a small number ). In order to fit a finite mixture model, we need to provide a value of , either because we really know the correct value beforehand, or because we have some educated guess.

Conceptually, to solve a mixture model, all we need to do is to properly assign each data point to one of the components. In a probabilistic model, we can do this by introducing a random variable , whose function is to specify to which component a particular observation is assigned. This variable is generally referred to as a latent variable. We call it latent because it is not directly observable.

Let's start building mixture models by using the chemical shifts data we already saw in Chapter 2, Programming Probabilistically:

cs = pd.read_csv('../data/chemical_shifts_theo_exp.csv')
cs_exp = cs['exp']
az.plot_kde(cs_exp)
plt.hist(cs_exp, density=True, bins=30, alpha=0.3)
plt.yticks([])

Figure 6.2

From Figure 6.2, we can see that this data cannot be properly described using a single Gaussian, but maybe three or four could do the trick. In fact, there are good theoretical reasons, that we will not discuss here, indicating that data really comes from a mixture of around 40 sub-populations, but with considerable overlap between them.

To develop the intuition behind mixture models, we can get ideas from the coin-flipping problem. For that model, we have two possible outcomes and we use the Bernoulli distribution to describe them. Since we did not know the probability of getting heads or tails, we use a beta distribution as a prior. A mixture model is similar, except that instead of two outcomes, like heads or tails, we now have outcomes (or -components). The generalization of the Bernoulli distribution to -outcomes is the categorical distribution and the generalization of the beta distribution is the Dirichlet distribution. So, let me introduce these two new distributions.

Table of Contents for Finite mixture models

Create new playlist

Sign In

Sign Up

Table of Contents for
Finite mixture models