Variational methods

Most of modern Bayesian statistics is done using Markovian methods (see the next section), but for some problems those methods can be too slow. Variational methods are an alternative that could be a better choice for large datasets (think big data) and/or for posteriors that are too expensive to compute.

The general idea of variational methods is to approximate the posterior distribution with a simpler distribution, in a similar fashion to the Laplace method but in a more elaborate way. We can find this simpler distribution by solving an optimization problem consisting of finding the closest possible distribution to the posterior under some way of measuring closeness. A common way of measuring closeness between distributions is by using the Kullback-Leibler (KL) divergence (as discussed in Chapter 5, Model Comparison). Using the KL divergence we can write:

Where is the simpler distribution, we use to approximate the posterior, ; is often called the variational distribution, and by using an optimization method, we try to find out the parameters of (often called the variational parameters) that makes as close as possible, in terms of the KL divergence, to the posterior distribution. Notice that we wrote and not ; we do so because this leads to a more convenient way of expressing the problem and a better solution, although I should make it clear that writing the KL divergence in the other direction can also be useful and in fact leads to another set of methods that we will not discuss here.

The problem with expression 8.1 is that we do not know the posterior so we can not directly use it. We need to find an alternative way to express our problem. The following steps show how to do that. If you do not care about the intermediate steps, please jump to equation 8.7.

First we replace the conditional distribution with its definition (see Chapter 1, Thinking Probabilistically, if you do not remember how to do this):

Then we just reorder 8.2:

By the properties of the logarithm, we have this equation:

Reordering:

The integral of is 1 and we can move out of the integral, then we get:

And using the properties of the logarithm:

Since , then , or in other words, the evidence (or marginal likelihood) is always equal or larger than the ELBO, and that is the reason for its name. Since is a constant, we can just focus on the ELBO. Maximizing the value of the ELBO is equivalent to minimizing the KL divergence. Thus, maximizing the ELBO is a way to make as close as possible to the posterior, .

Notice that, so far, we have not introduced any approximation, we just have been doing some algebra. The approximation is introduced the moment we choose . In principle, can be anything we want, but in practice we should choose distributions that are easy to deal with. One solution is to assume that the high-dimensional posterior can be described by independent one-dimensional distributions; mathematically, this can be expressed as follows:

This is known as the mean-field approximation. Mean-field approximations are common in physics, where it is used to model complex systems with many interacting parts as a collection of simpler subsystems not interacting at all, or in general, where the interactions have been taken into account only on average.

We could choose a different distribution, , for each parameter, . Generally, the distributions are taken from the exponential family because they are easy to deal with. The exponential family includes many of the distributions we have used in this book, such as Normal, exponential, beta, Dirichlet, gamma, Poisson, categorical, and Bernoulli.

With all these elements in place, we have effectively turned an inference problem into an optimization problem; thus, at least conceptually, all we need to solve it is to use some off-the-shelf optimizer methods and maximize the ELBO. In practice, things are a little bit more complex, but we have covered the general idea.

Table of Contents for Variational methods

Create new playlist

Sign In

Sign Up

Table of Contents for
Variational methods