Hamiltonian Monte Carlo

MCMC methods, including Metropolis-Hastings, come with the theoretical guarantee that if we take enough samples, we will get an accurate approximation of the correct distribution. However, in practice, it could take more time than we have to get enough samples. For that reason, alternatives to the general Metropolis-Hastings algorithm have been proposed.

Many of those alternative methods, such as the Metropolis-Hastings algorithm itself, were developed originally to solve problems in statistical mechanics, a branch of physics that studies properties of atomic and molecular systems, and thus can be interpreted in a very natural way using analogies of physical systems. One such modification is known as Hamiltonian Monte Carlo, or Hybrid Monte Carlo (HMC). In simple terms, a Hamiltonian is a description of the total energy of a physical system. The name Hybrid is also used because is was originally conceived as a hybridization of molecular mechanics, a widely-used simulation technique for molecular systems, and Metropolis-Hastings.

The HMC method is essentially the same as Metropolis-Hastings except, and this is a very, very important, except the proposal of new positions is not random. To get a general conceptual understanding of HMC without going into mathematical details, let's use the lake and boat analogy again. Instead of moving the boat randomly, we do so by following the curvature of the bottom of the lake. To decide where to move the boat, we let a ball roll on to the bottom of the lake starting from our current position. Our ball is a very special one: not only is perfectly spherical, it also has no friction and thus is not slowed down by the water or mud. We throw the ball and let it roll for a short moment until we suddenly stop it. Then we accept or reject this proposed step using the Metropolis criteria, just as we did with the vanilla Metropolis-Hastings method, and the whole procedure is repeated many times. This modified procedure has a higher chance of accepting new positions, even if they are far away relative to the previous position.

Moving according to the curvature of the parameter space turns out to be a smarter way of moving because it avoids one of the main drawbacks of Metropolis-Hastings: an efficient exploration of the sample space requires rejecting most of the proposed steps. Instead, using HMC, it is possible to get a high acceptance rate even for faraway points in the parameter space, thus resulting in a very efficient sampling method.

Let's get out of our Gedanken experiment and back to the real world. We have to pay a price for this very clever Hamiltonian-based proposal. We need to compute gradients of our function. A gradient is the generalization of the concept of the derivative to more than one dimension; computing the derivative of a function at one point tells us in which direction the function increases and in which direction it decreases. We can use gradient information to simulate the ball moving in a curved space; in fact, we use the same motion laws and mathematics used in classical physics to simulate classical mechanical systems, such as balls rolling, the orbits in planetary systems, and the jiggling of molecules.

Computing gradients make us face a trade-off; each HMC step is more expensive to compute than a Metropolis-Hastings step, but the probability of accepting that step is much higher with HMC than with Metropolis. To balance this trade-off situation in favor of HMC, we need to tune a few parameters of the HMC model (in a similar fashion to how tune the width of the proposal distribution for an efficient Metropolis-Hastings sampler). When this tuning is done by hand, it takes some trial and error and also requires an experienced user, making this procedure a less universal inference engine than we may want. Luckily for us, PyMC3 comes with a relatively new sampler, known as No-U-Turn Sampler (NUTS). This method has proven very useful in providing a very good efficiency for solving Bayesian models without requiring human intervention (or at least minimizing it). One caveat of NUTS is that only works for continuous distribution; the reason is that we cannot compute gradients for discrete distribution. PyMC3 solves this problem by assigning NUTS to continuous parameters and Metropolis to the discrete ones.

I strongly recommend you complement this section with this very cool animation by Chi Feng: https://chi-feng.github.io/mcmc-demo/.

Table of Contents for Hamiltonian Monte Carlo

Create new playlist

Sign In

Sign Up

Table of Contents for
Hamiltonian Monte Carlo