Sequential Monte Carlo

One of the caveats of Metropolis-Hastings and also NUTS (and other Hamiltonian Monte Carlo variants) is that if the posterior has multiple peaks and these peaks are separated by regions of very low probability, these methods can get stuck in a single mode and miss the others!

Many of the methods developed to overcome this multiple minima problem are based on the idea of tempering. This idea, once again, is borrowed from statistical mechanics. The number of states a physical system can populate depends on the temperature of the system; at 0 Kelvin (the lowest possible temperature), every system is stuck in a single state. On the other extreme, for an infinite temperature all possible states are equally likely. Generally, we are interested in systems at some intermediate temperature. For Bayesian models, there is a very intuitive way to adapt this tempering idea by writing Bayes' theorem with a twist.

The only difference between expressions 1.4 and 8.13 is the specification of the  parameter, this is known as the inverse temperature or tempering parameter. Notice that for  we get  and thus the tempered posterior,  is just the prior, , and when , the tempered posterior is the actual full posterior. As sampling from the prior is generally easier than sampling from the posterior (by increasing the value of ), we start sampling from an easier distribution and slowly morph it into the more complex distribution we really care about.

There are many methods that exploit this idea; one of them is known as Sequential Monte Carlo (SMC). The SMC method, as implemented in PyMC3, can be summarized as follows:

  1. Initialize  at zero.
  2. Generate samples from the tempered posterior.
  3. Increase  a little bit.
  4. Compute a set of  weights . The weights are computed according to the new tempered posterior.
  5. Obtain  by resampling  according to .  
  6. Run Metropolis chains, starting each one from a different sample in .
  7.  Repeat from step 3 until .

The resampling step works by removing samples with a low probability and replacing them with samples with a higher probability. The Metropolis step perturbs these samples helping, to explore the parameter-space.  

The efficiency of the tempered method depends heavily on the intermediate values of , what is usually referred as the cooling schedule. The smaller the difference between two successive values of , the closer the two successive tempered posteriors will be, and thus the easier the transition from one stage to the next. But if the steps are too small, we will need many intermediate stages, and beyond some point this will translate into wasting a lot of computational resources without really improving the accuracy of the results.

Fortunately, SMC can automatically compute the intermediate values of . The exact cooling schedule will be adapted to the difficulty of the problem; distributions that are more difficult to sample will require more stages than simpler ones.

SMC is summarized in Figure 8.6, the first subplot shows five samples (orange) dots at some particular stage. The second subplots show how these samples are re-weighted according to their tempered posterior density (blue) curve. The third subplot shows the result of running a certain number of Metropolis steps, starting from the re-weighted samples in the second subplot; notice how the two samples with the lower posterior density (the smaller circles at the most right and left) are discarded and not used to seed new Markov chains:

Figure 8.6

Besides the intermediate values of , two more parameters are dynamically computed based on the acceptance rate of the previous stage: the number of steps of each Markov chain and the width of the proposal distribution.

For step 6 of the SMC algorithm, PyMC3 uses the Metropolis algorithm; this is not necessarily the only choice but is a very reasonable one and is motivated both by theoretical and practical arguments. It is important to note than even when the SMC method uses the Metropolis method, it has several advantages over it:

  • It can sample from multimodal distributions.
  • It does not have a burn-in period. This is due to the re-weighing step. The weights are computed in such a way that  approximates not  but , thus at each stage the MCMC chains start, approximately, from the correct tempered posterior distribution.
  • It can generate samples with low autocorrelation.
  • It can be used to approximate the marginal likelihood (see Chapter 5, Model Comparison), this is just a side-effect of the SMC method requiring almost no additional computations.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.50.252