Convergence

An MCMC sampler, such as NUTS or Metropolis, could take some time before it converges; that is, it starts sampling from the correct distribution. As we previously explained, MCMC methods come with theoretical guarantees of convergence under very general conditions and an infinite number of samples. Unfortunately, in practice, we only can get a finite sample, so we must rely instead on empirical tests that provide, at best, hints or warnings that something bad could be happening when they fail but do not guarantee everything is OK when they do not fail.

One way to visually check for convergence is to run the ArviZ plot_trace function and inspect the result. To better understand what we should look for when inspecting these plots, let's compare the results for the two previously defined models (see Figure 8.6 and 8.7):

az.plot_trace(trace_cm, var_names=['a'], divergences='top')

Figure 8.7

az.plot_trace(trace_ncm, var_names=['a'])

Figure 8.8

Notice how the KDE in Figure 8.8 is smoother than in 8.7; a smooth KDE is a good sign while an uneven KDE may indicate a problem, such as a need for more samples or a more serious problem. The trace itself (the plot on the right) should look like white noise, meaning we should not see any recognizable pattern; we want a curve freely meandering around, like the trace in Figure 8.8. When this happens, we say we have good mixing. Instead, Figure 8.6 is an example of pathological behavior; if you carefully compare it to Figure 8.8, you will notice that the overlap of the two chains is larger for 8.8 than 8.7, and you will also notice something fishy going on at several regions along the trace in 8.7; the clearest one is in the region between 500-1000 draws: you will see that one of the chains (the blue one) got stuck (is basically a horizontal line).

This is really, really, really bad. The sampler is rejecting all new proposals except for those in the very close neighborhood; in other words, it is sampling very slowly and thus not very efficiently. For infinite samples, this will not be problematic, but for finite samples this introduce a bias in the result. A simple fix is to take more samples, but this will only help if the bias is somehow small, otherwise the number of samples required to compensate the bias will grow very very fast, making this simple fix useless.

Figure 8.9 has some additional examples of traces with good mixing (on the right) and bad mixing (on the left). If there is more than one region, such as for discrete variables or multimodal distributions, we expect the trace to not spend too much time in a value or region and then move to other regions, but to jump from one region to the other with ease:

Figure 8.9

Another feature of a good MCMC sample is an auto-similar trace; for example, the first 10% (or so) should look similar to other portions in the trace, such as the last 50% or 10%. Once again, we do not want patterns; instead, we expect something noisy. This can also be seen using az.plot_trace. If the first part of the trace looks different than the others, this is an indication of the need for burn-in, or a higher number of samples. If we see a lack of auto-similarity in other parts or we see a pattern, this could mean we need more draws, but more often than not, we should try with a different parameterization. For difficult models, we may even need to apply a combination of all these strategies.

By default, PyMC3 will try to run independent chains in parallel (the exact number depends on the number of available processors). This is specified with the chains argument in the pm.sample function. We can use the plot_trace or plot_forest ArviZ functions to visually inspect whether the parallel chains are similar to each other. Then we can combine the parallel chains into a single one for inference, so notice that running chains in parallel is not a waste of resources.

A quantitative way of comparing independent chains is by using the Rhat statistic. The idea of this test is to compute the variance between chains with the variance within chains. Ideally, we should expect a value of 1. As an empirical rule, we will be OK with a value below 1.1; higher values signal a lack of convergence. We can compute it using the az.r_hat function; we just need to pass a PyMC3 trace object. The Rhat diagnostic is also computed by default with the az.summary function (as you may recall from previous chapters) and optionally with az.plot_forest (using the r_hat=True argument), as we can see in the following examples:

az.plot_forest(trace_cm, var_names=['a'], r_hat=True, eff_n=True)

Figure 8.10

And also for az.summary:

summaries = pd.concat([az.summary(trace_cm, var_names=['a']),
                      az.summary(trace_ncm, var_names=['a'])])
summaries.index = ['centered', 'non_centered']
summaries

	mean	sd	mc error	hpd 3%	hpd 97%	eff_n	r_hat
centered	8.53	5.84	0.58	1.53	18.87	49.0	1.04
non_centered	7.92	6.01	0.04	0.01	18.48	3817.0	1.00

Table of Contents for Convergence

Create new playlist

Sign In

Sign Up

Table of Contents for
Convergence