Summarizing the posterior

Generally, the first task we will perform after sampling from the posterior is check what the results look like. The plot_trace function from ArviZ is ideally suited to this task:

az.plot_trace(trace)

Figure 2.1

By using az.plot_trace, we get two subplots for each unobserved variable. The only unobserved variable in our model is . Notice that y is an observed variable representing the data; we do not need to sample that because we already know those values. Thus, in Figure 2.1, we have two subplots. On the left, we have a Kernel Density Estimation (KDE) plot; this is like the smooth version of the histogram. On the right, we get the individual sampled values at each step during the sampling. From the trace plot, we can visually get the plausible values from the posterior. You should compare this result using PyMC3 with those from the previous chapter, which were obtained analytically.

ArviZ provides several other plots to help interpret the trace, and we will see them in the following pages. We may also want to have a numerical summary of the trace. We can get that using az.summary, which will return a pandas DataFrame:

az.summary(trace)

	mean	sd	mc error	hpd 3%	hpd 97 %	eff_n	r_hat
θ	0.33	0.18	0.0	0.02	0.64	847.0	1.0

We get the mean, standard deviation (sd), and 94% HPD interval (hpd 3% and hpd 97%). As we discussed in Chapter 1, Thinking Probabilistically, we can use these numbers to interpret and report the results of a Bayesian inference. The last two metrics are related to diagnosing samples. For details, see Chapter 8, Inference Engines.

Another way to visually summarize the posterior is to use the plot_posterior function that comes with ArviZ. We have already used this distribution in the previous chapter for a fake posterior. We are going to use it now for a real posterior. By default, plot_posterior shows a histogram for discrete variables and KDEs for continuous variables. We also get the mean of the distribution (we can ask for the median or mode using the point_estimate argument) and the 94% HPD as a black line at the bottom of the plot. Different interval values can be set for the HPD with the credible_interval argument. This type of plot was introduced by John K. Kruschke in his great book Doing Bayesian Data Analysis:

az.plot_posterior(trace)

Figure 2.2

Table of Contents for Summarizing the posterior

Create new playlist

Sign In

Sign Up

Table of Contents for
Summarizing the posterior