Interpreting and visualizing the posterior

As we have already seen, we can explore the posterior using ArviZ functions like plot_trace and summary, or we can use our own functions. For a linear regression, it could be useful to plot the average line that fits the data, together with the average mean values of and . To reflect the posterior's uncertainty, we can use semitransparent lines that have been sampled from the posterior:

plt.plot(x, y, 'C0.')

alpha_m = trace_g['α'].mean()
beta_m = trace_g['β'].mean()

draws = range(0, len(trace_g['α']), 10)
plt.plot(x, trace_g['α'][draws] + trace_g['β'][draws]
* x[:, np.newaxis], c='gray', alpha=0.5)

plt.plot(x, alpha_m + beta_m * x, c='k',
label=f'y = {alpha_m:.2f} + {beta_m:.2f} * x')

plt.xlabel('x')
plt.ylabel('y', rotation=0)
plt.legend()
Figure 3.5

Notice that uncertainty is lower in the middle, although it is not reduced to a single point, that is, the posterior is compatible with lines not passing exactly through the mean of the data, as we have already mentioned.

Having the semitransparent lines is perfectly fine, but we may want to add a cool-factor to the plot and instead use a semitransparent band to illustrate the Highest-Posterior Density (HPD) interval of . In fact, this was the main reason we defined the deterministic variable  in the model. Having this variable simplifies the following code:

plt.plot(x, alpha_m + beta_m * x, c='k',          
label=f
'y = {alpha_m:.2f} + {beta_m:.2f} * x')
sig = az.plot_hpd(x, trace_g[
'μ'], credible_interval=0.98, color='k') plt.xlabel('x')
plt.ylabel('y', rotation=0)
plt.legend()
Figure 3.6

One more option is to plot the HPD (for example, 94% and 50%) of the predicted data , that is, where we expect to see the 94% and 50% of future data, according to our model. For Figure 3.7, we are going to use a darker gray for the HPD 50 and a lighter gray for the HPD 95.

Getting the posterior predictive samples is easy in PyMC3 using the sample_posterior_predictive() function:

ppc = pm.sample_posterior_predictive(trace_g,
samples=2000,
model=model_g)

Now, we can plot the results:

plt.plot(x, y, 'b.')
plt.plot(x, alpha_m + beta_m * x, c='k',
label=f'y = {alpha_m:.2f} + {beta_m:.2f} * x')

az.plot_hpd(x, ppc['y_pred'], credible_interval=0.5, color='gray')
az.plot_hpd(x, ppc['y_pred'], color='gray')

plt.xlabel('x')
plt.ylabel('y', rotation=0)
Figure 3.7

The function az.plot_hpd is a helper function that we can use to plot a HPD interval for linear regressions. By default this function smooth the interval. Try passing the argument smooth=False to see what I mean.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.91