Model comparison with PyMC3

Model comparison with ArviZ is more easily done than said!

waic_l = az.waic(trace_l)
waic_l

	waic	waic_se	p_waic	warning
0	28.750381	5.303983	2.443984	0

If you want to compute LOO instead of WAIC, you must use az.loo.

For WAIC and LOO PyMC3 reports four values:

A point estimate
The standard error of the point estimate (this is computed by assuming normality and hence it may not be very reliable when the sample size is low)
The effective number of parameters
A warning (read the A note on the reliability of WAIC and LOO computations section for more details)

Since the values of WAIC/LOO are always interpreted in a relative fashion, that is, by comparing them across models, ArviZ provides two auxiliary functions to ease the comparison. The first one is az.compare:

cmp_df = az.compare({'model_l':trace_l, 'model_p':trace_p},
                    method='BB-pseudo-BMA')
cmp_df

	waic	pwaic	dwaic	weight	se	dse	warning
1	9.07	2.59	0	1	5.11	0	0
2	28.75	2.44	19.68	0	4.51	5.32	0

We have many columns, so let's check the meaning of them, one by one:

The first column clearly contains the values of WAIC. The DataFrame is always sorted from the lowest to the highest WAIC. The index reflects the order in which the models are passed to this function.
The second column is the estimated effective number of parameters. In general, models with more parameters will be more flexible to fit data and at the same time could also lead to overfitting. Thus, we can interpret pWAIC as a penalization term. Intuitively, we can also interpret it as a measure of how flexible each model is in fitting the data.
The third column is the relative difference between the value of WAIC for the top-ranked model and the value of WAIC for each model. For this reason, we will always get a value of 0 for the first model.
Sometimes, when comparing models, we do not want to select the best model. Instead, we want to perform predictions by averaging along all the models (or at least several models). Ideally, we would like to perform a weighted average, giving more weight to the model that seems to explain/predict the data better. There are many approaches to perform this task. One of them is to use Akaike weights based on the values of WAIC for each model. These weights can be loosely interpreted as the probability of each model (among the compared models), given the data. One caveat of this approach is that the weights are based on point estimates of WAIC (that is, the uncertainty is ignored).
The fifth column records the standard error for the WAIC computations. The standard error can be useful for assessing the uncertainty of the WAIC estimates.

In the same way that we can compute the standard error for each value of WAIC, we can compute the standard error of the differences between two values of WAIC. Notice that both quantities are not necessarily the same. The reason for this is that the uncertainty about WAIC is correlated between models. This quantity is always 0 for the top-ranked model.
Finally, we have the last column, named warning. A value of 1 indicates that the computation of WAIC may not be reliable. Read the A note on the reliability of WAIC and LOO computations section for further details.

We can also get similar information as a visualization by using the az.plot_compare function. This second convenience function takes the output of az.compare and produces a summary plot in the style of the one used in the book Statistical Rethinking by Richard McElreath:

az.plot_compare(cmp_df)

Figure 5.8

Let me describe the Figure 5.8 in detail:

The empty circle represents the values of WAIC and the black error bars associated with them are the values of the standard deviation of WAIC.
The value of the lowest WAIC is also indicated with a vertical dashed grey line to ease comparison with other WAIC values.
The filled in black dots are the in-sample deviance of each model, which for WAIC is 2 pWAIC from the corresponding WAIC value.
For all models except the top-ranked one, we also get a triangle, indicating the value of the difference of WAIC between that model and the top model, and a grey error bar indicating the standard error of the differences between the top-ranked WAIC and WAIC for each model.

The simpler way to use information criteria is to perform model selection. Simply choose the model with the lower Information Criterion value and forget about any other model. If we follow this interpretation, this is a very easy choice—the quadratic model is the best. Notice that the standard errors do not overlap, giving us confidence about making this choice. If instead the standard errors were overlapping, we should provide a more nuanced answer.

Table of Contents for Model comparison with PyMC3

Create new playlist

Sign In

Sign Up

Table of Contents for
Model comparison with PyMC3