WAIC in depth

If we expand equation 5.6, we get the following:

Both terms in this expression look very similar. The first one, the lppd (log point-wise predictive density), is computing the mean likelihood over the  posterior samples. We do this for each data point and then we take the logarithm and sum up over all data points. Please compare this term with equations 5.3 and 5.4. This is just what we call deviance, but computed, taking into account the posterior. Thus, if we accept that computing the log-likelihood is a good way to measure the appropriateness of the fit of a model, then computing it from the posterior is a logic path for a Bayesian approach. As we already said, the lddp of observed data  is an overestimate of the lppd for future data. Thus, we introduce a second term to correct the overestimation. The second term computes the variance of the log-likelihood over the posterior samples. We do this for each data point and then we sum up over all data points. Why does the variance give a penalization term? Well, the intuition is similar to that of the Bayes factor's built-in Occam's Razor. The larger the number of effective parameters, the greater the spread of the posterior will be. When we add structure to a model such as with informative/regularizing priors or hierarchical dependencies, we are restricting the posterior and thus decreasing the effective number of parameters in comparison with a similar unregularized or less structured model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.40.32