Log-likelihood and deviance

An intuitive way of measuring how well a model fits the data is to compute the quadratic mean error between the data and the predictions made by the model:

is just the predicted value, given the estimated parameters.

Note that this is essentially the average of the difference between the observed and predicted data. Squaring the errors ensures that the differences do not cancel out and emphasizes large errors relative to other ways of computing similar quantities, such as using the absolute value. A more general measure is to compute the log-likelihood:

When the likelihood is normal, this turns out to be proportional to the quadratic mean error.

In practice, and for historical reasons, people usually do not use log-likelihood directly; instead, they use a quantity known as deviance:

The deviance is used for Bayesians and non-Bayesians alike; the difference is that under a Bayesian framework, is estimated from the posterior and, like any quantity derived from a posterior, it has a distribution. On the contrary, in non-Bayesian settings, is a point estimate. To learn how to use this deviance, we should note two key aspects of this quantity:

The lower the deviance, the higher the log-likelihood and the higher the agreement of the model predictions and the data. Therefore, we want low deviance values.
The deviance is measuring the within-sample accuracy of the model and hence complex models will generally have a lower deviance than simpler ones. Thus, we need to somehow include a penalization term for complex models.

In the following sections, we will learn about different information criteria. They share the fact that they use deviance and a penalization term. What makes them different is how the deviance and the penalization term are computed.

Table of Contents for Log-likelihood and deviance

Create new playlist

Sign In

Sign Up

Table of Contents for
Log-likelihood and deviance