Bayesian variable selection

Bayesian variable selection within a classical context is usually simple. It really boils down to selecting an appropriate metric (such as the AIC or p-values) and evaluating the model in a greedy way; starting with either a simple (or complex) model, and seeing what happens when we add (or remove) terms.

In a Bayesian context, things are not that easy, since we are not treating parameters as fixed values. We are estimating a posterior density, but a density itself has no significance so we can no longer remove them based on p-values. The AIC way can't be used either, as we don't have an AIC value, but a distribution of possible AICs.

Clearly, we need a different way of doing variable selection that takes into consideration that we are dealing with densities. Kuo and Mallick (https://www.jstor.org/stable/25053023?seq=1#page_scan_tab_contents) proposed a simple and powerful way of doing variable selection in a proper Bayesian context. The idea is to add an indicator variable that will be either zero or one multiplying each variable. This indicator variable will be distributed as a Bernoulli random variable with probability = q:

Additionally, we can add a prior on this q parameter based on how many of the parameters we would expect to be relevant. If we expect most of the parameters to be relevant, we should think that, in general, q should be large. If a variable is not-relevant, then the q parameter will be estimated to be rather close to zero (meaning that the associated Bernoulli variable will generally be equal to 0). In other words, the model will assume that the variable could be multiplied by zero and nothing would change much.

Table of Contents for Bayesian variable selection

Create new playlist

Sign In

Sign Up

Table of Contents for
Bayesian variable selection