How it works... 

Forward methods (both AIC and p-value based) start with a model with only an intercept. They then iterate over all regressors and choose the one with the lowest AIC/p-value. They keep doing this until all variables are checked. Backward methods (both AIC and p-value based) start with the full model (containing all the variables) and they iterate over each possible variable by removing it. They pick the variable with the highest AIC/p-value. This is done until all variables are checked.  Stepwise methods combine both approaches. They try all of the variables and add the one causing the lowest AIC/lowest p-value. They drop the variable causing the biggest AIC increase/highest p-value. This is done until all variables are checked.  These methods work well in practice.

Nevertheless, there are two main problems with these approaches: 

  • There is no guarantee that the models found by them make sense. Maybe there are variables that should definitively be added, but these algorithms might drop them. For example, when predicting house prices, we should always have the house size as an independent variable. A model that omitted that variable would make little sense.
  • The chosen models might be good/bad in terms of the AIC and p-values, but these techniques disregard the residuals. It might be the case that a model with a good AIC has residuals with structure, or non-Gaussian, and so on.  
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.86.18