Stepwise regression method for variable selection

In stepwise regression method, a base regression model is formed using the OLS estimation method. Thereafter, a variable is added or subtracted to/from the base model depending upon the Akaike Information Criteria (AIC) value. The standard rule is that a minimum AIC would guarantee a best fit in comparison to other methods. Taking the Cars93_1.csv file, let's create a multiple linear regression model by using the stepwise method. There are three different ways to test out the best model using the step formula:

  • Forward method
  • Backward method
  • Both

In the forward selection method, initially a null model is created and variable is added to see any improvement in the AIC value. The independent variables is added to the null model till there is an improvement in the AIC value. In backward selection method, the complete model is considered to be the base model. An independent variable is deducted from the model and the AIC value is checked. If there is an improvement in AIC, the variable would be removed; the process will go on till the AIC value becomes minimum. In both the methods, alternatively forward and back selection method is used to identify the most relevant model. Let's look at the model result at first iteration:

> #base model

> fit<-lm(MPG.Overall~.,data=Cars93_1)

> #stepwise regression

> model<-step(fit,method="both")

Start: AIC=183.52

MPG.Overall ~ Price + EngineSize + Horsepower + RPM + Rev.per.mile +

Fuel.tank.capacity + Length + Wheelbase + Width + Turn.circle +

Rear.seat.room + Luggage.room + Weight

Df Sum of Sq RSS AIC

- Turn.circle 1 0.147 546.54 181.54

- Rear.seat.room 1 0.239 546.64 181.56

- Horsepower 1 0.323 546.72 181.57

- Price 1 6.055 552.45 182.43

- Width 1 6.185 552.58 182.45

- RPM 1 6.686 553.08 182.52

- Length 1 6.695 553.09 182.52

- EngineSize 1 8.186 554.58 182.74

- Luggage.room 1 9.673 556.07 182.96

<none> 546.40 183.52

- Wheelbase 1 35.799 582.20 186.73

- Rev.per.mile 1 40.565 586.96 187.40

- Fuel.tank.capacity 1 47.646 594.04 188.38

- Weight 1 63.400 609.80 190.53

In the beginning, AIC is 183.52. In the final model, AIC is 175.51 and there are six independent variables, as shown next:

Step: AIC=175.51

MPG.Overall ~ EngineSize + RPM + Rev.per.mile + Fuel.tank.capacity +

Wheelbase + Width + Luggage.room + Weight

Df Sum of Sq RSS AIC

- Luggage.room 1 8.976 568.78 174.82

- Width 1 12.654 572.46 175.34

<none> 559.81 175.51

- EngineSize 1 14.022 573.83 175.54

- RPM 1 19.422 579.23 176.31

- Wheelbase 1 28.477 588.28 177.58

- Rev.per.mile 1 37.873 597.68 178.88

- Fuel.tank.capacity 1 52.516 612.32 180.86

- Weight 1 135.462 695.27 191.28

In the preceding table, "none" entry denotes the stage which is the final model, where further tuning or variable reduction is not possible. Hence, dropping more variables after that point would not make sense.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.11.227