Dealing with multicollinearity

The modeler still wasn't sure that the model was robust enough. He remembered that he hadn't tested the model for any effects of multicollinearity. We spoke briefly about this phenomenon when we studied the correlation of stock prices with each of the eight independent variables proposed for the regression model. The multicollinearity test was run using the tolerance and the variance inflation factor (VIF).

The PROC REG code for multicollinearity is as follows:

PROC REG DATA=build plots(only label)=(RStudentByLeverage CooksD); 
ID date; 
MODEL stock = basket_index -- m1_money_supply_index/tol vif; 
RUN;
Figure 2.18: Partial output for multicollinearity

The tolerance is computed as 1-R2. When the R2 is high, the tolerance value is very low. Such low values of tolerance are indicative of multicollinearity. The VIF is derived by taking the inverse of tolerance; that is, 1/tolerance. A high VIF isn't good for the model. A VIF above 10 is definitely a matter of concern. Some academics also consider values between 2-10 to be indicative of multicollinearity. In our model's case, we have very high values of VIF. There could be multiple variables driving higher VIF, or even a single variable interacting with other predictor variables that might be leading to higher VIF. The modeler decided to remove the variable showing the inflation of the top 10 economies. Inflation is a factor that might be related to the GDP growth among other variables.

The PROC REG code for multicollinearity, after the removal of a variable, is as follows:

PROC REG DATA=build plots(only label)=(RStudentByLeverage CooksD); 
ID date; 
MODEL stock = basket_index eps p_e_ratio global_mkt_share media_analytics_index m1_money_supply_index top_10_gdp/tol vif;
RUN; 
Figure 2.19: Partial regression output for multicollinearity after the removal of a variable

As you can see, the VIF has gone down substantially, after the removal of a single variable related to the inflation of the top 10 economies. This was also a significant variable, and earlier, the model had six significant variables; now we are left with only five. This shouldn't impact the model negatively, as now, by reducing the multicollinearity, the model is much more stable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.196.244