Forward selection model

In Figure 5.9, we can see that the number of observations read are 75, but the number of observations used is only 65. This highlights the fact that we have entered null values for CPI for six months when we are interested in generating the forecast. The first variable of the spending to saving ratio calculated on a quarterly basis was entered in the model.

The variable is significant with a p-value of <0.0001:

Figure 5.9: Forward selection regression: step 1

The second variable entered in the model is the category of clothing and shopping spend by the bank's customers. The variable is again deemed significant and has a p-value of <0.0001 (Figure 5.10):

Figure 5.10: Forward selection regression: step 2

In Figure 5.11, we can see that the third variable of entertainment spending is entered and that too is significant with p<0.0001:

Figure 5.11: Forward selection regression: step 3

In Figure 5.12, we can see that the fourth variable of entertainment spending is entered and that too is significant with p<0.0001:

Figure 5.12: Forward selection regression: step 4

In the final step Figure 5.13 of the forward selection model, as shown in the following diagram, we can review the statistics for entry of the five variables that did not make it in the summary of the forward selection model. These variables were deemed insignificant and after introducing four variables in the model, there were no other variables that were retained:

Figure 5.13: Forward selection regression: step 5–model summary

We have also requested plots to summarize the fit criteria for CPI. Figure 5.14 in step 4, we achieved the highest R2, adjusted R2, lowest C(p), AIC, BIC, and SBC metrics. We have spoken about the rest of the diagnostic terms earlier, except Mallows Cp. It is a function of the sum of squared errors and, in general, a model with lower C(p) is preferred when comparing models:

Figure 5.14: Forward selection regression–fit criteria

Another important chart that is produced is the Cook's D for CPI. The statistics can also be evaluated using the observed versus predicted CPI value table that is produced as part of the model output. Large values of studentized residuals indicate observations that will influence the model significantly if the observation is deleted. For these observations, there can be a higher residual value (that is, a higher difference between observed versus predicted when compared to other observations). Cook's D is helpful in identifying extreme/outlier observations. A studentized residual gt 2 and a Cook's D value gt 0.2 should be investigated. The CPI value for January 2012 seems to have been considered as an important observation by the studentized residual and Cook's D metric.

If you look at residuals for any of the observations highlighted by blue bars in Figure 5.15, you will realize that they have the highest absolute residual values in the observed versus predicted table:

Figure 5.15: Forward selection partial output–studentized residuals and Cook's D
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.173.242