Maximize R

In the maximize R model, the first variable selected leads to the highest R2 value when entered initially in the model. As shown in Figure 20, in the case of our model, the first variable entered is Spend_save_quarterly_ratio. The variable is statistically significant. After this, the Clothing_and_shopping variable is entered. At this stage, the R2 value further increases. The model declares that the best two variable combination in the model has been found:

Figure 5.20: Maximize R: steps 1 and 2

In Figure 5.21, we see the attempt by the model to find the best possible three variable model. The Entertainment variable is entered in step 3. However, step 4, rather than just seeing an entry of the Education variable, we also see the exit of the Clothing_and_shopping variable that was introduced in the model in step 2. The model determines that the best three variable model should contain the Spend_and_save_quarterly_ratio, Entertainment, and Education variables:

Figure 5.21: Maximize R: steps 3 and 4

In step 5 in Figure 5.22, we end up with the same variables in the model as we did with the forward and backward selection models. However, in step 6, we have an additional variable, but all the variables are still significant. Until now this is the first time that we have had a five variable model with none of the variables insignificant. The R2 achieved is the highest in step 6. However, do notice that the C(p) value has gone down by the addition of the Furniture_home_improvement variable in step 6:

Figure 5.22: Maximize R: steps 5 and 6

The model continues to include variables until all of them have been included. Step 10 is the last step for the maximize R model. However, as we can see in Figure 5.23, we are left with a model that has many statistically insignificant variables and a high C(p) when compared to models in steps 5 or 6 of the maximize R2 model:

Figure 5.23: Maximize R: step 10

The studentized residuals and Cook's D chart, shown in Figure 5.24, did point out to an outlier present in May2015, which seems to be the result of some data storage issue. On checking the modeling dataset, it was observed that the May2015 value for the Communication variable is 18. This is an outlier, as the Apr2015 and Jun2015 values are 108 and 109, respectively:

Figure 5.24: Maximize R2 partial output–studentized residuals and Cook's D

Until now, we have compared three models in the multivariate regression section. Apart from the furniture_home_improvement variable being present in step 6 of the maximize R model, all the statistically significant variables are the same across the three models. Let us look at another modeling methodology before we compare the forecasts from the models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.