For the model that primarily uses only CPI data series, we will use a procedure in SAS called the Unobserved Components Model (UCM). While we are calling it a univariate model, we will end up using some components of the CPI time series as independent variables. Remember, we aren't using the nine internal variables that are available to us as part of the business problem. Those nine variables have been used in the multivariate regression model. The components we will be using are the irregular, trend, and seasonal components. We will also leverage some of the plots that can be produced as part of the UCM procedure.
The following is a univariate model code, using Proc UCM:
Proc UCM Data=Model; Id Month Interval=Month; Model CPI; Irregular; Level; Slope Var = 0 Noest; Season Length = 12 Type = Trig; Estimate Back = 6 Plot = (loess panel cusum wn); Forecast Back = 0 Lead = 24 Print = Forecasts Plot=(forecasts decomp); Run;
The trend has been specified using the level and the slope options. In Figure 5.25, we can see the preliminary estimates of the free parameters:
As seen in Figure 5.26, both the Level and Slope (part of the trend component) are significant components in the model:
Let us evaluate the residual diagnostics, shown in Figure 5.27. In the histogram, the residuals seem to be normally distributed. In the Q-Q plot, the residuals are closer to the line and hence seem to be normally distributed. The ACF and PACF don't exhibit any violation of the whiteness assumption:
In Figure 5.28, showing the residual white noise test, the first three lags correspond to the three components we have included in the model. As a standard, the white noise test is not done for the number of lags that equal the number of components used in the model. While the fourth lag is within the 0.05 p-value, from the fifth lag onwards, we can see that the residuals have a p-value greater than 0.05. Hence, no violation of the whiteness can be observed in the model:
There is a structural break in cumulative residuals, as seen in Figure 5.29. For a period of almost two years, the cumulative residuals are above the 95% confidence limit:
We have tried to solve the business problem using the multivariate regression and the UCM model approach. Let us compare the forecasts generated for our validation period:
Forecasted Month |
Forward Selection |
Backward Selection |
Maximize R |
UCM |
Observed Values |
Oct 2017 |
106.4077 |
106.4077 |
106.3580 |
106.46 |
106.4 |
Nov 2017 |
106.4077 |
106.4077 |
106.3592 |
106.48 |
106.5 |
Dec 2017 |
106.3413 |
106.3413 |
106.2917 |
106.48 |
106.6 |
Jan 2018 |
106.2715 |
106.2715 |
106.2025 |
106.46 |
106.6 |
Feb 2018 |
106.2051 |
106.2051 |
106.1272 |
106.46 |
106.6 |
Mar 2018 |
106.2051 |
106.2051 |
106.1198 |
106.476667 |
106.7 |
From the preceding table, we can observe that, in absolute terms, the UCM forecasts are closer to the observed values and also directionally right more number of times compared to other models. This does pose a dilemma for the bank's management. They are keen to leverage the internal data for forecasting. The publicly available CPI data seems to produce more accurate forecasts than the models using internal data. This is an aspect that the management team will have to consider while deciding to use internal data. From the nine variables that were used, it seems that only four are needed if the bank does go ahead and use models based on internal data.