We can achieve similar results with Autoregressive Integrated Moving Average (ARIMA) models. To predict future values of a time-series, we usually have to stationarize it first, which means that the data has a constant mean, variance, and autocorrelation over time. In the past two sections, we used seasonal decomposition and the Holt-Winters filter to achieve this. Now let's see how the generalized version of the Autoregressive Moving Average (ARMA) model can help with this data transformation.
ARIMA(p, d, q) actually includes three models with three non-negative integer parameters:
As ARIMA also includes an integrated (differencing) part over ARMA, it can deal with non-stationary time-series as well, as they naturally become stationary after differencing—in other words, when the d parameter is larger than zero.
Traditionally, choosing the best ARIMA model for a time-series is required to build multiple models with a variety of parameters and compare model fits. On the other hand, the forecast
package comes with a very useful function that can select the best fitting ARIMA model for a time-series by running unit root tests and minimizing the maximum-likelihood (ML) and the Akaike Information Criterion (AIC) of the models:
> auto.arima(nts) Series: ts ARIMA(3,0,0)(2,0,0)[7] with non-zero mean Coefficients: ar1 ar2 ar3 sar1 sar2 intercept 0.3205 -0.1199 0.3098 0.2221 0.1637 621.8188 s.e. 0.0506 0.0538 0.0538 0.0543 0.0540 8.7260 sigma^2 estimated as 2626: log likelihood=-1955.45 AIC=3924.9 AICc=3925.21 BIC=3952.2
It seems that the AR(3) model has the highest AIC with AR(2) seasonal effects. But checking the manual of auto.arima
reveals that the information criteria used for the model selection were approximated due to the large number (more than 100) of observations. Re-running the algorithm and disabling approximation returns a different model:
> auto.arima(nts, approximation = FALSE) Series: ts ARIMA(0,0,4)(2,0,0)[7] with non-zero mean Coefficients: ma1 ma2 ma3 ma4 sar1 sar2 intercept 0.3257 -0.0311 0.2211 0.2364 0.2801 0.1392 621.9295 s.e. 0.0531 0.0531 0.0496 0.0617 0.0534 0.0557 7.9371 sigma^2 estimated as 2632: log likelihood=-1955.83 AIC=3927.66 AICc=3928.07 BIC=3958.86
Although it seems that the preceding seasonal ARIMA model fits the data with a high AIC, we might want to build a real ARIMA model by specifying the D argument resulting in an integrated model via the following estimates:
> plot(forecast(auto.arima(nts, D = 1, approximation = FALSE), 31))
Although time-series analysis can sometimes be tricky (and finding the optimal model with the appropriate parameters requires a reasonable experience with these statistical methods), the preceding short examples proved that even a basic understanding of the time-series objects and related methods will usually provide some impressive results on the patterns of data and adequate predictions.
18.189.171.153