How to do it...  

In this recipe, we will work with multiple time series related to oil/gas in Argentina. The objective will be to model these series as a unique model, and then make predictions out of that model:

As usual, we first load the data:

library(vars)  
oilgas = read.csv("./fuel_series.csv")  
colnames(oilgas)  = c("time_index","oil_processed","gasoil_prod","fueloil_prod","butane") 
joined_data = ts(oilgas[-1],start=c(1996,1),frequency=12)

Because the data is collected monthly, it is a good practice to set the lag number to 12 (this way, every month will be correlated to the same month in different years). But this is problematic in some ways, because it means that each equation will have 12 lags x 3 variables = 36 coefficients just for the endogenous part (we could also have a mean and trend for each equation, which will require more parameters).

Many of those coefficients won't be significative, so we can use an automatic method to discard those lags/coefficients based on their t-values/p-values. If we choose to use restrict() and pass method=ser, the internal algorithm will remove all coefficients smaller than the thresh= value, which is 2 by default. The practical consequence of having a smaller/simpler model is that the confidence intervals for the predictions will be smaller:

m = VAR(joined_data,p=12) 
restrict(m, method = "ser")

The very first thing that needs to be done here is to check that all the roots of the model are smaller than 1 in modulus. If not, the model needs to be fixed (possibly by differencing the variables). A non-stationary VAR model yields large confidence intervals and is not practical in any way. As we can see here, our model is fine:

any(roots(m)>0.9999)

We get the following output:

The residuals should be checked. They should be homoscedastic, Gaussian, and with no structure. In our case, they almost look good, except for the fact that the Gaussian hypothesis is rejected. Fixing non-normality is much harder in VAR than in univariate AR models. One possibility is to take logs for each for each variable; nevertheless, we decide to continue with our model. We essentially get one of these plots for each variable:

normalitytest <- normality.test(m) 
plot(normalitytest)

The following output shows the Residuals graph:

We can plot the actual versus the predicted values. When calling plot, we will get one of these plots for each variable. The model fits reasonably well, except for an obvious outlier in the first part of the series:

plot(m)

The following screenshot shows the Diagram of fit and residuals for gasoil_prod:

We then plot the forecast error decomposition: this shows how much of each variable's variability is explained by each variable. The interaction is not very relevant, and each series seems to be explained solely by itself.

The following output shows the forecast error decomposition decomposition, the processed oil explains most its variability. And it also explains a substantial amount of variability for the rest:

One usual output for VAR models are the impulse-response functions, which show how an impact on one variable impacts another variable. Let's see, for example, how oil_processed impacts the butane/gasoil/fueloil production: evidently, an increase in the production causes an impact on the three other series; that fades to zero after two periods (the red lines show the bootstrapped confidence intervals):

var.2c.irf <- irf(m, impulse = "oil_processed", 
response = c("butane", "gasoil_prod", "fueloil_prod"), boot = TRUE) 
plot(var.2c.irf)

After running the preceding code, we get the following output:

Finally, we predict the values for the next 2 years (24 months). The following output shows these forecasts:

Table of Contents for How to do it...&#xA0;&#xA0;

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...