Vector autoregression

We've seen in the preceding section that temperature and CO2 require a first order difference. Another simple way to show this is with the forecast package's ndiffs() function. It provides an output that spells out the minimum number of differences needed to make the data stationary. In the function, you can specify which test out of the three available ones you would like to use: Kwiatkowski, Philips, Schmidt & Shin (KPSS), Augmented Dickey-Fuller (ADF), or Philips-Peron (PP). I'll use ADF in the following code, which has a null hypothesis that the data isn't stationary:

> climate49 <- window(climate_ts, start = 1949)  

> forecast::ndiffs(climate49[, 1], test = "adf")
[1] 1

> forecast::ndiffs(climate49[, 2], test = "adf")
[1] 1

We see that both require a first-order difference to become stationary. To get started, we'll create a difference. Then, we'll complete the traditional approach, where both series are stationary:

> climate_diff <- diff(climate49)

It's now a matter of determining the optimal lag structure based on the information criteria using vector autoregression. This is done with the VARselect function in the vars package. You only need to specify the data and number of lags in the model using lag.max = x in the function. Let's use a maximum of 12 lags:

> lag.select <- vars::VARselect(climate_diff, lag.max = 12)

> lag.select$selection
AIC(n) HQ(n) SC(n) FPE(n)
5 1 1 5

We called the information criteria using lag$selection. Four different criteria are provided, including AIC, Hannan-Quinn Criterion (HQ), Schwarz-Bayes Criterion (SC), and FPE. Note that AIC and SC are covered in Chapter 2, Linear Regression, so I won't go over the criterion formulas or differences here. If you want to see the actual results for each lag, you can use lag$criteria. We can see that AIC and FPE have selected lag 5 and HQ and SC lag 1 as the optimal structure to a VAR model. It seems to make sense that the five-year lag is the one to use. We'll create that model using the var() function. I'll let you try it with lag 1:

> fit1 <- vars::VAR(climate_diff, p = 5)

The summary results are quite lengthy as it builds two separate models and would take up probably two whole pages. What I provide is the abbreviated output showing the results with temperature as the prediction:

> summary(fit1)
Residual standard error: 0.09877 on 48 degrees of freedom
Multiple R-Squared: 0.4692, Adjusted R-squared: 0.3586
F-statistic: 4.243 on 10 and 48 DF, p-value: 0.0002996

The model is significant with a resulting adjusted R-square of 0.36.

As we did in the previous section, we should check for serial correlation. Here, the VAR package provides the serial.test() function for multivariate autocorrelation. It offers several different tests, but let's focus on the Portmanteau Test, and please note that the popular Durbin-Watson test is for univariate series only. The null hypothesis is that autocorrelations are zero and the alternative is that they aren't zero:

> vars::serial.test(fit1, type = "PT.asymptotic")

Portmanteau Test (asymptotic)

data: Residuals of VAR object fit1
Chi-squared = 33.332, df = 44, p-value = 0.8794

With p-value at 0.8794, we don't have evidence to reject the null and can say that the residuals aren't autocorrelated. What does the test say with 1 lag?

To do the Granger causality tests in R, you can use either the lmtest package and the Grangertest() function or the causality() function in the vars package. I'll demonstrate the technique using causality(). It's very easy as you just need to create two objects, one for x causing y and one for y causing x, utilizing the fit1 object previously created:

> x2y <- vars::causality(fit1, cause = "CO2")

> y2x <- vars::causality(fit1, cause = "Temp")

It's now just a simple matter to call the Granger test results:

> x2y$Granger

Granger causality H0: CO2 don't Granger-cause Temp

data: VAR object fit1
F-Test = 2.7907, df1 = 5, df2 = 96, p-value = 0.02133

> y2x$Granger

Granger causality H0: Temp don't Granger-cause CO2

data: VAR object fit1
F-Test = 0.71623, df1 = 5, df2 = 96, p-value = 0.6128

The p-value value for CO2 differences of Granger causing temperature is 0.02133 and isn't significant in the other direction. So what does all of this mean? The first thing we can say is that Y doesn't cause X. As for X causing Y, we can reject the null at the 0.05 significance level and therefore conclude that X does Granger cause Y. However, is that the relevant conclusion here? Remember, the p-value evaluates how likely the effect is if the null hypothesis is true. Also, remember that the test was never designed to be some binary yea or nay. Since this study is based on observational data, I believe we can say that it's highly probable that CO2 emissions Granger cause surface temperature anomalies. But there's a lot of room for criticism on that conclusion. I mentioned upfront the controversy around the quality of the data. 

However, we still need to model the original CO2 levels using the alternative Granger causality technique. The process to find the correct number of lags is the same as before, except we don't need to make the data stationary:

> level.select <- vars::VARselect(climate49, lag.max = 12)

> level.select$selection
AIC(n) HQ(n) SC(n) FPE(n)
10 1 1 6

Let's try the lag 6 structure and see whether we can achieve significance, remembering to add one extra lag to account for the integrated series. A discussion on the technique and why it needs to be done is available at http://davegiles.blogspot.de/2011/04/testing-for-granger-causality.html:

> fit2 <- vars::VAR(climate49, p = 7)

> vars::serial.test(fit2, type = "PT.asymptotic")

Portmanteau Test (asymptotic)

data: Residuals of VAR object fit2
Chi-squared = 32.693, df = 36, p-value = 0.6267

Now, to determine Granger causality for X causing Y, you conduct a Wald test, where the coefficients of X and only X are 0 in the equation to predict Y, remembering not to include the extra coefficients that account for integration in the test.

The Wald test in R is available in the aod package we've already loaded. We need to specify the coefficients of the full model, its variance-covariance matrix, and the coefficients of the causative variable.

The coefficients for Temp that we need to test in the VAR object consist of a range of even numbers from 2 to 12, while the coefficients for CO2 are odd from 1 to 11. Instead of using c(2, 4, 6, and so on) in our function, let's create an object with base R's seq() function.

First, let's see how CO2 does Granger causing temperature:

> CO2terms <- seq(1, 11, 2)

> Tempterms <- seq(2, 12, 2)

We're now ready to run the wald test, described in the following code and abbreviated output:

> aod::wald.test(
b = coef(fit2$varresult$Temp),
Sigma = vcov(fit2$varresult$Temp),
Terms = c(CO2terms)
)

$result$`chi2`
chi2 df P
13.48661591 6.00000000 0.03592734

How about that? We have a significant p-value so let's test the other direction causality with the following code:

> aod::wald.test(
b = coef(fit2$varresult$CO2),
Sigma = vcov(fit2$varresult$CO2),
Terms = c(Tempterms)
)

$result$`chi2`
chi2 df P
4.7709016 6.0000000 0.5735146

Conversely, we can say that temperature doesn't Granger cause CO2. The last thing to show here is how to use a vector autoregression to produce a forecast. A predict function is available and we'll plot the forecast for 24 years:

> plot(predict(fit2, n.ahead = 24, ci = 0.95))

The output of the preceding code is as follows:

Looking out a couple of decades hence, we see temperature anomalies getting close to 1 degree. If nothing else, I hope this has stimulated your thinking on how to apply the technique to your own real-world problems or maybe even to examine the climate change data in more detail. There should be a high bar when it comes to demonstrating causality, and Granger causality is a great tool for assisting in that endeavor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.201