Cointegration is similar to correlation but is viewed by many as a superior metric to define the relatedness of two time series. Two time series x(t)
and y(t)
are cointegrated if a linear combination of them is stationary. In such a case, the following equation should be stationary:
y(t) – a x(t)
Consider a drunk man and his dog out on a walk. Correlation tells us whether they are going in the same direction. Cointegration tells us something about the distance over time between the man and his dog. We will show cointegration using randomly generated time series and real data. The Augmented Dickey-Fuller (ADF) test (see http://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test) tests for a unit root in a time series and can be used to determine the cointegration of time series.
For the following code, have a look at the cointegration.py
file in this book's code bundle:
import statsmodels.api as sm from pandas.stats.moments import rolling_window import pandas as pd import statsmodels.tsa.stattools as ts import numpy as np def calc_adf(x, y): result = sm.OLS(x, y).fit() return ts.adfuller(result.resid) data_loader = sm.datasets.sunspots.load_pandas() data = data_loader.data.values N = len(data) t = np.linspace(-2 * np.pi, 2 * np.pi, N) sine = np.sin(np.sin(t)) print "Self ADF", calc_adf(sine, sine) noise = np.random.normal(0, .01, N) print "ADF sine with noise", calc_adf(sine, sine + noise) cosine = 100 * np.cos(t) + 10 print "ADF sine vs cosine with noise", calc_adf(sine, cosine + noise) print "Sine vs sunspots", calc_adf(sine, data)
Let's get started with the cointegration demo:
def calc_adf(x, y): result = stat.OLS(x, y).fit() return ts.adfuller(result.resid)
data_loader = sm.datasets.sunspots.load_pandas() data = data_loader.data.values N = len(data)
t = np.linspace(-2 * np.pi, 2 * np.pi, N) sine = np.sin(np.sin(t)) print "Self ADF", calc_adf(sine, sine)
The code should print the following:
Self ADF (-5.0383000037165746e-16, 0.95853208606005591, 0, 308, {'5%': -2.8709700936076912, '1%': -3.4517611601803702, '10%': -2.5717944160060719}, -21533.113655477719)
The first value in the printout is the ADF metric and the second value is the p-value. As you can see, the p-value is very high. The following values are the lag and sample size. The dictionary at the end gives the t-distribution values for this exact sample size.
noise = np.random.normal(0, .01, N) print "ADF sine with noise", calc_adf(sine, sine + noise)
With the noise, we get the following results:
ADF sine with noise (-7.4535502402193075, 5.5885761455106898e-11, 3, 305, {'5%': -2.8710633193086648, '1%': -3.4519735736206991, '10%': -2.5718441306100512}, -1855.0243977703672)
The p-value has gone down considerably. The ADF metric -7.45
here is lower than all the critical values in the dictionary. All these are strong arguments to reject cointegration.
cosine = 100 * np.cos(t) + 10 print "ADF sine vs cosine with noise", calc_adf(sine, cosine + noise)
The following values get printed:
ADF sine vs cosine with noise (-17.927224617871534, 2.8918612252729532e-30, 16, 292, {'5%': -2.8714895534256861, '1%': -3.4529449243622383, '10%': -2.5720714378870331}, -11017.837238220782)
Similarly, we have strong arguments to reject cointegration. Checking for cointegration between the sine and sunspots gives the following output:
Sine vs sunspots (-6.7242691810701016, 3.4210811915549028e-09, 16, 292, {'5%': -2.8714895534256861, '1%': -3.4529449243622383, '10%': -2.5720714378870331}, -1102.5867415291168)
The confidence levels are roughly the same for the pairs used here because they are dependent on the number of data points, which don't vary much. The outcome is summarized in the following table:
Pair |
Statistic |
p-value |
5% |
1% |
10% |
Reject |
---|---|---|---|---|---|---|
Sine with self |
-5.03E-16 |
0.95 |
-2.87 |
-3.45 |
-2.57 |
No |
Sine versus sine with noise |
-7.45 |
5.58E-11 |
-2.87 |
-3.45 |
-2.57 |
Yes |
Sine versus cosine with noise |
-17.92 |
2.89E-30 |
-2.87 |
-3.45 |
-2.57 |
Yes |
Sine versus sunspots |
-6.72 |
3.42E-09 |
-2.87 |
-3.45 |
-2.57 |
Yes |
18.191.237.194