Defining cointegration

Cointegration is similar to correlation but is viewed by many as a superior metric to define the relatedness of two time series. Two time series x(t) and y(t) are cointegrated if a linear combination of them is stationary. In such a case, the following equation should be stationary:

y(t) – a x(t)

Consider a drunk man and his dog out on a walk. Correlation tells us whether they are going in the same direction. Cointegration tells us something about the distance over time between the man and his dog. We will show cointegration using randomly generated time series and real data. The Augmented Dickey-Fuller (ADF) test (see http://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test) tests for a unit root in a time series and can be used to determine the cointegration of time series.

For the following code, have a look at the cointegration.py file in this book's code bundle:

import statsmodels.api as sm
from pandas.stats.moments import rolling_window
import pandas as pd
import statsmodels.tsa.stattools as ts
import numpy as np

def calc_adf(x, y):
    result = sm.OLS(x, y).fit()    
    return ts.adfuller(result.resid)

data_loader = sm.datasets.sunspots.load_pandas()
data = data_loader.data.values
N = len(data)

t = np.linspace(-2 * np.pi, 2 * np.pi, N)
sine = np.sin(np.sin(t))
print "Self ADF", calc_adf(sine, sine)

noise = np.random.normal(0, .01, N)
print "ADF sine with noise", calc_adf(sine, sine + noise)

cosine = 100 * np.cos(t) + 10
print "ADF sine vs cosine with noise", calc_adf(sine, cosine + noise)

print "Sine vs sunspots", calc_adf(sine, data)

Let's get started with the cointegration demo:

  1. Define the following function to calculate the ADF statistic:
    def calc_adf(x, y):
        result = stat.OLS(x, y).fit()    
        return ts.adfuller(result.resid)
  2. Load the sunspots data into a NumPy array:
    data_loader = sm.datasets.sunspots.load_pandas()
    data = data_loader.data.values
    N = len(data)
  3. Generate a sine and calculate the cointegration of the sine with itself:
    t = np.linspace(-2 * np.pi, 2 * np.pi, N)
    sine = np.sin(np.sin(t))
    print "Self ADF", calc_adf(sine, sine)

    The code should print the following:

    Self ADF (-5.0383000037165746e-16, 0.95853208606005591, 0, 308, {'5%': -2.8709700936076912, '1%': -3.4517611601803702, '10%': -2.5717944160060719}, -21533.113655477719)
    

    The first value in the printout is the ADF metric and the second value is the p-value. As you can see, the p-value is very high. The following values are the lag and sample size. The dictionary at the end gives the t-distribution values for this exact sample size.

  4. Now, add noise to the sine to demonstrate how noise will influence the signal:
    noise = np.random.normal(0, .01, N)
    print "ADF sine with noise", calc_adf(sine, sine + noise)

    With the noise, we get the following results:

    ADF sine with noise (-7.4535502402193075, 5.5885761455106898e-11, 3, 305, {'5%': -2.8710633193086648, '1%': -3.4519735736206991, '10%': -2.5718441306100512}, -1855.0243977703672)
    

    The p-value has gone down considerably. The ADF metric -7.45 here is lower than all the critical values in the dictionary. All these are strong arguments to reject cointegration.

  5. Let's generate a cosine of a larger magnitude and offset. Again, let's add noise to it:
    cosine = 100 * np.cos(t) + 10
    print "ADF sine vs cosine with noise", calc_adf(sine, cosine + noise)

    The following values get printed:

    ADF sine vs cosine with noise (-17.927224617871534, 2.8918612252729532e-30, 16, 292, {'5%': -2.8714895534256861, '1%': -3.4529449243622383, '10%': -2.5720714378870331}, -11017.837238220782)
    

Similarly, we have strong arguments to reject cointegration. Checking for cointegration between the sine and sunspots gives the following output:

Sine vs sunspots (-6.7242691810701016, 3.4210811915549028e-09, 16, 292, {'5%': -2.8714895534256861, '1%': -3.4529449243622383, '10%': -2.5720714378870331}, -1102.5867415291168)

The confidence levels are roughly the same for the pairs used here because they are dependent on the number of data points, which don't vary much. The outcome is summarized in the following table:

Pair

Statistic

p-value

5%

1%

10%

Reject

Sine with self

-5.03E-16

0.95

-2.87

-3.45

-2.57

No

Sine versus sine with noise

-7.45

5.58E-11

-2.87

-3.45

-2.57

Yes

Sine versus cosine with noise

-17.92

2.89E-30

-2.87

-3.45

-2.57

Yes

Sine versus sunspots

-6.72

3.42E-09

-2.87

-3.45

-2.57

Yes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.237.194