Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Autocorrelation

Autocorrelation is correlation within a dataset and can indicate a trend.

Note

For a given time series, with known mean and standard deviations, we can define the autocorrelation for times s and t using the expected value operator as follows:

This is, in essence, the formula for correlation applied to a time series and the same time series lagged.

For example, if we have a lag of one period, we can check if the previous value influences the current value. For that to be true, the autocorrelation value has to be pretty high.

In the previous chapter, Chapter 6, Data Visualization, we already used a pandas function that plots autocorrelation. In this example, we will use the NumPy correlate() function to calculate the actual autocorrelation values for the sunspots cycle. At the end, we need to normalize the values we receive. Apply the NumPy correlate() function as follows:

y = data - np.mean(data)
norm = np.sum(y ** 2)
correlated = np.correlate(y, y, mode='full')/norm

We are also interested in the indices corresponding to the highest correlations. These indices can be found with the NumPy argsort() function, which returns the indices that would sort an array:

print np.argsort(res)[-5:]

These are the indices found for the largest autocorrelations:

[ 9 11 10  1  0]

The largest autocorrelation is by definition for zero lag, that is, the correlation of a signal with itself. The next largest values are for a lag of one and ten years. Check the autocorrelation.py file in this book's code bundle:

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from pandas.tools.plotting import autocorrelation_plot


data_loader = sm.datasets.sunspots.load_pandas()
data = data_loader.data["SUNACTIVITY"].values
y = data - np.mean(data)
norm = np.sum(y ** 2)
correlated = np.correlate(y, y, mode='full')/norm
res = correlated[len(correlated)/2:]

print np.argsort(res)[-5:]
plt.plot(res)
plt.grid(True)
plt.xlabel("Lag")
plt.ylabel("Autocorrelation")
plt.show()
autocorrelation_plot(data)
plt.show()

Refer to the following plot for the end result:

Compare the previous plot with the plot produced by pandas:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Autocorrelation

Create new playlist

Sign In

Sign Up

Autocorrelation

Note

Table of Contents for
Autocorrelation