Importance of autocorrelation

Autocorrelation represents the degree of similarity between a given time series and a lagged (that is, delayed in time) version of itself over successive time intervals. It occurs in time series studies when the errors associated with a given time period carry over into future time periods. For example, if we are predicting the growth of stock dividends, an overestimate in 1 year is likely to lead to overestimates in the succeeding years.

The time series analysis data arise in lots of different scientific applications and in lots of financial processes. Some of the examples include: generated reports of financial performance, prices over time, computing volatility, and others.

If we are analyzing unknown data, autocorrelation can help us detect if the data is random or not. For that, we can use a correlogram. It can help provide answers to questions such as: is the data random, is this time series data a white noise, is it sinusoidal, is it autoregressive, what is the model of this time series data?

Getting ready

We will use matplotlib to compare two sets of data. One is Google day trend of search volume for a certain keyword for 1 year (365 days). The other set is 365 random measurements (generated with random data) with normal distribution.

We will autocorrelate both datasets and compare how the correlograms visualize patterns in data.

How to do it...

In this section, we will perform the following steps:

  1. Import the matplotlib.pyplot module
  2. Import the numpy package
  3. Use a cleaned dataset of Google search volume for a year
  4. Plot the data set and plot its autocorrelation diagram
  5. Generate the same-length random dataset using NumPy
  6. Plot the random dataset on the same figure and plot its autocorrelation diagram
  7. Add appropriate labels and grids for easier understanding of the plot

This is the code:

import matplotlib.pyplot as plt
import numpy as np

# import the data

from ch07_search_data import DATA as d

total = sum(d)
av = total / len(d)
z = [i - av for i in d]

fig = plt.figure()
# plt.title('Comparing autocorrelations')

# Search trend volume
ax1 = fig.add_subplot(221)
ax1.plot(d)
ax1.set_xlabel('Google Trends data for "flowers"')

# Is there a pattern in search trend for this keyword?
ax2 = fig.add_subplot(222)
ax2.acorr(z, usevlines=True, maxlags=None, normed=True, lw=2)
ax2.grid(True)
ax2.set_xlabel('Autocorrelation')

# Now let's generate random data for the same period
d1 = np.random.random(365)
assert len(d) == len(d1)

total = sum(d1)
av = total / len(d1)
z = [i - av for i in d1]

# Random: "search trend volume"
ax3 = fig.add_subplot(223)
ax3.plot(d1)
ax3.set_xlabel('Random data')

# Is there a pattern in search trend for this keyword?
ax4 = fig.add_subplot(224)
ax4.set_xlabel('Autocorrelation of random data')
ax4.acorr( z, usevlines=True, maxlags=None, normed=True, lw=2)
ax4.grid(True)

plt.show()

This code will render following figure:

How to do it...

How it works...

Looking at the left-hand side plots it is easy to spot patterns in search volume data, where bottom left plot is normally distributed random data—where patterns are not obvious, but still might exist.

Computing and plotting autocorrelation over the random data, we see that there is a high correlation at 0—which is expected, data is correlated with itself in no time lag. But going before or after no time lag, the signal is almost 0, so we can safely conclude that there is no correlation between the signal in original time and any time lags examined.

Looking at the real data—Google search volume trend—we can see the same behavior at 0 time lag, still something we can expect for any autocorrelated signal. But, we have strong signals at around 30, 60, and 110 days after 0 time lag. This indicates that there is a pattern with this particular search term and a way people search for it on the Google search engine.

Explaining why is this is a very different story, and we will leave this exercise to the reader. Remember that correlation and causation are two very different things.

There's more...

Autocorrelation is used very often when we want to identify model for unknown data, and try to fit data into a model. How data correlates to itself is sometimes a first step to identifying an appropriate model for a dataset we are presented with. This requires more than Python; it requires knowledge of mathematical modeling. Various statistical tests (Ljung-Box test, Box-Pierce test, and so on) will help us answer these questions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.250.203