© Sayan Mukhopadhyay, Pratip Samanta 2023
S. Mukhopadhyay, P. SamantaAdvanced Data Analytics Using Pythonhttps://doi.org/10.1007/978-1-4842-8005-8_6

6. Time Series

Sayan Mukhopadhyay1   and Pratip Samanta1
(1)
Kolkata, West Bengal, India
 

A time series is a series of data points arranged chronologically. Most commonly, the time points are equally spaced. A few examples are the passenger loads of an airline recorded each month for the past two years or the price of an instrument in the share market recorded each day for the last year. The primary aim of time-series analysis is to predict the future value of a parameter based on its past data.

Classification of Variation

Traditionally, time-series analysis divides the variation into three major components, namely, trends, seasonal variations, and other cyclic changes. The variation that remains is attributed to “irregular” fluctuations or error term. This approach is particularly valuable when the variation is mostly comprised of trends and seasonality.

Analyzing a Series Containing a Trend

A trend is a change in the mean level that is long-term in nature. For example, if you have a series like 2, 4, 6, 8 and someone asks you for the next value, the obvious answer is 10. You can justify your answer by fitting a line to the data using the simple least square estimation or any other regression method. A trend can also be nonlinear. Figure 6-1 shows an example of a time series with trends.

A graph plots the international airline passenger data in thousands. It plots an increasing trend from around 100 in Jan 49 to 600 in Jan 60.

Figure 6-1

A time series with trends

The simplest type of time series is the familiar “linear trend plus noise” for which the observation at time t is a random variable Xt, as follows:
$$ {	extrm{X}}_{	extrm{t}}=alpha +eta t+{varepsilon}_t $$

Here, α,β are constants, and εt denotes a random error term with a mean of 0. The average level at time t is given by mt= (α + βt). This is sometimes called the trend term.

Curve Fitting

Fitting a simple function of time such as a polynomial curve (linear, quadratic, etc.), a Gompertz curve, or a logistic curve is a well-known method of dealing with nonseasonal data that contains a trend, particularly yearly data. The global linear trend is the simplest type of polynomial curve. The Gompertz curve can be written in the following format, where α, β, and γ are parameters with 0 < r < 1:
$$ {	extrm{X}}_{	extrm{t}}=alpha exp left[eta exp left(-gamma 	extrm{t}
ight)
ight] $$
This looks quite different but is actually equivalent, provided t > 0. The logistic curve is as follows:
$$ {	extrm{X}}_{	extrm{t}}=	extrm{a}/left(1+{	extrm{be}}^{hbox{-} ct}
ight) $$

Both these curves are S-shaped and approach an asymptotic value as t → ∞, with the Gompertz curve generally converging slower than the logistic one. Fitting the curves to data may lead to nonlinear simultaneous equations.

For all curves of this nature, the fitted function provides a measure of the trend, and the residuals provide an estimate of local fluctuations where the residuals are the differences between the observations and the corresponding values of the fitted curve.

Removing Trends from a Time Series

Differentiating a given time series until it becomes stationary is a special type of filtering that is particularly useful for removing a trend. You will see that this is an integral part of the Box-Jenkins procedure. For data with a linear trend, a first-order differencing is usually enough to remove the trend.

Mathematically, it looks like this:
$$ {displaystyle egin{array}{c}	extrm{y}left(	extrm{t}
ight)={	extrm{a}}^{ast}	extrm{t}+	extrm{c}\ {}	extrm{y}left(	extrm{t}+1
ight)={	extrm{a}}^{ast}left(	extrm{t}+1
ight)+	extrm{c}\ {}	extrm{z}left(	extrm{t}
ight)=	extrm{y}left(	extrm{t}+1
ight)-	extrm{y}left(	extrm{t}
ight)=	extrm{a}+	extrm{c};	extrm{no} 	extrm{t}	extrm{rend} 	extrm{present} 	extrm{in} 	extrm{z}left(	extrm{t}
ight)end{array}} $$

A trend can be exponential as well. In this case, you will have to do a logarithmic transformation to convert the trend from exponential to linear.

Mathematically, it looks like this:
$$ {displaystyle egin{array}{c}	extrm{y}left(	extrm{t}
ight)={	extrm{a}}^{ast}exp left(	extrm{t}
ight)\ {}zleft(	extrm{t}
ight)=log left(	extrm{y}left(	extrm{t}
ight)
ight)={	extrm{t}}^{ast}log left(	extrm{a}
ight);	extrm{z}left(	extrm{t}
ight)	extrm{is} 	extrm{a} 	extrm{linear} 	extrm{function} 	extrm{of} 	extrm{t}end{array}} $$

Analyzing a Series Containing Seasonality

Many time series, such as airline passenger loads or weather readings, display variations that repeat after a specific time period. For instance, in India, there will always be an increase in airline passenger loads during the holiday of Diwali. This yearly variation is easy to understand and can be estimated if seasonality is of direct interest. Similarly, like trends, if you have a series such as 1, 2, 1, 2, 1, 2, your obvious choices for the next values of the series will be 1 and 2.

The Holt-Winters model is a popular model to realize time series with seasonality and is also known as exponential smoothing. The Holt-Winters model has two variations: additive and multiplicative. In the additive model with a single exponential smoothing time series, seasonality is realized as follows:
$$ 	extrm{X}left(	extrm{t}+1
ight)={upalpha}^{ast } Xt+left(1-upalpha 
ight)ast 	extrm{St}-1 $$

In this model, every point is realized as a weighted average of the previous point and seasonality. So, X(t+1) will be calculated as a function X(t-1) and S(t-2) and square of α. In this way, the more you go on, the α value increases exponentially. This is why it is known as exponential smoothing. The starting value of St is crucial in this method. Commonly, this value starts with a 1 or with an average of the first four observations.

The multiplicative seasonal model time series is as follows:
$$ 	extrm{X}left(	extrm{t}+1
ight)=left(	extrm{b}1+	extrm{b}{2}^{ast}	extrm{t}
ight)	extrm{St}+	extrm{noise}, $$

Here, b1, often referred to as the permanent component, is the initial weight of the seasonality; b2 represents the trend, which is linear in this case.

However, there is no standard implementation of the Holt-Winters model in Python. It is available in R (see Chapter 1 for how R’s Holt-Winters model can be called from Python code).

Removing Seasonality from a Time Series

There are two ways of removing seasonality from a time series.
  • By filtering

  • By differencing

By Filtering

The series {xt} is converted into another called {yt} with the linear operation shown here, where {ar} is a set of weights:
$$ {	extrm{Y}}_{	extrm{t}}={sum^{+	extrm{s}}}_{	extrm{r}=hbox{-} 	extrm{q}}{	extrm{a}}_{	extrm{r}}{	extrm{X}}_{	extrm{t}+	extrm{r}} $$

To smooth out local fluctuations and estimate the local mean, you should clearly choose the weights so that ∑ar = 1; then the operation is often referred to as a moving average. They are often symmetric with s = q and aj = a-j. The simplest example of a symmetric smoothing filter is the simple moving average, for which ar = 1 / (2q+1) for r = -q, …, + q.

The smoothed value of xt is given by the following:
$$ Smleft({	extrm{x}}_{	extrm{t}}
ight)=frac{1}{2q+1}{sum^{+	extrm{q}}}_{	extrm{r}=hbox{-} 	extrm{q}}{	extrm{X}}_{	extrm{t}+	extrm{r}} $$

The simple moving average is useful for removing seasonal variations, but it is unable to deal well with trends.

By Differencing

Differencing is widely used and often works well. Seasonal differencing removes seasonal variation.

Mathematically, if time series y(t) contains additive seasonality S(t) with time period T, then:
$$ {displaystyle egin{array}{c}	extrm{y}left(	extrm{t}
ight)={	extrm{a}}^{ast}	extrm{S}left(	extrm{t}
ight)={	extrm{b}}^{ast}	extrm{t}+	extrm{c}\ {}	extrm{y}left(	extrm{t}+	extrm{T}
ight)= aSleft(	extrm{t}+	extrm{T}
ight)+{	extrm{b}}^{ast}left(	extrm{t}+	extrm{T}
ight)+	extrm{c}\ {}	extrm{z}left(	extrm{t}
ight)=	extrm{y}left(	extrm{t}+	extrm{T}
ight)-	extrm{y}left(	extrm{t}
ight)={	extrm{b}}^{ast}	extrm{T}+	extrm{noise} 	extrm{term}end{array}} $$

Similar to trends, you can convert the multiplicative seasonality to additive by log transformation.

Now, finding time period T in a time series is the critical part. It can be done in two ways, either by using an autocorrelation function in the time domain or by using the Fourier transform in the frequency domain. In both cases, you will see a spike in the plot. For autocorrelation, the plot spike will be at lag T, whereas for FT distribution, the spike will be at frequency 1/T.

Transformation

Up to now I have discussed the various kinds of transformation in a time series. The three main reasons for making a transformation are covered in the next sections.

To Stabilize the Variance

The standard way to do this is to take a logarithmic transformation of the series; it brings closer the points in space that are widely scattered.

To Make the Seasonal Effect Additive

If the series has a trend and the volume of the seasonal effect appears to be on the rise with the mean, then it may be advisable to modify the data so as to make the seasonal effect constant from year to year. This seasonal effect is said to be additive. However, if the volume of the seasonal effect is directly proportional to the mean, then the seasonal effect is said to be multiplicative, and a logarithmic transformation is needed to make it additive again.

To Make the Data Distribution Normal

In most probability models, it is assumed that distribution of data is Gaussian or normal. For example, there can be evidence of skewness in a trend that causes “spikes” in the time plot that are all in the same direction.

To transform the data in a normal distribution, the most common transform is to subtract the mean and then divide by the standard deviation. I gave an example of this transformation in the RNN example in Chapter 5; I’ll give another in the final example of the current chapter. The logic behind this transformation is it makes the mean 0 and the standard deviation 1, which is a characteristic of a normal distribution. Another popular transformation is to use the logarithm. The major advantage of a logarithm is it reduces the variation and logarithm of Gaussian distribution data that is also Gaussian. Transformation may be problem-specific or domain-specific. For instance, in a time series of an airline’s passenger load data, the series can be normalized by dividing by the number of days in the month or by the number of holidays in a month.

Cyclic Variation

In some time series, seasonality is not a constant but a stochastic variable. That is known as cyclic variation. In this case, the periodicity first has to be predicted and then has to be removed in the same way as done for seasonal variation.

Irregular Fluctuations

A time series without trends and cyclic variations can be realized as a weekly stationary time series. In the next section, you will examine various probabilistic models to realize weekly time series.

Stationary Time Series

Normally, a time series is said to be stationary if there is no systematic change in mean and variance and if strictly periodic variations have been done away with. In real life, there are no stationary time series. Whatever data you receive by using transformations, you may try to make it somehow nearer to a stationary series.

Stationary Process

A time series is strictly stationary if the joint distribution of X(t1),...,X(tk) is the same as the joint distribution of X(t1 + τ),...,X(tk + τ) for all t1,…,tk,τ. If k=1, strict stationary implies that the distribution of X(t) is the same for all t, so provided the first two moments are finite, you have the following:
$$ mu left(	extrm{t}
ight)=mu $$
$$ {sigma}^2left(	extrm{t}
ight)={sigma}^2 $$

They are both constants, which do not depend on the value of t.

A weekly stationary time series is a stochastic process where the mean is constant and autocovariance is a function of time lag.

Autocorrelation and the Correlogram

Quantities called sample autocorrelation coefficients act as an important guide to the properties of a time series. They evaluate the correlation, if any, between observations at different distances apart and provide valuable descriptive information. You will see that they are also an important tool in model building and often provide valuable clues for a suitable probability model for a given set of data. The quantity lies in the range [-1,1] and measures the forcefulness of the linear association between the two variables. It can be easily shown that the value does not depend on the units in which the two variables are measured; if the variables are independent, then the ideal correlation is zero.

A helpful supplement in interpreting a set of autocorrelation coefficients is a graph called a correlogram. The correlogram may be alternatively called the sample autocorrelation function.

Suppose a stationary stochastic process X(t) has a mean μ, variance σ2, auto covariance function (acv.f.) γ(t), and auto correlation function (ac.f.) ρ(τ).
$$ 
ho left(	au 
ight)=frac{gamma left(	au 
ight)}{gamma (0)}=gamma left(	au 
ight)/{sigma}^2 $$

Estimating Autocovariance and Autocorrelation Functions

In the stochastic process, the autocovariance is the covariance of the process with itself at pairs of time points. Autocovariance is calculated as follows:
$$ gamma (h)=frac{1}{n}sum limits_{t=1}^{n-left|h
ight|}left({x}_{t+left|h
ight|}-overline{x}
ight)left({x}_t-overline{x}
ight),-n&lt;h&lt;n $$
Figure 6-2 shows a sample autocorrelation distribution.

A pair of simulated discrete-time sequences for autocorrelation and partial autocorrelations. Both plots have the highest peak of 0 at 1 point 0.

Figure 6-2

Sample autocorrelations

Time-Series Analysis with Python

A complement to SciPy for statistical computations including descriptive statistics and estimation of statistical models is provided by Statsmodels, which is a Python package. Besides the early models, linear regression, robust linear models, generalized linear models, and models for discrete data, the latest release of scikits.statsmodels includes some basic tools and models for time-series analysis, such as descriptive statistics, statistical tests, and several linear model classes. The linear model classes include autoregressive (AR), autoregressive moving-average (ARMA), and vector autoregressive models (VAR).

Useful Methods

Let’s start with a moving average.

Moving Average Process

Suppose that {Zt} is a purely random process with mean 0 and variance σz2. Then a process {Xt} is said to be a moving average process of order q.
$$ {X}_t={eta}_0{Z}_t+{eta}_1{Z}_{t-1}+dots +{eta}_{	extrm{q}}{Z}_{t-q} $$
Here, {βi} are constants. The Zs are usually scaled so that β0 = 1.
$$ {displaystyle egin{array}{l}Eleft({X}_t
ight)=0\ {} Varleft({X}_t
ight)={sigma}_Z^2sum limits_{i=0}^q{eta}_i^2end{array}} $$
The Zs are independent.
$$ {displaystyle egin{array}{c}gamma (k)= Cov left({X}_t,{X}_{t+k}
ight)\ {}= Covleft({eta}_0{Z}_t+dots +{eta}_q{Z}_{t-q},{eta}_0{Z}_{t+k}+dots +{eta}_q{Z}_{t+k-q}
ight)\ {}=left{vdash egin{array}{cc}0&amp; k&gt;q\ {}{sigma}_Z^2sum limits_{i=0}^{q-k}{eta}_i{eta}_{i+k}&amp; k=0,1,dots, q\ {}gamma left(-k
ight)&amp; k&lt;0end{array}
ight.end{array}} $$
$$ Cov left({	extrm{Z}}_s,{Z}_t
ight)=left{egin{array}{cc}{sigma}_Z^2&amp; s=t\ {}0&amp; s
e tend{array}
ight. $$
As γ(k) is not dependent on t and the mean is constant, the process is second-order stationary for all values of {βi}.
$$ 
ho (k)=left{egin{array}{cc}1&amp; k=0\ {}sum limits_{i=0}^{q-k}{eta}_i{eta}_{i+k}/sum limits_{i=0}^q{eta}_i^2&amp; k=1,dots, q\ {}0&amp; k&gt;q\ {}
ho left(-k
ight)&amp; k&lt;0end{array}
ight. $$

Fitting Moving Average Process

The moving-average (MA) model is a well-known approach for realizing a single-variable weekly stationary time series (see Figure 6-3). The moving-average model specifies that the output variable is linearly dependent on its own previous error terms as well as on a stochastic term. The AR model is called the moving-average model, which is a special case and a key component of the ARMA and ARIMA models of time series.
$$ {X}_t={varepsilon}_t+sum limits_{i=1}^p{varphi}_i{X}_{t-i}+sum limits_{i=1}^q{	heta}_i{varepsilon}_{t-i}+sum limits_{i=1}^b{eta}_i{d}_{t-i} $$

A graph of price versus period. It plots 2 increasing trends for the moving average of actual and interval at 6. The actual has the highest peak at 9.

Figure 6-3

Example of moving average

Here’s the example code for a moving-average model:
Please install numpy by following command:
pip install numpy
import numpy as np
def moving_average(x, w):
    return np.convolve(x, np.ones(w), 'valid') / w
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7])
print(moving_average(data,4))

Autoregressive Processes

Suppose {Zt} is a purely random process with mean 0 and variance σz2. After that, a process {Xt} is said to be of autoregressive process of order p if you have this:
$$ {displaystyle egin{array}{l}{x}_t={alpha}_1{x}_{t-1}+dots {alpha}_p{x}_{t-p}+{z}_t, or\ {}{x}_t=sum limits_{i=1}^p{alpha}_i{x}_{i-1}+{z}_tend{array}} $$
The autocovariance function is given by the following:
$$ gamma (k)=frac{alpha^k{sigma}_z^2}{left(1-{alpha}^2
ight)},k=0,1,2dots, 	extrm{hence} $$
$$ {
ho}_k=frac{gamma (k)}{gamma (0)}={alpha}^k $$
Figure 6-4 shows a time series and its autocorrelation plot of the AR model.

A simulated graph of time series and A R model. It plots a fluctuating trend with multiple crests and troughs.

Figure 6-4

A time series and AR model

Estimating Parameters of an AR Process

A process is called weakly stationary if its mean is constant and the autocovariance function depends only on time lag. There is no weakly stationary process, but it is imposed on time-series data to do some stochastic analysis. Suppose Z(t) is a weak stationary process with mean 0 and constant variance. Then X(t) is an autoregressive process of order p if you have the following:
$$ {displaystyle egin{array}{l}	extrm{X}left(	extrm{t}
ight)=	extrm{a}1 	extrm{x} 	extrm{X}left(	extrm{t}hbox{-} 1
ight)+	extrm{a}2 	extrm{x} 	extrm{X}left(	extrm{t}hbox{-} 2
ight)+dots + ap 	extrm{x} 	extrm{X}left(	extrm{t}hbox{-} 	extrm{p}
ight)+	extrm{Z}left(	extrm{t}
ight),	extrm{where} 	extrm{a}in 	extrm{R} 	extrm{a}	extrm{nd} 	extrm{p}in 	extrm{I}\ {} 	extrm{I}end{array}} $$
Now, E[X(t)] is the expected value of X(t).
$$ {displaystyle egin{array}{c}	extrm{Covariance}left(	extrm{X}left(	extrm{t}
ight),	extrm{X}left(	extrm{t}+	extrm{h}
ight)
ight)=	extrm{E}left[{left(	extrm{X}left(	extrm{t}
ight)hbox{-} 	extrm{E}left[	extrm{X}left(	extrm{t}
ight)
ight]
ight)}^{ast } left(	extrm{X}left(	extrm{t}+	extrm{h}
ight)hbox{-} 	extrm{E}left[	extrm{X}left(	extrm{t}+	extrm{h}
ight)
ight]
ight)
ight]\ {}=	extrm{E}left[{left(	extrm{X}left(	extrm{t}
ight)hbox{-} 	extrm{m}
ight)}^{ast } left(	extrm{X}left(	extrm{t}+	extrm{h}
ight)hbox{-} 	extrm{m}
ight)
ight]end{array}} $$
If X(t) is a weak stationary process, then:
$$ {displaystyle egin{array}{c}	extrm{E}left[	extrm{X}left(	extrm{t}
ight)
ight]=	extrm{E}left[	extrm{X}left(	extrm{t}+	extrm{h}
ight)
ight]=	extrm{m} left(	extrm{constant}
ight)\ {}=	extrm{E}left[	extrm{X}{left(	extrm{t}
ight)}^{ast } 	extrm{X}left(	extrm{t}+	extrm{h}
ight)
ight]-{	extrm{m}}^2=	extrm{c}left(	extrm{h}
ight)end{array}} $$

Here, m is constant, and cov[X(t),X(t+h)] is the function of only h(c(h)) for the weakly stationary process. c(h) is known as autocovariance.

Similarly, the correlation (X(t),X(t+h) = ρ(h) = r(h) = c(h) = c(0) is known as autocorrelation.

If X(t) is a stationary process that is realized as an autoregressive model, then:
$$ 	extrm{X}left(	extrm{t}
ight)=	extrm{a}{1}^{ast } 	extrm{X}left(	extrm{t}hbox{-} 1
ight)+	extrm{a}{2}^{ast } 	extrm{X}left(	extrm{t}hbox{-} 2
ight)+dots +{ap}^{ast } 	extrm{X}left(	extrm{t}hbox{-} 	extrm{p}
ight)+	extrm{Z}left(	extrm{t}
ight) $$

Correlation(X(t),X(t)) = a1 * correlation (X(t),X(t-1)) + …. + ap * correlation (X(t),X(t-p))+0

As covariance, (X(t),X(t+h)) is dependent only on h, so:
$$ 	extrm{r}0=	extrm{a}{1}^{ast } 	extrm{r}1+	extrm{a}{2}^{ast } 	extrm{r}2+dots +{ap}^{ast }  rp $$
$$ 	extrm{r}1=	extrm{a}{1}^{ast } 	extrm{r}0+	extrm{a}{2}^{ast } 	extrm{r}1+dots .+{ap}^{ast } 	extrm{r}left(	extrm{p}hbox{-} 1
ight) $$

So, for an n-order model, you can easily generate the n equation and from there find the n coefficient by solving the n equation system.

In this case, realize the data sets only in the first-order and second-order autoregressive model and choose the model whose mean of residual is less. For that, the reduced formulae are as follows:
  • First order: a1 = r1

  • Second order: $$ a1={r}_1left(1-{r}_2
ight)div left(1-{r}_1^2
ight),a2=left({r}_2-{r}_1^2
ight)div left(1-{r}_1^2
ight) $$

Here is some example code for an autoregressive model:
Please download the required CSC file from https://github.com/pratips/book-Advanced-Data-Analytics-Using-Python and install the Python libraries using the pip command.
from pandas import read_csv
from matplotlib import pyplot
from statsmodels.tsa.ar_model import AutoReg
from sklearn.metrics import mean_squared_error
from math import sqrt
# load dataset
series = read_csv('temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
# split dataset
X = series.values
train, test = X[1:len(X)-7], X[len(X)-7:]
# train autoregression
model = AutoReg(train, lags=29)
model_fit = model.fit()
print('Coefficients: %s' % model_fit.params)
# make predictions
predictions = model_fit.predict(start=len(train), end=len(train)+len(test)-1, dynamic=False)
for i in range(len(predictions)):
print('predicted=%f, expected=%f' % (predictions[i], test[i]))
rmse = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse)
# plot results
pyplot.plot(test)
pyplot.plot(predictions, color='red')
pyplot.show()

Mixed ARMA Models

Mixed ARMA models are a combination of MA and AR processes. A mixed autoregressive/moving average process containing p AR terms and q MA terms is said to be an ARMA process of order (p,q). It is given by the following:
$$ {X}_t={alpha}_1{X}_{t-1}+dots +{alpha}_p{X}_{t-p}+{Z}_t+{eta}_1{Z}_{t-1}+dots {eta}_q{Z}_{t-q} $$

The following example code was taken from the stat model site to realize time-series data as an ARMA model.

Please install matplotlib, statsmodels, and numpy using the pip command before running the code.
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.tsa.arima_process import ArmaProcess
import statsmodels.api as sm
np.random.seed(1234)
data = np.array([1, .85, -.43, -.63, .8])
parameter = np.array([1, .41])
model = ArmaProcess(data, parameter)
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111)
ax.plot(model.generate_sample(nsample=50))
plt.show()
Here is how to estimate parameters of an ARMA model:
  1. 1.

    After specifying the order of a stationary ARMA process, you need to estimate the parameters.

     
  2. 2.
    Assume the following:
    • The model order (p and q) is known.

    • The data has zero mean.

     
  3. 3.

    If step 2 is not a reasonable assumption, you can subtract the sample mean Y and fit a 0 mean ARMA model, as in Ø(B)Xt = θ(B)at where Xt = Yt – Y. Then use Xt + Y as the model for Yt.

     

Integrated ARMA Models

To fit a stationary model such as the one discussed earlier, it is imperative to remove nonstationary sources of variation. Differencing is widely used for econometric data. If Xt is replaced by ∇dXt, then you have a model capable of describing certain types of nonstationary series.
$$ {Y}_t={left(1-L
ight)}^d{X}_t $$
These are the estimating parameters of an ARIMA model:
  • ARIMA models are designated by the level of autoregression, integration, and moving averages.

  • This does not assume any pattern uses an iterative approach of identifying a model.

  • The model “fits” if residuals are generally small, randomly distributed, and, in general, contain no useful information.

Here is the example code for an ARIMA model.

Please download the required CSV file from https://github.com/pratips/book-Advanced-Data-Analytics-Using-Python and install the Python libraries using the pip command.
# importing libraries
from pandas import read_csv
from matplotlib import pyplot
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
# readfing csv
series = read_csv('temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
P = series.values
size = int(len(P) * 0.9)
#splitting into train test set
train, test = P[0:size], P[size:len(P)]
history = [p for p in train]
predictions = list()
for t in range(len(test)):
   # fitting model
    model = ARIMA(history, order=(5,1,0))
    model_fit = model.fit()
    output = model_fit.forecast()
    yhat = output[0]
    predictions.append(yhat)
    obs = test[t]
    history.append(obs)
    print('predicted=%f, expected=%f' % (yhat, obs))
error = mean_squared_error(test, predictions)
print('Test MSE: %.3f' % error)
# plotting results
pyplot.plot(test)
pyplot.plot(predictions, color='red')
pyplot.show()

The Fourier Transform

The representation of nonperiodic signals by everlasting exponential signals can be accomplished by a simple limiting process, and I will illustrate that nonperiodic signals can be expressed as a continuous sum (integral) of everlasting exponential signals. Say you want to represent the nonperiodic signal g(t). Realizing any nonperiodic signal as a periodic signal with an infinite time period, you get the following:
$$ g(t)=frac{1}{2pi}int_{-infty}^{infty }Gleft(omega 
ight){e}^{jomega t} domega $$
$$ Gleft(nDelta omega 
ight)=underset{T	o infty }{lim}int_{-{T}_0/2}^{T_0/2}{g}_p(t){e}^{- jnDelta omega t} dt $$
Hence:
$$ Gleft(omega 
ight)=int_{-infty}^{infty }g(t){e}^{- jomega t} dt $$

G(w) is known as a Fourier transform of g(t).

Here is the relation between autocovariance and the Fourier transform:
$$ gamma (0)={sigma}_X^2=int_0^{pi } dFleft(omega 
ight)=Fleft(pi 
ight) $$

An Exceptional Scenario

In the airline or hotel domain, the passenger load of month t is less correlated with data of t-1 or t-2 month, but it is more correlated for t-12 month. For example, the passenger load in the month of Diwali (October) is more correlated with last year’s Diwali data than with the same year’s August and September data. Historically, the pick-up model is used to predict this kind of data. The pick-up model has two variations.

In the additive pick-up model,
$$ 	extrm{X}left(	extrm{t}
ight)=	extrm{X}left(	extrm{t}hbox{-} 1
ight)+left[	extrm{X}left(	extrm{t}hbox{-} 12
ight)hbox{-} 	extrm{X}left(	extrm{t}hbox{-} 13
ight)
ight] $$
In the multiplicative pick-up model,
$$ 	extrm{X}left(	extrm{t}
ight)=	extrm{X}{left(	extrm{t}hbox{-} 1
ight)}^{ast } left[	extrm{X}left(	extrm{t}hbox{-} 12
ight)/	extrm{X}left(	extrm{t}hbox{-} 13
ight)
ight] $$

Studies have shown that for this kind of data the neural network–based predictor gives more accuracy than the time-series model.

In high-frequency trading in investment banking, time-series models are too time-consuming to capture the latest pattern of the instrument. So, they on the fly calculate dX/dt and d2X/dt2, where X is the price of the instruments. If both are positive, they blindly send an order to buy the instrument. If both are negative, they blindly sell the instrument if they have it in their portfolio. But if they have an opposite sign, then they do a more detailed analysis using the time-series data.

As I stated earlier, there are many scenarios in time-series analysis where R is a better choice than Python. So, here is an example of time-series forecasting using R. The beauty of the auto.arima model is that it automatically finds the order, trends, and seasonality of the data and fits the model. In the forecast, we are printing only the mean value, but the model provides the upper limit and the lower limit of the prediction in forecasting.

Please install the pmdarima library by using the following command:
pip install pmdarima
Once you install, you can try running the following code based on pmdarima’s manual. It loads a data set available in the package and then tries to fit auto arima and predicts as well.
import numpy as np
import pmdarima as pm
from pmdarima.datasets import load_wineind
# loading dataset
wineind = load_wineind().astype(np.float64)
# fitting a stepwise model:
stepwise_fit = pm.auto_arima(wineind, start_p=1, start_q=1, max_p=3, max_q=3, m=12, start_P=0, seasonal=True, d=1, D=1, trace=True, error_action=’ignore’, suppress_warnings=True, stepwise=True)
stepwise_fit.summary()
predicted_15 = stepwise_fit.predict(n_periods=15)
print(predicted_15)

Missing Data

One important aspect of time series and many other data analysis work is figuring out how to deal with missing data. In the previous code, you fill in the missing record with the average value. This is fine when the number of missing data instances is not very high. But if it is high, then the average of the highest and lowest values is a better alternative.

Summary

Following feature engineering, we’ll go over some basic statistics, particularly time-series models. One thing to keep in mind is that you may apply any supervised machine learning model on time-series data if you transform feature vectors (1xn) to the matrix (kxn), where each k element of each row is the latest k observation of the first column.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.151.158