Chapter 2: Regression Analysis for Time Series Data

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2: Regression Analysis for Time Series Data

Introduction

The Data Series

Durbin-Watson Test Using PROC REG

Definition of the Durbin-Watson Test Statistic

Procedure Output

Cochrane-Orcutt Estimation

Conclusion

Introduction

This chapter presents a simple, naive example of an ordinary regression using time series data. The results from this analysis can lead to unrealistic assumptions. Even when some of the errors are eliminated by the application of more refined techniques, the conclusion is doubtful. In practice, many regression models for time series data produce similar results. This chapter presents an analysis that is obviously in error in order to set the scene for properly modeling the dynamics of time series in later chapters.

The Data Series

The example in this chapter uses quarterly data for the milk production in the United States, measured in millions of pounds, as the dependent variable and the number of milk cows as the independent variable. This regression can be understood as a calculation of the milk production per cow in the form of the estimated regression coefficient. Quarterly dummies are applied in the regression because the relation might be affected by weather conditions. The data set includes data from 1998Q1 to 2012Q4, giving a total of T = 60 observations.

The series are plotted by the code in Program 2.1.

Program 2.1: Plotting the Two Time Series in an Overlaid Plot

PROC SGPLOT DATA=SASMTS.QUARTERLY_MILK;

SERIES Y=PRODUCTION X=DATE/MARKERS MARKERATTRS=(SYMBOL=CIRCLE

COLOR=BLUE);

SERIES Y=COWS X=DATE/MARKERS MARKERATTRS=(SYMBOL=TRIANGLE

COLOR=RED) Y2AXIS;

RUN;

Figure 2.1 shows that the series for milk production has a clear seasonal pattern, while seasonality seemingly is absent for the series for numbers of cows. Moreover, the milk production is clearly trending upward, while the numbers of cows vary in cyclic way.

Figure 2.1: Plots of the Time Series of Milk Production and the Number of Cows in the United States

Durbin-Watson Test Using PROC REG

For this data set, you apply PROC REG, using production as the dependent variable, called y and using the number of cows, denoted x, as the independent variable. In mathematical terms, the model is written as follows:

y t = α + β x t + δ 1 q 1 t + δ 2 q 2 t + δ 3 q 3 t + ε t

$yt=α+βxt+δ1q1t+δ2q2t+δ3q3t+εt$

The parameterization includes the dummy variables, Q1, Q2, and Q3, for the three first quarters, leaving the intercept, α, as the value for the fourth quarter. These dummies are defined by letting, for example, Q1t = 1 for the first quarter and Q1t = 0 for the remaining quarters. The parameter β could in naive terms be interpreted as the milk production per cow or more precisely taking the units of measurement into account, the milk production measured as millions of pounds for one thousand cows. The code, Program 2.2, estimates this naive model using PROC REG.

Program 2.2: Durbin-Watson Test Using PROC REG

PROC REG DATA=SASMTS.QUARTERLY_MILK PLOTS=ALL;

MODEL PRODUCTION=COWS Q1 Q2 Q3/DWPROB;

ID DATE;

TEST Q1=Q2=Q3=0;

RUN;

In regression models that are estimated by ordinary least squares (OLS), a crucial assumption is that the remainder terms, εt, should be uncorrelated. Usually, this assumption is not as obvious for time series data as it is for other types of data sets. In this example, a high production one quarter could well continue the next quarter because the actual cows are the same for some years.

Definition of the Durbin-Watson Test Statistic

The Durbin-Watson test statistic is defined by the following:

$DW=∑t=2T(et−et−1)2∑t=1Tet2$

This test statistic is closely related to the first-order autocorrelation of the residuals. The first order autocorrelation is defined as the correlation coefficient, corr(ε_t, ε_t_-1), between a term ε_t and the previous term ε_t_-1. In time series, a usual assumption is that the variance of the residuals ε_t is constant and that the relation expressed by the autocorrelation is constant. In other words, the variance and the autocorrelation are both assumed to be independent of the time index t. Similarly, the lag k autocorrelation is defined by corr(ε_t, ε_t_-k).

For the residuals, e_t, which sum to zero as always for residuals from a regression model, the first-order autocorrelation is estimated by the following:

$r1=∑t=2Tetet−1∑t=1Tet2$

By these formulas, the following approximate relation exists between the Durbin-Watson test statistic and the estimated first order autocorrelation:

DW ≈ 2(1 − r₁)

By definition, the Durbin-Watson statistic is bound to the interval from 0 to 4. If the test statistic equals 2, the residuals are independent—at least they show no first-order autocorrelation. If the value is close to 4, the residuals have a negative autocorrelation, while values of the Durbin-Watson test statistic close to 0 indicate a positive autocorrelation.

The distribution of the Durbin-Watson test is not explicitly known. Usually, an approximation is applied in the form of tables including a “gray zone” of nondecisive values. These tables allow for different numbers of independent variables in the model. This approximation is useful for short time series of say up to 30 observations. For longer time series, a calculation of the p-value by the asymptotic distribution of the first order autocorrelation gives an acceptable approximation.

The Durbin-Watson test tests only against the possibility of first-order autocorrelation in the residuals. For quarterly data, a fourth-order autocorrelation could be expected as well. But in the present setup, where quarterly dummies are included in the model, this situation is unlikely. More importantly, second-order autocorrelation can be present even if there is no first-order autocorrelation. So acceptance of a model by the Durbin-Watson test statistic is, strictly speaking, not reason enough to conclude that no autocorrelation exist.

On the other hand, a significant Durbin-Watson test statistic can point toward model deficits other than first-order autocorrelation. So the test statistic is often just a simple way to see whether something is wrong with the model. The test statistic is often used together with other similar tests for problems like heteroscedasticity and non-normality as crude indicators for the model fit.

Procedure Output

The option DWPROB to the MODEL statement gives the Durbin-Watson test statistic and the p-value for the test. This is the classical way to test for autocorrelation in residuals of regression models. Moreover, the first-order autocorrelation is printed. These test results are given in Output 2.1. In this situation, the autocorrelation problem is huge. And the Durbin-Watson statistic, DW = .044, is close to its lower boundary (which is zero), and the autocorrelation, r1 = .936, is close to its upper bound, which is 1. For this particular time series data, the test leads to a p-value very close to zero, and the hypothesis of independent residuals is clearly rejected.

Output 2.1: The Durbin-Watson Test

The conclusion is that OLS estimation is inefficient because the estimation should, preferably, be corrected for the autocorrelation. However, the estimates obtained by least squares in spite of the autocorrelation retain the attractive quality of being unbiased. So the estimated numbers for the regression coefficients are often not much disturbed by residual autocorrelation. The real problem arises when testing is performed, as the printed standard deviations for the estimated regression coefficients and all p-values are misleading. An intuitive way of explaining this situation is that the positive autocorrelation means that the observations are drawn from much fewer than 60 independent sources of information because the autocorrelation makes consecutive observations look alike.

The printed test results for the regression parameters is, for this reason, in error (Output 2.2). The same has to be said about the test for all seasonal dummies being zero (Output 2.3). This test is printed by the TEST statement in Program 2.2.

Output 2.2: Parameter Estimates from Ordinary Least Squares Estimation

Output 2.3: Simultaneous Test for Seasonality

Cochrane-Orcutt Estimation

Such problems are often seen when you are analyzing time series data using OLS by PROC REG. PROC REG offers no obvious solution to correct these errors. PROC REG focuses on cross-sectional data sets for which variable selection, identification of outliers, and influential data points are the main issues. But by using simple preprocessing in a DATA step, you might be able to analyze the data in a more correct way, even when using PROC REG.

The classical way in econometrics is to allow for autocorrelated residuals in applying Cochrane-Orcutt estimation. The idea is to transform the series by taking into account the estimated first-order autocorrelation for the residuals. This number, φ1 = .936, is printed in Output 2.1.

The method relies on the assumption that the residuals have the form of a first-order autoregressive, AR(1), model:

$εt=φ1εt−1+ζt$

where the remainder terms ζt are assumed to be independent and identically distributed. A series of this form has a first-order autocorrelation that equals φ1. In Chapter 6, this model is extended in many ways to a very useful class for time series data.

The regression model is then transformed in the following way:

$y˜t=yt−φ1yt−1 =α+βxt+δ1q1t+δ2q2t+δ3q3t+εt−φ1(α+βxt−1+δ1q1t−1+δ2q2t−1+δ3q3t−1+εt−1) =α+βx˜t+δ1q˜1t+δ2q˜2t+δ3q˜3t+ζt$

where

$y˜t=yt−φ1yt−1$

$x˜t=xt−φ1xt−1$

$ζt=εt−φ1εt−1$

and

$q˜it=qit−φ1qit−1$

The manipulation of the data is easily coded as a DATA step followed by an application of PROC REG (Program 2.3). Note that the LAG function returns the lagged value of the series. In other words, for example, LAG(COWS) equals the number of cows in the previous quarter.

Program 2.3: Cochrane-Orcutt Estimation by a DATA Step and PROC REG

DATA CO_TRANSFORM;

SET SASMTS.QUARTERLY_MILK;

Y=PRODUCTION-0.936*LAG(PRODUCTION);

X=COWS-0.936*LAG(COWS);

QQ1=Q1-0.936*LAG(Q1);

QQ2=Q1-0.936*LAG(Q2);

QQ3=Q1-0.936*LAG(Q3);

RUN;

PROC REG DATA= CO_TRANSFORM PLOTS=ALL;

MODEL Y=X QQ1 QQ2 QQ3/DW DWPROB;

ID DATE;

TEST QQ1=QQ2=QQ3=0;

RUN;

The estimated coefficient to the number of cows has changed from 17.997 to 7.796 (Output 2.4), and the standard deviation for the parameter estimates is much smaller than in Output 2.3. The seasonal dummies are now significant, meaning that a seasonality exists in the production of milk per cow, which is intuitive.

Output 2.4: Parameter Estimates by Cochrane-Orcutt Estimation

The autocorrelation problem is fixed according to the Durbin-Watson test statistic (Output 2.5). The method reduces the number of observations in the analysis by one, as is clearly stated in Output 2.5, because the definition of the variables in the DATA step excludes the first observation, which cannot be defined because it has no lagged value in the data set.

Output 2.5: Durbin-Watson Test for the Residuals of Cochrane-Orcutt Estimation

Such large changes in parameter estimates are usually not seen by Cochrane-Orcutt estimation when values of the first-order autocorrelation are around, say, .5. But in this case , φ1 = .936. This value is very close to the upper limit +1, which corresponds to a unit root. When the value φ1 = 1 is applied, it makes the whole model more dynamic in handling the quarterly changes in the two time series. This is the subject of Chapter 4 where the example is continued.

Conclusion

This chapter demonstrates the shortcomings of regression models when estimated by OLS for time series data that has autocorrelated errors. The old-fashioned tool for mending the problems, the Cochrane-Orcutt estimation algorithm, works, but it is not the final solution of the problems. Nowadays, more efficient procedures exist for full maximum likelihood estimation of all parameters in models for time series data.

For modeling multiple time series, SAS offers many other procedures that are designed especially for time series, such as PROC AUTOREG, which is a straightforward extension of PROC REG. The AUTOREG procedure will be considered in the next few chapters, but the rest of the book will concentrate on the much more specialized procedure, PROC VARMAX, which includes up-to-date models for the dynamics of multiple time series.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2: Regression Analysis for Time Series Data

Create new playlist

Sign In

Sign Up

Chapter 2: Regression Analysis for Time Series Data

Introduction

The Data Series

Durbin-Watson Test Using PROC REG

Definition of the Durbin-Watson Test Statistic

Procedure Output

Cochrane-Orcutt Estimation

Conclusion

Table of Contents for
Chapter 2: Regression Analysis for Time Series Data