Chapter 2: Regression Analysis for Time Series Data

Introduction

The Data Series

Durbin-Watson Test Using PROC REG

Definition of the Durbin-Watson Test Statistic

Procedure Output

Cochrane-Orcutt Estimation

Conclusion

Introduction

This chapter presents a simple, naive example of an ordinary regression using time series data. The results from this analysis can lead to unrealistic assumptions. Even when some of the errors are eliminated by the application of more refined techniques, the conclusion is doubtful. In practice, many regression models for time series data produce similar results. This chapter presents an analysis that is obviously in error in order to set the scene for properly modeling the dynamics of time series in later chapters.

The Data Series

The example in this chapter uses quarterly data for the milk production in the United States, measured in millions of pounds, as the dependent variable and the number of milk cows as the independent variable. This regression can be understood as a calculation of the milk production per cow in the form of the estimated regression coefficient. Quarterly dummies are applied in the regression because the relation might be affected by weather conditions. The data set includes data from 1998Q1 to 2012Q4, giving a total of T = 60 observations.

The series are plotted by the code in Program 2.1.

Program 2.1: Plotting the Two Time Series in an Overlaid Plot

PROC SGPLOT DATA=SASMTS.QUARTERLY_MILK;

    SERIES Y=PRODUCTION X=DATE/MARKERS MARKERATTRS=(SYMBOL=CIRCLE

        COLOR=BLUE);

    SERIES Y=COWS X=DATE/MARKERS MARKERATTRS=(SYMBOL=TRIANGLE

        COLOR=RED) Y2AXIS;

RUN;

Figure 2.1 shows that the series for milk production has a clear seasonal pattern, while seasonality seemingly is absent for the series for numbers of cows. Moreover, the milk production is clearly trending upward, while the numbers of cows vary in cyclic way.

Figure 2.1: Plots of the Time Series of Milk Production and the Number of Cows in the United States

image

Durbin-Watson Test Using PROC REG

For this data set, you apply PROC REG, using production as the dependent variable, called y and using the number of cows, denoted x, as the independent variable. In mathematical terms, the model is written as follows:

yt=α+βxt+δ1q1t+δ2q2t+δ3q3t+εt

yt=α+βxt+δ1q1t+δ2q2t+δ3q3t+εt

The parameterization includes the dummy variables, Q1, Q2, and Q3, for the three first quarters, leaving the intercept, α, as the value for the fourth quarter. These dummies are defined by letting, for example, Q1t = 1 for the first quarter and Q1t = 0 for the remaining quarters. The parameter β could in naive terms be interpreted as the milk production per cow or more precisely taking the units of measurement into account, the milk production measured as millions of pounds for one thousand cows. The code, Program 2.2, estimates this naive model using PROC REG.

Program 2.2: Durbin-Watson Test Using PROC REG

PROC REG DATA=SASMTS.QUARTERLY_MILK PLOTS=ALL;

    MODEL PRODUCTION=COWS Q1 Q2 Q3/DWPROB;

    ID DATE;

    TEST Q1=Q2=Q3=0;

RUN;

In regression models that are estimated by ordinary least squares (OLS), a crucial assumption is that the remainder terms, εt, should be uncorrelated. Usually, this assumption is not as obvious for time series data as it is for other types of data sets. In this example, a high production one quarter could well continue the next quarter because the actual cows are the same for some years.

Definition of the Durbin-Watson Test Statistic

The Durbin-Watson test statistic is defined by the following:

DW=Tt=2(etet1)2Tt=1e2t

This test statistic is closely related to the first-order autocorrelation of the residuals. The first order autocorrelation is defined as the correlation coefficient, corr(εt, εt-1), between a term εt and the previous term εt-1. In time series, a usual assumption is that the variance of the residuals εt is constant and that the relation expressed by the autocorrelation is constant. In other words, the variance and the autocorrelation are both assumed to be independent of the time index t. Similarly, the lag k autocorrelation is defined by corr(εt, εt-k).

For the residuals, et, which sum to zero as always for residuals from a regression model, the first-order autocorrelation is estimated by the following:

r1=Tt=2etet1Tt=1e2t

By these formulas, the following approximate relation exists between the Durbin-Watson test statistic and the estimated first order autocorrelation:

DW ≈ 2(1 − r1)

By definition, the Durbin-Watson statistic is bound to the interval from 0 to 4. If the test statistic equals 2, the residuals are independent—at least they show no first-order autocorrelation. If the value is close to 4, the residuals have a negative autocorrelation, while values of the Durbin-Watson test statistic close to 0 indicate a positive autocorrelation.

The distribution of the Durbin-Watson test is not explicitly known. Usually, an approximation is applied in the form of tables including a “gray zone” of nondecisive values. These tables allow for different numbers of independent variables in the model. This approximation is useful for short time series of say up to 30 observations. For longer time series, a calculation of the p-value by the asymptotic distribution of the first order autocorrelation gives an acceptable approximation.

The Durbin-Watson test tests only against the possibility of first-order autocorrelation in the residuals. For quarterly data, a fourth-order autocorrelation could be expected as well. But in the present setup, where quarterly dummies are included in the model, this situation is unlikely. More importantly, second-order autocorrelation can be present even if there is no first-order autocorrelation. So acceptance of a model by the Durbin-Watson test statistic is, strictly speaking, not reason enough to conclude that no autocorrelation exist.

On the other hand, a significant Durbin-Watson test statistic can point toward model deficits other than first-order autocorrelation. So the test statistic is often just a simple way to see whether something is wrong with the model. The test statistic is often used together with other similar tests for problems like heteroscedasticity and non-normality as crude indicators for the model fit.

Procedure Output

The option DWPROB to the MODEL statement gives the Durbin-Watson test statistic and the p-value for the test. This is the classical way to test for autocorrelation in residuals of regression models. Moreover, the first-order autocorrelation is printed. These test results are given in Output 2.1. In this situation, the autocorrelation problem is huge. And the Durbin-Watson statistic, DW = .044, is close to its lower boundary (which is zero), and the autocorrelation, r1 = .936, is close to its upper bound, which is 1. For this particular time series data, the test leads to a p-value very close to zero, and the hypothesis of independent residuals is clearly rejected.

Output 2.1: The Durbin-Watson Test

image

The conclusion is that OLS estimation is inefficient because the estimation should, preferably, be corrected for the autocorrelation. However, the estimates obtained by least squares in spite of the autocorrelation retain the attractive quality of being unbiased. So the estimated numbers for the regression coefficients are often not much disturbed by residual autocorrelation. The real problem arises when testing is performed, as the printed standard deviations for the estimated regression coefficients and all p-values are misleading. An intuitive way of explaining this situation is that the positive autocorrelation means that the observations are drawn from much fewer than 60 independent sources of information because the autocorrelation makes consecutive observations look alike.

The printed test results for the regression parameters is, for this reason, in error (Output 2.2). The same has to be said about the test for all seasonal dummies being zero (Output 2.3). This test is printed by the TEST statement in Program 2.2.

Output 2.2: Parameter Estimates from Ordinary Least Squares Estimation

image

Output 2.3: Simultaneous Test for Seasonality

image

Cochrane-Orcutt Estimation

Such problems are often seen when you are analyzing time series data using OLS by PROC REG. PROC REG offers no obvious solution to correct these errors. PROC REG focuses on cross-sectional data sets for which variable selection, identification of outliers, and influential data points are the main issues. But by using simple preprocessing in a DATA step, you might be able to analyze the data in a more correct way, even when using PROC REG.

The classical way in econometrics is to allow for autocorrelated residuals in applying Cochrane-Orcutt estimation. The idea is to transform the series by taking into account the estimated first-order autocorrelation for the residuals. This number, φ1 = .936, is printed in Output 2.1.

The method relies on the assumption that the residuals have the form of a first-order autoregressive, AR(1), model:

εt=φ1εt1+ζt

where the remainder terms ζt are assumed to be independent and identically distributed. A series of this form has a first-order autocorrelation that equals φ1. In Chapter 6, this model is extended in many ways to a very useful class for time series data.

The regression model is then transformed in the following way:

˜yt=ytφ1yt1 =α+βxt+δ1q1t+δ2q2t+δ3q3t+εtφ1(α+βxt1+δ1q1t1+δ2q2t1+δ3q3t1+εt1) =α+β˜xt+δ1˜q1t+δ2˜q2t+δ3˜q3t+ζt

where

˜yt=ytφ1yt1

˜xt=xtφ1xt1

ζt=εtφ1εt1

and

˜qit=qitφ1qit1

The manipulation of the data is easily coded as a DATA step followed by an application of PROC REG (Program 2.3). Note that the LAG function returns the lagged value of the series. In other words, for example, LAG(COWS) equals the number of cows in the previous quarter.

Program 2.3: Cochrane-Orcutt Estimation by a DATA Step and PROC REG

DATA CO_TRANSFORM;

    SET SASMTS.QUARTERLY_MILK;

    Y=PRODUCTION-0.936*LAG(PRODUCTION);

    X=COWS-0.936*LAG(COWS);

    QQ1=Q1-0.936*LAG(Q1);

    QQ2=Q1-0.936*LAG(Q2);

    QQ3=Q1-0.936*LAG(Q3);

RUN;

PROC REG DATA= CO_TRANSFORM PLOTS=ALL;

    MODEL Y=X QQ1 QQ2 QQ3/DW DWPROB;

    ID DATE;

    TEST QQ1=QQ2=QQ3=0;

RUN;

The estimated coefficient to the number of cows has changed from 17.997 to 7.796 (Output 2.4), and the standard deviation for the parameter estimates is much smaller than in Output 2.3. The seasonal dummies are now significant, meaning that a seasonality exists in the production of milk per cow, which is intuitive.

Output 2.4: Parameter Estimates by Cochrane-Orcutt Estimation

image

The autocorrelation problem is fixed according to the Durbin-Watson test statistic (Output 2.5). The method reduces the number of observations in the analysis by one, as is clearly stated in Output 2.5, because the definition of the variables in the DATA step excludes the first observation, which cannot be defined because it has no lagged value in the data set.

Output 2.5: Durbin-Watson Test for the Residuals of Cochrane-Orcutt Estimation

image

Such large changes in parameter estimates are usually not seen by Cochrane-Orcutt estimation when values of the first-order autocorrelation are around, say, .5. But in this case , φ1 = .936. This value is very close to the upper limit +1, which corresponds to a unit root. When the value φ1 = 1 is applied, it makes the whole model more dynamic in handling the quarterly changes in the two time series. This is the subject of Chapter 4 where the example is continued.

Conclusion

This chapter demonstrates the shortcomings of regression models when estimated by OLS for time series data that has autocorrelated errors. The old-fashioned tool for mending the problems, the Cochrane-Orcutt estimation algorithm, works, but it is not the final solution of the problems. Nowadays, more efficient procedures exist for full maximum likelihood estimation of all parameters in models for time series data.

For modeling multiple time series, SAS offers many other procedures that are designed especially for time series, such as PROC AUTOREG, which is a straightforward extension of PROC REG. The AUTOREG procedure will be considered in the next few chapters, but the rest of the book will concentrate on the much more specialized procedure, PROC VARMAX, which includes up-to-date models for the dynamics of multiple time series.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.186