Chapter 4: Regression Models for Differenced Series

Introduction

Regression Model for the Differenced Series

Regression Results

Inclusion of the Lagged Independent Variable

Reverted Regression

Inclusion of the Lagged Independent Variable in the Model

Two Lags of the Independent Variables

Inclusion of the Lagged Dependent Variable in the Regression

How to Interpret a Model with a Lagged Dependent Variable

Conclusions about the Models in Chapters 2, 3, and 4

Introduction

In Chapters 2 and 3, the original series of the milk production is modeled with the use of the original series of the number of cows as the independent variable. In Chapter 5, tests for stationarity of these series of changes from one quarter to the next quarter are discussed. But such test results are not necessary to justify the modeling in this chapter. The idea is that the number of cows and the milk production have some levels determined by history, but the dynamics of these series is what matters for the future.

You will find that the models for the dependence between these differenced series lead to independent residuals. Because the series of the milk production has a clear seasonal pattern, while the number of cows has no seasonal pattern, the model must include seasonal dummies. The estimation of such models, using simple regression procedures in SAS, is the subject for this chapter. In this chapter, only first order differences are applied. An alternative approach is seasonal differencing, which takes the difference of say the milk production during one quarter to the same quarter the previous year.

Regression Model for the Differenced Series

The series of differences is easily calculated by a DATA step (Program 4.1), leading to new data set DIF_MILK. The differences are calculated by the DIF function, which transforms yt into the following equation:

Δytytyt−1.

The differences of the production series are further lagged one quarter by the LAG function in the definition of the variable LDPRODUCTION, which is used later in this chapter. Moreover, differences in the series for milk production are lagged even two quarters by the LAG2 function.

The two series are plotted by PROC SGPLOT. Program 4.1 also includes a simple application of PROC REG for modeling these differences.

Program 4.1: A Simple Regression for the Differenced Series

DATA DIF_MILK;

    SET SASMTS.QUARTERLY_MILK;

    DCOWS=DIF(COWS);

    LDCOWS=LAG(DCOWS);

    DPRODUCTION=DIF(PRODUCTION);

    LDPRODUCTION=LAG(DPRODUCTION);

    L2DPRODUCTION=LAG2(DPRODUCTION);

RUN;

PROC SGPLOT DATA=DIF_MILK;

    SERIES Y=DPRODUCTION X=DATE/MARKERS MARKERATTRS=(SYMBOL=CIRCLE

        COLOR=BLUE);

    REFLINE 0;

RUN;

PROC SGPLOT DATA=DIF_MILK;

    SERIES Y=DCOWS X=DATE/MARKERS MARKERATTRS=(SYMBOL=TRIANGLE

        COLOR=RED);

    REFLINE 0;

RUN;

PROC REG DATA=DIF_MILK PLOTS(UNPACK)=ALL;

    MODEL DPRODUCTION=DCOWS Q1 Q2 Q3/ DWPROB;

RUN;

The two series of differences are presented in Figure 4.1 and Figure 4.2. Both series seem to be stable with no trends. The differenced series for milk production has a regular seasonal pattern, and the differenced series for the number of cows looks like a smooth curve moving around the horizontal axis.

Figure 4.1: Differenced Series for Milk Production

image

Figure 4.2: Differenced Series for the Number of Cows

image

The autocorrelation problem in the regression analysis seems to be solved, because the Durbin-Watson test is insignificant (Output 4.1).

Output 4.1: Durbin-Watson Test in the Regression Model for Differenced Series

image

Regression Results

The parameter estimates, presented in Output 4.2, are close to the estimates derived by the coding in Chapters 2 and 3, which is corrected for autocorrelated residuals in the regression for the level series. The estimation results are very similar. The only difference is that the estimated value of the autoregressive parameter is φ1 = .936 in Chapter 2, and φ1 = .996 in the full maximum likelihood estimation in Chapter 3, while the value φ1 = 1 is applied in Program 4.1.

The seasonal dummies are highly significant. They reflect that the seasonality in the milk production in Figure 2.1 and Figure 4.1 is clear and constant. The seasonal dummy variables are not transformed by differencing, so the actual meaning of the dummy variables has changed. In the model for the differenced series, the dummy variables are dummies for the changes in the dependent variable, the production of milk, and not the actual level of this series of milk production. The estimated coefficient of the number of cows is smaller than was the case in Output 2.4 and Output 3.4, where estimated autoregressive parameters are applied. This estimated regression coefficient is also much smaller than in the case of the regression in levels without differencing, as reported in Output 2.2.

Output 4.2: Estimated Parameters in the Model for Differenced Series

image

This model is acceptable from an autocorrelation point of view.

Inclusion of the Lagged Independent Variable

You could argue that milk production is also influenced by the number of cows in the preceding quarter, because the stock of individual milk-producing cows is almost the same for two succeeding quarters. You can model this type of dynamics by including the lagged difference, which was also defined as a variable, LDCOWS, in the DATA step in Program 4.1. The code for PROC REG is given in Program 4.2.

Program 4.2: Inclusion of the Lagged Independent Variable in the Model for Differenced Series

PROC REG DATA=DIF_MILK PLOTS(UNPACK)=ALL;

    MODEL DPRODUCTION=DCOWS LDCOWS Q1 Q2 Q3/ DWPROB;

RUN;

It turns out from the estimated parameters, Output 4.3, that the extra parameter is insignificant and, moreover, the estimated value, −.89, has the wrong sign, according to intuition. The coefficient of the unlagged difference for the number of cows is somewhat larger than in Output 4.2. The change in the number of cows from one quarter to the next quarter could well be positively correlated. Therefore, these estimated parameters are negatively correlated, and the negative coefficient of the LDCOWS variable is simply a statistical artifact because of multicollinearity.

Output 4.3: Parameter Estimates in the Model Including the Lagged Independent Variable

image

The result was that the milk production in one quarter was determined only by the number of cows in the same quarter. The idea that, say, changing the productivity of the stock of cows could also lead to a lagged dependence was insignificant. This model is in some sense the supply side of the dairy industry.

Reverted Regression

In the reverted regression, the differenced series of the number of cows is used as the dependent variable, and the differenced series of milk production is used as the independent variable. The reverted regression is modeled in this section because it illustrates many features of the dynamics in time series modeling. This model section is mainly to prepare you for the simultaneous modeling of the two series, which exemplifies the main subject of this book. However, economically, it is a simple model for the demand side of the dairy industry: You may argue that increased milk production is driven by increased consumer demand, which leads to a higher number of cows to meet this demand.

Program 4.3 gives the simple code for a regression by PROC AUTOREG, where the number of cows is modeled as the dependent variable, and milk production as the independent variable. PROC AUTOREG is applied because autocorrelation exists. So a plot of the autocorrelation function will be useful, but the estimated regression coefficients and their standard deviations remain the same if PROC REG is applied instead.

Program 4.3: Differences in the Number of Cows Explained by Changes in Milk Production

PROC AUTOREG DATA=DIF_MILK PLOTS(UNPACK)=ALL;

    MODEL DCOWS=DPRODUCTION Q1 Q2 Q3/ DWPROB;

RUN;

The estimated regression coefficient of milk production is significantly positive (Output 4.4), which is natural because the reverted relation is significant. Moreover, the seasonal dummies are significant. This is because the independent variable, milk production, has a clear seasonal pattern. This pattern has to be “dummied out” in a model with a dependent variable (the number of cows), even if the dependent variable has no seasonal structure.

Output 4.4: Estimated Parameters from Program 4.3

image

The autocorrelation function of the residuals, Figure 4.3, indicates a clearly significant first-order autocorrelation; but also a cyclic behavior, which could point to an autoregressive model order of at least two.

Figure 4.3: Autocorrelation Function for the Residuals of Program 4.3

image

Inclusion of the Lagged Independent Variable in the Model

In this section, the autocorrelations of the residual process in the previous section are modeled by inclusion of lags in the regression, rather than by extension of the residual process with lags, in order to directly model the dynamics of the two series. The intention is to formulate a model of the dynamics of the series, rather than trying to repair a regression model with a poor fit.

Program 4.4 is used to include in the model the lagged series of first-order differences of the milk production series.

Program 4.4: Inclusion of the Lagged Independent Variable

PROC AUTOREG DATA=DIF_MILK PLOTS(UNPACK)=ALL;

    MODEL DCOWS=DPRODUCTION LDPRODUCTION Q1 Q2 Q3/ DWPROB;

RUN;

The resulting parameters are presented by Output 4.5. Both the lag 0 and the lag 1 changes in milk production are significant. The positive coefficient of the lagged difference in milk production indicates that an increment in milk production leads to an immediate increase in number of cows, followed by another increase in the number of cows in the next quarter.

Output 4.5: Estimated Parameters from Program 4.4

image

This inclusion of the lagged independent variable does not, however, solve the autocorrelation problem (see Figure 4.4). At least, the lag 1 autocorrelation is significant, and the behavior is systematic and can be seen as a cycle even if the values for lags larger than 1 are insignificant. This result means that the autocorrelation function might point to a moving average model of order 1 as a proper model, or perhaps an autoregressive model high order. However, the partial autocorrelation function and the inverse autocorrelation function of the residual series (not shown here) have a significant value only at lag 1. This means that an autoregression of first order is sufficient. This implies that a further lagged value of the right side variable should do the job.

Figure 4.4: Residual Autocorrelations in a Model with a Lagged Independent Variable

image

Two Lags of the Independent Variables

In Program 4.5, the dynamics is extended by having the dependent variable lagged twice as an attempt to solve the autocorrelation problem seen in Figure 4.4. The use of two lags instead of only one is natural because some time is required to adjust the number of milking cows to meet the actual demand.

Program 4.5: Inclusion of Two Lagged Independent Variables

PROC AUTOREG DATA=DIF_MILK PLOTS(UNPACK)=ALL;

    MODEL DCOWS=DPRODUCTION LDPRODUCTION L2DPRODUCTION Q1 Q2 Q3/ DWPROB;

RUN;

The extra parameter is also significant (Output 4.6), and the model shows that the number of cows increases during both the same quarter and the next two quarters if the demand for milk increases from one quarter to the next quarter.

Output 4.6: Estimated Parameters from Program 4.5

image

From the estimated autocorrelation function for the residuals (Figure 4.5), you can argue that the autocorrelation problem is solved, but the first-order autocorrelation is close to significance. This autocorrelation is significant if the one-sided alternative hypothesis, which is natural in the present context, is applied.

Figure 4.5: Residual Autocorrelations in a Model with Two Lags of Independent Variable

image

Inclusion of the Lagged Dependent Variable in the Regression

In Program 4.6, the lagged value of the dependent variable is included as an independent variable in the regression. This inclusion is accomplished by PROC AUTOREG in order to show the plot of the residual autocorrelations. As for the parameter estimates, however, the similar application of PROC REG gives the same results. This idea may be rather counterintuitive at first sight, but the main purpose of this exercise will be clear in the conclusion of this chapter.

Program 4.6: Inclusion of the Lagged Dependent Variable as Right-Side Variable

PROC AUTOREG DATA=DIF_MILK PLOTS(UNPACK)=ALL;

    MODEL DCOWS=DPRODUCTION LDCOWS Q1 Q2 Q3/ DWPROB;

RUN;

The estimated parameters for both the changes in milk production and the lagged changes in the number of cows are significant for the present change in the number of cows (Output 4.7).

Output 4.7: Inclusion of the Lagged Value of the Dependent Variable

image

The autocorrelation problem is now completely solved (Figure 4.6). All autocorrelations are insignificant and, moreover, the autocorrelations seem random, having no systematic behavior in the signs.

Figure 4.6: Autocorrelations of Residuals in the Model with a Lagged Dependent Variable

image

How to Interpret a Model with a Lagged Dependent Variable

The model asserts that changes in the number of cows, Δyt = ytyt−1 , are explained by changes in the production of milk but, more interesting, also by the lagged changes of the series of the number of cows itself, Δyt−1. By expanding the estimated relationship, you can transform it into a relationship that explains changes in the number of cows by many lags of changes in milk production.

This is seen by the following calculation where the error terms are left out:

Δyt     =0.03Δxt+0.52Δyt1=0.03Δxt+0.52(0.03Δxt1+0.52Δyt2)=0.03Δxt+0.52×0.03Δxt1+0.522(0.03Δxt2+0.52Δyt3)=....=0.03j=00.52jΔxtj

This expression asserts that an increase by 1 in milk production leads to an immediate increase by .03 in the number of cows and then gradually increases them further. In this way, the inclusion of many lags of the independent variable is parameterized by just one lag of the dependent variable.

This method is sometimes denoted the Koyck lag. The expression for the dynamics of the series includes only two parameters even if an infinite number of lags is included in the model. In this way, the parameterization is made quite efficient.

The total effect is calculated as the infinite sum of the series:

0.0310.52=0.06

If you take the actual units of the two series into account, this expression means that an increase of 1 million pounds of milk in total leads to .06 thousand more cows, or just 60 more cows.

Conclusions about the Models in Chapters 2, 3, and 4

In Chapters 2 through 4, the relationships between two time series were modeled with ordinary least squares regression. This technique is useful even in situations for data with time series dynamics. It is achieved by methods familiar to everyone with an introductory knowledge of regression analysis. Most calculations can be performed by PROC REG, which is very simple.

But, in some sense, the methods seem counterintuitive. One problem is that it is hard to tell which of the two series should be chosen as the independent series and which one should be dependent on the other. It is possible to argue for a two-way relationship.

Models for multidimensional time series, possible with feedback, as in the present example, are the subject of the subsequent chapters, which also include a more precise formulation of the various models applied to multivariate time series.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.136.84