2.4. Estimation with PROC GLM for More Than Two Observations Per Person

When each person has three or more measurements on the time-varying variables, it's not obvious how to extend the method of difference scores. One approach is to compute first-difference scores for each pair of adjacent observations, yielding T – 1 observations for each individual. Then the problem is to estimate a single model for the entire set while allowing for correlated errors.[] Another approach is the dummy variable method, which gives the correct results in this new situation but is computationally intensive. In general, the easiest method is the one that was implemented in the last section using the ABSORB statement in PROC GLM. We now consider that method in greater detail.

[] One reasonable method is to do generalized least squares on the difference equations, allowing for unrestricted correlations between the error terms from the same individual. In SAS, this can be done with PROC GENMOD using the REPEATED statement.

As before, our basic model is given by the equation


where αi is a set of fixed parameters, εit satisfies the assumptions of a standard linear model and xit is assumed to be strictly exogenous. OLS produces optimal estimates of the parameters, but direct application of OLS with dummy variables for the αi terms is computationally tedious. It turns out, however, that we can get identical results by "conditioning out" the αi terms and performing OLS on deviation scores. That is, for each person and for each time-varying variable (both response variables and predictors), we compute the means over time for that person:


where ni is the number of measurements for person i. Then we subtract the person-specific means from the observed values of each variable:


Finally, we regress y* on x*, plus variables to represent the effect of time. This is what PROC GLM does when you use the ABSORB command.

If you construct the deviation scores yourself (using, say, PROC MEANS and a DATA step) and then use PROC REG to estimate the regression, you will get the correct OLS regression coefficients for the time-varying predictors. But the standard errors and p-values will be incorrect. That's because PROC REG calculates the degrees of freedom based on the number of variables on the MODEL statement, but it should actually include the number of dummy variables implicitly used to represent different persons in the sample (580 for the NLSY data). Formulas are available to correct these statistics (Judge et al. 1985), but it's much easier to let PROC GLM do it automatically. When the ABSORB statement is used, GLM converts all variables to deviation scores, estimates the regression, and uses the correct degrees of freedom to compute standard errors and p-values.

Let's try this with the NLSY data, except now we also include data from the middle year, 1992. Again, the first step is to construct a data set with one observation for each person at each time point:

DATA persyr3;
   SET my.nlsy;
   id=_N_;
   time=1;
   anti=anti90;
   self=self90;
   pov=pov90;
   OUTPUT;

time=2;
   anti=anti92;
   self=self92;
   pov=pov92;
   OUTPUT;
   time=3;
   anti=anti94;
   self=self94;
   pov=pov94;
   OUTPUT;
RUN;

If there were more than three time points, it might be worthwhile to shorten this program by using arrays and a DO loop. Note that TIME has been assigned values of 1, 2 and 3, which facilitates the use of a CLASS statement in PROC GLM. This DATA step produced 1,743 observations, three for each of the 581 children.

The PROC GLM statements for estimating the basic model are virtually identical to those for the two-period case, except that we now use a CLASS statement to handle the three-valued TIME variable:

PROC GLM DATA=persyr3;
   ABSORB id;
   CLASS time;
   MODEL anti=self pov time / SOLUTION;
RUN;

Note that for this to work, the data set must be sorted by the variable specified on the ABSORB statement. Of course, the DATA step that produced PERSYR3 did this automatically.

Results in Output 2.10 are similar to what we found in Output 2.2 for two time points: a highly significant effect of self-esteem with a coefficient of about –.055, and a nonsignificant effect of poverty. TIME also has a significant effect, with antisocial behavior increasing over the three periods.

Table 2.10. Output 2.10 GLM Estimates of a Fixed Effects Model for Three Periods
Dependent Variable: anti
SourceDFSum of SquaresMean SquareF ValuePr>F
Model5843181.8831125.4484305.48<.0001
Error11581151.2322070.994156  
Corrected Total17424333.115318   
R-SquareCoeff VarRoot MSEantiMean
0.73431860.914800.9970741.636833
SourceDFType I SSMean SquareF ValuePr>F
id5803142.4486525.4180155.45<.0001
self123.96625523.96625524.11<.0001
pov11.2543921.2543921.260.2615
time214.2138137.1069077.150.0008
SourceDFType III SSMean SquareF ValuePr>F
self127.2936239727.2936239727.45<.0001
pov11.441384751.441384751.450.2288
time214.213813487.106906747.150.0008
ParameterEstimateStandard ErrortValuePr>|t|
self−.05515140270.01052575−5.24<.0001
pov0.11247489080.093409881.200.2288
time 1−.21073656660.05879781−3.580.0004
time 2−.16634319790.05856544−2.840.0046
time 30.0000000000...

We also learn from the output that 73% of the variation in antisocial behavior is between children, whereas the remaining 27% is within children (across time). I got these numbers by dividing the Type I sum of squares for ID (3142.44) by the corrected total sum of squares (4333.12), which yields .73. The square root of this number (.85) is an estimate of the intraclass correlation for these data. Since the total R2 from the model is also .73, we conclude that the time-dependent predictors are not accounting for much additional variation.

It's also instructive to compare the results in Output 2.10 to what you get when the ABSORB command is omitted—that is, OLS regression with no control for between-person variation. These results are shown in Output 2.11. Notice first that the mean squared error in Output 2.10 is less than half of what we see in Output 2.11. That's because the control for between-person variation greatly reduces the error sum of squares (the R2 increases from .048 to .734). It also reduces the degrees of freedom (which would make the mean squared error larger), but in this case, the reduction is not nearly as rapid. In data sets where the between-person proportion of variation in the dependent variable is small, the mean squared error could go up rather than down in a fixed effects model.

Table 2.11. Output 2.11 Conventional OLS without Control for Between-Person Variation
Dependent Variable: anti
SourceDFSum of SquaresMean SquareF ValuePr>F
Model4208.31284852.07821221.94<.0001
Error17384124.8024702.373304  
Corrected Total17424333.115318   
R-SquareCoeff VarRoot MSEantiMean
0.04807594.117921.5405531.636833
SourceDFType III SSMean SquareF ValuePr>F
self186.302252786.302252736.36<.0001
pov1102.8341414102.834141443.33<.0001
time215.77869907.88934953.320.0362
ParameterEstimateStandard ErrortValuePr>|t|
Intercept2.9596173900.2398531812.34<.0001
self−0.0668940660.01109311−6.03<.0001
pov0.5175738420.078628566.58<.0001
time 1−0.2227416590.09059359−2.460.0140
time 2−0.1721355480.09043193−1.900.0571
time 30.000000000...

The root mean squared error directly affects the standard errors of the coefficients, so we might expect the standard errors to be smaller for the fixed effects regression than for the conventional regression. That's true for SELF, where the fixed effects standard error is .0105 while the conventional OLS standard error is .0111. But for POV, the fixed effects standard error is .0934 and the conventional OLS standard error is .0786. Why the difference? The answer is that the standard errors depend not only on the root mean square error, but also on the relative proportion of within- and between-person variation on the predictor variables. Other things being equal, the greater the proportion of variation that is between persons on a given predictor variable, the larger the standard error of its fixed effects coefficient. Other analysis shows that for SELF, 53% of the variation is between persons. For POV, on the other hand, the between-person variation is 70%. That's why the standard error for POV went up rather than down under the fixed effects analysis. The ideal situation for a fixed effects analysis is when all of the variation on the time-varying predictors is within persons, but there's still lots of between-person variation on the response variable.

As in the two-period case, we can also test whether the time-varying predictors have coefficients that vary with time by including interactions between them and TIME:

PROC GLM DATA=persyr3;
   ABSORB id;
   CLASS time;
   MODEL anti=self pov time self*time pov*time / SOLUTION;
RUN;

With high p-values for the two interactions, Output 2.12 shows no evidence for variation over time of the coefficients for SELF and POV.

Table 2.12. Output 2.12 Tests for Interaction between TIME and Time-Varying Predictors
SourceDFType III SSMean SquareF ValuePr>F
self126.6134057226.6134057226.74<.0001
pov11.340587141.340587141.350.2460
time23.724283191.862141601.870.1544
self*time22.626843931.313421971.320.2676
pov*time20.042912160.021456080.020.9787

In a similar way, we can test for constancy of the effects of time-invariant predictors:

PROC GLM DATA=persyr3;
   ABSORB id;
   CLASS time;
   MODEL anti=self pov time black*time hispanic*time
         childage*time married*time gender*time
         momage*time momwork*time / SOLUTION;
RUN;

As shown in Output 2.13, there is no evidence that any of the time-invariant predictors has an effect that varies with time.

Table 2.13. Output 2.13 Tests for Interaction between TIME and Time-Invariant Predictors
SourceDFType III SSMean SquareF ValuePr>F
self124.1335236824.1335236824.41<.0001
pov11.280458451.280458451.300.2553
time20.482550890.241275450.240.7835
black*time24.992859222.496429612.530.0805
hispanic*time21.635091760.817545880.830.4376
childage*time25.068846702.534423352.560.0774
married*time21.214439280.607219640.610.5412
gender*time20.947020640.473510320.480.6195
momage*time22.779342891.389671451.410.2456
momwork*time23.474912481.737456241.760.1729

In addition to PROC GLM, another SAS procedure, PROC TSCSREG (for time series cross section regression), also does OLS estimation of the fixed effects model. PROC TSCSREG, which is a component of the ETS product, has one nice feature that I will discuss in the next section, a Hausman test of fixed effects versus random effects. However, the downside of PROC TSCSREG is that it explicitly estimates coefficients for the dummy variables for the fixed effects and thus may use excessive computer time for large samples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.96.26