2.6. A Hybrid Method

As we've seen, both the GEE method and the random effects method may produce estimates that are markedly different from the fixed effects estimates. That's because neither of those methods controls for stable, unmeasured characteristics of the individuals. There's another approach, however, that combines some of the virtues of fixed effects and random effects methods. This method produces coefficient estimates that are identical to those from the fixed effects method, but the standard errors and test statistics might be somewhat different, depending on the details of the estimation method.

The basic idea is to decompose the time-varying predictors into two parts, one representing within-person variation, the other representing between-person variation (Neuhaus and Kalbfleisch 1998). Both of these components are used as predictors in the regression model. The coefficients for the within-person components will be identical to those for the classic fixed effects estimates.

There are several potential advantages of doing it this way. One is that you also get estimates for the between-person effects, as well as coefficients for any measured time-invariant predictors. Second, by testing whether the between-person coefficients are the same as the corresponding within-person coefficients, you get a test that has the same function as the Hausman test that we looked at earlier. That is, it can tell you whether a fixed effects approach offers any gains over a random effects regression. Third, using the options available in PROC MIXED, you can extend the conventional fixed effects models in several important ways.

Here's how to do it for the NLSY data. First, we use PROC MEANS to calculate the means for SELF and POV across the three observations for each child, and output them to a data set. The NWAY and NOPRINT options suppress unwanted output. The CLASS statement says to compute the means separately for each value of the ID variable. The data set of means is then merged with the original data, and deviations of each variable from its within-person mean are calculated:

PROC MEANS DATA=persyr3 NWAY NOPRINT;
   CLASS id;
   VAR self pov;
   OUTPUT OUT=a MEAN=mself mpov;
PROC SORT DATA=persyr3;
   BY id;
PROC SORT DATA=a;
   BY id;
DATA combine;
   MERGE persyr3 a;
   BY id;
   dself=self-mself;
   dpov=pov-mpov;
   dtime=time-2;
   time1=(time=1);
   time2=(time=2);
RUN;

Note that I've calculated deviation scores for TIME, although I won't use those in the first few models. (The mean of TIME is necessarily 2 for every child). I've also created two dummy variables, TIME1 and TIME2, so that I can represent time in PROC REG, which doesn't have a CLASS statement.

In the literature on multilevel models (Bryk and Raudenbusch 1992; Goldstein 1987; Kreft et al. 1995), the practice of subtracting person-specific means from each time-varying variable is referred to as group-mean centering. Although it is well-known that using group-mean centered variables can produce substantially different results, this literature has not generally made the connection to fixed effects models nor has it been recognized that group-mean centering controls for all time-invariant covariates.

The calculation of centered predictors is similar to the computational method for getting fixed effects estimates that I described in the previous section. What's new here is that we don't calculate centered scores for the dependent variable. Once the new data set is constructed, we can run an OLS regression with both the centered variables and the mean variables, along with any other time-invariant predictors that we want to include:

PROC REG DATA=combine;
   MODEL anti=dpov dself mpov mself time1 time2 black hispanic
         childage married gender momage momwork;
RUN;

Results are in Output 2.20. The coefficients for the centered variables, DPOV and DSELF, are the same as the fixed effects coefficients for POV and SELF in Output 2.10. But we also get coefficients for several time-invariant predictors, something that was not available with the earlier method.

Table 2.20. Output 2.20 OLS Regression Using Centered Predictors
Root MSE1.50725R-Square0.0935
Dependent Mean1.63683Adj R-Sq0.0867
Coeff Var92.08316  
VariableDFParameter EstimateStandard ErrortValuePr>|t|
Intercept13.118820.794223.93<.0001
dpov10.112470.141210.800.4258
dself1−0.055150.01591−3.470.0005
mpov10.616430.107105.76<.0001
mself1−0.090030.01506−5.98<.0001
time11−0.210740.08888−2.370.0179
time21−0.166340.08853−1.880.0604
black10.110930.090191.230.2189
hispanic1−0.279900.09513−2.940.0033
childage10.085750.062061.380.1672
married1−0.128410.08788−1.460.1441
gender1−0.508160.07289−6.97<.0001
momage1−0.011340.01738−0.650.5142
momwork10.164120.081412.020.0440

Despite the fact that the coefficients for DPOV and DSELF replicate our earlier results, the reported standard errors for those coefficients are about 50% larger in Output 2.20 than they were in Output 2.10. This means, of course, that the t-statistics are about 1/3 smaller. That's because the error term in the earlier regression consisted entirely of within-person variation on the dependent variable. Here there is both within-person and between-person variation. We can correct the problem by estimating a random effects model in PROC MIXED:

PROC MIXED DATA=combine COVTEST NOCLPRINT;
   CLASS id time;
   MODEL anti=dpov dself mpov mself time black hispanic
         childage married gender momage momwork / SOLUTION;
   RANDOM INTERCEPT / SUBJECT=id;
RUN;

Results in Output 2.21 have standard errors for DPOV and DSELF that are identical to those in Output 2.10. Note, however, that for the time-invariant predictors, the random effects standard errors are larger than the OLS standard errors rather than smaller (which is just what is expected from clustering adjustments).

Table 2.21. Output 2.21 PROC MIXED Estimates for Centered Predictors
Covariance Parameter Estimates
Cov ParmSubjectEstimateStandard ErrorZ ValuePrZ
Interceptid1.28960.0969213.31<.0001
Residual 0.99420.0413224.06<.0001
Solution for Fixed Effects
EffecttimeEstimateStandard ErrorDFtValuePr > |t|
Intercept 3.11881.16015712.690.0074
dpov 0.11250.0934111581.200.2288
dself −0.055150.010531158−5.24<.0001
mpov 0.61640.156711583.93<.0001
mself −0.090030.022031158−4.09<.0001
time1−0.21070.058801158−3.580.0004
time2−0.16630.058571158−2.840.0046
time30....
black 0.11090.132011580.840.4007
hispanic −0.27990.13921158−2.010.0446
childage 0.085750.0908011580.940.3452
married −0.12840.12861158−1.000.3181
gender −0.50820.10661158−4.77<.0001
momage −0.011340.025431158−0.450.6558
momwork 0.16410.119111581.380.1685

What we've gained by the centering method is the ability to estimate coefficients for time-invariant predictors. It's essential to keep in mind, however, that the coefficients of the time invariant predictors (unlike those for the within-person time-varying predictors) will be biased if those variables are correlated with the unobserved fixed effects. There's another attraction to this approach. If the random effects model is correct (that is, if the time-varying predictors are uncorrelated with person-specific fixed effects), the coefficients for the centered variables should be the same as the coefficients for the mean variables. Since both are estimated in the same model, it's easy to test that assumption in PROC MIXED by including CONTRAST statements after the MODEL statement:

CONTRAST 'pov' dpov 1 mpov −1;
CONTRAST 'self' dself 1 mself −1;
CONTRAST 'overall' dpov 1 mpov −1, dself 1 mself −1;

For each CONTRAST statement, the text in quotes is a required label, used for distinguishing one test from another in the output. The first CONTRAST statement tests whether the coefficients for DPOV and MPOV are the same. In detail, the coefficient for DPOV is multiplied by 1, the coefficient for MPOV is multiplied by −1, the results are added together, and the sum is tested for a difference with 0. The next CONTRAST statement does the same for the two self-esteem variables, and the final CONTRAST statement tests both hypotheses simultaneously. In Output 2.22, we see strong evidence that the assumption is not satisfied for POV, but might be for SELF. The overall test yields results very similar to the Hausman test in Output 2.17.

Table 2.22. Output 2.22 Tests for Fixed Effects vs. Random Effects Using Centered Scores
Contrasts
LabelNum DFDen DFF ValuePr>F
pov111587.630.0058
self111582.040.1535
overall211584.930.0073

If we conclude that the coefficients for the mean and deviation scores are different for a particular variable, a natural question is whether the coefficient for the mean has any useful interpretation. In most cases, I don't think so, because that coefficient is typically confounded with the effects of other unobserved variables. Nevertheless, it's important to have the mean variables in the model in order to get good estimates of the effects of other time-invariant variables. Omitting them would mean that the variable in question was not fully controlled.

Another advantage of the centering method for getting fixed effects estimates is that we can allow for random variation in the slope parameters for the time-varying predictors. For example, instead of estimating separate coefficients for time 1 and time 2, let's assume that the antisocial behavior changes linearly with time. Then we allow the coefficient of TIME (actually DTIME, which is TIME minus the mean of 2) to vary randomly from child to child—a random, linear growth model. Here's how to set it up:

PROC MIXED DATA=combine COVTEST NOCLPRINT;
   CLASS id;
   MODEL anti=dpov dself dtime mpov mself / SOLUTION;
   RANDOM INTERCEPT dtime / SUBJECT=id;
RUN;

Output 2.23 gives the results. We see that the fixed coefficient for DTIME (.1055) is highly significant, but the variance around that coefficient (.1409) is also highly significant. So we conclude that antisocial behavior tends to increase over time during the adolescent years, but there is substantial variation among children in the rate of increase.

Table 2.23. Output 2.23 Estimates for a Random Growth Curve Model with Fixed Effects
Covariance Parameter Estimates
Cov ParmSubjectEstimateStandard ErrorZ ValuePrZ
Interceptid1.41390.101313.96<.0001
dtimeid0.14090.041843.370.0004
Residual 0.85390.0502117.01<.0001
Solution for Fixed Effects
EffectEstimateStandard ErrorDFtValuePr > |t|
Intercept2.93360.46165796.36<.0001
dpov0.13870.093915781.480.1402
dself−0.055150.01048578−5.26<.0001
dtime0.10550.031405803.360.0008
mpov0.68420.13765784.97<.0001
mself−0.074770.02225578−3.360.0008

In the same way, I tried fitting models for random variation in the effect of POV and SELF, but there is no evidence for such variation (results not shown). This approach of combining models with fixed and random effects is quite similar to the conditional linear mixed models of Verbeke et al. (2001), although they used a somewhat different computational method for obtaining the estimates.[]

[] Verbeke et al. (2001) provide a SAS macro that transforms the data set as a precursor to using PROC MIXED.

One final attraction of estimating the fixed effects model with PROC MIXED is the ability to specify models for the error structure that are less constrained than the conventional fixed effects method, which implies a covariance structure for the dependent variable known as compound symmetry. Compound symmetry means that variance of the error term is constant over time and the covariance between any two time points is the same. PROC MIXED allows for a wide variety of alternative structures that can be specified with the REPEATED statement. Although there isn't space for a detailed discussion of the many options, let's consider the most general option, which has no constraints whatever on the error structure. While this option would not work well if there were many time points (there would be too many different covariances), it's quite reasonable when there are only three. A model with an unstructured covariance matrix is specified as follows:

PROC MIXED DATA=combine COVTEST NOCLPRINT;
   CLASS id time;
   MODEL anti=dpov dself mpov mself time black hispanic
         childage married gender momage momwork / SOLUTION;
   REPEATED time / SUBJECT=id TYPE=UN;
RUN;

On the REPEATED statement, the TIME variable is optional as long as everyone has the same number of time points and they are in the same order. UN stands for unstructured. The output from this program (not shown) has both coefficients and standard errors that differ slightly from those in Output 2.23, but with no appreciable change in the p-values.[]

[] Another, similar approach is to modify the program that produced Output 2.23 by specifying the EMPIRICAL option on the PROC statement. This produces robust standard errors and test statistics using the method of White (1980), which allows for heterogeneous variances and unstructured correlations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.136.9