3.5. Subject-Specific versus Population-Averaged Coefficients

In chapter 2, we saw that estimates for the linear random effects model could also be obtained by using GEE estimation in PROC GENMOD. Although GEE estimation also works well for logistic regression models, the results are not equivalent to the random effects estimates produced by PROC NLMIXED. Let's first examine the differences for our NLSY example, and then we'll discuss the nature of those differences. Here's a GENMOD program for a model that's similar to the one we just estimated in NLMIXED:

PROC GENMOD DATA=teenyrs5;
   CLASS year id;
   MODEL pov = year mother spouse inschool hours
         / DIST=BINOMIAL;
   REPEATED SUBJECT=id / TYPE=EXCH MODELSE;
RUN;

The DIST=BINOMIAL option specifies a binomial distribution for the response variable POV. For this distribution, the default link function is the logistic model. The REPEATED statement invokes GEE estimation (in addition to conventional maximum likelihood) for the logistic regression model. The TYPE=EXCH option says that all the within-person correlations are equal, which is similar to the random effects model. The MODELSE option requests standard errors based on the model rather than using a robust estimation method.

These two options on the REPEATED statement are not necessarily optimal but have been chosen to maximize the similarity with the random effects model. One thing that's not at all similar, however, is the computation time. While PROC NLMIXED took 40 seconds to estimate the random effects model, PROC GENMOD took only about half a second to estimate the analogous model.

Results in Output 3.10 are similar to those in Output 3.9 for the random effects model estimated by NLMIXED. The p-values for the coefficients are very close using the two methods. Although the coefficients are identical in sign and similar in magnitude, the GEE coefficients are all smaller than the random effects coefficients (except for year 3). This particular pattern is no accident. Both methods make similar assumptions about the data, but the random effects maximum likelihood method produces subject-specific coefficients whereas the GEE method produces population-averaged coefficients (Hu et al. 1998; Diggle et al. 1994, Ch. 7).

Table 3.11. Output 3.10 GEE Estimates from PROC GENMOD
Analysis Of GEE Parameter Estimates
Model-Based Standard Error Estimates
Parameter EstimateStandard Error95% ConfidenceLimitsZPr>|Z|
Intercept 0.40190.08680.23170.57214.63<.0001
year10.13310.0892−0.04180.30801.490.1358
year2−0.09150.0854−0.25890.0759−1.070.2840
year3−0.04350.0820−0.20410.1172−0.530.5958
year4−0.07130.0800−0.22810.0855−0.890.3728
year50.00000.00000.00000.0000..
mother −0.84500.0919−1.0251−0.6650−9.20<.0001
spouse 0.98470.12060.74841.22118.16<.0001
inschool 0.04710.0763−0.10240.19660.620.5373
hours 0.02160.00230.01710.02609.41<.0001
Scale 1.0000.....

What's the difference? A subject-specific coefficient is an estimate of what would happen to a particular individual if the predictor variable were increased by one unit. A population-averaged coefficient, on the other hand, is an estimate of what would happen to the whole population if everyone's predictor variable were raised by one unit. For linear models, there is no difference. But for logistic regression models, and for many other nonlinear models, subject-specific coefficients will typically be larger than population-averaged coefficients.

Which is preferable? Well, that depends. Suppose you're a doctor and you want to know how much a flu vaccine will lower your patient's risk of getting infected. Then the subject-specific coefficient is what you want. On the other hand, if you're a public-health administrator and you want to know how the proportion of people who contract some disease will change if everyone is vaccinated, then the population-averaged coefficient might be more useful. But even in the latter case, there's a sense in which the subject-specific coefficient is more fundamental.

Suppose that the true model is the basic random effects logistic model of equation (3.5). The coefficient vectors β and γ are both subject specific. But if we estimate the model with GEE using PROC GENMOD, we will get population-averaged coefficients β* and γ*. The degree to which these coefficients differ depends on the variance of αi. Specifically, if Var(αi) = 0, then β = β* and γ = γ*. As Var(αi) increases, the values of β* and γ* decline toward 0. When αi has a normal distribution, the approximate relationship is


So the population-averaged coefficients depend on the degree of unobserved heterogeneity in the logistic regression model. Comparing Outputs 3.9 and 3.10, we find that this relationship does, in fact, hold approximately.

If your main concern is with the statistical significance and relative importance of the variables in the model, the population-averaged results obtained with GEE estimation may be quite adequate. Given the computational economy of GENMOD, that may be the way to go. But if you really want to get the best estimate of the magnitudes of the subject-specific effects, the coefficients produced by NLMIXED are preferable. Note, also, that coefficients for the fixed effects logistic regression model, as estimated by conditional logistic regression using LOGISTIC, are also subject-specific and thus are not deflated by unobserved heterogeneity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.103.10