5.3. Cox Regression with Fixed Effects

Now we're ready to introduce fixed effects into the Cox regression model. As usual, this makes it possible to control for all stable predictor variables, while at the same time addressing the problem of dependence among the repeated observations. As in earlier fixed effects models, αi represents the combined effects of all stable covariates:


How can we estimate equation (5.3) for our birth interval data? One obvious possibility is to put dummy variables in the model for all women (except one). This method worked well for the Poisson and negative binomial regression models in chapter 4, but it runs into serious difficulties here. First, there is the practical problem of putting 6,910 dummy variables into a PROC PHREG model. I actually tried to do this, but my computer was still running after 10 days, at which point I terminated the job. In principle, such computational difficulties could be solved by using Greene's (2001) algorithms, but these are not currently available in any commercial software.

The more fundamental difficulty is the potential bias introduced by estimating so many "incidental parameters." In previous chapters, we saw that this bias could be quite serious for logistic regression models, but not for Poisson or negative binomial models. Elsewhere (Allison 2002), I've shown that Cox regression is more like logistic regression in this regard. When the average number of intervals per person is fewer than three, regression coefficients are inflated by approximately 30 to 90%, depending on the level of censoring (a higher proportion of censored cases produces greater inflation).

Fortunately, there is a simple alternative method that does the job very well. It's similar to the conditional likelihood methods used for both logistic and Poisson regression in that the coefficients for the dummy variables are not actually estimated but are eliminated from the likelihood function. First we modify equation (5.3) by defining


which yields


In this equation, the fixed effect αi has been absorbed into the unspecified function of time, which is now allowed to vary from one individual to another. Thus, each individual has her own hazard function, which is considerably less restrictive than allowing each individual to have her own constant.

Model (5.4) can be estimated by partial likelihood using the well-known method of stratification. Stratification allows different subgroups to have different baseline hazard functions, while constraining the coefficients to be the same across subgroups. It is accomplished by constructing a partial likelihood function for each subgroup, multiplying those likelihood functions together, and then maximizing the resulting likelihood function with respect to the coefficient vector β. In PHREG, stratification is implemented with the STRATA statement. Here's how it's done for the birth interval data:

PROC PHREG DATA=my.nsfg NOSUMMARY;
   MODEL dur*birth(0)= pregordr age married passt nobreast lbw
         caesar multiple college / TIES=EFRON;
   STRATA caseid;
RUN;

The statement STRATA CASEID creates a separate stratum for each value of CASEID, which means a separate stratum for each of the 6,911 women. That may seem like an enormous number of strata, but PHREG handles it with ease. The NOSUMMARY option is not essential, but it's strongly advised in order to avoid voluminous, uninformative output. If you don't include it, the output contains a line for each stratum, reporting the numbers of cases and events for that stratum.

The results in Output 5.3 show some noteworthy differences from those in Outputs 5.1 or 5.2. First, there's nothing reported for COLLEGE. Like most of our fixed effects methods, we can't estimate coefficients for variables that do not vary within person. Moving upward from COLLEGE, we see that the effect of a multiple birth is about the same as the previous estimates. But the coefficient for CAESAR is somewhat attenuated and no longer statistically significant. Low birth weight was previously far from statistically significant, but here the p-value is less than .01. The hazard ratio for LBW tells us that a low birth weight is associated with a 21% reduction in the hazard for a subsequent birth. The effect of breast feeding is attenuated, both in magnitude and significance. Public assistance was previously highly significant, but here it's not significant at all. The effect of marital status is about the same. Age is no longer statistically significant. On the other hand, the effect of pregnancy order is much greater, both in magnitude and statistical significance. Each additional birth is associated with about a 50% reduction in the hazard for a subsequent birth.

Table 5.3. Output 5.3 Cox Regression with Fixed Effects via Stratification
Testing Global Null Hypothesis: BETA=0
TestChi-SquareDFPr>ChiSq
Likelihood Ratio2640.95838<.0001
Score2293.61938<.0001
Wald1855.36318<.0001
Analysis of Maximum Likelihood Estimates
VariableDFParameter EstimateStandard ErrorChi-SquarePr>ChiSqHazard Ratio
pregordr1−0.716630.03372451.7316<.00010.488
age10.00008180.00011250.52850.46721.000
married10.183070.069586.92190.00851.201
passt10.075900.068631.22290.26881.079
nobreast1−0.128320.060474.50350.03380.880
lbw1−0.236420.081178.48320.00360.789
caesar1−0.078390.092720.71480.39790.925
multiple1−0.607310.218527.72400.00540.545
college00....

Why the differences? Well, like any fixed effects method, this one controls for all stable covariates, so it's possible that some of the earlier results in Output 5.2 were spurious. Thus, if I had to choose between the results in Output 5.2 and Output 5.3, I would emphatically choose the latter. The thing to keep in mind is that, in this analysis, each woman is being compared to herself in a different birth interval. For each woman, we're asking why some of her birth intervals are longer or shorter than others. Is it, for example, because she's married for some of the intervals and not for others? This approach will produce different answers than asking why some women tend to have longer birth intervals than other women.

This is particularly relevant to the PREGORDR variable. In a conventional Cox regression, this variable is likely to have a positive effect on the hazard for purely artifactual reasons. For a fixed interval of time, women who make it to higher numbers of births in that interval will necessarily have shorter birth intervals. By doing a fixed effects analysis, we are able to remove that artifact, which is why the negative coefficient becomes so much larger than before.

As with linear and logistic models, even though the fixed effects Cox model will not estimate the effects of time-invariant covariates like COLLEGE, it is possible to estimate interactions between time-invariant variables and other variables. For example, let's estimate a model that includes an interaction between COLLEGE and NOBREAST. Since PROC PHREG does not allow interactions to be directly specified in the MODEL statement,[] it's necessary to create a new variable in a DATA step:

[] In SAS 9, there is an experimental procedure called TPHREG that essentially duplicates PHREG with the addition of the CLASS statement. With this procedure, one can directly specify interactions on the MODEL statement.

DATA nsfg2;
   SET my.nsfg;
   collbreast=college*nobreast;
PROC PHREG DATA=nsfg2 NOSUMMARY;
   MODEL dur*birth(0)=pregordr age married passt nobreast lbw
         caesar multiple collbreast / ties=efron;
   STRATA caseid;
RUN;

Results are in Output 5.4. The interaction between COLLEGE and NOBREAST is statistically significant at the .05 level. But how is it interpreted? The main effect of NOBREAST represents the effect of this variable when COLLEGE=0, that is, among women without a college education. That coefficient is positive and far from statistically significant. The effect of NOBREAST among college-educated women is found by adding the main effect to the interaction (−.2659 + .0421) = −.22, which is statistically significant. The conclusion is that breast feeding increases the hazard of a subsequent birth among college-educated women, but not among other women.

Table 5.4. Output 5.4 Fixed Effects Cox Regression with Interaction
Testing Global Null Hypothesis: BETA=0
TestChi-SquareDFPr>ChiSq
Likelihood Ratio2645.50309<.0001
Score2296.15339<.0001
Wald1857.08379<.0001
Analysis of Maximum Likelihood Estimates
VariableDFParameter EstimateStandard ErrorChi-SquarePr>ChiSqHazard Ratio
pregordr1−0.717400.03373452.4115<.00010.488
age10.00007770.00011260.47620.49011.000
married10.183410.069556.95520.00841.201
passt10.075360.068611.20640.27201.078
nobreast10.042100.100190.17660.67441.043
lbw1−0.242080.081208.88790.00290.785
caesar1−0.078690.092770.71950.39630.924
multiple1−0.589260.219097.23340.00720.555
collbreast1−0.265900.124794.54020.03310.767

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.23.111.245