5.2. Cox Regression

The most popular method for analyzing event history data is Cox regression, named after its inventor, David Cox (1972), who introduced the proportional hazards model and the partial likelihood method for estimating that model. Before we discuss fixed effects analysis, it's essential to review this method.

Rather than directly modeling the length of the interval, the dependent variable in Cox regression is the hazard, or instantaneous likelihood of event occurrence. For repeated events, the hazard may be defined as follows. Let Ni(t) be the number of events that have occurred to individual i by time t. The hazard for individual i at time t is given by


In words, this equation says to consider the probability of one additional event in some small interval of time Δt. Then form the ratio of this probability to Δt, and take the limit of this ratio as Δt goes to 0. For repeated events, the hazard function is also known as the intensity function.

The next step is to model the hazard as a function of the predictor variables. Letting hik(t) be the hazard for the kth event for individual i, a proportional hazards model is given by


where xik is a column vector of predictor variables that may vary across individuals and across events, β is a row vector of coefficients, ti(k−1) is the time of the (k−1)th event, and μ(.) is an unspecified function. In this model, the hazard of an event depends on the time since the most recent event. Later, we'll consider alternative ways of representing the dependence on time.

The method of partial likelihood makes it possible to estimate β without specifying anything about the function μ. For details on how this is accomplished, see Allison (1995). In SAS, partial likelihood is implemented with PROC PHREG. Here's a program for estimating the model in (5.2), without incorporating fixed effects:

PROC PHREG DATA=my.nsfg;
  MODEL dur*birth(0)=pregordr age married passt
        nobreast lbw caesar multiple college / TIES=EFRON;
RUN;

In the MODEL statement, the left-hand side of the equation is expressed as DUR*BIRTH(0), which is necessary to allow for the fact that many of the intervals are terminated by the interview rather than by another birth. In event history terminology, these are called censored intervals. The variable BIRTH indicates whether or not an interval is censored, and the number in parentheses (in this case 0) gives the value of the variable that corresponds to censored cases. The TIES=EFRON option requests a slight technical change in the estimation method that I strongly recommend for routine use. See Allison (1995) for details.

In Output 5.1, we see that 6,911 of the birth intervals were censored. That's not surprising, because the data collection method implies that each woman's last interval was terminated by the interview. Looking at the "Analysis of Maximum Likelihood Estimates," we find that all the variables but one (low birth weight) have highly significant effects on the hazard for a subsequent birth. Increased hazards are associated with being married or being on public assistance. All the other variables have negative signs.

To get a more precise interpretation for the effect of each variable, it's helpful to look at the last column, labeled "Hazard Ratio." These numbers are the exponentiated values of the parameter estimates, and they are interpreted similarly to odds ratios in logistic regression. For example, MARRIED has a hazard ratio of 1.25. This means that women who are married at the time of a birth have a hazard for another birth that is 25% larger than the hazard for unmarried women (controlling for other variables in the model). The hazard ratio for MULTIPLE is .493, which means that if a woman has twins, the hazard for the next birth is cut in half. For AGE, the hazard ratio is .936, which means that each additional year of the mother's age reduces the hazard of a subsequent birth by 100(1 − .936) = 6.4%.

Table 5.1. Output 5.1 Cox Regression Estimates for a Conventional Model
Model Information
Data SetMY.NSFG
Dependent Variabledur
Censoring Variablebirth
Censoring Value(s)0
Ties HandlingEFRON
Summary of the Number of Event and Censored Values
TotalEventCensoredPercent Censored
149328021691146.28
Testing Global Null Hypothesis: BETA=0
TestChi-SquareDFPr>ChiSq
Likelihood Ratio1702.51179<.0001
Score1607.44599<.0001
Wald1585.45449<.0001
Analysis of Maximum Likelihood Estimates
VariableDFParameter EstimateStandard ErrorChi-SquarePr>ChiSqHazard Ratio
pregordr1−0.164340.01150204.0833<.00010.848
age1−0.00065650.0000306461.2265<.00010.999
married10.223200.0286760.6010<.00011.250
passt10.138240.0286823.2324<.00011.148
nobreast1−0.271900.02332135.9444<.00010.762
lbw1−0.002460.042040.00340.95330.998
caesar1−0.117060.0305414.69120.00010.890
multiple1−0.706610.1425724.5635<.00010.493
college1−0.208440.0259864.3778<.00010.812

Unfortunately, there's a potential problem with these results: 69% of the women contributed at least two birth intervals to the data set, and it's reasonable to suspect that there would be some dependence among these repeated observations. In particular, it's natural to suppose that some women have persistently short birth intervals, whereas others have persistently long intervals. The failure to address this dependence could lead to serious underestimates of the standard errors and p-values.

Fortunately, beginning with SAS 8.1, PROC PHREG includes an option called COVSANDWICH that makes it easy to correct for dependence when there are repeated observations. This option invokes a method variously known as the robust variance estimator or the modified sandwich estimator, developed for Cox regression by Lin and Wei (1989) and described in some detail in Therneau and Grambsch (2000). Here's a modified PHREG program that includes this option.

PROC PHREG DATA=my.nsfg COVSANDWICH(AGGREGATE);
   MODEL dur*birth(0)=pregordr age married passt
         nobreast lbw caesar multiple college / TIES=EFRON;
   ID caseid;
RUN;

The option COVSANDWICH can be abbreviated to COVS. To correct for dependence, it's necessary to include both the AGGREGATE option and an ID statement that gives the name of the variable containing the ID number that is common to all observations in the same "cluster" (a woman, in this example).

Results are shown in Output 5.2. Looking first at the "Testing Global Null Hypothesis" panel, we find the score and Wald statistics now have two versions, "model-based" and "sandwich." The model-based chi-squares are the same as in Output 5.1, whereas the sandwich chi-squares have been adjusted for dependence among the observations. Clearly the adjustments have not been major. In the "Analysis of Maximum Likelihood Estimates," we see that the coefficient estimates and the hazard ratios are exactly the same as in Output 5.1. Robust variance estimation only affects the standard errors and associated statistics. The reported standard errors, chi-squares and p-values are all adjusted for dependence. We also get a new column "StdErr Ratio," which is the ratio of the corrected standard errors to the uncorrected standard errors in Output 5.1. For the most part, the corrections here are rather small. The one exception is the corrected standard error for PREGORDER, which is 37% larger than its uncorrected version, resulting in a corrected chi-square that is only about half the uncorrected statistic. It's still highly significant, however.

Table 5.2. Output 5.2 Cox Regression with Robust Variance Estimation
Testing Global Null Hypothesis: BETA=0
TestChi-SquareDFPr>ChiSq
Likelihood Ratio1702.51179<.0001
Score (Model-Based)1607.44599<.0001
Score (Sandwich)1503.11349<.0001
Wald (Model-Based)1585.45449<.0001
Wald (Sandwich)1575.60259<.0001
Analysis of Maximum Likelihood Estimates
VariableDFParameter EstimateStandard ErrorStdErr RatioChi-SquarePr>ChiSqHazard Ratio
pregordr1−0.164340.015751.369108.8409<.00010.848
age1−0.00065650.00003101.016447.0998<.00010.999
married10.223200.029421.02657.5589<.00011.250
passt10.138240.029521.02921.9242<.00011.148
nobreast1−0.271900.022750.975142.8908<.00010.762
lbw1−0.002460.042981.0220.00330.95430.998
caesar1−0.117060.027920.91417.5824<.00010.890
multiple1−0.706610.143711.00824.1746<.00010.493
college1−0.208440.026151.00763.5451<.00010.812

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.231.128