Cox Regression

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5.2. Cox Regression

The most popular method for analyzing event history data is Cox regression, named after its inventor, David Cox (1972), who introduced the proportional hazards model and the partial likelihood method for estimating that model. Before we discuss fixed effects analysis, it's essential to review this method.

Rather than directly modeling the length of the interval, the dependent variable in Cox regression is the hazard, or instantaneous likelihood of event occurrence. For repeated events, the hazard may be defined as follows. Let N_i(t) be the number of events that have occurred to individual i by time t. The hazard for individual i at time t is given by

In words, this equation says to consider the probability of one additional event in some small interval of time Δt. Then form the ratio of this probability to Δt, and take the limit of this ratio as Δt goes to 0. For repeated events, the hazard function is also known as the intensity function.

The next step is to model the hazard as a function of the predictor variables. Letting h_ik(t) be the hazard for the kth event for individual i, a proportional hazards model is given by

where x_ik is a column vector of predictor variables that may vary across individuals and across events, β is a row vector of coefficients, t_i_(k−1) is the time of the (k−1)th event, and μ(.) is an unspecified function. In this model, the hazard of an event depends on the time since the most recent event. Later, we'll consider alternative ways of representing the dependence on time.

The method of partial likelihood makes it possible to estimate β without specifying anything about the function μ. For details on how this is accomplished, see Allison (1995). In SAS, partial likelihood is implemented with PROC PHREG. Here's a program for estimating the model in (5.2), without incorporating fixed effects:

PROC PHREG DATA=my.nsfg;
  MODEL dur*birth(0)=pregordr age married passt
        nobreast lbw caesar multiple college / TIES=EFRON;
RUN;

In the MODEL statement, the left-hand side of the equation is expressed as DUR*BIRTH(0), which is necessary to allow for the fact that many of the intervals are terminated by the interview rather than by another birth. In event history terminology, these are called censored intervals. The variable BIRTH indicates whether or not an interval is censored, and the number in parentheses (in this case 0) gives the value of the variable that corresponds to censored cases. The TIES=EFRON option requests a slight technical change in the estimation method that I strongly recommend for routine use. See Allison (1995) for details.

In Output 5.1, we see that 6,911 of the birth intervals were censored. That's not surprising, because the data collection method implies that each woman's last interval was terminated by the interview. Looking at the "Analysis of Maximum Likelihood Estimates," we find that all the variables but one (low birth weight) have highly significant effects on the hazard for a subsequent birth. Increased hazards are associated with being married or being on public assistance. All the other variables have negative signs.

To get a more precise interpretation for the effect of each variable, it's helpful to look at the last column, labeled "Hazard Ratio." These numbers are the exponentiated values of the parameter estimates, and they are interpreted similarly to odds ratios in logistic regression. For example, MARRIED has a hazard ratio of 1.25. This means that women who are married at the time of a birth have a hazard for another birth that is 25% larger than the hazard for unmarried women (controlling for other variables in the model). The hazard ratio for MULTIPLE is .493, which means that if a woman has twins, the hazard for the next birth is cut in half. For AGE, the hazard ratio is .936, which means that each additional year of the mother's age reduces the hazard of a subsequent birth by 100(1 − .936) = 6.4%.

Table 5.1. Output 5.1 Cox Regression Estimates for a Conventional Model
Model Information
Data Set	MY.NSFG
Dependent Variable	dur
Censoring Variable	birth
Censoring Value(s)	0
Ties Handling	EFRON

Summary of the Number of Event and Censored Values
Total	Event	Censored	Percent Censored
14932	8021	6911	46.28

Testing Global Null Hypothesis: BETA=0
Test	Chi-Square	DF	Pr>ChiSq
Likelihood Ratio	1702.5117	9	<.0001
Score	1607.4459	9	<.0001
Wald	1585.4544	9	<.0001

Analysis of Maximum Likelihood Estimates
Variable	DF	Parameter Estimate	Standard Error	Chi-Square	Pr>ChiSq	Hazard Ratio
pregordr	1	−0.16434	0.01150	204.0833	<.0001	0.848
age	1	−0.0006565	0.0000306	461.2265	<.0001	0.999
married	1	0.22320	0.02867	60.6010	<.0001	1.250
passt	1	0.13824	0.02868	23.2324	<.0001	1.148
nobreast	1	−0.27190	0.02332	135.9444	<.0001	0.762
lbw	1	−0.00246	0.04204	0.0034	0.9533	0.998
caesar	1	−0.11706	0.03054	14.6912	0.0001	0.890
multiple	1	−0.70661	0.14257	24.5635	<.0001	0.493
college	1	−0.20844	0.02598	64.3778	<.0001	0.812

Unfortunately, there's a potential problem with these results: 69% of the women contributed at least two birth intervals to the data set, and it's reasonable to suspect that there would be some dependence among these repeated observations. In particular, it's natural to suppose that some women have persistently short birth intervals, whereas others have persistently long intervals. The failure to address this dependence could lead to serious underestimates of the standard errors and p-values.

Fortunately, beginning with SAS 8.1, PROC PHREG includes an option called COVSANDWICH that makes it easy to correct for dependence when there are repeated observations. This option invokes a method variously known as the robust variance estimator or the modified sandwich estimator, developed for Cox regression by Lin and Wei (1989) and described in some detail in Therneau and Grambsch (2000). Here's a modified PHREG program that includes this option.

PROC PHREG DATA=my.nsfg COVSANDWICH(AGGREGATE);
   MODEL dur*birth(0)=pregordr age married passt
         nobreast lbw caesar multiple college / TIES=EFRON;
   ID caseid;
RUN;

The option COVSANDWICH can be abbreviated to COVS. To correct for dependence, it's necessary to include both the AGGREGATE option and an ID statement that gives the name of the variable containing the ID number that is common to all observations in the same "cluster" (a woman, in this example).

Results are shown in Output 5.2. Looking first at the "Testing Global Null Hypothesis" panel, we find the score and Wald statistics now have two versions, "model-based" and "sandwich." The model-based chi-squares are the same as in Output 5.1, whereas the sandwich chi-squares have been adjusted for dependence among the observations. Clearly the adjustments have not been major. In the "Analysis of Maximum Likelihood Estimates," we see that the coefficient estimates and the hazard ratios are exactly the same as in Output 5.1. Robust variance estimation only affects the standard errors and associated statistics. The reported standard errors, chi-squares and p-values are all adjusted for dependence. We also get a new column "StdErr Ratio," which is the ratio of the corrected standard errors to the uncorrected standard errors in Output 5.1. For the most part, the corrections here are rather small. The one exception is the corrected standard error for PREGORDER, which is 37% larger than its uncorrected version, resulting in a corrected chi-square that is only about half the uncorrected statistic. It's still highly significant, however.

Table 5.2. Output 5.2 Cox Regression with Robust Variance Estimation
Testing Global Null Hypothesis: BETA=0
Test	Chi-Square	DF	Pr>ChiSq
Likelihood Ratio	1702.5117	9	<.0001
Score (Model-Based)	1607.4459	9	<.0001
Score (Sandwich)	1503.1134	9	<.0001
Wald (Model-Based)	1585.4544	9	<.0001
Wald (Sandwich)	1575.6025	9	<.0001

Analysis of Maximum Likelihood Estimates
Variable	DF	Parameter Estimate	Standard Error	StdErr Ratio	Chi-Square	Pr>ChiSq	Hazard Ratio
pregordr	1	−0.16434	0.01575	1.369	108.8409	<.0001	0.848
age	1	−0.0006565	0.0000310	1.016	447.0998	<.0001	0.999
married	1	0.22320	0.02942	1.026	57.5589	<.0001	1.250
passt	1	0.13824	0.02952	1.029	21.9242	<.0001	1.148
nobreast	1	−0.27190	0.02275	0.975	142.8908	<.0001	0.762
lbw	1	−0.00246	0.04298	1.022	0.0033	0.9543	0.998
caesar	1	−0.11706	0.02792	0.914	17.5824	<.0001	0.890
multiple	1	−0.70661	0.14371	1.008	24.1746	<.0001	0.493
college	1	−0.20844	0.02615	1.007	63.5451	<.0001	0.812

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Cox Regression

Create new playlist

Sign In

Sign Up

5.2. Cox Regression

Table of Contents for
Cox Regression