One of PROC LIFEREG’s more useful features is its ability to handle left censoring and interval censoring. Recall that left censoring occurs when we know that an event occurred earlier than some time t, but we don’t know exactly when. Interval censoring occurs when the time of event occurrence is known to be somewhere between times a and b, but we don’t know exactly when. Left censoring can be seen as a special case of interval censoring in which a = 0; right censoring is a special case in which b=∞.
Interval-censored data are readily incorporated into the likelihood function. The contribution to the likelihood for an observation censored between a and b is just
Si(a) – Si(b)
where Si(.) is the survivor function for observation i. (This difference is always positive because Si(t) is a decreasing function of t.) In words, the probability of an event occurring in the interval (a, b) is the probability of an event occurring after a minus the probability of it occurring after b. For left-censored data, Si(a) = 1; for right-censored data Si(b)=0.
PROC LIFEREG can handle any combination of left-censored, right-censored, and interval-censored data, but a different MODEL syntax is required if there are any left-censored or interval-censored observations. Instead of a time variable and a censoring variable, PROC LIFEREG needs two time variables, an upper time and a lower time. Let’s call them UPPER and LOWER. The MODEL statement then reads as follows:
MODEL (LOWER,UPPER)=list of covariates;
The censoring status is determined by whether the two values are equal and whether either is coded as missing data:
Uncensored: | LOWER and UPPER are both present and equal. |
Interval Censored: | LOWER and UPPER are present and different. |
Right Censored: | LOWER is present, but UPPER is missing. |
Left Censored: | LOWER is missing, but UPPER is present. |
You might think that left censoring could also be indicated by coding LOWER as 0, but PROC LIFEREG excludes any observations with times that are 0 or negative. Observations are also excluded if both UPPER and LOWER are missing, or if LOWER > UPPER. Here are some examples:
Observation | Lower | Upper | Status |
---|---|---|---|
1 | 3.9 | 3.9 | Uncensored |
2 | 7.2 | . | Right Censored |
3 | 4.1 | 5.6 | Interval Censored |
4 | . | 2.0 | Left Censored |
5 | 0 | 5.8 | Excluded |
6 | 3.2 | 1.9 | Excluded |
Let’s look at a hypothetical example of left censoring for the recidivism data. Suppose that of the 114 arrests, the week of arrest was unavailable for 30 cases. In other words, we know that an arrest occurred between 0 and 52 weeks, but we don’t know when. To illustrate this, I modified the recidivism data by recoding the WEEK variable as missing for the first 30 arrests in the data set. Then, I estimated a Weibull model with the following program:
data; set recidlft; /* uncensored cases: */ if arrest=1 and week ne . then do; upper=week; lower=week; end; /* left-censored cases: */ if arrest=1 and week = . then do; upper=52; lower=.; end; /* right-censored cases: */ if arrest=0 then do; upper=.; lower=52; end; run; proc lifereg; model (lower,upper)=fin age race wexp mar paro prio / dist=weibull; run;
You should compare the results in Output 4.15 with those in Output 4.3, for which there were no left-censored cases. Although the results are quite similar, the chi-square statistics are nearly all smaller when some of the data are left censored. This is to be expected since left censoring entails some loss of information. Note that you cannot compare the log-likelihood for this model with the log-likelihood for the model with no left censoring. Whenever you alter the data, the log-likelihoods are no longer comparable.
L I F E R E G P R O C E D U R E Data Set =W0RK.DATA5 Dependent Variable=Log(LOWER) Dependent Variable=Log(UPPER) Noncensored Values= 84 Right Censored Values= 318 Left Censored Values= 30 Interval Censored Values= 0 Log Likelihood for WEIBULL -304.7419068 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 3.9656454 0.458605 74.77401 0.0001 Intercept FIN 1 0.29116771 0.151256 3.705626 0.0542 AGE 1 0.04471953 0.01772 6.368647 0.0116 RACE 1 -0.2312236 0.240799 0.922055 0.3369 WEXP 1 0.12310271 0.166282 0.548081 0.4591 MAR 1 0.31904485 0.300374 1.128183 0.2882 PARO 1 0.07010297 0.152682 0.210814 0.6461 PRIO 1 -0.0699893 0.022929 9.317426 0.0023 SCALE 1 0.77869005 0.080329 Extreme value scale parameter |
Now consider an application of interval censoring. We can actually view the recidivism data as discrete since we only know the week of the arrest, not the exact day. Although Petersen (1991) has shown that some bias can result from treating discrete data as continuous, there’s probably no danger with 52 different values for the measurement of arrest time. Nevertheless, you can use the interval-censoring option to get a slightly improved estimate. For an arrest that occurs in week 2, the actual interval in which the arrest occurred is (1, 2). Similarly, the interval is (2, 3) for an arrest occurring in week 3. This suggests the following recoding of the data:
data; set recid; /* interval censored cases: */ if arrest=l then do; upper=week; lower=week-.9999; end; /* right censored cases: */ if arrest=0 then do; upper=.; lower=52; end; run; proc lifereg; model (lower, upper) = fin age race wexp mar paro prio / dist=weibull; run;
To get the lower value for the interval-censored cases, I subtract .9999 instead of 1 so that the result is not 0 for those persons with WEEK=1.
The results in Output 4.16 are very close to those in Output 4.3, which assumed that time was measured exactly. If the intervals had been larger, we might have found more substantial differences. The magnitude of the log-likelihood is nearly doubled for the interval-censored version but, again, log-likelihoods are not comparable when the data are altered.
L I F E R E G P R O C E D U R E Data Set =RECID Dependent Variable=Log(LOWER) Dependent Variable=Log(UPPER) Noncensored Values= 0 Right Censored Values= 318 Left Censored Values= 0 Interval Censored Values= 114 Log Likelihood for WEIBULL -680.9846402 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 3.99060867 0.437423 83.22915 0.0001 Intercept FIN 1 0.28371327 0.143999 3.881861 0.0488 AGE 1 0.0425244 0.016707 6.478366 0.0109 RACE 1 -0.2342767 0.229784 1.03949 0.3079 WEXP 1 0.11058711 0.158172 0.488822 0.4845 MAR 1 0.32458763 0.285258 1.294761 0.2552 PARO 1 0.06182479 0.14573 0.17998 0.6714 PRIO 1 -0.0684968 0.021843 9.833993 0.0017 SCALE 1 0.74354156 0.066542 Extreme value scale parameter |
3.14.253.221