Left Censoring and Interval Censoring

One of PROC LIFEREG’s more useful features is its ability to handle left censoring and interval censoring. Recall that left censoring occurs when we know that an event occurred earlier than some time t, but we don’t know exactly when. Interval censoring occurs when the time of event occurrence is known to be somewhere between times a and b, but we don’t know exactly when. Left censoring can be seen as a special case of interval censoring in which a = 0; right censoring is a special case in which b=∞.

Interval-censored data are readily incorporated into the likelihood function. The contribution to the likelihood for an observation censored between a and b is just

Si(a) – Si(b)

where Si(.) is the survivor function for observation i. (This difference is always positive because Si(t) is a decreasing function of t.) In words, the probability of an event occurring in the interval (a, b) is the probability of an event occurring after a minus the probability of it occurring after b. For left-censored data, Si(a) = 1; for right-censored data Si(b)=0.

PROC LIFEREG can handle any combination of left-censored, right-censored, and interval-censored data, but a different MODEL syntax is required if there are any left-censored or interval-censored observations. Instead of a time variable and a censoring variable, PROC LIFEREG needs two time variables, an upper time and a lower time. Let’s call them UPPER and LOWER. The MODEL statement then reads as follows:

MODEL (LOWER,UPPER)=list of covariates;

The censoring status is determined by whether the two values are equal and whether either is coded as missing data:

Uncensored:LOWER and UPPER are both present and equal.
Interval Censored:LOWER and UPPER are present and different.
Right Censored:LOWER is present, but UPPER is missing.
Left Censored:LOWER is missing, but UPPER is present.

You might think that left censoring could also be indicated by coding LOWER as 0, but PROC LIFEREG excludes any observations with times that are 0 or negative. Observations are also excluded if both UPPER and LOWER are missing, or if LOWER > UPPER. Here are some examples:

ObservationLowerUpperStatus
13.93.9Uncensored
27.2.Right Censored
34.15.6Interval Censored
4.2.0Left Censored
505.8Excluded
63.21.9Excluded

Let’s look at a hypothetical example of left censoring for the recidivism data. Suppose that of the 114 arrests, the week of arrest was unavailable for 30 cases. In other words, we know that an arrest occurred between 0 and 52 weeks, but we don’t know when. To illustrate this, I modified the recidivism data by recoding the WEEK variable as missing for the first 30 arrests in the data set. Then, I estimated a Weibull model with the following program:

data;
   set recidlft;

      /* uncensored cases: */
   if arrest=1 and week ne . then do;
      upper=week;
      lower=week;
   end;

      /* left-censored cases: */
   if arrest=1 and week = . then do;
      upper=52;
      lower=.;
   end;

      /* right-censored cases: */
   if arrest=0 then do;
      upper=.;
      lower=52;
   end;
run;

proc lifereg;
   model (lower,upper)=fin age race wexp mar paro prio
         / dist=weibull;
run;

You should compare the results in Output 4.15 with those in Output 4.3, for which there were no left-censored cases. Although the results are quite similar, the chi-square statistics are nearly all smaller when some of the data are left censored. This is to be expected since left censoring entails some loss of information. Note that you cannot compare the log-likelihood for this model with the log-likelihood for the model with no left censoring. Whenever you alter the data, the log-likelihoods are no longer comparable.

Output 4.15. Results for the Weibull Model with Left-Censored Data
                   L I F E R E G  P R O C E D U R E

Data Set          =W0RK.DATA5
Dependent Variable=Log(LOWER)
Dependent Variable=Log(UPPER)
Noncensored Values=    84  Right Censored Values=    318
Left Censored Values=  30  Interval Censored Values=   0

Log Likelihood for WEIBULL -304.7419068

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1  3.9656454 0.458605  74.77401  0.0001 Intercept
FIN        1 0.29116771 0.151256  3.705626  0.0542
AGE        1 0.04471953  0.01772  6.368647  0.0116
RACE       1 -0.2312236 0.240799  0.922055  0.3369
WEXP       1 0.12310271 0.166282  0.548081  0.4591
MAR        1 0.31904485 0.300374  1.128183  0.2882
PARO       1 0.07010297 0.152682  0.210814  0.6461
PRIO       1 -0.0699893 0.022929  9.317426  0.0023
SCALE      1 0.77869005 0.080329                 Extreme value scale parameter

Now consider an application of interval censoring. We can actually view the recidivism data as discrete since we only know the week of the arrest, not the exact day. Although Petersen (1991) has shown that some bias can result from treating discrete data as continuous, there’s probably no danger with 52 different values for the measurement of arrest time. Nevertheless, you can use the interval-censoring option to get a slightly improved estimate. For an arrest that occurs in week 2, the actual interval in which the arrest occurred is (1, 2). Similarly, the interval is (2, 3) for an arrest occurring in week 3. This suggests the following recoding of the data:

data;
   set recid;

      /* interval censored cases: */
   if arrest=l then do;
      upper=week;
      lower=week-.9999;
   end;
      /* right censored cases: */
   if arrest=0 then do;
      upper=.;
      lower=52;
   end;
run;

proc lifereg;
   model (lower, upper) = fin age race wexp mar paro prio
         / dist=weibull;
run;

To get the lower value for the interval-censored cases, I subtract .9999 instead of 1 so that the result is not 0 for those persons with WEEK=1.

The results in Output 4.16 are very close to those in Output 4.3, which assumed that time was measured exactly. If the intervals had been larger, we might have found more substantial differences. The magnitude of the log-likelihood is nearly doubled for the interval-censored version but, again, log-likelihoods are not comparable when the data are altered.

Output 4.16. Results Treating Recidivism Data as Interval Censored
                   L I F E R E G  P R O C E D U R E

Data Set          =RECID
Dependent Variable=Log(LOWER)
Dependent Variable=Log(UPPER)
Noncensored Values=     0  Right Censored Values=    318
Left Censored Values=   0  Interval Censored Values= 114

Log Likelihood for WEIBULL -680.9846402

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 3.99060867 0.437423  83.22915  0.0001 Intercept
FIN        1 0.28371327 0.143999  3.881861  0.0488
AGE        1  0.0425244 0.016707  6.478366  0.0109
RACE       1 -0.2342767 0.229784   1.03949  0.3079
WEXP       1 0.11058711 0.158172  0.488822  0.4845
MAR        1 0.32458763 0.285258  1.294761  0.2552
PARO       1 0.06182479  0.14573   0.17998  0.6714
PRIO       1 -0.0684968 0.021843  9.833993  0.0017
SCALE      1 0.74354156 0.066542                 Extreme value scale parameter

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.253.221