The Life-Table Method

If the number of observations is large and if event times are precisely measured, there will be many unique event times. The KM method then produces long tables that may be unwieldy for presentation and interpretation. You can avoid this problem with the life-table method because event times are grouped into intervals that can be as long or short as you please. In addition, the life-table method (also known as the actuarial method) can produce estimates and plots of the hazard function, which are not available in PROC LIFETEST with the KM method. The downside to the life-table method is that the choice of intervals is usually somewhat arbitrary, leading to arbitrariness in the results and possible uncertainty about how to choose the intervals. There is inevitably some loss of information as well. Note, however, that PROC LIFETEST computes the log-rank and Wilcoxon statistics from the ungrouped data (if available) so they are unaffected by the choice of intervals for the life-table method.

Let’s first see what the life-table method produces in PROC LIFETEST, and then we’ll discuss some details of the calculations. Output 3.7 displays the first 20 observations from a recidivism study that was briefly described in Chapter 1, “Introduction,” in the section Why Use Survival Analysis? The sample consisted of 432 male inmates who were released from Maryland state prisons in the early 1970s (Rossi et al. 1980). These men were followed for one year after their release, and the dates of any arrests were recorded. We’ll only look at the first arrest here. (In fact, there weren’t enough men with two or more arrests to use the techniques for repeated events discussed in Chapter 8, “Heterogeneity, Repeated Events, and Other Topics.”) The WEEK variable contains the week of the first arrest after release. The variable ARREST has a value of 1 for those arrested during the one-year follow-up, and it has a value of 0 for those who were not. Only 26 percent of the men were arrested. The data are singly right censored so that all the censored cases have a value of 52 for WEEK.

Output 3.7. Recidivism Data (First 20 Cases Out of 432)
OBS  WEEK  FIN  AGE  RACE  WEXP  MAR  PARO  PRIO  ARREST

  1    52   1    24    1     1    0     1     1       0
  2    52   0    29    1     1    0     1     3       0
  3    52   1    20    1     1    1     1     1       0
  4    52   0    20    1     0    0     1     1       0
  5    52   1    31    0     1    0     1     3       0
  6    12   1    22    1     1    1     1     2       1
  7    52   0    24    1     1    0     1     2       0
  8    19   0    18    1     0    0     0     2       1
  9    52   0    18    1     0    0     1     3       0
 10    15   1    22    1     0    0     1     3       1
 11     8   1    21    1     1    0     1     4       1
 12    52   1    21    1     0    0     1     1       0
 13    52   1    21    0     1    0     1     1       0
 14    36   1    19    1     0    0     1     2       1
 15    52   0    33    1     1    0     1     2       1
 16     4   0    18    1     1    0     0     1       1
 17    45   1    18    1     0    0     0     5       1
 18    52   0    21    1     0    0     0     0       1
 19    52   1    20    1     0    1     0     1       0
 20    52   0    22    1     1    0     0     1       0

The covariates shown in Output 3.7 have the following interpretation:

FINhas a value of 1 if the inmate received financial aid after release; otherwise, FIN has a value of 0. This variable was randomly assigned with equal numbers in each category.
AGEspecifies age in years at the time of release.
RACEhas a value of 1 if the person was black; otherwise, RACE has a value of 0.
WEXPhas a value of 1 if the inmate had full-time work experience before incarceration; otherwise, WEXP has a value of 0.
MARhas a value of 1 if the inmate was married at the time of release; otherwise, MAR has a value of 0.
PAROhas a value of 1 if the inmate was released on parole; otherwise, PARO has a value of 0.
PRIOspecifies the number of convictions an inmate had prior to incarceration.

We now request a life table, using the default specification for interval lengths:

proc lifetest data=recid method=life;
   time week*arrest(0);
run;

Output 3.8 shows the results. PROC LIFETEST constructs six intervals, starting at 0 and incrementing by periods of 10 weeks. The algorithm for the default choice of intervals is fairly complex (see the SAS/STAT User’s Guide for details). You can override the default by specifying WIDTH=w in the PROC LIFETEST statement. The intervals will then begin with [0, w) and will increment by w. Alternatively, you can get more control over the intervals by specifying INTERVALS=a b c ... in the PROC LIFETEST statement, where a, b, and c are cut points. For example, INTERVALS= 15 20 30 50 produces the intervals [0, 15), [15, 20), [20, 30), [30, 50), [50, ∞). See the SAS/STAT User’s Guide for other options. Note that intervals do not have to be the same length. It’s often desirable to make later intervals longer so that they include enough events to give reliable estimates of the hazard and other statistics.

Output 3.8. Applying the Life-Table Method to the Recidivism Data
                  Life Table Survival Estimates

                                       Effective  Conditional
   Interval       Number     Number      Sample   Probability
[Lower,  Upper)   Failed    Censored      Size     of Failure

     0       10      14          0       432.0        0.0324
    10       20      21          0       418.0        0.0502
    20       30      23          0       397.0        0.0579
    30       40      23          0       374.0        0.0615
    40       50      26          0       351.0        0.0741
    50       60       7        318       166.0        0.0422


                 Conditional
                 Probability                      Survival   Median
   Interval        Standard                       Standard  Residual
[Lower,  Upper)     Error     Survival   Failure    Error   Lifetime

     0       10     0.00852     1.0000         0         0         .
    10       20      0.0107     0.9676    0.0324   0.00852         .
    20       30      0.0117     0.9190    0.0810    0.0131         .
    30       40      0.0124     0.8657    0.1343    0.0164         .
    40       50      0.0140     0.8125    0.1875    0.0188         .
    50       60      0.0156     0.7523    0.2477    0.0208         .


                           Evaluated at the Midpoint of the Interval

                  Median                PDF               Hazard
   Interval      Standard            Standard            Standard
[Lower,  Upper)    Error      PDF      Error    Hazard     Error

     0       10         .   0.00324  0.000852  0.003294   0.00088
    10       20         .   0.00486   0.00103  0.005153  0.001124
    20       30         .   0.00532   0.00108  0.005966  0.001244
    30       40         .   0.00532   0.00108  0.006345  0.001322
    40       50         .   0.00602   0.00114  0.007692  0.001507
    50       60         .   0.00317   0.00118  0.004308  0.001628

For each interval, 14 different statistics are reported. The four statistics displayed in the first panel, while not of major interest in themselves, are necessary for calculating the later statistics. Number Failed and Number Censored should be self-explanatory. Effective Sample Size is straightforward for the first five intervals because they contain no censored cases. The effective sample size for these intervals is just the number of persons who had not yet been arrested at the start of the interval. For the last interval, however, the effective sample size is only 166, even though 351 persons made it to the 50th week without an arrest. Why? The answer is a fundamental property of the life-table method. The method treats any cases censored within an interval as if they were censored at the midpoint of the interval. This treatment is equivalent to assuming that the distribution of censoring times is uniform within the interval. Since censored cases are only at risk for half of the interval, they only count for half in figuring the effective sample size. Thus, the effective sample size for the last interval is thus 7+318/2=166. The 7 corresponds to the seven men who were arrested in the interval; they are treated as though they were at risk for the whole interval.

The Conditional Probability of Failure is an estimate of the probability that a person will be arrested in the interval, given that he made it to the start of the interval. This estimate is calculated as (number failed)/(effective sample size). An estimate of its standard error is given in the next panel. The Survival column is the life-table estimate of the survivor function, that is, the probability that the event occurs at a time greater than or equal to the start time of each interval. For example, the estimated probability that an inmate will not be arrested until week 30 or later is .8657.

The survival estimate is calculated from the conditional probabilities of failure in the following way. For interval i, let ti be the start time and qi be the conditional probability of failure. The probability of surviving to ti or beyond is then

Equation 3.2


For i = 1 and, hence, ti = 0, the survival probability is set to 1.0.

The rationale for equation (3.2) is a fairly simple application of conditional probability theory. Suppose we want an expression for the probability of surviving to t4 or beyond. To obtain this, let’s define the following events:

A= survival to t2 or beyond.
B= survival to t3 or beyond.
C= survival to t4 or beyond.

We want the probability of C. But since you can’t get past t4 without getting past t2 and t3, we can write Pr(C) = Pr(A, B, C). By the definition of conditional probability, we can rewrite this as

Pr(A, B, C)=Pr(C | A, B)Pr(B | A)Pr(A)=(l – q3)(1 – q2)(1 – q1).

Extending this argument to other intervals yields the formula in equation (3.2). Note the similarity between this formula and equation (3.1) for the KM estimator. In equation (3.1), dj/nj is equivalent to qj in equation (3.2); both are estimates of the probability of failure in an interval given survival to the start of the interval. The major differences between the two formulas are as follows:

  • The number of censored observations in an interval is not halved in the KM estimator.

  • The interval boundaries for the KM estimator are determined by the event times themselves.

Thus, each interval for KM estimation extends from one unique event time up to, but not including, the next unique event time.

Continuing with the second panel of Output 3.8, the Failure column is just 1 minus the Survival column. We are also given the standard errors of the Survival probabilities. As with the KM method, these standard errors are calculated by Greenwood’s formula, and we can use them to construct confidence intervals around the survival probabilities. The Median Residual Lifetime column is, in principle, an estimate of the remaining time until an event for an individual who survived to the start of the interval. For this example, however, the estimates are all missing. To calculate this statistic for a given interval, it is necessary that there be a later interval whose survival probability is less than half the survival probability associated with the interval of interest. It is apparent from Output 3.8 that no interval satisfies this criterion. For any set of data, there will nearly always be some later intervals for which this statistic cannot be calculated.

The PDF column is an estimate of the probability density function at the midpoint of the interval. Of greater interest is the Hazard column, which gives estimates of the hazard function at the midpoint of each interval. This is calculated as

Equation 3.3


where for the ith interval, tim is the midpoint, di is the number of events, bi is the width of the interval, ni is the number still at risk at the beginning of the interval, and wi is the number of cases withdrawn (censored) within the interval. A better estimate of the hazard could be obtained by di/Ti, where Ti is the total exposure time within interval i. For each individual, exposure time is the amount of time actually observed within the interval. For an individual who is known to have survived the whole interval, exposure time is just the interval width bi For individuals who had events or who withdrew in the interval, exposure time is the time from the beginning of the interval until the event or withdrawal. Total exposure time is the sum of all the individual exposure times. The denominator in equation (3.3) is an approximation to total exposure time, such that all events and all withdrawals are presumed to occur at the midpoint of the interval (thus, the division by 2). Why use an inferior estimator? Well, exact exposure times are not always available (see the next section), so the estimator in equation (3.3) has become the standard for life tables.

Output 3.9. Hazard Estimates for Recidivism Data


You can get plots of the survival and hazard estimates by putting PLOTS=(S,H) in the PROC LIFETEST statement. Output 3.9 displays the graph of the hazard function. Apparently the hazard of arrest increases steadily until the 50-60 week interval, when it drops precipitously from .077 to .043. This drop is an artifact of the way the last interval is constructed, however. Although the interval runs from 50 to 60, in fact, no one was at risk of an arrest after week 52 when observation was terminated. As a result, the denominator in equation (3.3) is a gross overestimate of the exposure time in the interval.

We can fix this by explicitly setting the last interval to end at 53. (If we set it at 52, the interval will not include arrests that occurred in week 52 because the interval is open on the right). At the same time, it is better to recode the censored times from 52 to 53 because they are not censored within the interval but rather at the end. The recode has the effect of crediting the full interval (rather than only half) as exposure time for the censored cases.

Here’s the revised SAS code:

data;
   set recid;
   if arrest=0 then week=53;
run;
proc lifetest method=life plots=(s,h) graphics
              intervals=10 20 30 40 50 53;
   time week*arrest(0);
run;

The resulting hazard estimates and plot in Output 3.10 show only a slight tendency for the hazard to decline at the end of the one-year observation period.

Output 3.10. Corrected Hazard Estimates and Plot for Recidivism Data

            Evaluated at the Midpoint of the Interval

                  Median                PDF               Hazard
  Interval      Standard            Standard            Standard
[Lower,  Upper)    Error      PDF      Error    Hazard     Error

    0       10         .   0.00324  0.000852  0.003294   0.00088
   10       20         .   0.00486   0.00103  0.005153  0.001124
   20       30         .   0.00532   0.00108  0.005966  0.001244
   30       40         .   0.00532   0.00108  0.006345  0.001322
   40       50         .   0.00602   0.00114  0.007692  0.001507
   50       53         .   0.00540   0.00202  0.007258  0.002743
   53        .         .         .         .         .         .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.137.17