If the number of observations is large and if event times are precisely measured, there will be many unique event times. The KM method then produces long tables that may be unwieldy for presentation and interpretation. You can avoid this problem with the life-table method because event times are grouped into intervals that can be as long or short as you please. In addition, the life-table method (also known as the actuarial method) can produce estimates and plots of the hazard function, which are not available in PROC LIFETEST with the KM method. The downside to the life-table method is that the choice of intervals is usually somewhat arbitrary, leading to arbitrariness in the results and possible uncertainty about how to choose the intervals. There is inevitably some loss of information as well. Note, however, that PROC LIFETEST computes the log-rank and Wilcoxon statistics from the ungrouped data (if available) so they are unaffected by the choice of intervals for the life-table method.
Let’s first see what the life-table method produces in PROC LIFETEST, and then we’ll discuss some details of the calculations. Output 3.7 displays the first 20 observations from a recidivism study that was briefly described in Chapter 1, “Introduction,” in the section Why Use Survival Analysis? The sample consisted of 432 male inmates who were released from Maryland state prisons in the early 1970s (Rossi et al. 1980). These men were followed for one year after their release, and the dates of any arrests were recorded. We’ll only look at the first arrest here. (In fact, there weren’t enough men with two or more arrests to use the techniques for repeated events discussed in Chapter 8, “Heterogeneity, Repeated Events, and Other Topics.”) The WEEK variable contains the week of the first arrest after release. The variable ARREST has a value of 1 for those arrested during the one-year follow-up, and it has a value of 0 for those who were not. Only 26 percent of the men were arrested. The data are singly right censored so that all the censored cases have a value of 52 for WEEK.
OBS WEEK FIN AGE RACE WEXP MAR PARO PRIO ARREST 1 52 1 24 1 1 0 1 1 0 2 52 0 29 1 1 0 1 3 0 3 52 1 20 1 1 1 1 1 0 4 52 0 20 1 0 0 1 1 0 5 52 1 31 0 1 0 1 3 0 6 12 1 22 1 1 1 1 2 1 7 52 0 24 1 1 0 1 2 0 8 19 0 18 1 0 0 0 2 1 9 52 0 18 1 0 0 1 3 0 10 15 1 22 1 0 0 1 3 1 11 8 1 21 1 1 0 1 4 1 12 52 1 21 1 0 0 1 1 0 13 52 1 21 0 1 0 1 1 0 14 36 1 19 1 0 0 1 2 1 15 52 0 33 1 1 0 1 2 1 16 4 0 18 1 1 0 0 1 1 17 45 1 18 1 0 0 0 5 1 18 52 0 21 1 0 0 0 0 1 19 52 1 20 1 0 1 0 1 0 20 52 0 22 1 1 0 0 1 0 |
The covariates shown in Output 3.7 have the following interpretation:
We now request a life table, using the default specification for interval lengths:
proc lifetest data=recid method=life; time week*arrest(0); run;
Output 3.8 shows the results. PROC LIFETEST constructs six intervals, starting at 0 and incrementing by periods of 10 weeks. The algorithm for the default choice of intervals is fairly complex (see the SAS/STAT User’s Guide for details). You can override the default by specifying WIDTH=w in the PROC LIFETEST statement. The intervals will then begin with [0, w) and will increment by w. Alternatively, you can get more control over the intervals by specifying INTERVALS=a b c ... in the PROC LIFETEST statement, where a, b, and c are cut points. For example, INTERVALS= 15 20 30 50 produces the intervals [0, 15), [15, 20), [20, 30), [30, 50), [50, ∞). See the SAS/STAT User’s Guide for other options. Note that intervals do not have to be the same length. It’s often desirable to make later intervals longer so that they include enough events to give reliable estimates of the hazard and other statistics.
Life Table Survival Estimates Effective Conditional Interval Number Number Sample Probability [Lower, Upper) Failed Censored Size of Failure 0 10 14 0 432.0 0.0324 10 20 21 0 418.0 0.0502 20 30 23 0 397.0 0.0579 30 40 23 0 374.0 0.0615 40 50 26 0 351.0 0.0741 50 60 7 318 166.0 0.0422 Conditional Probability Survival Median Interval Standard Standard Residual [Lower, Upper) Error Survival Failure Error Lifetime 0 10 0.00852 1.0000 0 0 . 10 20 0.0107 0.9676 0.0324 0.00852 . 20 30 0.0117 0.9190 0.0810 0.0131 . 30 40 0.0124 0.8657 0.1343 0.0164 . 40 50 0.0140 0.8125 0.1875 0.0188 . 50 60 0.0156 0.7523 0.2477 0.0208 . Evaluated at the Midpoint of the Interval Median PDF Hazard Interval Standard Standard Standard [Lower, Upper) Error PDF Error Hazard Error 0 10 . 0.00324 0.000852 0.003294 0.00088 10 20 . 0.00486 0.00103 0.005153 0.001124 20 30 . 0.00532 0.00108 0.005966 0.001244 30 40 . 0.00532 0.00108 0.006345 0.001322 40 50 . 0.00602 0.00114 0.007692 0.001507 50 60 . 0.00317 0.00118 0.004308 0.001628 |
For each interval, 14 different statistics are reported. The four statistics displayed in the first panel, while not of major interest in themselves, are necessary for calculating the later statistics. Number Failed and Number Censored should be self-explanatory. Effective Sample Size is straightforward for the first five intervals because they contain no censored cases. The effective sample size for these intervals is just the number of persons who had not yet been arrested at the start of the interval. For the last interval, however, the effective sample size is only 166, even though 351 persons made it to the 50th week without an arrest. Why? The answer is a fundamental property of the life-table method. The method treats any cases censored within an interval as if they were censored at the midpoint of the interval. This treatment is equivalent to assuming that the distribution of censoring times is uniform within the interval. Since censored cases are only at risk for half of the interval, they only count for half in figuring the effective sample size. Thus, the effective sample size for the last interval is thus 7+318/2=166. The 7 corresponds to the seven men who were arrested in the interval; they are treated as though they were at risk for the whole interval.
The Conditional Probability of Failure is an estimate of the probability that a person will be arrested in the interval, given that he made it to the start of the interval. This estimate is calculated as (number failed)/(effective sample size). An estimate of its standard error is given in the next panel. The Survival column is the life-table estimate of the survivor function, that is, the probability that the event occurs at a time greater than or equal to the start time of each interval. For example, the estimated probability that an inmate will not be arrested until week 30 or later is .8657.
The survival estimate is calculated from the conditional probabilities of failure in the following way. For interval i, let ti be the start time and qi be the conditional probability of failure. The probability of surviving to ti or beyond is then
Equation 3.2
For i = 1 and, hence, ti = 0, the survival probability is set to 1.0.
The rationale for equation (3.2) is a fairly simple application of conditional probability theory. Suppose we want an expression for the probability of surviving to t4 or beyond. To obtain this, let’s define the following events:
A | = survival to t2 or beyond. |
B | = survival to t3 or beyond. |
C | = survival to t4 or beyond. |
We want the probability of C. But since you can’t get past t4 without getting past t2 and t3, we can write Pr(C) = Pr(A, B, C). By the definition of conditional probability, we can rewrite this as
Pr(A, B, C)=Pr(C | A, B)Pr(B | A)Pr(A)=(l – q3)(1 – q2)(1 – q1).
Extending this argument to other intervals yields the formula in equation (3.2). Note the similarity between this formula and equation (3.1) for the KM estimator. In equation (3.1), dj/nj is equivalent to qj in equation (3.2); both are estimates of the probability of failure in an interval given survival to the start of the interval. The major differences between the two formulas are as follows:
The number of censored observations in an interval is not halved in the KM estimator.
The interval boundaries for the KM estimator are determined by the event times themselves.
Thus, each interval for KM estimation extends from one unique event time up to, but not including, the next unique event time.
Continuing with the second panel of Output 3.8, the Failure column is just 1 minus the Survival column. We are also given the standard errors of the Survival probabilities. As with the KM method, these standard errors are calculated by Greenwood’s formula, and we can use them to construct confidence intervals around the survival probabilities. The Median Residual Lifetime column is, in principle, an estimate of the remaining time until an event for an individual who survived to the start of the interval. For this example, however, the estimates are all missing. To calculate this statistic for a given interval, it is necessary that there be a later interval whose survival probability is less than half the survival probability associated with the interval of interest. It is apparent from Output 3.8 that no interval satisfies this criterion. For any set of data, there will nearly always be some later intervals for which this statistic cannot be calculated.
The PDF column is an estimate of the probability density function at the midpoint of the interval. Of greater interest is the Hazard column, which gives estimates of the hazard function at the midpoint of each interval. This is calculated as
Equation 3.3
where for the ith interval, tim is the midpoint, di is the number of events, bi is the width of the interval, ni is the number still at risk at the beginning of the interval, and wi is the number of cases withdrawn (censored) within the interval. A better estimate of the hazard could be obtained by di/Ti, where Ti is the total exposure time within interval i. For each individual, exposure time is the amount of time actually observed within the interval. For an individual who is known to have survived the whole interval, exposure time is just the interval width bi For individuals who had events or who withdrew in the interval, exposure time is the time from the beginning of the interval until the event or withdrawal. Total exposure time is the sum of all the individual exposure times. The denominator in equation (3.3) is an approximation to total exposure time, such that all events and all withdrawals are presumed to occur at the midpoint of the interval (thus, the division by 2). Why use an inferior estimator? Well, exact exposure times are not always available (see the next section), so the estimator in equation (3.3) has become the standard for life tables.
You can get plots of the survival and hazard estimates by putting PLOTS=(S,H) in the PROC LIFETEST statement. Output 3.9 displays the graph of the hazard function. Apparently the hazard of arrest increases steadily until the 50-60 week interval, when it drops precipitously from .077 to .043. This drop is an artifact of the way the last interval is constructed, however. Although the interval runs from 50 to 60, in fact, no one was at risk of an arrest after week 52 when observation was terminated. As a result, the denominator in equation (3.3) is a gross overestimate of the exposure time in the interval.
We can fix this by explicitly setting the last interval to end at 53. (If we set it at 52, the interval will not include arrests that occurred in week 52 because the interval is open on the right). At the same time, it is better to recode the censored times from 52 to 53 because they are not censored within the interval but rather at the end. The recode has the effect of crediting the full interval (rather than only half) as exposure time for the censored cases.
Here’s the revised SAS code:
data; set recid; if arrest=0 then week=53; run; proc lifetest method=life plots=(s,h) graphics intervals=10 20 30 40 50 53; time week*arrest(0); run;
The resulting hazard estimates and plot in Output 3.10 show only a slight tendency for the hazard to decline at the end of the one-year observation period.
Evaluated at the Midpoint of the Interval Median PDF Hazard Interval Standard Standard Standard [Lower, Upper) Error PDF Error Hazard Error 0 10 . 0.00324 0.000852 0.003294 0.00088 10 20 . 0.00486 0.00103 0.005153 0.001124 20 30 . 0.00532 0.00108 0.005966 0.001244 30 40 . 0.00532 0.00108 0.006345 0.001322 40 50 . 0.00602 0.00114 0.007692 0.001507 50 53 . 0.00540 0.00202 0.007258 0.002743 53 . . . . . . |
3.133.137.17