In ordinary linear regression, the assumption of a normal distribution for the disturbance term is routinely invoked for a wide range of applications. Yet PROC LIFEREG allows for four additional distributions for ε: extreme value (2 parameter), extreme value (1 parameter), log-gamma, and logistic. For each of these distributions, there is a corresponding distribution for T:
Distribution of ε | Distribution of T |
---|---|
extreme value (2 par.) | Weibull |
extreme value (1 par.) | exponential |
log-gamma | gamma |
logistic | log-logistic |
normal | log-normal |
Incidentally, all AFT models are named for the distribution of T rather than the distribution of ε or log T. You might expect that since the logistic and normal lead to the log-logistic and log-normal, the gamma will lead to the log-gamma. But it is just the reverse. This is one of those unfortunate terminological inconsistencies that we just have to live with.
What is it about survival analysis that makes these alternatives worth considering? The main reason for allowing other distributions is that they have different implications for hazard functions that may, in turn, lead to different substantive interpretations. The remainder of this section explores each of these alternatives in some detail.
The simplest model that PROC LIFEREG estimates is the exponential model, invoked by DIST=EXPONENTIAL in the MODEL statement. This model specifies that ε has a standard extreme-value distribution, and constrains σ = 1. If ε has an extreme-value distribution, then log T also has an extreme-value distribution, conditional on the covariates. This implies that T itself has an exponential distribution, which is why we call it the exponential model. The standard extreme value distribution is also known as a Gumbel distribution or a double exponential distribution. It has a p.d.f. of f(ε) = exp[ε – exp(ε)]. Like the normal distribution, this is a unimodal distribution defined on the entire real line. Unlike the normal, however, it is not symmetrical, being slightly skewed to the left.
As we saw in Chapter 2, “Basic Concepts of Survival Analysis,” an exponential distribution for T corresponds to a constant hazard function, which is the most characteristic feature of this model. However, equation (2.12) expresses the exponential regression model as
Equation 4.3
where the •s have been added to distinguish these coefficients from those in equation (4.2). Although the dependent variable in equation (4.2) is the log of time, in equation (4.3) it is the log of the hazard. It turns out that the two models are completely equivalent. Furthermore, there is a simple relationship between the coefficients in equation (4.2) and equation (4.3), namely that βj = –βj• for all j.
The change in signs makes intuitive sense. If the hazard is high, then events occur quickly and survival times are short. On the other hand, when the hazard is low, events are unlikely to occur and survival times are long. It is important to be able to shift back and forth between these two ways of expressing the model so that you can compare results across different computer programs. In particular, since PROC PHREG reports coefficients in log-hazard form, we need to make the conversion in order to compare PROC LIFEREG output with PROC PHREG output.
You may wonder why there is no disturbance term in equation (4.3) (a characteristic it shares with the more familiar logistic regression model). No disturbance term is needed because there is implicit random variation in the relationship between h(t), the unobserved hazard, and the observed event time T. Even if two individuals have exactly the same covariate values (and therefore the same hazard), they will not have the same event time. Nevertheless, in Chapter 8, “Heterogeneity, Repeated Events, and Other Topics,” we will see that there have been some attempts to add a disturbance term to models like this to represent unobserved heterogeneity.
Output 4.2 shows the results of fitting the exponential model to the recidivism data. Comparing this with the log-normal results in Output 4.1, we see some noteworthy differences. The coefficient for AGE is about twice as large in the exponential model, and its p-value declines from .08 to .01. Similarly, the coefficient for PRIO increases somewhat in magnitude, and its p-value also goes down substantially. On the other hand, the p-value for FIN increases to slightly above the .05 level.
L I F E R E G P R O C E D U R E Log Likelihood for EXPONENT -325.8259007 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 4.05069154 0.58604 47.77542 0.0001 Intercept FIN 1 0.36626434 0.191116 3.672793 0.0553 AGE 1 0.05559804 0.021841 6.479813 0.0109 RACE 1 -0.3049391 0.30794 0.980603 0.3220 WEXP 1 0.14674614 0.2117 0.480499 0.4882 MAR 1 0.42698669 0.381382 1.253453 0.2629 PARO 1 0.08264792 0.195604 0.178529 0.6726 PRIO 1 -0.0856592 0.028313 9.153053 0.0025 SCALE 0 1 0 Extreme value scale parameter Lagrange Multiplier ChiSquare for Scale 24.93017 Pr>Chi is 0.0001. |
Clearly, the choice of model can make a substantive difference. Later, this chapter considers some criteria for choosing among these and other models. Notice that the SCALE parameter σ is forced equal to 1.0; the last line says that the “Lagrange Multiplier ChiSquare for Scale” is 24.93017 with a p-value of .0001. This is a 1 degree-of-freedom test for the null hypothesis that σ = 1. Here the null hypothesis is soundly rejected, indicating that the hazard function is not constant over time. While this might suggest that the log-normal model is superior, things are not quite that simple. There are other models to consider as well.
The Weibull model is a slight modification of the exponential model, with big consequences. By specifying DIST=WEIBULL in the MODEL statement, we retain the assumption that ε has a standard extreme-value distribution, but we relax the assumption that σ = 1. When σ>l, the hazard decreases with time. When .5<σ<1, the hazard is increasing at a decreasing rate. When 0<σ<.5, the hazard is increasing at an increasing rate. And when σ=.5, the hazard function is an increasing straight line with an origin at 0. Graphs of these hazard functions appear in Figure 2.3 (with the α in the figure equal to 1/σ – 1).
We call this the Weibull model because T has a Weibull distribution, conditional on the covariates. The Weibull distribution has long been the most popular parametric model in the biostatistical literature, for two reasons. First, it has a relatively simple survivor function that is easy to manipulate mathematically:
Second, in addition to being an AFT model, the Weibull model is also a proportional hazards model. This means that its coefficients (when suitably transformed) can be interpreted as relative hazard ratios. In fact, the Weibull model (and its special case, the exponential model) is the only model that is simultaneously a member of both these classes.
As with the exponential model, there is an exact equivalence between the log-hazard form of the model
and the log-survival time model
log T = β0 + β1x1 + ... + βkxk + σε.
The relationship between the parameters is slightly more complicated, however. Specifically, for the Weibull model
and α = 1/σ – 1. Since βj= 0 if and only if βj• = 0, a test of the null hypothesis that a coefficient is 0 will be the same regardless of which form you use. On the other hand, standard errors and confidence intervals for coefficients in the log-survival time format are not so easily converted to the log-hazard format. Collett (1994, p. 282) gives formulas for accomplishing this.
Output 4.3 shows the results from fitting the Weibull model to the recidivism data. Compared with the exponential model in Output 4.2, the coefficients are all somewhat attenuated. But the standard errors are also smaller, so the chi-square statistics and p-values are hardly affected at all. Furthermore, if we convert the coefficients to the log-hazard format by changing sign and dividing by (the SCALE estimate of .7124 in the output), we get
FIN | -0.382 |
AGE | -0.057 |
RACE | 0.316 |
WEXP | -0.150 |
MAR | -0.437 |
PARO | -0.083 |
PRIO | 0.092. |
These coefficients are much closer to the log-hazard coefficients for the exponential model (which differ only in sign from the log-survival time coefficients).
L I F E R E G P R O C E D U R E Log Likelihood for WEIBULL -319.3765238 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 3.9901348 0.419095 90.64624 0.0001 Intercept FIN 1 0.27216336 0.137962 3.891714 0.0485 AGE 1 0.0407138 0.016004 6.472179 0.0110 RACE 1 -0.2248024 0.220159 1.042629 0.3072 WEXP 1 0.10655659 0.151541 0.494425 0.4820 MAR 1 0.31127326 0.273302 1.297171 0.2547 PARO 1 0.05882725 0.139638 0.17748 0.6735 PRIO 1 -0.0658169 0.020941 9.878652 0.0017 SCALE 1 0.71240533 0.063423 Extreme value scale parameter |
Since (labeled SCALE in Output 4.3) is between .5 and 1, we conclude that the hazard is increasing at a decreasing rate. We can also calculate =(l/.7124)–l=0.4037, which is the coefficient for log t in the log-hazard model. Because both the dependent and independent variables are logged, this coefficient can be interpreted as follows: a 1 percent increase in time since release produces a 0.40 percent increase in the hazard for arrest.
Although we have already discussed the log-normal model and applied it to the recidivism data, we have not yet considered the shape of its hazard function. Unlike the Weibull model, the log-normal model has a nonmonotonic hazard function. The hazard is 0 when t=0. It rises to a peak and then declines toward 0 as t goes to infinity. The log-normal is not a proportional hazards model, and its hazard cannot be expressed in closed form (it involves the c.d.f. of a standard normal variable). It can, however, be expressed as a regression model in which the dependent variable is the logarithm of the hazard. Specifically,
log h(t) = log h0(te–βx) – βx
where h0(.) is the hazard function for an individual with x = 0. This equation also applies to the log-logistic and gamma models to be discussed shortly, except that h0(.) will be different in each case.
Some typical log-normal hazard functions are shown in Figure 4.1. All three functions correspond to distributions with a median of 1.0. When σ is large, the hazard peaks so rapidly that the function is almost indistinguishable from those like the Weibull and log-logistic that may have an infinite hazard when t = 0.
The inverted U-shape of the log-normal hazard is often appropriate for repeatable events. Suppose, for example, that the event of interest is a residential move. Immediately after a move, the hazard of another move is likely to be extremely low. People need to rest and recoup the substantial costs involved in moving. The hazard will certainly rise with time, but much empirical evidence indicates that it eventually begins to decline. One explanation is that, as time goes by, people become increasingly invested in a particular location or community. However, Chapter 8 shows how the declining portion of the hazard function may also be a consequence of unobserved heterogeneity.
Another model that allows for an inverted U-shaped hazard is the log-logistic model, which assumes that ε has a logistic distribution with p.d.f.
A symmetric distribution with a mean of 0, the logistic distribution is quite similar in shape to the normal distribution. The logistic distribution is well known to students of the logistic (logit) regression model, which can be derived by assuming (a) a linear model with a logistically distributed error term and (b) a dichotomization of the dependent variable.
If ε has a logistic distribution, then so does log T (although with a nonzero mean). It follows that T has a log-logistic distribution. The log-logistic hazard function is
where γ = 1/σ and λ = exp{–[β0 + β1 x1 + ... + βkxk]}. This produces the characteristic shapes shown in Figure 4.2, all of which correspond to distributions with a median of 1.0.
When σ < 1, the log-logistic hazard is similar to the log-normal hazard: it starts at 0, rises to a peak, and then declines toward 0. When σ > 1, the hazard behaves like the decreasing Weibull hazard: it starts at infinity and declines toward 0. When σ = 1, the hazard has a value of λ at t=0, and then declines toward 0 as t goes to infinity.
The log-logistic model has a rather simple survivor function,
As before, γ = 1/σ and λ = exp{–[β0 + β1x1 + ... + βkxk]}. A little algebra shows that this can be written as
where β•i = βj/σ for i = l,...,k. This is nothing more than a logistic (logit) regression model, which means that we can estimate the log-logistic model for the recidivism data by fitting a logit model to the dichotomy arrested versus not arrested in the first year after release. (Because t is a constant 52 weeks, the term γ log t gets absorbed into the intercept.) Of course, this estimation method is not fully efficient because we are not using the information on the exact timing of the arrests, and we certainly will not get the same estimates. The point is that the two apparently different methods are actually estimating the same underlying model.
Because S(t) is the probability of surviving to time t, S(t)/[1–S(t)] is the odds of surviving to time t. Thus, we have a model that is linear in the log-odds. Equivalently, we can say that the log-logistic model is a member of a general class of models called proportional odds models. This class can be defined in the following way. Let Si(t) and Sj(t) be the survivor functions for any two individuals i and j. The proportional odds model says that
where øij is some constant that is specific to the pair (i, j). In fact, the log-logistic model is the only model that is both a proportional odds model and an AFT model.
To fit the log-logistic model with PROC LIFEREG, you specify DIST=LLOGISTIC as an option in the MODEL statement. Output 4.4 shows the results for the recidivism data. The first thing to notice is that the estimate of σ (labeled SCALE) is less than 1.0, implying that the estimated hazard function follows the inverted U-shaped form shown in Figure 4.2. Given what I just said about the similarity of the log-normal and log-logistic hazards, you might expect the other results to be most similar to the log-normal output in Output 4.1. But the coefficients and test statistics actually appear to be closer to those for the Weibull model in Output 4.3.
L I F E R E G P R O C E D U R E Log Likelihood for LLOGISTC -319.3983709 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 3.91830423 0.427434 84.03459 0.0001 Intercept FIN 1 0.28887608 0.145585 3.937211 0.0472 AGE 1 0.03636558 0.015572 5.453767 0.0195 RACE 1 -0.2791492 0.229654 1.477496 0.2242 WEXP 1 0.17842379 0.157185 1.288494 0.2563 MAR 1 0.34730388 0.269669 1.658665 0.1978 PARO 1 0.05079816 0.149574 0.115341 0.7341 PRIO 1 -0.0691822 0.022742 9.254181 0.0023 SCALE 1 0.64713465 0.055923 Logistic scale parameter |
Survival analysis literature discusses two different gamma models: the standard (2-parameter) model and the generalized (3-parameter) model. PROC LIFEREG fits the generalized model. Because the generalized gamma model has one more parameter than any of the other models we have considered, its hazard function can take on a wide variety of shapes. In particular, the exponential, Weibull, standard gamma, and log-normal models (but not the log-logistic) are all special cases of the generalized gamma model. This fact is exploited later in this chapter when we consider likelihood ratio tests for comparing the different models. But the generalized gamma model can also take on shapes that are unlike any of these special cases. Most important, it can have hazard functions with U or bathtub shapes in which the hazard declines, reaches a minimum, and then increases. It is well known that the hazard for human mortality, considered over the whole life span, has such a shape. On the other hand, the generalized gamma model cannot represent hazard functions that have more than one peak.
Given the richness of the generalized gamma model, why not always use it instead of the other models? There are two reasons. First, the formula for the hazard function for the generalized gamma model is rather complicated, involving the gamma function and the incomplete gamma function. Consequently, you may often find it difficult to judge the shape of the hazard function from the estimated parameters. By contrast, hazard functions for the specific submodels can be rather simply described, as we have already seen. Second, computation for the generalized gamma model is considerably more difficult. For example, it took more than five times as much computer time to estimate the generalized gamma model for the recidivism data as compared with the exponential model. This fact can be an important consideration when you are working with large data sets. The generalized gamma model also has a reputation for convergence problems, although the parameterization and numerical algorithms used by PROC LIFEREG seem to have reduced these to a minimum.
To fit the generalized gamma model with PROC LIFEREG, you specify DIST=GAMMA as an option in the MODEL statement. Output 4.5 shows the results from fitting this model to the recidivism data. As usual, the SCALE parameter is the estimate of σ in equation (4.2). The estimate labeled SHAPE is the additional shape parameter that is denoted by δ in the PROC LIFEREG documentation. (In the output for earlier releases of PROC LIFEREG, this parameter is labeled GAMMA). When the shape parameter is 0, we get the log-normal distribution. When it is 1.0, we have the Weibull distribution. And when the shape parameter and the scale parameter are equal, we have the standard gamma distribution. In Output 4.5, the shape estimate is almost exactly 1.0, so we are very close to the Weibull distribution. The shape and scale parameters are also similar, so the standard gamma model is also quite plausible for these data. Later, we’ll make these comparisons more rigorous.
L I F E R E G P R O C E D U R E Log Likelihood for GAMMA -319.3764549 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 3.99149155 0.434907 84.23224 0.0001 Intercept FIN 1 0.27244327 0.140118 3.780636 0.0518 AGE 1 0.04066435 0.016546 6.039829 0.0140 RACE 1 -0.2254894 0.227983 0.978241 0.3226 WEXP 1 0.10734828 0.16595 0.418444 0.5177 MAR 1 0.31179029 0.276896 1.267921 0.2602 PARO 1 0.05878877 0.139813 0.176805 0.6741 PRIO 1 -0.0658616 0.021303 9.558375 0.0020 SCALE 1 0.71511922 0.239598 Gamma scale parameter SHAPE 1 0.99429204 0.484882 Gamma shape parameter |
As for the standard gamma model, there is no direct way of fitting this in PROC LIFEREG. Ideally, you fit the generalized model while imposing the constraint SCALE=SHAPE, but PROC LIFEREG doesn’t handle equality constraints. PROC LIFEREG does allow you to fix both the scale and shape parameters at specific values. So if you’re desperate to fit the standard gamma model, you can try out a bunch of different values (for example, with a line search) until you find the common value for the shape and scale parameters that maximizes the log-likelihood. When I did this for the recidivism data, I ended up with scale and shape parameters equal to .811. Without going into the details of the search, the code to fit the final model is as follows:
proc lifereg data=recid; model week*arrest(0)=fin age race wexp mar paro prio /dist=gamma noshape1 shape1=.811 noscale scale=.811; run;
Output 4.6 shows the results. The coefficients show little change from Output 4.5, which is not surprising since the log-likelihood hardly changes at all. The standard errors are probably slight underestimates because they do not take account of the sampling variation in the SCALE-SHAPE estimate.
L I F E R E G P R O C E D U R E Log Likelihood for GAMMA -319.4636775 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 4.03969532 0.426895 89.54781 0.0001 Intercept FIN 1 0.28319316 0.141064 4.030268 0.0447 AGE 1 0.03902397 0.015661 6.209049 0.0127 RACE 1 -0.2498191 0.227472 1.206139 0.2721 WEXP 1 0.13482081 0.15585 0.748343 0.3870 MAR 1 0.33229455 0.275574 1.454021 0.2279 PARO 1 0.0579843 0.145018 0.159874 0.6893 PRIO 1 -0.0673115 0.021468 9.830681 0.0017 SCALE 0 0.811 0 Gamma scale parameter SHAPE 0 0.811 0 Gamma shape parameter Lagrange Multiplier ChiSquare for Scale 0.002124 Pr>Chi is 0.9632. |
What is the interpretation of the standard gamma model? Under this model, the survival time T has the gamma p.d.f.
where K = l/δ2 (δ is the shape parameter reported by LIFEREG). The λ parameter is a function of the covariates: λ = exp{–[β0 + β1x1 + ... + βkxk]}. Γ(.) is the gamma function, a well-known function in mathematics that is defined by an integral. When K is an integer, Γ(K) = (K – 1)! and, in particular, Γ(1) = 0! = 1.
Both the survivor and hazard functions of the standard gamma distribution are awkward because they involve incomplete gamma functions, which are expressed as integrals. When K > 1, the hazard is 0 at time 0, and increases thereafter. When 0 <K< 1, the hazard is infinite at time 0 and decreases thereafter. When K = 1, the hazard is constant, and we are back to the exponential model. So far, this is similar to the Weibull hazard. The big difference is that the increasing Weibull hazard increases without limit, while the standard gamma hazard approaches λ as an upper limit. (Clearly, if there is an upper limit, the hazard cannot increase at an increasing rate). Similarly, while the decreasing Weibull hazard approaches 0 as a lower limit, the decreasing gamma hazard has a lower limit of λ. Figure 4.3 shows examples of such hazards with λ = 1. Since λ is a log-linear function of the covariates, we can think of the covariates as raising or lowering the boundary of the hazard. The K parameter determines how quickly the limit is approached and whether it approaches from above or below. For the model in Output 4.6, we can get the ML estimate of K by taking 1/(.811)2 = 1.52. Since this is greater than 1, we have evidence for an increasing hazard.
3.14.130.24