Alternative Distributions

In ordinary linear regression, the assumption of a normal distribution for the disturbance term is routinely invoked for a wide range of applications. Yet PROC LIFEREG allows for four additional distributions for ε: extreme value (2 parameter), extreme value (1 parameter), log-gamma, and logistic. For each of these distributions, there is a corresponding distribution for T:

Distribution of εDistribution of T
extreme value (2 par.)Weibull
extreme value (1 par.)exponential
log-gammagamma
logisticlog-logistic
normallog-normal

Incidentally, all AFT models are named for the distribution of T rather than the distribution of ε or log T. You might expect that since the logistic and normal lead to the log-logistic and log-normal, the gamma will lead to the log-gamma. But it is just the reverse. This is one of those unfortunate terminological inconsistencies that we just have to live with.

What is it about survival analysis that makes these alternatives worth considering? The main reason for allowing other distributions is that they have different implications for hazard functions that may, in turn, lead to different substantive interpretations. The remainder of this section explores each of these alternatives in some detail.

The Exponential Model

The simplest model that PROC LIFEREG estimates is the exponential model, invoked by DIST=EXPONENTIAL in the MODEL statement. This model specifies that ε has a standard extreme-value distribution, and constrains σ = 1. If ε has an extreme-value distribution, then log T also has an extreme-value distribution, conditional on the covariates. This implies that T itself has an exponential distribution, which is why we call it the exponential model. The standard extreme value distribution is also known as a Gumbel distribution or a double exponential distribution. It has a p.d.f. of f(ε) = exp[ε – exp(ε)]. Like the normal distribution, this is a unimodal distribution defined on the entire real line. Unlike the normal, however, it is not symmetrical, being slightly skewed to the left.

As we saw in Chapter 2, “Basic Concepts of Survival Analysis,” an exponential distribution for T corresponds to a constant hazard function, which is the most characteristic feature of this model. However, equation (2.12) expresses the exponential regression model as

Equation 4.3


where the •s have been added to distinguish these coefficients from those in equation (4.2). Although the dependent variable in equation (4.2) is the log of time, in equation (4.3) it is the log of the hazard. It turns out that the two models are completely equivalent. Furthermore, there is a simple relationship between the coefficients in equation (4.2) and equation (4.3), namely that βj = –βj for all j.

The change in signs makes intuitive sense. If the hazard is high, then events occur quickly and survival times are short. On the other hand, when the hazard is low, events are unlikely to occur and survival times are long. It is important to be able to shift back and forth between these two ways of expressing the model so that you can compare results across different computer programs. In particular, since PROC PHREG reports coefficients in log-hazard form, we need to make the conversion in order to compare PROC LIFEREG output with PROC PHREG output.

You may wonder why there is no disturbance term in equation (4.3) (a characteristic it shares with the more familiar logistic regression model). No disturbance term is needed because there is implicit random variation in the relationship between h(t), the unobserved hazard, and the observed event time T. Even if two individuals have exactly the same covariate values (and therefore the same hazard), they will not have the same event time. Nevertheless, in Chapter 8, “Heterogeneity, Repeated Events, and Other Topics,” we will see that there have been some attempts to add a disturbance term to models like this to represent unobserved heterogeneity.

Output 4.2 shows the results of fitting the exponential model to the recidivism data. Comparing this with the log-normal results in Output 4.1, we see some noteworthy differences. The coefficient for AGE is about twice as large in the exponential model, and its p-value declines from .08 to .01. Similarly, the coefficient for PRIO increases somewhat in magnitude, and its p-value also goes down substantially. On the other hand, the p-value for FIN increases to slightly above the .05 level.

Output 4.2. Exponential Model Applied to Recidivism Data
L I F E R E G  P R O C E D U R E

Log Likelihood for EXPONENT -325.8259007

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 4.05069154  0.58604  47.77542  0.0001 Intercept
FIN        1 0.36626434 0.191116  3.672793  0.0553
AGE        1 0.05559804 0.021841  6.479813  0.0109
RACE       1 -0.3049391  0.30794  0.980603  0.3220
WEXP       1 0.14674614   0.2117  0.480499  0.4882
MAR        1 0.42698669 0.381382  1.253453  0.2629
PARO       1 0.08264792 0.195604  0.178529  0.6726
PRIO       1 -0.0856592 0.028313  9.153053  0.0025
SCALE      0          1        0                 Extreme value scale parameter
  Lagrange Multiplier ChiSquare for Scale 24.93017 Pr>Chi is 0.0001.

Clearly, the choice of model can make a substantive difference. Later, this chapter considers some criteria for choosing among these and other models. Notice that the SCALE parameter σ is forced equal to 1.0; the last line says that the “Lagrange Multiplier ChiSquare for Scale” is 24.93017 with a p-value of .0001. This is a 1 degree-of-freedom test for the null hypothesis that σ = 1. Here the null hypothesis is soundly rejected, indicating that the hazard function is not constant over time. While this might suggest that the log-normal model is superior, things are not quite that simple. There are other models to consider as well.

The Weibull Model

The Weibull model is a slight modification of the exponential model, with big consequences. By specifying DIST=WEIBULL in the MODEL statement, we retain the assumption that ε has a standard extreme-value distribution, but we relax the assumption that σ = 1. When σ>l, the hazard decreases with time. When .5<σ<1, the hazard is increasing at a decreasing rate. When 0<σ<.5, the hazard is increasing at an increasing rate. And when σ=.5, the hazard function is an increasing straight line with an origin at 0. Graphs of these hazard functions appear in Figure 2.3 (with the α in the figure equal to 1/σ – 1).

We call this the Weibull model because T has a Weibull distribution, conditional on the covariates. The Weibull distribution has long been the most popular parametric model in the biostatistical literature, for two reasons. First, it has a relatively simple survivor function that is easy to manipulate mathematically:

Second, in addition to being an AFT model, the Weibull model is also a proportional hazards model. This means that its coefficients (when suitably transformed) can be interpreted as relative hazard ratios. In fact, the Weibull model (and its special case, the exponential model) is the only model that is simultaneously a member of both these classes.

As with the exponential model, there is an exact equivalence between the log-hazard form of the model

and the log-survival time model

log T = β0 + β1x1 + ... + βkxk + σε.

The relationship between the parameters is slightly more complicated, however. Specifically, for the Weibull model

and α = 1/σ – 1. Since βj= 0 if and only if βj = 0, a test of the null hypothesis that a coefficient is 0 will be the same regardless of which form you use. On the other hand, standard errors and confidence intervals for coefficients in the log-survival time format are not so easily converted to the log-hazard format. Collett (1994, p. 282) gives formulas for accomplishing this.

Output 4.3 shows the results from fitting the Weibull model to the recidivism data. Compared with the exponential model in Output 4.2, the coefficients are all somewhat attenuated. But the standard errors are also smaller, so the chi-square statistics and p-values are hardly affected at all. Furthermore, if we convert the coefficients to the log-hazard format by changing sign and dividing by (the SCALE estimate of .7124 in the output), we get

FIN-0.382
AGE-0.057
RACE0.316
WEXP-0.150
MAR-0.437
PARO-0.083
PRIO0.092.

These coefficients are much closer to the log-hazard coefficients for the exponential model (which differ only in sign from the log-survival time coefficients).

Output 4.3. Weibull Model Applied to Recidivism Data
L I F E R E G  P R O C E D U R E

Log Likelihood for WEIBULL -319.3765238

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1  3.9901348 0.419095  90.64624  0.0001 Intercept
FIN        1 0.27216336 0.137962  3.891714  0.0485
AGE        1  0.0407138 0.016004  6.472179  0.0110
RACE       1 -0.2248024 0.220159  1.042629  0.3072
WEXP       1 0.10655659 0.151541  0.494425  0.4820
MAR        1 0.31127326 0.273302  1.297171  0.2547
PARO       1 0.05882725 0.139638   0.17748  0.6735
PRIO       1 -0.0658169 0.020941  9.878652  0.0017
SCALE      1 0.71240533 0.063423                 Extreme value scale parameter

Since (labeled SCALE in Output 4.3) is between .5 and 1, we conclude that the hazard is increasing at a decreasing rate. We can also calculate =(l/.7124)–l=0.4037, which is the coefficient for log t in the log-hazard model. Because both the dependent and independent variables are logged, this coefficient can be interpreted as follows: a 1 percent increase in time since release produces a 0.40 percent increase in the hazard for arrest.

The Log-Normal Model

Although we have already discussed the log-normal model and applied it to the recidivism data, we have not yet considered the shape of its hazard function. Unlike the Weibull model, the log-normal model has a nonmonotonic hazard function. The hazard is 0 when t=0. It rises to a peak and then declines toward 0 as t goes to infinity. The log-normal is not a proportional hazards model, and its hazard cannot be expressed in closed form (it involves the c.d.f. of a standard normal variable). It can, however, be expressed as a regression model in which the dependent variable is the logarithm of the hazard. Specifically,

log h(t) = log h0(te–βx) – βx

where h0(.) is the hazard function for an individual with x = 0. This equation also applies to the log-logistic and gamma models to be discussed shortly, except that h0(.) will be different in each case.

Some typical log-normal hazard functions are shown in Figure 4.1. All three functions correspond to distributions with a median of 1.0. When σ is large, the hazard peaks so rapidly that the function is almost indistinguishable from those like the Weibull and log-logistic that may have an infinite hazard when t = 0.

Figure 4.1. Typical Hazard Functions for a Log-Normal Model


The inverted U-shape of the log-normal hazard is often appropriate for repeatable events. Suppose, for example, that the event of interest is a residential move. Immediately after a move, the hazard of another move is likely to be extremely low. People need to rest and recoup the substantial costs involved in moving. The hazard will certainly rise with time, but much empirical evidence indicates that it eventually begins to decline. One explanation is that, as time goes by, people become increasingly invested in a particular location or community. However, Chapter 8 shows how the declining portion of the hazard function may also be a consequence of unobserved heterogeneity.

The Log-Logistic Model

Another model that allows for an inverted U-shaped hazard is the log-logistic model, which assumes that ε has a logistic distribution with p.d.f.

A symmetric distribution with a mean of 0, the logistic distribution is quite similar in shape to the normal distribution. The logistic distribution is well known to students of the logistic (logit) regression model, which can be derived by assuming (a) a linear model with a logistically distributed error term and (b) a dichotomization of the dependent variable.

If ε has a logistic distribution, then so does log T (although with a nonzero mean). It follows that T has a log-logistic distribution. The log-logistic hazard function is

where γ = 1/σ and λ = exp{–[β0 + β1 x1 + ... + βkxk]}. This produces the characteristic shapes shown in Figure 4.2, all of which correspond to distributions with a median of 1.0.

Figure 4.2. Typical Hazard Functions for the Log-Logistic Model


When σ < 1, the log-logistic hazard is similar to the log-normal hazard: it starts at 0, rises to a peak, and then declines toward 0. When σ > 1, the hazard behaves like the decreasing Weibull hazard: it starts at infinity and declines toward 0. When σ = 1, the hazard has a value of λ at t=0, and then declines toward 0 as t goes to infinity.

The log-logistic model has a rather simple survivor function,

As before, γ = 1/σ and λ = exp{–[β0 + β1x1 + ... + βkxk]}. A little algebra shows that this can be written as

where βi = βj/σ for i = l,...,k. This is nothing more than a logistic (logit) regression model, which means that we can estimate the log-logistic model for the recidivism data by fitting a logit model to the dichotomy arrested versus not arrested in the first year after release. (Because t is a constant 52 weeks, the term γ log t gets absorbed into the intercept.) Of course, this estimation method is not fully efficient because we are not using the information on the exact timing of the arrests, and we certainly will not get the same estimates. The point is that the two apparently different methods are actually estimating the same underlying model.

Because S(t) is the probability of surviving to time t, S(t)/[1–S(t)] is the odds of surviving to time t. Thus, we have a model that is linear in the log-odds. Equivalently, we can say that the log-logistic model is a member of a general class of models called proportional odds models. This class can be defined in the following way. Let Si(t) and Sj(t) be the survivor functions for any two individuals i and j. The proportional odds model says that

where øij is some constant that is specific to the pair (i, j). In fact, the log-logistic model is the only model that is both a proportional odds model and an AFT model.

To fit the log-logistic model with PROC LIFEREG, you specify DIST=LLOGISTIC as an option in the MODEL statement. Output 4.4 shows the results for the recidivism data. The first thing to notice is that the estimate of σ (labeled SCALE) is less than 1.0, implying that the estimated hazard function follows the inverted U-shaped form shown in Figure 4.2. Given what I just said about the similarity of the log-normal and log-logistic hazards, you might expect the other results to be most similar to the log-normal output in Output 4.1. But the coefficients and test statistics actually appear to be closer to those for the Weibull model in Output 4.3.

Output 4.4. Log-Logistic Model Applied to Recidivism Data
L I F E R E G  P R O C E D U R E

Log Likelihood for LLOGISTC -319.3983709

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 3.91830423 0.427434  84.03459  0.0001 Intercept
FIN        1 0.28887608 0.145585  3.937211  0.0472
AGE        1 0.03636558 0.015572  5.453767  0.0195
RACE       1 -0.2791492 0.229654  1.477496  0.2242
WEXP       1 0.17842379 0.157185  1.288494  0.2563
MAR        1 0.34730388 0.269669  1.658665  0.1978
PARO       1 0.05079816 0.149574  0.115341  0.7341
PRIO       1 -0.0691822 0.022742  9.254181  0.0023
SCALE      1 0.64713465 0.055923                   Logistic scale parameter

The Gamma Model

Survival analysis literature discusses two different gamma models: the standard (2-parameter) model and the generalized (3-parameter) model. PROC LIFEREG fits the generalized model. Because the generalized gamma model has one more parameter than any of the other models we have considered, its hazard function can take on a wide variety of shapes. In particular, the exponential, Weibull, standard gamma, and log-normal models (but not the log-logistic) are all special cases of the generalized gamma model. This fact is exploited later in this chapter when we consider likelihood ratio tests for comparing the different models. But the generalized gamma model can also take on shapes that are unlike any of these special cases. Most important, it can have hazard functions with U or bathtub shapes in which the hazard declines, reaches a minimum, and then increases. It is well known that the hazard for human mortality, considered over the whole life span, has such a shape. On the other hand, the generalized gamma model cannot represent hazard functions that have more than one peak.

Given the richness of the generalized gamma model, why not always use it instead of the other models? There are two reasons. First, the formula for the hazard function for the generalized gamma model is rather complicated, involving the gamma function and the incomplete gamma function. Consequently, you may often find it difficult to judge the shape of the hazard function from the estimated parameters. By contrast, hazard functions for the specific submodels can be rather simply described, as we have already seen. Second, computation for the generalized gamma model is considerably more difficult. For example, it took more than five times as much computer time to estimate the generalized gamma model for the recidivism data as compared with the exponential model. This fact can be an important consideration when you are working with large data sets. The generalized gamma model also has a reputation for convergence problems, although the parameterization and numerical algorithms used by PROC LIFEREG seem to have reduced these to a minimum.

To fit the generalized gamma model with PROC LIFEREG, you specify DIST=GAMMA as an option in the MODEL statement. Output 4.5 shows the results from fitting this model to the recidivism data. As usual, the SCALE parameter is the estimate of σ in equation (4.2). The estimate labeled SHAPE is the additional shape parameter that is denoted by δ in the PROC LIFEREG documentation. (In the output for earlier releases of PROC LIFEREG, this parameter is labeled GAMMA). When the shape parameter is 0, we get the log-normal distribution. When it is 1.0, we have the Weibull distribution. And when the shape parameter and the scale parameter are equal, we have the standard gamma distribution. In Output 4.5, the shape estimate is almost exactly 1.0, so we are very close to the Weibull distribution. The shape and scale parameters are also similar, so the standard gamma model is also quite plausible for these data. Later, we’ll make these comparisons more rigorous.

Output 4.5. Generalized Gamma Model Applied to the Recidivism Data
   L I F E R E G  P R O C E D U R E

Log Likelihood for GAMMA -319.3764549

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 3.99149155 0.434907  84.23224  0.0001 Intercept
FIN        1 0.27244327 0.140118  3.780636  0.0518
AGE        1 0.04066435 0.016546  6.039829  0.0140
RACE       1 -0.2254894 0.227983  0.978241  0.3226
WEXP       1 0.10734828  0.16595  0.418444  0.5177
MAR        1 0.31179029 0.276896  1.267921  0.2602
PARO       1 0.05878877 0.139813  0.176805  0.6741
PRIO       1 -0.0658616 0.021303  9.558375  0.0020
SCALE      1 0.71511922 0.239598                   Gamma scale parameter
SHAPE      1 0.99429204 0.484882                   Gamma shape parameter

As for the standard gamma model, there is no direct way of fitting this in PROC LIFEREG. Ideally, you fit the generalized model while imposing the constraint SCALE=SHAPE, but PROC LIFEREG doesn’t handle equality constraints. PROC LIFEREG does allow you to fix both the scale and shape parameters at specific values. So if you’re desperate to fit the standard gamma model, you can try out a bunch of different values (for example, with a line search) until you find the common value for the shape and scale parameters that maximizes the log-likelihood. When I did this for the recidivism data, I ended up with scale and shape parameters equal to .811. Without going into the details of the search, the code to fit the final model is as follows:

proc lifereg data=recid;
  model week*arrest(0)=fin age race wexp
        mar paro prio /dist=gamma noshape1 shape1=.811
        noscale scale=.811;
run;

Output 4.6 shows the results. The coefficients show little change from Output 4.5, which is not surprising since the log-likelihood hardly changes at all. The standard errors are probably slight underestimates because they do not take account of the sampling variation in the SCALE-SHAPE estimate.

Output 4.6. Standard Gamma Model Applied to the Recidivism Data
                   L I F E R E G  P R O C E D U R E
Log Likelihood for GAMMA -319.4636775

 Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

 INTERCPT   1 4.03969532 0.426895  89.54781  0.0001 Intercept
 FIN        1 0.28319316 0.141064  4.030268  0.0447
 AGE        1 0.03902397 0.015661  6.209049  0.0127
 RACE       1 -0.2498191 0.227472  1.206139  0.2721
 WEXP       1 0.13482081  0.15585  0.748343  0.3870
 MAR        1 0.33229455 0.275574  1.454021  0.2279
 PARO       1  0.0579843 0.145018  0.159874  0.6893
 PRIO       1 -0.0673115 0.021468  9.830681  0.0017
 SCALE      0      0.811        0                   Gamma scale parameter
 SHAPE      0      0.811        0                   Gamma shape parameter
    Lagrange Multiplier ChiSquare for Scale 0.002124 Pr>Chi is 0.9632.

What is the interpretation of the standard gamma model? Under this model, the survival time T has the gamma p.d.f.

where K = l/δ2 (δ is the shape parameter reported by LIFEREG). The λ parameter is a function of the covariates: λ = exp{–[β0 + β1x1 + ... + βkxk]}. Γ(.) is the gamma function, a well-known function in mathematics that is defined by an integral. When K is an integer, Γ(K) = (K – 1)! and, in particular, Γ(1) = 0! = 1.

Both the survivor and hazard functions of the standard gamma distribution are awkward because they involve incomplete gamma functions, which are expressed as integrals. When K > 1, the hazard is 0 at time 0, and increases thereafter. When 0 <K< 1, the hazard is infinite at time 0 and decreases thereafter. When K = 1, the hazard is constant, and we are back to the exponential model. So far, this is similar to the Weibull hazard. The big difference is that the increasing Weibull hazard increases without limit, while the standard gamma hazard approaches λ as an upper limit. (Clearly, if there is an upper limit, the hazard cannot increase at an increasing rate). Similarly, while the decreasing Weibull hazard approaches 0 as a lower limit, the decreasing gamma hazard has a lower limit of λ. Figure 4.3 shows examples of such hazards with λ = 1. Since λ is a log-linear function of the covariates, we can think of the covariates as raising or lowering the boundary of the hazard. The K parameter determines how quickly the limit is approached and whether it approaches from above or below. For the model in Output 4.6, we can get the ML estimate of K by taking 1/(.811)2 = 1.52. Since this is greater than 1, we have evidence for an increasing hazard.

Figure 4.3. Typical Hazard Functions for the Standard Gamma Model


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.130.24