Describing Survival Distributions

All of the standard approaches to survival analysis are probabilistic or stochastic. That is, the times at which events occur are assumed to be realizations of some random process. It follows that T, the event time for some particular individual, is a random variable having a probability distribution. There are many different models for survival data, and what often distinguishes one model from another is the probability distribution for T. Before looking at these different models, you need to understand the three different ways of describing probability distributions.

Cumulative Distribution Function

One way that works for all random variables is the cumulative distribution function, or c.d.f. The c.d.f. of a variable T, denoted by F(t), is a function that tells us the probability that the variable will be less than or equal to any value t that we choose. Thus, F(t) = Pr{Tt}. If we know the value of F for every value of t, then we know all there is to know about the distribution of T. In survival analysis, it is more common to work with a closely related function called the survivor function, defined as S(t) = Pr{T > t} = 1 – F(t). If the event of interest is a death, the survivor function gives the probability of surviving beyond t. Because S is a probability, we know that it is bounded by 0 and 1. And because T cannot be negative, we know that S(0) = 1. Finally, as t gets larger, S never increases (and usually decreases). Within these restrictions, S can have a wide variety of shapes.

Chapter 3, “Estimating and Comparing Survival Curves with PROC LIFETEST,” explains how to estimate survivor functions using life-table and Kaplan-Meier methods. Often, the objective is to compare survivor functions for different subgroups in a sample. If the survivor function for one group is always higher than the survivor function for another group, then the first group clearly lives longer than the second group. If survivor functions cross, however, the situation is more ambiguous.

Probability Density Function

When variables are continuous, another common way of describing their probability distributions is the probability density function, or p.d.f. This function is defined as

Equation 2.1


That is, the p.d.f. is just the derivative or slope of the c.d.f. Although this definition is considerably less intuitive than that for the c.d.f., it is the p.d.f. that most directly corresponds to our intuitive notions of distributional shape. For example, the familiar bell-shaped curve that is associated with the normal distribution is given by its p.d.f., not its c.d.f.

Hazard Function

For continuous survival data, the hazard function is actually more popular than the p.d.f. as a way of describing distributions. The hazard function is defined as

Equation 2.2


Instead of h(t), some authors denote the hazard by λ(t) or r(t). Because the hazard function is so central to survival analysis, it is worth taking some time to explain this definition. The aim of the definition is to quantify the instantaneous risk that an event will occur at time t. Since time is continuous, the probability that an event will occur at exactly time t is necessarily 0. But we can talk about the probability that an event occurs in the small interval between t and t + Δt. We also want to make this probability conditional on the individual surviving to time t. Why? Because if individuals have already died (that is, experienced the event), they are clearly no longer at risk of the event. Thus, we only want to consider those individuals who have made it to the beginning of the interval [t, t + Δt). These considerations point to the numerator in equation (2.2): Pr{tT<tt|Tt}.

The numerator is still not quite what we want, however. First, the probability is a nondecreasing function of Δt—the longer the interval, the more likely it is that an event will occur in that interval. To adjust for this, we divide by Δt, as in equation (2.2). Second, we want the risk for event occurrence at exactly time t, not in some interval beginning with t. So we shrink the interval down by letting Δt get smaller and smaller, until it reaches a limiting value.

The definition of the hazard function in equation (2.2) is similar to an alternative definition of the p.d.f:

Equation 2.3


The only difference is that the probability in the numerator of equation equation (2.3) is an unconditional probability, whereas the probability in equation (2.2) is conditional on Tt. For this reason, the hazard function is sometimes described as a conditional density. When events are repeatable, the hazard function is often referred to as the intensity function.

The survivor function, the probability density function, and the hazard function are equivalent ways of describing a continuous probability distribution. Given any one of them, we can recover the other two. The relationship between the p.d.f. and the survivor function is given directly by the definition in equation (2.1). Another simple formula expresses the hazard in terms of the p.d.f. and the survivor function:

Equation 2.4


Together, equations (2.4) and (2.1) imply that

Equation 2.5


Integrating both sides of equation (2.5) gives an expression for the survivor function in terms of the hazard function:

Equation 2.6


Together with equation (2.4), this formula leads to

Equation 2.7


These formulas are extremely useful in any mathematical treatment of models for survival analysis because it is often necessary to move from one representation to another.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.113.188