Why Use Survival Analysis?

Survival data have two common features that are difficult to handle with conventional statistical methods: censoring and time-dependent covariates (sometimes called time-varying explanatory variables). Consider the following example, which illustrates both these problems. A sample of 432 inmates released from Maryland state prisons was followed for one year after release (Rossi et al. 1980). The event of interest was the first arrest. The aim was to determine how the occurrence and timing of arrests depended on several covariates (predictor variables). Some of these covariates (like race, age at release, and number of previous convictions) remained constant over the one-year interval. Others (like marital status and employment status) could change at any time during the follow-up period.

How do you analyze such data using conventional methods? One possibility is to perform a logit (logistic regression) analysis with a dichotomous dependent variable: arrested or not arrested. But this analysis ignores information on the timing of arrests. It’s natural to suppose that people who are arrested one week after release have, on average, a higher propensity to be arrested than those who are not arrested until the 52nd week. At the least, ignoring that information should reduce the precision of the estimates.

One solution to this problem is to make the dependent variable the length of time between release and first arrest and then estimate a conventional linear regression model. But what do you do with the persons who were not arrested during the one-year follow-up? Such cases are referred to as censored. A couple of obvious ad-hoc methods exist for dealing with censored cases, but neither method works well. One method is to discard the censored cases. That method might work well if the proportion of censored cases is small. In our recidivism example, however, fully 75 percent of the cases were not arrested during the first year after release. That’s a lot of data to discard, and it has been shown that large biases may result. Alternatively, you could set the time of arrest at one year for all those who were not arrested. That’s clearly an underestimate, however, and some of those ex-convicts may never be arrested. Again, large biases may occur.

Whichever method you use, it’s not at all clear how a time-dependent variable like employment status can be appropriately incorporated into either the logit model for the occurrence of arrests or the linear model for the timing of arrests. The data set contains information on whether each person was working full time during each of the 52 weeks of follow-up. You could, I suppose, estimate a model with 52 indicator (dummy) variables for employment status. Aside from the computational awkwardness and statistical inefficiency of such a procedure, there is a more fundamental problem that all the employment indicators for weeks after an arrest might be consequences of the arrest rather than causes. In particular, someone who is jailed after an arrest is not likely to be working full time in subsequent weeks. In short, conventional methods don’t offer much hope for dealing with either censoring or time-dependent covariates.

By contrast, all methods of survival analysis allow for censoring, and many also allow for time-dependent covariates. In the case of censoring, the trick is to devise a procedure that combines the information in the censored and uncensored cases in a way that produces consistent estimates of the parameters of interest. You can easily accomplish this by the method of maximum likelihood or its close cousin, partial likelihood. Time-dependent covariates can also be incorporated with these likelihood-based methods. Later chapters explain how you can usefully apply these methods to the recidivism data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.217.134