How to Choose a Method

In this book we’ve examined many different approaches to the analysis of survival data: Kaplan-Meier estimation, log-rank tests, accelerated failure time models, piecewise exponential models, Cox regression models, logit models, and complementary log-log models. Each of these methods is worth using in some situations. Along the way I have tried to point out their relative advantages and disadvantages, but those discussions are scattered throughout the book. It’s not easy to keep all the points in mind when designing a study or planning an analysis.

Many readers of early versions of this book urged me to provide a concluding road map for choosing a method of survival analysis. Although I give this kind of advice all the time, I do so here with some hesitation. While statisticians may have great consensus about the characteristics of various statistical methods, the choice among competing methods is often very personal, especially when dealing with methods as similar in spirit and results as those presented here. Five equally knowledgeable consultants could easily give you five different recommendations. I’m going to present some rules of thumb that I rely on myself in giving advice, but please don’t take them as authoritative pronouncements. Use them only to the degree that you find their rationales persuasive.

Make Cox Regression Your Default Method

Given the relative length of Chapter 5, “Estimating Cox Regression Models with PROC PHREG,” it will come as no surprise that I have a strong preference for Cox regression via PROC PHREG. This particular method

  • is more robust than the accelerated failure time methods

  • has excellent capabilities for time-dependent covariates

  • handles both continuous-time and discrete-time data

  • allows for late entry and temporary exit from the risk set

  • has a facility for nonparametric adjustment of nuisance variables (stratification).

PROC PHREG can also do log-rank tests (using a single dichotomous covariate). With Release 6.10 (and later), PROC PHREG will even do Kaplan-Meier estimation (by fitting a model with no covariates and using the BASELINE statement to produce a table of survival probabilities).

Beyond these intrinsic advantages, Cox regression has the considerable attraction of being widely used, accepted, and understood. What’s the point of choosing a marginally superior method if your audience is confused or skeptical and you have to waste valuable time and space with explanation and justification? For better or worse (mostly for better), Cox regression has become the standard, and it makes little sense to choose a different method unless you have a good reason for doing so. Having said that, I’ll now suggest some possible good reasons. They are presented in the form of questions that you should ask yourself when deciding on a method.

Do You Have Many Time-Dependent Covariates?

Although PROC PHREG does a good job at handling time-dependent covariates, the programming necessary for creating and manipulating those covariates can often be complex and confusing. Furthermore, the complete code for creating the covariates must be a part of every PROC PHREG step. While this may be tolerable for a single time-dependent covariate, the task can easily become overwhelming when there are 15 or 20 such covariates. This is not uncommon in the social sciences where panel surveys with large numbers of questions are conducted at regular intervals of time.

In such situations, you may want to consider using one of the methods that produces multiple observations per individual: the piecewise exponential model described in Chapter 4, “Estimating Parametric Regression Models with PROC LIFEREG,” or the logit and complementary log-log models of Chapter 7, “Analysis of Tied or Discrete Data with the LOGISTIC, PROBIT, and GENMOD procedures.” With these methods, you need one DATA step to produce the multiple records, with all of the time-dependent covariates assigned the appropriate values for each record. Once the new data set is constructed, you can proceed to estimate as many models as you like with no additional data manipulation. In these models, the time-dependent covariates are treated just like the fixed covariates. This approach is particularly well suited to situations where an expert programmer is available to produce the data set, which is then passed to an analyst. The analyst can then fit relatively simple models without worrying about the complexities of the time-dependent covariates.

If you choose to go this route, then you must decide which of the three alternative methods is most appropriate. If the event times are measured with considerable precision, then the piecewise exponential method has the edge—there is no loss of precision in the estimation process. On the other hand, when event times are measured coarsely, the piecewise exponential method is inappropriate. In those cases, you should use the methods of Chapter 7. As we saw there, the complementary log-log model is preferable when there is an underlying continuous-time process, while the logit model is appropriate when the event times are truly discrete.

Is the Sample Large with Heavily Tied Event Times?

As we saw in Chapter 5, PROC PHREG can deal with situations in which there are many events occurring at the same recorded times using either the DISCRETE or the EXACT methods. Unfortunately, those options can take a great deal of computer time that increases rapidly with sample size. If you have a sample of 10,000 observations with only 10 distinct event times, you can expect to wait a long time before you see any output. Why wait when the alternatives are so attractive? The complementary log-log and logit methods of Chapter 7 estimate exactly the same models as the EXACT and DISCRETE methods with statistical efficiency that is at least as good as that provided by PROC PHREG. The only drawback is that you may have to reorganize the data to create multiple records per individual. If that’s undesirable, another alternative is to fit accelerated failure time models with PROC LIFEREG using the interval censoring option. The disadvantage there is that you must choose one of the parametric models rather than leaving the hazard function unspecified. That brings us to the next question.

Do You Want to Study the Shape of the Hazard Function?

In some studies, one of the major aims is to investigate hypotheses about the dependence of the hazard on time. Cox regression is far from ideal in such situations because it treats the dependence on time as a nuisance function that cancels out of the estimating equations. You can still produce graphs of the baseline survival and hazard functions, but those graphs don’t provide direct tests of hypotheses. And if you have any time-dependent covariates, you can’t even produce the graphs.

With PROC LIFEREG, on the other hand, you can produce formal hypothesis tests that answer the following sorts of questions:

  • Is the hazard constant over time?

  • If not constant, is the hazard increasing, decreasing, or non monotonic?

  • If increasing (or decreasing), is the rate of change going up or down?

All of these questions are addressed within the context of smooth parametric functions. If that seems too restrictive, you can get much more flexibility with the piecewise exponential, logit, or complementary log-log models. With these models, the time scale is chopped into intervals, and the least restrictive models have a set of indicator variables to represent those intervals. Restrictions can easily be imposed to represent functions of almost any desired shape. A further advantage of these multiple-record methods is that two or more time axes can be readily introduced into a single model. A model for promotions, for example, could include time since last promotion, time since initial employment by the firm, time in the labor force, and time since completion of education.

So if you want to study the dependence of the hazard on time, there are good alternatives to Cox regression. But whatever method you use, I urge you to be cautious in interpreting the results. As we saw in Chapter 8, “Heterogeneity, Repeated Events, and Other Topics,” the hazard function is strongly confounded with uncontrolled heterogeneity, which makes hazards look like they are declining with time even when they are constant or increasing. As a result, any declines in the hazard function may be purely artifactual, a possibility that you can never completely rule out.

Do You Want to Generate Predicted Event Times or Survival Probabilities?

As we saw in Chapter 5, you can use output from the BASELINE statement in PROC PHREG to get predicted median survival times or survival probabilities for any specified set of covariates. Because BASELINE produces a complete set of survivor function estimates, however, getting predicted values or five-year survival probabilities for a large number of observations can be rather cumbersome. Furthermore, predicted median survival times may be unavailable for many or all of the observations if a substantial fraction of the sample is censored. With PROC LIFEREG, on the other hand, you can easily generate predicted median survival times (or any other percentile) for all observations using the OUTPUT statement. Using my PREDICT macro, you can also produce estimated survival probabilities for a specified survival time.

Do You Have Left-Censored Data?

The only SAS procedure that allows for left censoring is PROC LIFEREG. In principle, it’s possible to adapt Cox regression to handle left censoring, but I know of no commercial program that does this.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.10.137