Graphical Methods for Evaluating Model Fit

Another way to discriminate between different probability distributions is to use graphical diagnostics. In Chapter 3 we saw how to use plots of the estimated survivor function to evaluate two of the distributional models considered in this chapter. Specifically, if the distribution of event times is exponential, a plot of – log Ŝ(t) versus t should yield a straight line with an origin at 0. If event times have a Weibull distribution, a plot of log[–log Ŝ(t)] versus log t should also be a straight line. These plots can be requested in PROC LIFETEST with the PLOTS=(LS, LLS) option in the PROC LIFETEST statement.

Output 4.9 shows the log-survivor plot for the recidivism data. The graph is approximately linear with a slight tendency to bow upward. This tendency is consistent with earlier indications that the hazard tends to increase with time. Output 4.10 shows the log-log survivor plot for the same data. There is little evidence of nonlinearity (except for the jag in the middle), which is consistent with the Weibull model.

Output 4.9. Log-Survivor Plot for Recidivism Data


Output 4.10. Log-Log Survivor Plot for Recidivism Data


We can use a similar approach to evaluate the log-normal and log-logistic distributions, but it’s a little more trouble to produce the graphs. The steps are as follows:

  1. Use PROC LIFETEST to get the Kaplan-Meier estimate of the survivor function and output it to a SAS data set.

  2. In a new DATA step, apply appropriate transformations to the survivor estimates.

  3. Use the PLOT or GPLOT procedures to produce the desired graphs.

For the log-normal distribution, a plot of Φ-1[1 – Ŝ(t)] versus log t should be linear, where Φ(.) is the c.d.f of a standard normal variable and Φ-1 is its inverse. Similarly, a log-logistic distribution implies that a plot of log[(1 – Ŝ(t))/Ŝ(t)] versus log t will be linear. Here’s the SAS code for producing these plots for the recidivism data:

proc lifetest data=recid outsurv=a;
   time week*arrest(0);
run;
data;
   set a;
   s=survival;
   logit=log((l-s)/s);
   lnorm=probit(1-s);
   lweek=log(week);
run;

proc gplot;
   symboll value=none i=join;
   plot logit*lweek lnorm*lweek;
run;

The OUTSURV option on the first line produces a data set (named A in this example) that includes the KM estimates of the survivor function in a variable called SURVIVAL. See Output 3.3 for an example of what’s contained in such data sets. In the DATA step that follows, the variable SURVIVAL is renamed S to make it easier to specify the transformations. Next, the two transformations are calculated, along with the logarithm of the time variable (PROBIT is the built-in SAS function that gives the inverse of the standard normal c.d.f.) Finally, the two plots are requested. These are shown in Output 4.11 and Output 4.12. The plot for the log-logistic distribution shows some minor deviations from linearity, while the log-normal plot appears to be more seriously bowed upward.

Output 4.11. Plot for Evaluating Log-Logistic Model


Output 4.12. Plot for Evaluating Log-Normal Model


One difficulty with all these plots is that they are based on the assumption that the sample is drawn from a homogeneous population, implying that no covariates are related to survival time. In practice, that means that a model that looks fine on the plots may not fit well when covariates are taken into account. Similarly, a model that is rejected on the basis of the plots may be quite satisfactory when survival time is allowed to depend on covariates. One solution to this problem is to create plots on the residuals from the regression models. Not only does this take the covariates into account in judging model fit, it also leads to a single type of transformation and plot regardless of the model fitted.

Several different kinds of residuals have been proposed for survival models (Collett 1994), but the ones most suitable for this purpose are Cox-Snell residuals, defined as

where ti is the observed event time or censoring time for individual i, xi is the vector of covariate values for individual i, and Ŝ(t) is the estimated probability of surviving to time t, based on the fitted model. Now the eis are rather unlike the usual residuals calculated from a linear regression model. For one thing, they’re always positive. For our purposes, however, what’s important about these residuals if that, if the fitted model is correct, the eis have (approximately) an exponential distribution with parameter λ=l. (If ti is a censoring time, then ei is also treated as a censored observation.) But we already have a graphical method for evaluating exponential distributions with censoring: compute the KM estimator of the survivor function, take minus the log of the estimated survivor function, and plot that against t (actually e in this case). The resulting graph should be a straight line, with a slope of 1 and an origin at 0.

Here’s an example of how to do this for a Weibull model fitted to the recidivism data:

proc lifereg data=recid;
   model week*arrest(0)=fin age race wexp mar paro prio
         / dist=weibull;
   output out=a cdf=f;
run;

data b;
   set a;
   e=-log(l-f);
run;

proc lifetest data=b plots=(1s) notable graphics;
   time e*arrest(0);
   symbol1 v=none;
run;

Output 4.13. Residual Plot for Weibull Model


The OUTPUT statement in the LIFEREG procedure defines an output data set (here named A) containing all of the original data and selected additional variables. By specifying CDF=F, we request the estimated c.d.f. evaluated at ti, and we give that variable the name F (or any other name we choose). Since the c.d.f. is just 1 minus the survivor function, we’re halfway there in getting the residuals. In the DATA step, we take minus the log of 1– F to get the Cox-Snell residuals. Finally, in PROC LIFETEST we request the log-survivor plot for the residuals (the NOTABLE option suppresses the KM table). We can repeat this set of statements for each choice of distribution, changing only the DIST option in the MODEL statement.

Output 4.14. Residual Plot for Log-Normal Model


Unfortunately, while this method is attractive in theory and is easy to implement, I have not found it to be sensitive to differences in model fit. Output 4.13 shows the plot for the Weibull model, which fit the data well according to the earlier likelihood ratio test. To my eye, it looks pretty straight. Output 4.14 for the log-normal model also looks fairly straight, even though the likelihood ratio test indicated rejection. Any differences between the two plots are quite subtle. Plots for the other models are also similar.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.127.141