Graphical Methods for Evaluating Model Fit

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Graphical Methods for Evaluating Model Fit

Another way to discriminate between different probability distributions is to use graphical diagnostics. In Chapter 3 we saw how to use plots of the estimated survivor function to evaluate two of the distributional models considered in this chapter. Specifically, if the distribution of event times is exponential, a plot of – log Ŝ(t) versus t should yield a straight line with an origin at 0. If event times have a Weibull distribution, a plot of log[–log Ŝ(t)] versus log t should also be a straight line. These plots can be requested in PROC LIFETEST with the PLOTS=(LS, LLS) option in the PROC LIFETEST statement.

Output 4.9 shows the log-survivor plot for the recidivism data. The graph is approximately linear with a slight tendency to bow upward. This tendency is consistent with earlier indications that the hazard tends to increase with time. Output 4.10 shows the log-log survivor plot for the same data. There is little evidence of nonlinearity (except for the jag in the middle), which is consistent with the Weibull model.

Output 4.9. Log-Survivor Plot for Recidivism Data

Output 4.10. Log-Log Survivor Plot for Recidivism Data

We can use a similar approach to evaluate the log-normal and log-logistic distributions, but it’s a little more trouble to produce the graphs. The steps are as follows:

Use PROC LIFETEST to get the Kaplan-Meier estimate of the survivor function and output it to a SAS data set.
In a new DATA step, apply appropriate transformations to the survivor estimates.
Use the PLOT or GPLOT procedures to produce the desired graphs.

For the log-normal distribution, a plot of Φ^-1[1 – Ŝ(t)] versus log t should be linear, where Φ(.) is the c.d.f of a standard normal variable and Φ^-1 is its inverse. Similarly, a log-logistic distribution implies that a plot of log[(1 – Ŝ(t))/Ŝ(t)] versus log t will be linear. Here’s the SAS code for producing these plots for the recidivism data:

proc lifetest data=recid outsurv=a;
   time week*arrest(0);
run;
data;
   set a;
   s=survival;
   logit=log((l-s)/s);
   lnorm=probit(1-s);
   lweek=log(week);
run;

proc gplot;
   symboll value=none i=join;
   plot logit*lweek lnorm*lweek;
run;

The OUTSURV option on the first line produces a data set (named A in this example) that includes the KM estimates of the survivor function in a variable called SURVIVAL. See Output 3.3 for an example of what’s contained in such data sets. In the DATA step that follows, the variable SURVIVAL is renamed S to make it easier to specify the transformations. Next, the two transformations are calculated, along with the logarithm of the time variable (PROBIT is the built-in SAS function that gives the inverse of the standard normal c.d.f.) Finally, the two plots are requested. These are shown in Output 4.11 and Output 4.12. The plot for the log-logistic distribution shows some minor deviations from linearity, while the log-normal plot appears to be more seriously bowed upward.

Output 4.11. Plot for Evaluating Log-Logistic Model

Output 4.12. Plot for Evaluating Log-Normal Model

One difficulty with all these plots is that they are based on the assumption that the sample is drawn from a homogeneous population, implying that no covariates are related to survival time. In practice, that means that a model that looks fine on the plots may not fit well when covariates are taken into account. Similarly, a model that is rejected on the basis of the plots may be quite satisfactory when survival time is allowed to depend on covariates. One solution to this problem is to create plots on the residuals from the regression models. Not only does this take the covariates into account in judging model fit, it also leads to a single type of transformation and plot regardless of the model fitted.

Several different kinds of residuals have been proposed for survival models (Collett 1994), but the ones most suitable for this purpose are Cox-Snell residuals, defined as

where t_i is the observed event time or censoring time for individual i, x_i is the vector of covariate values for individual i, and Ŝ(t) is the estimated probability of surviving to time t, based on the fitted model. Now the e_is are rather unlike the usual residuals calculated from a linear regression model. For one thing, they’re always positive. For our purposes, however, what’s important about these residuals if that, if the fitted model is correct, the e_is have (approximately) an exponential distribution with parameter λ=l. (If t_i is a censoring time, then e_i is also treated as a censored observation.) But we already have a graphical method for evaluating exponential distributions with censoring: compute the KM estimator of the survivor function, take minus the log of the estimated survivor function, and plot that against t (actually e in this case). The resulting graph should be a straight line, with a slope of 1 and an origin at 0.

Here’s an example of how to do this for a Weibull model fitted to the recidivism data:

proc lifereg data=recid;
   model week*arrest(0)=fin age race wexp mar paro prio
         / dist=weibull;
   output out=a cdf=f;
run;

data b;
   set a;
   e=-log(l-f);
run;

proc lifetest data=b plots=(1s) notable graphics;
   time e*arrest(0);
   symbol1 v=none;
run;

Output 4.13. Residual Plot for Weibull Model

The OUTPUT statement in the LIFEREG procedure defines an output data set (here named A) containing all of the original data and selected additional variables. By specifying CDF=F, we request the estimated c.d.f. evaluated at t_i, and we give that variable the name F (or any other name we choose). Since the c.d.f. is just 1 minus the survivor function, we’re halfway there in getting the residuals. In the DATA step, we take minus the log of 1– F to get the Cox-Snell residuals. Finally, in PROC LIFETEST we request the log-survivor plot for the residuals (the NOTABLE option suppresses the KM table). We can repeat this set of statements for each choice of distribution, changing only the DIST option in the MODEL statement.

Output 4.14. Residual Plot for Log-Normal Model

Unfortunately, while this method is attractive in theory and is easy to implement, I have not found it to be sensitive to differences in model fit. Output 4.13 shows the plot for the Weibull model, which fit the data well according to the earlier likelihood ratio test. To my eye, it looks pretty straight. Output 4.14 for the log-normal model also looks fairly straight, even though the likelihood ratio test indicated rejection. Any differences between the two plots are quite subtle. Plots for the other models are also similar.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Graphical Methods for Evaluating Model Fit

Create new playlist

Sign In

Sign Up

Graphical Methods for Evaluating Model Fit

Output 4.9. Log-Survivor Plot for Recidivism Data

Output 4.10. Log-Log Survivor Plot for Recidivism Data

Output 4.11. Plot for Evaluating Log-Logistic Model

Output 4.12. Plot for Evaluating Log-Normal Model

Output 4.13. Residual Plot for Weibull Model

Output 4.14. Residual Plot for Log-Normal Model

Table of Contents for
Graphical Methods for Evaluating Model Fit