In the recidivism example, several of the covariates—race, marital status, work experience, and parole status—are dichotomous variables that are coded as indicator (dummy) variables. For categorical covariates with more than two categories, the standard approach is to create a set of indicator variables, one for each category (except for one). You can do this in the DATA step, but PROC LIFEREG does it automatically if the variable (or variables) is listed in a CLASS statement. Here’s an example. Another covariate in the recidivism data set is education, which was originally coded like this:
2 = 6th grade or less | 24 cases |
3 = 7th to 9th grade | 239 cases |
4 = 10th to 11th grade | 119 cases |
5 = 12th grade | 39 cases |
6 = some college | 11 cases |
Due to the small numbers of cases in the two extreme categories, I combined them (in a DATA step) with the adjacent categories to produce a variable EDUC with values of 3 (9th or less), 4 (10th to 11th) and 5 (12th or more). I then specified a Weibull model in PROC LIFEREG with the following statements:
proc lifereg data=recid; class educ; model week*arrest(0)=fin age race wexp mar paro prio educ/dist=weibull covb; run;
The COVB option requests the covariance matrix for the parameter estimates, which we’ll discuss later in the chapter. Output 4.7 shows other results (some output lines have been omitted). For any variables listed in the CLASS statement, PROC LIFEREG first reports the number of levels found, and values for those levels.
In the table of estimates, there are four lines for the EDUC variable. The first line is a chi-square test of the null hypothesis that all the coefficients associated with EDUC are 0. In this case, the chi-square statistic is 3.20 with 2 d.f., yielding a p-value of .20—clearly not significant. The next two lines contain coefficients, standard errors, and hypothesis tests for levels 3 and 4 of EDUC, while the last line merely informs us that level 5 is the omitted category. Hence, each of the estimated coefficients is a contrast with level 5. (The default in PROC LIFEREG is to take the highest formatted value as the omitted category, but you can get some control over this with the ORDER option in the PROC LIFEREG statement.)
L I F E R E G P R O C E D U R E Class Level Information Class Levels Values EDUC 3 3 4 5 Log Likelihood for WEIBULL -317.4956184 Variable DF Estimate Std Err ChiSquare Pr>Chi Label/Value INTERCPT 1 4.46799186 0.517139 74.64661 0.0001 Intercept FIN 1 0.26898413 0.137861 3.806899 0.0510 AGE 1 0.03920003 0.015947 6.042594 0.0140 RACE 1 -0.252407 0.22293 1.281928 0.2575 WEXP 1 0.07729428 0.152151 0.258073 0.6114 MAR 1 0.30129792 0.273231 1.215999 0.2701 PARO 1 0.06579323 0.139592 0.222148 0.6374 PRIO 1 -0.0585497 0.021336 7.530823 0.0061 EDUC 2 3.202349 0.2017 1 -0.5115662 0.309022 2.740463 0.0978 3 1 -0.3536199 0.324266 1.18924 0.2755 4 0 0 0 . . 5 SCALE 1 0.7118726 0.063398 Extreme value scale parameter |
Unlike the GLM procedure, PROC LIFEREG does not have facilities for specifying interactions between two CLASS variables or between a CLASS variable and a quantitative covariate. To do this, you must create appropriate variables representing the interaction, either in the DATA step or with the GLMMOD procedure.
3.138.33.87