Categorical Variables and the CLASS Statement

In the recidivism example, several of the covariates—race, marital status, work experience, and parole status—are dichotomous variables that are coded as indicator (dummy) variables. For categorical covariates with more than two categories, the standard approach is to create a set of indicator variables, one for each category (except for one). You can do this in the DATA step, but PROC LIFEREG does it automatically if the variable (or variables) is listed in a CLASS statement. Here’s an example. Another covariate in the recidivism data set is education, which was originally coded like this:

2 = 6th grade or less24 cases
3 = 7th to 9th grade239 cases
4 = 10th to 11th grade119 cases
5 = 12th grade39 cases
6 = some college11 cases

Due to the small numbers of cases in the two extreme categories, I combined them (in a DATA step) with the adjacent categories to produce a variable EDUC with values of 3 (9th or less), 4 (10th to 11th) and 5 (12th or more). I then specified a Weibull model in PROC LIFEREG with the following statements:

proc lifereg data=recid;
   class educ;
   model week*arrest(0)=fin age race wexp mar paro
         prio educ/dist=weibull covb;
   run;

The COVB option requests the covariance matrix for the parameter estimates, which we’ll discuss later in the chapter. Output 4.7 shows other results (some output lines have been omitted). For any variables listed in the CLASS statement, PROC LIFEREG first reports the number of levels found, and values for those levels.

In the table of estimates, there are four lines for the EDUC variable. The first line is a chi-square test of the null hypothesis that all the coefficients associated with EDUC are 0. In this case, the chi-square statistic is 3.20 with 2 d.f., yielding a p-value of .20—clearly not significant. The next two lines contain coefficients, standard errors, and hypothesis tests for levels 3 and 4 of EDUC, while the last line merely informs us that level 5 is the omitted category. Hence, each of the estimated coefficients is a contrast with level 5. (The default in PROC LIFEREG is to take the highest formatted value as the omitted category, but you can get some control over this with the ORDER option in the PROC LIFEREG statement.)

Output 4.7. Recidivism Model with Education as a CLASS Variable
L I F E R E G  P R O C E D U R E
                       Class Level Information

                      Class    Levels    Values

                      EDUC          3    3 4 5

Log Likelihood for WEIBULL -317.4956184

Variable  DF   Estimate  Std Err ChiSquare  Pr>Chi Label/Value

INTERCPT   1 4.46799186 0.517139  74.64661  0.0001 Intercept
FIN        1 0.26898413 0.137861  3.806899  0.0510
AGE        1 0.03920003 0.015947  6.042594  0.0140
RACE       1  -0.252407  0.22293  1.281928  0.2575
WEXP       1 0.07729428 0.152151  0.258073  0.6114
MAR        1 0.30129792 0.273231  1.215999  0.2701
PARO       1 0.06579323 0.139592  0.222148  0.6374
PRIO       1 -0.0585497 0.021336  7.530823  0.0061

EDUC       2                      3.202349  0.2017
           1 -0.5115662 0.309022  2.740463  0.0978            3
           1 -0.3536199 0.324266   1.18924  0.2755            4
           0          0        0         .   .                5

SCALE      1  0.7118726 0.063398                 Extreme value scale parameter

Unlike the GLM procedure, PROC LIFEREG does not have facilities for specifying interactions between two CLASS variables or between a CLASS variable and a quantitative covariate. To do this, you must create appropriate variables representing the interaction, either in the DATA step or with the GLMMOD procedure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.33.87