Now let’s look at a much more complicated table, the 2 × 2 × 4 × 4 table that we previously analyzed by way of a logit model in Section 4.5. Our main goal will be to duplicate the results of the logit model with a loglinear model. To refresh your memory, the sample consisted of 4,991 high school seniors in Wisconsin. The dependent variable was whether or not they planned to attend college in the following year. The three independent variables were coded as follows:
IQ | 1=low, 2=lower middle, 3=upper middle, 4=high |
SES | 1=low, 2=lower middle, 3=upper middle, 4=high |
PARENT | 1=low parental encouragement, 2=high encouragement. |
The data, shown in Section 4.5, was read in as 32 records, each record containing a unique combination of values of the independent variables, along with the number of seniors who had those values and the number of those seniors who planned to attend college. Unfortunately, that’s not the format we need for a loglinear analysis. Instead, we need 64 records, one for each cell in the four-way table, with values for all the variables and the frequency count in that cell. Here’s a DATA step that inputs the previous data set (WISC) and outputs the new data set in the appropriate format (WISCTAB).
DATA wisctab; SET wisc; college=1; freq=coll; OUTPUT; college=0; freq=total-coll; OUTPUT; DROP total coll; PROC PRINT; RUN;
Output 10.4 shows what this new data set looks like.
OBS IQ PARENT SES COLLEGE FREQ 1 1 1 1 1 4 2 1 1 1 0 349 3 1 1 2 1 2 4 1 1 2 0 232 5 1 1 3 1 8 6 1 1 3 0 166 7 1 1 4 1 4 8 1 1 4 0 48 9 1 2 1 1 13 10 1 2 1 0 64 11 1 2 2 1 27 12 1 2 2 0 84 13 1 2 3 1 47 14 1 2 3 0 91 15 1 2 4 1 39 16 1 2 4 0 57 17 2 1 1 1 9 18 2 1 1 0 207 19 2 1 2 1 7 20 2 1 2 0 201 21 2 1 3 1 6 22 2 1 3 0 120 23 2 1 4 1 5 24 2 1 4 0 47 25 2 2 1 1 33 26 2 2 1 0 72 27 2 2 2 1 64 28 2 2 2 0 95 29 2 2 3 1 74 30 2 2 3 0 110 31 2 2 4 1 123 32 2 2 4 0 90 33 3 1 1 1 12 34 3 1 1 0 126 35 3 1 2 1 12 36 3 1 2 0 115 37 3 1 3 1 17 38 3 1 3 0 92 39 3 1 4 1 9 40 3 1 4 0 41 41 3 2 1 1 38 42 3 2 1 0 54 43 3 2 2 1 93 44 3 2 2 0 92 45 3 2 3 1 148 46 3 2 3 0 100 47 3 2 4 1 224 48 3 2 4 0 65 49 4 1 1 1 10 50 4 1 1 0 67 51 4 1 2 1 17 52 4 1 2 0 79 53 4 1 3 1 6 54 4 1 3 0 42 55 4 1 4 1 8 56 4 1 4 0 17 57 4 2 1 1 49 58 4 2 1 0 43 59 4 2 2 1 119 60 4 2 2 0 59 61 4 2 3 1 198 62 4 2 3 0 73 63 4 2 4 1 414 64 4 2 4 0 54 |
Here is the SAS code for estimating a loglinear model that is equivalent to the first logit model of Section 4.5:
PROC GENMOD DATA=wisctab; CLASS iq ses; MODEL freq=iq|ses|parent college iq*college ses*college parent*college / D=P TYPE3; RUN;
As before, we are fitting a Poisson regression model for the frequency counts, with the default logarithmic link. The first term on the right-hand side of the MODEL equation—IQ|SES|PARENT—is shorthand for IQ*SES*PARENT IQ*SES IQ*PARENT SES*PARENT IQ SES PARENT. In other words, we fit the 3-way interaction, the three 2-way interactions, and the main effects of each of the independent variables. These parameters pertain only to the relationships among the independent variables in the logit model, not to the effects of the independent variables on the dependent variable (college choice). We include them in the model because to do otherwise would assert that they are 0. Because we cannot force these parameters to be 0 in a logit model, neither do we do it in the corresponding loglinear model. The general principle is this: Whenever you want a loglinear model to be equivalent to some logit model, you must include all possible interactions among the independent variables in the logit model. Even though we include these interactions, they rarely have any substantive interest because they describe relationships among the independent variables conditional on the values of the dependent variable. Ordinarily, this has no useful causal interpretation.
The parameters that do have a useful interpretation are specified in the MODEL statement as COLLEGE IQ*COLLEGE SES*COLLEGE PARENT*COLLEGE. These correspond to the intercept and the three main effects of the independent variables on the dependent variable in the logit model. So, all the parameters in the corresponding logit model involve the dependent variable when specified in the loglinear model. Notice that IQ and SES are listed as CLASS variables so that, for each variable, three dummy variables will be constructed to represent the four categories. This is unnecessary for COLLEGE and PARENT because they are dichotomous.
Results are shown in Output 10.5. Values that are the same as those in Output 4.10, obtained by direct fitting of the logit model, are shown in boldface. The numbers to the right of the parameter names correspond to the values of the CLASS variables. Apparently, the loglinear model contains many more parameters than the logit model, but the ones that count are identical in the two models. Notice also that the deviance and Pearson chi-squares are identical for the logit and loglinear models.
The GENMOD Procedure Model Information Description Value Data Set WORK.WISCTAB Distribution POISSON Link Function LOG Dependent Variable FREQ Observations Used 64 Class Level Information Class Levels Values IQ 4 1 2 3 4 SES 4 1 2 3 4 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 24 25.2358 1.0515 Scaled Deviance 24 25.2358 1.0515 Pearson Chi-Square 24 24.4398 1.0183 Scaled Pearson X2 24 24.4398 1.0183 Log Likelihood . 18912.8805 . Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 1.4076 0.4093 11.8287 0.0006 IQ 1 1 2.4070 0.5036 22.8476 0.0001 IQ 2 1 1.8198 0.4967 13.4206 0.0002 IQ 3 1 1.7044 0.4969 11.7656 0.0006 IQ 4 0 0.0000 0.0000 . . SES 1 1 3.4289 0.4775 51.5546 0.0001 SES 2 1 3.3431 0.4600 52.8290 0.0001 SES 3 1 1.6470 0.5007 10.8213 0.0010 SES 4 0 0.0000 0.0000 . . IQ*SES 1 1 1 0.2993 0.5817 0.2647 0.6069 IQ*SES 1 2 1 -0.7469 0.5692 1.7217 0.1895 IQ*SES 1 3 1 0.2001 0.6061 0.1090 0.7413 IQ*SES 1 4 0 0.0000 0.0000 . . IQ*SES 2 1 1 -0.3036 0.5794 0.2745 0.6004 IQ*SES 2 2 1 -0.6241 0.5633 1.2277 0.2679 IQ*SES 2 3 1 0.0101 0.6070 0.0003 0.9867 IQ*SES 2 4 0 0.0000 0.0000 . . IQ*SES 3 1 1 -0.7575 0.5905 1.6454 0.1996 IQ*SES 3 2 1 -1.4171 0.5746 6.0828 0.0137 IQ*SES 3 3 1 -0.2082 0.6113 0.1160 0.7334 IQ*SES 3 4 0 0.0000 0.0000 . . IQ*SES 4 1 0 0.0000 0.0000 . . IQ*SES 4 2 0 0.0000 0.0000 . . IQ*SES 4 3 0 0.0000 0.0000 . . IQ*SES 4 4 0 0.0000 0.0000 . . PARENT 1 1.3895 0.2144 42.0060 0.0001 PARENT*IQ 1 1 -1.3237 0.2746 23.2379 0.0001 PARENT*IQ 2 1 -0.7906 0.2632 9.0218 0.0027 PARENT*IQ 3 1 -0.8352 0.2616 10.1935 0.0014 PARENT*IQ 4 0 0.0000 0.0000 . . PARENT*SES 1 1 -2.0023 0.2641 57.4604 0.0001 PARENT*SES 2 1 -1.7432 0.2474 49.6418 0.0001 PARENT*SES 3 1 -0.7940 0.2636 9.0726 0.0026 PARENT*SES 4 0 0.0000 0.0000 . . PARENT*IQ*SES 1 1 1 0.2425 0.3363 0.5198 0.4709 PARENT*IQ*SES 1 2 1 0.6967 0.3196 4.7533 0.0292 PARENT*IQ*SES 1 3 1 0.1915 0.3316 0.3336 0.5636 PARENT*IQ*SES 1 4 0 0.0000 0.0000 . . PARENT*IQ*SES 2 1 1 0.3940 0.3240 1.4789 0.2239 PARENT*IQ*SES 2 2 1 0.4903 0.3061 2.5662 0.1092 PARENT*IQ*SES 2 3 1 0.0860 0.3228 0.0709 0.7900 PARENT*IQ*SES 2 4 0 0.0000 0.0000 . . PARENT*IQ*SES 3 1 1 0.5263 0.3284 2.5689 0.1090 PARENT*IQ*SES 3 2 1 0.9028 0.3082 8.5797 0.0034 PARENT*IQ*SES 3 3 1 0.2568 0.3215 0.6380 0.4244 PARENT*IQ*SES 3 4 0 0.0000 0.0000 . . PARENT*IQ*SES 4 1 0 0.0000 0.0000 . . PARENT*IQ*SES 4 2 0 0.0000 0.0000 . . PARENT*IQ*SES 4 3 0 0.0000 0.0000 . . PARENT*IQ*SES 4 4 0 0.0000 0.0000 . . COLLEGE 1 -3.1005 0.2123 213.3353 0.0001 COLLEGE*IQ 1 1 -1.9663 0.1210 264.2400 0.0001 COLLEGE*IQ 2 1 -1.3722 0.1024 179.7284 0.0001 COLLEGE*IQ 3 1 -0.6331 0.0976 42.0831 0.0001 COLLEGE*IQ 4 0 0.0000 0.0000 . . COLLEGE*SES 1 1 -1.4140 0.1210 136.6657 0.0001 COLLEGE*SES 2 1 -1.0580 0.1029 105.7894 0.0001 COLLEGE*SES 3 1 -0.7516 0.0976 59.3364 0.0001 COLLEGE*SES 4 0 0.0000 0.0000 . . PARENT*COLLEGE 1 2.4554 0.1014 586.3859 0.0001 SCALE 0 1.0000 0.0000 . . NOTE: The scale parameter was held fixed. LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi IQ 3 175.6015 0.0001 SES 3 379.7224 0.0001 IQ*SES 9 17.5638 0.0406 PARENT 1 34.1173 0.0001 PARENT*IQ 3 86.1646 0.0001 PARENT*SES 3 257.0933 0.0001 PARENT*IQ*SES 9 13.7343 0.1321 COLLEGE 1 1078.3695 0.0001 COLLEGE*IQ 3 361.5648 0.0001 COLLEGE*SES 3 179.8467 0.0001 PARENT*COLLEGE 1 795.6139 0.0001 |
Because the deviance is not 0, we know that this is not a saturated model, unlike the model we considered for the 2 × 2 table. To get a saturated model, we would have to include three 3-way interactions with COLLEGE and one 4-way interaction with COLLEGE. These would correspond to three 2-way interactions and one 3-way interaction in the logit model.
3.16.66.156