4.5. A Four-Way Table with Ordinal Explanatory Variables

Next we consider a 2 × 2 × 4 × 4 table, reported by Sewell and Shah (1968), for a sample of 4,991 high school seniors in Wisconsin. The dependent variable was whether or not they planned to attend college in the following year. The three independent variables were coded as follows:

IQ1=low, 2=lower middle, 3=upper middle, 4=high
SES1=low, 2=lower middle, 3=upper middle, 4=high
PARENT1=low parental encouragement, 2=high encouragement.

Below is a DATA step to read in the table. There is one record for each combination of the independent variables. The fourth variable TOTAL is the number of students who had each set of values for the independent variables, and the last variable COLL is the number of students who planned to attend college. Thus, on the first line, we see that there were 353 students who were low on all three independent variables; of those, only four planned to attend college.

DATA wisc;
  INPUT iq parent ses total coll;
  DATALINES;
1   1   1   353     4
1   1   2   234     2
1   1   3   174     8
1   1   4    52     4
1   2   1    77    13
1   2   2   111    27
1   2   3   138    47
1   2   4    96    39
2   1   1   216     9
2   1   2   208     7
2   1   3   126     6
2   1   4    52     5
2   2   1   105    33
2   2   2   159    64
2   2   3   184    74
2   2   4   213   123
3   1   1   138    12
3   1   2   127    12
3   1   3   109    17
3   1   4    50     9
3   2   1    92    38
3   2   2   185    93
3   2   3   248   148
3   2   4   289   224
4   1   1    77    10
4   1   2    96    17
4   1   3    48     6
4   1   4    25     8
4   2   1    92    49
4   2   2   178   119
4   2   3   271   198
4   2   4   468   414
;

Now we’re ready for GENMOD:

PROC GENMOD DATA=wisc;
  CLASS iq ses;
  MODEL coll/total=iq ses parent / D=B TYPE3;
RUN;

IQ and SES are listed as CLASS variables so that they will be treated as categorical rather than quantitative variables. The TYPE3 option is needed to produce test statistics for sets of dummy variables created by the CLASS statement; otherwise GENMOD only prints a chi-square for each constructed dummy variable.

Output 4.10. GENMOD Output for a Four-Way Table
          Criteria For Assessing Goodness Of Fit

Criterion              DF         Value      Value/DF

Deviance               24       25.2358        1.0515
Scaled Deviance        24       25.2358        1.0515
Pearson Chi-Square     24       24.4398        1.0183
Scaled Pearson X2      24       24.4398        1.0183
Log Likelihood          .    -2166.0549             .

             Analysis Of Parameter Estimates

Parameter       DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT        1     -3.1005      0.2123    213.3353  0.0001
IQ         1     1     -1.9663      0.1210    264.2400  0.0001
IQ         2     1     -1.3722      0.1024    179.7284  0.0001
IQ         3     1     -0.6331      0.0976     42.0830  0.0001
IQ         4     0      0.0000      0.0000           .       .
SES        1     1     -1.4140      0.1210    136.6657  0.0001
SES        2     1     -1.0580      0.1029    105.7894  0.0001
SES        3     1     -0.7516      0.0976     59.3364  0.0001
SES        4     0      0.0000      0.0000           .       .
PARENT           1      2.4554      0.1014    586.3859  0.0001
SCALE            0      1.0000      0.0000           .       .

NOTE:  The scale parameter was held fixed.

  LR Statistics For Type 3 Analysis

Source       DF   ChiSquare  Pr>Chi

IQ            3    361.5648  0.0001
SES           3    179.8467  0.0001
PARENT        1    795.6139  0.0001

In Output 4.10, we see that a model with main effects and no interactions fits the data well. Even though GENMOD doesn’t compute a p-value for the deviance, whenever the deviance is close to the degrees of freedom, you can be sure that the p-value is well above .05. (In this case, the p-value is .39.) As with the previous table, the deviance is a test statistic for the null hypothesis that all the omitted terms (three 2-way interactions and one 3-way interaction) are 0.

Despite the fact that the overall deviance is not significant, I also checked to see if any of the two-way interactions might be significant. This could happen if most of the deviance was attributable to a single interaction. The GENMOD code for fitting all the two-way interactions is:

PROC GENMOD DATA=wisc;
  CLASS iq ses;
  MODEL coll/total=iq|ses|parent @2/ D=B TYPE3;
RUN;

The syntax IQISESIPARENT is shorthand for all possible interactions and lower-order terms among the three variables. Because I wasn’t interested in the three-way interaction, I included the @2 option, which restricts the model to 2-way interactions and main effects. In Output 4.11 (produced by the TYPE3 option), we see that none of the 2-way interactions is statistically significant.

Output 4.11. Likelihood Ratio Tests for a Model with 2-Way Interactions
LR Statistics For Type 3 Analysis

Source        DF   ChiSquare  Pr>Chi

IQ             3     25.2021  0.0001
SES            3      2.5691  0.4629
SES*IQ         9     12.4052  0.1914
PARENT         1    724.5295  0.0001
PARENT*IQ      3      2.9357  0.4017
PARENT*SES     3      2.2237  0.5273

Now we can feel confident that the main-effects model in Output 4.10 is a reasonably good fit to the data. From the likelihood ratio statistics at the bottom of the output, we see that each of the three variables is highly significant, with parental encouragement having the strongest effect, followed by IQ and then SES.

Turning to the parameter estimates, the easiest one to interpret is the coefficient of 2.4554 for PARENT. Although PARENT has values of 1 for low encouragement and 2 for high encouragement, the coefficient would be identical if the values were 0 for low and 1 for high. Calculating exp(2.4554) = 11.65, we can say that students whose parents gave high levels of encouragement are nearly 12 times as likely to plan to attend college as students whose parents gave low levels of encouragement. Because the sample size is large, this adjusted odds ratio is estimated with good precision: the 95% confidence interval is 9.5 to 14.3 (calculated by hand using the Wald method).

To interpret the IQ and SES coefficients, it’s essential to remember that by default GENMOD uses the highest value of a CLASS variable as the reference category. Each of the coefficients is a comparison between a particular category and the reference category. For example, the coefficient of –1.9663 for IQ 1 is a comparison between the low IQ and the high IQ group. Exponentiating this coefficient yields .14, indicating that the odds of college plans among the low IQ group are about one seventh the odds in the high IQ group. For IQ 2, we have exp(–1.3722) = .25. This tell us that students in the lower-middle IQ group have odds that are only about one-fourth the odds for the high group. Finally, comparing upper-middle and high groups, we have exp(–.6331) = .53. Thus, the highest group has about double the odds of college plans as the upper-middle group. From the chi-square column, we see that each of these contrasts with the reference category is statistically significant at well beyond the .001 level.

Again, each of the SES coefficients is a contrast with the high SES category. Each SES level is significantly different from the highest category and, as you might expect, the odds of college plans goes up with each increase in SES. But what about comparisons between other categories? Suppose we want to know if there’s a significant difference between lower-middle (SES 2) and upper-middle (SES 3) groups with respect to college plans. The magnitude of the difference is found by taking the difference in coefficients: –.7516 – (–1.058) = .3064. Exponentiating yields 1.36, indicating that the odds of college plans are about 36% higher in the upper-middle than in the lower-middle group. To test for significance of the difference, we can use a CONTRAST statement after the MODEL statement:

CONTRAST 'UM vs. LM' ses 0 1 -1 0;

The text within the single quotes is a label; a label is mandatory but it can be any text you want. The four numbers after SES tell GENMOD to multiply the SES 1 coefficient by 0, SES 2 by 1, SES 3 by –1, and SES 4 by 0. GENMOD then sums the results and does a likelihood ratio test of the null hypothesis that the sum is equal to 0. Of course, this is equivalent to testing whether the difference between SES 2 and SES 3 is 0. The output clearly indicates that the difference is significant:

      CONTRAST Statement Results

Contrast     DF   ChiSquare  Pr>Chi  Type

UM vs. LM     1      9.1318  0.0025  LR

The pattern of coefficients for IQ and SES suggests that SES and IQ might be treated as quantitative variables, thereby obtaining a more parsimonious representation of the data. This is easily accomplished by removing those variables from the CLASS statement. I estimated two models, one removing IQ from the CLASS statement and the other removing SES from the CLASS statement. When IQ is treated as a quantitative variable, the deviance is 25.84 with 26 degrees of freedom and a p-value of .47. When SES is treated as quantitative, the deviance is 35.24 with 26 degrees of freedom and a p-value of .11. While this is still an acceptable fit, it suggests that perhaps the effect of SES is not quite linear. To get a more sensitive test, we can take the difference in deviances between the model with SES as categorical (25.24) and the model with SES as quantitative (35.24), yielding a chi-square of 10 with 2 degrees of freedom (the difference in degrees of freedom for the two models). This test has a p-value of .0067, telling us that the model with SES as categorical fits significantly better than the model with SES as quantitative.

Output 4.12 gives the results for the model with SES as categorical and IQ as quantitative. By removing IQ from the CLASS statement, we impose the restriction that each one-level jump in IQ has the same effect on the odds of planning to go to college. Exponentiating the IQ coefficient of .6682, we get 1.95. We can then say that each one-level increase in IQ approximately doubles the odds (controlling for SES and parental encouragement). Consequently, moving three steps (from IQ 1 to IQ 4) multiplies the odds by 1.953=7.4.

Output 4.12. Model with Quantitative Effect of IQ
             Analysis Of Parameter Estimates

Parameter     DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT      1     -5.7635      0.2265    647.7589  0.0001
IQ             1      0.6682      0.0367    332.0319  0.0001
SES       1    1     -1.4131      0.1209    136.5537  0.0001
SES       2    1     -1.0566      0.1028    105.5986  0.0001
SES       3    1     -0.7486      0.0975     58.9692  0.0001
SES       4    0      0.0000      0.0000           .       .
PARENT         1      2.4532      0.1013    586.0945  0.0001

The coefficients for SES and PARENT are about the same as they were in Output 4.10. Examination of the SES coefficients yields some insight into why couldn’t we impose a linear effect of SES. If the effect of SES were linear, the difference between adjacent coefficients should be the same at every level. Yet, while the difference between SES 4 and SES 3 is .75, the difference between SES 3 and SES 2 is only .31. With a sample size of nearly 5,000, even small differences like this may show up as statistically significant.

Although GENMOD doesn’t compute an R2, it’s easy enough to calculate one yourself. First, you must fit a null model—a model with no explanatory variables. For the data at hand, this can be done with the statement:

MODEL coll/tot= / D=B;

The deviance for the null model is 2262.61. For the model in Output 4.12, the deviance is 25.85. The difference of 2236.76 is the chi-square for the testing of the null hypothesis that all the coefficients are 0. Applying the formula in equation (3.10) yields a generalized R2 of .36.

Before leaving the four-way table, let’s consider one more model. It’s reasonable to argue that parental encouragement is an intervening variable in the production of college plans. Students with high IQ and high SES are more likely to get parental encouragement for attending college. As a result, controlling for parental encouragement may obscure the overall impact of these two variables. To check this possibility, Output 4.13 displays a model that excludes PARENT. The coefficient for IQ increases slightly—on the odds scale, it’s about 14% larger. For SES the change is more dramatic. The coefficient for SES 1 has a magnitude of 2.13 in Output 4.13 compared with 1.41 in Output 4.12, corresponding to odds ratios of 8.41 vs. 4.10. So the effect of going from the lowest to highest category of SES more than doubles when PARENT is removed from the model. It appears, then, that parental encouragement mediates a substantial portion of the overall effect of SES on college plans.

Output 4.13. Model without PARENT
             Analysis Of Parameter Estimates

Parameter     DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT      1     -1.5724      0.1123    196.1821  0.0001
IQ             1      0.7953      0.0335    565.1182  0.0001
SES       1    1     -2.1340      0.1091    382.5562  0.0001
SES       2    1     -1.5361      0.0935    269.6874  0.0001
SES       3    1     -0.9776      0.0894    119.4661  0.0001
SES       4    0      0.0000      0.0000           .       .

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.54.7