6.5. Cumulative Logit Model: Contingency Tables

The cumulative logit model can be very useful in analyzing contingency tables. Consider Table 6.1, which was tabulated by Sloane and Morgan (1996) from the General Social Survey. Our goal is to estimate a model for the dependence of happiness on year and marital status.

Table 6.1. General Happiness by Marital Status and Year
  Very happyPretty happyNot too happy
1974Married47349393
 Unmarried8423199
1984Married33238762
 Unmarried150347117
1994Married571793112
 Unmarried257889234

Here’s the SAS program to read the table:

DATA happy;
  INPUT year married happy count;
  y84 = year EQ 2;
  y94 = year EQ 3;
  DATALINES;
1 1 1 473
1 1 2 493
1 1 3  93
1 0 1  84
1 0 2 231
1 0 3  99
2 1 1 332
2 1 2 387
2 1 3  62
2 0 1 150
2 0 2 347
2 0 3 117
3 1 1 571
3 1 2 793
3 1 3 112
3 0 1 257
3 0 2 889
3 0 3 234
;

The two lines after the INPUT statement define dummy variables. For example, Y84 =1 when YEAR is equal to 2, otherwise 0. Note that unlike GENMOD, it’s necessary to define the dummy variables for YEAR in the DATA statement. Notice also that I’ve coded HAPPY so that 1 is very happy and 3 is not too happy.

To fit the cumulative logit model, we run the following program, with results shown in Output 6.5:

PROC LOGISTIC DATA=happy;
  FREQ count;
  MODEL happy = married y84 y94 / AGGREGATE SCALE=N;
  TEST y84, y94;
RUN;

Output 6.5. Cumulative Logit Results for Happiness Table
              Score Test for the Proportional Odds Assumption

                 Chi-Square = 26.5204 with 3 DF (p=0.0001)

          Deviance and Pearson Goodness-of-Fit Statistics

                                                           Pr >
         Criterion        DF       Value    Value/DF    Chi-Square

         Deviance          7     30.9709      4.4244        0.0001
         Pearson           7     31.4060      4.4866        0.0001

                        Number of unique profiles: 6

    Model Fitting Information and Testing Global Null Hypothesis BETA=0

                             Intercept
               Intercept        and
 Criterion       Only       Covariates    Chi-Square for Covariates

 AIC           10937.042     10586.698         .
 SC            10950.347     10619.961         .
 -2 LOG L      10933.042     10576.698      356.343 with 3 DF (p=0.0001)
 Score              .             .         348.447 with 3 DF (p=0.0001)

                  Analysis of Maximum Likelihood Estimates

             Parameter Standard    Wald       Pr >    Standardized    Odds
 Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate     Ratio

 INTERCP1 1    -1.3203   0.0674   383.2575     0.0001            .     .
 INTERCP2 1     1.4876   0.0684   473.6764     0.0001            .     .
 MARRIED  1     0.9931   0.0553   322.9216     0.0001     0.270315   2.700
 Y84      1     0.0479   0.0735     0.4250     0.5145     0.011342   1.049
 Y94      1    -0.0716   0.0636     1.2673     0.2603    -0.019737   0.931

                Linear Hypotheses Testing

                 Wald                          Pr >
Label         Chi-Square              DF    Chi-Square

                 3.7619               2        0.1524

The first thing we see in Output 6.5 is that the score test for the proportional odds assumption indicates fairly decisive rejection of the model. This is corroborated by the overall goodness-of-fit tests (obtained with the AGGREGATE option), which have p-values less than .0001. The score test has 3 degrees of freedom corresponding to the constraints imposed on the three coefficients in the model. Roughly speaking, the score test can be thought of as one component of the overall deviance. So we can say that approximately 85% of the deviance stems from the constraints imposed by the cumulative logit model. The remaining 15% (and 4 degrees of freedom) comes from possible interactions between year and marital status.

Should we reject the model? Keep in mind that the sample size is quite large (5,724 cases), so it may be hard to find any parsimonious model with a p-value above .05. But let’s postpone a decision until we examine more evidence. Turning to the lower part of the output, we see strong evidence that married people report greater happiness than the unmarried but little evidence for change over time. Neither of the individual year coefficients is statistically significant. A simultaneous test that both coefficients are 0 (produced by the TEST statement and reported under “Linear Hypothesis Testing”) is also nonsignificant. So let’s try deleting the year variables and see what happens (Output 6.6).

Output 6.6. Happiness Model with Year Variables Deleted
             Score Test for the Proportional Odds Assumption

                  Chi-Square = 0.3513 with 1 DF (p=0.5534)

              Deviance and Pearson Goodness-of-Fit Statistics

                                                           Pr >
         Criterion        DF       Value    Value/DF     Chi-Square

         Deviance          1      0.3508      0.3508         0.5537
         Pearson           1      0.3513      0.3513         0.5534

                        Number of unique profiles: 2

                 Analysis of Maximum Likelihood Estimates

             Parameter Standard    Wald       Pr >    Standardized   Odds
Variable DF   Estimate   Error  Chi-Square Chi-Square   Estimate    Ratio

INTERCP1 1     -1.3497   0.0459   865.8084     0.0001            .     .
INTERCP2 1      1.4569   0.0468   968.1489     0.0001            .     .
MARRIED  1      1.0017   0.0545   337.2309     0.0001     0.272655  2.723

Now the model fits great. Of course, there’s only 1 degree of freedom in the score test because only one coefficient is constrained across the two implicit equations. The deviance and Pearson chi-squares also have only 1 degree of freedom because LOGISTIC has regrouped the data after the elimination of the year variables. Interpreting the marital status effect, we find that married people have odds of higher happiness that are nearly three times the odds for unmarried people.

It’s tempting to leave it at this, but the fact that the score statistic declined so dramatically with the deletion of the year variables suggests that something else is going on. To see what it might be, let’s fit separate models for the two ways of dichotomizing the happiness variable.

DATA a;
 SET happy;
 lesshap=happy GE 2;
 nottoo=happy EQ 3;
RUN;
PROC LOGISTIC DATA=a;
  FREQ count;
  MODEL lesshap=married y84 y94;
RUN;
PROC LOGISTIC DATA=a;
  FREQ count;
  MODEL nottoo=married y84 y94;
RUN;

Results are in Output 6.7. If the cumulative logit model is correct, the coefficients for the three variables should be the same in the two models. That’s nearly true for MARRIED. But Y94 has a negative coefficient in the first model and a positive coefficient in the second. Both are significant at beyond the .01 level. This is surely the cause of the difficulty with the cumulative logit model. What seems to be happening is that 1994 is different from the other two years, but not in a way that could be described as a uniform increase or decrease in happiness. Rather, in that year there were relatively more cases in the middle category and fewer in the two extreme categories.

It’s possible to generalize the cumulative logit model to accommodate patterns like this. Specifically, one can model the parameter σ in equation (6.2) as a function of explanatory variables. In this example, the year variables would show up in the equation for σ but not in the linear equation for z. Although some commercial packages have this feature (for example, LIMDEP), LOGISTIC does not. In Chapter 10, we’ll see how this pattern can be modeled as a log-linear model.

Output 6.7. Results from Alternative Dichotomizations of General Happiness
1 vs.  (2,3)

             Parameter Standard    Wald       Pr >     Standardized   Odds
 Variable DF  Estimate   Error  Chi-Square Chi-Square    Estimate    Ratio

 INTERCPT 1    -1.2411   0.0735   285.4516     0.0001             .    .
 MARRIED  1     0.9931   0.0624   253.2457     0.0001      0.270311  2.700
 Y84      1    0.00668   0.0801     0.0070     0.9335      0.001582  1.007
 Y94      1    -0.2204   0.0699     9.9348     0.0016     -0.060762  0.802

(1,2) vs. 3

             Parameter Standard    Wald       Pr >     Standardized   Odds
 Variable DF  Estimate   Error  Chi-Square Chi-Square    Estimate    Ratio

 INTERCPT 1     1.2511   0.0915   186.8473     0.0001             .    .
 MARRIED  1     1.0109   0.0842   144.1700     0.0001      0.275159  2.748
 Y84      1     0.1928   0.1140     2.8603     0.0908      0.045643  1.213
 Y94      1     0.3037   0.0996     9.3083     0.0023      0.083737  1.355

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.183.172