The cumulative logit model can be very useful in analyzing contingency tables. Consider Table 6.1, which was tabulated by Sloane and Morgan (1996) from the General Social Survey. Our goal is to estimate a model for the dependence of happiness on year and marital status.
Very happy | Pretty happy | Not too happy | ||
---|---|---|---|---|
1974 | Married | 473 | 493 | 93 |
Unmarried | 84 | 231 | 99 | |
1984 | Married | 332 | 387 | 62 |
Unmarried | 150 | 347 | 117 | |
1994 | Married | 571 | 793 | 112 |
Unmarried | 257 | 889 | 234 |
Here’s the SAS program to read the table:
DATA happy; INPUT year married happy count; y84 = year EQ 2; y94 = year EQ 3; DATALINES; 1 1 1 473 1 1 2 493 1 1 3 93 1 0 1 84 1 0 2 231 1 0 3 99 2 1 1 332 2 1 2 387 2 1 3 62 2 0 1 150 2 0 2 347 2 0 3 117 3 1 1 571 3 1 2 793 3 1 3 112 3 0 1 257 3 0 2 889 3 0 3 234 ;
The two lines after the INPUT statement define dummy variables. For example, Y84 =1 when YEAR is equal to 2, otherwise 0. Note that unlike GENMOD, it’s necessary to define the dummy variables for YEAR in the DATA statement. Notice also that I’ve coded HAPPY so that 1 is very happy and 3 is not too happy.
To fit the cumulative logit model, we run the following program, with results shown in Output 6.5:
PROC LOGISTIC DATA=happy; FREQ count; MODEL happy = married y84 y94 / AGGREGATE SCALE=N; TEST y84, y94; RUN;
Score Test for the Proportional Odds Assumption Chi-Square = 26.5204 with 3 DF (p=0.0001) Deviance and Pearson Goodness-of-Fit Statistics Pr > Criterion DF Value Value/DF Chi-Square Deviance 7 30.9709 4.4244 0.0001 Pearson 7 31.4060 4.4866 0.0001 Number of unique profiles: 6 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 10937.042 10586.698 . SC 10950.347 10619.961 . -2 LOG L 10933.042 10576.698 356.343 with 3 DF (p=0.0001) Score . . 348.447 with 3 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCP1 1 -1.3203 0.0674 383.2575 0.0001 . . INTERCP2 1 1.4876 0.0684 473.6764 0.0001 . . MARRIED 1 0.9931 0.0553 322.9216 0.0001 0.270315 2.700 Y84 1 0.0479 0.0735 0.4250 0.5145 0.011342 1.049 Y94 1 -0.0716 0.0636 1.2673 0.2603 -0.019737 0.931 Linear Hypotheses Testing Wald Pr > Label Chi-Square DF Chi-Square 3.7619 2 0.1524 |
The first thing we see in Output 6.5 is that the score test for the proportional odds assumption indicates fairly decisive rejection of the model. This is corroborated by the overall goodness-of-fit tests (obtained with the AGGREGATE option), which have p-values less than .0001. The score test has 3 degrees of freedom corresponding to the constraints imposed on the three coefficients in the model. Roughly speaking, the score test can be thought of as one component of the overall deviance. So we can say that approximately 85% of the deviance stems from the constraints imposed by the cumulative logit model. The remaining 15% (and 4 degrees of freedom) comes from possible interactions between year and marital status.
Should we reject the model? Keep in mind that the sample size is quite large (5,724 cases), so it may be hard to find any parsimonious model with a p-value above .05. But let’s postpone a decision until we examine more evidence. Turning to the lower part of the output, we see strong evidence that married people report greater happiness than the unmarried but little evidence for change over time. Neither of the individual year coefficients is statistically significant. A simultaneous test that both coefficients are 0 (produced by the TEST statement and reported under “Linear Hypothesis Testing”) is also nonsignificant. So let’s try deleting the year variables and see what happens (Output 6.6).
Score Test for the Proportional Odds Assumption Chi-Square = 0.3513 with 1 DF (p=0.5534) Deviance and Pearson Goodness-of-Fit Statistics Pr > Criterion DF Value Value/DF Chi-Square Deviance 1 0.3508 0.3508 0.5537 Pearson 1 0.3513 0.3513 0.5534 Number of unique profiles: 2 Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCP1 1 -1.3497 0.0459 865.8084 0.0001 . . INTERCP2 1 1.4569 0.0468 968.1489 0.0001 . . MARRIED 1 1.0017 0.0545 337.2309 0.0001 0.272655 2.723 |
Now the model fits great. Of course, there’s only 1 degree of freedom in the score test because only one coefficient is constrained across the two implicit equations. The deviance and Pearson chi-squares also have only 1 degree of freedom because LOGISTIC has regrouped the data after the elimination of the year variables. Interpreting the marital status effect, we find that married people have odds of higher happiness that are nearly three times the odds for unmarried people.
It’s tempting to leave it at this, but the fact that the score statistic declined so dramatically with the deletion of the year variables suggests that something else is going on. To see what it might be, let’s fit separate models for the two ways of dichotomizing the happiness variable.
DATA a; SET happy; lesshap=happy GE 2; nottoo=happy EQ 3; RUN; PROC LOGISTIC DATA=a; FREQ count; MODEL lesshap=married y84 y94; RUN; PROC LOGISTIC DATA=a; FREQ count; MODEL nottoo=married y84 y94; RUN;
Results are in Output 6.7. If the cumulative logit model is correct, the coefficients for the three variables should be the same in the two models. That’s nearly true for MARRIED. But Y94 has a negative coefficient in the first model and a positive coefficient in the second. Both are significant at beyond the .01 level. This is surely the cause of the difficulty with the cumulative logit model. What seems to be happening is that 1994 is different from the other two years, but not in a way that could be described as a uniform increase or decrease in happiness. Rather, in that year there were relatively more cases in the middle category and fewer in the two extreme categories.
It’s possible to generalize the cumulative logit model to accommodate patterns like this. Specifically, one can model the parameter σ in equation (6.2) as a function of explanatory variables. In this example, the year variables would show up in the equation for σ but not in the linear equation for z. Although some commercial packages have this feature (for example, LIMDEP), LOGISTIC does not. In Chapter 10, we’ll see how this pattern can be modeled as a log-linear model.
1 vs. (2,3) Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.2411 0.0735 285.4516 0.0001 . . MARRIED 1 0.9931 0.0624 253.2457 0.0001 0.270311 2.700 Y84 1 0.00668 0.0801 0.0070 0.9335 0.001582 1.007 Y94 1 -0.2204 0.0699 9.9348 0.0016 -0.060762 0.802 (1,2) vs. 3 Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 1.2511 0.0915 186.8473 0.0001 . . MARRIED 1 1.0109 0.0842 144.1700 0.0001 0.275159 2.748 Y84 1 0.1928 0.1140 2.8603 0.0908 0.045643 1.213 Y94 1 0.3037 0.0996 9.3083 0.0023 0.083737 1.355 |
18.117.183.172