10.5. Loglinear Models for Square, Ordered Tables

Loglinear models have been particularly popular for the analysis of mobility tables like the one shown in Table 10.2, which is based on data collected in Britain by Glass (1954) and his collaborators. The column variable is the respondent’s occupational status, classified into five categories with 1 being the highest and 5 the lowest. The same classification was used for the father’s occupation, the row variable in the table. For obvious reasons, we call this a square table. Our aim is to fit loglinear models that describe the relationship between father’s status and son’s status. In the process, we’ll see how to estimate some models proposed by Goodman (1970).

Table 10.2. Cross-Classification of Respondent’s Occupational Status by Father’s Occupational Status, 3,497 British Males.
   Son’s Status
  12345
 150458188
Father’s Status2281748415455
3117811022396
414150185714447
504272320411

One model that nearly every researcher knows how to fit (but may not realize it) is the independence model, which asserts that the two variables are independent. Here is SAS code to fit the independence model:

DATA mobility;
    INPUT n dad son;
    DATALINES;
50  1 1
45  1 2
8   1 3
18  1 4
8   1 5
28  2 1
174 2 2
84  2 3
154 2 4
55  2 5
11  3 1
78  3 2
110 3 3
223 3 4
96  3 5
14  4 1
150 4 2
185 4 3
714 4 4
447 4 5
0   5 1
42  5 2
72  5 3
320 5 4
411 5 5
;
PROC GENMOD DATA=mobility;
  CLASS dad son;
      MODEL n = dad son /D=P;
RUN;

As usual, each cell of the table is read in as a separate record, with the variable N containing the frequency counts and with values of 1 through 5 for the row and column variables, DAD and SON. In PROC GENMOD, these are declared as CLASS variables so that, at this point, no ordering of the categories is assumed. The MODEL statement includes the main effects of DAD and SON but no interaction. This allows for variation in the marginal frequencies but doesn’t allow for any relationship between the two variables. Not surprisingly, the fit of this model is terrible. With 16 d.f., the deviance is 810.98 and the Pearson chi-square is 1199.36. (It’s hardly worth the effort calculating p-values because they would obviously be smaller than any sensible criterion.) Note that the Pearson chi-square is the same value that would be obtained by traditional methods of computing chi-square in a two-way table and the same value that is reported by PROC FREQ under the CHISQ option.

In rejecting the independence model, we conclude that there is indeed a relationship between father’s status and son’s status. But how can we modify the model to represent this relationship? We could fit the saturated model by the statement:

MODEL n=dad son dad*son / D=P;

but that wouldn’t accomplish much. The deviance and Pearson chi-square would both be 0, and we’d have estimates for 16 parameters describing the relationship between the two variables. We might as well just look at the original table. Can’t we get something more parsimonious?

The first alternative model that Goodman considered is the quasi-independence model, also called the quasi-perfect mobility model when applied to a mobility table. This model takes note of the fact that the main diagonal cells in Table 10.2 tend to have relatively high frequency counts. We might explain this by postulating a process of occupational inheritance such that sons take up the same occupation as the father. The quasi-independence model allows for such inheritance but asserts that there is no additional relationship between father’s status and son’s status. That is, if the son doesn’t have the same status as the father, then father’s status doesn’t tell us anything about son’s status.

There are two ways to fit the quasi-independence model. One way is to include a separate parameter for each of the main diagonal cells. The other, equivalent way (which we will take) is to simply delete the main diagonal cells from the data being fitted. Here’s how:

PROC GENMOD DATA=mobility;
  WHERE dad NE son;
  CLASS dad son;
      MODEL n = dad son /D=P;
RUN;

In the WHERE statement, NE means “not equal to.”

Although the quasi-independence model fits much better than the independence model, the fit is still bad. With 11 d.f., the deviance is 249.4 and the Pearson chi-square is 328.7. We conclude: although 69% of the original deviance is attributable to the main diagonal cells, there is something else going on in the table besides status inheritance.

To represent that something else, Goodman proposed 21 other models as possible candidates. Let’s consider two of them. The QPN model is based on the ordering of occupational status. In Table 10.2, the upper triangle represents downward mobility while the lower triangle represents upward mobility. The QPN model says that, besides ignoring the main diagonal cells, there is independence within each of these two triangles. To fit the model, we create a new variable distinguishing the two portions of the table:

DATA b;
  SET mobility;
  up=son GT dad;
PROC GENMOD DATA=b;
  WHERE dad NE son;
  CLASS dad son;
      MODEL n = dad son son*up dad*up/D=P;
RUN;

We get a considerable improvement in fit from this model (output not shown). With 6 d.f., the deviance is 14.0 (p=.03) and the Pearson chi-square is 9.9 (p=.13). While the p value for the deviance is below the .05 criterion, keep in mind that we are working with a rather large sample and even minor deviations from the model are likely to be statistically significant. What is the substantive interpretation of this model? In addition to allowing for status inheritance (by deleting the main diagonal), it seems to be saying that father’s status could affect whether the son moves up or down but does not determine the exact destination.

Another of Goodman’s models is the diagonals parameter model. This model is motivated by the observation that the cells in Table 10.2 that are farther away from the main diagonal tend to have smaller frequencies. To represent this, we include a distinct parameter corresponding to each absolute difference between father’s status and son’s status:

DATA c;
  SET mobility;
  band=ABS(dad-son);
PROC GENMOD DATA=c;
  WHERE dad NE son;
  CLASS dad son band;
      MODEL n = dad son band /D=P;
RUN;

The results are shown in Output 10.9. The fit is not terrible but it’s not great either—the p-value for the likelihood ratio chi-square is .014. The only parameters of interest to us are those for BAND. We see that cells that are directly adjacent to the main diagonal (BAND=1) have frequencies that are estimated to be exp(2.4824)=12 times those in the off-diagonal corners. For BAND=2 and BAND=3, the frequency counts are estimated to be 7 times and 3 times those in the corners.

Output 10.9. Results from Fitting the Diagonals Parameter Model
           Criteria For Assessing Goodness Of Fit

    Criterion             DF         Value      Value/DF

    Deviance               8       19.0739        2.3842
    Scaled Deviance        8       19.0739        2.3842
    Pearson Chi-Square     8       15.9121        1.9890
    Scaled Pearson X2      8       15.9121        1.9890
    Log Likelihood         .     8473.7383             .

              Analysis Of Parameter Estimates

Parameter      DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT       1      3.0797      0.3680     70.0315  0.0001
DAD        1    1     -1.3915      0.1293    115.7369  0.0001
DAD        2    1     -0.2227      0.0802      7.7116  0.0055
DAD        3    1     -0.4620      0.0743     38.6411  0.0001
DAD        4    1      0.5450      0.0814     44.7860  0.0001
DAD        5    0      0.0000      0.0000           .       .
SON        1    1     -2.1280      0.1478    207.3948  0.0001
SON        2    1     -0.5790      0.0756     58.6140  0.0001
SON        3    1     -0.8890      0.0726    149.8054  0.0001
SON        4    1      0.2367      0.0768      9.4917  0.0021
SON        5    0      0.0000      0.0000           .       .
BAND       1    1      2.4824      0.3707     44.8335  0.0001
BAND       2    1      1.9539      0.3733     27.3900  0.0001
BAND       3    1      1.1482      0.3740      9.4258  0.0021
BAND       4    0      0.0000      0.0000           .       .

Before concluding our analysis, let’s consider one additional model that was not discussed in Goodman’s article, the quasi-uniform association model. This model incorporates the ordering of the occupational statuses in a very explicit manner. The conventional uniform association model says that if the cells in the table are properly ordered, then the odds ratio (cross-product ratio) in every 2 × 2 subtable of adjacent cells is exactly the same. Our modification of that model is to delete the main diagonals before we fit it. Here’s the SAS code:

DATA d;
  SET mobility;
  sonq=son;
  dadq=dad;
PROC GENMOD DATA=d;
  WHERE dad NE son;
  CLASS dad son;
      MODEL n = dad son sonq*dadq /D=P OBSTATS;
RUN;

As in some earlier examples, we define new versions of the row and column variables so that we can treat those variables as both categorical and quantitative. The model includes the main effects of SON and DAD as categorical, thereby ensuring that the marginal frequencies are fitted exactly. The relationship between the two variables is specified by an interaction between the quantitative versions of the two variables. The OBSTATS option requests predicted values and residuals.

With 10 d.f., the quasi-uniform association model has a deviance of 19.3 (p=.04) and a Pearson chi-square of 16.2 (p=.094) (output not shown). While not quite as good a fit as the QPN model, it’s still decent for a sample of this size, and it has the virtue of providing a single number to describe the association between father’s status and son’s status in the off-diagonal cells: .3374. Exponentiating, we find that the estimated odds ratio in any 2 × 2 subtable is 1.40. Thus for adjacent categories, being in the higher category for DAD increases the odds of being in the higher category for SON by 40%.

Comparing the observed and predicted frequencies in Output 10.10, we find that the numbers are close for most of the cells. (For the three lines shown in boldface, the observed frequency lies outside the 95% confidence interval based on the fitted model.)

Output 10.10. Selected OBSTATS Output for Quasi-Uniform Association Model
  N       Pred       Lower      Upper

 45    36.2948     28.0890    46.8979
  8    16.1264     12.6337    20.5846
 18    21.3776     16.3270    27.9904
  8     5.2014      3.6963     7.3195
 28    24.7939     18.3938    33.4210
 84    84.8482     72.6289    99.1233
154   157.6170    138.6531   179.1746
 55    53.7409     45.1977    63.8990
 11    11.8078      8.8657    15.7262
 78    90.9439     78.3197   105.6030
223   206.5569    184.0375   231.8317
 96    98.6915     85.9194   113.3623
 14    13.9113     10.1391    19.0870
150   150.1456    131.5876   171.3208
185   183.5770    162.5983   207.2623
447   448.3662    410.6894   489.4994
  0     2.4871      1.6884     3.6636
 42    37.6157     31.1354    45.4449
 72    64.4486     55.1532    75.3105
320   329.4487    297.8315   364.4223

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.142.115