10.2. A Loglinear Model for a 2 × 2 Table

Let’s start with the 2 × 2 table we analyzed in Chapter 4 (reproduced here as Table 10.1), which shows sentence by race for 147 death penalty cases. How can we represent this by a loglinear model?

Table 10.1. Death Sentence by Race of Defendant
 BlacksNonblacksTotal
Death282250
Life455297
Total7374147

Let’s consider the table in more general form as

m11m12
m21m22

where mij is the expected number of cases falling into row i and column j. What I mean by this is that if n is the sample size and pij is the probability of falling into cell (i, j), then mij=npij.

There are a couple of different but equivalent ways of writing a loglinear model for these four frequency counts. The way I’m going to do it is consistent with how PROC GENMOD estimates the model. Let Ri be a dummy variable for the rows, having a value of 1 if i=1 and 0 if i=2. Similarly, let Cj be a dummy variable for the columns with a value of 1 if j=1 and 0 if j=2. We can then write the “saturated” loglinear model for this 2 × 2 table of frequency counts as:

Equation 10.1


Note that RiCj is the product (interaction) of Ri and Cj. Equation (10.1) actually represents four different equations:

Equation 10.2


In a moment, we’ll see how to estimate the four β parameters by using PROC GENMOD. But before doing that, let’s talk about why we might be interested in these parameters at all. All we’ve done in equation (10.2) is transform the four expected frequencies into four different quantities. The reason for doing this is that the β’s show us things about the table that aren’t so easily seen by looking at the frequency counts.

We are particularly interested in β3, the coefficient for the interaction term. If we solve equation (10.2) for β3, we get:

Equation 10.3


The quantity in parenthesis is the cross-product ratio. In Chapter 2, we saw that the cross-product ratio is equal to the odds ratio. In this case, it’s the ratio of the odds of a death sentence for blacks to the odds of a death sentence for nonblacks. Recall that an odds ratio of 1.0 corresponds to independence between the two variables. Because the logarithm of 1 is 0, independence of the row and column variables is equivalent to β3=0. So, we can test whether the two variables are independent by testing whether β3=0.

All this has been expressed in terms of expected frequency counts, the mij. What about the observed frequency counts, which I’ll denote by nij? For the model we’ve just been considering, the maximum likelihood estimator of β3 has the same form as equation (10.3) except that expected frequencies are replaced by observed frequencies:

Equation 10.4


For the data in Table 10.1, . Similar expressions can readily be obtained for the other β parameters in the model, but we’re not usually very interested in them.

Although it’s simple enough in this case to get all the maximum likelihood estimates by hand calculations, we can also do it with PROC GENMOD. Here is the SAS code:

DATA penalty;
  INPUT n death black;
  DATALINES;
28 1 1
22 1 0
45 0 1
52 0 0
;
PROC GENMOD DATA=penalty;
  MODEL n = death black death*black / DIST=POISSON;
RUN;

Each cell in the table is a separate record in the data set. DEATH is a dummy variable for a death sentence, and BLACK is a dummy variable for race. The MODEL statement includes the main effects of row and column, as well as their interaction. The DIST option in the MODEL statement says that each frequency count has a Poisson distribution whose expected value mij is given by equation (10.2). For frequency counts in contingency tables, the Poisson distribution is appropriate for a variety of different sampling designs. As we saw in Chapter 9, the default in GENMOD for a Poisson distribution is LINK=LOG. That means that the logarithm of the expected value of the dependent variable is assumed to be a linear function of the explanatory variables, which is exactly what equation (10.1) says.

Output 10.1. GENMOD Output for Loglinear Model for a 2 × 2 Table
       Data Set                        WORK.PENALTY
       Distribution                    POISSON
       Link Function                   LOG
       Dependent Variable              N
       Observations Used               4

          Criteria For Assessing Goodness Of Fit

   Criterion             DF         Value      Value/DF

   Deviance               0        0.0000             .
   Scaled Deviance        0        0.0000             .
   Pearson Chi-Square     0        0.0000             .
   Scaled Pearson X2      0        0.0000             .
   Log Likelihood         .      391.0691             .


              Analysis Of Parameter Estimates

Parameter    DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT     1      3.9512      0.1387    811.8410  0.0001
DEATH         1     -0.8602      0.2543     11.4392  0.0007
BLACK         1     -0.1446      0.2036      0.5043  0.4776
DEATH*BLACK   1      0.3857      0.3502      1.2135  0.2706
SCALE         0      1.0000      0.0000           .       .

NOTE:  The scale parameter was held fixed.

Examining Output 10.1, we see that the deviance for this model is 0. That’s because it’s a saturated model; there are four estimated parameters for the four cells in the table, so the model perfectly reproduces the frequency counts. The estimate for the interaction is .3857, the same value obtained from hand calculation of equation (10.4). This estimate has an associated Wald chi-square of 1.214, which is nearly identical to the traditional Pearson chi-square (1.218) for testing whether the two variables are independent (obtained with PROC FREQ). That’s not surprising because both statistics are testing the same null hypothesis.

Instead of a loglinear model, we could estimate a logit model for this table, taking death sentence as the dependent variable:

PROC GENMOD DATA=penalty;
  WEIGHT n;
  MODEL death=black / D=BINOMIAL LINK=LOGIT;
RUN;

This produces Output 10.2.

Output 10.2. GENMOD Output for Logit Model for 2 × 2 Table
            Analysis Of Parameter Estimates

Parameter  DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT   1     -0.8602      0.2543     11.4392  0.0007
BLACK       1      0.3857      0.3502      1.2135  0.2706

Remarkably, the estimates in Output 10.2 are identical to some of the estimates in Output 10.1. The coefficient for BLACK in the logit model is the same as the BLACK*DEATH coefficient in the loglinear model, along with identical standard errors and chi-squares. Similarly, the intercept (and associated statistics) in the logit model is the same as the DEATH coefficient in the loglinear model. This is a general phenomenon. As I’ve said before, every logit model for a contingency table has a corresponding loglinear model. But the main effects in the logit model become 2-way interactions (with the dependent variable) in the loglinear model. The intercept in the logit model becomes a main effect of the dependent variable in the loglinear model.

Now let’s do something peculiar just to emphasize the point. We’ll estimate another logit model, but we’ll switch the variables and make BLACK the dependent variable and DEATH the independent variable:

PROC GENMOD DATA=penalty;
  WEIGHT n;
  MODEL black=death / D=BINOMIAL LINK=LOGIT;
RUN;

Output 10.3. GENMOD Output for Logit Model for 2 × 2 Table with Variables Reversed
           Analysis Of Parameter Estimates

Parameter  DF    Estimate     Std Err   ChiSquare  Pr>Chi

INTERCEPT   1     -0.1446      0.2036      0.5043  0.4776
DEATH       1      0.3857      0.3502      1.2135  0.2706

In Output 10.3, we see again that the slope coefficient (along with its standard error and chi-square) is identical to the two-way interaction in the loglinear model. Unlike bivariate linear regression, where the regression line changes when you reverse the dependent and independent variables, the bivariate logit model is symmetrical with respect to the two variables. We also see that the intercept term corresponds to the main effect of BLACK in the loglinear model.

Why do the logit models share parameters with the loglinear model? The algebra that demonstrates this is really quite simple. The dependent variable in the logit model is the logarithm of the odds. For example, the log-odds of a death sentence for racial group j is log(m1j/m2j). Substituting from the loglinear model in equation (10.1), we get:

But because R1=1 and R2=0, this reduces to:

Similarly, if we reverse the independent and dependent variables, we obtain:

These results show that the loglinear model for the 2 × 2 table implies two logit models, one for the row variable as dependent on the column variable and the other for the column variable as dependent on the row variable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.122.162