Testing independence of proportions

Remember the University of California Berkeley dataset that we first saw when discussing the relationship between two categorical variables in Chapter 3, Describing Relationships. Recall that UCB was sued because it appeared as though the admissions department showed preferential treatment to male applicants. Also recall that we used cross-tabulation to compare the proportion of admissions across categories.

If admission rates were, say 10%, you would expect about one out of every ten applicants to be accepted regardless of gender. If this is the case—that gender has no bearing on the proportion of admits—then gender is independent.

Small deviations from this 10% proportion are, of course, to be expected in the real world and not necessarily indicative of a sexist admissions machine. However, if a test of independence of proportions is significant, that indicates that a deviation as extreme as the one we observed is very unlikely to occur if the variable were truly independent.

A test statistic that captures divergence from an idealized, perfectly independent cross tabulation is the chi-squared statistic Testing independence of proportions statistic), and its sampling distribution is known as a chi-square distribution. If our chi-square statistic falls into the critical region of the chi-square distribution with the appropriate degrees of freedom, then we reject the hypothesis that gender is an independent factor in admissions.

Let's perform one of these chi-square tests on the whole UCB Admissions dataset.

  > # The chi-square test function takes a cross-tabulation
  > # which UCBAdmissions already is. I am converting it from
  > # and back so that you, dear reader, can learn how to do
  > # this with other data that isn't already in cross-tabulation
  > # form
  > ucba <- as.data.frame(UCBAdmissions)
  > head(ucba)
       Admit Gender Dept Freq
  1 Admitted   Male    A  512
  2 Rejected   Male    A  313
  3 Admitted Female    A   89
  4 Rejected Female    A   19
  5 Admitted   Male    B  353
  6 Rejected   Male    B  207
  >
  > # create cross-tabulation
  > cross.tab <- xtabs(Freq ~ Gender+Admit, data=ucba)
  >
  > chisq.test(cross.tab)
  
          Pearson's Chi-squared test with Yates' continuity correction
  
  data:  cross.tab
  X-squared = 91.6096, df = 1, p-value < 2.2e-16

The proportions are almost certainly not independent (p < .0001). Before you conclude that the admissions department is sexist, remember Simpson's Paradox? If you don't, reread the relevant section in Chapter 3, Describing Relationships.

Since the chi-square independence of proportion test can be (and is often used) to compare a whole mess of proportions, it's sometimes referred to an omnibus test, just like the ANOVA. It doesn't tell us what proportions are significantly discrepant, only that some proportions are.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.168.10