Chapter 14

Tests on contingency tables

In this chapter we deal with the question of whether there is an association between two random variables or not. This question can be formulated in different ways. We can ask if the two random variables are independent or test for homogeneity. The corresponding tests are presented in Section 14.1. These are foremost the well-known Fisher's exact test and Pearson's c14-math-0001-test. In Section 14.2 we test if two raters agree on their rating of the same issue. Section 14.3 deals with two risk measures, namely the odds ratio and the relative risk.

14.1 Tests on independence and homogeneity

In this chapter we deal with the two null hypotheses of independence and homogeneity. While a test of independence examines if there is an association between two random variables or not, a test of homogeneity tests if the marginal proportions are the same for different random variables. The test problems in this chapter can be described for the homogeneity hypothesis as well as for the independence hypothesis.

14.1.1 Fisher's exact test

Description: Tests the hypothesis of independence or homogeneity in a c14-math-0002 contingency table.
Assumptions:
  • Data are at least measured on a nominal scale with two possible categories, labeled as c14-math-0003 and c14-math-0004, for each of the two variables c14-math-0005 and c14-math-0006 of interest.
  • The random sample follows a Poisson, Multinomial or Product-Multinomial sampling distribution.
  • A dataset of c14-math-0007 observations is available and presented as a c14-math-0008 contingency table.
Hypotheses: (A) c14-math-0009 vs c14-math-0010
(B) c14-math-0011 vs c14-math-0012
(C) c14-math-0013 vs c14-math-0014
with c14-math-0015,
c14-math-0016 and c14-math-0017
Test statistic: c14-math-0018
Test decision: Reject c14-math-0019 if for the observed value c14-math-0020 of c14-math-0021
(A) c14-math-0022
or c14-math-0023
(B) c14-math-0024
(C) c14-math-0025
p-values: (A) c14-math-0026
(B) c14-math-0027
(C) c14-math-0028
with c14-math-0029
Annotations:
  • The test is based on the exact distribution of the test statistic c14-math-0030 conditional on all marginal frequencies c14-math-0031, which is for all three sampling distributions the hypergeometric distribution with c14-math-0032. Given the marginal totals, c14-math-0033 can take values from c14-math-0034 to c14-math-0035) (Agresti 1990).
  • This test has its origin in Fisher (1934, 1935) and Irwin (1935) and is also called the Fisher–Irwin test.
  • When testing for homogeneity let row variable c14-math-0036 indicate to which of two populations each observation belongs. The test problem considers the probabilities to observe characteristic c14-math-0037 of variable c14-math-0038 in the two populations, usually denoted by c14-math-0039 and c14-math-0040 for the two populations. Hence c14-math-0041 and c14-math-0042. Thereby we have the three test problems (A) c14-math-0043, (B) c14-math-0044, and (C) c14-math-0045. The test procedure is just the same as given above. All three hypotheses can also be expressed in terms of the odds ratio, see Agresti (1990) for details.
  • Fisher's exact test was originally developed for c14-math-0046 tables. Freeman and Halton (1951) extended it to any c14-math-0047 table and multinomial distributed random variables. This test is called Freeman–Halton test as well as just Fisher's exact test like the original test.

Example
To test if there is an association between the malfunction of workpieces and which of two companies A and B produces them. A sample of c14-math-0048 workpieces has been checked with c14-math-0049 for functioning and c14-math-0050 for defective (dataset in Table A.4).


SAS code
proc freq data=malfunction;
 tables company*malfunction /fisher;
run;
SAS output
         Fisher's Exact Test
---------------------------------------
Cell (1,1) Frequency (F)         9
Left-sided Pr <= F          0.0242
Right-sided Pr >= F         0.9960
Table Probability (P)       0.0202
Two-sided Pr <= P           0.0484
Remarks:
  • The procedure proc freq enables Fisher's exact test. After the tables statement the two variables must be specified and separated by a star ().
  • The option fisher invokes Fisher's exact test. Alternatively the option chisq can be used, which also returns Fisher's Exact test in the case of c14-math-0051 tables.
  • Instead of using the raw data as in the example above, it is also possible to use the counts directly by constructing a c14-math-0052 table and handing this over to the function as first parameter:
    data counts;
     input r c counts;
     datalines;
     1 1 9
     1 2 11
     2 1 16
     2 2 4
    run;
    proc freq;
     tables r*c /fisher;
     weight counts;
    run;
    Here the first variable r holds the first index (the rows), the second variable c holds the second index variable (the columns). The variable counts holds the frequencies for each cell. The weight command indicates the variable that holds the frequencies.
  • SAS arranges the factors into the c14-math-0053 table according to the (internal) order unless the weight method is used. The one-sided hypothesis (B) or (C) depends in their interpretation on the way data are arranged in the table, so which table is finally analyzed needs to be carefully checked.


R code
# Read the two variables company and malfunction
x<-malfunction$company
y<-malfunction$malfunction
# Invoke the test
fisher.test(x,y,alternative="two.sided")
R output
Fisher's Exact Test for Count Data
data:  x and y
p-value = 0.04837
Remarks:
  • alternative=“value” is optional and defines the type of alternative hypothesis: “two.sided”= two sided (A); “greater”=one sided (B); “less”=one sided (C). Default is “two.sided”.
  • Instead of using the raw data as in the example above, it is also possible to use the counts directly by constructing a c14-math-0054 table and handing this over to the function as first parameter:
    fisher.test(matrix(c(9,11,16,4), ncol = 2))
  • It is not clear how R arranges the factors into the c14-math-0055 table if the “table” method is not used. For the two-sided hypothesis this does not matter, but for the directional hypotheses it is important. So in the latter case we recommend to construct a c14-math-0056 table and to hand this over to the function.

14.1.2 Pearson's c14-math-0057-test

Description: Tests the hypothesis of independence or homogeneity in a two-dimensional contingency table.
Assumptions:
  • Data are at least measured on a nominal scale with c14-math-0058 and c14-math-0059 possible outcomes of the two variables c14-math-0060 and c14-math-0061 of interest.
  • The random sample follows a Poisson, Multinomial or Product-Multinomial sampling distribution.
  • A dataset of c14-math-0062 observations is available and presented as c14-math-0063 contingency table.
Hypotheses: c14-math-0064 and c14-math-0065 are independent
vs c14-math-0066 and c14-math-0067 are not independent
Test statistic: c14-math-0068
with c14-math-0069 the random variable of cell counts of combination c14-math-0070 and c14-math-0071 the expected cell count.
Test decision: Reject c14-math-0072 if for the observed value c14-math-0073 of c14-math-0074
c14-math-0075
p-values: c14-math-0076
Annotations:
  • This test was introduced by Pearson (1900). Fisher (1922) corrected the degrees of freedom of this test, which Pearson incorrectly thought were c14-math-0077.
  • The test problem can also be stated as:c14-math-0078 for all c14-math-0079.vs c14-math-0080 for at least one pair c14-math-0081,c14-math-0082, c14-math-0083
  • The test statistic c14-math-0084 is asymptotically c14-math-0085-distributed.
  • c14-math-0086 is the c14-math-0087-quantile of the c14-math-0088-distribution withc14-math-0089 degrees of freedom.
  • For c14-math-0090 tables, Yates (1934) supposed a continuity correction for a better approximation to the c14-math-0091-distribution. In this case the test statistic is: c14-math-0092.
  • The number of expected frequencies in each cell of the contingency table should be at least c14-math-0093 to ensure the approximate c14-math-0094-distribution. If this condition is not fulfilled an alternative is Fisher's exact test (Test 14.1.1).
  • Special versions of this test are the c14-math-0095 goodness-of-fit test (Test 12.2.1) and the K-sample binomial test (Test 4.3.1).

Example
To test if there is an association between the malfunction of workpieces and which of two companies A and B produces them. A sample of c14-math-0096 workpieces has been checked with c14-math-0097 for functioning and c14-math-0098 for defective (dataset in Table A.4).


SAS code
proc freq data=malfunction;
 tables company*malfunction /chisq;
run;
SAS output
     Statistics for Table of company by malfunction
Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      5.2267    0.0222
Continuity Adj. Chi-Square     1      3.8400    0.0500
Remarks:
  • The procedure proc freq enables Pearson's c14-math-0099-test. Following the tables statement the two variables must be specified and separated by a star ().
  • The option chisq invokes the test.
  • SAS prints the value of the test statistic and the p-value of the c14-math-0100-test statistic as well as the Yates corrected c14-math-0101-test statistic.
  • Instead of using the raw data, it is also possible to use the counts directly. See Test 14.1.1 for details.


R code
# Read the two variables company and malfunction
x<-malfunction$company
y<-malfunction$malfunction
# Invoke the test
chisq.test(x,y,correct=TRUE)
R output
Pearson's Chi-squared test with Yates' continuity correction
data:  x and y
X-squared = 3.84, df = 1, p-value = 0.05004
Remarks:
  • correct=“value” is optional and determines if Yates' continuity correction is used (value=TRUE) or not (value=FALSE). Default is TRUE.
  • Instead of using the raw data as in the example above, it is also possible to use the counts directly by constructing a c14-math-0102 table and handing this over to the function as first parameter:
    chisq.test(matrix(c(9,11,16,4), ncol = 2))

14.1.3 Likelihood-ratio c14-math-0103-test

Description: Tests the hypothesis of independence or homogeneity in a two-dimensional contingency table.
Assumptions:
  • Data are at least measured on a nominal scale with c14-math-0104 and c14-math-0105 possible outcomes of the two variables c14-math-0106 and c14-math-0107 of interest.
  • The random sample follows a Poisson, Multinomial or Product-Multinomial sampling distribution.
  • A dataset of c14-math-0108 observations is available and presented as c14-math-0109 contingency table.
Hypotheses: c14-math-0110 and c14-math-0111 are independent
vs c14-math-0112 and c14-math-0113 are not independent
Test statistic: c14-math-0114
with c14-math-0115 the random variable of cell counts of combination c14-math-0116 and c14-math-0117 the expected cell count.
Test decision: Reject c14-math-0118 if for the observed value c14-math-0119 of c14-math-0120
c14-math-0121
p-values: c14-math-0122
Annotations:
  • The test statistic c14-math-0123 is asymptotically c14-math-0124-distributed.
  • c14-math-0125 is the c14-math-0126-quantile of the c14-math-0127-distribution with c14-math-0128c14-math-0129 degrees of freedom.
  • This test is an alternative to Pearson's c14-math-0130-test (Test 14.1.2).
  • The approximation to the c14-math-0131-distribution is usually good if c14-math-0132. See Agresti (1990) for more details on this test.

Example
To test if there is an association between the malfunction of workpieces and which of two companies A and B produces them. A sample of c14-math-0133 workpieces has been checked with c14-math-0134 for functioning and c14-math-0135 for defective (dataset in Table A.4).


SAS code
proc freq data=malfunction;
 tables company*malfunction /chisq;
run;
SAS output
Statistics for Table of company by malfunction
Statistic                     DF       Value      Prob
------------------------------------------------------
Likelihood Ratio Chi-Square    1      5.3834    0.0203
Remarks:
  • The procedure proc freq enables the likelihood-ratio c14-math-0136-test. Following the tables statement the two variables must be specified and separated by a star ().
  • The option chisq invokes the test.
  • Instead of using the raw data, it is also possible to use the counts directly. See Test 14.1.1 for details.


R code
# Read the two variables company and malfunction
x<-malfunction$company
y<-malfunction$malfunction
# Get the observed and expected cases
e<-chisq.test(x,y)$expected
o<-chisq.test(x,y)$observed
# Calculate the test statistic
g2<-2*sum(o*log(o/e))
# Get degrees of freedom from function chisq.test()
df<-chisq.test(x,y)$parameter
# Calculate the p-value
p_value=1-pchisq(g2,1)
# Output results
cat(“Likelihood-Ratio Chi-Square Test   

”,
“test statistic   ”,“p-value”,“
”,
“--------------   ----------”,“
”,
“  ”,g2,“     ”,p_value,“ ”,“
”)
R output
Likelihood-Ratio Chi-Square Test
 test statistic    p-value
 --------------   ----------
    5.38341       0.02032911
Remarks:
  • There is no basic R function to calculate the likelihood-ratio c14-math-0137-test directly.
  • We used the R function chisq.test() to calculate the expected and observed observations as well as the degrees of freedom. See Test 14.1.2 for details on this function.

14.2 Tests on agreement and symmetry

Often categorical data are observed in so-called matched pairs, for example, as ratings of two raters on the same objects. Then it is of interest to analyze the agreement of the classification of objects into the categories. We present a test on the kappa coefficient, which is a measurement of agreement. Another question would be if the two raters classify objects into the same classes by the same proportion. For c14-math-0138 tables the McNemar test is given, in which case the hypothesis of marginal homogeneity is equivalent to that of axial symmetry.

14.2.1 Test on Cohen's kappa

Description: Tests if the kappa coefficient, as measure of agreement, differs from zero.
Assumptions:
  • Data are at least measured on a nominal scale.
  • Measurements are taken by letting two raters classify objects into c14-math-0139 categories.
  • The raters make their judgement independently.
  • The two random variables c14-math-0140 and c14-math-0141 describe the rating of the two raters for one subject, respectively, with the c14-math-0142 categories as possible outcomes.
  • Data are summarized in a c14-math-0143 contingency table counting the number of occurrences of the possible combinations of ratings in the sample.
  • A sample of size c14-math-0144 is given, which follows the multinomial sampling scheme.
Hypotheses: (A) c14-math-0145 vs c14-math-0146
(B) c14-math-0147 vs c14-math-0148
(C) c14-math-0149 vs c14-math-0150
where c14-math-0151 is the kappa coefficient
given by c14-math-0152 and c14-math-0153
Test statistic: c14-math-0154
where c14-math-0155,
c14-math-0156
c14-math-0157, c14-math-0158
Test decision: Reject c14-math-0159 if for the observed value c14-math-0160 of c14-math-0161
(A) c14-math-0162 or c14-math-0163
(B) c14-math-0164
(C) c14-math-0165
p-values: (A) c14-math-0166
(B) c14-math-0167
(C) c14-math-0168
Annotations:
  • The kappa coefficient was introduced by Cohen (1960) and is therefore known as Cohen's kappa.
  • c14-math-0169 is under the null hypothesis asymptotically normally distributed with mean c14-math-0170 and variance c14-math-0171.
  • In the case of a perfect agreement c14-math-0172 takes the value c14-math-0173. It becomes c14-math-0174 if the agreement is equal to that given by change. A higher positive value indicates a stronger agreement, whereas negative values suggest that the agreement is weaker than expected by change (Agresti 1990).
  • The above variance formula c14-math-0175 is different from the formula Cohen published. SAS uses the formula from Fleiss et al. (2003), which we present here.

Example
To test if two reviewers of X-rays of the lung agree on their rating of the lung disease silicosis. Judgements from both reviewers on c14-math-0176 patients are available with c14-math-0177 for silicosis and c14-math-0178 for no silicosis (dataset in Table A.9).


SAS code
proc freq;
 tables reviewer1*reviewer2;
 exact kappa;
run;
SAS output
     Simple Kappa Coefficient
--------------------------------
Kappa (K)                 0.3000
ASE                       0.2122
95% Lower Conf Limit     -0.1160
95% Upper Conf Limit      0.7160
      Test of H0: Kappa = 0
ASE under H0              0.2225
Z                         1.3484
One-sided Pr>  Z         0.0888
Two-sided Pr> |Z|        0.1775
Exact Test
One-sided Pr>=  K        0.1849
Two-sided Pr>= |K|       0.3698
Remarks:
  • The procedure proc freq enables this test. After the tables statement the two variables must be specified and separated by a star ().
  • The option exact kappa invokes the test with asymptotic and exact p-values.
  • Instead of using the raw data, it is also possible to use the counts directly. See Test 14.1.1 for details.
  • Alternatively the code
    proc freq data=silicosis;
     tables reviewer1*reviewer2 /agree;
     test agree;
    run;
    can be used, but this will only give the p-values based on the Gaussian approximation.
  • The p-value of hypothesis (C) is not reported and must be calculated as one minus the p-value of hypothesis (B).


R code
# Get the number of observations
n<-length(silicosis$patient)
# Construct a 2x2 table
freqtable <- table(silicosis$reviewer1,silicosis$reviewer2)
# Calculate the observed frequencies
po<-(freqtable[1,1]+freqtable[2,2])/n
# Calculate the expected frequencies
row<-margin.table(freqtable,1)/n
col<-margin.table(freqtable,2)/n
pe<-row[1]*col[1]+row[2]*col[2]
# Calculate the simple kappa coefficient
k<-(po-pe)/(1-pe)
# Calculate the variance under the null hypothesis
var0<-( pe+pe∧2 - (row[1]*col[1]*(row[1]+col[1])+
                   row[2]*col[2]*(row[2]+col[2])))
                  /(n*(1-pe)∧2)
# Calculate the test statistic
z<-k/sqrt(var0)
# Calculate p_values
p_value_A<-2*pnorm(-abs(z))
p_value_B<-1-pnorm(z)
p_value_C<-pnorm(z)
# Output results
k
z
p_value_A
p_value_B
p_value_C
R output
> k
0.3
> z
1.3484
> p_value_A
0.1775299
> p_value_B
0.08876493
> p_value_C
0.9112351
Remarks:
  • There is no basic R function to calculate the test directly.
  • The R function table is used to construct the basic c14-math-0179 table and the R function margin.table is used to get the marginal frequencies of this table.

14.2.2 McNemar's test

Description: Test on axial symmetry or marginal homogeneity in a c14-math-0180 table.
Assumptions:
  • Data are at least measured on a nominal scale.
  • Measurements are taken in matched pairs, for example, by letting two raters classify objects into two categories labeled with c14-math-0181 and c14-math-0182.
  • The random variable c14-math-0183 states the first rating and c14-math-0184 the second rating.
  • Data are summarized in a c14-math-0185 contingency table counting the number of occurrences of the four possible combinations of ratings in the sample.
  • A sample of size c14-math-0186 is given, which follows the multinomial sampling scheme.
Hypotheses: c14-math-0187 vs c14-math-0188
with c14-math-0189 and
c14-math-0190.
Test statistic: c14-math-0191
Test decision: Reject c14-math-0192 if for the observed value c14-math-0193 of c14-math-0194
c14-math-0195
p-values: c14-math-0196
Annotations:
  • The test goes back to McNemar (1947).
  • The hypothesis of symmetry of probabilities c14-math-0197 and c14-math-0198 is equivalent to that of marginal homogeneity c14-math-0199.
  • The test statistic c14-math-0200 is asymptotically c14-math-0201-distributed (Agresti 1990, p. 350).
  • c14-math-0202 is the c14-math-0203-quantile of the c14-math-0204-distribution with one degree of freedom.
  • Sometimes a continuity correction for the better approximation to the c14-math-0205-distribution is proposed. In this case the test statistic is:c14-math-0206.
  • This test is a large sample test as it is based on the asymptotic c14-math-0207-distribution of the test statistic. For small samples an exact test can be based on the binomial distribution of c14-math-0208 conditional on the off-main diagonal total with c14-math-0209. Alternatively the test decision can be based on Markov chain Monte Carlo methods, see Krampe and Kuhnt (2007), which also cover Bowker's test for symmetry as an extension to c14-math-0210 tables.

Example
Of interest is the marginal homogeneity of intelligence quotients over c14-math-0211 before training (IQ1) and after training (IQ2). The dataset contains measurements of c14-math-0212 subjects (dataset in Table A.2), which first need to be transformed into a binary variable given by the cut point of an intelligence quotient of c14-math-0213.


SAS code
* Dichotomize the variables iq1 and iq2;
data temp;
 set iq;
  if iq1<=100 then iq_before=0;
  if iq1> 100 then iq_before=1;
  if iq2<=100 then iq_after=0;
  if iq2> 100 then iq_after=1;
run;
* Apply the test;
proc freq;
 tables iq_before*iq_after;
 exact mcnem;
run;
SAS output
   Statistics for Table of iq_before by iq_after
                McNemar's Test
        ----------------------------
        Statistic (S)         6.0000
        DF                         1
        Asymptotic Pr >  S    0.0143
        Exact      Pr >= S    0.0313
Remarks:
  • The procedure proc freq enables this test. After the tables statement the two variables must be specified and separated by a star ().
  • The option exact mcnem invokes the test with asymptotic and exact p-values.
  • Instead of using the raw data, it is also possible to use the counts directly. See Test 14.1.1 for details.
  • SAS does not provide a continuity correction.


R code
# Dichotomize the variables IQ1 and IQ2
iq_before <- ifelse(iq$IQ1<=100, 0, 1)
iq_after  <- ifelse(iq$IQ2<=100, 0, 1)
# Apply the test
mcnemar.test(iq_before, iq_after, correct = FALSE)
R output
McNemar's Chi-squared test
data:  iq_before and iq_after
McNemar's chi-squared = 6, df = 1, p-value = 0.01431
Remarks:
  • correct=“value” is optional and determines if a continuity correction is used (value=TRUE) or not (value=FALSE). Default is TRUE.
  • Instead of using the raw data as in the example above, it is also possible to use the counts directly by constructing the c14-math-0214 table and handing this over to the function as first parameter:
    freqtable<-table(iq_before, iq_after)
    mcnemar.test(freqtable, correct = FALSE)

14.2.3 Bowker's test for symmetry

Description: Test on symmetry in a c14-math-0215 table.
Assumptions:
  • Data are at least measured on a nominal scale.
  • Measurements are taken in matched pairs, for example, by letting two raters classify objects into c14-math-0216 categories labeled with c14-math-0217 to c14-math-0218.
  • The random variable c14-math-0219 states the first rating and c14-math-0220 the second rating for an individual object.
  • Data are summarized in a c14-math-0221 contingency table counting the number of occurrences of the possible combinations of ratings in the sample.
  • A sample of size c14-math-0222 is given, which follows the multinomial sampling scheme.
Hypotheses: c14-math-0223 for all c14-math-0224
vs c14-math-0225 for at least one pair c14-math-0226, c14-math-0227
with c14-math-0228.
Test statistic: c14-math-0229
Test decision: Reject c14-math-0230 if for the observed value c14-math-0231 of c14-math-0232
c14-math-0233
p-values: c14-math-0234
Annotations:
  • The test was introduced by Bowker (1948) as an extension of McNemar's test for symmetry in c14-math-0235 tables to higher dimensional tables.
  • The test statistic c14-math-0236 is asymptotically c14-math-0237-distributed with c14-math-0238 degrees of freedom (Bowker 1948).
  • c14-math-0239 is the c14-math-0240-quantile of the c14-math-0241-distribution with c14-math-0242 degrees of freedom.
  • Sometimes a continuity correction of the test statistic for the better approximation to the c14-math-0243-distribution is proposed. Edwards 1948 suggested a correction for the McNemar test which extended to Bowker's test reads c14-math-0244. Under the null hypothesis of symmetry c14-math-0245 is also approximately c14-math-0246-distributed.
  • This test is a large sample test as it is based on the asymptotic c14-math-0247-distribution of the test statistic. For small samples test decisions can be based on Markov chain Monte Carlo methods, see Krampe and Kuhnt (2007).

Example
Of interest is the symmetry of the health rating of two general practitioners. The ratings can range from poor (=1) through fair (=2) to good (=3). Ratings of c14-math-0248 patients are observed in the given sample (dataset in Table A.13).


SAS code
* Construct the contingency table;
data counts;
  input gp1 gp2 counts;
  datalines;
  1 1 10
  1 2  8
  1 3 12
  2 1 13
  2 2 14
  2 3  6
  3 1  1
  3 2 10
  3 3 20
run;
* Apply the test;
proc freq;
 tables gp1*gp2;
 weight counts;
 exact agree;
run;
SAS output
Statistics for Table of gp1 by gp2
      Test of Symmetry
   ------------------------
   Statistic (S)    11.4982
   DF                     3
   Pr> S            0.0093
Remarks:
  • The procedure proc freq enables this test. After the tables statement the two variables must be specified and separated by a star ().
  • The first variable gp1 holds the rating index of the first physician, and the second variable gp2 the rating index of the second physician. The variable counts hold the frequency for each cell of the contingency table.
  • The option exact agree invokes Bowker's test if applied to tables larger than c14-math-0249, stating asymptotic and exact p-values.
  • It is also possible to use raw data, see Test 14.1.1 for details.
  • SAS does not provide a continuity correction.


R code
# Construct the contingency table
table<-matrix(c(10,13,1,8,14,10,12,6,20),ncol=3)
# Apply the test
mcnemar.test(table)
R output
   McNemar's Chi-squared test
data:  table
McNemar's chi-squared = 11.4982, df = 3, p-value = 0.009316
Remarks:
  • R uses the function mcnemar.test to apply Bowker's test for symmetry, but a continuity correction is not provided.
  • It is also possible to use raw data, see Test 14.1.1 for details.

14.3 Test on risk measures

In this section we introduce tests for two common risk measures in c14-math-0250 tables. The odds ratio and the relative risks are mainly used in epidemiology to identify risk factors for an health outcome. Note, for risk estimates a confidence interval is in most cases more meaningful than a test, because the confidence interval reflects the variability of an estimator.

14.3.1 Large sample test on the odds ratio

Description: Tests if the odds ratio in a c14-math-0251 contingency table differs from unity.
Assumptions:
  • Data are at least measured on a nominal scale with two possible categories, labeled as c14-math-0252 and c14-math-0253, for each of the two variables c14-math-0254 and c14-math-0255 of interest.
  • The random sample follows a Poisson, Multinomial or Product-Multinomial sampling distribution.
  • A dataset of c14-math-0256 observations is available and presented as a c14-math-0257 contingency table.
Hypotheses: (A) c14-math-0258 vs c14-math-0259
(B) c14-math-0260 vs c14-math-0261
(C) c14-math-0262 vs c14-math-0263
where c14-math-0264 is the odds ratio.
Test statistic: c14-math-0265
c14-math-0266
c14-math-0267
Test decision: Reject c14-math-0268 if for the observed value c14-math-0269 of c14-math-0270
(A) c14-math-0271 or c14-math-0272
(B) c14-math-0273
(C) c14-math-0274
p-values: (A) c14-math-0275
(B) c14-math-0276
(C) c14-math-0277
Annotations:
  • The statistic c14-math-0278 is asymptotically Gaussian distributed and c14-math-0279 is an estimator of its asymptotic standard error (Agresti 1990, p. 54).
  • c14-math-0280 is the c14-math-0281-quantile of the standard normal distribution.
  • The odds ratio is also called the cross-product ratio as it can be expressed as a ratio of probabilities diagonally opposite in the table, c14-math-0282.
  • c14-math-0283 means in row c14-math-0284 response c14-math-0285 is more likely than in row c14-math-0286, and if c14-math-0287 response c14-math-0288 is in row c14-math-0289 less likely than in row c14-math-0290. The further away the odds ratio lies from unity the stronger is the association. If c14-math-0291 rows and columns are independent.
  • This is a large sample test. In the case of small sample sizes Fisher's exact test can be used (14.1.1) as c14-math-0292 is equivalent to independence.
  • Cornfield (1951) showed that the odds ratio is an estimate for the relative risk in case-control studies.
  • The concept of odds ratios can be extended to larger contingency tables. Furthermore it is possible to adjust for other variables by using logistic regression.

Example
To test the odds ratio of companies A and B with respect to the malfunction of workpieces produced by them. A sample of c14-math-0293 workpieces has been checked with c14-math-0294 for functioning and c14-math-0295 for defective (dataset in Table A.4).


SAS code
* Sort the dataset in the right order;
proc sort data=malfunction;
 by company descending malfunction;
run;
* Use proc freq to get the counts saved into freq_table;
proc freq order=data;
 tables company*malfunction /out=freq_table;
run;
* Get the counts out of freq_table;
data n11 n12 n21 n22;
 set freq_table;
 if company='A' and malfunction=1 then do;
    keep count; output n11;
 end;
 if company='A' and malfunction=0 then do;
    keep count; output n12;
 end;
 if company='B' and malfunction=1 then do;
    keep count; output n21;
 end;
 if company='B' and malfunction=0 then do;
    keep count; output n22;
 end;
run;
* Rename counts;
 data n11; set n11; rename count=n11; run;
 data n12; set n12; rename count=n12; run;
 data n21; set n21; rename count=n21; run;
 data n22; set n22; rename count=n22; run;
* Merge counts together and calculate test statistic;
data or_table;
 merge n11 n12 n21 n22;
 * Calculate the Odds Ratio;
 OR=(n11*n22)/(n12*n21);
 * Calculate the standard deviation of ln(OR);
 SD=sqrt(1/n11+1/n12+1/n22+1/n21);
 * Calculate test statistic;
 z=log(OR)/SD;
 * Calculate p-values;
 p_value_A=2*probnorm(-abs(z));
 p_value_B=1-probnorm(z);
 p_value_C=probnorm(z);
run;
* Output results;
proc print split='*' noobs;
 var OR z p_value_A p_value_B p_value_C;
 label OR='Odds Ratio*----------'
       z='Test Statistic*--------------'
       p_value_A='p-value A*---------'
    p_value_B='p-value B*---------'
    p_value_C='p-value C*---------';
 title 'Test on the Odds Ratio';
run;
SAS output
                         Test on the Odds Ratio
Odds Ratio    Test Statistic    p-value A    p-value B
----------    --------------    ---------    ---------
 4.88889         2.21241        0.026938     0.013469
p-value C
---------
 0.98653
Remarks:
  • The above code calculates the odds ratio for the malfunctions of company A vs B. An odds ratio of c14-math-0296 means that a malfunction in company A is c14-math-0297 times more likely than in company B. Changing the rows of the table results in an estimated odds ratio of c14-math-0298, which means that a malfunction in company B is c14-math-0299 less likely than in company A.
  • There is no generic SAS function to calculate the p-value in a c14-math-0300 table directly, but logistic regression can be used as in the following code:
    proc logistic data=malfunction;
     class company (PARAM=REF REF='B'),
     model malfunction (event='1') = company;
    run;
    Note, this code correctly returns the above two-sided p-value and also the odds ratio of c14-math-0301, because with the code class company (PARAM=REF REF='B'), we tell SAS to use company B as reference. One-sided p-values are not given.
  • Also with proc freq the odds ratio itself can be calculated.
    * Sort the dataset in the right order;
    proc sort data=malfunction;
     by company descending malfunction;
    run;
    * Apply the test;
    proc freq order=data;
     tables company*malfunction /relrisk;
     exact comor;
    run;
    However, no p-values are reported.


R code
# Get the cell counts for the 2x2 table
n11<-sum(malfunction$company=='A' &
                        malfunction$malfunction==1)
n12<-sum(malfunction$company=='A' &
                        malfunction$malfunction==0)
n21<-sum(malfunction$company=='B' &
                        malfunction$malfunction==1)
n22<-sum(malfunction$company=='B' &
                        malfunction$malfunction==0)
# Calculate the Odds Ratio
OR=(n11*n22)/(n12*n21)
# Calculate the standard deviation of ln(OR)
SD=sqrt(1/n11+1/n12+1/n22+1/n21)
# Calculate test statistic
z=log(OR)/SD
# Calculate p-values
p_value_A<-2*pnorm(-abs(z));
p_value_B<-1-pnorm(z);
p_value_C<-pnorm(z);
# Output results
OR
z
p_value_A
p_value_B
p_value_C
R output
> OR
[1] 4.888889
> z
[1] 2.212413
> p_value_A
[1] 0.02693816
> p_value_B
[1] 0.01346908
> p_value_C
[1] 0.986531
Remarks:
  • The above code calculates the odds ratio for the malfunctions of company A vs B. An odds ratio of c14-math-0302 means that a malfunction in company A is c14-math-0303 times more likely than in company B. Changing the rows in the table results in an odds ratio of c14-math-0304 and means that a malfunction in company B is c14-math-0305 less likely than in company A.
  • There is no generic R function to calculate the odds ratio in a c14-math-0306 table, but logistic regression can be used as in the following code:
    x<-malfunction$company
    y<-malfunction$malfunction
    summary(glm(x∼y,family=binomial(link=“logit”)))
    Note, this code correctly returns the above two-sided p-value, but not the odds ratio of c14-math-0307, due to the used specification of which factors enter the regression in which order. Here, R returns a log(odds ratio) of c14-math-0308 which equals an odds ratio of c14-math-0309 (see first remark). One-sided p-values are not given.

14.3.2 Large sample test on the relative risk

Description: Tests if the relative risk in a c14-math-0310 contingency table differs from unity.
Assumptions:
  • Data are at least measured on a nominal scale with two possible categories, labeled as c14-math-0311 and c14-math-0312, for each of the two variables c14-math-0313 and c14-math-0314 of interest.
  • The random sample follows a Poisson, Multinomial or Product-Multinomial sampling distribution.
  • A dataset of c14-math-0315 observations is available and presented as a c14-math-0316 contingency table.
Hypotheses: (A) c14-math-0317 vs c14-math-0318
(B) c14-math-0319 vs c14-math-0320
(C) c14-math-0321 vs c14-math-0322
with c14-math-0323 the relative risk.
Test statistic: c14-math-0324
c14-math-0325
c14-math-0326
Test decision: Reject c14-math-0327 if for the observed value c14-math-0328 of c14-math-0329
(A) c14-math-0330 or c14-math-0331
(B) c14-math-0332
(C) c14-math-0333
p-values: (A) c14-math-0334
(B) c14-math-0335
(C) c14-math-0336
Annotations:
  • The statistic c14-math-0337 is asymptotically Gaussian distributed and c14-math-0338 is an estimator of its asymptotic standard error (Agresti 1990, p. 55).
  • c14-math-0339 is the c14-math-0340-quantile of the standard normal distribution.
  • c14-math-0341 means that in row c14-math-0342 of the table the risk of response c14-math-0343 is higher than in row c14-math-0344, and if c14-math-0345 the risk of response c14-math-0346 is in row c14-math-0347 lower than in row c14-math-0348. The further away the RR ratio is from unity the stronger is the association. If c14-math-0349 rows and columns are independent and there is no risk. The relative risk can also defined in terms of columns instead of rows.
  • This is a large sample test.
  • The concept of relative risk can be extended to larger contingency tables and it is possible to adjust for other variables by using generalized linear models.

Example
To test the relative risk of a malfunction in workpieces produced in company A compared with company B. A sample of c14-math-0350 workpieces has been checked with c14-math-0351 for functioning and c14-math-0352 for defective (dataset in Table A.4).


SAS code
* Sort the dataset in the right order;
proc sort data=malfunction;
 by company descending malfunction;
run;
* Use proc freq to get the counts saved into freq_table;
proc freq order=data;
 tables company*malfunction /out=freq_table;
run;
* Get the counts out of freq_table;
data n11 n12 n21 n22;
 set freq_table;
 if company='A' and malfunction=1 then do;
    keep count; output n11;
 end;
 if company='A' and malfunction=0 then do;
    keep count; output n12;
 end;
 if company='B' and malfunction=1 then do;
    keep count; output n21;
 end;
 if company='B' and malfunction=0 then do;
    keep count; output n22;
 end;
run;
* Rename counts;
 data n11; set n11; rename count=n11; run;
 data n12; set n12; rename count=n12; run;
 data n21; set n21; rename count=n21; run;
 data n22; set n22; rename count=n22; run;
* Merge counts and calculate test statistic;
data rr_table;
 merge n11 n12 n21 n22;
 * Calculate the Relative Risk;
 RR=(n11/(n11+n12))/(n21/(n21+n22));
 * Calculate the standard deviation of ln(RR);
 SD=sqrt(1/n11-1/(n11+n12)+1/n21-1/(n21+n22));
 * Calculate test statistic;
 z=log(RR)/SD;
 * Calculate p-values;
 p_value_A=2*probnorm(-abs(z));
 p_value_B=1-probnorm(z);
 p_value_C=probnorm(z);
run;
* Output results;
proc print split='*' noobs;
 var RR z p_value_A p_value_B p_value_C;
 label RR='Relative Risk*-------------'
       z='Test Statistic*--------------'
       p_value_A='p-value A*---------'
    p_value_B='p-value B*---------'
    p_value_C='p-value C*---------';
 title 'Test on the Relative Risk';
run;
SAS output
                  Test on the Relative Risk
Relative Risk    Test Statistic    p-value A    p-value B
-------------    --------------    ---------    ---------
    2.75            2.06102        0.039301     0.019650
 p-value C
 ---------
  0.98035
Remarks:
  • The above code calculates the relative risk of malfunctions in products from company A vs B. The risk is c14-math-0353 times higher in company A than in company B. Changing the rows of the table results in an estimated relative risk of c14-math-0354 and means that a malfunction in a product from company B is c14-math-0355 times less likely than from company A.
  • There is no generic SAS function to calculate the p-values of a relative risk ratio in a c14-math-0356 table, but generalized linear models can be used as in the following code:
    proc genmod data = malfunction descending;
     class company (PARAM=REF REF='B'),
     model malfunction=company /dist=binomial link=log;
     run;
    Note, this code correctly returns the above two-sided p-value and also the relative risk of c14-math-0357, as with the code class company (PARAM=REF REF='B') we tell SAS to use company B as reference. SAS returns here a log(relative risk) of c14-math-0358 which equals a relative risk of c14-math-0359 (see first remark). One-sided p-values are not given.
  • However, with proc freq the relative risk itself can be calculated but not the p-values:
    * Sort the dataset in the right order;
    proc sort data=malfunction;
     by company descending malfunction;
    run;
    * Apply the test;
    proc freq order=data;
     tables company*malfunction /relrisk;
    run;
    In the output the Cohort (Col1 Risk) states our wanted relative risk estimate as we are interested in the risk between row c14-math-0360 and row c14-math-0361.


R code
# Get the cell counts for the 2x2 table
n11<-sum(malfunction$company=='A' &
                        malfunction$malfunction==1)
n12<-sum(malfunction$company=='A' &
                        malfunction$malfunction==0)
n21<-sum(malfunction$company=='B' &
                        malfunction$malfunction==1)
n22<-sum(malfunction$company=='B' &
                        malfunction$malfunction==0)
# Calculate the Relative Risk
RR=(n11/(n11+n12))/(n21/(n21+n22))
# Calculate the standard deviation of ln(RR)
SD=sqrt(1/n11-1/(n11+n12)+1/n21-1/(n21+n22))
# Calculate test statistic
z=log(RR)/SD
# Calculate p-values
p_value_A<-2*pnorm(-abs(z));
p_value_B<-1-pnorm(z);
p_value_C<-pnorm(z);
# Output results
RR
z
p_value_A
p_value_B
p_value_C
R output
> RR
[1] 2.75
> z
[1] 2.061022
> p_value_A
[1] 0.03930095
> p_value_B
[1] 0.01965047
> p_value_C
[1] 0.9803495
Remarks:
  • The above code calculates the relative risk of malfunctions in products from company A vs B. The risk of a malfunction in a product is c14-math-0362 times higher in company A than in company B. Changing the rows in the table results in an estimated relative risk of c14-math-0363 and means that a malfunction in a product from company B is c14-math-0364 times less likely than from company A.
  • There is no generic R function to calculate the relative risk ratio in a c14-math-0365 table, but generalized linear models can be used. The following code will do that:
    x<-malfunction$company
    y<-malfunction$malfunction
    summary(glm(y∼x,family=binomial(link=“logit”)))
    Note, this code correctly returns the above two-sided p-value, but not the relative risk of c14-math-0366, due to the used specification of which factors enter the regression in which order. Here, R returns a log(relative risk) of c14-math-0367 which equals a relative risk of c14-math-0368 (see first remark). One-sided p-values are not given.

References

Agresti A. 1990 Categorical Data Analysis. John Wiley & Sons, Ltd.

Bowker A.H. 1948 A test for symmetry in contingency tables. Journal of the American Statistical Associtaion 43, 572–574.

Cohen J. 1960 A coefficient of agreement for nominal scales. Educational and Psychological Measurement 10, 37–46.

Cornfield J. 1951 A method of estimation comparative rates from clinical data. Applications to cancer of the lung, breast and cervix. Journal of the National Cancer Institute 11, 1229–1275.

Edwards A.L. 1948. Note on the correction for continuity in testing the significance of the difference between correlated proportions. Psychometrika 13, 185–187.

Fisher R.A. 1922 On the interpretation of chi-square from contingency tables, and the calculation of P. Journal of the Royal Statistical Society 85, 87–94.

Fisher R.A. 1934 Statistical Methods for Research Workers, 5th edn. Oliver & Boyd.

Fisher R.A. 1935 The logic of inductive inference. Journal of the Royal Statistical Society, Series A 98, 39–54.

Fleiss J.L., Levin B. and Paik M.C. 2003 Statistical Methods for Rates and Proportions, 3rd edn. John Wiley & Sons, Ltd.

Freeman G.H. and Halton J.H. 1951 Note on an exact treatment of contingency, goodness of fit and other problems of significance. Biometrika 38, 141–149.

Irwin J.O. 1935 Tests of significance for differences between percentages based on small numbers. Metron 12, 83–94.

Krampe A. and Kuhnt S. 2007 Bowker's test for symmetry and modifications within the algebraic framework. Computational Statistics & Data Analysis 51, 4124–4142.

McNemar Q. 1947 Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157.

Pearson K. 1900 On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine 50, 157–175.

Yates F. 1934 Contingency tables involving small numbers and the c14-math-0369 test. Journal of the Royal Statistical Society Supplement 34, 217–235.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.17