9.8 A Nonparametric Test for Correlation (Optional)

When the simple linear regression assumptions (Section 9.3) are violated (e.g., the random error has a highly skewed distribution), an alternative method of analysis may be required. One approach is to apply a nonparametric test for correlation based on ranks.

To illustrate, suppose 10 new paintings are shown to two art critics and each critic ranks the paintings from 1 (best) to 10 (worst). We want to determine whether the critics’ ranks are related. Does a correspondence exist between their ratings? If a painting is ranked high by critic 1, is it likely to be ranked high by critic 2? Or do high rankings by one critic correspond to low rankings by the other? That is, are the rankings of the critics correlated?

If the rankings are as shown in the “Perfect Agreement” columns of Table 9.6, we immediately notice that the critics agree on the rank of every painting. High ranks correspond to high ranks and low ranks to low ranks. This is an example of a perfect positive correlation between the ranks. In contrast, if the rankings appear as shown in the “Perfect Disagreement” columns of Table 9.6, then high ranks for one critic correspond to low ranks for the other. This is an example of perfect negative correlation.

Table 9.6 Rankings of 10 Paintings by Two Critics

Alternate View
Perfect AgreementPerfect Disagreement
Painting Critic 1 Critic 2 Critic 1 Critic 2
1 4 4 9 2
2 1 1 3 8
3 7 7 5 6
4 5 5 1 10
5 2 2 2 9
6 6 6 10 1
7 8 8 6 5
8 3 3 4 7
9 10 10 8 3
10 9 9 7 4

In practice, you will rarely see perfect positive or perfect negative correlation between the ranks. In fact, it is quite possible for the critics’ ranks to appear as shown in Table 9.7. Note that these rankings indicate some agreement between the critics, but not perfect agreement, thus pointing up a need for a measure of rank correlation.

Spearman’s rank correlation coefficient, rs, provides a measure of correlation between ranks. The formula for this measure of correlation is given in the next box. We also give a formula that is identical to rs when there are no ties in rankings; this formula provides a good approximation to rs when the number of ties is small relative to the number of pairs.

Note that if the ranks for the two critics are identical, as in the second and third columns of Table 9.6, the differences between the ranks will all be 0. Thus,

rs=16d2n(n21)=16(0)10(99)=1

Table 9.7 Rankings of Paintings: Less-than-Perfect Agreement

Alternate View
CriticDifference between Rank 1 and Rank 2
Painting 1 2 d d2
1 4 5 1 1
2 1 2 1 1
3 9 10 1 1
4 5 6 1 1
5 2 1 1 1
6 10 9 1 1
7 7 7 0 0
8 3 3 0 0
9 6 4 2 4
10 8 8 0 0
Σd2=10

That is, perfect positive correlation between the pairs of ranks is characterized by a Spearman correlation coefficient of rs=1. When the ranks indicate perfect disagreement, as in the fourth and fifth columns of Table 9.6, Σdi2=330 and

rs=16(330)10(99)=1.

Thus, perfect negative correlation is indicated by rs=1.

Biography Charles E. Spearman (1863–1945)

Spearman’s Correlation

London-born Charles Spearman was educated at Leamington College before joining the British Army. After 20 years as a highly decorated officer, Spearman retired from the army and moved to Germany to begin his study of experimental psychology at the University of Leipzig. At the age of 41, he earned his Ph.D. and ultimately became one of the most influential figures in the field of psychology. Spearman was the originator of the classical theory of mental tests and developed the “two-factor” theory of intelligence. These theories were used to develop and support the “Plus-Elevens” tests in England: exams administered to British 11-year-olds that predict whether they should attend a university or a technical school. Spearman was greatly influenced by the works of Francis Galton (p. 502); consequently, he developed a strong statistical background. While conducting his research on intelligence, he proposed the rank-order correlation coefficient—now called “Spearman’s correlation coefficient.” During his career, Spearman spent time at various universities, including University College (London), Columbia University, Catholic University, and the University of Cairo (Egypt).

Spearman’s Rank Correlation Coefficient

rs=SSuvSSuuSSvv

where

  • SSuv=(uiu¯)(viv¯)=uivi(ui)(vi)nSSuu=(uiu¯)2=ui2(ui)2nSSvv=(viv¯)2=vi2(vi)2nui=Rank of the ith observation in sample 1vi=Rank of the ith observation in sample 2n=Number of pairs of observations (number of observations in each sample)

Shortcut Formula for rs*

rs=16di2n(n21)

where

  • di=uivi(difference in the ranks of the ith observations for samples 1 and 2)

  • n=number of pairs of observations (number of observations in each sample)

For the data of Table 9.7,

rs=16d2n(n21)=16(10)10(99)=1699=.94

The fact that rs is close to 1 indicates that the critics tend to agree, but the agreement is not perfect.

The value of rs always falls between 1 and +1, with +1 indicating perfect positive correlation and 1 indicating a perfect negative correlation. The closer rs falls to +1 or 1, the greater the correlation between the ranks. Conversely, the nearer rs is to 0, the less is the correlation.

Note that the concept of correlation implies that two responses are obtained for each experimental unit. In the art critics example, each painting received two ranks (one from each critic) and the objective of the study was to determine the degree of positive correlation between the two rankings. Rank correlation methods can be used to measure the correlation between any pair of variables. If two variables are measured on each of n experimental units, we rank the measurements associated with each variable separately. Ties receive the average of the ranks of the tied observations. Then we calculate the value of rs for the two rankings. This value measures the rank correlation between the two variables. We illustrate the procedure in Example 9.9.

Example 9.9 Spearman’s Rank Correlation—Smoking versus Babies’ Weights

Problem

  1. A study is conducted to investigate the relationship between cigarette smoking during pregnancy and the weights of newborn infants. The 15 women smokers who make up the sample kept accurate records of the number of cigarettes smoked during their pregnancies, and the weights of their children were recorded at birth. The data are given in Table 9.8.

    Table 9.8 Data and Calculations for Example 9.9

    Alternate View
    Woman Cigarettes per Day Rank Baby’s Weight (pounds) Rank d d2
    1 12 1 7.7 5 4 16
    2 15 2 8.1 9 7 49
    3 35 13 6.9 4 9 81
    4 21 7 8.2 10 3 9
    5 20 5.5 8.6 13.5 8 64
    6 17 3 8.3 11.5 8.5 72.25
    7 19 4 9.4 15 11 121
    8 46 15 7.8 6 9 81
    9 20 5.5 8.3 11.5 6 36
    10 25 8.5 5.2 1 7.5 56.25
    11 39 14 6.4 3 11 121
    12 25 8.5 7.9 7 1.5 2.25
    13 30 12 8.0 8 4 16
    14 27 10 6.1 2 8 64
    15 29 11 8.6 13.5 2.5 6.25
    Total=795

    Data Set: NEWBORN

    1. Calculate and interpret Spearman’s rank correlation coefficient for the data.

    2. Use a nonparametric test to determine whether level of cigarette smoking and weights of newborns are negatively correlated for all smoking mothers. Use α=.05.

Solution

  1. We first rank the number of cigarettes smoked per day, assigning a 1 to the smallest number (12) and a 15 to the largest (46). Note that the two ties receive the averages of their respective ranks. Similarly, we assign ranks to the 15 babies’ weights. Since the number of ties is relatively small, we will use the shortcut formula to calculate rs. The differences d between the ranks of the babies’ weights and the ranks of the number of cigarettes smoked per day are shown in Table 9.8. The squares of the differences, d2, are also given. Thus,

    rs=16di2n(n21)=16(795)15(1521)=11.42=.42

    The value of rs can also be obtained by computer. A SAS printout of the analysis is shown in Figure 9.27. The value of rs, highlighted on the printout, agrees (except for rounding) with our hand-calculated value, .42. The negative correlation coefficient indicates that in this sample, an increase in the number of cigarettes smoked per day is associated with (but is not necessarily the cause of) a decrease in the weight of the newborn infant.

  2. If we define ρ as the population rank correlation coefficient [i.e., the rank correlation coefficient that could be calculated from all (x, y) values in the population], we can determine whether level of cigarette smoking and weights of newborns are negatively correlated by conducting the following test:

    • H0:ρ=0 (no population correlation between ranks)

    • Ha:ρ<0 (negative population correlation between ranks)

    • Test statistic: rs(the sample Spearman rank correlation coefficient)

    To determine a rejection region, we consult Table XI in Appendix B, which is partially reproduced in Table 9.9. Note that the left-hand column gives values of n, the number of pairs of observations. The entries in the table are values for an upper-tail rejection region, since only positive values are given. Thus, for n=15 and α=.05, the value .441 is the boundary of the upper-tailed rejection region, so P(rs>.441)=.05 if H0:ρ=0 is true. Similarly, for negative values of rs, we have P(rs<.441)=.05 if ρ=0. That is, we expect to see rs<.441 only 5% of the time if there is really no relationship between the ranks of the variables.

    The lower-tailed rejection region is therefore

    Rejectionregion(α=.05):rs<.441

    Since the calculated rs=.42 is not less than .441, we cannot reject H0 at the α=.05 level of significance. That is, this sample of 15 smoking mothers provides insufficient evidence to conclude that a negative correlation exists between the number of cigarettes smoked and the weight of newborns for the populations of measurements corresponding to all smoking mothers. This does not, of course, mean that no relationship exists. A study using a larger sample of smokers and taking other factors into account (father’s weight, sex of newborn child, etc.) would be more likely to reveal whether smoking and the weight of a newborn child are related.

    Figure 9.27

    SAS Spearman correlation printout for Example 9.9

Table 9.9 Reproduction of Part of Table XI in Appendix B: Critical Values of Spearman’s Rank Correlation Coefficient

Alternate View
n α=.05 α=.025 α=.01 α=.005
5 .900
6 .829 .886 .943
7 .714 .786 .893
8 .643 .738 .833 .881
9 .600 .683 .783 .833
10 .564 .648 .745 .794
11 .523 .623 .736 .818
12 .497 .591 .703 .780
13 .475 .566 .673 .745
14 .457 .545 .646 .716
15 .441 .525 .623 .689
16 .425 .507 .601 .666
17 .412 .490 .582 .645
18 .399 .476 .564 .625
19 .388 .462 .549 .608
20 .377 .450 .534 .591

Look Back

The two-tailed p-value of the test (.1145) is highlighted on the SAS printout, shown in Figure 9.27. Since the lower-tailed p-value, 1145/2=.05725, exceeds α=.05, our conclusion is the same: Do not reject H0.

Now Work Exercise 9.128

A summary of Spearman’s nonparametric test for correlation is given in the following box:

Nonparametric Spearman’s Rank Correlation Test

Let ρ represent the population rank correlation coefficient.

Test statistic: rs, the sample rank correlation coefficient (see formulas in the previous box)

Ties: Assign tied absolute differences the average of the ranks they would receive if they were unequal but occurred in successive order. For example, if the third-ranked and fourth-ranked absolute differences are tied, assign each a rank of (3+4)/2=3.5. (The number of ties should be small relative to the total sample size, n.)

One-Tailed Tests Two-Tailed Test
H0:ρ=0 H0:ρ=0 H0:ρ=0
Ha:ρ<0 Ha:ρ>0 Ha:ρ0
Rejection region: rs<rs,α rs>rs,α |rs|>rs,α/2

Decision: Reject H0 if test statistic falls into the rejection region where rs,α is obtained from Table XI of Appendix B.

Conditions Required for a Valid Spearman’s Test

  1. The sample of experimental units on which the two variables are measured is randomly selected.

  2. The probability distributions of the two variables are continuous.

Exercises 9.124—9.139

Understanding the Principles

  1. 9.124 What conditions are required for a valid Spearman’s test?

  2. 9.125 What is the value of rs when there is perfect negative rank correlation between two variables? Perfect positive rank correlation?

Learning the Mechanics

  1. 9.126 Specify the rejection region for Spearman’s nonparametric test for rank correlation in each of the following situations:

    1. H0:ρ=0,Ha:ρ0,n=10,α=.05

    2. H0:ρ=0,Ha:ρ>0,n=20,α=.025

    3. H0:ρ=0,Ha:ρ<0,n=30,α=.01

  2. 9.127 Compute Spearman’s rank correlation coefficient for each of the following pairs of sample observations:

    1. Alternate View
      x 33 61 20 19 40
      y 26 36 65 25 35
    2. Alternate View
      x 89 102 120 137 41
      y 81 94 75 52 136
    3. Alternate View
      x 2 15 4 10
      y 11 2 15 21
    4. Alternate View
      x 5 20 15 10 3
      y 80 83 91 82 87
  3. L09128 9.128 The following sample data were collected on variables x and y:

    Alternate View
    x 0 3 0 4 3 0 4
    y 0 2 2 0 3 1 2
    1. Specify the null and alternative hypotheses that should be used in conducting a hypothesis test to determine whether the variables x and y are correlated.

    2. Conduct the test of part a, using α=.05.

    3. What is the approximate p-value of the test of part b?

    4. What assumptions are necessary to ensure the validity of the test of part b?

Applying the Concepts—Basic

  1. MOON 9.129 Measuring the moon’s orbit. Refer to the American Journal of Physics (Apr. 2014) study of the moon’s orbit, Exercise 9.23 (p. 513). Recall that photographs were used to measure the angular size (in pixels) of the moon at various distances (heights) above the horizon (measured in degrees). The data for 13 different heights are reproduced in the table.

    1. Rank the angular sizes from smallest to largest.

    2. Rank the heights above the horizon from smallest to largest.

    3. Use the results, parts a and b, to compute Spearman’s rank correlation coefficient, rs.

    4. Carry out a test to determine if angular size and height above horizon have a positive rank correlation. Test using α=.05.

      Angle Height
      321.9 17
      322.3 18
      322.4 26
      323.2 32
      323.4 38
      324.4 42
      325.0 49
      325.7 52
      325.8 57
      325.0 60
      326.9 63
      326.0 67
      325.8 73
  2. ANTS 9.130 Mongolian desert ants. Refer to the Journal of Biogeography (Dec. 2003) study of ants in Mongolia, presented in Exercise 9.26 (p. 514). Data on annual rainfall, maximum daily temperature, and number of ant species recorded at each of 11 study sites are reproduced in the table below.

    Alternate View
    Site Region Annual Rainfall (mm) Max. Daily Temp. (°C) Number of Ant Species
    1 Dry Steppe 196 5.7 3
    2 Dry Steppe 196 5.7 3
    3 Dry Steppe 179 7.0 52
    4 Dry Steppe 197 8.0 7
    5 Dry Steppe 149 8.5 5
    6 Gobi Desert 112 10.7 49
    7 Gobi Desert 125 11.4 5
    8 Gobi Desert 99 10.9 4
    9 Gobi Desert 125 11.4 4
    10 Gobi Desert 84 11.4 5
    11 Gobi Desert 115 11.4 4

    Based on Pfeiffer, M., et al. “Community organization and species richness of ants in Mongolia along an ecological gradient from steppe to Gobi desert.” Journal of Biogeography, Vol. 30, No. 12, Dec. 2003 (Tables 1 and 2).

    1. Consider the data for the five sites in the Dry Steppe region only. Rank the five annual rainfall amounts. Then rank the five maximum daily temperature values.

    2. Use the ranks from part a to find and interpret the rank correlation between annual rainfall (y) and maximum daily temperature (x).

    3. Repeat parts a and b for the six sites in the Gobi Desert region.

    4. Now consider the rank correlation between the number of ant species (y) and annual rainfall (x). Using all the data, compute and interpret Spearman’s rank correlation statistic.

  3. POLO 9.131 Game performance of water polo players. Refer to the Biology of Sport (Vol. 31, 2014) study of the physiological performance of top-level water polo players, Exercise 9.24 (p. 513). Two variables were measured for each of eight Olympic male water polo players during competition: mean heart rate over the four quarters of the game (expressed as a percentage of maximum heart rate) and maximal oxygen uptake (VO2max). Simulated data are shown in the accompanying table. Spearman’s rank correlation between the two variables is shown in the accompanying SAS printout.

    Player HR% VO2Max
    1 55 148
    2 54 157
    3 70 160
    4 67 179
    5 74 179
    6 77 180
    7 78 194
    8 85 197

    1. Locate the value of rs on the printout and interpret its value.

    2. Rank the data and then calculate rs to verify the result shown on the printout.

    3. Is there sufficient evidence (at α=.01) of positive rank correlation between mean heart rate and maximal oxygen uptake in the population of Olympic male water polo players?

  4. TRAPS 9.132 Lobster fishing study. Refer to the Bulletin of Marine Science (Apr. 2010) study of teams of fishermen fishing for the red spiny lobster in Baja California Sur, Mexico, Exercise 9.63 (p. 529). Recall that two variables measured for each of 8 teams from the Punta Abreojos (PA) fishing cooperative were total catch of lobsters (in kilograms) during the season and average percentage of traps allocated per day to exploring areas of unknown catch (called search frequency). These data are reproduced in the table on p. 529.

    Total Catch Search Frequency
    2,785 35
    6,535 21
    6,695 26
    4,891 29
    4,937 23
    5,727 17
    7,019 21
    5,735 20

    From Shester, G. G. “Explaining catch variation among Baja California lobster fishers through spatial analysis of trap-placement decisions.” Bulletin of Marine Science, Vol. 86, No. 2, Apr. 2010 (Table 1). Reprinted with permission from the University of Miami, Bulletin of Marine Science.

    1. Rank the total catch values from 1 to 8.

    2. Rank the search frequency values from 1 to 8.

    3. Use the ranks, parts a and b, to compute Spearman’s rank correlation coefficient.

    4. Based on the result, part c, is there sufficient evidence to indicate that total catch is negatively rank correlated with search frequency? Test using α=.05.

  5. BOXING2 9.133 Effect of massage on boxers. Refer to the British Journal of Sports Medicine (Apr. 2000) study of the effect of massaging boxers between rounds, presented in Exercise 9.70 (p. 530). Two variables measured on the boxers were blood lactate level (y) and the boxer’s perceived recovery (x). The data for 16 five-round boxing performances are reproduced in the table.

    Blood Lactate Level Perceived Recovery
    3.8 7
    4.2 7
    4.8 11
    4.1 12
    5.0 12
    5.3 12
    4.2 13
    2.4 17
    3.7 17
    5.3 17
    5.8 18
    6.0 18
    5.9 21
    6.3 21
    5.5 20
    6.5 24

    Based on Hemmings, B., Smith, M., Graydon, J., and Dyson, R. “Effects of massage on physiological restoration, perceived recovery, and repeated sports performance.” British Journal of Sports Medicine, Vol. 34, No. 2, Apr. 2000 (data adapted from Figure 3).

    1. Rank the values of the 16 blood lactate levels.

    2. Rank the values of the 16 perceived recovery values.

    3. Use the ranks from parts a and b to compute Spearman’s rank correlation coefficient. Give a practical interpretation of the result.

    4. Find the rejection region for a test to determine whether y and x are rank correlated. Use α=.10.

    5. What is the conclusion of the test you conducted in part d? State your answer in the words of the problem.

Applying the Concepts—Intermediate

  1. NAME2 9.134 The “name game.” Refer to the Journal of Experimental Psychology—Applied (June 2000) study in which the “name game” was used to help groups of students learn the names of other students in the group, presented in Exercise 9.34 (p. 517). Recall that one goal of the study was to investigate the relationship between proportion y of names recalled by a student and position (order x) of the student during the game. The data for 144 students in the first eight positions are saved in the NAME2 file. The first five and last five observations in the data set are listed in the table, followed by a SAS printout on p. 562.

    1. To properly apply the parametric test for correlation on the basis of the Pearson coefficient of correlation, r (Section 9.5), both the x and y variables must be normally distributed. Demonstrate that this assumption is violated for these data. What are the consequences of the violation?

    2. Find Spearman’s rank correlation coefficient on the accompanying SAS printout and interpret its value.

    3. Find the observed significance level for testing for zero rank correlation on the SAS printout, and interpret its value.

    4. At α=.05, is there sufficient evidence of rank correlation between proportion y of names recalled by a student and position (order x) of the student during the game?

    Data for Exercise 9.134

    Position Recall
    2 0.04
    2 0.37
    2 1.00
    2 0.99
    2 0.79
    9 0.72
    9 0.88
    9 0.46
    9 0.54
    9 0.99

    Based on Morris, P. E., and Fritz, C. O. “The name game: Using retrieval practice to improve the learning of names.” Journal of Experimental Psychology—Applied, Vol. 6, No. 2, June 2000 (data simulated from Figure 2).

    SAS Output for Exercise 9.134

  2. MTBE 9.135 Groundwater contamination of wells. Environmental Science & Technology (Jan. 2005) published an investigation of MTBE contamination in 70 New Hampshire groundwater wells. The researchers wanted an estimate of the correlation between the MTBE level of a groundwater well and the depth (in feet) of a well. Because MTBE level is not normally distributed, they employed Spearman’s rank correlation method. Also, because earlier analyses indicated that public and private wells have different MTBE distributions, the rank correlation was computed separately for each well class. The SPSS printout of the analysis is shown below. Interpret the results practically.

  1. PUNISH 9.136 Do nice guys finish first or last? Refer to the Nature (Mar. 20, 2008) study of whether the saying “nice guys finish last” applies to the competitive corporate world, Exercise 9.22 (p. 512). Recall that college students repeatedly played a version of the game “prisoner’s dilemma,” where competitors choose cooperation, defection, or costly punishment. At the conclusion of the games, the researchers recorded the average payoff and the number of times punishment was used for each player. The data in the table are representative of the data obtained in the study. The researchers concluded that “punishers tend to have lower payoffs.” Do you agree? Use Spearman’s rank correlation statistic to support your conclusion.

    Punish Payoff
    0 0.50
    1 0.20
    2 0.30
    3 0.25
    4 0.00
    5 0.30
    6 0.10
    8 0.20
    10 0.15
    12 0.30
    14 0.10
    16 0.20
    17 0.25
  2. TASTE 9.137 Taste-testing scales. Refer to the Journal of Food Science (Feb. 2014) taste-testing study, Exercise 9.86 (p. 539). Recall that a sample of 200 subjects used the perceived hedonic intensity (PHI) scale to rate their most favorite and least favorite food. In addition, each rated the sensory intensity of four solutions using the average perceived sensory intensity (PSI) scale. The SPSS printout below shows the Spearman rank correlation between perceived sensory intensity (PSI) and perceived hedonic intensity for both favorite (PHI-F) and least favorite (PHI-L) foods. According to the researchers, “the palatability of the favorite and least favorite foods varies depending on the perceived intensity of taste: Those who experience the greatest taste intensity (that is, supertasters) tend to experience more extreme food likes and dislikes.” In Exercise 9.86 , you used Pearson correlations to support the accuracy of this statement. Now examine the Spearman correlations. Do these values also support the statement? Explain.

  3. EMPATHY 9.138 Pain empathy and brain activity. Refer to the Science (Feb. 20, 2004) study on the relationship between brain activity and pain-related empathy in persons who watch others in pain, presented in Exercise 9.72 (p. 531). Recall that 16 female partners watched while painful stimulation was applied to the finger of their respective male partners. The two variables of interest were y=female's pain-related brain activity (measured on a scale ranging from 2 to 2) and x=female's score on the Empathic Concern Scale (0 to 25 points). The data are reproduced in the accompanying table. Use Spearman’s rank correlation test to answer the research question “Do people scoring higher in empathy show higher pain-related brain activity?”

    Couple Brain Activity (y) Empathic Concern (x)
    1 .05 12
    2 .03 13
    3 .12 14
    4 .20 16
    5 .35 16
    6 0 17
    7 .26 17
    8 .50 18
    9 .20 18
    10 .21 18
    11 .45 19
    12 .30 20
    13 .20 21
    14 .22 22
    15 .76 23
    16 .35 24

    Based on Singer, T., et al. “Empathy for pain involves the affective but not sensory components of pain.” Science, Vol. 303, Feb. 20, 2004. (Adapted from Figure 4.)

  4. SCHEALTH 9.139 Food availability at middle schools. Refer to the Journal of School Health (Dec. 2009) study of identifying and quantifying food items in the a la carte line at a middle school, Exercise 7.99 (p. 420). Recall that two methods were compared—a detailed inventory approach and a checklist approach—for a sample of 36 middle schools. The data on percent of food items deemed healthy for each subject is saved in the SCHEALTH file. Use Spearman’s rank correlation coefficient to measure the strength of the association between the percentage determined using the inventory method and the percentage found using the checklist method. Conduct a test (at α=.05) for positive rank correlation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.158.47