2.8 Graphing Bivariate Relationships (Optional)

Teaching Tip

Draw different scatterplots to illustrate the difference between positive and negative correlation.

The claim is often made that the crime rate and the unemployment rate are “highly correlated.” Another popular belief is that smoking and lung cancer are “related.” Some people even believe that the Dow Jones Industrial Average and the lengths of fashionable skirts are “associated.” The words correlated, related, and associated imply a relationship between two variables—in the examples just mentioned, two quantitative variables.

One way to describe the relationship between two quantitative variables—called a bivariate relationship—is to construct a scatterplot. A scatterplot is a two-dimensional plot, with one variable’s values plotted along the vertical axis and the others along the horizontal axis. For example, Figure 2.35 is a scatterplot relating (1) the cost of mechanical work (heating, ventilating, and plumbing) to (2) the floor area of the building for a sample of 26 factory and warehouse buildings. Note that the scatterplot suggests a general tendency for mechanical cost to increase as building floor area increases.

Figure 2.35

Scatterplot of cost vs. floor area

When an increase in one variable is generally associated with an increase in the second variable, we say that the two variables are “positively related” or ­“positively correlated.”* Figure 2.35 implies that mechanical cost and floor area are positively correlated. Alternatively, if one variable has a tendency to decrease as the other increases, we say the variables are “negatively correlated.” Figure 2.36 shows several hypothetical scatterplots that portray a positive bivariate relationship (Figure 2.36a), a negative bivariate relationship (Figure 2.36b), and a situation in which the two variables are unrelated (Figure 2.36c).

Figure 2.36

Hypothetical bivariate relationships

Example 2.19 Graphing Bivariate Data—Hospital Application

Problem

  1. A medical item used to administer to a hospital patient is called a factor. For example, factors can be intravenous (IV) tubing, IV fluid, needles, shave kits, bedpans, diapers, dressings, medications, and even code carts. The coronary care unit at Bayonet Point Hospital (St. Petersburg, Florida) recently investigated the relationship between the number of factors administered per patient and the patient’s length of stay (in days). Data on these two variables for a sample of 50 coronary care patients are given in Table 2.9. Use a scatterplot to describe the relationship between the two variables of interest: number of factors and length of stay.

    Table 2.9 Medfactors Data on Patients’ Factors and Length of Stay

    Number of Factors Length of Stay (days)
    231 9
    323 7
    113 8
    208 5
    162 4
    117 4
    159 6
    169 9
     55 6
     77 3
    103 4
    147 6
    230 6
     78 3
    525 9
    121 7
    248 5
    233 8
    260 4
    224 7
    472 12
    220 8
    383 6
    301 9
    262 7
    354 11
    142 7
    286 9
    341 10
    201 5
    158 11
    243 6
    156 6
    184 7
    115 4
    202 6
    206 5
    360 6
     84 3
    331 9
    302 7
     60 2
    110 2
    131 5
    364 4
    180 7
    134 6
    401 15
    155 4
    338 8

    Based on Bayonet Point Hospital, Coronary Care Unit.

    Data Set: MEDFAC

    Figure 2.37

    SPSS scatterplot for medical factors data in Table 2.9

Solution

  1. Rather than construct the plot by hand, we resort to a statistical software package. The SPSS plot of the data in Table 2.9, with length of stay (LOS) on the vertical axis and number of factors (FACTORS) on the horizontal axis, is shown in Figure 2.37.

    Although the plotted points exhibit a fair amount of variation, the scatterplot clearly shows an increasing trend. It appears that a patient’s length of stay is positively correlated with the number of factors administered to the patient.

Look Back

If hospital administrators can be confident that the sample trend shown in Figure 2.37 accurately describes the trend in the population, then they may use this information to improve their forecasts of lengths of stay for future patients.

Now Work Exercise 2.157

The scatterplot is a simple, but powerful, tool for describing a bivariate relationship. However, keep in mind that it is only a graph. No measure of reliability can be attached to inferences made about bivariate populations based on scatterplots of sample data. The statistical tools that enable us to make inferences about bivariate relationships are presented in Chapter 11.

Exercises 2.1542.169

Understanding the Principles

  1. 2.154 Define a bivariate relationship.

  2. 2.155 For what types of variables, quantitative or qualitative, are scatterplots useful?

  3. 2.156 What is the difference between positive association and negative association as it pertains to the relationship between two variables?

Learning the Mechanics

  1. L02157 2.157 Construct a scatterplot for the data in the table that follows. Do you detect a trend?

    Alternate View
    Variable #1:  5 3 1 2 7 6 4 0  8
    Variable #2: 14 3 10 1 8 5 3 2 12
  2. L02158 2.158 Construct a scatterplot for the data in the table that follows. Do you detect a trend?

    Alternate View
    Variable #1: .5 1 1.5 2 2.5  3 3.5  4 4.5  5
    Variable #2: 2 1   3 4   6 10   9 12 17 17

Applying the Concepts—Basic

  1. SAT 2.159 Comparing SAT scores. Refer to Exercise 2.46 (p. 52) and the data on state SAT scores saved in the SAT file. Consider a scatterplot for the data, with 2014 SAT score on the vertical axis and 2011 SAT score on the horizontal axis. What type of trend would you expect to observe? Why? Create the scatterplot and check your answer.

  2. PARKS 2.160 Does elevation impact hitting performance in baseball? The Colorado Rockies play their major league home baseball games in Coors Field, Denver. Each year, the Rockies are among the leaders in team batting statistics (e.g., home runs, batting average, and slugging percentage). Many baseball experts attribute this phenomenon to the “thin air” of Denver—called the “mile-high city” due to its elevation. Chance (Winter 2006) investigated the effects of elevation on slugging percentage in Major League Baseball. Data were compiled on players’ composite slugging percentage at each of 29 cities for a recent season as well as each city’s elevation (feet above sea level). (Selected observations are shown in the next table.) Construct a scatterplot for the data. Do you detect a trend?

    City Slug Pct Elevation
    Anaheim .480 160
    Arlington .605 616
    Atlanta .530 1050
    Baltimore .505 130
    Boston .505 20
    Denver .625 5277
    Seattle .550 350
    San Francisco .510 63
    St. Louis .570 465
    Tampa .500 10
    Toronto .535 566

    Based on Schaffer, J., and Heiny, E. L. “The effects of elevation on slugging percentage in Major League Baseball.” Chance, Vol. 19, No. 1, Winter 2006 (Figure 2).

  3. TRAPS 2.161 Lobster trap placement. Strategic placement of lobster traps is one of the keys for a successful lobster fisherman. An observational study of teams fishing for the red spiny lobster in Baja California Sur, Mexico, was conducted and the results published in Bulletin of Marine Science (Apr. 2010). Two variables measured for each of 8 teams from the Punta Abreojos (PA) fishing cooperative were total catch of lobsters (in kilograms) during the season and average percentage of traps allocated per day to exploring areas of unknown catch (called search frequency). These data are listed in the accompanying table. Graph the data in a scatterplot. What type of trend, if any, do you observe?

    Lobster fishing study data

    Total Catch Search Frequency
    2,785 35
    6,535 21
    6,695 26
    4,891 29
    4,937 23
    5,727 17
    7,019 21
    5,735 20

    Source: Shester, G. G. “Explaining catch variation among Baja California lobster fishers through spatial analysis of trap-placement decisions.” Bulletin of Marine Science, Vol. 86, No. 2, Apr. 2010 (Table 1).

  4. BREAM 2.162 Feeding behavior of fish. The feeding behavior of black bream (a type of fish) spawned in aquariums was studied in Brain, Behavior and Evolution (Apr. 2000). Zoologists recorded the number of aggressive strikes of two black bream feeding at the bottom of the aquarium in the 10-minute period following the addition of food. The number of strikes and age of the fish (in days) were recorded approximately each week for nine weeks, as shown in the table below.

    Week Number of Strikes Age of Fish (days)
    1 85 120
    2 63 136
    3 34 150
    4 39 155
    5 58 162
    6 35 169
    7 57 178
    8 12 184
    9 15 190

    Based on Shand, J., et al. “Variability in the location of the retinal ganglion cell area centralis is correlated with ontogenetic changes in feeding behavior in the Blackbream, Acanthopagrus ‘butcher’.” Brain and Behavior, Vol. 55, No. 4, Apr. 2000 (Figure 10H).

    1. Construct a scatterplot of the data, with number of strikes on the y-axis and age of the fish on the x-axis.

    2. Examine the scatterplot of part a. Do you detect a trend?

  5. BTYPE 2.163 New method for blood typing. In Analytical Chemistry (May 2010), medical researchers tested a new method of typing blood using low cost paper. Blood drops were applied to the paper and the rate of absorption (called blood wicking) was measured. The table gives the wicking lengths (in millimeters) for six blood drops, each at a different antibody concentration. Construct a plot to investigate the relationship between wicking length and antibody concentration. What do you observe?

    Droplet Length (mm) Concentration
    1 22.50 0.0
    2 16.00 0.2
    3 13.50 0.4
    4 14.00 0.6
    5 13.75 0.8
    6 12.50 1.0

    Based on Khan, M. S. “Paper diagnostic for instant blood typing.” Analytical Chemistry, Vol. 82, No. 10, May 2010 (Figure 4b).

Applying the Concepts—Intermediate

  1. BBALL 2.164 Sound waves from a basketball. Refer to the American Journal of Physics (June 2010) study of sound waves in a spherical cavity, Exercise 2.43 (p. 52). The frequencies of sound waves (estimated using a mathematical formula) resulting from the first 24 resonances (echoes) after striking a basketball with a metal rod are reproduced in the table on p. 97. Graph the data in a scatterplot, with frequency on the vertical axis and resonance number on the horizontal axis. Since a mathematical formula was used to estimate frequency, the researcher expects an increasing trend with very little variation. Does the graph support the researcher’s theory?

    Based on Russell, D. A. “Basketballs as spherical acoustic cavities.” American Journal of Physics, Vol. 78, No. 6, June 2010 (Table I).

    Data for Exercise 2.164

    Resonance Frequency
     1  979
     2 1572
     3 2113
     4 2122
     5 2659
     6 2795
     7 3181
     8 3431
     9 3638
    10 3694
    11 4038
    12 4203
    13 4334
    14 4631
    15 4711
    16 4993
    17 5130
    18 5210
    19 5214
    20 5633
    21 5779
    22 5836
    23 6259
    24 6339

    Source: Russell, D. A. “Basketballs as spherical acoustic cavities.” American Journal of Physics, Vol. 48, No. 6, June 2010 (Table I).

  2. CLIFFS 2.165 Plants that grow on Swiss cliffs. A rare plant that grows on the limestone cliffs of the Northern Swiss Jura mountains was studied in Alpine Botany (Nov. 2012). The researchers collected data from a sample of 12 limestone cliffs. Several of the variables measured for each cliff included the altitude above sea level (meters), plant population size (number of plants growing), and molecular variance (i.e., the variance in molecular weight of the plants). These data are provided in the accompanying table. The researchers are interested in whether either altitude or population size is related to molecular variance.

    Alternate View
    Cliff Number Altitude Population Size Molecular Variance
    1 468 147 59.8
    2 589 209 24.4
    3 700  28 42.2
    4 664 177 59.5
    5 876 248 65.8
    6 909  53 17.7
    7 1032  33 12.5
    8 952 114 27.6
    9 832 217 35.9
    10 1099  10 13.3
    11 982   8  3.6
    12 1053  15  3.2

    Source: Rusterholz, H., Aydin, D., and Baur, B. “Population structure and genetic diversity of relict populations of Alyssum montanum on limestone cliffs in the Northern Swiss Jura mountains.” Alpine Botany, Vol. 122, No. 2, Nov. 2012 (Tables 1 and 2).

    1. Use a scatterplot to investigate the relationship between molecular variance and altitude. Do you detect a trend?

    2. Use a scatterplot to investigate the relationship between molecular variance and population size. Do you detect a trend?

  3. FRAG 2.166 Forest fragmentation study. Ecologists classify the cause of forest fragmentation as either anthropogenic (i.e., due to human development activities, such as road construction or logging) or natural in origin (e.g., due to wetlands or wildfire). Conservation Ecology (Dec. 2003) published an article on the causes of fragmentation for 54 South American forests. Using advanced high-resolution satellite imagery, the researchers developed two fragmentation indexes for each forest—one for anthropogenic fragmentation and one for fragmentation from natural causes. The values of these two indexes (where higher values indicate more fragmentation) for five of the forests in the sample are shown in the table below. The data for all 54 forests are saved in the FRAG file.

    Ecoregion (forest) Anthropogenic Index Natural Origin Index
    Araucaria moist forests 34.09 30.08
    Atlantic Coast restingas 40.87 27.60
    Bahia coastal forests 44.75 28.16
    Bahia interior forests 37.58 27.44
    Bolivian Yungas 12.40 16.75

    Based on Wade, T. G., et al. “Distribution and causes of global forest fragmentation.” Conservation Ecology, Vol. 72, No. 2, Dec. 2003 (Table 6).

    1. Ecologists theorize that an approximately linear (straight-line) relationship exists between the two fragmentation indexes. Graph the data for all 54 forests. Does the graph support the theory?

    2. Delete the data for the three forests with the largest anthropogenic indexes, and reconstruct the graph of part a. Comment on the ecologists’ theory.

  4. ANTS 2.167 Mongolian desert ants. Refer to the Journal of Biogeography (Dec. 2003) study of ants in Mongolia, presented in Exercise 2.68 (p. 63). Data on annual rainfall, maximum daily temperature, percentage of plant cover, number of ant species, and species diversity index recorded at each of 11 study sites are saved in the ANTS file.

    1. Construct a scatterplot to investigate the relationship between annual rainfall and maximum daily temperature. What type of trend (if any) do you detect?

    2. Use scatterplots to investigate the relationship that annual rainfall has with each of the other four variables in the data set. Are the other variables positively or negatively related to rainfall?

Applying the Concepts—Advanced

  1. LSPILL 2.168 Spreading rate of spilled liquid. A contract engineer at DuPont Corp. studied the rate at which a spilled volatile liquid will spread across a surface (Chemical Engineering Progress, Jan. 2005). Suppose that 50 gallons of methanol spills onto a level surface outdoors. The engineer uses derived empirical formulas (assuming a state of turbulence-free convection) to calculate the mass (in pounds) of the spill after a period ranging from 0 to 60 minutes. The calculated mass values are given in the table on the next page. Is there evidence to indicate that the mass of the spill tends to diminish as time increases?

    Data for Exercise 2.168

    Time (minutes) Mass (pounds)
     0 6.64
     1 6.34
     2 6.04
     4 5.47
     6 4.94
     8 4.44
    10 3.98
    12 3.55
    14 3.15
    16 2.79
    18 2.45
    20 2.14
    22 1.86
    24 1.60
    26 1.37
    28 1.17
    30 0.98
    35 0.60
    40 0.34
    45 0.17
    50 0.06
    55 0.02
    60 0.00

    Based on Barry, J. “Estimating rates of spreading and evaporation of volatile liquids.” Chemical Engineering Progress, Vol. 101, No. 1. Jan. 2005.

  2. PGA 2.169 Ranking driving performance of professional golfers. Refer to The Sport Journal (Winter 2007) analysis of a new method for ranking the total driving performance of golfers on the PGA tour, Exercise 2.66 (p. 62). Recall that the method uses both the average driving distance (in yards) and the driving accuracy (percent of drives that land in the fairway). Data on these two variables for the top 40 PGA golfers are saved in the PGA file. A professional golfer is practicing a new swing to increase his average driving distance. However, he is concerned that his driving accuracy will be lower. Is his concern reasonable? Explain.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.166.246