4.6 Descriptive Methods for Assessing Normality

In the chapters that follow, we learn how to make inferencesInferences can be made about the population on the basis of information contained in the sample. Several of these techniques are based on the assumption that the population is approximately normally distributed. Consequently, it will be important to determine whether the sample data come from a normal population before we can apply these techniques properly.

A number of descriptive methods can be used to check for normality. In this section, we consider the four methods summarized in the following box:

Determining whether the Data Are from an Approximately Normal Distribution

  1. Construct either a histogram or stem-and-leaf display for the data, and note the shape of the graph. If the data are approximately normal, the shape of the histogram or stem-and-leaf display will be similar to the normal curve shown in Figure 4.15 (i.e., the display will be mound shaped and symmetric about the mean).

  2. Compute the intervals ˉx±s,ˉx±2s, and x±3s, and determine the percentage of measurements falling into each. If the data are approximately normal, the percentages will be approximately equal to 68%, 95%, and 100%, respectively.

  3. Find the interquartile range IQR and standard deviation s for the sample, and then calculate the ratio IQR/s. If the data are approximately normal, then IQR/s1.3.

  4. Construct a normal probability plot for the data. If the data are approximately normal, the points will fall (approximately) on a straight line.

The first two methods come directly from the properties of a normal distribution established in Section 4.5. Method 3 is based on the fact that for normal distributions, the z values corresponding to the 25th and 75th percentiles are .67 and .67, respectively. (See Example 4.17.) Since σ=1 for a standard normal distribution,

[&*frac*{~rom~IQR~normal~}{|sig|}|=|~norm~*frac*{Q_{~it~U}|-|~normal~Q_{~it~L}}{|sig|~normal~}|=|~norm~*frac*{.67|-||pbo||minus|.67|pbc|}{1}|=|1.34 &]

IQRσ=QUQLσ=.67(6.7)1=1.34

The final descriptive method for checking normality is based on a normal probability plot. In such a plot, the observations in a data set are ordered from smallest to largest and are then plotted against the expected z-scores of observations calculated under the assumption that the data come from a normal distribution. When the data are, in fact, normally distributed, a linear (straight-line) trend will result. A nonlinear trend in the plot suggests that the data are nonnormal.

A normal probability plot for a data set is a scatterplot with the ranked data values on one axis and their corresponding expected z-scores from a standard normal distribution on the other axis. [Note: Computation of the expected standard normal z-scores is beyond the scope of this text. Therefore, we will rely on available statistical software packages to generate a normal probability plot.]

EPAGAS Example 4.24 Checking for Normal Data—EPA Estimated Gas Mileages

Problem

  1. The EPA mileage ratings on 100 cars, first presented in Chapter 2 (p. 42), are reproduced shown in Table 4.6. Recall that theseThese data are saved in the EPAGAS file. Numerical and graphical descriptive measures for the data are shown on the MINITAB and SPSS printouts presented in Figure 4.30ac. Determine whether the EPA mileage ratings are from an approximate normal distribution.

Solution

  1. As a first check, we examine the MINITAB histogram of the data shown in Figure 4.30a. Clearly, the mileages fall into an approximately mound shaped, symmetric distribution centered around the mean of about 37 mpg. Note that a normal curve is superimposed on the figure. Therefore, check #1 in the box indicates that the data are approximately normal.

    Table 4.6 EPA Gas Mileage Ratings for 100 Cars (miles per gallon)

    Alternate View
    36.3 41.0 36.9 37.1 44.9 36.8 30.0 37.2 42.1 36.7
    32.7 37.3 41.2 36.6 32.9 36.5 33.2 37.4 37.5 33.6
    40.5 36.5 37.6 33.9 40.2 36.4 37.7 37.7 40.0 34.2
    36.2 37.9 36.0 37.9 35.9 38.2 38.3 35.7 35.6 35.1
    38.5 39.0 35.5 34.8 38.6 39.4 35.3 34.4 38.8 39.7
    36.3 36.8 32.5 36.4 40.5 36.6 36.1 38.2 38.4 39.3
    41.0 31.8 37.3 33.1 37.0 37.6 37.0 38.7 39.0 35.8
    37.0 37.2 40.7 37.4 37.1 37.8 35.9 35.6 36.7 34.5
    37.1 40.3 36.7 37.0 33.9 40.1 38.0 35.2 34.8 39.5
    39.9 36.9 32.9 33.8 39.8 34.0 36.8 35.0 38.1 36.9

    Figure 4.30a

    MINITAB histogram for gas mileage data

    Figure 4.30b

    MINITAB descriptive statistics for gas mileage data

    To apply check #2, we obtain x=37 and s=2.4 from the MINITAB printout of Figure 4.30b. The intervals x±s,x±2s, and x±3s are shown in Table 4.7, as is the percentage of mileage ratings that fall into each interval. These percentages agree almost exactly with those from a normal distribution.

    Check #3 in the box requires that we find the ratio IQR/s. From Figure 4.30b, the 25th percentile (labeled Q1 by MINITAB) is QL=35.625 and the 75th percentile (labeled Q3 by MINITAB) is QU=38.375. Then IQR=QUQL=2.75, and the ratio is

    [&*frac*{~rom~IQR~normal~}{s}|=|~norm~*frac*{2.75}{2.4}|=|1.15 &]

    IQRs=2.752.4=1.15

    Since this value is approximately equal to 1.3, we have further confirmation that the data are approximately normal.

    A fourth descriptive method is to interpret a normal probability plot. An SPSS normal probability plot of the mileage data is shown in Figure 4.30c. Notice that the ordered mileage values (shown on the horizontal axis) fall reasonably close to a straight line when plotted against the expected values from a normal distribution. Thus, check #4 also suggests that the EPA mileage data are approximately normally distributed.

Figure 4.30c

SPSS normal probability plot for gas mileage data

Table 4.7 Describing the 100 EPA Mileage Ratings

Interval Percentage in Interval
ˉx±s=(34.6,39.4) 68
ˉx±2s=(32.2,41.8) 96
ˉx±3s=(29.8,44.2) 99

Look Back

The checks for normality given in the box on p. 210 are simple yet powerful techniques to apply, but they are only descriptive in nature. It is possible (although unlikely) that the data are nonnormal even when the checks are reasonably satisfied. Thus, we should be careful not to claim that the 100 EPA mileage ratings are, in fact, normally distributed. We can only state that it is reasonable to believe that the data are from a normal distribution.*

As we will learn in the next chapter, several inferential methods of analysis require the data to be approximately normal. If the data are clearly nonnormal, inferences derived from the method may be invalid. Therefore, it is advisable to check the normality of the data prior to conducting any analysis.

Statistics in Action Revisited

Assessing whether the Normal Distribution Is Appropriate for Modeling the Super Weapon Hit Data

In Statistics in Action Revisited in Section 4.5, we used the normal distribution to find the probability that a single flechette from a super weapon that shoots 1,100 flechettes at once hits one of three targets at 500 meters. Recall that for three range tests, the weapon was always aimed at the center target (i.e., the specification mean was set at μ=5 feet), but the specification standard deviation was varied at σ=1 foot,σ=2 feet, and σ=4feet. Table SIA4.1 shows the calculated normal probabilities of hitting the three targets for the different values of σ, as well as the actual results of the three range tests. (Recall that the actual data are saved in the MOAGUN file.) You can see that the proportion of the 1,100 flechettes that actually hit each target—called the hit ratio—agrees very well with the estimated probability of a hit derived from the normal distribution.

Table SIA4.1 Summary of Normal Probability Calculations and Actual Range Test Results

Alternate View
Target Specification Normal Probability Actual Number of Hits Hit Ratio (Hits/1,100)
LEFT ( 1 to 1) σ=1 .0000   0 .000
σ=2 .0214  30 .027
σ=4 .0919  73 .066
MIDDLE (4 to 6) σ=1 .6826 764 .695
σ=2 .3820 409 .372
σ=4 .1974 242 .220
RIGHT (9 to 11) σ=1 .0000   0 .000
σ=2 .0214  23 .021
σ=4 .0919  93 .085

Consequently, it appears that our assumption that the horizontal hit measurements are approximately normally distributed is reasonably satisfied. Further evidence that this assumption is satisfied is provided by the MINITAB histograms of the horizontal hit measurements shown in Figures SIA4.3ac. The normal curves superimposed on the histograms fit the data very well.

Figure SIA4.3a

MINITAB histogram for the horizontal hit measurements when σ=1

Figure SIA4.3b

MINITAB histogram for the horizontal hit measurements when σ=2

Figure SIA4.3c

MINITAB histogram for the horizontal hit measurements when σ=4

Data Set: MOAGUN

Exercises 4.103–4.124

Understanding the Principles

  1. 4.103 Why is it important to check whether the sample data come from a normal population?

  2. 4.104 Give four methods for determining whether the sample data come from a normal population.

  3. 4.105 If a population data set is normally distributed, what is the proportion of measurements you would expect to fall within the following intervals?

    1. μ±σ

    2. μ±2σ

    3. μ±3σ

  4. 4.106 What is a normal probability plot and how is it used?

Learning the Mechanics

  1. 4.107 Normal probability plots for three data sets are shown at the bottom of the page. Which plot indicates that the data are approximately normally distributed?

  2. 4.108 Consider a sample data set with the following summary statistics: s=95,QL=72, and QU=195.

    1. Calculate IQR.

    2. Calculate IQR/s.

    3. Is the value of IQR/s approximately equal to 1.3? What does this imply?

  3. L04109 4.109 Examine the following sample data.

    Alternate View
    32 48 25 135 53 37 5 39 213 165
    109 40 1 146 39 25 21 66 64 57
    197 60 112 10 155 134 301 304 107 82
    35 81 60 95 401 308 180 3 200 59
    1. Construct a stem-and-leaf plot to assess whether the data are from an approximately normal distribution.

    2. Find the values of QL,QU, and s for the sample data.

    3. Use the results from part b to assess the normality of the data.

    4. Generate a normal probability plot for the data, and use it to assess whether the data are approximately normal.

  4. L04110 4.110 Examine the sample data in the next column.

    Alternate View
    5.9 5.3 1.6 7.4 8.6 3.2 2.1
    4.0 7.3 8.4 5.9 6.7 4.5 6.3
    6.0 9.7 3.5 3.1 4.3 3.3 8.4
    4.6 8.2 6.5 1.1 5.0 9.4 6.4

    Plots for Exercise 4.107

    1. Construct a stem-and-leaf plot to assess whether the data are from an approximately normal distribution.

    2. Compute s for the sample data.

    3. Find the values of QL and QU and the value of s from part b to assess whether the data come from an approximately normal distribution.

    4. Generate a normal probability plot for the data, and use it to assess whether the data are approximately normal.

Applying the Concepts—Basic

  1. ISR 4.111 Irrelevant speech effects. Refer to the analysis of irrelevant speech effects in Exercise 2.34 (p. 49). Recall that irrelevantIrrelevant speech effects refer to the degree to which the memorization process is impaired by irrelevant background speech. In a study published in Acoustical Science & Technology (Vol. 35, 2014), subjects performed a memorization task under two conditions: (1) with irrelevant background speech and (2) in silence. The difference in the error rates for the two conditions—called the relative difference in error rate (RDER)—was computed for each subject. A MINITAB histogram and descriptive statistics pintout summarizing the RDER values for 71 subjects are displayed below and on the next page. Would you recommend using a normal distribution to model the distribution of RDER values? Explain.

    MINITAB output for Exercise 4.111
  2. 4.112 Characteristics of antiwar demonstrators. Refer to the American Journal of Sociology (Jan. 2014) study of the characteristics of antiwar demonstrators in the United States, Exercise 2.106 (p. 77). Based on data collected for over 5,000 antiwar demonstrators over a recent 3-year period, the researchers found that the mean number of protest organizations joined by the demonstrators was .90 with a standard deviation of 1.10 and a median of 1. Explain why x, the number of protest organizations joined by a randomly selected demonstrator, cannot be exactly normally distributed.

  3. 4.113 Software file updates. Software configuration management was used to monitor a software engineering team’s performance at Motorola, Inc. (Software Quality Professional, Nov. 2004). One of the variables of interest was the number of updates to a file changed because of a problem report. Summary statistics for n=421 files yielded the following results: x=4.71,s=6.09,QL=1, and QU=6. Are these data approximately normally distributed? Explain. 

  4. JRC 4.114 Shear strength of rock fractures. Understanding the characteristics of rock masses, especially the nature of the fractures, are essential when building dams and power plants. The shear strength of rock fractures was investigated in Engineering Geology (May 12, 2010). The Joint Roughness Coefficient (JRC) was used to measure shear strength. Civil engineers collected JRC data for over 750 rock fractures. The results (simulated from information provided in the article) are summarized in the SPSS histogram shown below. Should the engineers use the normal probability distribution to model the behavior of shear strength for rock fractures? Explain. 

    Based on Pooyan Asadollahi and Fulvio Tonon, “Constitutive model for rock fractures: Revisiting Barton’s empirical model.” Engineering Geology, Vol. 113, no. 1, pp. 11–32.

  5. 4.115 Estimating glacier elevations. Digital elevation models (DEMs) are now used to estimate elevations and slopes of remote regions. In Arctic, Antarctic, and Alpine Research (May 2004), geographers analyzed reading errors from maps produced by DEMs. Two readers of a DEM map of White Glacier (in Canada) estimated elevations at 400 points in the area. The difference between the elevation estimates of the two readers had a mean of μ=.28 meter and a standard deviation of σ=1.6 meters. A histogram of the difference (with a normal histogram superimposed on the graph) is shown below.

    1. On the basis of the histogram, the researchers concluded that the difference between elevation estimates is not normally distributed. Why?

    2. Will the interval μ±2σ contain more than 95%, exactly 95%, or less than 95% of the 400 elevation differences? Explain.

      “Uncertainty in digital elevation models of Axel Heiberg Island. Arctic Canada,” Arctic, Antarctic, and Alpine Research, Vol. 36, No. 2, May 2004 (Figure 3). © Regents of the University of Colorado. Reprinted with permission.

  6. TABLET 4.116 Drug content assessment. Scientists at GlaxoSmithKline Medicines Research Center used high-performance liquid chromatography (HPLC) to determine the amount of drug in a tablet produced by the company (Analytical Chemistry, Dec. 15, 2009). Drug concentrations (measured as a percentage) for 50 randomly selected tablets are listed in the accompanying table.

    Alternate View
    91.28 92.83 89.35 91.90 82.85 94.83 89.83 89.00 84.62
    86.96 88.32 91.17 83.86 89.74 92.24 92.59 84.21 89.36
    90.96 92.85 89.39 89.82 89.91 92.16 88.67 89.35 86.51
    89.04 91.82 93.02 88.32 88.76 89.26 90.36 87.16 91.74
    86.12 92.10 83.33 87.61 88.20 92.78 86.35 93.84 91.20
    93.44 86.77 83.77 93.19 81.79

    Based on Borman, P. J., et al., “Design and analysis of method equivalence studies.” Analytical Chemistry, Vol. 81, No. 24, Dec. 15, 2009 (Table 3).

    1. Descriptive statistics for the drug concentrations are shown at the top of the following SPSS printout the next page Use this information to assess whether the data are approximately normal.

      SPSS Output for Exercise 4.116

    2. An SPSS normal probability plot is shown above. Use this information to assess whether the data are approximately normal.

Applying the Concepts—Intermediate

  1. SILICA 4.117 Mineral flotation in water study. Refer toRefer to the Minerals Engineering (Vol. 46–47, 2013) study of the impact of calcium and gypsum on the flotation properties of silica in water, Exercise 2.48 (p. 53). Recall that 50Fifty solutions of deionized water were prepared both with and without calcium/gypsum, and the level of flotation of silica in the solution was measured using a variable called zeta potential (measured in millivolts, mV). The data (simulated, based on information provided in the journal article) are reproduced in the tables at the bottom of the page. Which of the two zeta potential distributions, without calcium/gypsum or with calcium/gypsum, can be approximated by a normal distribution?

    Data for Exercise 4.117

    Alternate View
    Without calcium/gypsum
    47.1 53.0 50.8 54.4 57.4 49.2 51.5 50.2 46.4 49.7
    53.8 53.8 53.5 52.2 49.9 51.8 53.7 54.8 54.5 53.3
    50.6 52.9 51.2 54.5 49.7 50.2 53.2 52.9 52.8 52.1
    50.2 50.8 56.1 51.0 55.6 50.3 57.6 50.1 54.2 50.7
    55.7 55.0 47.4 47.5 52.8 50.6 55.6 53.2 52.3 45.7
    With calcium/gypsum
    9.2 11.6 10.6 8.0 10.9 10.0 11.0 10.7 13.1 11.5
    11.3 9.9 11.8 12.6 8.9 13.1 10.7 12.1 11.2 10.9
    9.1 12.1 6.8 11.5 10.4 11.5 12.1 11.3 10.7 12.4
    11.5 11.0 7.1 12.4 11.4 9.9 8.6 13.6 10.1 11.3
    13.0 11.9 8.6 11.3 13.0 12.2 11.3 10.5 8.8 13.4
  2. HABITAT 4.118 Habitats of endangered species. An evaluation of the habitats of endangered salmon species was performed in Conservation Ecology (Dec. 2003). The researchers identified 734 sites (habitats) for Chinook, coho, or steelhead salmon species in Oregon and assigned a habitat quality score to each site. (Scores range from 0 to 36 points, with lower scores indicating poorly maintained or degraded habitats.) The data are saved in the HABITAT file. Give your opinion on whether the habitat quality score is normally distributed.

  3. MLBAL MLBNL 4.119 Baseball batting averages. Major League Baseball (MLB) has two leagues: the American League (AL), which utilizes the designated hitter (DH) to bat for the pitcher, and the National League (NL), which does not allow the DH. A player’s batting average is computed by dividing the player’s total number of hits by his official number of at bats. The batting averages for all AL and NL players with at least 100 official at bats during the 2013 season are stored in the MLBAL and MLBNL files, respectively. Determine whether each batting average distribution is approximately normal.

  4. PGA 4.120 Ranking the driving performance of professional golfers. Refer toConsider The Sport Journal (Winter 2007) article on a new method for ranking the driving performance of PGA golfers, presented in Exercise 2.66 (p. 62). Recall that the method incorporates a golfer’s average driving distance (yards) and driving accuracy (percentage of drives that land in the fairway) into a driving performance index. Data on these three variables for the top 40 PGA golfers are saved in the PGA file. Determine which of the variables—driving distance, driving accuracy, and driving performance index—are approximately normally distributed.

  5. SAND 4.121 Permeability of sandstone during weathering. Refer toConsider the Geographical Analysis (Vol. 42, 2010) study of the decay properties of sandstone when exposed to the weather, Exercise 2.69 (p. 63). Recall that blocksBlocks of sandstone were cut into 300 equal-sized slices and the slices randomly divided into three groups of 100 slices each. Slices in group A were not exposed to any type of weathering; slices in group B were repeatedly sprayed with a 10% salt solution (to simulate wetting by driven rain) under temperate conditions; and slices in group C were soaked in a 10% salt solution and then dried (to simulate blocks of sandstone exposed during a wet winter and dried during a hot summer). All sandstone slices were then tested for permeability, measured in milliDarcies (mD). The data for the study (simulated) are saved in the SAND file. Is it plausible to assume that the permeability measurements in any of the three experimental groups are approximately normally distributed?

  6. SANIT 4.122 Cruise ship sanitation scores. Refer toConsider the data on the Aug. 2013 sanitation scores for 186 cruise ships, presented in Exercise 2.41 (p. 51). The data are saved in the SANIT file. Assess whether the sanitation scores are approximately normally distributed.

Applying the Concepts—Advanced

  1. 4.123 Blond hair types in the Southwest Pacific. Refer toConsider the American Journal of Physical Anthropology (Apr. 2014) study of a mutation of blond-hair genotypes, Exercise 2.141 (p. 91). Recall that forFor each of 550 Southwest Pacific islanders, the effect of the mutation on hair pigmentation was measured with the melanin (M) index, where M ranges between 4 and 4. Box plots showing the distribution of M for three different genotypes—CC, CT, and TT—are reproduced below (left column). Use the guidelines of this section and Exercise 5.51 (p. 274) to determine whether any of these distributions are normally distributed.

  2. ECOPHD 4.124 Ranking Ph.D. programs in economics. Refer to Southern Economic Journal (Apr. 2008) rankings of Ph.D. programs in economics at 129 colleges and universities, Exercise 2.129 (p. 83). Recall that theThe number of publications published by faculty teaching in the Ph.D. program and the quality of the publications were used to calculate an overall productivity score for each program. The mean and standard deviation of these 129 productivity scores were then used to compute a z-score for each economics program. The data (z-scores) for all 129 economic programs are saved in the ECOPHD file. A MINITAB normal probability plot for the z-scores is shown in the accompanying printout.

    1. Use the graph to assess whether the data are approximately normal.

    2. Based on the graph, determine the nature of the skewness of the data.

4.7 Approximating a Binomial Distribution with a Normal Distribution (Optional)

When n is large, a normal probability distribution may be used to provide a good approximation to the probability distribution of a binomial random variable. To show how this approximation works, we refer to Example 4.13, in which we used the binomial distribution to model the number x of 20 eligible voters who favor a candidate. We assumed that 60% of all the eligible voters favored the candidate. The mean and standard deviation of x were found to be μ=12 and σ=2.2, respectively. The binomial distribution for n=20 and p=.6 is shown in Figure 4.31, and the approximating normal distribution with mean μ=12 and standard deviation σ=2.2 is superimposed.

Figure 4.31

Binomial distribution for n=20,p=.6 and normal distribution with μ=12,σ=2.2

As part of Example 4.13, we used Table I to find the probability that x10. This probability, which is equal to the sum of the areas contained in the rectangles (shown in Figure 4.31) that correspond to p(0),p(1),p(2),,p(10), was found to equal .245. The portion of the normal curve that would be used to approximate the area p(0)+p(1)+p(2)++p(10) is highlighted in Figure 4.31. Note that this highlighted area lies to the left of 10.5 (not 10), so we may include all of the probability in the rectangle corresponding to p(10). Because we are approximating a discrete distribution (the binomial) with a continuous distribution (the normal), we call the use of 10.5 (instead of 10 or 11) a correction for continuity. That is, we are correcting the discrete distribution so that it can be approximated by the continuous one. The use of the correction for continuity leads to the calculation of the following standard normal z-value:

z=xμσ=10.5122.2=.68

Using Table II, we find the area between z=0 and z=.68 to be .2517. Then the probability that x is less than or equal to 10 is approximated by the area under the normal distribution to the left of 10.5, shown highlighted in Figure 4.31. That is,

P(x10)P(z.68)=.5P(.68<z0)=.5.2517=.2483

The approximation differs only slightly from the exact binomial probability, .245. Of course, when tables of exact binomial probabilities are available, we will use the exact value rather than a normal approximation.

The normal distribution will not always provide a good approximation to binomial probabilities. The following is a useful rule of thumb to determine when n is large enough for the approximation to be effective: The interval μ±3σ should lie within the range of the binomial random variable x (i.e., from 0 to n) in order for the normal approximation to be adequate. The rule works well because almost all of the normal distribution falls within three standard deviations of the mean, so if this interval is contained within the range of x values, there is “room” for the normal approximation to work.

As shown in Figure 4.32a for the preceding example with n=20 and p=.6, the interval μ±3σ=12+3(2.2)=(5.4,18.6) lies within the range from 0 to 20. However, if we were to try to use the normal approximation with n=10 and p=.1, the interval μ±3σ becomes 1±3(.95), or (1.85,3.85). As shown in Figure 4.32b, this interval is not contained within the range of x, since x=0 is the lower bound for a binomial random variable. Note in Figure 4.32b that the normal distribution will not “fit” in the range of x; therefore, it will not provide a good approximation to the binomial probabilities.

Figure 4.32

Rule of thumb for normal approximation to binomial probabilities

Biography Abraham De Moivre (1667–1754)

Advisor to Gamblers

French-born mathematician Abraham de Moivre moved to London when he was 21 years old to escape religious persecution. In England, he earned a living first as a traveling teacher of mathematics and then as an advisor to gamblers, underwriters, and annuity brokers. De Moivre’s major contributions to probability theory are contained in two of his books: The Doctrine of Chances (1718) and Miscellanea Analytica (1730). In these works, he defines statistical independence, develops the formula for the normal probability distribution, and derives the normal curve as an approximation to the binomial distribution. Despite his eminence as a mathematician, de Moivre died in poverty. He is famous for using an arithmetic progression to predict the day of his death.

Example 4.25 Approximating a Binomial Probability with the Normal Distribution—Lot Acceptance Sampling

Problem

  1. One problem with any product that is mass produced (e.g., a graphing calculator) is quality control. The process must be monitored or audited to be sure that the output of the process conforms to requirements. One monitoring method is lot acceptance sampling, in which items being produced are sampled at various stages of the production process and are carefully inspected. The lot of items from which the sample is drawn is then accepted or rejected on the basis of the number of defectives in the sample. Lots that are accepted may be sent forward for further processing or may be shipped to customers; lots that are rejected may be reworked or scrapped. For example, suppose a manufacturer of calculators chooses 200 stamped circuits from the day’s production and determines x, the number of defective circuits in the sample. Suppose that up to a 6% rate of defectives is considered acceptable for the process.

    1. Find the mean and standard deviation of x, assuming that the rate of defectives is 6%.

    2. Use the normal approximation to determine the probability that 20 or more defectives are observed in the sample of 200 circuits (i.e., find the approximate probability that x20).

Solution

  1. The random variable x is binomial with n=200 and the fraction defective p=.06. Thus,

    μ=np=200(.06)=12σ=npq=200(.06)(.94)=11.28=3.36

    We first note that

    μ±3σ=12±3(3.36)=12±10.08=(1.92,22.08)

    lies completely within the range from 0 to 200. Therefore, a normal probability distribution should provide an adequate approximation to this binomial distribution.

  2. By the rule of complements, P(x20)=1P(x19). To find the approximating area corresponding to x19, refer to Figure 4.33. Note that we want to include all the binomial probability histograms from 0 to 19, inclusive. Since the event is of the form xa, the proper correction for continuity is a+.5=19+.5=19.5. Thus, the z value of interest is

    Figure 4.33

    Normal approximation to the binomial distribution with n=200,p=.06

    z=(a+.5)μσ=19.5123.36=2.23

    Referring to Table II in Appendix B, we find that the area to the right of the mean, 0, corresponding to z=2.23 (see Figure 4.34) is .4871. So the area A=P(z2.23) is

    A=.5+.4871=.9871

    Figure 4.34

    Standard normal distribution

    Thus, the normal approximation to the binomial probability we seek is

    P(x20)=1P(x19)1.9871=.0129

    In other words, if, in fact, the true fraction of defectives is .06, then the probability that 20 or more defectives will be observed in a sample of 200 circuits is extremely small.

Look Back

If the manufacturer observes x20, the likely reason is that the process is producing more than the acceptable 6% defectives. The lot acceptance sampling procedure is another example of using the rare-event approach to make inferences.

The steps for approximating a binomial probability by a normal probability are given in the following box:

Using a Normal Distribution to Approximate Binomial Probabilities

  1. After you have determined n and p for the binomial distribution, calculate the interval

    μ±3σ=np±3npq

    If the interval lies in the range from 0 to n, the normal distribution will provide a reasonable approximation to the probabilities of most binomial events.

  2. Express the binomial probability to be approximated in the form P(xa) or P(xb)P(xa). For example,

    P(x<3)=P(x2)P(x5)=1P(x4)P(7x10)=P(x10)P(x6)
  3. For each value of interest, a, the correction for continuity is (a+.5) and the corresponding standard normal z value is

    z=(a+.5)μσ(see Figure 4.35)
  4. Sketch the approximating normal distribution and shade the area corresponding to the probability of the event of interest, as in Figure 4.35. Verify that the rectangles you have included in the shaded area correspond to the probability you wish to approximate. Use the z value(s) you calculated in step 3 to find the shaded area with Table II or technology. This is the approximate probability of the binomial event.

    Figure 4.35

    Approximating binomial probabilities by normal probabilities

Exercises 4.125–4.142

Understanding the Principles

  1. 4.125 For large n (say, n=100), why is it advantageous to use the normal distribution to approximate a binomial probability?

  2. 4.126 Why do we need a correction for continuity when approximating a binomial probability with the normal distribution?

Learning the Mechanics

  1. 4.127 Suppose x is a binomial random variable with p=.4 and n=25.

    1. Would it be appropriate to approximate the probability distribution of x with a normal distribution? Explain.

    2. Assuming that a normal distribution provides an adequate approximation to the distribution of x, what are the mean and variance of the approximating normal distribution?

    3. Use Table I of Appendix B or statistical software to find the exact value of P(x9).

    4. Use the normal approximation to find P(x9).

  2. 4.128 Assume that x is a binomial random variable with n and p as specified in parts a–f that follow. For which cases would it be appropriate to use a normal distribution to approximate the binomial distribution?

    1. n=100,p=.01

    2. n=20,p=.6

    3. n=10,p=.4

    4. n=1,000,p=.05

    5. n=100,p=.8

    6. n=35,p=.7

  3. 4.129 Assume that x is a binomial random variable with n=25 and p=.5. Use Table I of Appendix B and the normal approximation to find the exact and approximate values, respectively, of the following probabilities:

    1. P(x11)

    2. P(x16)

    3. P(8x16)

  4. 4.130 Assume that x is a binomial random variable with n=1,000 and p=.50. Find each of the following probabilities:

    1. P(x>500)

    2. P(490x<500)

    3. P(x>550)

  5. 4.131 Assume that x is a binomial random variable with n=100 and p=.40. Use a normal approximation to find the following:

    1. P(x35)

    2. P(40x50)

    3. P(x38)

Applying the Concepts—Basic

  1. 4.132 Blood diamonds. According to Global Research News (Mar. 4, 2014), one-fourth of all rough diamonds produced in the world are blood diamonds, i.e., diamonds mined to finance war or an insurgency. In a random sample of 700 rough diamonds purchased by a diamond buyer, let x be the number that are blood diamonds.

    1. Find the mean of x.

    2. Find the standard deviation of x.

    3. Find the z-score for the value x=200.

    4. Find the approximate probability that the number of the 700 rough diamonds that are blood diamonds is less than or equal to 200.

  2. 4.133 Working on summer vacation. In Exercise 4.58, (p. 192) you learnedConsider that 35% of U.S. adults do not work at all while on summer vacation (Adweek/Harris July 2011 poll). Now consider a random sample of 1,000 U.S. adults and let x represent the number who do not work during summer vacation.

    1. What is the expected value of x? Interpret this value practically.

    2. Find the variance of x.

    3. Find the approximate probability that x is less than 325.

    4. Would you expect to observe 390 or more U.S. adults in the sample of 1,000 who do not work during summer vacation? Explain.

  3. 4.134 Where will you get your next pet? Refer to Exercise 4.60 (p. 193) and theConsider the Associated Press/Petside.com poll that revealed that half of all pet owners would get their next dog or cat from a shelter (USAToday, May 12, 2010). Consider a random sample of 500 pet owners and define x as the number of pet owners who would acquire their next dog or cat from a shelter. In Exercise 4.70, you determined that x is a binomial random variable.

    1. Compute μ and σ for the probability distribution of x.

    2. Compute the z-score for the value x=240.

    3. Compute the z-score for the value x=270.

    4. Use the technique of this section to approximate P(240<x<270).

  4. 4.135 Cesarean birth study. In Exercise 4.62 (p. 193), you learned that 32% of all births in the United States occur by Cesarean section each year (National Vital Statistics Reports, Mar. 2010). In a random sample of 1,000 births this year, let x be the number that occur by Cesarean section.

    1. Find the mean of x. (This value should agree with your answer to Exercise 4.62 a.)

    2. Find the standard deviation of x. (This value should agree with your answer to Exercise 4.62 b.)

    3. Find the z-score for the value x=200.5.

    4. Find the approximate probability that the number of Cesarean sections in a sample of 1,000 births is less than or equal to 200.

  5. 4.136 LASIK surgery complications. According to studies, 1% of all patients who undergo laser surgery (i.e., LASIK) to correct their vision have serious post-laser vision problems (All About Vision, 2014). In a random sample of 100,000 LASIK patients, let x be the number who experience serious post-laser vision problems.

    1. Find E(x).

    2. Find Var(x).

    3. Find the z-score for x=950.

    4. Find the approximate probability that fewer than 950 patients in a sample of 100,000 will experience serious post-laser vision problems.

Applying the Concepts—Intermediate

  1. 4.137 Ecotoxicological survival study. The Journal of Agricultural, Biological and Environmental Statistics (Sept. 2000) gave an evaluation of the risk posed by hazardous pollutants. In the experiment, guppies (all the same age and size) were released into a tank of natural seawater polluted with the pesticide dieldrin and the number of guppies surviving after five days was determined. The researchers estimated that the probability of any single guppy surviving is .60. If 300 guppies are released into the polluted tank, estimate the probability that fewer than 100 guppies survive after five days. 

  2. 4.138 Chemical signals of mice. Refer toConsider the Cell (May 14, 2010) study of the ability of a mouse to recognize the odor of a potential predator, Exercise 4.61 (p. 193). You learned that 40% of lab mice cells exposed to chemically produced major urinary proteins (Mups) from a cat responded positively (i.e., recognized the danger of the lurking predator). Again, consider a sample of 100 lab mice cells, each exposed to chemically produced cat Mups, and let x represent the number of cells that respond positively. How likely is it that less than half of the cells respond positively to cat Mups? 

  3. 4.139 Fingerprint expertise. Refer to the Psychological Science (Aug. 2011) study of fingerprint identification, Exercise 4.63 (p. 193). Recall that the study found that when presented with prints from the same individual, a novice will correctly identify the match only 75% of the time. Now consider a sample of 120 different pairs of fingerprints where each pair is a match.

    1. Estimate the probability that a novice will correctly identify the match in more than half of all pairs of fingerprints.

    2. Estimate the probability that a novice will correctly identify the match in fewer than 100 pairs of fingerprints.

Applying the Concepts—Advanced

  1. 4.140 Body fat in men. The percentage of fat in the bodies of American men is an approximately normal random variable with mean equal to 15% and standard deviation equal to 2%.

    1. If these values were used to describe the body fat of men in the U.S. Army, and if a measure of 20% or more body fat characterizes the person as obese, what is the approximate probability that a random sample of 10,000 soldiers will contain fewer than 50 who would actually be characterized as obese?

    2. If the Army actually were to check the percentage of body fat for a random sample of 10,000 men, and if only 30 contained 20% (or higher) body fat, would you conclude that the Army was successful in reducing the percentage of obese men below the percentage in the general population? Explain your reasoning.

  2. 4.141 Marital name changing. Refer to the Advances in Applied Sociology (Nov. 2013) study of marital name changing, Exercise 4.78 (p. 206). Recall that the probability that an American female will change her last name upon marriage is .9. In a sample of 500 European females, suppose researchers find that fewer than 400 changed their last name upon marriage. Make an inference about the probability that a European female will change her last name upon marriage.

  3. 4.142 Waiting time at an emergency room. According to Health Affairs (Oct. 28, 2004), the median time a patient waits to see a doctor in a typical U.S. emergency room is 30 minutes. On a day when 150 patients visit the emergency room, what is the approximate probability that

    1. More than half will wait more than 30 minutes?

    2. More than 85 will wait more than 30 minutes?

    3. More than 60, but fewer than 90, will wait more than 30 minutes?

4.8 Sampling Distributions

In previous sections, we assumed that we knew the probability distribution of a random variable, and using this knowledge, we were are able to compute the mean, variance, and probabilities associated with the random variable. However, in most practical applications, this information is not available. To illustrate, in Example 4.13 (p. 190) , we calculated the probability that the binomial random variable x, the number of 20 polled voters who favored a certain mayoral candidate, assumed specific values. To do this, it was necessary to assume some value for p, the proportion of all voters who favored the candidate. Thus, for the purposes of illustration, we assumed that p=.6 when, in all likelihood, the exact value of p would be unknown. In fact, the probable purpose of taking the poll is to estimate p. Similarly, when we modeled the in-city gas mileage of a certain automobile model, we used the normal probability distribution with an assumed mean and standard deviation of 27 and 3 miles per gallon, respectively. In most situations, the true mean and standard deviation are unknown quantities that have to be estimated. Numerical quantities that describe probability distributions are called parameters. Thus, p, the probability of a success in a binomial experiment, and μ and σ, the mean and standard deviation, respectively, of a normal distribution, are examples of parameters.

A parameter is a numerical descriptive measure of a population. Because it is based on the observations in the population, its value is almost always unknown.

We have also discussed the sample mean x, sample variance s2, sample standard deviation s, and the like, which are numerical descriptive measures calculated from the sample. (See Table 4.8 for a list of the statistics covered so far in this text.) We will often use the information contained in these sample statistics to make inferences about the parameters of a population.

A sample statistic is a numerical descriptive measure of a sample. It is calculated from the observations in the sample.

Table 4.8 List of Population Parameters and Corresponding Sample Statistics

Population Parameter Sample Statistic
Mean: μ x
Median: η M
Variance: σ2 s2
Standard deviation: σ s
Binomial proportion: p ˆp

Note that the term statistic refers to a sample quantity and the term parameter refers to a population quantity.

Before we can show you how to use sample statistics to make inferences about population parameters, we need to be able to evaluate their properties. Does one sample statistic contain more information than another about a population parameter? On what basis should we choose the “best” statistic for making inferences about a parameter? For example, if we want to estimate a parameter of a population—say, the population mean μ—we can use a number of sample statistics for our estimate. Two possibilities are the sample mean x and the sample median M. Which of these do you think will provide a better estimate of μ?

Before answering this question, consider the following example: Toss a fair die, and let x equal the number of dots showing on the up face. Suppose the die is tossed three times, producing the sample measurements 2, 2, 6. The sample mean is then x=3.33, and the sample median is M=2. Since the population mean of x is μ=3.5, you can see that, for this sample of three measurements, the sample mean x provides an estimate that falls closer to μ than does the sample median (see Figure 4.36a). Now suppose we toss the die three more times and obtain the sample measurements 3, 4, 6. Then the mean and median of this sample are x=4.33 and M=4, respectively. This time, M is closer to μ. (See Figure 4.36b.)

Figure 4.36

Comparing the sample mean ( x) and sample median (M) as estimators of the population mean (μ)

This simple example illustrates an important point: Neither the sample mean nor the sample median will always fall closer to the population mean. Consequently, we cannot compare these two sample statistics or, in general, any two sample statistics on the basis of their performance with a single sample. Instead, we need to recognize that sample statistics are themselves random variables because different samples can lead to different values for the sample statistics. As random variables, sample statistics must be judged and compared on the basis of their probability distributions (i.e., the collection of values and associated probabilities of each statistic that would be obtained if the sampling experiment were repeated a very large number of times). We will illustrate this concept with another example.

Suppose it is known that in a certain part of Canada the daily high temperature recorded for all past months of January has a mean μ=10°F and a standard deviation μ=5°F. Consider an experiment consisting of randomly selecting 25 daily high temperatures from the records of past months of January and calculating the sample mean x. If this experiment were repeated a very large number of times, the value of x would vary from sample to sample. For example, the first sample of 25 temperature measurements might have a mean x=9.8, the second sample a mean x=11.4, the third sample a mean x=10.5, etc. If the sampling experiment were repeated a very large number of times, the resulting histogram of sample means would be approximately the probability distribution of x. If x is a good estimator of μ, we would expect the values of x to cluster around μ as shown in Figure 4.37 . This probability distribution is called a sampling distribution because it is generated by repeating a sampling experiment a very large number of times.

The sampling distribution of a sample statistic calculated from a sample of n temperature measurements is the probability distribution of the statistic.

In actual practice, the sampling distribution of a statistic is obtained mathematically or (at least approximately) by simulating the sample on a computer, using a procedure similar to that just described.

If x has been calculated from a sample of n=25 measurements selected from a population with mean μ=10 and standard deviation σ=5, the sampling distribution (Figure 4.37) provides information about the behavior of x in repeated sampling.

Figure 4.37

Sampling distribution for ˉx based on a sample of n=25 temperature measurements

For example, the probability that you will draw a sample of 25 measurements and obtain a value of x in the interval 9x10 will be the area under the sampling distribution over that interval.

Since the properties of a statistic are typified by its sampling distribution, it follows that, to compare two sample statistics, you compare their sampling distributions. For example, if you have two statistics, A and B, for estimating the same parameter (for purposes of illustration, suppose the parameter is the population variance σ2), and if their sampling distributions are as shown in Figure 4.38, you would prefer statistic A over statistic B. You would do so because the sampling distribution for statistic A centers over σ2 and has less spread (variation) than the sampling distribution for statistic B. Then, when you draw a single sample in a practical sampling situation, the probability is higher that statistic A will fall nearer σ2.

Figure 4.38

Two sampling distributions for estimating the population variance σ2

Remember that, in practice, we will not know the numerical value of the unknown parameter σ2, so we will not know whether statistic A or statistic B is closer to σ2 for a particular sample. We have to rely on our knowledge of the theoretical sampling distributions to choose the best sample statistic and then use it sample after sample. The procedure for finding the sampling distribution for a statistic is demonstrated in Example 4.26.

Example 4.26 Finding a Sampling Distribution—Come-Out Roll in Craps

Problem

  1. Consider the popular casino game of craps, in which a player throws two dice and bets on the outcome (the sum total of the dots showing on the upper faces of the two dice). In Example 4.5 (p. 172), we looked at the possible outcomes of a $5 wager on the first toss (called the come-out roll). Recall that ifIf the sum total of the dice is 7 or 11, the roller wins $5; if the total is a 2, 3, or 12, the roller loses $5 (i.e., the roller “wins” $5); and, for any other total (4, 5, 6, 8, 9, or 10), no money is lost or won on that roll (i.e., the roller wins $0). Let x represent the result of the come-out roll wager ($5,$0,or+$5). We showed in Example 4.5 that the actual probability distribution of x is:

    Alternate View
    Outcome of wager, x 5 0 5
    p(x) 1/9 6/9 2/9

    Now, consider a random sample of n=3 come-out rolls.

    1. Find the sampling distribution of the sample mean, x.

    2. Find the sampling distribution of the sample median, M.

Solution

  1. The outcomes for every possible sample of n=3 come-out rolls are listed in Table 4.9, along with the sample mean and median. The probability of each sample is obtained using the Multiplicative Rule. For example, the probability of the sample (0, 0, 5) is p(0)p(0)p(5)=(6/9)(6/9)(2/9)=72/729=.099 .The probability for each sample is also listed in Table 4.9. Note that the sum of these probabilities is equal to 1.

    Table 4.9 All Possible Samples of n=3 Come-Out Rolls in Craps

    Alternate View
    Possible Samples x M Probability
    5,5,5 5 5 (1/9)(1/9)(1/9)=1/729
    5,5,0 3.33 5 (1/9)(1/9)(1/9)=6/729
    5,5,5 1.67 5 (1/9)(1/9)(2/9)=2/729
    5,0,5 3.33 5 (1/9)(1/9)(2/9)=6/729
    5,0,0 1.67 0 (1/9)(6/9)(6/9)=36/729
    5,0,5 0 0 (1/9)(6/9)(6/9)=12/729
    5,5,5 1.67 5 (1/9)(2/9)(1/9)=2/729
    5,5,0 0 0 (1/9)(2/9)(6/9)=12/729
    5,5,5 1.67 5 (1/9)(2/9)(2/9)=4/729
    0,5,5 3.33 5 (6/9)(1/9)(1/9)=6/729
    0,5,0 1.67 0 (6/9)(1/9)(6/9)=36/729
    0,5,5 0 0 (6/9)(1/9)(2/9)=12/729
    0,0,5 1.67 0 (6/9)(6/9)(1/9)=36/729
      0, 0, 0 0 0 (6/9)(6/9)(6/9)=216/729
      0, 0, 5 1.67 0 (6/9)(6/9)(2/9)=72/729
    0,5,5 0 0 (6/9)(2/9)(1/9)=12/729
      0, 5, 0 1.67 0 (6/9)(2/9)(6/9)=72/729
      0, 5, 5 3.33 5 (6/9)(2/9)(2/9)=24/729
    5,5,5 1.67 5 (2/9)(1/9)(1/9)=2/729
    5,5,0 0 0 (2/9)(1/9)(6/9)=12/729
    5,5,5 1.67 5 (2/9)(1/9)(2/9)=4/729
    5,0,5 0 0 (2/9)(6/9)(1/9)=12/729
      5, 0, 0 1.67 0 (2/9)(6/9)(6/9)=72/729
      5, 0, 5 3.33 5 (2/9)(6/9)(2/9)=24/729
    5,5,5 1.67 5 (2/9)(2/9)(1/9)=4/729
      5, 5, 0 3.33 5 (2/9)(2/9)(6/9)=24/729
      5, 5, 5 5 5 (2/9)(2/9)(2/9)=8/729

    1. From Table 4.9, you can see that x can assume the values 5,3.33,1.67,0,1.67,3.33, and 5. Because x=5 occurs only in one sample, P(ˉx=5)=1/729. Similarly, x=3.33 occurs in three samples, (5,5,0),(5,0,5),and(0,5,5). Therefore, P(ˉx=3.33)=6/729+6/729+6/729=18/729. Calculating the probabilities of the remaining values of x and arranging them in a table, we obtain the following probability distribution:

      Alternate View
      x 5 3.33 1.67 0 1.67 3.33 5
      p(ˉx) 1/729= .0014 18/729114/729= .0247 288/729= .1564 288/729= .3951 72/729= .3127 8/729= .0988 .0110

      This is the sampling distribution for x because it specifies the probability associated with each possible value of x. You can see that the most likely mean outcome after 3 randomly selected come-out rolls is ˉx=$0; this result occurs with probability 288/729=.3951.

    2. In Table 4.9, you can see that the median M can assume one of three values: 5,0, and 5. The value M=5 occurs in 7 different samples. Therefore, P(M=5) is the sum of the probabilities associated with these 7 samples; that is, P(M=5)=1/729+6/729+2/729+6/729+2/729+6/729+2/729=25/729. Similarly, M=0 occurs in 13 samples and M=5 occurs in 7 samples. These probabilities are obtained by summing the probabilities of their respective sample points. After performing these calculations, we obtain the following probability distribution for the median M:

      Alternate View
      M 5 0 5
      p(M) 25/729=.0343 612/729=.8395 92/729=.1262

      Once again, the most likely median outcome after 3 randomly selected come-out rolls is ˉx=$0—a result that occurs with probability 612/729=.8395.

Look Back

The sampling distributions of parts a and b are found by first listing all possible distinct values of the statistic and then calculating the probability of each value. Note that if the values of x were equally likely, the 27 sample points in Table 4.9 would all have the same probability of occurring, namely, 1/27.

Example 4.9 demonstrates the procedure for finding the exact sampling distribution of a statistic when the number of different samples that could be selected from the population is relatively small. In the real world, populations often consist of a large number of different values, making samples difficult (or impossible) to enumerate. When this situation occurs, we may choose to obtain the approximate sampling distribution for a statistic by simulating the sampling over and over again and recording the proportion of times different values of the statistic occur. Example 4.27 illustrates this procedure.

Example 4.27 Simulating a Sampling Distribution—Thickness of Steel Sheets 

Problem

  1. The rolling machine of a steel manufacturer produces sheets of steel of varying thickness. The thickness of a steel sheet ranges between 150 and 200 millimeters, with distribution shown in Figure 4.39 (This distribution is known as the Uniform distribution.). Suppose we perform the following experiment over and over again: Randomly sample 11 steel sheets from the production line and record the thickness x of each. Calculate the two sample statistics

    ˉx=Sample mean=Σx11M=Median=Sixth sample measurement when the 11 thicknessesare arranged in ascending order

    Obtain approximations to the sampling distributions of x and M.

Solution

  1. We used MINITAB to generate 1,000 samples from this population, each with n=11 observations. Then we computed x and M for each sample. Our goal is to obtain approximations to the sampling distributions of x and M in order to find out which sample statistic ( x or M) contains more information about μ. [Note: In this particular example, it is known that the population mean is μ=175mm.] The first 10 of the 1,000 samples generated are presented in Table 4.10. For instance, the first computer-generated sample from the uniform distribution contained the following measurements (arranged in ascending order): 151, 157, 162, 169, 171, 173, 181, 182, 187, 188, and 193 millimeters. The sample mean x and median M computed for this sample are

    Figure 4.39

    Uniform distribution for thickness of steel sheets

    Table 4.10 First 10 Samples of n=11 Thickness Measurements from Uniform Distribution

    Alternate View
    Sample Thickness Measurements Mean Median
     1 173 171 187 151 188 181 182 157 162 169 193 174.00 173
     2 181 190 182 171 187 177 162 172 188 200 193 182.09 182
     3 192 195 187 187 172 164 164 189 179 182 173 180.36 182
     4 173 157 150 154 168 174 171 182 200 181 187 172.45 173
     5 169 160 167 170 197 159 174 174 161 173 160 169.46 169
     6 179 170 167 174 173 178 173 170 173 198 187 176.55 173
     7 166 177 162 171 154 177 154 179 175 185 193 172.09 175
     8 164 199 152 153 163 156 184 151 198 167 180 169.73 164
     9 181 193 151 166 180 199 180 184 182 181 175 179.27 181
    10 155 199 199 171 172 157 173 187 190 185 150 176.18 173

    Data Set: SIMUNI

    ˉx=151+157++19311=174.0M=Sixth ordered measurement=173

    The MINITAB relative frequency histograms for x and M for the 1,000 samples of size n=11 are shown in Figure 4.40. These histograms represent approximations to the true sampling distributions of x and M.

    Figure 4.40

    MINITAB histograms for sample mean and sample median, Example 4.27

Look Back

You can see that the values of x tend to cluster around μ to a greater extent than do the values of M. Thus, on the basis of the observed sampling distributions, we conclude that x contains more information about μ than M does—at least for samples of n=11 measurements from the uniform distribution.

As noted earlier, many sampling distributions can be derived mathematically, but the theory necessary to do so is beyond the scope of this text. Consequently, when we need to know the properties of a statistic, we will present its sampling distribution and simply describe its properties. Several of the important properties we look for in sampling distributions are discussed in the next section.

Exercises 4.143–4.151

Understanding the Principles

  1. 4.143 What is the difference between a population parameter and a sample statistic?

  2. 4.144 What is a sampling distribution of a sample statistic?

Learning the Mechanics

  1. 4.145 The probability distribution shown here describes a population of measurements that can assume values of 0, 2, 4, and 6, each of which occurs with the same relative frequency:

    Alternate View
    x 0 2 4 6
    p(x) 14 14 14 14
    1. List all the different samples of n=2 measurements that can be selected from this population.

    2. Calculate the mean of each different sample listed in part a.

    3. If a sample of n=2 measurements is randomly selected from the population, what is the probability that a specific sample will be selected?

    4. Assume that a random sample of n=2 measurements is selected from the population. List the different values of x found in part b, and find the probability of each. Then give the sampling distribution of the sample mean x in tabular form.

    5. Construct a probability histogram for the sampling distribution of x.

  2. 4.146 Simulate sampling from the population described in Exercise 4.145 by marking the values of x, one on each of four identical coins (or poker chips, etc.). Place the coins (marked 0, 2, 4, and 6) into a bag, randomly select one, and observe its value. Replace this coin, draw a second coin, and observe its value. Finally, calculate the mean x for this sample of n=2 observations randomly selected from the population (Exercise 4.145, part b). Replace the coins, mix them, and, using the same procedure, select a sample of n=2 observations from the population. Record the numbers and calculate x for this sample. Repeat this sampling process until you acquire 100 values of x. Construct a relative frequency distribution for these 100 sample means. Compare this distribution with the exact sampling distribution of x found in part e of Exercise 4.145. [Note: The distribution obtained in this exercise is an approximation to the exact sampling distribution. However, if you were to repeat the sampling procedure, drawing two coins not 100 times, but 10,000 times, then the relative frequency distribution for the 10,000 sample means would be almost identical to the sampling distribution of x found in Exercise 4.145, part e.]

  3. 4.147 Consider the population described by the probability distribution shown here:

    Alternate View
    x 1 2 3 4 5
    p(x) .2 .3 .2 .2 .1

    The random variable x is observed twice. If these observations are independent, verify that the different samples of size 2 and their probabilities are as follows:

    Alternate View
    Sample Probability Sample Probability
    1, 1 .04 3, 4 .04
    1, 2 .06 3, 5 .02
    1, 3 .04 4, 1 .04
    1, 4 .04 4, 2 .06
    1, 5 .02 4, 3 .04
    2, 1 .06 4, 4 .04
    2, 2 .09 4, 5 .02
    2, 3 .06 5, 1 .02
    2, 4 .06 5, 2 .03
    2, 5 .03 5, 3 .02
    3, 1 .04 5, 4 .02
    3, 2 .06 5, 5 .01
    3, 3 .04
    1. Find the sampling distribution of the sample mean x.

    2. Construct a probability histogram for the sampling distribution of x.

    3. What is the probability that x is 4.5 or larger?

    4. Would you expect to observe a value of x equal to 4.5 or larger? Explain.

  4. 4.148 Refer to Exercise 4.147 and find E(x)=μ. Then use the sampling distribution of x found in Exercise 6.5 to find the expected value of x. Note that E(x)=μ.

  5. 4.149 Refer to Exercise 4.147. Assume that a random sample of n=2 measurements is randomly selected from the population.

    1. List the different values that the sample median M may assume, and find the probability of each. Then give the sampling distribution of the sample median.

    2. Construct a probability histogram for the sampling distribution of the sample median, and compare it with the probability histogram for the sample mean (Exercise 6.5, part b).

  6. 4.150 In Example 4.27, we used the computer to generate 1,000 samples, each containing n=11 observations, from a uniform distribution over the interval from 150 to 200. Now use the computer to generate 500 samples, each containing n=15 observations, from that same population.

    1. Calculate the sample mean for each sample. To approximate the sampling distribution of x, construct a relative frequency histogram for the 500 values of x.

    2. Repeat part a for the sample median. Compare this approximate sampling distribution with the approximate sampling distribution of x found in part a.

  7. 4.151 Consider a population that contains values of x equal to 00, 01, 02, 03, , 96, 97, 98, 99. Assume that these values occur with equal probability. Use the computer to generate 500 samples, each containing n=25 measurements, from this population. Calculate the sample mean x and sample variance s2 for each of the 500 samples.

    1. To approximate the sampling distribution of x, construct a relative frequency histogram for the 500 values of x.

    2. Repeat part a for the 500 values of s2.

4.9 The Sampling Distribution of ˉx and the Central Limit Theorem

Estimating the mean useful life of automobiles, the mean number of crimes per month in a large city, and the mean yield per acre of a new soybean hybrid are practical problems with something in common. In each case, we are interested in making an inference about the mean μ of some population. As we mentioned in Chapter 2, the sample mean x is, in general, a good estimator of μ. We now develop pertinent information about the sampling distribution for this useful statistic. We will show that x is the minimum-variance unbiased estimator (MVUE) of μ.

Example 4.28 Describing the Sampling Distribution of x

Problem

  1. Suppose a population has the uniform probability distribution given in Figure 4.41. It can be shown (proof omitted) that the mean and standard deviation of this probability distribution are, respectively, μ=175 and σ=14.43. Now suppose a sample of 11 measurements is selected from this population. Describe the sampling distribution of the sample mean x based on the 1,000 sampling experiments discussed in Example 4.27.

    Figure 4.41

    Sampled uniform population

Solution

  1. Recall that in Example 4.27 we generated 1,000 samples of n=11 measurements each. The MINITAB histogram for the 1,000 sample means is shown in Figure 4.42, with a normal probability distribution superimposed. You can see that this normal probability distribution approximates the computer-generated sampling distribution very well.

    To fully describe a normal probability distribution, it is necessary to know its mean and standard deviation. MINITAB gives these statistics for the 1,000 x’s in the upper right corner of the histogram of Figure 4.42. You can see that the mean is 175.2 and the standard deviation is 4.383.

    To summarize our findings based on 1,000 samples, each consisting of 11 measurements from a uniform population, the sampling distribution of x appears to be approximately normal with a mean of about 175 and a standard deviation of about 4.38.

Figure 4.42

MINITAB histogram for sample mean in 1,000 samples

Look Back

Note that the simulated value μx=175.2 is very close to μ=175 for the uniform distribution; that is, the simulated sampling distribution of x appears to provide an unbiased estimate of μ.

The true sampling distribution of x has the properties given in the next box, assuming only that a random sample of n observations has been selected from any population.

Properties of the Sampling Distribution of x

  1. The mean of the sampling distribution of x equals the mean of the sampled population. That is, μx=E(x)=μ.*

  2. The standard deviation of the sampling distribution of x equals

    StandarddeviationofsampledpopulationSquarerootofsamplesize

That is, σˉx=σ/n**

The standard deviation σx is often referred to as the standard error of the mean.

You can see that our approximation to μx in Example 6.7 was precise, since property 1 assures us that the mean is the same as that of the sampled population: 175. Property 2 tells us how to calculate the standard deviation of the sampling distribution of x. Substituting σ=14.43 (the standard deviation of the sampled uniform distribution) and the sample size n=11 into the formula for σx, we find that

σx=σn=14.4311=4.35

Thus, the approximation we obtained in Example 6.7, σx4.38, is very close to the exact value, σx=4.35***.

What about the shape of the sampling distribution? Two theorems provide this information. One is applicable whenever the original population data are normally distributed. The other, applicable when the sample size n is large, represents one of the most important theoretical results in statistics: the Central Limit Theorem.

Theorem 4.1

If a random sample of n observations is selected from a population with a normal distribution, the sampling distribution of x will be a normal distribution.

Biography Pierre-Simon Laplace (1749–1827)

The Originator of the Central Limit Theorem

As a boy growing up in Normandy, France, Pierre-Simon Laplace attended a Benedictine priory school. Upon graduation, he entered Caen University to study theology. During his two years there, he discovered his mathematical talents and began his career as an eminent mathematician. In fact, he considered himself the best mathematician in France. Laplace’s contributions to mathematics ranged from introducing new methods of solving differential equations to presenting complex analyses of motions of astronomical bodies. While studying the angles of inclination of comet orbits in 1778, Laplace showed that the sum of the angles is normally distributed. Consequently, he is considered to be the originator of the Central Limit Theorem. (A rigorous proof of the theorem, however, was not provided until the early 1930s by another French mathematician, Paul Levy.) Laplace also discovered Bayes’s theorem and established Bayesian statistical analysis as a valid approach to many practical problems of his time.

Theorem 4.2: Central Limit Theorem

Consider a random sample of n observations selected from a population (any population) with mean μ and standard deviation σ. Then, when n is sufficiently large, the sampling distribution of x will be approximately a normal distribution with mean μx=μ and standard deviation σˉx=σ/n. The larger the sample size, the better will be the normal approximation to the sampling distribution of x.*

Thus, for sufficiently large samples, the sampling distribution of x is approximately normal. How large must the sample size n be so that the normal distribution provides a good approximation to the sampling distribution of x? The answer depends on the shape of the distribution of the sampled population, as shown by Figure 4.43. Generally speaking, the greater the skewness of the sampled population distribution, the larger the sample size must be before the normal distribution is an adequate approximation to the sampling distribution of x. For most sampled populations, sample sizes of n30 will suffice for the normal approximation to be reasonable.

Figure 4.43

Sampling distributions of x for different populations and different sample sizes

Example 4.29 Using the Central Limit Theorem to Find a Probability

Problem

  1. Suppose we have selected a random sample of n=36 observations from a population with mean equal to 80 and standard deviation equal to 6. It is known that the population is not extremely skewed.

    1. Sketch the relative frequency distributions for the population and for the sampling distribution of the sample mean x.

    2. Find the probability that x will be larger than 82.

Solution

  1. We do not know the exact shape of the population relative frequency distribution, but we do know that it should be centered about μ=80, its spread should be measured by σ=6, and it is not highly skewed. One possibility is shown in Figure 4.44a. From the Central Limit Theorem, we know that the sampling distribution of x will be approximately normal, since the sampled population distribution is not extremely skewed. We also know that the sampling distribution will have mean

    μx=μ=80

    and standard deviation

    σx=σn=636=1

    The sampling distribution of x is shown in Figure 4.44b.

    Figure 4.44

    A population relative frequency distribution and the sampling distribution for x

  2. The probability that x will exceed 82 is equal to the highlighted area in Figure 4.45. To find this area, we need to find the z value corresponding to x=82. Recall that the standard normal random variable z is the difference of any normally distributed random variable and its mean, expressed in units of its standard deviation. Since x is a normally distributed random variable with mean μx=μ and standard deviation σˉx=σ/n, it follows that the standard normal z value corresponding to the sample mean x is

    Figure 4.45

    The sampling distribution of x

    z=(Normal random variable)(Mean)Standard deviation=ˉxμˉxσˉx

    Therefore, for x=82, we have

    z=ˉxμˉxσˉx=82801=2

    The area A in Figure 4.45 corresponding to z=2 is given in the table of areas under the normal curve (see Table II of Appendix B) as .4772. Therefore, the tail area corresponding to the probability that x exceeds 82 is

    P(x>82)=P(z>2)=.5.4772=.0228

Look Back

The key to finding the probability in part b is to recognize that the distribution of x is normal with μx=μ and σˉx=σ/n.

Example 4.30 Application of the Central Limit Theorem—Testing a Manufacturer’s Claim

Problem

  1. A manufacturer of automobile batteries claims that the distribution of the lengths of life of its best battery has a mean of 54 months and a standard deviation of 6 months. Suppose a consumer group decides to check the claim by purchasing a sample of 50 of the batteries and subjecting them to tests that estimate the battery’s life.

    1. Assuming that the manufacturer’s claim is true, describe the sampling distribution of the mean lifetime of a sample of 50 batteries.

    2. Assuming that the manufacturer’s claim is true, what is the probability that the consumer group’s sample has a mean life of 52 or fewer months?

Solution

  1. Even though we have no information about the shape of the probability distribution of the lives of the batteries, we can use the Central Limit Theorem to deduce that the sampling distribution for a sample mean lifetime of 50 batteries is approximately normally distributed. Furthermore, the mean of this sampling distribution is the same as the mean of the sampled population, which is μ=54 months according to the manufacturer’s claim. Finally, the standard deviation of the sampling distribution is given by

    σx=σn=650=.85 month

    Note that we used the claimed standard deviation of the sampled population, σ=6 months. Thus, if we assume that the claim is true, then the sampling distribution for the mean life of the 50 batteries sampled must be as shown in Figure 4.46.

    Figure 4.46

    Sampling distribution of ˉx in Example 4.30 for n=50

  2. If the manufacturer’s claim is true, the probability that the consumer group observes a mean battery life of 52 or fewer months for its sample of 50 batteries, P(x52), is equivalent to the highlighted area in Figure 4.46. Since the sampling distribution is approximately normal, we can find this area by computing the standard normal z value:

    z=ˉxμˉxσˉx=ˉxμσˉx=5254.85=2.35

    Here, μx, the mean of the sampling distribution of x, is equal to μ, the mean of the lifetimes of the sampled population, and σx is the standard deviation of the sampling distribution of x. Note that z is the familiar standardized distance (z-score) of Section 2.6, and since x is approximately normally distributed, it will possess the standard normal distribution of Section 4.5.

    The area A shown in Figure 4.46 between x=52 and x=54 (corresponding to z=2.35) is found in Table II of Appendix B to be .4906. Therefore, the area to the left of x=52 is

    P(x52)=.5A=.5.4906=.0094

    Thus, the probability that the consumer group will observe a sample mean of 52 or less is only .0094 if the manufacturer’s claim is true.

Look Back

If the 50 batteries tested do exhibit a mean of 52 or fewer months, the consumer group will have strong evidence that the manufacturer’s claim is untrue because such an event is very unlikely to occur if the claim is true. (This is still another application of the rare-event approach to statistical inference.)

We conclude this section with three comments on the sampling distribution of  x. First, from the formula σˉx=σ/n, we see that the standard deviation of the sampling distribution of x gets smaller as the sample size n gets larger. For example, we computed σx=.85 when n=50 in Example 6.9. However, for n=100, we obtain σˉx=σ/n=6/100=.60. This relationship will hold true for most of the sample statistics encountered in this text. That is, the standard deviation of the sampling distribution decreases as the sample size increases. Consequently, the larger the sample size, the more accurate the sample statistic (e.g., x) is in estimating a population parameter (e.g.,  μ). We will use this result in Chapter 5 to help us determine the sample size needed to obtain a specified accuracy of estimation.

Our second comment concerns the Central Limit Theorem. In addition to providing a very useful approximation for the sampling distribution of a sample mean, the Central Limit Theorem offers an explanation for the fact that many relative frequency distributions of data possess mound-shaped distributions. Many of the measurements we take in various areas of research are really means or sums of a large number of small phenomena. For example, a year’s growth of a pine seedling is the total of the numerous individual components that affect the plant’s growth. Similarly, we can view the length of time a construction company takes to build a house as the total of the times taken to complete a multitude of distinct jobs, and we can regard the monthly demand for blood at a hospital as the total of the many individual patients’ needs. Whether or not the observations entering into these sums satisfy the assumptions basic to the Central Limit Theorem is open to question; however, it is a fact that many distributions of data in nature are mound shaped and possess the appearance of normal distributions.

Finally, it is important to understand when to use σ or σ/n in your statistical analysis. If the statistical inference you want to make concerns a single value of a random variable—for example, the life length x of one randomly selected battery—then use σ, the standard deviation of the probability distribution for x in your calculations. Alternatively, if the statistical inference you want to make concerns the sample mean—for example, the mean life length x for a random sample of n batteries—then use σ/n, the standard deviation of the sampling distribution of x.

Exercises 4.152–4.175

Understanding the Principles

  1. 4.152 What do the symbols μx and σx represent?

  2. 4.153 How does the mean of the sampling distribution of x relate to the mean of the population from which the sample is selected?

  3. 4.154 How does the standard deviation of the sampling distribution of x relate to the standard deviation of the population from which the sample is selected?

  4. 4.155 State the Central Limit Theorem.

  5. 4.156 Will the sampling distribution of x always be approximately normally distributed? Explain.

Learning the Mechanics

  1. 4.157 Suppose a random sample of n measurements is selected from a population with mean μ=100 and variance σ2=100. For each of the following values of n, give the mean and standard deviation of the sampling distribution of the sample mean x.

    1. n=4

    2. n=25

    3. n=100

    4. n=50

    5. n=500

    6. n=1,000

  2. 4.158 Suppose a random sample of n=25 measurements is selected from a population with mean μ and standard deviation σ. For each of the following values of μ and σ, give the values of μx and σx.

    1. μ=10,σ=3

    2. μ=100,σ=25

    3. μ=20,σ=40

    4. μ=10,σ=100

  3. 4.159 Consider the following probability distribution:

    Alternate View
    x 1 2 3 8
    p(x) .1 .4 .4 .1
    1. Find μ,σ2, and σ.

    2. Find the sampling distribution of x for random samples of n=2 measurements from this distribution by listing all possible values of x, and find the probability associated with each.

    3. Use the results of part b to calculate μx and σx. Confirm that μx=μ and that σˉx=σ/n=σ/2.

  4. 4.160 A random sample of n=64 observations is drawn from a population with a mean equal to 20 and standard deviation equal to 16.

    1. Give the mean and standard deviation of the (repeated) sampling distribution of x.

    2. Describe the shape of the sampling distribution of x. Does your answer depend on the sample size?

    3. Calculate the standard normal z-score corresponding to a value of x=16.

    4. Calculate the standard normal z-score corresponding to x=23.

    5. Find P(x<16).

    6. Find P(x>23).

    7. Find P(16<x<23).

  5. 4.161 A random sample of n=100 observations is selected from a population with μ=30 and σ=16.

    1. Find μx and σx.

    2. Describe the shape of the sampling distribution of x.

    3. Find P(x28).

    4. Find P(22.1x26.8).

    5. Find P(x28.2).

    6. Find P(x27.0).

Applet Exercise 4.7

Open the applet entitled Sampling Distribution. On the pull-down menu to the right of the top graph, select Binary.

    1. Run the applet for the sample size n=10 and the number of samples N=1000. Observe the shape of the graph of the sample proportions, and record the mean, median, and standard deviation of the sample proportions.

    2. How does the mean of the sample proportions compare with the mean μ=0.5 of the original distribution?

    3. Use the formula σ=np(1p), where n=1 and p=0.5, to compute the standard deviation of the original distribution. Divide the result by 10, the square root of the sample size used in the sampling distribution. How does this result compare with the standard deviation of the sample proportions?

    4. Explain how the graph of the distribution of sample proportions suggests that the distribution may be approximately normal.

    5. Explain how the results of parts b–d illustrate the Central Limit Theorem.

Applet Exercise 4.8

Open the applet entitled Sampling Distributions. On the pull-down menu to the right of the top graph, select Uniform. The box to the left of the top graph displays the population mean, median, and standard deviation of the original distribution.

    1. Run the applet for the sample size n=30 and the number of samples N=1000. Observe the shape of the graph of the sample means, and record the mean, median, and standard deviation of the sample means.

    2. How does the mean of the sample means compare with the mean of the original distribution?

    3. Divide the standard deviation of the original distribution by 30, the square root of the sample size used in the sampling distribution. How does this result compare with the standard deviation of the sample proportions?

    4. Explain how the graph of the distribution of sample means suggests that the distribution may be approximately normal.

    5. Explain how the results of parts b–d illustrate the Central Limit Theorem.

Applying the Concepts—Basic

  1. 4.162 Corporate sustainability of CPA firms. Refer to the Business and Society (Mar. 2011) study on the sustainability behaviors of CPA corporations, Exercise 1.26 (p. 22). Recall that corporateCorporate sustainability refers to business practices designed around social and environmental considerations. The level of support senior managers have for corporate sustainability was measured quantitatively on a scale ranging from 0 to 160 points, where higher point values indicate a higher level of support for sustainability. The study provided the following information on the distribution of levels of support for sustainability: μ=68,σ=27. Now consider a random sample of 30 senior managers and let x represent the sample mean level of support.

    1. Give the value of μx, the mean of the sampling distribution of x, and interpret the result.

    2. Give the value of σx, the standard deviation of the sampling distribution of x, and interpret the result.

    3. What does the Central Limit Theorem say about the shape of the sampling distribution of x?

    4. Find P(x>65).

  2. 4.163 Voltage sags and swells. Refer to the Electrical Engineering (Vol. 95, 2013) study of the power quality (sags and swells) of a transformer, Exercise 2.127 (p. 82). For transformers built for heavy industry, the distribution of the number of sags per week has a mean of 353 with a standard deviation of 30. Of interest is x, the sample mean number of sags per week for a random sample of 45 transformers.

    1. Find E(x) and interpret its value.

    2. Find Var(x).

    3. Describe the shape of the sampling distribution of x.

    4. How likely is it to observe a sample mean number of sags per week that exceeds 400?

  3. PHISH 4.164 Phishing attacks to e-mail accounts. In Exercise 2.47 (p. 53), you learned that phishing describes an attempt to extract personal/financial information from unsuspecting people through fraudulent e-mail. Data from an actual phishing attack against an organization were presented in Chance (Summer 2007). The interarrival times, i.e., the time differences (in seconds), for 267 fraud box e-mail notifications were recorded and are saved in the PHISH file. For this exercise, consider these interarrival times to represent the population of interest.

    1. In Exercise 2.47 you constructed a histogram for the interarrival times. Describe the shape of the population of interarrival times.

    2. Find the mean and standard deviation of the population of interarrival times.

    3. Now consider a random sample of n=40 interarrival times selected from the population.

      Describe the shape of the sampling distribution of x, the sample mean. Theoretically, what are μx and σx?

    4. Find P(x<90).

    5. Use a random number generator to select a random sample of n=40 interarrival times from the population, and calculate the value of x. (Every student in the class should do this.)

    6. Refer to part e. Obtain the values of x computed by the students and combine them into a single data set. Form a histogram for these values of x. Is the shape approximately normal?

    7. Refer to part f. Find the mean and standard deviation of the x-values. Do these values approximate μx and σx, respectively?

  4. 4.165 Physical activity of obese young adults. In a study on the physical activity of young adults, pediatric researchers measured overall physical activity as the total number of registered movements (counts) over a period of time and then computed the number of counts per minute (cpm) for each subject (International Journal of Obesity, Jan. 2007). The study revealed that the overall physical activity of obese young adults has a mean of μ=320 cpm and a standard deviation of σ=100 cpm. (In comparison, the mean for young adults of normal weight is 540 cpm.) In a random sample of n=100 obese young adults, consider the sample mean counts per minute, x.

    1. Describe the sampling distribution of x.

    2. What is the probability that the mean overall physical activity level of the sample is between 300 and 310 cpm?

    3. What is the probability that the mean overall physical activity level of the sample is greater than 360 cpm?

  5. 4.166 Cost of unleaded fuel. According to the American Automobile Association (AAA), the average cost of a gallon of regular unleaded fuel at gas stations in May 2014 was $3.65 (AAA Fuel Gauge Report). Assume that the standard deviation of such costs is $.15. Suppose that a random sample of n=100 gas stations is selected from the population and the cost per gallon of regular unleaded fuel is determined for each. Consider x, the sample mean cost per gallon.

    1. Calculate μx and σx.

    2. What is the approximate probability that the sample has a mean fuel cost between $3.65 and $3.67?

    3. What is the approximate probability that the sample has a mean fuel cost that exceeds $3.67?

    4. How would the sampling distribution of x change if the sample size n were doubled from 100 to 200? How do your answers to parts b and c change when the sample size is doubled?

  6. 4.167 Requests to a Web server. Brighton Webs LTD modeled the arrival time of requests to a Web server within each hour using a uniform distribution (see Example 4.27 ). Specifically, the number of seconds x from the start of the hour that the request is made is uniformly distributed between 0 and 3,600 seconds. It can be shown that the distribution of x has mean μ=1,800 and variance σ2=1,080,000. In a random sample of n=60 Web server requests, let x represent the sample mean number of seconds from the start of the hour that the request is made.

    1. Find E(x) and interpret its value.

    2. Find Var( x).

    3. Describe the shape of the sampling distribution of x.

    4. Find the probability that x is between 1,700 and 1,900 seconds.

    5. Find the probability that x exceeds 2,000 seconds.

Applying the Concepts—Intermediate

  1. 4.168 Shell lengths of sea turtles. Refer toConsider the Aquatic Biology (Vol. 9, 2010) study of green sea turtles inhabiting the Grand Cayman South Sound lagoon, Exercise 2.85 (p. 69). Research shows that the curved carapace (shell) lengths of these turtles has a distribution with mean μ=50 cm and standard deviation σ=10cm. In the study, n=76 green sea turtles were captured from the lagoon; the mean shell length for the sample was x=55.5 cm. How likely is it to observe a sample mean of 55.5 cm or larger? 

  2. 4.169 Tomato as a taste modifier. Miraculin is a protein naturally produced in a rare tropical fruit that can convert a sour taste into a sweet taste. Refer toConsider the Plant Science (May 2010) investigation of the ability of a hybrid tomato plant to produce miraculin, Exercise 5.38 (p. 272). Recall that theThe amount x of miraculin produced in the plant had a mean of 105.3 micrograms per gram of fresh weight with a standard deviation of 8.0. Consider a random sample of n=64 hybrid tomato plants and let x represent the sample mean amount of miraculin produced. Would you expect to observe a value of x less than 103 micrograms per gram of fresh weight? Explain. 

  3. 4.170 Motivation of drug dealers. Refer toConsider the Applied Psychology in Criminal Justice (Sept. 2009) investigation of the personality characteristics of drug dealers, Exercise 2.102 (p. 77). Convicted drug dealers were scored on the Wanting Recognition (WR) Scale—a scale that provides a quantitative measure of a person’s level of need for approval and sensitivity to social situations. (Higher scores indicate a greater need for approval.) Based on the study results, we can assume that the WR scores for the population of convicted drug dealers have a mean of 40 and a standard deviation of 5. Suppose that in a sample of n=100 people, the mean WR scale score is x=42. Is this sample likely to have been selected from the population of convicted drug dealers? Explain. 

  4. 4.171 Critical part failures in NASCAR vehicles. The Sport Journal (Winter 2007) published a study of critical part failures at NASCAR races. The researchersResearchers found that the time x (in hours) until the first critical part failure has a skewed distribution with μ=.10 and σ=.10. Now, consider a random sample of n=50 NASCAR races and let x represent the sample mean time until the first critical part failure.

    1. Find E(x) and Var(x).

    2. Although x has an exponential distribution, the sampling distribution of x is approximately normal. Why?

    3. Find the probability that the sample mean time until the first critical part failure exceeds .13 hour.

  5. 4.172 Characteristics of antiwar demonstrators. Refer toConsider the American Journal of Sociology (Jan. 2014) study of the characteristics of antiwar demonstrators in the United States, Exercise 2.106 (p. 77). The researchers found that the mean number of protest organizations joined by antiwar demonstrators over a recent 3-year period was .90 with a standard deviation of 1.10. Assume that these values represent the true mean μ and true standard deviation σ of the population of all antiwar demonstrators over the 3 years.

    1. In a sample of 50 antiwar demonstrators selected from the population, what is the expected value of x, the sample mean number of protest organizations joined by the demonstrators?

    2. Find values of the sample mean, L and U, such that P(L<x<U)=.95.

  6. 4.173 Is exposure to a chemical in Teflon-coated cookware hazardous? Perfluorooctanoic acid (PFOA) is a chemical used in Teflon®-coated cookware to prevent food from sticking. The Environmental Protection Agency is investigating the potential risk of PFOA as a cancer-causing agent (Science News Online, Aug. 27, 2005). It is known that the blood concentration of PFOA in the general population has a mean of μ=6 parts per billion (ppb) and a standard deviation of σ=10 ppb. Science News Online reported on tests for PFOA exposure conducted on a sample of 326 people who live near DuPont’s Teflon-making Washington (West Virginia) Works facility.

    1. What is the probability that the average blood concentration of PFOA in the sample is greater than 7.5 ppb?

    2. The actual study resulted in x=300 ppb. Use this information to make an inference about the true mean ( μ) PFOA concentration for the population that lives near DuPont’s Teflon facility.

Applying the Concepts—Advanced

  1. 4.174 Video game players and divided attention tasks. Human Factors (May 2014) published the results of a study designed to determine whether video game players are better than non-video game players at crossing the street when presented with distractions. Participants (college students) entered a street crossing simulator. The simulator was designed to have cars traveling at various high rates of speed in both directions. During the crossing, the students also performed a memory task as a distraction. The researchers found that students who are video game players took an average of 5.1 seconds to cross the street, with a standard deviation of .8 seconds. Assume that the time, x, to cross the street for the population of video game players has μ=5.1 and σ=.8. Now consider a sample of 30 students and let x represent the sample mean time (in seconds) to cross the street in the simulator.

    1. Find P(x>5.5).

    2. The 30 students in the sample are all non-video game players. What inference can you make about μ and/or σ for the population of non-video game players? Explain.

  2. 4.175 Hand washing versus hand rubbing. Refer toConsider the British Medical Journal (Aug. 17, 2002) study comparing the effectiveness of hand washing with soap and hand rubbing with alcohol, presented in Exercise 2.107 (p. 78). Health care workers who used hand rubbing had a mean bacterial count of 35 per hand with a standard deviation of 59. Health care workers who used hand washing had a mean bacterial count of 69 per hand with a standard deviation of 106. In a random sample of 50 health care workers, all using the same method of cleaning their hands, the mean bacterial count per hand (x) is less than 30. Give your opinion on whether this sample of workers used hand rubbing with alcohol or hand washing with soap. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.202.187