Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7.3 Comparing Two Population Means: Paired Difference Experiments

In Example 7.4, we compared two methods of teaching reading to “slow learners” by means of a 95% confidence interval. Suppose it is possible to measure the “reading IQs” of the “slow learners” before they are subjected to a teaching method. Eight pairs of “slow learners” with similar reading IQs are found, and one member of each pair is randomly assigned to the standard teaching method while the other is assigned to the new method. The data are given in Table 7.3. Do the data support the hypothesis that the population mean reading test score for “slow learners” taught by the new method is greater than the mean reading test score for those taught by the standard method?

Table 7.3 Reading Test Scores for Eight Pairs of “Slow Learners”

Pair	New Method (1)	Standard Method (2)
1	77	72
2	74	68
3	82	76
4	73	68
5	87	84
6	69	68
7	66	61
8	80	76

Data Set: PAIREDSCORES

We want to test

H 0 : (μ 1 - μ 2) = 0 H a : (μ 1 - μ 2) > 0

$\begin{array}{l} H_{0} : (μ_{1} - μ_{2}) = 0 \\ H_{a} : (μ_{1} - μ_{2}) > 0 \end{array}$

Many researchers mistakenly use the t statistic for two independent samples (Section 7.2) to conduct this test. This invalid analysis is shown on the MINITAB printout of Figure 7.10. The test statistic, t=1.26, $t = 1.26,$ and the p-value of the test, p=.115., $p = .115 .,$ are highlighted on the printout. At α=.10, $α = .10,$ the p-value exceeds α. $α .$ Thus, from this analysis, we might conclude that we do not have sufficient evidence to infer a difference in the mean test scores for the two methods.

MINITAB printout of an invalid analysis of reading test scores in Table 7.3

If you examine the data in Table 7.3 carefully, however, you will find this result difficult to accept. The test score of the new method is larger than the corresponding test score for the standard method for every one of the eight pairs of “slow learners.” This, in itself, seems to provide strong evidence to indicate that μ1 $μ_{1}$ exceeds μ2. $μ_{2} .$ Why, then, did the t-test fail to detect the difference? The answer is, the independent samples t-test is not a valid procedure to use with this set of data.

The t-test is inappropriate because the assumption of independent samples is invalid. We have randomly chosen pairs of test scores; thus, once we have chosen the sample for the new method, we have not independently chosen the sample for the standard method. The dependence between observations within pairs can be seen by examining the pairs of test scores, which tend to rise and fall together as we go from pair to pair. This pattern provides strong visual evidence of a violation of the assumption of independence required for the two-sample t-test of Section 7.2. Note also that

s 2 p = ( n 1 - 1 ) s 2 1 + ( n 2 - 1 ) s 2 2 n 1 + n 2 - 2 = ( 8 - 1 ) ( 6.93 ) 2 + ( 8 - 1 ) ( 7.01 ) 2 8 + 8 - 2 = 48.58

$s_{p}^{2} = \frac{(n_{1} - 1) s_{1}^{2} + (n_{2} - 1) s_{2}^{2}}{n_{1} + n_{2} - 2} = \frac{(8 - 1) (6.93)^{2} + (8 - 1) (7.01)^{2}}{8 + 8 - 2} = 48.58$

Hence, there is a large variation within samples (reflected by the large value of s2p $s_{p}^{2}$ ) in comparison to the relatively small difference between the sample means. Because s2p $s_{p}^{2}$ is so large, the t-test of Section 7.2 is unable to detect a difference between μ1 $μ_{1}$ and μ2. $μ_{2} .$

Pair	New Method	Standard Method	Difference (New Method $-$ Standard Method)
1	77	72	5
2	74	68	6
3	82	76	6
4	73	68	5
5	87	84	3
6	69	68	1
7	66	61	5
8	80	76	4

We now consider a valid method of analyzing the data of Table 7.3. In Table 7.4, we add the column of differences between the test scores of the pairs of “slow learners.” We can regard these differences in test scores as a random sample of differences for all pairs (matched on reading IQ) of “slow learners,” past and present. Then we can use this sample to make inferences about the mean of the population of differences, μd, $μ_{d},$ which is equal to the difference (μ1−μ2). $(μ_{1} - μ_{2}) .$ That is, the mean of the population (and sample) of differences equals the difference between the population (and sample) means. Thus, our test becomes

H 0 : μ d = 0 (μ 1 - μ 2 = 0) H a : μ d > 0 (μ 1 - μ 2 > 0)

$\begin{array}{l} H_{0} : μ_{d} = 0 (μ_{1} - μ_{2} = 0) \\ H_{a} : μ_{d} > 0 (μ_{1} - μ_{2} > 0) \end{array}$

The test statistic is a one-sample t (Section 8.4), since we are now analyzing a single sample of differences for small n. Thus,

T e s t s t a t i s t i c : t = x ¯ d - 0 s d / n d - - \sqrt

$T e s t s t a t i s t i c : t = \frac{{\bar{x}}_{d} - 0}{s_{d} / \sqrt{n_{d}}}$

where

x ¯ d s d n d = = = Sample mean difference Sample standard deviation of differences Number of differences = Number of pairs

$\begin{array}{l} \begin{matrix} {\bar{x}}_{d} \end{matrix} & = & Sample mean difference \\ s_{d} & = & Sample standard deviation of differences \\ n_{d} & = & Number of differences = Number of pairs \end{array}$

Assumptions: The population of differences in test scores is approximately normally distributed. The sample differences are randomly selected from the population differences. [Note: We do not need to make the assumption that σ21=σ22. $σ_{1}^{2} = σ_{2}^{2} .$ ]
Rejection region: At significance level α=.05, $α = .05,$ we will reject H0 $H_{0}$ if t>t.05, $t > t_{.05},$ where t.05 $t_{.05}$ is based on (nd−1) $(n_{d} - 1)$ degrees of freedom.

Referring to Table II in Appendix B, we find the t-value corresponding to α=.05 $α = .05$ and nd−1=8−1=7 $n_{d} - 1 = 8 - 1 = 7$ df to be t.05=1.895. $t_{.05} = 1.895.$ Then we will reject the null hypothesis if t>1.895. $t > 1.895.$ (See Figure 7.11.) Note that the number of degrees of freedom decreases from n1+n2−2=14 to 7 $n_{1} + n_{2} - 2 = 14 to 7$ when we use the paired difference experiment rather than the two independent random samples design.

Summary statistics for the nd=8 $n_{d} = 8$ differences are shown in the MINITAB printout of Figure 7.12. Note that x¯¯¯d=4.375 ${\overline{x}}_{d} = 4.375$ and sd=1.685. $s_{d} = 1.685.$ Substituting these values into the formula for the test statistic, we have

t = x ¯ d - 0 s d / n d - - \sqrt = 4.375 1.685 / 8 - \sqrt = 7.34

$t = \frac{{\bar{x}}_{d} - 0}{s_{d} / \sqrt{n_{d}}} = \frac{4.375}{1.685 / \sqrt{8}} = 7.34$

Because this value of t falls into the rejection region, we conclude (at α=.05 $α = .05$ ) that the population mean test score for “slow learners” taught by the new method exceeds the population mean score for those taught by the standard method. We can reach the same conclusion by noting that the p-value of the test, highlighted in Figure 7.12, is much smaller than α=.05. $α = .05 .$

MINITAB paired difference analysis of reading test scores

Now Work Exercises 7.35a and b

This kind of experiment, in which observations are paired and the differences are analyzed, is called a paired difference experiment. In many cases, a paired difference experiment can provide more information about the difference between population means than an independent samples experiment can. The idea is to compare population means by comparing the differences between pairs of experimental units (objects, people, etc.) that were similar prior to the experiment. The differencing removes sources of variation that tend to inflate σ2. $σ^{2} .$ For example, when two children are taught to read by two different methods, the observed difference in achievement may be due to a difference in the effectiveness of the two teaching methods, or it may be due to differences in the initial reading levels and IQs of the two children (random error). To reduce the effect of differences in the children on the observed differences in reading achievement, the two methods of reading are imposed on two children who are more likely to possess similar intellectual capacity, namely, children with nearly equal IQs. The effect of this pairing is to remove the larger source of variation that would be present if children with different abilities were randomly assigned to the two samples. Making comparisons within groups of similar experimental units is called blocking, and the paired difference experiment is a simple example of a randomized block experiment. In our example, pairs of children with matching IQ scores represent the blocks.

Some other examples for which the paired difference experiment might be appropriate are the following:

Suppose you want to estimate the difference (μ1−μ2) $(μ_{1} - μ_{2})$ in mean price per gallon between two major brands of premium gasoline. If you choose two independent random samples of stations for each brand, the variability in price due to geographic location may be large. To eliminate this source of variability, you could choose pairs of stations of similar size, one station for each brand, in close geographic proximity and use the sample of differences between the prices of the brands to make an inference about (μ1−μ2). $(μ_{1} - μ_{2}) .$
Suppose a college placement center wants to estimate the difference (μ1−μ2) $(μ_{1} - μ_{2})$ in mean starting salaries for men and women graduates who seek jobs through the center. If it independently samples men and women, the starting salaries may vary because of their different college majors and differences in grade point averages. To eliminate these sources of variability, the placement center could match male and female job seekers according to their majors and grade point averages. Then the differences between the starting salaries of each pair in the sample could be used to make an inference about (μ1−μ2). $(μ_{1} - μ_{2}) .$
Suppose you wish to estimate the difference (μ1−μ2) $(μ_{1} - μ_{2})$ in mean absorption rate into the bloodstream for two drugs that relieve pain. If you independently sample people, the absorption rates might vary because of age, weight, sex, blood pressure, etc. In fact, there are many possible sources of nuisance variability, and pairing individuals who are similar in all the possible sources would be quite difficult. However, it may be possible to obtain two measurements on the same person. First, we administer one of the two drugs and record the time until absorption. After a sufficient amount of time, the other drug is administered and a second measurement on absorption time is obtained. The differences between the measurements for each person in the sample could then be used to estimate (μ1−μ2). $(μ_{1} - μ_{2}) .$ This procedure would be advisable only if the amount of time allotted between drugs is sufficient to guarantee little or no carry-over effect. Otherwise, it would be better to use different people matched as closely as possible on the factors thought to be most important.

Now Work Exercise 7.35

The hypothesis-testing procedures and the method of forming confidence intervals for the difference between two means in a paired difference experiment are summarized in the following boxes for both large and small n:

Paired Difference Confidence Interval for μd=μ1−μ2 $μ_{d} = μ_{1} - μ_{2}$

Large Sample, Normal (z) Statistic

x ¯ d \pm z α / 2 σ d n d - - \sqrt \approx x ¯ d \pm z α / 2 s d n d - - \sqrt

${\bar{x}}_{d} \pm z_{α / 2} \frac{σ_{d}}{\sqrt{n_{d}}} \approx {\bar{x}}_{d} \pm z_{α / 2} \frac{s_{d}}{\sqrt{n_{d}}}$

Small Sample, Student’s t-Statistic

x ¯ d \pm t α / 2 s d n d - - \sqrt

${\bar{x}}_{d} \pm t_{α / 2} \frac{s_{d}}{\sqrt{n_{d}}}$

where tα/2 $t_{α / 2}$ is based on (nd−1) $(n_{d} - 1)$ degrees of freedom

Paired Difference Test of Hypothesis for μd=μ1−μ2 $μ_{d} = μ_{1} - μ_{2}$

	One-Tailed Tests		Two-Tailed Test
	H0:μd=D0 $H_{0} : μ_{d} = D_{0}$	H0:μd=D0 $H_{0} : μ_{d} = D_{0}$	H0:μd=D0 $H_{0} : μ_{d} = D_{0}$
	Ha:μd<D0 $H_{a} : μ_{d} < D_{0}$	Ha:μd>D0 $H_{a} : μ_{d} > D_{0}$	Ha:μd≠D0 $H_{a} : μ_{d} \neq D_{0}$
Large Sample, Normal (z) Test Statistic: zc=(x¯d−D0)(σd/nd−−√)≈(x¯d−D0)(sd/nd−−√) $z_{c} = \frac{({\bar{x}}_{d} - D_{0})}{(σ_{d} / \sqrt{n_{d}})} \approx \frac{({\bar{x}}_{d} - D_{0})}{(s_{d} / \sqrt{n_{d}})}$
Rejection region:	zc<−zα $z_{c} < - z_{α}$	zc>zα $z_{c} > z_{α}$	\|zc\|>zα/2 $\| z_{c} \| > z_{α / 2}$
p-value:	P(z<zc) $P (z < z_{c})$	P(z>zc) $P (z > z_{c})$	2P(z>zc) $2 P (z > z_{c})$ if zc $z_{c}$ is positive
			2P(z<zc) $2 P (z < z_{c})$ if zc $z_{c}$ is negative
Small Sample, Student’s t-Test Statistic: tc=(x¯d−D0)(sd/nd−−√) $t_{c} = \frac{({\bar{x}}_{d} - D_{0})}{(s_{d} / \sqrt{n_{d}})}$
Rejection region:	tc<−tα $t_{c} < - t_{α}$	tc>tα $t_{c} > t_{α}$	\|tc\|>tα/2 $\| t_{c} \| > t_{α / 2}$
p-value:	P(t<tc) $P (t < t_{c})$	P(t>tc) $P (t > t_{c})$	2P(t>tc) $2 P (t > t_{c})$ if tc $t_{c}$ is positive
			2P(t<tc) $2 P (t < t_{c})$ if tc $t_{c}$ is negative
Decision: Reject H0 $H_{0}$ if α>p-value $α > p -value$ or if test statistic falls in rejection region where P(z>zα)=α, P(z>zα/2)=α/2, P(t>tα)=α, P(t>tα/2)=α/2, $P (z > z_{α}) = α, P (z > z_{α / 2}) = α / 2, P (t > t_{α}) = α, P (t > t_{α / 2}) = α / 2,$ the distribution of `t` is based on $(n d - 1) df$ $(n_{d} - 1) df$ , and α=P(Type I error)= $α = P (Type I error) =$ P(Reject H0\|H0 true) $P (Reject H_{0} \| H_{0} true)$ .

[Note: The symbol for the numerical value assigned to the difference μd $μ_{d}$ under the null hypothesis is D0 $D_{0}$ . For testing equal population means, D0=0 $D_{0} = 0$ .]

Conditions Required for Valid Large-Sample Inferences about μd $μ_{d}$

A random sample of differences is selected from the target population of differences.
The sample size nd $n_{d}$ is large (i.e., nd≥30 $n_{d} \geq 30$ ). (By the Central Limit Theorem, this condition guarantees that the test statistic will be approximately normal, regardless of the shape of the underlying probability distribution of the population.)

Conditions Required for Valid Small-Sample Inferences about μd $μ_{d}$

A random sample of differences is selected from the target population of differences.
The population of differences has a distribution that is approximately normal.

GRADS Example 7.5 Confidence Interval For μd $μ_{d}$ —Comparing Mean Salaries of Males and Females

Problem

An experiment is conducted to compare the starting salaries of male and female college graduates who find jobs. Pairs are formed by choosing a male and a female with the same major and similar grade point averages (GPAs). Suppose a random sample of 10 pairs is formed in this manner and the starting annual salary of each person is recorded. The results are shown in Table 7.5. Compare the mean starting salary μ1 $μ_{1}$ for males with the mean starting salary μ2 $μ_{2}$ for females, using a 95% confidence interval. Interpret the results.

Alternate View

Pair Male Female $Difference Male - Female$

1 $29,300 $28,800 $ 500

2 41,500 41,600 $- 100$

3 40,400 39,800 600

4 38,500 38,500 0

5 43,500 42,600 900

6 37,800 38,000 $- 200$

7 69,500 69,200 300

8 41,200 40,100 1,100

9 38,400 38,200 200

10 59,200 58,500 700

Data Set: GRADS

Pair	Male	Female	$Difference Male - Female$
1	$29,300	$28,800	$ 500
2	41,500	41,600	$- 100$
3	40,400	39,800	600
4	38,500	38,500	0
5	43,500	42,600	900
6	37,800	38,000	$- 200$
7	69,500	69,200	300
8	41,200	40,100	1,100
9	38,400	38,200	200
10	59,200	58,500	700

Solution

Since the data on annual salary are collected in pairs of males and females matched on GPA and major, a paired difference experiment is performed. To conduct the analysis, we first compute the differences between the salaries, as shown in Table 7.5. Summary statistics for these n=10 $n = 10$ differences are displayed at the top of the SAS printout shown in Figure 7.13.

Figure 7.13

SAS analysis of salary differences

The 95% confidence interval for μd=(μ1−μ2) $μ_{d} = (μ_{1} - μ_{2})$ for this small sample is

$x ¯ d \pm t α / 2 s d n d - - \sqrt$ ${\bar{x}}_{d} \pm t_{α / 2} \frac{s_{d}}{\sqrt{n_{d}}}$

where tα/2=t.025=2.262 $t_{α / 2} = t_{.025} = 2.262$ (obtained from Table II , Appendix B) is based on nd−1=9 $n_{d} - 1 = 9$ degrees of freedom. Substituting the values of x¯¯¯d ${\overline{x}}_{d}$ and sd $s_{d}$ shown on the printout, we obtain

$x ¯ d \pm 2.262 s d n d - - \sqrt = = 400 \pm 2.262 (434.613 10 - - \sqrt) 400 \pm 310.88 \approx 400 \pm 311 = ($ 89, $ 711)$ $\begin{array}{l} \begin{matrix} {\bar{x}}_{d} \pm 2.262 \frac{s_{d}}{\sqrt{n_{d}}} \end{matrix} & = & 400 \pm 2.262 (\frac{434.613}{\sqrt{10}}) \\ = & 400 \pm 310.88 \approx 400 \pm 311 = ($ 89, $ 711) \end{array}$

[Note: This interval is also shown highlighted at the bottom of the SAS printout of Figure 7.13.] Our interpretation is that the true mean difference between the starting salaries of males and females falls between $89 and $711, with 95% confidence. Since the interval falls above 0, we infer that μ1−μ2>0; $μ_{1} - μ_{2} > 0;$ that is, the mean salary for males exceeds the mean salary for females.

Look Back

Remember that μd=μ1−μ2. $μ_{d} = μ_{1} - μ_{2} .$ So if μd>0, $μ_{d} > 0,$ then μ1>μ2. $μ_{1} > μ_{2} .$ Alternatively, if μd<0, $μ_{d} < 0,$ then μ1<μ2. $μ_{1} < μ_{2} .$

Now Work Exercise 7.41

To measure the amount of information about (μ1−μ2) $(μ_{1} - μ_{2})$ gained by using a paired difference experiment in Example 7.5 rather than an independent samples experiment, we can compare the relative widths of the confidence intervals obtained by the two methods. A 95% confidence interval for (μ1−μ2) $(μ_{1} - μ_{2})$ obtained from a paired difference experiment is, from Example 7.5, ($89, $711). If we mistakenly analyzed the same data as though this were an independent samples experiment,* we would first obtain the descriptive statistics shown in the SAS printout of Figure 7.14. Then we substitute the sample means and standard deviations shown on the printout into the formula for a 95% confidence interval for (μ1−μ2) $(μ_{1} - μ_{2})$ using independent samples. The result is

(x ¯ 1 - x ¯ 2) \pm t .025 s 2 p (1 n 1 + 1 n 2) - - - - - - - - - - - - \sqrt

$({\bar{x}}_{1} - {\bar{x}}_{2}) \pm t_{.025} \sqrt{s_{p}^{2} (\frac{1}{n_{1}} + \frac{1}{n_{2}})}$

where

s 2 p = ( n 1 - 1 ) s 2 1 + ( n 2 - 1 ) s 2 2 n 1 + n 2 - 2

$s_{p}^{2} = \frac{(n_{1} - 1) s_{1}^{2} + (n_{2} - 1) s_{2}^{2}}{n_{1} + n_{2} - 2}$

SPSS analysis of salaries, assuming independent samples

SPSS performed these calculations and obtained the interval ($−10,537.50, $11,337.50), $($ - 10, 537.50, $ 11, 337.50),$ highlighted in Figure 7.14.

Notice that the independent samples interval includes 0. Consequently, if we were to use this interval to make an inference about (μ1−μ2), $(μ_{1} - μ_{2}),$ we would incorrectly conclude that the mean starting salaries of males and females do not differ! You can see that the confidence interval for the independent sampling experiment is about 35 times wider than for the corresponding paired difference confidence interval. Blocking out the variability due to differences in majors and grade point averages significantly increases the information about the difference in males’ and females’ mean starting salaries by providing a much more accurate (a smaller confidence interval for the same confidence coefficient) estimate of (μ1−μ2). $(μ_{1} - μ_{2}) .$

You may wonder whether a paired difference experiment is always superior to an independent samples experiment. The answer is, most of the time, but not always. We sacrifice half the degrees of freedom in the t-statistic when a paired difference design is used instead of an independent samples design. This is a loss of information, and unless that loss is more than compensated for by the reduction in variability obtained by blocking (pairing), the paired difference experiment will result in a net loss of information about (μ1−μ2). $(μ_{1} - μ_{2}) .$ Thus, we should be convinced that the pairing will significantly reduce variability before performing a paired difference experiment. Most of the time, this will happen.

Ethics in Statistics

In a two-group analysis, intentionally pairing observations after the data have been collected in order to produce a desired result is considered unethical statistical practice.

One final note: The pairing of the observations is determined before the experiment is performed (i.e., by the design of the experiment). A paired difference experiment is never obtained by pairing the sample observations after the measurements have been acquired.

What Do You Do When the Assumption of a Normal Distribution for the Population of Differences Is Not Satisfied?

Answer: Use the nonparametric Wilcoxon signed rank test for the paired difference design. (see optional Section 7.6.)

Exercises 7.29–7.52

Understanding the Principles

7.29 In a paired difference experiment, when should the observations be paired, before or after the data are collected?
7.30 What are the advantages of using a paired difference experiment over an independent samples design?
7.31 True or False. In a paired difference experiment, x¯¯¯d=x¯¯¯1−x¯¯¯2 ${\overline{x}}_{d} = {\overline{x}}_{1} - {\overline{x}}_{2}$ .
7.32 What conditions are required for valid large-sample inferences about μd $μ_{d}$ ? small-sample inferences?

Learning the Mechanics

7.33 A paired difference experiment yielded nd $n_{d}$ pairs of observations. In each case, what is the rejection region for testing H0:μd=2 $H_{0} : μ_{d} = 2$ against Ha:μd>2 $H_{a} : μ_{d} > 2$ ?
1. nd=10,α=.05 $n_{d} = 10, α = .05$
2. nd=20,α=.10 $n_{d} = 20, α = .10$
3. nd=5,α=.025 $n_{d} = 5, α = .025$
4. nd=9,α=.01 $n_{d} = 9, α = .01$
7.34 A paired difference experiment produced the following data:

$n d = 16 x ¯ ¯ ¯ 1 = 143 x ¯ ¯ ¯ 2 = 150 x ¯ ¯ ¯ d = - 7 s 2 d = 64$ $n_{d} = 16 {\overline{x}}_{1} = 143 {\overline{x}}_{2} = 150 {\overline{x}}_{d} = - 7 s_{d}^{2} = 64$
1. Determine the values of t for which the null hypothesis μ1−μ2=0 $μ_{1} - μ_{2} = 0$ would be rejected in favor of the alternative hypothesis μ1−μ2<0. $μ_{1} - μ_{2} < 0.$ Use α=.10. $α = .10 .$
2. Conduct the paired difference test described in part a. Draw the appropriate conclusions.
3. What assumptions are necessary so that the paired difference test will be valid?
4. Find a 90% confidence interval for the mean difference μd. $μ_{d} .$
5. Which of the two inferential procedures, the confidence interval of part d or the test of hypothesis of part b, provides more information about the difference between the population means?
L07035 7.35 The data for a random sample of six paired observations are shown in the following table.

Pair Sample from Population 1 Sample from Population 2

1 7 4

2 3 1

3 9 7

4 6 2

5 4 4

6 8 7
1. Calculate the difference between each pair of observations by subtracting observation 2 from observation 1. Use the differences to calculate x¯¯¯d ${\overline{x}}_{d}$ and s2d. $s_{d}^{2} .$
2. If μ1 $μ_{1}$ and μ2 $μ_{2}$ are the means of populations 1 and 2, respectively, express μd $μ_{d}$ in terms of μ1 $μ_{1}$ and μ2. $μ_{2} .$
3. Form a 95% confidence interval for μd. $μ_{d} .$
4. Test the null hypothesis H0:μd=0 $H_{0} : μ_{d} = 0$ against the alternative hypothesis Ha:μd≠0. $H_{a} : μ_{d} \neq 0.$ Use α=.05. $α = .05 .$
L07036 7.36 The data for a random sample of 10 paired observations are shown in the following table.

Pair Population 1 Population 2

1 19 24

2 25 27

3 31 36

4 52 53

5 49 55

6 34 34

7 59 66

8 47 51

9 17 20

10 51 55
1. If you wish to test whether these data are sufficient to indicate that the mean for population 2 is larger than that for population 1, what are the appropriate null and alternative hypotheses? Define any symbols you use.
2. Conduct the test from part a, using α=.10. $α = .10 .$ What is your decision?
3. Find a 90% confidence interval for μd. $μ_{d} .$ Interpret this interval.
4. What assumptions are necessary to ensure the validity of the preceding analysis?
7.37 A paired difference experiment yielded the following results:

$n d = 40, x ¯ d = 11.7, s d = 6.$ $n_{d} = 40, {\bar{x}}_{d} = 11.7, s_{d} = 6.$
1. Test H0:μd=10 $H_{0} : μ_{d} = 10$ against $H_{a} : μ_{d} \neq 10,$ where $μ_{d} = (μ_{1} - μ_{2}) .$ Use $α = .05 .$
2. Report the p-value for the test you conducted in part a. Interpret the p-value.

Pair	Sample from Population 1	Sample from Population 2
1	7	4
2	3	1
3	9	7
4	6	2
5	4	4
6	8	7

Pair	Population 1	Population 2
1	19	24
2	25	27
3	31	36
4	52	53
5	49	55
6	34	34
7	59	66
8	47	51
9	17	20
10	51	55

Applying the Concepts—Basic

7.38 Summer weight-loss camp. Camp Jump Start is an 8-week summer camp for overweight and obese adolescents. Counselors develop a weight-management program for each camper that centers on nutrition education and physical activity. In a study published in Pediatrics (Apr. 2010), the body mass index (BMI) was measured for each of 76 campers both at the start and end of camp. Summary statistics on BMI measurements are shown in the table.

Mean Standard Deviation

Starting BMI 34.9 6.9

Ending BMI 31.6 6.2

Paired Differences 3.3 1.5

Based on Huelsing, J., Kanafani, N., Mao, J., and White, N. H. “Camp Jump Start: Effects of a residential summer weight-loss camp for older children and adolescents.” Pediatrics, Vol. 125, No. 4, Apr. 2010 (Table 3).
1. Give the null and alternative hypothesis for determining whether the mean BMI at the end of camp is less than the mean BMI at the start of camp.
2. How should the data be analyzed, as an independent-samples t-test or as a paired-difference t-test? Explain.
3. Calculate the test statistic using the formula for an independent-samples t-test. (Note: This is not how the test should be conducted.)
4. Calculate the test statistic using the formula for a paired-difference t-test.
5. The p-value of the test, part d, was reported as $p < .0001 .$ Interpret this result assuming $α = .01$ .
6. Do the differences in BMI values need to be normally distributed in order for the inference, part f, to be valid? Explain.
7. Find a 99% confidence interval for the true mean change in BMI for Camp Jump Start campers. Interpret the result.
7.39 Packaging of a children’s health food. Refer to the Journal of Consumer Behaviour (Vol. 10, 2011) study of packaging of a children’s health food product, Exercise 8.42 (p. 391). Recall that a fictitious brand of a healthy food product—sliced apples—was packaged to appeal to children (a smiling cartoon apple on the front of the package). The researchers compared the appeal of this fictitious brand to a commercially available brand of sliced apples that was not packaged for children. Each of 408 schoolchildren rated both brands on a 5-point “willingness to eat” scale, with $1 = “ not willing a tall ”$ and $5 = “ very willing . ”$ The fictitious brand had a sample mean score of 3.69, while the commercially available brand had a sample mean score of 3.00. The researchers wanted to compare the population mean score for the fictitious brand, $μ_{F}$ , to the population mean score for the commercially available brand, $μ_{C}$ . They theorized that $μ_{F}$ will be greater than $μ_{C}$ .
1. Specify the null and alternative hypothesis for the test.
2. Explain how the researchers should analyze the data and why.
3. The researchers reported a test statistic value of 5.71. Interpret this result. Use $α = .05$ to draw your conclusion.
4. Find the approximate p-value of the test.
5. Could the researchers have tested at $α = .01$ and arrived at the same conclusion?

	Mean	Standard Deviation
Starting BMI	34.9	6.9
Ending BMI	31.6	6.2
Paired Differences	3.3	1.5

7.40 DRILL2 Twinned drill holes. A traditional method of verifying mineralization grades in mining is to drill twinned holes, i.e., the drilling of a new hole, or “twin,” next to an earlier drillhole. The use of twinned drill holes was investigated in Exploration and Mining Geology (Vol. 18, 2009). Geologists use data collected at both holes to estimate the total amount of heavy minerals (THM) present at the drilling site. The data in the next table (based on information provided in the journal article) represent THM percentages for a sample of 15 twinned holes drilled at a diamond mine in Africa. The geologists want to know if there is any evidence of a difference in the true THM means of all original holes and their twin holes drilled at the mine.

Explain why the data should be analyzed as paired differences.
Compute the difference between the “1st hole” and “2nd hole” measurements for each drilling location.
Find the mean and standard deviation of the differences, part b.
Use the summary statistics, part c, to find a 90% confidence interval for the true mean difference (“1st hole” minus “2nd hole”) in THM measurements.

Interpret the interval, part d. Can the geologists conclude that there is no evidence of a difference in the true THM means of all original holes and their twin holes drilled at the mine?

Location	1st Hole	2nd Hole
1	5.5	5.7
2	11.0	11.2
3	5.9	6.0
4	8.2	5.6
5	10.0	9.3
6	7.9	7.0
7	10.1	8.4
8	7.4	9.0
9	7.0	6.0
10	9.2	8.1
11	8.3	10.0
12	8.6	8.1
13	10.5	10.4
14	5.5	7.0
15	10.0	11.2

MUSEUM 7.41 Healing potential of handling museum objects. Does handling a museum object have a positive impact on a sick patient’s well-being? To answer this question, researchers at the University College London collected data from 32 sessions with hospital patients (Museum & Society, Nov. 2009). Each patient’s health status (measured on a 100-point scale) was recorded both before and after handling museum objects such as archaeological artifacts and brass etchings. The data (simulated) are listed in the accompanying table.

Session	Before	After
1	52	59
2	42	54
3	46	55
4	42	51
5	43	42
6	30	43
7	63	79
8	56	59
9	46	53
10	55	57
11	43	49
12	73	83
13	63	72
14	40	49
15	50	49
16	50	64
17	65	65
18	52	63
19	39	50
20	59	69
21	49	61
22	59	66
23	57	61
24	56	58
25	47	55
26	61	62
27	65	61
28	36	53
29	50	61
30	40	52
31	65	70
32	59	72

Explain why the data should be analyzed as paired differences.
Compute the difference between the “before” and “after” measurements for each session.
Find the mean and standard deviation of the differences, part b.
Use the summary statistics, part c, to find a 90% confidence interval for the true mean difference (“before” minus “after”) in health status scale measurements.
Interpret the interval, part d. Does handling a museum object have a positive impact on a sick patient’s well-being?

7.42 Laughter among deaf signers. The Journal of Deaf Studies and Deaf Education (Fall 2006) published an article on vocalized laughter among deaf users of American Sign Language (ASL). In videotaped ASL conversations among deaf participants, 28 laughed at least once. The researchers wanted to know if they laughed more as speakers (while signing) or as audience members (while listening). For each of the 28 deaf participants, the number of laugh episodes as a speaker and the number of laugh episodes as an audience member were determined. One goal of the research was to compare the mean numbers of laugh episodes of speakers and audience members.
1. Explain why the data should be analyzed as a paired difference experiment.
2. Identify the study’s target parameter.
3. The study yielded a sample mean of 3.4 laughter episodes for speakers and a sample mean of 1.3 laughter episodes for audience members. Is this sufficient evidence to conclude that the population means are different? Explain.
4. A paired difference t-test resulted in $t = 3.14$ and $p -v a l u e < .01 .$ Interpret the results in the words of the problem.
7.43 The placebo effect and pain. According to research published in Science (Feb. 20, 2004), the mere belief that you are receiving an effective treatment for pain can reduce the pain you actually feel. Researchers tested this placebo effect on 24 volunteers as follows: Each volunteer was put inside a magnetic resonance imaging (MRI) machine for two consecutive sessions. During the first session, electric shocks were applied to their arms and the blood oxygen level–dependent (BOLD) signal (a measure related to neural activity in the brain) was recorded during pain. The second session was identical to the first, except that, prior to applying the electric shocks, the researchers smeared a cream on the volunteer’s arms. The volunteers were informed that the cream would block the pain when, in fact, it was just a regular skin lotion (i.e., a placebo). If the placebo is effective in reducing the pain experience, the BOLD measurements should be higher, on average, in the first MRI session than in the second.
1. Identify the target parameter for this study.
2. What type of design was used to collect the data?
3. Give the null and alternative hypotheses for testing the placebo effect theory.
4. The differences between the BOLD measurements in the first and second sessions were computed and summarized in the study as follows: $n_{d} = 24, {\overline{x}}_{d} = .21, s_{d} = .47 .$ Use this information to calculate the test statistic.
5. The p-value of the test was reported as $p -v a l u e = .02 .$ Make the appropriate conclusion at $α = .05 .$

Applying the Concepts—Intermediate

7.44 SHALLOW Settlement of shallow foundations. Structures built on a shallow foundation (e.g., a concrete slab-on-grade foundation) are susceptible to settlement. Consequently, accurate settlement prediction is essential in the design of the foundation. Several methods for predicting settlement of shallow foundations on cohesive soil were compared in Environmental & Engineering Geoscience (Nov. 2012). Settlement data for a sample of 13 structures built on a shallow foundation were collected. The actual settlement values (measured in millimeters) for each structure were compared to settlement predictions made using a formula that accounts for dimension, rigidity, and embedment depth of the foundation. The data are listed in the table.

Structure	Actual	Predicted
1	11	11
2	11	11
3	10	12
4	8	6
5	11	9
6	9	10
7	9	9
8	39	51
9	23	24
10	269	252
11	4	3
12	82	68
13	250	264

Source: Ozur, M. “Comparing methods for predicting immediate settlement of shallow foundations on cohesive soils based on hypothetical and real cases.” Environmental & Engineering Geoscience, Vol. 18, No. 4, Nov. 2012 (from Table 4).

What type of design was employed to collect the data?
Use the information in the accompanying SAS printout to construct a 99% confidence interval for the mean difference between actual and predicted settlement value. Give a practical interpretation of the interval.
Explain the meaning of “99% confidence” for this application.

SOLAR 7.45 Solar energy generation along highways. The potential of using solar panels constructed above national highways to generate energy was explored in the International Journal of Energy and Environmental Engineering (Dec. 2013). Two-layer solar panels (with 1 meter separating the panels) were constructed above sections of both east-west and north-south highways in India. The amount of energy (kilowatt hours) supplied to the country’s grid by the solar panels above the two types of highways was determined each month. The data for several randomly selected months are provided in the table. The researchers concluded that the “two-layer solar panel energy generation is more viable for the north-south oriented highways as compared to east-west oriented roadways.” Do you agree?

Month East-West North-South

February 8658 8921

April 7930 8317

July 5120 5274

September 6862 7148

October 8608 8936

Source: Sharma, P., and Harinarayana, T. “Solar energy generation potential along national highways.” International Journal of Energy and Environmental Engineering, Vol. 49, No. 1, Dec. 2013 (Table 3).

Month	East-West	North-South
February	8658	8921
April	7930	8317
July	5120	5274
September	6862	7148
October	8608	8936

SKIN 7.46 Estimating well scale deposits. Scale deposits can cause a serious reduction in the flow performance of a well. A study published in the Journal of Petroleum and Gas Engineering (Apr. 2013) compared two methods of estimating the damage from scale deposits (called skin factor). One method of estimating the well skin factor uses a series of Excel spreadsheets, while the second method employs EPS computer software. Skin factor data was obtained from applying both methods to 10 randomly selected oil wells: 5 vertical wells and 5 horizontal wells. The results are supplied in the accompanying table.

Compare the mean skin factor values for the two estimation methods using all 10 sampled wells. Test at $α = .05$ . What do you conclude?
Repeat part a, but analyze the data for the 5 horizontal wells only.
Repeat part a, but analyze the data for the 5 vertical wells only.

Skin Factor Values
Well (Type)	Excel Spreadsheet	EPS Software
1 (Horizontal)	44.48	37.77
2 (Horizontal)	18.34	13.31
3 (Horizontal)	19.21	7.02
4 (Horizontal)	11.70	4.77
5 (Horizontal)	9.25	1.96
6 (Vertical)	317.40	281.74
7 (Vertical)	181.44	192.16
8 (Vertical)	154.65	140.84
9 (Vertical)	77.43	56.86
10 (Vertical)	49.37	45.01

Source: Rahuma, K. M., et al. “Comparison between spreadsheet and specialized programs in calculating the effect of scale deposition on the well flow performance.” Journal of Petroleum and Gas Engineering, Vol. 4, No. 4, Apr. 2013 (Table 2).

MAWASH 7.47 Acidity of mouthwash. Acid has been found to be a primary cause of dental caries (cavities). It is theorized that oral mouthwashes contribute to the development of caries due to the antiseptic agent oxidizing into acid over time. This theory was tested in the Journal of Dentistry, Oral Medicine and Dental Education (Vol. 3, 2009). Three bottles of mouthwash, each of a different brand, were randomly selected from a drugstore. The pH level (where lower pH levels indicate higher acidity) of each bottle was measured on the date of purchase and after 30 days. The data are shown in the next table. Conduct an analysis to determine if the mean initial pH level of mouthwash differs significantly from the mean pH level after 30 days. Use $α = .05$ as your level of significance.

Mouthwash Brand Initial pH Final pH

LMW 4.56 4.27

SMW 6.71 6.51

RMW 5.65 5.58

Based on Chunhye, K. L., and Schmitz, B. C., “Determination of pH, total acid, and total ethanol in oral health products: Oxidation of ethanol and recommendations to mitigate its association with dental caries.” Journal of Dentistry, Oral Medicine and Dental Education, Vol. 3, No. 1, 2009 (Table 1).
7.48 Visual search and memory study. In searching for an item (e.g., a roadside traffic sign, a lost earring, or a tumor in a mammogram), common sense dictates that you will not reexamine items previously rejected. However, researchers at Harvard Medical School found that a visual search has no memory (Nature, Aug. 6, 1998). In their experiment, nine subjects searched for the letter “T” mixed among several letters “L.” Each subject conducted the search under two conditions: random and static. In the random condition, the locations of the letters were changed every 111 milliseconds; in the static condition, the locations of the letters remained unchanged. In each trial, the reaction time in milliseconds (i.e., the amount of time it took the subject to locate the target letter) was recorded.
1. One goal of the research was to compare the mean reaction times of subjects in the two experimental conditions. Explain why the data should be analyzed as a paired difference experiment.
2. If a visual search has no memory, then the main reaction times in the two conditions will not differ. Specify $H_{0}$ and $H_{a}$ for testing the “no-memory” theory.
3. The test statistic was calculated as $t = 1.52$ with $p -v a l u e = .15 .$ Draw the appropriate conclusion.

Mouthwash Brand	Initial pH	Final pH
LMW	4.56	4.27
SMW	6.71	6.51
RMW	5.65	5.58

DEMENT 7.49 Linking dementia and leisure activities. Does participation in leisure activities in your youth reduce the risk of Alzheimer’s disease and other forms of dementia? To answer this question, a group of university researchers studied a sample of 107 same-sex Swedish pairs of twins (Journal of Gerontology: Psychological Sciences and Social Sciences, Sept. 2003). Each pair of twins was discordant for dementia; that is, one member of each pair was diagnosed with Alzheimer’s disease while the other member (the control) was nondemented for at least five years after the sibling’s onset of dementia. The level of overall leisure activity (measured on an 80-point scale, where higher values indicate higher levels of leisure activity) of each twin of each pair 20 years prior to the onset of dementia was obtained from the Swedish Twin Registry database. The leisure activity scores (simulated on the basis of summary information presented in the journal article) are saved in the DEMENT file. The first five and last five observations are shown in the following table.

Pair	Control	Demented
1	27	13
2	57	57
3	23	31
4	39	46
5	37	37
$⋮$	$⋮$	$⋮$
103	22	14
104	32	23
105	33	29
106	36	37
107	24	1

Explain why the data should be analyzed as a paired difference experiment.
Conduct the appropriate analysis, using $α = .05 .$ Make an inference about which member of the pair, the demented or control (nondemented) twin, had the largest average level of leisure activity.

7.50 Ethical sensitivity of teachers toward racial intolerance. Many high schools have education programs that encourage teachers to embrace racial tolerance. To gauge the effectiveness of one such program that utilizes two videos of teachers engaging in racial stereotypes of their students, researchers recruited 238 high school professionals (including teachers and counselors) to participate in a study (Journal of Moral Education, Mar. 2010). Teachers watched the first video, then were given a pretest—the Quick-REST Survey—designed to measure ethical sensitivity toward racial intolerance. The teachers next participated in an all-day workshop on cultural competence. At the end of the workshop, the teachers watched the second video and again were given the Quick-REST Survey (the posttest). To determine whether the program was effective, the researchers compared the mean scores on the Quick-REST Survey using a paired-difference t-test. (Note: The higher the score on the Quick-REST Survey, the greater the level of racial tolerance.)
1. The researchers reported the sample means for the pretest and posttest as 75.85 and 80.35, respectively. Why is it dangerous to gauge the effectiveness of the program based only on these summary statistics?
2. The paired-difference t-test (posttest minus pretest) was reported as $t = 4.50$ with an associated observed significance level of p-value $< .001$ . Interpret this result.
3. What assumptions, if any, are necessary for the validity of the inference, part b?

REDLIT 7.51 Impact of red light cameras on car crashes. To combat red-light-running crashes, many states are adopting photo-red enforcement programs. In these programs, red light cameras installed at dangerous intersections photograph the license plates of vehicles that run the red light. How effective are photo-red enforcement programs in reducing red-light-running crash incidents at intersections? The Virginia Department of Transportation (VDOT) conducted a comprehensive study of its newly adopted photo-red enforcement program and published the results in a June 2007 report. In one portion of the study, the VDOT provided crash data both before and after installation of red light cameras at several intersections. The data (measured as the number of crashes caused by red light running per intersection per year) for 13 intersections in Fairfax County, Virginia, are given in the table. Analyze the data for the VDOT. What do you conclude?

Intersection	Before Camera	After Camera
1	3.60	1.36
2	0.27	0
3	0.29	0
4	4.55	1.79
5	2.60	2.04
6	2.29	3.14
7	2.40	2.72
8	0.73	0.24
9	3.15	1.57
10	3.21	0.43
11	0.88	0.28
12	1.35	1.09
13	7.35	4.92

Based on Virginia Transportation Research Council, “Research report: The impact of red light cameras (photo-red enforcement) on crashes in Virginia.” June 2007.

Applying the Concepts—Advanced

WINE40 7.52 Alcoholic fermentation in wines. Determining alcoholic fermentation in wine is critical to the wine-making process. Must/wine density is a good indicator of the fermentation point, since the density value decreases as sugars are converted into alcohol. For decades, winemakers have measured must/wine density with a hydrometer. Although accurate, the hydrometer employs a manual process that is very time consuming. Consequently, large wineries are searching for more rapid measures of density measurement. An alternative method utilizes the hydrostatic balance instrument (similar to the hydrometer, but digital). A winery in Portugal collected must/wine density measurements on white wine samples randomly selected from the fermentation process for a recent harvest. For each sample, the density of the wine at 20°C was measured with both the hydrometer and the hydrostatic balance. The densities for 40 wine samples are saved in the WINE40 file. The first five and last five observations are shown in the accompanying table. The winery will use the alternative method of mea-suring wine density only if it can be demonstrated that the mean difference between the density measurements of the two methods does not exceed .002. Perform the analysis for the winery. Provide the winery with a written report of your conclusions.

Sample	Hydrometer	Hydrostatic
1	1.08655	1.09103
2	1.00270	1.00272
3	1.01393	1.01274
4	1.09467	1.09634
5	1.10263	1.10518
$⋮$	$⋮$	$⋮$
36	1.08084	1.08097
37	1.09452	1.09431
38	0.99479	0.99498
39	1.00968	1.01063
40	1.00684	1.00526

Based on Cooperative Cellar of Borba (Adega Cooperativ a de Borba), Portugal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
7.3 Comparing Two Population Means: Paired Difference Experiments

7.3 Comparing Two Population Means: Paired Difference Experiments

Table 7.3 Reading Test Scores for Eight Pairs of “Slow Learners”

Figure 7.10

Table 7.4 Differences in Reading Test Scores

Figure 7.11

Figure 7.12

Paired Difference Confidence Interval for μd=μ1−μ2 $μ_{d} = μ_{1} - μ_{2}$

Paired Difference Test of Hypothesis for μd=μ1−μ2 $μ_{d} = μ_{1} - μ_{2}$

Conditions Required for Valid Large-Sample Inferences about μd $μ_{d}$

Conditions Required for Valid Small-Sample Inferences about μd $μ_{d}$

GRADS Example 7.5 Confidence Interval For μd $μ_{d}$ —Comparing Mean Salaries of Males and Females

Problem

Table 7.5 Data on Annual Salaries for Matched Pairs of College Graduates

Solution

Figure 7.13

Look Back

Figure 7.14

Ethics in Statistics

What Do You Do When the Assumption of a Normal Distribution for the Population of Differences Is Not Satisfied?

Exercises 7.29–7.52

Understanding the Principles

Learning the Mechanics

Applying the Concepts—Basic

Applying the Concepts—Intermediate

Skin Factor Values

Applying the Concepts—Advanced

Table of Contents for 7.3 Comparing Two Population Means: Paired Difference Experiments

Create new playlist

Sign In

Sign Up

Table of Contents for
7.3 Comparing Two Population Means: Paired Difference Experiments