6.4 Test of Hypothesis about a Population Mean: Normal (z) Statistic

When testing a hypothesis about a population mean μ, the test statistic we use will depend on whether the sample size n is large (say, n30) or small and whether we know the value of the population standard deviation, σ. In this section, we consider the large-sample case.

Because the sample size is large, the Central Limit Theorem guarantees that the sampling distribution of x is approximately normal. Consequently, the test statistic for a test based on large samples will be based on the normal z-statistic. Although the z-statistic requires that we know the true population standard deviation σ, this is rarely, if ever, the case. However, we established in Chapter 5 that when n is large, the sample standard deviation s provides a good approximation to σ, and the z-statistic can be approximated as follows:

z=x¯μ0σx¯=x¯μ0σ/nx¯μ0s/n

where μ0 represents the value of μ specified in the null hypothesis.

The setup of a large-sample test of hypothesis about a population mean is summarized in the following boxes. Both the one- and two-tailed tests are shown.

Large-Sample Test of Hypothesis about μ Based on a Normal (z) Statistic

σ known σ unknown
Test statistic: zc=(x¯μ0)(σ/n) zc=(x¯μ0)(s/n)
One-Tailed Tests Two-Tailed Test
H0:μ=μ0 H0:μ=μ0
Ha:μ>μ0 Ha:μμ0
Rejection region: zc>zα |zc|>zα/2
p-value: P(z>zc) 2P(z>zc) if zc is positive
2P(z<zc) if zc is negative
Decision: Reject H0 if α>p-value or if test statistic (zc) falls in rejection region where P(z>zα)=α,P(z>zα/2)=α/2, and α=P(TypeIerror)=P(RejectH0|H0true).

[Note: The symbol for the numerical value assigned to μ under the null hypothesis is μ0.]

Conditions Required for a Valid Large-Sample Hypothesis Test for μ

  1. A random sample is selected from the target population.

  2. The sample size n is large (e.g., n30). (Due to the Central Limit Theorem, this condition guarantees that the test statistic will be approximately normal regardless of the shape of the underlying probability distribution of the population.)

Once the test has been set up, the sampling experiment is performed and the test statistic calculated. The next box contains possible conclusions for a test of hypothesis, depending on the result of the sampling experiment.

Possible Conclusions for a Test of Hypothesis

  1. If the calculated test statistic falls in the rejection region, reject H0 and conclude that the alternative hypothesis Ha is true. State that you are rejecting H0 at the α level of significance. Remember that the confidence is in the testing process, not the particular result of a single test.

  2. If the test statistic does not fall in the rejection region, conclude that the sampling experiment does not provide sufficient evidence to reject H0 at the α level of significance. [Generally, we will not “accept” the null hypothesis unless the probability β of a Type II error has been estimated.]

DRUGRAT Example 6.5 Carrying Out a Hypothesis Test for μ—Mean Drug Response Time

Problem

  1. Refer to the neurological response-time test set up in Example 6.3 (p. 315). The sample of 100 drug-injected rats yielded the results (in seconds) shown in Table 6.3. At α=.01, use these data to conduct the test of hypothesis,

    H0:μ=1.2
    Ha:μ1.2

    Table 6.3 Drug Response Times for 100 Rats, Example 6.5

    Alternate View
    1.90 2.17 0.61 1.17 0.66 1.86 1.41 1.30 0.70 0.56
    2.00 1.27 0.98 1.55 0.64 0.60 1.55 0.93 0.48 0.39
    0.86 1.19 0.79 1.37 1.31 0.85 0.71 1.21 1.23 0.89
    1.84 0.80 0.64 1.08 0.74 0.93 1.71 1.05 1.44 0.42
    0.70 0.54 1.40 1.06 0.54 0.17 0.98 0.89 1.28 0.68
    0.98 1.14 1.16 1.64 1.16 1.01 1.09 0.77 1.58 0.99
    0.57 0.27 0.51 1.27 1.81 0.88 0.31 0.92 0.93 1.66
    0.21 0.79 0.94 0.45 1.19 1.60 0.14 0.99 1.08 1.57
    0.55 1.65 0.81 1.00 2.55 1.96 1.31 1.88 1.51 1.48
    0.61 0.05 1.21 0.48 1.63 1.45 0.22 0.49 1.29 1.40

    Data Set: DRUGRAT

Solution

  1. To carry out the test, we need to find the values of x and s. (In this study, σ is obviously unknown. So we will use s to estimate σ.) These values, x=1.0517 and s=.4982, are shown (highlighted) on the MINITAB printout of Figure 6.13. Now we substitute these sample statistics into the test statistic and obtain

    z=x¯1.2σx¯=x¯1.2σ/n1.05171.2.4982/100=2.98

    The implication is that the sample mean, 1.0517, is (approximately) three standard deviations below the null-hypothesized value of 1.2 in the sampling distribution of x.

    Recall from Example 6.3 that the rejection region for the test at α=.01 is

    Rejection region:|z|>2.575

    From Figure 6.5 (p. 316), you can see that z=2.98 falls in the lower-tail rejection region, which consists of all values of z<2.575. Therefore, this sampling experiment provides sufficient evidence to reject H0 and conclude, at the α=.01 level of significance, that the mean response time for drug-injected rats differs from the control mean of 1.2 seconds. It appears that the rats receiving an injection of the drug have a mean response time that is less than 1.2 seconds.

Figure 6.13

MINITAB Analysis of Drug Response Times, Example 6.5

Look Back

Four points about the test of hypothesis in this example apply to all statisti­cal tests:

  1. Since z is less than 2.575, it is tempting to state our conclusion at a significance level lower than α=.01. We resist this temptation because the level of α is determined before the sampling experiment is performed. If we decide that we are willing to tolerate a 1% Type I error rate, the result of the sampling experiment should have no effect on that decision. In general, the same data should not be used both to set up and to conduct the test.

  2. When we state our conclusion at the .01 level of significance, we are referring to the failure rate of the procedure, not the result of this particular test. We know that the test procedure will lead to the rejection of the null hypothesis only 1% of the time when in fact μ=1.2. Therefore, when the test statistic falls into the rejection region, we infer that the alternative μ1.2 is true and express our confidence in the procedure by quoting either the α level of significance or the 100(1α)% confidence level.

  3. Although a test may lead to a “statistically significant” result (i.e., rejecting H0 at significance level α, as in the preceding test), it may not be “practically significant.” For example, suppose the neurologist tested n=100,000 drug-injected rats, resulting in x=1.1995 and s=.05. Now a two-tailed hypothesis test of H0:μ=1.2 results in a test statistic of

    z=(1.19951.2).05/100,000=3.16

    This result at α=.01 leads us to “reject H0” and conclude that the mean μ is “statistically different” from 1.2. However, for all practical purposes, the sample mean x=1.1995 and the hypothesized mean μ=1.2 are the same. Because the result is not “practically significant,” the neurologist is not likely to consider a unit dose of the drug as an inhibitor to response time in rats. Consequently, not all “statistically significant” results are “practically significant.”

  4. Finally, the inference derived from a two-tailed test of hypothesis will match the inference obtained from a confidence interval whenever the same value of α is used for both. To see this, a 99% confidence interval for μ is shown on the MINITAB printout, Figure 6.13. Note that the interval, (.92, 1.18), does not include the hypothesized value, μ=1.2. Consequently, both the confidence interval (usingα=.01) and the two-tailed hypothesis test (usingα=.01) lead to the same conclusion—namely, that the true mean drug response time is not equal to 1.2 seconds.

Now Work Exercise 6.40

DRUGRAT Example 6.6 Using p-Values—Test of Mean Drug Response Time

Problem

  1. Find the observed significance level (p-value) for the test of the mean drug response time in Examples 6.3 and 6.5. Interpret the result.

Solution

  1. Again, we are testing H0:μ=1.2 seconds versus Ha:μ1.2 seconds. The observed value of the test statistic in Example 6.5 was z=2.98, and any value of z less than 2.98 or greater than 2.98 (because this is a two-tailed test) would be even more contradictory to H0. Therefore, the observed significance level for the test is

    p-value=P(z<2.98orz>2.98)=P(|z|>2.98)

    Thus, we calculate the area below the observed z-value, z=2.98, and double it. Consulting Table II in Appendix B, we find that P(z<2.98)=.5.4986=.0014. Therefore, the p-value for this two-tailed test is

    2P(z<2.98)=2(.0014)=.0028

    This p-value can also be obtained using statistical software. The rounded p-value is shown (highlighted) on the MINITAB printout, Figure 6.13. Since α=.01 is greater than p-value =.0028, our conclusion is identical to that in Example 6.5—reject H0.

Look Back

We can interpret this p-value as a strong indication that the drug has an effect on mean rat response because we would observe a test statistic this extreme or more extreme only 28 in 10,000 times if the drug had no effect (μ=1.2). The extent to which the mean differs from 1.2 could be better determined by calculating a confidence interval for μ. As we saw in Example 6.5, the 99% confidence interval for μ is (.92, 1.18).

Now Work Exercise 6.36

LOS Example 6.7 Using p-Values—Test of Mean Hospital Length of Stay

Problem

  1. Knowledge of the amount of time a patient occupies a hospital bed—called length of stay (LOS)—is important for allocating resources. At one hospital, the mean LOS was determined to be 5 days. A hospital administrator believes the mean LOS may now be less than 5 days due to a newly adopted managed health care system. To check this, the LOSs (in days) for 100 randomly selected hospital patients were recorded; these data are listed in Table 6.4. Suppose we want to test the hypothesis that the true mean length of stay at the hospital is less than 5 days; that is,

    H0:μ=5(Mean LOS after adoption is 5 days)Ha:μ<5(Mean LOS after adoption is less than 5 days)

    Assuming that σ=3.68, use the data in the table to conduct the test at α=.05.

    Table 6.4 Lengths of Stay for 100 Hospital Patients

    Alternate View
    2 3 8 6 4 4 6 4 2 5
    8 10 4 4 4 2 1 3 2 10
    1 3 2 3 4 3 5 2 4 1
    2 9 1 7 17 9 9 9 4 4
    1 1 1 3 1 6 3 3 2 5
    1 3 3 14 2 3 9 6 6 3
    5 1 4 6 11 22 1 9 6 5
    2 2 5 4 3 6 1 5 1 6
    17 1 2 4 5 4 4 3 2 3
    3 5 2 3 3 2 10 2 4 2

    Data Set: LOS

Solution

  1. The data were entered into a computer, and MINITAB was used to conduct the analysis. The MINITAB printout for the lower-tailed test is displayed in Figure 6.14. Both the test statistic, z=1.28, and the p-value of the test, p=.101, are highlighted on the MINITAB printout. Since the p-value exceeds our selected α value, α=.05, we cannot reject the null hypothesis. Hence, there is insufficient evidence (at α=.05) to conclude that the true mean LOS at the hospital is less than 5 days.

    Figure 6.14

    MINITAB lower-tailed test of mean LOS, Example 6.7

Look Back

A hospital administrator, desirous of a mean length of stay less than 5 days, may be tempted to select an α level that leads to a rejection of the null hypothesis after determining p-value=.101. There are two reasons one should resist this temptation. First, the administrator would need to select α>.101 (say, α=.15) in order to conclude that Ha:μ<5 is true. A Type I error rate of 15% is considered too large by most researchers. Second, and more importantly, such a strategy is considered unethical statistical practice. (See marginal note, p. 384.)

Now Work Exercise 6.45

In the next section, we demonstrate how to conduct a test for μ when the sample size is small.

Statistics in Action Revisited

Testing a Population Mean in the KLEENEX® Survey

Refer to Kimberly-Clark Corporation’s survey of 250 people who kept a count of their use of KLEENEX® tissues in diaries (p. 316). We want to test the claim made by marketing experts that μ, the average number of tissues used by people with colds, is greater than 60 tissues. That is, we want to test

H0:μ=60Ha:μ>60

We will select α=.05 as the level of significance for the test.

The survey results for the 250 sampled KLEENEX® users are stored in the TISSUES data file. A MINITAB analysis of the data yielded the printout displayed in Figure SIA6.1.

The observed significance level of the test, highlighted on the printout, is p-value=000. Since this p-value is less than α=.05, we have sufficient evidence to reject H0; therefore, we conclude that the mean number of tissues used by a person with a cold is greater than 60 tissues. This result supports the company’s decision to put more than 60 (in fact, 68) tissues in a cold-care (anti-viral) box of KLEENEX.

Figure SIA6.1

MINITAB test of μ=60 for KLEENEX survey

Data Set: TISSUEs

Exercises 6.31–6.50

Understanding the Principles

  1. 6.31 What conditions are required for a valid large-sample test for μ?

  2. 6.32 For what values of the test statistic do you reject H0? fail to reject H0?

Learning the Mechanics

  1. 6.33 A random sample of 100 observations from a population with standard deviation 60 yielded a sample mean of 110.

    1. Test the null hypothesis that μ=100 against the alternative hypothesis that μ>100, using α=.05. Interpret the results of the test.

    2. Test the null hypothesis that μ=100 against the alternative hypothesis that μ100, using α=.05. Interpret the results of the test.

    3. Compare the p-values of the two tests you conducted. Explain why the results differ.

  2. 6.34 A random sample of 64 observations produced the following summary statistics: x=.323 and s2=.034.

    1. Test the null hypothesis that μ=.36 against the alternative hypothesis that μ<.36, using α=.10.

    2. Test the null hypothesis that μ=.36 against the alternative hypothesis that μ.36, using α=.10. Interpret the result.

  3. 6.35 Suppose you are interested in conducting the statistical test of H0:μ=200 against Ha:μ>200, and you have decided to use the following decision rule: Reject H0 if the sample mean of a random sample of 100 items is more than 215. Assume that the standard deviation of the population is 80.

    1. Express the decision rule in terms of z.

    2. Find α, the probability of making a Type I error, by using this decision rule.

Applet Exercise 6.2

Use the applet entitled Hypotheses Test for a Mean to investigate the effect of the underlying distribution on the proportion of Type I errors. For this exercise, take nullmean=50, and alternative <.

  1. Select the normal distribution and run the applet several times without clearing. What happens to the proportion of times the null hypothesis is rejected at the .05 level as the applet is run more and more times?

  2. Clear the applet and then repeat part a, using the right-skewed distribution. Do you get similar results? Explain.

  3. Describe the effect that the underlying distribution has on the probability of making a Type I error.

Applet Exercise 6.3

Use the applet entitled Hypotheses Test for a Mean to investigate the effect of the underlying distribution on the proportion of Type II errors. For this exercise, take nullmean=52, and alternative <.

  1. Select the normal distribution and run the applet several times without clearing. What happens to the proportion of times the null hypothesis is rejected at the .01 level as the applet is run more and more times? Is this what you would expect? Explain.

  2. Clear the applet and then repeat part a, using the right-skewed distribution. Do you get similar results? Explain.

  3. Describe the effect that the underlying distribution has on the probability of making a Type II error.

Applet Exercise 6.4

Use the applet entitled Hypotheses Test for a Mean to investigate the effect of the null mean on the probability of making a Type II error. For this exercise, take standarddeviation=10, and alternative < with the normal distribution. Set the null mean to 55 and run the applet several times without clearing. Record the proportion of Type II errors that occurred at the .01 level. Clear the applet and repeat for null means of 54, 53, 52, and 51. What can you conclude about the probability of a Type II error as the null mean gets closer to the actual mean? Can you offer a reasonable explanation for this behavior?

Applying the Concepts—Basic

  1. FUP 6.36 Stability of compounds in new drugs. Refer to Consider the ACS Medicinal Chemistry Letters (Vol. 1, 2010) study of the metabolic stability of drugs, Exercise 2.35 (p. 49). Recall that two Two important values computed from the testing phase are the fraction of compound unbound to plasma (fup) and the fraction of compound unbound to microsomes (fumic). A key formula for assessing stability assumes that the fup/fumic ratio is 1. Pharmacologists at Pfizer Global Research and Development tested 416 drugs and reported the fup/fumic ratio for each. These data are saved in the FUP file, and summary statistics are provided in the accompanying MINITAB printout. Suppose the pharmacologists want to determine if the true mean ratio, μ, differs from 1.

    1. Specify the null and alternative hypothesis for this test.

    2. Descriptive statistics for the sample ratios are provided in the accompanying MINITAB printout. Note that the sample mean, x =.327, is less than 1. Consequently, a pharmacologist wants to reject the null hypothesis. What are the problems with using such a decision rule?

    3. Locate values of the test statistic and corresponding p-value on the printout.

    4. Select a value of α, the probability of a Type I error. Interpret this value in the words of the problem.

    5. Give the appropriate conclusion, based on the results of parts c and d.

    6. What conditions must be satisfied for the test results to be valid?

  2. SUSTAIN 6.37 Corporate sustainability of CPA firms. Refer to Consider the Business and Society (Mar. 2011) study on the sustainability behaviors of CPA corporations, Exercise 5.18 (p. 262). Recall that the The level of support for corporate sustainability (measured on a quantitative scale ranging from 0 to 160 points) was obtained for each in a sample of 992 senior managers at CPA firms. The data (where higher point values indicate a higher level of support for sustainability) are saved in the referenced file. The CEO of a CPA firm claims that the true mean level of support for sustainability is 75 points.

    1. Specify the null and alternative hypotheses for testing this claim.

    2. For this problem, what is a Type I error? A Type II error?

    3. The SPSS printout at the bottom of this page gives the results of the test. Locate the p-value on the printout.

    4. At α=.05, give the appropriate conclusion.

    5. What assumptions, if any, about the distribution of support levels must hold true in order for the inference derived from the test to be valid? Explain.

  3. 6.38 Speeding and young drivers. Psychologists conducted a survey of 258 student drivers and their attitudes toward speeding and reported the results in the British Journal of Educational Psychology (Vol. 80, 2010). One of the variables of interest was the response to the question, “Are you confident that you can resist your friends’ persuasion to drive faster?” Each response was measured on a 7-point scale, from 1=“definitelyno to 7=“definitelyyes. The data were collected 5 months after the students had attended a safe-driver presentation. The psychologists reported a sample mean response of 4.98 and a sample standard deviation of 1.62. Suppose it is known that the true mean response of students who do not attend a safe-driver presentation is μ=4.7.

    1. Set up the null and alternative hypotheses for testing whether the true mean student-driver response 5 months after a safe-driver presentation is larger than 4.7.

    2. Calculate the test statistic for the hypothesis test.

    3. Find the rejection region for the hypothesis test, using α=.05.

    4. State the appropriate conclusion, in the words of the problem.

    5. Do the test results indicate that the safe-driver presentation was effective in helping students feel more confident that they can resist their friends’ persuasion to drive faster? Explain.

    6. The distribution of response scores (on a 7-point scale) for all student drivers is unlikely to be normal. Does this impact the validity of the hypothesis test? Why or why not?

  4. 6.39 Emotional empathy in young adults. According to a theory in psychology, young female adults show more emotional empathy toward others than do males. The Journal of Moral Education (June 2010) tested this theory by examining the attitudes of a sample of 30 female college students. Each student completed the Ethic of Care Interview, which consisted of a series of statements on empathy attitudes. For the statement on emotional empathy (e.g., “I often have tender, concerned feelings for people less fortunate than me”), the sample mean response was 3.28. Assume the population standard deviation for females is .5. [Note: Empathy scores ranged from 0 to 4, where 0=never and 4=always.] Suppose it is known that male college students have an average emotional empathy score of μ=3.

    1. Specify the null and alternative hypotheses for testing whether female college students score higher than 3.0 on the emotional empathy scale.

    2. Compute the test statistic.

    3. Find the observed significance level (p-value) of the test.

    4. At α=.01, what is the appropriate conclusion?

    5. How small of an α-value can you choose and still have sufficient evidence to reject the null hypothesis?

  5. 6.40 Heart rate during laughter. Laughter is often called “the best medicine,” since studies have shown that laughter can reduce muscle tension and increase oxygenation of the blood. In the International Journal of Obesity (Jan. 2007), researchers at Vanderbilt University investigated the physiological changes that accompany laughter. Ninety subjects (18–34 years old) watched film clips designed to evoke laughter. During the laughing period, the researchers measured the heart rate (beats per minute) of each subject, with the following summary results: x=73.5,s=6. It is well known that the mean resting heart rate of adults is 71 beats per minute.

    1. Set up H0 and Ha for testing whether the true mean heart rate during laughter exceeds 71 beats per minute.

    2. If α=05, find the rejection region for the test.

    3. Calculate the value of the test statistic.

    4. Make the appropriate conclusion.

  6. 6.41 Facial structure of CEOs. Refer to Consider the Psychological Science (Vol. 22, 2011) study on using a chief executive officer’s facial structure to predict a firm’s financial performance, Exercise 5.21 (p. 263). Recall that the The facial width-to-height ratio (WHR) for each in a sample of 55 CEOs at publicly traded Fortune 500 firms was determined. The sample resulted in x=1.96 and s=.15. An analyst wants to predict the financial performance of a Fortune 500 firm based on the value of the true mean facial WHR of CEOs. The analyst wants to use the value of μ=2.2. Do you recommend he use this value? Conduct a test of hypothesis for μ to help you answer the question. Specify all the elements of the test: H0, Ha, test statistic, p-value, α, and your conclusion.

Applying the Concepts—Intermediate

  1. 6.42 Packaging of a children’s health food. Junk foods (e.g., potato chips) are typically packaged to appeal to children. Can similar packaging of a healthy food product influence children’s desire to consume the product? This was the question of interest in an article published in the Journal of Consumer Behaviour (Vol. 10, 2011). A fictitious brand of a healthy food product—sliced apples—was packaged to appeal to children (a smiling cartoon apple on the front of the package). The researchers showed the packaging to a sample of 408 schoolchildren and asked each whether he or she was willing to eat the product. Willingness to eat was measured on a 5-point scale, with 1=not willing at all and 5=very willing. The data are summarized as follows: x=3.69, s=2.44. Suppose the researchers know that the mean willingness to eat an actual brand of sliced apples (which is not packaged for children) is μ=3.

    1. Conduct a test to determine whether the true mean willingness to eat the brand of sliced apples packaged for children exceeds 3. Use α=.05 to make your conclusion.

    2. The data (willingness to eat values) are not normally distributed. How does this affect (if at all) the validity of your conclusion in part a? Explain.

  2. 6.43 Dating and disclosure. Refer to Consider the Journal of Adolescence (Apr. 2010) study of adolescents’ disclosure of their dating and romantic relationships, Exercise 1.33 (p. 23). Recall that a A sample of 222 high school students was recruited to participate in the study. One of the variables of interest was the level of disclosure to an adolescent’s mother (measured on a 5-point scale, where 1=nevertell, 2=rarelytell, 3=sometimestell, 4=almostalwaystell, and 5=alwaystell). The sampled high school students had a mean disclosure score of 3.26 and a standard deviation of .93. The researchers hypothesize that the true mean disclosure score of all adolescents will exceed 3. Do you believe the researchers? Conduct a formal test of hypothesis using α=.01.

  3. 6.44 Birth order and IQ. An international team of economists investigated the possible link between IQ and birth order in CESifo Economic Studies (Vol. 57, 2011). The data source for the research was the Medical Birth Registry of Norway. It is known that the mean IQ (measured in stanines) for all Norway residents is 5.2 points. In the study, a sample of 581 Norway residents who were the 6th-born or later in their families had a mean IQ score of 4.7 points with a standard deviation of 1.8 points. Is this sufficient evidence to conclude that the mean IQ score of all Norway residents who were the 6th-born or later in their families is lower than the country mean of 5.2 points? Use α=.01 as a measure of reliability for your inference.

  4. BONES 6.45 Bone fossil study. Archeologists have found that humerus bones from the same species of animal tend to have approximately the same length-to-width ratios. It is known that species A exhibits a mean ratio of 8.5. Suppose 41 fossils of humerus bones were unearthed at an archeological site in East Africa, where species A is believed to have lived. (Assume that the unearthed bones were all from the same unknown species.) The length-to-width ratios of the bones are listed in the following table.

    Alternate View
    10.73 8.89 9.07 9.20 10.33 9.98 9.84 9.59
    8.48 8.71 9.57 9.29 9.94 8.07 8.37 6.85
    8.52 8.87 6.23 9.41 6.66 9.35 8.86 9.93
    8.91 11.77 10.48 10.39 9.39 9.17 9.89 8.17
    8.93 8.80 10.02 8.38 11.67 8.30 9.17 12.00
    9.38

    1. Test whether the population mean ratio of all bones of this particular species differs from 8.5. Use α=.01.

    2. What are the practical implications of the test you conducted in part a?

  5. TURBINE 6.46 Cooling method for gas turbines. A popular cooling method for a gas turbine engine uses high-pressure inlet fogging. The performance of a sample of 67 gas turbines augmented with high-pressure inlet fogging was investigated in the Journal of Engineering for Gas Turbines and Power (Jan. 2005). One measure of performance is heat rate (kilojoules per kilowatt per hour). Heat rates for the 67 gas turbines are listed in the next table. Suppose that a standard gas turbine has, on average, a heat rate of 10,000 kJ/kWh.

    1. Conduct a test to determine whether the mean heat rate of gas turbines augmented with high-pressure inlet fogging exceeds 10,000 kJ/kWh. Use α=.05.

    2. Identify a Type I error for this study. Identify a Type II error.

    Alternate View
    14622 13196 11948 11289 11964 10526 10387 10592 10460 10086
    14628 13396 11726 11252 12449 11030 10787 10603 10144 11674
    11510 10946 10508 10604 10270 10529 10360 14796 12913 12270
    11842 10656 11360 11136 10814 13523 11289 11183 10951 9722
    10481 9812 9669 9643 9115 9115 11588 10888 9738 9295
    9421 9105 10233 10186 9918 9209 9532 9933 9152 9295
    16243 14628 12766 8714 9469 11948 12414
  6. ISR 6.47 Irrelevant speech effects. Refer to Consider the Acoustical Science & Technology (Vol. 35, 2014) study of irrelevant speech effects, Exercise 5.14 (p. 261). Recall that subjects performed a memorization task under two conditions: (1) with irrelevant background speech and (2) in silence. The difference in the error rates for the two conditions—called the relative difference in error rate (RDER)—was computed for each subject. Descriptive statistics for the RDER values are reproduced in the following SAS printout. Conduct a test to determine if the average difference in error rates for all subjects who perform the memorization tasks exceeds 75 percent. Use α=.01 and interpret the results practically.

  7. 6.48 Time required to complete a task. When asked, “How much time will you require to complete this task?”, cognitive theory posits that people will typically underestimate the time required. Would the opposite theory hold if the question was phrased in terms of how much work could be completed in a given amount of time? This was the question of interest to researchers writing in Applied Cognitive Psychology (Vol. 25, 2011). For one study conducted by the researchers, each in a sample of 40 University of Oslo students was asked how many minutes it would take to read a 32-page report. In a second study, 42 students were asked how many pages of a lengthy report they could read in 48 minutes. (The students in either study did not actually read the report.) Numerical descriptive statistics (based on summary information published in the article) for both studies are provided in the accompanying table.

    Estimated Time (minutes) Estimated Number of Pages
    Sample size, n 40 42
    Sample mean, x 60 28
    Sample standard deviation, s 41 14
    1. The researchers determined that the actual mean time it takes to read the report is μ=48 minutes. Is there evidence to support the theory that the students, on average, will overestimate the time it takes to read the report? Test using α=.10.

    2. The researchers also determined that the actual mean number of pages of the report that is read within the allotted time is μ=32 pages. Is there evidence to support the theory that the students, on average, will underestimate the number of report pages that can be read? Test using α=.10.

    3. The researchers noted that the distributions of both estimated time and estimated number of pages are highly skewed (i.e., not normally distributed). Does this fact affect the inferences derived in parts a and b? Explain.

Applying the Concepts—Advanced

  1. 6.49 Social interaction of mental patients. The Community Mental Health Journal (Aug. 2000) presented the results of a survey of over 6,000 clients of the Department of Mental Health and Addiction Services (DMHAS) in Connecticut. One of the many variables measured for each mental health patient was frequency of social interaction (on a 5-point scale, where 1=very infrequently, 3=occasionally, and 5=very frequently). The 6,681 clients who were evaluated had a mean social interaction score of 2.95 with a standard deviation of 1.10.

    1. Conduct a hypothesis test (at α=.01) to determine whether the true mean social interaction score of all Con­necticut mental health patients differs from 3.

    2. Examine the results of the study from a practical view, and then discuss why “statistical significance” does not always imply “practical significance.”

    3. Because the variable of interest is measured on a 5-point scale, it is unlikely that the population of ratings will be normally distributed. Consequently, some analysts may perceive the test from part a to be invalid and search for alternative methods of analysis. Defend or refute this position.

  2. 6.50 Instructing English-as-a-first-language learners. A study published in the journal Applied Linguistics (Feb. 2014) investigated the effects of implicit and explicit classroom interventions during the instruction of Japanese to English-as-a-first-language (EFL) students. EFL students at a Japanese university were divided into implicit ( n=44) and explicit ( n=37) groups. [Note: Implicit instruction emphasized reading comprehension only, while explicit instruction included both deductive and inductive activities designed to bring learners’ attention to rules and patterns.] Following an instruction period, all students were tested on their ability to use epistemic terms such as think, maybe, seem, and may in an essay. A summary of the epistemic type scores for the two groups of EFL students are provided in the table on p. 393.

    Table for Exercise 6.50

    Mean Score (points) Standard Deviation
    Implicit ( n=44) 5.9 2.4
    Explicit ( n=37) 9.7 3.1

    Source: Fordyce, K. “The differential effects of explicit and implicit instruction on EFL learners’ use of epistemic stance.” Applied Linguistics, Vol. 35, No. 1, Feb. 2014 (Table 6).

    1. A 95% confidence interval for μImp, the true mean epistemic type score of EFL students taught with implicit instruction, is (5.17, 6.63). Based on this interval, is there evidence that μImp differs from 6?

    2. Use the summary information in the table to test the hypothesis that μImp differs from 6 at α=.05.

    3. Explain why the inferences in parts a and b agree.

    4. A 95% confidence interval for μExp, the true mean epistemic type score of EFL students taught with explicit instruction, is (8.67, 10.73). Based on this interval, is there evidence that μExp differs from 6?

    5. Use the summary information in the table to test the hypothesis that μExp differs from 6 at α=.05.

    6. Explain why the inferences in parts d and e agree.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.63.145