Chapter 13
IN THIS CHAPTER
Understanding the basic parts of a hypothesis test
Setting up your null and alternative hypotheses
Calculating test statistics
Determining the critical values
Drawing conclusions and making interpretations
A hypothesis test is a statistical procedure designed to test a claim about a population. In other words, someone proposes a certain parameter, and you want to test the claim by using data. Testing a hypothesis is different from a confidence interval situation, where you have no idea what the parameter is and you use your data to estimate it.
In this chapter, you review the basic ideas and steps of conducting a hypothesis test, and you practice setting up and carrying out the most common hypothesis tests: the tests for a population mean, a population proportion, two population means, and two population proportions. You also practice working with the t-distribution, which you use in situations where the sample sizes are too small to use the Z-distribution, or if the population standard deviation, , is unknown (see Chapters 7 and 8 for more on the Z- and t-distributions).
The most common hypothesis test is the test for one population mean. Someone is claiming that the population mean is one value, and you are testing that claim because you believe it to be false. In this first section, you go through each step of a hypothesis test using the situation where you are testing one population mean. You also assume that the population standard deviation, , is known and that you have either a normal distribution for your population or a large enough sample size (sample ).
Every hypothesis test contains two hypotheses. The first hypothesis is called the null hypothesis, denoted Ho (pronounced “H naught”). The null hypothesis states that the population parameter is equal to the claimed value. For example, if you claim that the average test score for a population of students is 75, you have Ho: . In general, when you are conducting a hypothesis test about the population mean, you use to denote the claimed value of in the null hypothesis (for example, 75).
Along with every null hypothesis is an alternative hypothesis, denoted Ha (or ). If Ho turns out to be false, you conclude that the alternative hypothesis is true. Three possibilities exist for the second or alternative hypothesis, denoted Ha:
Which alternative hypothesis you choose when setting up your hypothesis test depends on what you want to conclude, should you have enough evidence to refute the claim (Ho). If someone claims that the average test score is higher than 75, you have Ha: . If someone thinks that the percentage of students who own cellphones is less than 65 percent, you have Ha: .
After you set up the hypotheses, the next step is to collect the data and calculate your sample statistic (the sample mean, ). Convert your statistic to a standard score so you can interpret it on a Z-table (see the Appendix). To convert your sample statistic to a test statistic:
Divide the result by the standard error of the statistic.
Assuming you have a normal distribution and the population standard deviation is known, the standard error for the sample mean is (see Chapter 10).
The test statistic is .
After you calculate your test statistic, the hard part is over. Now all you have to do is make your conclusion by seeing where the test statistic falls on the Z-distribution. You can do this in one of two ways: using critical values or p-values. This chapter deals with the critical value method; the p-value method is in Chapter 14.
Under the critical value method, before you collect your data, you set one (or two) cutoff point(s) on the Z-distribution so that if your test statistic falls beyond the cutoff point(s), you reject Ho; otherwise, you fail to reject Ho. The cutoff points are called critical values. (A critical value is much like the goal line in football; the place you have to reach to “score” a significant result and reject Ho.)
The critical value(s) is (are) determined by the significance level (or alpha level) of the test, denoted by . Alpha levels differ for each situation, but most researchers are happy with an alpha level of 0.05, much in the same way they’re happy with a 95 percent confidence level for a confidence interval. (Notice that equals the confidence level of a confidence interval.)
Table 13-1 shows some critical values for one-sided and two-sided hypothesis tests that use the Z-distribution. My computer software generated the values, so they’re more precise than what you find in the Z-table (refer to the Appendix). Many more alpha levels are possible, but this table gives you a good start. Note the not-equal-to alternative has of the probability lying between two values, not less than one value (as in the alternative) and not all greater than one value (as in the alternative).
Table 13-1 Critical Values for Hypothesis Tests Using the Z-Distribution
Alpha Level |
Alternative Hypothesis |
Critical Value(s) |
0.01 |
||
0.01 |
||
0.01 |
and |
|
0.05 |
||
0.05 |
||
0.05 |
and |
|
0.10 |
||
0.10 |
||
0.10 |
and |
After you set up the critical value(s), if the test statistic falls beyond the critical value(s), your conclusion is “reject Ho at level .” This means that the test statistic falls into the rejection region. If the test statistic doesn’t go beyond the critical value(s), you conclude “fail to reject Ho at level .” This means that the test statistic falls into the nonrejection region.
See the following for an example of setting up and conducting a hypothesis test for a population mean.
Q. Suppose that you test Ho: versus Ha: , and your sample mean is 6.5 with a sample of 10, and the population standard deviation is 0.5. Assume that your data come from a normal distribution.
A. You use this test when you’re trying to estimate a population mean.
1 Explain what’s wrong with the following hypotheses: Ho: versus Ha: .
2 Suppose that a pizza place claims its average pizza delivery time is 30 minutes, but you believe it takes longer than that. Your sample of 10 pizzas has an average delivery time of 40 minutes. Assume that the population standard deviation is 15 minutes and the times have a normal distribution. Use .
3 Suppose that a sports reporter claims the average football game lasts 3 hours, and you believe it’s more than that. Your random sample of 35 games has an average time of 3.25 hours. Assume that the population standard deviation is 1 hour. Use . What do you conclude?
4 Show how you get critical values of 1.65, , and for a right-tailed, left-tailed, and two-tailed hypothesis test (use and assume a large sample size).
It is not always the case that is known or the population has a normal distribution or n is large enough to the use the central limit theorem (CLT). The formula for the test statistic for one population mean in those cases is . Here, s is the sample standard deviation. To calculate it, you
Compare your test statistic to the critical value(s) from the t-distribution with degrees of freedom, as described in Chapter 8. Use the t-table (see the Appendix). If is unknown, or the sample size is small, use the sample standard deviation, s, instead of and use the t-distribution with degrees of freedom. If your test statistic is beyond the critical value(s), reject Ho; otherwise, fail to reject Ho.
See the following for an example of setting up a t-test for the mean.
Q. Suppose that you hear a claim that the average score on a national exam is 78. You think the average is higher than that, and your sample of 100 students produces an average of 81.
A. Setting up the hypotheses correctly is critical to your success with hypothesis tests.
5 Conduct the hypothesis test Ho: versus , where , , and . Use .
6 Conduct the hypothesis test Ho: versus Ho: , where , , and . Use .
7 Conduct the hypothesis test Ho: versus Ha: , where , , and . Use .
8 Suppose that your critical value for a left-tailed hypothesis test is . For what values of the test statistic would you reject Ho?
You use a population proportion hypothesis test when the variable is categorical (such as gender, political party, support/oppose, and so on) and you want to study only one population or group (such as all U.S. citizens or all registered voters). The test looks at the percentage (p) of individuals in the population that have a certain characteristic; for example, the percentage of households that have cellphones. The null hypothesis is Ho: , where is a certain value. For example, if the claim is that 20 percent of homes have cellphones, is 0.20. The alternative hypothesis is one of the following: , , or .
The formula for the test statistic for a single proportion is . To find it, follow these steps:
Compare your test statistic to the critical value that you previously set. If the test statistic is beyond the critical value(s), reject Ho.
See the following for an example of setting up and conducting a hypothesis test for a proportion.
Q. Suppose that a political candidate claims the percentage of uninsured drivers is 30 percent, but you believe the percentage is more than 30.
A. First, make sure you can identify that this is a hypothesis test about a proportion. It has a claim that’s being challenged or tested, and the claim is about a percentage (or proportion). That’s how you know.
9 Carry out the hypothesis test of Ho: versus Ha: , where and . Use .
10 Carry out the hypothesis test of versus , where and . Use .
11 Carry out the hypothesis test of Ho: versus Ha: , with and , where x is the number of people in the sample that have the characteristic of interest. Use .
12 Suppose that you want to test the fairness of a single die, so you concentrate on the proportion of 1s that come up. Write down the null and alternative hypotheses for this test.
This test is used when the variable is numerical (such as income, cholesterol level, or miles per gallon) and two populations or groups are being compared (such as men versus women, athletes versus nonathletes, or cars versus SUVs). Two separate random samples need to be selected, one from each population, to collect the data needed for this test. The null hypothesis is that the two population means are the same; in other words, their difference is equal to 0. The notation for the hypotheses is Ho: , where represents the mean of the first population, and represents the mean of the second population.
The formula for the test statistic comparing two means when both populations are normal (or approximately normal) and both population standard deviations are known is . To find it, follow these steps:
Compare your test statistic to the critical value from the Z-distribution (Table 13-1).
See the following for an example conducting a hypothesis test for two means.
Q. A teacher instructs two statistics classes with two different teaching methods (Group 1: computer versus Group 2: pencil/paper). She wants to see whether the computer method works better by comparing average final exam scores for the two groups. She selects volunteers to be in each group.
A. The teacher uses a test for two population means, because she compares the average exam scores, and exam scores are a quantitative variable.
13 Conduct the hypothesis test Ho: versus Ha: , where , , , , , and . Use .
14 Conduct the hypothesis test Ho: versus Ha: , where , , , , , . Use .
15 Conduct the hypothesis test Ho: versus Ha: , where , , , , , . Use .
16 Suppose that you conducted a hypothesis test for two means (group one mean minus group two mean) and you reject Ho: versus Ha: . You conclude that the two population means aren’t equal. Can you say a little more? Explain how you can tell from the sign on the test statistic which group probably has the higher mean.
You use this test when the variable is numerical (such as income, cholesterol level, or miles per gallon) and when you pair up the individuals in the sample in some way (identical twins are often used) or use the same people twice (with a pre- and post-test, for example). Researchers typically use paired tests for medical studies when they test to see whether a certain treatment works, without having to worry about other factors associated with the subjects that may influence the results. For example, you want to compare a new blood pressure drug to an existing one to see whether it does a better job. To make it a fair test, you pair up the people in the study according to their weight, age, fitness level, and severity of blood pressure problems. With these pairing parameters, you can attribute any difference in blood pressure to the drug. (See Chapter 16 for more information on designed experiments like this one.)
You collect the data in pairs, and for each pair, you find the difference between the values. For example, suppose that you have a pair of subjects in the blood pressure test. Suppose that the first person in the pair is in the “current drug” group, and her blood pressure is 190 after the experiment. Suppose that the second person is in the “new drug” group, and her blood pressure is 180 after the experiment. The difference in blood pressures for this pair is , or . (That means the new drug drops the blood 10 points more than the current drug for this pair.) You make the same calculations for each pair of subjects in the study.
The set of all the paired differences becomes your new (single) data set. This test is now the same as a test for a single population mean, and the null hypothesis is that the mean is equal to 0. (The average of all the differences should be 0 if the null hypothesis is true.) The notation for the null hypothesis is Ho: , where is the mean of the paired differences.
The formula for the test statistic for paired differences is . To calculate it, run through the following steps:
Find divided by the standard error from Step 3.
Remember, under the assumption that Ho is true.
If the number of pairs (n) is 30 or more, compare your test statistic to the critical value(s) from the standard normal distribution (see the Z-table in the Appendix or Table 13-1 earlier in this chapter). If the number of pairs (n) is less than 30, compare your test statistic to the critical value(s) from the t-distribution with degrees of freedom (see the t-table in the Appendix). If your test statistic is beyond the critical value(s), reject Ho. If not, fail to reject Ho.
See the following for an example involving a matched-pairs test.
Q. Suppose that you use a paired t-test using matched pairs to find out whether a certain weight-loss method works. You measure the participants’ weights before and after the study, and you take weight before minus weight after as your pairs of data.
A. The signs on the differences are important when making comparisons. If you subtract two numbers and get a positive result, that means the first number is larger than the second. If the result is negative, the second is larger than the first.
Note: If you switch the data around and take weight after minus weight before, you have to switch the sign in the alternative hypothesis to be (less than). Most people like to use positive values, and greater-than signs () produce them. If you have a choice, always order the groups so the one that may have the higher average is Group 1.
17 Conduct the hypothesis test Ho: versus Ha: , where , , and . Use .
18 Conduct the hypothesis test Ho: versus Ha: , where , , and . Use .
You use this test when the variable is categorical (such as smoker/nonsmoker, political party, support/oppose an opinion, and so on) and when you want to know the percentage of individuals with a certain characteristic (such as the percentage of smokers). In this case, you compare two populations or groups (such as men versus women or Democrats versus Republicans). To conduct this test, you need to select two separate random samples — one from each population. The null hypothesis is that the two population proportions are the same; in other words, their difference is equal to 0. The notation for the null hypothesis is Ho: , where is the percentage from the first population and is the percentage from the second population.
The formula for the test statistic comparing two proportions is . To find it, follow these steps:
Compare your test statistic to the critical value that you previously set. If the test statistic is beyond the critical value(s), reject Ho. If not, fail to reject Ho. In most cases, the critical value will be on the Z-distribution because the sample sizes will be large in these situations. To be sure, check that and are both at least 10.
See the following for an example of setting up a hypothesis test for two proportions.
Q. Suppose that you want to test whether there’s a higher percentage of males who are Democrat than females who are Democrat.
A. Your two populations are males and females, and you compare the percentage of Democrats in each group. This means equals the percentage of all males who are Democrat, and equals the percentage of all females who are Democrat.
Note: Saying Ho: is the same as saying Ho: . Take the first equation and subtract from each side. The second version gives you a number to put in the null hypothesis (0), which is nice. That’s because if the proportions are equal, their difference has to be 0.
19 Conduct the hypothesis test Ho: versus Ha: , where , , , , . Use .
20 Conduct the hypothesis test Ho: versus Ha: , where , , , . Use .
1 You see in both hypotheses, which is wrong. The null and alternative hypotheses make statements about the population parameter, not about the sample statistic.
Instructors get really bent out of shape when they see this kind of mistake, so avoid it at all costs. Always keep sample statistics like and out of Ho and Ha. Use population parameters like and p instead.
2 The process of doing a hypothesis test in full requires several steps: setting up the hypotheses, finding the critical value, calculating the test statistic, and making the decision.
Make sure you can identify these problems out of context — in other words, when the problems are all mixed up on an exam. You may want to copy some problems, mark where they come from, mix them up, and try to solve them. Also, as you go through the problems in this workbook, always think about how to recognize the types of problems in a test situation and how to approach them. I give you clues to look for; write them down in a quick outline to help you study.
3 Because the claim is that the average football game lasts 3 hours, and you believe it’s more than that, the hypotheses are Ho: and Ha: . Your random sample of 35 games has an average time of 3.25 hours, and the population standard deviation is 1 hour so the test statistic is . Using , the critical value is 1.96 from the Z-distribution so the conclusion is fail to reject Ho because the test statistic is less than the critical value. You can’t say with these data that the average football game lasts more than 3 hours.
4 In a right-tailed test, Ha has a greater-than sign () in it. Your critical value on the Z-distribution is 1.64, because the probability of being below 1.64 is equal to 0.95 (see Table 13-1 earlier in this chapter), and the probability of being beyond (in this case, “above”) 1.64 is . If you run a left-tailed test (Ha has a less-than [] sign in it), your critical value is , because the area beyond (in this case, “below”) –1.64 is 0.05. If you run a two-tailed test (Ha has a not-equal-to sign in it), you have two critical values — one positive and one negative — and the total area beyond them (in both directions) is 0.05. That means the area above the positive one is , and the area below the negative one is 0.025. The critical values in that case are , as seen in Table 13-1.
5 You have a right-tailed test for one population mean with , , , and . The test statistic is . The critical value for this right-tailed test with is (see t-table in the appendix with 29 degrees of freedom, column for .01.). Conclusion: You shouldn’t reject Ho, because the test statistic is less than the critical value. Interpretation: According to your data, you don’t have enough evidence to say the mean is more than 7.
Don’t stop with the statistically correct conclusion: reject Ho or fail to reject Ho. Always go back to the question and try to answer it in the context of the problem as best as you can. Your instructor will love you for it.
6 You have a two-tailed test for one population mean with , , , and . The test statistic is . The critical values for this two-tailed test with are and (see Table 13-1 since the t-distribution is about the same as the Z-distribution when n is 100.) Conclusion: You shouldn’t reject Ho, because the test statistic is between rather than beyond the critical values. Interpretation: You don’t have enough evidence to say that the mean for this population is anything but 75.
When doing problems involving hypothesis tests, I recommend you immediately write down what type of test you have and what the given information is, as I do in these solutions. It helps your instructor see where you’re going, and it helps you keep it all straight.
7 You have a right-tailed test for one population mean with , , , and . Notice the sample size is too small to use a Z-distribution, so you have to use the t-distribution. Instructors typically have you use the p-value approach to solving problems involving the t-distribution, but in this case, I want to walk you through one example where you use the critical method (so bear with me!).
The degree of freedom for your t-distribution is , so you can denote this as . The test statistic is . The critical value for this right-tailed test with is 1.833 on the distribution. Here’s why: Because this is a right-tailed test, the area beyond (in this case, “above”) the critical value must be , which means the area above it must be .05. Find column .05 and the row for 9 degrees of freedom. You find 1.833, so you have your critical value (whew!). Conclusion: You shouldn’t reject Ho, because the test statistic (0.53) isn’t beyond the critical value. Interpretation: You don’t have enough evidence to say that the mean is more than 100.
8 Anything beyond the critical value leads to a rejection of Ho. In this case, the critical value is , so beyond means “less than”; therefore, any test statistic that comes in less than leads you to reject Ho.
9 You have a right-tailed test for one population proportion, with , , , and .
The test statistic is . The critical value for this right-tailed test with is (see Table 13-1). Conclusion: You should reject Ho, because the test statistic is beyond the critical value. Interpretation: According to your data, the proportion of people in the population who have the characteristic of interest is more than 0.50.
Always use the decimal version of p when you work with hypothesis tests for one or two proportions. The formulas don’t work if you use percents.
10 You have a left-tailed test for one population proportion, with , , , and .
The test statistic is . The critical value for this left-tailed test with is (see Table 13-1). Conclusion: You should reject Ho, because the test statistic is beyond the critical value. Interpretation: According to your data, the proportion of people in the population who have the characteristic of interest is less than 0.5.
11 You have a two-tailed test for one population proportion, with , , , and . The sample proportion, , is the number of people in the sample who have the characteristic of interest divided by n, so in this case, . The test statistic is . The critical values for this two-tailed test with are and (see Table 13-1). Conclusion: You shouldn’t reject Ho, because the test statistic is between rather than beyond the critical values. Interpretation: You don’t have enough evidence to say that the proportion of this population who fall in the group of interest is anything but 0.50.
12 If the die is fair, each face should show up one-sixth of the time. If you let p equal the proportion of times this die will show a 1, you have Ho: . The alternative is that the coin isn’t fair, so either or fit that case, which means you have Ha: .
13 You have a left-tailed test for two population means with , , , , , , and . The test statistic is . The critical value for this left-tailed test with is (see Table 13-1). Conclusion: You shouldn’t reject Ho, because the test statistic isn’t beyond the critical value. Interpretation: According to your data, you can’t say the difference in the means of these two populations is less than 0 (indicating no statistically significant difference between the means of these two populations).
If you use in Question 13, the critical value is , and the test statistic of is very close to this. You still can’t reject Ho, but you can argue that the close number is a marginal result. The p-value approach (Chapter 14) helps solve the problem of getting different conclusions when you use different levels. It allows you to report the strength of your test statistic and to let other people decide for themselves whether the info is enough to reject Ho.
14 You have a right-tailed test for two population means, with , , , , , and .
The test statistic is . The critical value for this right-tailed test with is (see Table 13-1). Conclusion: You should reject Ho, because the test statistic is beyond the critical value. Interpretation: According to your data, the difference in the means of these two populations is greater than 0 (indicating a statistically significant difference between the means of these two populations).
15 This is the same hypothesis test as Question 14, except you run a two-tailed test rather than a right-tailed test. Notice that Ho and Ha appear different from usual in this problem, but notice that Ho: is the same as Ho: (just subtract from each side of the original equation). This is another way your professor may write these hypotheses, so be ready for it.
To work the problem, note that, again, you have a two-tailed test for two population means, with , , , , , , and . Again, the test statistic is 2.02. However, the critical values for this (now) two-tailed test with are and (see Table 13-1). Conclusion: You should still reject Ho, because the test statistic is beyond the critical value, but it isn’t as far beyond the critical value as it is in Question 14. Interpretation: According to your data, the difference in the means of these two populations is greater than 0, indicating a statistically significant difference between the means of these two populations. (And because the difference is positive, you can say that the first population has a larger mean than the second population.)
Two-tailed tests require you to split the level in half. The half level means the probability of being outside the critical values is cut in half for each side, making the critical values push out farther on each edge. In other words, if you can make a goal by going to either end of the field, you have to push both goal posts out farther to make up for it. So the result you have with a one-tailed test isn’t as strong when you go to the two-tailed test. A one-tailed test puts all your eggs in one basket. A two-tailed test makes you divide your eggs into two baskets, so to speak.
16 After Ho has been rejected and you conclude that the groups don’t have the same mean, the numerator of your test statistic tells you which group had the larger mean. If the numerator of your test statistic is positive, , which means . That means in terms of the populations, the mean of group one (which is ) is likely to be larger than the mean of group two (which is ). If the numerator of your test statistic is negative, , which means . That means in terms of the populations, the mean of group one (which is ) is likely to be smaller than the mean of group two (which is ). Of course, these results all depend on the samples being representative of their populations.
17 You have a right-tailed test for the average difference with , , , and . You do this test the same way you conduct a test for one population mean. The test statistic is . Because the sample size is under 30, you compare your test statistic to the t-distribution with degrees of freedom, denoted . On that distribution, 1.27 falls between column .25 and .10. The p-value falls between these numbers. Conclusion: You shouldn’t reject Ho, because the p-value isn’t smaller than .05. Interpretation: According to your data, you can’t say the mean difference for this population is greater than 0 (indicating no statistically significant mean difference for this population).
18 You have the same hypothesis test as Question 17 here, except the sample size is larger for this test. You have a right-tailed test for the average difference with , , , and .
The test statistic is , which is larger than the test statistic in Question 17. Because the sample size is 30 or more, you compare your test statistic to the Z-distribution. The critical value for this right-tailed test with is 1.64. Conclusion: You should reject Ho, because the test statistic is beyond the critical value. Interpretation: According to your data, the mean difference for this population is greater than 0 (indicating a statistically significant and positive mean difference for this population).
A larger sample size makes it easier to reject Ho. It makes your test statistic more extreme, which increases its chances of crossing over into the rejection region. And in the case of Question 18, where the sample size goes from 10 to 30, the critical value decreases, because you don’t have to use the t-distribution anymore. Remember, the t-distribution makes you pay a penalty for having less data, and that penalty involves pushing out the tails of the t-distribution so you have to get farther out there to reject Ho. A wider boundary makes it harder to reject Ho. (See Chapter 8 for more on the t-distribution.)
19 You have a right-tailed test for two population proportions with , , , , , .
The test statistic is .
The critical value for this right-tailed test with is (see Table 13-1). Conclusion: You shouldn’t reject Ho, because the test statistic isn’t beyond the critical value. Interpretation: According to your data, we can’t say the difference in the proportions for these two populations is greater than 0 (indicating no statistically significant difference between the proportions for these two populations).
20 You have a two-tailed test for two population proportions with , , . The test statistic is .
The critical values for this two-tailed test with are (see Table 13-1). Conclusion: You should reject Ho, because the test statistic is beyond the critical values. Interpretation: According to your data, the difference in the proportions for these two populations isn’t equal to 0, indicating a statistically significant difference between the proportions for these two populations. (And because the test statistic is negative [taking Group 1 – Group 2], the proportion in the first population who have that characteristic of interest is likely to be lower than the proportion in the second population who have the characteristic of interest.)
3.16.15.149