Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.3 Testing Category Probabilities: Multinomial Experiment

Recall from Section 1.4 (p. 10) that observations on a qualitative variable can only be categorized. For example, consider the highest level of education attained by a professional hockey player. Level of education is a qualitative variable with several categories, including some high school, high school diploma, some college, college undergraduate degree, and graduate degree. If we were to record education level for all professional hockey players, the result of the categorization would be a count of the numbers of players falling into the respective categories.

When the qualitative variable of interest results in one of two responses (e.g., yes or no, success or failure, favor or do not favor), the data—called counts—can be analyzed with the binomial probability distribution discussed in Section 4.3. However, qualitative variables, such as level of education, that allow for more than two categories for a response are much more common, and these must be analyzed by a different method.

Qualitative data with more than two levels often result from a multinomial experiment. The characteristics for a multinomial experiment with k outcomes are described in the next box. You can see that the binomial experiment of Chapter 4 is a multinomial experiment with $k = 2.$ $k = 2.$

Teaching Tip

The multinomial experiment should be looked at as an extension of the binomial experiment studied earlier in the text. The number of outcomes has been expanded to k (instead of just 2).

Properties of the Multinomial Experiment

The experiment consists of n identical trials.
There are k possible outcomes to each trial. These outcomes are sometimes called classes, categories, or cells.
The probabilities of the k outcomes, denoted by $p_{1}, p_{2}, \dots, p_{k},$ $p_{1}, p_{2}, \dots, p_{k},$ where $p_{1} + p_{2} + \dots + p_{k} = 1,$ $p_{1} + p_{2} + \dots + p_{k} = 1,$ remain the same from trial to trial.
The trials are independent.
The random variables of interest are the cell counts $n_{1}, n_{2}, \dots, n_{k}$ $n_{1}, n_{2}, \dots, n_{k}$ of the number of observations that fall into each of the k categories.

Example 8.4 Identifying a Multinomial Experiment

Problem

Consider the problem of determining the highest level of education attained by each of a sample of $n = 40$ $n = 40$ National Hockey League (NHL) players. Suppose we categorize level of education into one of five categories—some high school, high school diploma, some college, college undergraduate degree, and graduate degree—and count the number of the 40 players that fall into each category. Is this a multinomial experiment, to a reasonable degree of approximation?

Solution

Checking the five properties of a multinomial experiment shown in the box, we have the following:
1. The experiment consists of $n = 40$ $n = 40$ identical trials, each of which is undertaken to determine the education level of an NHL player.
2. There are $k = 5$ $k = 5$ possible outcomes to each trial, corresponding to the five education-level responses.
3. The probabilities of the $k = 5$ $k = 5$ outcomes $p_{1}, p_{2}, p_{3}, p_{4},$ $p_{1}, p_{2}, p_{3}, p_{4},$ and $p_{5},$ $p_{5},$ where $p_{i}$ $p_{i}$ represents the true probability that an NHL player attains level-of-education category i, remain the same from trial to trial (to a reasonable degree of approximation).
4. The trials are independent; that is, the education level attained by one NHL player does not affect the level attained by any other player.
5. We are interested in the count of the number of hockey players who fall into each of the five education-level categories. These five cell counts are denoted $n_{1}, n_{2}, n_{3}, n_{4},$ $n_{1}, n_{2}, n_{3}, n_{4},$ and $n_{5} .$ $n_{5} .$
Thus, the properties of a multinomial experiment are satisfied.

In this section, we consider a multinomial experiment with k outcomes that correspond to the categories of a single qualitative variable. The results of such an experiment are summarized in a one-way table. The term one-way is used because only one variable is classified. Typically, we want to make inferences about the true percentages that occur in the k categories on the basis of the sample information in the one-way table.

To illustrate, suppose three political candidates are running for the same elective position. Prior to the election, we conduct a survey to determine the voting preferences of a random sample of 150 eligible voters. The qualitative variable of interest is preferred candidate, which has three possible outcomes: candidate 1, candidate 2, and candidate 3. Suppose the number of voters preferring each candidate is tabulated and the resulting count data appear as in Table 8.3.

Table 8.3 Results of Voter Preference Survey

Candidate
1	2	3
61 votes	53 votes	36 votes

Note that our voter preference survey satisfies the properties of a multinomial experiment for the qualitative variable, preferred candidate. The experiment consists of randomly sampling $n = 150$ $n = 150$ voters from a large population of voters containing an unknown proportion $p_{1}$ $p_{1}$ that favors candidate 1, a proportion $p_{2}$ $p_{2}$ that favors candidate 2, and a proportion $p_{3}$ $p_{3}$ that favors candidate 3. Each voter sampled represents a single trial that can result in one of three outcomes: The voter will favor candidate 1, 2, or 3 with probabilities $p_{1}, p_{2},$ $p_{1}, p_{2},$ and $p_{3},$ $p_{3},$ respectively. (Assume that all voters will have a preference.) The voting preference of any single voter in the sample does not affect the preference of any other; consequently, the trials are independent. Finally, you can see that the recorded data are the numbers of voters in each of the three preference categories. Thus, the voter preference survey satisfies the five properties of a multinomial experiment.

In this survey, and in most practical applications of the multinomial experiment, the k outcome probabilities $p_{1}, p_{2}, \dots, p_{k}$ $p_{1}, p_{2}, \dots, p_{k}$ are unknown and we want to use the survey data to make inferences about their values. The unknown probabilities in the voter preference survey are

Teaching Tip

The null hypothesis to be tested allows for the unique specification of the population proportions (e.g., $H_{0} : p_{1} = .2, p_{2} = .3, p_{3} = .5$ $H_{0} : p_{1} = .2, p_{2} = .3, p_{3} = .5$ ). In many cases, however, the appropriate test specifies that all of the proportions are equal.

\begin{matrix} \begin{array}{l} p_{1} & = & Proportion of all voters who favor candidate 1 \\ p_{2} & = & Proportion of all voters who favor candidate 2 \\ p_{3} & = & Proportion of all voters who favor candidate 3 \end{array} \end{matrix}

$\begin{matrix} \begin{array}{l} p_{1} & = & Proportion of all voters who favor candidate 1 \\ p_{2} & = & Proportion of all voters who favor candidate 2 \\ p_{3} & = & Proportion of all voters who favor candidate 3 \end{array} \end{matrix}$

To decide whether the voters, in total, have a preference for any one of the candidates, we will test the null hypothesis that the candidates are equally preferred (i.e., $p_{1} = p_{2} = p_{3} = \frac{1}{3}$ $p_{1} = p_{2} = p_{3} = \frac{1}{3}$ ) against the alternative hypothesis that one candidate is preferred (i.e., at least one of the probabilities $p_{1}, p_{2},$ $p_{1}, p_{2},$ and $p_{3}$ $p_{3}$ exceeds $\frac{1}{3}$ $\frac{1}{3}$ ). Thus, we want to test

\begin{array}{l} H_{0} : p_{1} = p_{2} = p_{3} = \frac{1}{3} (no preference) \\ H_{a} : At least one of the proportions exceeds \frac{1}{3} (a preference exists) \end{array}

$\begin{array}{l} H_{0} : p_{1} = p_{2} = p_{3} = \frac{1}{3} (no preference) \\ H_{a} : At least one of the proportions exceeds \frac{1}{3} (a preference exists) \end{array}$

If the null hypothesis is true and $p_{1} = p_{2} = p_{3} = \frac{1}{3},$ $p_{1} = p_{2} = p_{3} = \frac{1}{3},$ then the expected value (mean value) of the number of voters who prefer candidate 1 is given by

E_{1} = n p_{1} = (n) \frac{1}{3} = (150) \frac{1}{3} = 50

$E_{1} = n p_{1} = (n) \frac{1}{3} = (150) \frac{1}{3} = 50$

Similarly, $E_{2} = E_{3} = 50$ $E_{2} = E_{3} = 50$ if the null hypothesis is true and no preference exists.

Biography Karl Pearson (1857–1936)

The Father of Statistics

While attending college, London-born Karl Pearson exhibited a wide range of interests, including mathematics, physics, religion, history, socialism, and Darwinism. After earning a law degree at Cambridge University and a Ph.D. in political science at the University of Heidelberg (Germany), Pearson became a professor of applied mathematics at University College in London. His 1892 book The Grammar of Science illustrated his conviction that statistical data analysis lies at the foundation of all knowledge; consequently, many consider Pearson to be the “father of statistics.” Among Pearson’s many contributions to the field are introducing the term standard deviation and its associated symbol $(σ);$ $(σ);$ developing the distribution of the correlation coefficient; cofounding and editing the prestigious statistics journal Biometrika; and (what many consider his greatest achievement) creating the first chi-square “goodness-of-fit” test. Pearson inspired his students (including his son, Egon, and William Gossett) with his wonderful lectures and enthusiasm for statistics.

The chi-square test measures the degree of disagreement between the data and the null hypothesis:

\begin{array}{l} \begin{matrix} χ^{2} \end{matrix} & = & \frac{{[n_{1} - E_{1}]}^{2}}{E_{1}} + \frac{{[n_{2} - E_{2}]}^{2}}{E_{2}} + \frac{{[n_{3} - E_{3}]}^{2}}{E_{3}} \\ = & \begin{matrix} \frac{{(n_{1} - 50)}^{2}}{50} + \frac{{(n_{2} - 50)}^{2}}{50} + \frac{{(n_{3} - 50)}^{2}}{50} \end{matrix} \end{array}

$\begin{array}{l} \begin{matrix} χ^{2} \end{matrix} & = & \frac{{[n_{1} - E_{1}]}^{2}}{E_{1}} + \frac{{[n_{2} - E_{2}]}^{2}}{E_{2}} + \frac{{[n_{3} - E_{3}]}^{2}}{E_{3}} \\ = & \begin{matrix} \frac{{(n_{1} - 50)}^{2}}{50} + \frac{{(n_{2} - 50)}^{2}}{50} + \frac{{(n_{3} - 50)}^{2}}{50} \end{matrix} \end{array}$

Note that the farther the observed numbers $n_{1}, n_{2},$ $n_{1}, n_{2},$ and $n_{3}$ $n_{3}$ are from their expected value (50), the larger $χ^{2}$ $χ^{2}$ will become. That is, large values of $χ^{2}$ $χ^{2}$ imply that the null hypothesis is false.

We have to know the distribution of $χ^{2}$ $χ^{2}$ in repeated sampling before we can decide whether the data indicate that a preference exists. When $H_{0}$ $H_{0}$ is true, $χ^{2}$ $χ^{2}$ can be shown to have (approximately) a chi-square distribution (see Section 6.7). For this one-way classification, the $χ^{2}$ $χ^{2}$ distribution has $(k - 1)$ $(k - 1)$ degrees of freedom.* The rejection region for the voter preference survey for $α = .05$ $α = .05$ and $k - 1 = 3 - 1 = 2$ $k - 1 = 3 - 1 = 2$ df is

R e j e c t i o n r e g i o n : χ^{2} > χ_{.05}^{2}

$R e j e c t i o n r e g i o n : χ^{2} > χ_{.05}^{2}$

This value of $χ_{.05}^{2}$ $χ_{.05}^{2}$ (found in Table IV of Appendix B) is 5.99147. (See Figure 8.3.) The computed value of the test statistic is

\begin{array}{l} \begin{matrix} χ^{2} \end{matrix} & = & \begin{matrix} \frac{{(n_{1} - 50)}^{2}}{50} + \frac{{(n_{2} - 50)}^{2}}{50} + \frac{{(n_{3} - 50)}^{2}}{50} \end{matrix} \\ = & \begin{matrix} \begin{matrix} \frac{{(61 - 50)}^{2}}{50} + \frac{{(53 - 50)}^{2}}{50} + \frac{{(36 - 50)}^{2}}{50} = 6.52 \end{matrix} \end{matrix} \end{array}

$\begin{array}{l} \begin{matrix} χ^{2} \end{matrix} & = & \begin{matrix} \frac{{(n_{1} - 50)}^{2}}{50} + \frac{{(n_{2} - 50)}^{2}}{50} + \frac{{(n_{3} - 50)}^{2}}{50} \end{matrix} \\ = & \begin{matrix} \begin{matrix} \frac{{(61 - 50)}^{2}}{50} + \frac{{(53 - 50)}^{2}}{50} + \frac{{(36 - 50)}^{2}}{50} = 6.52 \end{matrix} \end{matrix} \end{array}$

Rejection region for voter preference survey

Since the computed $χ^{2} = 6.52$ $χ^{2} = 6.52$ exceeds the critical value of 5.99147, we conclude at the $α = .05$ $α = .05$ level of significance that there does exist a voter preference for one or more of the candidates.

Now that we have evidence to indicate that the proportions $p_{1}, p_{2},$ $p_{1}, p_{2},$ and $p_{3}$ $p_{3}$ are unequal, we can use the methods of Section 5.4 to make inferences concerning their individual values. [Note: We cannot use the methods of Section 7.2 to compare two proportions because the cell counts are dependent random variables.] The general form for a test of hypothesis concerning multinomial probabilities is shown in the following box:

Teaching Tip

Use plenty of in-class examples to illustrate the one-way test. Lay the groundwork for the test of independence in the two-way analysis in the next section.

A Test of a Hypothesis about Multinomial Probabilities: One-Way Table

H_{0} : p_{1} = p_{1, 0}, p_{2} = p_{2, 0}, \dots, p_{k} = p_{k, 0}

$H_{0} : p_{1} = p_{1, 0}, p_{2} = p_{2, 0}, \dots, p_{k} = p_{k, 0}$

where $p_{1, 0}, p_{2, 0}, \dots, p_{k, 0}$ $p_{1, 0}, p_{2, 0}, \dots, p_{k, 0}$ represent the hypothesized values of the multinomial probabilities

$H_{a} :$ $H_{a} :$ At least one of the probabilities does not equal its hypothesized value

T e s t s t a t i s t i c : χ_{c}^{2} = \sum^{​} \frac{{[n_{i} - E_{i}]}^{2}}{E_{i}}

$T e s t s t a t i s t i c : χ_{c}^{2} = \sum^{} \frac{{[n_{i} - E_{i}]}^{2}}{E_{i}}$

where $E_{i} = n p_{i}, 0$ $E_{i} = n p_{i}, 0$ is the expected cell count—that is, the expected number of outcomes of type i, assuming that $H_{0}$ $H_{0}$ is true. The total sample size is n.

Rejection region: $χ_{c}^{2} > χ_{α}^{2},$ $χ_{c}^{2} > χ_{α}^{2},$ where $χ_{α}^{2}$ $χ_{α}^{2}$ has $(k - 1) d f$ $(k - 1) d f$

p - v a l u e : P (χ^{2} > χ_{c}^{2})

$p - v a l u e : P (χ^{2} > χ_{c}^{2})$

Conditions Required for a Valid $χ^{2}$ $χ^{2}$ Test: One-Way Table

A multinomial experiment has been conducted. This is generally satisfied by taking a random sample from the population of interest.
The sample size n will be large enough so that, for every cell, the expected cell count $E_{i}$ $E_{i}$ will be equal to 5 or more.*
*The assumption that all expected cell counts are at least 5 is necessary in order to ensure that the $χ^{2}$ $χ^{2}$ approximation is appropriate. Exact methods for conducting the test of hypothesis exist and may be used for small expected cell counts, but these methods are beyond the scope of this text.

Example 8.5 A One-Way $χ^{2}$ $χ^{2}$ Test—Effectiveness of a TV Program on Marijuana

Problem

Suppose an educational television station in a state that has not legalized marijuana has broadcast a series of programs on the physiological and psychological effects of smoking marijuana. Now that the series is finished, the station wants to see whether the citizens within the viewing area have changed their minds about how the possession of marijuana should be considered legally. Before the series was shown, it was determined that 7% of the citizens favored legalization, 18% favored decriminalization, 65% favored the existing law (an offender could be fined or imprisoned), and 10% had no opinion.

A summary of the opinions (after the series was shown) of a random sample of 500 people in the viewing area is given in Table 8.4. Test at the $α = .01$ $α = .01$ level to see whether these data indicate that the distribution of opinions differs significantly from the proportions that existed before the educational series was aired.

Table 8.4 Distribution of Opinions about Marijuana Possession

Alternate View

Legalization Decriminalization Existing Laws No Opinion

39 99 336 26

Data Set: MARIJ

Legalization	Decriminalization	Existing Laws	No Opinion
39	99	336	26

Solution

Define the proportions after the airing to be
- $\begin{array}{l} \begin{matrix} p_{1} \end{matrix} & = & Proportion of citizens favoring legalization \\ p_{2} & = & Proportion of citizens favoring decriminalization \\ \begin{matrix} p_{3} \end{matrix} & = & Proportion of citizens favoring existing laws \\ \begin{matrix} p_{4} \end{matrix} & = & Proportion of citizens with no opinion \end{array}$ $\begin{array}{l} \begin{matrix} p_{1} \end{matrix} & = & Proportion of citizens favoring legalization \\ p_{2} & = & Proportion of citizens favoring decriminalization \\ \begin{matrix} p_{3} \end{matrix} & = & Proportion of citizens favoring existing laws \\ \begin{matrix} p_{4} \end{matrix} & = & Proportion of citizens with no opinion \end{array}$
Then the null hypothesis representing no change in the distribution of percentages is

$H_{0} : p_{1} = .07, p_{2} = .18, p_{3} = .65, p_{4} = .10$ $H_{0} : p_{1} = .07, p_{2} = .18, p_{3} = .65, p_{4} = .10$

and the alternative is

$H_{a} : At least one of the proportions differs from its null hypothesized value$ $H_{a} : At least one of the proportions differs from its null hypothesized value$

Thus, we have

$T e s t s t a t i s t i c : χ^{2} = \sum^{} \frac{{[n_{i} - E_{i}]}^{2}}{E_{i}}$ $T e s t s t a t i s t i c : χ^{2} = \sum^{} \frac{{[n_{i} - E_{i}]}^{2}}{E_{i}}$

where

$\begin{array}{l} E_{1} & = & n p_{1, 0} & = & 500 (.07) & = & 35 \\ E_{2} & = & n p_{2, 0} & = & 500 (.18) & = & 90 \\ E_{3} & = & n p_{3, 0} & = & 500 (.65) & = & 325 \\ E_{4} & = & n p_{4, 0} & = & 500 (.10) & = & 50 \end{array}$ $\begin{array}{l} E_{1} & = & n p_{1, 0} & = & 500 (.07) & = & 35 \\ E_{2} & = & n p_{2, 0} & = & 500 (.18) & = & 90 \\ E_{3} & = & n p_{3, 0} & = & 500 (.65) & = & 325 \\ E_{4} & = & n p_{4, 0} & = & 500 (.10) & = & 50 \end{array}$

Since all these values are larger than 5, the $χ^{2}$ $χ^{2}$ approximation is appropriate. Also, if the citizens in the sample were randomly selected, then the properties of the multinomial probability distribution are satisfied.

Rejection region: For $α = .01$ $α = .01$ and $df = k - 1 = 3, reject H_{0} if χ^{2} > χ_{.01}^{2},$ $df = k - 1 = 3, reject H_{0} if χ^{2} > χ_{.01}^{2},$ where (from Table IV in Appendix B) $χ_{.01}^{2} = 11.3449.$ $χ_{.01}^{2} = 11.3449.$

We now calculate the test statistic:

$χ^{2} = \frac{(39 - 35)^{2}}{35} + \frac{(99 - 90)^{2}}{90} + \frac{(336 - 325)^{2}}{325} + \frac{(26 - 50)^{2}}{50} = 13.249$ $χ^{2} = \frac{(39 - 35)^{2}}{35} + \frac{(99 - 90)^{2}}{90} + \frac{(336 - 325)^{2}}{325} + \frac{(26 - 50)^{2}}{50} = 13.249$

Since this value exceeds the table value of $χ^{2}$ $χ^{2}$ (11.3449), the data provide sufficient evidence $(α = .01)$ $(α = .01)$ that the opinions on the legalization of marijuana have changed since the series was aired.

The $χ^{2}$ $χ^{2}$ test can also be conducted with the use of an available statistical software package. Figure 8.4 is an SPSS printout of the analysis of the data in Table 8.5. The test statistic and p-value of the test are highlighted on the printout. Since $α = .01$ $α = .01$ exceeds $p = .004,$ $p = .004,$ there is sufficient evidence to reject $H_{0} .$ $H_{0} .$

Figure 8.4

SPSS analysis of data in Table 8.5

Look Back

If the conclusion for the $χ^{2}$ $χ^{2}$ test is “fail to reject $H_{0},$ $H_{0},$ ” then there is insufficient evidence to conclude that the distribution of opinions differs from the proportions stated in $H_{0} .$ $H_{0} .$ Be careful not to “accept $H_{0}$ $H_{0}$ ” and conclude that $p_{1} = .07,$ $p_{1} = .07,$ $p_{2} = .18, p_{3} = .65,$ $p_{2} = .18, p_{3} = .65,$ and $p_{4} = .10 .$ $p_{4} = .10 .$ The probability $(β)$ $(β)$ of a Type II error is unknown.

Now Work Exercise 8.41

If we focus on one particular outcome of a multinomial experiment, we can use the methods developed in Section 7.4 for a binomial proportion to establish a confidence interval for any one of the multinomial probabilities.* For example, if we want a 95% confidence interval for the proportion of citizens in the viewing area who have no opinion about the issue, we calculate

{\hat{p}}_{4} \pm 1.96 σ_{{\hat{p}}_{4}}

${\hat{p}}_{4} \pm 1.96 σ_{{\hat{p}}_{4}}$

where

\begin{array}{l} {\hat{p}}_{4} = \frac{n_{4}}{n} = \frac{26}{500} = .052 & and & σ_{{\hat{p}}_{4}} \approx \end{array} \sqrt{\frac{{\hat{p}}_{4} (1 - {\hat{p}}_{4})}{n}}

$\begin{array}{l} {\hat{p}}_{4} = \frac{n_{4}}{n} = \frac{26}{500} = .052 & and & σ_{{\hat{p}}_{4}} \approx \end{array} \sqrt{\frac{{\hat{p}}_{4} (1 - {\hat{p}}_{4})}{n}}$

Thus, we get

.052 \pm 1.96 \frac{(.052) (.948)}{500} = .052 \pm .019

$.052 \pm 1.96 \frac{(.052) (.948)}{500} = .052 \pm .019$

or (.033, .071). Consequently, we estimate that between 3.3% and 7.1% of the citizens now have no opinion on the issue of the legalization of marijuana. The series of programs may have helped citizens who formerly had no opinion on the issue to form an opinion, since it appears that the proportion of “no opinions” is now less than 10%.

Exercises 8.34–8.53

Understanding the Principles

8.34 What are the characteristics of a multinomial experiment? Compare the characteristics with those of a binomial experiment.
8.35 What conditions must n satisfy to make the $χ^{2}$ $χ^{2}$ test for a one-way table valid?

Learning the Mechanics

8.36 Use Table IV of Appendix B to find each of the following $χ^{2}$ $χ^{2}$ values:
1. $χ_{.05}^{2}$ $χ_{.05}^{2}$ for $d f = 10$ $d f = 10$
  
  18.3070
2. $χ_{.990}^{2}$ $χ_{.990}^{2}$ for $d f = 50$ $d f = 50$
  
  29.7067
3. $χ_{.10}^{2}$ $χ_{.10}^{2}$ for $d f = 16$ $d f = 16$
  
  23.5418
4. $χ_{.005}^{2}$ $χ_{.005}^{2}$ for $d f = 50$ $d f = 50$
  
  79.4900
8.37 Find the rejection region for a one-dimensional $χ^{2}$ $χ^{2}$ -test of a null hypothesis concerning $p_{1}, p_{2}, \dots, p_{k}$ $p_{1}, p_{2}, \dots, p_{k}$ if
1. $k = 3; α = .05$ $k = 3; α = .05$
  
  $χ^{2} > 5.99147$ $χ^{2} > 5.99147$
2. $k = 5; α = .10$ $k = 5; α = .10$
  
  $χ^{2} > 7.77944$ $χ^{2} > 7.77944$
3. $k = 4; α = .01$ $k = 4; α = .01$
  
  $χ^{2} > 11.3449$ $χ^{2} > 11.3449$
8.38 A multinomial experiment with $k = 3$ $k = 3$ cells and $n = 320$ $n = 320$ produced the data shown in the accompanying table. Do these data provide sufficient evidence to contradict the null hypothesis that $p_{1} = .25, p_{2} = .25,$ $p_{1} = .25, p_{2} = .25,$ and $p_{3} = .50 ?$ $p_{3} = .50 ?$ Test, using $α = .05 .$ $α = .05 .$

$χ^{2} = 8.075$ $χ^{2} = 8.075$

Alternate View

Cell

1 2 3

$n_{i}$ $n_{i}$ 78 60 182

	Cell
$n_{i}$ $n_{i}$	78	60	182

8.39 A multinomial experiment with $k = 4$ $k = 4$ cells and $n = 205$ $n = 205$ produced the data shown in the following table.

	Cell
$n_{i}$ $n_{i}$	43	56	59	47

Is there sufficient evidence to conclude that the multinomial probabilities differ? Test, using $α = .05 .$ $α = .05 .$

$χ^{2} = 3.293$ $χ^{2} = 3.293$
What are the Type I and Type II errors associated with the test of part a?
Construct a 95% confidence interval for the multinomial probability associated with cell 3.

$.288 \pm .062$ $.288 \pm .062$

8.40 A multinomial experiment with $k = 4$ $k = 4$ cells and $n = 400$ $n = 400$ produced the data shown in the accompanying table. Do these data provide sufficient evidence to contradict the null hypothesis that $p_{1} = .2, p_{2} = .4, p_{3} = .1,$ $p_{1} = .2, p_{2} = .4, p_{3} = .1,$ and $p_{4} = .3 ?$ $p_{4} = .3 ?$ Test, using $α = .05 .$ $α = .05 .$

	Cell
$n_{i}$ $n_{i}$	70	196	46	88

Applying the Concepts—Basic

8.41 Jaw dysfunction study. A report on dental patients with temporomandibular (jaw) joint dysfunction (TMD) was published in General Dentistry (Jan/Feb. 2004). A random sample of 60 patients was selected for an experimental treatment of TMD. Prior to treatment, the patients filled out a survey on two nonfunctional jaw habits—bruxism (teeth grinding) and teeth clenching—that have been linked to TMD. Of the 60 patients, 3 admitted to bruxism, 11 admitted to teeth clenching, 30 admitted to both habits, and 16 claimed they had neither habit.
1. Describe the qualitative variable of interest in the study. Give the levels (categories) associated with the variable.
2. Construct a one-way table for the sample data.
3. Give the null and alternative hypotheses for testing whether the percentages associated with the admitted habits are the same.
  
  $H_{0} : p_{1} = p_{2} = p_{3} = p_{4} = .25$ $H_{0} : p_{1} = p_{2} = p_{3} = p_{4} = .25$
4. Calculate the expected numbers for each cell of the one-way table.
  
  15
5. Calculate the appropriate test statistic.
  
  $χ^{2} = 25.73$ $χ^{2} = 25.73$
6. Give the rejection region for the test at $α = .05$ $α = .05$
  
  $χ^{2} > 7.81473$ $χ^{2} > 7.81473$
7. Give the appropriate conclusion in the words of the problem.
  
  $Reject H_{0}$ $Reject H_{0}$
8. Find and interpret a 95% confidence interval for the true proportion of dental patients who admit to both habits.
  
  $.5 \pm .13$ $.5 \pm .13$
SLIME 8.42 Beetles and slime molds. Myxomycetes are mushroom- like slime molds that are a food source for insects. The Journal of Natural History (May 2010) published the results of a study that investigated which of six species of slime molds are most attractive to beetles inhabiting an Atlantic rain forest. A sample of 19 beetles feeding on slime mold was obtained and the species of slime mold was determined for each beetle. The numbers of beetles captured on each of the six species are given in the accompanying table. The researchers want to know if the relative frequency of occurrence of beetles differs for the six slime mold species.

Alternate View

Slime mold species: LE TM AC AD HC HS

Number of beetles: 3 2 7 3 1 3
1. Identify the categorical variable (and its levels) of interest in this study.
2. Set up the null and alternative hypotheses of interest to the researchers.
  
  $H_{0} : p_{1} = p_{2} = p_{3} = p_{4} = p_{5} = p_{6} = 1 / 6$ $H_{0} : p_{1} = p_{2} = p_{3} = p_{4} = p_{5} = p_{6} = 1 / 6$
3. Find the test statistic and corresponding p-value.
  
  $χ^{2} = 6.58$ $χ^{2} = 6.58$
  
  MINITAB Output for Exercise 8.44
4. The researchers found “no significant differences in the relative frequencies of occurrence” using $α = .05$ $α = .05$ . Do you agree?
  
  Yes
5. Comment on the validity of the inference, part d. (Determine the expected cell counts.)
8.43 Do social robots walk or roll? Refer to the International Conference on Social Robotics (Vol. 6414, 2010) study of how engineers design social robots, Exercise 2.7 (p. 38). Recall that a social (or service) robot is designed to entertain, educate, and care for human users. In a random sample of 106 social robots obtained through a Web search, the researchers found that 63 were built with legs only, 20 with wheels only, 8 with both legs and wheels, and 15 with neither legs nor wheels. Prior to obtaining these sample results, a robot design engineer stated that 50% of all social robots produced have legs only, 30% have wheels only, 10% have both legs and wheels, and 10% have neither legs nor wheels.
1. Explain why the data collected for each sampled social robot are categorical in nature.
2. Specify the null and alternative hypotheses for testing the design engineer’s claim.
  
  $H_{0} : p_{1} = .5, p_{2} = .3, p_{3} = p_{4} = .1$ $H_{0} : p_{1} = .5, p_{2} = .3, p_{3} = p_{4} = .1$
3. Assuming the claim is true, determine the number of social robots in the sample you expect to fall into each design category.
  
  53; 31.8; 10.6; 10.6
4. Use the results to compute the chi-square test statistic.
5. Make the appropriate conclusion using $α = .05$ $α = .05$ .
  
  8.73
MMC 8.44 Museum management. Refer to the Museum Management and Curatorship (June 2010) worldwide survey of 30 leading museums of contemporary art, Exercise 2.22 (p. 41). Recall that each museum manager was asked to provide the performance measure used most often for internal evaluation. A summary of the results is provided in the table. The data were analyzed using a chi-square test for a multinomial experiment. The results are shown in the MINITAB printout below.
1. Is there evidence to indicate that one performance measure is used more often than any of the others? Test using $α = .10$ $α = .10$ .
  
  No; $χ^{2} = 1.67$ $χ^{2} = 1.67$
2. Find a 90% confidence interval for the proportion of museums worldwide that use total visitors as their performance measure. Interpret the result.
  
  $.267 \pm .133$ $.267 \pm .133$
Performance Measure Number of Museums

Total visitors 8

Paying visitors 5

Big shows 6

Funds raised 7

Members 4
BOYGRL 8.45 Gender in two-child families. Refer to the Human Biology (Feb. 2009) study on the gender of children in two-child families, Exercise 4.33 (p. 180). The article reported on the results of the National Health Interview Survey (NHIS) of 42,888 two-child families. The table below gives the number of families with each gender configuration.

Gender Configuration Number of Families

Girl-girl (GG) 9,523

Boy-girl (BG) 11,118

Girl-boy (GB) 10,913

Boy-boy (BB) 11,334
1. If it is just as likely to have a boy as a girl, find the probability of each of the gender configurations for a two-child family.
  
  1/4
2. Use the probabilities, part a, to determine the expected number of families for each gender configuration.
  
  10,722
3. Compute the chi-square test statistic for testing the hypothesis that it is just as likely to have a boy as a girl.
  
  $χ^{2} = 187.04$ $χ^{2} = 187.04$
4. Interpret the result, part c, if you conduct the test using $α = .10$ $α = .10$ .
  
  Reject $H_{0}$ $H_{0}$
5. Recent research indicates that the ratio of boys to girls in the world population is not 1 to 1, but instead higher (e.g., 1.06 to 1). Using a ratio of 1.06 to 1, the researchers showed that the probabilities of the different gender configurations are: GG—.23795, BG—.24985, GB—.24985, and BB—.26235. Repeat parts b–d using these probabilities.
  
  $χ^{2} = 64.95; reject H_{0}$ $χ^{2} = 64.95; reject H_{0}$

Slime mold species:	LE	TM	AC	AD	HC	HS
Number of beetles:	3	2	7	3	1	3

Performance Measure	Number of Museums
Total visitors	8
Paying visitors	5
Big shows	6
Funds raised	7
Members	4

Gender Configuration	Number of Families
Girl-girl (GG)	9,523
Boy-girl (BG)	11,118
Girl-boy (GB)	10,913
Boy-boy (BB)	11,334

Applying the Concepts—Intermediate

8.46 Mobile device typing strategies. Researchers estimate that in a typical month about 75 billion text messages are sent in the United States. Text messaging on mobile devices (e.g., cell phones, smartphones) often requires typing in awkward positions that may lead to health issues. A group of Temple University public health professors investigated this phenomenon and published their results in Applied Ergonomics (Mar. 2012). One portion of the study focused on the typing styles of mobile device users. Typing style was categorized as (1) device held with both hands/both thumbs typing, (2) device held with right hand/right thumb typing, (3) device held with left hand/left thumb typing, (4) device held with both hands/right thumb typing, (5) device held with left hand/right index finger typing, or (6) other. In a sample of 859 college students observed typing on their mobile devices, the professors observed 396, 311, 70, 39, 18, and 25, respectively, in the six categories. Is this sufficient evidence to conclude that the proportions of mobile device users in the six texting style categories differ? Use $α = .10$ $α = .10$ to answer the question.

yes
8.47 Curbing street gang gun violence. Refer to the Journal of Quantitative Criminology (Mar. 2014) study of street gun violence in Boston, Exercise 2.18 (p. 41). Recall that over a 5-year period (2006–2010), 80 shootings involving a Boston street gang were sampled and analyzed. Of these, 37 occurred in 2006, 30 in 2007, 4 in 2008, 5 in 2009, and 4 in 2010. A program designed to reduce street gang violence was implemented at the end of 2007. If the program is effective, then researchers believe that the percentage of all gang shootings over the 5-year period will break down as follows: 40% in 2006, 40% in 2007, 10% in 2008, 5% in 2009, and 5% in 2010. Give your opinion on the effectiveness of the program.

$χ^{2} = 3.16, fail to reject H_{0}$ $χ^{2} = 3.16, fail to reject H_{0}$
PONDICE 8.48 Characteristics of ice-melt ponds. The National Snow and Ice Data Center (NSIDC) collected data on 504 ice-melt ponds in the Canadian Arctic. One variable of interest to environmental engineers studying the ponds is the type of ice observed in each. Ice type is classified as first-year ice, multiyear ice, or landfast ice. The SAS summary table for the types of ice of the 504 ice-melt ponds is reproduced below.
1. Use a 90% confidence interval to estimate the proportion of ice-melt ponds in the Canadian Arctic that have first-year ice.
  
  $.175 \pm .028$ $.175 \pm .028$
2. Suppose environmental engineers hypothesize that 15% of Canadian Arctic ice-melt ponds have first-year ice, 40% have landfast ice, and 45% have multiyear ice. Test the engineers’ theory, using $α = .01$ $α = .01$ .
  
  $χ^{2} = 2.39$ $χ^{2} = 2.39$

E4E4 8.49 Detecting Alzheimer’s disease at an early age. Geneticists at Australian National University are studying whether the cognitive effects of Alzheimer’s disease can be detected at an early age (Neuropsychology, Jan. 2007). One portion of the study focused on a particular strand of DNA extracted from each in a sample of 2,097 young adults between the ages of 20 and 24. The DNA strand was classified into one of three genotypes: $E 4^{+} / E 4^{+}, E 4^{+} / E 4^{-},$ $E 4^{+} / E 4^{+}, E 4^{+} / E 4^{-},$ and $E 4^{-} / E 4^{-} .$ $E 4^{-} / E 4^{-} .$ The number of young adults with each genotype is shown in the table. Suppose that in adults who are not afflicted with Alzheimer’s disease, the distribution of genotypes for this strand of DNA is 2% with $E 4^{+} / E 4^{+},$ $E 4^{+} / E 4^{+},$ 25% with $E 4^{+} / E 4^{-} .$ $E 4^{+} / E 4^{-} .$ , and 73% with $E 4^{-} / E 4^{-} .$ $E 4^{-} / E 4^{-} .$ . If differences in this distribution are detected, then this strand of DNA could lead researchers to an early test for the onset of Alzheimer’s. Conduct a test (at $α = .05$ $α = .05$ ) to determine if the distribution of E4/E4 genotypes for the population of young adults differs from the norm.

8.50 Traffic sign maintenance. Refer to the Journal of Transportation Engineering (June 2013) study of traffic sign maintenance, Exercise 8.17 (p. 457). Recall that civil engineers estimated the proportion of traffic signs maintained by the North Carolina Department of Transportation (NCDOT) that fail minimum retroreflectivity requirements. The researchers were also interested in the proportions of NCDOT signs with background colors white (regulatory signs), yellow (warning/caution), red (stop/yield/wrong way), and green (guide/information). In a random sample of 1,000 road signs maintained by the NCDOT, 373 were white, 447 were yellow, 88 were green, and 92 were red. Suppose that NCDOT stores new signs in a warehouse for use as replacement signs; of these, 35% are white, 45% are yellow, 10% are green, and 10% are red. Does the distribution of background colors for all road signs maintained by NCDOT match the color distribution of signs in the warehouse? Test, using $α = .05 .$ $α = .05 .$

$χ^{2} = 3.61$ $χ^{2} = 3.61$
INTERACT 8.51 Interactions in a children’s museum. Refer to the Early Childhood Education Journal (Mar. 2014) study of interactions in a children’s museum, Exercise 2.19 (p. 41). Recall that interactions by visitors to the museum were classified as (1) show-and-tell, (2) learning/teaching, (3) refocusing, (4) participatory play, or (5) advocating/disciplining. Over a 3-month period, the researchers observed 170 meaningful interactions, of which 81 were led by children and 89 were led by adult caregivers. The number of interactions observed in each category is provided in the accompanying table.

Type of Interaction Child-Led Adult-Led

Show-and-tell 26 0

Learning/Teaching 21 64

Refocusing 21 10

Participatory Play 12 9

Advocating/Disciplining 1 6

Totals 81 89

Source: McMunn-Dooley, C. and Welch, M. M. “Nature of interactions among young children and adult caregivers in a children’s museum.” Early Childhood Education Journal, Vol. 42, No. 2, Mar. 2014 (adapted from Figure 2).
1. For child-led interactions, is there evidence of differences in the proportions associated with the types of interactions? Test, using $α = .01$ $α = .01$ .
  
  yes, $χ^{2} = 24.12$ $χ^{2} = 24.12$
2. Repeat part a for adult-led interactions.
  
  $χ^{2} = 153.3$ $χ^{2} = 153.3$

Type of Interaction	Child-Led	Adult-Led
Show-and-tell	26	0
Learning/Teaching	21	64
Refocusing	21	10
Participatory Play	12	9
Advocating/Disciplining	1	6
Totals	81	89

Applying the Concepts—Advanced

USHOR 8.52 Political representation of religious groups. Do those elected to the U.S. House of Representatives really “represent” their constituents demographically? This was a question of interest in Chance (Summer 2002). One of several demographics studied was religious affiliation. The table in the next column gives the proportion of the U.S. population for several religions, as well as the number of the 435 seats in the House of Representatives affiliated with that religion. Give your opinion on whether the members of the House of Representatives are statistically representative of the religious affiliation of their constituents in the United States.

Religion Proportion of U.S. Population Number of Seats in House

Catholic .28 117

Methodist .04 61

Jewish .02 30

Other .66 227

Totals 1.00 435

Religion	Proportion of U.S. Population	Number of Seats in House
Catholic	.28	117
Methodist	.04	61
Jewish	.02	30
Other	.66	227
Totals	1.00	435

SCRABBLE 8.53 Analysis of a Scrabble game. In the board game Scrabble^™, a player initially draws a “hand” of seven tiles at random from 100 tiles. Each tile has a letter of the alphabet, and the player attempts to form a word from the letters in his or her hand. A few years ago, a handheld electronic version of the game, called ScrabbleExpress^™, was developed and sold. However, pure Scrabble players complained that the handheld game produced too few vowels in the 7-letter draws. For each of the 26 letters (and “blank” for any letter), the accompanying table gives the true relative frequency of the letter in the board game. To investigate the validity of the complaint, assume that we record the frequency of occurrence of the letter in a sample of 350 tiles (i.e., 50 “hands”) randomly drawn in the electronic game. These results are also shown in the table.

Do the data support the claim that ScrabbleExpress^™ produces letters at a different rate than the Scrabble™ board game? Test, using $α = .05 .$ $α = .05 .$

$χ^{2} = 190.9$ $χ^{2} = 190.9$
Use a 95% confidence interval to estimate the true proportion of letters drawn in the electronic game that are vowels. Compare the results with the true relative frequency of a vowel in the board game.

$.197 \pm 0.42$ $.197 \pm 0.42$

Letter	Relative Frequency in Board Game	Frequency in Electronic Game
A	.09	20
B	.02	8
C	.02	15
D	.04	14
E	.12	16
F	.02	10
G	.03	16
H	.02	11
I	.09	12
J	.01	9
K	.01	13
L	.04	10
M	.02	15
N	.06	17
O	.08	10
P	.02	16
Q	.01	6
R	.06	13
S	.04	14
T	.06	12
U	.04	11
V	.02	17
W	.02	15
X	.01	8
Y	.02	16
Z	.01	7
#(blank)	.02	19
Total		350

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Genotype:	$E 4^{+} / E 4^{+}$ $E 4^{+} / E 4^{+}$	$E 4^{+} / E 4^{-}$ $E 4^{+} / E 4^{-}$	$E 4^{-} / E 4^{-}$ $E 4^{-} / E 4^{-}$
Number of young adults:	56	517	1,524

Table of Contents for 8.3 Testing Category Probabilities: Multinomial Experiment

Create new playlist

Sign In

Sign Up

Table of Contents for
8.3 Testing Category Probabilities: Multinomial Experiment