We will make statements about a ≥ 2 populations (distributions). We first discuss selection procedures and later multiple comparisons of means. Gupta and Huang (1981) give a good overview of multiple decision problems.
We start with a set G of a populations, i.e. G = {Pi, i = 1, … , a}. These populations correspond with random variables with parameter vectors for which at least one component is unknown. This general approach is described in Rasch and Schott (2018). In this book we restrict ourselves to a special case. In Pi we consider stochastically independent random variables with a ‐distribution.
It is our aim to select the population with the largest expectation or the populations with the t < a largest expectations. In Section 10.7 we discuss selection procedures for variances.
We first order the populations in G by magnitude of their expectations. For this, we need an order relation.
Selection Problem 5.3. (Bechhofer, 1954). For a given risk of wrong decision β with and δ > 0 a subset MB of size t from G has to be selected. Selection is based on random samples from Ai with normally distributed components yi. Select MB in such a way that the probability P(CS) of a correct selection is
In (5.1) d(G1, G2) is the distance between Aa − t + 1 and Aa − t. the value δ, given in advance and, besides β, is part of the precision requirement.
The condition is reasonable, because for no real statistical problem exists. Without experimenting, one can denote any of the subsets of size t by MB and with it fulfils (5.1).
The region [μa − t + 1, μa − t + 1 − δ] is the indifference zone and Selection Problem 5.3 is often called the indifference zone formulation of selection problems.
From Guiard (1994) a modified problem formulation is:
Selection Problem 10.1A. Select a subset MB of size t corresponding to Selection Problem 5.7 in such a way that in place of (5.1)
is used. Here is the set in G, containing all Ai with μi ≥ μa − t + 1 − δ.
For practical purposes, this formulation is easier to understand. Later, we show how results for Selection Problem 5.3 apply to Selection Problem 10.1A.
For the selection procedure, take from each of a normal populations with the random variable yi with E(yi) = μi; var(yi) = σ2 a random sample . These a random samples are assumed to be stochastically independent and the components yij are assumed to be distributed like yi. We base decisions on the estimators or its realisations of μi.
We first assume that the variances of the a populations are equal and known. In practice this is rarely the case but this approach gives a better understanding.
Selection Rule 10.1. From the a independent random samples the sample means …, are calculated and then we select the t populations with the t largest means into the set MB, see Bechhofer (1954).
Selection Rule 10.1 can only be applied, if σ2 is known. If σ2 is unknown, we apply multi‐stage selection procedures in Section 5.4.1.
Bechhofer showed for Selection Rule 10.1 that PC has the maximal lower bound of (5.2) if we have normal distributions with known and equal variances and use ni = n; i = 1, … , a for fixed a, t, δ.
If σ2 is unknown a two‐stage selection rule is proposed.
Selection Rule 10.2. Calculate from observations (realisations of yij) yij(i = 1, … , a; j = 1, … , n0) from the a populations A1, … , Aa with
the estimate
with df = a(n0 − 1) degrees of freedom, as in Table 5.2 with N = an0. For given a, c, t, and β we calculate using the R‐command of Problem 5.6 the sample size n. If n > n0 we take from each of the a populations n − n0 additional observations, otherwise n0 is the final sample size. With n or n0 we continue as in Selection Rule 10.1.
In place of Selection Problem 10.l, Selection Problem 10.1A can always be used. There are advantages in applications. The researcher could ask what can be said about the probability that we really selected the t best populations, if (μa − t + 1, μa − t) < δ . An answer to such a problem is not possible, it is better to formulate Selection Problem 10.1A, which can better be interpreted, and where we now know at least with probability 1 −β that we selected exactly t populations not being more than δ worse than Aa − t + 1.
Guiard (1994) showed that the least favourable cases concerning the values of PC and for Selection Problem 5.7 and 10.1A are identical. By this, the lower bounds 1 −β in (10.2) are identical for Selection Problems 5.3 and 10.1A. We call the probability of a δ‐precise selection.
Now it is again our aim to select the population with the largest expectations. The size r of the selected subset MG is random. Selection is based on a stochastically independent random samples ; its components yij are assumed to be distributed like yi, which are N(μi, σ2) distributed. We base decisions on the estimators or its realisations of μi.
Selection Problem 5.6 (Gupta 1956).
For a given risk β of incorrect decision with
select from G a subset MG of random size r so that
Selection Problem 5.6 is the subset formulation of the selection problem.
The following selection rule stems from Gupta and Panchapakesan (1970, 1979).
Selection Rule 10.3. We use the estimators based on a samples of equal size n, which are ‐distributed.
All the Ai are put into MG, for which
We have to choose with a pre‐given D so that
where Φ and ϕ are the distributions function and the density function of the standardised normal distribution, respectively.
If σ2 is unknown we write approximately, where s2 is an estimate of σ2, based on f degrees of freedom. Then (5.6) is replaced by
where hf(y) is the density function of and is CS(f)‐distributed.
Which of the two problem formulations is better suited for practical purposes? Often experiments with a technologies, a varieties, medical treatments and others, have the purpose of selecting the best of them (i.e. t = 1). If we then have a huge list of candidates at the beginning – say about 500, such as in drug screening, then it is reasonable at first to reduce the number of candidates by a subset procedure down to let us say r = 20. In a second step we then use an indifference zone procedure with a = r. We will present an optimal combination of the two approaches taken from simulation results in Rasch and Yanagida (2019).
A Simulation Experiment. (Rasch and Yanagida, 2019)
We choose t = 1 and as final probability of a correct selection PC = 0.95. Further we assume that the μi, i = 1, … a > 1 are expectations of normal distributions with equal variances.
We start with Gupta's approach with a probability of correct selection PCGu ≥ 0.95 and a sample size nGu as small as possible, both found by systematic search. This results in a subset of size r for each run. If Aa is not in that subset the simulation run was finished with the result ‘incorrect selection’. If Aa is in the subset, and r = 1, the simulation run was finished with the result ‘correct selection’. If, however, Aa is in the subset and r > 1 we continue with Bechhofer's approach with a sample size nB from the R‐program of Selection Problem 5.3.
How can the free parameters of both problems be combined in an optimal way so that the overall probability of correct selection is PC = 0.95 and the total experimental size
is as small as possible?
Rasch and Yanagida (2019) showed that for unknown variances and a ≥ 30 nearly no difference occurs between known and unknown variances due to the large degrees of freedom of the t‐distribution. Therefore, for a not too small the results are valid for unknown variances too.
In the simulation experiment, we used Gupta's approach with sample sizes nGu shown in Table 5.2.
Table 10.2 Values of nGu used in the simulation experiment.
a |
||||
δ |
30
|
50
|
100
|
200
|
19 | 20 | 20 | 23 | |
δ = σ | 5 | 5 | 5 | 6 |
The simulated observations in each sample and each run are xij, i = 1, … , a; j = 1, … , nGu. Each xij is a realisation of a normally distributed random variable with expectation μi; i = 1, … , a; μa = 1; μj = 0, j = 1, … , a−1 and variance σ2 = 1.
Using Gupta's approach for the a realised sample means results in a subset of size r.
If is in this subset and r > 1 we simulate from the r populations in this subset samples of size obtained by R.
Using Bechhofer's approach results in an a best‐selected population; if it is not Aa, the selection is incorrect. We used in the simulation experiment besides the probability of correct selection PC = 0.95 a = 30, 50, 100, 200. All simulations had 100 000 runs, which means that the relative frequencies of correct selection are not far from 0.95. When in Nf of the runs the selection was incorrect (either already after Gupta's approach or at the end) then
is a good estimate of PC.
Table 5.3 shows the average size of the selected subset, which must be used as the number of populations in the indifference zone selection.
Table 10.3 Values of average found in the subset selection.
a |
||||
δ | 30 |
50 |
100 |
200 |
12.25 | 18.81 | 32.08 | 59.81 | |
δ = σ | 12.48 | 18.70 | 31.91 | 62.07 |
Table 5.4 shows the optimal values of PB. Table 5.5 shows the average total number of observations needed for both selection procedures and Table 5.6 the estimated overall probability of correct selection.
Table 10.4 Optimal values of PB used in the simulation experiment.
a |
||||
δ | 30 |
50 |
100 |
200 |
0.981 | 0.983 | 0.991 | 0.988 | |
δ = σ | 0.975 | 0.981 | 0.991 | 0.982 |
Table 10.5 Average total size of the simulation experiment (upper entry) and the size needed for Bechhofer's approach only (lower entry).
a |
||||
δ |
30 |
50 |
100 |
200 |
δ = σ/2 |
1151.7 1830 |
2013.70 3350 |
4253.95 7500 |
8836.00 16 400 |
δ = σ |
288.38 480 |
500.25 850 |
1071.70 1900 |
2215.90 4200 |
Table 10.6 Relative frequencies of correct selection calculated from 100 000 runs.
a |
||||
δ |
30 |
50 |
100 |
200 |
0.9519 | 0.9512 | 0.9510 | 0.9504 | |
d = σ | 0.9504 | 0.9516 | 0.9507 | 0.9502 |
We learn from the simulation experiment that a combination of Gupta's and Bechhofer's approach leads to a smaller total sample size than the use of Bechhofer's approach alone.
In the mean time further results are obtaind by simulation for t >1 and for non‐normal distributions. For this see the special issue of JSTP and a proceedings volume (Springer) of the 10th International Workshop on Simulation and Statistics, Salzburg 2019.
Let the random variable x in Pi be ‐distributed with known μi. From n observations from each population Pi; i = 1, … , a we calculate
In the case of unknown μi we use estimates i and calculate
The yi will be used to select the population with the smallest variance; each yi has the same number f of degrees of freedom (if μi is known we have f = n and if μiis unknown then f = n−1).
Selection for the smallest variance follows from Selection Rule 10.4.
Selection Rule 10.4.
Put into MG all Pi for which
, where is the smallest sample variance. z* = z(f, a, β) ≤ 1 depends on the degrees of freedom f, the number a of populations and on β.
For z* we choose the largest number, so that the right‐hand side of 5.7 equals 1 − β. We have to calculate P(CS) for the least favourable case given by (monotonicity of the likelihood ratio is given). We denote the estimates of as usual by and formulate (for the proof see Rasch and Schott (2018, theorem 11.5)
Real multiple decision problems (with more than two decisions) are present; if we consider results of some tests simultaneously, their risks must be mutually evaluated. Of course, we cannot give a full overview about methods available. For more details, see Miller (1981) and Hsu (1996).
We consider the situation that a populations Pi, i = 1, … , a are given in which random variables yi, i = 1, … , a are independent of each other normally distributed with expectations μi and common but unknown variance σ2. When the variances are unequal, some special methods are applied and we discuss only some of them. The situation is similar to that of the one‐way ANOVA in Chapter 5; the a populations are the levels of a fixed factor A.
We assume that from a populations independent random samples of size ni are drawn. Some of the methods assume the balanced case with equal sample sizes. We consider several multiple comparison (MC) problems.
MC Problem 5.3. Test the null hypothesis
against the alternative hypothesis
HA: there exists at least one pair (i,j) with i ≠ j so that μi ≠ μj.
The first kind risk is defined not for a single pair (i,j) but for all possible pairs of the experiment and this risk is therefore called an experiment‐wise risk αe.
MC Problem 5.6. Test each of the null hypotheses
against the corresponding alternative hypothesis
Each pair (H0,ij; HA,ij) of these hypotheses is independently tested of the others. This corresponds with the situation of Chapter 3 for which the two‐sample t‐test was applied.
The first kind risk αij is defined for the pair (H0,ij; HA,ij) of hypotheses and may differ for each pair. Often we choose all αij = αc and call it a comparison‐wise first kind risk. If we perform all t‐tests using the two‐sample t‐test or the Welch test than we speak about the multiple t‐procedure or W‐procedure.
MC Problem 5.10. One of the populations (without loss of generality Pa) is prominent (a standard method, a control treatment, and so on). Test each of the a − 1 null hypotheses
against the alternative hypothesis
The first kind risk αi for the pair (H0,i; HA,i) of hypotheses is independent of that one of other pairs and therefore also called comparison‐wise. Often we choose αi = αc.
If we use the term experiment‐wise, risk of the first kind αe in MC Problem 5.10 means that it is the probability that in at least one of the pairs (H0,ij; HA,ij) of hypotheses in MC Problem 5.6 or that in at least one of the a − 1 pairs (H0,i; HA,i) of hypotheses in MC Problem 5.10 the null hypothesis is erroneously rejected. However, make sure that such a risk is not really a risk of a statistical test but a probability in a multiple decision problem because when we consider all possible pairs (null hypothesis–alternative hypothesis) of MC Problem 5.6 or 5.10, we have a multiple decision problem with more than two possible decisions if a > 2.
In general we cannot convert αe and αc into each other. The asymptotic (for known σ2) relations for k orthogonal contrasts
follow from elementary rules of probability theory, because we can assign to the independent contrasts independent F‐tests (transformed z‐tests) with f1 = 1, f2 = ∞ degrees of freedom.
To solve MC Problems 5.6 and 5.10 we first construct confidence intervals for differences of expectations as well as for linear contrasts in these expectations. With these confidence intervals, the problems easily are handled.
As already mentioned at the start of Section 5.1.1, we assume that from a populations independent random samples of size ni are drawn. We consider the a populations Pi as the a levels of a fixed factor A in a one‐way analysis of variance as discussed in Section 5.3, therefore we write
and call μ the overall expectation and μ + ai the expectation of the ith level of population Pi. The total size of the experiment is . In applied statistics, we use the notation ‘multiple comparison of means’ synonymously with ‘multiple comparison of expectations’.
We can solve MC Problem 5.3 by several methods. At first, we use the F‐test from Chapter 5.
If MC Problem 5.3 is handled by the F‐test, we use the notations of Table 5.2. H0 : μ1 = μ2 = ⋯ = μa is rejected, if
The method proposed in Scheffé (1953) allows the calculation of simultaneous confidence intervals for all linear contrasts of the μ + ai in (10.17).
We now reformulate MC Problem 5.3 as an MC problem to construct confidence intervals. If H0 is correct then all linear contrasts in the μi = μ + αi equal zero. The validity of H0 conversely follows from the fact that all linear contrasts vanish.
Therefore, confidence intervals Kr for all linear contrasts Lr can be constructed in such a way that the probability that Lr ∈ Kr for all r is at least 1 − αe. We then reject H0 with a first kind risk αe, if at least one of the Kr does not cover Lr.
Confidence intervals by Scheffé's method are not optimal if confidence intervals for k special but not for all contrasts are wanted. Sometimes we obtain shorter intervals using the Bonferroni inequality (Bonferroni, 1936).
Tukey's method (1953) is applied for equal numbers of observations in all a normally distributed samples.
Spjøtvoll and Stoline (1973) generalised Tukey's method without assuming ni = n.
If all conditions of Section 10.8.1.2 are fulfilled, all linear contrasts are simultaneously covered with probability 1 − αe by intervals
In (10.29) , and (a, f| 1 − α) is the (1 − α)‐quantile of the distribution of the augmented studentised range (a, f) (see Rasch and Schott (2018)). These intervals, contrary to the Tukey method, depend on the degree of unbalancedness.
A further generalisation of the Tukey method can be found in Hochberg (1974) and Hochberg and Tamhane (1987) – see also Hochberg and Tamhane (2008).
In the case of equal sample sizes a multiple Welch test analogous to the two‐sample Welch test is possible.
Each of the null hypotheses
has to be tested against the corresponding alternative hypothesis
Each pair (H0,ij; HA,ij) of these hypotheses is tested independently of the others. This corresponds with the situation of Chapter 3 for which the two‐sample t‐test was applied.
By contrast to the two‐sample t‐test the multiple t‐test uses in place of the variances of the two samples under comparison the overall variance estimator, which is equal to the residual mean square of the one‐way ANOVA in Chapter 5. Notice that it is assumed that all a sample sizes are equal to n > 1. The test statistic for testing H0,ij is
with MSres from (5.8).
H0,ij is rejected if the realisation t of t in (10.31) is larger than the (1 − α)‐quantile of the central t‐distribution with N − a degrees of freedom.
Sometimes a – 1 treatments have to be compared with a standard procedure called control. We assume first that we have equal sample sizes n for the treatments.
If one of the a populations P, without loss of generality let it be Pa, is considered as a standard or control, we use in case of a pairwise comparison in place of (10.31)
to test each of the a − 1 null hypotheses
against the corresponding alternative hypothesis
When we use for the tests of the a − 1 differences an overall significance level α* and we use for each test of the a − 1 differences a significance level α, then the Bonferroni inequality gives α ≤ α* ≤ (a − 1) α. When we use for the test of each of the a − 1 differences a significance level α = α*/( a − 1) we have an approximate overall significance level of α*. To test with an exact overall significance level α* we must use the multivariate Student distribution t(a−1, f) with f = df(MSres) = a(n − 1). Dunnett (1955) solved this problem. Let us now derive the optimum choice of sample sizes for the multiple t‐test with a control.
The total sample size is then 10*23 = 230.
Often we have a fixed set of resources and a budget that allows for only N observations. To maximise the power of the tests we want to minimise the total variance. If we use na observations for the control population Pa and for the other treatments an equal sample size of n observations then N = (a − 1)n + na. The variance of i − a is σ2 (1/n + 1/ na). The total variance (TSV) is the sum of the (a – 1) parts, hence TSV = (a – 1)[σ2 (1/n + 1/ na)] and we have to minimise it subject to the constraint N = (a − 1)n + na. This is a Lagrange multiplier problem in calculus where we must minimise M = TSV + λ[N − (a − 1)n − na]. Setting the partial derivatives w.r.t. n and na equal to zero yields the equations:
Hence na = N − (a − 1)n and inserting this in TSV we have a function of n as f(n); the first derivative of f(n) to n, f′(n) = 0 gives as solution the stationary point n = N/[(a − 1) + √(a − 1)]. To decide what type of extreme it is calculate f″(n) = 2/n3 + 2(a − 1)2/[N − (a − 1)n]2 and this is positive for all n > 0, hence the stationary point belongs to a minimum of f(n) due to the constraint. Of course we must take integer numbers for n and na.
In the R package OPDOE
we cannot solve the minimal sample size when we take the integer numbers for na = n√(a − 1) and n = N/[(a − 1) + √(a − 1)].
Simultaneous 1 − αe‐confidence intervals for the a − 1 differences
are constructed for the equal sample sizes n for the a samples. (After renumbering μa is always the expectation of the control.)
We consider a independent N(μi, σ2)‐distributed random variables yi, independent of a CS(f)‐distributed random variable .
Dunnett (1955) derived the distribution of
which is the multivariate Student distribution t(a − 1, f) with f = df(MSres) = a(n − 1), which he called the distribution d(a − 1,f) with f = df(MSres) = a(n–1).
Dunnett (1964) and Bechhofer and Dunnett (1988) present the quantiles d(a − 1, f| 1 − αe) of the distribution of
We see that d ≤ d(a − 1, f| 1 − αe) is necessary and sufficient for
For all i, by
a class of confidence intervals is given, covering all differences μi − μa with probability 1 − αe.
For the one‐way classification with the notation of Example 5.7 we receive, for equal sub‐class numbers n, the class of confidence intervals
If ni = n, for i = 1, ..., a, we use the Dunnett procedure, based on confidence intervals of the Dunnett method. Then H0i is rejected, if
If the ni are not equal, we use a method proposed by Dunnett (1964) with modified quantiles. For example, when the optimal sample size of the control is taken and ni = n for i = 1, 2, ..., a − 1, the table of critical values are given in Dunnett (1964) and Bechhofer and Dunnett (1988).
Dunnett and Tamhane (1992) proposed a further modification, called a step up procedure, starting testing the hypothesis with the least significant of the a – 1 test statistics at the left‐hand side of (5.25) and then going further with the next larger one and so on. Analogously, a step down procedure is defined. Dunnett and Tamhane (1992) showed that the step up procedure is uniformly more powerful than a method proposed by Hochberg (1988) and preferable to the step down procedure. As a disadvantage they mentioned the greater difficulty of calculating critical values and minimum sample sizes in the standard and other groups. This problem is of no importance if we use the R‐package DunnettTests
together with the R‐package mvtnorm.
Finally, we compare the minimal sample sizes of the methods used for Example 5.7 in Table 5.9.
Table 10.9 Minimal sample sizes for several multiple decision problems.
Method | Remarks | |
Selection Rule 10.1 (t = 1) Dunnett's stepwise procedure Multiple t‐procedure F‐test |
12 27.7 average 23 11 ≤ n ≤ 49 |
t = 1 9 comparisons 45 comparisons One test |
Average means .
3.147.126.211