Chapter 9
Tests on scale difference
In this chapter we present nonparametric tests for the scale parameter. Actually, it is tested if two samples come from the same population where alternatives are characterized by differences in dispersion. These tests are called tests on the scale, spread or dispersion. The most famous one is the Siegel–Tukey test (Test 9.1.1). The introduced tests can be employed if the samples are not normally distributed, but the equality of median assumption is crucial.
9.1 Two-sample tests
9.1.1 Siegel–Tukey test
Description: |
Tests if the scale (variance) of two independent populations is the same. |
Assumptions: |
- Data are measured at least on an ordinal scale.
- Samples and are independently drawn from the two populations, .
- The random variables and are independent with continuous distribution functions and , scale parameters and median . It holds that .
- and belong to the same distribution function with possibly differences in scale and location. Under the assumption of equal median, the hypothesis reduces to .
|
Hypotheses: |
(A) vs |
|
(B) vs |
|
(C) vs |
Test statistic: |
For the test statistic is given by: |
|
|
If is uneven, the above ranking is applied after the middle observation of the combined and ordered sample is discarded and the sample size is reduced to .
Test decision: |
Reject if for the observed value of |
|
(A) or |
|
(B) |
|
(C) |
p-value: |
(A) |
|
(B) |
|
(C) |
Annotations: |
- Tables with critical values can be found in Siegel and Tukey (1980). Due to the used ranking procedure the same tables for critical values can be used as for the Wilcoxon rank sum test for location.
- For the calculation of the test statistic, first combine both samples and rank the combined sample from the lowest to the highest values according to the above ranking scheme. Hence, the lowest value gets the rank , the highest value the rank , the second highest value the rank , the second lowest value the rank , the third lowest value the rank , and so forth. The above test statistic is the sum of the ranks of the sample of based on the assumption . The test can also be based on the ranks of -observations in the combined sample. Usually the sum of ranks of the sample with the smaller sample size is used due to arithmetic convenience (Siegel and Tukey 1980).
- The distribution with the larger scale will have the lower sum of ranks, because the lower ranks are on both ends of the combined sample.
- It is not necessary to remove the middle observation if the combined sample size is odd. The advantage of this is, that the sum of ranks of adjacent observations is always the same and therefore the sum of ranks is a symmetric distribution under .
- For large samples the test statistic can be used, which is approximately a standard normal distribution. The sign has to be chosen such that is smaller (Siegel and Tukey 1980).
|
Example
To test the hypothesis that the dispersion of the systolic blood pressure in the two populations of healthy subjects (status=0) and subjects with hypertension (status=1) is the same. The dataset contains
observations for status=0 and
observations for status=1 (dataset in
Table A.1).
SAS code
proc npar1way data=blood_pressure correct=no st;
var mmhg;
class status;
exact st;
run;
SAS output
The NPAR1WAY Procedure
Siegel–Tukey Scores for Variable mmhg
Classified by Variable status
Sum of Expected Std Dev Mean
status N Scores Under H0 Under H0 Score
-------------------------------------------------------
0 25 655.0 700.0 59.001584 26.20
1 30 885.0 840.0 59.001584 29.50
Average scores were used for ties.
Siegel–Tukey Two-Sample Test
Statistic 655.0000
Z −0.7627
One-Sided Pr < Z 0.2228
Two-Sided Pr > |Z| 0.4456
Remarks:
- The parameter st enables the Siegel–Tukey test of the procedure NPAR1WAY.
- correct=value is optional. If value is YES than a continuity correction for the normal approximation is used. The default is NO.
- exact st is optional and applies an additional exact test. Note, the computation of an exact test can be very time consuming. This is the reason why in this example no exact p-values are given in the output.
- Besides the two-sided p-value SAS also reports a one-sided p-value; which one is printed depends on the Z-statistic. If it is greater than zero the right-sided p-value is printed. If it is less than or equal to zero the left-sided p-value is printed.
- In this example the sum of scores for the healthy subjects is 655.0 compared with 885.0 for the people with hypertension. So there is evidence that the scale of healthy subjects is higher than the scale of unhealthy subjects. In fact the variance of the healthy subjects is 124.41 and the variance of the unhealthy subjects is 120.05. Therefore the p-value for hypothesis (C) is and the p-value for hypothesis (B) is .
- In the case of odd sample sizes SAS does not delete the middle observation.
R code
# Helper functions to find even or odd numbers
is.even <- function(x) x %% 2 == 0
is.odd <- function(x) x %% 2 == 1
# Create a sorted matrix with first column the blood
# pressure and second column the status
data<-blood_pressure[order(blood_pressure$mmhg),]
x<-c(data$mmhg)
x<-cbind(x,data$status)
# If the sample size is odd then remove the observation
# in the middle
if (is.odd(nrow(x))) x<-x[-c(nrow(x)/2+0.5),]
# Calculate the (remaining) sample size
n<-nrow(x)
# y returns the Siegel–Tukey scores
y<-rep(0,times=n)
# Assigning the scores
for (i in seq(along=x)) {
if (1<i & i <= n/2 & is.even(i))
{
y[i]<-2*i
}
else if (n/2<i & i<=n & is.even(i))
{
y[i]<-2*(n-i)+2
}
else if (1<=i & i <=n/2 & is.odd(i))
{
y[i]<-2*i-1
}
else if (n/2<i & i < n & is.odd(i))
{
y[i]<-2*(n-i)+1
}
}
# Now mean scores must be created if necessary
t<-tapply(y,x[,1],mean) # Get mean scores for tied values
v<-strsplit(names(t), “ ”) # Get mmhg values
# r
r<-rep(0,times=n)
# Assign ranks and mean ranks to r
for (i in seq(along=r))
{
for (j in seq(along=v))
{
if (x[i,1]==as.numeric(v[j])) r[i]=t[j]
}
}
# Now calculate the test statistics S_0 (status 0)
# and S_1 (status 1) for both samples
S_0<-0
S_1<-0
for (i in seq(along=r)) {
if(x[i,2]==0) S_0=S_0+r[i]
if(x[i,2]==1) S_1=S_1+r[i]
}
# Calculate sample sizes for status=0 and status=1
n1<-sum(x[,2]==0)
n2<-sum(x[,2]==1)
# Choose the test statistic which belongs to the smallest
# sample size
if (n1<=n2) {
# Choose the smaller |z| value
z1<-(2*S_0-n1*(n+1)+1)/sqrt((n1*n2*(n+1)/3))
z2<-(2*S_0-n1*(n+1)-1)/sqrt((n1*n2*(n+1)/3))
if (abs(z1)<=abs(z2)) z=z1 else z=z2
pvalue_B=1-pnorm(-abs(z))
pvalue_C=pnorm(-abs(z))
}
if (n1>n2) {
# Choose the smaller |z| value
z1<-(2*S_1-n2*(n+1)+1)/sqrt((n1*n2*(n+1)/3))
z2<-(2*S_1-n2*(n+1)-1)/sqrt((n1*n2*(n+1)/3))
if (abs(z1)<=abs(z2)) z=z1 else z=z2
pvalue_B=pnorm(-abs(z));
pvalue_C=1-pnorm(-abs(z));
}
pvalue_A=2*min(pnorm(-abs(z)),1-pnorm(-abs(z)));
# Output results
print(“Siegel–Tukey test”)
n
S_0
S_1
z
pvalue_A
pvalue_B
pvalue_C
R output
[1] “Siegel–Tukey test”
> n
[1] 54
> S_0
[1] 600.5
> S_1
[1] 884.5
> z
[1] -1.027058
> pvalue_A
[1] 0.3043931
> pvalue_B
[1] 0.8478035
> pvalue_C
[1] 0.1521965
Remarks:
- There is no basic R function to calculate this test directly.
- In this implementation of the test, the observation in the middle of the sorted sample is removed. This is different to SAS and therefore the calculated values of the test statistic are not the same.
- In the case of ties–as in the above sample–the construction of ranks must be made in two passes. First the ranks are constructed in the ordered combined sample. Afterwards the mean of ranks of the tied observations are calculated.
9.1.2 Ansari–Bradley test
Description: |
Tests if the scale (variance) of two independent populations is the same. |
Assumptions: |
- Data are measured at least on an ordinal scale.
- Samples and are independently drawn from the two populations, .
- The random variables and are independent with continuous distribution functions and , scale parameters and median . It holds that .
- and belong to the same distribution function with possibly differences in scale and location. Under the assumption of equal median, the hypothesis reduces to .
|
Hypotheses: |
(A) vs |
|
(B) vs |
|
(C) vs |
Test statistic: |
For the test statistic is given by: |
|
sum of ranks of in the combined sample. |
Test decision: |
Reject if for the observed value of |
|
(A) or with |
|
(B) |
|
(C) |
p-value: |
(A) |
|
(B) |
|
(C) |
Annotations: |
- For the calculation of the test statistic, first combine both samples and rank the combined sample from the lowest to the highest values according to the above ranking scheme. It means that for even sample size the series of ranks will be and for odd sample size it will be (Ansari and Bradley 1960). The distribution with the larger scale will have the lower sum of ranks because the lower ranks are on the both ends of the combined sample.
- Here, denotes the upper-tail probability for the null distribution of the Ansari–Bradley statistic calculated for the sample with the smaller sample size; tables are given in Ansari and Bradley (1960) as well as in Hollander and Wolfe (1999, Table A.8). In general, the test can alternatively be set up by using the sum of ranks of the sample with the larger sample size as the test statistic.
- In the case of tied observations mean ranks are used.
- For large sample sizes ( and ) the test statistic is asymptotically normally distributed. If no ties are present and is even, then and . If no ties are present and is odd, then and . In the case of ties the expectation is the same, but the variance is somewhat different. Let be the number of tied groups, the number of tied observations in group , and the middle range in group . If is even, then . If is odd, then . (Hollander and Wolfe 1999, p. 145).
|
Example
To test the hypothesis that the dispersion of the systolic blood pressure in the two populations of healthy subjects (status=0) and subjects with hypertension (status=1) is the same. The dataset contains
observations for status=0 and
observations for status=1 (dataset in
Table A.1).
R code
x<-blood_pressure$mmhg[blood_pressure$status==0]
y<-blood_pressure$mmhg[blood_pressure$status==1]
ansari.test(x,y,exact=NULL,alternative =“two.sided”)
R output
Ansari–Bradley test
data: x and y
AB = 334, p-value = 0.4489
alternative hypothesis: true ratio of scales is not
equal to 1
Remarks:
- exact=value is optional. If value is not specified or TRUE an exact p-value is computed if the combined sample size is less than . If it is NULL or FALSE the approximative p-value is computed. In the case of ties R cannot compute an exact test.
- R tests equivalent hypotheses of the type vs for hypothesis (A), and so on.
- alternative=“value” is optional and defines the type of alternative hypothesis: “two.sided”= true ratio of scales is not equal to 1 (A); “greater”=true ratio of scales is greater than 1 (C); “lower”=true ratio of scales is less than 1 (B). Default is “two.sided”.
9.1.3 Mood test
Description: |
Tests if the scale (variance) of two independent populations is the same. |
Assumptions: |
- Data are measured at least on an ordinal scale.
- Samples and are independently drawn from the two populations, .
- The random variables and are independent with continuous distribution functions and , scale parameters and median . It holds that .
- and belong to the same distribution function with possibly differences in scale and location. Under the assumption of equal median, the hypothesis reduces to .
|
Hypotheses: |
(A) vs |
|
(B) vs |
|
(C) vs |
Test statistic: |
For the test statistic is given by: |
|
|
|
where is the rank of the th -observation in the combined sample |
Test decision: |
Reject if for the observed value of |
|
(A) or with |
|
(B) |
|
(C) |
p-value: |
(A) |
|
(B) |
|
(C) |
Annotations: |
- Tables with critical values can be found in Laubscher et al. (1968).
- For the calculation of the test statistic, first combine both samples and rank the combined sample from the lowest to the highest values. Above test statistic is the sum of the quadratic distance of the ranks of the -observations from the median of all ranks based on the assumption . The test can also be based on the ranks of -observations in the combined sample. Usually the sum of ranks of the sample with the smaller sample size is used.
- In the case of tied observations mid ranks are used. However, tied observations only influence the test statistics if they are between the - and -observations.
- For large sample sizes () the test statistic is asymptotically normally distributed with and (Mood 1954).
|
Example
To test the hypothesis that the dispersion of the systolic blood pressure in the two populations of healthy subjects (status=0) and subjects with hypertension (status=1) is the same. The dataset contains
observations for status=0 and
observations for status=1 (dataset in
Table A.1).
SAS code
proc npar1way data=blood_pressure correct=no mood;
var mmhg;
class status;
exact mood;
run;
R code
x<-blood_pressure$mmhg[blood_pressure$status==0]
y<-blood_pressure$mmhg[blood_pressure$status==1]
mood.test(x,y,alternative =“two.sided”)
R output
Mood two-sample test of scale
data: x and y
Z = 0.6765, p-value = 0.4987
alternative hypothesis: two.sided
Remarks:
- R handles ties differently to SAS. Instead of mid ranks a procedure by Mielke is used (Mielke 1967).
- alternative=“value” is optional and defines the type of alternative hypothesis: “two.sided”= true ratio of scales is not equal to 1 (A); “greater”=true ratio of scales is greater than 1 (C); “lower”=true ratio of scales is less than 1 (B). Default is “two.sided”.
References
Ansari A.R. and Bradley R.A. 1960 Rank-sum tests for disperson. Annals of Mathematical Statistics 31, 1174–1189.
Hollander M. and Wolfe D.A. 1999 Nonparametric Statistical Methods, 2nd edn. John Wiley & Sons, Ltd.
Laubscher N.F., Steffens F.E. and DeLange E.M. 1968 Exact critical values for Mood's distribution-free test statistic for dispersion and its normal approximation. Technometrics 10, 497–508.
Mielke P.W. 1967 Note on some squared rank tests with existing ties. Technometrics 9, 312–314.
Mood A.M. 1954 On the asymptotic efficiency of certain nonparametric two-sample tests. Annals of Mathematical Statistics 25, 514–522.
Siegel S. and Tukey J.W. 1980 A nonparametric sum of ranks procedure for relative spread in unpaired samples. Journal of the American Statistical Association 55, 429–445.