Chapter 3

Tests on the variance

This chapter contains statistical tests on the variance of normal populations. In the one-sample case it is of interest whether the variance of a single population differs from some pre-specified value, where the mean value of the underlying Gaussian distribution may be known or unknown. SAS and R do not provide the user with ready to use procedures or functions for the resulting c03-math-0001-tests. For the two-sample cases it must be distinguished between independent and dependent samples. In the former case an F-test and in the latter case a t-test is appropriate. The SAS procedure proc ttest provides a way to calculate the test for the two-sided hypothesis. We additionally show how the test can be performed for the one-sided hypothesis. In R the function var.test calculates the test for all hypotheses. In SAS and R there is no convenient way to calculate the t-test for dependent samples and we provide code for it. For k-sample variance tests (Levene test, Bartlett test) please refer to Chapter 17 which covers ANOVA tests.

3.1 One-sample tests

This section deals with the question, if the variance differs from a predefined value.

3.1.1 c03-math-0002-test on the variance (mean known)

Description: Tests if a population variance c03-math-0003 differs from a specific value c03-math-0004.
Assumptions:
  • Data are measured on an interval or ratio scale.
  • Data are randomly sampled from a Gaussian distribution.
  • The mean c03-math-0005 of the underlying Gaussian distribution is known.
Hypotheses: (A) c03-math-0006 vs c03-math-0007
(B) c03-math-0008 vs c03-math-0009
(C) c03-math-0010 vs c03-math-0011
Test statistic: c03-math-0012
Test decision: Reject c03-math-0013 if for the observed value c03-math-0014 of c03-math-0015
(A) c03-math-0016 or c03-math-0017
(B) c03-math-0018
(C) c03-math-0019
p-value: (A) c03-math-0020
(B) c03-math-0021
(C) c03-math-0022
Annotations:
  • The test statistic c03-math-0023 is c03-math-0024-distributed with c03-math-0025 degrees of freedom.
  • c03-math-0026 is the c03-math-0027-quantile of the c03-math-0028-distribution with c03-math-0029 degrees of freedom.
  • The test is very sensitive to violations of the Gaussian assumption, especially if the sample size is small [see Sheskin (2007) for details].

Example
To test the hypothesis that the variance of the blood pressures of a certain populations equals 400 (i.e., the standard deviation is 20) with known mean of 130 mmHg. The dataset contains 55 patients (dataset in Table A.1).


SAS code
*Calculate squared sum;
data chi01;
 set blood_pressure;
 mean0=130;     * Set the known mean;
 square_diff=(mmhg-mean0)**2;
run;
proc summary;
 var square_diff;
 output out=chi02 sum=sum_square_diff;
run;
* Calculate test-statistic and p-values;
data chi03;
 set chi02;
 format p_value_A p_value_B p_value_C pvalue.;
 df=_FREQ_;
 sigma0=20;    * Set std under the null hypothesis;
 chisq=sum_square_diff/(sigma0**2);
 * p-value for hypothesis (A);
 p_value_A=2*min(probchi(chisq,df),1-probchi(chisq,df));
 * p-value for hypothesis (B);
 p_value_B=1-probchi(chisq,df);
 * p-value for hypothesis (C);
 p_value_C=probchi(chisq,df);
run;
* Output results;
proc print;
 var chisq df p_value_A p_value_B p_value_c;
run;
SAS output
chisq   df   p_value_A   p_value_B   p_value_C
49.595  55    0.6390      0.6805      0.3195
Remarks:
  • There is no SAS procedure to calculate this c03-math-0030-test directly.


R code
mean0<-130 # Set known mean
sigma0<-20 # Set std under the null hypothesis
# Calculate squared sum;
sum_squared_diff<-sum((blood_pressure$mmhg-mean0)∧2)
# Calculate test-statistic and p-values;
df<-length(blood_pressure$mmhg)
chisq<-sum_squared_diff/(sigma0∧2)
# p-value for hypothesis (A)
p_value_A=2*min(pchisq(chisq,df),1-pchisq(chisq,df))
# p-value for hypothesis (B)
p_value_B=1-pchisq(chisq,df)
# p-value for hypothesis (C)
p_value_C=pchisq(chisq,df)
# Output results
chisq
df
p_value_A
p_value_B
p_value_C
R output
> chisq
[1] 49.595
> df
[1] 55
> p_value_A
[1] 0.6389885
> p_value_B
[1] 0.6805057
> p_value_C
[1] 0.3194943
Remarks:
  • There is no basic R function to calculate this c03-math-0031-test directly.

3.1.2 c03-math-0032-test on the variance (mean unknown)

Description: Tests if a population variance c03-math-0033 differs from a specific value c03-math-0034.
Assumptions:
  • Data are measured on an interval or ratio scale.
  • Data are randomly sampled from a Gaussian distribution.
  • The mean c03-math-0035 of the underlying Gaussian distribution is unknown.
Hypotheses: (A) c03-math-0036 vs c03-math-0037
(B) c03-math-0038 vs c03-math-0039
(C) c03-math-0040 vs c03-math-0041
Test statistic: c03-math-0042
Test decision: Reject c03-math-0043 if for the observed value c03-math-0044 of c03-math-0045
(A) c03-math-0046 or c03-math-0047
(B) c03-math-0048
(C) c03-math-0049
p-value: (A) c03-math-0050
(B) c03-math-0051
(C) c03-math-0052
Annotations:
  • The test statistic c03-math-0053 is c03-math-0054-distributed with c03-math-0055 degrees of freedom.
  • c03-math-0056 is the c03-math-0057-quantile of the c03-math-0058-distribution with c03-math-0059 degrees of freedom.
  • The test is very sensitive to violations of the Gaussian assumption, especially if the sample size is small (Sheskin 2007).

Example
To test the hypothesis that the variance of the blood pressures of a certain population equals 400 (i.e., the standard deviation is 20) with unknown mean. The dataset contains 55 patients (dataset in Table A.1).


SAS code
* Calculate sample std and sample size;
proc means data=blood_pressure std;
 var mmhg;
 output out=chi01 std=std_sample n=n_total;
run;
* Calculate test-statistic and p-values;
data chi02;
 set chi01;
 format p_value_A p_value_B p_value_C pvalue.;
 df=n_total-1;
 sigma0=20;    * Set std under the null hypothesis;
 chisq=(df*(std_sample**2))/(sigma0**2);
 * p-value for hypothesis (A);
 p_value_A=2*min(probchi(chisq,df),1-probchi(chisq,df));
 * p-value for hypothesis (B);
 p_value_B=1-probchi(chisq,df);
 * p-value for hypothesis (C);
 p_value_C=probchi(chisq,df);
run;
* Output results;
proc print;
 var chisq df p_value_A p_value_B p_value_c;
run;
SAS output
chisq   df   p_value_A   p_value_B   p_value_C
49.595  54    0.71039     0.64480     0.35520
Remarks:
  • There is no SAS procedure to calculate this c03-math-0060-test directly.


R code
# Calculate sample std and sample size;
std_sample<-sd(blood_pressure$mmhg)
n<-length(blood_pressure$mmhg)
# Set std under the null hypothesis
sigma0<-20
# Calculate test-statistic and p-values;
df=n-1
chisq<-(df*std_sample∧2)/(sigma0∧2)
# p-value for hypothesis (A)
p_value_A=2*min(pchisq(chisq,df),1-pchisq(chisq,df))
# p-value for hypothesis (B)
p_value_B=1-pchisq(chisq,df)
# p-value for hypothesis (C)
p_value_C=pchisq(chisq,df)
# Output results
chisq
df
p_value_A
p_value_B
p_value_C
R output
> chisq
[1] 49.595
> df
[1] 54
> p_value_A
[1] 0.7103942
> p_value_B
[1] 0.6448029
> p_value_C
[1] 0.3551971
Remarks:
  • There is no basic R function to calculate this c03-math-0061-test directly.

3.2 Two-sample tests

This section covers two-sample tests, which enable us to test if the variances of two populations differ from each other.

3.2.1 Two-sample c03-math-0062-test on variances of two populations

Description: Tests if two population variances c03-math-0063 and c03-math-0064 differ from each other.
Assumptions:
  • Data are measured on an interval or ratio scale.
  • Data are randomly sampled from two independent Gaussian distributions with standard deviations c03-math-0065 and c03-math-0066.
Hypotheses: (A) c03-math-0067 vs c03-math-0068
(B) c03-math-0069 vs c03-math-0070
(C) c03-math-0071 vs c03-math-0072
Test statistic: c03-math-0073
Test decision: Reject c03-math-0074 if for the observed value c03-math-0075 of c03-math-0076
(A) c03-math-0077 or c03-math-0078
(B) c03-math-0079
(C) c03-math-0080
p-value: (A) c03-math-0081
(B) c03-math-0082
(C) c03-math-0083
Annotations:
  • The test statistic c03-math-0084 is c03-math-0085-distributed.
  • c03-math-0086 is the c03-math-0087-quantile of the F-distribution with c03-math-0088 and c03-math-0089 degrees of freedom.
  • The test is very sensitive to violations of the Gaussian assumption.

Example
To test the hypothesis that the variances of the systolic blood pressure of healthy subjects (status=0) and subjects with hypertension (status=1) are equal. The dataset contains c03-math-0090 subjects with status 0 and c03-math-0091 with status 1 (dataset in Table A.1).


SAS code
*** Variant 1 ***;
* Only for hypothesis (A);
proc ttest data=blood_pressure h0=0 sides=2;
 class status;
 var mmhg;
run;
*** Variant 2 ***;
* For hypotheses (A),(B), and (C);
* Calculate the two standard deviations and;
* sample size;
proc means data=blood_pressure std;
 var mmhg;
 by status;
 output out=ftest01 std=stdvalue n=n_total;
run;
* Output the std in two different datasets;
data ftest02 ftest03;
 set ftest01;
 if status=0 then output ftest02;
 if status=1 then output ftest03;
run;
* Rename std and sample size of the subjects with;
* status=0;
data ftest02;
 set ftest02;
 rename stdvalue=std_status0
        n_total=n_status0;
run;
* Rename std and sample size of subjects with;
* status=1;
data ftest03;
 set ftest03;
 rename stdvalue=std_status1
        n_total=n_status1;
run;
* Calculate test statistic p-values;
data ftest04;
 merge ftest02 ftest03;
 format p_value_A p_value_B p_value_C pvalue.;
* Calculate numerator and denominator of the;
* F-statistic;
 std_num=max(std_status0,std_status1);
 std_den=min(std_status0,std_status1);
* Calculate the appropriate degrees of freedom;
 if std_num=std_status0 then
   do;
    df_num=n_status0-1;
    df_den=n_status1-1;
   end;
  else
   do;
    df_num=n_status1-1;
    df_den=n_status0-1;
   end;
* Calculate the test-statistic;
 f=std_num**2/std_den**2;
* p-value for hypothesis (A);
 p_value_A=2*min(probf(f,df_num,df_den),
                       1-probf(f,df_num,df_den));
* p-value for hypothesis (B);
 p_value_B=1-probf(f,df_num,df_den);
* p-value for hypothesis (C);
 p_value_C=probf(f,df_num,df_den);
run;
* Output results;
proc print;
 var f df_num df_den p_value_A p_value_B p_value_C;
run;
SAS output
Variant 1
           Equality of Variances
 Method      Num DF    Den DF    F Value    Pr> F
Folded F       24        29       1.04      0.9180
Variant 2
  f    df_num  df_den  p_value_A  p_value_B p_value_C
1.03634  24      29     0.9180      0.4590    0.5410
Remarks:
  • Variant 1 calculates only the p-value for hypothesis (A) as proc ttest only includes this as additional information using the test statistic c03-math-0092.
  • Variant 2 calculates p-values for all three hypotheses.
  • In some situations SAS calculates an erroneous p-value with the variant 1. This occurs if the degree of freedom of the numerator is greater than the degree of freedom of the denominator and the test statistic c03-math-0093 is between 1 and the median of the F-distribution. Details are given by Gallagher (2006). If this is the case, use either variant 2, or use the F-value which proc ttest provides and the formula of variant 2 for the two-sided p-value.


R code
status0<-blood_pressure$mmhg[blood_pressure$status==0]
status1<-blood_pressure$mmhg[blood_pressure$status==1]
var.test(status0,status1,alternative="two.sided")
R output
F = 1.0363, num df = 24, denom df = 29, p-value = 0.918
Remarks:
  • alternative=“value” is optional and indicates the type of alternative hypothesis: “two.sides” (A); “greater” (B); “less” (C). Default is “two.sided”.

3.2.2 c03-math-0094-test on variances of two dependent populations

Description: Tests if two population variances c03-math-0096 and c03-math-0097 differ from each other.
Assumptions:
  • Data are measured on an interval or ratio scale and are randomly sampled in pairs c03-math-0098.
  • c03-math-0099 follows a Gaussian distribution with mean c03-math-0100 and variance c03-math-0101. c03-math-0102 follows a Gaussian distribution with mean c03-math-0103 and variance c03-math-0104.
Hypotheses: (A) c03-math-0105 vs c03-math-0106
(B) c03-math-0107 vs c03-math-0108
(C) c03-math-0109 vs c03-math-0110
Test statistic: c03-math-0111

equation

Test decision: Reject c03-math-0113 if for the observed value c03-math-0114 of c03-math-0115
(A) c03-math-0116 or c03-math-0117
(B) c03-math-0118
(C) c03-math-0119
p-value: (A) c03-math-0120
(B) c03-math-0121
(C) c03-math-0122
Annotations:
  • The test statistic c03-math-0123 is t-distributed with c03-math-0124 degrees of freedom.
  • c03-math-0125 is the c03-math-0126-quantile of the t-distribution with c03-math-0127 degrees of freedom.
  • This test is very sensitive to violations of the Gaussian assumption (Sheskin 2007, pp. 754–755).
  • Here, c03-math-0128 denotes the correlation coefficient between c03-math-0129 and c03-math-0130.

Example
To test the hypothesis that the variance of intelligence quotients before training (IQ1) and after training (IQ2) stays the same. The dataset contains 20 subjects (dataset in Table A.2).


SAS code
* Calculate sample standard deviations;
* and sample size;
proc means data=iq std;
 var iq1;
 output out=std1 std=std1 n=n_total;
run;
proc means data=iq std;
 var iq2;
 output out=std2 std=std2 n=n_total;
run;
data ttest01;
 merge std1 std2;
run;
* Calculate correlation coefficient;
proc corr data=iq OUTP=corr01;
 var iq1 iq2;
run;
data corr02;
 set corr01;
 if _TYPE_='CORR' and _NAME_='IQ1';
 rename IQ2 = r;
 drop _TYPE_;
run;
data ttest02;
 merge ttest01 corr02;
run;
* Calculate test statistic and two-sided p-value;
data ttest03;
 set ttest02;
 format p_value pvalue.;
 df=n_total-2;
 t=((df**0.5)*(std1**2-std2**2))/
                      (4*(1-r**2)*(std1**2)*(std2**2));
 p_value=2*probt(-abs(t),df);
run;
* Output results;
proc print;
 var t df p_value;
run;
SAS output
     t        df   p_value
0.007821987   18   0.9938
Remarks:
  • There is no SAS procedure to calculate this test directly.
  • The one-sided p-value for hypothesis (B) can be calculated with p_value_B=1-probt(t,df) and the p-value for hypothesis (C) with p_value_C=probt(t,df).


R code
# Calculate sample standard deviations
# and sample size
std1=sd(iq$IQ1)
std2=sd(iq$IQ2)
n_total<-length(iq$IQ1)
# Calculate correlation coefficient
r<-cor(iq$IQ1,iq$IQ2)
# Calculate test statistic and two-sided p-value
df<-n_total-2;
t<-(sqrt(df)*(std1∧2-std2∧2))/(4*(1-r∧2)*std1∧2*std2∧2)
p_value=2*pt(-abs(t),df)
# Output results
t
df
p_value
R output
> t
[1] 0.007821987
> df
[1] 18
> p_value
[1] 0.993845
Remarks:
  • There is no basic R function to calculate this test directly.
  • The one-sided p-value for hypothesis (B) can be calculated with p_value_B=1-pt(t,df) and the p-value for hypothesis (C) with p_value_C=pt(t,df).

References

Gallagher J. 2006 The F test for comparing two normal variances: correct and incorrect calculation of the two-sided p-value. Teaching Statistics 28, 58–60.

Sheskin D.J. 2007 Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.186.143