Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3 Tests on the variance

This chapter contains statistical tests on the variance of normal populations. In the one-sample case it is of interest whether the variance of a single population differs from some pre-specified value, where the mean value of the underlying Gaussian distribution may be known or unknown. SAS and R do not provide the user with ready to use procedures or functions for the resulting $c03-math-0001$ -tests. For the two-sample cases it must be distinguished between independent and dependent samples. In the former case an F-test and in the latter case a t-test is appropriate. The SAS procedure proc ttest provides a way to calculate the test for the two-sided hypothesis. We additionally show how the test can be performed for the one-sided hypothesis. In R the function var.test calculates the test for all hypotheses. In SAS and R there is no convenient way to calculate the t-test for dependent samples and we provide code for it. For k-sample variance tests (Levene test, Bartlett test) please refer to Chapter 17 which covers ANOVA tests.

3.1 One-sample tests

This section deals with the question, if the variance differs from a predefined value.

3.1.1 $c03-math-0002$ -test on the variance (mean known)

Description:	Tests if a population variance $c03-math-0003$ differs from a specific value $c03-math-0004$ .
Assumptions:	Data are measured on an interval or ratio scale. Data are randomly sampled from a Gaussian distribution. The mean $c03-math-0005$ of the underlying Gaussian distribution is known.
Hypotheses:	(A) $c03-math-0006$ vs $c03-math-0007$
	(B) $c03-math-0008$ vs $c03-math-0009$
	(C) $c03-math-0010$ vs $c03-math-0011$


Test statistic:	$c03-math-0012$
Test decision:	Reject $c03-math-0013$ if for the observed value $c03-math-0014$ of $c03-math-0015$
	(A) $c03-math-0016$ or $c03-math-0017$
	(B) $c03-math-0018$
	(C) $c03-math-0019$
p-value:	(A) $c03-math-0020$
	(B) $c03-math-0021$
	(C) $c03-math-0022$
Annotations:	The test statistic $c03-math-0023$ is $c03-math-0024$ -distributed with $c03-math-0025$ degrees of freedom. $c03-math-0026$ is the $c03-math-0027$ -quantile of the $c03-math-0028$ -distribution with $c03-math-0029$ degrees of freedom. The test is very sensitive to violations of the Gaussian assumption, especially if the sample size is small [see Sheskin (2007) for details].

Example

To test the hypothesis that the variance of the blood pressures of a certain populations equals 400 (i.e., the standard deviation is 20) with known mean of 130 mmHg. The dataset contains 55 patients (dataset in Table A.1).

SAS code

*Calculate squared sum;
data chi01;
 set blood_pressure;
 mean0=130;     * Set the known mean;
 square_diff=(mmhg-mean0)**2;
run;
proc summary;
 var square_diff;
 output out=chi02 sum=sum_square_diff;
run;
* Calculate test-statistic and p-values;
data chi03;
 set chi02;
 format p_value_A p_value_B p_value_C pvalue.;
 df=_FREQ_;
 sigma0=20;    * Set std under the null hypothesis;
 chisq=sum_square_diff/(sigma0**2);
 * p-value for hypothesis (A);
 p_value_A=2*min(probchi(chisq,df),1-probchi(chisq,df));
 * p-value for hypothesis (B);
 p_value_B=1-probchi(chisq,df);
 * p-value for hypothesis (C);
 p_value_C=probchi(chisq,df);
run;
* Output results;
proc print;
 var chisq df p_value_A p_value_B p_value_c;
run;

SAS output

chisq   df   p_value_A   p_value_B   p_value_C
49.595  55    0.6390      0.6805      0.3195

Remarks:

There is no SAS procedure to calculate this $c03-math-0030$ -test directly.

R code

mean0<-130 # Set known mean
sigma0<-20 # Set std under the null hypothesis
# Calculate squared sum;
sum_squared_diff<-sum((blood_pressure$mmhg-mean0)∧2)
# Calculate test-statistic and p-values;
df<-length(blood_pressure$mmhg)
chisq<-sum_squared_diff/(sigma0∧2)
# p-value for hypothesis (A)
p_value_A=2*min(pchisq(chisq,df),1-pchisq(chisq,df))
# p-value for hypothesis (B)
p_value_B=1-pchisq(chisq,df)
# p-value for hypothesis (C)
p_value_C=pchisq(chisq,df)
# Output results
chisq
df
p_value_A
p_value_B
p_value_C

R output

> chisq
[1] 49.595
> df
[1] 55
> p_value_A
[1] 0.6389885
> p_value_B
[1] 0.6805057
> p_value_C
[1] 0.3194943

Remarks:

There is no basic R function to calculate this $c03-math-0031$ -test directly.

3.1.2 $c03-math-0032$ -test on the variance (mean unknown)

Description:	Tests if a population variance $c03-math-0033$ differs from a specific value $c03-math-0034$ .
Assumptions:	Data are measured on an interval or ratio scale. Data are randomly sampled from a Gaussian distribution. The mean $c03-math-0035$ of the underlying Gaussian distribution is unknown.
Hypotheses:	(A) $c03-math-0036$ vs $c03-math-0037$
	(B) $c03-math-0038$ vs $c03-math-0039$
	(C) $c03-math-0040$ vs $c03-math-0041$
Test statistic:	$c03-math-0042$
Test decision:	Reject $c03-math-0043$ if for the observed value $c03-math-0044$ of $c03-math-0045$
	(A) $c03-math-0046$ or $c03-math-0047$
	(B) $c03-math-0048$
	(C) $c03-math-0049$
p-value:	(A) $c03-math-0050$
	(B) $c03-math-0051$
	(C) $c03-math-0052$
Annotations:	The test statistic $c03-math-0053$ is $c03-math-0054$ -distributed with $c03-math-0055$ degrees of freedom. $c03-math-0056$ is the $c03-math-0057$ -quantile of the $c03-math-0058$ -distribution with $c03-math-0059$ degrees of freedom. The test is very sensitive to violations of the Gaussian assumption, especially if the sample size is small (Sheskin 2007).

Example

To test the hypothesis that the variance of the blood pressures of a certain population equals 400 (i.e., the standard deviation is 20) with unknown mean. The dataset contains 55 patients (dataset in Table A.1).

SAS code

* Calculate sample std and sample size;
proc means data=blood_pressure std;
 var mmhg;
 output out=chi01 std=std_sample n=n_total;
run;
* Calculate test-statistic and p-values;
data chi02;
 set chi01;
 format p_value_A p_value_B p_value_C pvalue.;
 df=n_total-1;
 sigma0=20;    * Set std under the null hypothesis;
 chisq=(df*(std_sample**2))/(sigma0**2);
 * p-value for hypothesis (A);
 p_value_A=2*min(probchi(chisq,df),1-probchi(chisq,df));
 * p-value for hypothesis (B);
 p_value_B=1-probchi(chisq,df);
 * p-value for hypothesis (C);
 p_value_C=probchi(chisq,df);
run;
* Output results;
proc print;
 var chisq df p_value_A p_value_B p_value_c;
run;

SAS output

chisq   df   p_value_A   p_value_B   p_value_C
49.595  54    0.71039     0.64480     0.35520

Remarks:

There is no SAS procedure to calculate this $c03-math-0060$ -test directly.

R code

# Calculate sample std and sample size;
std_sample<-sd(blood_pressure$mmhg)
n<-length(blood_pressure$mmhg)
# Set std under the null hypothesis
sigma0<-20
# Calculate test-statistic and p-values;
df=n-1
chisq<-(df*std_sample∧2)/(sigma0∧2)
# p-value for hypothesis (A)
p_value_A=2*min(pchisq(chisq,df),1-pchisq(chisq,df))
# p-value for hypothesis (B)
p_value_B=1-pchisq(chisq,df)
# p-value for hypothesis (C)
p_value_C=pchisq(chisq,df)
# Output results
chisq
df
p_value_A
p_value_B
p_value_C

R output

> chisq
[1] 49.595
> df
[1] 54
> p_value_A
[1] 0.7103942
> p_value_B
[1] 0.6448029
> p_value_C
[1] 0.3551971

Remarks:

There is no basic R function to calculate this $c03-math-0061$ -test directly.

3.2 Two-sample tests

This section covers two-sample tests, which enable us to test if the variances of two populations differ from each other.

3.2.1 Two-sample $c03-math-0062$ -test on variances of two populations

Description:	Tests if two population variances $c03-math-0063$ and $c03-math-0064$ differ from each other.
Assumptions:	Data are measured on an interval or ratio scale. Data are randomly sampled from two independent Gaussian distributions with standard deviations $c03-math-0065$ and $c03-math-0066$ .
Hypotheses:	(A) $c03-math-0067$ vs $c03-math-0068$
	(B) $c03-math-0069$ vs $c03-math-0070$
	(C) $c03-math-0071$ vs $c03-math-0072$
Test statistic:	$c03-math-0073$

Test decision:	Reject $c03-math-0074$ if for the observed value $c03-math-0075$ of $c03-math-0076$
	(A) $c03-math-0077$ or $c03-math-0078$
	(B) $c03-math-0079$
	(C) $c03-math-0080$
p-value:	(A) $c03-math-0081$
	(B) $c03-math-0082$
	(C) $c03-math-0083$
Annotations:	The test statistic $c03-math-0084$ is $c03-math-0085$ -distributed. $c03-math-0086$ is the $c03-math-0087$ -quantile of the F-distribution with $c03-math-0088$ and $c03-math-0089$ degrees of freedom. The test is very sensitive to violations of the Gaussian assumption.

Example

To test the hypothesis that the variances of the systolic blood pressure of healthy subjects (status=0) and subjects with hypertension (status=1) are equal. The dataset contains $c03-math-0090$ subjects with status 0 and $c03-math-0091$ with status 1 (dataset in Table A.1).

SAS code

*** Variant 1 ***;
* Only for hypothesis (A);
proc ttest data=blood_pressure h0=0 sides=2;
 class status;
 var mmhg;
run;
*** Variant 2 ***;
* For hypotheses (A),(B), and (C);
* Calculate the two standard deviations and;
* sample size;
proc means data=blood_pressure std;
 var mmhg;
 by status;
 output out=ftest01 std=stdvalue n=n_total;
run;
* Output the std in two different datasets;
data ftest02 ftest03;
 set ftest01;
 if status=0 then output ftest02;
 if status=1 then output ftest03;
run;
* Rename std and sample size of the subjects with;
* status=0;
data ftest02;
 set ftest02;
 rename stdvalue=std_status0
        n_total=n_status0;
run;
* Rename std and sample size of subjects with;
* status=1;
data ftest03;
 set ftest03;
 rename stdvalue=std_status1
        n_total=n_status1;
run;
* Calculate test statistic p-values;
data ftest04;
 merge ftest02 ftest03;
 format p_value_A p_value_B p_value_C pvalue.;
* Calculate numerator and denominator of the;
* F-statistic;
 std_num=max(std_status0,std_status1);
 std_den=min(std_status0,std_status1);
* Calculate the appropriate degrees of freedom;
 if std_num=std_status0 then
   do;
    df_num=n_status0-1;
    df_den=n_status1-1;
   end;
  else
   do;
    df_num=n_status1-1;
    df_den=n_status0-1;
   end;
* Calculate the test-statistic;
 f=std_num**2/std_den**2;
* p-value for hypothesis (A);
 p_value_A=2*min(probf(f,df_num,df_den),
                       1-probf(f,df_num,df_den));
* p-value for hypothesis (B);
 p_value_B=1-probf(f,df_num,df_den);
* p-value for hypothesis (C);
 p_value_C=probf(f,df_num,df_den);
run;
* Output results;
proc print;
 var f df_num df_den p_value_A p_value_B p_value_C;
run;

SAS output

Variant 1
           Equality of Variances
 Method      Num DF    Den DF    F Value    Pr> F
Folded F       24        29       1.04      0.9180
Variant 2
  f    df_num  df_den  p_value_A  p_value_B p_value_C
1.03634  24      29     0.9180      0.4590    0.5410

Remarks:

Variant 1 calculates only the p-value for hypothesis (A) as proc ttest only includes this as additional information using the test statistic $c03-math-0092$ .
Variant 2 calculates p-values for all three hypotheses.
In some situations SAS calculates an erroneous p-value with the variant 1. This occurs if the degree of freedom of the numerator is greater than the degree of freedom of the denominator and the test statistic $c03-math-0093$ is between 1 and the median of the F-distribution. Details are given by Gallagher (2006). If this is the case, use either variant 2, or use the F-value which proc ttest provides and the formula of variant 2 for the two-sided p-value.

R code

status0<-blood_pressure$mmhg[blood_pressure$status==0]
status1<-blood_pressure$mmhg[blood_pressure$status==1]
var.test(status0,status1,alternative="two.sided")

R output

F = 1.0363, num df = 24, denom df = 29, p-value = 0.918

Remarks:

alternative=“value” is optional and indicates the type of alternative hypothesis: “two.sides” (A); “greater” (B); “less” (C). Default is “two.sided”.

3.2.2 $c03-math-0094$ -test on variances of two dependent populations

Description:	Tests if two population variances $c03-math-0096$ and $c03-math-0097$ differ from each other.
Assumptions:	Data are measured on an interval or ratio scale and are randomly sampled in pairs $c03-math-0098$ . $c03-math-0099$ follows a Gaussian distribution with mean $c03-math-0100$ and variance $c03-math-0101$ . $c03-math-0102$ follows a Gaussian distribution with mean $c03-math-0103$ and variance $c03-math-0104$ .

Hypotheses:	(A) $c03-math-0105$ vs $c03-math-0106$
	(B) $c03-math-0107$ vs $c03-math-0108$
	(C) $c03-math-0109$ vs $c03-math-0110$
Test statistic:	$c03-math-0111$

equation

Test decision:	Reject $c03-math-0113$ if for the observed value $c03-math-0114$ of $c03-math-0115$
	(A) $c03-math-0116$ or $c03-math-0117$
	(B) $c03-math-0118$
	(C) $c03-math-0119$
p-value:	(A) $c03-math-0120$
	(B) $c03-math-0121$
	(C) $c03-math-0122$
Annotations:	The test statistic $c03-math-0123$ is t-distributed with $c03-math-0124$ degrees of freedom. $c03-math-0125$ is the $c03-math-0126$ -quantile of the t-distribution with $c03-math-0127$ degrees of freedom. This test is very sensitive to violations of the Gaussian assumption (Sheskin 2007, pp. 754–755). Here, $c03-math-0128$ denotes the correlation coefficient between $c03-math-0129$ and $c03-math-0130$ .

Example

To test the hypothesis that the variance of intelligence quotients before training (IQ1) and after training (IQ2) stays the same. The dataset contains 20 subjects (dataset in Table A.2).

SAS code

* Calculate sample standard deviations;
* and sample size;
proc means data=iq std;
 var iq1;
 output out=std1 std=std1 n=n_total;
run;
proc means data=iq std;
 var iq2;
 output out=std2 std=std2 n=n_total;
run;
data ttest01;
 merge std1 std2;
run;
* Calculate correlation coefficient;
proc corr data=iq OUTP=corr01;
 var iq1 iq2;
run;
data corr02;
 set corr01;
 if _TYPE_='CORR' and _NAME_='IQ1';
 rename IQ2 = r;
 drop _TYPE_;
run;
data ttest02;
 merge ttest01 corr02;
run;
* Calculate test statistic and two-sided p-value;
data ttest03;
 set ttest02;
 format p_value pvalue.;
 df=n_total-2;
 t=((df**0.5)*(std1**2-std2**2))/
                      (4*(1-r**2)*(std1**2)*(std2**2));
 p_value=2*probt(-abs(t),df);
run;
* Output results;
proc print;
 var t df p_value;
run;

SAS output

     t        df   p_value
0.007821987   18   0.9938

Remarks:

There is no SAS procedure to calculate this test directly.
The one-sided p-value for hypothesis (B) can be calculated with p_value_B=1-probt(t,df) and the p-value for hypothesis (C) with p_value_C=probt(t,df).

R code

# Calculate sample standard deviations
# and sample size
std1=sd(iq$IQ1)
std2=sd(iq$IQ2)
n_total<-length(iq$IQ1)
# Calculate correlation coefficient
r<-cor(iq$IQ1,iq$IQ2)
# Calculate test statistic and two-sided p-value
df<-n_total-2;
t<-(sqrt(df)*(std1∧2-std2∧2))/(4*(1-r∧2)*std1∧2*std2∧2)
p_value=2*pt(-abs(t),df)
# Output results
t
df
p_value

R output

> t
[1] 0.007821987
> df
[1] 18
> p_value
[1] 0.993845

Remarks:

There is no basic R function to calculate this test directly.
The one-sided p-value for hypothesis (B) can be calculated with p_value_B=1-pt(t,df) and the p-value for hypothesis (C) with p_value_C=pt(t,df).

References

Gallagher J. 2006 The F test for comparing two normal variances: correct and incorrect calculation of the two-sided p-value. Teaching Statistics 28, 58–60.

Sheskin D.J. 2007 Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: Tests on the variance

Create new playlist

Sign In

Sign Up

Chapter 3

Tests on the variance

3.1 One-sample tests

3.1.1 -test on the variance (mean known)

3.1.2 -test on the variance (mean unknown)

3.2 Two-sample tests

3.2.1 Two-sample -test on variances of two populations

3.2.2 -test on variances of two dependent populations

References

Table of Contents for
Chapter 3: Tests on the variance

3.1.1 $c03-math-0002$ -test on the variance (mean known)

3.1.2 $c03-math-0032$ -test on the variance (mean unknown)

3.2.1 Two-sample $c03-math-0062$ -test on variances of two populations

3.2.2 $c03-math-0094$ -test on variances of two dependent populations