Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4 Tests on proportions

In this chapter we present tests for the parameter of a binomial distribution. We first treat a test on the population proportion in the one-sample case. We further cover tests for the difference of two proportions using the pooled as well as the unpooled variances. The last test in this chapter deals with the equality of proportions for the multi-sample case. Not all tests are covered by a SAS procedure or R function. We give the appropriate sample code to perform all discussed tests.

4.1 One-sample tests

In this section we deal with the question, if a population proportion differs from a predefined value between 0 and 1.

4.1.1 Binomial test

Description:	Tests if a population proportion $c04-math-0001$ differs from a value $c04-math-0002$ .
Assumptions:	Data are randomly sampled from a large population with two possible outcomes. Let $c04-math-0003$ be denoted as “success” and $c04-math-0004$ as “failure”. The parameter $c04-math-0005$ of interest is given by the proportion of successes in the population. The number of successes $c04-math-0006$ in a random sample of size $c04-math-0007$ follows a binomial distribution $c04-math-0008$ .
Hypothesis:	(A) $c04-math-0009$ vs $c04-math-0010$
	(B) $c04-math-0011$ vs $c04-math-0012$
	(C) $c04-math-0013$ vs $c04-math-0014$
Test statistic:	$c04-math-0015$

Test decision:	Reject $c04-math-0016$ if for the observed value $c04-math-0017$ of $c04-math-0018$
	(A) $c04-math-0019$ or $c04-math-0020$
	(B) $c04-math-0021$
	(C) $c04-math-0022$
p-value:	(A) $c04-math-0023$
	(B) $c04-math-0024$
	(C) $c04-math-0025$
Annotation:	This is the large sample test. If the sample size is large [rule of thumb: $c04-math-0026$ ] the test statistic $c04-math-0027$ is approximately a standard normal distribution. For small samples an exact test with $c04-math-0028$ as test statistic and critical regions based on the binomial distribution are used.

Example

To test the hypothesis that the proportion of defective workpieces of a machine equals 50%. The available dataset contains $c04-math-0029$ observations (dataset in Table A.4).

SAS code

*** Version 1 ***;
* Only for hypothesis (A) and (C);
proc freq data=malfunction;
  tables malfunction / binomial(level='1' p=.5 correct);
  exact binomial;
run;
*** Version 2 ***;
* For hypothesis (A), (B), and (C);
* Calculate the numbers of successes and failures;
proc sort data=malfunction;
 by malfunction;
run;
proc summary data=malfunction n;
 var malfunction;
 by malfunction;
 output out=ptest01  n=n;
run;
* Retrieve the number of successe and failures;
data ptest02 ptest03;;
 set ptest01;
 if malfunction=0 then output ptest02;
 if malfunction=1 then output ptest03;
run;
* Rename number of failures;
data ptest02;
 set ptest02;
 rename n=failures;
 drop malfunction _TYPE_ _FREQ_;
run;
* Rename number of successes;
data ptest03;
 set ptest03;
 rename n=successes;
 drop malfunction _TYPE_ _FREQ_;
run;
* Calculate test statistic and p-values;
data ptest04;
  merge ptest02 ptest03;
  format test $20.;
  n=successes+failures;
 * Estimated Proportion;
  p_estimate=successes/n;
 * Proportion to test;
  p0=0.5;
 * Perform exact test;
  test="Exact";
  p_value_B=probbnml(p0,n,failures);
  p_value_C=probbnml(p0,n,successes);
  p_value_A=2*min(p_value_B,p_value_c);
  output;
 * Perform asymptotic test;
   test="Asymptotic";
   Z=(successes-n*p0)/sqrt((n*p0*(1-p0)));
   p_value_A=2*probnorm(-abs(Z));
   p_value_B=1-probnorm(-abs(Z));
   p_value_C=probnorm(-abs(Z));
   output;
* Perform asymptotic test with continuity correction;
   test=“Asymptotic with correction”;
   Z=(abs(successes-n*p0)-0.5)/sqrt((n*p0*(1-p0)));
   p_value_A=2*probnorm(-abs(Z));
   p_value_B=1-probnorm(-abs(Z));
   p_value_C=probnorm(-abs(Z));
  output;
run;
* Output results;
proc print;
  var test Z p_estimate p0 p_value_A p_value_B p_value_C;
run;

SAS output

Version 1
    Test of H0: Proportion = 0.5
ASE under H0                 0.0791
Z                           -1.4230
One-sided Pr <  Z            0.0774
Two-sided Pr > |Z|           0.1547
Exact Test
One-sided Pr <=  P           0.0769
Two-sided = 2 * One-sided    0.1539
The asymptotic confidence limits and test
    include a continuity correction.
Version 2
test                  p_value_A   p_value_B  p_value_C
Exact                 0.15386     0.95965    0.076930
Asymptotic            0.11385     0.94308    0.056923
Asymptotic with corr  0.15473     0.92264    0.077364

Remarks:

PROC FREQ is the easiest way to perform the binomial test, but the procedure calculates p-values only for hypotheses (A) and (C).
level= indicates the variable level for successes.
p= specifies p₀. The default is 0.5.
correct requests the asymptotic test with continuity correction. This yields a better approximation in some cases by subtracting 0.5 in the numerator if $c04-math-0030$ and adding 0.5 otherwise. Omitting this option will result in a test without continuity correction.
exact binomial forces SAS to perform the exact test as well.

R code

# Number of observations
n<-length(malfunction$malfunction)
# Number of successes
d<-length(malfunction$malfunction
                [malfunction$malfunction==1])
# Proportion to test
p0<-0.5
# Exact test
binom.test(d,n,p0,alternative="two.sided")
# Asymptotic test
prop.test(d,n,p0,alternative="two.sided",correct=TRUE)

R output

Exact binomial test
number of successes = 15, number of trials = 40,
                                  p-value = 0.1539
1-sample proportions test with continuity correction
X-squared = 2.025, df = 1, p-value = 0.1547

Remarks:

The function binom.test calculates the exact test and the function prop.test the asymptotic test.
The first parameter of both functions is for the number of successes, the second parameter for the number of trials and the third parameter for the proportion to test for.
alternative=“value” is optional and indicates the type of alternative hypothesis: “two.sided”= two sided (A); “greater”=true proportion is greater (B); “less”=true proportion is lower (C). Default is “two.sided”.
The asymptotic test provides an additional parameter. With “corrected=TRUE” the test with continuity correction is applied. This yields a better approximation in some cases. A Yates' continuity correction is applied, but only if $c04-math-0031$ . The default value is “correct=FALSE”.
Because the test statistics of the one-sample proportion test and the $c04-math-0032$ -test for one-way tables are equivalent, R uses the latter test.

4.2 Two-sample tests

In this section we deal with the question, if proportions of two independent populations differ from each other. We present two tests for this problem (Keller and Warrack 1997). In the first case the standard deviations of both distributions may differ from each other. In the second case equal but unknown standard deviations are assumed such that both samples can be pooled to obtain a better estimate of the standard deviation. Both presented tests are based on an asymptotic standard normal distribution.

4.2.1 z-test for the difference of two proportions (unpooled variances)

Description:

Tests if two population proportions $c04-math-0033$ and $c04-math-0034$ differ by a specific value $c04-math-0035$ .

Assumptions:	Data are randomly sampled with two possible outcomes. Let $c04-math-0036$ be denoted as “success” and $c04-math-0037$ as “failure”. The parameters $c04-math-0038$ and $c04-math-0039$ are the proportions of success in the two populations. Data are randomly sampled from two populations with sample sizes $c04-math-0040$ and $c04-math-0041$ . The number of successes $c04-math-0042$ in the $c04-math-0043$ sample follows a binomial distribution $c04-math-0044$ , $c04-math-0045$ .
Hypothesis:	(A) $c04-math-0046$ vs $c04-math-0047$
	(B) $c04-math-0048$ vs $c04-math-0049$
	(C) $c04-math-0050$ vs $c04-math-0051$
Test statistic:	$c04-math-0052$
	where $c04-math-0053$ and $c04-math-0054$
Test decision:	Reject $c04-math-0055$ if for the observed value $c04-math-0056$ of $c04-math-0057$
	(A) $c04-math-0058$ or $c04-math-0059$
	(B) $c04-math-0060$
	(C) $c04-math-0061$
p-value:	(A) $c04-math-0062$
	(B) $c04-math-0063$
	(C) $c04-math-0064$
Annotation:	This is a large sample test. If the sample size is large enough the test statistic $c04-math-0065$ is a standard normal distribution. As a rule of thumb $c04-math-0066$ , $c04-math-0067$ , $c04-math-0068$ and $c04-math-0069$ should all be $c04-math-0070$

Example

To test the hypothesis that the proportion of defective workpieces of company A and company B differ by 10%. The dataset contains $c04-math-0071$ observations from company A and $c04-math-0072$ observations from company B (dataset in Table A.4).

SAS code

* Determining sample sizes and number of successes;
proc means data=malfunction n sum;
 var malfunction;
 by company;
 output out=prop1 n=n sum=success;
run;
* Retrieve these results as two separate datasets;
data propA propB;
 set prop1;
 if company="A" then output propA;
 if company="B" then output propB;
run;
* Relative frequencies of successes for company A;
data propA;
 set propA;
 keep n success p1;
 rename n=n1
        success=success1;
 p1=success/n;
run;
* Relative frequencies of successes for company B;
data propB;
 set propB;
 keep n success p2;
 rename n=n2
        success=success2;
 p2=success/n;
run;
* Merge datasets of company A and B;
data prop2;
 merge propA propB;
run;
* Calculate test statistic and p-value;
data prop3;
 set prop2;
 format p_value pvalue.;
 p_diff=p1-p2; *Difference of proportions;
 d0=0.10;      *Difference to be tested;
 * Test statistic and p-values;
 z=(p_diff-d0)/sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2);
 p_value=2*probnorm(-abs(z));
run;
proc print;
 var z p_value;
run;

SAS output

   z        p_value
1.75142     0.0799

Remarks:

There is no SAS procedure to calculate this test directly.
The data do not fulfill the criteria to ensure that the test statistic $c04-math-0073$ is a Gaussian distribution, because $c04-math-0074$ , therefore the p-value is questionable.

R code

# Number of observations for company A
n1<-length(malfunction$malfunction
                            [malfunction$company=='A'])
# Number of successes for company A
s1<-length(malfunction$malfunction[malfunction$company=='A'
                             & malfunction$malfunction==1])
# Number of observations for company B
n2<-length(malfunction$malfunction
                            [malfunction$company=='B'])
# Number of successes for company B
s2<-length(malfunction$malfunction[malfunction$company=='B'
                             & malfunction$malfunction==1])
# Proportions
p1=s1/n1
p2=s2/n2
# Difference of proportions
p_diff=p1-p2
# Difference to test
d0=0.10
# Test statistic and p-values
z=(p_diff-d0)/sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2)
p_value=2*pnorm(-abs(z))
# Output results
z
p_value

R output

> z
[1] 1.751424
> p_value
[1] 0.07987297

Remarks:

There is no R function to calculate this test directly.
The data do not fulfill the criteria to ensure that the test statistic $c04-math-0075$ is a Gaussian distribution, because $c04-math-0076$ , therefore the p-value is questionable.

4.2.2 z-test for the equality between two proportions (pooled variances)

Description:	Tests if two population proportions $c04-math-0077$ and $c04-math-0078$ differ from each other.
Assumptions:	Data are randomly sampled with two possible outcomes. Let $c04-math-0079$ be denoted as “success” and $c04-math-0080$ as “failure”. The parameters $c04-math-0081$ and $c04-math-0082$ are the proportions of success in the two populations. Data are randomly sampled from two populations with sample sizes $c04-math-0083$ and $c04-math-0084$ . The number of successes $c04-math-0085$ in the $c04-math-0086$ sample follow a binomial distribution $c04-math-0087$ , $c04-math-0088$ .
Hypothesis:	(A) $c04-math-0089$ vs $c04-math-0090$
	(B) $c04-math-0091$ vs $c04-math-0092$
	(C) $c04-math-0093$ vs $c04-math-0094$
Test statistic:	$c04-math-0095$
	where $c04-math-0096$ and $c04-math-0097$
Test decision:	Reject $c04-math-0098$ if for the observed value $c04-math-0099$ of $c04-math-0100$
	(A) $c04-math-0101$ or $c04-math-0102$
	(B) $c04-math-0103$
	(C) $c04-math-0104$
p-value:	(A) $c04-math-0105$
	(B) $c04-math-0106$
	(C) $c04-math-0107$
Annotation:	This is a large sample test. If the sample size is large enough the test statistic $c04-math-0108$ is a standard normal distribution. As a rule of thumb following $c04-math-0109$ , $c04-math-0110$ , $c04-math-0111$ and $c04-math-0112$ should all be $c04-math-0113$ This test is equivalent to the $c04-math-0114$ -test of a $c04-math-0115$ table, that is, $c04-math-0116$ . The advantage of the $c04-math-0117$ -test is that there exists an exact test for small samples, which calculates the p-values from the exact distribution. This test is the famous Fisher's exact test. More information is given in Chapter 14.

Example

To test the hypothesis that the proportion of defective workpieces of company A and company B are equal. The dataset contains $c04-math-0118$ observations from company A and $c04-math-0119$ observations from company B (dataset in Table A.4).

SAS code

* Determining sample sizes and number of successes;
proc means data=malfunction n sum;
 var malfunction;
 by company;
 output out=prop1 n=n sum=success;
run;
* Retrieve these results in two separate datasets;
data propA propB;
 set prop1;
 if company="A" then output propA;
 if company="B" then output propB;
run;
* Relative frequencies of successes for company A;
data propA;
 set propA;
 keep n success p1;
 rename n=n1
        success=success1;
 p1=success/n;
run;
* Relative frequencies of successes for company B;
data propB;
 set propB;
 keep n success p2;
 rename n=n2
        success=success2;
 p2=success/n;
run;
* Merge datasets of company A and B;
data prop2;
 merge propA propB;
run;
* Calculate test statistic and p-value;
data prop3;
 set prop2;
 format p_value pvalue.;
 * Test statistic and p-values;
 p=(p1*n1+p2*n2)/(n1+n2);
 z=(p1-p2)/sqrt((p*(1-p))*(1/n1+1/n2));
 p_value=2*probnorm(-abs(z));
run;
proc print;
 var z p_value;
run;

SAS output

   z       p_value
2.28619    0.0222

Remarks:

There is no SAS procedure to calculate this test directly.
The data do not fulfill the criteria to ensure that the test statistic $c04-math-0120$ is a Gaussian distribution, because $c04-math-0121$ . In this case it is better to use Fisher's exact test, see Chapter 14.

R code

# Number of observations for company A
n1<-length(malfunction$malfunction
                       [malfunction$company=='A'])
# Number of successes for company A
s1<-length(malfunction$malfunction[malfunction$company=='A'
                        & malfunction$malfunction==1])
# Number of observations for company B
n2<-length(malfunction$malfunction
                       [malfunction$company=='B'])
# Number of successes for company A
s2<-length(malfunction$malfunction[malfunction$company=='B'
                       & malfunction$malfunction==1])
# Proportions
p1=s1/n1
p2=s2/n2
# Test statistic and p-value
p=(p1*n1+p2*n2)/(n1+n2)
z=(p1-p2)/sqrt((p*(1-p))*(1/n1+1/n2))
p_value=2*pnorm(-abs(z))
# Output results
z
p_value

R output

> z
[1] 2.286190
> p_value
[1] 0.02224312

Remarks:

There is no R function to calculate this test directly.
The data do not fulfill the criteria to ensure that the test statistic $c04-math-0122$ is a Gaussian distribution, because $c04-math-0123$ . In this case it is better to use Fisher's exact test, see Chapter 14.

4.3 $c04-math-0124$ -sample tests

Next we present the population proportion equality test for $c04-math-0125$ samples [see Bain and Engelhardt (1991) for further details]. If we have $c04-math-0126$ independent binomial samples we can arrange them in a $c04-math-0127$ table and take advantage of results on contingency tables. We concentrate on the $c04-math-0128$ -test based on asymptotic results, although Fisher's exact test can be used as well. More details are given in Chapter 14.

4.3.1 $c04-math-0129$ -sample binomial test

Description:	Tests if $c04-math-0130$ population proportions, $c04-math-0131$ , differ from each other.
Assumptions:	Data are randomly sampled with two possible outcomes. Let $c04-math-0132$ be denoted as “success” and $c04-math-0133$ as “failure”. The parameters $c04-math-0134$ are the proportions of success in the $c04-math-0135$ populations, $c04-math-0136$ . Data are randomly sampled from the $c04-math-0137$ populations with sample sizes $c04-math-0138$ , $c04-math-0139$ . The number of successes $c04-math-0140$ in the $c04-math-0141$ ^th sample follow a binomial distribution $c04-math-0142$ , $c04-math-0143$ .
Hypothesis:	$c04-math-0144$ vs $c04-math-0145$ for at least one $c04-math-0146$
Test statistic:	$c04-math-0147$
	where $c04-math-0148$ , $c04-math-0149$ ,
	$c04-math-0150$ , $c04-math-0151$ , $c04-math-0152$ , $c04-math-0153$ .
Test decision:	Reject $c04-math-0154$ if for the observed value $c04-math-0155$ of $c04-math-0156$
	$c04-math-0157$
p-value:	$c04-math-0158$

Annotation:

The test statistic $c04-math-0159$ is $c04-math-0160$ -distributed.
$c04-math-0161$ is the (1- $c04-math-0162$ )-quantile of the $c04-math-0163$ -distribution with $c04-math-0164$ degrees of freedom.
If not all expected absolute frequencies $c04-math-0165$ are larger or equal to $c04-math-0166$ , use Fisher's exact test (see Test 14.1.1).

Example

The proportions of male carp in three ponds are tested for equality. The observed relative frequency of male carp in pond one is 10/19, in pond two 12/20, and in pond three 14/21.

SAS code

data counts;
input r c counts;
datalines;
1 1 10
1 0  9
2 1 12
2 0  8
3 1 14
3 0  7
;
run;
proc freq;
 tables r*c /chisq;
 weight counts;
run;

SAS output

Statistic      DF     Value     Prob
Chi-Square     2      0.8187    0.6641

Remarks:

The data step constructs a $c04-math-0167$ contingency, with r for rows (ponds 1 to 3) and c for columns (1 for male and 0 for female carp). The variable counts includes the counts for each combination between ponds and sex. In proc freq these counts can be passed by using the weight statement.
With proc freq it is also possible to use raw data instead of a predefined contingency table to perform these tests. In this case there must be one variable for the ponds and one for the sex and one row for each carp. Use the same SAS statement but omit the weight command.
Because the null hypothesis is rejected if $c04-math-0168$ , the p-value must be calculated as 1-pchisq(0.8187,2).

R code

x1 <- matrix(c(10, 12, 14, 9, 8, 7), ncol = 2)
chisq.test(x1)

R output

X-squared = 0.8187, df = 2, p-value = 0.6641

Remarks:

The matrix command constructs a matrix $c04-math-0169$ with the ponds in the columns and the male carp population in the first row and the female carp population the second row. This matrix can then be passed on to the chisq.test function.
Because the null hypothesis is rejected if $c04-math-0170$ , the p-value must be calculated as 1-pchisq(0.8187,2).

References

Bain L.J. and Engelhardt M. 1991 Introduction to Probability and Mathematical Statistics, 1st edn. Duxbury Press.

Keller G. and Warrack B. 1997 Statistics for Management and Economics, 4th edn. Duxbury Press.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4: Tests on proportions

Create new playlist

Sign In

Sign Up

Chapter 4

Tests on proportions

4.1 One-sample tests

4.1.1 Binomial test

4.2 Two-sample tests

4.2.1 z-test for the difference of two proportions (unpooled variances)

4.2.2 z-test for the equality between two proportions (pooled variances)

4.3 -sample tests

4.3.1 -sample binomial test

References

Table of Contents for
Chapter 4: Tests on proportions

4.3 $c04-math-0124$ -sample tests

4.3.1 $c04-math-0129$ -sample binomial test