In this chapter we present tests for the parameter of a binomial distribution. We first treat a test on the population proportion in the one-sample case. We further cover tests for the difference of two proportions using the pooled as well as the unpooled variances. The last test in this chapter deals with the equality of proportions for the multi-sample case. Not all tests are covered by a SAS procedure or R function. We give the appropriate sample code to perform all discussed tests.
In this section we deal with the question, if a population proportion differs from a predefined value between 0 and 1.
Description: | Tests if a population proportion differs from a value . |
Assumptions: |
|
Hypothesis: | (A) vs |
(B) vs | |
(C) vs | |
Test statistic: |
Test decision: | Reject if for the observed value of |
(A) or | |
(B) | |
(C) | |
p-value: | (A) |
(B) | |
(C) | |
Annotation: |
|
*** Version 1 ***; * Only for hypothesis (A) and (C); proc freq data=malfunction; tables malfunction / binomial(level='1' p=.5 correct); exact binomial; run; *** Version 2 ***; * For hypothesis (A), (B), and (C); * Calculate the numbers of successes and failures; proc sort data=malfunction; by malfunction; run; proc summary data=malfunction n; var malfunction; by malfunction; output out=ptest01 n=n; run; * Retrieve the number of successe and failures; data ptest02 ptest03;; set ptest01; if malfunction=0 then output ptest02; if malfunction=1 then output ptest03; run; * Rename number of failures; data ptest02; set ptest02; rename n=failures; drop malfunction _TYPE_ _FREQ_; run; * Rename number of successes; data ptest03; set ptest03; rename n=successes; drop malfunction _TYPE_ _FREQ_; run; * Calculate test statistic and p-values; data ptest04; merge ptest02 ptest03; format test $20.; n=successes+failures; * Estimated Proportion; p_estimate=successes/n; * Proportion to test; p0=0.5; * Perform exact test; test="Exact"; p_value_B=probbnml(p0,n,failures); p_value_C=probbnml(p0,n,successes); p_value_A=2*min(p_value_B,p_value_c); output; * Perform asymptotic test; test="Asymptotic"; Z=(successes-n*p0)/sqrt((n*p0*(1-p0))); p_value_A=2*probnorm(-abs(Z)); p_value_B=1-probnorm(-abs(Z)); p_value_C=probnorm(-abs(Z)); output; * Perform asymptotic test with continuity correction; test=“Asymptotic with correction”; Z=(abs(successes-n*p0)-0.5)/sqrt((n*p0*(1-p0))); p_value_A=2*probnorm(-abs(Z)); p_value_B=1-probnorm(-abs(Z)); p_value_C=probnorm(-abs(Z)); output; run; * Output results; proc print; var test Z p_estimate p0 p_value_A p_value_B p_value_C; run;
Version 1 Test of H0: Proportion = 0.5 ASE under H0 0.0791 Z -1.4230 One-sided Pr < Z 0.0774 Two-sided Pr > |Z| 0.1547 Exact Test One-sided Pr <= P 0.0769 Two-sided = 2 * One-sided 0.1539 The asymptotic confidence limits and test include a continuity correction. Version 2 test p_value_A p_value_B p_value_C Exact 0.15386 0.95965 0.076930 Asymptotic 0.11385 0.94308 0.056923 Asymptotic with corr 0.15473 0.92264 0.077364
# Number of observations n<-length(malfunction$malfunction) # Number of successes d<-length(malfunction$malfunction [malfunction$malfunction==1]) # Proportion to test p0<-0.5 # Exact test binom.test(d,n,p0,alternative="two.sided") # Asymptotic test prop.test(d,n,p0,alternative="two.sided",correct=TRUE)
Exact binomial test number of successes = 15, number of trials = 40, p-value = 0.1539 1-sample proportions test with continuity correction X-squared = 2.025, df = 1, p-value = 0.1547
In this section we deal with the question, if proportions of two independent populations differ from each other. We present two tests for this problem (Keller and Warrack 1997). In the first case the standard deviations of both distributions may differ from each other. In the second case equal but unknown standard deviations are assumed such that both samples can be pooled to obtain a better estimate of the standard deviation. Both presented tests are based on an asymptotic standard normal distribution.
Description: | Tests if two population proportions and differ by a specific value . |
Assumptions: |
|
Hypothesis: | (A) vs |
(B) vs | |
(C) vs | |
Test statistic: | |
where and | |
Test decision: | Reject if for the observed value of |
(A) or | |
(B) | |
(C) | |
p-value: | (A) |
(B) | |
(C) | |
Annotation: |
|
* Determining sample sizes and number of successes; proc means data=malfunction n sum; var malfunction; by company; output out=prop1 n=n sum=success; run; * Retrieve these results as two separate datasets; data propA propB; set prop1; if company="A" then output propA; if company="B" then output propB; run; * Relative frequencies of successes for company A; data propA; set propA; keep n success p1; rename n=n1 success=success1; p1=success/n; run; * Relative frequencies of successes for company B; data propB; set propB; keep n success p2; rename n=n2 success=success2; p2=success/n; run; * Merge datasets of company A and B; data prop2; merge propA propB; run; * Calculate test statistic and p-value; data prop3; set prop2; format p_value pvalue.; p_diff=p1-p2; *Difference of proportions; d0=0.10; *Difference to be tested; * Test statistic and p-values; z=(p_diff-d0)/sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2); p_value=2*probnorm(-abs(z)); run; proc print; var z p_value; run;
z p_value 1.75142 0.0799
# Number of observations for company A n1<-length(malfunction$malfunction [malfunction$company=='A']) # Number of successes for company A s1<-length(malfunction$malfunction[malfunction$company=='A' & malfunction$malfunction==1]) # Number of observations for company B n2<-length(malfunction$malfunction [malfunction$company=='B']) # Number of successes for company B s2<-length(malfunction$malfunction[malfunction$company=='B' & malfunction$malfunction==1]) # Proportions p1=s1/n1 p2=s2/n2 # Difference of proportions p_diff=p1-p2 # Difference to test d0=0.10 # Test statistic and p-values z=(p_diff-d0)/sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2) p_value=2*pnorm(-abs(z)) # Output results z p_value
> z [1] 1.751424 > p_value [1] 0.07987297
Description: | Tests if two population proportions and differ from each other. |
Assumptions: |
|
Hypothesis: | (A) vs |
(B) vs | |
(C) vs | |
Test statistic: | |
where and | |
Test decision: | Reject if for the observed value of |
(A) or | |
(B) | |
(C) | |
p-value: | (A) |
(B) | |
(C) | |
Annotation: |
|
* Determining sample sizes and number of successes; proc means data=malfunction n sum; var malfunction; by company; output out=prop1 n=n sum=success; run; * Retrieve these results in two separate datasets; data propA propB; set prop1; if company="A" then output propA; if company="B" then output propB; run; * Relative frequencies of successes for company A; data propA; set propA; keep n success p1; rename n=n1 success=success1; p1=success/n; run; * Relative frequencies of successes for company B; data propB; set propB; keep n success p2; rename n=n2 success=success2; p2=success/n; run; * Merge datasets of company A and B; data prop2; merge propA propB; run; * Calculate test statistic and p-value; data prop3; set prop2; format p_value pvalue.; * Test statistic and p-values; p=(p1*n1+p2*n2)/(n1+n2); z=(p1-p2)/sqrt((p*(1-p))*(1/n1+1/n2)); p_value=2*probnorm(-abs(z)); run; proc print; var z p_value; run;
z p_value 2.28619 0.0222
# Number of observations for company A n1<-length(malfunction$malfunction [malfunction$company=='A']) # Number of successes for company A s1<-length(malfunction$malfunction[malfunction$company=='A' & malfunction$malfunction==1]) # Number of observations for company B n2<-length(malfunction$malfunction [malfunction$company=='B']) # Number of successes for company A s2<-length(malfunction$malfunction[malfunction$company=='B' & malfunction$malfunction==1]) # Proportions p1=s1/n1 p2=s2/n2 # Test statistic and p-value p=(p1*n1+p2*n2)/(n1+n2) z=(p1-p2)/sqrt((p*(1-p))*(1/n1+1/n2)) p_value=2*pnorm(-abs(z)) # Output results z p_value
> z [1] 2.286190 > p_value [1] 0.02224312
Next we present the population proportion equality test for samples [see Bain and Engelhardt (1991) for further details]. If we have independent binomial samples we can arrange them in a table and take advantage of results on contingency tables. We concentrate on the -test based on asymptotic results, although Fisher's exact test can be used as well. More details are given in Chapter 14.
Description: | Tests if population proportions, , differ from each other. |
Assumptions: |
|
Hypothesis: | vs for at least one |
Test statistic: | |
where , , | |
, , , . | |
Test decision: | Reject if for the observed value of |
p-value: |
Annotation: |
|
data counts; input r c counts; datalines; 1 1 10 1 0 9 2 1 12 2 0 8 3 1 14 3 0 7 ; run; proc freq; tables r*c /chisq; weight counts; run;
Statistic DF Value Prob Chi-Square 2 0.8187 0.6641
x1 <- matrix(c(10, 12, 14, 9, 8, 7), ncol = 2) chisq.test(x1)
X-squared = 0.8187, df = 2, p-value = 0.6641
Bain L.J. and Engelhardt M. 1991 Introduction to Probability and Mathematical Statistics, 1st edn. Duxbury Press.
Keller G. and Warrack B. 1997 Statistics for Management and Economics, 4th edn. Duxbury Press.
3.15.237.123