Chapter 10

Other Tests

In this chapter we present a well-known test for the problem if two independent samples are drawn from the same population or not. The test is based on very few assumptions, for example, it is not necessary to specify the distributions beyond the fact that they are continuous distributions.

10.1 Two-sample tests

10.1.1 Kolmogorov–Smirnov two-sample test (Smirnov test)

Description: Tests if two independent samples are sampled from the same distribution.
Assumptions:
  • Data are at least measured on an ordinal scale.
  • The random variables c10-math-0001 and c10-math-0002 are independent with continuous distribution functions c10-math-0003 and c10-math-0004.
  • Samples c10-math-0005 and c10-math-0006 are independently drawn from the two populations.
Hypotheses: (A) c10-math-0007 vs c10-math-0008 for at least one c10-math-0009
(B) c10-math-0010 vs c10-math-0011 with c10-math-0012 for at least one c10-math-0013
(C) c10-math-0014 vs c10-math-0015 with c10-math-0016 for at least one c10-math-0017
Test statistic: (A) c10-math-0018
(B) c10-math-0019
(C) c10-math-0020
where c10-math-0021 and c10-math-0022 denote the empirical distribution functions based on the two samples.
Test decision: Reject c10-math-0023 if for the observed value c10-math-0024 of c10-math-0025
(A) c10-math-0026
(B) c10-math-0027
(C) c10-math-0028
The critical values c10-math-0029, c10-math-0030, c10-math-0031 can be found for instance in Sheskin (2007, table A.23).
p-values: (A) c10-math-0032
(B) c10-math-0033
(C) c10-math-0034
Annotations:
  • The test statistics evaluate the maximum distances between the two empirical distribution functions.
  • The Smirnov test can be presented as a rank test as the statistics can be written as supremum of linear rank statistics (Steck 1969).
  • The test is known as the Kolmogorov–Smirnov test as well as the Smirnov test for two samples.

Example
To test the hypothesis that the two populations of healthy subjects (status=0) and subjects with hypertension (status=1) do not differ with respect to the distribution of their systolic blood pressure. The dataset contains c10-math-0035 observations for status=0 and c10-math-0036 observations for status=1 (dataset in Table A.1).


SAS code
proc npar1way data=blood_pressure D;
 class status;
 var mmhg;
 exact edf;
run;
SAS output
             The NPAR1WAY Procedure
    Kolmogorov–Smirnov Test for Variable mmhg
          Classified by Variable status
                     EDF at    Deviation from Mean
status       N       Maximum        at Maximum
---------------------------------------------------
0           25      0.880000          2.218182
1           30      0.066667         -2.024914
Total       55      0.436364
     Maximum Deviation Occurred at Observation 25
          Value of mmhg at Maximum = 125.0
              KS  0.4050    KSa  3.0034
    Kolmogorov–Smirnov Two-Sample Test (Asymptotic)
            D = max |F1 - F2|     0.8133
            Pr > D                <.0001
            D+ = max (F1 - F2)    0.8133
            Pr > D+               <.0001
            D- = max (F2 - F1)    0.0000
            Pr > D-               1.0000
Remarks:
  • The option D enables the one-sided (B) and (C) tests in addition to the two-sided test (A). However, if only the two-sided test is desired, do not use any option or the option EDF.
  • exact edf is optional and applies an additional exact test. Note, the computation of an exact test can be very time consuming. Although this option is given in the listing, the output is generated without this option because it would have taken too much time to calculate the exact p-values even for this tiny dataset.
  • c10-math-0037 is the test statistic for hypothesis (B) and c10-math-0038 is the test statistic for hypothesis (C). From Figure 10.1 it can be seen that the cumulative distribution function of the healthy subjects is above the cumulative distribution function of the subjects with hypertension. Accordingly hypothesis (B) is rejected while hypothesis (C) is not.


R code
x<-blood_pressure$mmhg[blood_pressure$status==0]
y<-blood_pressure$mmhg[blood_pressure$status==1]
ks.test(x,y,alternative="two.sided",exact=FALSE)
R output
Two-sample Kolmogorov–Smirnov test
data:  x and y
D = 0.8133, p-value = 2.923e-08
alternative hypothesis: two-sided
Remarks:
  • alternative=“value” is optional and defines the type of alternative hypothesis: “two.sided”= the cumulative distribution functions of c10-math-0039 and c10-math-0040 do not differ (A); “greater”= the cumulative distribution function of c10-math-0041 lies above c10-math-0042 (C); “less”=the cumulative distribution function of c10-math-0043 lies below c10-math-0044 (B). Default is “two.sided”.
  • exact=value is optional. If value is not specified or TRUE an exact p-value is computed if the product of the sample sizes is less than 10 000, otherwise only the approximative p-value is computed. In the case of ties or a one-sided alternative no exact test is computed.
  • c10-math-0045 is the test statistic for hypothesis (B) with option alternative=“greater” and c10-math-0046 is the test statistic for hypothesis (C) with option alternative=“less”. From Figure 10.1 it can be seen that the cumulative distribution function of the healthy subjects is above the cumulative distribution function of the subjects with hypertension. Accordingly hypothesis (B) is rejected while hypothesis (C) is not.

Figure 10.1 Cumulative empirical distribution functions of the blood pressure of healthy subjects (bold lines) and subjects with hypertension (non-bold lines).

c10f001

References

Sheskin D. 2007 Handbook of Parametric and Nonparametric Statistical Procedures, 4nd edn. Chapman & Hall.

Steck G.P. 1969 The Smirnov two sample tests as rank tests. The Annals of Mathematical Statistics 40, 1449–1466.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.31.67