Non-parametric methods

When a training dataset does not conform to any specific probability distribution because of non-adherence to the assumptions of that specific probability distribution, the only option left to analyze the data is via non-parametric methods. Non-parametric methods do not follow any assumption regarding the probability distribution. Using non-parametric methods, one can draw inferences and perform hypothesis testing without adhering to any assumptions. Now let's look at a set of on-parametric tests that can be used when a dataset does not conform to the assumptions of any specific probability distribution.

Wilcoxon signed-rank test

If the assumption of normality is violated, then it is required to apply non-parametric methods in order to answer a question such as: is there any difference in the mean mileage within the city between automatic and manual transmission type cars?

> wilcox.test(Cars93$MPG.city~Cars93$Man.trans.avail, correct = F)

Wilcoxon rank sum test

data: Cars93$MPG.city by Cars93$Man.trans.avail

W = 380, p-value = 1e-06

alternative hypothesis: true location shift is not equal to 0

The argument paired can be used if the two samples happen to be matching pairs and the samples do not follow the assumptions of normality:

> wilcox.test(Cars93$MPG.city, Cars93$MPG.highway, paired = T)

Wilcoxon signed rank test with continuity correction

data: Cars93$MPG.city and Cars93$MPG.highway

V = 0, p-value <2e-16

alternative hypothesis: true location shift is not equal to 0

Mann-Whitney-Wilcoxon test

If two samples are not matched, are independent, and do not follow a normal distribution, then it is required to use Mann-Whitney-Wilcoxon test to test the hypothesis that the mean difference in the two samples are statistically significantly different from each other:

> wilcox.test(Cars93$MPG.city~Cars93$Man.trans.avail, data=Cars93)

Wilcoxon rank sum test with continuity correction

data: Cars93$MPG.city by Cars93$Man.trans.avail

W = 380, p-value = 1e-06

alternative hypothesis: true location shift is not equal to 0

Kruskal-Wallis test

To compare means of more than two groups, that is, the non-parametric side of ANOVA analysis, we can use the Kruskal-Wallis test. It is also known as a distribution-free statistical test:

> kruskal.test(Cars93$MPG.city~Cars93$Cylinders, data= Cars93)

Kruskal-Wallis rank sum test

data: Cars93$MPG.city by Cars93$Cylinders

Kruskal-Wallis chi-squared = 68, df = 5, p-value = 3e-13
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.36.221