How to do it...

In this example, we will use the very famous iris dataset, which contains 50 observations for 3 flower species (there are three species: Iris setosa, Iris virginica, and Iris versicolor). There are four measurements for each observation: the length and width of the sepals and petals. Let's get started:

First, we do our test. The formula here needs to specify all of the variables separated by | and, after that, the group type specified by ~. Actually, this function will return four different statistics, calculated using two different p-values (the standard ones and the permutation ones—the latter are more precise). This actually leads to 4 x 2 = 8 p-values. In this case, all the statistics agree that there are differences between the joint distributions along the three groups. The second part of this output contains the relative treatment effects: they are very different for each flower type. These are actually probabilities, and can be interpreted as the probability of picking a random member of that group that has a greater value of that variable for all of the groups. For example, virginica/Sepal.Length=0.75 means that there is a 0.75 probability of picking a random Sepal.Length virginica that is greater than any other random Sepal.Length value from any group (including virginica). These are known as empirical non parametric relative treatment effects:

nonpartest(Sepal.Length | Sepal.Width | Petal.Length | Petal.Width ~ Species, data = iris, permreps = 2000)

The preceding code displays the following output of nonpartest:

After (and if) the test is rejected, we can do a more focused analysis to identify the groups or variables causing the differences. Because we have four statistics, we need to choose one. This is done via the test= parameter. This should be a vector containing 0s and a 1 for the test statistic that should be used (in this case, we chose the ANOVA one). Based on whether we specify factors_and_variables=TRUE or not, the analysis will use only the factors and variables. The first part starts with a global test involving all of the groups, and cascades down until the significant difference is pinpointed. What we see here is that we first reject the equality of the distributions for the three flowers, and it can actually be traced back to the fact that the setosa-versicolor and setosa-virginica pairs are different. After that, the same procedure follows for the factor levels: the four variables differ from group to group (Petal.Width, Petal.Length, Sepal.Width, and Sepa.Length), as shown in the following example:

ssnonpartest(Sepal.Length | Sepal.Width | Petal.Length | Petal.Width ~ Species, data = iris, test = c(1, 0, 0, 0), alpha = 0.05,
factors.and.variables = TRUE)

The preceding command displays the following output:

We finally conclude that all of the groups are different, and these differences are driven by all of the variables.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...