How to do it...

In this example, we will use a dataset that contains several days of data for a website selling furniture. On each day, 50% of the website users are exposed to each background color (red or blue). Also, within each group, 50% of them are exposed to a classic font type, and 50% of them to a modern font type. The objective is to understand whether these two factors cause any impact on the sales for this website. Finally, if there are differences, we want to characterize them:

Load the data, as follows:

library(dplyr)
 r = read.csv("./2wayanova.csv")

We need to prepare the ANOVA table via the aov() function. The formula that we will use is Sales ~ Colour + Font + Font*Colour, meaning that we want to model Sales as a function of two factors: the color and the font type. The Font*Colour term is the interaction: we want to test it and proceed only if it is found to be nonsignificative. Then, we do a quantile-quantile plot to determine whether the residuals are Gaussian (ANOVA needs the residuals to be Gaussian and have the same variance in each group):

d = aov(Sales ~ Colour + Font + Font*Colour,data=r )
 plot(d, 2)
 shapiro.test(residuals(d))

The preceding code generates the following output of qqplot—normality doesn't seem reasonable here as the residuals don't lie over a line:

The shapiro-wilk test rejects the null hypothesis of normality:

Evidently, the quantile-quantile plot indicates that the data is not Gaussian, mainly because of a few observations on the left. We can identify them by clicking on them and pressing Esc. Hopefully, after removing them, we can rerun our analysis and get correct estimates. Another way of testing the normality of the data is by using the Shapiro-Wilk statistic; it tests the null hypothesis that the data is Gaussian (in this case, we reject it since we get a p-value of 8.624e-05, confirming our analysis using qqplot). So, we remove observations 22 and 44:

r = r[-c(22,44),]
 d = aov(Sales ~ Colour + Font + Font*Colour,data=r )
 plot(d, 2)
 shapiro.test(residuals(d))

The preceding code generate the following output of qqplot—the residuals are all over the line once we remove the two abnormal values:

The following screenshot shows that the shapiro-wilk test looks much better now; we do not reject the normality of residuals:

This looks much better since most of the points lie near the line. This is confirmed by shapiro-Wilk p-value=0.8688 (we don't reject the null hypothesis of normality). Once we are satisfied with the normality of the residuals, we can evaluate whether the variance is similar across the different groups. In this case, we can conclude that, except for a few cases, the spread of the data is fairly similar:

plot(d, 1)

The preceding code generates the following output of residuals versus fitted. There is no obvious structure in the residuals, and the variance seems to be constant as we move through the X axis:

Finally, we get the anova table using the following code. The interaction is obviously not significative:

anova(d)

This is the resulting ANOVA table:

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...