How to do it...

In this exercise, we will focus on a simple ANOVA example and show the mechanics of the sum of squares and F-tests. We will use a balanced dataset and calculate the sum of squares and the F-tests manually. The dataset will contain weight measurements for animals, and these will depend on which food type they were fed on and which lot the animals were assigned. 

In R, we can compute the anova table using the anova function on a linear model that we can estimate via the lm or aov functions:

  1. Load the dataset, as follows:
data = read.csv("./anova__lot_type.csv") 
  1. We need to compute the sum of squares for the first factor. This is defined as the sum of squared deviations between the predictions and the target variable:
result = lm(Result ~ Lot,data=data)
SS_LOT = sum((predict(result)-mean(data$Result))**2)
  1. Now, we will compute the sum of squares for the second factor. Regarding the LOT factor, we just added it and computed the sum of squares. However, we are adding the Lot effect and then adding the food type effect. We are calculating the sum of squares for the food type after we have controlled the Lot effect. This is called the sequential sum of squares and will be the default that R will report when we use the automatic ANOVA functions. In the ANOVA terminology, this is called the Type I sum of squares: it has a major drawback in that it depends on the order that we are specifying the factors in. The first factor was tested in isolation, and the second factor was tested after the first factor was added. In this case, we purposely added the block/control factor first, and we then added the main effect. We usually just want to test the main effect conditional on the block effects (blocking just means adding an extra nuisance factor to get more precision - in our case the Lot); we really don't care much about the block effect, per se. But what would have happened if we had two main effects; how should they have been added? Take a look at the following code:
result = lm(Result ~ Lot + Food.Type,data=data)
SS_FOODTYPE = sum((predict(result)-mean(data$Result))**2) - SS_LOT

  1. Now, we will compute the residual sum of squares. It is defined as the sum of the squared differences between the predictions and the target variable. Note that this is the variability of the model that we can't explain:
SS_ERROR = sum((predict(result)-data$Result)**2)
  1. We now compute the F statistics. They are defined as the ratio between the sum of squares and the residual sum of squares (each one of them respectively divided by their degrees of freedom). The degrees of freedom for the LOT effect is equal to 1, and the degrees of freedom for the food type is equal to 2 (they are equal to the number of levels minus one). The residual degrees of freedom are equal to the number of observations minus the number of parameters that were estimated (in this case we have 60 observations and 4 parameters that were estimated - an intercept, a coefficient for the south lot, and two coefficients for the food type). Note that the p-values are calculated as the area to the right of the test statistic (according to an F distribution). It can be proven that since the F statistic involves the division of two sum of squares (that are distributed according to Chi Square distributions) divided by their respective degrees of freedom, it is distributed as an F distribution. 
FF_LOT        = (SS_LOT/1)/(SS_ERROR/56)  
FF_FOODTYPE = (SS_FOODTYPE/2)/(SS_ERROR/56)
pval_LOT = 1-pf(FF_LOT,1,56)
pval_FOODTYPE = 1-pf(FF_FOODTYPE,2,56)
  1. Let's print all of the values that we have calculated, as follows:
print(paste("SS(ERROR) = ",SS_ERROR))  
print(paste("SS(LOT) =",SS_LOT,"/F(LOT) = ",FF_LOT,"pvalue = ",pval_LOT))
print(paste("SS(FOODTYPE) =",SS_FOODTYPE,"/F(FOODTYPE) = ",FF_FOODTYPE,"pvalue = ",pval_FOODTYPE))

The following screenshot shows the mean squares, F values, and p-values :

  1. The ANOVA table can be printed using the following syntax. Obviously, this matches our calculations:
anova(result)

The preceding code shows the ANOVA table:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.226.120