Calculating ANOVA sum of squares and F tests

Analysis of Variance (ANOVA) is a technique that's used for analyzing the differences between the means from several groups (it is essentially an extension of the t-test to multiple samples). It is deeply tied to a statistical discipline known as experimental design, a discipline that analyzes how to collect the data, how to layout an experiment, and which variables should be measured.  

In statistics, correlation is not the same as causality: two phenomena might be correlated, but deducing causality out of that correlation is usually wrong. For example, most animals wake up just before dawn, but we can't deduce that waking up causes the sunlight to appear.

A very important question then, is: how can we determine causality within a statistical framework? The way we identify causality in statistics is by first laying out an experiment in a structured way. We identify the relevant factors that we think cause a response in a target variable, we identify blocking factors (factors that we are not interested in per se, but could explain part of the variability of the data), and we collect the data in accordance to the framework and budget that we defined. 

The most important part is that we define one factor to be a baseline/placebo that we can compare all of the treatments against. We do this because we usually want to test using a treatment/action versus not doing anything. Because using the treatment will always be more expensive than not doing anything, we want to be sure that there is a measurable effect (a statistically significative effect-causality). 

For example, if we have a website, we can change the background color to red, green, or blue. We will be interested in evaluating whether the background color has any impact on the purchases that people make through the website. In statistical terms, our null hypothesis is that there is no difference that's attributable to the colors. Our alternative hypothesis is that there is at least one color that induces a different response from the rest of them.  

If we find evidence that rejects that hypothesis, we will then obviously be interested in finding which colors cause a statistically different amount of purchases. Most websites are currently built using this data-driven approach; in fact, this is known as A/B testing. Of course, ANOVA can actually be used in any context, for any imaginable industry. The central assumption is that the data is distributed according to a Gaussian distribution, which is a lax assumption that applies to most situations. 

This is achieved by decomposing the variability in the data into two parts: one part that is related to the group variability (color, in our example), and another part that is related to the internal variability within each group. The former is usually referred to as explained variability (because we can attribute it to a group, such as color effect), while the latter is referred to as unexplained variability (we can't explain why the data deviates from the group means). If the null hypothesis was true (the color type is not related to the sales), we would expect both variabilities to be quite similar. This would imply that the fluctuations of the data around the group means is similar to the fluctuations of the data with respect to the global mean of the data. If there was a color effect, we should see that the between effects variability is substantially greater than the within variability. Operationally, this is achieved by computing the sum of the squared deviations for the within and between effects components. If the relevant statistic is large (meaning that the between sum of squares is larger than the within sum of squares), we can conclude that the factor (in this case, the color) is statistically significative.

The following screenshot shows the ANOVA same means:

The following screenshot shows the ANOVA different means:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.36.231