Groups comparison

One pretty common statistical analysis is group comparison. We may be interested in how well patients respond to a certain drug, the reduction of car accidents by the introduction of a new traffic regulation, student performance under different teaching approaches, and so on.

Sometimes, this type of question is framed under the hypothesis testing scenario with the goal of declaring a result statistically significant. Relying only on statistical significance can be problematic for many reasons: on the one hand, statistical significance is not equivalent to practical significance; on the other hand, a really small effect can be declared significant just by collecting enough data. The idea of hypothesis testing is connected to the concept of p-values. This is not a fundamental connection but a cultural one; people are used to thinking that way mostly because that's what they learn in most introductory statistical courses. There is a long record of studies and essays showing that, more often than not, p-values are used and interpreted the wrong way, even by people who are using them on a daily basis.

Instead of doing hypothesis testing, we are going to take a different route and we are going to focus on estimating the effect size, that is, quantifying the difference between two groups. One advantage of thinking in terms of effect size is that we move away from the yes-no questions like; Does it work?, Is there any effect? to the more nuance type of question like; How well does it work? How large/small is the effect?

The effect size is just a way to quantify the size of the difference between two groups.

Sometimes, when comparing groups, people talk about a control group and a treatment group (or maybe more than one control and treatment groups). This makes sense, for example, when we want to test a new drug: because of the placebo effect and other reasons, we want to compare the new drug (the treatment) against a control group (a group not receiving the drug). In this case, we want to know how well one drug works compared to doing nothing (or, as is generally done, against the placebo effect). One interesting alternative question will be to ask how good a new drug is compared with the (already approved) most popular drug to treat that illness. In such a case, the control group cannot be a placebo; it should be the other drug. Bogus control groups are a splendid way to lie using statistics.

For example, imagine you work for a dairy product company that wants to sell overly sugared yogurts to kids by telling their dads and moms that this particular yogurt boosts the immune system or help their kids grown stronger. One way to cheat and falsely back up your claims with data is by using milk or even water as a control group, instead of another cheaper, less sugary, less marketed yogurt. It may sound silly when I put this way, but there is a lot of actual research done this way. In fact, I am describing actual papers, not imaginary hypothetical scenarios. When someone says something is harder, better, faster, stronger, remember to ask what the baseline used for the comparison was.

To compare groups, we must decide which feature (or features) we are going to use for the comparison. A very common feature is the mean of each group. Because we are Bayesian, we will work to obtain a posterior distribution of the differences of means between groups and not just a point-estimate of the differences. To help us see and interpret such a posterior, we are going to use three tools:

A posterior plot with a reference value
The Cohen's d
The probability of superiority

In the previous chapter, we already saw an example of how to use the function az.plot_posterior with a reference value; we will see another example soon. The novelties here are the Cohen's d and the probability of superiority, which are two popular ways to express the effect size. Let's take a look at them.

Table of Contents for Groups comparison

Create new playlist

Sign In

Sign Up

Table of Contents for
Groups comparison