Have you ever heard of the ”butterfly effect”? It is a chaos theory picture in which
a butterfly flaps its wings in one part of the world, causing a cascade of events
that leads to a tidal wave elsewhere in the world.
It is just a metaphor, but consider the following. Say that I tell you that a butterfly
flaps its wings. Imagine this as a metaphor for movement in one variable. Now I tell
you that every time the butterfly flaps its wings, a body of water moves elsewhere. This is a metaphor for movement in a second variable.
Is there correlation? Yes – if the one variable (the water) tends to change when the other one changes
(the butterfly) then you have association between the two and you have correlation.
However, surely you should have further questions? Most importantly, I have told you
that when the butterfly flaps its wings the water moves, but I have not told you by how much the water moves. Surely this is the important thing. If the butterfly causes a small ripple in the
water every time it flaps, this is very different to whether the change in the water
is a tidal wave that kills thousands.
So, correlations deal with associations, but they do not really deal with how big
the associations are in the context of the variables.
Covariances are simply correlations that have been adjusted so that they include a
sense of the size of the relative association.
Let us use a business example too. Imagine if I told you that I have a variable measured
in US dollars, and I tell you that this variable changes by $1 when something else
happens, (e.g. when the oil price drops by 10%). What does this mean to you? Surely
you would tell me that it depends what a $1 change means in the overall context of my variable. Specifically, it would depend on the spread/range of my variable (i.e. standard
deviation). For example the following are two dollar-based variables:
-
The USD/EUR exchange rate. At the time of writing this book this exchange rate was around $0.8 to the euro,
and doesn’t change by more than about $0.25. In this context, a change of $1.00 to
the USD/EUR exchange rate is massive!
-
The gold price
. Over the past few years the gold price has varied from about $400 to $1,500 per
ounce, so a $1,000 range or more. In this context, a $1 change is not very big at
all!
Do you see that a change of $1 means a very different thing depending on what the
spread/range of the variable is? In the USD/EUR exchange rate, a change of $1 is huge,
whereas in the gold price such a change is small. A correlation coefficient that predicts
a linear relationship where $1 is the scale of change is therefore an incomplete statistic
until we know more about the standard deviations of the variables.
To adjust for this we sometimes use a measure of association called the “covariance.”
The (true) covariance between two variables is a correlation scaled for the standard
deviations of the variables. You don’t need to know exactly how this looks, but just remember the basic lesson
that covariances reflect how much the one variable moves (varies) given a certain
variance in the other variable.
Covariances are better than correlations – they give more information, and pay attention
to the relative scales of the variables.
However, unlike the correlation, it is not really possible to look at a set of covariances
and see immediately what they mean about the association. Therefore, we use covariances
as the real raw material for more complex statistics (like regression as discussed
later in the book), but we typically do not analyze them directly.
It’s more important to understand the notion of covariances than to actually calculate
them – the essential thing is to know that they underlie many critical statistical
procedures. If you do want to generate them, add to keyword COV in the SAS correlation
module.