Correlation, causation, and the messiness of life

Suppose we want to predict how much we are going to pay for gas to heat our home during winter and suppose we know the amount of sun radiation in the area we live. In this example, the sun radiation is going to be the independent variable, , and the bill is the dependent variable, . It is very important to note that there is nothing forbidding us to invert the question and ask about the amount of sun radiation, given the bill. If we establish a linear relationship (or any other relation, for that matter), we can go from  to  or vice versa. We call a variable independent because its value cannot be predicted by the model; instead, it is an input of the model and the dependent variable is the output. We say that the value of one variable depends on the value of the other because we build a model specifying such a dependency. We are not establishing a causal relationship between variables and we are not saying  causes . Always remember the following mantra, correlation does not imply causation. Let's develop this idea a little bit more. Even if we are able to predict the gas bill of a home from the sun radiation and the sun radiation from the gas bill, we can agree that it isn't true that we can control the amount of radiation emitted by the sun by changing the thermostat of our house! However, it is true that higher sun radiation can be related to a lower gas bill.

It is therefore important to remember that the statistical model we are building is one thing and the physical mechanism relating the variables is another. To establish that a correlation can be interpreted in terms of causation, we need to add a plausible physical mechanism to the description of the problem; a correlation is simply not enough. A very nice and amusing page showing clear examples of correlated variables with no causal relationship can be found at http://www.tylervigen.com/spurious-correlations.

Is a correlation useless when it comes to establishing a causal link? Not at all—a correlation can, in fact, support a causal link if we perform a carefully designed experiment. For example, we know that global warming is highly correlated to the increasing levels of atmospheric CO2. From this observation alone, we cannot conclude whether higher temperatures are causing an increase in the levels of CO2 or if the higher levels of the carbonic gas are increasing the temperature.

Even more, it could happen that there is a third variable that we are not taking into account, and this variable is producing both higher temperatures and higher levels of CO2. However—and pay attention to this—we can do an experiment to gain insight into this problem. One possible experiment could be the following; we build a set of glass tanks filled with different quantities of CO2. We can have one with regular air (that contains ~0.04% of CO2) and the others with and increasing amount of CO2. We then let these tanks receive sun light for, let's say, three hours. If we do this, we will verify that tanks with higher levels of CO2 have higher final temperatures. Hence, we will conclude that indeed CO2 is a greenhouse effect gas. Using the same experiment, we can also measure the concentration of CO2 at the end of the experiment to check that temperature does not cause the CO2 level to increase, at least not from air. It is this experimental setting together with statistical models that gives evidence in favor of CO2 emissions contributing to global warming. 

Another important aspect we will discuss following this example is that, even when the sun radiation and the gas bill are connected and maybe the sun radiation can be used to predict the gas bill, the relationship is more complicated, and other variables are involved. In fact, higher temperature can contribute to higher levels of CO2 because oceans are a reservoir of CO2, and CO2 is less soluble in water when temperatures increase. Also, a higher sun radiation means that more energy is delivered to a home. Part of that energy is reflected and part is turned into heat, part of the heat is absorbed by the house, and part is lost to the environment. The amount of heat lost depends upon several factors, such as the outside temperature and the speed of the wind. Then, we have the fact that the gas bill could also be affected by other factors, such as the international price of oil and gas, the costs/profits for the company (and its level of greediness), and also how tightly the government regulates the company.

In summary, life is messy, problems are not generally simple to understand, and context is always important. Statistical models can help us achieve better interpretations, reduce the risk of making nonsensical statements, and get better predictions, but none of this is automatic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.123.106