Dealing with correlated variables

We know from Chapter 3, Modeling with Linear Regression that tricky things await us when we deal with (highly) correlated variables. Correlated variables translate into wider combinations of coefficients that are able to explain the data, or from a complementary point of view, correlated data has less power to restrict the model. A similar problem occurs when the classes become perfectly separable, that is, when there is no overlap between classes given the linear combination of variables in our model.

Using the iris dataset, you can try running model_1, but this time using the petal_width and petal_length variables. You will find that the  coefficients are broader than before, and also the 94% HPD black band in Figure 4.5 is much wider:

corr = iris[iris['species'] != 'virginica'].corr() 
mask = np.tri(*corr.shape).T
sns.heatmap(corr.abs(), mask=mask, annot=True, cmap='viridis')

Figure 4.7 is a heat map showing that for the sepal_length and setal_width variables used in the first example, the correlation is not as high as the correlation between the petal_length and petal_width  variables used in the second example:

Figure 4.7

To generate Figure 4.7, we have used a mask to remove the upper triangle and the diagonal elements of the heat map, since these are uninformative, given the lower triangle. Also note that we have plotted the absolute value of the correlation, since at this moment we do not care about the sign of the correlation between variables, only about its strength.

One solution when dealing with (highly) correlated variables is to just remove one (or more than one) correlated variable. Another option is to put more information into the prior, this can be achieved using informative priors if we have useful information. For weakly-informative priors, Andrew Gelman and the Stan Team recommend (https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations) scaling all non-binary variables to have a mean of 0 and then using:

Here, sd should be chosen in order to weakly inform us about the expected values for the scale. The normality parameter, , is suggested to be around 3-7. This prior is saying that, in general, we expect the coefficient to be small, but we use fat tails because occasionally we will find some larger coefficients. As we saw in Chapter 2, Programming Probabilistically and Chapter 3, Modeling with Linear Regression, using a Student's t-distribution leads to a more robust model than using a Gaussian distribution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.163.238