Modifying the data before running

One simple way to remove the correlation of and  is to center the  variable. For each  data point, we subtract the mean of the  variable ():

As a result,  will be centered at 0, and hence the pivot point when changing the slope is exactly the intercept, and thus the plausible parameter space is now more circular and less correlated. Be sure to complete exercise 6 (in the Exercises section) to see the differences between centering and not centering the data.

Centering data is not only a computational trick; it can also help in interpreting the results. The intercept is the value of  when . For many problems, this interpretation has no real meaning. For example, for quantities such as the height or weight, values of zero are meaningless and hence the intercept has no value in helping to make sense of the data. Instead, when centering the variables, the intercept is always the value of  for the mean value of .

For some problems, it may be useful to estimate the intercept precisely because it is not feasible to experimentally measure the value of  and so the estimated intercept can provide us with valuable information. However, extrapolations can be problematic, so be careful when doing this!

We may want to report the estimated parameters in terms of the centered data or in terms of the uncentered data, depending on our problem and audience. If we need to report the parameters as if they were determined in the original scale, we can do the following to put them back into the original scale:

This correction is the result of the following algebraic reasoning:

Therefore, it follows that equation 3.5 is true and also that:

We can even go further than centering  and transforming the data by standardizing it before running models. Standardizing is a common practice for linear regression models in statistics and ML since many algorithms behave better when the data is standardized. This transformation is achieved by centering the data and dividing it by the standard deviation. Mathematically we have:

One advantage of standardizing the data is that we can always use the same weakly informative priors without having to think about the scale of the data. For standardized data, the intercept will always be around 0 and the slope will be restricted to the interval [-1, 1]. Additionally, standardizing the data allow us to talk in terms of Z-scores, that is, in units of standard deviations. If someone says the value of a parameter is -1.3 in Z-score units, we automatically know that the value in question is 1.3 standard deviations below the mean, irrespective of the actual value of the mean or the actual value of the standard deviation of the data. A change in one Z-score unit is a change in one standard deviation, whatever the scale of the original data is. This can be very useful when working with several variables; having all of the variables in the same scale can simplify the interpretation of the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.95.107