Robust inferences

One objection you may have with the model_g model is that we are assuming a normal distribution, but we have two data points on the tails of the distribution, making the normally assumption a little bit forced. Since the tails of the normal distribution falls quickly as we move away from the mean, the normal distribution (at least an anthropomorphized one) is surprised by seeing those two points and reacts by moving itself toward those points and increasing the standard deviation. We can imagine those points as having an excessive weight determining the parameters of the normal distribution. So, what can we do?

One option is to declare those points as outliers and remove them from the data. We may have a valid reason to discard those points—maybe a malfunction of the equipment or a human error while measuring those two data points. Sometimes, we can even fix those data points, for example, if we realize they are just a result of bad coding while cleaning the data. On many occasions, we may also want to automatize the outlier elimination process by using one of the many outlier rules. Two of them are:

  • Any data point below 1.5 times the interquartile range from the lower quartile or 1.5 times the interquartile range above the upper quartile is an outlier
  • Any data point below or above two times the standard deviation of the data should be declared an outlier and banished from our data

Instead of using one of these outlier rules to manipulate the data, we can change the model, as explained in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.93.12