Data preparation

As we said previously, there is simply no substitute for data quality. Is there data that is missing, malformed, or incorrect? And let's not forget about another term you'll get familiar with, data outliers. Those are the nasty little pieces of data that simply don't fit nicely with the rest of your data! Do you have those? If so, should they be there, and if so, how will they be treated? If you are not sure, here's what a data outlier might look like if you are plotting your data:

In statistics, an outlier is an observation point that is distant from other observations, sometimes very much so, sometimes not. The outlier itself may be due to variability in measurement, indicate an experiment defect, or it might in fact be valid. If you see outliers in your data, you need to understand why. They can indicate some form of measurement error, and the algorithm that you are using may not be robust enough to handle these outliers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.157.197