Getting ready

Why even worry about preprocessing? It's easy to overlook the easy steps. As we ingest data into our algorithms, we'll need to ensure that each of the data points is both useful and accurate. This means we need to ensure that both the X data and Y labels, in a supervised learning problem space, are correct prior to going to a learner. So, how do we ensure that each of the data points is correct? For large datasets, we can look at macro metrics such as a three sigma outlier. For smaller datasets, visually inspecting a percentage of the training data from each class or type could be another option. In essence, the point of this recipe is to introduce you to some of these techniques and then we will apply them throughout the chapters as we need them.

Remember in the preceding section ( Data types: There's more...) when we requested that you read up on Python, NumPy, and all of those other fancy pieces of technology? Well, now you are going to get the opportunity to go and apply these techniques to real-world practical problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.165.62