Summary

In this chapter, we have dealt with many different problems that you may encounter when preparing your data to be analyzed by a linear model.

We started by discussing rescaling variables and understanding how new variables' scales not only permit a better insight into the data, but also help us deal with unexpectedly missing data.

Then, we learned how to encode qualitative variables and deal with the extreme variety of possible levels with unpredictable variables and textual information just by using the hashing trick. We then returned to quantitative variables and learned how to transform in a linear shape and obtain better regression models.

Finally, we dealt with some possible data pathologies, missing and outlying values, showing a few quick fixes that, in spite of their simplicity, are extremely effective and performant.

At this point, before proceeding to more sophisticated linear models, we just need to illustrate the data science principles that can help you obtain really good working predictive engines and not just mere mathematical curve fitting exercises. And that's precisely the topic of the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.140.206