There's more...

There are quite a few libraries out there that are focused solely on the purpose of balancing data. This is a fairly common problem in the machine learning space. Consider the anomaly detection side of discriminatory modeling. Typically, you'll see cases where there are potentially 10% to 20% anomalies in the base data. Sometimes, it's even worse than just that. There have been some cases in my career where the rate of defect was around 1%. If we created a classifier in this case that strictly predicted no defect, then we would get 99% accuracy. In this way, we have to pay very careful attention to the structure and distribution of our data as we attempt to learn the understanding distribution.

Here's another really simple way to visualize the outliers in a dataset:

https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm.

The following link is to an article that focuses on practical graphing techniques for multivariate problems:

https://machinelearningmastery.com/visualize-machine-learning-data-python-pandas/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.216.254