Using machine learning to select features

Using CountVectorizer built-in feature selection tools is great when you are dealing with text; however, we are usually dealing with data already built into a row/column structure. We've seen the power of using purely statistical methodology for feature selection, and now let's see how we can invoke the awesome power of machine learning to, hopefully, do even more. The two main machine learning models that we will use in this section for the purposes of feature selection are tree-based models and linear models. They both have a notion of feature ranking that are useful when subsetting feature sets.

Before we go further, we believe it is worth mentioning again that these methods, while different in their methodology of selection, are attempting to find the optimal subset of features to improve our machine learning pipelines. The first method we will dive into will involve the internal importance metrics that algorithms such as decision trees and random forest models generate whilst fitting to training data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.14.118