Feature Selection

We're halfway through our text and we have gotten our hands dirty with about a dozen datasets and have seen a great deal of feature selection methods that we, as data scientists and machine learning engineers, may utilize in our work and lives to ensure that we are getting the most out of our predictive modeling. So far, in dealing with data, we have worked with methods including:

Feature understanding through the identification of levels of data
Feature improvements and imputing missing values
Feature standardization and normalization

Each of the preceding methods has a place in our data pipeline and, more often than not, two or more methods are used in tandem with one another.

The remainder of this text will focus on other methods of feature engineering that are, by nature, a bit more mathematical and complex than in the first half of this book. As the preceding workflow grows, we will do our best to spare the reader the inner workings of each and every statistical test we invoke and instead convey a broader picture of what the tests are trying to achieve. As authors and instructors, we are always open to your questions about any of the inner mechanisms of this work.

We have come across one problem quite frequently in our discussion of features, and that problem is noise. Often, we are left working with features that may not be highly predictive of the response and, sometimes, can even hinder our models' performance in the prediction of the response. We used tools such as standardization and normalization to try to mitigate such damage, but at the end of the day, noise must be dealt with.

In this chapter, we will address a subset of feature engineering called feature selection, which is the process of selecting which features, from the original batch of features, are the best when it comes to the model prediction pipeline. More formally, given n features, we search for a subset of k, where k < n features that improve our machine learning pipeline. This generally comes down to the statement:

Feature Selection attempts to weed out the noise in our data and remove it.

The definition of feature selection touches on two major points that must be addressed:

The methods in which we may find the subset of k features
The definition of better in the context of machine learning

The majority of this chapter is dedicated to the methods in which we may find such subsets of features and the basis on which such methods work. This chapter will break up the methods of feature selection into two broad subsections: statistical-based and model-based feature selection. This separation may not 100% capture the complexity of the science and art of feature selection, but will work to drive real and actionable results in our machine learning pipeline.

Before we dive into the deep end of many of these methods, let's first discuss how we may better understand and define the idea of better, as it will frame the remainder of this chapter, as well as framing the remainder of this text.

We will cover the following topics in this chapter:

Achieving better performance in feature engineering
Creating a baseline machine learning pipeline
The types of feature selection
Choosing the right feature selection method

Table of Contents for Feature Selection

Create new playlist

Sign In

Sign Up

Table of Contents for
Feature Selection