contents

front matter

preface

acknowledgments

about this book

about the author

about the cover illustration

1 Introduction to feature engineering

1.1 What is feature engineering, and why does it matter?

Who needs feature engineering?

What feature engineering cannot do

Great data, great models

1.2 The feature engineering pipeline

The machine learning pipeline

1.3 How this book is organized

The five types of feature engineering

A brief overview of this book’s case studies

2 The basics of feature engineering

2.1 Types of data

Structured data

Unstructured data

2.2 The four levels of data

Qualitative data vs. quantitative data

The nominal level

The ordinal level

The interval level

The ratio level

2.3 The types of feature engineering

Feature improvement

Feature construction

Feature selection

Feature extraction

Feature learning

2.4 How to evaluate feature engineering efforts

Evaluation metric 1: Machine learning metrics

Evaluation metric 2: Interpretability

Evaluation metric 3: Fairness and bias

Evaluation metric 4: ML complexity and speed

3 Healthcare: Diagnosing COVID-19

3.1 The COVID flu diagnostic dataset

The problem statement and defining success

3.2 Exploratory data analysis

3.3 Feature improvement

Imputing missing quantitative data

Imputing missing qualitative data

3.4 Feature construction

Numerical feature transformations

Constructing categorical data

3.5 Building our feature engineering pipeline

Train/test splits

3.6 Feature selection

Mutual information

Hypothesis testing

Using machine learning

3.7 Answers to exercises

4 Bias and fairness: Modeling recidivism

4.1 The COMPAS dataset

The problem statement and defining success

4.2 Exploratory data analysis

4.3 Measuring bias and fairness

Disparate treatment vs. disparate impact

Definitions of fairness

4.4 Building a baseline model

Feature construction

Building our baseline pipeline

Measuring bias in our baseline model

4.5 Mitigating bias

Preprocessing

In-processing

Postprocessing

4.6 Building a bias-aware model

Feature construction: Using the Yeo-Johnson transformer to treat the disparate impact

Feature extraction: Learning fair representation implementation using AIF360

4.7 Answers to exercises

5 Natural language processing: Classifying social media sentiment

5.1 The tweet sentiment dataset

The problem statement and defining success

5.2 Text vectorization

Feature construction: Bag of words

Count vectorization

TF-IDF vectorization

5.3 Feature improvement

Cleaning noise from text

Standardizing tokens

5.4 Feature extraction

Singular value decomposition

5.5 Feature learning

Introduction to autoencoders

Training an autoencoder to learn features

Introduction to transfer learning

Transfer learning with BERT

Using BERT’s pretrained features

5.6 Text vectorization recap

5.7 Answers to exercises

6 Computer vision: Object recognition

6.1 The CIFAR-10 dataset

The problem statement and defining success

6.2 Feature construction: Pixels as features

6.3 Feature extraction: Histogram of oriented gradients

Optimizing dimension reduction with PCA

6.4 Feature learning with VGG-11

Using a pretrained VGG-11 as a feature extractor

Fine-tuning VGG-11

Using fine-tuned VGG-11 features with logistic regression

6.5 Image vectorization recap

6.6 Answers to exercises

7 Time series analysis: Day trading with machine learning

7.1 The TWLO dataset

The problem statement

7.2 Feature construction

Date/time features

Lag features

Rolling/expanding window features

Domain-specific features

7.3 Feature selection

Selecting features using ML

Recursive feature elimination

7.4 Feature extraction

Polynomial feature extraction

7.5 Conclusion

7.6 Answers to exercises

8 Feature stores

8.1 MLOps and feature stores

Benefits of using a feature store

Wikipedia, MLOps, and feature stores

8.2 Setting up a feature store with Hopsworks

Feature groups

Using feature groups to select data

8.3 Creating training data in Hopsworks

Training datasets

Provenance

8.4 Answer to exercise

9 Putting it all together

9.1 Revisiting the feature engineering pipeline

9.2 Key takeaways

Feature engineering is as crucial as ML model choice

Feature engineering isn’t a one-size-fits-all solution

9.3 Recap of feature engineering

Feature improvement

Feature construction

Feature selection

Feature extraction

Feature learning

9.4 Data type-specific feature engineering techniques

Structured data

Unstructured data

9.5 Frequently asked questions

When should I dummify categorical variables vs. leaving them as a single column?

How do I know if I need to deal with bias in my data?

9.6 Other feature engineering techniques

Categorical dummy bucketing

Combining learned features with conventional features

Other raw data vectorizers

9.7 Further reading material

index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.54