Data mining for feature extraction

The cost-effective evaluation of large, complex datasets requires the detection of signals at scale. There are several examples throughout the book:

  • Information theory is a useful tool to extract features that capture potential signals and can be used in ML models. In Chapter 4, Alpha Factor Research we use mutual information to assess the potential values of individual features for a supervised learning algorithm to predict asset returns.
  • In Chapter 12Unsupervised Learning, we introduce various techniques to create features from high-dimensional datasets. In Chapter 14Topic Modeling, we apply these techniques to text data.
  • We emphasize model-specific ways to gain insights into the predictive power of individual variables. We use a novel game-theoretic approach called SHapley Additive exPlanations (SHAP) to attribute predictive performance to individual features in complex Gradient Boosting machines with a large number of input variables.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.94.150.98