Many tutorials show you how to develop ML systems from ideation to deployed models. But with constant changes in tooling, those systems can quickly become outdated. Without an intentional design to hold the components together, these systems will become a technical liability, prone to errors and be quick to fall apart.

In this book, Chip Huyen provides a framework for designing real-world ML systems that are quick to deploy, reliable, scalable, and iterative. These systems have the capacity to learn from new data, improve on past mistakes, and adapt to changing requirements and environments. Youâ??ll learn everything from project scoping, data management, model development, deployment, and infrastructure to team structure and business analysis.

  • Learn the challenges and requirements of an ML system in production
  • Build training data with different sampling and labeling methods
  • Leverage best techniques to engineer features for your ML models to avoid data leakage
  • Select, develop, debug, and evaluate ML models that are best suit for your tasks
  • Deploy different types of ML systems for different hardware
  • Explore major infrastructural choices and hardware designs
  • Understand the human side of ML, including integrating ML into business, user experience, and team structure

Table of Contents

  1. 1. Machine Learning in Production
    1. When and When not to Use Machine Learning
    2. When To Use Machine Learning
    3. When not to Use Machine Learning
    4. Machine Learning Use Cases
    5. Understanding Machine Learning Systems
    6. Machine learning in research vs. in production
    7. Machine learning systems vs. traditional software
    8. Designing ML Systems in Production
    9. Requirements for ML Systems
    10. Iterative Process
    11. Summary
  2. 2. Data Engineering: Fundamentals
    1. Mind vs. Data
    2. Data Sources
    3. Data Formats
    4. JSON
    5. Row-major vs. Column-major Format
    6. Text vs. Binary Format
    7. Data Processing
    8. OLTP vs. OLAP
    9. ETL: Extract, Transform, Load
    10. Summary
  3. 3. Data Engineering: Training Data
    1. Sampling
    2. Non-Probability Sampling
    3. Simple Random Sampling
    4. Stratified Sampling
    5. Weighted Sampling
    6. Importance Sampling
    7. Reservoir Sampling
    8. Labeling
    9. Hand Labels
    10. Handling the Lack of Hand Labels
    11. Class Imbalance
    12. Challenges of Class Imbalance
    13. Handling Class Imbalance