0%

MLOps Engineering at Scale teaches you how to implement efficient machine learning systems using pre-built services from AWS and other cloud vendors. This easy-to-follow book guides you step-by-step as you set up your serverless ML infrastructure, even if you’ve never used a cloud platform before. You’ll also explore tools like PyTorch Lightning, Optuna, and MLFlow that make it easy to build pipelines and scale your deep learning models in production.

Table of Contents

  1. MLOps Engineering at Scale
  2. Copyright
  3. contents
  4. front matter
    1. preface
    2. acknowledgments
    3. about this book
    4. Who should read this book
    5. How this book is organized: A road map
    6. About the code
    7. liveBook discussion forum
    8. about the author
    9. about the cover illustration
  5. Part 1 Mastering the data set
  6. 1 Introduction to serverless machine learning
    1. 1.1 What is a machine learning platform?
    2. 1.2 Challenges when designing a machine learning platform
    3. 1.3 Public clouds for machine learning platforms
    4. 1.4 What is serverless machine learning?
    5. 1.5 Why serverless machine learning?
    6. 1.5.1 Serverless vs. IaaS and PaaS
    7. 1.5.2 Serverless machine learning life cycle
    8. 1.6 Who is this book for?
    9. 1.6.1 What you can get out of this book
    10. 1.7 How does this book teach?
    11. 1.8 When is this book not for you?
    12. 1.9 Conclusions
    13. Summary
  7. 2 Getting started with the data set
    1. 2.1 Introducing the Washington, DC taxi rides data set
    2. 2.1.1 What is the business use case?
    3. 2.1.2 What are the business rules?
    4. 2.1.3 What is the schema for the business service?
    5. 2.1.4 What are the options for implementing the business service?
    6. 2.1.5 What data assets are available for the business service?
    7. 2.1.6 Downloading and unzipping the data set
    8. 2.2 Starting with object storage for the data set
    9. 2.2.1 Understanding object storage vs. filesystems
    10. 2.2.2 Authenticating with Amazon Web Services
    11. 2.2.3 Creating a serverless object storage bucket
    12. 2.3 Discovering the schema for the data set
    13. 2.3.1 Introducing AWS Glue
    14. 2.3.2 Authorizing the crawler to access your objects
    15. 2.3.3 Using a crawler to discover the data schema
    16. 2.4 Migrating to columnar storage for more efficient analytics
    17. 2.4.1 Introducing column-oriented data formats for analytics
    18. 2.4.2 Migrating to a column-oriented data format
    19. Summary
  8. 3 Exploring and preparing the data set
    1. 3.1 Getting started with interactive querying
    2. 3.1.1 Choosing the right use case for interactive querying
    3. 3.1.2 Introducing AWS Athena
    4. 3.1.3 Preparing a sample data set
    5. 3.1.4 Interactive querying using Athena from a browser
    6. 3.1.5 Interactive querying using a sample data set
    7. 3.1.6 Querying the DC taxi data set
    8. 3.2 Getting started with data quality
    9. 3.2.1 From “garbage in, garbage out” to data quality
    10. 3.2.2 Before starting with data quality
    11. 3.2.3 Normative principles for data quality
    12. 3.3 Applying VACUUM to the DC taxi data
    13. 3.3.1 Enforcing the schema to ensure valid values
    14. 3.3.2 Cleaning up invalid fare amounts
    15. 3.3.3 Improving the accuracy
    16. 3.4 Implementing VACUUM in a PySpark job
    17. Summary
  9. 4 More exploratory data analysis and data preparation
    1. 4.1 Getting started with data sampling
    2. 4.1.1 Exploring the summary statistics of the cleaned-up data set
    3. 4.1.2 Choosing the right sample size for the test data set
    4. 4.1.3 Exploring the statistics of alternative sample sizes
    5. 4.1.4 Using a PySpark job to sample the test set
    6. Summary
  10. Part 2 PyTorch for serverless machine learning
  11. 5 Introducing PyTorch: Tensor basics
    1. 5.1 Getting started with tensors
    2. 5.2 Getting started with PyTorch tensor creation operations
    3. 5.3 Creating PyTorch tensors of pseudorandom and interval values
    4. 5.4 PyTorch tensor operations and broadcasting
    5. 5.5 PyTorch tensors vs. native Python lists
    6. Summary
  12. 6 Core PyTorch: Autograd, optimizers, and utilities
    1. 6.1 Understanding the basics of autodiff
    2. 6.2 Linear regression using PyTorch automatic differentiation
    3. 6.3 Transitioning to PyTorch optimizers for gradient descent
    4. 6.4 Getting started with data set batches for gradient descent
    5. 6.5 Data set batches with PyTorch Dataset and DataLoader
    6. 6.6 Dataset and DataLoader classes for gradient descent with batches
    7. Summary
  13. 7 Serverless machine learning at scale
    1. 7.1 What if a single node is enough for my machine learning model?
    2. 7.2 Using IterableDataset and ObjectStorageDataset
    3. 7.3 Gradient descent with out-of-memory data sets
    4. 7.4 Faster PyTorch tensor operations with GPUs
    5. 7.5 Scaling up to use GPU cores
    6. Summary
  14. 8 Scaling out with distributed training
    1. 8.1 What if the training data set does not fit in memory?
    2. 8.1.1 Illustrating gradient accumulation
    3. 8.1.2 Preparing a sample model and data set
    4. 8.1.3 Understanding gradient descent using out-of-memory data shards
    5. 8.2 Parameter server approach to gradient accumulation
    6. 8.3 Introducing logical ring-based gradient descent
    7. 8.4 Understanding ring-based distributed gradient descent
    8. 8.5 Phase 1: Reduce-scatter
    9. 8.6 Phase 2: All-gather
    10. Summary
  15. Part 3 Serverless machine learning pipeline
  16. 9 Feature selection
    1. 9.1 Guiding principles for feature selection
    2. 9.1.1 Related to the label
    3. 9.1.2 Recorded before inference time
    4. 9.1.3 Supported by abundant examples
    5. 9.1.4 Expressed as a number with a meaningful scale
    6. 9.1.5 Based on expert insights about the project
    7. 9.2 Feature selection case studies
    8. 9.3 Feature selection using guiding principles
    9. 9.3.1 Related to the label
    10. 9.3.2 Recorded before inference time
    11. 9.3.3 Supported by abundant examples
    12. 9.3.4 Numeric with meaningful magnitude
    13. 9.3.5 Bring expert insight to the problem
    14. 9.4 Selecting features for the DC taxi data set
    15. Summary
  17. 10 Adopting PyTorch Lightning
    1. 10.1 Understanding PyTorch Lightning
    2. 10.1.1 Converting PyTorch model training to PyTorch Lightning
    3. 10.1.2 Enabling test and reporting for a trained model
    4. 10.1.3 Enabling validation during model training
    5. Summary
  18. 11 Hyperparameter optimization
    1. 11.1 Hyperparameter optimization with Optuna
    2. 11.1.1 Understanding loguniform hyperparameters
    3. 11.1.2 Using categorical and log-uniform hyperparameters
    4. 11.2 Neural network layers configuration as a hyperparameter
    5. 11.3 Experimenting with the batch normalization hyperparameter
    6. 11.3.1 Using Optuna study for hyperparameter optimization
    7. 11.3.2 Visualizing an HPO study in Optuna
    8. Summary
  19. 12 Machine learning pipeline
    1. 12.1 Describing the machine learning pipeline
    2. 12.2 Enabling PyTorch-distributed training support with Kaen
    3. 12.2.1 Understanding PyTorch-distributed training settings
    4. 12.3 Unit testing model training in a local Kaen container
    5. 12.4 Hyperparameter optimization with Optuna
    6. 12.4.1 Enabling MLFlow support
    7. 12.4.2 Using HPO for DcTaxiModel in a local Kaen provider
    8. 12.4.3 Training with the Kaen AWS provider
    9. Summary
  20. Appendix A. Introduction to machine learning
    1. A.1 Why machine learning?
    2. A.2 Machine learning at first glance
    3. A.3 Machine learning with structured data sets
    4. A.4 Regression with structured data sets
    5. A.5 Classification with structured data sets
    6. A.6 Training a supervised machine learning model
  21. Appendix B. Getting started with Docker
    1. B.1 Getting started with Docker
    2. B.2 Building a custom image
    3. B.3 Sharing your custom image with the world
  22. index
18.116.42.208