MLOps Engineering at Scale

Author Carl Osipov

Release Date: 2022/02/01

ISBN: 9781617297762

Topic:

23
Chapters

0-1
Hours read

0k
Total Words

Start Reading Now
Add to Wishlist
View table of contents

MLOps Engineering at Scale teaches you how to implement efficient machine learning systems using pre-built services from AWS and other cloud vendors. This easy-to-follow book guides you step-by-step as you set up your serverless ML infrastructure, even if you’ve never used a cloud platform before. You’ll also explore tools like PyTorch Lightning, Optuna, and MLFlow that make it easy to build pipelines and scale your deep learning models in production.

MLOps Engineering at Scale
Copyright
contents
front matter
1. preface
2. acknowledgments
3. about this book
4. Who should read this book
5. How this book is organized: A road map
6. About the code
7. liveBook discussion forum
8. about the author
9. about the cover illustration
Part 1 Mastering the data set
1 Introduction to serverless machine learning
1. 1.1 What is a machine learning platform?
2. 1.2 Challenges when designing a machine learning platform
3. 1.3 Public clouds for machine learning platforms
4. 1.4 What is serverless machine learning?
5. 1.5 Why serverless machine learning?
6. 1.5.1 Serverless vs. IaaS and PaaS
7. 1.5.2 Serverless machine learning life cycle
8. 1.6 Who is this book for?
9. 1.6.1 What you can get out of this book
10. 1.7 How does this book teach?
11. 1.8 When is this book not for you?
12. 1.9 Conclusions
13. Summary
2 Getting started with the data set
1. 2.1 Introducing the Washington, DC taxi rides data set
2. 2.1.1 What is the business use case?
3. 2.1.2 What are the business rules?
4. 2.1.3 What is the schema for the business service?
5. 2.1.4 What are the options for implementing the business service?
6. 2.1.5 What data assets are available for the business service?
7. 2.1.6 Downloading and unzipping the data set
8. 2.2 Starting with object storage for the data set
9. 2.2.1 Understanding object storage vs. filesystems
10. 2.2.2 Authenticating with Amazon Web Services
11. 2.2.3 Creating a serverless object storage bucket
12. 2.3 Discovering the schema for the data set
13. 2.3.1 Introducing AWS Glue
14. 2.3.2 Authorizing the crawler to access your objects
15. 2.3.3 Using a crawler to discover the data schema
16. 2.4 Migrating to columnar storage for more efficient analytics
17. 2.4.1 Introducing column-oriented data formats for analytics
18. 2.4.2 Migrating to a column-oriented data format
19. Summary
3 Exploring and preparing the data set
1. 3.1 Getting started with interactive querying
2. 3.1.1 Choosing the right use case for interactive querying
3. 3.1.2 Introducing AWS Athena
4. 3.1.3 Preparing a sample data set
5. 3.1.4 Interactive querying using Athena from a browser
6. 3.1.5 Interactive querying using a sample data set
7. 3.1.6 Querying the DC taxi data set
8. 3.2 Getting started with data quality
9. 3.2.1 From “garbage in, garbage out” to data quality
10. 3.2.2 Before starting with data quality
11. 3.2.3 Normative principles for data quality
12. 3.3 Applying VACUUM to the DC taxi data
13. 3.3.1 Enforcing the schema to ensure valid values
14. 3.3.2 Cleaning up invalid fare amounts
15. 3.3.3 Improving the accuracy
16. 3.4 Implementing VACUUM in a PySpark job
17. Summary
4 More exploratory data analysis and data preparation
1. 4.1 Getting started with data sampling
2. 4.1.1 Exploring the summary statistics of the cleaned-up data set
3. 4.1.2 Choosing the right sample size for the test data set
4. 4.1.3 Exploring the statistics of alternative sample sizes
5. 4.1.4 Using a PySpark job to sample the test set
6. Summary
Part 2 PyTorch for serverless machine learning
5 Introducing PyTorch: Tensor basics
1. 5.1 Getting started with tensors
2. 5.2 Getting started with PyTorch tensor creation operations
3. 5.3 Creating PyTorch tensors of pseudorandom and interval values
4. 5.4 PyTorch tensor operations and broadcasting
5. 5.5 PyTorch tensors vs. native Python lists
6. Summary
6 Core PyTorch: Autograd, optimizers, and utilities
1. 6.1 Understanding the basics of autodiff
2. 6.2 Linear regression using PyTorch automatic differentiation
3. 6.3 Transitioning to PyTorch optimizers for gradient descent
4. 6.4 Getting started with data set batches for gradient descent
5. 6.5 Data set batches with PyTorch Dataset and DataLoader
6. 6.6 Dataset and DataLoader classes for gradient descent with batches
7. Summary
7 Serverless machine learning at scale
1. 7.1 What if a single node is enough for my machine learning model?
2. 7.2 Using IterableDataset and ObjectStorageDataset
3. 7.3 Gradient descent with out-of-memory data sets
4. 7.4 Faster PyTorch tensor operations with GPUs
5. 7.5 Scaling up to use GPU cores
6. Summary
8 Scaling out with distributed training
1. 8.1 What if the training data set does not fit in memory?
2. 8.1.1 Illustrating gradient accumulation
3. 8.1.2 Preparing a sample model and data set
4. 8.1.3 Understanding gradient descent using out-of-memory data shards
5. 8.2 Parameter server approach to gradient accumulation
6. 8.3 Introducing logical ring-based gradient descent
7. 8.4 Understanding ring-based distributed gradient descent
8. 8.5 Phase 1: Reduce-scatter
9. 8.6 Phase 2: All-gather
10. Summary
Part 3 Serverless machine learning pipeline
9 Feature selection
1. 9.1 Guiding principles for feature selection
2. 9.1.1 Related to the label
3. 9.1.2 Recorded before inference time
4. 9.1.3 Supported by abundant examples
5. 9.1.4 Expressed as a number with a meaningful scale
6. 9.1.5 Based on expert insights about the project
7. 9.2 Feature selection case studies
8. 9.3 Feature selection using guiding principles
9. 9.3.1 Related to the label
10. 9.3.2 Recorded before inference time
11. 9.3.3 Supported by abundant examples
12. 9.3.4 Numeric with meaningful magnitude
13. 9.3.5 Bring expert insight to the problem
14. 9.4 Selecting features for the DC taxi data set
15. Summary
10 Adopting PyTorch Lightning
1. 10.1 Understanding PyTorch Lightning
2. 10.1.1 Converting PyTorch model training to PyTorch Lightning
3. 10.1.2 Enabling test and reporting for a trained model
4. 10.1.3 Enabling validation during model training
5. Summary
11 Hyperparameter optimization
1. 11.1 Hyperparameter optimization with Optuna
2. 11.1.1 Understanding loguniform hyperparameters
3. 11.1.2 Using categorical and log-uniform hyperparameters
4. 11.2 Neural network layers configuration as a hyperparameter
5. 11.3 Experimenting with the batch normalization hyperparameter
6. 11.3.1 Using Optuna study for hyperparameter optimization
7. 11.3.2 Visualizing an HPO study in Optuna
8. Summary
12 Machine learning pipeline
1. 12.1 Describing the machine learning pipeline
2. 12.2 Enabling PyTorch-distributed training support with Kaen
3. 12.2.1 Understanding PyTorch-distributed training settings
4. 12.3 Unit testing model training in a local Kaen container
5. 12.4 Hyperparameter optimization with Optuna
6. 12.4.1 Enabling MLFlow support
7. 12.4.2 Using HPO for DcTaxiModel in a local Kaen provider
8. 12.4.3 Training with the Kaen AWS provider
9. Summary
Appendix A. Introduction to machine learning
1. A.1 Why machine learning?
2. A.2 Machine learning at first glance
3. A.3 Machine learning with structured data sets
4. A.4 Regression with structured data sets
5. A.5 Classification with structured data sets
6. A.6 Training a supervised machine learning model
Appendix B. Getting started with Docker
1. B.1 Getting started with Docker
2. B.2 Building a custom image
3. B.3 Sharing your custom image with the world
index

MLOps Engineering at Scale

Table of Contents