Part 1 An introduction to machine learning engineering
1 What is a machine learning engineer?
1.2 The core tenets of ML engineering
1.3 The goals of ML engineering
2 Your data science could use some engineering
2.1 Augmenting a complex profession with processes to increase project success
2.2 A foundation of simplicity
2.3 Co-opting principles of Agile software engineering
Embracing and expecting change
2.4 The foundation of ML engineering
3 Before you model: Planning and scoping a project
3.1 Planning: You want me to predict what?!
Experimentation by solution building: Wasting time for pride’s sake
3.2 Experimental scoping: Setting expectations and boundaries
Experimental scoping for the ML team: Research
Experimental scoping for the ML team: Experimentation
4 Before you model: Communication and logistics of projects
4.1 Communication: Defining the problem
Setting critical discussion boundaries
4.2 Don’t waste our time: Meeting with cross-functional teams
Experimental update meeting: Do we know what we’re doing here?
SME review/prototype review: Can we solve this?
Development progress review(s): Is this thing going to work?
MVP review: Did you build what we asked for?
Preproduction review: We really hope we didn’t screw this up
4.3 Setting limits on your experimentation
Can you put this into production? Would you want to maintain it?
TDD vs. RDD vs. PDD vs. CDD for ML projects
4.4 Planning for business rules chaos
Embracing chaos by planning for it
5 Experimentation in action: Planning and researching an ML project
Perform basic research and planning
Forget the blogs—read the API docs
Draw straws for an internal hackathon
5.2 Performing experimental prep work
Moving from script to reusable code
One last note on building reusable code for experimentation
6 Experimentation in action: Testing and evaluating a project
Running quick forecasting tests
Whittling down the possibilities
Evaluating prototypes properly
Making a call on the direction to go in
7 Experimentation in action: Moving from prototype to MVP
7.1 Tuning: Automating the annoying stuff
Using Hyperopt to tune a complex forecasting problem
7.2 Choosing the right tech for the platform and the team
Handling tuning from the driver with SparkTrials
Handling tuning from the workers with a pandas_udf
Using new paradigms for teams: Platforms and technologies
8 Experimentation in action: Finalizing an MVP with MLflow and runtime optimization
8.1 Logging: Code, metrics, and results
Please stop printing and log your information
Version control, branch strategies, and working with others
8.2 Scalability and concurrency
What you can (and can’t) run asynchronously
Part 2 Preparing for production: Creating maintainable ML
9 Modularity for ML: Writing testable and legible code
9.1 Understanding monolithic scripts and why they are bad
Considerations for monolithic scripts
9.4 Using test-driven development for ML
10 Standards of coding and creating maintainable ML code
10.2 Naming, structure, and code architecture
Naming conventions and structure
10.3 Tuple unpacking and maintainable alternatives
A solid alternative to tuple unpacking
10.4 Blind to issues: Eating exceptions and other bad practices
Try/catch with the precision of a shotgun
Exception handling with laser precision
10.5 Use of global mutable objects
Encapsulation to prevent mutable side effects
11 Model measurement and why it’s so important
11.1 Measuring model attribution
Measuring prediction performance
Clarifying correlation vs. causation
11.2 Leveraging A/B testing for attribution calculations
Using alternative displays and tests
Evaluating categorical metrics
12 Holding on to your gains by watching for drift
13.1 Elegant complexity vs. overengineering
Lightweight scripted style (imperative)
13.2 Unintentional obfuscation: Could you read this if you didn’t write it?
Troublesome coding habits recap
13.3 Premature generalization, premature optimization, and other bad ways to show how smart you are
Generalization and frameworks: Avoid them until you can’t
13.4 Do you really want to be the canary? Alpha testing and the dangers of the open source coal mine
13.5 Technology-driven development vs. solution-driven development
Part 3 Developing production machine learning code
Find a source of truth and align on it
Don’t embed data cleansing into your production code
14.3 Monitoring everything else in the model life cycle
14.4 Keeping things as simple as possible
Simplicity in problem definitions
14.6 Avoiding cargo cult ML behavior
15 Quality and acceptance testing
A brief intro to feature stores
15.2 Fallbacks and cold starts
15.3 End user vs. internal use testing
Interfacing with the model registry
What a feature store is used for
16.3 Prediction serving architecture
Integrated models (edge deployment)
Appendix A Big O(no) and how to think about runtime performance
Appendix B Setting up a development environment
3.144.255.55