contents

front matter

preface

acknowledgments

about this book

about the author

about the cover illustration

Part 1 An introduction to machine learning engineering

1 What is a machine learning engineer?

1.1 Why ML engineering?

1.2 The core tenets of ML engineering

Planning

Scoping and research

Experimentation

Development

Deployment

Evaluation

1.3 The goals of ML engineering

2 Your data science could use some engineering

2.1 Augmenting a complex profession with processes to increase project success

2.2 A foundation of simplicity

2.3 Co-opting principles of Agile software engineering

Communication and cooperation

Embracing and expecting change

2.4 The foundation of ML engineering

3 Before you model: Planning and scoping a project

3.1 Planning: You want me to predict what?!

Basic planning for a project

That first meeting

Plan for demos—lots of demos

Experimentation by solution building: Wasting time for pride’s sake

3.2 Experimental scoping: Setting expectations and boundaries

What is experimental scoping?

Experimental scoping for the ML team: Research

Experimental scoping for the ML team: Experimentation

4 Before you model: Communication and logistics of projects

4.1 Communication: Defining the problem

Understanding the problem

Setting critical discussion boundaries

4.2 Don’t waste our time: Meeting with cross-functional teams

Experimental update meeting: Do we know what we’re doing here?

SME review/prototype review: Can we solve this?

Development progress review(s): Is this thing going to work?

MVP review: Did you build what we asked for?

Preproduction review: We really hope we didn’t screw this up

4.3 Setting limits on your experimentation

Set a time limit

Can you put this into production? Would you want to maintain it?

TDD vs. RDD vs. PDD vs. CDD for ML projects

4.4 Planning for business rules chaos

Embracing chaos by planning for it

Human-in-the-loop design

What’s your backup plan?

4.5 Talking about results

5 Experimentation in action: Planning and researching an ML project

5.1 Planning experiments

Perform basic research and planning

Forget the blogs—read the API docs

Draw straws for an internal hackathon

Level the playing field

5.2 Performing experimental prep work

Performing data analysis

Moving from script to reusable code

One last note on building reusable code for experimentation

6 Experimentation in action: Testing and evaluating a project

6.1 Testing ideas

Setting guidelines in code

Running quick forecasting tests

Whittling down the possibilities

Evaluating prototypes properly

Making a call on the direction to go in

So . . . what’s next?

7 Experimentation in action: Moving from prototype to MVP

7.1 Tuning: Automating the annoying stuff

Tuning options

Hyperopt primer

Using Hyperopt to tune a complex forecasting problem

7.2 Choosing the right tech for the platform and the team

Why Spark?

Handling tuning from the driver with SparkTrials

Handling tuning from the workers with a pandas_udf

Using new paradigms for teams: Platforms and technologies

8 Experimentation in action: Finalizing an MVP with MLflow and runtime optimization

8.1 Logging: Code, metrics, and results

MLflow tracking

Please stop printing and log your information

Version control, branch strategies, and working with others

8.2 Scalability and concurrency

What is concurrency?

What you can (and can’t) run asynchronously

Part 2 Preparing for production: Creating maintainable ML

9 Modularity for ML: Writing testable and legible code

9.1 Understanding monolithic scripts and why they are bad

How monoliths come into being

Walls of text

Considerations for monolithic scripts

9.2 Debugging walls of text

9.3 Designing modular ML code

9.4 Using test-driven development for ML

10 Standards of coding and creating maintainable ML code

10.1 ML code smells

10.2 Naming, structure, and code architecture

Naming conventions and structure

Trying to be too clever

Code architecture

10.3 Tuple unpacking and maintainable alternatives

Tuple unpacking example

A solid alternative to tuple unpacking

10.4 Blind to issues: Eating exceptions and other bad practices

Try/catch with the precision of a shotgun

Exception handling with laser precision

Handling errors the right way

10.5 Use of global mutable objects

How mutability can burn you

Encapsulation to prevent mutable side effects

10.6 Excessively nested logic

11 Model measurement and why it’s so important

11.1 Measuring model attribution

Measuring prediction performance

Clarifying correlation vs. causation

11.2 Leveraging A/B testing for attribution calculations

A/B testing

Evaluating continuous metrics

Using alternative displays and tests

Evaluating categorical metrics

12 Holding on to your gains by watching for drift

12.1 Detecting drift

What influences drift?

12.2 Responding to drift

What can we do about it?

Responding to drift

13 ML development hubris

13.1 Elegant complexity vs. overengineering

Lightweight scripted style (imperative)

An overengineered mess

13.2 Unintentional obfuscation: Could you read this if you didn’t write it?

The flavors of obfuscation

Troublesome coding habits recap

13.3 Premature generalization, premature optimization, and other bad ways to show how smart you are

Generalization and frameworks: Avoid them until you can’t

Optimizing too early

13.4 Do you really want to be the canary? Alpha testing and the dangers of the open source coal mine

13.5 Technology-driven development vs. solution-driven development

Part 3 Developing production machine learning code

14 Writing production code

14.1 Have you met your data?

Make sure you have the data

Check your data provenance

Find a source of truth and align on it

Don’t embed data cleansing into your production code

14.2 Monitoring your features

14.3 Monitoring everything else in the model life cycle

14.4 Keeping things as simple as possible

Simplicity in problem definitions

Simplicity in implementation

14.5 Wireframing ML projects

14.6 Avoiding cargo cult ML behavior

15 Quality and acceptance testing

15.1 Data consistency

Training and inference skew

A brief intro to feature stores

Process over technology

The dangers of a data silo

15.2 Fallbacks and cold starts

Leaning heavily on prior art

Cold-start woes

15.3 End user vs. internal use testing

Biased testing

Dogfooding

SME evaluation

15.4 Model interpretability

Shapley additive explanations

Using shap

16 Production infrastructure

16.1 Artifact management

MLflow’s model registry

Interfacing with the model registry

16.2 Feature stores

What a feature store is used for

Using a feature store

Evaluating a feature store

16.3 Prediction serving architecture

Determining serving needs

Bulk external delivery

Microbatch streaming

Real-time server-side

Integrated models (edge deployment)

Appendix A Big O(no) and how to think about runtime performance

Appendix B Setting up a development environment

index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.255.55