front matter

preface

Even as a young boy, I was stubborn. When people would suggest simple ways of doing things, I would ignore advice, choosing to always do things the hard way. Decades later, not much changed as I shifted through increasingly challenging careers, eventually landing in the realm of data science (DS) and machine learning (ML) engineering, and now ML software development. As a data scientist in industry, I always felt the need to build overly complex solutions, working in isolation to solve a given problem in the way that I felt was best.

I had some successes but many failures, and generally left a trail of unmaintainable code in my wake as I moved from job to job. It’s not something that I’m particularly proud of. I’ve been contacted by former colleagues, years after leaving a position, to have them tell me that my code is still running every day. When I’ve asked each one of them why, I’ve gotten the same demoralizing answer that has made me regret my implementations: “No one can figure it out to make changes to it, and it’s too important to turn off.”

I’ve been a bad data scientist. I’ve been an even worse ML engineer. It took me years to learn why that is. That stubbornness and resistance to solving problems in the simplest way created a lot of headaches for others, both in the sheer number of cancelled projects while I was at companies and in the unmaintainable technical debt that I left in my wake.

It wasn’t until my most recent job, working as a resident solutions architect at Databricks (essentially a vendor field consultant), that I started to learn where I had gone wrong and to change how I approached solving problems. Likely because I was now working as an advisor to help others who were struggling with data science problems, I was able to see my own shortcomings through the abstract reflection of what they were struggling through. Over the past few years, I’ve helped quite a few teams avoid many pitfalls that I’ve experienced (and created through my own stubbornness and hubris). I figured that writing down some of this advice that I give people regularly could benefit a broader audience, beyond my individual conversations with isolated teams in the context of my job.

After all, applying machine learning to a real-world use case is hard enough when following along with examples and books on the concepts of applied ML. When you introduce the staggering complexity of end-to-end project work (which is the focus of this book), it comes as little surprise that many companies fail to realize the potential of ML in their businesses. It’s just hard. It’s easier if you have a guide, though.

This book doesn’t aim to be a guide to applied ML. We’re not going to be covering algorithms or theories on why one model is better than another for a particular use case, nor will we delve into all the details to solve individual problems. Rather, this book is a guide to avoid the pitfalls that I’ve seen so many teams fall into (and ones that I’ve had to claw my way out of as a practitioner). It is a generalized approach to using DS techniques to solve problems in a way that you, your customers (the internal ones at your company), and your peers will not regret. It’s a guide to help you avoid making some of the really stupid mistakes that I’ve made.

In the words of two of my relatively recently acquired favorite proverbs:

Ask the experienced rather than the learned.

—Arab proverb

It is best to learn wisdom by the experience of others.

—Latin proverb

acknowledgments

There’s absolutely no way that this book would have been possible without the support of my truly staggeringly amazing wife, Julie. She’s had to endure countless evenings of me toiling away in my office well past midnight, hammering away at drafts, edits, and code refactoring. I’m not sure if you’ll ever get the chance to meet her, but she’s truly incredible. Not only is she my soulmate, but she’s one of the few people on this planet capable of making me genuinely laugh and is a constant inspiration to me. I could argue that most of the wisdom that I’ve learned about how to influence and interact with people in a positive manner comes directly from me observing her.

I’d like to thank Patrick Barb, my development editor at Manning, for this book. He’s been invaluable in getting this into the state that it’s in, consistently challenged me to reduce my verbosity, and has been a great resource for helping me distill the points I’ve tried to make throughout the book. Along with Brian Sawyer, my acquisitions editor, and Marc-Philippe Huget, my technical development editor, the three of them have been an immense help throughout this entire process. In addition, a sincere thank you to Sharon Wilkey, the copy editor for this book, for incredible insight and fantastic skill in making the tone and flow of the book much better, and to all of the Manning team for their hard work in producing this book.

I’d also like to thank the reviewers who have provided great feedback throughout the process of building this book: Dae Kim, Denis Shestakov, Grant van Staden, Ignacio A. Ruiz-Reyes, Ioannis Atsonios, Jaganadh Gopinadhan, Jesús Antonino Juárez Guerrero, Johannes Verwijnen, John Bassil, Lara Thompson, Lokesh Kumar, Matthias Busch, Mirerfan Gheibi, Ninoslav Čerkez, Peter Morgan, Rahul Jain, Rui Liu, Taylor Delehanty, and Xiangbo Mao. Their candid and relevant opinions have been incredibly helpful in condensing a sprawling, tone-deaf, and overly verbose ramble into something that I’m fairly proud of.

I’d like to thank a few colleagues who have helped influence many of the stories and examples and who have been a sounding board for me during the development of this book: Jas Bali, Amir Issaei, Brooke Wenig, Alex Narkaj, Conor Murphy, and Niall Turbitt. I’d also like to acknowledge the creators and fantastic world-class engineers and product team members at Databricks ML engineering who have designed, built, and maintained much of the tech that is featured in parts of this book. It’s an absolute honor to count you all as colleagues.

Finally, thank you to Willy, our dog, who is heavily featured in this book. Yes, his favorite food is my Bolognese. To the curious, yes, he gets enough treats (although he might argue with that statement), and is thanked, repeatedly, through the judicious offerings of such.

about this book

Machine Learning Engineering in Action is an extension of the recommendations, hard-earned wisdom, and general tips that I’ve been sharing with clients for the past few years. This isn’t a book on theory, nor is it going to make you build the best models for a given problem. Those books have already been, and continue to be, written by great authors. This is a book focused on the “other stuff.”

Who should read this book

This book is intended to reach a rather large audience in the ML community. It is neither too in the weeds to be exclusive to ML engineers, nor too high-level to be exclusively written for the benefit of a layperson. My intention in writing it in the way that I did is to make it approachable for anyone who is involved in the process of using ML to solve business problems.

I’ve been pleasantly surprised by some of the early-stage feedback during development of this book. One of the first questions that I ask people who have reached out is, “What do you do?” I’ve received a far wider range of job titles and industries than I ever would have imagined—venture capitalists with PhDs in economics, ML engineers with 20 years of industry experience at some of the most prestigious tech companies, product managers at Silicon Valley startups, and undergrad university students in their freshman year. This lets me know that the book offers a bit of something for everyone to learn in terms of using ML engineering to build something successful.

How this book is organized: A road map

This book has three main parts that address milestones in any ML project. From the initial scoping stages of “What are we trying to solve?” to the final stage of “How are we keeping this solution relevant for years to come?,” the book moves through each of these major epochs in the same logical order that you would consider these topics while working through a project:

  • Part 1 (chapters 1-8) is focused primarily on the management of ML projects from the perspective of a team lead, manager, or project lead. It lays out a blueprint for scoping, experimentation, prototyping, and inclusive feedback to help you avoid falling into solution-building traps.
  • Part 2 (chapters 9-13) covers the development process of ML projects. With examples (both good and bad) of ML solution development, this section carries you through proven methods of building, tuning, logging, and evaluating an ML solution to ensure that you’re building the simplest and most maintainable code possible.
  • Part 3 (chapters 14-16) focuses on “the after”: specifically, considerations related to streamlining production release, retraining, monitoring, and attribution for a project. With examples focused on A/B testing, feature stores, and a passive retraining system, you’ll be shown how to implement systems and architectures that can ensure that you’re building the minimally complex solution to solve a business problem with ML.

About the code

This book contains many examples of source code, both in numbered listings and inline with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text.

In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. Code annotations accompany many of the listings, highlighting important concepts.

You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/machine-learning-engineering-in-action. The complete code for the examples in the book is available for download from the Manning website at www.manning.com/books/machine-learning-engineering-in-action, and from GitHub at https://github.com/BenWilson2/ML-Engineering.

liveBook discussion forum

Purchase of Machine Learning Engineering in Action includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/machine-learning-engineering-in-action/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/discussion.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

about the author

Wilson_author

Ben Wilson is an ML engineer who has served as a nuclear engineering technician, a semiconductor process engineer, and a data scientist. He’s been solving problems with data and open source tooling for over a decade, helping others do the same for the last four years. He enjoys building ML framework code, helping people think through challenging DS problems, and having a good chuckle.

about the cover illustration

The figure on the cover of Machine Learning Engineering in Action, “Hiatheo ou Esclave Chinoise,” or “Hia Theo, a Chinese servant,” is taken from a collection by Jacques Grasset de Saint-Sauveur, published in 1788. Each illustration is finely drawn and colored by hand.

In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.41.27