Lost (and found) in the cloud

Having a beefy desktop machine with a GPU and an Ubuntu build is great for prototyping and research, but when it comes time to getting your model into production, and to actually making the day-to-day predictions required by your use case, you need compute resources that are highly available and scalable. What does that actually mean?

Imagine you've taken our Convolutional Neural Network (CNN) example, tweaked the model and trained it on your own data, and created a simple REST API frontend to call the model. You want to build a little business around providing clients with a service whereby they pay some money, get an API key, and can submit an image to an endpoint and get a reply stating what that image contains. Image recognition as a service! Does this sound good?

How would we make sure our service is always available and fast? After all, people are paying you good money, and even a small outage or dip in reliability could cause you to lose customers to one of your competitors. Traditionally, the solution was to buy a bunch of expensive server-grade hardware, usually a rack-mounted server with multiple power supplies and network interfaces to ensure service continuity in the case of hardware failure. You'd need to examine options for redundancy at every level, from disk or storage all the way through to the network and even internet connection.

The rule of thumb was that you needed two of everything, and this all came at a considerable, even prohibitive, cost. If you were a large, well-funded start-up, you had many options, but of course, as the funding curve dropped off, so did your options. It was inevitable that self-hosting became managed hosting (not always, but for most small or start-up use cases), which in turn became a standardized layer of compute stored in someone else's data center to the extent that you simply didn't need to care about the underlying hardware or infrastructure at all.

Of course, in reality, this is not always the case. A cloud provider such as AWS takes most of the boring, painful (but necessary) stuff, such as hardware replacements and general maintenance, out of the equation. You're not going to lose a disk or fall prey to a faulty network cable, and if you decide (hey, this is all working well) to serve 100,000 customers a day, then you can push a simple infrastructure spec change. No calls to a hosting provider, negotiating outages, or trips to the computer hardware store required.

This is an incredibly powerful idea; the literal nuts and bolts of your solution—the mix of silicon and gadgetry that your model will use to make predictions—can almost be treated as an afterthought, at least compared to a few short years ago. The skill set, or approach, that is generally required to maintain cloud infrastructure is called DevOps. This means that an individual has feet in two (or more!) camps. They understand what all these AWS resources are meant to represent (servers, switches, and load balancers), and how to write the code necessary to specify and manage them.

An evolving role is that of the machine learning engineer. This is the traditional DevOps skill set, but as more of the Ops side becomes automated or abstracted away, the individual can also focus on model training or deployment and, indeed, scaling. It is beneficial to have engineers involved in the entire stack. Understanding how parallelizable a model is, the kinds of memory requirements a particular model may have, and how to build the distributed infrastructure necessary to perform inference at scale all results in a model-serving infrastructure where the various design elements are not the product of domain specialization but rather an integrated whole.

Table of Contents for Lost (and found) in the cloud

Create new playlist

Sign In

Sign Up

Table of Contents for
Lost (and found) in the cloud