Despite the vast potential of AI, it hasn’t caught hold in most industries. Sure, it has transformed consumer internet companies such as Google, Baidu, and Amazon—all massive and data rich, with hundreds of millions of users. But industries such as manufacturing, agriculture, and health care still need to find ways to make this technology work for them. Here’s the problem: The playbook that these consumer internet companies use to build their AI systems—where a single one-size-fits-all AI system can serve massive numbers of users—won’t perform well for these other industries.
Instead, these legacy industries will need a large number of bespoke solutions that are adapted to their many diverse use cases. This doesn’t mean that AI won’t work for them, however. It just means they need to take a different approach.
To bridge this gap and unleash AI’s full potential, executives in all industries should adopt a new, data-centric approach to building AI. They should aim to build AI systems with careful attention to ensuring that the data clearly conveys what they need the AI to learn. This requires focusing on data that covers important cases and is consistently labeled, so that the AI can learn from this data what it is supposed to do. In other words, the key to creating these valuable AI systems is programming with data rather than with code.
Why isn’t AI widely used outside consumer internet companies? The top challenges facing AI adoption in other industries include:
For AI to realize its full potential, we need a systematic approach to solving these problems across all industries. The data-centric approach to AI, supported by tools designed for building, deploying, and maintaining AI applications—called machine learning operations (MLOps) platforms—will make this possible.
AI systems are made up of software—the computer program that includes an AI model—and data, the information used to train the model. For example, to build an AI system for automated inspection in manufacturing, an AI engineer might create software that implements a deep learning algorithm, which is then shown a data set comprising pictures of good and defective parts so it can learn to distinguish between them.
Over the last decade, a lot of AI research was driven by software-centric development (also called model-centric development) in which the data is fixed and teams attempt to optimize or invent new programs to learn from the available data. Many tech companies had large data sets from millions of consumers, and they used these to drive a lot of innovation in AI.
But at AI’s current level of sophistication, the bottleneck for many applications is getting the right data to feed to the software. We’ve heard about the benefits of big data, but we now know that for many applications, it is more fruitful to focus on making sure we have good data—data that clearly illustrates the concepts we need the AI to learn. This means, for example, the data should be reasonably comprehensive in its coverage of important cases and labeled consistently. Data is food for AI, and modern AI systems need not only calories but also high-quality nutrition.
Shifting your focus from software to data offers an important advantage: It relies on the people you already have on staff. In a time of great AI talent shortage, a data-centric approach allows many subject matter experts who have vast knowledge of their respective industries to contribute to the AI system development.
For example, most factories have workers who are highly skilled at defining and identifying what counts as a defect (is a 0.2 mm scratch a defect or is it so small that it doesn’t matter?). If we expect each factory to ask its workers to invent new AI software as a way to get that factory the bespoke solution it needs, progress will be slow. But if we instead build and provide tools to empower these domain experts to engineer the data—by allowing them to express their knowledge about manufacturing through providing data to the AI—their odds of success will be much higher.
The shift toward data-centric AI development is being enabled by the emerging field of MLOps, which provide tools that make building, deploying, and maintaining AI systems easier than ever before. Tools that are geared to help produce high-quality data sets, in particular, hold the key to addressing the challenges of small data sets, high cost of customization, and the long road to getting an AI project into production outlined above.
How, exactly? First, ensuring high-quality data means that AI systems will be able to learn from the smaller data sets available in most industries. Second, by making it possible for a business’s domain experts, rather than AI experts, to engineer the data, the ability to use AI will become more accessible to all industries. And third, MLOps platforms provide much of the scaffolding software needed to take an AI system to production, so teams no longer have to develop this software. This allows teams to deploy AI systems—and bridge the gap between proof of concept and production in weeks or months rather than years.
The vast majority of valuable AI projects have yet to be imagined. And even for projects that teams are already working on, the gap that leads to deployment in production remains to be bridged—indeed, Accenture estimates that 80% to 85% of companies’ AI projects are in the proof-of-concept stage.
Here are some things companies can do right now:
It’s possible for AI to become a thriving asset outside of data-rich consumer internet businesses, but it has yet to hit its stride in other industries. A new data-centric mindset, coupled with MLOps tools that allow industry domain experts to participate in the creation, deployment, and maintenance of AI systems, will ensure that all industries can reap the rewards that AI can offer.
__________
Andrew Ng is the founder and CEO of Landing AI, the former VP and chief scientist of Baidu, cochairman and cofounder of Coursera, the former founding lead of Google Brain, and an adjunct professor at Stanford University.
Adapted from content posted on hbr.org, July 29, 2021 (product #H06HSP).
18.223.107.142