©  Geoff Hulten 2018
Geoff HultenBuilding Intelligent Systemshttps://doi.org/10.1007/978-1-4842-3432-7_24

24. Dealing with Mistakes

Geoff Hulten
(1)
Lynnwood, Washington, USA
 
There will be mistakes. Humans make them. Artificial intelligences make them, too—and how. Mistakes can be irritating or they can be disastrous.
Every Intelligent System should have a strategy for identifying mistakes; for example, by monitoring critical metrics and giving users easy ways to report problems.
Every Intelligent System should also have a strategy for dealing with mistakes. Perhaps it’s done by updating intelligence; perhaps by having humans override certain behaviors by hand; perhaps by offering a workaround or refund to affected users.
Some mistakes will be very hard to find. Some will be very hard to fix (without introducing new mistakes). And some will take a very long time to fix (hours to deploy a new model or months to come up with a new intelligence strategy).
This chapter will discuss ways of dealing with mistakes, including these topics:
  • The types of mistakes the system might make (especially the bad ones).
  • Reasons intelligences might make mistakes.
  • Ways to mitigate mistakes.
Every orchestrator of an Intelligent System should embrace the reality of mistakes.

The Worst Thing That Could Happen

Ask yourself: What is the worst thing my Intelligent System could do?
  • Maybe your Intelligent System will make minor mistakes, like flashing a light the user doesn’t care about or playing a song they don’t love.
  • Maybe it could waste time and effort, automating something that a user has to undo, or causing your user to take their attention off the thing they actually care about and look at the thing the intelligence is making a mistake about.
  • Maybe it could cost your business money by deciding to spend a lot of CPU or bandwidth, or by accidentally hiding your best (and most profitable) content.
  • Maybe it could put you at legal risk by taking an action that is against the law somewhere, or by shutting down a customer or a competitor’s ability to do business, causing them damages you might end up being liable for.
  • Maybe it could do irreparable harm by deleting things that are important, melting a furnace, or sending an offensive communication from one user to another.
  • Maybe it could hurt someone—even get someone killed.
Most of the time, when you think about your system you are going to think about how amazing it will be, all the good it will do, all the people who will love it. You’ll want to dismiss its problems; you’ll even try to ignore them.
Don’t.
Find the worst thing your system can do.
Then find the second worst.
Then the third worst.
Then get five other people to do the same thing. Embrace their ideas and accept them.
And then when you have fifteen really bad things your Intelligent System might do, ask yourself: is that OK?
Because these types of mistakes are going to happen, and they will be hard to find, and they will be hard to correct.
If the worst thing your system might do is too bad to contemplate, you might want to design a different system—one that couldn’t do that bad thing, ever, no matter what the intelligence says. Maybe you make sure a human is part of the decision process. Maybe you use a less forceful experience. Maybe you find something completely different to do with your life…
Because the intelligence will make mistakes and, eventually, the worst thing will happen.

Ways Intelligence Can Break

An Intelligent System will make mistakes for many different reasons. Some of them are implementation or management problems; some of them are intelligence problems. This section discusses these potential problem sources, including the following:
  • System outages
  • Model outages
  • Intelligence errors
  • Intelligence degradation
The first step to fixing a mistake is understanding what is causing it.

System Outage

Sometimes computers crash. Sometimes the Internet is slow. Sometimes network cables get cut. Sometimes a system has subtle bugs in the way its systems interact. These are problems with the implementation or the operation of your Intelligent System, but they might show up the same way intelligence mistakes do, in user reports, escalations, and degrading metrics.
Isolating these types of problems can be difficult in large systems, particularly when intelligence is spread between clients (which are in different states of upgrade) and multiple servers (which can live in various data centers).
Catastrophic outages are usually easy to find—because everything tanks. But partial outages can be more subtle. For example, suppose 1% of your traffic is going to a particular server and the server bombs out in a crazy way. One percent of your users are getting a bad experience, and maybe they are reporting it, over and over… But that’s just 1% of your user base. 99% of your users aren’t bothered. Would you ever notice?
System outages should be rare, and they should be fixed immediately. If they become prevalent they will paralyze intelligence work—and they will be just plain bad for morale.

Model Outage

Related to system outages , a model outage is more an implementation problem than an intelligence problem—but it will have similar symptoms.
Model outages can occur when:
  • A model file is corrupted in deployment.
  • A model file goes out of sync with the code that turns contexts into features.
  • The intelligence creation environment goes out of sync with the intelligence runtime environment.
  • An intelligence goes out of sync with the experience.
These problems can be very hard to find—imagine if some feature code gets updated in the intelligence creation environment, but not in the intelligence runtime. Then when a new model (using the updated feature code) is pushed to the runtime (using the out-of-date feature code) it will be confused. It will get feature values it doesn’t expect. It will make mistakes. Because of this, maybe the accuracy is 5% worse in the runtime than it is in the lab. All the testing in the lab shows that the intelligence is working fine, but users are getting a slightly worse experience.
Because these problems are so hard to find, every intelligence implementation should have checks and double-checks to make sure the intelligence-creation environment is in sync with the runtime environment, that everything is deployed correctly, and that all components are in sync.

Intelligence Errors

When the models that make up intelligence don’t match the world perfectly (and they don’t), there will be mistakes. Recall that creating intelligence is a balancing act between learning a very complex model that can represent the problem and learning a model that can generalize well to new contexts. There will always be gaps—places where the model isn’t quite right.
And these gaps cause mistakes, mistakes that are hard to correct through intelligence creation. You can try another type of model, but that will make its own (new) types of mistakes. You can get more data, but that has diminishing returns. You can try more feature engineering—and it usually helps. But these types of mistakes will always exist.
They will appear a bit random. They will change over time (as the training data changes). They aren’t easy to correct—it will require sustained effort, and it will get harder the further you go.
One additional challenge for intelligence errors is figuring out which part of the intelligence is responsible. When models are loosely coupled (for example, when they have an order of execution and the first model to make a statement about a context wins), it can be easy to determine exactly which model gave the incorrect answer. But when models are tightly coupled (for example, when the output of several models is combined using a complex heuristic or a meta-model), a mistake will be harder to track. If six models are each partially responsible for a mistake, where do you begin?

Intelligence Degradation

When an open-ended, time-changing problem changes, the intelligence you had yesterday will not be as good today. Changing problems compound generic intelligence errors because new mistakes will occur even when you don’t change anything. Further, training data for the “new” problem will take time to accumulate, meaning you may need to wait to respond and you may never be able to get enough training data to learn any particular version of the problem well (by the time you learn it, it isn’t relevant any more).
There are two main categories of change:
  1. 1.
    Where new contexts appear over time (or old ones disappear), in which case you will need to create new intelligence to work on the new contexts, but existing training data can still be used on old contexts.
     
  2. 2.
    Where the meaning of contexts changes over time, in which case existing training data can be misleading, and you’ll need to focus on new telemetry to create effective intelligence.
     
One way to understand the degradation in your Intelligent System is to preserve old versions of your intelligence and run a spectrum of previous intelligences on current data—the intelligence from yesterday, from five days ago, from ten days ago, and so on. By looking at how mistakes change, you can gain intuition about the way your problem is evolving and use that intuition when choosing how to adapt.

Mitigating Mistakes

Random, low-cost mistakes are to be expected. But when mistakes spike, when they become systematic, or when they become risky or expensive, you might consider mitigations.
This section discusses the following approaches to mitigating errors:
  • Investing in intelligence
  • Balancing the experience
  • Adjust intelligence management parameters
  • Implementing guardrails
  • Overriding errors

Invest in Intelligence

In a healthy Intelligent System , the intelligence will be constantly improving. One way to deal with mistakes is to wait for the intelligence to catch up.
In fact, almost every other approach to mitigating mistakes degrades the value of the Intelligent System for users who aren’t having problems (by watering down the experience); or it adds complexity and maintenance cost in the long run (by adding manual tweaks that must be maintained). Because of this, improving the intelligence is a great way to deal with mistakes—when it is possible. The best ways to invest in improving intelligence with respect to mistakes are these:
  1. 1.
    Get more relevant telemetry or training data that contains the contexts where mistakes are occurring. This might allow the intelligence creation to start solving the problem with very little work.
     
  2. 2.
    Help intelligence creators prioritize the parts of the system they spend time on by categorizing mistakes into categories (by locals, age, user properties, and so on) and prioritizing the categories. Intelligence creators can then work on features and modeling that helps in those specific areas, maybe via partitioning and focused modeling.
     
  3. 3.
    Provide more resources to intelligence creation in terms of people and tools.
     
These investments will improve the overall quality of the intelligence and, over time, a rising tide will raise all ships.
And perhaps the worst way to invest in intelligence is to track intelligence errors as if they were software defects and hold intelligence creators accountable to fix them in order, one after the next, until there aren’t any more. That’s not the way it works. If there are errors that you absolutely must fix, then you should consider one of the other mitigation approaches discussed in this section.

Balance the Experience

If the errors are low-grade , random intelligence errors or are caused by intelligence degradation, they will be hard to solve. In these cases, you might want to rebalance the experience, making it less forceful and making the errors less costly.
If the problems are bad enough, you could consider essentially turning off an intelligent experience until you can get on top of the problem.
There are many chapters that discuss ways to balance intelligence and experience , so I won’t cover them again here.

Adjust Intelligence Management Parameters

If the errors are because of degradation —that is, because new contexts are showing up quickly or old contexts are changing meaning—you might be able to address them by training and deploying new models faster.
You might also change what training data you use, for example by phasing out old training data more quickly (when contexts are changing meaning) or up-sampling new contexts in telemetry or training (which helps with changing meanings and when new contexts are appearing quickly).
These approaches are similar to investing in intelligence, but they can be more reactive. For example, when a holiday comes around, a bad weather event occurs, or a new product is launched your problem might change more-quickly than usual. An orchestrator might know this and tweak some knobs rather than waiting for intelligence creators to learn how to predict these types of events.

Implement Guardrails

Sometimes you encounter categories of mistakes that are just silly. Any human would look at the mistake and know that it couldn’t possibly be right. For example:
  • When the pellet griller is at 800 degrees you never want to add more fuel to the fire.
  • When the user is 10 years old you never want to show them a horror movie.
In these cases, you could try to trick the intelligence creation algorithms to learn these things. You could gather focused training data. You could hound the intelligence creators. You could invest months of work…
Or you could implement a simple heuristic to override the intelligence when it is about to do something that is obviously crazy—a guardrail.
When using guardrails , make sure to:
  1. 1.
    Be conservative—only override obvious problems and don’t get drawn into creating sophisticated intelligence by hand.
     
  2. 2.
    Revisit your decisions—by tracking the performance (and cost) of guardrails and removing or relaxing ones that become less important as intelligence improves or the problem changes.
     

Override Errors

Sometimes there is no way around it; your system will make expensive mistakes that can’t be mitigated any other way, and you’ll have to override these mistakes by hand. For example:
  • You run a search engine and the top response for the query “games” is not about games.
  • You run an anti-spam service and it is deleting all the mail from a legitimate business.
  • You run an e-commerce site and it removed a product for “violating policy,” but the product wasn’t violating policy.
  • You run a funny-webpage finder and it is marking the Internet’s most popular joke site as not-funny.
When these mistakes are important enough, you might want to have special user experience to allow users to report problems. And you might want to have some processes around responding to these reports . For example, you might create a support group with the right tools and work-flows to examine every reported mistake within an hour, 24 hours a day, 7 days a week.
As with guardrails, make sure to use overriding sparingly and to track the quality and cost of overrides over time.

Summary

Mistakes are part of Intelligent Systems, and you should have a plan to measure mistakes and deal with them. Part of this plan is to understand what types of bad things your system can do. Be honest with yourself. Be creative in imagining problems.
In order to fix a problem, it’s helpful to know what is causing the problem. Mistakes can occur when:
  • A part of your Intelligent System has an outage.
  • Your model is created, deployed, or interpreted incorrectly.
  • Your intelligence isn’t a perfect match for the problem (and it isn’t).
  • The problem or user-base changes.
Once you’ve found a problem you can mitigate it in a number of ways:
  • By investing more in intelligence.
  • By rebalancing the experience.
  • By changing intelligent management parameters.
  • By implementing guardrails.
  • By overriding errors.
An active mistake mitigation plan can allow the rest of your Intelligent System to be more aggressive—and achieve more impact. Embracing mistakes, and being wise and efficient at mitigating them, is an important part of orchestrating an Intelligent System.

For Thought…

After reading this chapter, you should:
  • Understand when and how mistakes put an Intelligent System at risk.
  • Understand how to know if an Intelligent System is working, and to identify the common ways it might fail.
  • Be able to mitigate mistakes using a collection of common approaches, and know when to use the various approaches.
You should be able to answer questions like these:
  • What is the most widespread Intelligent System mistake you are aware of?
  • What is the most expensive one?
  • Design a system to address one of these two mistakes (the widespread one or the expensive one).
  • Would it work for the other mistake? Why or why not?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.12.207