©  Geoff Hulten 2018
Geoff HultenBuilding Intelligent Systemshttps://doi.org/10.1007/978-1-4842-3432-7_14

14. Intelligence Management

Geoff Hulten
(1)
Lynnwood, Washington, USA
 
The intelligence in an Intelligent System takes a journey from creation, to verification, to deployment, to lighting up for users, and finally to being monitored over time. Intelligence management bridges the gap between intelligence creation (which is discussed in Part IV of this book) and intelligence orchestration (which is discussed in Part V), by making it safer and easier to deploy new intelligence and enable it for users.
At its simplest, intelligence management might involve hand-copying model files to a deployment directory where the runtime picks them up and exposes them to users. But in large, complex Intelligent Systems, the process of managing intelligence can (and probably should) be much more involved.
This chapter will discuss some of the challenges with intelligence management. Then it will provide an overview of ways intelligence management can support agility in your Intelligent System while verifying intelligence and lighting it up with users.

Overview of Intelligence Management

Intelligence management involves all the work to take intelligence from where it is created and put it where it will impact users. This includes:
  • Sanity checking the intelligence to make sure it will function correctly.
  • Deploying the intelligence to the runtimes it needs to execute in.
  • Lighting up the intelligence in a controlled fashion.
  • Turning off intelligence that is no longer helpful.
An intelligence management system can be simple (like a set of instructions an intelligence operator must execute manually for each of these steps); it can be partially automated (like a set of command-line tools that do the work); or it can be very slick (like a graphical console that lets intelligence orchestrators inject, move, and light up or disable intelligence with a click). A good intelligence management system will do the following:
  • Provide enough support to match the skills and scenarios for intelligence management in your environment.
  • Make it hard to make mistakes.
  • Not introduce too much latency.
  • Make a good tradeoff between human involvement and implementation cost.
Intelligence management is challenging because of complexity, frequency, and human systems. We’ll discuss these in turn.

Complexity in Intelligent Management

Intelligent Systems can be quite complex . For example:
  • The intelligence might need to live at multiple places between the client and server, including client-side intelligence, one or more server-side intelligences, and cached intelligence.
  • Intelligence might come from dozens of different sources—some of them machine-learned, some created by humans, and some created by processes outside of your organization.
  • Various parts of intelligence might depend on one another, and might be updated at different frequencies by different people.
As Intelligent Systems grow over time, simply getting a new piece of intelligence correctly deployed to users can be difficult and error-prone.

Frequency in Intelligence Management

Intelligent Systems will have their intelligence updated many times during their life-cycles, consider that:
  • Updating intelligence once a week for three years is about a hundred sixty times.
  • Updating intelligence once a day for three years is about a thousand times.
  • Updating intelligence once an hour for three years is about twenty-six thousand times.
  • Updating intelligence one a minute for three years is about one and a half million times.
These are pretty big numbers. They mean that the intelligence management process needs to be reliable (have a low error rate), and it probably can’t take much human effort.

Human Systems

Intelligence might be deployed by users with all sorts of skill levels and backgrounds:
  • Experts who understand the implementation of the Intelligent System well.
  • Machine-learning practitioners who are not great engineers.
  • Nontechnical people who are hand-correcting costly mistakes the system is making.
  • New employees.
  • Disgruntled employees.
Making intelligent management easier and less error-prone can pay large dividends in the agility with which your Intelligent System evolves over time.

Sanity-Checking Intelligence

A rapid, automated sanity-checking system is a safety net for intelligence creators, allowing them to innovate with confidence and focus their energy on building better intelligence (not on cross-checking a bunch of details and remembering how to run all the tests they should run). An effective safety net will verify that new intelligence:
  • Is compatible with the Intelligent System.
  • Executes and meets any runtime requirements.
  • Doesn’t make obvious mistakes.
A good intelligence management system will make it harder to deploy intelligence without doing these checks than it is to deploy intelligence via a system that automates these types of checks.
We’ll now explore these categories of checks in turn.

Checking for Compatibility

Mistakes happen. Sometimes intelligence creators format things wrong, or forget to run a converter on their model files, or train a model from a corrupted telemetry file, or the training environment breaks in an odd way that outputs a corrupted model. Intelligence management is a great chokepoint where lots of simple, common mistakes can be caught before they turn into damaging problems. Here are some things to check to ensure that intelligence is compatible:
  • The intelligence data file properly formatted and will load in the intelligent runtime.
  • The new intelligence in sync with the feature extractor that is currently deployed.
  • The new intelligence is in sync with the other intelligence in the system, or dependent intelligence is being deployed simultaneously.
  • The new intelligence deployment contains all the required meta-data (such as any thresholds needed to hook it to the intelligent experience).
  • The cost of deploying the new intelligence will be reasonable (in terms of bandwidth costs and the like).
These are all static tests that should be simple to automate, should not introduce much latency, and don’t require much human oversight (unless there is a problem)—it is easy to determine automatically whether they pass or fail.

Checking for Runtime Constraints

It’s also important to check that the new intelligence meets any constraints from the environment where it will execute, including that:
  • The new intelligence doesn’t use too much RAM when loaded into memory in the runtime.
  • The new intelligence meets the runtime performance targets for the execution environment (across a wide range of contexts).
  • The new intelligence will run exactly the same way when users interact with it as it did in the intelligence creation environment (context handling, feature creation, intelligence execution, and so on).
These tests require:
  • Executing intelligence in a test environment that mirrors the environment where users will interact with the intelligence.
  • A facility to load contexts, execute the intelligence on them, measure resource consumption, and compare the results to known correct answers.
  • A set of test contexts that provide good coverage over the stations your users encounter.
These are dynamic tests that can be automated. They will introduce some latency (depending on how many test contexts you use). They don’t require much human oversight (unless there is a problem)—it is easy to determine automatically whether they pass or fail.

Checking for Obvious Mistakes

Intelligence creators shouldn’t create intelligences that make obvious mistakes . You can tell them that (and I’ll tell them that later in this book)—but it never hurts to check. Intelligence management should verify that:
  • The new intelligence has “reasonable” accuracy on a validation set (contexts that the intelligence creators never get to see—no cheating).
  • The new intelligence doesn’t make any mistakes on a set of business-critical contexts (that should never be wrong).
  • The new intelligence doesn’t make significantly more costly mistakes than the previous intelligence did.
  • The new intelligence doesn’t focus its new mistakes on any critical sub-population of users or contexts.
If any of these tests fail, the intelligence deployment should be paused for further review by a human.
These are dynamic tests that can be automated. They will introduce some latency (depending on how many test contexts you use). They are somewhat subjective, in that humans may need to consider the meaning of fluctuations in accuracy over time.

Lighting Up Intelligence

Once intelligence is sanity-checked against a series of offline checks, it can be checked against real users. Ways of doing this include the following:
  • Single Deployment
  • Silent Intelligence
  • Controlled Rollout
  • Flighting
  • Reversion
This section will discuss these as well as some of their pros and cons, so you can decide which is right for your Intelligent System.

Single Deployment

In the simplest case, intelligence can be deployed all at once to all users simultaneously in any of several ways:
  • By bundling the new intelligence into a file, pushing the file to the runtimes on clients, and overwriting the old intelligence.
  • By copying the new intelligence onto the server that is hosting the runtime and restarting the runtime process.
  • By partitioning the intelligence into the part that runs on the client, the part that runs on the service, and the part that runs on the back-end, and deploying the right pieces to the right places.
Pushing the intelligence all at once is simple to manage and relatively simple to implement. But it isn’t very forgiving. If there is a problem, all of your users will see the problem at once.
For example, imagine you’ve built a smart clothes-washing machine. Put in clothes, shut the door, and this machine washes them—no more messing with dials and settings. Imagine the system is working well, but you decide to improve the intelligence with a single deployment. You push a new intelligence out to tens of thousands of smart washing machines—and then start getting reports that the washing machines are ruining users’ clothes. How is it happening? You aren’t sure. But the problem is affecting all your users and you don’t have a good solution.
Single Deployment can be effective when:
  • You want to keep things simple.
  • You have great offline tests to catch problems.
Single deployment can be problematic when:
  • Your system makes high-cost mistakes.
  • Your ability to identify and correct problems is limited/slow.

Silent Intelligence

Silent intelligence deploys new intelligence in parallel to the existing intelligence and runs both of them for every interaction. The existing intelligence is used to control the intelligent experience (what users see). The silent intelligence does not affect users; its predictions are simply recorded in telemetry so you can examine them and see if the new intelligence is doing a good job or not.
One helpful technique is to examine contexts where the existing intelligence and the silent intelligence make different decisions. These are the places where the new intelligence is either better or worse than the old one. Inspecting a few hundred of these contexts by hand can give a lot of confidence that the new intelligence is safe to switch on (or that it isn’t).
Intelligence can be run in silent mode for any amount of time: a thousand executions, a few hours, days, or weeks; as long as it takes for you to gain confidence in it.
If the new intelligence proves itself during the silent evaluation, it can replace the previous intelligence. But if the new intelligence turns out to be worse, it can be deleted without ever impacting a user—no problem!
Silent intelligence can be effective when:
  • You want an extra check on the quality of your intelligence.
  • You want to confirm that your intelligence gives the same answers at runtime as it did when you created it.
  • You have a very big or open-ended problem and you want to gain confidence that your intelligence will perform well on new and rare contexts (which may not appear in your intelligence-creation environment).
Silent intelligence can be problematic when:
  • You don’t want the complexity (or resource cost) of running multiple intelligences at the same time.
  • Latency is critical, and you can’t afford to wait to verify your new intelligence in silent mode.
  • It is hard to evaluate the effect of the silent intelligence without exposing it to users—you can see what it would have done, but not the outcome the user would have gotten.

Controlled Rollout

A controlled rollout lights up new intelligence for a fraction of users, while leaving the rest of the users with the old intelligence. It collects telemetry from the new users and uses it to verify that the new intelligence is performing as expected. If the new intelligence is good, it is rolled out to more users; if the new intelligence has problems, it can be reverted without causing too much damage.
This is different from silent intelligence in two important ways:
  1. 1.
    Telemetry from a controlled rollout includes the effect the intelligence has on user behavior. You can know both what the intelligence did and how users responded.
     
  2. 2.
    A controlled rollout runs a single intelligence per client; but runs multiple intelligences across the user base—it uses fewer resources per client, but may be more complex to manage.
     
Intelligence can be rolled out using various policies to balance latency and safety, including:
  • Rolling out to an additional small fraction of your users every few hours as long as telemetry indicates things are going well.
  • Rolling out to a small test group for a few days, then going to everyone as long no problems were discovered.
  • Rolling out to alpha testers for a while, and then to beta testers, then to early adopters, and finally to everyone.
A controlled rollout can be effective when:
  • You want to see how users will respond to a new intelligence while controlling the amount of damage the new intelligence can cause.
  • You are willing to let some of your users experience problems to help you verify intelligence.
A controlled rollout can be problematic when:
  • You don’t want to deal with the complexity of having multiple versions of intelligence deployed simultaneously.
  • You are worried about rare events. For example, a controlled rollout to 1% of users is unlikely to see a problem that affects only 1% of users.

Flighting

Flighting is a special type of controlled rollout that gives different versions of the intelligence to different user populations to answer statistical questions about the intelligences.
Imagine two intelligence creators who come to you and say they have a much better intelligence for your Intelligent System. One of the intelligences is fast but only so-so on the accuracy. The other is very slow, but has much better accuracy.
Which is going to do a better job at achieving your Intelligent System’s objectives? Which will users like more? Which will improve engagement? Which will result in better outcomes?
You could do focus groups. You could let the intelligence creators argue it out. Heck, you could give them battle axes, put them in an arena and let the winner choose which intelligence to ship…
Or you could deploy each version to 1,000 of your customers and track their outcomes over the following month.
  • Does one of the trial populations use the app more than the other?
  • Did one of the trial populations get better outcomes than the other?
  • Does one of the trial populations have higher sentiment for your app than the other?
A flight can help you understand how intelligence interacts with the rest of your Intelligent System to achieve objects.
Flights can be effective when:
  • You are considering a small number of large changes and you want to know which of them is best.
  • You need to track changes over an extended period so you can make statistically valid statements about how changes affect outcomes, leading indicators, and organizational objectives.
Flights can be problematic when:
  • You need to iterate quickly and make many changes in a short time.
  • The difference between the options you are considering is small (as when one algorithm is a half percent more accurate than another). Flights can take a long time to determine which small change is best.

Turning Off Intelligence

No matter how safe you think you are, sometimes things will go wrong, and you might have to undo an intelligence change—fast!
One way to do this is to redeploy an old intelligence over a new one that is misbehaving.
Another approach is to keep multiple versions of the intelligence near the runtime—the new one and several old ones. If things go wrong, the runtime can load a previous intelligence without any distribution latency (or cost).
Support for quick reversion can be effective when:
  • You’re human (and thus make mistakes).
  • The cost of deploying intelligence is high.
Support for quick reversions can be problematic when:
  • You’re trying to impress someone and don’t want them to think you’re a wimp.
  • Your intelligence is large, and you don’t have capacity to store multiple copies of it near the runtime.

Summary

Intelligence management takes intelligence from where it is created to where it will impact users. A good management system will make it very easy to deploy intelligence and will make it hard to make mistakes. It must do both of the following:
  • Sanity-check the intelligence; that is, perform basic checks to make sure the intelligence is usable. These include making sure it will run in the runtime, it will be performant enough, and it doesn’t make obvious terrible mistakes.
  • Light up the intelligence, which includes providing controls for intelligence to be presented to users in a measured fashion, to see what the intelligence might do, to see some small percentage of users interact with it—and to revert it quickly if there is a problem.
A successful intelligence-management system will make it easy to deploy intelligence with confidence.
It will help intelligence creators by preventing common mistakes, but also by letting them verify the behavior of their intelligence against real users in a measured fashion.
And a good intelligence-management system will support the operation of the Intelligence Service over its lifetime.

For Thought…

After reading this chapter, you should:
  • Be able to design a system to manage the intelligence in an Intelligent System.
  • Know ways to verify intelligence to ensure that it is compatible, works within constraints, and doesn’t make obvious mistakes.
  • Be prepared with a collection of ways to roll out intelligence changes safely, ensuring that the intelligence is doing what it was intended to do.
You should be able to answer questions like these:
  • Design a system for managing intelligence for an Intelligence Service where the intelligence changes monthly. What tools would you build? What facilities would you create for rolling out the intelligence to users?
  • Now imagine the intelligence needs to change twice per day. What would you do differently?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.235.23