©  Geoff Hulten 2018
Geoff HultenBuilding Intelligent Systemshttps://doi.org/10.1007/978-1-4842-3432-7_21

21. Organizing Intelligence

Geoff Hulten
(1)
Lynnwood, Washington, USA
 
In most large-scale systems, intelligence creation is a team activity. Multiple people can work on the intelligence at the same time, building various parts of it, or investigating different problem areas. Multiple people can also work on the intelligence over time, taking over for team members who’ve left, or revisiting an intelligence that used to work but has started having problems. Some examples of ways to organize intelligence include these:
  • Using machine learned intelligence for most things, but using manual intelligence to override mistakes.
  • Using one machine learned intelligence for users from France, and a different one for users from Japan.
  • Using the output of heuristic intelligence as features into a machine learned intelligence.
This chapter discusses ways to organize intelligence and the process used to create intelligence to make it robust, and to allow many people to collaborate effectively.

Reasons to Organize Intelligence

There are many reasons you might want to move from building a single monolithic intelligence (for example, a single machine-learned model) to an organized intelligence:
  • Collaboration : Large-scale intelligence construction is a collaborative activity. You may have 3, or 5, or 15 people working on the intelligence of a single Intelligent System. And if you do, you’ll need to find ways to get them all working together, efficiently, instead of competing to be the owner of the one-intelligence-to-rule-them-all.
  • Cleaning up mistakes : Every intelligence will make mistakes, and correcting one mistake will often make more mistakes crop up in other places. Also, trying to get a machine-learning-based intelligence to stop making a particular mistake isn’t easy—it often requires some experimentation and luck. Combining intelligences can provide quick mistake mitigation to backstop more complex intelligences.
  • Solving the easy part the easy way: Sometimes part of a problem is easy, where a few heuristics can do a great (or perfect) job. In those cases, you could try to trick machine learning to learn a model that does something you already know how to do, or you could partition the problem and let heuristics solve the easy part while machine learning focuses on the harder parts of the problem.
  • Incorporating legacy intelligence: There is a lot of intelligence in the world already. Existing intelligences and intelligence-creation processes can be quite valuable. You might want to incorporate them into an Intelligent System in a way that leverages their strength and helps them grow even more effective.

Properties of a Well-Organized Intelligence

Organizing intelligence can be difficult and problematic. Done wrong, the layers of intelligence come to depend on the idiosyncrasies of each other. Any change to one intelligence causes unintended (and hard to track) changes in other intelligences. You can end up with a situation where you know you need to make some changes, but you simply can’t—just like spaghetti code , you can have spaghetti intelligence.
A well-organized intelligence will be all of the following:
  • Accurate : The organization should not reduce the accuracy potential too much. It should be a good trade-off of short term cost (in terms of lower immediate accuracy) for long term gains (in terms of higher accuracy over the lifetime of the Intelligent System).
  • Easy to Grow : It should be easy for anyone to have an insight, create some intelligence, and drop it into the system.
  • Loosely Coupled : The ability for one intelligence to influence the behavior of other intelligences should be minimized. The interfaces between the intelligences should be clear, and the intelligences shouldn’t use information about the inner working of one-another.
  • Comprehensible : For every outcome that users have, the system should be able to pinpoint the intelligence (or intelligences) that were involved in the decision, and the number of intelligences involved in each decision/outcome should be minimized.
  • Measurable : For every part of the intelligence, it should be possible to determine how much that part of the intelligence is benefiting users.
  • Suportive of the Team : The organization strategy should work with the team. The organization should allow intelligence creators’ successes to amplify one another's work. It should avoid putting goals in conflict or creating the need for participants to compete in unproductive ways.

Ways to Organize Intelligence

This section discusses a number of techniques for organizing intelligence and the process of creating intelligence, and evaluates them against the key properties of a well-organized intelligence. Most large Intelligent Systems will use multiple of these methods simultaneously. And there are many, many options—the techniques here are just a starting point:
  • Decouple feature engineering
  • Multiple model searches
  • Chase mistakes
  • Meta-models
  • Model sequencing
  • Partition contexts
  • Overrides
The following sections will describe these approaches and rank them according to how well they meet the criteria for well-organized intelligence. This is a subjective scale that is attempting to highlight relative strength/weaknesses as follows:
++
A real strength
+
Better than average
 
Average
-
Worse than average
- -
A challenge that will need attention
All of these methods are viable, and used in practice. But you should be prepared to mitigate the challenges inherent in the organization strategy you choose.

Decouple Feature Engineering

One approach to organizing intelligence creation is to separate the feature engineering tasks so that each intelligence creator has a clear part of the context to explore and to turn into features. For example, if trying to understand a web page:
  • One intelligence creator can focus on the content of the page, using standard approaches like bag of words and n-grams to convert words into features.
  • Another intelligence creator can focus on understanding the semantics of the text on the page, using parts of speech tagging, sentiment analysis, and so on.
  • Another could look at the history of the web site, where it is hosted, who created it, and what else they created.
  • Another could explore the images on the web page and try to create features from them.
  • Another could look at the properties of the user.
Each participant needs to be able to inject their feature extraction code into the modeling process. They need to be able to tweak model parameters to take best advantage of the new work. They need to be able to deploy their work.
Some challenges of decoupled feature engineering include these:
  • Conflict on model building: One type of model/set of parameters might work better for one type of feature than for another. Participants will need to balance trade-offs to grow the overall system, and not simply cannibalize value from existing feature sets.
  • Redundant features: Multiple approaches to feature creation could leverage the same underlying information from the context. The resulting features may be very similar to each other. Intelligence creators may have conflict about how to remove the redundancies.
  • Instability in feature value: When a new feature is presented to a machine learning algorithm it will usually change the direction of the model-building search, which can have wild impacts on the value of other features and on the types of mistakes the model makes. Adding a new feature may require some global understanding of the feature set/model and some work on other parts of the feature-creation code to keep everything in balance.
In summary, the approach of decoupling feature engineering is
  • Accurate: Average
    There isn’t much compromise in this approach, and intelligence creators can work in parallel to make gains.
  • Easy to grow: +
    The act of adding a few new features to an existing model is conceptually easy. Not the absolute simplest, but quite good.
  • Loosely coupled: Average
    Features can interact with each other, but as long as you aggressively remove redundancy, the coupling should not be a major problem.
  • Comprehensible: Average
    When trying to debug an interaction there aren’t good tools to pinpoint problematic features, and many model types make it particularly difficult. Sometimes you are left with “try removing features one at a time and retraining to see when the problem goes away.”
  • Measurable:
    It’s easy to measure improvement when the features are initially added. It isn’t so easy to track the contribution of the features over time (for example, as the problem changes).
  • Supportive of the Team: Average
    When there are clear boundaries in the context things can work well, but there are certainly plenty of ways to end up with conflict as to which features should be in and which should be out, particularly if there are any runtime constraints (CPU or RAM).

Multiple Model Searches

Another way to organize intelligence is to allow multiple creators to take a shot at the model-building process. For example, maybe one team member is an expert with linear models, while another is a master of neural networks. These practitioners can both try to create the intelligence, using whatever they are most comfortable with, and the best model wins.
Using multiple model searches can be effective when:
  • You have intelligence creators who are experienced with different approaches.
  • You are early in the process of building your Intelligent System and want to cast a wide net to see what approaches work best.
  • You have a major change in your system (such as a big change in the problem or a big increase in usage) and want to reverify that you have selected the right modeling approach.
But using multiple model searches can result in redundant work and in conflicts, because one approach will eventually win, and the others will lose.
The approach of multiple model searches is
  • Accurate: -
    This approach makes it hard to leverage many intelligence creators over time. It should be used sparingly at critical parts of the intelligence creation process, such as when it is clear a change is needed.
  • Easy to grow: -
    To ship a new intelligence you have to beat an old one. This means that new ideas need to be quite complete, and evaluated extensively before deploying.
  • Loosely coupled: Average
    There is just one intelligence, so there isn’t any particular coupling problem.
  • Comprehensible: Average
    There is just one intelligence, so there isn’t any particular comprehension problem.
  • Measurable: Average
    There is just one intelligence, so there isn’t any particular measurably problem.
  • Supportive of the Team: - -
    This is a bit of a winner-take-all way of working, which means there are a modeling winner and a modeling loser. It also tends to promote wasted work—chasing a modeling idea that never pans out.

Chase Mistakes

Another approach is to treat intelligence problems like software bugs . Bugs can be assigned to intelligence creators, and they can go figure out whatever change they need to make to fix the problem. For example, if you’re having trouble with a sub-population—say children—send someone to figure out what to add to the context, or what features to change, or what modeling to change to do better on children.
Intelligences will always make mistakes, so this approach could go on forever.
And one of the key problems with this approach is figuring out what mistakes are just sort of random mistakes, and which are systematic problems where a change in intelligence creation could help. When using this approach, it is very easy to fall into chasing the wrong problems, making everyone upset, and getting nowhere.
In my opinion, this approach should be used infrequently and only near the beginning of the project (when there are lots of legitimate bugs) or when there is a catastrophic issue.
The approach of chasing mistakes is
  • Accurate: -
    Intelligence sometimes works this way (like with a sub-population problem), but it is easy to get drawn into chasing the wrong mistakes.
  • Easy to grow: -
    Everyone needs to know everything to find and follow mistakes, develop productive changes, and deploy the fix. Also, this approach tends to lead to poor decisions about what problems to tackle.
  • Loosely coupled: Average
    Doesn’t really affect coupling.
  • Comprehensible: Average
    Doesn’t really affect comprehensibility.
  • Measurable: Average
    Doesn’t really affect measurability.
  • Supportive of the Team: -
    This approach does not provide nice boundaries for people to work with. It is also easy to fix one mistake by causing another, and it won’t always be clear that one fix caused the other mistake until much later. Done wrong, this approach can create a miserable work environment.

Meta-Models

The meta-model approach is to treat the predictions of the various intelligences in your system as features of a meta-intelligence. Every base intelligence runs and makes its decision, and then a meta-intelligence looks at all the proposed predictions and decides what the real output should be. Using meta-models can be
  • Very accurate, because it brings together as many approaches as possible and learns which contexts each approach is effective in and which it struggles with.
  • A great way to incorporate legacy intelligence. For example, when you find a new intelligence that is better than your original heuristics, you can throw away your heuristics… or you could use them as a feature in the new intelligence.
  • A good way to get multiple intelligence creators working together. There are no constraints on what they can try. The meta-model will use the information they produce if it is valuable and ignore it if it isn’t.
But meta-models can also be a bit of a nightmare to manage. Some complexities include these:
  • The meta-intelligence and the base intelligences become tightly coupled, and changing any part of it might involve retraining and retuning all of it.
  • If any piece of the system breaks (for example, one model starts behaving poorly) the whole system can break, and it can be very hard to track where and how problems are occurring.
If you want to use meta-models you will need approaches to control the complexity that interdependent models introduce, perhaps by
  • Enforcing some structure about which models can change and when—for example, freezing the legacy intelligence and only changing it if you find severe problems.
  • Building extra machinery to help retrain and retune all the intelligences that make up the system very, very easily.
In summary, the approach of using meta-models is
  • Accurate: ++
    Short term, meta-models have the power to be the most accurate of the methods listed here. The cost of them is in the other areas. For raw accuracy, use meta-models.
  • Easy to grow: -
    To ship a new intelligence you need to retrain the meta-intelligence, which risks instability in outcomes across the board. Careful testing is probably required.
  • Loosely coupled: - -
    Changing any base intelligence usually requires retraining the meta-intelligence. Unintended changes in any of the intelligences (e.g. some change in one of the data sources it depends on) can affect the whole system, to the point of completely breaking it.
  • Comprehensible: - -
    Every intelligence contributes to every decision. When there is a problem it can be extremely difficult (maybe impossible?) to track it down to a source.
  • Measurable: -
    It is easy to measure a new intelligence when it is added to the system. It isn’t so easy to track the contribution of the intelligences over time (for example, as the problem changes).
  • Supportive of the Team: Average
    There can be conflict between intelligences, but they can be created independently. There may also be conflicts for resources when the intelligences need to run in a resource restrained environment.

Model Sequencing

Model sequencing is a restricted version of the meta-model approach in which the meta-model is constrained to be super-simple. In the sequencing approach, the models are put into order by the intelligence creator. Each model gets a chance to vote on the outcome. And the first model to vote with high confidence wins and gets to decide the answer.
This can be accomplished for classification by setting a default answer—if no one votes, the answer is “male”—and allowing each model to run with a high-precision operating point for the “female” answer. If any model is very sure it can give a high-precision “female” answer, then it does; if none of the models are certain, the default “male” is the return value.
Model sequencing has less accuracy potential than meta-models, which can combine all the votes simultaneously, but it is much easier to orchestrate and control.
The approach of model sequencing is
  • Accurate: Average
    This approach trades off some potential accuracy for ease of management and growth.
  • Easy to grow: +
    An intelligence creator can put a new model in the sequence (as long as it has high enough precision) without affecting any other part of the system.
  • Loosely coupled: ++
    Models are completely uncoupled and are combined by a simple procedure that everyone can understand.
  • Comprehensible: ++
    Every interaction can be traced to the piece of intelligence that decided the outcome, and each piece of intelligence can have a clear owner.
  • Measurable: Average
    It is easy to measure how many positive and negative interactions each intelligence gives to users. The downside is that the first confident answer is taken, so other intelligences might not get all the credit (or blame) they deserve.
  • Supportive of the Team: +
    Anyone can easily add value. Potential conflict points include what order to use to sequence of models and what precision threshold to demand. But telemetry should provide good data to use to make these decisions empirically, so they shouldn't make people argue—much.

Partition Contexts

Partitioning by contexts is another simple way to organize multiple intelligences. It works by defining some simple rules on the contexts that split them into partitions and then having one intelligence (or model sequence, meta-model, and so on) for each of the partitions. For example:
  • One intelligence for servers in the US, one for all others.
  • One intelligence for small web sites, one for large web sites, and one for websites that aggregate content.
  • One intelligence for grey-scale images, one for color images.
  • One intelligence for new users, one for users with a lot of history.
This approach has advantages by allowing you to use different types of intelligence on different parts of the problem, solve easy cases the easy way, and also control the incidence of mistakes made on various partitions. Of course, machine-learning algorithms can technically use this type of information internally—and probably pick better partitions with respect to raw accuracy—but manually partitioning can be very convenient for orchestration and organization.
The approach of partitioning is
  • Accurate: Average
    This approach turns one problem into several problems. This doesn’t have to affect accuracy, but it might, particularly as innovations in one area might not get ported to all the other areas where they could help.
  • Easy to grow: ++
    An intelligence creator can define a specific partition and give intelligence that is tuned for it without affecting other partitions.
  • Loosely coupled: +
    Models are completely uncoupled and are combined by an understandable procedure (unless the partitioning gets out of hand).
  • Comprehensible: +
    Every interaction can be traced to the piece of intelligence that decided the outcome, and each piece of intelligence can have a clear owner.
  • Measurable: +
    It is easy to measure how many positive and negative interactions each intelligence gives to users.
  • Supportive of the Team: +
    Anyone can easily add value. Also, when one team member takes on a problem (by partitioning away something that was causing trouble for other models) it can be perceived as a favor: “I’m glad you took on those mistakes so my model can focus on adding this other value…”

Overrides

Overriding is an incredibly important concept for dealing with mistakes. The override structure for organizing intelligence works by having one blessed intelligence (usually created and maintained by humans) that can override all the other intelligences (usually created by machine learning)—no matter what they say.
One way this can be used is by hand-labeling specific contexts with specific outcomes. This can be used to spot-correct specific damaging mistakes. For example:
  • This web page is not funny, no matter what any intelligence thinks.
  • When all the toaster sensors read a specific combination, toast for 2 minutes (because we know exactly what product that is), no matter what the intelligence thinks.
Another way this can be used is by creating rules that can serve as guardrails , protecting from things that are obviously wrong. For example:
  • If we’ve released 10 pellets into the pellet griller in the past 5 minutes, don’t release any more, no matter what the intelligence says.
  • If the web page has any of these 15 offensive words/phrases it is not funny, no matter what the intelligence says.
Override intelligence should be use extremely sparingly. It should not be trying to solve the problem; it should just be covering up for the worst mistakes.
The approach of using overrides is
  • Accurate: Average
    As long as the overrides are used sparingly. They should be a backstop and not a complicated hand-crafted intelligence.
  • Easy to grow: ++
    An intelligence creator (or an untrained person with some common sense) can define a context and specify an outcome. Tooling can help.
  • Loosely coupled: Average
    The overrides are somewhat coupled to the mistakes they are correcting and might end up living in the system for longer than they are really needed. Over time they might turn into a bit of a maintenance problem if they aren’t managed.
  • Comprehensible: +
    Every interaction can be traced to any overrides that affected it. Intelligence creators might forget to check all the overrides when evaluating their new intelligence, though, so it can lead to a bit of confusion.
  • Measurable: +
    It is easy to measure how many positive and negative interactions each override saved/gave to users.
  • Supportive of the Team: +
    As long as overrides are used sparingly, they provide a simple way to make intelligence creators more productive. There is potential conflict between the intelligence creators and the people producing the overrides.

Summary

Most large Intelligent Systems do not have monolithic intelligences ; they have organized collections of intelligence. Organizing intelligence allows multiple intelligence creators to collaborate effectively; to clean up mistakes cheaply, to use right types of intelligence to target the right part of the problem, and to incorporate legacy intelligence.
A well-organized intelligence will be: accurate, easy to grow, loosely coupled, compressible, measurable, and supportive of the team. It will, of course, also be accurate and sometimes organization needs to be sacrificed for accuracy.
There are many, many ways to organize intelligence. This chapter presented some of the prominent ones, but others are possible. Important organization techniques include: decoupling feature engineering; doing multiple model searches; chasing mistakes; meta-models; model sequencing; partitioning contexts; and overrides. Figure 21-1 shows the summary table.
A455442_1_En_21_Fig1_HTML.gif
Figure 21-1
Comparing approaches to intelligence organization

For Thought…

After reading this chapter, you should:
  • Understand what it takes to work on a large, complex Intelligent System, or with a team of intelligence creators.
  • Be able to implement an intelligence architecture that allows the right intelligence to attack the right parts of your problem, and all participants to work together efficiently.
You should be able to answer questions like these:
  • Describe an Intelligent System that does not need to have any intelligence organization—that is, it works with just a single model.
  • What are some of the ways this might cause problems? What problems are most likely to occur?
  • Design a simple intelligence organization plan that addresses the most likely problem.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.72.74