©  Geoff Hulten 2018
Geoff HultenBuilding Intelligent Systemshttps://doi.org/10.1007/978-1-4842-3432-7_4

4. Defining the Intelligent System’s Goals

Geoff Hulten
(1)
Lynnwood, Washington, USA
 
An Intelligent System connects intelligence with experience to achieve a desired outcome. Success comes when all of these elements are aligned: the outcome is achievable; the intelligence is targeted at the right problem; and the experience encourages the correct user behavior.
A good success criterion connects these elements. It expresses the desired outcome in plain language. It indicates what sub-problems the intelligence and experience need to solve, and it ties those solutions to the desired larger scale (organizational) outcome. Implicit in good success criteria is a framework that allows all participants to see how their work contributes to the overall goal. This helps prevent them from heading in the wrong direction, even when data and their experiences are conspiring to mislead.
This chapter discusses setting goals for Intelligent Systems, including:
  • What makes good goals.
  • Why finding good goals is hard.
  • The various types of goals a system can have.
  • Some ways to measure goals.

Criteria for a Good Goal

A successful goal will do all of the following:
  1. 1.
    Clearly communicate the desired outcome to all participants. Everyone should be able to understand what success looks like and why it is important, no matter what their background or technical experience.
     
  2. 2.
    Be achievable . Everyone on the team should believe they are set up to succeed. The goal can be difficult, but team members should be able to explain roughly how they are going to approach success and why there is a good chance it will work.
     
  3. 3.
    Be measurable . Intelligent Systems are about optimizing, and so Intelligent Systems are about measuring. Because if you can’t measure something you really aren’t going to be able to optimize it.
     
It isn’t easy to know if a goal is correct. In fact, bridging the gap between high-level objectives and detailed properties of the implementation is often the key challenge to creating a successful Intelligent System. Some goals will seem perfect to some participants but make no sense to others. Some will clearly align with positive impact but be impossible to measure or achieve. There will always be trade-offs, and it is common to spend a great deal of time refining the definition of success.
But that’s OK. It’s absolutely worth the effort because failing to define success before starting a project is the easiest, most certain way to waste time and money.

An Example of Why Choosing Goals Is Hard

Consider an anti-phishing feature backed by an Intelligent System.
One form of phishing involves web sites that look like legitimate banking sites but are actually fake sites, controlled by abusers. Users are lured to these phishing sites and tricked into giving their banking passwords to criminals. Not good.
So what should an Intelligent System do?
Talk to a machine-learning person and it won’t take long to get them excited. They’ll quickly see how to build a model that examines web pages and predicts whether they are phishing sites or not. These models will consider things like the text and images on the web pages to make their predictions. If the model thinks a page is a phish, block it. If a page is blocked, a user won’t browse to it, won’t type their banking password into it. No more problem. Easy. Everyone knows what to do.
So the number of blocks seems like a great thing to measure—block more sites and the system is doing a better job.
Or is it?
What if the system is so effective that phishers quit? Every single phisher in the world gives up and finds something better to do with their time?
Perfect!
But then there wouldn’t be any more phishing sites and the number of blocks would drop to zero. The system has achieved total success, but the metric indicates total failure.
Not great.
Or what if the system blocks one million phishing sites per day, every day, but the phishers just don’t care? Every time the system blocks a site, the phishers simply make another site. The Intelligent System is blocking millions of things, everyone on the team is happy, and everyone feels like they are helping people—but the same number of users are losing their credentials to abusers after the system was built as were losing their credentials before it was built.
Not great.
One pitfall with defining success in an Intelligent System is that there are so many things that can be measured and optimized. It’s very easy to find something that is familiar to work with, choose it as an objective, and get distracted from true success.
Recall the three properties of a good success criterion:
  1. 1.
    Communicate the desired outcome
     
  2. 2.
    Be achievable
     
  3. 3.
    Be measurable
     
Using the number of blocked phishing pages as a success metric hits #2 and #3 out of the park, but fails on #1.
The desired outcome of this system isn’t to block phishing sites—it is to stop abusers from getting users’ banking passwords.

Types of Goals

There are many types of things a system can try to optimize, ranging from very concrete to very abstract.
A system’s true objective tends to be very abstract (like making money next quarter), but the things it can directly affect tend to be very concrete (like deciding whether a toaster should run for 45 or 55 seconds). Finding a clear connection between the abstract and concrete is a key source of tension in setting effective goals. And it is really, really hard.
One reason it is hard is that different participants will care about different types of goals. For example:
  • Some participants will care about making money and attracting and engaging customers.
  • Some participants will care about helping users achieve what they are trying to do.
  • Some participants will care that the intelligence of the system is accurate.
These are all important goals, and they are related, but the connection between them is indirect. For example, you won’t make much money if the system is always doing the wrong thing; but making the intelligence 1% better will not translate into 1% more profit.
This section discusses different ways to consider the success of an Intelligent System, including:
  • Organizational objectives
  • Leading indicators
  • User outcomes
  • Model properties
Most Intelligent Systems use several of these on a regular basis but focus primarily on user outcomes and model properties for day-to-day optimization.

Organizational Objectives

Organizational objectives are the real reason for the Intelligent System. In a business these might be things like revenue, profit, or number of units sold. In a nonprofit organization these might be trees saved, lives improved, or other benefits to society.
Organizational objectives are clearly important to optimize. But they are problematic as direct objectives for Intelligent Systems for at least three reasons:
  1. 1.
    They are very distant from what the technology can affect. For example, a person working on an Internet toaster can change the amount of time a cold piece of bread is toasted—how does that relate to number of units sold?
     
  2. 2.
    They are affected by many things out of the system’s control. For example, market conditions, marketing strategies, competitive forces, changes to user behavior over time, and so on.
     
  3. 3.
    They are very slow indicators. It may take weeks or months to know if any particular action has impacted an organizational objective. This makes them difficult to optimize directly.
     
Every Intelligent System should contribute to an organizational objective , but the day-to-day orchestration of an Intelligent System will usually focus on more direct measures—like the ones we’ll discuss in the next few sections (particularly user outcomes and model properties).

Leading Indicators

Leading indicators are measures that correlate with future success. For example:
  • You are more likely to make a profit when your customers like your product than when they hate it.
  • You are more likely to grow your customer base when your customers are recommending your product to their friends than when your customers are telling their friends to stay away.
  • You are more likely to retain customers when they use your product every day than when they use your product once every couple of months.
Leading indicators are a way to bridge between organizational objectives and the more concrete properties of an Intelligent System (like user outcomes and model properties). If an Intelligent System gets better, customers will probably like it more. That may lead to more sales or it might not, because other factors—like competitors, marketing activities, trends, and so on—can affect sales. Leading indicators factor some of these external forces out and can help you get quicker feedback as you change your Intelligent System.
There are two main types of leading indicators : customer sentiment and customer engagement.
Customer sentiment is a measure of how your customers feel about your product. Do they like using it? Does it make them happy? Would they recommend it to a friend (or would they rather recommend it to an enemy)?
If everyone who uses your product loves it, it is a sign that you are on the right track. Keep going, keep expanding your user base, and eventually you will have business success (make revenue, sell a lot of units, and so on).
On the other hand, if everyone who uses your product hates it you might be in for some trouble. You might have some customers, they might use your product, but they aren’t happy with it. They are looking for a way out. If you get a strong competitor your customers are ready to jump ship.
Sentiment is a fuzzy measure, because users’ feelings can be fickle. It can also be very hard to measure sentiment accurately—users don’t always want to tell you exactly what you ask them to tell you. Still, swings in sentiment can be useful indicators of future business outcomes, and Intelligent Systems can certainly affect the sentiment of users who encounter them.
Customer engagement is a measure of how much your customers use your product. This could mean the frequency of usage. It could also mean the depth of usage, as in using all the various features your product has to offer.
Customers with high engagement are demonstrating that they find value in your product. They’ve made a habit of your product, and they come back again and again. They will be valuable to you and your business over time.
Customers with low engagement use the product infrequently. These customers may be getting value from your offering, but they have other things on their minds. They might drift away and never think about you or your product again.
Leading indicators have some disadvantages as goals for Intelligent Systems, similar to those that organizational outcomes suffer from:
  • They are indirect.
  • They are affected by factors out of control of the Intelligent System.
  • They aren’t good at detecting small changes.
  • They provide slow feedback, so they are difficult to optimize directly.
  • And they are often harder to measure than organizational objectives (how many surveys do you like to answer?).
Still, leading indicators can be useful, particularly as early indicators of problems—no matter what you think your Intelligent System should be doing, if customers have much worse sentiment after an update than they had before the update, you are probably doing something wrong.

User Outcomes

Another approach for setting goals for Intelligent Systems is to look at the outcomes your users are getting. For example:
  • If your system is about helping users find information, are they finding useful information efficiently?
  • If your system is about helping users make better decisions, are they making better decisions?
  • If your system is about helping users find content they will enjoy, are they finding content that they end up liking?
  • If your system is about optimizing the settings on a computer, are the computers it is optimizing faster?
  • And if your system is about helping users avoid scams, are they avoiding scams?
Intelligent Systems can set goals around questions and decisions like these and try to optimize the outcomes users get.
This is particularly useful because outcomes rely on a combination of the intelligence and the experience of the Intelligent System. In order for a user to get a good outcome, the intelligence must be correct, and the experience must help the user benefit.
For example, in the anti-phishing example, imagine intelligence that is 100% accurate at identifying scams. If the experience blocks scam pages based on this intelligence, users will get good outcomes. But what if the experience is more subtle? Maybe it puts a little warning on the user’s browser when they visit a scam site—a little red X in the address bar. Some users won’t notice the warning. Some won’t interpret it correctly. Some will ignore it. In this case some users will get bad outcomes (and give their passwords to scammers) even though the intelligence had correctly identified the scam.
User outcomes can make very good targets for Intelligent Systems because they measure how well the intelligence and the experience work together to influence user behavior.

Model Properties

Within every Intelligent System there are concrete, direct things to optimize, for example:
  • The error rate of the model that identifies scams.
  • The probability a user will have to re-toast their bread.
  • The fraction of times a user will accept the first recommendation of what content to use.
  • The click-through rate of the ads the system decides to show.
These types of properties don’t always line up exactly with user outcomes, leading indicators, or organizational objectives, but they do make very good goals for the people who are working to improve Intelligent Systems.
For example, a model that is right 85% of the time (on test data in the lab) is clearly better than one that is right 75% of the time. Clear and concrete. Easy to get fast feedback. Easy to make progress.
But model properties have some disadvantages as goals for Intelligent Systems:
  1. 1.
    They are not connected to actual user reality. For example, if the Internet toaster always gets within 10 seconds of the optimal toast time will users like it? Would 5 seconds of error be better? Sure, obviously, of course. But how much better? Will that error reduction make tastier toast? What should the goal be? If we could get to a model with 4 seconds of error is that enough? Should we stop or press on for 3 seconds of error? How much investment is each additional second worth?
     
  2. 2.
    They don’t leverage the full system. A model might make a mistake, but the mistake will be perceived by users in the context of a full system. Maybe the user experience makes the mistake seem so minor that no one cares. Or maybe there is a really good way for the user to give feedback, which quickly corrects the mistake—way more cheaply than investing in optimizing the last few points of the model’s performance.
     
  3. 3.
    They are too familiar to machine-learning people. It is easy to build Intelligent Systems to optimize model properties—it is precisely what machine-learning people spend their lives doing, so it will naturally come up in any conversation about objectives. Be careful with them. They are so powerful and familiar that they may stifle and hijack the system’s actual objective.
     
Optimizing model properties is what intelligence is about, but it is seldom the goal. A good goal will show how improving model properties contributes to having the desired impact on users and the business. A good goal will give guidance on how much model property optimization is worth.

Layering Goals

Success in an Intelligent System project is hard to define with a single metric, and the metrics that define it are often hard to measure. One good practice is to define success on different levels of abstraction and have some story about how success at one layer contributes to the others. This doesn’t have to be a precise technical endeavor, like a mathematical equation, but it should be an honest attempt at telling a story that all participants can get behind.
For example, participants in an Intelligent System might:
  • On an hourly or daily basis optimize model properties.
  • On a weekly basis review the user outcomes and make sure changes in model properties are affecting user outcomes as expected.
  • On a monthly basis review the leading indicators and make sure nothing has gone off the rails.
  • On a quarterly basis look at the organizational objectives and make sure the Intelligent System is moving in the right direction to affect them.
Revisit the goals of the Intelligent System often during the course of the project.
Because things change.

Ways to Measure Goals

One reason defining success is so hard is that measuring success is harder still.
How the heck are we supposed to know how many passwords abusers got with their phishing pages?
When we discuss intelligent experiences in Part II of this book we will discuss ways to design intelligent experiences to help measure goals and get data to make the intelligence better. This section introduces some basic approaches. Using techniques like these should allow more flexibility in defining success.

Waiting for More Information

Sometimes it is impossible to tell if an action is right or wrong at the time it happens, but a few hours or days or weeks later it becomes much easier. As time passes you’ll usually have more information to interpret the interaction. Here are some examples of how waiting might help:
  • The system recommends content to the user, and the user consumes it completely—by waiting to see if the user consumes the content, you can get some evidence of whether the recommendation was good or bad.
  • The system allows a user to type their password into a web page—by waiting to see if the user logs in from eastern Europe and tries to get all their friends to install malware, you can get some evidence if the password was stolen or not.
Waiting can be a very cheap and effective way to make a success criterion easier to measure, particularly when the user’s behavior implicitly indicates success or failure.
There are a couple of downsides.
First, waiting adds latency. This means that waiting might not help with optimizing, or making fine-grained measurements.
Second, waiting adds uncertainty. There are lots of reasons a user might change their behavior. Waiting gives more time for other factors to affect the measurement.

A/B Testing

Showing different versions of the feature/intelligence to different users can be a very powerful way to quantify the effect of the feature.
Imagine giving half the users an intelligent experience and the other half a stubbed-out (simple, default) experience. Maybe half the users of the Internet toaster get a toast time of one minute no matter what settings they use or what they put into the toaster. The other half get a toast time that is determined by all the fanciest intelligence you can find.
If users who got the stubbed experience are just as happy/engaged/effective at toasting as the ones who got the full experience—you’ve got a problem.
A/B testing can be difficult to manage, because it involves maintaining multiple versions of the product simultaneously.
It can also have trouble distinguishing small effects. Imagine testing two versions of the intelligent toaster, one that toasts 1 minute no matter what, and one that toasts 61 seconds no matter what. Is one of them better than the other? Maybe, but it will probably take a long time (and a lot of observations of user interactions) to figure out which.
A/B testing is a great way to make sure that large changes to your system are positive , but it is troublesome with day-to-day optimization.

Hand Labeling

Sometimes a computer can’t tell if an action was aligned with success, but a human can. You can hire some humans to periodically examine a small number of events/interactions and tell you if they were successful or not. In many cases, this hand labeling is easy to do, and doesn’t require any particular skill or training.
In order to hand-label interactions, the Intelligent System needs to have enough telemetry to capture and replay interactions . This telemetry must contain enough detail so a human can reliably tell what happened and whether the outcome was good or not (while preserving user privacy). This isn’t always possible, particular when it involves having to guess what the user was trying to do, or how they were feeling while they were doing it.
But it never hurts to look at what your system is doing and how it is affecting users. The scale can be small. It can be cheap. But it can also be very useful.

Asking Users

Perhaps the most direct way to figure out if something is succeeding or not is to ask the user. For example, by building feedback mechanisms right into the product:
  • The user is shown several pieces of content, selects one. The system pops up a dialog box asking if the user was happy with the choices.
  • A self-driving car takes a user to their destination, and as the user is getting out it asks if the user felt safe and comfortable during the trip.
  • The toaster periodically asks, “is that toasted just the way you like it?” (And yeah, that could be pretty darn creepy, especially if it does it when you are home alone after dark and the lights aren’t on.)
A couple of things to keep in mind:
Users don’t always have the answer. For example, asking someone “did you just give your password to a criminal?” might not be very effective—and it might scare a lot of people for no good reason .
Users might not always feel like giving an answer to all of your questions. This will introduce bias. For example, users who are very engaged in their task might not pause to consider a survey, even though they are getting a good outcome.
Users will get sick of being asked questions. This type of feedback should be used sparingly.
But sometimes, asking just .1% of users a simple question once per month can unlock a lot of potential in helping you know if your Intelligent System is succeeding .

Decoupling Goals

Some things are hard to measure directly but can be broken down into simpler pieces, and the pieces can be measured (perhaps using some of the techniques from this section) and then stitched together into an estimate of the whole.
For example, consider the phishing example. The number of credentials lost to phishers can be decoupled into the following:
  • The number of your users who visit phish sites (which is estimated by waiting for user reports to identify phishing sites and combining with traffic telemetry to estimate the number of visits).
  • The percent of users who type their passwords into phish sites when the system doesn’t provide a warning (which is estimated by doing user studies).
  • The percent of users who type their passwords into phish sites when the system does provide a warning (which is estimated using telemetry on how many users dismiss the warning and proceed to type into the password box on known phishing sites).
Multiply these together and you have an estimate of a pretty good potential goal for the Intelligent System .
Decoupling is particularly useful when it identifies critical, measurable sub-problems and shows how they chain together with other clear sub-problems to reach overall success.

Keeping Goals Healthy

Sometimes a goal should change. But changing goals is difficult, particularly when people have organized around them. There might be a lot of inertia, many opinions, and no good answers. Having a process for reviewing goals and adapting them is a very good idea. Goals might change if any of the following happen:
  • A new data source comes on-line and shows some assumption was wrong.
  • A part of the system is functioning very well, and goals around further improvement to it should be changed (before they take on a life of their own, resulting in investing in the wrong things).
  • Someone comes up with a better idea about how to connect the Intelligent System’s work to actual customer impact.
  • The world changes and previous goal no longer reflects success .
And even if the goals don’t have to change, it’s good to get everyone together once and a while and remind yourselves what you all are trying to accomplish together.

Summary

Having goals is crucial to success in an Intelligent System. But goals are hard to get right. Effective goals should:
  1. 1.
    Communicate the desired outcome
     
  2. 2.
    Be achievable
     
  3. 3.
    Be measurable
     
Goals can be very abstract (like organizational objectives). They can be less abstract (like leading indicators). They can be sort of concrete (like user outcomes). Or they can be super concrete (like model properties).
An effective set of goals will usually tie these various types of goals together into a story that clearly heads toward success.
Most Intelligent Systems will contribute to organizational objectives and leading indicators, but the core work of day-to-day improvement will be focused on user outcomes and model properties.
Goals can be measured through telemetry, through waiting for outcomes to become clear, by using human judgment, and by asking users about their experiences.
And did I mention—goals are hard to get right. They will probably take iteration.
But without effective goals, an Intelligent System is almost certainly doomed to waste time and money—to fail.

For Thought

After reading this chapter, you should:
  • Understand the ways you can define success for an Intelligent System, and how to measure whether success is being achieved.
  • Be able to define success on several levels of abstraction and tell the story of how the different types of success contribute to each other.
You should also be able to answer questions like these:
Consider your favorite hobby that might benefit from an Intelligent System.
  • What organizational objective would the Intelligent System contribute to for its developers?
  • What leading outcomes would make the most sense for it?
  • What are the specific user outcomes that the Intelligent System would be tracked on?
  • Which way would you measure these? Why?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.167.195