©  Geoff Hulten 2018
Geoff HultenBuilding Intelligent Systemshttps://doi.org/10.1007/978-1-4842-3432-7_9

9. Getting Data from Experience

Geoff Hulten
(1)
Lynnwood, Washington, USA
 
Intelligence creators can work with all sorts of data, even crummy data. But when they have good data, their job is much easier—and the potential of the intelligence they can create is much greater. An ideal intelligent experience will control the interactions between users and the Intelligent System so that the record of those interactions makes it easy to create high-quality intelligence.
Before exploring ways you can craft intelligent experiences to get data from your users, let’s consider the alternative: you could gather data yourself. Traditionally, machine learning systems have a data-collection phase. That is, if you want to build an intelligence for counting the number of cows in a picture, you’d go out and take a lot of pictures of cows. You’d travel from farm to farm, taking pictures of cows in different situations. You’d pose cows so you get pictures of their good sides, and their bad sides; their faces and their… other parts. You’d take pictures of cows behind bushes. You’d take pictures of cows laying down, running, grazing, sleeping, yawning, on sunny days, on cloudy days, at night… A traditional computer vision system might require tens of thousands of different pictures of cows. And it would get better with more pictures—hundreds of thousands or millions might help. And once you had all that data you’d need to label it. That is, you’d pay lots of people to look at all those cow pictures and draw circles around all the cows—fun!
This can get very expensive. Data collection and labeling can be the primary cost in traditional machine-learning systems. Because of this, there is a lot of value in designing experiences that leverage users, and their interactions with the Intelligent System, to produce data with the correct properties to create intelligence. Getting this right makes it possible to use intelligence and machine learning in many situations where it would have been prohibitively expensive using manual data collection.
This chapter begins with an example of what it means to collect data from experience and how various approaches can affect the quality of the resulting data. We will then explore the properties that make data good for intelligence creation and some of the options for capturing user outcomes (or opinions) on the data.

An Example: TeamMaker

Let’s consider an example: an Intelligent System designed to create teams for basketball leagues—we’ll call it TeamMaker. Every player is entered into the system, along with some basic statistics: height, experience, positions they play, and maybe something about how well they shoot the ball. Then the system divides the players up into teams. If the teams are balanced, the league will be fun to play in. If the teams are imbalanced the league might be boring. One team will always win and everyone else will feel like losers. Feuds will begin, rioting in the streets, hysteria…
So we better get this right.
We need to build an experience that collects useful feedback about the teams the system creates, so the intelligence can improve (and suggest better teams ) over time.

Simple Interactions

One approach is to let users correct the teams and use the corrections to improve the intelligence.
For example, TeamMaker might produce “suggested teams” as a starting point. These proposals could be presented to a human—the league commissioner—to correct any problems. Maybe the intelligence got it wrong and put all the best players on one team. Then the commissioner could intervene and move some players around. Each time the commissioner moves a player, TeamMaker gets training data to make the intelligence better. After a few seasons, the intelligence might be so good that the commissioner doesn’t need to make any more changes.
But, to be honest, this basketball-team–creating system sounds pretty lame. Who wants to use a tool to make teams, and then have to make them by hand anyway? This approach could fail if commissioners don’t provide any corrections to the teams, because:
  • It sounds boring.
  • They don’t feel confident they can beat the machine.
  • They actually don’t understand how to play basketball, so they make corrections that make things worse.
  • They make the corrections offline and don’t bother to enter them into TeamMaker at all.
Slow or irregular interactions will limit the speed the intelligence can grow and limit its ultimate potential. A good experience will set up interactions that provide obvious value to the user and that the user can do a good job at getting right.

Making It Fun

Another approach would be to make the connection between usage and intelligence something that’s more fun. Something users really want to do and are motivated to do right.
For example, TeamMaker could make the teams, but it could also support betting. Most people who bet will want to win (because most people care about money), so they will try to bet on the best team—and will avoid betting on obvious losers. So TeamMaker can use the distribution of bets to figure out if it did a good job of constructing teams. If the bets are heavily skewed toward (or away) from one of the teams, TeamMaker can learn it used the wrong factors to balance the teams. It can do better next time.
This betting-based-interaction is an improvement to the team-tweaking–based interaction system and should result in more data per season and more opportunities to improve the intelligence, and it should produce a better Intelligent System faster.
But the data from betting-based interaction will not be perfect. For example, some people might only bet for their own team, even if it is obvious they can’t win. Or maybe the betters don’t know anything about basketball and are betting to pump up some good-natured office rivalry. (Or betting is illegal or against your morals, so as a developer you don’t want to use betting-based interaction with TeamMaker at all…) These types of problems can lead to bias, and bias can lead intelligence astray. When an intelligence learns from biased betting data the resulting teams will not be optimized for creating a competitive league, they will be optimized for… something else.
When users have a different objective than the Intelligent System the data from their interactions can be misleading. A good experience will align usage with the intelligence to get unbiased observations of success.

Connecting to Outcomes

Another approach would be to connect the intelligence directly to outcomes. No human judgment needed, just the facts. When a single team wins every single game, the intelligence knows it did something wrong. When all the games are close, the intelligence can learn it did something right.
For this approach to work, someone would have to be motivated to enter all the scores into TeamMaker. So maybe TeamMaker creates some fun features around this, like integrating with the betting, or leader-boards, or scheduling. People might crowd around their computers every time a game is completed, just to see how the standings have changed, who won money, and who lost pride.
As games are played and users enter statistics about which teams won, which games were close, and which were blowouts, TeamMaker has everything it needs to improve the intelligence. If there is a team that never loses, it can learn what is special about that team and avoid doing that in the future. And next season TeamMaker would do a better job, creating tournaments that are more likely to be balanced, and less likely to have an unstoppable (or a hopeless) team.
When the experience can directly track the desired outcomes , it can produce the most useful intelligence.

Properties of Good Data

In order to solve hard, large, open-ended, time-changing problems, you’re going to need data—lots of data. But not just any data will do. To build intelligence, you’re going to need data with specific properties. The best data will:
  • Contain the context of the interaction, any actions that were taken, and the outcomes.
  • Have good coverage of what your users want to do with the Intelligent System.
  • Reflect real interactions with the system (and not guesses or surveys about how people might interact).
  • Have few (or no) biases.
  • Avoid feedback, where the intelligence influences the data, which influences intelligence, which influences data…
  • Be large enough to be relevant to the problem. (Creating intelligence for difficult problems can require incredible amounts of data.)
Achieving all of these properties is not easy. In fact, you are unlikely to get the data you need to effectively create intelligence unless you explicitly address each of these properties in your design.

Context, Actions, and Outcomes

The basic requirement for creating intelligence from data is to know:
  1. 1.
    The context of what was going on when the intelligence was invoked.
     
  2. 2.
    Any actions that were taken as a result.
     
  3. 3.
    The outcomes of the interaction, and specifically if the outcomes were positive or negative.
     
For example:
  • A self-driving car needs to know what all the sensors on the car see (the context), the things a human driver might do in that situation (the actions), and whether the car ends up crashing or getting honked at (the outcome).
  • A book-recommending system needs to know what books the user has already read and how much they enjoyed them (the context), what books might be recommended to that user, whether the user purchased any of the books or not (the actions), and which of the books the user ended up liking (the outcomes).
  • An anti-malware system needs to know what file the user downloaded and where they got it from (the context), whether they installed it or not (the action), and whether their machine ended up infected or not (the outcome).
An ideal intelligent experience will create situations that have enough context to make good decisions, both for the user and for the intelligence. For example: if the automatic car doesn’t have any sensors other than a speedometer, then it doesn’t help to know what the user did with the steering wheel—no intelligence could learn how to steer based on the speed alone.

Good Coverage

The data should contain observations of all the situations where the intelligence will need to operate. For example:
  • If the system needs to automate lights, the data should contain the context, actions, and outcomes of controlling lights:
    • During the day.
    • At night.
    • In the winter.
    • In the summer.
    • During daylight savings time.
    • With lights that are heavily used.
    • With lights that are rarely used.
    • And so on.
  • If the lights need to work in 50 different countries around the world, the data should contain observations from those 50 countries in all of these situations.
  • If the system needs to work in a mineshaft, the data should contain observations of lights being used in a mineshaft.
  • If the system needs to work on a space station, the data should contain observations of lights being used in a space station.
An intelligence operating in a situation it was not trained (or evaluated) on is likely to make mistakes, crazy ones.
Intelligent Systems will be expected to work in new contexts over their lifetime. There will always be new books, movies, songs, web pages, documents, programs, posts, users, and so on. An effective intelligent experience will be able to put users into these new contexts with confidence that the mistakes made while collecting data will have low cost and the value of the collected data will be high.

Real Usage

The best data will come from users doing things they actually care about. For example:
  • A user driving in a car simulator might make decisions they wouldn’t make if they were driving a real car (when their life is on the line).
  • A user telling you what books they like might talk about literary classics (because they are trying to impress you) but they might never actually read those books.
  • A user might give very different answers when asked if a file is safe to install on a lab computer or on their mother’s computer.
Connecting real users to interactions they care about ensures the resulting data is honest, and that building intelligence from it will be most likely to give other users what they actually want.

Unbiased

Bias occurs when the experience influences the types of interactions users have, or influences the types of feedback users give.
One common source of bias is that different types of outcomes get reported at different rates.
Consider a spam filtering program. If a spam email gets to the user’s inbox, it is right in their face. They are likely to notice it and may press a “this is junk” button and generate useful data on a bad outcome. On the other hand, if the filter deletes a personal email, the user may never notice.
In this case, choices made in designing the experience have introduced bias and have made the resulting data much less useful for building intelligence.
Another potential source of bias is that users with strong sentiment toward the interaction (either good or bad) are more likely to give feedback than users with neutral opinions.
Another source of bias is when the experience encourages users to take certain choices over others. For example, when the experience presents choices in a list, the user is more likely to choose the first item on the list (and is very unlikely to choose an item that isn’t on the list at all).

Does Not Contain Feedback Loops

Experience and intelligence will affect one another, sometimes in negative ways.
For example, if some part of the user experience becomes more prominent, users will interact with it more. If the intelligence learns from these interactions, it might think users like the action more (when they don’t like it more, they simply notice it more because of the change to the user experience).
Conversely, if the intelligence makes a mistake and starts suppressing an action that users like, users will stop seeing the option. They will stop selecting the action (because they can’t). The action will disappear from the data. The intelligence will think users don’t like the option any more. But it will be wrong…
Here are some ways to deal with feedback:
  • Include what the user saw in the context of the recorded data. If one option is presented in a larger font than another, record it in context. If some options were suppressed, record it in context.
  • Put a bit of randomization in what users see, for example switching the order of certain options. This helps gather data in broader situations, and it helps identify that feedback may be occurring.
  • Record the effort the user took to select the option in the context. For example, if the user had to issue a command manually (because it was suppressed in the intelligent experience) the intelligence should know it made a costly mistake. If the user had to browse through many pages of content to find the option they want, the intelligence should know.

Scale

In order to improve, an Intelligent System must be used. This means that the intelligent experiences must be prominent, they must be interesting to use, they must be easy to find, and they must be the things that users do often.
Consider two options: a new Intelligent System and an established Intelligent System.
The new Intelligent System will probably not have many users. This means the intelligent experience must be central to what the users will do, so they will interact with it regularly. In these cases, the intelligent experience will need to encourage interaction, make it fun, and maybe even change the whole product around to put the intelligent interaction front and center.
The established Intelligent System will have much more usage. This means that the intelligent experiences can be more subtle. They can be put in places where fewer users will see them. This doesn’t mean they are less valuable—they may be solving very important problems, but problems that users encounter on a weekly or monthly basis instead of on a daily basis.
For some context, a system that:
  • Generates tens of interaction per day is basically useless for verifying or improving intelligence.
  • Generates hundreds of interactions per day can probably validate intelligence and produce some simple intelligence.
  • Generates thousands or tens of thousands of interactions per day can certainly validate the Intelligent System and produce intelligence for many hard, open-ended problems.
  • Generates hundreds of thousands or millions of interactions per day will probably have all the data it needs for most tasks.
An effective intelligent experience will attract users and get them engaging, building a base for gathering data and producing interesting intelligence.

Ways to Understand Outcomes

When an interaction between a user and intelligence occurs, it isn’t always easy to understand whether the outcome is positive or negative. For example, in a music recommender system—did the user push the “next” button because they hate the song (the intelligence got it wrong) or because they see the next song coming up and they really love that song (the intelligence got it right)?
The best intelligent experiences are set up to make the outcome clear implicitly—that is, without relying on the user to do anything other than use the product. But this isn’t always possible. Even when it is possible, it can be useful to have a backstop to make sure the implicit data has all the right properties to create intelligence. Methods for understanding outcomes include:
  • Implicit observations
  • User ratings
  • Problem reports
  • Escalations
  • User classifications
Many large Intelligent Systems use all of these approaches to understand as much as possible about the outcomes their users are having.

Implicit Outcomes

An ideal experience will produce good, useful data without requiring any special user action. Users will produce data simply by using the system, and that data will have all the properties required to grow intelligence.
For example, when a user sets their thermostat to 72 degrees it’s pretty clear what they would have wanted the intelligence to do. Or when the user buys a product or some content, it’s clear they were interested in it.
It’s often hard to know exactly how to interpret a user’s action. Users may not spend the time to make perfect decisions, so the way data is presented to them could bias their results. For example, they might have selected one of the five recommended movies because it was the absolute best movie for them at that moment—or because they didn’t feel like looking at more options.
Because of this, achieving a fully implicit data collection system is quite difficult, requiring careful coordination and curtailing of the experience so the usage can be effectively interpreted.

Ratings

User ratings and reviews can be a very good source of data. For example, designing experiences that allow users to:
  • Rate the content they consume with 1-5 stars.
  • Give a thumbs-up or thumbs-down on a particular interaction.
  • Leave some short text description of their experience.
Users are used to leaving ratings and many users enjoy doing it, feeling like they are helping others or personalizing their experience.
But there are some challenges with ratings:
  1. 1.
    Users don’t always rate everything. They may not feel like taking the effort.
     
  2. 2.
    There can be some time between the interaction and the rating. The user might not remember exactly what happened, or they might not attribute their outcome to the interaction.
     
  3. 3.
    The rating might not capture what the intelligence is optimizing. For example, consider these two questions: How good is this book? Was that the right book to recommend to you yesterday?
     
  4. 4.
    Ratings vary across different user populations—five stars in one country is not the same as five stars in others.
     
  5. 5.
    Infrequent interactions will not get many ratings, and be hard to learn about.
     

Reports

An experience can allow users to report that something went wrong. For example, many email systems have a “report as spam” button for spam messages that get to the inbox.
Data collected through reports is explicitly biased, in that the only outcomes it captures are bad ones (when the intelligence made a mistake), and users won’t report every problem. For example, users might click “report as spam” on 10% of the spam they see, and just ignore or delete the rest of the spam.
Because of this, data from reports is difficult to use as an exclusive source for creating intelligence. However, reports can be very effective in verifying intelligence and in tuning the experience to control the negative effects of poor intelligence.

Escalations

Escalations are an expensive form of report that occur when users get upset enough to contact customer support (or visit a support forum, and so on). Escalations are similar to reports but are out-of-band of the main product experience. And they usually represent very unhappy users and important mistakes.
Escalations tend to be messy and unfocused. The user who is having trouble won’t always know what part of the system is causing trouble. They won’t express the problem in the same terms as other users who reported it.
Because of this, it can be difficult and expensive to sort through escalations and use them to improve intelligence—they are better for identifying big problems than for refining the system.
In general, it’s better to allow users to report problems in context (with an in-line reporting experience, at the time of the problem) so the system can capture the relevant context and actions that led to the bad outcome.

User Classifications

You can simply ask users about their outcomes using specific (carefully controlled) questions, in context as users interact with your system. Sort of like a survey. Imagine:
  • Using a light automation system and once every three months it says: In order to improve our product we’d like to know—right now would you prefer the lights to be on or off?
  • Using a photo editing program and after touching up a hundred photos it says: Please help improve this program. Is there a face in the image you are working on?
Classification can produce very focused data for improving intelligence. It can also be made unobtrusive, for example by limiting questions:
  • To one question per thousand interactions.
  • To users who have volunteered to help.
  • To users who haven’t signed up for premium service.
  • And so on.

Summary

The intelligent experience plays a critical role in getting data for growing intelligence. If an intelligent experience isn’t explicitly designed to produce good data for creating intelligence, it is almost certainly going to produce data that is poor (or useless) for creating intelligence.
Experiences produce better data when users interact with intelligence often, when they perceive value in the interactions, and when they have the right information to make good decisions.
Data is most useful when it:
  • Contains the context of the interaction, the action taken, and the outcomes.
  • Contains good coverage of all the situations where the Intelligent System will be used.
  • Represents real usage, that is, interactions that users care about.
  • Contains unbiased data.
  • Contains enough information to identify and break feedback loops.
  • Produces a meaningful amount of data with respect to the complexity of the problem.
The most effective data comes from implicit interactions, where the user naturally expresses their intent and their opinion of the outcome by using the product.
When implicit interactions aren’t sufficient other methods of understanding outcomes include:
  • Allowing users to explicitly rate the content or interactions they have.
  • Allowing users to report problems directly from the user experience.
  • Giving users access to support paths to report large, costly problems.
  • Asking users to answer specifically curtailed questions (infrequently).
Many Intelligent Systems use all of these approaches. The majority of their data might come from implicit interactions, but the other techniques will be available to monitor and make sure the implicit data is high quality (has not started to develop bias or feedback).
Producing an experience that produces excellent data without burdening users is hard—and it is very important to building great Intelligent Systems.

For Thought…

After reading this chapter you should:
  • Know how to craft experiences that collect the data needed to evaluate and grow intelligence.
  • Understand the options for collecting this data from users, which range from ones requiring no explicit user action to ones requiring extensive user involvement.
You should be able to answer questions like these:
  • Think of an intelligent service you use regularly that seems to collect all its data implicitly. How could it be changed to leverage user classifications?
  • Imagine an Intelligent System to assist your favorite hobby. Think of an implicit way to collect “good data” when it does something wrong. Now think of another way to collect data in the same situation that expresses a different user interpretation of the mistake (for example maybe that the intelligence was correct, but the user didn’t want to do the suggested action for some other reason).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.80.209