Chapter 11. Delivering a product

This chapter covers

  • Understanding what the customer wants to see in results
  • Various forms that results can take, from a simple report to an analytical application
  • Why some content should or should not be included in the results product

Figure 11.1 shows where we are in the data science process: product delivery. Previous chapters of this book discuss setting project goals, asking good questions, and answering those questions through rigorous analysis of data. After all this is done, if you’re the lead data scientist you probably know more about every aspect of the project than anyone else, and you’re in a position to answer all sorts of questions about the project, ranging from the methods and tools used to the significance and impact of the results. But it’s not usually a good idea to stay in this position in perpetuity, making yourself the only possible source of information about the project and its results. Not only would you become the single point of failure (if you’re not available for some reason, what happens?), but you also would have created perpetual work for yourself, whenever questions come up. Because of these possibilities, it’s usually good to create something that summarizes or catalogs your results so that customers and other people can have their questions—at least the most common ones—answered without involving you.

Figure 11.1. The first step of the finishing phase of the data science process: product delivery

In order to create an effective product that you can deliver to the customer, first you must understand the customer perspective. Second, you need to choose the best media for the project and for the customer. And finally, you must choose what information and results to include in the product and what to leave out. Making good choices throughout product creation and delivery can greatly improve the project’s chances for success.

11.1. Understanding your customer

In chapter 2, I discussed listening to customers and asking them questions that can help you understand their problems, as well as providing information relevant to the questions they have. Hopefully, some of the strategies I presented led to good outcomes in data gathering, exploration, design and implementation of statistical methods, and overall results. I’ll revisit that idea of understanding the customer again here, with a focus on creating a product that will most efficiently make those good results available to the customer.

11.1.1. Who is the entire audience for the results?

You probably know the customer very well by now, but there may be people other than customers who might also be interested in results. If the customer is a leader of a group or organization, other members of that group might also be part of the audience for your results. If the customer is an organization, certain departments or individuals within that organization may be part of the audience, but others may not. If the customer is an individual or a department, results may be passed up the hierarchy, to bosses or executives, so decisions can be made. In any case, it’s best not to assume that the customer you’ve dealt with regularly is the only audience for the results you’ve generated. Consider the network of people surrounding the customer and whether they’re part of the audience. If you’re not sure, ask the customer, “Who do you foresee wanting to see these results, and why?” Hopefully, you can assemble a good idea of who the audience is.

11.1.2. What will be done with the results?

Once you know the audience for your results, you’ll want to figure out what they’re going to do with them. This is often more difficult than you would think.

In chapter 2, I wrote briefly about how you might discuss deliverables with the customer, so you may already have a good idea about what types of things the customer wants to see in the results and what they might do with them. In bioinformatics, for example, a customer might intend to take the top-10 candidate genes from your results and run extensive experiments on them. If you’ve built a beer-recommendation algorithm, the customer may intend to have their friends use the algorithm and then drink the recommended beers. There are many possibilities.

In a project involving organizational behavior for which you used some techniques in social network analysis, the customer may be interested in exploring each individual’s contacts and seeing how those contacts are similar or different from the individual. This example is less of an action than an interest. If a customer begins a sentence beginning with

  • “We would be interested in...”
  • “We want to see...”
  • “We would like to know...”

or similar, be sure to pursue the issue further and find out how exactly they intend to take action on this new knowledge. The actions they intend to take are far more important than what they’re interested in.

If, for example, the customer intends to make business decisions based on what they find out, then you should probably figure out what their tolerance is for error and incorporate that into the tailored results you present to them. Misunderstanding the intended action and its consequences can cause bigger problems later. The example at the end of this chapter gives one instance when miscommunication while delivering a product may have caused a problem.

There are an uncountable number of ways that a customer might act on the results that you deliver, so it’s best so spend some time trying to pin those down before you finalize them and their format. You might try having the customer run through a hypothetical scenario involving various types of results that are appropriate for the project, or you might even want to visit them in their workplace, observe their workflows, and witness personally how they make decisions. Talking to multiple people is also a good idea, particularly if your audience is composed of individuals with varying experience, knowledge, and interests in results. Overall, you’d like to understand as thoroughly as possible the perspective of the customer and audience and what they expect and intend with respect to the results that you’ll deliver. This understanding can help you create and deliver a product that helps the customer accomplish their goals.

11.2. Delivery media

The thing that you create and deliver to customers—the product—can take many forms. In data science, one of the most important aspects of a product is whether the customer passively consumes information from it, or whether the customer actively engages the product and is able to use the product to answer any of a multitude of possible questions. The most common example of a passive product is a report or white paper; the customer can find in this only the answers that are in the text, tables, and figures present in the document. The most common example of an active product is an application that allows customers to interact with data and analysis in order to answer some questions on their own. Various types of products can fall anywhere along the spectrum between passive and active. Each of these types has strengths and weaknesses, which I discuss in the following sections.

11.2.1. Report or white paper

Probably the simplest option for delivering results to a customer, a report or white paper includes text, tables, figures, and other information that address some or all of the questions that your project was intended to answer. Reports and white papers might be printed on paper or delivered as PDFs or other electronic format. Because a report is a passive product, customers can read it when delivered and can consult the report as needed in the future, but the report will never be able to provide any new answers that weren’t included when it was written—this is an important distinction between reports and more active product types. On the other hand, reports and white papers are some of the simplest and most easily digestible product types.

Strengths

Some strengths of a report or white paper are these:

  • On paper or in electronic form, reports and white papers are portable and don’t require any special technology or knowledge in order to use them, except some general domain knowledge of the report’s topic.
  • Reports and white papers can provide the simplest and quickest way for the customer to find answers if the desired answers are present and if the report is concise and well organized. For most people, finding and reading an answer on a page is easier than, for example, opening an application or interpreting data in a spreadsheet.
  • Reports also offer the ability to construct a narrative that can be useful for effective delivery of answers, information, caveats, and impact. Some product types provide data and answers out of context, but a narrative can establish contexts that help the readers of the report make better use of the results therein. For example, classifications generated by a machine learning algorithm can be far more useful to customers if they understand the accuracies and limits of applicability of that algorithm. A narrative can provide context prior to stating results in order to prevent misinterpretation and misuse of results.
Limitations

Some limitations of a report or white paper are these:

  • The biggest limitation of reports and white papers is that they’re fully passive. You need to know before you write the report which questions the customer wants to have answered, and you need to answer these questions in a way that’s easily comprehensible. If you’re not successful in writing a good report, the customer will return to you with questions or, even worse, dismiss the project as a failure and lose confidence in you and/or your team, even if the results themselves are quite good.
  • It can be tough to include the right amount of detail so that all the major points are covered and the most important questions are answered, while avoiding details that distract from the important points.
  • Reports and papers can answer questions only at the current time and may not apply to future times or other data sets outside the current set of data. If it’s likely that the customer will want to revisit the project’s questions in the future or use another data set, a report might not be the best choice.
  • Some people don’t like reading reports. People of various learning and leadership styles may prefer to see results in a different format, and if these people are stubborn and in a position of authority, writing a report would be a waste of time.
When to use it

A report or white paper can be a good product to deliver when

  • Your project involves a few key questions that can be answered completely and succinctly in a written report that may include tables, graphics, or other figures.
  • The main goals of your project involve answering a few questions one time, and these answers are useful by themselves, without an ongoing need to update or expand the answers.
  • The customer would like a written report, and you don’t feel that that’s an inappropriate request.

11.2.2. Analytical tool

In some data science projects, the analyses and results from the data set can also be used on data outside the original scope of the project, which might include data generated after the original data (in the future), similar data from a different source, or other data that hasn’t been analyzed yet for one reason or another. In these cases, it can be helpful to the customer if you can create a tool for them that can perform these analyses and generate results on new data sets. If the customer can use this analytical tool effectively, it might allow them to generate any number of results and continue to answer their primary questions well into the future and on various (but similar) data sets.

A simple example of such an analytical tool is a spreadsheet that makes projections based on the current financial situation and expectations of a company and its industry. Theoretically, a customer could enter a range of values into such a spreadsheet and see how the projections change if the company’s financial situations change. Customers might not be able to create the spreadsheet themselves if it consists of complicated formulas and statistical methods, but they can understand the intent and the meaning of the results if they, for example, conform to generally accepted financial-modeling principles.

An analytical tool that you might deliver as a product of your data science project might also be a software script that accepts a data set and analyzes it, generating results that can be used by the customer in a specific, useful way. It might also be a highly specialized database query that addresses some of the project’s questions. An analytical tool delivered as a product can take many forms, but it needs to fulfill some criteria:

  • The analytical tool needs to generate reliable results within the boundaries of the types of data sets for which it was intended.
  • The set of applicable data sets must be well specified.
  • The customer must be able to use the analytical tool correctly.

If all three of these criteria are met, then you might have a good product to deliver. The usefulness of the tool also depends on how many of the project’s questions it can answer and how important those questions are to the project’s goals and to the customer.

Strengths

Some strengths of analytical tools are these:

  • Analytical tools allow the customer to answer some of their own questions quickly and without involving you. This saves time and effort for everyone involved.
  • Within the intended scope of answerable questions, an analytical tool is more versatile than a report. Even within a narrow scope, analytical tools can usually give an unlimited number of results as inputs and data sets vary. It would be impossible to provide such unlimited results in a report.
Limitations

Some limitations of analytical tools are these:

  • It’s often difficult to build an analytical tool that’s good at answering important questions reliably and concisely for a customer. If at some point the tool runs into an edge case and gives incorrect or misleading results, the customer may not realize it.
  • Customers need to be able to understand the basics of how the tool works in order to know its limitations and interpret the results correctly.
  • Customers need to be able to use the tool properly, or they’ll risk getting incorrect results. If you’re not available to assist them, you need to have reasonable guarantees that they won’t mess something up.
  • If there are bugs or other problems with the tool, the customer may need support from you. Even if the analysis is good, things like data formatting, computer compatibility, and third parties whom the customer invited to share the tool can all cause unexpected problems that require your attention and slow the customer down.
  • Because it’s so hard to create a foolproof analytical tool, such a tool can typically replicate only the absolute clearest of the project’s analyses. Accuracy, significance, and impact must all generally be high, and so the scope of an analytical tool must be reduced to only those analyses and results that meet these stringent criteria.
When to use it

An analytical tool can be a good product to deliver when

  • The analysis completed within your project is conducive to being converted into such a tool, specifically that it can be made relatively easy to use and its results can be expected to be reliable.
  • The customers can be expected to understand the tool to a point that they can use it correctly and interpret results correctly.
  • A passive product such as a report isn’t sufficient for the customer’s needs, such as the case where the customer intends to replicate the project’s analysis for new data sets.

11.2.3. Interactive graphical application

If you want to deliver a product that’s a step more toward active than an analytical tool, you’ll likely need to build a full-fledged application of some sort. Although it can be argued that analytical tools like scripts and spreadsheets are also applications, I’ll draw a fuzzy distinction here between command-line-style, numbers-in-numbers-out analytical tools and graphical user interface (GUI) point-and-click-style applications. These aren’t well-defined categories, but I think the loose conceptual descriptions suffice here, because you can combine the two types in any number of ways and consider both sets of strengths and limitations listed here as appropriate. The former type (command-line style) I consider to fall into the analytical tool category of the previous section. In this section, I consider mainly GUI-based applications.

GUI-based applications, these days, are typically built on web frameworks, which I discussed earlier in this book. They don’t have to be web applications, but that type is most common right now. Such an interactive graphical application that you might deliver to your customer might include the following:

  • Graphs, charts, and tables
  • Drop-down menus that enable different analyses
  • Interactive graphics, such as a timeline with movable endpoints
  • The ability to import or select different data sets
  • A search bar
  • Results that can be filtered and/or sorted

None of these is required, but each of them enables the user (the customer) to answer more project-related questions at their leisure.

The most important thing to remember about interactive graphical applications, if you’re considering delivering one, is that you have to design, build, and deploy it. Often, none of these is a small task. If you want the application to have many capabilities and be flexible, designing it and building it become even more difficult. Software design, user experience, and software engineering are each full-time jobs at software companies, and so if you have little experience with delivering applications, it’s probably best to consult someone who does and to consider carefully the time, effort, and knowledge required before you start.

The strengths and limitations of interactive graphical applications include those of analytical tools discussed in the previous section, but I’ll add more specific ones here.

Strengths

Some strengths of interactive graphical applications include the following:

  • If it’s well designed, an interactive graphical application can be the most powerful tool that you can deliver to a customer in terms of the information and answers it can convey.
  • A well-designed and well-deployed interactive graphical application is easy to access and easy to use. It can be made clear within the application itself how to use the application properly and effectively.
  • An interactive graphical application can be made portable and scalable if it’s built and deployed using common frameworks. This can be useful if you expect the number of users to grow or if you think another customer will want to use it as well.
Limitations

Some limitations of interactive graphical applications are these:

  • Interactive graphical applications are hard to design, build, and deploy. Not only are the tasks difficult, but they also can take a lot of time.
  • Interactive graphical applications often require ongoing support. The potential for bugs and problems increases with the complexity of the software and the deployment platform, and supporting the application and fixing bugs may take a considerable amount of time and resources.
  • Customers might not use the application properly. If proper use isn’t clear, or if a user isn’t careful, misleading conclusions might be drawn.
When to use it

An interactive graphical application can be a good product to deliver when

  • The guidelines from the previous section for when to use an analytical tool apply.
  • A point-and-click GUI is strongly preferred over other types of analytical tools, either for ease of use or for the effectiveness of results delivery.
  • You have the time and resources to design, build, deploy, and support such an application.

11.2.4. Instructions for how to redo the analysis

Whether or not you’ve already elected to create and deliver one of the products I’ve discussed, it can be a good idea to record the steps that you took to perform the project’s final analysis and to package it into an instruction book for the customer’s use or even for your own.

If you’re dealing with a smart and capable customer, possibly even a data scientist of some sort, they may be able to replicate your analysis if given instructions. This can be helpful for them if they want to analyze new data or other similar data in the future. As with building an analytical tool, the goal is to enable the customer to ask and answer some of their own questions without requiring much of your time. Giving them detailed instructions can accomplish this without requiring you to create and deliver a high-quality, relatively bug-free software application. If you’re giving them any code, there’s still the possibility of encountering bugs, but in this case there’s a reasonable expectation that the customer can read, edit, and fix the code themselves if they need to. You may still need to provide support sometimes, but this arrangement shifts a large part of the support burden away from you, assuming the customer is capable.

On the other hand, delivering an instruction set to a customer can create all sorts of problems if they aren’t familiar with some of the steps or if they don’t have much experience with the tools you’re using.

Strengths

Some strengths of delivering a set of instructions are these:

  • It’s usually pretty easy to write down what you did, bundle it with your code or other tools, and deliver it to a customer.
  • A set of instructions can be extremely useful to you in the future, if you ever return to this project and need to analyze data in a similar way again.
Limitations

Some limitations of delivering a set of instructions are these:

  • The customer needs to understand everything you deliver, and they need to be able to replicate it, possibly with changes in data sets or other aspects, without encountering many problems that they can’t solve.
  • Delivering instructions requires a lot of time and effort on the part of the customer, because they will have to read and understand some complex analyses and tools.
  • Unclear instructions or messy code can make it nearly impossible to replicate the analysis reliably. You need to take care to avoid this possibility.
When to use it

A set of instructions for performing the analysis can be a good product to deliver when

  • The exploratory work was the challenging part, and applying the statistical methods is relatively easy for the customer to do.
  • The customer is smart and capable and shouldn’t have many problems working with the instructions, tools, and/or code that you deliver.
  • You think there’s even a remote possibility that you’ll return to this project in the future—hold onto the instructions for yourself.

11.2.5. Other types of products

There are many other products that might also fit your project and your customer well. Here are some:

  • A web-based API that, when queried in a certain way, returns answers and information pertaining to the query— This can be useful when a customer wants to be able to integrate your analysis into an existing piece of software.
  • A software component that’s built directly into the customer’s software— This takes more coordination than a web-based API because you have to understand the architecture of the existing software, but it can still be a good idea depending on how the software will be used and deployed.
  • An extended development project in which you work with the customer’s own software engineers to build a software component that they will then maintain themselves— If you have the time now, letting others build and maintain software based on your analysis can save you a lot of time later.
  • A database populated with the most useful data and/or the results of some analyses— For customers who regularly work with databases, giving them one can sometimes be more convenient for them to work with than an API or other component, if you can figure out an efficient and relatively foolproof way to structure the database so that it’s useful and is used properly.

Each of these has its own set of strengths and limitations, many of which correspond to those I listed for other products earlier. If you’re considering a product that doesn’t fit in any of the categories I’ve described in detail, perhaps thinking through it in much the same way that I have will lead you to your own conclusions. In particular, it’s important to consider whether a particular product is more passive or active and what the time requirements will likely be for you, both in the near future and in the long term if you’ll need to provide support. Beyond that, every project and every specific potential product will have its own nuanced situations, and if you find yourself even a little unsure of what the ramifications are, it can be very helpful to consult someone with experience and ask their opinions. The internet can also be a good source of guidance if you can distinguish the good from the bad.

11.3. Content

In addition to deciding the medium in which to deliver your results, you must also decide which results it will contain. Once you choose a product, you have to figure out the content you’ll use to fill it.

Some results and content may be obvious choices for inclusion, but the decision may not be so obvious for other bits of information. Typically, you want to include as much helpful information and as many results as possible, but you want to avoid any possibility that the customer might misinterpret or misuse any results you choose to include. This can be a delicate balance in many situations, and it depends greatly on the specific project as well as the knowledge and experience of the customer and the rest of the audience for the results. In this section, I provide some guidance on how to make decisions about inclusion and exclusion, and I also discuss how user experience can make products more and less effective.

11.3.1. Make important, conclusive results prominent

If there are critical questions that your project was intended to answer, and you now have conclusive answers to these questions, these answers should be prominent in your product. If you’re delivering a report, a summary of these important, conclusive results should appear in the first section or on the first page, and a more through discussion of methods and impact should be given later in the paper. If you’re delivering an interactive graphical application, these results should appear either on the main page of analytical results, or they should be very easily accessible through a few clicks, searches, or queries.

In general, it’s best to put the most important, conclusive, unmistakable, straightforward, and useful results front and center in whatever product you’re delivering so that the customer and the rest of the audience can find them without having to look for them and immediately understand the results and their impact.

11.3.2. Don’t include results that are virtually inconclusive

It can be tempting to tell the customer about all of the planned analyses that didn’t work alongside all of those that did. In a research setting, this might be a good idea, because failed experiments can sometimes give insight into how a system works and why other positive results turned out the way they did. But in non-academic data science, including in a report, the stories of the things that you tried that didn’t work can distract from the important, actionable information in the report.

Most people don’t need distractions from their work; they can find them on their own. If a piece of information doesn’t support valuable intelligence that can be used directly to make business decisions or otherwise help achieve any of the stated goals of a project, it’s usually best to leave it out. Feel free, however, to make note of any interesting tidbits for your own records or a supplementary report that isn’t the primary product for the customer; sometimes these can come in handy later if a project’s direction changes or if a new, related project is begun.

11.3.3. Include obvious disclaimers for less significant results

You probably have some results that fall between the labels conclusive and inconclusive. Deciding what to do with these can be tough, and there can be many reasons both for including them and for excluding them. It’s certain, though, that you wouldn’t want the customer to confuse results that are absolutely conclusive with those that are only partially so. Therefore, both as insurance for your own reputation and as a step toward making sure that the customer understands how best to use the information you’re providing, I highly recommended including disclaimers and caveats next to every result that’s less than 99.9% statistically significant or otherwise is not quite conclusive.

For example, let’s say you’re trying to detect fraudulent credit card transactions, and the customer is immediately going to reject the transactions that your software labels as fraudulent. If your software is 99.9% accurate, the customer probably won’t complain about the 0.1% of cases that were falsely rejected. But if your software has a false positive rate of 10%, the customer might complain very strongly. In similar situations, if you do have a 10% false positive rate, it’s definitely better to communicate that rate to the customer along with its potential implications before you deliver them the results and before they act on them. If there were any misunderstandings at all, and they acted on your results before they fully understood their implications and limitations, that could be bad for the customer, bad for the project, and possibly bad for you.

If you want to deliver some results that aren’t conclusive but that still might be helpful and informative, make sure you include a disclaimer stating exactly how significant the results are, what the limitations are, and what the positive and negative impacts are if the customer uses those results in certain ways. Overall, though, the customer needs to understand fully that you’re not 100% sure about these results and that acting on them might have unexpected consequences. If you can communicate this effectively, the customer will be in a good position to make good decisions based on your results, and you’ll be in a position to have a successful project.

11.3.4. User experience

Most people who use the phrase “user experience” refer to the ways that a person might interact with a piece of software. Within the software industry, user experience (UX) designer has become a lucrative career and rightly so. Understanding how people might interact with a piece of software is not an easy task, and it has been demonstrated in many contexts that how people interact with software has a large influence on whether that software is ultimately effective. User experience makes the difference between good analytical software that’s used effectively and good analytical software that most people can’t figure out how to use.

User experience can also refer to a report, an analytical tool, or any other product that you might deliver to a customer. The experience that the customer or audience has with your product is the user experience, and the principal goal is to ensure that these users use the product properly in order to draw correct conclusions from it and make good business decisions. If a customer isn’t using the product properly, you might reconsider the user experience. The goal is to enable and encourage the customer and the audience to do the right thing.

Inverted pyramid of journalism

The popular concept of an inverted pyramid, as illustrated in figure 11.2, shows how a journalistic news story might be represented in order to be most effective to the reader. The implicit assumption is that a reader might not read the whole article, or at least that the reader might not read the whole article with their full attention. Based on this assumption, the most important information, the lede (or lead), should be at the very beginning of the article, followed by the body that supports the lede, and finally the tail that adds to the rest but isn’t absolutely necessary for the story to be complete.

Figure 11.2. The inverted pyramid of journalism[1]

1

For reports on data science projects, this means that it might be most effective to follow the same pattern to deliver results to a customer: lead with the most important, most impactful results in clear language, then include details that directly support those results, and finally include other auxiliary results that are useful but not necessary.

For analytical tools and interactive graphical applications (and other products), it can be helpful to consider the concept of the inverted pyramid when designing and building. The most important results and information should be in the user’s face as soon as they start using the application. Supporting details might take a little more effort, but not too much, and then users might need to look around a bit before they find the less important, extra information.

Although it certainly shouldn’t be a hard rule of data science projects, following the inverted pyramid from journalism can be helpful in writing a report or designing an application that delivers the most important information to the customer first and the less important but still helpful information second.

Plain language with no jargon

Jargon is confusing to people who don’t work in the field from which it comes. You shouldn’t use jargon in your reports or your applications or any of your products. If you do use it, you should make the definitions clear to the customers, the audience, or the users of the product you’re delivering.

The term jargon is hard to define. For our purposes, jargon is a set of terms or phrasing that’s familiar to people of a specific training, experience, or knowledge but that isn’t familiar to people working in substantially different fields. Because you can rarely guarantee that the people you’re speaking with are people with similar backgrounds to you, it’s generally best to assume that they don’t know your jargon.

I admit to being wholly anti-jargon, but I also see the value in using jargon in highly specialized conversations and writing. Jargon allows people to communicate efficiently within their fields—or at least within the subfields for which that jargon is valid. In those situations, I fully support the use of jargon, but in any situation in which people might not understand the specialized terms, it’s best to avoid them.

When it comes to presenting your work, if you must speak with or in front of people, it’s helpful to speak more plainly than you think you need to and more slowly as well. It’s rarely advantageous to use terms that a significant portion of your audience doesn’t understand.

When you’re writing a report or text for an application, the same rules apply: the text should be comprehensible by most of the audience, even if they don’t have experience in some related areas. Most important, with respect to language and understanding, using jargon isn’t proof that someone knows what they’re talking about. It’s often the contrary, in my experience. The ability to explain complex concepts in plain language is a rare talent and in my opinion a far more valuable skill than explaining anything using jargon.

Visualizations

Like the field of user experience design, data visualization improves your application or other product greatly, but it isn’t usually a main focus of that product. Data visualization is also very well studied, and it’s usually best to heed the warnings and follow the best practices of those who have studied and thought a lot about data visualization.

Edward Tufte’s The Visual Display of Quantitative Information is a must-read book for people who want to get their data visualization absolutely right. There are other great references on the topic as well, but Tufte is usually the best place to start. Not only will you learn about when it’s best to use bar charts or line graphs, but you’ll also discover some key principles that apply to any visualization—maps, timelines, scatter plots, and so on—such as “encourage the eye to compare different pieces of data” and “be closely integrated with the statistical and verbal descriptions of a data set.” Tufte’s books are packed with such tenets and plenty of examples that show exactly what he means.

Visualizations of data and results can be helpful in reports and applications, but if they’re not designed well, they can be detrimental to the product’s intent. It’s often worth taking some time to study and consider the assumptions and implications of any visualizations that you’re trying to create. Consulting some data visualization references, like Tufte’s books, or someone with experience can have large benefits later, as the visualizations continue to serve their purpose as clear, concise conveyors of useful information.

The science behind user experience

The study of user experience is a science, though some people don’t treat it as such. I didn’t realize that it was or could be more science than art until a few years ago, when I witnessed experience studies and evaluations in action. There are many well-studied principles regarding what makes an application easy to use, or powerful, or effective, and if you’re building a complex application, employing these principles can make a huge difference in the success of your project. I encourage you to consult an experienced UX designer if you’re building an application. Sometimes even a short conversation with a UX designer can lead to great improvements in your application’s usability.

11.4. Example: analyzing video game play

While working with Panopticon Laboratories, an analytic software company whose goal is to characterize and detect suspicious in-game behavior in multiplayer online video game environments, we delivered a preliminary report to a customer (a video game publisher) that included a survey of the state of their in-game community as well as a list of some of the more suspicious players. To do this, as we did with all customers, we fit a proprietary statistical model to their data; this model assigned scores to each player, indicating how suspicious or fraudulent the player appeared to be in various categories. We were highly confident that the players with the highest scores were indeed fraudulent players, but the farther we progressed down the list of the most suspicious players, the less sure we were. We were contractually obligated to deliver to the customer a list of suspicious players, and we knew that we had to convey along with this list some notion of this uncertainty.

The customer didn’t employ any data scientists in the security department with which we were dealing, so we had to be careful not to mince words about statistical significance or uncertainty. The main thing we had to decide was how many suspicious players to include on the list. If we provided a relatively short list, we would likely be leaving out some very suspicious players who would continue to cost the video game publisher money. If we included too many players in the report, we would be pointing the finger at innocent players (with a not-insignificant false positive rate) and possibly misleading the video game publisher into thinking the problem was bigger than it was and possibly also causing them to take action against those innocent players, such as banning them from game play.

To resolve the decision, we worked with the customer to determine that they did indeed intend to ban players who were highly suspicious and that they were willing to accept a false positive rate of less than 5%. In order to establish a false positive rate, the customer planned to sample randomly from our initial list of suspicious players and check each one manually. We could then use the feedback to reinforce the statistical models and subsequently generate a new, more accurate report. Two or three rounds of this is usually enough to obtain high accuracy.

We had a minor setback when, after using the first-round feedback to generate another report but before we were quite ready to say that the behavior models were 100% done, the customer indicated that they were ready to begin banning the suspicious players from the game based on our lists. Luckily, before they acted, we had the chance to talk with them about how the most recent report still wasn’t necessarily actionable and that their feedback on this report would be crucial to gaining the requisite actionable intelligence from the next phase of reporting (meeting the <5% false positive rate requirement) and the subsequent software deployment. This was a classic case of not knowing what the customer was going to do with the product we were delivering. I’m not sure they would have admitted their intent beforehand (maybe they didn’t even realize it themselves), and so any line of questioning may have been fruitless anyway, but it was worth trying and worth being vigilant. The only thing worse than delivering something that isn’t entirely effective is delivering something that is effective but that is then misused, to much detriment.

After the next round of feedback, we delivered a report for which we expected a false positive rate of well under 5% (to give ourselves a cushion), and we made sure that the customer understood that the list still wasn’t perfect and that they could expect that a small percentage of the players on that list weren’t bad guys. If they took action against all of those players, they should expect some adverse effects.

After delivery of the final report, we began hooking up their data source to our real-time analytic engine that powers an interactive graphical web application that provides the same information as the reports, plus the ability to interact with and learn far more about players and their behavior. The application allows the customer to see the same curated lists of suspicious players and allows them to click players’ names and get more information about them—in particular, more information about why they’re considered suspicious. In many ways, the application is superior to the reports because of the increased amount of information available, the multiple ways and formats in which the results can be viewed, and the interactive nature of the application that lets users find the answers they need when they want them. Also, the application has informative graphics and a well-designed user experience, which makes interacting with data and results both easier and more intuitive. Supporting a customer deployment of a live application is a considerable amount of work, but the app seems to be far more useful to customers than the reports have been.

Exercises

Continuing with the Filthy Money Forecasting personal finance app scenario first described in chapter 2, and relating to previous chapters’ exercises, try these exercises:

1.

Suppose your boss or another management figure has asked for a report summarizing the results of your work with forecasting. What would you include in the report?

2.

Suppose the lead product designer for the FMF web app asks you to write a paragraph for the app users, explaining the forecasts generated by your application, specifically regarding reliability and accuracy. What would you write?

Summary

  • The product is, in a sense, the thing that you’ve been working toward for the entire duration of the project; it’s important to get the format and content right.
  • The format and medium of the product should, as much as possible, meet the customer’s needs both now and in the foreseeable future.
  • The content of products should focus on important, conclusive results and not distract the customer with inconclusive results or other trivia.
  • It’s best to spend some time thinking formally about user experience (UX) design in order to make the product as effective as possible.
  • Consider in advance whether the product will need ongoing support and plan accordingly.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.254.7