© Gavin Lew, Robert M. Schumacher Jr. 2020
G. Lew, R. M. Schumacher Jr.AI and UXhttps://doi.org/10.1007/978-1-4842-5775-3_4

4. Garbage In, Garbage Out

Doing a disservice to AI
Gavin Lew1  and Robert M. Schumacher Jr.2
(1)
S Barrington, IL, USA
(2)
Wheaton, IL, USA
 

Given the ever-evolving nature of AI, programmers need to continuously improve and refine their algorithms. In Chapter 1, we saw how algorithms are improved and often repurposed for different tasks, such as the credit card fraud detection system called Falcon that Craig Nies described had its roots in a visual system to detect military targets. Essentially, the foundation for pattern recognition to differentiate battlefield equipment from surrounding landscapes was applied to recognize patterns of fraud in credit card data.

But, again, let’s assume the AI code works; that is, the AI algorithms that feed all of the modern-day AI systems—whether they are called deep learning or machine learning or some other proprietary name—are capable of doing the job. If this is true, then the focus shifts from the code to the datasets that feed these systems. How much care has been placed into the data that feeds the machine?

We need to take a bit of a step back here and more clearly define the space we’re talking about. AI is a huge field. The focus of our data-centered discussion here points at AI-enabled products that rely on human behaviors, attitudes, opinions, and so on. Data that are actively solicited (e.g., surveys) have different properties (and problems) than data that are passively acquired. A lot of our discussion in this chapter focuses on data that is intentionally acquired from people.

What we Feed the Algorithms Matters

BOB: Consider Formula One racing. No matter how good the engine is, success depends on the entire ecosystem around the vehicle: the mix of the fuel, the skill of the driver, the efficiency of the pit crew, and so on.

GAVIN: An engine using low-grade fuel will underperform. In the case of AI, its fuel is data. While data scientists can massage the data to map to the algorithms for learning, how much care is placed on the dataset? That data might have been purchased from a site where it was “not perfect, but good enough.” Or it may be far removed from the researchers who collected it. What if the data is no longer high grade?

BOB: As UX researchers, we know a lot about data collected from people—it’s messy—the nuances in the questions, missing cells, context of collection, and more. The problem can be especially concerning if the dataset was not commissioned by the team using it. There’s a lot of trust that the dataset is clean.

GAVIN: AI algorithms are fed data initially to learn and train those models; those models are then applied broadly to more data to provide the insight.

BOB: The data that is fed into AI is pretty important for success, especially in the training phase when AI is learning.

The point It’s not just how good the algorithms are; it’s how good the data is or “garbage in, garbage out.” Let’s spend time giving AI the best data we can.

Swimming in data

As researchers, we (the authors) often talk to companies about their data. We ask about what they know, what they don’t know, and what is currently being collected. We look for gaps and opportunities where more or better data could answer strategic questions. A common issue is that they have more data than can be analyzed. So it’s not about collecting more data, but taking the time to think through how to better analyze what they have.

If this is the case, then the product team developing the AI-enabled technology must look critically into what data is used for training and what is used once the algorithm is trained. AI is an ecosystem of many elements—the algorithm is just one piece. It may be at the center and gets much of the attention, but success depends on all the elements being aligned and supportive of the objectives.

Companies swimming in data should think about how their data was obtained—did it come from a vast warehouse of compiled data, or was it gathered for a specific purpose? This is a critical question to help understand that data. Was the data specifically collected for the AI-enabled product that targets the key area of interest? If the data was not commissioned specifically for AI, then one must spend time to understand more about the dataset itself.

Questions to ask when evaluating a dataset are as follows:
  • Where did the dataset come from?

  • What was the method of data collection?

  • If it was survey data, what are the assumptions and conditions under which this data was obtained?

  • Were any of the data imputed (missing cells filled algorithmically)?

  • What other datasets could be joined to add supplemental context?

  • What do subject matter experts know about the data and how could this knowledge be beneficial to learning?

The point

These simple questions can identify areas of improvement for the training dataset that will be used to help AI learn. This is where we give AI a fighting chance at success by potentially giving the data more context.

So, how does AI really “learn”?

Data that capture human behaviors and interactions are given to machine learning (ML) scientists to train AI systems and algorithms. Whether the data comprises a set of liver-disease diagnoses and outcomes, comes from a consumer survey on attitudes toward marijuana usage, or derives from active/passive data collection of spoken phrases, AI systems need training data to ensure their algorithms produce the right outcomes. Custom-built data for AI may not be as common as datasets that were created for other purposes, such as market research, customer segmentation, sales and financial data, health outcomes, and a lot more. Once ML scientists have acquired a dataset, they still need to consider whether it includes what the AI system needs.

An example of how AI learns

At one level, AI can be thought of as a pattern recognition system. In order for an AI system to learn, it needs lots of examples. AI algorithm needs data to look for patterns, make mistakes, and refine its internal understanding to be better. As a fun example of this, Figure 4-1 illustrates an Internet meme that circulated a few years ago. What’s interesting is that it shows how easy it is for people to detect the signal (the Chihuahua) in a very noisy field. AI algorithms have a very difficult time with this and these samples are useful to validate patten recognition systems.
../images/470825_1_En_4_Chapter/470825_1_En_4_Fig1_HTML.jpg
Figure 4-1

Example of the challenge of pattern recognition and data that might be provided for an AI to learn how to distinguish between a Chihuahua from a blueberry muffin.

Different ways machines learn today

In general, there are three kinds of machine learning (ML) techniques for constructing AI systems, as follows:
  • Supervised learning  – in this approach, scientists feed algorithms a dataset comprising data—for example, labels, text, numbers, or images—and then calibrate the algorithm to recognize a certain set of inputs as a particular thing. For instance, imagine feeding an algorithm a set of pictures of dogs, in which each picture contains a set of features that correspond to properties of the picture. Inputs to the algorithm could also include a number of images that are not of dogs—for example, pictures of cats, pigeons, polar bears, pickup trucks, or snow shovels—and the corresponding properties of each of the not-dogs images. Then, based on what the algorithm has learned about classifying images as dog or not dog through the features and properties of images, if you show the algorithm a picture of a dog it’s never seen before, it has the ability to identify that it is, in fact, a picture of a dog. The algorithm is successful when it can accurately recognize an image as a dog and reject images that are not dogs.

  • Unsupervised learning  – this approach attempts to find classes of similar objects in a dataset based on each object’s properties. When scientists give an algorithm a set of inputs that have parameters and values, it tries to find common features and group them. For example, scientists might feed an algorithm thousands of pictures of flowers with various tags such as color, stem length, or preferred soil. The algorithm is successful if it can group all flowers of the same type.

  • Reinforcement learning  – this approach trains an algorithm through a series of positive and negative feedback loops. Behavioral psychologists used this technique of feedback loops to train pigeons in lab studies. This is also how many pet owners train their animals to follow simple commands such as sit or stay and then reward them with a treat or reprimand them with a no. In the context of machine learning, scientists show an algorithm a series of images, and then as the algorithm classifies images—of, say, penguins—they confirm the model when the algorithm properly identifies a penguin and adjust it when the algorithm gets it wrong. When you hear about bots on Twitter that have gone awry, this is typically an example of reinforcement learning where the bots have learned to identify examples incorrectly, but the system thinks they are correct.1

Although all ML techniques are useful and applicable in various contexts, let’s focus on supervised learning.

All data are not equal

Obtaining good training data is the Achilles heel of many ML scientists. Where does one get this type of data? Getting data from secondary sources is surprisingly easy. There are many sources2 that provide access to thousands of free datasets. Recently, Google launched a search tool to make publicly available databases for ML applications easier to find. But it is important to note that many of these databases are very esoteric—for example, “Leading Anti-aging Facial Brands in the U.S. Sales 2018.”3 Nonetheless, data is becoming more accessible. While this supports educational endeavors, the ability for businesses to use these databases for mainstream applicability may be low.

These databases have limitations such as the following:
  • They might not have precisely what ML researchers are seeking—for example, videos of elderly people crossing a street compared to children riding bicycles.

  • They might not be tagged appropriately or usefully with the metadata that is necessary for ML use.

  • Other ML researchers might have used them over and over again.

  • They might not represent a rich, robust sample—for example, a database might not be representative of the population.

  • They might lack enough cases/examples.

  • They might not be very clean—for example, they could have lots of missing values.

As many researchers often say, all data are not equal. The inherent assumptions and context that are associated with datasets often get overlooked. If scientists do not give sufficient care to a dataset’s hygiene before plugging it into an ML system, the AI might never learn—or worse, could learn incorrectly, as we described earlier. In cases where the quality of the data may be suspect, it’s difficult to know whether the learning is real or accurate. This is a huge risk.

Knowing what we now know about machine learning and the risks and limitations of datasets, how can we mitigate these risks? The answer involves UX.

Playing Catch Up with Computing Speed when we Should be Slowing Down

BOB: Recovering from failure and learning is necessary and part of how AI will evolve and succeed. But, recovering from failure requires significant overhaul to not do what did not “work” but review what “worked” as well.

GAVIN: Yes. And consider how fast the technology is advancing. The faster AI advances, in some ways, we lose the opportunity to think about ethical considerations or even about revisiting the foundations.

Consider the evolution of the CPU. Under Moore’s Law, the number of transistors in a CPU doubles every 2 years, but in AI’s case, computing power for AI took advantage of the massively parallel processing of a GPU (graphics processing unit). These are the new graphics chips associated with making video games smoother and the incredible action movies we see today. Massively parallel processing required to present video games made AI much, much faster. AI systems often took months to learn the dataset. When graphics chips were applied to AI applications, training intervals dropped to single days, not weeks. There is barely enough time to stop and think about the results.

BOB: If AI applications are to learn, they learn from consuming data. When one thinks of data, it is easy to assume that the AI application would take all data into consideration. But, practically speaking, consuming data still takes time. If the training sets are consumed faster, have we evolved our thinking on the data itself or just the processing power?

GAVIN: This is the trap. With all the emphasis on hardware and algorithmic advances, my fear is that this only distracts from getting the foundation right first.

BOB: Simply put, “garbage in, garbage out.” We are doing a disservice to AI if we don’t think about what we are feeding it.

The point Advances in AI will come, but are we taking time to understand the data that we feed into the machine?

Getting custom data for machine learning

While not all datasets relate to human behavior, the majority of them do. Therefore, understanding the behaviors that the data capture is essential. Over the last decade, our UX agency has been engaged by many companies to collect data for custom datasets. This means we had to collect precise examples and attribute tags that are necessary to train or prove in their AI algorithms. (In some cases, thousands of data points are needed, which are samples of different things.) Here are some examples of these samples:
  • Video samples of people doing indoor and outdoor activities

  • Voice and text samples of doctors and nurses making clinical requests

  • Video samples of people stealing packages from porches

  • Video samples capturing the presence or absence of people in a room

  • Thumbprint samples from specific ethnic groups

  • Video and audio samples of people knocking on doors

Note that none of this data was available publicly. We had to build each of the datasets through custom research based on the specific intentions and research objectives of our clients.

Custom Data for AI is a Big Deal

BOB: When we received a request from a client to collect thousands of samples for a custom dataset, we raised an eyebrow at how to approach this from a practical perspective. Thousands of people—face to face data collection! We were to capture in situ behaviors.

GAVIN: After reviewing the specifications, the amount of precision required was immense. Because participant demographics are always important to ensure samples are representative to the target population, we would need many participants. For instance, a facial recognition AI on a smartphone or computer needs to learn to recognize data from the same participant in different situations. They might have changed their appearance. So, we would collect data with and without beards. They could wear different clothing or different makeup, or different hair styles, and so on. We would systematically ask participants to change their look to add additional samples to the data. This would allow AI to learn about people, but also train it to recognize that the same person can look different. We were asked to capture participants in a variety of contexts. The amount of care placed on the ask was well thought through.

BOB: It also extended to different continents too. At some point, we argued that there were more cost-effective ways to collect this massive amount of data than through a UX firm like ours, as we tended to collect data on a much smaller basis. The response was that they understood, but that most large-scale data collection lacked the experimental rigor to capture what was needed for their AI application. They wanted to use precision that was typically used in small sample research studies but replicated two orders of magnitude higher.

GAVIN: When we write about using datasets that were used for other projects or datasets that are purchased and used for AI, the difference between that and commissioning a custom dataset for a specific AI application is striking. It is one thing to “dust off” an old dataset and entirely another world to specify what your AI would need to consume to properly train.

The point While an existing dataset might describe people and behavior, a custom dataset can be tuned to the elements that make AI smarter and better. Care for the details in the data makes AI better.

Understanding the sheer magnitude of data collection necessary for effective ML applications, it seems obvious that these datasets should be custom. But, how much time, effort, and money are actually spent on clean datasets relative to the programming?

For many scientists and researchers, the easy way is to use data that already exists. But our clients who commissioned these projects understood a key shortcoming of these methods: low data integrity. The project sponsors recognized that the underlying data had to be clean and representative of the domain they were trying to model—carefully considering the nuances of captured experiences. So, we needed to collect the behaviors in context and had to observe them—not simply ask for a number on a five-point scale—as is often the case in quantitative data collection. Apart from the obvious problems in survey research, we, as psychologists, understand that people often cannot report on their own behaviors reliably. That is, we can’t often just ask people to tell us what they did; we must observe and record. Capturing behavior is the prerogative of UX and requires research rigor and formal protocols. What we learned is that UX is uniquely positioned to collect and code these data elements through our tested research methodologies and expertise in understanding and codifying human behavior.

Data hygiene

While this section may seem somewhat redundant with some of the material covered in Chapter 3, the dataset can be fraught with concerns, such as the following:
  • Identification of missing cells that are filled with imputed data where cell tags (i.e., notations that this is imputed) are not passed on to the AI team

  • Data that is purposefully and systematically unsurveyed so multiple participants are combined to complete a full survey (i.e., split questionnaire survey design).

  • Surveys that are completed by bots, acting as humans4

Data scientists take data hygiene very seriously. What we are concerned with here is that we have heard from those involved immersed in AI development that sometimes the preceding issues are glossed over. Let’s not assume the data is free from elements that can skew the learning.

The point

Let’s not assume the data—even if no cells are missing—is free from elements that can bias outcomes and create unintended consequences.

Doing a Disservice to AI

BOB: The question for anyone who is working on AI applications is how much care is spent on the data itself?

GAVIN: This is a challenge because so many hands touch the data and when it is passed from survey designers to programmers to respondents to data scientists then to AI technologists, who know exactly what was done to the dataset that might be an artifact that will influence the AI application?

BOB: We know that some data scientists tag fields that have been imputed, but by the time the data is washed and formatted for training the AI application, has this knowledge been stripped away?

GAVIN: There is a disservice to AI to have it trained on datasets where there may be underlying flaws in the data..

The point The dataset deserves a lot of scrutiny—ask questions on the methodology, respondents, questions, design, and so on. This is what the AI application will use to learn and all team members can play a role to give AI better data.

Black box implications

As described in Chapter 3, one challenge with AI is that it does not reveal the meaning or rationale behind what it finds. It is a classic “black box” where data goes in and an answer comes out, but there is no description of why or how the answer came to be.

As mentioned above, potentially compounding the problem is the concern that the data we think is obtained from humans, just might be from bots. Or in our efforts to make a complete dataset, we use imputed data where an equation or algorithms were used to fill data elements. The concern is that any outcome obtained might simply be the result of the AI system reverse engineering the imputation algorithms used. Because AI is a black box, we are not able to inspect the “why” behind AI results. This takes away our ability to walk backward through the AI application’s conclusions to find the underlying rationale. This can be problematic, especially considering how fast the business world acts on AI findings.

Ethics is Best Early not Late in the Discussion

BOB: When I started my career, there were companies that were considered innovators and there were those that adopted a “wait and see” attitude about “fast following” of innovations.

GAVIN: Today, these corporate philosophies still exist, but it seems that the brand value of being innovative is much stronger, and this is driving companies to innovate faster and faster. Consider the practice of producing the minimum viable product (MVP), where startups and monolithic companies alike are launching products with a bare minimum feature set hoping to capture the attention of the marketplace and learn quickly from customers.

BOB: One challenge of MVP is what happens if the product has been pared down such that isn’t very compelling in its MVP state? This is not only a UX and value proposition issue, but what concerns me more about AI is how quickly companies are moving to be first to market. Let’s say you are creating an AI-enabled application. You rush to get data from sources that make sense. The dataset is cleaned and used for training. After AI “trains” and presumably “learns,” it identifies an interesting finding. What does a company do next?

GAVIN: A company that believes they are “innovative” will run to build a business case, get funding, and build a product where AI is at the core. But what if the dataset is dodgy or biased due to poor sampling?

BOB: You are talking about ethics in data. This is an area where AI has not developed fully. Companies are building AI not for foundational science, but for commercial advantage. The same sorts of issues that arise with bias in the culture also exist in the data. So the fear is that AI applications may have subtle—or even not-so-subtle—biases because the underlying data contain biases.

The point Organizations are moving fast to build applications, but social and ethical considerations inherent in the data  need to be addressed, developed, and adopted.

Next, let’s take a deeper look at ethics and AI through the lens of privacy and bias.

Ethics and AI

The ethics of AI is a relatively new area of discussion. Only recently has AI become mainstream enough that ethical considerations are beginning to take shape. There are no formal ethical standards or guidelines for AI. It is very much the proverbial “Wild West” where technology is being created without guardrails.5

The concern is that the “grist for the AI mill” (the data) could hide ethical concerns. What data was used? Was the data universal? Did it have too much focus on a region or socioeconomic level? If the training data contain bias, would AI have an opportunity to revisit the underlying training or will it always have a bias?

Let’s look at two important points concerning ethics and AI: privacy and bias.

Privacy

The data science revolution is the centerpiece of major tech companies. As outlined by Facebook investor Roger McNamee, web startups like PayPal, Facebook, and Google have made massive inroads through a big-data first approach—using data to build more functional and successful products, then selling that data.6 Despite being well connected in the tech industry, McNamee sounded the alarm about tech companies’ big-data focus. He invoked the idea that user privacy is actively being compromised by major tech firms in a way that outweighs any benefits of their services. While privacy concerns haven’t stopped programs like Gmail and Facebook from becoming behemoths, they are an ever-present part of the discussion around issues of big tech, and AI may only exacerbate these fears. In 2010, then-Google CEO Eric Schmidt described Google’s capabilities in terms sure to scare any user concerned about their privacy:

We don’t need you to type at all. We know where you are. We know where you’ve been. We can more or less know what you’re thinking about.7

This quote from a decade ago described how algorithms guided by fallible human beings could extract untold insights from the data we all share online. When Eric Schmidt was asked during an interview for CNBC’s “Inside the Mind of Google” special about whether users should be sharing information with Google as if it were a “trusted friend,” Schmidt responded:

If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.8

When you consider what is in a dataset and how it is derived from human behavior, this is a clear example of behavior and how it can be used to analyze and predict future behaviors. The message that Schmidt might not be explicitly describing is how much information Google really has. It is certainly more than simply search terms, but geonavigation data, actual consumer purchases, and email correspondences at the very least. And the rub of it all is that we give our consent to have this data collected by clicking through and accepting the policies. We’re all giving up our privacy for the putative benefits that the technology offers us. 

Privacy can be divided into three different types:
  • Big Brother privacy (keeping personal information from government or business entities)

  • Public privacy (keeping personal information from coworkers or community)

  • Household privacy (keeping personal information from family or roommates)

Each of these three types of privacy has different impacts on UX.

For a long time, Big Brother privacy intrusions have mostly been tolerated by users. After all, we’ve all had the experience of clicking through the Terms and Conditions for a new account or app without reading them. But, with the era of big data fully upon us, the issue seems to be gaining political salience. This is best exemplified by the European Union’s GDPR privacy law, one of the most prominent attempts to regulate big data. The GDPR is “based on the concept of privacy as a fundamental human right.”9 Privacy and policy research director Michelle Goddard views the GDPR’s regulations on data collection as an opportunity for data scientists, not a setback. She says the GDPR’s focus on ensuring privacy through “transparency” and “accountability” aligns with privacy practices necessary for ethical research, including anonymizing personal data.10 AI, similarly, can focus on transparency to dispel user concerns about Big Brother privacy.

Public privacy is probably the least likely of these three forms to be violated given the current political and mainstream concerns focused on big businesses like Google and Facebook, so let’s look at household privacy.

Household privacy is most salient with programs or devices that are meant to stay at home or to be used by one user in particular, such as standalone virtual assistants. If a user buys a virtual assistant device for their household, it can lead to violations of household privacy. For example, the user’s roommate might be able to read and respond to their texts, or their spouse might stumble upon an update on the delivery status of their secret anniversary gift. The desktop computers of a bygone era were a classic case of potential household privacy violations, which were resolved by the feature of individual user profiles. A similar solution might help virtual assistants—but the technology for a convenient profile solution on virtual assistants is still evolving.

Mattel created a virtual assistant that offers a glimpse at a profile system. Aristotle was Mattel’s virtual assistant, based on Amazon Alexa, which was intended to primarily serve children. The company planned to make Aristotle capable of understanding a child’s voice and of differentiating it from adult voices. Then, the device could offer limited capabilities to child users, while also offering adults the ability to use Alexa to do more complex tasks like ordering childcare supplies.11 However, Aristotle was canceled in 2017, after consumer advocates, politicians, and pediatricians objected. Big Brother privacy concerns were one major reason for objections to Aristotle, along with concerns about child development.12

While Aristotle may not have come to fruition, an AI system like it that can differentiate users’ voices from one another and associate with an individual’s profile is a solution to the problem of household privacy in virtual assistants. There are other possible solutions, of course—perhaps a future assistant could determine who it is talking to by discovering whose smartphone is in the room. In 2017, Google Home provided a feature where it could distinguish from up to six different household members13 and Amazon’s Alexa followed suit in 2019 with “Voice Profiles.”14

Users’ expectations of privacy online can be slippery, as Microsoft principal researcher Danah Boyd has pointed out. Boyd has written that users’ expectations of privacy online are most obviously violated when the context is stripped away from their actions and those actions are released to a wider public than the user had intended. This leads the user to feeling a loss of “control over how information flows,”15 which results in user mistrust in the technology that removed the context.

For an example of how to build trust, let’s turn back to Spotify. The company cites data claiming that it is more trusted than its competitors, including among millennials. They cite “discovery” features like Discover Weekly and the partially neural-network-powered recommendation engine as a primary reason why.16 In an article directed at advertisers, Spotify claims that users are willing to give a company personal information so long as it results in a useful feature. Spotify’s recommendations are that useful feature.

The Spotify recommendation engine is built based only on data from Spotify itself, and it even allows users to enter a private mode in which Spotify won’t count their streams. That means that users can simply take their guilty pleasures elsewhere (might want to stream that Nickelback album in private mode or on YouTube) and make sure they don’t affect their recommendations. This helps users trust that Spotify’s data collection serves a purpose for them.

AI Does not know where the “Line” is, So we Need to Draw IT

GAVIN: This is a very difficult question to solve for businesses because companies have an obligation to their shareholders first, so AI-enabled products should be made with all data available to produce a compelling product.

BOB: But, if users rebel, that will hurt the shareholders. Companies must still balance privacy to not negatively impact their brand.

GAVIN: This reminds me of another quote by Eric Schmidt. When he was asked about whether Google would implant technology into the brain to get information, Schmidt said, “There is what I call the creepy line. The Google policy on a lot of things is to get right up to that creepy line and not cross it.”17

BOB: And let’s hope we can trust businesses to know where that line is.

The point The need for AI to respect privacy comes from those who develop and market AI.

Privacy implications center around whether the data should be used in AI. Let’s explore the concept of bias that creeps into our dataset—even with the best of intentions.

Bias in datasets

Ethical considerations and artificial intelligence date back to 1960 when Arthur Samuel wrote in Science about the moral consequences about a machine simply making conclusions from logical consequences of the input it is given.18 Today, much of the focus on AI ethics is on the “what” (principles and codes) rather than on the “how” (practical application to AI). Ethics and AI have a long way to go.

Awareness of the potential issues [of AI] is increasing at a fast rate, but the AI community’s ability to take action to mitigate the associated risks is still at its infancy.

—Morley, Floridi, Kinsey, and Elhalal (2019)19

How Does AI Know what is Important?

BOB: Let’s take a medical example where AI takes in data and learns. One could argue that the very best data is from peer-reviewed journal articles. Studies described in these articles can be replicable (in theory), and medical science and careers advance through peer-reviewed publications.

GAVIN: But let’s also consider generations of medical research from the 1960s and earlier where mostly men were participants. We have learned through the years that women have differing symptoms from men for the same disease. For example, women often delay seeking medical attention for a heart attack because they feel abdominal pain and not chest pain.20

BOB: This opens the question of whether the dataset used for AI applications adequately weighs evidence. The process of publications is to build on what is known. When a groundbreaking study is done, while it may be published in a top-tier journal, it takes time for more articles to be published to both replicate and further the science. And how does AI take groundbreaking results into its learning when a preponderance of articles exists on the older treatment?

GAVIN: Yep. When corrections to the science are made post AI learning, does the AI application get updated?

The point How does the AI application “keep up with the literature” or simply stay current when new data come to light?

As an example, in 2018, the FDA fast-tracked and approved a new “tissue agnostic” cancer drug for a specific genetic mutation. Oncologists said that this new therapy would change the game, but how many studies need to be published until AI applications adopt it as the therapy of choice?

Researchers at the Memorial Sloan Kettering Cancer Center (MSKCC) who teamed up with IBM Watson sought to solve this question by creating “synthetic cases” that were put into training datasets so IBM Watson could learn from their data.21

Bias from what some Believe to be True

GAVIN: Essentially, MSKCC and IBM Watson added new cases to their dataset. They created records from their cases and placed them into research datasets containing other cases.

BOB: Presumably, this would make IBM Watson become smarter because it would have the benefit of MSKCC’s knowledge. This is often referred to as the “Sloan Kettering way” of treating patients.

GAVIN: So, these “synthetic cases” were given to IBM Watson so it would learn. Doesn’t this beg questions about whether these are common or unique cases, or even if MSKCC tends to receive a certain type of patient?

BOB: And because this was the “training set” where the AI modeled and learned, the bias can permeate future findings.

The point Techniques to add “synthetic cases” to improve datasets may also add bias as well.

We assume that peer-reviewed studies care for certain factors, such as representativeness, and control for bias or at minimum state them as assumptions/qualifications to the findings. When “synthetic cases” are created, one must ask questions such as these:
  • Are these synthetic patient cases representative for the domain?

  • Are they typical cases? Or are the edge cases?

  • Did these patients transfer to the institution because they needed the worse/last resort treatment solutions?

  • Is there potential for social, economic, racial, or gender bias in selection for these cases?

While this list merely nips around the edges of the potential for bias when an institution creates artificial or synthetic data to train AI, the need to apply ethical standards in AI becomes clear and apparent.

Let us be clear: MSKCC is one of the premiere cancer treatment centers in the world, but as Pilar Ossorio, a professor of law and bioethics at the University of Wisconsin Law School, argues, “[AI] will learn race, gender, and class bias, basically baking those social stratifications in and making the biases even less apparent and even less easy for people to recognize.” Considering that patients who are attracted to MSKCC tend to be more affluent, have a different mix of types of cancer, and have often failed multiple lines of treatment and are looking for one last chance,22 these biases are woven into the very fabric of Watson’s AI.

When the Watson team were pressed on concerns over the use of ‘synthetic cases’ to train IBM Watson, the response was striking. Deborah DiSanzo, general manager, IBM Watson Health, replied, “The bias is taken out by the sheer amount of data we have.23

Considering how AI is a black box and we cannot truly know what data elements Watson’s AI algorithm used or did not use, the relevance of the volume of data as an answer that overcomes potential bias is speculation at best.

This is the issue with bias. It is often hard to see or incorporate into one’s thinking. As an example, Dr. Andrew Seidman, who was the MSKCC lead trainer for IBM Watson, provided this answer to concerns over bias using MSKCC “synthetic cases” by proclaiming, “We are not at all hesitant about inserting our bias, because I think our bias is based on the next best thing to prospective randomized trials, which is having a vast amount of experience. So it’s a very unapologetic bias.” This is why an ethical standard is needed and should be applied. It can be difficult for some to be objective.

Training Data Sets the Foundation for AI Thinking

GAVIN: The underlying concern is how pervasive bias can be when AI learns from a dataset with a questionable foundation. AI only learns what you feed into its training dataset. There is a lot more to successful AI than simply its programming.

BOB: Whether you bought the dataset and need to manage what is inside or you took the time to curate your own dataset, the data is critical to the process. Responsibility is on the product and data scientists teams to ensure good data hygiene.

GAVIN: Assume a result forms the basis for a product—one with AI at the core. How many corporations would retrain on a new dataset following the product launch?

BOB: There is a lot of risk on a complete retrain. What if the AI engine doesn’t produce the same results with new training data? If it’s bad enough, it could sink the product. There are a lot of companies or product teams that might not take that risk.

The point Ethical standards are relevant today because AI is learning from datasets now. These datasets need to consider inherent bias or risk weaving that very bias into the foundation that is used to power the AI engine.

Toward an ethical standard

Organizations are concerned with the lack of an ethical standard for AI. In 2018, the MIT Media Lab at the Massachusetts Institute of Technology joined forces with the Institute of Electrical and Electronics Engineers (IEEE), a New Jersey-based global professional organization dedicated to advancing technology for humanity, and the IEEE Standards Association to form the global Council on Extended Intelligence (CXI). CXI’s mission is to promote the responsible design and deployment of autonomous and intelligent technologies.

The IEEE welcomes engagement from those who wish to be part of the standards initiatives. The IEEE’s Global Initiative’s mission is, “To ensure every stakeholder involved in the design and development of autonomous and intelligent systems is educated, trained, and empowered to prioritize ethical considerations so that these technologies are advanced for the benefit of humanity.”

This organization drafted a downloadable report entitled Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems, First Edition (EAD1e).24 This report sets the foundation for an ethical standard for autonomous and intelligent systems. The IEEE P7000™ Standards Working Group standards projects listed as follows:
  • IEEE P7000 – Model Process for Addressing Ethical Concerns During System Design

  • IEEE P7001 – Transparency of Autonomous Systems

  • IEEE P7002 – Data Privacy Process

  • IEEE P7003 – Algorithmic Bias Considerations

  • IEEE P7004 – Standard on Child and Student Data Governance

  • IEEE P7005 – Standard on Employer Data Governance

  • IEEE P7006 – Standard on Personal Data AI Agent Working Group

  • IEEE P7007 – Ontological Standard for Ethically driven Robotics and Automation Systems

  • IEEE P7008 – Standard for Ethically Driven Nudging for Robotic, Intelligent and Autonomous Systems

  • IEEE P7009 – Standard for Fail-Safe Design of Autonomous and Semi-Autonomous Systems

  • IEEE P7010 – Wellbeing Metrics Standard for Ethical Artificial Intelligence and Autonomous Systems

  • IEEE P7011 – Standard for the Process of Identifying and Rating the Trustworthiness of News Sources

  • IEEE P7012 – Standard for Machine Readable Personal Privacy Terms

The point

There is an effort underway to develop ethical standards for AI.

Conclusion: Where to next?

So we have covered a couple of the concerns about the inputs to AI-enabled products. But we think there’s another place where there is opportunity and what could keep AI applications from getting a bad rap: the user experience. One underlying theme that we touch on here and there is that there is a giddiness, an infatuation at times, with the technology that we forget that at the beginning and the end there is a user, a person. And, because we believe an AI application is just another application, we need to ensure that the AI application is tuned not only to the data but to the user’s needs. The final chapter then describes the elements we feel will promote user engagement, and ultimately increase the likelihood of marketplace success.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.165.247