© Tobias Baer 2019
Tobias BaerUnderstand, Manage, and Prevent Algorithmic Biashttps://doi.org/10.1007/978-1-4842-4885-0_1

1. Introduction

Tobias Baer1 
(1)
Kaufbeuren, Germany
 
What is a bias? A widely cited source1 defines it as follows:

Inclination or prejudice for or against one person or group, especially in a way considered to be unfair.

Biases are double-edged swords. As you will see in the next chapter, biases typically are not a character flaw or rare aberration but rather the necessary cost of enabling the human mind to make thousands of decisions every day in a seemingly effortless, ultra-fast manner. Have you ever marveled how you were able to escape a fast-moving object, such as a car about to crash into you, in a split-second? Neuroscientists and psychologists have started to unravel the mysteries of the mind and have found that the brain can achieve this speed only by taking numerous shortcuts.

A shortcut means that the mind will jump to a conclusion (e.g., deem a dish inedible or a stranger dangerous) without giving all facts due consideration. In other words, the mind uses prejudice in order to gain speed.

The use of prejudice in decision-making therefore is unfair insofar as it (willfully) disregards certain facts that may advocate a different decision. For example, if your partner once ate a bouillabaisse fish soup and became terribly sick afterwards, he or she is bound to never eat bouillabaisse again, and may refuse to even try the beautiful bouillabaisse you just cooked, blissfully ignoring the fact that you graduated with distinction from cooking school and bought the best and freshest ingredients available in the country.

Algorithms are mathematical equations or other logical rules to solve a specific problem—for example, to decide on a binary question (yes/no) or to estimate an unknown number. Just like the brain making decisions in split-seconds, algorithms promise to give an answer instantaneously (in most cases, the score value of the algorithm’s equation can be calculated in a fraction of a second), and they are also a shortcut because they consider only a limited number of factors in a predetermined fashion.

On one level, algorithms are a way for machines to emulate or replace human decision-makers. For example, a bank that needs to approve thousands of loan applications every month may turn to an algorithm applied by a computer instead of human credit officers to underwrite these loans; this often is motivated by an algorithm being both faster and cheaper than a human being.

On another level, however, algorithms also can be a way to reduce or even eliminate bias. Statisticians have developed techniques to develop algorithms specifically under the constraint of being unbiased—for example, the ordinary least squares (OLS) regression is a statistical technique defined as BLUE, the best linear unbiased estimate. Sadly, I had to write that algorithms “can” reduce or eliminate bias—algorithms also can be as biased or even worse than human decision-making. Several chapters of this book are dedicated to explaining the many ways an algorithm can be biased.

In the context of algorithms, however, the definition of bias should be more specific. Problems solved by algorithms have at least theoretically a correct answer. For example, if I estimate the number of hairs on the head of a well-known president, nobody may ever have counted them, but anyone with unlimited time and access to the president could verify my estimate of 107,817 hairs.

In most situations (including presidential hair), the correct answer cannot be known at least a priori (i.e., at the time the algorithm is applied). Algorithms therefore often are a way to make predictions. Through predictions, algorithms help to reduce and to manage uncertainty. For example, if I apply for a loan, the bank doesn’t know (yet) whether I will pay back the loan, but if an algorithm tells the bank that the probability of me defaulting on the loan is 5%, the bank can decide whether it will make any profit on me if it gives me the loan at a 5.99% interest rate by comparing the expected loss with the interest charged and other costs incurred by the bank. This illustrates a typical way algorithms are used: algorithms estimate probabilities of specific events (e.g., a customer defaulting on a loan, a car being damaged in an accident, or a person dying by the end of the term of a life insurance contract), and these probabilities allow a business underwriting risks to make an approve/reject decision based on an objective expected risk-adjusted return criterion.

Algorithms are deployed in situations with imperfect information (e.g., the bank’s credit rating algorithm doesn’t know about the gambling debt I incurred last night, nor does it know if my company will fire me next month). Algorithms therefore will make mistakes; however, they are supposed to be correct on average. A bias is present if the average of all predictions systematically deviates from the correct answer. For example, if the bank’s algorithm assigns a 5% probability of default to 10,000 different customers, one would expect that 500 of the 10,000 will default (500/10,000 = 5%). If you investigate the situation and find that in reality 10% of customers default but every time an applicant has a German passport, the algorithm cuts the true estimate by half, the algorithm is biased—in this case, in favor of Germans. (Is it a coincidence that this algorithm was created by a German guy?)

Systematic errors in predictions—whether made by humans or by algorithms—can have serious implications for businesses, and sadly they happen all the time. For example, one study of mega infrastructure projects—analyzing 258 projects in 20 different countries—found cost overruns in almost 9 out of 10 of them, indicative of a systematic underestimation of true cost.2 During the global financial crisis, banks such as Northern Rock, Lehman Brothers, and Washington Mutual went under because they had systematically underestimated credit, market, and liquidity risks.

Sometimes human bias is to blame. For example, one US bank had an economic capital model (a sophisticated model quantifying those “unexpected losses” of a given portfolio that can cause a bank run or bankruptcy) that prior to the global financial crisis hinted at the out-sized risks looming in home equity loans by estimating unexpected losses many times larger than expected losses; tragically, management dismissed those estimates because they were used to seeing unexpected losses much closer to expected losses and therefore deemed the model to be faulty.

At other times, however, algorithms themselves are flawed. For example, an Asian bank bought a scoring model for consumer credit cards that looked at the card’s utilization ratio as one of the predictors of default. The algorithms believed that customers with a low utilization (e.g., using just 10% of the credit limit) were safer than customers with a high utilization; for safe customers, the algorithm increased the limit. However, this created a circular reference: in the moment the algorithm increased the credit limit, the utilization (calculated by dividing the current outstanding balance by the credit limit) dropped, causing the algorithm to further increase the limit (so if the outstanding was 10 and the limit was 100, utilization was 10%; if the system increased the limit by 25% from 100 to 125, utilization dropped to 8% (= 10/125), triggering another increase of the limit, and so on). This happened until credit limits reached stratospheric levels that were totally beyond the customers’ means to repay the bank. When more and more customers started to actually use their very large credit limits, unsurprisingly many defaulted, and the bank almost went bankrupt after having written off more than a billion USD in bad debt.

Algorithmic bias comes in all kinds of shapes and colors. In 2016, ProPublica published a research report showing that COMPAS, an algorithm used by US authorities to estimate the probability of a criminal to re-offend, is racially biased against blacks.3 MIT reported on natural language processing algorithms being sexist by associating homemakers with women and programmers with men.4 And research conducted in 2014 showed that setting the user’s profile to female in Google’s Ad Settings can lead to less high-paying job offers appearing in ads.5 As more and more decisions are made by algorithms—affecting consumers, companies, employees, governments, the environment, even pets and inanimate objects—the dangers and impact of algorithmic bias is growing day by day. However, this is not by necessity—bias is merely a side-effect of an algorithm’s working and therefore a by-product of conscious and unconscious choices made by the creators and users of algorithms. These choices can be revisited and changed in order to reduce or even eliminate algorithmic bias.

This book is about algorithmic bias. First of all, we want to understand better what it is—where it comes from and how it can wreak havoc with important decisions. Second, we want to control its damage by exploring how you can manage algorithmic bias—be it as a user or as a regulator. And third, we want to explore ways for data scientists to prevent algorithmic bias.

The first part, Chapters 2-5, introduces the topic. I will start with a quick review of psychology and human decision biases as algorithmic biases mirror them in more ways than easily meets the eye (Chapter 2) and discuss how algorithms can help to remove such biases from decisions (Chapter 3). Keeping in mind that many readers of this book are laymen and not data scientists, I’ll then review how the sausage is made—i.e., how algorithms are developed (Chapter 4) and demystify what is behind machine learning (Chapter 5).

The second part of the book, Chapters 6-11, explores where algorithmic biases come from. Chapter 6 examines how real-world biases can be mirrored by algorithms (rather than rectified). Chapter 7 turns to the persona of the data scientist and how the data scientist’s own (human) biases can cause algorithmic biases. Chapter 8 dives deeper into the role of data, and Chapter 9 reviews how the very nature of algorithms introduces so-called stability biases. Chapter 10 looks at new biases arising from statistical artifacts, and Chapter 11 deep-dives into social media where human behavior and algorithmic bias can reinforce each other in a particularly diabolical manner.

The third part of the book, Chapters 12-17, approaches algorithmic bias from a user’s perspective. It sets out with a brief discussion of whether or not to actually use an algorithm (Chapter 12) and how to assess the severity of the risk of algorithmic bias for a particular decision problem (Chapter 13). Chapter 14 gives an overview of techniques to protect yourself from algorithmic bias. Chapter 15 more specifically describes techniques for diagnosing algorithmic bias, and Chapter 16 discusses managerial strategies for overcoming a bias ingrained in an algorithm (if not real life). Chapter 17 discusses how users of algorithms can make a critical contribution to the debiasing of algorithms by producing unbiased data.

The fourth part of the book, Chapters 18-23, addresses data scientists developing algorithms. Chapter 18 provides an overview of the various ways data scientists can guard against algorithmic bias. Chapter 19 deep-dives into specific techniques to identify biased data. Chapter 20 discusses how to choose between machine learning and other statistical techniques in developing an algorithm in order to minimize algorithmic bias, and Chapter 21 builds on this by proposing hybrid approaches combining the best of both worlds. Chapter 22 discusses how to adapt the debiasing techniques introduced by this book for the case of self-improving machine learning models that require validation “on the fly.” And Chapter 23 takes the perspective of a large organization developing numerous algorithms and describes how to embed the best practices for preventing algorithmic bias in a robust model development and deployment process at the institutional level.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.66.178