I have always thought the actions of men the best interpreters of their thoughts.
Data can be useful or anonymous, but never both.
If ethical practices are the result of ethical inquiry, then how do those practices show up in business today?
This chapter explores findings from primary and secondary research, including direct one-on-one interviews with industry thought leaders and practitioners working at companies who use big data.
Reading the privacy policies and other statements available on the websites of most organizations is a great way to understand how data-handling practices are showing up in the world today: they’re free, available to anyone on the Web, and, although often written in fairly legal language, generally state in somewhat accessible terms what policies an organization follows when handling data.
We reviewed the public-facing policy statements of the top 50 Fortune 500 corporations to better understand the current state of affairs of data-handling practices and how they relate to users, consumers, and others.[2] The process included identifying specific policy elements across a number of data handling practices.
Examples include:
Whether data would be sold without consent
If target advertising was utilized
How much control customers had over the usage of their data
Whether data would be purchased from other organizations
If data was shared or aggregated
Stated reasons for the policy itself
Taken together, the findings paint a picture of common practice in large enterprise environments. The broader implications, however, reveal issues with coherence and consistency in data-handling practices.
It is not surprising that different organizations have different practices—after all, different organizations have different values. What is somewhat surprising is the degree of differences across practices. There were clear trends and commonalities in many aspects, but the variations in how specific practices were carried out seem to indicate either that there is an amazingly wide variety of values driving corporate action or that organizations are just not sure exactly what they value (and, hence, what actions they should take to honor those values) in the first place.
Let’s take a look at the findings first.
Of the 50 policies surveyed, 40 indicated that the corporation would share personal data with third-party service providers, such as suppliers and shippers.
Of the remaining 10, 8 policies said nothing, and 2 stated that the corporation would not share personal information, even with third-party service providers.
Of the 50 policies, 34 explicitly stated that the corporation would not sell personal data without consent.
No policy explicitly stated that the corporation would sell personal data.
Of the 50 policies, 11 stated that the corporation would buy or otherwise “obtain” personal information from third parties.
No policy stated that a corporation would not buy personal information.
Of the 50 policies, 23 stated that the corporation did engage in targeted advertising on third-party websites and through third-party marketing networks. Of the remainder, only one policy ruled out targeted advertising, while 26 said nothing about the topic.
Of the 50 policies, 33 stated that a user could control the use of her data with respect to things like targeted advertising. Of these 33 policies, 31 explained how to opt out.
Of these 31 policies, 14 directed the user to a relatively convenient, web-based location for opting out. Of the 14 corporations offering a web-based opt-out, 5 employed the services of the Network Advertising Initiative.[3] Three of the remaining nine required the user to create an account on the site in order to opt out.
Of the 17 policies that offered an opt-out from targeted marketing that wasn’t web-based, 14 gave an email address for the purpose. None of these made clear that the email in question would trigger an automatic opt-out. Other policies directed the reader to a phone number.
Consider the unequal treatment given to selling personal data versus buying it:
34 out of 50 Fortune-class companies said that they would not sell personal data.
No company said explicitly that they would sell personal data.
No company made any explicit statement that they would not buy personal data.
11 policies made explicit statements that buying personal data was allowed.
Without knowing any other facts, this seems strange: if it is not OK to sell something, how could it be OK to buy it?
If selling personal data is wrong because it may harm a third party (making the individual more susceptible to harm through unintended consequences), then it would seem to follow that buying personal data contributes as much as selling personal data does to the risk of harm through unintended consequences.
It’s notoriously complicated to determine who is more responsible in exchanges that inherently contain risk, the “buyer” or the “seller.” The judicial system frequently punishes sellers more than buyers. Since the buying and selling of personal data is (currently) legal, the question becomes even more nuanced. And there is active debate and frequent new legislation regarding consumer rights online all the time.
When an additional party gains access to personal data, there is almost certainly an increased risk of harm. The potential for weaker security and protection measures, differences in data-handling policy and practice, or the mere lack of insight into another organization’s data-handling processes can contribute to the risk. Which raises the question of whether acquiring personal data (buying it) contributes as much to the degree of risk as selling it does.
There is, at least, one clear take-away: buying personal data is a common practice in current business models, but none of the Fortune 50 are comfortable enough with the practice to state publicly that they’ll also sell it. This seems to indicate that organizations are more comfortable with some values than others—a value in itself that is showing up in their actions.
All of this buying and selling relates directly to one of the central topics in the debate over big-data handling today: targeted advertising. Though many people are concerned about having their viewing history follow them around the Internet, there are realistic scenarios that provide direct consumer benefit. It makes sense that since users are going to see advertising in any case, they might as well see ads for things in which they’re more likely to be interested and it makes sense that tracking browsing behavior to infer what people are interested in is perfectly acceptable. For example, a recent browsing session exploring vacation activities in Bermuda can easily serve up targeted advertising on a national news site offering discounted hotel rates—and who doesn’t want to save money on vacation?
The question here is: whose interests are being best served and are those the right priorities? In the absence of more information, any firm conclusions would be speculative. But even in the absence of explicit policy statements about selling personal data, it seems clear that somebody is selling it because a lot of organizations are buying it.
The opt-out model of providing customers control over the use of their data is the norm. In that model, the default configuration is to require people to take additional action (“opt-out”) to prevent having their data used for other purposes. Frequently, the mere agreement to Terms of Services (whether you’ve read them or not) or the act of using a particular product online automatically grants permission to acquire personal data for use, for example, in targeted advertising. Although 33 out of 50 organizations offered people a way to control the use of their data, there is less uniformity in the ease of the opt-out procedure itself.
It is tempting to sympathize with this practice. It is difficult enough to get people to opt-in—even in the face of clear consumer benefits. We’re all familiar with the feeling of risk from not really knowing how a company is going to use our personal information. Not to mention email boxes clogged full of useless or irrelevant offers. That’s why many people create dummy email addresses when they sign up for some things and use other tricks to protect their privacy.
But even though it is simple and easy to fix that gap in customer’s understanding, that fear of the unknown risk creates a barrier to conversion that all organizations are familiar with. Making it too easy to opt out can easily be seen as detrimental to both the consumer and the business model.
Even understanding the temptation to choose the path of least resistance in order to support specific business models, designing and implementing business processes and tools to make it less likely or more difficult for people to opt out begs the question of what values are motivating the choice of which model to implement. Which value is most important: acknowledging and respecting people’s fear of the unknown risk and honoring their interest in reducing it, or making it slightly more difficult to support a business model?
These value questions become increasingly important as the evolution of big data unfolds. This is because as more data becomes available and easily analyzed on commodity hardware, the easier it will be to combine initially anonymous data sets with other data sets and correlate them to reveal new patterns or information. Some of which could cause unintended consequences, such as revealing damaging personal information.
Inconsistent policy statements on buying versus selling data and variations in opt-out procedures for uses such as targeted advertising indicate the need for deeper inquiry. The incoherence actually generates more distrust and confusion. Reducing the risk of the unknown, or not understanding what will happen to personal data, represents a substantial opportunity for organizations to share their values more broadly and align their actions with them more fully.
These can be complicated goals to achieve. Consider the aspects of the reviewed polices that concern anonymization, personally identifying information, and privacy:
47 of 50 policies made a distinction between “personally identifying information” and information that is “anonymized” and therefore not “personally identifying.” Of those 47 policies, 22 made no attempt at all to explain the distinction.
Of the remaining 25, 11 merely gave an incomplete list (e.g., “such as street address, phone number, email address…”). The remaining 14 made some attempt to explain what makes information “personally identifying.”
10 of 50 policies explicitly stated that “anonymized” data sets were not treated as protected. None of the remaining 40 policies said that “anonymized” data would not be released.
24 of 50 policies either stated or implied that user data would be aggregated with data from other sources. No policy stated that this would not happen.
16 of 50 policies stated some reason why the company protected information.
14 of these 16 policies gave some variant of “You care about privacy and we want your business.”
Of the remaining 2, one stated that protecting privacy is a matter of “respect for the individual,” and the other stated that doing so is a part of “fair information practices.”
Nearly all of the policies surveyed made some type of distinction between “personally identifying” and “anonymized” data. Nearly half of those, however, did not explain how they defined the distinction—or exactly what protections were in place.
And the distinction is critical. Anonymized data is quickly becoming very difficult to maintain. And what constitutes “personally identifying” is matter of wide and variable opinion. To understand how to reduce the risks of inadvertent migration from one category to the other, organizations first have to understand what the risks are and the growing number of ways anonymized data sets can be aggregated and correlated quite easily to expose personally identifiable information.
An excellent primer here is Paul Ohm’s “Broken Promises of Privacy” from the UCLA Law Review (2010; http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006). In a nutshell, Ohm’s point is this: for several decades, the foundation of information privacy law and practice has been the distinction between “personally identifying information” and other information.
Many of the policies reviewed explicitly state or clearly imply that “anonymized” data is not protected. However, whether a specific set of information (e.g., a street address or a set of ratings of content on a website) is “personally identifying” depends on what other information is available to someone trying to “reidentify” the people referenced in a data set (or “de-anonymize” the set).
For example, if someone doesn’t know that you live at 312 Cherry Lane, then knowing that a package is going there doesn’t associate the package with you. To someone lacking information correlating that street address with you, in that instance not even the street address is personally identifying.
Of course, data that connects street addresses to people is widely available. Ohm’s point is that all sorts of data is available that makes it easy to aggregate and connect personal data with an individual person. The more such additional data is available (in addition to more easily accessible tools and computing resources), the easier it is to reattach supposedly “anonymous” data sets to canonical “personally identifying information” such as name, address, and phone number.
In one widely cited study, for instance, researchers were able to reidentify many users from an “anonymized” data set released by Netflix for the purposes of crowd-sourcing recommendation algorithms by comparing it to user profiles on IMDB (http://www.securityfocus.com/news/11497).
You might not think that a few movie ratings could be “personally identifying information,” but given the right auxiliary information, they are.
In the opening quote to this chapter, Ohm was talking specifically about the particular sense of the word “anonymous” that relates to an individual’s personal privacy. But the reality is broader than that. Any context we create to turn data into information automatically assigns new characteristic to it, causing data itself to become less anonymous and more meaningful. And if we have enough data, we can correlate, extrapolate, query, or extract some very useful new information by understanding the relationships between those characteristics. The loss of data anonymity is a natural consequence of placing it in a context to create meaningful information. And while the value of that utility is growing exponentially in our time, so too is the unknown potential for unintended consequences of the many broad social and economic benefits derived from product and service innovations using big-data technologies.
This has serious repercussions for data-handling policies based on the personally identifying/anonymized distinction. Such policies can be implemented coherently only if there really is such a distinction, and Ohm argues rather persuasively that there isn’t. As he puts it, “data can be either useful or perfectly anonymous, but never both” (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006).
The broader implication is that as big data’s forcing function grows larger and more powerful, more and more information may slip into the category of “personally identifying” in practice. If this is true, it means that the handling of personal data may become an activity with an increasingly large risk. Especially as business models based on aggregation and sharing “anonymized” data see their data transition into the “personally identifying” category.
That risk is manageable only to the degree that business models that rely on the use of “anonymized” data while seeking to protect “personally identifying” status can safely maintain the distinction. The policies of the Fortune 50 do little to promote confidence in their ability to maintain that distinction. Nearly half of all policies make no attempt at all to explain how they define “personally identifying” information. The rest either offer suggestive open-ended lists of the “name, address, and phone number…” variety or use vague circular phrases to illustrate the difference, such as “personally identifying information is information that can be traced back to a particular person.”
The issue with such a vague formulation is that it provides no help in determining which information does not allow such “tracing back.” Remember, the mere addition of a few IMDB movie reviews allowed researchers to identify specific individuals out of a supposedly “anonymized” set of data. In the face of growing evidence that aggregation is an increasingly powerful and available method of violating individual privacy, explicit ethical inquiry is a critical part of maintaining ethical coherence. A more mature understanding about what organizations value about privacy is needed. Either the business models that depend on the distinction between anonymized and personally identifying data need to be refined, or data-handling policies and procedures need to be developed that take account of the technological and conceptual problems with maintaining the distinction.
At AT&T Labs in Florham Park, New Jersey, big data is being used to analyze the traffic and movement patterns of people through data generated by their mobile phones, to help improve policymaking and urban and traffic planning. The research team realized they could understand deep patterns of how people moved through urban environments by analyzing the flow of mobile devices from cell tower to cell tower. And they wanted to use those insights to help improve traffic flow and to inform better urban planning, not to improve their marketing.
But, of course, AT&T, along with Verizon, Google, TomTom, NAVTEQ, and several companies who help retail malls track the traffic patterns of shoppers, want very much to use that information to generate new streams of revenue. The question of privacy is top of mind (especially as the distinction between anonymized and personally identifying information becomes more difficult to maintain), but the question of ownership is equally compelling.
Since people buy their mobile devices, does the data generated by the use of those devices belong to the individual device owners—or to the company who owns and maintains the technological infrastructure that makes that usage possible?
The Electronic Frontier Foundation offers an interesting metaphor in response to this question:[4]
“Big data is the mantra right now. Everyone wants to go there, and everyone has these stories about how it might benefit us,” said Lee Tien, senior staff attorney with the Electronic Frontier Foundation, a San Francisco–based nonprofit organization specializing in free speech, privacy, and consumer rights.
“One of the things you learn in kindergarten is that if you want to play with somebody else’s toys, you ask them,” Tien said. “What is distressing, and I think sad, about the big data appetite is so often it is essentially saying, ‘Hey, we don’t have to ask.’”
Google explicitly states that they “don’t sell [their] user’s personal information.” However, they make no statement about who owns the information in the first place, which leaves the door wide open to allow them to utilize that information in their business model (notably the sale of online advertising) without denying or rejecting your claim to it.
And although Google very visibly provides information about how to “liberate” your data (http://www.dataliberation.org/), it has become common knowledge that the valuable services Google provides millions of people every day is paid for, at least in part, as a result of the implied (or tacit) agreement that Google can use some of the data created by your use of their products to generate revenue in their business model.
The question remains open, however, of the exact distinction between “personal information” and the set of all information that Google knows about you, which, when combined in the right way, could potentially expose enormous amounts of personal information. The tacit agreements we enter into as individuals with organizations who have access to vast amounts of personal data generate more risk than making those agreements explicit and easily understood and accessible.
In many ways, an organization’s business processes, technical infrastructure configuration, and data-handling procedures can be interpreted as a manifestation of their values.[5]
Seen this way, values are inherently expressed by these data-handling practices. Although it might not be completely possible to reverse-engineer a company’s values by deconstructing their data-handling practices, it certainly is possible to learn more about what has been considered important enough to include by simply reading the policy statement.
And it is fair to assume that the absence of any particular consideration in the policy statement indicates that consideration was deemed not important enough to include. Without additional information, it’s impossible to know exactly what was considered but ultimately not included or what those conversations were like. But we can know what ultimately did and did not make it into those statements, and infer some reasonable understanding of what the originating organization deems important.
Though many people hold privacy as a “right,” and rising concerns about personal data usage frequently focus on perceived violations of this right, the majority of privacy policies themselves fail to address their value basis almost entirely. Only 2 of 50 policies stated any recognizably moral reason for having a privacy policy at all. Most polices said nothing, while a minority gave the nonmoral reason that people care about privacy and the company values their business.
This is important because it directly raises the question of how to close the gaps in data-driven business models to structure their activities in alignment with moral motives. Those who believe in “corporate social responsibility” would say there are recognizably moral reasons for business to act in alignment with their values. Others, such as Friedman famously (or perhaps infamously), have stated that corporations have responsibilities only to their shareholders—the implication being that any legal action that generates a profit is justified if it results in returning value to shareholders (http://www.colorado.edu/studentgroups/libertarians/issues/friedman-soc-resp-business.html).
Regardless of where your organization falls on that spectrum, data-handling policies that reflect common values provide alignment benefits. And being more explicit about the value-based motivations for those policies, including any moral notion of a right to privacy, makes it easier to benefit from that alignment. So, whether the practice of providing an admittedly inconvenient method to opt out of the use of personal data for targeted advertising should continue can be answered by understanding why an organization has a privacy policy in the first place. If a company places greater value on providing increased individual control of the usage of personal data, then it’s ethically incoherent to develop data-handling practices that make it difficult to opt out. If the intent of any specific policy is merely to reduce customer complaints and comply with the law, then reflecting actual values is immaterial. If the intent of the policy is to ensure that people’s interests are respected, then simple opt-out procedures may be required to become ethically coherent.
The vast majority of the Fortune 50 (46 out of 50) referred to the documents that explain their data-handling practices exclusively as “privacy” statements. The other aspects of big-data ethics (identity, ownership, and reputation) receive virtually no consideration in their current data-handling practices. Identity, ownership, and reputation would also benefit from more explicit consideration of how an organization’s values inform their data-handling practices.
For example, there were virtually no discussions of what constitutes an “identity” in any of the reviewed policies. Even Google’s new policies, previously separate across more than 60 different products, are now streamlined and integrated into one set of policy statements (effective March 1, 2012). They refer to them as “Privacy Policy” and “Terms of Service.”
Although they explicitly define personal information as “information that you provide to us which personally identifies you, such as your name, email address or billing information, or other data which can be reasonably linked to such information by Google,” there is no discussion of how they conceive of who you are—that is, what constitutes an individual identity. There is no clear definition of what a specific, individual, unique “you” to whom a name (what name?), email address, or billing information can be accurately assigned.
These actions imply that Google believes it is ethical for an organization to sell advertising based on data generated by people using their services as long as that data is not personally identifiable—according to their definition. This is a highly common practice, not only by Fortune 50 corporations, but the majority of business models using big data. And this practice carries with it substantial risk for at least two reasons:
The high degree of variability between organizations. For example, what Google considers Personally Identifiable Information (PII) may be substantially different from Microsoft’s definition. How are we to protect PII if we can’t agree on what we’re protecting?
The increasing availability of open data (and increasing number of data breaches) that make cross-correlation and de-anonymization an increasingly trivial task. Let’s not forget the example of the Netflix prize.
Finally, there is no mention anywhere, in any policy statement reviewed, no matter what it was called, that addressed the topic of reputation. Reputation might be considered an “aggregate” value comprised of personal information that is judged in one fashion or another. Again, however, this raises the question of what values an organization is motivated by when developing the constituent policies. Reputation is tied to an individual with a unique identity. Do the values we hold about that unique individual’s privacy transfer completely to his reputation? What role do organizations play in protecting the data that might be used to assess a unique individual’s reputation?
The complexity and accessibility of these policy statements is no minor concern either. In March 2012, Alexis Madrigal published an article in The Atlantic referencing a Carnegie Mellon research study that found it would require 76 working days to read all of the privacy policies we encounter (http://www.theatlantic.com/technology/archive/2012/03/reading-the-privacy-policies-you-encounter-in-a-year-would-take-76-work-days/253851/).
That is 76 working days for every one of us who agree to any policy whatsoever. Which is pretty much everyone. Imagine the economic impact if we all stopped working and chose to actually read all of them. If that happened, corporations would almost certainly find some value in making them less complex and more accessible.
Privacy is clearly of direct, relevant concern to everyone. However, the somewhat unsurprising time and cost associated with actually reading those policies represents a major opportunity for organizations to streamline communication of their position on the use of personal data.
A nice start would be to make a simple change: call them something else. Data Handling Policy, Usage Agreement, or Customer Protection Commitment all broaden the scope of what organizations can consider in their policy design in order to develop deeper engagement and build more trusting relationships with their market.
By explicitly including coverage of a broad range of concerns, organizations demonstrate proactive interest in recognizing the concerns of people who use their products and services. And reducing the long-form legalese format not only makes them more accessible, as research in The Atlantic article demonstrates, but decreasing the complexity has the added benefit of reducing the opportunity cost of learning exactly what is being done with all that data.
This isn’t just about customer service. It is also about seizing an opportunity to benefit from aligning your organizational values with your actions. Both internal and external communication of which values are driving policy and action provide a range of benefits:
Faster adoption by consumers by reducing fear of the unknown (how are you using my data?)
Reduction of friction from legislation from a more thorough understanding of constrains and requirements
Increased pace of innovation and collaboration derived from a sense of shared purpose generated by explicitly shared values
Reduction in risk of unintended consequences from an overt consideration of long-term, far-reaching implications of the use of big-data technologies
Brand value generated from leading by example
A simple change in the title and design of a “privacy policy” makes these benefits immediately available to both organizations and their customers or constituents. Taking an active interest in addressing concerns around identity, privacy, ownership, and reputation is low-cost, high-return way to build deeper brand engagement and loyalty.
In the absence of any clear best practice for how to communicate these values and how they drive business decisions (actions), we’re left to wonder what data-handling practices an organization values at all.
It is worth noting at this point that the majority of the Fortune 50 operates around the world. And there are a wide variety of values present in many of those other countries, cultures, and governments. And those values are reflected in their actions.
In Sweden, the FRA Law authorizes the Swedish government to tap all voice and Internet traffic that crosses its borders—without a warrant. It was met with fierce protests across the political spectrum (http://en.wikipedia.org/wiki/FRA_law). British privacy laws are a complex and complicated set of regulations that face serious challenges resulting from how people use platforms that rely on big data, such as Twitter (http://www.huffingtonpost.com/2011/05/23/uk-privacy-law-thrown-int_n_865416.html). The number of closed-circuit television cameras (CCTV) in London is estimated to be almost two million (http://en.wikipedia.org/wiki/Closed-circuit_television). And it is well known that the Chinese government heavily regulates Internet traffic (http://en.wikipedia.org/wiki/Internet_censorship_in_the_People%27s_Republic_of_China).
Just a few examples from only three countries show the wide variety of values at play in how technology in general, and big data in particular, are utilized and managed. The sheer variety itself shows how closely related values and actions are. And how those relationships show up is often demonstrated in our written policies, technical infrastructure, and data-handling practices and processes. There is significant value in understanding more fully how those policies, infrastructures, and practices are developed and managed.
It is clear that organizations are playing catch-up when it comes to understanding and articulating the values that drive their data-handling policies. There is a great deal of confusion about some critically important distinctions, such as what it means to be anonymized and what exactly “personally identifiable information” means, not to mention how to respond to the increasing difficulty of maintaining that distinction—whatever it turns out to be.
There are open questions about how to interpret existing policy statements in terms of what the originating organization values. In the absence of explicit and transparent statements and actions, policies inherently reflect value judgments, but in the vast majority of cases, it’s unclear what those values are. The opportunity here is to build stronger brand engagement and customer loyalty.
The benefits of big-data innovation must be balanced by understanding the risks of unintended consequences. And organizations must be intentional about the inquiry. Identify and acknowledge gaps between values and actions, and make explicit plans to close them. Expand the domain of what is included in policy statements to include consideration for other key aspects of big-data ethics such as identity, ownership, and reputation. Actively seek to understand any hierarchy of values to help prioritize business decisions.
Along the way, you will learn to understand and appreciate the benefits offered by values-to-action alignment. The next chapter focuses on how to do just that.
18.224.65.20