Chapter 2. Hazardous Material

November 14, 1980 (noon) - The phone rang at a desk at National CSS (NCSS) headquarters in Wilton, Connecticut. A woman answered. On the other end of the line, a voice asked jokingly, “What would you give for the [password] directory?”1 The caller was referring to the highly sensitive database on NCSS systems that contained all 14,000 customer user IDs and passwords. But it wasn’t funny—he really had the data.

1. “Event Report as of 1/21/81 08:45:30,” FBI file 196A-397 (New Haven), FOIA/PA #1364189-0, E3df34b6cc6c2a9a14ddc71e47c1a18b8d966c57f_Q3702_R343967_D1813129.pdf, January 21, 1981, 48 (obtained under the FOIA from the FBI; received March 2019).

At the time, NCSS was a top computer time-sharing company. A predecessor of modern cloud providers, NCSS offered digital storage space and remote processing for approximately 3,100 organizations in banking, government, engineering, finance, utilities, and more. Major customers included Bank of America and Dun & Bradstreet (which had acquired NCSS in 1979).2

2. IT History Society, National CSS, Inc. (NCSS), http://www.ithistory.org/db/companies/national-css-inc-ncss (accessed April 29, 2019).

It is impossible to know precisely what data was on the NCSS systems (and it’s unlikely that NCSS knew itself, in much the same way that most of today’s cloud providers do not take specific inventory of their customers’ data). However, given their massive clientele, the NCSS servers could easily have housed millions of people’s Social Security numbers (SSNs), bank account records, payroll information, credit details, and more. An unauthorized person with access to the directory data could read or modify any customer’s files, potentially resulting in untold damages.

The Theft

According to FBI’s investigative report (revealed nearly 40 years later), the data was stolen by a former NCSS programmer, Bruce Ivan Paul, who had left and gone to a small consulting firm called the Guild. In June 1980, shortly after Paul’s departure, a mysterious intruder made multiple unauthorized connections to the NCSS mainframes, attached a backup database that contained the customer passwords, and transferred files to the “GUILD” account on NCSS systems. “The access was accomplished using userids known to Bruce Paul from his work at NCSS,” stated the company’s internal incident report. “The passwords for these userids may have remained unchanged since the time of his termination. Alternatively they may have been determined using the DIRPRINT facility” (a command known to NCSS programmers that was used to retrieve passwords).3

3. Federal Bureau of Investigation, “Prosecutive Report of Investigation Concerning Bruce Ivan Paul; National CSS - Victim; Fraud by Wire - Computer Fraud,” FBI file 196A-397 (New Haven), FOIA/PA #1364189-0, E3df34b6cc6c2a9a14ddc71e47c1a18b8d966c57f_Q3702_R343967_D1813131.pdf, October 6, 1981, 12 (obtained under the FOIA from the FBI; received March 2019).

A month after the passwords were downloaded to the “GUILD” account, Paul transferred the data to the computer system of a company called Mediametrics, based in California. Mediametrics had purchased a mini-computer from NCSS and also hired the Guild to improve its software in exchange for free disk storage space and use of its computer systems. This was a handy arrangement for the Guild, since at the time, computing resources were very limited. Personal computers had not yet become widespread. The Guild’s team eagerly took advantage of the Mediametrics disk space.

By November 1980, the relationship between the Guild and Mediametrics had become strained. The Guild was doing “very little work” on Mediametrics projects, yet routinely using the company’s computer system. On November 7, the Guild’s activities caused Mediametrics’ system to crash. In response, Mediametrics informed the Guild that it was no longer allowed to access Mediametrics’ computer and changed the Guild’s password, expecting that this would lock it out.4

4. Federal Bureau of Investigation, “FD-302,” FBI file 196A-397 (New Haven), FOIA/PA #1364189-0, E3df34b6cc6c2a9a14ddc71e47c1a18b8d966c57f_Q3702_R343967_D1813129.pdf, May 29, 1981, 85 (obtained under the FOIA from the FBI; received March 2019).

It didn’t work. Surprisingly, a few days later, a technician at Mediametrics realized that the Guild had once again gained access to the Mediametrics system. The FBI reported that the Guild must have broken in using a default administrator account.

Concerned that the Guild might have stolen proprietary data, too, Mediametrics conducted an inventory of the files in the Guild’s disk space.5 There, it discovered suspicious files that appeared to contain thousands of stolen NCSS customer IDs and passwords, as well as copies of valuable NCSS software. A technician tested three customer passwords and was able to successfully access three NCSS customer accounts. The data was valid.

5. FBI, “Prosecutive Report,” 14.

Triage

Around noon on November 14, 1980, Mediametrics called its contact at NCSS and notified her of the stolen data. She escalated to a manager, who immediately recognized the risk. According to the FBI interview, “[He] explained that, if what [she] said was true, it would have been a most serious breech [sic] of security into the National CSS system and would have monumental consequences.”6

6. FBI, “FD-302,” 101.

After an internal conference call, NCSS decided to verify that the data was real. An NCSS employee used a terminal to remotely log onto the Guild’s account at Mediametrics and began analyzing the files in its disk space. Suddenly, while she was working, a curt message appeared on her screen from “BIPPER” (associated with the Guild’s account). “Who the hell is this?” it read. Minutes later, her connection was forcibly terminated. She reconnected and finished verifying that the data did, in fact, appear to contain valid customer data. (Shortly thereafter, a different NCSS employee received a call from Bruce Ivan Paul, asking who was using the Guild account.)

Having confirmed the validity of the data, NCSS moved into the next phase: damage control. Upon request, Mediametrics agreed to temporarily remove its computer from the network, in order to prevent any further unauthorized access. The following morning, NCSS staff met with Mediametrics onsite and collected system log files, backup tapes, printouts, and other evidence. It was an early example of digital forensic evidence acquisition.

Involving Law Enforcement

Recognizing the potential risks, NCSS attorneys and their management team wrestled with the case. In 1980, there were no computer security incident response manuals, no template breach notification letters, no digital forensics experts, and no “breach coaches” to guide the company. Indeed, the very concept of hosting large quantities of digital data on behalf of another company—and therefore losing it, as well—was relatively new.7

7. Harold Feinleib “A Technical History of National CSS,” IT Corporate Histories Collection, March 4, 2005, http://corphist.computerhistory.org/corphist/documents/doc-42ae226a5a4a1.pdf.

Hoping to recover the password files quickly from the suspected thieves, an NCSS attorney made a call he or she probably regretted later—to the FBI.8 NCSS asked the FBI to “handle the situation quietly” and recover the password data from the suspects.9

8. Federal Bureau of Investigation, “Complaint Form: FD-71,” FBI file 196A-397 (New Haven), FOIA/PA #1364189-0, E3df34b6cc6c2a9a14ddc71e47c1a18b8d966c57f_Q3702_R343967_D1813129.pdf, November 15, 1980, 3 (obtained under the FOIA from the FBI; received March 2019).

9. Vin McLellan, “Case of the Purloined Password,” New York Times, July 26, 1981, http://www.nytimes.com/1981/07/26/business/case-of-the-purloined-password.html?pagewanted=1.

But the FBI refused to guarantee secrecy and could not take quick action against the suspects to recover the stolen data. “NCSS officials suddenly became defensive and ‘uncooperative,’ hiding behind a phalanx of corporate lawyers,” reported the New York Times. In order to move forward with the investigation, the FBI ultimately resorted to threatening NCSS with grand jury subpoenas.10

10. McLellan, “Case of the Purloined Password.”

The FBI later told the press it was a “learning experience.” It certainly was for NCSS, as well: Once the FBI was notified, the investigation spun out of its control.

The First Customer Breach Notification

NCSS’s parent company, Dun & Bradstreet (D&B), stepped in to oversee the response. As one of NCSS’s largest customers, the D&B executive team undoubtedly understood the value of the data stored within corporate accounts on NCSS systems and the potential for liability if it were to be misused. According to the New York Times, “[T]he traditional D. & B. credit services have become increasingly dependent on NCSS technology and the network. . . . [A]nyone who had obtained the NCSS password directory would have been able to change or erase or create data, according to NCSS technicians familiar with the D. & B. software. In other words, temporarily, at least, a thief could create or diminish credit.”11

11. McLellan, “Case of the Purloined Password.”

Unable to recover the stolen data and faced with the ongoing risk of unauthorized access to customer accounts, D&B made the unprecedented and controversial decision to notify NCSS customers. NCSS liaisons began making phone calls to their customers. The following week, NCSS, under orders of D&B executives, sent what the New York Times later dubbed “the first ‘broadcast’ security alert to the entire customer base of a major timesharing company in the 25-year history of the industry.”12

12. McLellan, “Case of the Purloined Password.”

As shown in Figure 2-1, the letter was short and sweet, notifying customers of a problem but providing no detail. The November 20 letter included the following statement:13

13. FBI, “Prosecutive Report,” 74.

Figure 2-1. The notification letter sent to NCSS customers in November 1980 (obtained under the FOIA from the FBI; received March 2019)

It has come to our attention that a former employee may have obtained information which could potentially compromise system access security. Although a breach of any customer’s data security is highly unlikely, in line with our total commitment to maintain absolute security, we strongly urge that you immediately change all passwords by which you access the National CSS’ systems.

This landmark notification letter represented a minimal disclosure strategy, with NCSS releasing only the information necessary to minimize the risk of future unauthorized access to customer data. Importantly, NCSS did not force a password change, but instead, merely “urged” customers to do so. Forcing a password reset for 14,000 corporate accounts would likely have resulted in interruptions for customers and a high volume of complaints. By notifiying its customer base of the risk (without actually stating outright that the passwords had been stolen), NCSS placed the ball in their court.

Customers understandably expressed frustration at the lack of detail and guarded wording of the notification letter. Frank Logrippo, a manager at the auditing firm Coopers & Lybrand, was told by his customer service representative over the phone that the password directory had been found at another company in California and then received the notification letter the following week. “If the passwords were found at someone else’s site,” he complained, “it’s not ‘may be compromised’—it’s compromised!”

Rival timesharing firms, too, criticized NCSS’s widespread and vague notification. Chester Bartholomew, the “protection and control director” for competitor Boeing Computer Services,14 said “Everybody in this business has dealt with penetration. Usually, we have enough information to take a rifle shot at it rather than let loose with a shotgun blast.”15 NCSS never indicated when the breach actually occurred, how long the password file had been exposed, or how to tell whether a specific account had been accessed. In response to inquiries, NCSS refused to release any further details, stating that “the matter is still under investigation for potential criminal action.”16

14. Boeing Frontiers, “A Step Back in Virtual Time,” Boeing Frontiers 2, no. 4 (August 2003), http://www.boeing.com/news/frontiers/archive/2003/august/cover4.html.

15. McLellan, “Case of the Purloined Password.”

16. Rita, Shoor. “Firm Avoids Security Breach with Customer Cooperation,” Computerworld, January 19, 1981, 13.

Downplaying Risk

It was not clear what justification, if any, NCSS had for concluding that an actual breach of customer data was “highly unlikely.” The company offered no evidence to indicate that customer accounts had not been accessed, and the evidence showed that some were. For example, a later audit uncovered unauthorized access to the systems of Marsh & McLennan, Inc. (“Marsh”), a large insurance company based in San Francisco. The intruder logged in using a default administrator password that came preconfigured on NCSS computers. Upon review, Marsh’s team was “unable to determine what the unauthorized user was doing,” but they were “99 per cent sure that the unauthorized user did not change any of the files as all things balanced out after the entry.”17

17. FBI, “FD-302,” 92.

Although NCSS conducted an audit after the Mediametrics incident, it was very limited in scope. The tools for logging and monitoring account access in 1980 were immature and had not been widely adopted. Detecting unauthorized access was a painstaking and tedious process.

What’s more, the risk of widespread account compromise at NCSS was high. At NCSS, customer passwords were accessible to many people and could have been copied and used countless times without discovery. Default passwords were widespread. Customers such as Marsh were not told that the computers they purchased had account passwords that were used on other customer systems and therefore did not consider changing them.18

18. FBI, “FD-302,” 92.

Media Manipulation

Dun & Bradstreet moved quickly to control the media response—and it was clearly well positioned to do so. The popular computer magazine Datamation prepared a news report on the incident, but D&B had previously acquired the magazine in 1977 and “vetoed” the article. Computerworld magazine published a short piece in January 1981 that perfectly toed the D&B company line with the (misleading) headline “Firm Avoids Security Breach with Customer Cooperation.”

“What can be done to prevent a security breach when an employee with access to sensitive information voluntarily leaves the company?” the article queried. According to the article, which quoted only one source (the president of NCSS, David Fehr), customers were “very cooperative about the password change.”19 Problem solved. No mention was made of the fact that the password file was found on a third party’s computer system or the possibility that customer files could have been accessed by an unauthorized person.

19. Shoor, “Firm Avoids Security Breach.”

Somehow, in the public eye, the case of the stolen password file had morphed into a story about how an ordinary employer, faced with an unpreventable “potential security problem,” did the right thing and “chose to let all of its timesharing customers in on things” so they could “change the locks.”20

20. Shoor, “Firm Avoids Security Breach,” 13.

Reporters, and the general public, didn’t know enough about technology to ask the right questions. In 1981, when the New York Times article was published, relatively few people had ever used a computer, or even knew what a password was. Home computers barely existed. Corporations, government, and research institutions rented space on timesharing servers, and only a small percentage of employees logged into the timesharing system to process data. Many people had no clue that their personal details were stored on a computer system at all. People didn’t understand what an electronic credit report was or what it could be used for, the concept of “identity theft” didn’t exist, and most thought a SSN was good for tax returns and, well, Social Security services.

As a result, the media did not effectively investigate and report on the volume of data that may have been exposed, instead focusing on the theft of the passwords themselves. It was as though the New York Times had published an article about an oil tanker hitting an iceberg and thoroughly reported the details of the accident, but failed to investigate whether any oil had actually spilled.

Skeletons in the Closet

The FBI investigation continued. Over time, former NCSS staff members and Department of Justice officials gabbed to the media. Finally, in July 1981, the New York Times published a lengthy exposeé (“The Case of the Purloined Password”), and details of the breach finally—and briefly—emerged in the public eye. The New York Times article revealed a history of pervasive, and previously unreported, security issues within both NCSS and the timesharing industry as a whole.

Earlier cases from the late 1970s were suddenly exposed, indicating that customers had access to the sensitive NCSS directory—and therefore other customers’ passwords—for years. A Bank of America programmer once demonstrated that he could access the NCSS directory from the bank’s computers, which had prompted NCSS to conduct a “six-month security review.” Around the same time, in the late 1970s, NCSS discovered that a group of their own employees, based in Detroit, had hacked the system in order to routinely gain unauthorized access to customer files for more than a year.

Two NCSS executives also claimed that they had once been offered the password directory of their biggest direct competitor, the Service Bureau Corporation (SBC), for $5,000. Reportedly, SBC officials said they had “no record of the incident.”

Industry professionals waved off the problems. “Every timesharing firm in the world has these skeletons in the closet,” said Larry Smith, former employee of NCSS.21 Another professional, interviewed by the FBI, offered his opinion that the theft of the NCSS directory was not done for criminal purposes. “[H]e referred to the term ‘hackery’ which is known in the business as an attempt by someone to break the system. It was his opinion that the hackery would continue in the computer business as long as security for programs and computer information had loop holes in the design.”22

21. McLellan, “Case of the Purloined Password.”

22. FBI, “FD-302,” 65.

What We Can Learn

The NCSS breach of 1980 did not go down in history as the first “mega-breach” to capture the public’s attention—but perhaps it should have. Consider the vast troves of data that the NCSS systems almost certainly housed on behalf of its 3,100 corporate customers, which could have been accessed or modified.

Many of the issues raised in the NCSS case remain relevant in data breaches today. The NCSS breach demonstrates how classic security flaws contribute to the risk of data breaches, including:

  • Insider attacks

  • Default credentials

  • Shared passwords

  • Insecure password storage

  • Lack of effective monitoring

  • Vendor risks

In addition, the response of NCSS and its parent company, Dun & Bradstreet, included nascent elements of a modern breach response, including:

  • Digital evidence aquisition

  • Law enforcement involvement (clearly a “learning experience” at the time)

  • Formal breach notification

  • Public relations efforts specifically related to the breach

Above all, the NCSS breach was a landmark case because it illustrated how entrusting other people with data, and holding data on behalf of others, introduces risk for all. On the timesharing system, as in the modern cloud, customers fear that their data may be accessed by unauthorized parties. Hosting providers fear the potential for reputational and legal consequences in the event that a breach occurs. All parties must work together in order to minimize risk system-wide.

As we will see in this chapter, storing data inherently creates risk. As organizations rush to amass large volumes of data, data breaches naturally occur with greater frequency. In the next sections, we will learn about how data collection creates risk and the five factors that influence the risk of a data breach. Finally, we will show how understanding these five factors can help security professionals effectively assess and manage the risk of a breach.

2.1 Data Is the New Oil

In March 1989, the massive Exxon Valdez oil spill devastated pristine Alaskan waters, immediately killing hundreds of thousands of animals and causing untold long-term damage to the marine environment. It was one of the worst environmental catastrophes ever caused by humans.

Ironically, just one month before the Exxon Valdez spill, Dun & Bradstreet executive George Feeney enthusiastically likened information to oil:23

23. Claudia H. Deutsch, “Dun & Bradstreet’s Bid to Stay Ahead,” New York Times, late ed. (East Coast), February 12, 1989, A1.

In the oil business you start off exploring for oil, you move on to producing and refining it, and only then do you worry about marketing and distributing it. . . . Well, think of the information business like the oil business. In the 1970’s and early 1980’s, we gathered data, processed it and refined it. Now the critical technology is making it available to customers.

Like early auto mechanics, the people who stored, used, and disposed of electronic data during the 1970s and 1980s did so without any thought of negative consequences. Indeed, it didn’t seem like there could be much of a downside to accumulating data; on the contrary, there was enormous potential. Published stories of computer break-ins were few and far between. There were no laws or regulations surrounding data storage or breach notification requirements. The term “data breach” didn’t even exist.

It turned out that, much like oil, data could spill and escape the confines of its containers. And spill it did.

2.1.1 Secret Data Collection

There are indications that some companies purposefully hid their data collection practices from the public, understanding that it would make people uncomfortable. For example, in 1981, the Los Angeles Times published an article called “TRW Credit-Check Unit Maintains Low Profile—and 86 Million Files.” While today this wouldn’t be considered news at all, at the time, the company’s business model was absolutely eye-opening for readers. The article began much like an exposé:24

24. Tom Furlong, “TRW Credit-Check Unit Maintains Low Profile—and 86 Million Files,” Los Angeles Times, September 18, 1981.

There are no windows facing the street, no corporate signs on the side, no markings to indicate what’s going on inside. There is virtually nothing to attract the glance of a passing motorist. The facade is no accident. Housed within are super-sensitive financial and credit records of virtually every Californian who has charged a washing machine at Montgomery Ward & Co., bought a meal on MasterCard or purchased an airline ticket with Visa.

The article goes on to explain to concerned and curious public readers how credit reports were collected, used, and updated. While credit reporting had existed for decades on a small scale, operating within specific geographic regions and industries, by the early 1980s, advancements in computer and communications technology allowed them to expand dramatically. “Histories that were once read over the phone to an inquiring business were now transmitted electronically. . . . [Credit reporting companies transformed] themselves from ‘local associations’ or ‘bureaus’ that clipped wedding announcements from newspapers to ‘efficient integrated systems serving an entire society.”25

25. Mark Furletti, “An Overview and History of Credit Reporting,” Federal Reserve Bank of Philadelphia, June 2002.

By 1981, TRW stored “about 500 million lines of information on consumers, 25 times what it was 10 years ago, and 22 million lines on businesses, up from nothing five years ago. . . . The very existence of such a large data bank is somewhat Orwellian to those who worry the data will be misused.” No wonder the company kept a low profile.

2.1.2 The TRW Breach

The public’s fears appeared justified when TRW burst into the spotlight on June 21, 1984. “The credit ratings of the 90 million people tracked by TRW Information Services have been exposed to credit card thieves armed with simple home computers,” reported Lou Dolinar of Newsday.26 A password used by Sears to check customer credit reports had been stolen and posted to an electronic bulletin board, reportedly for as long as two and a half years.27

26. Lou Dolinar, “Computer Thieves Tamper with Credit,” Morning News (Wilmington, DE), June 21, 1984, 9.

27. Christine McGeever, “TRW Security Criticized,” InfoWorld, August 13, 1984, 14.

Unlike the New York Times reporter in the NCSS case just a few years earlier, Dolinar connected the dots. Only one TRW customer password was exposed, and yet he recognized that this password was the key to accessing all consumer data on the system—90 million records in total. The 1984 headline, “Computer Thieves Tamper with Credit,” immediately grabbed the attention of consumers. In contrast, the 1981 headline in the NCSS breach, “The Case of the Purloined Password,” held little meaning for the majority of the audience.

In fact, Dolinar’s conclusion—that the theft of the password “exposed” all of TRW’s consumer information to prospective thieves—represented the first time that the media held a company accountable due to the potential for unauthorized access to millions of accounts. TRW had the burden of proving that the accounts weren’t actually inappropriately accessed.

TRW denied that consumer data could have been exposed. “There is no evidence . . . that anyone used the code to break into the records stored in the computer, which include credit card numbers and other information on more than 100 million people throughout the nation,” said a TRW spokesperson. “All we know for sure is that the (secrecy of the) password was violated.”28

28. Marcida Dodson, “TRW Investigates ‘Stolen’ Password,” Los Angeles Times, June 22, 1984.

The hackers themselves disagreed. “I’m the one that did it,” a hacker called “Tom” told InfoWorld magazine. He added that TRW’s response was “a lie to keep themselves clean.”29

29. McGeever, “TRW Security Criticized,” 14.

A spokesperson for Sears confirmed that TRW had notified the company and changed the password. However, this fact did nothing to reassure consumers, who were the subjects of the “exposed” records and had not themselves been notified of the security breach.

Later referred to as “the first identity theft-related breach [to catch] the media’s eye” by security expert Lenny Zeltser,30 the TRW breach illustrated the risks of large-scale data accumulation. Regardless of whether all 90 million records had actually been stolen, the public held TRW accountable for the security of every record.

30. Lenny Zeltser, “Early Discussions of Computer Security in the Media,” SANS ISC InfoSec Forums, September 10, 2006, https://isc.sans.edu/forums/diary/Early+Discussions+of+Computer+Security+in+the+Media/1685.

In direct response to the TRW breach, Representative Dan Glickman of the U.S. House added an amendment to the pending Counterfeit Access Device and Abuse Act of 1984, which made it “a federal crime to obtain unauthorized computer access to information protected by the Privacy Act and the Fair Credit Reporting Act.”31 The focus of regulation in the 1980s remained squarely on punishing the hackers, rather than holding organizations accountable for implementing appropriate computer security measures to protect against data breaches. It would take two more decades before U.S. lawmakers passed regulations aimed at holding data custodians accountable.

31. Mitch Betts, “DP Crime Bill Toughened,” ComputerWorld, July 2, 1984.

Even as the volume of data collection and processing continued to increase, measures for securely storing data lagged behind. Data was stored in poorly secured containers and transported over unencrypted communications lines. Measures for detecting and responding to “data spills” were virtually nonexistent. Data breaches became a systemic, widespread, and pervasive problem.

2.2 The Five Data Breach Risk Factors

Data is hazardous material. The more you have, the greater your risk of a data breach. In order to effectively manage the risk, you must understand the factors that contribute to the risk of a data breach.

There are five general factors that influence the risk of a data breach. These risk factors are:

  1. Retention: The length of time that the data exists

  2. Proliferation: The number of copies of data that exist

  3. Access: The number of people who have access to the data, the number of ways that the data can be accessed, and the ease of obtaining access

  4. Liquidity: The time required to access, transfer, and process the data

  5. Value: The amount the data is worth

The evolution of technology has increased the risk in each of these five areas, as we will see in the next sections.

2.3 The Demand for Data

Today, many types of organizations—and individuals—acquire sensitive personal data. These organizations fuel the market for data. Key players include advertising agencies, media outlets, data analytics firms, software companies, and data brokers. Data from your organization may end up in their hands, either through legitimate transactions or as the result of theft and data laundering.

Understanding how sensitive data is used, and why it is valuable, will help you evaluate the risk of storing, processing, or transferring a data set. In this section, we will examine key players in the data market and analyze how their demand for sensitive data influences the risks of a data breach.

2.3.1 Media Outlets

Media outlets create strong incentives for data leaks. Many will quietly pay for confidential information, even when those providing it are breaking the law. For example, in 2008 Lawanda Jackson, an administrative specialist at UCLA Medical Center, was convicted of selling medical information about high-profile patients to the National Enquirer, including information regarding the treatment of Britney Spears, Farrah Fawcett, Maria Shriver, and others. Prosecutors said that the National Enquirer “deposited checks totaling at least $4,600 into her husband’s checking account beginning in 2006.”32

32. Shaya Tayefe Mohajer, “Former UCLA Hospital Worker Admits Selling Records,” San Diego Union-Tribune, December 2, 2008, http://www.sandiegouniontribune.com/sdut-medical-records-breach-120208-2008dec02-story.html.

The National Enquirer was caught only because celebrity Farrah Fawcett essentially set up a sting. Details about Fawcett’s medical treatment were repeatedly reported in the National Enquirer. Eventually, she became convinced that they were being leaked from the UCLA healthcare facility itself, where she was treated. When she experienced a resurgence of cancer, she spoke with her doctor and agreed that they would withhold the news from family and friends. “I set it up with the doctor,” said Fawcett. “I said, ‘OK, you know and I know.’ . . . I knew that if it came out, it was coming from UCLA.” Days later, the Enquirer ran a story about Fawcett’s latest medical diagnosis. “I couldn’t believe how fast it came out,” she said.33

33. Charles Ornstein, “Farrah Fawcett: ‘Under a Microscope’ and Holding On to Hope,” ProPublica, May 11, 2009, https://www.propublica.org/article/farrah-fawcett-under-a-microscope-and-holding-onto-hope-511.

The hospital employee was tried and convicted—but what were the consequences for the National Enquirer? Before Fawcett died, she made it clear that she wanted to see the magazine charged: “They obviously know it’s like buying stolen goods. They’ve committed a crime. They’ve paid her money,” she said.34

34. Ornstein, “Farrah Fawcett.”

The Enquirer defended its actions in a statement, saying, “[Fawcett’s] public discussion of her illness has provided a valuable and important forum for awareness about the disease.”35 Both Fawcett and Jackson died before charges were filed against the tabloid.36

35. Ornstein, “Farrah Fawcett.”

36. Jim Rutenberg, “The Gossip Machine, Churning Out Cash,” New York Times, May 21, 2011, http://www.nytimes.com/2011/05/22/us/22gossip.html.

The Farrah Fawcett case was not an isolated incident—far from it. In another case, Dawn Holland, a former employee to the Betty Ford clinic, confessed that media outlet TMZ paid her $10,000 for information and a copy of a report that detailed an internal incident involving Lindsay Lohan. Patient confidentiality at the Betty Ford clinic is protected under state and federal regulation, according to clinic document.37 TMZ apparently took steps to cover up the flow of money. According to the New York Times, “TMZ paid [Holland] through a bank account of her lawyer at the time, Keith Davidson, who has other clients who have appeared on TMZ. . . . She said that TMZ had called her incessantly after the incident, and that she finally agreed to talk after the treatment center suspended her.”38

37. Patient confidentiality is federally protected by Alcohol and Drug Abuse Patient Records, 42 C.F.R. pt. 2; and/or HIPAA Privacy Regulations, 45 C.F.R. pts. 160, 164. See Hazelden Betty Ford Foundation, Authorization to Disclose Medical Records, https://www.hazelden.org/web/public/document/privacy-notice.pdf (accessed May 12, 2019).

38. Rutenberg, “Gossip Machine.”

The public’s lust for personal details of celebrities’ lives, including health and medical information, creates an unyielding revenue stream for magazines, websites, and TV shows that provide data. “An analysis of advertising estimates from those outlets shows that the revenue stream now tops more than $3 billion annually, driving the gossip industry to ferret out salacious tidbits on a scale not seen since the California courts effectively shut down the scandal sheets of the 1950s.”39

39. Rutenberg, “Gossip Machine.”

Where do media outlets get the juicy tidbits that fuel their businesses? A whole support industry has sprung up to harvest data about celebrities and other newsworthy people, generating quick cash for suppliers of sensitive data. “This new secrets exchange has its own set of bankable stars and one-hit wonders, high-rolling power brokers and low-level scammers, many of whom follow a fluid set of rules that do not always comport with those of state and federal law, let alone those of family or friendship,” reported the New York Times.40

40. Rutenberg, “Gossip Machine.”

“We pay CA$H for Valid, Accurate, Usable Tips on Celebrities,” advertised one gossip data broker, Hollywoodtip.com.41 Lured by the promise of easy money, low-wage healthcare employees like Jackson and Holland (who made only $22,000/yr) are enticed to spill the beans.

41. Rutenberg, “Gossip Machine.”

In the days after pop star Michael Jackson’s death (in June 2009), the Los Angeles County Coroner’s Department found itself under siege. “[T]he offer for pictures of Michael Jackson in our building was worth $2 million the day after he died,” said Deputy Coroner Ed Winter. “We had to shut down public access to our building. We had people literally climb the back fence trying to break in and get what they could.”

Law enforcement officials do investigate celebrity data breach cases, with limited success. The Department of Justice has “conducted a wide-ranging investigation into illegal leaks of celebrity health records and other confidential files,” including cases involving Fawcett, Spears, and Tiger Woods. However, since payments for data are often made in cash, often through intermediaries, they are difficult to track. The emergence of intermediary data brokers that support media has made it even more challenging for law enforcement to determine how a leak occurred, let alone prosecute. “Sometimes I think we’re losing,” said one investigator.42

42. Rutenberg, “Gossip Machine.”

2.3.2 Big Advertising

Marketing agencies can find enormous value in personal data, whether they serve retailers, entertainment, healthcare, or another industry. “Name a condition—Alzheimer’s disease, a weak heart, obesity, poor bladder control, clinical depression, irritable bowel syndrome, erectile dysfunction, even HIV—and some data brokers will compile a list of people who have the condition, and will sell the list to companies for direct marketing,” says Adam Tanner, author of Our Bodies, Our Data, an exposé on the medical data market.43

43. Adam Tanner, Our Bodies, Our Data: How Companies Make Billions Selling Our Medical Records (Boston: Beacon Press, 2017), 130.

Retailers want to lure consumers who have specific needs, such as pregnancy products or diabetes support. Healthcare providers will pay for lists of potential patients, so that they can target advertising based on their specialties. Pharmaceutical companies have direct incentive to advertise to persons who suffer from illnesses that their products can treat—as well as their doctors. Attorneys engaged in class-action lawsuits might want to send a notification to persons with a specific ailment related to their case. Media providers may want to place targeted ads for films or shows that appeal to people with certain interests, as well as health-related data such as sexual orientation.

Today, health data is combined with consumer profile databases from big data brokers such as Acxiom to produce frighteningly comprehensive profiles on consumers. What’s more, data analytics firms can use propensity modeling to predict a subject’s ailments or health-related interests based on consumer profile data. Intimate consumer data is leveraged using digital advertising and analytics, which in turn are also used to augment and improve consumer profiling.

One former IMS Health executive, Bob Merold, casually described how consumer medical data is used to target online advertising: “Companies like IMS are selling ‘Here [are] four million patients with erectile dysfunction and here [are] their profiles,’ and then Google puts it into their algorithms so that the Viagra ads show up when you are searching fishing or whatever the heck the things are that correlate.”44

44. Tanner, Our Bodies, Our Data, 135–36.

2.3.3 Big Data Analytics

Big data analytics is a burgeoning industry. Organizations within the healthcare ecosystem have an incentive to leverage data from all facets of patients’ lives, in order to more efficiently and effectively diagnose and serve patients, and make money. Big data analytics has created enormous advances in clinical operations, medical research and development, treatment cost predictions, and public health management, to name a few areas.

“McKinsey estimates that big data analytics can enable more than $300 billion in savings per year in U.S. healthcare, two thirds of that through reductions of approximately 8% in national healthcare expenditures. Clinical operations and R & D are two of the largest areas for potential savings with $165 billion and $108 billion in waste respectively.”45

45. Wullianallur Raghupathi and Viju Raghupathi, “Big Data Analytics in Healthcare: Promise and Potential,” Health Information Science and Systems 2, no. 1 (2014): article 3, doi: 10.1186/2047-2501-2-3.

Many other types of organizations can likewise gain advantages by leveraging health data and derived products, including advertising firms, entertainment vendors, and retailers. This expanding marketplace has increased the value of personal health data and created new incentives for selling, trading, processing, and hoarding it. As technology advances and data mining becomes increasingly sophisticated, raw data is much like crude oil: unrefined and full of potential.

Health data analytics relies on stores of personal health data, such as:

  • Prescription records

  • Lab test results

  • Sensor data, such as heart rate, blood pressure, insulin levels

  • Doctor’s notes

  • Medical images (X rays, CAT scans, MRIs, etc.)

  • Insurance information

  • Billing details

In addition, personal health data can be augmented with other types of personal information, such as:

  • Social media activity

  • Web search queries

  • Shopping history

  • Credit card transactions

  • GPS location history

  • Demographic records

  • Interests and characteristics derived from other sources

“You may soon get a call from your doctor if you’ve let your gym membership lapse, made a habit of picking up candy bars at the check-out counter or begin shopping at plus-sized stores,” reported Bloomberg News in 2014.46 At the time, Carolinas HealthCare System had just purchased consumer data on 2 million people, including shopping histories and credit card transactions.

46. Shannon Pettypiece and Jordan Robertson, “Hospitals Soon See Donuts-to-Cigarette Charges for Health,” Bloomberg, June 26, 2014, https://www.bloomberg.com/news/articles/2014-06-26/hospitals-soon-see-donuts-to-cigarette-charges-for-health.

Carolinas HealthCare used the data to assign a risk score to patients and ultimately planned to regularly share patient risk scores with doctors and nurses, so they could proactively reach out to high-risk patients.47 The Affordable Care Act increasingly tied healthcare reimbursements to quality metrics and clinical outcomes, giving hospitals increased incentive to invest in big data analytics that could help them to reduce readmission rates and improve overall patient health.

47. Shannon Pettypiece and Jordan Robertson, “Hospitals, Including Carolinas HealthCare, Using Consumer Purchase Data for Information on Patient Health,” Charlotte Observer, June 27, 2014, http://www.charlotteobserver.com/living/health-family/article9135980.html.

Of course, injecting new kinds of consumer data into the healthcare ecosystem increases the amount of sensitive data and therefore the risk of a potential data breach.

2.3.4 Data Analytics Firms

Big data analytics is increasingly conducted by specialized data analytics firms, which collect data from a variety of sources and produce derivative data products to be purchased or leveraged by clients. Processing data on a large scale requires a proportional investment in hardware and software for processing, as well as raw collections of data assets to be used for training and development purposes. Analytics firms typically have a complex web of relationships including data sources, customers, data brokers, and other analytics firms. Personal data flows through this web, often winding up in unexpected places.

Truven Health System is a medical data analytics firm. According to the company’s quarterly SEC report, in 2013 Truven held approximately 3 PB of data, which included “20 billion data records on nearly 200 million de-identified patient lives.”48 Where did Truven’s patient lives come from? Originally started as MedStat Systems, the company collected and analyzed insurance claims from large enterprises, including General Electric, Federal Express, and others.49 It offered clients free analytics products in exchange for the right to resell their anonymized data. In 1994, the company was sold to Thomson, which later merged with Reuters.

48. U.S. Securities and Exchange Commission (SEC), “Truven Holding Corp./Truven Health Analytics, Inc.,” Form 10-K, 2013, https://www.sec.gov/Archives/edgar/data/1571116/000144530514001222/truvenhealthq410-k2013.htm.

49. Tanner, Our Bodies, Our Data, 69.

Adam Tanner, the author of Our Bodies, Our Data, was a journalist working for Thomson-Reuters in 2007, when the companies merged. “We journalists felt complete surprise when we learned our new combined company now had an insurance database with tens of millions of patient histories,” Tanner reflected.50

50. Tanner, Our Bodies, Our Data, 69.

Explorys was another health data analytics firm that emerged as a leader in the mid-2000s. A spinoff of the Cleveland Clinic, Explorys amassed a database containing 50 million patient lives, collected from 360 hospitals.51

51. Rajiv Leventhal, “Explorys CMO: IBM Deal Will Fuel New Predictive Power,” Healthcare Informatics, April 15, 2015, https://www.healthcare-informatics.com/article/explorys-cmio-ibm-deal-will-fuel-new-predictive-power.

Today, tech companies such as IBM purchase patient “lives” in bulk, to fuel the next generation of artificial-intelligence-driven medical diagnostic tools. In 2015, IBM launched IBM Watson Health, a cloud-based health analytics platform driven by the Watson artificial intelligence system. Subsequently, IBM invested heavily in building its collection of health data. By April 2016, it had acquired four health data companies, including Explorys (for an undisclosed sum) and Truven ($2.6 billion and 215 million patient lives).

All told, by the end of 2016 IBM Watson had amassed more than 300 million patient lives. The company touted its “HIPAA-enabled” cloud, enticing more healthcare providers to upload their data to the system and partner with the tech giant. IBM also strategically partnered with Apple, releasing a ResearchKit for developers that enabled health apps on the AppleWatch or iPhone to store and analyze personal health data using the Watson Health cloud as the back end. The first app, SleepHealth, was released in 2016.52

52. Laura Lorenzetti, “IBM Debuts Apple ResearchKit Study on Watson Health Cloud,” Fortune, March 2, 2016, http://fortune.com/2016/03/02/ibm-watson-apple-researchkit.

Big data analytics holds enormous potential. Like any powerful tool, it can be harnessed for the benefit of society or cause great damage if not carefully controlled. The newly emerging industry has incentivized retention, fueled proliferation, expanded access, increased data liquidity, and increased value of personal health data—all five factors that increase the risk of data breaches.

2.3.5 Data Brokers

Data brokers, according to the Federal Trade Commission (FTC), are “companies that collect information, including personal information about consumers, from a wide variety of sources for the purpose of reselling such information to their customers for various purposes, including verifying an individual’s identity, differentiating records, marketing products, and preventing financial fraud.”53

53. Federal Trade Commission, Protecting Consumer Privacy in an Era of Rapid Change (Washington, DC: FTC, 2012), https://www.ftc.gov/sites/default/files/documents/reports/federal-trade-commission-report-protecting-consumer-privacy-era-rapid-change-recommendations/120326privacyreport.pdf.

Data brokers are a key part of the data supply chain, which incentivizes and perpetuates data breaches simply as a natural result of its existence. For example, data such as a purchase history may be generated by consumers shopping in a store; collected by the retailer; sold to a data broker, who in turn analyzes it and categorizes the user as, say, an expectant mother. That data broker sells it to a larger data broker, which merges it with credit reports to generate a list of low-income expectant mothers. This list, in turn, is purchased by a marketing firm, who uses it to advertise on behalf of a diaper manufacturer.

To support their business model, data brokers amass a vast and varied trove of consumer data, including purchase histories, health issues, web browsing activity, financial details, employment records, daily habits, ethnicity, and more. The FTC conducted a study of nine data brokers in 2014 and found that “[d]ata brokers collect and store a vast amount of data on almost every U.S. household and commercial transaction. . . . [O]ne data broker’s database has information on 1.4 billion consumer transactions and over 700 billion aggregated data elements; another data broker’s database covers one trillion dollars in consumer transactions; and yet another data broker adds three billion new records each month to its databases.”54

54. Federal Trade Commission, Data Brokers: A Call for Transparency and Accountability (Washington, DC: FTC, 2014), iv, https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014/140527databrokerreport.pdf.

This data is analyzed and distilled to create data products, such as those designed to facilitate decision making (background checks, credit scores), marketing, and more. Brokers therefore maintain not only stores of raw data collected from many sources; they also maintain neatly packaged data products that include inferences derived using data analytics. These products are valuable for a variety of reasons. “[W]hile data brokers have a data category for ‘Diabetes Interest’ that a manufacturer of sugar-free products could use to offer product discounts, an insurance company could use that same category to classify a consumer as higher risk.” 55

55. FTC, Data Brokers, vi.

No one knows exactly how many data brokers exist. Pam Dixon, executive director of the World Privacy Forum, estimated in 2013 that it included between 3,500 to 4,000 companies.56 The data-driven marketing economy (DDME) (which includes the subset of data brokers that help businesses select and market to consumers) was valued at $202 billion in 2015.57

56. U.S. Senate, What Information Do Data Brokers Have on Consumers, and How Do They Use It? (Washington, DC: GPO, 2013), 75, https://www.gpo.gov/fdsys/pkg/CHRG-113shrg95838/pdf/CHRG-113shrg95838.pdf.

57. John Deighton and Peter A. Johnson, “The Value of Data: 2015,” Data and Marketing Association, December 2015, https://thedma.org/wp-content/uploads/Value-of-Data-Summary.pdf.

By purchasing, selling, and sharing information, data brokers increase the number of copies of data, as well as the number of people who have access to a given piece of information. Data brokers distill huge, complex data sets into concise, highly liquid snippets of structured data, designed for easy transfer to other organizations. Many retain data “indefinitely” to facilitate future analysis or for the purposes of identity verification.58 And of course, the data broker’s goal is to maintain and increase the value of the data it holds, since of course data is its product.

58. FTC, Data Brokers, vi.

In short, data brokers, like other key players in the nascent data economy, inherently increase all five of the data breach risk factors.

2.4 Anonymization and Renonymization

It is common practice for organizations to “anonymize” data sets, removing explicit identifiers such as names and SSNs, and often replacing them with individual identifiers such as numeric codes. This is also known as de-identification. The goal is to reduce the risk associated with data exposure, while retaining valuable data that can been mined. By removing identifying characteristics, data custodians theorize, individuals cannot be harmed by data exposure. Regulations such as HIPAA and other laws take anonymization into account; typically security and breach notification requirements do not apply to anonymized data.

Often, data custodians assume that if a data set is “anonymized,” it is safe to share and publish without risk of harm. Unfortunately, this is not the case. Anonymization is often reversible. To the naked eye, an anonymized data set might seem impossible to map back to the individual named subjects, but in many cases, such a task can be rendered trivial. How? Even anonymized data contains information that can be unique to an individual, such as the timing of hospital visits, specific combinations of “lifestyle interests,” and personal characteristics. By mapping these unique details to other data sources, such as a voter registration list, purchase histories, marketing lists, or other data sets, it is possible to link databases and ultimately identify individuals.

This means that even anonymized data carries a risk of causing a breach. To demonstrate this, in 1997 Harvard University researcher Latanya Sweeney famously identified Governor William Weld’s hospital records in a de-identified database released by the Massachussetts Group Insurance Commission (GIC). As described by law professor Paul Ohm:59

59. Paul Ohm, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization,” UCLA Law Review 57 (2010): 1701, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 (accessed January 18, 2018).

At the time GIC released the data, William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identifiers. In response, then-graduate student Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, a city of 54,000 residents and seven ZIP codes. For twenty dollars, she purchased the complete voter rolls from the city of Cambridge, a database containing, among other things, the name, address, ZIP code, birth date, and sex of every voter. By combining this data with the GIC records, Sweeney found Governor Weld with ease. Only six people in Cambridge shared his birth date, only three of them men, and of them, only he lived in his ZIP code. In a theatrical flourish, Dr. Sweeney sent the Governor’s health records (which included diagnoses and prescriptions) to his office.

Some methods of anonymization leave more residual risk than others. The risk depends on precisely what information remains in the data set after anonymization. Data custodians must decide which details to remove and which to leave in the data set. If too much information remains and the data set is exposed, then it can cause a breach (at least, as defined by the public, if not the law).

2.4.1 Anonymization Gone Wrong

Netflix discovered the hard way that ineffective anonymization can lead to data exposure, public relations crises, and lawsuits. In 2011, as a publicity stunt, the company ran a contest to see who could create the best algorithm for predicting user film ratings, based on each user’s previous film ratings. In support of this, Netflix released an “anonymized” data set containing 100 million movie ratings from more than 480,000 subscribers. Each entry in the data set included a numeric identifier unique to each subscriber, details of each movie rated, the date it was rated, and the subscriber’s rating.

Researchers Arvind Narayanan and Vitaly Shmatikov weren’t as interested in film predictions as they were in privacy. They analyzed the Netflix data set and found that they could re-identify users by comparing the entries to a small sample of publicly available information from the website IMDB. As a proof-of-concept, the researchers re-identified two Netflix subscribers, by cross-correlating the movie review data with public IMDB movie ratings posted on the web. Due to IMDB’s terms of service, the researchers used only a small subset of the available public reviews. Based only on the dates and content of the movie reviews, the researchers were able to link two IMDB reviewers who also appeared in the Netflix data set and identified them based on their IMDB profiles.

“Jane Doe,” who watched movies in Netflix’s “Gay and Lesbian” categories, was among those whose movie-rating history was made public in the Netflix Prize dataset. She initiated a class-action lawsuit against Netflix, claiming that if her sexual orientation were to become known, “it would negatively affect her ability to pursue her livelihood and support her family and would hinder her and her children’s’ ability to live peaceful lives within Plaintiff Doe’s community.”60 As a provider of audiovisual recordings, Netflix was regulated under the Video Privacy Protection Act of 1988, which was established after U.S. Supreme Court nominee Robert Bork’s video rental history was leaked to the press.

60. Jane Doe v. Netflix, Inc., 2009, San Jose Division, CA, https://www.wired.com/images_blogs/threatlevel/2009/12/doev-netflix.pdf.

In court documents, the plaintiffs described what they called “The Brokeback Mountain Factor”: essentially, the concept that a person’s movie-watching history can reveal far more than just a person’s entertainment preferences. “A Netflix member’s movie data may reveal that member’s private information such as sexuality, religious beliefs, or political affiliations. Such data may also reveal a member’s personal struggles with issues such as domestic violence, adultery, alcoholism, or substance abuse”61 Netflix, under pressure from the FTC and the public, ultimately settled the lawsuit and canceled the Netflix Prize contest.62

61. Jane Doe v. Netflix.

62. Steve Lohr, “Netflix Cancels Contest after Concerns are Raised about Privacy,” New York Times, March 12, 2010, http://www.nytimes.com/2010/03/13/technology/13netflix.html.

2.4.2 Big Data Killed Anonymity

Data brokers, which store billions of pieces of data on millions of consumers, have access to big databases that can easily facilitate renonymization and linking of disparate data sets. In fact, data brokers frequently link data sets procured from different sources and offer products designed to connect online consumers with their offline activities. Data brokers may purchase anonymized data sets from different sources and use automated tools to connect the dots if it suits their business needs.

The National Security Agency (NSA)—one of the world’s largest data aggregators—was able to identify pseudonymous Bitcoin creator Satoshi Nakamoto using stylometry, combined with access to an enormous database of writing samples.

“By taking Satoshi’s texts and finding the 50 most common words, the NSA was able to break down his text into 5,000 word chunks and analyse each to find the frequency of those 50 words,” explained entrepreneur Alexander Muse. The NSA then compared the “fingerprint” of Satoshi’s writing style with intelligence databases containing trillions of writing samples, including the PRISM and MUSCULAR programs. “[T]he NSA was able to place trillions of writings from more than a billion people in the same plane as Satoshi’s writings to find his true identity. The effort took less than a month and resulted in positive match.”64

64. Alexander Muse, “How the NSA Identified Satoshi Nakamoto,” Medium, August 26, 2017, https://medium.com/cryptomuse/how-the-nsa-caught-satoshi-nakamoto-868affcef595.

In short, the bigger the data broker, the easier it is to pick you out of a supposedly “anonymized” data set.

2.5 Follow the Data

Driven by the huge potential for efficiency gains and profit, organizations of all kinds have amassed computerized data at a tremendous pace. Handwritten records have been entered into databases; file cabinets have been scanned and ultimately emptied in the shift to electronic records. Freed of physical constraints, organizations had the capacity to store more data. What’s more, they could analyze more data, too, now that data retrieval times were measured in milliseconds rather than minutes. Data became more liquid—easier to transfer—due to digitization and the emergence of structured data formats. This made it easier to share and trade, leading to even greater proliferation and the emergence of data markets.

In this section, we will trace the flow of data through an example supply chain, in order to better understand how the risk of data breaches has expanded over the years. We will focus on personal health data for this illustration, since it is a good example that shows the complex data processing relationships within the modern economy.

2.5.1 Pharmacies: A Case Study

In the late 1970s and early 1980s, pharmacies all over the United States began to install computer systems. Voluminous file cabinets containing personal information, insurance details, and prescriptions were digitized. This enabled pharmacists to process orders much faster, detect errors, save time on billing (which was highly complex due to insurance reimbursements), and identify fraud. Customers enjoyed added perks such as the ability to fill prescriptions at multiple locations in a chain—a strong competitive advantage.

Soon, pharmacists discovered that they could leverage their computerized databases to make extra money. For example, Thomas Menighan, owner of the Medicine Shoppe in West Virginia, installed a computer system in 1978. He was quickly approached by a company called IMS Health, which “offered to pay him fifty dollars a month to copy his prescription files onto an eight-inch floppy disk and send it in by mail.”65 Excited about the revenue, Menighan copied his pharmacy’s database to the disk, mailed it in, and received $50. “I thought I was making out like a bandit!” the pharmacist exclaimed.

65. Tanner, Our Bodies, Our Data, 14.

Patients might have agreed. Very few realized that their prescription data was shared outside the pharmacy, and fewer still knew that it was used to help pharmaceutical companies target doctors in sophisticated data-driven marketing and sales programs.

IMS Health, founded in 1954, was an early data broker. It provided market intelligence information to pharmaceutical companies and other organizations. By collecting detailed prescription and sales records from pharmacies, the company could provide drug manufacturers with reports about what products were actually moving off pharmacies’ shelves.66

66. Tanner, Our Bodies, Our Data, 14.

“Look, you are creating data as a by-product. It’s an exhaust from your system,” said IMS executive Roger Korman, describing how the company convinced sources to share their data. “Why don’t you take that thing and turn it into an asset and sell it?”67

67. Tanner, Our Bodies, Our Data, 71.

IMS reports included medication and amounts dispensed, as well as the patient’s age and other characteristics. Individual names were typically (though not always) removed before the data was sent to IMS. Importantly, the prescribing doctor’s name was included. This “doctor-identified data” enabled IMS Health to sell detailed reports of doctors’ prescription histories to drug manufacturers, who then targeted individual doctors in sophisticated sales and marketing programs. Doctors were the gatekeepers to the market, drug manufacturers recognized. “Research has shown that winning just one more prescription per week from each prescriber, yields an annual gain of $52 million in sales,” advertised IMS. “So, if you’re not targeting with the utmost precision, you could be throwing away a fortune.”68

68. Tanner, Our Bodies, Our Data, 43.

Drug manufacturers flocked to purchase IMS reports. Today, a large pharmaceutical company might pay $10 to $40 million per year for IMS products and services, according to Adam Tanner. “Drug companies have to have them,” he writes, “whatever the cost—and the price is certainly high.”69

69. Tanner, Our Bodies, Our Data, 48–49.

Pharmacy chains, in turn, now routinely plan for revenue from the sale of their databases. “Pretty much everyone who is in the business has some sort of supply arrangement for de-identified prescription data,” said CVS Health executive Peter Lofberg. “CVS Caremark is one of the providers of data into that marketplace. On the retail side of the business, they also have pretty extensive data collection ranging from loyalty cards and that sort of thing to track people’s shopping patterns. Also on the retail pharmacy side, like most retailers, they will sell certain types of data to market research companies and so on.”70 According to Tanner, today’s pharmacies can generate about one cent for each prescription, which can add up to millions of dollars for large chains.71

70. Tanner, Our Bodies, Our Data, 16.

71. Tanner, Our Bodies, Our Data, 16.

2.5.2 Data Skimming

As organizations began to leverage third-party software providers, software vendors suddenly realized the value of the data that they could access—and decided that they, too, could profit from it. The result was that sensitive data that once resided within an organization was collected and mined by third-party providers, which leveraged it and sold data products to even more organizations. Data proliferated and spread, increasing risk of a breach for all parties in the data supply chain.

“W[e] are getting tons of data in real time!” thought Fritz Krieger, who was hired in 1998 to manage data sales on behalf of a company called Cardinal Health. Cardinal Health was a drug wholesaler that also offered a service called ScriptLINE, which helped pharmacists maximize and manage their insurance reimbursements. That meant it had instant access to each transaction as it was processed. Cardinal teamed up with CVS, Wal-Mart, Kmart, and Albertson’s to create an online product called “R(x)ealTime,” which provided real-time sales data to subscribers.72

72. Biz Journals.com, “Cardinal Health, Others Form Prescription-Data Analysis Firm,” Columbus Business First, July 30, 2001, https://www.bizjournals.com/columbus/stories/2001/07/30/daily2.html.

Cardinal Health was just one of many software providers that profited from the data that flowed through their products. “As more insurance plans covered prescription drugs, a layer of data processors called clearinghouses, or switches, emerged,” explains Tanner. “Those companies route claims from the pharmacy or doctor’s office to those paying the bills such as the insurance company or . . . Medicare. Entrepreneurs running switches and pharmacy software programs learned that they could make extra cash by selling their expertise to the secondary market.”73 Often, pharmacists themselves did not even know who was selling “their” data.

73. Tanner, Our Bodies, Our Data, 17.

As the realization dawned that software providers were “skimming” data from electronic transactions, pharmacists pushed back. In 1994, two Illinois pharmacies sued a small software company, Mayberry Systems, alleging that the software provider sold their prescription data without authorization (“misappropriation of trade secrets”). The lawsuit was later expanded to include IMS Health, a purchaser of the data, and certified as a class action to include all 350 pharmacies that were Mayberry customers. Later, in 2003, two pharmacies sued IMS Health and 60 software providers that they purchased data from, alleging that they “misappropriated the trade secrets (i.e., prescription data) of thousands of pharmacies in the United States and used this information either without authorization or outside the scope of any authorization.” IMS settled both lawsuits in 2004 for approximately $10.6 million, and continued its work.74

74. U.S. Securities and Exchange Commission (SEC), “IMS Health Incorporated 2004 Annual Report to Shareholders,” Exhibit 13, https://www.sec.gov/Archives/edgar/data/1058083/000104746905006554/a2153610zex-13.htm (accessed May 12, 2019).

The maturation of AllScripts took medical data skimming to a whole new level. Developed as a service for doctors to electronically send prescriptions to pharmacies, AllScripts expanded to include electronic medical records, thereby gaining access to in-depth patient records from nearly one in three doctors’ offices and half of all hospitals in the United States.75 There was profit to be made from harvesting, mining, and selling patient data. In 2000, IMS invested $10 million in AllScripts. Glen Tullman, former CEO of AllScripts, said, “Today, if you look at AllScripts, the data business is the only thing that is driving the growth of bottom-line earnings there. That’s a key jewel in the world today, and that’s data coming from electronic health records.”76

75. Tanner, Our Bodies, Our Data, 72.

76. Tanner, Our Bodies, Our Data, 72.

Practice Fusion, a web-based electronic health records (EHR) system, now offers its software free to healthcare providers. The company generates revenue by selling ads and sharing data with third parties. “[Practice Fusion] crunches 100 million patient records it has stored remotely in an online database to alert providers when treatments or tests might be needed,” reported the Wall Street Journal in 2015. “Some of those messages are sponsored, letting marketers deliver the ultimate nudge: a subtle pitch to the right doctor, about the right patient, at the right moment.”77

77. Elizabeth Dwoskin, “The Next Marketing Frontier: Your Medical Records,” Wall Street Journal, March 3, 2015, https://www.wsj.com/articles/the-next-marketing-frontier-your-medical-records-1425408631.

Even the biggest EHR players are getting in on the action: Cerner, the market leader in the $28 billion electronic medical record system market, sells access to its patient database. According to Senior Vice President David McCallie Jr., Cerner provides access using “data enclaves,” which allow customers to remotely analyze the data without downloading the full database.78 Cerner’s website advertises, “Our strategic analytics solutions offer the ability to discover new insights by providing pre-built content and [a] variety of analytic visualization tools.”79

78. Tanner, Our Bodies, Our Data, 142.

79. Cerner, Analytics: Uncover the Value of Your Data, https://www.cerner.com/solutions/population-health-management/analytics (accessed January 8, 2018).

The emergence of “data skimming” created a whole new market for medical data. At the same time, it dramatically increased the risk of data breaches. Sensitive data proliferated and spread to many more organizations. Those that already had sensitive data discovered that they could monetize it in new ways, which gave them incentives to collect even more.

2.5.3 Service Providers

Service providers, likewise, discovered that when they received data in order to provide a service, they could often reuse that data for wholly different purposes in order to make a profit. This fuels data proliferation, creates incentive for giving more people access to sensitive data, and increases the value of the raw data used to create data products.

Laboratories are a prime example. When a patient’s test results are ready, the lab can share the outcome not just with the doctor, but also with customers that pay to receive reports of results. The patient’s identifying information is typically removed in accordance with HIPAA, but doctor-identified data remains. That means drug companies know which doctors have patients with relevant diagnoses. Sales reps can immediately reach out to the doctor to convince him or her that their drug is the right treatment option, even before the doctor has a chance to see the patient again.

Prognos is a leading broker for laboratory records, boasting that its registry contains more than “11 billion clinical diagnostics records for 175 million patients across 35 disease areas.” Where does the data come from? Quest Diagnostics, LabCorp, Cigna, and Biogen have all been publicly named as “collaborators.” The company’s main product, Prognos DxCloud, “ingests all payer lab data, including connecting and extracting from new lab sources to achieve expanded lab data coverage. . . . The result is actionable member health insights available to payers through robust, secure data connectivity access and web services ensuring delivery of lab data to the right person at the right time.”80 Prognos DxCloud is used for insurance risk assessment and cost analyses, treatment decision making, research, and more.

80. Marketwired, “New AI Cloud Platform by Prognos Transforms Member Lab Data to Address Business Challenges for Payers,” press release, May 10, 2017, http://markets.businessinsider.com/news/stocks/New-AI-Cloud-Platform-by-Prognos-Transforms-Member-Lab-Data-to-Address-Business-Challenges-for-Payers-1002000305.

2.5.4 Insurance

Insurers, too, are in on the action. Blue Health Intelligence, a spin-off of Blue Cross Blue Shield, advertises that it is “[t]he nation’s largest health information analytics data warehouse,” based on “over 10 years of claims experience from over 172 million unique members nationwide.” Other insurers, including Anthem and UnitedHealth, offer similar services.

In 2012, IMS excitedly announced that in collaboration with Blue Health Intelligence, it was releasing a product called PharMetrics Plus. The database contains “fully adjudicated pharmacy, hospital and medical claims at the anonymized patient level, sourced from commercial payers covering over 100 million enrollees from 2007 to present.”81 According to the product advertisement, the data includes:

81. B.R.I.D.G.E. To_Data, QuintilesIMS Real-World Data Adjudicated Claims: USA [QuintilesIMS PharMetrics Plus], https://www.bridgetodata.org/node/824 (accessed January 8, 2018).

  • Diagnoses

  • Procedures

  • Diagnostic & lab tests ordered (no lab values)

  • Enrollment

  • Adverse events

  • Hospitalizations

  • Office visits

  • ER visits

  • Home care

  • Cost & data of treatment

  • On/off formulary status

  • Co pays/deductibles

  • Complete medical and pharmacy costs

IMS additionally advertises that “data from disparate sources can be linked upon request (e.g., from Electronic Medical Record, Registries, Laboratory data) to provide additional clinical detail].”82 Potential purchasers might include drug companies, marketing firms, researchers, healthcare facilities, and other analytics companies.

82. B.R.I.D.G.E To_Data, QuintilesIMS.

Bill Saunders, executive at Kaiser Permanante, explained that insurance companies routinely share de-identified claims data with analytics companies, for direct profit or trade of services. “The Blues plans are the largest supplier of claims data. . . . There are a lot of small insurance companies that supply data to them as well so that they can get free analytical services in exchange for their claims data.”83 Analytics companies such as Milliman, Ingenix, and others process data on behalf of insurers and provide them with risk scores based on factors such as age and gender, utilization benchmarks, service cost projections, and more. According to Saunders, Kaiser does not provide claims information to data brokers.

83. Personal conversation between the author and Bill Saunders, June 2017.

Insurers also provide fully identifiable claims information to employers and groups. Employers running self-funded groups typically hire the insurance company to administer claims. In this case, since the employer owns the claims data, the insurer must provide it with fully identified claims records. That means employers with self-funded insurance policies have access to employee prescription records, medical procedures, and more. All too often, enterprise security professionals in these organizations are unaware that such granular health data exists on their network, until a breach occurs.

The U.S. government group also demands detailed claims information, and Saunders said, “we are not de-identifying the data at their mandate.” Saunders also said that insurers are required to provide detailed, identified claims information to state programs. “Hopefully they have good security systems to manage it and keep it confidential.”

Of course, insurers aren’t the only source of claims data. “The same claim form actually exists in at least three locations,” said Zach Henderson, senior vice president of Health Care Markets. “[They exist] in the system that created the claim (the provider), the clearinghouse that moved the claim and the entity that paid the claim (payer or PBM).”84 Any or all of these entities can mine the data and share the results with others, further increasing the risk of data exposure.

84. Tanner, Our Bodies, Our Data, 179.

2.5.5 State Government

State governments collect extensive details regarding prescription and hospitalization records, and they often sell or share this data with corporations or researchers. Security professionals who work in these environments (or those who have access to the data) should be aware of the extent of the data collected, as well as the limitations of de-identification techniques.

According to Harvard University researchers Sean Hooley and Latanya Sweeney, “[t]hirty-three states release hospital discharge data in some form, with varying levels of demographic information and hospital stay details such as hospital name, admission and discharge dates, diagnoses, doctors who attended to the patient, payer, and cost of the stay.” State governments are exempt from HIPAA regulations, and each state is free to decide what level of de-identification is sufficient.85

85. Sean Hooley and Latanya Sweeney, “Survey of Publicly Available State Health Databases” (whitepaper 1075-1, Data Privacy Lab, Harvard University, Cambridge, MA, June 2013), https://thedatamap.org/1075-1.pdf.

In Washington State, hospitals are required to share hospitalization details with the state, including “age, sex, zip code and billed charges of patients, as well as the codes for their diagnoses and procedures.” Washington State now has a database of hospitalization records from 1987 to the present, which it makes available to the public.86

86. Washington State Department of Health, Comprehensive Hospital Abstract Reporting System (CHARS), https://www.doh.wa.gov/DataandStatisticalReports/HealthcareinWashington/HospitalandPatientData/HospitalDischargeDataCHARS (accessed January 9, 2018).

In 2013, Sweeney purchased the Washington State hospitalization database for $50 and attempted to match medical records to news reports. She found that 43% of the time, “[n]ews information uniquely and exactly matched medical records in the State database,” enabling her to quickly and easily re-identify the records. “Employers, financial organizations and others know the same kind of information as reported in news stories,” Sweeney concluded, “making it just as easy for them to identify the medical records of employees, debtors, and others.”87

87. Latanya Sweeney, “Matching Known Patients to Health Records in Washington State Data” (Data Privacy Lab, Harvard University, Cambridge, MA, June 2013), https://dataprivacylab.org/projects/wa/1089-1.pdf.

Commercial data brokers and analytics firms IMS Health, Milliman, Ingenix, WebMD Health, and Truven Health Analytics are among the top purchasers of state hospital discharge data, according to a 2013 Bloomberg report.88 In this roundabout manner, sensitive medical information can enter the data supply chain, where it can then be combined with other data sources (such as purchase records, web surfing activity, and more) to create shockingly detailed records of individual lives. Exposure of this data is often not considered a breach under state or federal law, depending on the precise details, contractual obligations, and specific jurisdictions.

88. “Who’s Buying Your Medical Records?,” Bloomberg, https://www.bloomberg.com/graphics/infographics/whos-buying-your-medical-records.html (accessed January 9, 2018).

2.5.6 Cost/Benefit Analysis

Data has become a precious resource, as well as a valuable commodity. The expansion of computing power and digital storage space has led organizations to integrate sensitive data into everyday business processes, in order to increase efficiency and productivity. The development of data analytics tools has sparked the rise of the data brokerage industry and created strong, direct financial incentives for collecting and sharing data. This has resulted in a global increase in the volume of sensitive data that organizations collect, store, process, and transmit.

At the same time, regulations have lagged. As we will see in the following chapters, data breach laws and standards are typically applied to organizations that most visibly collect sensitive data (such as healthcare clinics and merchants that collect payment card data), while less-visible organizations that exchange data (such as analytics firms and data brokers) are largely unregulated. What’s more, the information protected by existing data breach laws and standards is very limited compared with the wide spectrum of sensitive data that is currently bought, sold, and leveraged.

Historically, relatively few organizations have been held accountable for their data spills. All too often, the costs of a data breach are borne by the data subjects themselves or society as a whole. This is slowly changing, however, as the public becomes savvier, regulations evolve, and the media digs deeper to follow the trail of sensitive data.

As more organizations bear the cost of their data spills, the cost/benefit ratios of storing data change, and reducing the risk of a data breach becomes more important.

2.6 Reducing Risk

As with any kind of hazardous material, the quickest and cheapest way for an organization to reduce the risk of a data breach is to minimize the volume of data stored. This requires a fundamental shift in the approach to data collection and transfer for most modern organizations, which have spent the last few decades stockpiling as much data as feasible and then storing it in loosely controlled locations. Any sensitive data that an organization does choose to retain needs to be carefully tracked, stored in a controlled manner, and properly disposed of when it is no longer needed.

2.6.1 Track Your Data

The first step to reducing and then securing sensitive data is to identify what you have and keep track of it. To accomplish this, you must establish a data classification program, take an inventory, and create a data map. Along the way, pay close attention to the places that data can escape from your control.

2.6.1.1 Data Classification

A data classification scheme is the foundation of every strong cybersecurity and breach response program. Typically it is advisable to classify data into three to five categories. Table 2-1 shows a sample data classification scheme with four categories: Public, Internal, Confidential, and Private (which, in this case, includes personally identifiable information and patient health information).

Table 2-1 Sample Data Classification Scheme

Type

Definition

Examples

Public

Data that anyone may access.

Press announcements Website home page Marketing materials

Internal

Data that may be accessed by anyone internal to the organization. Public release would not cause significant harm to the organization or individuals.

Internal website General employee communications

Confidential

Access is restricted to authorized users. Disclosure could have serious adverse impact on the organization, a business partner, or the public through financial harm, reputational damage, or delay/failure of normal operations.

Proprietary or sensitive research Financial details Audit results Passwords

Private

Information that identifies and describes an individual, where unauthorized disclosure, modification, destruction, or use could cause breach of regulation or contract, and/or serious harm to the individual or organization.

SSNs Payment card data Driver’s license numbers Medical information

2.6.1.2 Inventory Your Data

Next, take the time to create a detailed inventory of your organization’s sensitive information. Depending on the kinds of data that your organization holds, you may wish to be more or less granular. Small organizations with limited sensitive information may be able to reasonably maintain this inventory in a spreadsheet; organizations with more complex needs should consider leveraging enterprise data management software.

How much data do you have? For each type of sensitive information, estimate the volume of data. Certain types of information, such as SSNs, payment card data, or driver’s license numbers can be measured in number of records. Other data, such as customer accounts or medical files, can be measured by number of individuals. For more complex data sets, such as legal files, it may be most useful to measure simply by volume of data (i.e., terabytes). Finally, data such as intellectual property (i.e., Coca-Cola’s secret recipe) may be most effectively measured by value, in dollars or other measure of currency.

Most organizations tend to underestimate the amount of sensitive data that they store. When I conduct an initial interview for a cyber insurance policy review, I normally ask how many records my client maintains. The client will typically say something like, “Well, we have 40,000 customers, so about 40,000 people’s records.” Then I ask, “How long do you retain customer information?” More often than not, the answer is “forever,” or the client is unsure. Suddenly, 40,000 records balloon into hundreds of thousands because the organization has actually retained data of all previous customers over 20 to 30 years.

2.6.1.3 Map the Flow

Once you have a comprehensive list of the types of sensitive information present in your organization, map the flow of information so you understand where it lives. You may find it helpful to create a data flow diagram, which is a visual representation of the flow of information.

Many data loss prevention (DLP) systems include automated discovery of sensitive data throughout your network and can produce reports or visual maps of the information flow. Certain cloud providers also offer built-in DLP and data inventory tools. For example, Office365 includes built-in data loss prevention capabilities, which enable you to “discover documents that contain sensitive data throughout your tenant.”89

89. Form a Query to Find Sensitive Data Stored on Sites, Microsoft, https://support.office.com/en-us/article/Form-a-query-to-find-sensitive-data-stored-on-sites-3019fbc5-7f15-4972-8d0e-dc182dc7f836 (accessed January 19, 2018).

2.6.2 Minimize Your Data

Minimizing data is the quickest way to reduce your risk. Once you have a good handle on where data lives within your organization, you can then minimize it using one of three strategies: dispose of it, devalue it, or abstain from collecting it in the first place.

2.6.2.1 Disposal

Carefully weigh the risks and benefits for each type of data that you choose to retain, and consciously set limits. Regularly remove data from your systems when it is no longer needed. It’s important to have a formal policy that defines your data retention period and removal process, so that everyone in your organization is on the same page. Organizations typically store data in a variety of formats (paper, CDs, bits and bytes on a server, tape), and the best practices for disposal vary depending on the format. Some methods are more secure than others. Create your process and then regularly audit and report to ensure that it is being followed.

2.6.2.2 Devalue

Often, you can reap the benefits of storing data and reduce the risk. One method is through “tokenization”—the process of replacing sensitive data fields with different, less sensitive values. Using tokenization, you can remove information that is valuable to criminals on the dark web, but still retain the content that is useful for your purposes.

For example, until the early 2000s, many health insurance companies used the SSN as a policyholder’s identifier, and it was printed on health insurance cards. Over time, insurers replaced SSNs with a completely different identifier that could not be exploited or used for fraud as easily.

2.6.2.3 Abstain

Carefully review your data collection processes. Do you need all the data you collect? Is the value of the data that you retain worth the risk? If not, don’t collect it! By abstaining from data collection, you avoid the costs of security, as well as the risk of a breach.

2.7 Conclusion

Data has emerged as a powerful new resource, driving new markets and spurring efficiency and productivity. At the same time, it is hard to control and can easily leak out. Data breaches have increased in frequency, causing reputational and financial damage to organizations and consumers.

In this chapter, we have presented this important principle:

Data = Risk: Treat data as you would any hazardous material.

We also introduced the five factors that influence the risk of a data breach. These factors are:

  1. Retention: The length of time that the data exists

  2. Proliferation: The number of copies of data that exist

  3. Access: The number of people who have access to the data, the number of ways that the data can be accessed, and the ease of obtaining access

  4. Liquidity: The time required to access, transfer, and process the data

  5. Value: The amount the data is worth

Finally, we discussed techniques for minimizing sensitive data in your environment, which will inherently reduce your risk of a data breach.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.35.77