19

Legalities

Abstract

Big Data projects always incur some legal risk. It is impossible to know all the data contained in a Big Data project, and it is impossible to know every purpose to which Big Data is used. Hence, the entities that produce Big Data may unknowingly contribute to a variety of illegal activities, chiefly: copyright and other intellectual property infringements, breaches of confidentiality, and privacy invasions. In addition, issues of data quality, data availability, and data documentation may contribute to the legal or regulatory disqualification of a Big Data resource. In this chapter, four issues will be discussed in detail: (1) responsibility for the accuracy of the contained data; (2) obtaining the rights to create, use, and share the data held in the resource; (3) intellectual property encumbrances incurred from the use of standards required for data representation and data exchange; and (4) protections for individuals whose personal information is used in the resource. Big Data managers contend with a wide assortment of legal issues, but these four issues will never go away.

Keywords

Data Quality Act; Freedom of Information Act; FOIA; Limited Data Use Agreements, and Madey v. Duke; Tort; Patents; Intellectual property; Informed consent; Data ownership; Copyright; Infringement; Fair use

Section 19.1. Responsibility for the Accuracy and Legitimacy of Data

At this very moment, there's an odds-on chance that someone in your organization is making a poor decision on the basis of information that was enormously expensive to collect.

Shvetank Shah, Andrew Horne, and Jaime Capella [1]

In 2031, lawyers will be commonly a part of most development teams.

Grady Booch

I am not a lawyer, and this chapter is not intended to provide legal advice to the readers. It is best to think of this chapter as an essay that covers the issues that responsible managers of Big Data resources worry about, all of the time. When I was a program director at the National Institutes of Health, I worked on resources that collected and analyzed medical data. My colleagues and I worked through the perceived legal risks that encumbered all of our projects. For the most part, our discussions focused on four issues: (1) responsibility for the accuracy of the contained data; (2) rights to create, use, and share the data held in the resource; (3) intellectual property encumbrances incurred from the use of standards required for data representation and data exchange; and (4) protections for individuals whose personal information is used in the resource. Big Data managers contend with a wide assortment of legal issues, but these four problems, that never seem to go away, will be described in this chapter.

The contents of small data resources can be closely inspected and verified. This is not the case for Big Data. Because Big Data resources are constantly growing, and because the sources of the data are often numerous and not strictly controlled, it is a safe bet that some of the data is incorrect. The reflexive position taken by some data managers can be succinctly stated as: “It is not my problem!”

To a small extent, measures taken to improve the quality of data contained in a Big Data resource will depend on how the data will be used. Will the data be used for mission-critical endeavors? In the medical realm, will the data be used to make diagnostic or treatment decisions? These contingencies raise the stakes for Big Data resources, but the data manager's responsibility is largely the same, regardless of the intended use of the resource. Every Big Data resource must have in place a system whereby data quality is constantly checked, errors are documented, corrective actions are taken, and improvement is documented. Without a quality assurance plan, the resource puts itself in great legal jeopardy. In addition to retaining legal counsel, data managers would be wise to follow a few simple measures:

  •  Make no unjustified claims.

It is important that statements issuing from the Big Data resource, including claims made in advertisements and informational brochures, and verbal or written communications with clients, should never promise data accuracy. People who insist on accuracy should confine their attention to small data resources. If your Big Data resource has made no effort to ensure that the data is true and accurate, then you owe it to your users to indicate as much.

  •  Require your sources to take necessary measures to provide accurate data.

Sources that contribute to Big Data resources should have their own operation protocols, and these protocols must be made available to the manager of the Big Data resource. In addition, sources should certify that that their contributed data conforms, as best as they can ascertain, to their data policies.

  •  Have procedures in place ensuring that the data provided by outside sources is accurately represented within the resource.

Big Data managers should exercise reasonable diligence to ensure that the received data is legitimate, and to verify such data when it is received.

  •  Warn your data users that their analytic results, based on the resource's data, must be validated against external data sources.

It may seem obvious to you that conclusions drawn from the analyses of Big Data are always tentative, and must be validated against data from other sources. Sometimes data analysts need to be protected from their own naiveté, necessitating an explicit warning.

  •  Open your verification procedures to review (preferably public review).

Users find it unsettling to read exculpatory verbiage in user licenses, expressing that the data provider cannot guarantee the accuracy of the data and cannot be held liable for any negative consequences that might arise from the use of the data. At the very least, data managers should re-assure their users that reasonable measures have been taken to verify the data contained in the resource. Furthermore, those measures should be available for review by any and all potential data users.

  •  Provide a method by which complainants can be heard.

This may actually be one of those rare instances when the immutability of a Big Data resource is broken. If material is known to be illegal or if the material is a potential danger to individuals, then it may be necessary to expunge the data (i.e., violate data immutability).

  •  Be prepared to defend your data and your procedures

Big Data managers must understand their data. The conclusions drawn from their data may someday serve as evidence in legal proceedings, including all manner of arbitration and litigations, both civil and criminal. In the case of Daubert v Merrell Dow Pharmaceuticals, Inc., the U.S. Supreme Court ruled that trial judges must determine the relevance and adequacy of data-based evidence presented by expert witnesses. Judicial oversight is conducted through a pre-trial review that “entails a preliminary assessment of whether the reasoning or methodology underlying the testimony is scientifically valid and of whether that reasoning or methodology properly can be applied to the facts in issue” [2]. Hence, Big Data managers must constantly strive to assure that the data contained in their resources are fully described and linked to the protocols through which the data was obtained. Any verification processes, through which data is entered and checked into the resource, may be reviewed by government committees and courts.

When Big Data resources are used to influence the governmental process, special regulatory conditions may apply. The U.S. government passed the Data Quality Act in 2001, as part of the FY 2001 Consolidated Appropriations Act (Pub. L. No. 106-554) [3,4]. The Act requires Federal Agencies to base their policies on high quality data and to permit the public to challenge and correct inaccurate data [5]. The drawback to this legislation, is that science is a messy process, and data may not always attain a high quality. Data that fails to meet standards of quality may be rejected by government committees or may be seized upon by lobbyists to abrogate good policies that were based on the imperfect data [68]. [Glossary Data Quality Act]

Data managers chant a common lament: “I cannot be held responsible for everything!” They have a point, but their inability to control everything does not relieve them of their responsibility to exercise a high degree of data diligence.

Section 19.2. Rights to Create, Use, and Share the Resource

Free software is a matter of liberty, not price.

Richard Stallman

As mentioned earlier, ownership is a mercantile concept; the owner of an item is the person who can sell the item. If you own a cow, then you can sell the cow. Once the cow is sold, you no longer own the cow; the cow has a new owner. This simple ownership arrangement does not work well for Big Data. Data can be copied ad infinitum. In virtually all cases financial transactions that involve the transfer of data do not actually result in the loss of the data by the provider. The data provider continues to hold the data after the transaction has transpired. In the Big Data universe, Big Data is not “owned” in the usual sense of the word; data is intangible. This explains why the term “service” pops up so often in the information field (e.g., Internet Service Providers, Web Services, List Servers). Data is more often a service than an owned commodity. [Glossary Web service]

Because Big Data comes from many sources, different uses, and can be retrieved via federated queries across multiple resources (Big and small), the customary laws pertaining to property rights can be difficult to apply. Big Data managers need to know whether they have the right to acquire and distribute the data held in their resources. It may be easiest to think in terms of two separable issues: laws dealing with data acquisition, and laws dealing with data distribution.

Information produced through a creative effort (e.g., books, newspapers, journal articles) usually falls under copyright law. This means that you cannot freely obtain and distribute these materials. Exceptions would include books that fall into the public domain (e.g., books produced by the federal government, and books whose copyright term has expired). Other exceptions might include copyrighted material that fall under Fair Use provisions [9]. Fair Use provisions permit the distribution of copyrighted material if it is done solely for the public good, with no profit motive, and if it can be done in a way that does not financially harm the copyright holder (e.g., does not result in the loss of sales and royalties).

Most Big Data resources are primarily composed of raw data, along with annotations to the data. The data may consist of measurements of physical objects and events, and short informational attributes appended to abstract data objects. These types of data are generally not produced through a creative effort, and would not fall under copyright law. In the United States, the most cited precedent relevant to data acquisition is Feist Publishing, Inc. v. Rural Telephone Service Co. When Rural Telephone Co. refused to license their alphabetized listing of names and telephone numbers to Feist Publishing, Inc., Feist proceeded to copy and use the data. Rural Telephone Co. claimed copyright infringement. The court ruled that merely collecting data into a list does not constitute a creative work and was not protected by copyright.

European courts differ somewhat from American courts with regard to copyright protections. Like their American counterparts, Europeans interpret copyright to cover creative works, not data collections. However, the 1996 European Database Directive instructs courts to extend sui generis (i.e., one of a kind or exceptional) protection to databases. In Europe, databases created with a significant investment of time, effort and money cannot be freely copied for commercial use. The idea behind such a directive is to protect the investments made by database builders. By protecting the database owner the European law attempts to promote the creation of new Big Data resources along with the commercial activities that follow.

Insofar as Big Data resources have international audiences, differences in database laws across different countries can be very frustrating for data managers who strive for legal clarity. Consequently, providers and users often develop their own solutions, as needed. Acquisition of commercial data (i.e., data that does not belong to the public domain), much like access to commercial software, is often achieved through legal agreements (e.g., licenses or contracts) between the data providers and the data users.

Regarding laws dealing with holding and distributing data, the Digital Millennium Copyright Act of 1998 (DMCA) applies in the United States. This law deals primarily with anti-piracy security measures built into commercial digital products [10]. The law also contains a section (Title II) dealing with the obligations of online service providers who inadvertently distribute copyrighted material. Service providers may be protected from copyright infringement liability if they block access to the copyrighted material when the copyright holder or the holder's agent claims infringement. To qualify for liability protection, service providers must comply with various guidelines (i.e., the so-called safe harbor guidelines) included in the Act. In most instances, compliant service providers would also be protected from infringement claims when their sites link to other sites that contain infringing materials. [Glossary DMCA]

Whereas the DMCA provides some liability relief for inadvertent copyright infringers, the United States No Electronic Theft Act of 1997 (NET Act) makes possible the criminal prosecution of infringers who distribute copyrighted material for non-commercial purposes (i.e., for free) [11]. In the early days of the Internet, there was a commonly held, but unfounded, belief that copyrighted material could be held and distributed without fear of legal retribution, if no profit was involved. This belief, perhaps based on an overly liberal interpretation of the Fair Use provisions, came to an end with the NET Act.

Without delving into legal minutiae, here are a few general suggestions for data managers:

  1. 1. Require your sources to substantiate their claim that the data is theirs to contribute. Nobody should be submitting data that they do not own or that they do not have the right to distribute.
  2. 2. Require your sources to indicate that the data was collected in a manner that did not harm individuals and that the data can be distributed without harming individuals.
  3. 3. Use government data whenever feasible. Much of the best data available to Big Data resources comes absolutely free from the U.S. government and other governments that have a policy of contributing their official data to the public domain. Big Data resources can freely copy and redistribute public domain government data. Links to the major sources of prepared U.S. government data are found at: http://www.data.gov/. In addition, virtually all data collected by the government, including data collected through federal grants, and data used to determine public actions, policies, or regulations, can be requested through the Freedom of Information Act [12]. Many countries provide their citizens with the right to acquire data that was generated with government (i.e., taxpayer) funds.
  4. 4. Pay for legitimate data when feasible. It seldom makes good sense to copy a data set into a Big Data resource, if that data requires constant updating and curation. For example, a comprehensive list of restaurants, with their addresses and phone numbers, is always a work in progress. Restaurants open, close, move their locations, acquire new phones numbers, revise their menus, and modify their hours of operation. If there is a database resource that collects and updates this information, there may be little reason to replicate these activities within another data resource. It may make much more sense to license the database or to license access to the database. A federated data service, wherein queries to your Big Data resource are automatically outsourced to other databases, depending on the query subject, may be much more feasible than expanding your resource to include every type of information. In many circumstances the best and the safest method of using and distributing data may come from negotiating payments for external data.

Section 19.3. Copyright and Patent Infringements Incurred by Using Standards

She was incapable of saying please, incapable of saying thank you and incapable of saying sorry, all the while creating a surge in the demand for these expressions.

Edward St. Aubyn, in his book, “At Last”

As described in Chapter 7, the standards that you have been using in your Big Data resource may actually belong to somebody else. Strange as it may seem, standards are intellectual property and can be copyrighted, patented, or licensed. Not only may a standard be patented, but specific uses of the standard may also be patented, and the patents on uses of the copyright may be held by entities who were not at all involved in the creation of the standard.

If you choose to pay a license fee for the use of a proprietary standard, you might find that the costs exceed the sticker price [13]. The license agreement for the standard may impose unwanted restrictions on the use of the standard. For example, a standard may be distributed under a license that prohibits you from freely distributing the intellectual product of the standard (i.e., materials created through the use of the standard). This may mean that your users will not be able to extract and download data that has been formatted in conformance with the standard, or annotated with codes, numbers, terms or other information that could not have been created without the use of the standard. The same restrictions might apply to licensed software.

The building blocks of Big Data resources may hide intellectual property [14,13]. This is particularly true for software, which may inadvertently contain subroutines or lines of code that fall under a claim within an issued patent. One day, you might receive a letter from a lawyer who represents a patent holder, asserting that a fragment of code included in a piece of your software infringes his client's patent. The letter may assert the patent and demand that you cease using the patent holder's intellectual property. More commonly, the letter will simply indicate that a conflict has arisen and will suggest that both parties (your Big Data resource and the patent holder) should seek a negotiated remedy. In either case, most Big Data resources will keep a law firm on retainer for such occasions. Do not despair; the ultimate goal of the patent holder is to acquire royalty payments; not to initiate a lawsuit.

Big Data resources are complex and contain many different types of data objects that may have been transformed, annotated, or formatted by many different methods. The uses of these methods may be restricted under licenses, contracts and other legal contrivances. A few precautionary steps may help reduce your risks:

  •  Whenever possible, use free and open source standards, software, nomenclatures, and ontologies for all of your data annotations. Do not disparage free and open source products. In the world of Big Data, many of the best standards, data formats, nomenclatures, classifications, software, and programming languages are free and open source [15].
  •  Inventory your standards, software, nomenclatures, and ontologies. For each item, write a description of any restrictions that might apply to your resource.
  •  Investigate on the Web. See if there are any legal actions, active or settled, involving any of the materials you might use. Visit the U.S. Patent Office to determine whether there are patent claims on the uses of the standards, software, nomenclatures and ontologies held in your resource. Most likely, your Big Data resource will send and receive data beyond the U.S. Consult the World Intellectual Property Organization (WIPO). Do not restrict your search to proprietary materials. Free and open source materials may contain embedded intellectual property and other encumbrances.
  •  Talk to your legal staff before you commit to using any proprietary product. Your law firm will need to be involved in virtually every aspect of the design and operation of your Big Data resource.
  •  If you must use licensed materials, carefully read the “Terms of Use” in the agreement. Licenses are written by lawyers who are paid to represent their client (the Licensor). In most cases, the lawyer will be unaware of the special use requirements of Big Data resources. The Terms of Use may preclude the customary activities of a Big Data resource (e.g., sharing data across networks, responding to large numbers of queries with annotated data, storing data on multiple servers in widely distributed geographic locations). As noted previously, it is important to have a lawyer review license agreements before they are signed, but the data manager is in the best position to anticipate provisions that might reduce the value of a Big Data resource.

Big Data would greatly benefit from a universal framework supporting resource interoperability [16]. At present, every data manager must fend for herself.

Section 19.4. Protections for Individuals

Everything is gone;

Your life's work has been destroyed.

Squeeze trigger (yes/no)?

Computer-inspired haiku by David Carlson

Data managers must be familiar with the concept of tort. Tort relates to acts that result in harm. Tort does not require an illegal act; it only requires a harm and a person or entity who contributes to the harm and who is liable for the damages. Tort works like this; if you are held liable for harm to another entity, then you must compensate the victim to an extent that makes the victim whole (i.e., brings the victim back to where he was before suffering harm). If the victim makes a case that the harm resulted from negligence or due to conditions that could have been corrected through customary caution, then punitive fees can be added to the victim's award. The punitive fees can greatly exceed the restorative fees. Consequently, it behooves every data manager to constantly ask themselves whether their Big Data resource can result in harm to individuals (i.e., the users of the data, or the subjects of the data). Needless to say, Big Data managers must seek specialized legal advice to minimize tort-related risks.

In the Big Data universe, tort often involves the harms that befall individuals when their confidential data files have been breached. I was raised in Baltimore, not far from the community of Catonsville. Catonsville was the site of a 1968 protest against United States involvement the Vietnam War. Nine anti-war activists stormed into a draft office, stole files, and publicly burned the files. The Catonsville 9 attained instant international notoriety. The number of files destroyed: 379. In the Big Data era the ante has been upped by many orders of magnitude. Today, when records are stolen or destroyed, you can expect the numbers to be in the millions, or even hundreds of millions [17].

In May, 2006, 26.5 million records on military veterans were stolen, including Social Security numbers and birth dates. The records had been taken home by a data analyst employed by the Department of Veterans Affairs. His laptop, containing all this information, was stolen. A class action lawsuit was brought on behalf of the 26.5 million aggrieved veterans. Three years later, the Department of Veterans Affairs paid $20 million to settle the matter [18]. In the United Kingdom, a copy of medical and banking records on 25 million Britons were lost in the mail [19]. The error led to the sudden resignation of the chairman of Her Majesty's Revenue and Customs [19].

There are occasions when security is broken, but no theft occurs. In these instances, resource managers may be unaware of the privacy breach for a surprisingly long period of time. Medical data collected on about 20,000 patients was posted on a public Web site in 2010. The data included patient names, diagnosis codes, and administrative information on admissions and discharges occurring in a six month period in 2009. The data stayed posted on the public Web site for about a year before a patient happened to see the data and reported the breach to the hospital [20]. Accidental breaches are common in many different fields [21].

Today, healthcare organizations must report data breaches that affect more than 500 people. Hundreds of such breaches have been reported. These breaches cost the healthcare industry in excess of $6 billion annually, and the costs are increasing, not decreasing [17]. Other industries have data breaches but are not required to report incidents.

Industry costs do not reflect the personal costs in time, emotional distress, and money suffered by individuals coping with identity theft. In the Big Data field, everyone's deepest fear is identity theft. None of us wants to contemplate what may happen when another person has access to their financial accounts or gains the opportunity to create new accounts under the stolen identity.

Security issues are inseparable from issues related to privacy and confidentiality. We have dealt with some of the more technical issues of data security in Section 18.3, “Data Security and Cryptographic Protocols”. In this chapter, we can review a few of the commonsense measures that will reduce the likelihood of identification theft.

  1. 1. Do not collect or provide information that will link an individual to his or her data record unless you really need the information. If you do not have information that links a record to a named individual, then you cannot inadvertently expose the information. Names, social security numbers, credit card numbers, and birth dates constitute the core information sought by identity thieves. Big Data resources should seriously consider whether such information needs to be stored within the resource. Does your resource really need to collect social security numbers and credit card numbers? Can the person's name be adequately replaced with an internal identifier? Do you need a birth date when a birth year might suffice? When these data items are necessary, do they need to be included in data records that are accessible to employees?
  2. 2. Work with deidentified records whenever possible. Deidentification may not be a perfect way to render records harmless; but it takes you very close to your goal. A thoughtfully deidentified data set has quite limited value to identity thieves.
  3. 3. All files should be encrypted whenever possible. Most breaches involve the theft of unencrypted records. Breaking an encrypted record is quite difficult and far beyond the technical expertise of most thieves.
  4. 4. Back-up data should be encrypted, inventoried, and closely monitored. Back-up data is a vulnerability. Thieves would be just as happy to steal your back-up data as your original data. Because theft of back-up data does not result in a system crash, such thefts can go undetected. It is very important to secure your back-up data and to deploy a system that monitors when back-up data is removed, copied, misplaced, destroyed, or otherwise modified.

Section 19.5. Consent

MRECs [Medical Research Ethics Committees] sometimes place extreme demands on researchers. These demands have included gaining consent for each step of the research and ensuring data are destroyed on completion of a project...

Louise Corti, Annette Day, and Gill Backhouse [22]

For data managers who deal with medical data, or with any data whose use puts human subjects at risk, consent issues will loom as a dominant legal issue. The reason why consent is a consuming issue for data managers has very little to do with its risks; the risks associated with obtaining improper consent are very small. Consent issues are important because consenting data can be incredibly expensive to implement. The consent process can easily consume the major portion of the data manager's time, and cost-effective implementations are difficult to achieve.

In the context of Big Data, informed consent occurs when a human agrees to accept the risk of harm resulting from the collection and use of their personal data. In principle, every consent transaction is simple. Someone involved with the Big Data resource approaches a person and indicates the data that he would like to collect for the data project. He indicates the potential harms that may occur if consent is granted. If relevant, he indicates the measures that will be taken to minimize the risk of harm. The human subject either signs, or does not sign, the consent form. If the subject signs the form, then his data can be included in the Big Data resource. [Glossary Informed consent, Bayh-Dole Act]

It is important that data managers understand the purpose of the consent form, so that it is not confused with other types of legal agreements between data owners and data contributors. The consent form is exclusively devoted to issues of risk to human subjects. It should not be confused with a commercial agreement (i.e., financial incentives for data use), or with an intellectual property agreement (i.e., specifying who controls the uses of the data); or with scientific descriptions of the project (i.e., determining how the data is to be used and for which specific purposes).

The term “informed consent” is often misinterpreted to mean that the patient must be fully informed of the details of the Big Data project with an exhaustive list of all the possible uses of their personal data. Not so. The “informed” in “informed consent” refers to knowledge of the risks involved in the study, not the details of the study itself. It is reasonable to stipulate that the data in Big Data resources is held permanently, and can be used by many individuals, for a wide variety of purposes that cannot be predetermined. Filling the consent form with detailed information about the uses of the resource is counterproductive, if it distracts from the primary purpose of the form; to explain the risks.

What are the risks to human subjects in a Big Data project? With few exceptions, Big Data risks are confined to two related consequences: loss of confidentiality and loss of privacy.

The concepts of confidentiality and of privacy are often confused, and it is useful to clarify their separate meanings. Confidentiality is the process of keeping a person's secret. Privacy is the process of ensuring that the person will not be annoyed, betrayed, or harmed as a result of his decision to give you his secret. For example, if you give me your unlisted telephone number in confidence, then I am expected to protect this confidentiality by never revealing the number to other persons. I may also be expected to protect your privacy by never using the telephone number to call you unnecessarily, at all hours of the day and night (i.e., annoying you with your private information). In this case the same information object (i.e., your unlisted telephone number) is encumbered by confidentiality (i.e., keeping the unlisted number secret) and privacy (i.e., not using the unlisted number to annoy you).

To cover confidentiality risks the consent form could indicate that personal information will be collected, but that measures will be taken to ensure that the data will not be linked to your name. In many circumstances, that may be all that is needed. Few patients really care if anyone discovers that their gall bladder was removed in 1995. When the personal information is of a highly sensitive nature, the consent form may elaborate on the security measures that ensure confidentiality.

The risk of losing privacy is a somewhat more subtle risk than the loss of confidentiality. In practical terms, for Big Data projects, loss of privacy occurs when the members of the Big Data resource come back to the human subject with a request for additional information, or with information regarding the results of the study. The consent form should indicate any constraints that the Big Data resource has put into place to ensure that subjects are not annoyed with unwelcome future contacts by members of the project. In some cases the Big Data project will anticipate the need to recontact human subjects (i.e., to invade their privacy). In this case the consent form must contain language informing the subjects that privacy will not be fully protected. In many cases subjects do not particularly care, one way or the other. They are happy to participate in projects that will benefit society, and they do not mind answering a phone call at some future time. The problem for the Big Data resource will come if and when subjects have a change of heart, and they decide to withdraw consent.

Obtaining consent from human subjects carries its own administrative and computational challenges; many of which are unanticipated by Big Data managers. Consent-related tasks include the following:

  1. 1. Creating a legally valid consent form.

There are many ways to write a bad consent form. The most common mistake is inserting consent clauses among the fine-print verbiage of broader legal documents (e.g., contracts, agreements, licenses). This is a bad mistake for several reasons. The validity of informed consent can be challenged if an individual can claim that he or she was not adequately informed. The consent form should be devoted to a single topic, consent, and should not be inserted into other legal forms that require the subject's signature.

The consent form should be written in language that the average person can understand. In many cases, particularly in medical settings, informed consent should be read aloud by an individual who is capable of explaining difficult passages in the consent document.

Consent forms should not contain exculpatory clauses. For example, the consent form should not contain language expressing that the Big Data resource cannot be held liable for harm resulting from the use of the consenter's data. Neither should the form ask signers to waive any of their normal rights.

The consent form should have a signature section, indicating an affirmative consent. Certain types of informed consent may require the signature of a witness, and consent protocols should have provisions for surrogate signatures (e.g., of a parent or legal guardian). It is common for consent forms to provide an opportunity for subjects to respond in the negative (i.e., to sign a statement indicating that consent is denied). Doing so is seldom a good idea, for several reasons. First, the negative (non-affirmative) statement is not legally required and there are no circumstances for which a non-affirmative statement has any practical value. Secondly, individuals should not feel compelled to respond in any way to the consent form. If they freely choose to give consent, they can sign the form. If they do not wish to give consent, they should not be coerced to sign their names to a statement of denial. Thirdly, a non-affirmative statement can produce great confusion in the future, when an individual consents to having the same record used for another research project, or when the individual has a change of heart, and decides to provide consent for the same project.

The consent form should reveal circumstances that might influence a person's decision to provide consent. For example, if the investigators have a commercial interest in the outcome of the study, then that information should be included in the consent form. It is reasonable for individuals to fear that they might suffer harm if the investigators have something to gain by a particular outcome of an experiment or analysis.

Traditionally, consent is not open-ended. Consent generally applies to a particular project that is conducted over a specified period of time. Consent ends when the project ends. There has been a trend to lengthen the window of time to which consent applies, to accommodate projects that might reasonably be expected to extend over many years. For example, the Framingham study on heart disease has been in progress for more than 60 years [23]. If the Big Data project intends to use consented data for an indefinite period, as it almost always does, then the consent form must clarify this condition.

Most importantly, the consent form should carefully describe the risks of participation. In the case of Big Data analyses, the risks are typically confined to loss of confidentiality or loss of privacy.

  1. 2. Obtaining informed consent.

The U.S. Census is an established project that occurs every decade. The methods and the goals of the census have been developed over many decades. About 600,000 census workers are involved; their jobs are to obtain signed census forms from about 250 million individuals. The cost of each census is about $14 billion. Keeping these numbers in your mind, imagine that you are a Big Data manager. You maintain and operate a global Big Data resource, with data on over 2 billion individuals (8 times the population of the United States). You are informed by your supervisor that a new project for the resource will require you to obtain informed consent on the resource's catchment population. You are told that you will be assigned ten additional part-time workers to help you. You are given a budget of $100,000 for the project. When you complain that you need more help and a larger budget, you are told that you should use the computational power of the Big Data resource to facilitate the effort. You start looking for another job.

There are no easy ways to obtain informed consent. Popular marketing techniques that use automated or passive affirmations cannot be used to obtain informed consent. For example, opt out forms in which human subjects must take an action to be excluded from participating in a potentially harmful data collection effort are unacceptable. Informed consent must be affirmative. Forms should not be promissory (i.e., should not promise a reward for participation). Informed consent must be voluntary and uncompensated.

Consent must be obtained without coercion. Individuals cannot be denied customary treatment or access to goods and services if they refuse to grant consent. There are circumstances for which the choice of person who seeks informed consent may be considered coercive. A patient might feel threatened by a surgeon who waves a research-related consent form in their face minutes before a scheduled procedure. Big Data managers must be careful to obtain consent without intimidation.

The consent form must be signed if it is to have any legal value. This means that a Web page submission is unacceptable unless it can be reasonably determined that the person providing the consent is the same person who is listed in the submitted Web page. This would usually necessitate an authenticated password, at minimum. Issues of identity theft, password insecurity, and the general difficulty of managing electronic signatures make Web-based consent a difficult process.

The process of obtaining consent has never been easy. It cannot be fully automated because there will always be people whose contact information (e.g., email accounts) are invalid or who ignore all attempts at contact. To this date, nobody has found an inexpensive or labor-free method for obtaining informed consent from large numbers of individuals.

  1. 3. Preserving consent.

After consent has been obtained, it must be preserved. This means that the original paper document or a well-authenticated electronic document, with a verified signature, must be preserved. The consent form must be linked to the particular record for which it applies and to the protocol or protocols for which the consent applies. An individual may sign many different consent forms, for different data uses. The data manager must keep all of these forms safe and organized. If these documents are lost or stolen, then the entire resource can be jeopardized.

  1. 4. Ensuring that the consent status is kept confidential.

The consent forms themselves are potential sources of harm to patients. They contain information related to special studies or experiments or subsets of the population that include the individual. The consent form also contains the individual's name. If an unauthorized person comes into possession of consent forms, then the confidentiality of the individuality would be lost.

  1. 5. Determining whether biases are introduced by the consent process.

After all the consents have been collected, someone must determine whether the consented population introduces bias. The data analyst would ask: “Is the group of people who provide consent in any way different from the group of people who refuse to provide consent?” and, if so, “Will differences between the consenters and the non-consenters bias analytic outcomes?” A data analyst might look for specific differences among the consented and unconsented group in features that are relevant to the question under study. For example, for a medical disease study, are there differences in the incidence of the disease between the consenting group and the non-consenting group? Are there differences in the ages at which the disease occurs in consenters and non-consenters?

  1. 6. Creating a process whereby reversals and modifications of consent can be recorded and flagged.

In most cases, consent can be retracted. Retraction is particularly important in long or indefinite studies. The data manager must have a way of tracking consents and documenting a new consent status. For any future use of the data, occurring after the consent status has changed, the subject's data records must not be available to the data analyst.

  1. 7. Maintaining records of consent actions.

Tracking consent data is extremely difficult. Here are a few consent-related activities that Big Data managers must record and curate: “Does each consent form have an identifier?” “Does each consent form link to a document that describes the process by which the consent form was approved?” “If paper consent forms were used, can the data manager find and produce the physical consent document?”, “Was the consent restricted, permitting certain uses of the data and forbidding other types of data uses?” “Is each consent restriction tagged for tracking?”, “If the consent form was signed, is there a protocol in place by which the signature is checked to determine authenticity?”, “Does the data manager have a recorded policy that covers situations wherein subjects cannot provide an informed consent (e.g., infants, patients with dementia)?”, “Does the resource have protocols for using surrogate signatures for children and subjects who have guardians or assignees with power-of-attorney?”, “Does the Big Data resource have policies that exclude classes of individuals from providing informed consent?”, “Is there a protocol to deal with subjects who withdraw consent or modify their original consent?” “Does the resource track data related to consent withdrawals and modifications?”

  1. 8. Educating staff on the liberties and limitations of consented research.

Many Big Data managers neglect to train their staff on legal matters, including consent-related issues. Information technologists may erect strong mental barriers to exclude the kinds of legal issues that obfuscate the field of data law. Data managers have no choice but to persevere. It is unlikely that factors such as staff indifference and workplace incompetence will serve as mitigating factors when tort claims are adjudicated.

Section 19.6. Unconsented Data

The main point in our favor is that there is little or no case law, at least in the UK, which has unearthed any complaints by research participants about misuse of their contributions.

Louise Corti, Annette Day, and Gill Backhouse [22]

There are enormous technical difficulties and legal perils in the consent process. Is there some way of avoiding the whole mess?

I have worked for decades in an information-centric culture that has elevated the consent process to an ethical imperative. It is commonly held that the consent process protects individuals from harm, and data managers from liability. In the opinion of many of my colleagues, all confidential data on individuals should be consented into the database, unless there is a very good reason to the contrary.

After many years of dealing with the consent issue, I have reached a very different conclusion. To my way of thinking, consent should be avoided, if feasible; it should only be used as a last resort. In most circumstances, it is far preferable for all concerned to simply render data records harmless, and to use them without obtaining consent. As the dependence on consent has grown over the past few decades, several new issues, all having deleterious societal effects, have arisen:

  1. 1. Consent can be an unmerited revenue source for data managers.

When consent must be obtained on thousands or millions of individuals, the consenting costs can actually exceed the costs of preparing and using the data. When these costs are passed on to investors, or to taxpayers (in the case of public Big Data resources), it raises the perceived importance and the general cash flow for the resource. Though data managers are earnest and humble, as a rule, there are some managers who feel comfortable working on projects of dubious scientific value, and a low likelihood of success, if there is ample funding. Tasks related to the consent process cost money, without materially contributing to the research output. Because funding institutions must support consenting efforts, grant writers for Big Data projects can request and receive obscenely large awards, when consent is required.

  1. 2. The act of obtaining consent is itself a confidentiality risk.

The moment you ask for consent, you're creating a new security weakness, because the consent form contains sensitive information about the subject and the research project. The consent form must be stored, and retrieved as needed. As more and more people have access to copies of the consent forms, the risk of a confidentiality breach increases.

An irony of Big Data research is that the potential harm associated with soliciting consent may easily exceed the potential harms of participating as a subject in a Big Data project.

  1. 3. Consent issues may preoccupy data managers, diverting attention from other responsibilities.

There is a limit to the number of problems anyone can worry about. If half of your research effort is devoted to obtaining, storing, flagging, and retrieving consent forms, then you are less likely to pay attention to other aspects of the project. One of the chief lessons of this book is that, at the current time, most of our Big Data resources teeter on the brink of failure. The consent process can easily push a resource over the brink.

  1. 4. Consented research has been used for unintended purposes.

Once you have received permission to use personal data in a consented study, the data remains forever. Scientists can use this data freely, for any purpose, if they deidentify the data or if the original consent form indicates that the data might be used for future unspecified purposes. The latter option fueled the Havasupai lawsuit, to be discussed in the final section of this chapter.

As it happens, consent can be avoided altogether if the data in the resource has been rendered harmless through deidentification. Let's remember that the purpose of the consent form is to provide individuals with the choice to decline the risks associated with the use of their data in the Big Data resource. If there are no risks, there is no need to obtain consent. Data managers taking the unconsented path to data use need to ask themselves the following question. “Can I devise a way by which the data can be used, without risk to the individual?”

Exceptions exist. Regulations that restrict the use of data for designated groups of individuals may apply, even when no risk of harm is ascertained. Data confidentiality and privacy concerns are among the most difficult issues facing Big Data resources. Obtaining the advice of legal counsel is always wise.

The widespread use and public distribution of fully deidentified data records is a sort of holy grail for data miners. Medical records, financial transactions, collections of private electronic communications conducted over air and wire all contribute to the dark matter of the information universe. Everyone knows that this hidden data exists (we each contribute to these data collections), that this hidden data is much bigger than the data that we actually see, and that this data is the basic glue that binds the information universe. Nonetheless, most of the data created for the information universe is considered private. Private data is controlled by a small number of corporations who guard their data against prying eyes, while they use the data, to the extent allowed by law, to suit their own agendas. Why isn't Big Data routinely deidentified using methods discussed earlier, as discussed in Sections 3.6 and 3.7, and distributed for public review and analysis? Here are some of the reasons:

  •  Commercially available deidentification/scrubbing software is slow. It cannot cope with the exabytes of information being produced each year.
  •  None of the commercially available deidentification/scrubbing software does a perfect job. These software applications merely reduce the number of identifiers in records; they leave behind an irreducible amount of identifying information.
  •  Even if deidentification/scrubbing software actually were to perform as claimed, removing every identifier and every byte of unwanted data from electronic records, some records might be identified through the use of external database resources that establish identities through non-identifying details contained in records.
  •  Big Data managers are highly risk averse and would rather hoard their data than face the risk, no matter how unlikely, of a possible tort suit from an aggrieved individual.
  •  Big Data managers are comfortable with restricted data sharing, through legal instruments such as Data Use Agreements. Through such agreements, selected sets of data extracted from a Big Data resource are provided to one or a few entities who use the data for their own projects and who do not distribute the data to other entities. [Glossary Data sharing]
  •  Data deidentification methods, like many of the useful methods in the information field, can be patented. Some of the methods for deidentification have fallen under patent restriction, or have been incorporated into commercial software that is not freely available to data managers [24]. For some data managers, royalty and license costs are additional reasons for abandoning the deidentification process.
  •  Big Data managers are not fully convinced that deidentification is possible, even under ideal circumstances.

It may seem impossible, but information that is not considered identifying may actually be used to discover the name of the person linked to deidentified records. Basically, deidentification is easy to break when deidentified data can be linked to a name in an identified database containing fields that are included in both databases. This is the common trick underlying virtually every method designed to associate a name with a deidentified record.

Data managers who provide deidentified data sets to the public must worry whether there is, or ever will be, an available identified database that can be used to link fields, or combinations of fields, to their deidentified data, and thus link their records to the names of individuals. This worry weighs so heavily on data managers and on legal consultants for Big Data resources that there are very few examples of publicly available deidentified databases. Everyone in the field of Big Data is afraid of the legal repercussions that will follow when the confidentiality of their data records is broken.

Section 19.7. Privacy Policies

No keyboard present

Hit F1 to continue

Zen engineering?

Computer-inspired haiku by Jim Griffith

Discussions of privacy and confidentiality seem to always focus on the tension that results when the interests of the data holders conflict with the interests of the data subjects. These issues can be intractable when each side has a legitimate claim to their own preferences (businesses need to make profit, and individuals need some level of privacy).

At some point, every Big Data manager must create a Privacy Policy, and abide by their own rules. It has been my experience that legal problems arise when companies have no privacy policy, or have a privacy policy that is not well-documented, or have a privacy policy that is closed to scrutiny, or have a fragmented privacy policy, or fail to follow their own policy. If the company is open with its policy (i.e., permits the policy to be scrutinized by the public), and willing to change the policy if it fails to adequately protect individuals from harm, then the company is not likely to encounter any major problems.

Privacy protection protocols do not need to be perfect. They do, however, need to be followed. Companies are much more likely to get into trouble for ignoring their own policies than for following an imperfect policy. For a policy to be followed, the policy must be simple. Otherwise, the employees will be incapable of leaning the policies. Unknowable policies tend to be ignored by the unknowing staff.

Every Big Data project should make the effort to produce a thoughtful set of policies to protect the confidentiality of their records and the privacy of data subjects. These policies should be studied by every member of a Big Data project, and should be modified as needed, and reviewed at regular intervals. Every modification and review should be thoroughly documented. Every breach or failure of every policy must be investigated, promptly, and the results of the investigation, including any and all actions taken, must be documented. Competent data managers will make it their priority to see that the protocols are followed and that their review process is fully documented.

If you are a Big Data manager endowed with a overactive imagination, it is possible to envision all types of unlikely scenarios in which confidentiality can be breached. Nobody is perfect, and nobody expects perfection from any human endeavor. Much of law is based on a standard of “reasonableness.” Humans are not held to an unreasonable standard. As an example, the privacy law that applies to hospitals and healthcare organizations contains 390 occurrences of the word “reasonable” [25]. A reasonable approach to confidentiality and privacy is all that can be expected from a complex human endeavor.

Section 19.8. Case Study: Timely Access to Big Data

Don't accept your dog's admiration as conclusive evidence that you are wonderful.

Ann Landers

In the clinical bioinformatics world, testing laboratories must have access to detailed population data, on millions of gene variants, with which to correlate their findings [2631]. Specifically, genetics laboratories need to know whether a gene variant is present in the normal population that has no clinical significance; or whether variants are associated with disease. The lives of patients are put at risk when we are deprived of timely and open access to data relating genetic findings to clinical phenotypes.

In 2008, a 2-year-old child had a severe seizure, and died. In the prior year, the child had undergone genetic testing. The child's doctors were concerned that the patient might have Dravet's syndrome, a seizure disorder in which about 80% of patients have a mutation in the SCN1A gene. The laboratory discovered a mutation in the child's SCN1A gene, but remarked in their report that the mutation was a variant of unknown significance. That is to say that the reference database of sequence variants, used by the laboratory, did not contain information that specifically linked the child's SCN1A mutation to Dravet syndrome. In this circumstance, the laboratory report indicated that the gene test was “inconclusive”; they could neither rule in or rule out the possibility that the found mutation was diagnostic of Dravet syndrome.

Some time later, the child died.

In a wrongful death lawsuit filed by the child's mother, the complaint was made that two published reports, appearing in 2006 and 2007, had linked the specific SCN1A gene mutation, that was subsequently found in her child's DNA, with an epileptic encephalopathy [32]. According to the mother, the reporting laboratory should have known the significance of her child's mutation [32,33]. Regardless of the verdict rendered at this trial, the circumstances serve as fair warning. In the era of Big Data, testing laboratories need access to the most current data available, including the data generated by competing laboratories.

Section 19.9. Case Study: The Havasupai Story

Freeing yourself was one thing; claiming ownership of that freed self was another.

Toni Morrison

For those who seek consent for research, the case of the Havasupai Tribe v. Arizona Board of Regents holds us in thrall. The facts of the case play out over a 21-year period, from 1989 to 2010. In 1989 Arizona University obtained genetic samples from several hundred members of the Havasupai Tribe, a community with a high prevalence of Type II diabetes. In addition to their use in diabetes research, the informed consent indicated the samples might be used for research on “behavioral and medical disorders,” not otherwise specified. The researchers tried but failed to make headway linking genes sampled from the Havasupai tribe with cases of diabetes. The gene samples were subsequently used for ancillary studies that included schizophrenia and for studies on the demographic trends among the Havasupai. These ancillary studies were performed without the knowledge of the Havasupai. In 2003 a member of the Havasupai tribe happened to attend a lecture, at Arizona State University, on the various studies performed with the Havasupai DNA samples.

The Havasupai tribe was enraged. They were opposed to the use of their DNA samples for studies of schizophrenia or for the studies of demographic trends. In their opinions, these studies did not benefit the Havasupai and touched upon questions that were considered embarrassing and taboo, including the topic of consanguineous matings, and the prevalence rates of mental illnesses within the tribe.

In 2004, the Havasupai Tribe filed a lawsuit indicating lapses in the informed consent process, violation of civil rights, violation of confidentiality, and unapproved use of the samples. The case was dismissed on procedural grounds, but was reinstated by the Arizona Court of Appeals, in 2008 [34].

Reinstatement of the case led to lengthy and costly legal maneuvers. Eventually, the case was settled out of court. Arizona State University agreed to pay individuals in the Havasupai tribe a total of $700,000. This award is considerably less than the legal costs already incurred by the University. Arizona State University also agreed to return the disputed DNA samples to the Havasupai tribe.

If the Havasupai tribe had won anything in this dispute, it must have been a Pyrrhic victory. Because the case was settled out of court, no legal decision was rendered, and no clarifying precedent was established.

Though I am not qualified to comment on the legal fine-points, several of the principles related to the acquisition and use of data are relevant and can be discussed as topics of general interest.

First, the purpose of an informed consent document is to list the harm that might befall the individual who gives consent, as a consequence of his or her participation as a human subject. Consent relates only to harm; consent does not relate to approval for research. Laypersons should not be put into a situation wherein they must judge the value of research goals. By signing consent, the signator indicates that he or she is aware of the potential harm from the research, and agrees to accept the risk. In the case of samples or data records contributed to a Big Data resource, consenters must be warned, in writing, that the data will be used for purposes that cannot be specified in the consent form.

Secondly, most consent is obtained to achieve one primary purpose, and this purpose is customarily described briefly in the consent form. The person who consents often wants to know that the risks that he or she is accepting will be compensated by some potential benefit to society. In the case of the Havasupai Tribe v. Arizona State University, the tribe sought to exert control over how their DNA would be used [35]. It would seem that the Havasupai Tribe members believed that their DNA should be used exclusively for scientific efforts that would benefit the tribe. There is no ethical requirement that binds scientists to conduct their research for the sole benefit of one group of individuals. A good consent form will clearly state that research conducted cannot be expected to be of any direct value to the consenter.

Finally, the consent form should include all of the potential harms that might befall the consenter as a consequence of his or her participation. It may be impossible to anticipate every possible adverse consequence to a research participant. In this case, the scientists at Arizona State University did not anticipate that the members of the Havasuapai Tribe would be harmed if their gene data was used for ancillary research purposes. I would expect that the researchers at Arizona State University do not believe that their research produced any real harm. The Havasupai tribal members believe otherwise. It would seem that the Havasupai believed that their DNA samples were abused, and that their trust had been violated.

Had the original consent form listed all of the potential harms, as perceived by the Havasupai, then the incident could have been avoided. The Havasupai could have reached an informed decision weighing the potential benefits of diabetes research against the uncertain consequences of using their DNA samples for future research projects that might be considered taboo.

Why had the Havasupai signed their consent forms? Had any members of the Havasupai tribe voiced concerns over the unspecified medical and behavioral disorders mentioned in the consent form, then the incident could have been avoided.

In a sense, the Havasupai v. Arizona Board of Regents lawsuit hinged on a misunderstanding. The Havasupai did not understand how scientists use information to pursue new questions. The Board of Regents did not understand the harms that occur when data is used for legitimate scientific purposes. The take home lesson for data managers is the following: to the extent humanly possible, ensure that consent documents contain a complete listing of relevant adverse consequences. In some cases, this may involve writing the consent form with the assistance of members of the group whose consent is sought.

Glossary

Bayh-Dole Act The Patent and Trademark Amendments of 1980, P.L. 96-517. Adopted in 1980, the U.S. Bayh-Dole legislation and subsequent extensions gave universities and corporations the right to keep and control any intellectual property (including data sets) developed under federal grants. The Bayh-Dole Act has provided entrepreneurial opportunities for researchers who work under federal grants, but has created conflicts of interest that should be disclosed to human subjects during the informed consent process. It is within the realm of possibility that a researcher who stands to gain considerable wealth, depending on the outcome of the project, may behave recklessly or dishonestly to achieve his or her ends.

DMCA Digital Millennium Copyright Act, signed into law in 1998. This law deals with many different areas of copyright protection, most of which are only peripherally relevant to Big Data. In particular, the law focuses on copyright protections for recorded works, particularly works that have been theft-protected by the copyright holders [10]. The law also contains a section (Title II) dealing with the obligations of online service providers who inadvertently distribute copyrighted material. Service providers may be protected from copyright infringement liability if they block access to the copyrighted material when the copyright holder or the holder's agent claims infringement. To qualify for liability protection, service providers must comply with various guidelines (i.e., the so-called safe harbor guidelines) included in the Act.

Data Quality Act In the United States the data upon which public policy is based must have quality and must be available for review by the public. Simply put, public policy must be based on verifiable data. The Data Quality Act of 2002, requires the Office of Management and Budget to develop government-wide standards for data quality [3].

Data sharing Providing one's own data to another person or entity. This process may involve free or purchased data, and it may be done willingly, or under coercion, as in compliance with regulations, laws, or court orders.

Informed consent Human subjects who are put at risk must provide affirmative consent, if they are to be included in a government-sponsored study. This legally applies in the United States and most other nations, and ethically applies to any study that involves putting humans at risk. To this end, researchers provide prospective human subjects with an “informed consent” document that informs the subject of the risks of the study, and discloses foreseen financial conflicts among the researchers. The informed consent must be clear to laymen, must be revocable (i.e., subjects can change their mind and withdraw from the study, if feasible to do so), must not contain exculpatory language (e.g., no waivers of responsibility for the researchers), must not promise any benefit or monetary compensation as a reward for participation, and must not be coercive (i.e., must not suggest a negative consequence as a result of non-participation).

Web service Server-based collections of data, plus a collection of software routines operating on the data, that can be accessed by remote clients. One of the features of Web services is that they permit client users (e.g., humans or software agents) to discover the kinds of data and methods offered by the Web Service and the rules for submitting server requests. To access Web services, clients must compose their requests as messages conveyed in a language that the server is configured to accept, a so-called Web services language.

References

[1] Shah S., Horne A., Capella J. Good Data Won't Guarantee Good Decisions. Harvard Business Review; April 2012.

[2] Cranor C. Scientific Inferences in the Laboratory and the Law. Am J Public Health. 2005;95:S121–S128.

[3] Data Quality Act. 67 Fed. Reg. 8,452, February 22, 2002, addition to FY 2001 Consolidated Appropriations Act (Pub. L. No. 106-554 codified at 44 U.S.C. 3516).

[4] Bornstein D. The dawn of the evidence-based budget. The New York Times; 2012 May 30.

[5] Guidelines for ensuring and maximizing the quality, objectivity, utility, and integrity of information disseminated by federal agencies. Fed Regist. 2002;67(36) February 22.

[6] Sass J.B., Devine Jr. J.P. The Center for Regulatory Effectiveness invokes the Data Quality Act to reject published studies on atrazine toxicity. Environ Health Perspect. 2004;112:A18.

[7] Tozzi J.J., Kelly Jr. W.G., Slaughter S. Correspondence: data quality act: response from the Center for Regulatory Effectiveness. Environ Health Perspect. 2004;112:A18–A19.

[8] Mooney C. Thanks to a little-known piece of legislation, scientists at the EPA and other agencies find their work questioned not only by industry, but by their own government, Interrogations. Boston Globe; 2005. August 28 http://archive.boston.com/news/globe/ideas/articles/2005/08/28/interrogations/?page=full [viewed November 7, 2017].

[9] Copyright Act, Section 107, Limitations on exclusive rights: fair use. Available from: http://www.copyright.gov/title17/92chap1.html [viewed May 18, 2017].

[10] The Digital Millennium Copyright Act of 1998 U.S. Copyright Office Summary. Available from: http://www.copyright.gov/legislation/dmca.pdf [viewed August 24, 2012].

[11] No Electronic Theft (NET) Act of 1997 (H.R. 2265). Statement of Marybeth Peters The Register of Copyrights before the Subcommittee on Courts and Intellectual Property Committee on the Judiciary. United States House of Representatives 105th Congress, 1st Session. September 11, 1997. Available from: http://www.copyright.gov/docs/2265_stat.html [viewed August 26, 2012].

[12] The Freedom of Information Act. 5 U.S.C. 552. Available from: http://www.nih.gov/icd/od/foia/5usc552.htm [viewed August 26, 2012].

[13] December 5 Gates S. Qualcomm v. Broadcom—The Federal Circuit Weighs in on “Patent Ambushes”. Available from: http://www.mofo.com/qualcomm-v-broadcom---the-federal-circuit-weighs-in-on-patent-ambushes-12-05-2008. 2008 [viewed January 22, 2013].

[14] Cahr D., Kalina I. Of pacs and trolls: how the patent wars may be coming to a hospital near you. ABA Health Lawyer. 2006;19:15–20.

[15] Berman J.J. Data simplification: taming information with open source tools. Waltham, MA: Morgan Kaufmann; 2016.

[16] Greenbaum D., Gerstein M. A universal legal framework as a prerequisite for database interoperability. Nat Biotechnol. 2003;21:979–982.

[17] Perlroth N. Digital data on patients raises risk of breaches. The New York Times; 2011 December 18.

[18] Frieden T. VA will pay $20 million to settle lawsuit over stolen laptop's data. CNN.; 2009 January 27.

[19] Mathieson S.A. UK government loses data on 25 million Britons: HMRC chairman resigns over lost CDs. ComputerWeekly.com 20; 2007 November 20.

[20] Sack K. Patient data posted online in major breach of privacy. The New York Times; 2011 September 8.

[21] Broad W.J. U.S. accidentally releases list of nuclear sites. The New York Times; 2009 June 3.

[22] Corti L, Day A, Backhouse G. Confidentiality and Informed Consent: Issues for Consideration in the Preservation of and Provision of Access to Qualitative Data Archives [46 paragraphs].

[23] Framingham Heart Study. NIH, U.S. National Library of Medicine. Clinical Trials.gov. Available from: http://www.clinicaltrials.gov/ct/show/NCT00005121 [viewed October 16, 2012].

[24] Berman J.J. Racing to share pathology data. Am J Clin Pathol. 2004;121:169–171.

[25] Department of Health and Human Services. 45 CFR (Code of Federal Regulations), Parts 160 through 164. Standards for Privacy of Individually Identifiable Health Information (Final Rule). Fed Regist. 2000;65(250):82461–82510 December 28.

[26] Gilissen C., Hoischen A., Brunner H.G., Veltman J.A. Disease gene identification strategies for exome sequencing. Eur J Hum Genet. 2012;20:490–497.

[27] Bodmer W., Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701.

[28] Wallis Y., Payne S., McAnulty C., Bodmer D., Sistermans E., Robertson K., Moore D., Abbs D., Deans Z., Devereau A. Practice guidelines for the evaluation of pathogenicity and the reporting of sequence variants in clinical molecular genetics. Association for Clinical Genetic Science; 2013. http://www.acgs.uk.com/media/774853/evaluation_and_reporting_of_sequence_variants_bpgs_june_2013_-_finalpdf.pdf [viewed May 26, 2017].

[29] Pritchard J.K. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137.

[30] Pennisi E. Breakthrough of the year: human genetic variation. Science. 2007;318:1842–1843.

[31] MacArthur D.G., Manolio T.A., Dimmock D.P., Rehm H.L., Shendure J., Abecasis G.R., et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508:469–476.

[32] Ray T. Mother's negligence suit against Quest's Athena could broadly impact genetic testing labs. GenomeWeb; 2016 March 14.

[33] Ray T. Wrongful death suit awaits input from South Carolina supreme court. Genomeweb; 2017 April 4.

[34] Appeal from the Superior Court in Maricopa County Cause No. CV2005-013190. Available from: http://www.azcourts.gov/Portals/89/opinionfiles/CV/CV070454.pdf [viewed August 21, 2012].

[35] Informed consent and the ethics of DNA research. The New York Times; 2010 April 23.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.156.156