Many organizations today analyze and share large, sensitive datasets about individuals. Whether these datasets cover healthcare details, financial records, or exam scores, it's become more difficult for organizations to protect an individual's information through deidentification, anonymization, and other traditional statistical disclosure limitation techniques. This practical book explains how differential privacy (DP) can help.

Authors Ethan Cowan, Mayana Pereira, and Michael Shoemate explain how these techniques enable data scientists, researchers, and programmers to run statistical analyses that hide the contribution of any single individual. You'll dive into basic DP concepts and understand how to use open source tools to create differentially private statistics, explore how to assess the utility/privacy trade-offs, and learn how to integrate differential privacy into workflows.

With this book, you'll learn:

  • How DP guarantees privacy when other data anonymization methods don't
  • What preserving individual privacy in a dataset entails
  • How to apply DP in several real-world scenarios and datasets
  • Potential privacy attack methods, including what it means to perform a reidentification attack
  • How to use the OpenDP library in privacy-preserving data releases
  • How to interpret guarantees provided by specific DP data releases

Table of Contents

  1. 1. History
  2. 2. Privacy in the Classroom
  3. 3. Differential Privacy Fundamentals
  4. 4. Dataset Transformations, Dataset Distance, and Stability
  5. 5. Event-level Data and Differential Privacy
  6. 6. A System for Relating Distances
  7. 7. Machine Learning and Differential Privacy
  8. 8. Machine Learning Part II
  9. 9. Defining Privacy Loss Parameters of a Data Release: How to Choose Epsilon, Delta and Other Greek Letters
  10. 10. Protecting Your Data Against Privacy Attacks
  11. 11. Differentially Private Synthetic Data