0%

Book Description

Recently, data scientists have found effective methods to generate high-quality synthetic data. That’s good news for companies seeking large amounts of data to train and build artificial intelligence and machine learning models. This report provides an overview of synthetic data generation that not only focuses on business value and use cases but also provides some practical techniques for using synthetic data.

Author Khaled El Emam, cofounder and Director of Replica Analytics and Professor at the University of Ottawa, helps data analytics leadership understand the options so they can get started building their own training sets. With the help of several industry use cases, you’ll learn how synthetic data can accelerate machine learning projects in your company. As advances in synthetic data generation continue, broad adoption of this approach will quickly follow.

  • Learn what synthetic data is and how it can accelerate machine learning model development
  • Understand how synthetic data is generated—and why these datasets are similar to real data
  • Explore the process and best practices for generating synthetic datasets
  • Examine case studies of synthetic data use in industries including manufacturing, healthcare, financial services, and transportation
  • Learn key requirements for future work and improvements to synthetic data

Table of Contents

  1. 1. Defining Synthetic Data
    1. What Is Synthetic Data?
    2. The Benefits of Synthetic Data
      1. Improving Data Access
      2. Improving Data Quality
      3. Using Synthetic Data for Exploratory Analysis
      4. Using Synthetic Data for Full Analysis
      5. Replacing Real Data That Does Not Exist
    3. Learning to Trust Synthetic Data
    4. Other Approaches to Accessing Data
    5. Generating Synthetic Data from Real Data
    6. Conclusions
  2. 2. The Synthesis Process
    1. Data Synthesis Projects
      1. Data Synthesis Steps
      2. Data Preparation
    2. The Data Synthesis Pipeline
    3. Synthesis Program Management
    4. Best Practices for Implementing Data Synthesis
      1. Having Sufficient Computing Capacity
      2. Synthesizing Cohorts Versus Full Datasets
      3. Performing Validation Studies to Get Buy-In
    5. Conclusions
  3. 3. Synthetic Data Case Studies
    1. Manufacturing and Distribution
    2. Health Care
      1. Data for Cancer Research
      2. Evaluating Innovative Digital Health Technologies
    3. Financial Services
      1. Synthetic Data Benchmarks
      2. Software Testing
    4. Transportation
      1. Microsimulation Models
      2. Data Synthesis for Autonomous Vehicles
    5. Conclusions
  4. 4. The Future of Data Synthesis
    1. Creating a Data Utility Framework
    2. Removing Information from Synthetic Data
    3. Using Data Watermarking
    4. Generating Synthesis from Simulators
    5. Conclusions
18.219.236.70