Healthcare is the next frontier for data science. Using the latest in machine learning, deep learning, and natural language processing, you'll be able to solve healthcare's most pressing problems: reducing cost of care, ensuring patients get the best treatment, and increasing accessibility for the underserved — once you learn how to access and make sense of all that data.

This book provides pragmatic and hands-on solutions for working with healthcare data, from data extraction to cleaning and normalizing to feature engineering. Author Andrew Nguyen covers specific ML and deep learning examples with a focus on producing high-quality data. You'll discover how graph technologies help you connect disparate data sources so you can solve healthcare's most challenging problems using advanced analytics.

With this book, you'll learn:

  • The different types of healthcare data: electronic health records, clinical registries and trials, digital health tools, and claims data
  • The challenges of working with healthcare data, especially when trying to aggregate data from multiple sources
  • Current options for extracting structured data from clinical text
  • How to make trade-offs when using tools and frameworks for normalizing structured healthcare data
  • How to harmonize healthcare data using terminologies, ontologies, and mappings and crosswalks

Table of Contents

  1. 1. Introduction to Healthcare Data
    1. The Enterprise Mindset
    2. The Complexity of Healthcare Data
    3. Sources of Healthcare Data
    4. Electronic Health Records
    5. Claims Data
    6. Clinical / Disease Registries
    7. Clinical Trials Data
    8. Other Data
    9. Data Collection and How that Affects Data Scientists
    10. Retrospective vs. Prospective Studies
  2. 2. Technical Introduction
    1. Basic Introduction to Docker and Containers
    2. Installing and Testing Docker
    3. Conceptual Introduction to Databases
    4. ACID Compliance
    5. OLTP Systems
    6. OLAP Systems
    7. SQL vs. NoSQL
    8. SQL Databases
    9. Labeled Property Graph (LPG) Databases
    10. Resource Description Framework (RDF) Databases
    11. Hypergraph Databases
    12. Conclusion