It's no wonder data engineering expertise is in high demand, given the large costs and enormous decision-making impact data systems can have. With this practical book, you'll learn industry-tested methods for taking advantage of cloud services while avoiding complexity and out-of-control costs.

Cloud service providers deliver most of the high-quality education in this space, but their offerings have two major downsides. Their documentation tends to focus on solving problems with their products rather than addressing the complexity of working in an enterprise environment. Author Sahil Jhangiani provides a holistic approach to managing data in the cloud using specific examples from the three major cloud providers (AWS, GCP, and Azure).

You'll learn how to:

  • Navigate the large swath of cloud services available
  • Cut through marketing hype to identify the root technologies and concepts behind big data tooling
  • Build modular pipelines and systems that manage change smoothly
  • Understand the tricks and pitfalls of processing large datasets, from both a cost and a performance viewpoint
  • Create systems that integrate smoothly and can adapt to ever-changing analytical workloads
  • Avoid vendor lock-in and leverage individual cloud services for what they do best

Table of Contents

  1. 1. Service Offerings
    1. A Quick Bit on Managed Services
    2. The Borrowed Instance
    3. Storage
    4. Execution Environment
    5. Data Services
    6. DevOps
    7. Networking
    8. Data Analytics/Science
    9. Billing Structures
    10. Tradeoffs, Tradeoffs, Tradeoffs
  2. 2. Security, Privacy, and Regulatory Considerations
    1. Security Basics
    2. Secrets Management
    3. Networking
    4. User Interaction
    5. Data Privacy and Regulatory Compliance