0%

Book Description

A revolution is occurring in data management regarding how data is collected, stored, processed, governed, managed, and provided to decision makers. The data lake is a popular approach that harnesses the power of big data and marries it with the agility of self-service. With this report, IT executives and data architects will focus on the technical aspects of building a data lake for your organization.

Alex Gorelik from Facebook explains the requirements for building a successful data lake that business users can easily access whenever they have a need. You'll learn the phases of data lake maturity, common mistakes that lead to data swamps, and the importance of aligning data with your company's business strategy and gaining executive sponsorship.

You'll explore:

  • The ingredients of modern data lakes, such as the use of different ingestion methods for different data formats, and the importance of the three Vs: volume, variety, and velocity
  • Building blocks of successful data lakes, including data ingestion, integration, persistence, data governance, and business intelligence and self-service analytics
  • State-of-the-art data lake architectures offered by Amazon Web Services, Microsoft Azure, and Google Cloud

Table of Contents

  1. 1. Introduction to Data Lakes
    1. Building a Successful Data Lake
    2. Advantages of a Cloud Data Lake Platform
    3. The Data Swamp
    4. Conclusion
  2. 2. Building Successful Data Lakes
    1. Ingestion and Integration
      1. ETL/ELT, MapReduce
      2. Self-Service Data Preparation
      3. Integration Platform as a Service
      4. Data Virtualization
    2. Persistence
      1. Why Use Zones?
      2. Storage Technologies
    3. Governance
      1. Regulatory Compliance
      2. Access Control
      3. Data Quality
    4. BI and Self-Service Analytics
    5. Advanced Analytics—Data Science, AI/ML
    6. Conclusion
  3. 3. AWS, Azure, and GCP Architecture
    1. Amazon Web Services
    2. Microsoft Azure
    3. Google Cloud Platform
    4. Which Service Should You Use?
    5. Conclusion
  4. 4. Architecting Multiple Data Lakes
    1. To Merge or Not to Merge?
      1. Reasons for Keeping Data Lakes Separate
      2. Advantages of Merging Data Lakes
      3. Building Multiple Data Lakes on the Same Cloud Platform
    2. Virtual Data Lakes
      1. Data Federation
      2. Data Fabric
      3. Catalogs and Data Oceans
    3. Conclusion
54.210.83.20