0%

DESIGNING BIG DATA PLATFORMS

Provides expert guidance and valuable insights on getting the most out of Big Data systems

An array of tools are currently available for managing and processing data—some are ready-to-go solutions that can be immediately deployed, while others require complex and time-intensive setups. With such a vast range of options, choosing the right tool to build a solution can be complicated, as can determining which tools work well with each other. Designing Big Data Platforms provides clear and authoritative guidance on the critical decisions necessary for successfully deploying, operating, and maintaining Big Data systems.

This highly practical guide helps readers understand how to process large amounts of data with well-known Linux tools and database solutions, use effective techniques to collect and manage data from multiple sources, transform data into meaningful business insights, and much more. Author Yusuf Aytas, a software engineer with a vast amount of big data experience, discusses the design of the ideal Big Data platform: one that meets the needs of data analysts, data engineers, data scientists, software engineers, and a spectrum of other stakeholders across an organization. Detailed yet accessible chapters cover key topics such as stream data processing, data analytics, data science, data discovery, and data security. This real-world manual for Big Data technologies:

  • Provides up-to-date coverage of the tools currently used in Big Data processing and management
  • Offers step-by-step guidance on building a data pipeline, from basic scripting to distributed systems
  • Highlights and explains how data is processed at scale
  • Includes an introduction to the foundation of a modern data platform

Designing Big Data Platforms: How to Use, Deploy, and Maintain Big Data Systems is a must-have for all professionals working with Big Data, as well researchers and students in computer science and related fields.

Table of Contents

  1. Cover
  2. Title Page
  3. Copyright
  4. List of Contributors
  5. Preface
  6. Acknowledgments
  7. Acronyms
  8. Introduction
  9. 1 An Introduction: What's a Modern Big Data Platform
    1. 1.1 Defining Modern Big Data Platform
    2. 1.2 Fundamentals of a Modern Big Data Platform
  10. 2 A Bird's Eye View on Big Data
    1. 2.1 A Bit of History
    2. 2.2 What Makes Big Data
    3. 2.3 Components of Big Data Architecture
    4. 2.4 Making Use of Big Data
  11. 3 A Minimal Data Processing and Management System
    1. 3.1 Problem Definition
    2. 3.2 Processing Large Data with Linux Commands
    3. 3.3 Processing Large Data with PostgreSQL
    4. 3.4 Cost of Big Data
  12. 4 Big Data Storage
    1. 4.1 Big Data Storage Patterns
    2. 4.2 On‐Premise Storage Solutions
    3. 4.3 Cloud Storage Solutions
    4. 4.4 Hybrid Storage Solutions
  13. 5 Offline Big Data Processing
    1. 5.1 Defining Offline Data Processing
    2. 5.2 MapReduce Technologies
    3. 5.3 Apache Spark
    4. 5.4 Apache Flink
    5. 5.5 Presto
  14. 6 Stream Big Data Processing
    1. 6.1 The Need for Stream Processing
    2. 6.2 Defining Stream Data Processing
    3. 6.3 Streams via Message Brokers
    4. 6.4 Streams via Stream Engines
  15. 7 Data Analytics
    1. 7.1 Log Collection
    2. 7.2 Transferring Big Data Sets
    3. 7.3 Aggregating Big Data Sets
    4. 7.4 Data Pipeline Scheduler
    5. 7.5 Patterns and Practices
    6. 7.6 Exploring Data Visually
  16. 8 Data Science
    1. 8.1 Data Science Applications
    2. 8.2 Data Science Life Cycle
    3. 8.3 Data Science Toolbox
    4. 8.4 Productionalizing Data Science
  17. 9 Data Discovery
    1. 9.1 Need for Data Discovery
    2. 9.2 Data Governance
    3. 9.3 Data Discovery Tools
  18. 10 Data Security
    1. 10.1 Infrastructure Security
    2. 10.2 Data Privacy
    3. 10.3 Law Enforcement
    4. 10.4 Data Security Tools
  19. 11 Putting All Together
    1. 11.1 Platforms
    2. 11.2 Big Data Systems and Tools
    3. 11.3 Challenges
  20. 12 An Ideal Platform
    1. 12.1 Event Sourcing
    2. 12.2 Kappa Architecture
    3. 12.3 Data Mesh
    4. 12.4 Data Reservoirs
    5. 12.5 Data Catalog
    6. 12.6 Self‐service Platform
    7. 12.7 Abstraction
    8. 12.8 Data Guild
    9. 12.9 Trade‐offs
    10. 12.10 Data Ethics
  21. Appendix A: Further Systems and Patterns
    1. A.1 Lambda Architecture
    2. A.2 Apache Cassandra
    3. A.3 Apache Beam
  22. Appendix B: Recipes
    1. B.1 Activity Tracking Recipe
    2. B.2 Data Quality Assurance
    3. B.3 Estimating Time to Delivery
    4. B.4 Incident Response Recipe
    5. B.5 Leveraging Spark SQL Metrics
    6. B.6 Airbnb Price Prediction
  23. Bibliography
  24. Index
  25. End User License Agreement
13.59.36.203