0%

PEEK “UNDER THE HOOD” OF BIG DATA ANALYTICS

The world of big data analytics grows ever more complex. And while many people can work superficially with specific frameworks, far fewer understand the fundamental principles of large-scale, distributed data processing systems and how they operate. In Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood, renowned big-data experts and computer scientists Drs. Supun Kamburugamuve and Saliya Ekanayake deliver a practical guide to applying the principles of big data to software development for optimal performance.

The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You???ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within.

Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system.

Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to:

  • Identify the foundations of large-scale, distributed data processing systems
  • Make major software design decisions that optimize performance
  • Diagnose performance problems and distributed operation issues
  • Understand state-of-the-art research in big data
  • Explain and use the major big data frameworks and understand what underpins them
  • Use big data analytics in the real world to solve practical problems

Table of Contents

  1. Cover
  2. Title Page
  3. Introduction
    1. History of Data-Intensive Applications
    2. Data Processing Architecture
    3. Foundations of Data-Intensive Applications
    4. Who Should Read This Book?
    5. Organization of the Book
    6. Scope of the Book
    7. References
    8. References
  4. CHAPTER 1: Data Intensive Applications
    1. Anatomy of a Data-Intensive Application
    2. Parallel Applications
    3. Application Classes and Frameworks
    4. What Makes It Difficult?
    5. Summary
    6. References
    7. Notes
  5. CHAPTER 2: Data and Storage
    1. Storage Systems
    2. Data Formats
    3. Data Replication
    4. Data Partitioning
    5. NoSQL Databases
    6. Message Queuing
    7. Summary
    8. References
    9. Notes
  6. CHAPTER 3: Computing Resources
    1. A Demonstration
    2. Computer Clusters
    3. Data Analytics in Clusters
    4. Distributed Application Life Cycle
    5. Computing Resources
    6. Cluster Resource Managers
    7. Job Scheduling
    8. Summary
    9. References
    10. Notes
  7. CHAPTER 4: Data Structures
    1. Virtual Memory
    2. The Need for Data Structures
    3. Object and Text Data
    4. Vectors and Matrices
    5. Table
    6. Summary
    7. References
    8. Notes
  8. CHAPTER 5: Programming Models
    1. Introduction
    2. Data Structures and Operations
    3. Message Passing Model
    4. Distributed Data Model
    5. Task Graphs (Dataflow Graphs)
    6. Batch Dataflow
    7. Streaming Dataflow
    8. SQL
    9. Summary
    10. References
    11. Notes
  9. CHAPTER 6: Messaging
    1. Network Services
    2. Messaging for Data Analytics
    3. Distributed Operations
    4. Distributed Operations on Arrays
    5. Distributed Operations on Tables
    6. Advanced Topics
    7. Summary
    8. References
    9. Notes
  10. CHAPTER 7: Parallel Tasks
    1. CPUs
    2. Accelerators
    3. Task Execution
    4. Batch Tasks
    5. Streaming Tasks
    6. Summary
    7. References
  11. CHAPTER 8: Case Studies
    1. Apache Hadoop
    2. Apache Spark
    3. Apache Storm
    4. Kafka Streams
    5. PyTorch
    6. Cylon
    7. Rapids cuDF
    8. Summary
    9. References
    10. Notes
  12. CHAPTER 9: Fault Tolerance
    1. Dependable Systems and Failures
    2. Recovering from Faults
    3. Checkpointing
    4. Streaming Systems
    5. Batch Systems
    6. Summary
    7. References
  13. CHAPTER 10: Performance and Productivity
    1. Performance Metrics
    2. Performance Factors
    3. Finding Issues
    4. Programming Languages
    5. Productivity
    6. Summary
    7. References
    8. Notes
  14. Index
  15. Copyright
  16. Dedication
  17. About the Authors
  18. About the Editor
  19. Acknowledgments
  20. End User License Agreement
18.226.251.22