Section 3: Implementing Common Use Cases and Best Practices

This part of the book will explain how to implement the most common use cases of Amazon EMR, including batch ETL with Spark, real-time streaming with Spark Streaming, and handling UPSERT operations in S3 data lakes with Apache Hudi. Then it will explain how you can orchestrate your EMR jobs and how you can strategize on-premises Hadoop cluster migration to EMR, and finally, it will cover some of the best practices and cost optimization techniques you can follow while implementing your data analytics pipeline in EMR.

This section comprises the following chapters:

  • Chapter 9, Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark
  • Chapter 10, Implementing Real-Time Streaming with Amazon EMR and Spark Streaming
  • Chapter 11, Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi
  • Chapter 12, Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA
  • Chapter 13, Migrating On-Premises Hadoop Workloads to Amazon EMR
  • Chapter 14, Best Practices and Cost Optimization Techniques
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.226.216