This part of the book will explain how to implement the most common use cases of Amazon EMR, including batch ETL with Spark, real-time streaming with Spark Streaming, and handling UPSERT operations in S3 data lakes with Apache Hudi. Then it will explain how you can orchestrate your EMR jobs and how you can strategize on-premises Hadoop cluster migration to EMR, and finally, it will cover some of the best practices and cost optimization techniques you can follow while implementing your data analytics pipeline in EMR.
This section comprises the following chapters:
18.118.226.216