Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Section 3: Implementing Common Use Cases and Best Practices

This part of the book will explain how to implement the most common use cases of Amazon EMR, including batch ETL with Spark, real-time streaming with Spark Streaming, and handling UPSERT operations in S3 data lakes with Apache Hudi. Then it will explain how you can orchestrate your EMR jobs and how you can strategize on-premises Hadoop cluster migration to EMR, and finally, it will cover some of the best practices and cost optimization techniques you can follow while implementing your data analytics pipeline in EMR.

This section comprises the following chapters:

Chapter 9, Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark
Chapter 10, Implementing Real-Time Streaming with Amazon EMR and Spark Streaming
Chapter 11, Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi
Chapter 12, Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA
Chapter 13, Migrating On-Premises Hadoop Workloads to Amazon EMR
Chapter 14, Best Practices and Cost Optimization Techniques

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Section 3: Implementing Common Use Cases and Best Practices

Create new playlist

Sign In

Sign Up

Section 3: Implementing Common Use Cases and Best Practices

Table of Contents for
Section 3: Implementing Common Use Cases and Best Practices