Amazon Elastic Map Reduce (EMR)

EMR is a fully managed Hadoop framework that can be launched in minutes. It handles the tasks of node provisioning, cluster setup, configuration, and cluster tuning for you. It operates using EC2 instances and can scale from one node to thousands.

You can increase or decrease the number of instances manually or use auto scaling to do it dynamically, even while the cluster is running. The EMR service monitors your cluster; it can handle retries for failed tasks and will replace poor performing instances automatically.

Even though it is managed, you have complete control over the cluster, including root access. You can install additional applications. EMR has the option to choose from several Hadoop distributions and applications such as Apache Spark, Presto, and HBase.

Data storage can be linked to S3 using the EMR File System (EMRFS). You can store your data in Amazon S3 and use multiple EMR clusters to process the same dataset. This allows you to configure the cluster to the requirements for the task at hand without forcing a best compromise between all tasks. It aligns with the goal of separating compute and storage.

You can also programmatically create an EMR cluster, run a job, then automatically shut it down when complete. This makes it very useful for large-scale batch processing or analytical jobs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.65.65