Elastic MapReduce

Elastic MapReduce (EMR) is a fully-managed cluster platform for running big-data and analytics frameworks such as Apache Hadoop, Spark, HBase, Presto, Impala, Cascading, and Flink. Running Hadoop clusters is a complex and time-consuming task. EMR provisions the cluster and installs frequently used frameworks for data scientists, analysts, and engineers.

EMR provides the flexibility to bootstrap your cluster, with a series of steps defined by the customer to install, configure, and prepare your data to be processed. EMR can use the Hadoop distributed file system on EBS volumes or EMRFS with Amazon S3 as the backing persistence service.

EMR clusters have a variety of use cases, from ETL and batch processing to real-time applications integrating Amazon Firehose or Apache Spark, and a wide number of connectors and integration architectures. Clusters on EMR can be transient for a one-time use case, or persistent, meaning they are constantly processing data without interruption.

Spot instances can be used to lower the costs of massive processing of computed nodes, making EMR a cost-efficient solution. The EMR architecture is elastic, with the capacity to increase the number of processing nodes and unlimited resilient storage when using Amazon S3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.150.231