Spark comparison with Hadoop MapReduce

The Hadoop MapReduce framework is more commonly compared to Apache Spark, a newer technology that aims to solve problems in a similar problem space. Some of their most important attributes are summarized in the table that follows:

Hadoop MapReduce

Apache Spark

Written in

Java

Scala

Programming model

MapReduce

Resilient distributed dataset

Client bindings

Most high-level languages

Java, Scala, Python

Ease of use

Moderate, with high-level abstractions (Pig, Hive, and so on)

Good

Performance

High throughput in batch

High throughput in Streaming and batch mode

Uses

Disk (I/O bound)

Memory, degrading performance if disk is needed

Typical node

Medium

Medium-large

As we can see from the preceding comparison, there are pros and cons for both technologies. Spark has arguably better performance, especially in problems that use fewer nodes. On the other hand, Hadoop is a mature framework with excellent tooling on top of it to cover almost every use case.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.25.112