Summary

We explored the evolution of the Hadoop and MapReduce frameworks and discussed YARN, HDFS concepts, HDFS Reads and Writes, and key features as well as challenges. Then, we discussed the evolution of Apache Spark, why Apache Spark was created in the first place, and the value it can bring to the challenges of big data analytics and processing.

Finally, we also took a peek at the various components in Apache Spark, namely, Spark core, Spark SQL, Spark streaming, Spark GraphX, and Spark ML as well as PySpark and SparkR as a means of integrating Python and R language code with Apache Spark.

Now that we have seen big data analytics, the space and the evolution of the Hadoop Distributed computing platform, and the eventual development of Apache Spark along with a high-level overview of how Apache Spark might solve some of the challenges, we are ready to start learning Spark and how to use it in our use cases.

In the next chapter, we will delve more deeply into Apache Spark and start to look under the hood of how it all works in Chapter 6, Start Working with Spark - REPL and RDDs.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary