Big Data With Hadoop

Hadoop has become the de facto standard in the world of big data, especially over the past three to four years. Hadoop started as a subproject of Apache Nutch in 2006 and introduced two key features related to distributed filesystems and distributed computing, also known as MapReduce, that caught on very rapidly among the open source community. Today, there are thousands of new products that have been developed leveraging the core features of Hadoop, and it has evolved into a vast ecosystem consisting of more than 150 related major products. Arguably, Hadoop was one of the primary catalysts that started the big data and analytics industry.

In this chapter, we will discuss the background and core concepts of Hadoop, the components of the Hadoop platform, and delve deeper into the major products in the Hadoop ecosystem. We will learn about the core concepts of distributed filesystems and distributed processing and optimizations to improve the performance of Hadoop deployments. We'll conclude with real-world hands-on exercises using the Cloudera Distribution of Hadoop (CDH). The topics we will cover are:

The basics of Hadoop
The core components of Hadoop
Hadoop 1 and Hadoop 2
The Hadoop Distributed File System
Distributed computing principles with MapReduce
The Hadoop ecosystem
Overview of the Hadoop ecosystem
Hive, HBase, and more
Hadoop Enterprise deployments
In-house deployments
Cloud deployments
Hands-on with Cloudera Hadoop
Using HDFS
Using Hive
MapReduce with WordCount

Table of Contents for Big Data With Hadoop

Create new playlist

Sign In

Sign Up

Table of Contents for
Big Data With Hadoop