The fundamentals of Hadoop

In 2006, Doug Cutting, the creator of Hadoop, was working at Yahoo!. He was actively engaged in an open source project called Nutch that involved the development of a large-scale web crawler. A web crawler at a high level is essentially software that can browse and index web pages, generally in an automatic manner, on the internet. Intuitively, this involves efficient management and computation across large volumes of data. In late January of 2006, Doug formally announced the start of Hadoop. The first line of the request, still available on the internet at https://issues.apache.org/jira/browse/INFRA-700, was The Lucene PMC has voted to split part of Nutch into a new subproject named Hadoop. And thus, Hadoop was born.

At the onset, Hadoop had two core components : Hadoop Distributed File System (HDFS) and MapReduce. This was the first iteration of Hadoop, also now known as Hadoop 1. Later, in 2012, a third component was added known as YARN (Yet Another Resource Negotiator) which decoupled the process of resource management and job scheduling. Before we delve into the core components in more detail, it would help to get an understanding of the fundamental premises of Hadoop:

Doug Cutting's post at https://issues.apache.org/jira/browse/NUTCH-193 announced his intent to separate Nutch Distributed FS (NDFS) and MapReduce to a new subproject called Hadoop.

Table of Contents for The fundamentals of Hadoop

Create new playlist

Sign In

Sign Up

Table of Contents for
The fundamentals of Hadoop