HDFS High Availability

HDFS is a Master-Slave cluster with the NameNode as the master and the 100s, if not 1000s of DataNodes as slaves, managed by the master node. This introduces a Single Point of Failure (SPOF) in the cluster as if the Master NameNode goes down for some reason, the entire cluster is going to be unusable. HDFS 1.0 supports an additional Master Node known as the Secondary NameNode to help with recovery of the cluster. This is done by maintaining a copy of all the metadata of the filesystem and is by no means a Highly Available System requiring manual interventions and maintenance work. HDFS 2.0 takes this to the next level by adding support for full High Availability (HA).

HA works by having two Name Nodes in an active-passive mode such that one Name Node is active and other is passive. When the primary NameNode has a failure, the passive Name Node will take over the role of the Master Node.

The following diagram shows how the active-passive pair of NameNodes will be deployed:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.97.47