Firstly, we'll kill a DataNode. Recall that the DataNode process runs on each host in the HDFS cluster and is responsible for the management of blocks within the HDFS filesystem. Because Hadoop, by default, uses a replication factor of 3 for blocks, we should expect a single DataNode failure to have no direct impact on availability, rather it will result in some blocks temporarily falling below the replication threshold. Execute the following steps to kill a DataNode process:
dfsadmin
command for this:$ Hadoop dfsadmin -report Configured Capacity: 81376493568 (75.79 GB) Present Capacity: 61117323920 (56.92 GB) DFS Remaining: 59576766464 (55.49 GB) DFS Used: 1540557456 (1.43 GB) DFS Used%: 2.52% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 4 (4 total, 0 dead) Name: 10.0.0.102:50010 Decommission Status : Normal Configured Capacity: 20344123392 (18.95 GB) DFS Used: 403606906 (384.91 MB) Non DFS Used: 5063119494 (4.72 GB) DFS Remaining: 14877396992(13.86 GB) DFS Used%: 1.98% DFS Remaining%: 73.13% Last contact: Sun Dec 04 15:16:27 PST 2011 …
Now log onto one of the nodes and use the jps
command to determine the process ID of the DataNode process:
$ jps 2085 TaskTracker 2109 Jps 1928 DataNode
$ kill -9 1928
$ jps 2085 TaskTracker
dfsadmin
command:$ Hadoop dfsadmin -report Configured Capacity: 81376493568 (75.79 GB) Present Capacity: 61117323920 (56.92 GB) DFS Remaining: 59576766464 (55.49 GB) DFS Used: 1540557456 (1.43 GB) DFS Used%: 2.52% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 4 (4 total, 0 dead) …
$ Hadoop dfsadmin -report Configured Capacity: 61032370176 (56.84 GB) Present Capacity: 46030327050 (42.87 GB) DFS Remaining: 44520288256 (41.46 GB) DFS Used: 1510038794 (1.41 GB) DFS Used%: 3.28% Under replicated blocks: 12 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (4 total, 1 dead) …
0
:$ Hadoop dfsadmin -report … Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (4 total, 1 dead) …
The high-level story is pretty straightforward; Hadoop recognized the loss of a node and worked around the problem. However, quite a lot is going on to make that happen.
When we killed the DataNode process, the process on that host was no longer available to serve or receive data blocks as part of the read/write operations. However, we were not actually accessing the filesystem at the time, so how did the NameNode process know this particular DataNode was dead?
The answer lies in the constant communication between the NameNode and DataNode processes that we have alluded to once or twice but never really explained. This occurs through a constant series of heartbeat messages from the DataNode reporting on its current state and the blocks it holds. In return, the NameNode gives instructions to the DataNode, such as notification of the creation of a new file or an instruction to retrieve a block from another node.
It all begins when the NameNode process starts up and begins receiving status messages from the DataNode. Recall that each DataNode knows the location of its NameNode and will continuously send status reports. These messages list the blocks held by each DataNode and from this, the NameNode is able to construct a complete mapping that allows it to relate files and directories to the blocks from where they are comprised and the nodes on which they are stored.
The NameNode process monitors the last time it received a heartbeat from each DataNode and after a threshold is reached, it assumes the DataNode is no longer functional and marks it as dead.
The exact threshold after which a DataNode is assumed to be dead is not configurable as a single HDFS property. Instead, it is calculated from several other properties such as defining the heartbeat interval. As we'll see later, things are a little easier in the MapReduce world as the timeout for TaskTrackers is controlled by a single configuration property.
Once a DataNode is marked as dead, the NameNode process determines the blocks which were held on that node and have now fallen below their replication target. In the default case, each block held on the killed node would have been one of the three replicas, so each block for which the node held a replica will now have only two replicas across the cluster.
In the preceding example, we captured the state when 12 blocks were still under-replicated, that is they did not have enough replicas across the cluster to meet the replication target. When the NameNode process determines the under-replicated blocks, it assigns other DataNodes to copy these blocks from the hosts where the existing replicas reside. In this case we only had to re-replicate a very small number of blocks; in a live cluster, the failure of a node can result in a period of high network traffic as the affected blocks are brought up to their replication factor.
Note that if a failed node returns to the cluster, we have the situation of blocks having more than the required number of replicas; in such a case the NameNode process will send instructions to remove the surplus replicas. The specific replica to be deleted is chosen randomly, so the result will be that the returned node will end up retaining some of its blocks and deleting the others.
We configured the NameNode process to log all its activities. Have a look through these very verbose logs and attempt to identify the replication requests being sent.
The final output shows the status after the under-replicated blocks have been copied to the live nodes. The cluster is down to only three live nodes but there are no under-replicated blocks.
3.145.91.254