Let's take a look at how the default rack configuration is set up in our cluster.
$ Hadoop fsck -rack
Default replication factor: 3 Average block replication: 3.3045976 Corrupt blocks: 0 Missing replicas: 18 (0.5217391 %) Number of data-nodes: 4 Number of racks: 1 The filesystem under path '/' is HEALTHY
Both the tool used and its output are of interest here. The tool is hadoop fsck, which can be used to examine and fix filesystem problems. As can be seen, this includes some information not dissimilar to our old friend hadoop dfsadmin
, though that tool is focused more on the state of each node in detail while hadoop fsck
reports on the internals of the filesystem as a whole.
One of the things it reports is the total number of racks in the cluster, which, as seen in the preceding output, has the value 1
, as expected.
This command was executed on a cluster that had recently been used for some HDFS resilience testing. This explains the figures for average block replication and under-replicated blocks.
If a block ends up with more than the required number of replicas due to a host temporarily failing, the host coming back into service will put the block above the minimum replication factor. Along with ensuring that blocks have replicas added to meet the replication factor, Hadoop will also delete excess replicas to return blocks to the replication factor.
3.145.179.85