Time for action – examining the default rack configuration

Let's take a look at how the default rack configuration is set up in our cluster.

  1. Execute the following command:
    $ Hadoop fsck -rack
    
  2. The result should include output similar to the following:
    Default replication factor:    3
    Average block replication:     3.3045976
    Corrupt blocks:                0
    Missing replicas:              18 (0.5217391 %)
    Number of data-nodes:          4
    Number of racks:               1
    The filesystem under path '/' is HEALTHY
    

What just happened?

Both the tool used and its output are of interest here. The tool is hadoop fsck, which can be used to examine and fix filesystem problems. As can be seen, this includes some information not dissimilar to our old friend hadoop dfsadmin, though that tool is focused more on the state of each node in detail while hadoop fsck reports on the internals of the filesystem as a whole.

One of the things it reports is the total number of racks in the cluster, which, as seen in the preceding output, has the value 1, as expected.

Note

This command was executed on a cluster that had recently been used for some HDFS resilience testing. This explains the figures for average block replication and under-replicated blocks.

If a block ends up with more than the required number of replicas due to a host temporarily failing, the host coming back into service will put the block above the minimum replication factor. Along with ensuring that blocks have replicas added to meet the replication factor, Hadoop will also delete excess replicas to return blocks to the replication factor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.85