The next step should be obvious; let's kill three DataNodes in quick succession.
The following are the steps to kill three DataNodes in quick succession:
$ start-all.sh
dfsadmin -report
shows four live nodes.$ Hadoop fs -put file1.data file1.new
dfsadmin
until you get output similar to the following that reports the missing blocks:… Under replicated blocks: 123 Blocks with corrupt replicas: 0 Missing blocks: 33 ------------------------------------------------- Datanodes available: 1 (4 total, 3 dead) …
$ hadoop fs -get file1.new file1.new 11/12/04 16:18:05 INFO hdfs.DFSClient: No node available for block: blk_1691554429626293399_1003 file=/user/hadoop/file1.new 11/12/04 16:18:05 INFO hdfs.DFSClient: Could not obtain block blk_1691554429626293399_1003 from any node: java.io.IOException: No live nodes contain current block … get: Could not obtain block: blk_1691554429626293399_1003 file=/user/hadoop/file1.new
start-all.sh
script:$ start-all.sh
$ Hadoop dfsadmin -report | grep -i blocks Under replicated blockss: 69 Blocks with corrupt replicas: 0 Missing blocks: 35 $ Hadoop dfsadmin -report | grep -i blocks Under replicated blockss: 0 Blocks with corrupt replicas: 0 Missing blocks: 30
$ Hadoop fs -get file1.new file1.new
$ md5sum file1.* f1f30b26b40f8302150bc2a494c1961d file1.data f1f30b26b40f8302150bc2a494c1961d file1.new
After restarting the killed nodes, we copied the test file onto HDFS again. This isn't strictly necessary as we could have used the existing file but due to the shuffling of the replicas, a clean copy gives the most representative results.
We then killed three DataNodes as before and waited for HDFS to respond. Unlike the previous examples, killing these many nodes meant it was certain that some blocks would have all of their replicas on the killed nodes. As we can see, this is exactly the result; the remaining single node cluster shows over a hundred blocks that are under-replicated (obviously only one replica remains) but there are also 33 missing blocks.
Talking of blocks is a little abstract, so we then try to retrieve our test file which, as we know, effectively has 33 holes in it. The attempt to access the file fails as Hadoop could not find the missing blocks required to deliver the file.
We then restarted all the nodes and tried to retrieve the file again. This time it was successful, but we took an added precaution of performing an MD5 cryptographic check on the file to confirm that it was bitwise identical to the original one — which it is.
This is an important point: though node failure may result in data becoming unavailable, there may not be a permanent data loss if the node recovers.
Do not assume from this example that it's impossible to lose data in a Hadoop cluster. For general use it is very hard, but disaster often has a habit of striking in just the wrong way.
As seen in the previous example, a parallel failure of a number of nodes equal to or greater than the replication factor has a chance of resulting in missing blocks. In our example of three dead nodes in a cluster of four, the chances were high; in a cluster of 1000, it would be much lower but still non-zero. As the cluster size increases, so does the failure rate and having three node failures in a narrow window of time becomes less and less unlikely. Conversely, the impact also decreases but rapid multiple failures will always carry a risk of data loss.
Another more insidious problem is recurring or partial failures, for example, when power issues across the cluster cause nodes to crash and restart. It is possible for Hadoop to end up chasing replication targets, constantly asking the recovering hosts to replicate under-replicated blocks, and also seeing them fail mid-way through the task. Such a sequence of events can also raise the potential of data loss.
Finally, never forget the human factor. Having a replication factor equal to the size of the cluster—ensuring every block is on every node—won't help you when a user accidentally deletes a file or directory.
The summary is that data loss through system failure is pretty unlikely but is possible through almost inevitable human action. Replication is not a full alternative to backups; ensure that you understand the importance of the data you process and the impact of the types of loss discussed here.
The reports from each DataNode also included a count of the corrupt blocks, which we have not referred to. When a block is first stored, there is also a hidden file written to the same HDFS directory containing cryptographic checksums for the block. By default, there is a checksum for each 512-byte chunk within the block.
Whenever any client reads a block, it will also retrieve the list of checksums and compare these to the checksums it generates on the block data it has read. If there is a checksum mismatch, the block on that particular DataNode will be marked as corrupt and the client will retrieve a different replica. On learning of the corrupt block, the NameNode will schedule a new replica to be made from one of the existing uncorrupted replicas.
If the scenario seems unlikely, consider that faulty memory, disk drive, storage controller, or numerous issues on an individual host could cause some corruption to a block as it is initially being written while being stored or when being read. These are rare events and the chances of the same corruption occurring on all DataNodes holding replicas of the same block become exceptionally remote. However, remember as previously mentioned that replication is not a full alternative to backup and if you need 100 percent data availability, you likely need to think about off-cluster backup.
3.144.114.85