Let's now configure our NameNode to simultaneously write multiple copies of fsimage
to give us our desired data resilience. To do this, we require an NFS-exported directory.
$ stopall.sh
Hadoop/conf/core-site.xml
, modifying the second path to point to an NFS-mounted location to which the additional copy of NameNode data can be written.<property> <name>dfs.name.dir</name> <value>${hadoop.tmp.dir}/dfs/name,/share/backup/namenode</value> </property>
$ rm -f /share/backup/namenode
$ start-all.sh
fsimage
is being written to both the specified locations by running the md5sum
command against the two files specified before (change the following code depending on your configured locations):$ md5sum /var/hadoop/dfs/name/image/fsimage a25432981b0ecd6b70da647e9b94304a /var/hadoop/dfs/name/image/fsimage $ md5sum /share/backup/namenode/image/fsimage a25432981b0ecd6b70da647e9b94304a /share/backup/namenode/image/fsimage
Firstly, we ensured the cluster was stopped; though changes to the core configuration files are not reread by a running cluster, it's a good habit to get into in case that capability is ever added to Hadoop.
We then added a new property to our cluster configuration, specifying a value for the data.name.dir
property. This property takes a list of comma-separated values and writes fsimage
to each of these locations. Note how the hadoop.tmp.dir
property discussed earlier is de-referenced, as would be seen when using Unix variables. This syntax allows us to base property values on others and inherit changes when the parent properties are updated.
Before starting the cluster, we ensure the new directory exists and is empty. If the directory doesn't exist, the NameNode will fail to start as should be expected. If, however, the directory was previously used to store NameNode data, Hadoop will also fail to start as it will identify that both directories contain different NameNode data and it does not know which one is correct.
Be careful here! Especially if you are experimenting with various NameNode data locations or swapping back and forth between nodes; you really do not want to accidentally delete the contents from the wrong directory.
After starting the HDFS cluster, we wait for a moment and then use MD5 cryptographic checksums to verify that both locations contain the identical fsimage
.
The recommendation is to write fsimage
to at least two locations, one of which should be the remote (such as a NFS) filesystem, as in the previous example. fsimage
is only updated periodically, so the filesystem does not need high performance.
In our earlier discussion regarding the choice of hardware, we alluded to other considerations for the NameNode host. Because of fsimage
criticality, it may be useful to ensure it is written to more than one disk and to perhaps invest in disks with higher reliability, or even to write fsimage
to a RAID array. If the host fails, using the copy written to the remote filesystem will be the easiest option; but just in case that has also experienced problems, it's good to have the choice of pulling another disk from the dead host and using it on another to recover the data.
We have ensured that fsimage
is written to multiple locations and this is the single most important prerequisite for managing a swap to a different NameNode host. Now we need to actually do it.
This is something you really should not do on a production cluster. Absolutely not when trying for the first time, but even beyond that it's not a risk-free process. But do practice on other clusters and get an idea of what you'll do when disaster strikes.
You don't want to be exploring this topic for the first time when you need to recover the production cluster. There are several things to do in advance that will make disaster recovery much less painful, not to mention possible:
fsimage
to multiple locations, as done before.core-site.xml
and hdfs-site.xml
files, place them (ideally) on an NFS location, and update them to point to the new host. Any time you modify the current configuration files, remember to make the same changes to these copies.slaves
file from the NameNode onto either the new host or the NFS share. Also, make sure you keep it updated.Ready? Let's do it!
18.220.88.62