A distributed processing framework wouldn't be complete without distributed storage. One of them is HDFS. Even if Spark is run on local mode, it can still use a distributed file system at the backend. Like Spark breaks computations into subtasks, HDFS breaks a file into blocks and stores them across a set of machines. For HA, HDFS stores multiple copies of each block, the number of copies is called replication level, three by default (refer to Figure 3-5).
NameNode is managing the HDFS storage by remembering the block locations and other metadata such as owner, file permissions, and block size, which are file-specific. Secondary Namenode is a slight misnomer: its function is to merge the metadata modifications, edits, into fsimage, or a file that serves as a metadata database. The merge is required, as it is more practical to write modifications of fsimage to a separate file instead of applying each modification to the disk image of the fsimage directly (in addition to applying the corresponding changes in memory). Secondary Namenode cannot serve as a second copy of the Namenode. A Balancer is run to move the blocks to maintain approximately equal disk usage across the servers—the initial block assignment to the nodes is supposed to be random, if enough space is available and the client is not run within the cluster. Finally, the Client communicates with the Namenode to get the metadata and block locations, but after that, either reads or writes the data directly to the node, where a copy of the block resides. The client is the only component that can be run outside the HDFS cluster, but it needs network connectivity with all the nodes in the cluster.
If any of the node dies or disconnects from the network, the Namenode notices the change, as it constantly maintains the contact with the nodes via heartbeats. If the node does not reconnect to the Namenode within 10 minutes (by default), the Namenode will start replicating the blocks in order to achieve the required replication level for the blocks that were lost on the node. A separate block scanner thread in the Namenode will scan the blocks for possible bit rot—each block maintains a checksum—and will delete corrupted and orphaned blocks:
$ wget ftp://apache.cs.utah.edu/apache.org/hadoop/common/h/hadoop-2.6.4.tar.gz --2016-05-12 00:10:55-- ftp://apache.cs.utah.edu/apache.org/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz => 'hadoop-2.6.4.tar.gz.1' Resolving apache.cs.utah.edu... 155.98.64.87 Connecting to apache.cs.utah.edu|155.98.64.87|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /apache.org/hadoop/common/hadoop-2.6.4 ... done. ==> SIZE hadoop-2.6.4.tar.gz ... 196015975 ==> PASV ... done. ==> RETR hadoop-2.6.4.tar.gz ... done. ... $ wget ftp://apache.cs.utah.edu/apache.org/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz.mds --2016-05-12 00:13:58-- ftp://apache.cs.utah.edu/apache.org/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz.mds => 'hadoop-2.6.4.tar.gz.mds' Resolving apache.cs.utah.edu... 155.98.64.87 Connecting to apache.cs.utah.edu|155.98.64.87|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /apache.org/hadoop/common/hadoop-2.6.4 ... done. ==> SIZE hadoop-2.6.4.tar.gz.mds ... 958 ==> PASV ... done. ==> RETR hadoop-2.6.4.tar.gz.mds ... done. ... $ shasum -a 512 hadoop-2.6.4.tar.gz 493cc1a3e8ed0f7edee506d99bfabbe2aa71a4776e4bff5b852c6279b4c828a0505d4ee5b63a0de0dcfecf70b4bb0ef801c767a068eaeac938b8c58d8f21beec hadoop-2.6.4.tar.gz $ cat !$.mds hadoop-2.6.4.tar.gz: MD5 = 37 01 9F 13 D7 DC D8 19 72 7B E1 58 44 0B 94 42 hadoop-2.6.4.tar.gz: SHA1 = 1E02 FAAC 94F3 35DF A826 73AC BA3E 7498 751A 3174 hadoop-2.6.4.tar.gz: RMD160 = 2AA5 63AF 7E40 5DCD 9D6C D00E EBB0 750B D401 2B1F hadoop-2.6.4.tar.gz: SHA224 = F4FDFF12 5C8E754B DAF5BCFC 6735FCD2 C6064D58 36CB9D80 2C12FC4D hadoop-2.6.4.tar.gz: SHA256 = C58F08D2 E0B13035 F86F8B0B 8B65765A B9F47913 81F74D02 C48F8D9C EF5E7D8E hadoop-2.6.4.tar.gz: SHA384 = 87539A46 B696C98E 5C7E352E 997B0AF8 0602D239 5591BF07 F3926E78 2D2EF790 BCBB6B3C EAF5B3CF ADA7B6D1 35D4B952 hadoop-2.6.4.tar.gz: SHA512 = 493CC1A3 E8ED0F7E DEE506D9 9BFABBE2 AA71A477 6E4BFF5B 852C6279 B4C828A0 505D4EE5 B63A0DE0 DCFECF70 B4BB0EF8 01C767A0 68EAEAC9 38B8C58D 8F21BEEC $ tar xf hadoop-2.6.4.tar.gz $ cd hadoop-2.6.4
core-site.xml
and hdfs-site.xml
files, as follows:$ cat << EOF > etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property> </configuration> EOF $ cat << EOF > etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> EOF
This will put the Hadoop HDFS metadata and data directories under the /tmp/hadoop-$USER
directories. To make this more permanent, we can add the dfs.namenode.name.dir
, dfs.namenode.edits.dir
, and dfs.datanode.data.dir
parameters, but we will leave these out for now. For a more customized distribution, one can download a Cloudera version from http://archive.cloudera.com/cdh.
$ bin/hdfs namenode -format 16/05/12 00:55:40 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = alexanders-macbook-pro.local/192.168.1.68 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.6.4 STARTUP_MSG: classpath = ...
namenode
, secondarynamenode
, and datanode
Java processes (I usually open three different command-line windows to see the logs, but in a production environment, these are usually daemonized):$ bin/hdfs namenode & ... $ bin/hdfs secondarynamenode & ... $ bin/hdfs datanode & ...
$ date | bin/hdfs dfs –put – date.txt ... $ bin/hdfs dfs –ls Found 1 items -rw-r--r-- 1 akozlov supergroup 29 2016-05-12 01:02 date.txt $ bin/hdfs dfs -text date.txt Thu May 12 01:02:36 PDT 2016
datanode
on (localhost). In my case, it is the following:$ cat /tmp/hadoop-akozlov/dfs/data/current/BP-1133284427-192.168.1.68-1463039756191/current/finalized/subdir0/subdir0/blk_1073741827 Thu May 12 01:02:36 PDT 2016
http://localhost:50070
and displays a host of information, including the HDFS usage and the list of DataNodes, the slaves of the HDFS Master node as follows:The preceding figure shows HDFS Namenode HTTP UI in a single node deployment (usually, http://<namenode-address>:50070
). The Utilities | Browse the file system tab allows you to browse and download the files from HDFS. Nodes can be added by starting DataNodes on a different node and pointing to the Namenode with the fs.defaultFS=<namenode-address>:8020
parameter. The Secondary Namenode HTTP UI is usually at http:<secondarynamenode-address>:50090
.
Scala/Spark by default will use the local file system. However, if the core-site/xml
file is on the classpath or placed in the $SPARK_HOME/conf
directory, Spark will use HDFS as the default.
18.222.120.93