To perform filesystem related tasks, the commands begin with hdfs dfs
. The filesystem commands have been designed to behave similarly to the corresponding Unix/Linux filesystem commands.
What is a URI? URI stands for Uniform Resource Identifier. In the commands that are listed as follows, you will observe the use of URI for file locations. The URI syntax to access a file in HDFS is hdfs://namenodehost/parent/child/<file>
.
The following are some of the most commonly used HDFS commands:
ls
: This command lists files in HDFS.The syntax of the ls
command is hdfs dfs -ls <args>
. The following is the screenshot showing an example of the ls
command:
cat
: This command displays the contents of file/files in the terminal.The syntax of the cat
command is hdfs dfs -cat URI [URI …]
. The following is a sample output of the cat
command:
copyFromLocal
: This command copies a file/files from the local filesystem to HDFS.The syntax of the copyFromLocal
command is hdfs dfs -copyFromLocal <localsrc> URI
. The following is the screenshot showing an example of the copyFromLocal
command:
copyToLocal
: This command copies a file/files from HDFS to thelocal filesystem.The syntax of the copyToLocal
command is hdfs dfs -copyToLocal URI <localdst>
. The following is the screenshot showing an example of the copyToLocal
command:
cp
: This command copies files within HDFS.The syntax of the cp
command is hdfs dfs -cp URI [URI …] <dest>
. The following is the screenshot showing an example of the cp
command:
mkdir
: This command creates a directory in HDFS.The syntax of the mkdir
command is hdfs dfs -mkdir <paths>
. The following is the screenshot showing an example of the mkdir
command:
mv
: This command moves files within HDFS.The syntax of the mv
command is hdfs dfs -mv URI [URI …] <dest>
. The following is the screenshot showing an example of the mv
command:
rm
: This command deletes files from HDFS.The syntax of the rm
command is hdfs dfs -rm URI [URI …]
. The following is the screenshot showing an example of the rm
command:
rm -r
: This command deletes a directory from the HDFS.The syntax of the rm –r
command is hdfs dfs –rm -r URI [URI …]
. The following is the screenshot showing an example of the rm -r
command:
setrep
: This command sets the replication factor for a file in HDFS.The syntax of the setrep
command is hdfs dfs -setrep [-R] <path>
. The following is the screenshot showing an example of the setrep
command:
tail
: This command displays the trailing kilobyte of the contents of a file in HDFS.The syntax of the tail
command is hdfs dfs -tail [-f] URI
. The following is the screenshot showing an example of the tail
command:
Hadoop provides several commands to administer HDFS. The following are two of the commonly used administration commands in HDFS:
balancer
: In a cluster, new datanodes can be added. The addition of new datanodes provides more storage space for the cluster. However, when a new datanode is added, the datanode does not have any files. Due to the addition of the new datanode, data blocks across all the datanodes are in a state of imbalance, that is, they are not evenly spread across the datanodes. The administrator can use the balancer
command to balance the cluster. The balancer can be invoked using this command.The syntax of the balancer
command is hdfs balancer –threshold <threshold>
. Here, threshold
is the balancing threshold expressed in percentage. The threshold is specified as a float value that ranges from 0 to 100. The default threshold values is 10. The balancer tries to distribute blocks to the underutilized datanodes. For example, if the average utilization of all the datanodes in the cluster is 50 percent, the balancer, by default, will try to pick up blocks from nodes that have a utilization of above 60 percent (50 percent + 10 percent) and move them to nodes that have a utilization of below 40 percent (50 percent - 10 percent).
dfsadmin
: The dfsadmin
command is used to run administrative commands on HDFS.The syntax of the dfsadmin
command is hadoop dfsadmin <options>
. Let's understand a few of the important command options and the actions they perform:
[-report]
: This generates a report of the basic filesystem information and statistics.[-safemode <enter | leave | get | wait>]
: This safe mode is a namenode state in which it does not accept changes to the namespace (read-only) and does not replicate or delete blocks.[-saveNamespace]
: This saves the current state of the namespace to a storage directory and resets the edits
log.[-rollEdits]
: This forces a rollover of the edits
log, that is, it saves the state of the current edits
log and creates a fresh edits
log for new transactions.[-restoreFailedStorage true|false|check]
: This enables to set/unset or check to attempt to restore failed storage replicas.[-refreshNodes]
: This updates the namenode daemon with the set of datanodes allowed to connect to the namenode daemon.[-setQuota <quota> <dirname>...<dirname>]
: This sets the quota (the number of items) for the directory/directories.[-clrQuota <dirname>...<dirname>]
: This clears the set quota for the directory/directories.[-setSpaceQuota <quota> <dirname>...<dirname>]
: This sets the disk space quota for the directory/directories.[-clrSpaceQuota <dirname>...<dirname>]
: This clears the disk space quota for the directory/directories.[-refreshserviceacl]
: This refreshes the service-level authorization policy file. We will be learning more about authorization later.[-printTopology]
: This prints the tree of the racks and their nodes as reported by the namenode daemon.[-refreshNamenodes datanodehost:port]
: This reloads the configuration files for a datanode daemon, stops serving the removed block pools, and starts serving new block pools. A block pool is a set of blocks that belong to a single namespace. We will be looking into this concept a bit later.[-deleteBlockPool datanodehost:port blockpoolId [force]]
: This deletes a block pool of a datanode daemon.[-setBalancerBandwidth <bandwidth>]
: This sets the bandwidth limit to be used by the balancer. The bandwidth is the value in bytes per second that the balancer should use for data blocks movement.[-fetchImage <local directory>]
: This gets the latest fsimage
file from namenode and saves it to the specified local directory.[-help [cmd]]
: This displays help for the given command or all commands if a command is not specified.18.224.59.192