Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Adding new nodes to an existing cluster

Hadoop supports adding new nodes to an existing cluster without shutting down or restarting any service. This recipe will outline the steps required to add a new node to a pre-existing cluster.

Getting ready

Ensure that you have a Hadoop cluster up and running. In addition, ensure that you have the Hadoop distribution extracted, and the configuration files have been updated with the settings from the recipe titled Starting Hadoop in distributed mode.

We will use the following terms for our imaginary cluster:

Server name	Purpose	Number of dedicated machines
head	Will run the NameNode and JobTracker services	1
secondary	Will run the Secondary NameNode service	1
worker(n)	Will run the TraskTracker and DataNode services	3 or greater

How to do it...

Follow these steps to add new nodes to an existing cluster:

From the head node, update the slaves configuration file with the hostname of the new node:
```
$ vi conf/slaves
worker1
worker2
worker3
worker4
```

$ ssh hadoop@worker4
$ cd /path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker

How it works...

We updated the slaves configuration file on the head node to tell the Hadoop framework that a new node exists in the cluster. However, this file is only read when the Hadoop services are started (for example, by executing the bin/start-all.sh script). In order to add the new node to the cluster without having to restart all of the Hadoop services, we logged into the new node, and started the DataNode and TaskTracker services manually.

Note

The DataNode and TaskTracker services will automatically start the next time the cluster is restarted.

There's more...

When you add a new node to the cluster, the cluster is not properly balanced. HDFS will not automatically redistribute any existing data to the new node in order to balance the cluster. To rebalance the existing data in the cluster, you can run the following command from the head node:

# bin/start-balancer.sh

Note

Rebalancing a Hadoop cluster is a network-intensive task. Imagine, we might be moving terabytes of data around, depending on the number of nodes added to the cluster. Job performance issues might arise when a cluster is in the process of rebalancing, and therefore regular rebalancing maintenance should be properly planned.

Table of Contents for
Adding new nodes to an existing cluster

Adding new nodes to an existing cluster

Getting ready

How to do it...

How it works...

Note

There's more...

Note

See also

Table of Contents for Adding new nodes to an existing cluster

Create new playlist

Sign In

Sign Up

Adding new nodes to an existing cluster

Getting ready

How to do it...

How it works...

Note

There's more...

Note

See also

Table of Contents for
Adding new nodes to an existing cluster