Chapter 6. Managing a Cluster – Scaling, Node Repair, and Backup

As a system grows, or an application gets matured, or the cloud infrastructure starts to warn of failure of the underlying hardware, or probably you have got hit by the TechCrunch effect, you may need to do one of these things: repair, backup, or scale up/down. Or, perhaps, the management decides to have another data center set up just for analysis of data (maybe using Hadoop) without affecting the user's experience for which the data is served from the existing data center. These tasks come as an integral part of a system administrator's day job. Fortunately, all these tasks are fairly easy in Cassandra and there is a lot of documentation available for it.

In this chapter, we will go through Cassandra's built-in DevOps tool and the discuss on scaling a cluster up and shrinking it down. We will also see how one can replace a dead node or just remove it, and let other nodes bear the extra load. Further, we will briefly see backup and restoration. We will observe how Cassandra tried its best to accommodate the changes, but most of the operations leave the cluster unbalanced—in the sense that tokens to each machine are not uniformly distributed. The load balancing section shows how easy it is to rebalance the cluster.

Although most of the tasks are mechanical and really simple to automate, it may be a burden to maintain a large cluster. The last section briefly introduces Priam, a Java-based web application, to simplify many operational tasks.

Note

TechCrunch Effect is basically sudden surge of traffic to your website when the popular technical news website http://techcrunch.com features your company/application. It is generally referred to indicate the tsunami of traffic that comes via all the PR sources.

Scaling

Adding more nodes to Cassandra (scaling up) or shrinking the number of nodes (scaling down) is a pretty straightforward task. In a smaller and moderate-sized Cassandra cluster setup (say, fewer than 25 nodes), it can be easily managed by doing the tasks manually. But in larger clusters, the whole process can be automated by writing appropriate shell script to perform the task.

Adding nodes to a cluster

Cassandra is one of the purest distributed systems where all the nodes are identical. So, adding a new node is just a matter of launching a Cassandra service with almost the same parameters as any other machine in the ring. In a cloud environment such as Amazon Web Services, it is a pretty common practice to have a machine image of Cassandra that contains the blueprint of a Cassandra node. Each time you have to add a node to the cluster, you launch the AMI, tweak a couple of parameters that are specific to the node, and done. It is as simple as that.

To add a new node to the cluster, you need to have a Cassandra setup that has:

  • Setup node's identity: Edit cassandra.yaml to set the following appropriately:
    • cluster_name: It is the same as the other nodes in the cluster where this node is joining in.
    • listen_address: Set it to the IP or the hostname where other nodes connect the Cassandra service on this node to. Be warned that leaving this field empty may not be a good idea. It will assume listen_address is the same as the hostname, which may or may not be correct. In Amazon EC2, it is usually just right.
    • broadcast_address: It may be needed to set for a multi data center Cassandra installation.
  • Seed node: Each node must know the seed node to be able to initialize the gossip (refer to the Gossip section of Chapter 2, Cassandra Architecture, for gossip), learn about the topology, and let other nodes know about itself.
  • Initial token: It is the data range this node is going to be responsible for. One can just leave the initial token and Cassandra will assign the token by choosing the middle of a token range of the most loaded node. This is the fastest way to make a lopsided cluster. The nodes should be well balanced.

Apart from these settings, any customization in other nodes cassandra.yaml should be made into new nodes configuration.

Now that the node is ready, here are the steps to add new nodes:

  1. Initial tokens: Depending on the type of partitioner that you are using for the key distribution, you will need to recalculate the initial tokens for each node in the system (refer to the Initial token section of Chapter 4, Deploying a Cluster, for initial token calculation). This means older nodes are going to have different data sets than they originally owned. However, there are a couple of smart tricks in the initial token assignment.
    1. N-folding the capacity: If you are doubling, triplicating, or increasing the capacity N times, you'll find that the initial token generated includes older initial tokens. Say, for example, you had a 3-node cluster with initial tokens as 0, t/3, and 2t/3. If you decide to triple the capacity by adding six more nodes, the new tokens should be 0, t/9, ...t/3, ...2t/3, and... 8t/9. The trick here is to leave the tokens that are already in use in the existing cluster, and assign the rest of the nodes with the remaining tokens. This saves extra move commands to adjust the tokens. You just launch the new nodes and wait till data streams out to all the nodes.
    2. Rebalance later: This is the most common technique among those who have started with Cassandra. The idea is not to bother about imbalance. You can just launch new nodes. Cassandra will assign it with a token value, that is, the middle value of the highest loaded node. This, as expected, does a pretty decent job in removing hotspots from the cluster (and many times this is what you want when you are adding a new node). Once the data streaming between the nodes is done, the cluster may or may not be perfectly balanced. You may want to load balance now. (Refer to the Load Balancing section in this chapter.)
    3. Right token to right node: This is the most complex but the most common case. Usually, you do not go for doubling or quadrupling the cluster. It is more like you are asked to add two new nodes. In this case, you calculate the tokens for the new configuration, edit the new nodes' cassandra.yaml, and set initial tokens to them (no specific choice).You start them and move the data around the nodes so that the nodes comply with the new initial tokens that we calculated. (We'll see how to do this later in this chapter.)
  2. Start a new node: With the initial token assigned or not assigned to the new nodes, we should start the nodes one by one. It is recommended to have a pause of at least two minutes between two consecutive nodes start. These two minutes are to make sure that the other nodes know about this new node via gossip.
  3. Move data: If adding a new node has skewed the data distributed in the cluster, we may need to move the data around in such a way that each node has equal share of the token range. This can be done using nodetool. You need to run nodetool move NEW_INITIAL_TOKEN on each node.
  4. Cleanup: Move does not really move the data from one machine to another; it copies the data instead. This leaves nodes with unused old data. To clean this data, execute nodetool cleanup on each node.

Following is a demonstration adding of a node into a 3-node cluster, that is, the expansion of a 3-node cluster into a 4-node cluster.

Ring status: Use nodetool -h HOSTNAME ring to see the current ring distribution.

$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring 
Address         Effective-Ownership   Token
                                      113427455640312821154458202477256070485
10.99.9.67       66.67%               0
10.147.171.159   66.67%               6713727820156410577229101238628035242
10.114.189.54    66.67%               13427455640312821154458202477256070485
# Some columns are removed from the result to fit the width

The previous sample looks pretty balanced with three nodes and a replication factor of 2.

New tokens: Adding additional nodes is going to split the token range into four. Instead of calculating the tokens manually, we'll let the tools provided by Cassandra do it for us. Let's see what they are.

$ /opt/cassandra/tools/bin/token-generator 4 

DC #1: 
  Node #1:   0 
  Node #2:   42535295865117307932921825928971026432 
  Node #3:   85070591730234615865843651857942052864 
  Node #4:   127605887595351923798765477786913079296

Note

A couple of things: Be aware that token numbers are partitioner dependent and in this case it was RandomPartitioner. The other thing is if you see the old and new tokens, you will realize that the first node is not going to be touched. It is already set to the correct value. Also, it will be profitable in the old node, 2. The old node 3 gets assigned to the token values of node 2 and node 3 in the new configuration. This way we'll minimize data movement across the nodes (streaming). The new node will have the initial token as described by node 4 in the previous result.

Start the new node: Edit cassandra.yaml of the new node to set the appropriate value of the cluster name, initial token, seed node, listen address, and any other customization as per the environment (such as broadcast address, snitch, security, datafile, and so on). Now, start the node by issuing the cassandra command or starting the Cassandra service. Wait for a couple of minutes as the new node gets introduced with the cluster. The cluster now looks pretty lopsided:

$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring 

Address           EO*      Token
                           127605887595351923798765477786913079296
10.99.9.67       33.33%    0
10.147.171.159   58.33%    56713727820156410577229101238628035242
10.114.189.54    66.67%   113427455640312821154458202477256070485
10.166.54.134    41.67%   127605887595351923798765477786913079296
*EO = Effective ownership

Move tokens: Let's balance the nodes by moving data around. We need not touch Node #1 and Node #4. We need to move data from Node #2 and Node #3. They are the ones with wrong tokens. Here is how:

# Move data on Node #2
$ /opt/cassandra/bin/nodetool -h 10.147.171.159 move 42535295865117307932921825928971026432

# Cassandra is still unbalanced.# Move data on Node #3
$ /opt/cassandra/bin/nodetool -h 10.114.189.54 move 85070591730234615865843651857942052864

This is a blocking operation. That means you will need to wait till the process finishes. In a really large cluster with huge data, it may take some time to move the data. Be patient. This operation moves data. It heavily burdens the network and the data size on disks may change. So, it is not ideal to do this task when your site is running at peak traffic. Perform this task at a relatively slow traffic time.

It may be useful to watch streaming statistics on the node by using nodetool netstats. Here is an example of how that looks (sampled every one second):

$ for i in {1..300} ; do  /opt/cassandra/bin/nodetool -h 10.114.189.54 netstats; sleep 1; done 
Mode: NORMAL
Not sending any streams.
Not receiving any streams. 
Pool Name        Active   Pending      Completed
Commands         n/a      0            1371882
Responses        n/a      0            7871820
Mode: MOVING
Not sending any streams.
Not receiving any streams. 

Pool Name        Active   Pending      Completed
Commands         n/a      0            1371882
Responses        n/a      0            7871823
[-- snip --]
Mode: MOVING
Streaming to: /10.99.9.67
/mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db sections=1 progress=8126464/20112794 – 40%
/mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db sections=1 progress=0/15600228 - 0% 

Not receiving any streams. 

Pool Name      Active   Pending      Completed
Commands       n/a      0            1371882
Responses      n/a      0            7871925
Mode: NORMAL
Not sending any streams.
Not receiving any streams. 

Pool Name    Active   Pending      Completed
Commands     n/a      0            1371882
Responses    n/a      0            7871934

After the move is done, the balancing is done. The latest cluster now looks much better:

$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring 

Address          EO*       Token
                           127605887595351923798765477786913079296
10.99.9.67      50.00%    0
10.147.171.159  50.00%    42535295865117307932921825928971026432
10.114.189.54   50.00%    85070591730234615865843651857942052864
10.166.54.134   50.00%    127605887595351923798765477786913079296

*EO = Effective-Ownership

Cleanup: Now that everything is done and there is relatively low traffic on the database, it is a good time to clean the useless data from each node.

$ /opt/cassandra/bin/nodetool -h 10.114.189.54 cleanup
$ /opt/cassandra/bin/nodetool -h 10.99.9.67 cleanup
$ /opt/cassandra/bin/nodetool -h 10.147.171.159 cleanup
$ /opt/cassandra/bin/nodetool -h 10.166.54.134 cleanup

Now, we are done with adding a new node to the system.

Removing nodes from a cluster

It may not always be desired to have a high number of nodes up all the time. It adds to the cost and maintenance overheads. In many situations where one has scaled up to cope with a sudden surge in the traffic (for high I/O) or to avoid a hotspot for a while, it may be required to retire some machines and come back to the normal operation mode. Another reason to remove a node is hardware or communication failure, like a dead node that needs to be ejected out of the ring.

Removing a live node

Removing a live node is to stream the data out of the node to its neighbors. The command to remove a live node is nodetool decommission. That's all. You are done with removing a live node. It will take some time to stream the data and you may need to rebalance the cluster.

Here is what decommissioning a node looks like. Assume that the ring is the same as when we added one node to a 3-node cluster. The following command will show the process of decommissioning a live node:

$ /opt/cassandra/bin/nodetool -h 10.166.54.134 decommission

This will decommission the node at 10.166.54.134. It is a blocking process, which means the command-line interface (CLI) will wait till the decommissioning gets done. Here is what netstats on the node looks like:

$ for i in {1..300} ; do  /opt/cassandra/bin/nodetool -h 10.166.54.134 netstats; sleep 1; done 

Mode: NORMAL
Not sending any streams. 
Not receiving any streams. 

Pool Name    Active   Pending      Completed
Commands     n/a      0            7
Responses    n/a      0            139736
Mode: LEAVING
Not sending any streams.
Not receiving any streams. 

Pool Name    Active   Pending      Completed
Commands     n/a      0            7
Responses    n/a      0            139784
Mode: LEAVING 

Streaming to: /10.99.9.67
/mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-10-Data.db sections=1 progress=9014392/9014392 – 100%
/mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-9-Data.db sections=1 progress=0/33779859 – 0% 
/mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-8-Data.db sections=1 progress=0/7298715 - 0% 

Streaming to: /10.147.171.159 
/mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-9-Data.db sections=1 progress=15925248/53814752 - 29% 

Not receiving any streams. 

Pool Name    Active   Pending      Completed
Commands     n/a      0            7
Responses    n/a      0            139886
Mode: DECOMMISSIONED
Not sending any streams. 
Not receiving any streams. 

Pool Name    Active   Pending      Completed
Commands     n/a      0            7
Responses    n/a      0            139967

Obviously, it leaves the ring imbalanced:

$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring 

Address          EO*      Token
85070591730234615865843651857942052864
10.99.9.67       75.00%   0
10.147.171.159   75.00%   42535295865117307932921825928971026432
10.114.189.54    50.00%   85070591730234615865843651857942052864

*EO=  Effective-Ownership

Removing a dead node

Removing a dead node is similar to decommissioning except for the fact that data is streamed from replica nodes to other nodes instead of streaming from the node that is being replaced. The command to remove a dead node from the cluster is:

$ nodetool -h HOSTNAME removetoken TOKEN_ASSIGNED_TO_THE_NODE

So, if you've decided to remove a node that owned a token, 85070591730234615865843651857942052864, you just run:

$ nodetool -h 10.99.9.67 removetoken 85070591730234615865843651857942052864

It has a similar effect on the ring as nodetool decommission. But decommission or disablegossip cannot be used with a dead node. It may require moving/rebalancing the cluster tokens after this.

Note

It must be noted that decommissioning or removing a token does not remove the data from the node that is being removed from the system. If you plan to reuse the node, you must clean the data directories manually.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.137.75