As a system grows, or an application gets matured, or the cloud infrastructure starts to warn of failure of the underlying hardware, or probably you have got hit by the TechCrunch effect, you may need to do one of these things: repair, backup, or scale up/down. Or, perhaps, the management decides to have another data center set up just for analysis of data (maybe using Hadoop) without affecting the user's experience for which the data is served from the existing data center. These tasks come as an integral part of a system administrator's day job. Fortunately, all these tasks are fairly easy in Cassandra and there is a lot of documentation available for it.
In this chapter, we will go through Cassandra's built-in DevOps tool and the discuss on scaling a cluster up and shrinking it down. We will also see how one can replace a dead node or just remove it, and let other nodes bear the extra load. Further, we will briefly see backup and restoration. We will observe how Cassandra tried its best to accommodate the changes, but most of the operations leave the cluster unbalanced—in the sense that tokens to each machine are not uniformly distributed. The load balancing section shows how easy it is to rebalance the cluster.
Although most of the tasks are mechanical and really simple to automate, it may be a burden to maintain a large cluster. The last section briefly introduces Priam, a Java-based web application, to simplify many operational tasks.
TechCrunch Effect is basically sudden surge of traffic to your website when the popular technical news website http://techcrunch.com features your company/application. It is generally referred to indicate the tsunami of traffic that comes via all the PR sources.
Adding more nodes to Cassandra (scaling up) or shrinking the number of nodes (scaling down) is a pretty straightforward task. In a smaller and moderate-sized Cassandra cluster setup (say, fewer than 25 nodes), it can be easily managed by doing the tasks manually. But in larger clusters, the whole process can be automated by writing appropriate shell script to perform the task.
Cassandra is one of the purest distributed systems where all the nodes are identical. So, adding a new node is just a matter of launching a Cassandra service with almost the same parameters as any other machine in the ring. In a cloud environment such as Amazon Web Services, it is a pretty common practice to have a machine image of Cassandra that contains the blueprint of a Cassandra node. Each time you have to add a node to the cluster, you launch the AMI, tweak a couple of parameters that are specific to the node, and done. It is as simple as that.
To add a new node to the cluster, you need to have a Cassandra setup that has:
cassandra.yaml
to set the following appropriately:cluster_name
: It is the same as the other nodes in the cluster where this node is joining in.listen_address
: Set it to the IP or the hostname where other nodes connect the Cassandra service on this node to. Be warned that leaving this field empty may not be a good idea. It will assume listen_address
is the same as the hostname, which may or may not be correct. In Amazon EC2, it is usually just right.broadcast_address
: It may be needed to set for a multi data center Cassandra installation.Apart from these settings, any customization in other nodes cassandra.yaml
should be made into new nodes configuration.
Now that the node is ready, here are the steps to add new nodes:
cassandra.yaml
, and set initial tokens to them (no specific choice).You start them and move the data around the nodes so that the nodes comply with the new initial tokens that we calculated. (We'll see how to do this later in this chapter.)nodetool
. You need to run nodetool move NEW_INITIAL_TOKEN
on each node.nodetool cleanup
on each node.Following is a demonstration adding of a node into a 3-node cluster, that is, the expansion of a 3-node cluster into a 4-node cluster.
Ring status: Use nodetool -h HOSTNAME ring
to see the current ring distribution.
$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring Address Effective-Ownership Token 113427455640312821154458202477256070485 10.99.9.67 66.67% 0 10.147.171.159 66.67% 6713727820156410577229101238628035242 10.114.189.54 66.67% 13427455640312821154458202477256070485 # Some columns are removed from the result to fit the width
The previous sample looks pretty balanced with three nodes and a replication factor of 2.
New tokens: Adding additional nodes is going to split the token range into four. Instead of calculating the tokens manually, we'll let the tools provided by Cassandra do it for us. Let's see what they are.
$ /opt/cassandra/tools/bin/token-generator 4 DC #1: Node #1: 0 Node #2: 42535295865117307932921825928971026432 Node #3: 85070591730234615865843651857942052864 Node #4: 127605887595351923798765477786913079296
A couple of things: Be aware that token numbers are partitioner dependent and in this case it was RandomPartitioner
. The other thing is if you see the old and new tokens, you will realize that the first node is not going to be touched. It is already set to the correct value. Also, it will be profitable in the old node, 2. The old node 3 gets assigned to the token values of node 2 and node 3 in the new configuration. This way we'll minimize data movement across the nodes (streaming). The new node will have the initial token as described by node 4 in the previous result.
Start the new node: Edit cassandra.yaml
of the new node to set the appropriate value of the cluster name, initial token, seed node, listen address, and any other customization as per the environment (such as broadcast address, snitch, security, datafile, and so on). Now, start the node by issuing the cassandra
command or starting the Cassandra service. Wait for a couple of minutes as the new node gets introduced with the cluster. The cluster now looks pretty lopsided:
$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring Address EO* Token 127605887595351923798765477786913079296 10.99.9.67 33.33% 0 10.147.171.159 58.33% 56713727820156410577229101238628035242 10.114.189.54 66.67% 113427455640312821154458202477256070485 10.166.54.134 41.67% 127605887595351923798765477786913079296 *EO = Effective ownership
Move tokens: Let's balance the nodes by moving data around. We need not touch Node #1
and Node #4
. We need to move data from Node #2
and Node #3
. They are the ones with wrong tokens. Here is how:
# Move data on Node #2 $ /opt/cassandra/bin/nodetool -h 10.147.171.159 move 42535295865117307932921825928971026432 # Cassandra is still unbalanced.# Move data on Node #3 $ /opt/cassandra/bin/nodetool -h 10.114.189.54 move 85070591730234615865843651857942052864
This is a blocking operation. That means you will need to wait till the process finishes. In a really large cluster with huge data, it may take some time to move the data. Be patient. This operation moves data. It heavily burdens the network and the data size on disks may change. So, it is not ideal to do this task when your site is running at peak traffic. Perform this task at a relatively slow traffic time.
It may be useful to watch streaming statistics on the node by using nodetool netstats
. Here is an example of how that looks (sampled every one second):
$ for i in {1..300} ; do /opt/cassandra/bin/nodetool -h 10.114.189.54 netstats; sleep 1; done Mode: NORMAL Not sending any streams. Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 1371882 Responses n/a 0 7871820 Mode: MOVING Not sending any streams. Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 1371882 Responses n/a 0 7871823 [-- snip --] Mode: MOVING Streaming to: /10.99.9.67 /mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-1-Data.db sections=1 progress=8126464/20112794 – 40% /mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-2-Data.db sections=1 progress=0/15600228 - 0% Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 1371882 Responses n/a 0 7871925 Mode: NORMAL Not sending any streams. Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 1371882 Responses n/a 0 7871934
After the move is done, the balancing is done. The latest cluster now looks much better:
$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring Address EO* Token 127605887595351923798765477786913079296 10.99.9.67 50.00% 0 10.147.171.159 50.00% 42535295865117307932921825928971026432 10.114.189.54 50.00% 85070591730234615865843651857942052864 10.166.54.134 50.00% 127605887595351923798765477786913079296 *EO = Effective-Ownership
Cleanup: Now that everything is done and there is relatively low traffic on the database, it is a good time to clean the useless data from each node.
$ /opt/cassandra/bin/nodetool -h 10.114.189.54 cleanup $ /opt/cassandra/bin/nodetool -h 10.99.9.67 cleanup $ /opt/cassandra/bin/nodetool -h 10.147.171.159 cleanup $ /opt/cassandra/bin/nodetool -h 10.166.54.134 cleanup
Now, we are done with adding a new node to the system.
It may not always be desired to have a high number of nodes up all the time. It adds to the cost and maintenance overheads. In many situations where one has scaled up to cope with a sudden surge in the traffic (for high I/O) or to avoid a hotspot for a while, it may be required to retire some machines and come back to the normal operation mode. Another reason to remove a node is hardware or communication failure, like a dead node that needs to be ejected out of the ring.
Removing a live node is to stream the data out of the node to its neighbors. The command to remove a live node is nodetool decommission
. That's all. You are done with removing a live node. It will take some time to stream the data and you may need to rebalance the cluster.
Here is what decommissioning a node looks like. Assume that the ring is the same as when we added one node to a 3-node cluster. The following command will show the process of decommissioning a live node:
$ /opt/cassandra/bin/nodetool -h 10.166.54.134 decommission
This will decommission the node at 10.166.54.134
. It is a blocking process, which means the command-line interface (CLI) will wait till the decommissioning gets done. Here is what netstats
on the node looks like:
$ for i in {1..300} ; do /opt/cassandra/bin/nodetool -h 10.166.54.134 netstats; sleep 1; done Mode: NORMAL Not sending any streams. Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 7 Responses n/a 0 139736 Mode: LEAVING Not sending any streams. Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 7 Responses n/a 0 139784 Mode: LEAVING Streaming to: /10.99.9.67 /mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-10-Data.db sections=1 progress=9014392/9014392 – 100% /mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-9-Data.db sections=1 progress=0/33779859 – 0% /mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-8-Data.db sections=1 progress=0/7298715 - 0% Streaming to: /10.147.171.159 /mnt/cassandra-data/data/Keyspace1/Standard1/Keyspace1-Standard1-hf-9-Data.db sections=1 progress=15925248/53814752 - 29% Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 7 Responses n/a 0 139886 Mode: DECOMMISSIONED Not sending any streams. Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 7 Responses n/a 0 139967
Obviously, it leaves the ring imbalanced:
$ /opt/cassandra/bin/nodetool -h 10.99.9.67 ring Address EO* Token 85070591730234615865843651857942052864 10.99.9.67 75.00% 0 10.147.171.159 75.00% 42535295865117307932921825928971026432 10.114.189.54 50.00% 85070591730234615865843651857942052864 *EO= Effective-Ownership
Removing a dead node is similar to decommissioning except for the fact that data is streamed from replica nodes to other nodes instead of streaming from the node that is being replaced. The command to remove a dead node from the cluster is:
$ nodetool -h HOSTNAME removetoken TOKEN_ASSIGNED_TO_THE_NODE
So, if you've decided to remove a node that owned a token, 85070591730234615865843651857942052864
, you just run:
$ nodetool -h 10.99.9.67 removetoken 85070591730234615865843651857942052864
It has a similar effect on the ring as nodetool decommission
. But
decommission
or disablegossip
cannot be used with a dead node. It may require moving/rebalancing the cluster tokens after this.
3.15.137.75