Once in a while, a dead node needs to be replaced. This means you just wanted an exact replacement instead of just removal. Here are the steps:
cassandra.yaml
appropriately. Similar to the way we did when adding a new node. (Refer to the Adding nodes to a cluster section in this chapter.)nodetool ring
listing.nodetool repair
for each keyspace for integrity.nodetool removetoken
for the old node.Let us see this in action. The example cluster here has three nodes and a replication factor of 2. One of the nodes is down. We will replace it with a new node.
Here is what the current ring looks like:
$ bin/nodetool -h 10.99.9.67 ring Address Status EO* Token 113427455640312821154458202477256070484 10.99.9.67 Up 66.67% 0 10.147.171.159 Up 66.67% 56713727820156410577229101238628035242 10.114.189.54 Down 66.67% 113427455640312821154458202477256070484 # * EO stands for Effective-ownership
As you can see, we need to replace the third node, 10.114.189.54
. We fired up a new machine, installed Cassandra, altered cassandra.yaml
to match the cluster specifications, and set up the listen address and data directory. We also made sure that the data directories (commit log, saved caches, and data) are blank. Since this node is going to replace a node with token 113427455640312821154458202477256070484
, we are setting the new node's initial_token
as 113427455640312821154458202477256070483
. By default auto_bootstrap
is true, which is good.
# cassandra.yaml on replacement node initial_token: 113427455640312821154458202477256070483
We start the Cassandra service on the node. Looking at the logs, it seems it has got all the information it needed:
# Cassandra log, joining the cluster with dead-node … INFO 08:30:34,841 JOINING: schema complete INFO 08:30:34,842 JOINING: waiting for pending range calculation INFO 08:30:34,842 JOINING: calculation complete, ready to bootstrap INFO 08:30:34,843 JOINING: getting bootstrap token INFO 08:30:34,849 Enqueuing flush of Memtable-LocationInfo@2039197494(36/45 serialized/live bytes, 1 ops) INFO 08:30:34,849 Writing Memtable-LocationInfo@2039197494(36/45 serialized/live bytes, 1 ops) INFO 08:30:34,873 Completed flushing /mnt/cassandra-data/data/system/LocationInfo/system-LocationInfo-hf-6-Data.db (87 bytes) for commitlog position ReplayPosition(segmentId=1371630629459, position=30209) INFO 08:30:34,875 JOINING: sleeping 30000 ms for pending range setup INFO 08:30:43,856 InetAddress /10.114.189.54 is now dead. INFO 08:31:04,875 JOINING: Starting to bootstrap... INFO 08:31:05,685 Finished streaming session 4 from /10.147.171.159 INFO 08:31:09,108 Finished streaming session 3 from /10.99.9.67 INFO 08:31:10,613 Finished streaming session 1 from /10.99.9.67 INFO 08:31:10,622 Finished streaming session 2 from /10.147.171.159 …
Now that the new node has joined, the nodetool ring
still does not seem to be right.
$ bin/nodetool -h 10.99.9.67 ring Address Status EO* Token 113427455640312821154458202477256070484 10.99.9.67 Up 33.33% 0 10.147.171.159 Up 66.67% 56713727820156410577229101238628035242 10.166.54.134 Up 66.67% 113427455640312821154458202477256070483 10.114.189.54 Down 33.33% 113427455640312821154458202477256070484 # * EO stands for Effective-ownership
This means we need to remove the dead node from the cluster. But before we go ahead and remove the node, let's just repair the keyspaces to make sure that the nodes are consistent.
$ bin/nodetool -h 10.99.9.67 repair Keyspace1 [2013-06-19 08:40:55,336] Starting repair command #1, repairing 2 ranges for keyspaceKeyspace1 [2013-06-19 08:41:01,297] Repair session f4e1f830-d8bb-11e2-0000-23f6cbfa94fd for range (113427455640312821154458202477256070484,0] finished [2013-06-19 08:41:01,298] Repair session f86bbb30-d8bb-11e2-0000-23f6cbfa94fd for range (113427455640312821154458202477256070483,113427455640312821154458202477256070484] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/10.114.189.54) is dead: session failed [2013-06-19 08:41:01,298] Repair command #1 finished $ bin/nodetool -h 10.99.9.67 repair mytest [2013-06-19 08:41:25,867] Starting repair command #2, repairing 2 ranges for keyspacemytest [2013-06-19 08:41:26,377] Repair session 0712f3b0-d8bc-11e2-0000-23f6cbfa94fd for range (113427455640312821154458202477256070484,0] finished [2013-06-19 08:41:26,377] Repair session 075ea2b0-d8bc-11e2-0000-23f6cbfa94fd for range (113427455640312821154458202477256070483,113427455640312821154458202477256070484] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/10.114.189.54) is dead: session failed [2013-06-19 08:41:26,377] Repair command #2 finished
Now, let's remove the dead node. We will use the same technique of nodetool removetoken
as we did in the Removing a dead node section in this chapter. The cluster looks good after removal.
# Remove dead node $ bin/nodetool -h 10.99.9.67 removetoken 113427455640312821154458202477256070484 # Ring status $ bin/nodetool -h 10.99.9.67 ring Address Status EO* Token 113427455640312821154458202477256070483 10.99.9.67 Up 66.67% 0 10.147.171.159 Up 66.67% 56713727820156410577229101238628035242 10.166.54.134 Up 66.67% 113427455640312821154458202477256070483 # * EO stands for Effective-ownership
If, for some reason, you are unable to perform the replacement using the previous method, there is an alternative approach. Here are the steps to replace a node with a new one:
cassandra.yaml
appropriately.initial_token
the same as the token assigned to the dead node. Start the substitute node.The cluster will assume that the dead node came alive. Run nodetool repair
on all the keyspaces. This will stream the data to the new node.
In Cassandra 1.0 and onwards, a dead node can be replaced with a new node by the property cassandra.replace_token=<Token>
. Set this property using the -D
option while starting Cassandra. Make sure the data directories are empty on the new node and run a nodetool repair
after the node is up.
The two alternative approaches (inserting a node with a new token versus replacing the node with the same token) have different perspectives of how the fix is made. The former approach inserts a node between the dead node and the previous node. We leave just one token to the dead node. This one token gets assigned to the node next to the dead node when we remove the dead node from the ring. The latter approach, however, is like saying the dead node came back to life but lost its memory. So, the replica nodes fill it in. There is no specific preference on which method one should prefer. Choose the one convenient to you.
3.12.163.180