A balanced Cassandra cluster is one where each node owns an equal number of keys. This means when you query nodetool ring
, a balanced cluster will show the same percentage for all the nodes under the Owns
or Effective Ownership
columns. If the data is not uniformly distributed between the keys, even with equal ownership you will see some nodes are more occupied by the data than others. We use RandomPartitioner
or Murmur3Partitioner
to avoid this sort of lopsided cluster.
Anytime a new node is added or a node is decommissioned, the token distribution gets skewed. Normally, one always wants to have Cassandra fairly load balanced to avoid hotspots. Fortunately, it is very easy to load balance. Here is the two-step load balancing process.
tools/bin/token-generator
to generate tokens for you. For example, the following snippet generates the tokens for two data centers with each having three nodes:$ tools/bin/token-generator 3 3 DC #1: Node #1: 0 Node #2: 56713727820156410577229101238628035242 Node #3: 113427455640312821154458202477256070484 DC #2: Node #1: 169417178424467235000914166253263322299 Node #2: 55989722784154413846455963776007251813 Node #3: 112703450604310824423685065014635287055
RandomPartitioner
. That means it is good for default Cassandra 1.1.x, but not for default Cassandra 1.2.x or higher. Cassandra 1.2.x and higher uses Murmur3Partitioner
as default. Murmur has a different key range.bin/nodetool -h <node_to_move> move <token_number>
The trick here is to assign a new token to a node that is closest to it. This will allow faster balancing as there will be less data to move. Live example of how load balancing is done is covered under the topic Adding nodes to a cluster in this chapter, where we add a node to the cluster, which makes the cluster lopsided. We finally balance it by moving tokens around.
It is actually very easy to write a shell or Python script that takes the ring and then balances it automatically. For someone using RandomPartitioner
, there is a GitHub project, Cassandra-Balancer (https://github.com/tivv/cassandra-balancer), which calculates the tokens for a node and moves the data. So, instead of writing one of your own you can just use this groovy script. Execute on each node, one by one, and you are done.
18.227.102.50