A balanced Cassandra cluster is the one where each node owns an equal number of keys. This means when you query nodetool status
, a balanced cluster will show the same percentage for all the nodes under the Owns
or Effective Ownership
columns. If the data is not uniformly distributed between the keys, even with equal ownership you will see some nodes are more occupied by the data than others. We use RandomPartitioner
or Murmur3Partitioner
to avoid this sort of lopsided cluster.
Anytime a new node is added or a node is decommissioned, the token distribution gets skewed. Normally, one always wants Cassandra to be fairly load balanced to avoid hotspots. Fortunately, it is very easy to load balance. The two-step load balancing process is as follows:
If you are using RandomPartitioner
, you can use tools/bin/token-generator
to generate tokens for you. For example, the following command generates the tokens for two data centers; each has three nodes:
$ tools/bin/token-generator 3 3 DC #1: Node #1: 0 Node #2: 56713727820156410577229101238628035242 Node #3: 113427455640312821154458202477256070484 DC #2: Node #1: 169417178424467235000914166253263322299 Node #2: 55989722784154413846455963776007251813 Node #3: 112703450604310824423685065014635287055
For, Murmur3Partitioner
(default type of partitioner, since Cassandra Version 1.2), you can use this Python script:
python -c 'print [str(((2**64 / nodes_count) * i) - 2**63) for i in range(nodes_count)]'
For example, if you have six nodes, run the following command:
python -c 'print [str(((2**64 / 6) * i) - 2**63) for i in range(6)]' ['-9223372036854775808', '-6148914691236517206', '-3074457345618258604', '-2', '3074457345618258600', '6148914691236517202']
If you have nodes distributed across two data centers, you may want to assign alternate tokens to each data center:
DC #1: -9223372036854775808 -3074457345618258604 3074457345618258600 DC #2: -6148914691236517206 -2 6148914691236517202
bin/nodetool -h <node_to_move> move <token_number>
The trick here is to assign a new token to a node that is closest to it. This will allow faster balancing as there will be less data to move. A live example of how load balancing is done is covered in the Adding nodes to a cluster section, where we add a node to the cluster, which makes the cluster lopsided. We finally balance it by moving tokens around.
It is actually very easy to write a shell or Python script that takes the ring and then balances it automatically. For someone using RandomPartitioner
, there is a GitHub project, Cassandra-Balancer (https://github.com/tivv/cassandra-balancer), which calculates the tokens for a node and moves the data. So, instead of writing one of your own you can just use this Groovy script. Execute it on each node, one by one, and you are done.
3.135.201.217