Load balancing

A balanced Cassandra cluster is one where each node owns an equal number of keys. This means when you query nodetool ring, a balanced cluster will show the same percentage for all the nodes under the Owns or Effective Ownership columns. If the data is not uniformly distributed between the keys, even with equal ownership you will see some nodes are more occupied by the data than others. We use RandomPartitioner or Murmur3Partitioner to avoid this sort of lopsided cluster.

Anytime a new node is added or a node is decommissioned, the token distribution gets skewed. Normally, one always wants to have Cassandra fairly load balanced to avoid hotspots. Fortunately, it is very easy to load balance. Here is the two-step load balancing process.

  1. Calculate the initial tokens based on the partitioner that you are using. It can be manually generated by equally dividing the token range for a given partitioner among the number of nodes. Or, you can use tools/bin/token-generator to generate tokens for you. For example, the following snippet generates the tokens for two data centers with each having three nodes:
    $ tools/bin/token-generator 3 3 
    DC #1:
      Node #1:  0 
      Node #2:  56713727820156410577229101238628035242
      Node #3:  113427455640312821154458202477256070484 
    
    DC #2: 
      Node #1:  169417178424467235000914166253263322299 
      Node #2:  55989722784154413846455963776007251813 
      Node #3:  112703450604310824423685065014635287055
  2. Please note that these tokens are generated for RandomPartitioner. That means it is good for default Cassandra 1.1.x, but not for default Cassandra 1.2.x or higher. Cassandra 1.2.x and higher uses Murmur3Partitioner as default. Murmur has a different key range.
  3. Now that we have tokens, we need to call:
    bin/nodetool -h <node_to_move> move <token_number>

The trick here is to assign a new token to a node that is closest to it. This will allow faster balancing as there will be less data to move. Live example of how load balancing is done is covered under the topic Adding nodes to a cluster in this chapter, where we add a node to the cluster, which makes the cluster lopsided. We finally balance it by moving tokens around.

It is actually very easy to write a shell or Python script that takes the ring and then balances it automatically. For someone using RandomPartitioner, there is a GitHub project, Cassandra-Balancer (https://github.com/tivv/cassandra-balancer), which calculates the tokens for a node and moves the data. So, instead of writing one of your own you can just use this groovy script. Execute on each node, one by one, and you are done.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.102.50