Rebalancing the parallelism of a topology

As explained in the previous section, one of the key features of Storm is that it allows us to modify the parallelism of a topology at runtime. The process of updating a topology parallelism at runtime is called rebalance. If we add new supervisor nodes to a Storm cluster and don't rebalance the topology, the new nodes will remain idle.

There are two ways to rebalance the topology:

  • Using the Storm Web UI
  • Using the Storm CLI

The Storm Web UI will be covered in detail in the next chapter. This section covers how we can rebalance the topology using the Storm CLI tool. The following is the command we need to execute on the Storm CLI tool to rebalance the topology:

bin/storm rebalance [TopologyName] -n [NumberOfWorkers] -e [Spout]=[NumberOfExecutos] -e [Bolt1]=[NumberOfExecutos] [Bolt2]=[NumberOfExecutos]

The rebalance command will first deactivate the topology for the duration of the message timeout and then redistribute the workers evenly around the Storm cluster. After a few seconds or minutes, the topology will be back in the previous state of activation and restart the processing of input streams.

Rebalancing the parallelism of the sample topology

With the following steps, let's first check the number of worker processes that are running in the Storm cluster by running jps commands on the supervisor machines:

  1. Run the jps command on the first supervisor machine:
    jps
    

    The following information is displayed:

    24347 worker
           23940 supervisor
           24593 Jps
    24349 worker
    

    Two worker processes are assigned to the first supervisor machine.

  2. Run the jps command on the second supervisor machine:
    jps
    

    The following information is displayed:

    24344 worker
          23941 supervisor
          24543 Jps
    

    One worker process is assigned to the second supervisor machine.

In total, three worker processes are running on the Storm cluster.

Let's try to reconfigure the LearningStormClusterTopology topology to use two worker processes, the LearningStormSpout spout to use four executors, and the LearningStormBolt bolt to use four executors using the following command:

bin/storm rebalance LearningStormClusterTopology -n 2 -e LearningStormSpout=4 -e LearningStormBolt=4

The following is the output displayed:

0[main] INFO  backtype.storm.thrift  - Connecting to Nimbus at nimbus.host.ip:6627
58   [main] INFO  backtype.storm.command.rebalance  - Topology LearningStormClusterTopology is rebalancing

Rerun the jps commands on the supervisor machines to view the number of worker processes as follows:

  1. Run the jps command on the first supervisor machine:
    jps
    

    The following information is displayed:

    24377 worker
           23940 supervisor
           24593 Jps
    

    One worker process is assigned to the first supervisor machine.

  2. Run the jps command on the second supervisor machine:
    jps
    

    The following information is displayed:

    24353 worker
          23941 supervisor
          24543 Jps
    

    One worker process is assigned to the second supervisor machine.

In total, two worker processes are running on the Storm cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.186.202