As explained in the previous section, one of the key features of Storm is that it allows us to modify the parallelism of a topology at runtime. The process of updating a topology parallelism at runtime is called rebalance. If we add new supervisor nodes to a Storm cluster and don't rebalance the topology, the new nodes will remain idle.
There are two ways to rebalance the topology:
The Storm Web UI will be covered in detail in the next chapter. This section covers how we can rebalance the topology using the Storm CLI tool. The following is the command we need to execute on the Storm CLI tool to rebalance the topology:
bin/storm rebalance [TopologyName] -n [NumberOfWorkers] -e [Spout]=[NumberOfExecutos] -e [Bolt1]=[NumberOfExecutos] [Bolt2]=[NumberOfExecutos]
The rebalance
command will first deactivate the topology for the duration of the message timeout and then redistribute the workers evenly around the Storm cluster. After a few seconds or minutes, the topology will be back in the previous state of activation and restart the processing of input streams.
With the following steps, let's first check the number of worker processes that are running in the Storm cluster by running jps
commands on the supervisor machines:
jps
command on the first supervisor machine:jps
The following information is displayed:
24347 worker 23940 supervisor 24593 Jps 24349 worker
Two worker processes are assigned to the first supervisor machine.
jps
command on the second supervisor machine:jps
The following information is displayed:
24344 worker 23941 supervisor 24543 Jps
One worker process is assigned to the second supervisor machine.
In total, three worker processes are running on the Storm cluster.
Let's try to reconfigure the LearningStormClusterTopology
topology to use two worker processes, the LearningStormSpout
spout to use four executors, and the LearningStormBolt
bolt to use four executors using the following command:
bin/storm rebalance LearningStormClusterTopology -n 2 -e LearningStormSpout=4 -e LearningStormBolt=4
The following is the output displayed:
0[main] INFO backtype.storm.thrift - Connecting to Nimbus at nimbus.host.ip:6627 58 [main] INFO backtype.storm.command.rebalance - Topology LearningStormClusterTopology is rebalancing
Rerun the jps
commands on the supervisor machines to view the number of worker processes as follows:
jps
command on the first supervisor machine:jps
The following information is displayed:
24377 worker 23940 supervisor 24593 Jps
One worker process is assigned to the first supervisor machine.
jps
command on the second supervisor machine:jps
The following information is displayed:
24353 worker 23941 supervisor 24543 Jps
One worker process is assigned to the second supervisor machine.
In total, two worker processes are running on the Storm cluster.
18.118.186.202