Apache Storm is a real time distributed computation framework. It processes humongous data in real time. Recently, Storm has been adapted by Apache as the incubating project and the development for Apache Storm. You can read more information about Apache Storm Features here: http://storm.incubator.apache.org/.
Apache Storm can be used to process massive streams of data in a distributed manner. It therefore provides excellent batch-oriented processing capabilities for time-sensitive analytics. With Apache Solr and Storm together, organizations can process big data in real time: for example, such industrial plants that would like to extract information from their plant system, which is emitting raw data continuously, and process it to facilitate real-time analytics such as identifying the top problematic systems or looking for recent errors/failures. Apache Solr and Storm can work together to execute such batch processing for big data in real time.
Apache Storm runs in a cluster mode where multiple nodes participate in performing computation in real time. It supports two types of nodes: a master node (also called Nimbus) and a worker node (also called a slave). As the name describes, Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures, whereas the supervisor listens for work assigned to its machine and starts and stops worker processes as necessary on the basis of what Nimbus has assigned to it. Apache Storm uses ZooKeeper to perform all the co-ordination between Nimbus and the supervisor. The data in Apache Storm is ready as a stream, which is simply a tuple of name value pairs:
{id: 1748, author_name: "hrishi", full_name: "Hrishikesh Karambelkar"}
Apache Storm uses the concept of Spout and Bolts. All work is executed in the Apache Storm topology. The following screenshot shows the Storm topology with an example of word count:
Spouts are data inputs; this is where data arrives in the Storm cluster. Bolts process the streams that get piped into it. They can be fed data from spouts or other bolts. The bolts can form a chain of processing, with each bolt performing a unit task. This concept is similar to MapReduce, which we will discuss in the following chapters.
Let's install Apache Storm and try out a simple word count example:
zoo.cfg
from the book's codebase, or rename zoo_sample.cfg
to zoo.cfg
in your code.$ bin/zkServer.sh
$STORM_HOME/conf
folder. Edit storm.yaml and put the correct Nimbus host. You can use the configuration file provided along with the book. If you are running it in a cluster environment, your nimbus_host needs to point to the correct master. In this configuration, you may also provide multiple ZooKeeper servers for failsafe.JAVA_HOME
and STORM_HOME
:$ export STORM_HOME=/home/hrishi/storm $ export JAVA_HOME=/usr/share/jdk
$ $STORM_HOME/bin/storm nimbus
$ $STORM_HOME/bin/storm supervisor
$ $STORM_HOME/bin/storm ui
http://localhost:8080
from your browser. A screen similar to the following screenshot should be visible now:$ bin/storm jar storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology WordCount -c nimbus.host=<host>
In the word count example, you will find different classes being mapped to different roles as shown in the following code snippet:
3.137.220.44