The probability that the organizations developing and operating Big Data applications already have a Hadoop cluster deployed is very high. Also, there is a high possibility that they also have real-time stream processing applications deployed to go along with the batch applications running on Hadoop.
It would be great if we can leverage the already deployed YARN cluster to also run Storm topologies. This will reduce the operational cost of maintenance by giving us only one cluster to manage instead of two.
Storm-YARN is a project developed by Yahoo! that enables the deployment of Storm topologies over YARN clusters. It enables the deployment of Storm processes on nodes managed by YARN.
The following diagram illustrates how the Storm processes are deployed on YARN:
In the following section, we will see how to set up Storm-YARN.
Since Storm-YARN is still in alpha, we will proceed with the master branch of the Git repository. The master branch is the branch where all the development for Git repositories takes place. It is equivalent to the trunk in SVN repositories. Make sure you have Git installed on your system. If not, then run the following command:
yum install git-core
Also make sure that you have Apache ZooKeeper and Apache Maven installed on your system. Refer to previous chapters for their setup instructions.
The following are the steps for deploying Storm-YARN:
cd ~/opt git clone https://github.com/yahoo/storm-yarn.git cd storm-yarn
mvn package
We will get the following output:
[INFO] Scanning for projects... [INFO] [INFO] ---------------------------------------------------- [INFO] Building storm-yarn 1.0-alpha [INFO] ---------------------------------------------------- … [INFO] ---------------------------------------------------- [INFO] BUILD SUCCESS [INFO] ---------------------------------------------------- [INFO] Total time: 32.049s [INFO] Finished at: Fri Apr 04 09:45:06 IST 2014 [INFO] Final Memory: 14M/152M [INFO] ----------------------------------------------------
storm.zip
file from storm-yarn/lib
to HDFS using the following commands:hdfs dfs -mkdir -p /lib/storm/0.9.0-wip21 hdfs dfs -put lib/storm.zip /lib/storm/0.9.0-wip21/storm.zip
The exact version might be different in your case from 0.9.0-wip21.
mkdir -p ~/storm-data cp lib/storm.zip ~/storm-data/ cd ~/storm-data/ unzip storm.zip
storm.yaml
file located at ~/storm-data/storm-0.9.0-wip21/conf
:storm.zookeeper.servers: - "localhost" nimbus.host: "localhost" master.initial-num-supervisors: 2 master.container.size-mb: 128
If required, change the values as per your setup.
storm-yarn/bin
folder to your path by adding the following code to the ~/.bashrc
file:export PATH=$PATH:/home/anand/storm-data/storm-0.9.0-wip21/bin:/home/anand/opt/storm-yarn/bin
~/.bashrc
file with the following command:source ~/.bashrc
~/opt/zookeeper-3.4.5/bin/zkServer.sh start
storm-yarn launch ~/storm-data/storm-0.9.0-wip21/conf/storm.yaml
We will get the following output:
14/04/15 10:14:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/04/15 10:14:49 INFO yarn.StormOnYarn: Copy App Master jar from local filesystem and add to local environment … … 14/04/15 10:14:51 INFO impl.YarnClientImpl: Submitted application application_1397537047058_0001 to ResourceManager at /0.0.0.0:8032 application_1397537047058_0001
The Storm-YARN application has been submitted with the application_1397537047058_0001
application ID.
yarn
command:yarn application -list
We will get the status of our application as follows:
14/04/15 10:23:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1397537047058_0001 Storm-on-Yarn YARN and default RUNNING UNDEFINED 50% N/A
http://localhost:8088/cluster/
. You should be able to see something similar to the following screenshot:You can explore the various metrics exposed by clicking on various links on the UI.
http://localhost:7070/
. You should be able to see something similar to the following screenshot:mkdir ~/.storm storm-yarn getStormConfig --appId application_1397537047058_0001 --output ~/.storm/storm.yaml
We will get the following output:
14/04/15 10:32:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/04/15 10:32:02 INFO yarn.StormOnYarn: application report for application_1397537047058_0001 :localhost.localdomain:9000 14/04/15 10:32:02 INFO yarn.StormOnYarn: Attaching to localhost.localdomain:9000 to talk to app master application_1397537047058_0001 14/04/15 10:32:02 INFO yarn.StormMasterCommand: storm.yaml downloaded into /home/anand/.storm/storm.yaml
–appId
parameter.Now that we have successfully deployed Storm-YARN, we will see how to run our topologies on this Storm cluster.
18.221.35.58