Integration of Storm with Hadoop

The probability that the organizations developing and operating Big Data applications already have a Hadoop cluster deployed is very high. Also, there is a high possibility that they also have real-time stream processing applications deployed to go along with the batch applications running on Hadoop.

It would be great if we can leverage the already deployed YARN cluster to also run Storm topologies. This will reduce the operational cost of maintenance by giving us only one cluster to manage instead of two.

Storm-YARN is a project developed by Yahoo! that enables the deployment of Storm topologies over YARN clusters. It enables the deployment of Storm processes on nodes managed by YARN.

The following diagram illustrates how the Storm processes are deployed on YARN:

Integration of Storm with Hadoop

Storm processes on YARN

In the following section, we will see how to set up Storm-YARN.

Setting up Storm-YARN

Since Storm-YARN is still in alpha, we will proceed with the master branch of the Git repository. The master branch is the branch where all the development for Git repositories takes place. It is equivalent to the trunk in SVN repositories. Make sure you have Git installed on your system. If not, then run the following command:

yum install git-core

Also make sure that you have Apache ZooKeeper and Apache Maven installed on your system. Refer to previous chapters for their setup instructions.

The following are the steps for deploying Storm-YARN:

  1. Clone the Storm-YARN repository with the following commands:
    cd ~/opt
    git clone https://github.com/yahoo/storm-yarn.git
    cd storm-yarn
    
  2. Build Storm-YARN by running the following Maven command:
    mvn package
    

    We will get the following output:

    [INFO] Scanning for projects...
    [INFO]                                                                         
    [INFO] ----------------------------------------------------
    [INFO] Building storm-yarn 1.0-alpha
    [INFO] ----------------------------------------------------
    
    [INFO] ----------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ----------------------------------------------------
    [INFO] Total time: 32.049s
    [INFO] Finished at: Fri Apr 04 09:45:06 IST 2014
    [INFO] Final Memory: 14M/152M
    [INFO] ----------------------------------------------------
    
  3. Copy the storm.zip file from storm-yarn/lib to HDFS using the following commands:
    hdfs dfs -mkdir -p  /lib/storm/0.9.0-wip21
    hdfs dfs -put lib/storm.zip /lib/storm/0.9.0-wip21/storm.zip
    

    The exact version might be different in your case from 0.9.0-wip21.

  4. Create a directory to hold our Storm configuration:
    mkdir -p ~/storm-data
    cp lib/storm.zip ~/storm-data/
    cd ~/storm-data/
    unzip storm.zip
    
  5. Add the following configuration in the storm.yaml file located at ~/storm-data/storm-0.9.0-wip21/conf:
    storm.zookeeper.servers:
         - "localhost"
    
    nimbus.host: "localhost"
    
    master.initial-num-supervisors: 2
    master.container.size-mb: 128

    If required, change the values as per your setup.

  6. Add the storm-yarn/bin folder to your path by adding the following code to the ~/.bashrc file:
    export PATH=$PATH:/home/anand/storm-data/storm-0.9.0-wip21/bin:/home/anand/opt/storm-yarn/bin
  7. Refresh the ~/.bashrc file with the following command:
    source ~/.bashrc
    
  8. Make sure ZooKeeper is running on your system. If not, start ZooKeeper by running the following command:
    ~/opt/zookeeper-3.4.5/bin/zkServer.sh start
    
  9. Launch Storm-YARN using the following command:
    storm-yarn launch ~/storm-data/storm-0.9.0-wip21/conf/storm.yaml
    

    We will get the following output:

    14/04/15 10:14:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/04/15 10:14:49 INFO yarn.StormOnYarn: Copy App Master jar from local filesystem and add to local environment
    … …
    14/04/15 10:14:51 INFO impl.YarnClientImpl: Submitted application application_1397537047058_0001 to ResourceManager at /0.0.0.0:8032
    application_1397537047058_0001
    

    The Storm-YARN application has been submitted with the application_1397537047058_0001 application ID.

  10. We can retrieve the status of our application using the following yarn command:
    yarn application -list
    

    We will get the status of our application as follows:

    14/04/15 10:23:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                    Application-Id    Application-Name    Application-Type      User     Queue             State       Final-State       Progress                       Tracking-URL
    application_1397537047058_0001       Storm-on-Yarn                YARN and   default           RUNNING         UNDEFINED            50%                                N/A
    
  11. We can also see Storm-YARN running on the ResourceManager Web UI at http://localhost:8088/cluster/. You should be able to see something similar to the following screenshot:
    Setting up Storm-YARN

    Storm-YARN on the ResourceManager Web UI

    You can explore the various metrics exposed by clicking on various links on the UI.

  12. Nimbus should also be running now, and you should be able to see it through the Nimbus Web UI at http://localhost:7070/. You should be able to see something similar to the following screenshot:
    Setting up Storm-YARN

    The Nimbus Web UI running on YARN

  13. Now, we need to get the Storm configuration that will be used when deploying topologies on this Storm cluster deployed over YARN. To do so, execute the following commands:
    mkdir ~/.storm
    storm-yarn getStormConfig --appId application_1397537047058_0001 --output ~/.storm/storm.yaml
    

    We will get the following output:

    14/04/15 10:32:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/04/15 10:32:02 INFO yarn.StormOnYarn: application report for application_1397537047058_0001 :localhost.localdomain:9000
    14/04/15 10:32:02 INFO yarn.StormOnYarn: Attaching to localhost.localdomain:9000 to talk to app master application_1397537047058_0001
    14/04/15 10:32:02 INFO yarn.StormMasterCommand: storm.yaml downloaded into /home/anand/.storm/storm.yaml
    
  14. Please make sure that you are passing the correct application ID as retrieved in step 9 to the –appId parameter.

Now that we have successfully deployed Storm-YARN, we will see how to run our topologies on this Storm cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.35.58