Home Page Icon
Home Page
Table of Contents for
Learning Storm
Close
Learning Storm
by Anand Nalya, Ankit Jain
Learning Storm
Learning Storm
Table of Contents
Learning Storm
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Setting Up Storm on a Single Machine
Features of Storm
Storm components
Nimbus
Supervisor nodes
The ZooKeeper cluster
The Storm data model
Definition of a Storm topology
Operation modes
Setting up your development environment
Installing Java SDK 6
Installing Maven
Installing Git – distributed version control
Installing the STS IDE
Developing a sample topology
Setting up ZooKeeper
Setting up Storm on a single development machine
Deploying the sample topology on a single-node cluster
Summary
2. Setting Up a Storm Cluster
Setting up a distributed Storm cluster
Deploying a topology on a remote Storm cluster
Deploying the sample topology on the remote cluster
Configuring the parallelism of a topology
The worker process
The executor
Tasks
Configuring parallelism at the code level
Distributing worker processes, executors, and tasks in the sample topology
Rebalancing the parallelism of a topology
Rebalancing the parallelism of the sample topology
Stream grouping
Shuffle grouping
Fields grouping
All grouping
Global grouping
Direct grouping
Local or shuffle grouping
Custom grouping
Guaranteed message processing
Summary
3. Monitoring the Storm Cluster
Starting to use the Storm UI
Monitoring a topology using the Storm UI
Cluster statistics using the Nimbus thrift client
Fetching information with the Nimbus thrift client
Summary
4. Storm and Kafka Integration
The Kafka architecture
The producer
Replication
Consumers
Brokers
Data retention
Setting up Kafka
Setting up a single-node Kafka cluster
Setting up a three-node Kafka cluster
Running multiple Kafka brokers on a single node
A sample Kafka producer
Integrating Kafka with Storm
Summary
5. Exploring High-level Abstraction in Storm with Trident
Introducing Trident
Understanding Trident's data model
Writing Trident functions, filters, and projections
Trident functions
Trident filters
Trident projections
Trident repartitioning operations
The shuffle operation
The partitionBy operation
The global operation
The broadcast operation
The batchGlobal operation
The partition operation
Trident aggregators
The partition aggregate
The aggregate
The ReducerAggregator interface
The Aggregator interface
The CombinerAggregator interface
The persistent aggregate
Aggregator chaining
Utilizing the groupBy operation
A non-transactional topology
A sample Trident topology
Maintaining the topology state with Trident
A transactional topology
The opaque transactional topology
Distributed RPC
When to use Trident
Summary
6. Integration of Storm with Batch Processing Tools
Exploring Apache Hadoop
Understanding HDFS
Understanding YARN
Installing Apache Hadoop
Setting up password-less SSH
Getting the Hadoop bundle and setting up environment variables
Setting up HDFS
Setting up YARN
Integration of Storm with Hadoop
Setting up Storm-YARN
Deploying Storm-Starter topologies on Storm-YARN
Summary
7. Integrating Storm with JMX, Ganglia, HBase, and Redis
Monitoring the Storm cluster using JMX
Monitoring the Storm cluster using Ganglia
Integrating Storm with HBase
Integrating Storm with Redis
Summary
8. Log Processing with Storm
Server log-processing elements
Producing the Apache log in Kafka
Splitting the server log line
Identifying the country, the operating system type, and the browser type from the logfile
Extracting the searched keyword
Persisting the process data
Defining a topology and the Kafka spout
Deploying a topology
MySQL queries
Calculating the page hits from each country
Calculating the count for each browser
Calculating the count for each operating system
Summary
9. Machine Learning
Exploring machine learning
Using Trident-ML
The use case – clustering synthetic control data
Producing a training dataset into Kafka
Building a Trident topology to build the clustering model
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Table of Contents
Next
Next Chapter
Learning Storm
Learning Storm
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset