Chapter 5. Exploring High-level Abstraction in Storm with Trident

In the previous chapter, we learned how we can set up a cluster of Kafka, how we can write the Kafka producer, integration of Kafka and Storm, and so on.

In this chapter, we will cover the following topics:

  • Introducing Trident
  • Trident's data model
  • Trident functions, filters, and projections
  • Trident repartitioning operations
  • Trident aggregators
  • Trident's groupBy operation
  • A non-transactional topology
  • A sample Trident topology
  • Trident's state
  • Distributed RPC
  • When to use Trident

Introducing Trident

Trident is a high-level abstraction built on top of Storm. Trident supports stateful stream processing, while pure Storm is a stateless processing framework. The main advantage of using Trident is that it will guarantee that every message that enters the topology is processed only once, which is difficult to achieve in the case of Vanilla Storm. The concept of Trident is similar to high-level batch processing tools such as Cascading and Pig developed over Hadoop. Trident processes the input stream as small batches to achieve exactly once processing in Storm. We will cover this in greater detail in the Maintaining the topology state with Trident section of this chapter.

So far, we have learned that in the Vanilla Storm topology, the spout is the source of tuples, a tuple is a unit of data that can be processed by a Storm application, and the bolt is the processing powerhouse where we write the transformation logic. However, in the Trident topology, the bolt is replaced with higher-level semantics of functions, aggregates, filters, and states.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.186.143