In the previous chapter, we learned how we can set up a cluster of Kafka, how we can write the Kafka producer, integration of Kafka and Storm, and so on.
In this chapter, we will cover the following topics:
groupBy
operationTrident is a high-level abstraction built on top of Storm. Trident supports stateful stream processing, while pure Storm is a stateless processing framework. The main advantage of using Trident is that it will guarantee that every message that enters the topology is processed only once, which is difficult to achieve in the case of Vanilla Storm. The concept of Trident is similar to high-level batch processing tools such as Cascading and Pig developed over Hadoop. Trident processes the input stream as small batches to achieve exactly once processing in Storm. We will cover this in greater detail in the Maintaining the topology state with Trident section of this chapter.
So far, we have learned that in the Vanilla Storm topology, the spout is the source of tuples, a tuple is a unit of data that can be processed by a Storm application, and the bolt is the processing powerhouse where we write the transformation logic. However, in the Trident topology, the bolt is replaced with higher-level semantics of functions, aggregates, filters, and states.
18.118.186.143