Data stream life cycle

Apache Kafka is a piece of technology that enables us to handle data streams. Before getting into the working of Kafka, let's see what life cycle events that are when a data stream flow takes place. Rassul Fazelat, in one of his LinkedIn blogs, has explained this in detail as shown pictorially in the following figure.

Figure 04: Life cycle of Data Stream

As shown in the preceding figure, the life cycle events of a data stream have three components, each having a definite job to do:

  • Create: The most important component. It produces the data streams from a variety of internal business applications and external partners as well as other applications. For example, server logs from servers where business applications are hosted, behavioural data collected from various business applications in the form of click stream, page views, social data coming from various social sites, various sensors (IoT) emitting different parameters, and so on.
  • Collect: This is one of the components that helps in collecting this data and making it available for processing. This capability is achieved by the technology we are diving deep into this chapter, Apache Kafka. Other options that give you this capability do exist, such as ActiveMQ, HornetQ and so on.
  • Process: Component which processes the data stream and derives meaningful data stream for various analyses. In our Data Lake architecture, we have this capability requirement and we also have a technology in mind, which will be delved deep into the following chapters. Some of the example technologies in this space are Apache Spark, Apache Flink, and so on.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.73.175