A Brief introduction to streaming

In today's world of interconnected devices and services, it is hard to even spend a few hours a day without our smartphone to check Facebook, or order an Uber ride, or tweet something about the burger you just bought, or check the latest news or sports updates on your favorite team. We depend on our phones and Internet, for a lot of things, whether it is to get work done, or just browse, or e-mail your friend. There is simply no way around this phenomenon, and the number and variety of applications and services will only grow over time.

As a result, the smart devices are everywhere, and they generate a lot of data all the time. This phenomenon, also broadly referred to as the Internet of Things, has changed the dynamics of data processing forever. Whenever you use any of the services or apps on your iPhone, or Droid or Windows phone, in some shape or form, real-time data processing is at work. Since so much is depending on the quality and value of the apps, there is a lot of emphasis on how the various startups and established companies are tackling the complex challenges of SLAs (Service Level Agreements), and usefulness and also the timeliness of the data.

One of the paradigms being researched and adopted by organisations and service providers is the building of very scalable, near real-time or real-time processing frameworks on a very cutting-edge platform or infrastructure. Everything must be fast and also reactive to changes and failures. You would not like it if your Facebook updated once every hour or if you received email only once a day; so, it is imperative that data flow, processing, and the usage are all as close to real time as possible. Many of the systems we are interested in monitoring or implementing generate a lot of data as an indefinite continuous stream of events.

As in any other data processing system, we have the same fundamental challenges of a collection of data, storage, and processing of data. However, the additional complexity is due to the real-time needs of the platform. In order to collect such indefinite streams of events and then subsequently process all such events in order to generate actionable insights, we need to use highly scalable specialized architectures to deal with tremendous rates of events. As such, many systems have been built over the decades starting from AMQ, RabbitMQ, Storm, Kafka, Spark, Flink, Gearpump, Apex, and so on.

Modern systems built to deal with such large amounts of streaming data come with very flexible and scalable technologies that are not only very efficient but also help realize the business goals much better than before. Using such technologies, it is possible to consume data from a variety of data sources and then use it in a variety of use cases almost immediately or at a later time as needed.

Let us talk about what happens when you take out your smartphone and book an Uber ride to go to the airport. With a few touches on the smartphone screen, you're able to select a point, choose the credit card, make the payment, and book the ride. Once you're done with your transaction, you then get to monitor the progress of your car real-time on a map on your phone. As the car is making its way toward you, you're able to monitor exactly where the car is and you can also make a decision to pick up coffee at the local Starbucks while you're waiting for the car to pick you up.

You could also make informed decisions regarding the car and the subsequent trip to the airport by looking at the expected time of arrival of the car. If it looks like the car is going to take quite a bit of time picking you up, and if this poses a risk to the flight you are about to catch, you could cancel the ride and hop in a taxi that just happens to be nearby. Alternatively, if it so happens that the traffic situation is not going to let you reach the airport on time, thus posing a risk to the flight you are due to catch, then you also get to make a decision regarding rescheduling or canceling your flight.

Now in order to understand how such real-time streaming architectures work to provide such invaluable information, we need to understand the basic tenets of streaming architectures. On the one hand, it is very important for a real-time streaming architecture to be able to consume extreme amounts of data at very high rates while , on the other hand, also ensuring reasonable guarantees that the data that is getting ingested is also processed.

The following images diagram shows a generic stream processing system with a producer putting events into a messaging system while a consumer is reading from the messaging system:

Processing of real-time streaming data can be categorized into the following three essential paradigms:

  • At least once processing
  • At most once processing
  • Exactly once processing

Let's look at what these three stream processing paradigms mean to our business use cases.
While exactly once processing of real-time events is the ultimate nirvana for us, it is very difficult to always achieve this goal in different scenarios. We have to compromise on the property of exactly once processing in cases where the benefit of such a guarantee is outweighed by the complexity of the implementation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.53.168