Understanding micro batching

Micro batching is defined as the procedure in which the incoming stream of messages is processed by dividing them into group of small batches. This helps to achieve the performance benefits of batch processing; however, at the same time, it helps to keep the latency of processing of each message minimal.

Here, an incoming stream of messages is treated as a flow of small batches of messages and each batch is passed to a processing engine, which outputs a processed stream of batches.

In Spark Streaming, a micro batch is created based on time instead of size, that is, events received in a certain time interval, usually in milliseconds, are grouped together into a batch. This assures that any message does not have to wait much before processing and helps to keep latency of processing under control.

Another advantage of micro batching is that it helps to keep the volume of control message lower. For example, if a system requires an acknowledgement is to be sent by the processing engine for every message processed then, in case of micro batching, one acknowledgement is to be sent per batch of messages rather than per message.

Micro batching comes with a disadvantage as well. In failure scenarios, the whole batch needs to be replayed even if only one message within the batch has failed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.142.85