Producer and consumer reliability

In distributed systems, components fail. Its a common practice to design your code to take care of these failures in a seamless fashion (fault-tolerant).

One of the ways by which Kafka tolerates failure is by maintaining the replication of messages. Messages are replicated in so called partitions and Kafka automatically elects one partition as leader and other follower partitions just replicate the leader. The leader also maintains a list of replicas which are in sync so as to make sure that ideal replication is maintained to handle failures.

The producer sends message to the topic (Kafka broker in Kafka cluster) and durability can be configured using the producer configuration, request.required.acks, which has the following values:

  • 0: message written to network/buffer
  • 1: message written directly to partition leader
  • all: producer gets an acknowledgement when all in-sync replicas (ISR’s) get the message

Consumer reads data from topics and in Kafka the state of the message read from topic by a consumer is kept with the consumer itself rather than Kafka. This allows Kafka to take away management of message consumption by each of the consumers. It's the responsibility of each consumer to manage this and they do this using what is called consumer offset (the sequence message ID from where the consumer last read the message). The messages in topic are kept as is and are not deleted soon after consumers have subscribed to a topic read. The messages are deleted from the topic, according to set broker/topic configuration. So, even though the consumer is dead or is not in a position to consume messages, it's still kept in the topic and if the retention period is kept at a reasonable level, when the consumer comes online, using it's offset, it can read all the messages from that offset value, without much of a problem. This is how consumer reliability is achieved in Kafka.

This figure shows the various positions that a consumer uses while traversing a topic partition in a broker.

Figure 11: Offsets which consumer uses to track message consumption

This figure shows some of the important positions maintained by a consumer in a log partition as summarized here:

  • Last committed offset: Offset of the last message written to the log. If a partition fails and the consumer is hooked to a new partition, this offset is used as the starting point.
  • Current position: offset which is read by the consumer
  • High watermark: Offset that holds the message which has been successfully copied to all replicas within the cluster. This is a message till a consumer can ideally read other messages after that won't be exposed for consumption till all replicas get the message and then the watermark offset moves forward.
  • Log end offset: Message which is last written by the producer to a log.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.107.31