Introducing Spark andKafka | 147
6.2.5 Why is Kafka so Fast?
Kafka relies heavily on the OS kernel to move data around quickly. It relies on the principals of
Zero Copy. Kafka enables you to batch data records into chunks. These batches of data can be seen
end to end from Producer to le system (Kafka Topic Log) to the Consumer. Batching allows for
more efcient data compression and reduces I/O latency. Kafka writes to the immutable commit
log to the disk sequential and thus, it avoids random disk access, slow disk seeking. Kafka pro-
vides horizontal scale through sharding. It shards a Topic Log into hundreds and potentially thou-
sands of partitions to thousands of servers. This sharding allows Kafka to handle massive load.
6.2.6 Kafka Needs ZooKeeper
Kafka uses ZooKeeper to do leadership election of Kafka broker and topic partition pairs. Kafka
uses ZooKeeper to manage service discovery for Kafka brokers that form the cluster. ZooKeeper
sends changes of the topology to Kafka, so each node in the cluster knows when a new broker
joins, a Broker dies, a topic was removed or a topic was added, etc. ZooKeeper provides an
in-sync view of Kafka Cluster conguration.
6.2.7 Different Components in Kafka
Message: Messages are simply byte arrays and any object can be stored in any format by the devel-
opers, such as Format- String, JSON, Avro, Storage.
There are two components of any message, namely a key, which represents the data about the
message and a value, which represents the body of the message.
Kaa Cluster: In Apache Kafka, more than one brokers, i.e., a set of servers is collectively known
as Kafka cluster.
Kafka Producer: Kafka producer actually creates a record and publishes it to the broker. It is mainly
used as a source system of the data and the producer generates the data to the Kafka Broker or
Server (Ref. Figure 6.14).
FIGURE 6.14 KAFKA components
KAFKA cluster
KAFKA Node 1 KAFKA Node 2 KAFKA Node 3
Producer Producer Producer
Consumer Consumer Consumer
M06 Big Data Simplified XXXX 01.indd 147 5/17/2019 2:49:18 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.7.131