148 | Big Data Simplied
Kafka Server/Broker: A Kafka cluster consists of one or more servers (Kafka brokers), which are
running Kafka.
Kafka Topic: A topic is a feed name to which messages are stored and published. Messages are
byte arrays that can store any object in any format. As said before, all Kafka messages are
organized into topics. If you wish to send a message, then you send it to a specic topic and if
you wish to read a message, then you read it from a specic topic. Producer applications write
data to topics and consumer applications read from topics. Messages published to the cluster
will stay in the cluster until a congurable retention period has passed by. Kafkaretains all
messages for a set amount of time and therefore, consumers are responsible to track their
location.
Kafka Consumer: Consumers can read messages starting from a specic offset and are allowed to
read from any offset point they choose. This allows consumers to join the cluster at any point
of time.
Consumers pull messages from topic partitions. Different consumers can be responsible for
different partitions. Kafka can support a large number of consumers and retain large amounts of
data with very little overhead. By using consumer groups, consumers can be parallelized so that
multiple consumers can read from multiple partitions on a topic allowing a very high message
processing throughput.
6.2.8 Difference between Apache Kafka and Apache Flume
Apache Kafka Apache Flume
General-purpose tool for multiple producers
and consumers.
Special-purpose tool for specic applications.
Data ow: Pull Data ow: Push
Replicates the events using ingest pipelines. Does not replicate the events.
Velocity is higher. Velocity is high.
6.2.9 Kafka Demonstration—How Messages are Passing from Publisher
to Consumer through a Topic
Step 1: Start the Zookeeper server.
Kafka uses ZooKeeper so you need to first start a ZooKeeper server.
>
bin/zookeeper-server-start.sh config/zookeeper.properties
M06 Big Data Simplified XXXX 01.indd 148 5/17/2019 2:49:18 PM