148 | Big Data Simplied
Kafka Server/Broker: A Kafka cluster consists of one or more servers (Kafka brokers), which are
running Kafka.
Kafka Topic: A topic is a feed name to which messages are stored and published. Messages are
byte arrays that can store any object in any format. As said before, all Kafka messages are
organized into topics. If you wish to send a message, then you send it to a specic topic and if
you wish to read a message, then you read it from a specic topic. Producer applications write
data to topics and consumer applications read from topics. Messages published to the cluster
will stay in the cluster until a congurable retention period has passed by. Kafkaretains all
messages for a set amount of time and therefore, consumers are responsible to track their
location.
Kafka Consumer: Consumers can read messages starting from a specic offset and are allowed to
read from any offset point they choose. This allows consumers to join the cluster at any point
of time.
Consumers pull messages from topic partitions. Different consumers can be responsible for
different partitions. Kafka can support a large number of consumers and retain large amounts of
data with very little overhead. By using consumer groups, consumers can be parallelized so that
multiple consumers can read from multiple partitions on a topic allowing a very high message
processing throughput.
6.2.8 Difference between Apache Kafka and Apache Flume
Apache Kafka Apache Flume
General-purpose tool for multiple producers
and consumers.
Special-purpose tool for specic applications.
Data ow: Pull Data ow: Push
Replicates the events using ingest pipelines. Does not replicate the events.
Velocity is higher. Velocity is high.
6.2.9 Kafka Demonstration—How Messages are Passing from Publisher
to Consumer through a Topic
Step 1: Start the Zookeeper server.
Kafka uses ZooKeeper so you need to first start a ZooKeeper server.
>
bin/zookeeper-server-start.sh config/zookeeper.properties
M06 Big Data Simplified XXXX 01.indd 148 5/17/2019 2:49:18 PM
Introducing Spark andKafka | 149
Step 2: Now start the Kafka server.
>
bin/kafka-server-start.sh config/server.properties
M06 Big Data Simplified XXXX 01.indd 149 5/17/2019 2:49:19 PM
150 | Big Data Simplied
Step 3: Create a topic.
Let’s create a topic named ‘test’ with a single partition and only one replica.
>
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
We can now see that topic if we run the list topic command.
>
bin/kafka-topics.sh --list --zookeeper localhost:2181
Step 4: Start Kafka producer and send some messages through topic ‘test’.
Kafka comes with a command line client that will take input from a file or from standard input
and send it out as messages to the Kafka cluster.
Run the producer and then type a few messages into the console to send to the server.
>
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Step 5: Start a Kafka consumer and consume messages from producer through topic ‘test’.
Kafka also has a command line consumer that will dump out messages to standard output.
>
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092
--topic test --from-beginning
M06 Big Data Simplified XXXX 01.indd 150 5/17/2019 2:49:20 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.67.70