Setting up MongoDB and Apache Kafka

MongoDB is a document-oriented, schema-free NoSQL database. Document oriented meaning that each of the storage units in MongoDB is a document. Think of a document being equivalent to a row in a table of a MySQL database. The documents are organized into collections, for the sake of simplifying database-structures. However, we can still keep different kinds of documents into same collection. Why? Because it is schema-free. No two documents need to have exactly the same fields. More so, the fields could be nested. So we can say it is a schema-free, document-oriented NoSQL database that even supports nested data-structures. This is pretty neat, and gives great power and flexibility to a developer.

Setting up MongoDB

Here in this chapter, we will use Scala binding for MongoDB. You should be able to install MongoDB on your system. Please look at different installation guides as per your OS and machine type: http://docs.mongodb.org/manual/installation/.

Let's now confirm if MongoDB is up and running:

$ mongo
MongoDB shell version: 2.6.3
connecting to: test
> show dbs
admin  (empty)
local  0.078GB

Once we have MongoDB installed on the machine, we configure casbah (an official Scala binding for MongoDB). Add the following line to build.sbt for configuring casbah:

libraryDependencies += "org.mongodb" %% "casbah" % "2.8.1"

MongoDB will serve as a persistent structured datastore. However, for streaming tasks we need a queuing mechanism. We could use any of Apache Kafka, RabbitMQ, Apache ActiveMQ, and so on or just roll our own. We simply chose Apache Kafka for this purpose.

Setting up Apache Kafka

Apache Kafka is an efficient distributed and persistent, producer-consumer queuing software. Let's set it up. First download the tarball, and extract it. Then start the ZooKeper in one terminal, and in another terminal start the kafka server as shown in the following commands:

$ wget -c http://apache.mesi.com.ar/kafka/0.8.2.0/kafka_2.10-0.8.2.0.tgz
$ tar zxf kafka_2.10-0.8.2.0.tgz
# Start zookeper
$ bin/zookeeper-server-start.sh config/zookeeper.properties
# In a separate terminal, start kafka server
$ bin/kafka-server-start.sh config/server.properties

Did you notice we just mentioned ZooKeeper? It is that core component that allows coordinating distributed components to keep in sync with each other. For our purposes, we need to make ensure that ZooKeeper is running before we start Kafka.

Additional configurations for both MongoDB and Kafka, such as replication, are beyond the scope of this book. Please read some additional material to learn about those topics.

Once we have Kafka installed on the machine, we add the following line to build.sbt:

libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "0.8.2.0"

This will add the Kafka library for connecting to it using its API.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.102.50