MongoDB is a document-oriented, schema-free NoSQL database. Document oriented meaning that each of the storage units in MongoDB is a document. Think of a document being equivalent to a row in a table of a MySQL database. The documents are organized into collections, for the sake of simplifying database-structures. However, we can still keep different kinds of documents into same collection. Why? Because it is schema-free. No two documents need to have exactly the same fields. More so, the fields could be nested. So we can say it is a schema-free, document-oriented NoSQL database that even supports nested data-structures. This is pretty neat, and gives great power and flexibility to a developer.
Here in this chapter, we will use Scala binding for MongoDB. You should be able to install MongoDB on your system. Please look at different installation guides as per your OS and machine type: http://docs.mongodb.org/manual/installation/.
Let's now confirm if MongoDB is up and running:
$ mongo MongoDB shell version: 2.6.3 connecting to: test > show dbs admin (empty) local 0.078GB
Once we have MongoDB installed on the machine, we configure casbah
(an official Scala binding for MongoDB). Add the following line to build.sbt
for configuring casbah
:
libraryDependencies += "org.mongodb" %% "casbah" % "2.8.1"
MongoDB will serve as a persistent structured datastore. However, for streaming tasks we need a queuing mechanism. We could use any of Apache Kafka, RabbitMQ, Apache ActiveMQ, and so on or just roll our own. We simply chose Apache Kafka for this purpose.
Apache Kafka is an efficient distributed and persistent, producer-consumer queuing software. Let's set it up. First download the tarball, and extract it. Then start the ZooKeper in one terminal, and in another terminal start the kafka
server as shown in the following commands:
$ wget -c http://apache.mesi.com.ar/kafka/0.8.2.0/kafka_2.10-0.8.2.0.tgz $ tar zxf kafka_2.10-0.8.2.0.tgz # Start zookeper $ bin/zookeeper-server-start.sh config/zookeeper.properties # In a separate terminal, start kafka server $ bin/kafka-server-start.sh config/server.properties
Did you notice we just mentioned ZooKeeper? It is that core component that allows coordinating distributed components to keep in sync with each other. For our purposes, we need to make ensure that ZooKeeper is running before we start Kafka.
Additional configurations for both MongoDB and Kafka, such as replication, are beyond the scope of this book. Please read some additional material to learn about those topics.
Once we have Kafka installed on the machine, we add the following line to build.sbt
:
libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "0.8.2.0"
This will add the Kafka library for connecting to it using its API.
18.227.102.50