Chapter 1. Instant MongoDB

Welcome to Instant MongoDB. This book has been developed to provide you with all the information that you need to get started with MongoDB. You will learn the basics of MongoDB, get started by installing it, and then perform various operations on it such as inserting, updating, and querying data and discover some tips and tricks for using MongoDB.

This document contains the following sections:

So what is MongoDB? explains what MongoDB actually is, what you can do with it, and why it's so great.

Installation shows you how to download and install MongoDB with minimum effort, introducing the important configuration parameters and then how to set it up so that you can use it as soon as possible.

Quick start – setting up database and querying starts off by giving a brief comparison of terminologies from the Mongo world and their equivalent in the relational world. We then import some data in the database; this is the data we would be playing around with for most of the book. We conclude this section by connecting to the MongoDB from the Mongo shell and executing some queries to get a feel of how the queries and the data look. We basically would only be scratching the surface of this technology in this section.

Top 4 features you need to know about helps us learn how to perform various operations on MongoDB from the Mongo shell. By the end of this section you will be able to connect to a database from the shell, perform insert, update, and upsert (update + insert) operations, execute advanced queries, schema design concepts, and creating indexes for performance. Also, you will finally learn about the new aggregation framework and Map reduce operations.

People and places you should get to know lists many useful links to the project page and forums. Also, since every Open Source project is centered on a community, it provides a number of helpful articles, tutorials, and blogs which will enrich your learning process.

So, what is MongoDB?

Put simply, MongoDB is a Documented Oriented Database.

What is a document?

While it may vary for various implementations of different Document Oriented Databases available, as far as MongoDB is concerned it is a BSON document, which stands for Binary JSON. JSON (JavaScript Object Notation) is an open standard developed for human readable data exchange. Though a thorough knowledge of JSON is not really important to understand MongoDB, for keen readers the URL to its RFC is http://tools.ietf.org/html/rfc4627. Also, the BSON specification can be found at http://bsonspec.org/. Since MongoDB stores the data as BSON documents, it is a Document Oriented Database.

What does a document look like?

Consider the following example where we represent a person using JSON:

{
  "firstName":"Jack",
  "secondName":"Jones",
  "age":30,
  "phoneNumbers":[
    {fixedLine:"1234"},
    {mobile:"5678"}
  ],
  "residentialAddress":{
lineOne:"…", 
lineTwo:"…", 
city:"…", 
state:"…", 
zip:"…", 
country:"…"
  }
}

As we can see, a JSON document always starts and ends with curly braces and has all the content within these braces. Multiple fields and values are separated by commas, with a field name always being a string value and the value being of any type ranging from string, numbers, date, array, another JSON document, and so on. For example in "firstName":"Jack", the firstName is the name of the field whereas Jack is the value of the field.

Need for MongoDB

Many of you would probably be wondering why we need another database when we already have good old relational databases. We will try to see a few drivers from its introduction back in 2009.

Relational databases are extremely rich in features. But these features don't come for free; there is a price to pay and it is done by compromising on the scalability and flexibility. Let us see these one by one.

Scalability

It is a factor used to measure the ease with which a system can accommodate the growing amount of work or data. There are two ways in which you can scale your system: scale up, also known as scale vertically or scale out, also known as scale horizontally. Vertical scalability can simply be put up as an approach where we say "Need more processing capabilities? Upgrade to a bigger machine with more cores and memory". Unfortunately, with this approach we hit a wall as it is expensive and technically we cannot upgrade the hardware beyond a certain level. You are then left with an option to optimize your application, which might not be a very feasible approach for some systems which are running in production for years.

On the other hand, Horizontal scalability can be described as an approach where we say "Need more processing capabilities? Simple, just add more servers and multiply the processing capabilities". Theoretically this approach gives us unlimited processing power but we have more challenges in practice. For many machines to work together, there would be a communication overhead between them and the probability of any one of these machines being down at a given point of time is much higher.

MongoDB enables us to scale horizontally easily, and at the same time addresses the problems related to scaling horizontally to a great extent. The end result is that it is very easy to scale MongoDB with increasing data as compared to relational databases.

Ease of development

MongoDB doesn't have the concept of creation of schema as we have in relational databases. The document that we just saw can have an arbitrary structure when we store them in the database. This feature makes it very easy for us to model and store relatively unstructured/complex data, which becomes difficult to model in a relational database. For example, product catalogues of an e-commerce application containing various items and each having different attributes. Also, it is more natural to use JSON in application development than tables from relational world.

Ok, it looks good, but what is the catch? Where not to use MongoDB?

To achieve the goal of letting MongoDB scale out easily, it had to do away with features like joins and multi document/distributed transactions. Now, you must be wondering it is pretty useless as we have taken away two of the most important features of the relational database. However, to mitigate the problems of joins is one of the reasons why MongoDB is document oriented. If you look at the preceding JSON document for the person, we have the address and the phone number as a part of the document. In relational database, these would have been in separate tables and retrieved by joining these tables together.

Distributed/Multi document transactions inhibit MongoDB to scale out and hence are not supported and nor there is a way to mitigate it. MongoDB still is atomic but the atomicity for inserts and updates is guaranteed at document level and not across multiple documents. Hence, MongoDB is not a good fit for scenarios where complex transactions are needed, such as in an OLTP banking applications. This is an area where good old relational database still rules.

To conclude, let us take a look at the following image. This graph is pretty interesting and was presented by Dwight Merriman, Founder and CEO of 10gen, the MongoDB company in one of his online courses.

Ease of development

As we can see, we have on one side some products like Memcached which is very low on functionality but high on scalability and performance. On the other end we have RDBMS (Relational Database Management System) which is very rich in features but not that scalable. According to the research done while developing MongoDB, this graph is not linear and there is a point in it after which the scalability and performance fall steeply on adding more features to the product. MongoDB sits on this point where it gives maximum possible features without compromising too much on the scalability and performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.79.65