As you have seen with the subject of relational databases, indexes are important structures when we think of a performance boost. In fact, indexes are so important that for most database administrators, they are a critical tool in their search for the continuous improvement of database performance.
In NoSQL databases such as MongoDB, indexing is part of a bigger strategy that will allow us to achieve many gains in performance and assign important behaviors to our database, which will be essential to the data model's maintenance.
This happens because we can have indexes with very special properties in MongoDB. For example, we can define an index of a date typed field that will control when a document should be removed from the collection.
So, in this chapter we will see:
Out of all the subjects we have been discussing in this book so far, this is where we will be the most at ease. The index concept is present in almost every relational database, so if you have any previous basic knowledge on the matter, you will most likely have no difficulty in this chapter.
But in case you feel that you are not familiar enough with the concept of indexes, an easy way to understand them is to draw a parallel with books. Suppose that we have a book with an index like this:
With this in hand, if we decide to read about the Internet, we know that on page 4, we will find information on the subject. On the other hand, how would we be able to find information we are looking for without the page number? The answer is quite simple: by going through the entire book, page by page, until we find the word "Internet."
As you might already know, indexes are data structures that hold part of the data from our main data source. In relational databases, indexes hold parts of a table, while in MongoDB, since indexes are on a collection level, these will hold part of a document. Similar to relational databases, indexes use a B-Tree data structure at implementation level.
Depending on our application's requirements, we can create indexes of fields or fields of embedded documents. When we create an index, it will hold a sorted set of values of the fields we choose.
Thus, when we execute a query, if there is an index that covers the query criteria, MongoDB will use the index to limit the number of documents to be scanned.
We have the customers
collection that we used in Chapter 3, Querying Documents, which contains these documents:
{ "_id" : ObjectId("54aecd26867124b88608b4c9"), "username" : "customer1", "email" : "[email protected]", "password" : "b1c5098d0c6074db325b0b9dddb068e1" }
We can create an index in the mongo shell on the username
field, by using the createIndex
method:
db.customers.createIndex({username: 1})
The following query will use the previously created index:
db.customers.find({username: "customer1"})
We could state that this is the simplest way to create and use an index in MongoDB. In addition to this, we can create indexes on multikey fields or in embedded documents' fields, for instance.
In the next section, we will go through all these index types.
As we already stated in the last section, the simplest way to create an index on MongoDB is to do so in a single field. The index could be created on a field of any type in the collection of documents.
Consider the customers
collection we used before, with some modification to work in this section:
{ "_id" : ObjectId("54aecd26867124b88608b4c9"), "username" : "customer1", "email" : "[email protected]", "password" : "b1c5098d0c6074db325b0b9dddb068e1", "age" : 25, "address" : { "street" : "Street 1", "zipcode" : "87654321", "state" : "RJ" } }
The following command creates an ascending index in the username
field:
db.customers.createIndex({username: 1})
In order to create an index in MongoDB, we use the createIndex
method. In the preceding code, we just passed a single document as a parameter to the createIndex
method. The document {username: 1}
contains a reference to the field that the index should be creating and the order: 1 for ascending or -1 for descending.
Another way to create the same index, but in descending order, is:
db.customers.createIndex({username: -1})
In the following query, MongoDB will use the index created in the username
field to reduce the number of documents in the customers
collection that it should inspect:
db.customers.find({username: "customer1"})
Besides the creation of indexes on a string or the number fields in the collection document, we could create an index of a field in an embedded document. Therefore, queries such as this will use the created index:
db.customers.createIndex({"address.state": 1})
The following code creates an index of the state
field of the embedded address document:
db.customers.find({"address.state": "RJ"})
While a bit more complex, we can also create an index of the entire embedded document:
db.customers.createIndex({address: 1})
The following query will use the index:
db.customers.find( { "address" : { "street" : "Street 1", "zipcode" : "87654321", "state" : "RJ" } } )
But none of these queries will do this:
db.customers.find({state: "RJ"}) db.customers.find({address: {zipcode: "87654321"}})
This happens because in order to match an embedded document, we have to match exactly the entire document, including the field order. The following query will not use the index either:
db.customers.find( { "address" : { "state" : "RJ", "street" : "Street 1", "zipcode" : "87654321" } } )
Although the document contains all the fields, these are in a different order.
Before moving on to the next type of index, let's review a concept that you learned in Chapter 3, Querying Documents, the _id
field. For every new document created in a collection, we should specify the _id
field. If we do not specify it, MongoDB automatically creates one ObjectId
typed for us. Furthermore, every collection automatically creates a unique ascending index of the _id
field. That being said, we can state that the _id
field is the document's primary key.
In MongoDB, we can create an index that holds values for more than one field. We should call this kind of index a compound index. There is no big difference between a single field index and a compound index. The biggest difference is in the sort order. Before we move on to the particularities of compound indexes, let's use the customers
collection to create our first compound index:
{ "_id" : ObjectId("54aecd26867124b88608b4c9"), "username" : "customer1", "email" : "[email protected]", "password" : "b1c5098d0c6074db325b0b9dddb068e1", "age" : 25, "address" : { "street" : "Street 1", "zipcode" : "87654321", "state" : "RJ" } }
We can imagine that an application that wants to authenticate a customer uses the username
and password
fields together in a query like this:
db.customers.find( { username: "customer1", password: "b1c5098d0c6074db325b0b9dddb068e1" } )
To enable better performance when executing this query, we can create an index of both the username
and password
fields:
db.customers.createIndex({username: 1, password: 1})
Nevertheless, for the following queries, does MongoDB use the compound index?
#Query 1 db.customers.find({username: "customer1"}) #Query 2 db.customers.find({password: "b1c5098d0c6074db325b0b9dddb068e1"}) #Query 3 db.customers.find( { password: "b1c5098d0c6074db325b0b9dddb068e1", username: "customer1" } )
The answer is yes for Query 1
and Query 3
. As mentioned before, the order is very important in the creation of a compound index. The index created will have references to the documents sorted by the username
field, and within each username entry, sorted by password entries. Thus, a query with only the password
field as the criteria will not use the index.
Let's assume for a moment that we have the following index in the customers
collection:
db.customers.createIndex( { "address.state":1, "address.zipcode": 1, "address.street": 1 })
You might be asking which queries will use our new compound index? Before answering that question, we need to understand a compound index concept in MongoDB: the prefix. The prefix in a compound index is a subset of the indexed fields. As its name suggests, it is the fields that take precedence over other fields in the index. In our example, both {"address.state":1}
and {"address.state":1, "address.zipcode": 1}
are index prefixes.
A query that has any index prefix will use the compound index. Therefore, we can deduce that:
address.state
field will use the compound indexaddress.state
and address.zipcode
fields will also use the compound indexaddress.state
, address.zipcode
and address.street
will also use the compound indexaddress.state
and address.street
will also use the compound indexThe compound index will not be used on queries that:
address.zipcode
fieldaddress.street
fieldaddress.zipcode
and address.street
fieldsWe should notice that, despite a query that has both address.state
and address.street
fields using the index, we could achieve a better performance in this query if we have single indexes for each field. This is explained by the fact that the compound index will be first sorted by address.state
, followed by a sort on the address.zipcode
field, and finally a sort on the address.street
field. Thus, it is much more expensive for MongoDB to inspect this index than to inspect the other two indexes individually.
So, for this query:
db.customers.find( { "address.state": "RJ", "address.street": "Street 1" } )
It would be more efficient if we have this index:
db.customers.createIndex({"address.state": 1, "address.street": 1})
Another way to create indexes in MongoDB is to create an index of an array field. These indexes can hold arrays of primitive values, such as strings and numbers, or even arrays of documents.
We must be particularly attentive while creating multikey indexes. Especially when we want to create a compound multikey index. It is not possible to create a compound index of two array fields.
Consider the customers
collection with documents like this one:
{ "_id" : ObjectId("54aecd26867124b88608b4c9"), "username" : "customer1", "email" : "[email protected]", "password" : "b1c5098d0c6074db325b0b9dddb068e1", "age" : 25, "address" : { "street" : "Street 1", "zipcode" : "87654321", "state" : "RJ" }, "followedSellers" : [ "seller1", "seller2", "seller3" ], "wishList" : [ { "sku" : 123, "seller" : "seller1" }, { "sku" : 456, "seller" : "seller2" }, { "sku" : 678, "seller" : "seller3" } ] }
We can create the following indexes for this collection:
db.customers.createIndex({followedSellers: 1}) db.customers.createIndex({wishList: 1}) db.customers.createIndex({"wishList.sku": 1}) db.customers.createIndex({"wishList.seller": 1})
But the following index cannot be created:
db.customers.createIndex({followedSellers: 1, wishList: 1}
Since its 2.4 version, MongoDB gives us the chance to create indexes that will help us in a text search. Although there are a wide variety of specialized tools for this, such Apache Solr, Sphinx, and ElasticSearch, most of the relational and NoSQL databases have full text searching natively.
It is possible to create a text index of a string or an array of string fields in a collection. For the following examples, we will use the products
collection that we also used in Chapter 3, Querying Documents, but with some modification:
{ "_id" : ObjectId("54837b61f059b08503e200db"), "name" : "Product 1", "description" : "Product 1 description", "price" : 10, "supplier" : { "name" : "Supplier 1", "telephone" : "+552199998888" }, "review" : [ { "customer" : { "email" : "[email protected]" }, "stars" : 5 } ], "keywords" : [ "keyword1", "keyword2", "keyword3" ] }
We can create a text index just by specifying the text
parameter in the createIndex
method:
db.products.createIndex({name: "text"}) db.products.createIndex({description: "text"}) db.products.createIndex({keywords: "text"})
All the preceding commands could create a text index of the products
collection. But, MongoDB has a limitation, in that we can only have one text index per collection. Thus, only one of the previous commands could be executed for the products
collection.
Despite the limitation to create only one text index per collection, it is possible to create a compound text index:
db.products.createIndex({name: "text", description: "text"})
The preceding command creates a text
index field for the name
and description
fields.
For a query to use a text index, we should use the $text
operator in it. And, to better understand how to create an effective query, it is good to know how the indexes are created. As a matter of fact, the same process is used to execute the query using the $text
operator.
To sum up the process, we can split it into three phases:
In order to optimize our queries, we can specify the language we are using in our text fields, and consequently in our text index, so that MongoDB will use a list of words in all three phases of the indexing process.
Since its 2.6 version, MongoDB supports the following languages:
da
or danish
nl
or dutch
en
or english
fi
or finnish
fr
or french
de
or german
hu
or hungarian
it
or italian
nb
or norwegian
pt
or portuguese
ro
or romanian
ru
or russian
es
or spanish
sv
or swedish
tr
or turkish
An example of an index creation with language could be:
db.products.createIndex({name: "text"},{ default_language: "pt"})
We can also opt to not use any language, by just creating the index with a none
value:
db.products.createIndex({name: "text"},{ default_language: "none"})
By using the none
value option, MongoDB will simply perform tokenization and stemming; it will not load any stop words list.
When we decide to use a text index, we should always double our attention. Every single detail will have a side effect on the way we design our documents. In previous versions of MongoDB, before creating a text index, we should change the allocation method for all collections to usePowerOf2Sizes. This is because text indexes are considered larger indexes.
Another major concern occurs at the moment we create the index. Depending on the size of the existing collection, the index could be very large, and to create a very large index we will need a lot of time. Thus, it is better to schedule this process to take place at a more timely opportunity.
Finally, we have to predict the impact that the text indexes will have on our write operations. This happens because, for each new record created in our collection, there will also be an entry created in the index referencing all the indexed value fields.
18.116.19.75