Chapter 4. Indexing

As you have seen with the subject of relational databases, indexes are important structures when we think of a performance boost. In fact, indexes are so important that for most database administrators, they are a critical tool in their search for the continuous improvement of database performance.

In NoSQL databases such as MongoDB, indexing is part of a bigger strategy that will allow us to achieve many gains in performance and assign important behaviors to our database, which will be essential to the data model's maintenance.

This happens because we can have indexes with very special properties in MongoDB. For example, we can define an index of a date typed field that will control when a document should be removed from the collection.

So, in this chapter we will see:

  • Indexing documents
  • Index types
  • Special index properties

Indexing documents

Out of all the subjects we have been discussing in this book so far, this is where we will be the most at ease. The index concept is present in almost every relational database, so if you have any previous basic knowledge on the matter, you will most likely have no difficulty in this chapter.

But in case you feel that you are not familiar enough with the concept of indexes, an easy way to understand them is to draw a parallel with books. Suppose that we have a book with an index like this:

Indexing documents

With this in hand, if we decide to read about the Internet, we know that on page 4, we will find information on the subject. On the other hand, how would we be able to find information we are looking for without the page number? The answer is quite simple: by going through the entire book, page by page, until we find the word "Internet."

As you might already know, indexes are data structures that hold part of the data from our main data source. In relational databases, indexes hold parts of a table, while in MongoDB, since indexes are on a collection level, these will hold part of a document. Similar to relational databases, indexes use a B-Tree data structure at implementation level.

Depending on our application's requirements, we can create indexes of fields or fields of embedded documents. When we create an index, it will hold a sorted set of values of the fields we choose.

Thus, when we execute a query, if there is an index that covers the query criteria, MongoDB will use the index to limit the number of documents to be scanned.

We have the customers collection that we used in Chapter 3, Querying Documents, which contains these documents:

{
   "_id" : ObjectId("54aecd26867124b88608b4c9"),
   "username" : "customer1",
   "email" : "[email protected]",
   "password" : "b1c5098d0c6074db325b0b9dddb068e1"
}

We can create an index in the mongo shell on the username field, by using the createIndex method:

db.customers.createIndex({username: 1})

The following query will use the previously created index:

db.customers.find({username: "customer1"})

Note

Since Version 3.0.0, the ensureIndex method is deprecated and is an alias for the createIndex method.

We could state that this is the simplest way to create and use an index in MongoDB. In addition to this, we can create indexes on multikey fields or in embedded documents' fields, for instance.

In the next section, we will go through all these index types.

Indexing a single field

As we already stated in the last section, the simplest way to create an index on MongoDB is to do so in a single field. The index could be created on a field of any type in the collection of documents.

Consider the customers collection we used before, with some modification to work in this section:

{
   "_id" : ObjectId("54aecd26867124b88608b4c9"),
   "username" : "customer1",
   "email" : "[email protected]",
   "password" : "b1c5098d0c6074db325b0b9dddb068e1",
   "age" : 25,
   "address" : {

      "street" : "Street 1",
      "zipcode" : "87654321",
      "state" : "RJ"

   }
}

The following command creates an ascending index in the username field:

db.customers.createIndex({username: 1})

In order to create an index in MongoDB, we use the createIndex method. In the preceding code, we just passed a single document as a parameter to the createIndex method. The document {username: 1} contains a reference to the field that the index should be creating and the order: 1 for ascending or -1 for descending.

Another way to create the same index, but in descending order, is:

db.customers.createIndex({username: -1})

In the following query, MongoDB will use the index created in the username field to reduce the number of documents in the customers collection that it should inspect:

db.customers.find({username: "customer1"})

Besides the creation of indexes on a string or the number fields in the collection document, we could create an index of a field in an embedded document. Therefore, queries such as this will use the created index:

db.customers.createIndex({"address.state": 1})

The following code creates an index of the state field of the embedded address document:

db.customers.find({"address.state": "RJ"})

While a bit more complex, we can also create an index of the entire embedded document:

db.customers.createIndex({address: 1})

The following query will use the index:

db.customers.find(
{
   "address" : 
   { 
      "street" : "Street 1", 
      "zipcode" : "87654321", 
      "state" : "RJ"
   }
}
)

But none of these queries will do this:

db.customers.find({state: "RJ"})

db.customers.find({address: {zipcode: "87654321"}})

This happens because in order to match an embedded document, we have to match exactly the entire document, including the field order. The following query will not use the index either:

db.customers.find(
{
   "address" : 
   { 
      "state" : "RJ", 
      "street" : "Street 1", 
      "zipcode" : "87654321" 
   }
}
)

Although the document contains all the fields, these are in a different order.

Before moving on to the next type of index, let's review a concept that you learned in Chapter 3, Querying Documents, the _id field. For every new document created in a collection, we should specify the _id field. If we do not specify it, MongoDB automatically creates one ObjectId typed for us. Furthermore, every collection automatically creates a unique ascending index of the _id field. That being said, we can state that the _id field is the document's primary key.

Indexing more than one field

In MongoDB, we can create an index that holds values for more than one field. We should call this kind of index a compound index. There is no big difference between a single field index and a compound index. The biggest difference is in the sort order. Before we move on to the particularities of compound indexes, let's use the customers collection to create our first compound index:

{
   "_id" : ObjectId("54aecd26867124b88608b4c9"),
   "username" : "customer1",
   "email" : "[email protected]",
   "password" : "b1c5098d0c6074db325b0b9dddb068e1",
   "age" : 25,
   "address" : {
      "street" : "Street 1",
      "zipcode" : "87654321",
      "state" : "RJ"
   }
}

We can imagine that an application that wants to authenticate a customer uses the username and password fields together in a query like this:

db.customers.find(
{
username: "customer1", 
password: "b1c5098d0c6074db325b0b9dddb068e1"
}
)

To enable better performance when executing this query, we can create an index of both the username and password fields:

db.customers.createIndex({username: 1, password: 1})

Nevertheless, for the following queries, does MongoDB use the compound index?

#Query 1
db.customers.find({username: "customer1"})
#Query 2
db.customers.find({password: "b1c5098d0c6074db325b0b9dddb068e1"})
#Query 3
db.customers.find(
{
   password: "b1c5098d0c6074db325b0b9dddb068e1", 
   username: "customer1"
}
)

The answer is yes for Query 1 and Query 3. As mentioned before, the order is very important in the creation of a compound index. The index created will have references to the documents sorted by the username field, and within each username entry, sorted by password entries. Thus, a query with only the password field as the criteria will not use the index.

Let's assume for a moment that we have the following index in the customers collection:

db.customers.createIndex(
{
   "address.state":1, 
   "address.zipcode": 1, 
   "address.street": 1
})

You might be asking which queries will use our new compound index? Before answering that question, we need to understand a compound index concept in MongoDB: the prefix. The prefix in a compound index is a subset of the indexed fields. As its name suggests, it is the fields that take precedence over other fields in the index. In our example, both {"address.state":1} and {"address.state":1, "address.zipcode": 1} are index prefixes.

A query that has any index prefix will use the compound index. Therefore, we can deduce that:

  • Queries that include the address.state field will use the compound index
  • Queries that include both the address.state and address.zipcode fields will also use the compound index
  • Queries with address.state, address.zipcode and address.street will also use the compound index
  • Queries with both address.state and address.street will also use the compound index

The compound index will not be used on queries that:

  • Have only the address.zipcode field
  • Have only the address.street field
  • Have both the address.zipcode and address.street fields

Note

We should notice that, despite a query that has both address.state and address.street fields using the index, we could achieve a better performance in this query if we have single indexes for each field. This is explained by the fact that the compound index will be first sorted by address.state, followed by a sort on the address.zipcode field, and finally a sort on the address.street field. Thus, it is much more expensive for MongoDB to inspect this index than to inspect the other two indexes individually.

So, for this query:

db.customers.find(
{
   "address.state": "RJ", 
   "address.street": "Street 1"
}
)

It would be more efficient if we have this index:

db.customers.createIndex({"address.state": 1, "address.street": 1})

Indexing multikey fields

Another way to create indexes in MongoDB is to create an index of an array field. These indexes can hold arrays of primitive values, such as strings and numbers, or even arrays of documents.

We must be particularly attentive while creating multikey indexes. Especially when we want to create a compound multikey index. It is not possible to create a compound index of two array fields.

Note

The main reason why we could not create an index of parallel arrays is because they will require the index to include an entry in the Cartesian product of the compound keys, which will result in a large index.

Consider the customers collection with documents like this one:

{
   "_id" : ObjectId("54aecd26867124b88608b4c9"),
   "username" : "customer1",
   "email" : "[email protected]",
   "password" : "b1c5098d0c6074db325b0b9dddb068e1",
   "age" : 25,
   "address" : {
      "street" : "Street 1",
      "zipcode" : "87654321",
      "state" : "RJ"
   },
   "followedSellers" : [
      "seller1",
      "seller2",
      "seller3"
   ],
   "wishList" : [
      {
         "sku" : 123,
         "seller" : "seller1"
      },
      {
         "sku" : 456,
         "seller" : "seller2"
      },
      {
         "sku" : 678,
         "seller" : "seller3"
      }
   ]
}

We can create the following indexes for this collection:

db.customers.createIndex({followedSellers: 1})

db.customers.createIndex({wishList: 1})

db.customers.createIndex({"wishList.sku": 1})

db.customers.createIndex({"wishList.seller": 1})

But the following index cannot be created:

db.customers.createIndex({followedSellers: 1, wishList: 1}

Indexing for text search

Since its 2.4 version, MongoDB gives us the chance to create indexes that will help us in a text search. Although there are a wide variety of specialized tools for this, such Apache Solr, Sphinx, and ElasticSearch, most of the relational and NoSQL databases have full text searching natively.

It is possible to create a text index of a string or an array of string fields in a collection. For the following examples, we will use the products collection that we also used in Chapter 3, Querying Documents, but with some modification:

{ 
   "_id" : ObjectId("54837b61f059b08503e200db"), 
   "name" : "Product 1", 
   "description" : 
   "Product 1 description", 
   "price" : 10, 
   "supplier" : { 
      "name" : "Supplier 1", 
      "telephone" : "+552199998888" 
   }, 
   "review" : [ 
   { 
      "customer" : { 
         "email" : "[email protected]" 
      }, 
      "stars" : 5 
   }
   ],
   "keywords" : [ "keyword1", "keyword2", "keyword3" ] 
}

We can create a text index just by specifying the text parameter in the createIndex method:

db.products.createIndex({name: "text"})

db.products.createIndex({description: "text"})

db.products.createIndex({keywords: "text"})

All the preceding commands could create a text index of the products collection. But, MongoDB has a limitation, in that we can only have one text index per collection. Thus, only one of the previous commands could be executed for the products collection.

Despite the limitation to create only one text index per collection, it is possible to create a compound text index:

db.products.createIndex({name: "text", description: "text"})

The preceding command creates a text index field for the name and description fields.

Note

A common and useful way of creating a text index of a collection is to create an index for all text fields of the collection. There is a special syntax for creating this index, which you can see as follows:

db.products.createIndex({"$**","text"})

For a query to use a text index, we should use the $text operator in it. And, to better understand how to create an effective query, it is good to know how the indexes are created. As a matter of fact, the same process is used to execute the query using the $text operator.

To sum up the process, we can split it into three phases:

  • Tokenization
  • Removal of suffix and/or prefix, or stemming
  • Removal of stop words

In order to optimize our queries, we can specify the language we are using in our text fields, and consequently in our text index, so that MongoDB will use a list of words in all three phases of the indexing process.

Since its 2.6 version, MongoDB supports the following languages:

  • da or danish
  • nl or dutch
  • en or english
  • fi or finnish
  • fr or french
  • de or german
  • hu or hungarian
  • it or italian
  • nb or norwegian
  • pt or portuguese
  • ro or romanian
  • ru or russian
  • es or spanish
  • sv or swedish
  • tr or turkish

An example of an index creation with language could be:

db.products.createIndex({name: "text"},{ default_language: "pt"})

We can also opt to not use any language, by just creating the index with a none value:

db.products.createIndex({name: "text"},{ default_language: "none"})

By using the none value option, MongoDB will simply perform tokenization and stemming; it will not load any stop words list.

When we decide to use a text index, we should always double our attention. Every single detail will have a side effect on the way we design our documents. In previous versions of MongoDB, before creating a text index, we should change the allocation method for all collections to usePowerOf2Sizes. This is because text indexes are considered larger indexes.

Another major concern occurs at the moment we create the index. Depending on the size of the existing collection, the index could be very large, and to create a very large index we will need a lot of time. Thus, it is better to schedule this process to take place at a more timely opportunity.

Finally, we have to predict the impact that the text indexes will have on our write operations. This happens because, for each new record created in our collection, there will also be an entry created in the index referencing all the indexed value fields.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.19.75