© Subhashini Chellappan and Dharanitharan Ganesan 2020
S. Chellappan, D. GanesanMongoDB Recipeshttps://doi.org/10.1007/978-1-4842-4891-1_4

4. Indexes

Subhashini Chellappan1  and Dharanitharan Ganesan2
(1)
Bangalore, India
(2)
Krishnagiri, Tamil Nadu, India
 
In Chapter 3, we discussed data modeling patterns and the aggregation framework. In this chapter, we to discuss the following topics:
  • Indexes.

  • Types of indexes.

  • Index properties.

  • The various indexing strategies to be considered.

Indexes are used to improve the performance of the query. Without indexes, MongoDB must search the entire collection to select those documents that match the query statement. MongoDB therefore uses indexes to limit the number of documents that it must scan.

Indexes are special data structures that store a small portion of the collection’s data set in an easy-to-transform format. The index stores a set of fields ordered by the value of the field. This ordering helps to improve the performance of equality matches and range-based query operations.

MongoDB defines indexes at the collection level and indexes can be created on any field of the document. MongoDB creates an index for the _id field by default.

Note

MongoDB creates a default unique index _id field, which helps to prevent inserting two documents with the same value of the _id field.

Recipe 4-1. Working with Indexes

In this recipe, we are going to discuss how to work with indexes in MongoDB.

Problem

You want to create an index.

Solution

Use the following command.
db.collection.createIndex()

How It Works

Let’s follow the steps in this section to create an index.

Consider the following employee collection.
db.employee.insert({empId:1,empName:"John",state:"KA",country:"India"})
db.employee.insert({empId:2,empName:"Smith",state:"CA",country:"US"})
db.employee.insert({empId:3,empName:"James",state:"FL",country:"US"})
db.employee.insert({empId:4,empName:"Josh",state:"TN",country:"India"})
db.employee.insert({empId:5,empName:"Joshi",state:"HYD",country:"India"})

Step 1: Create an Index

To create single-field index on the empId field, use the following command.
db.employee.createIndex({empId:1})

Here, the parameter value “1” indicates that empId field values will be stored in ascending order.

Here is the output,
> db.employee.createIndex({empId:1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>
To create an index on multiple fields, known as a compound index, use this command.
db.employee.createIndex({empId:1,empName:1})
Here is the output,
> db.employee.createIndex({empId:1,empName:1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 2,
        "numIndexesAfter" : 3,
        "ok" : 1
}
>
To display a list of indexes, this is the syntax.
db.employee.getIndexes()
Here is the output,
> db.employee.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "test.employee"
        },
        {
                "v" : 2,
                "key" : {
                        "empId" : 1
                },
                "name" : "empId_1",
                "ns" : "test.employee"
        },
        {
                "v" : 2,
                "key" : {
                        "empId" : 1,
                        "empName" : 1
                },
                "name" : "empId_1_empName_1",
                "ns" : "test.employee"
        }
]
>
To drop a compound index, use the following command.
db.employee.dropIndex({empId:1,empName:1})
Here is the output,
> db.employee.dropIndex({empId:1,empName:1})
{ "nIndexesWas" : 3, "ok" : 1 }
>
To drop all the indexes, use this command.
db.employee.dropIndexes()
Here is the output,
> db.employee.dropIndexes()
{
        "nIndexesWas" : 2,
        "msg" : "non-_id indexes dropped for collection",
        "ok" : 1
}
>

Note

We can’t drop _id indexes. MongoDB creates an index for the _id field by default.

Recipe 4-2. Index Types

In this recipe, we are going to discuss various types of indexes.

Problem

You want to create different types of indexes.

Solution

Use the following command.
db.collection.createIndex()

How It Works

Let’s follow the steps in this section to create the different types of indexes shown in Figure 4-1.
../images/475461_1_En_4_Chapter/475461_1_En_4_Fig1_HTML.png
Figure 4-1

Types of indexes

Step 1: Multikey Index

Multikey indexes are useful to create an index for a field that holds an array value. MongoDB creates an index key for each element in the array.

Consider the below collection, you can create if the collection is not available.
db.employeeproject.insert({empId:1001,empName:"John",projects:["Hadoop","MongoDB"]})
db.employeeproject.insert({empId:1002,empName:"James",projects:["MongoDB","Spark"]})
To create an index on the projects field, use the following command.
db.employeeproject.createIndex({projects:1})
Here is the output,
> db.employeeproject.createIndex({projects:1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

Note

You can’t create a compound multikey index.

Step 2: Text Indexes

MongoDB provides text indexes to support text search queries on string content. You can create a text index on a field that takes as its value a string or an array of string elements.

Consider this post collection.
db.post.insert({
   "post_text": "Happy Learning",
   "tags": [
      "mongodb",
      "10gen"
   ]
})
To create text indexes, use the following command.
db.post.createIndex({post_text:"text"})

This command creates a text index for the field post_text.

Here is the output,
> db.post.createIndex({post_text:"text"})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>
To perform a search, use the command shown here.
db.post.find({$text:{$search:"Happy"}})
Here is the output,
> db.post.find({$text:{$search:"Happy"}})
{ "_id" : ObjectId("5bb215286d8d957bcedc225e"), "post_text" : "Happy Learning", "tags" : [ "mongodb", "10gen" ] }
>

Step 3: Hashed Indexes

The size of the indexes can be reduced with the help of hashed indexes. Hashed indexes store the hashes of the values of the indexed field. Hashed indexes support sharding using hashed shard keys. In hashed-based sharding, a hashed index of a field is used as the shard key to partition data across the sharded cluster. We discuss sharding in Chapter 5.

Hashed indexes do not support multikey indexes.

Consider the user collection shown here.
db.user.insert({userId:1,userName:"John"})
db.user.insert({userId:2,userName:"James"})
db.user.insert({userId:3,userName:"Jack"})
To create a hashed-based index on the field userId, use the following command.
db.user.createIndex( { userId: "hashed" } )
Here is the output,
> db.user.createIndex( { userId: "hashed" } )
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>

Step 4: 2dsphere Index

The 2dsphere index is useful to return queries on geospatial data.

Consider the schools collection given here.
db.schools.insert( {
   name: "St.John's School",
   location: { type: "Point", coordinates: [ -73.97, 40.77 ] },
   } );
db.schools.insert( {
   name: "St.Joseph's School",
   location: { type: "Point", coordinates: [ -73.9928, 40.7193 ] },
   } );
db.schools.insert( {
   name: "St.Thomas School",
   location: { type: "Point", coordinates: [ -73.9375, 40.8303 ] },
   } );
Use the following syntax to create a 2dsphere index.
db.schools.createIndex( { location : "2dsphere" } )
The following code uses the $near operator to return documents that are at least 500 meters from and at most 1,500 meters from the specified GeoJSON point.
db.schools.find({location:{$near:{$geometry: { type: "Point",  coordinates: [ -73.9667, 40.78 ] },$minDistance: 500,$maxDistance: 1500}}})
Here is the output,
> db.schools.find({location:{$near:{$geometry: { type: "Point",  coordinates: [ -73.9667, 40.78 ] },$minDistance: 500,$maxDistance: 1500}}})
{ "_id" : ObjectId("5ca47a184b034d4cc1345f45"), "name" : "St.John's School", "location" : { "type" : "Point", "coordinates" : [ -73.97, 40.77 ] } }
>

Recipe 4-3. Index Properties

Indexes can also have properties. The index properties define certain characteristics and behaviors of an indexed field at runtime. For example, a unique index ensures the indexed fields do not support duplicates. In this recipe, we are going to discuss various index properties.

Problem

You want to work with index properties.

Solution

Use this command.
db.collection.createIndex()

How It Works

Let’s follow the steps in this section to work with index properties.

Step 1: TTL Indexes

Time to Live (TTL) indexes are single-field indexes that are used to remove documents from a collection after a certain amount of time. Data expiration is useful for certain types of information such as logs, machine-generated data, and so on.

Consider the sample collection shown here.
db.credit.insert({credit:16})
db.credit.insert({credit:18})
db.credit.insert({credit:12})
To create a TTL index on the field credit, issue the following command.
db.credit.createIndex( { credit: 1 }, { expireAfterSeconds: 35 } );
Here is the output,
> db.credit.createIndex( { credit: 1 }, { expireAfterSeconds:35 } );
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>

Step 2: Unique Indexes

A unique index ensures that the indexed fields do not contain any duplicate values. By default, MongoDB creates a unique index on the _id field.

Consider the following student collection.
db.student.insert({_id:1,studid:101,studname:"John"})
db.student.insert({_id:2,studid:102,studname:"Jack"})
db.student.insert({_id:3,studid:103,studname:"James"})
To create unique index for the field studId, use this command.
db.student.createIndex({"studid":1}, {unique:true})
Here is the output,
> db.student.createIndex({"studid":1}, {unique:true})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>
When we try to insert the value studid:101,
db.student.insert([{_id:1,studid:101,studname:"John"}])
throws the following error message:
"errmsg" : "E11000 duplicate key error collection: test.student index: _id_ dup key: { : 1.0 }",

Step 3: Partial Indexes

Partial indexes are useful when you want to index the documents in a collection that meet a specific filter condition. The filter condition could be specified using any operators. For example, db.person.find( { age: { $gt: 15 } } ) can be used to find the documents that have an age greater than 15 in the person collection. Partial indexes reduce storage requirements and performance costs because they store only a subset of the documents.

Use the db.collection.createIndex() method with the partialFilterExpression option to create a partial index.

The partialFilterExpression option accepts a document that specifies the filter condition using:
  • equality expressions (i.e., field: value or using the $eq operator).

  • $exists: true expression.

  • $gt, $gte, $lt, and $lte expressions.

  • $type expressions.

  • $and operator at the top level only.

Consider the following documents of a person collection.
db.person.insert({personName:"John",age:16})
db.person.insert({personName:"James",age:15})
db.person.insert({personName:"John",hobbies:["sports","music"]})
To index only those documents in the person collection where the value in the age field is greater than 15, use the following command.
db.person.createIndex(  { age: 1},{partialFilterExpression: { age: { $gt: 15 }}})
The query shown here uses a partial index to return those documents in the person collection where the age value is greater than 15.
db.person.find( { age: { $gt: 15 } } )
Here is the output,
> db.person.find( { age: { $gt: 15 } } )
{ "_id" : ObjectId("5ca453954b034d4cc1345f3b"), "personName" : "John", "age" : 16 }
>

Step 4: Sparse Indexes

Sparse indexes store entries only for the documents that have the indexed field, even if it contains null values. A sparse index skips any documents that do not have the indexed field. The index is considered sparse because it does not include all the documents of a collection.

Consider the following person collection.
db.person.insert({personName:"John",age:16})
db.person.insert({personName:"James",age:15})
db.person.insert({personName:"John",hobbies:["sports","music"]})
To create a sparse index on the age field, issue this command.
db.person.createIndex( { age: 1 }, { sparse: true } );
Here is the output,
> db.person.createIndex( { age: 1 }, { sparse: true } );
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
>
To return all documents in the collection named person using the index on the age field, use hint () to specify a sparse index.
db.person.find().hint( { age: 1 } ).count();
Here is the output,
> db.person.find().hint( { age: 1 } ).count();
2
>
To perform a correct count, use this code.
db.person.find().count();
Here is the output,
> db.person.find().count();
3
>

Note

Partial indexes determine the index entries based on the filter condition, whereas sparse indexes select the documents based on the existence of the indexed field.

Recipe 4-4. Indexing Strategies

We must follow different strategies to create the right index for our requirements. In this recipe, we are going to discuss various indexing strategies.

Problem

You want to learn about indexing strategies to ensure you are creating the right type of index for different purposes.

Solution

The best indexing strategy is determined by different factors, including
  • Type of executing query.

  • Number of read/write operations.

  • Available memory.

Figure 4-2 illustrates the different indexing strategies.
../images/475461_1_En_4_Chapter/475461_1_En_4_Fig2_HTML.png
Figure 4-2

Indexing strategies

How It Works

Let’s follow the steps in this section to work with different indexing strategies.

Step 1: Create an Index to Support Your Queries

Creating the right index to support the queries increases the query execution performance and results in great performance.

Create a single-field index if all the queries use the same single key to retrieve the documents.
> db.employee.createIndex({empId:1})
Create a multifield compound index if all the queries use more than one key (multiple filter condition) to retrieve the documents.
> db.employee.createIndex({empId:1,empName:1})

Step 2: Using an Index to Sort the Query Results

Sort operations use indexes for better performance. Indexes determine the sort order by fetching the documents based on the ordering in the index.

Sorting can be done in the following scenarios:
  • Sorting with a single-field index.

  • Sorting on multiple fields.

Sorting with a Single-Field Index
The index can support ascending or descending order on a single field while retrieving the documents.
> db.employee.createIndex({empId:1})
The preceding index can support ascending order sorting.
> db.employee.find().sort({empId:1})
Here is the output,
MongoDB Enterprise > db.employee.createIndex({empId:1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
MongoDB Enterprise > db.employee.find()
{ "_id" : ObjectId("5d50ed688dcf280c50fde439"), "empId" : 1, "empName" : "John", "state" : "KA", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43a"), "empId" : 2, "empName" : "Smith", "state" : "CA", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43b"), "empId" : 3, "empName" : "James", "state" : "FL", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43c"), "empId" : 4, "empName" : "Josh", "state" : "TN", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43d"), "empId" : 5, "empName" : "Joshi", "state" : "HYD", "country" : "India" }
MongoDB Enterprise > db.employee.find().sort({empId:1})
{ "_id" : ObjectId("5d50ed688dcf280c50fde439"), "empId" : 1, "empName" : "John", "state" : "KA", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43a"), "empId" : 2, "empName" : "Smith", "state" : "CA", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43b"), "empId" : 3, "empName" : "James", "state" : "FL", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43c"), "empId" : 4, "empName" : "Josh", "state" : "TN", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43d"), "empId" : 5, "empName" : "Joshi", "state" : "HYD", "country" : "India" }
The same index can also support sorting of documents in descending order.
MongoDB Enterprise > db.employee.find().sort({empId:-1})
{ "_id" : ObjectId("5d50ed688dcf280c50fde43d"), "empId" : 5, "empName" : "Joshi", "state" : "HYD", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43c"), "empId" : 4, "empName" : "Josh", "state" : "TN", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43b"), "empId" : 3, "empName" : "James", "state" : "FL", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43a"), "empId" : 2, "empName" : "Smith", "state" : "CA", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde439"), "empId" : 1, "empName" : "John", "state" : "KA", "country" : "India" }
Sorting on Multiple Fields
We can create a compund index to support sorting on multiple fields.
> db.employee.createIndex({empId:1,empName:1})
Here is the output,
MongoDB Enterprise > db.employee.createIndex({empId:1,empName:1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 2,
        "numIndexesAfter" : 3,
        "ok" : 1
}
MongoDB Enterprise > db.employee.find().sort({empId:1,empName:1})
{ "_id" : ObjectId("5d50ed688dcf280c50fde439"), "empId" : 1, "empName" : "John", "state" : "KA", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43a"), "empId" : 2, "empName" : "Smith", "state" : "CA", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43b"), "empId" : 3, "empName" : "James", "state" : "FL", "country" : "US" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43c"), "empId" : 4, "empName" : "Josh", "state" : "TN", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43d"), "empId" : 5, "empName" : "Joshi", "state" : "HYD", "country" : "India" }
Index to Hold Recent Values in Memory

When using multiple collections, we must consider the size of indexes on all collections and ensure the index fits in memory to avoid the system reading the index from the disk.

Use the following query to check the size of the index for any collection.
> db.employee.totalIndexSize()

When we ensure the index fits entirely into the RAM, that ensures faster system processing.

Here is the output,
MongoDB Enterprise > db.employee.totalIndexSize()
49152
Create Queries to Ensure Selectivity
The ability of any query to narrow down the results using the created index is called selectivity. Writing queries that limit the number of possible documents with the indexed field and the queries that are appropriately selective relative to your indexed data ensures selectivity.
> db.employee.find({empId:{$gt:1},country:"India"})

This query must scan all the documents to return the result of empId values greater than 1.

Here is the output,
MongoDB Enterprise > db.employee.find({empId:{$gt:1},country:"India"})
{ "_id" : ObjectId("5d50ed688dcf280c50fde43c"), "empId" : 4, "empName" : "Josh", "state" : "TN", "country" : "India" }
{ "_id" : ObjectId("5d50ed688dcf280c50fde43d"), "empId" : 5, "empName" : "Joshi", "state" : "HYD", "country" : "India" }
> db.employee.find({empId:4})

This query must scan only one document to return the result empId:4.

Here is the output,
MongoDB Enterprise > db.employee.find({empId:4})
{ "_id" : ObjectId("5d50ed688dcf280c50fde43c"), "empId" : 4, "empName" : "Josh", "state" : "TN", "country" : "India" }
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.245.140