17 Advanced MongoDB Concepts

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

17
Advanced MongoDB Concepts

There is much more to MongoDB than can be fully covered in this book. This chapter covers some additional fundamentals beyond the normal database create, access, and delete operations. Designing and implementing indexes allows you to improve database performance. Also, implementing replica sets and sharding provide additional performance improvements and high availability.

Adding Indexes

MongoDB allows you to index fields in your collections that make it faster to find documents. When an index is added in MongoDB, a special data structure is created in the background that stores a small portion of a collection’s data and then optimizes the structure of that data to make it faster to find specific documents.

For example, applying an _id index basically creates a sorted array of _id values. Once the index has been created, the following benefits occur:

When looking up an object by _id, an optimized search can be performed on the ordered index to find the object in question.

Say that you want objects back sorted by _id, and the sort has already been performed on the index so it doesn’t need to be done again. MongoDB just needs to read the documents in the order the _id appears in the index.

Say that you want documents 10–20 sorted by _id, the operation is just a matter of slicing that chunk out of the index to get the _id values to look up objects.

If all you need is a list of sorted _id values, MongoDB does not even need to read the documents at all. It can just return the values directly from the index.

Keep in mind, however, that those benefits come at a cost. The following are some of costs associated with indexes:

Indexes take up space on disk and in memory.

Indexes take up processing time when inserting and updating documents. That means that database writes to collections with a large number of indexes can suffer performance hits.

The larger the collection, the greater the cost in resources and performance. Extremely large collections may make it impractical to apply some indexes.

Several different types of indexes can be applied to fields in a collection to support various design requirements. Table 17.1 lists the different index types.

Table 17.1 Types of Indexes Supported by MongoDB

Option	Description
Default `_id`	All MongoDB collections have an index on the `_id` by default. If applications do not specify a value for `_id`, the driver or the `mongod` create an `_id` field with an `ObjectID` value. The `_id` index is unique, and prevents clients from inserting two documents with the same value for `_id`.
Single field	The most basic type of index is one on a single field. This is similar to the `_id` index but on any field that you need. The index can be sorted in ascending or descending order. The values of the fields do not necessarily need to be unique. For example: {name: 1}
Compound	Specifies an index on multiple fields. The index is sorted on the first field value, then the second, and so on. You can also mix the sort direction. For example, you can have one field sorting ascending and another sorted descending. {name: 1, value: -1}
Multikey	If you add a field that stores an array of items, a separate index for every element in the array also is created. This allows you to find documents more quickly by values contained in the index. For example, consider an array of objects named `myObjs` where each object has a `score` field: {myObjs.score: 1}
Geospatial	MongoDB allows you to create a geospatial index based on `2d` or `2sphere` coordinates. This allows you to more effectively store and retrieve data that has a reference to a geospatial location. For example: {"locs":"2d"}
Text	MongoDB also supports adding a text index that supports faster lookup of string elements by words that are contained inside. The index does not store words like the, a, and, etc. For example: {comment: "text"}
Hashed	When using hashed base sharding, MongoDB allows you to use a hashed index, which only indexes hashed values that match those stored in that particular server. This reduces the overhead of keeping hashes around for items on other servers. For example: {key: "hashed"}

Indexes can also have special properties that define how MongoDB handles the index. These properties are

unique: This forces the index to only include a single instance of each field value and thus MongoDB rejects adding a document that has a duplicate value to one that is already in the index.

sparse: This ensures that the index only contains entries for documents that have the indexed field. The index skips documents that do not have the indexed field.

TTL: TTL or Time To Live indexes apply the concept of only allowing documents to exist in the index for a certain amount of time—for example, log entries or event data that should be cleaned up after a certain amount of time. The index keeps track of insertion time and removes the earliest items after they have expired.

The unique and sparse properties can be combined such that the index rejects documents that have a duplicate value for the index field and rejects documents that do not include the indexed field.

Indexes can be created from the MongoDB shell, MongoDB Node.js native client, or Mongoose. To create an index from the MongoDB shell, you use the ensureIndex(index, properties) method. For example:

db.myCollection.ensureIndex({name:1}, {background:true, unique:true, sparse: true})

The background option specifies whether the index created should take place in the foreground of the shell or the background. Running in the foreground completes faster but takes up more system resources, so it’s not a good idea on a production system during peak times.

To create an index from the MongoDB Node.js native driver, you can call the ensureIndex(collection, index, options, callback) method on an instance of the Db object. For example:

Click here to view code image

var MongoClient = require('mongodb').MongoClient;
MongoClient.connect("mongodb://localhost/", function(err, db) {
  db.ensureIndex('myCollection', {name: 1},
                 {background: true, unique: true, sparse: true},
                 function(err){
    if(!err) console.log("Index Created");
  });
});

To create an index using the Schema object in Mongoose, you set the index options on the field in the schema for example:

var s = new Schema({ name: { type: String, index: true, unique: true, sparse: true});

You can also add the index to the Schema object later using the index() method, for example:

s.schema.path.('some.path').index({unique: true, sparse: true});

Using Capped Collections

Capped collections are fixed-size collections that insert, retrieve, and delete documents based on insertion order. This allows the capped collection to support high throughput operations. Capped collections work similarly to circular buffers in that once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.

Capped collections can also be limited based on a maximum number of documents. This is useful in reducing the indexing overhead that can occur when storing large numbers of documents in a collection.

Capped collections are useful for rotating event logs or caching data because you do not need to worry about expending the overhead and effort of implementing code in your application to clean up the collection.

To create a capped collection from the MongoDB shell, you use the createCollection() method on the db object, specify the capped property, and set the size in bites as well as the optional maximum number of documents. For example:

db.createCollection("log", { capped : true, size : 5242880, max : 5000 } )

From the MongoDB Node.js native driver, you can also specify a capped collection in the db.createCollection() method described in Chapter 13, “Getting Started with MongoDB and Node.js.” For example:

db.createCollection("newCollection", { capped : true, size : 5242880, max : 5000 }
                   function(err, collection){ });

From Mongoose, you can define the collection as capped in the schema options. For example:

var s = new Schema({ name:String, value:Number},
                   { capped : true, size : 5242880, max : 5000});

Applying Replication

Replication is one of the most critical aspects of high-performance databases. Replication is the process of defining multiple MongoDB servers that have the same data. The MongoDB servers in the replica set will be one of three types, as illustrated in Figure 17.1:

Primary: The primary server is the only server in a replica set that can be written to and ensures the data integrity during write operations. A replica set can only have one primary server.

Figure depicts the implementation of a replica set in MongoDB.

Figure 17.1 Implementing a replica set in MongoDB

Secondary: Secondary servers contain a duplicate of the data on the primary server. To ensure the data is accurate, the replica servers apply the operations log, or oplog, from the primary server, ensuring that every write operation on the primary server also happens on the secondary servers in the same order. Clients can read from secondary servers but not write to them.

Arbiter: The arbiter server is kind of interesting. It does not contain a replica of the data but can be used when electing a new primary if the primary server experiences a problem. When the primary server fails, the failure is detected, and other servers in the replica set elect a new primary using a heartbeat protocol between the primary, secondary, and arbiter servers. Figure 17.2 shows an example of the configuration using an arbiter server.

Replication provides two benefits: performance and high availability. Replica sets provide better performance because although clients cannot write to secondary servers they can read from them, which allows you to provide multiple read sources for your applications.

Replica sets provide high availability because if the primary server happens to fail, other servers that have a copy of the data can take over. The replica set uses a heartbeat protocol to communicate between the servers and determine whether the primary server has failed, at which point a new master is elected.

Figure shows an example of the configuration using an arbiter server.

Figure 17.2 Implementing an arbiter server in a MongoDB replica set to ensure an odd number of servers

You should have at least three servers in the replica set, and you should also try and have an odd number. This makes it easier for the servers to elect a primary. This is where arbiter servers come in handy. They require few resources but can save time when electing a new primary. Figure 17.2 shows the replica set configuration with an arbiter. Notice that the arbiter does not have a replica; it only participates in the heartbeat protocol.

Replication Strategy

There are a few concepts to apply when you are determining how to deploy a MongoDB replica set. The following sections discuss a few of the different things you should consider before implementing a MongoDB replica set.

Number of Servers

The first question is how many servers should be included in the replica set. This depends on the nature of data interaction from clients. If the data from clients is mostly writes, you are not going to get a big benefit from a large number of servers. However, if your data is mostly static and you have a high number of read requests, more secondary servers will definitely make a difference.

Number of Replica Sets

Also consider the data. In some instances, it makes more sense to break up the data into multiple replica sets, each containing a different segment of the data. This allows you to fine-tune the servers in each set to meet the data and performance needs. Only consider this if there is no correlation between the data, so clients accessing the data would rarely need to connect to both replica sets at the same time.

Fault Tolerance

How important is the fault tolerance to your application? It will likely be a rare occurrence for your primary server to go down. If it doesn’t really affect your application too much and the data can easily be rebuilt, you may not need replication. However, if you promise your customer Seven Nines availability, any outage is bad, and an extended outage is unacceptable. In those cases, it makes sense to add additional servers to the replica set to ensure availability.

Another thing to consider is placing one of the secondary servers in an alternative data center to support instances when your entire data center fails. However, for the sake of performance, you should keep the majority of secondary servers in your primary data center.

If you are concerned about fault tolerance, you should also enable journaling as described in Chapter 12, “Getting Started with MongoDB.” Enabling journaling allows transactions to be replayed even if the power fails in your data center.

Deploying a Replica Set

Implementing a replica set is simple in MongoDB. The following steps take you through the process of prepping and deploying the replica set.

1. First ensure that each member in the replica set is accessible to each other using DNS or hostnames. Adding a virtual private network for the replica servers to communicate on will enhance the performance of the system because the replication process will not be affected by other traffic on the network. If the servers are not behind a firewall so the data communications are safe, you should also configure an auth and kwFile for the servers to communicate on for security.

2. Configure the replSet value, which is a unique name for the replica set, either in the mongodb.conf file or on your command line for each server in the replica set, for example:
mongod --port 27017 --dbpath /srv/mongodb/db0 --replSet rs0

3. Start the MongoDB client using the mongo command and execute the following command on each server in the replica set to initiate the replica set operations:
rs.initiate()

4. Use the MongoDB shell to connect to the MongoDB server that acts as the primary and execute the following command for each secondary host:
rs.add(<secondary_host_name_or_dns>)

5. Use the following command to view the configuration on each server:
rs.conf()

6. Inside your application, define the read preference for reading data from the replica set. The previous chapters already described how to do this by setting the preference to primary, primaryPreferred, secondary, secondaryPreferred, or nearest.

Implementing Sharding

A serious problem that many large-scale applications encounter is that the data stored in the MongoDB is so enormous that it severely impacts performance. When a single collection of data becomes too large, indexing can cause a severe performance hit, the amount of data on disk can cause a system performance hit, and the number of requests from clients can quickly overwhelm the server. The application gets slower and slower at an accelerated rate when reading from and writing to the database.

MongoDB solves this problem through sharding. Sharding is the process of storing documents across multiple MongoDB servers running on different machines. This allows the MongoDB database to scale horizontally. The more MongoDB servers you add, the larger the number of documents that can be supported by your application. Figure 17.3 illustrates the concept of sharding. From the application’s perspective there is a single collection; however, there are actually four MongoDB shard servers, and each contains a portion of the documents in the collections.

Figure depicts the sharing concept of sharding.

Figure 17.3 From the application’s perspective there is only a single collection to access; however, the documents for that collection are split across multiple MongoDB shard servers

Sharding Server Types

Three types of MongoDB servers are involved when sharding your data. These servers each play a specific role to present a unified view to the applications. The following list describes each of the server types, and the diagram in Figure 17.4 illustrates the interaction between the different types of sharding servers.

Shard: A shard actually stores the documents that make up the collection. The shard can be an individual server; however, to provide high availability and data consistency in production, consider using a replica set that provides primary and secondary copies of the shard.

Query router: The query router runs an instance of mongos. The query router provides the interface for client applications to interact with the collection and obfuscates the fact that the data is in fact sharded. The query router processes the request, sends targeted operations to the shards, and then combines the shard responses into a single response to the client. A sharded cluster can contain more than one query router, which is a good way to load balance large numbers of client requests.

Config: Config servers store the metadata about the sharded cluster that contains a mapping of the cluster’s data set to the shards. The query router uses this metadata when targeting operations to specific shards. Production sharded clusters should have exactly three config servers.

A figure shows interaction between different types of sharding servers.

Figure 17.4 The router servers accept requests from the MongoDB clients and then communicate with the individual shard servers to read or write data

Choosing a Shard Key

The first step in sharding a large collection is to decide on a shard key that is used to determine which documents should be stored in which shard. The shard key is an indexed field or an indexed compound field that must be included in every document in the collection. MongoDB uses the value of the shard key to split the collection between the shards in the cluster.

Selecting a good shard key can be critical to achieving the performance that you need from MongoDB. A bad key can seriously impact the performance of the system, whereas a good key can improve performance and ensure future scalability. If a good key does not exist in your documents, you may want to consider adding a field specifically to be a sharding key.

When selecting a shard key, keep in mind the following considerations:

Easily divided: The shard key needs to be easily divided into chunks.

Randomness: When using range-based sharding, random keys can ensure that documents are more evenly distributed, so no one server is overloaded.

Compound keys: It is best to shard using a single field when possible; however, if a good single field key doesn’t exist, you can still get better performance from a good compound field than a bad single field key.

Cardinality: Cardinality defines the uniqueness of the values of the field. A field has high cardinality if it is unique, for example, a Social Security number among a million people. A field has low cardinality if it is generally not unique, for example, eye color in a million people. Typically, fields that have high cardinality provide much better options for sharding.

Query targeting: Take a minute and look at the queries necessary in your applications. Queries perform better if the data can be collected from a single shard in the cluster. If you can arrange for the shard key to match the most common query parameters, you will get better performance as long as all queries are not going to the same field value. Consider the example of arranging documents based on the zip code of the user when all your queries are based on looking up users by zip code. All the users for a given zip code exist on the same shard server. If your queries are fairly distributed across zip codes, a zip code key is a good idea. However, if most of your queries are on a few zip codes, a zip code key is actually a bad idea.

To illustrate shard keys better consider the following keys:

{ "zipcode": 1}: This shard key distributes documents by the value of the zipcode field. That means that all lookups based on a specific zipcode go to a single shard server.

{ "zipcode": 1, "city": 1 }: This shard key first distributes documents by the value of the zipcode field. If a number of documents have the same value for zipcode, they can be split off to other shards based on the city field value. That means you are no longer guaranteed that a query on a single zipcode will hit only one shard. However, queries based on zipcode and city will go to the same shard.

{ "_id": "hashed" }: This shard key distributes documents by a hash of the value of the _id field. This ensures a more even distribution across all shards in the cluster. However, it makes it impossible to target queries so that they will hit only a single shard server.

Selecting a Partitioning Method

The next step in sharding a large collection is to decide how to partition the documents based on the shard key. You can use two methods to distribute the documents into different shards based on the shard key value. Which method you use depends on the type of shard key you select:

Range-based sharding: Divides the data set into specific ranges based on the value of the shard key. This method works well for shard keys that are numeric. For example, if you have a collection of products and each product is given a specific product ID from 1 to 1,000,000, you could shard the products in ranges of 1–250,000; 250,001–500,000, and so on.

Hash-based sharding: Uses a hash function that computes a field value to create chunks. The hash function ensures that shard keys that have a close value end up in different shards to ensure a good distribution.

It is vital that you select a shard key and distribution method that distributes documents as evenly as possible across the shards; otherwise, one server ends up overloaded while another is relatively unused.

The advantage of range-based sharding is that it is often easy to define and implement. Also if your queries are often range-based as well, it is more performant than hash-based sharding. However, it is difficult to get an even distribution with range-based sharding unless you have all the data up front and the shard key values will not change going forward.

Hash-based sharding takes more understanding of the data but typically provides the best overall approach to sharding because it ensures a much more evenly spaced distribution.

The index used when enabling sharding on the collection determines which partitioning method is used. If you have an index based on a value, MongoDB uses range-based sharding. For example, the following implements a range-based shard on the zip and name fields of the document:

db.myDB.myCollection.ensureIndex({"zip": 1, "name":1})

To shard using the hash-based method, you need to define the index using the hash method, for example:

db.myDB.myCollection.ensureIndex({"name":"hash"})

Deploying a Sharded MongoDB Cluster

The process of deploying a sharded MongoDB cluster involves several steps to set the different types of servers and then configuring the databases and collections. To deploy a sharded MongoDB cluster, you need to

1. Create the config server database instances.

2. Start query router servers.

3. Add shards to the cluster.

4. Enable sharding on a database.

5. Enable sharding on a collection.

The following sections describe each of these steps in more detail.

Warning

All members of a sharded cluster must be able to connect to all other members of a sharded cluster, including all shards and all config servers. Make sure that the network and security systems, including all interfaces and firewalls, allow these connections.

Creating the Config Server Database Instances

The config server processes are simply mongod instances that store the cluster’s metadata instead of the collections. Each config server stores a complete copy of the cluster’s metadata. In production deployments, you must deploy exactly three config server instances, each running on different servers to ensure high availability and data integrity.

To implement the config servers, perform the following steps on each:

1. Create a data directory to store the config database.

2. Start the config server instances passing the path to the data directory created in step 1, and also include the --configsvr option to denote that this is a config server. For example:
mongod --configsvr --dbpath <path> --port <port>

3. Once the mongod instance starts up, the config server is ready.

Note

The default port for config servers is 27019.

Starting Query Router Servers (`mongos`)

The query router (mongos) servers do not require database directories because the configuration is stored on the config servers and the data is stored on the shard server. The mongos servers are lightweight, and therefore it is acceptable to have a mongos instance on the same system that runs your application server.

You can create multiple instances of the mongos servers to route requests to the sharded cluster. However, these instances shouldn’t be running on the same system to ensure high availability.

To start an instance of the mongos server, you need to pass in the --configdb parameter with a list of the DNS names/hostnames of the config servers you want to use for the cluster. For example:

mongos --configdb c1.test.net:27019,c2.test.net:27019,c3.test.net:27019

By default, a mongos instance runs on port 27017. However, you can also configure a different port address using the --port <port> command line option.

Tip

To avoid downtime, give each config server a logical DNS name (unrelated to the server’s physical or virtual hostname). Without logical DNS names, moving or renaming a config server requires shutting down every mongod and mongos instance in the sharded cluster.

Adding Shards to the Cluster

The shard servers in a cluster are just standard MongoDB servers. They can be a standalone server or a replica set. To add the MongoDB servers as shards in the cluster, all you need to do is access the mongos server from the MongoDB shell and use the sh.addShard() command.

The syntax for the sh.addShard() command is:

sh.addShard(<replica_set_or_server_address>)

For example, to add a replica set named rs1 on a server named mgo1.test.net as a shard in the cluster server, execute the following command from the MongoDB shell on the mongos server:

sh.addShard( "rs1/mgo1.test.net:27017" )

For example, to add a server named mgo1.test.net as a shard in the cluster server, execute the following command from the MongoDB shell on the mongos server:

sh.addShard( "mgo1.test.net:27017" )

Once you have added all the shards to the replica set, the cluster will be communicating and sharding the data, although for predefined data it takes some time for the chunks to be fully distributed.

Enabling Sharding on a Database

Prior to sharding a collection you need to enable the database it resides in to handle sharding. Enabling sharding doesn’t automatically redistribute the data, but instead just assigns a primary shard for the database and makes other configuration adjustments that make it possible to enable the collections for sharding.

To enable the database for sharding, you need to connect to a mongos instance using the MongoDB shell and issue the sh.enableSharding(database) command. For example, to enable a database named bigWords you would use:

sh.enableSharding("bigWords");

Enabling Sharding on a Collection

Once the database has been enabled for sharding, you are ready to enable sharding at the collection level. You do not need to enable sharding for all collections in the database, just the one that it makes sense on.

Use the following steps to enable sharding on a collection:

1. Determine which field(s) will be used for the shard key as described above.

2. Create a unique index on the key field(s) using ensureIndex() described earlier in this chapter.
db.myDB.myCollection.ensureIndex( { _id : "hashed" } )

3. Enable sharding on the collection using sh.shardCollection(<database>. <collection>, shard_key). The shard_key is the pattern used to create the index. For example:
sh.shardCollection("myDB.myCollection", { "_id": "hashed" } )

Set Up Shard Tag Ranges

Once you have enabled sharding on a collection, you might want to add tags to target specific ranges of the shard key values. A really good example of this is where the collection is sharded by zip codes. To improve performance, tags can be added for specific city codes, such as NYC and SFO, and the zip code ranges for those cities specified. This ensures that documents for a specific city are stored on a single shard in the cluster, which can improve performance for queries based on multiple zip codes for the same city.

To set up shard tags, you simply need to add the tag to the shard using the sh.addShardTag(shard_server, tag_name) command from a mongos instance. For example:

sh.addShardTag("shard0001", "NYC")
sh.addShardTag("shard0002", "SFO")

Then to specify range for a tag, in this case the zip code ranges for each city tag, you use the sh.addTagRange(collection_path, startValue, endValue, tag_name) command from the mongos instance. For example:

sh.addTagRange("records.users", { zipcode: "10001" }, { zipcode: "10281" }, "NYC")
sh.addTagRange("records.users", { zipcode: "11201" }, { zipcode: "11240" }, "NYC")
sh.addTagRange("records.users", { zipcode: "94102" }, { zipcode: "94135" }, "SFO")

Notice that multiple ranges are added for NYC. This allows you to specify multiple ranges within the same tag that is assigned to a single shard.

If you need to remove a shard tag later, you can do so using the sh.removeShardTag(shard_server, tag_name) method. For example:

sh.removeShardTag("shard0002", "SFO")

Repairing a MongoDB Database

There are a couple of reasons to run a repair on the MongoDB database—for example, if the system crashes, or there is a data integrity problem manifested in the application, or even just to reclaim some unused disk space.

You can initiate a repair of a MongoDB database from the MongoDB shell or from the mongod command line. To execute a repair from the command line, use --repair and --repairpath <repair_path> syntax. The <repair_path> specifies the location to store temporary repair files. For example:

mongod --repair --repairpath /tmp/mongdb/data

To execute a repair from the MongoDB client, use the db.repairDatabase(options) command, for example:

db.repairDatabase({ repairDatabase: 1,
  preserveClonedFilesOnFailure: <boolean>,
  backupOriginalFiles: <boolean> })

When a repair is initiated, all collections in the database are compacted, which reduces the size on disk. Also any invalid records in the database are deleted. Therefore, it may be better to restore from backup rather than run a repair.

The time it takes to run the repair depends on the size of the data. A repair impacts the performance on the systems and should be run during off peak hours.

Warning

If you are trying to repair a member of a replica set and you have access to an intact copy of your data on another replica, you should restore from that intact copy because repairDatabase will delete the corrupt data and it will be lost.

Backing Up MongoDB

The best backup strategy for MongoDB is to implement high availability using a replica set. This ensures that the data is as up-to-date as possible and ensures that it is always available. However, also consider the following if your data is critical and cannot be replaced:

What if the data center fails? In this case you can back up the data periodically and store it off-site, or you can add a replica somewhere off-site.

What if something happens to corrupt the actual application data that gets replicated? This is always a concern. In this instance, the only option is to have a backup from a previous point.

If you decide that you need to implement periodic backups of data, also consider the impact that backups will have on the system and decide on a strategy. For example:

Production impact: Backups are often intensive and need to be performed at a time when they have the least impact on your environment.

Requirements: If you plan on implementing something like a block-level snapshot to back up the database, you need to make sure the system infrastructure supports it.

Sharding: If you are sharding the data, all shards must be consistent. You cannot back up one without backing up all. Also you must stop the writes to the cluster to generate the point-in-time backup.

Relevant data: You can also reduce the impact that backups have on your system by only backing up data that is critical to your system. For example, if a database will never change, it only needs to be backed up once, or if data in a database can easily be regenerated but is very large, it may be worth accepting the cost of regeneration rather than frequent backups.

There are two main approaches to backing up MongoDB. The first is to perform a binary dump of the data using the mongodump command. The binary data can be stored off-site for later use. For example, to dump the database for a replica set named rset1 on host mg1.test.net and on a standalone system named mg2.test.net to a folder called /opt/backup/current, use the following command:

mongodump --host rset1/mg1.test.net:27018,mg2.test.net --out /opt/backup/current

The second method for backing up MongoDB databases is to use a file system snapshot. The snapshots are quick to complete; however, they are also much larger. You need to have journaling enabled, and the system has to support the block-level backups. If you are interested in implementing a snapshot method for backups, check out the guide at the following location: http://docs.mongodb.org/manual/tutorial/back-up-databases-with-filesystem-snapshots/.

Summary

This chapter finished off the MongoDB introduction by adding some more advanced concepts. You learned how to define different types of indexes to improve the speed for queries. You learned how to deploy a MongoDB replica set to ensure high availability and improve read performance. The replica set has read/write master and read-only replicas.

You were introduced to the concept of partitioning data in large collections into shards that exist on separate partitions to allow your implementation to scale horizontally. You also looked at different backup strategies and options to protect the most critical data in your MongoDB databases.

In the next chapter you get back to the Node.js world with the express module. The express module allows you to more easily implement a webserver running on Node.js by supporting routes and other functionality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
17 Advanced MongoDB Concepts

17
Advanced MongoDB Concepts

Adding Indexes

Using Capped Collections

Applying Replication

Replication Strategy

Number of Servers

Number of Replica Sets

Fault Tolerance

Deploying a Replica Set

Implementing Sharding

Sharding Server Types

Choosing a Shard Key

Selecting a Partitioning Method

Deploying a Sharded MongoDB Cluster

Creating the Config Server Database Instances

Starting Query Router Servers (`mongos`)

Adding Shards to the Cluster

Enabling Sharding on a Database

Enabling Sharding on a Collection

Set Up Shard Tag Ranges

Repairing a MongoDB Database

Backing Up MongoDB

Summary

Next

Table of Contents for 17 Advanced MongoDB Concepts

Create new playlist

Sign In

Sign Up

17Advanced MongoDB Concepts

Adding Indexes

Using Capped Collections

Applying Replication

Replication Strategy

Number of Servers

Number of Replica Sets

Fault Tolerance

Deploying a Replica Set

Implementing Sharding

Sharding Server Types

Choosing a Shard Key

Selecting a Partitioning Method

Deploying a Sharded MongoDB Cluster

Creating the Config Server Database Instances

Starting Query Router Servers (mongos)

Adding Shards to the Cluster

Enabling Sharding on a Database

Enabling Sharding on a Collection

Set Up Shard Tag Ranges

Repairing a MongoDB Database

Backing Up MongoDB

Summary

Next

Table of Contents for
17 Advanced MongoDB Concepts

17
Advanced MongoDB Concepts

Starting Query Router Servers (`mongos`)