Hour 23. Implementing Replication and Sharding in MongoDB

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Hour 23. Implementing Replication and Sharding in MongoDB

What You’ll Learn in This Hour:

What types of servers are involved in a MongoDB replica set

How replica servers provide fault tolerance

How to deploy a MongoDB replica set

What type of servers are involved in a MongoDB sharded cluster

How to choose a shard key

Ways to choose a partitioning method

How to deploy a MongoDB sharded cluster

When implementing MongoDB as a high-performance database, you should seriously consider applying replication and sharding. When implemented correctly, MongoDB’s replication and sharding feature can provide scalability and high availability for your data.

Replication is the process of setting up multiple MongoDB servers that act as one. The data stored in MongoDB is replicated to each server in the replica set. This provides the benefit of having multiple copies of your data in case of server failure. Replication also enables you to distribute the request load on read requests because data can be read from any of the servers in the replica set.

Sharding is the process of splitting up large datasets onto multiple servers. This enables you to scale data that otherwise would be too large for a single server to handle. The following sections discuss designing and implementing replication and sharding in MongoDB.

Applying Replication in MongoDB

One of the most critical aspects of high-performance databases is replication, the process of defining multiple MongoDB servers that have the same data. The MongoDB servers in the replica set are one of three types, as Figure 23.1 illustrates.

FIGURE 23.1 Implementing a replica set in MongoDB.

Primary: The primary server is the only server in a replica set that can be written to. This makes sure the primary can ensure the data integrity during write operations. A replica set can have only one primary server.

Secondary: Secondary servers contain a duplicate of the data on the primary server. To ensure that the data is accurate, the replica servers apply the oplog from the primary server so that every write operation on the primary server also happens on the secondary servers in the same order. Clients can read from secondary servers but not write to them.

Arbiter: The arbiter server is interesting. It does not contain a replica of the data, but it can be used when electing a new primary if the primary server experiences a problem. When the primary server fails, the failure is detected and other servers in the replica set then elect a new primary using a heartbeat protocol among the primary, secondary, and arbiter servers. Figure 23.2 shows an example of the configuration using an arbiter server.

FIGURE 23.2 Implementing an arbiter server in MongoDB replica set to ensure an odd number of servers.

Replication provides two benefits: performance and high availability. Replica sets provide better performance because, although clients cannot write to secondary servers, they can read from them. This enables you to provide multiple read sources for you applications.

Replica sets provide high availability because if the primary server happens to fail, other servers have a copy of the data and can take over. The replica set uses a heartbeat protocol to communicate between the servers and determine whether the primary server has failed. If so, a new master is elected.

You should have at least three servers in the replica set. You should also try to have an odd number, which makes it easier for the servers to elect a primary. This is where arbiter servers come in handy. Arbiter servers require few resources, but they can save time when electing a new primary. Figure 23.2 shows the replica set configuration with an arbiter. Notice that the arbiter does not have a replica—it only participates in the heartbeat protocol.

Understanding Replication Strategy

When determining how to deploy a MongoDB replica set, you need to apply a few concepts. The following sections discuss some of the considerations in implementing a MongoDB replica set.

Number of Servers

The first question is how many servers to include in the replica set. This depends on the nature of data interaction from clients. If the data from clients is mostly writes, you will not gain a big benefit from a large number of servers. However, if your data is mostly static and you have a lot of read requests, having more secondary servers definitely makes a difference.

Number of Replica Sets

Also consider the data. In some instances, it makes more sense to break up the data into multiple replica sets, each containing a different segment of the data. This enables you to fine-tune the servers in each set to meet the data and performance needs. Consider this only if the there is no correlation between the data and if the clients accessing the data would rarely need to connect to both replica sets at the same time.

Fault Tolerance

How important is fault tolerance to your application? Your primary server likely will go down only rarely. If this wouldn’t really affect your application and you can easily rebuild the data, you might not need replication. However, if you promise your customer seven 9s of availability, then any outage is extremely bad—and extended outage is unacceptable. In those cases, it makes more sense to add servers to the replica set, to ensure availability.

Another consideration involves placing one of the secondary servers in an alternative datacenter, in case your entire datacenter fails. However, for the sake of performance, you should keep the majority of your servers in your primary datacenter.

By the Way

If you are concerned about fault tolerance, you should also enable journaling (see Hour 1, “Introducing NoSQL and MongoDB”). Journaling allows transaction to be replayed even if the power fails in your datacenter.

Deploying a Replica Set

Implementing a replica set is simple in MongoDB. The following steps describe the process of prepping and deploying the replica set:

1. Ensure that each member in the replica set is accessible to the others using DNS or hostnames. Adding a virtual private network for the replica servers to communicate on enhances the performance of the system because other traffic on the network will not affect the replication process. If the servers are not behind a DMZ and data communication is safe, you should also configure an auth and kwFile for the servers to communicate on, for security.

2. Configure the replSet value, which is a unique name for the replica set, either in the mongodb.conf file or on your command line for each server in the replica set. For example, the following command line starts a database and assigns it to replica set rs0:

Table of Contents for Hour 23. Implementing Replication and Sharding in MongoDB

Create new playlist

Sign In

Sign Up

Hour 23. Implementing Replication and Sharding in MongoDB

Applying Replication in MongoDB

Understanding Replication Strategy

Number of Servers

Number of Replica Sets

Fault Tolerance

Deploying a Replica Set

Implementing Sharding in MongoDB

Understanding Sharding Server Types

Choosing a Shard Key

Selecting a Partitioning Method

Deploying a Sharded MongoDB Cluster

Creating the Config Server Database Instances

Starting Query Router Servers (mongos)

Adding Shards to the Cluster

Enabling Sharding on a Database

Enabling Sharding on a Collection

Set Up Shard Tag Ranges

Summary

Q&A

Workshop

Quiz

Quiz Answers

Exercises

Table of Contents for
Hour 23. Implementing Replication and Sharding in MongoDB