Replication

Replication is a way of keeping a number of copies of your data on multiple servers. Why would we need replication? We need it because it provides us with multiple copies of the same data that can be used for redundancy and increasing data availability.

There are actually countless reasons why replicating your data is a good idea, but they generally boil down to two major reasons:

  • Availability
  • Scalability

If, for any reason, your main server goes down, then you don't want to lose any data, and you probably want another server to immediately start serving data to the clients. If your dataset is reasonably small, then in the event of a failure, you could just spin up a new server and restore data from a backup. However, if the dataset is large, the restore process could take hours! To avoid this downtime, it's a good idea to replicate your data.

The other main reason is scalability. If your database has lots of traffic, it may become slow and unresponsive, degrading the clients' experience. This is especially true if your application requires lots of read queries. In this situation, setting up replication provides you with another copy of data (a read replica) so that the traffic can be effectively load balanced between multiple servers. As your traffic grows, you can add more and more read replicas.

In RethinkDB, you set up replication by creating table replica sets. A replica set is a group of servers with one primary and one or more secondary servers that keep a copy of the primary's data. Suppose, we set up a replica set with one primary and one secondary. If one of the servers goes down, you can still access your data from the other replica in the set. Additionally, RethinkDB automatically load balances queries between replica sets, reducing traffic on each node and boosting overall cluster performance.

Adding a secondary replica

In this section, we will get started with replication by setting up a two-member replica set. The first thing we are going to do is create a new table called people. You can do so by executing the following query:

r.db('test').tableCreate('people)

If the query succeeds, you will get an output similar to this:

Adding a secondary replica

As we can see from the query result, the primary replica has been created on the rethink2 server. As we have two servers connected to our cluster, the maximum number of replicas that can be created for each shard is two.

To create a secondary replica, select the people table from the tables section of the web interface. On this page, you can configure sharding and replication settings for the table:

Adding a secondary replica

As you can see from the previous screenshot, our table currently has one shard and one replica. Let's add another replica by clicking on the Reconfigure button and setting the replicas to 2 as shown in the following screenshot:

Adding a secondary replica

As you can see, the administration interface tells us exactly what's going to happen when we apply the changes. The rethink2 server acts as the primary replica, and a secondary replica will be created in the rethink1 server. Let's go ahead and apply the new configuration.

If you receive no errors, the secondary replica will have been created, and the people table is now replicated across two servers. Congratulations! We can confirm this by visiting the Tables section of the web interface:

Adding a secondary replica

As you can see from the preceding screenshot, the people table has one shard and two replicas, one on each server of the cluster.

Now that we have replicated our table, let's add some data to it. For example, we can add a document to the table by running the following query:

Adding a secondary replica

For the purpose of this example, I have executed this query five times, each with different values, so the table now contains five different documents. You may be wondering what is happening under the hood to the replica set. While we write data to the table, the secondary replica replicates the primary and applies all operations to its dataset in such a way that the secondary replica set reflects the primary's dataset. What this means is that we now have a complete copy of the table's data on a different server. So, if the primary server becomes unavailable due to a failure, we will still be able to read data from the secondary replica.

Failover

Let's put this into practice by simulating a database failure. In the following example, I will simulate the failure of the primary replica, that is, server rethink2. To simulate a failure, we can simply stop the database by running the following command from the terminal:

sudo /etc/init.d/rethinkdb stop

This will completely shutdown the database on the rethink2 server. If you now check the web interface, you will see a screen similar to the following one:

Failover

As you can see, the web interface signals the fact that one server is unavailable, and this affects one of the tables. The advantage of implementing replication is that in this situation, with one server offline, we can still read data from the secondary replica.

We can try this by running the following query from the Data Explorer:

r.db('test').table('people', read_mode= 'outdated')

As our primary replica is offline, there is a possibility that our queries return outdated data; for this reason, we must tell RethinkDB that we're willing to receive outdated data. We do this by setting read_mode setting to Outdated when running the query. Running the previous query will result in the following result:

Failover

As you can see, even if one of the servers is offline, we can still read data and run queries on the table as we have a secondary replica. Now that we have successfully implemented replication, it's time to look at sharding.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.184.90