Sharding

Sharding (or partioning) can be defined as the process of splitting a table across multiple machines. This is achieved by dividing a table into multiple parts and putting a subset of data on each machine. Sharding a table allows us to store more data and handle more load without scaling vertically (that is, no need for larger and more powerful machines).

Note

If you haven't worked on scaling a database before, you may be confused about the differences between replication and sharding. Replication creates an exact copy of a table on a different server, whereas sharding distributes the table such that each server has a portion of the data of each table.

Sharding a database solves the challenges of scaling to support very large datasets. In fact, sharding reduces the number of operations that each shard handles, resulting in an increased capacity. Additionally, sharding reduces the amount of data that each server needs to store.

One of the benefits of RethinkDB's sharding implementation is that it is completely managed by the database. Even if your cluster has dozens of machines and hundreds of shards, it will still look like a single database to your application. There is no difference in the way we write queries for a sharded database. RethinkDB automatically routes queries to the appropriate shard.

Sharding a table

RethinkDB greatly simplifies the way in which we administer the database. In fact, as you have seen in the previous sections, replicating a table is as easy as clicking on a few buttons. Sharding is another example of RethinkDB's easy of use trait as it can also be done through the administration web interface.

For example, suppose we want to shard the people table that we created in this chapter into two shards. To do this, open the tables section of the web interface, and click on the people table to enter its settings. Scrolling down, you will see the replication and sharding card, which we have already encountered earlier on:

Sharding a table

As you can see from the previous screenshot, our table currently has one shard and two replicas. Let's add another shard by clicking on the Reconfigure button and setting the shards to 2 as shown in the following screenshot:

Sharding a table

When you've set the number of shards, apply the new configuration. You may have noticed that there is a maximum to the amount of shards that you can have. This is equal to the number of machines in your cluster.

When you apply the new configuration, RethinkDB will analyze the table and search for the best split points to break up the table into multiple shards. Currently, all sharding is done based on the primary key and is completely automatic—the user cannot specify custom split points, and these cannot be automatically changed after sharding.

If you receive no errors, the sharding will have succeeded, and the people table is now split between two shards, each of which has a secondary replica. We can check this by looking at the data distribution within the administration interface:

Sharding a table

As you can see now, each of the two shards contains almost half of the data. This results in less data on each server, thus increasing capacity and enhancing database performance. Congratulations! You've successfully replicated and sharded a table!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.51.241