Performance comparison with replica sets

Developers and architects are always looking out for ways to compare performance between replica sets and sharded configurations.

The way MongoDB implements sharding, it is based on top of replica sets. Every shard in production should be a replica set.

The main difference in performance comes from fan out queries. When we are querying without the shard key, MongoDB's execution time is limited by the worst-performing replica set.

In addition, when using sorting without the shard key, the primary server has to implement the distributed merge sort on the entire dataset. This means that it has to collect all data from different shards, merge-sort them, and pass them as sorted to mongos.

In both cases, network latency and limitations in bandwidth can slow down operations as opposed to a replica set.

On the flip side, by having three shards, we can distribute our working set requirements across different nodes, thus serving results from RAM instead of reaching out to the underlying storage, HDD or SSD.

On the other hand, writes can be sped up significantly since we are no longer bound by a single node's I/O capacity but we can have writes in as many nodes as there are shards.

Summing up, in most cases and especially for the cases that we are using the shard key, both queries and modification operations will be significantly sped up by sharding.

The shard key is the single most important decision in sharding and should reflect and apply to our most common application use cases.

Table of Contents for Performance comparison with replica sets

Create new playlist

Sign In

Sign Up

Table of Contents for
Performance comparison with replica sets