Taking a backup of a sharded cluster

If we want to take a backup of an entire sharded cluster we need to stop the balancer before starting. The reason is that if there are chunks migrating between different shards at the time that we take our snapshot our database will be in an inconsistent state having either incomplete or duplicate data chunks that were in-flight at the time we took our snapshot.

Backups from an entire sharded cluster will be approximate-in-time. If we need point-in-time precision we need to stop all writes in our database, something that is generally not possible for production systems.

First we need to disable the balancer by connecting to our mongos through the mongo shell:

> use config
> sh.stopBalancer()

Then if we don't have journaling enabled in our secondaries or we have journal and data files in different volumes we need to lock our secondaries mongod instances for all shards and the config server replica set.

We also need to have a sufficient oplog size in these servers to enable them to be able to catch up to the primaries once we unlock them, or else we will need to resync them from scratch.

Given that we don't need to lock our secondaries, the next step is to back up the config server. In Linux and using LVM, this would be similar to this:

$ lvcreate --size 100M --snapshot --name snap-14082017 /dev/vg0/mongodb

Then we need to repeat the same process for a single member from each replica set in each shard.

Finally, we need to restart the balancer using the same mongo shell that we used to stop it:

> sh.setBalancerState(true)

Without going into too much detail here, it's evident that taking a backup of a sharded cluster is a complicated and time consuming procedure that needs prior planning and extensive testing to make sure not only that it works with minimal disruption but that also our backups are usable and can be restored back to our cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.71.21