Automatic failover

In rare cases of a system outage, automatic failover tries to recover and reassign the traffic to the existing regions. There are generally two cases where this applies:

Read region failover
Write region failover

If any read regions fails, Cosmos DB automatically marks them offline, in order to avoid any existing traffic trying to route to that region (to avoid 404 and 502 errors), and, using the internal consensus, figures out the next read region and drives the traffic to that region.

This is pretty much what Cosmos DB offers and this is, I believe, what every other database provider offers. The key feature is the re-syncing of the data when the affected region is online again. Cosmos DB updates that region with the data that it missed during the affected time and everything returns to a normal state again.

If any write region fails, then Cosmos DB looks for another write region. If your application has only one write region and that failed, Cosmos DB will then convert any read region, based on their relative priorities, to a write region and makes the application run smoothly.

When the original write region comes back online, it will re-sync the data and the original write region will become a write region again, as it was supposed to be.

We have seen automatic failover; now let's look at the manual failover.

Table of Contents for Automatic failover

Create new playlist

Sign In

Sign Up

Table of Contents for
Automatic failover