Chapter 8. Advanced Database Features

We are at the last leg of our journey and so far we have seen how Trove can help users in creating, configuring, resizing, taking backups, and restoring different data stores. However, all of these tasks deal with a single instance.

With something as important as data (especially if it is production data), no organization in the world will risk running a single instance. Therefore, in a production setup, it is imperative that some sort of high availability for databases is introduced.

While there are several options, two of the most used ones are replication and clustering when it comes to databases.

In this chapter, we will deal with these features of Trove. Currently, these features are only available for some of the databases that Trove supports.

Another point to keep in mind is that Trove itself is not the provider for these features, but merely provides a platform to help configure these if the underlying databases themselves support it. Which means if database engine type X doesn't support a feature (replication or clustering), then Trove cannot be used to set that up.

Trove enables these features using strategies (the same concept that was seen in the previous chapter for backups).

In this chapter, we will go over the following topics:

  • Understanding replication and clustering
  • How to set up replication in Trove and the different failover options available to the administrator
  • How to set up clustering in Trove

The replication example will be set up in the MySQL data store and we already have the image created for MySQL. For the clustering piece, we will use MongoDB (for which we have not yet created an image and so we will also be creating an image for the MongoDB data store).

Replication and clustering

While the detailed discussion on this topic is beyond the scope of this book, it makes logical sense to briefly look at what these mean before we get into the nitty gritty of configuring the two using Trove.

Please do remember that this is a general understanding and certain advanced features provided by some database engines may follow a different pattern.

Replication

Replication defined in the simplest terms is the process of keeping a copy of the data available on another node. Replication typically has two or more nodes, where one is the master (where reads and writes happen) and the others are slaves (where only reads can happen). There are concepts of master-master replication, but that's beyond the scope of this book.

There are two main reasons/benefits for which one could opt for replication:

  • For failover (Business Continuity Plan):
    • In the event the master fails, the slave can be promoted and the applications can continue to work
    • The failover is mostly manual, but can be automated with scripts
    • There can be consistency issues with the data as replication of the data is a timed activity and there could be a possibility of data loss with the master
  • For performance improvement:
    • In order to share the load of data reads (for reports), slaves could serve the purpose
    • In such scenarios, masters are used for database writes and real-time data reads, while slaves can be used for near real-time data reads

Clustering

Clustering focusses on a single-point agenda, Availability. Clustering is available at various levels from hardware clusters to operating system clusters to application clusters. However, in terms of databases, a cluster ensures that the atomicity of the transaction is only completed when the data is written on all the nodes.

Clusters are used where high availability is desired without any loss of data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.171.168