The CAP theorem

First introduced in the late 1990s by Eric Allen Brewer, the CAP theorem categorizes the constraints, or more generally the characteristics, of distributed database systems. In brief, the CAP theorem postulates that strictly speaking, database systems can guarantee only two of the three properties defined by CAP, as follows:

  • Consistency: The data should be consistent across all instances of the database and hence, when queried, should provide a coherent result across all nodes
  • Availability: Irrespective of the state of any individual node, the system will always respond with a result upon a query being executed (whether or not it is the most recent commit)
  • Partition tolerance: This implies that when nodes are separated across a network, the system should continue to function normally even if any node loses interconnectivity to another node

It might be evident from this that, since in a cluster nodes will be connected over a network which, by nature can be disrupted, partition tolerance has to be guaranteed in order for the system to continue performing normally. In this case, the contention lies with choosing between consistency and availability. For example, if the system has to be consistent; that is, show the most recent commit across all nodes, all the nodes cannot be available all at the same time as some nodes might not have the most recent commit. In this case, a query on a new update will not execute until all nodes have been updated with the new data. In case of availability, in similar terms, we cannot guarantee consistency, since to be available at all times means that some nodes will not have the same data as another node if a new update has not been written onto the respective node.

There is a great deal of confusion as well as contention between deciding on whether to ensure consistency or to ensure availability, and as such databases have been categorized as being either CP or AP. For the purpose of this exercise, we need not get caught up in the terminologies as that would lead to a rather abstract and philosophical discussion. The information on the aforementioned terminologies has been primarily provided to reflect upon some of the foundational theories driving the development of databases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.97.75