Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

13 Consistency

In a distributed database system with replicated data, consistency has a wider connotation than consistency in single-server systems (as in the ACID properties; see Chapter 2). For example, a user may write a new value for a data item on one replica (residing at one server); but the same user may later on read an older value from another server where the replica has not yet been updated due to delays or failures in the system. Moreover, different users might try to update data items concurrently on different replicas leading to a conflict. Different notions of consistency have been devised to specify desired properties in replicated database systems. A database system without replication (a “one-copy” database) is however the gold standard for distributed systems; that is, ideally all updates should be immediately visible at all replicas and should be applied in the same order.

13.1 Strong Consistency

Strong consistency denotes the ideal world for users of a replicated database system. It demands that users never read stale data and writes are applied in the same order at all replicas basically instantaneously after the user issued the write request. However, such an ideal system behavior can never be achieved in a distributed shared-nothing database system due to network delays, failures and concurrent operations at different replicas.

To see how sequences of user requests at different replicas can interfere with each other, consider a replicated data item x. Reads are only local but writes are propagated to all other replicas. For example in Figure 13.1, Replica 2 reads x, processes a write request from Replica 1 and then writes its own result (based on a now stale value of x); this result is then propagated to all other replicas overwriting the previous value. That is, we observe the case of a stale read in Replica 2 and the case of a lost update of the write operation of Replica 1 because its write operation is not taken into account by subsequent read operations – although logically some process might have to operate on the value written by Replica 1.

While in reality instantaneous writes at all replicas and always up-to-date reads are impossible, some definitions of strong consistency aim at an appropriate global ordering of writes at all replicas at the cost of high synchronization requirements between the replicas. One early definition of strong consistency originating from distributed programming is called sequential consistency. Sequential consistency was originally applied in multiprocessor systems. To improve performance in a distributed program running on multiple processors, reordering of individual operations (without however changing the final output) might be possible. Lamport [Lam79] observed the problem that this reordering may lead to erroneous behavior when using a multiprocessor system and demanded to maintain the ordering of each of the processors’ programs.

Fig. 13.1. Interfering operations at three replicas

When transferring the notion of sequential consistency to a distributed database system, it demands to maintain a global order of the write requests that are propagated to all replicas. With sequential consistency, transactions are not considered. Instead, at each replica, sequences of independent read and write operations are executed. The write operations are propagated to all other replicas and hence a global interleaving of all writes issued by the different replicas has to be found. Sequential consistency ensures that

–there is a global ordering of all writes; that is, all replicas apply all writes in the same order;

–the local operation order at each replica is preserved; in other words, if two operations occur in sequence locally at one replica, they cannot be swapped in the global ordering.

The sequence of operations shown in Figure 13.1 complies with the definition of sequential consistency, because all write operations are executed in the same order at all replicas while also respecting the correct order of read and write operations locally at each replica. If – in contrast – the order of the two write operations were swapped at one of the replicas, then sequential consistency would be violated. However, we see that sequential consistency cannot avoid stale reads because it is defined on individual operations: sequential consistency has no notion of transactions and with it no means to define indivisible sequences of read and write operations.

A common definition for strong consistency respecting transactions is one-copy serializability. A one-copy database is one that does not replicate data and hence a write of a data item will only be directed to the server maintaining the single copy of the data item. Serializability has been defined in Section 2.5.2 for non-replicated databases: every interleaving of transaction must be such that it corresponds (that is, is equivalent) to a serial execution of the transactions one after the other. One-copy serializability extends this definition to multiple replicas: all replicas see interleavings of different transactions (that is, request sequences of different users) in the same order. This order must be serializable in the usual sense: the values read and written must be the same as if the transactions were executed serially one after the other. However, even in single-copy databases, serializability is hard to verify and instead locking or timestamping are used. In distributed database systems, the situation is even worse: the coordination overhead required between the replicas does not scale well.

Fig. 13.2. Serial execution at three replicas

To illustrate one-copy serializability, we can consider two simple transactions: one executed by Replica 1 as T₁ : 〈read₁(x), write₁(x)〉 and one executed by Replica 2 as T₂ : 〈read₂(x), write₂(x)〉. To fulfill the requirement of one-copy serializability, we see that no interleaving is possible: if we would interleave these transactions for example as 〈read₁(x), read₂(x), write₁(x), write₂(x)〉 (with a distributed execution as shown in Figure 13.1), the read values will be different from the the values read in a serial execution of these transactions where a write has to occur in between the reads – thus violating the serializability requirement. That is, to achieve one-copy serializability we indeed have to execute the transactions serially: either in the order 〈T₁, T₂〉 (as shown in Figure 13.2), or in the order 〈T₂, T₁〉. Hence, the stale read at Replica 2 is avoided, because the write in T₁ is executed before the read in T₂.

Fig. 13.3. Read-one write-all quorum (left) and majority quorum (right)

13.1.1 Write and Read Quorums

A flexible mechanism to avoid stale reads and lost updates among a group of servers in a replicated database system is to use quorums when reading and writing data:

–a read quorum is defined as a subset of replicas that have to be contacted when reading data; for a successful read, all replicas in a read quorum have to return the same answer value.

–a write quorum is defined as a subset of replicas that have to be contacted when writing data; for a successful write, all replicas in the write quorum have to acknowledge the write request.

More generally, let R denote the size of a read quorum, W denote the size of a write quorum, and N denote the replication factor. A usual requirement for a quorum-based system is that any read and write quorums overlap: in other words, the sum of read quorum size and write quorum size are larger than the replication factor – that is, as a formula R + W > N. In this way, it can be ensured that at least one replica (indeed, all replicas in the overlap) has acknowledged all previous writes and hence is able to return the most recent value. Two typical variants of quorum-based systems (ROWA and majority) are defined as follows. Read-one write-all (ROWA) requires that writes are acknowledged by all replicas, but for reads it suffces to contact one replica. Hence in a ROWA system, reads are fast but writes are slow. A majority quorum for both reads and writes requires that (for N replicas) at least replicas acknowledge the writes and at least replicas are asked when reading a value. Both variants are shown in Figure 13.3.

When all replicas in a read quorum return an identical value, the requesting client can be sure that this is the most recent value and hence the read value is strongly consistent. It might of course happen that a read quorum contains replicas with stale data. For example, the majority read quorum in Figure 13.3 contains servers S1 and S2 which might (not yet) have received the most recent write. In this case, the returned values will be ambiguous. To then determine the most recent value, it is common practice to combine quorums with version vectors (see Section 12.3.4). Hence strong consistency for reads can be achieved with quorums: provided that the read quorum and the write quorum of each data item overlap, the value with the highest version vector can be chosen as the most recently written value.

Even with intersecting read and write quorums, cases of concurrent writes can occur: the most recent value is not unique because there are concurrent version vectors. That is why a similar intersection requirement can be put on the write quorum sizes to achieve strong consistency during write operations. The requirement is that any two write quorums must overlap: in other words, twice the write quorum size is larger than the replication factor: 2W > N. Now, in order to avoid concurrent writes, write quorum intersection can again be combined with version vectors to achieve strong consistency for the writes: The servers in a write quorum can enforce total order of the writes by rejecting any write requests that are incomparable to the current version vector stored on each server. This forces the requesting client to read the current version and retry the write with the up-to-date version vector.

Quorums have the following positive effects in a distributed database system:

–Quorums enforce the same order of write operations on all replicas when combined with version vectors. Note however, that this order is not necessarily serializable; to achieve one-copy serializability extra synchronization effort between the replicas is required.

–Availability of data is ensured as long as the desired quorum is reachable by the client.

–Latency is reduced when the quorum size is smaller than the replication factor, because not all replicas need to be contacted.

–Partition-tolerance is achieved as long as the desired quorum is part of an entire partition and the partition is reachable for the client.

Flexibility comes from the fact that different quorums can be chosen for each data item. These quorums can also be of different size for each data item.

Quorums can also be used when strong consistency is not required: Indeed, quorum sizes can be chosen on every read and write request. If the restriction is dropped that quorums overlap, they are called partial quorums. More precisely, if quorums are not required to overlap, strong consistency cannot be ensured any longer – and hence stale reads or conflicting updates can occur. Some database systems can be configured to aim at majority quorums as the optimal case – but if only partial quorums can be established, they are accepted by the database system as a possibility to react to failures. In this case, the database system continues nevertheless after a timeout even if no majority of acknowledging servers can be reached. In this way, weak consistency (see Section 13.2) can be achieved on demand to improve latency of the read or write operation.

13.1.2 Snapshot Isolation

One property of multiversion concurrency control (MVCC; see Section 12.2.3) is called snapshot isolation [BBG+95, EPZ05]. A snapshot x_i of a data item x is a version of data item which was written by transaction T_i. A snapshot for one transaction is a set of snapshots of those data items that the transaction accesses.

Following the customary notation [ASS13, SPAL11], we write transactions as sequences of operations. An interleaving of several transactions is called a history. A history consists of several read (r_i) and write (w_i) operations which happen in a transaction T_i. Each transaction ends with either a commit (c_i) or abort (a_i) operation. Moreover, for each transaction T_i there is one snapshot operation s_i which happens before any other operations in the transaction. The write set of a transaction T_i is the set of data items written by the transaction; it is denoted as writeset(T_i). The read set (denoted readset(T_i)) the set of data items read by the transaction. For simplicity of notation we assume that every transaction first reads a data item before writing it and each transaction writes a data item at most once. We write o < o' when operation o occurs before operation o' in a history.

Snapshot isolation ensures the following two properties for any transactions T_i, T_j, T_k (for i ≠ j ≠ k) that are interleaved in a history h:

1.Read Rule: Whenever a transaction T_i reads a version x_j written by another transaction T_j (that is, r_i(x_j) ∊ h), and furthermore another transaction T_k writes data item x (that is, w_k(x_k) ∊ h) and later on commits (that is, c_k ∊ h), then the following holds

–Transaction T_j commits: c_j ∊ h

–Transaction T_j commits before transaction T_i takes its snapshot: c_j < s_i

–Transaction T_k commits before transaction T_j commits, or transaction T_i takes its snapshot before transaction T_k commits: c_k < c_j or s_i < c_k

2.Write Rule: For two transactions T_i and T_j that both commit (that is, c_i ∊ h and c_j ∊ h when their writesets intersect (that is, writeset(T_i) ⋂ writeset(T_j) ≠ ), then one must have committed before the other takes its snapshot: c_i < s_j or c_j < s_i.

The read rule enforces an ordering of the history such that a transaction only sees values in its snapshot that have been written by a transaction that actually committed before; and, if one transaction T_i sees the value x_j, then any transaction T_k that also writes x either must have committed before T_j committed (so that T_j overwrites the value written by T_k) or will commit later so that T_i will not observe any values written by T_k at all.

The write rule ensures that whenever two transactions modify the same data item then one must have committed before the other one takes its snapshot. In this way, the write rule also enforces a “first committer wins” strategy: if two transactions concurrently try to commit although one has not seen the effects of the other, then only the first committing transaction succeeds while the second one is aborted.

Snapshot isolation does not provide serializability. In particular, an anomaly called write skew may occur under snapshot isolation but not under serializability. A further property [EPZ05] can be checked at runtime to ensure serializability:

Dynamic Serializability Rule For any two transaction T_i and T_j that commit concurrently, T_i may not read data items that T_j writes; that is, the read set of one transaction may not intersect with the write set of the other: for s_i ∊ h, c_i ∊ h and c_j ∊ h, if s_i < c_j < c_i, then readset(T_i) ⋂ writeset(T_j).

One-copy serializability would require that even in a distributed system, a transaction always takes a snapshot based on a global real time. As this is impossible to achieve, snapshot isolation has been generalized [EPZ05] to be allowed to take a snapshot based on any older transaction that committed previously – not necessarily being the latest transaction in the distributed database system. In this way, snapshot isolation supports lazy replication: a transaction can take a snapshot of the current state of any replica locally – although another transaction might have committed on a remote replica at a later point of time.

It has been shown [ASS13] that snapshot isolation can be decomposed into four properties. In other words, instead of ensuring the read rule and the write rule above, we can as well ensure the four properties of avoiding cascading aborts (ACA), strictly consistent snapshots (SCONS), snapshot monotonicity (MON), and write-conflict freedom (WCF):

1.ACA: A history h avoids cascading aborts, if for every read r_i(x_j) in h, commit c_j occurs before it: c_j < r_i(x_j).

2.SCONS: A strictly consistent snapshots reads all data items that the transaction accesses at the same point in time; this can be expressed by the two properties that have to hold for any transactions T_i, T_j, T_k, and T_l with k ≠ j:

(a)SCONSa: when transaction T_i observes writes of both T_j and T_l then T_l may not commit after T_i read the modified value; more formally, when r_i(x_j) ∊ h and r_i(y_l) ∊ h then the r_i(x_j) is not allowed to precede the commit c_l, that is, r_i(x_j) ≮ c_l; however, they may be concurrent.

(b)SCONSb: when transaction T_i observes writes of both T_j and T_l, and when T_k writes the same data item that T_j writes and T_i reads, and if it then holds that T_k commits before T_l, then T_k also commits before T_j: when r_i(x_j) ∊ h and r_i(y_l) ∊ h and and w_k(x_k) ∊ h, if c_k < c_l then also c_k < c_j.

3.MON: Snapshots in a history h are monotonic if they can be partially ordered so that if a transaction T_i that takes its snapshot before another transaction T_l commits, the transaction T_i will never read a value written by T_l or any other transaction T_j that reads a value written by T_l.

4.WCF: A history h is write-conflict free if two independent transactions never write to the same object; two transactions T_i₂ and T_{i_n−1} are independent if there is no information flow by a cascade of read operations between the transactions; in other words, there does not occur a set of reads r_i₂(x_i₁), r_i₃(y_i₂) … r_in (z_{i_n−1}) in h.

13.2 Weak Consistency

Ensuring strong consistency is usually costly (in terms of latency) or may even lead to indefinite blocking or aborting of operations (hence reducing availability for these operations). Achieving strong consistency might also be overly restrictive (or overly pessimistic) in the sense that operations are suspended which could actually be executed immediately without causing any problems. For example, a replica in a quorum might respond with a delay larger than the others and the slower replica then keeps the other replicas waiting. Moreover, strong consistency requires a high amount of synchronization between replicas. For a write-heavy system this will turn out to be a bottleneck. Reducing the synchronization requirements leads to weaker forms of consistency. Weak consistency can improve performance of the overall system (in terms of latency or availability); however, weak consistency can cause conflicts and inconsistencies of the data and may even lead to data loss.

Optimistic replication [SS05] takes the perspective that inconsistencies and conflicts may occur but they occur only rarely and they can be resolved after they have been detected. However with optimistic replication only weaker forms of consistency can be ensured. Weak consistency leads to advantages for other properties of replicated systems like:

Availability: Requests are not blocked but every request can completed; for example, [COK86] measure availability as the fractions of requests (or transactions) in the entire system that complete.

Reduced latency: Requests return faster without waiting for acknowledgements of (all) other replicas.

Failure tolerance: Under several failure scenarios, the system (or parts of it) still remains functional; as a special case, partition tolerance means that even when the distributed system is split into several subsystems (the partitions) with no means of communication between the partitions, at least a part of the system is still able to provide the requested functionality – for example, at least one of the partitions can still accept write requests.

Scalability: a larger number of replicas (hence a larger replication factor for the individual data items) can be supported because less synchronization between replicas is needed.

There are different ways to weaken the notion of strong consistency. These weaker definitions of consistency are easier to implement than strong consistency but provide less consistency guarantees with respect to supported operations, the amount of records that can be accessed or the operation ordering that is enforced:

Operations: Enforcing consistency for sequences of individual operations (operations are seen as individual commands not belonging to a larger transaction) at the replicas is a weaker requirement than enforcing consistency for read-only transactions which is again a weaker requirement than enforcing consistency for read-write transactions.

Records: Consistency for operations or transactions may be restricted to span only one individual record which is weaker than enforcing consistency when accessing multiple records in an operation or transaction.

Ordering: different operation orderings may be enforced by the consistency definitions – like real-time ordering (which requires the a fully synchronized global clock); causality ordering (based on some causality relation between operations where some operations might be concurrent); transaction ordering (taking into account the ordering inside transactions as well as between transactions); operation sequence ordering (as prescribed by the operations issued at the individual replicas); or arbitrary ordering.

For strong consistency, eager replication is necessary: write operations have to be propagated to the replicas (or at least a quorum of replicas); these replicas have to acknowledge that the write succeeded before any other operation can be executed. In contrast, weak consistency can also rely on lazy replication: only one replica handling the write request has to successfully execute it; then the write operation is propagated to the other replicas, but it is optimistically assumed that the propagated write will succeed and hence there is no need to wait for acknowledgements. That is also why with lazy replication the propagation need not occur instantaneously but for example could be executed in a batch. Lazy replication improves latency of writes, but requires conflict handling should a conflicting write be detected at a later point of time. Moreover, with lazy replication, stale reads might happen more often, because some replicas may not have received the most recent update before answering read requests. The duration from the acceptance of a write request by the first replica until the last successful write execution by all replicas is called the inconsistency window. Ideally, the inconsistency window is only very short in order to reduce the amount of stale reads.

Several forms of weak consistency have been proposed in the literature. They are customarily divided into data-centric and client-centric consistency models. Data-centric consistency models focus on the propagation and ordering of read and write operations between the replicas in a distributed database system. Client-centric consistency models in contrast analyze the effects of consistency maintenance that are visible to a user of the distributed database system.

13.2.1 Data-Centric Consistency Models

Data-centric consistency definitions look at the internals of communication between the replicas. It wants to achieve consistency by restricting the order of read and write operations on the replicas.

Eventual consistency: As defined in [TDP+94], replicas may hold divergent versions of a data item due to concurrent updates and propagation delays. Eventual consistency demands that these versions converge in case no new updates arrive. In other words, replicas agree on a version after some time of inconsistency and as long as no new updates are issued by users. Hence, eventual consistency requires the two properties (1) total propagation of writes to all replicas and (2) convergence of all replicas towards a unique common value.

Causal consistency: Causal consistency relies on the happened-before relation that is also used for the ordering of events by logical clocks (see Section 12.3.1). The three properties of the causality relation can simply be restated in terms of read and write operations on replicas as:

1.ordering of reads and writes on a single replica must be maintained

2.reads-from relation: any read operation accessing a value of a write operation propagated by another replica must be scheduled after the write (where the write operation corresponds to receive event)

3.transitivity must be maintained

The ordering requirement of causal consistency is that concurrent operations can be executed in any order while causally related operations the same ordering is required at all replicas.

As shown in Section 12.3.1, Lamport’s happened-before relation [Lam78] covers all possible causes of an operation. Each operation then has to wait for all those previous operations to complete that the current operation causally depends on. In complex systems, it is impractical to track all the causal dependencies of all events: in other words, it is difficult to keep track of complex causality graphs. As a semantic refinement of Lamport’s happened-before relation, the notion of effective causality restricts the set of possible causes to a set of actual causal dependencies between events. With this refinement it is possible to only track the much smaller set of effective causes which can be specified by the user for each event.

With causal consistency, eventual consistency cannot be guaranteed, because replicas can execute concurrent writes in an arbitrary order. That is, the replicas may not converge towards a unique value because no dependencies might exist between writes to two different replicas of a data item. Ensuring convergence of replicas has therefore been defined and analyzed in [LFKA11, MSL+11].

Parallel snapshot isolation (PSI): PSI [SPAL11] relaxes the properties of snapshot isolation. While conventional snapshot isolation requires all commits to occur in the same order on all replicas, with PSI replicas may use different orderings of the commit operations in a history.

Non-monotonic snapshot isolation (NMSI): NMSI [ASS13] disregards condition MON and replaces condition SCONS by a relaxed consistency condition CONS: A transaction T_i in a history h observes a consistent snapshot if for r_i(x_j) ∊ h and w_k(x_k) ∊ h, when T_i depends on T_k (that is, there is a cascade of reads between a value written by T_k and T_i), then c_k < c_j.

13.2.2 Client-Centric Consistency Models

Client-centric consistency takes the perspective of the user interacting with the database system. For the user the internal ordering of read and write operations in a replicated database system is irrelevant as long as the database system presents a consistent view to the user. The user can for example read a value from one replica, update the value on another replica and then read the value again from a third replica. In this case, client-centric consistency should ensure that the interaction of the user makes sense and the database system avoids any inconsistent behavior (like returning a stale value after an update). This can be achieved by restricting the read access to those replica servers that have already processed the update; in other words, not all replicas will be available for the read access. In particular, with client-centric consistency it is allowed that different users indeed see different orderings of read and write operations because different user may require different guarantees.

One approach to client-centric consistency are session guarantees [TDP+94]: they automatically guarantee certain properties inside a session – where a session is a time-restricted interaction sequence of a single user. Some session guarantees as the following ones can be combined to yield a stronger notion of consistency:

Read your writes (RYW) Once the user has written a value, subsequent reads will return this value (or newer versions if other writes occurred in between); the user will never see versions older than his last write. Note however that RYW does not ensure isolation of different users: if other users write the same data item, values are simply overwritten.

Monotonic reads (MR) Once a user has read a version of a data item on one replica server, it will never see an older version on any other replica server; in other words, any subsequent read will return the same or a more recent version even when reading from a different replica server. This is ensured by requiring that all write accesses that are relevant for the first read R1 on server S1 will also be processed by any server S2 before S2 can serve a subsequent read R2.

Writes follow reads (WFR) If a user reads a data item from one replica sever, but subsequently writes a new value for the data item on a different replica server, the second server must have processed all those writes that are relevant for the read first, before processing the write.

Monotonic writes (MW) Once a user has written a new value for a data item in a session, any previous write has to be processed before the current one. In other words, MW strictly maintains the order of writes inside the session.

Consistent prefix (CP) Each replica has processed a subset of all writes according to a common global order of all writes. That is, some writes may not yet be visible at some replicas so that stale reads are possible; but those writes that have already been processed comply with the global order.

Bounded staleness (BS) BS puts a limit to the staleness of each read value (see for example [BVF+12, BZS14]). This can be done in terms of real time or version (corresponding to logical time for counting the versions):

–a time-based definition of BS is t-visibility: the inconsistency window comprises at most t time units; that is, any value that is returned upon a read request was up to date t time units ago.

–a version-based of BS is k-staleness: the inconsistency window comprises at most k versions; that is, lags at most k versions behind the most recent version.

13.3 Consistency Trade-offs

Trade-offs between consistency and other desired properties in distributed systems have been discussed for a long time in several papers like [RG77, TGGL82, FM82, BG84, DGMS85, COK86, GHOS96, FB99, GL02]. An early comprehensive survey of protocols for consistency in partitioned networks was given in [DGMS85]. Even in non-replicated single-server databases, the so-called isolation levels of RDBMS have been discussed for decades as an improvement of latency by reducing consistency requirements; these isolation levels have been recently surveyed in [BFG+13].

The discussion centered around the relation of the three properties strong consistency (C) versus high availability (A) versus partition tolerance (P) has gained new momentum with the formulation of the strong CAP principle in [FB99]. The strong CAP principle says that in a distributed system from the three properties C, A and P at most two of them can be achieved at the same time.

A more diplomatic formulation of this conjecture is also given in [FB99] with the weak CAP principle: if higher guarantees are required for two of the three properties, only weaker guarantees can be assured for the third. Based on this distinction, distributed systems can roughly be categorized based on the following three types:

AP systems: Whenever a network partition occurs, an AP system prefers to be available at the cost of inconsistencies that can be introduced. For example, in a quorum-based system, partitions with only a minority of replica servers might still accept write operations, although the write operations issued to different partitions might be conflicting. Inconsistencies must then later be resolved as soon as the partition has been resolved.

CP systems: Whenever a network partition occurs, a CP system prefers to maintain consistency at the cost of reduced availability. For example, in a system using majority quorum, any minority partition has to reject write and read operations (in effect making them unavailable); only a partition with a majority of replica servers can still process incoming user requests. As a second example, a ROWA quorum system can still answer read requests (with any running replica server) – but all write requests have to be rejected, as long as at least one replica server is partitioned from the rest.

CA system: As long as there is no partition, a CA system should achieve as much consistency and availability as possible. However, as soon as a partition happens, the system can give not guarantees regarding consistency and availability any longer.

Note however that there are not clear boundary between these categories and systems usually offer different guarantees in each of the categories. In particular, in distributed systems, partitions (or server crashes resulting in singleton partitions) cannot be avoided. Moreover, partitions can usually not be distinguished from arbitrary message losses. Hence any reliable distributed system must take precautions for these communication failures; if a distributed system forsakes partition tolerance, it might fail entirely due to a partition resulting in an unavailable system. As a consequence, a common interpretation of the CAP principle can be stated as: In case of a network partition, a distributed system can choose between maintaining either high availability or strong consistency.

It has also been advocated that – instead of availability in general – the tradeoff is more between latency and consistency during normal (partition-free) operation [Aba12]: the PACELC notion states that if there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C)?

Knowing about these trade-offs is important to configure the read and write behavior of distributed database systems. From a practitioner’s perspective a good option is to use adaptable consistency level for each individual read and write call – if this is offered by the database API.

13.4 Bibliographic Notes

Strong consistency has long since been analyzed from a mostly theoretical perspective [BG85]. Numerous weaker consistency models have been proposed in the last decades. The ones surveyed here can for example be found in [TDP+94, BVF+12, BZS14]. Gray et al [GHOS96] compare eager and lazy replication in the settings of the multi-master (update-anywhere) and the single-master replication cases. [WPS+00] survey safety properties of distributed protocols.

The notion of atomicity can be used to define staleness in a formal way. Starting from the basic form of atomicity in [Lam86], extended definitions of atomicity include k-atomicity [AAB05] (bounding staleness of read operations to the last k versions) and ∆-atomicity [GLS11] (bounding staleness of read operations to ∆ time units). Bailis et al [BVF+12] predict staleness in a probabilistic model.

Snapshot isolation as one way to implement multiversion concurrency control has been used and extended in several approaches; for example non-monotonic snapshot isolation [ASS13], snapshot isolation with vector clocks [PRT14] or serializable generalized snapshot isolation [BHEF11]. Lin et al [LKPMJP05] describe a middleware system that provides snapshot isolation for read-one-write-all replication. They call one-copy snapshot isolation a system that provides local snapshot isolation at each replica and all local schedules can be merged into a single snapshot-isolation schedule.

Although the trade-offs between consistency, availability and partition tolerance have been discussed for decades (for example in [RG77, TGGL82, FM82, BG84, DGMS85, COK86, GHOS96, FB99, GL02]), the strict formulation as the strong CAP principle in [FB99] has spawned several discussions. Brewer [Bre12] addresses some of these discussions in retrospect. Gilbert et al [GL02] argue the principle holds in two different theoretical models (the asynchronous and the partially synchronous network model) when arbitrary message loss may occur. The PACELC view on distributed systems was introduced by Abadi in [Aba12].

Eventual consistency has been discussed from various perspectives for example in [BG13, BGY13, BD13].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 13 Consistency

Create new playlist

Sign In

Sign Up