Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11. Datastore Field Guide

Technically, a datastore is just that—storage of data and the associated software and structure to allow it to be stored, modified, and accessed. But we are specifically speaking of datastores that today’s organizations would use to fulfill these purposes with nontrivial amounts of users accessing nontrival amounts of data at nontrivial levels of concurrency.

A field guide is traditionally carried by a reader looking to identify flora, fauna, or other objects in nature. Carried out into the field, it helps the user distinguish between a wide range of similar objects. Our goal in this chapter is to help you understand the identifying characteristics of various datastores. Armed with this information we hope that you can go into the world understanding the best use cases for these datastores—as well as appropriate care and feeding.

In this chapter, we begin by defining attributes and categories of a datastore that are pertinent to the developers of applications that write and consume data. After this, we dive into the categories that would be of greater interest to architects and operators of datastores. Although we believe anyone developing, designing, or operating datastores should be aware of all of the attributes of that datastore, we recognize that people often evaluate these things from their specific job roles. Our goal is not to be comprehensive here, because there are any number of datastores out there in use. Instead, we hope to familiarize you with a good sampling and to give you the tools to do further investigation based on your own needs and objectives.

Conceptual Attributes of a Datastore

There are numerous ways to categorize a datastore. How you do so really depends on your job and how you might interact with the datastore. Do you build features in applications that query, store, and modify data? Do you query and analyze data for decision making? Do you design the systems on which the database will run? Do you administer, tune, or monitor the database? Each role has a certain view of the database and the data within.

In the world of ORMs and serverless architectures exposing APIs, there has been a movement toward abstracting away the datastores from the consumers who use them. We don’t agree with this. Understanding each attribute and the (implications thereof) of the datastore you (or someone else) are choosing is critical to doing your job well. There is no such thing as a free lunch, and each attractive feature will come with a trade-off or caveat. Ensuring that the teams working with these datastores are fully educated about this is a crucial function.

The Data Model

For most software engineers (SWEs), the data model is one of the most important categorizations. How the data is structured and how relationships are managed is crucial to those building applications on top of it. This also significantly affects how you manage database changes and migrations, as the different models often manage such changes very differently.

There are four prevalent permutations of data models in this section: relational, key–value, document, and navigational, or graph models. Each has its own uses, limitations, and quirks. The relational model has historically been the most prevalent. With significant time in production in a huge number of shops, it can be considered the most well understood, the most stable, and the least risky of the choices available.

The relational model

The relational model has been around since its initial proposal by E.F. Codd, who issued his paper “A Relational Model of Data for Large Shared Data Banks” in 1970 after an internal IBM paper one year earlier. Because the purpose of this guide is not to give you a full background but rather to help you to understand systems you encounter today, we will focus on relational systems in modern organizations.

The basic premise of relational database models, is that data is represented as a series of relationships, based around unique keys that are the core identifiers for a piece of data. The relational model creates consistency of data across tables with constraints on relationships, cardinality, values, and the requirements for certain attributes to exist or not. The relational model is formalized and includes various levels of strictness, also known as normalization. The reality is that many of these theoretical requirements fall by the wayside as performance and concurrency come into play.¹

Well-known relational databases include Oracle, MySQL, PostgreSQL, DB2, SQL Server, and Sybase. More alternative players in field include Google Spanner, Amazon RedShift, NuoDB, and Firebird. Many of these alternative systems are classifed as NewSQL. These are considered to be a subclass of relational database management systems that seek to break some of the barriers of concurrency and scale while maintaining consistency guarantees. This will be discussed further in this chapter.²

The relational model provides a very well-known approach to data retrieval. By supporting joins, one-to-many, and many-to-many relationships, developers have a high level of flexibility in how they define their data model. This can also lead to much more challenging approaches to schema evolution, as the addition, modification or removal of tables, relationships, and attributes all can require a large amount of coordination and moving parts in order to be accomplished. This can lead to expensive and risky changes, as discussed in Chapter 8.

Many software teams choose to operate an object relational management (ORM) layer to facilitate work by mapping the relational model to the object model defined at the software layer. Such ORMs can be great tools for developer velocity, but they can prove problematic to the database reliability engineer (DBRE) team in multiple ways.

ORMs and You

ORMs have matured greatly over the past decade, and as a DBRE, you don’t need to be as leery of them as you used to be. Still, there are some gotchas that you will need to consider.

ORMs tie reads and writes to tables. This makes any number of optimizations for one part of the workload more challenging because it affects the entire workload.
ORMs can hold transactions much longer than necessary, causing significant impact to finite resources because snapshots are maintained excessively.
ORMs can create a huge number of unnecessary queries.
ORMs can create convoluted and poorly performing queries.

Beyond these obvious issues, one of the greater challenges is that the ORM abstracts the database away, which eliminates the collaboration needed to scale an organization past the physical number of database administrators (DBAs) who work for them. The ORM allows constraints to be ignored, logic to be obfuscated, and creates barriers for DBREs to understand the application’s interactions with the datastores.³

All of this leads many software engineers and architects to think of relational systems as unflexible and an impedance to developer velocity. This is far from accurate, however, and later in this chapter we present a more accurate list of pros and cons and bust some of the myths prevalent in many such lists.

The key–value model

A key–value model stores data as a dictionary or hash. A dictionary is analogous to a table and contains any number of objects. Each object can store any number of attributes or fields within it. Like a relational database, these records are uniquely identified with a key. Unlike relational databases, there is no way to create mappings between objects based on those keys.

The key–value datastore sees an object as a blob of data. It isn’t inherently aware of the data it holds, and thus each object can have different fields, nested objects, and an infinite amount of variety. This variety comes at a cost, including the potential for inconsistency because rules are not enforced at the common storage layer. Similarly, efficiencies in datatypes and indexing are unavailable. On the other hand, a lot of the overhead inherent to managing various datatypes, constraints, and relationships are gone. If the application doesn’t need this, efficiencies can be realized.

Examples of key–value stores can be quite varied. One example is Dynamo. In 2007, Amazon published the Dynamo paper as a set of techniques to build a highly available, distributed datastore. As soon as we’ve gone over all of the attributes, we will discuss Dynamo in more detail. Dynamo-based systems include Aerospike, Cassandra, Riak, and Voldemort. Other key–value implementations include Redis, Oracle NoSQL Database, and Tokyo Cabinet.

The document model

The document model is technically a subset of the key–value model. The difference with the document model is that the database maintains metadata about the structure of the document. This allows for datatype optimization, secondary indexing, and other optimizations. Document stores store all information about the object together, rather than across tables. This allows for all data to be retrieved from one call, rather than requiring joins, which, although declaratively easy, can consume significantly more resources. This also typically eliminates the need for an ORM layer.

On the other hand, this means that document stores inherently require denormalization if there are different views of the object required. This can cause bloat and create consistency issues. Additionally, external tools are required to enforce data governance, as the schema no longer exists as a self-documenting system.⁴

Data Governance

Data governance is the management of the availability, integrity, and security of the data that an organization saves and uses. Introduction of new data attributes is something that should be considered carefully and documented. The use of JSON for data storage allows new data attributes to be introduced too easily and even accidentally.

The navigational model

Navigational models began with hierarchical and network databases. Today, when referring to navigation models, we are almost always discussing the graph data model. A graph database uses nodes, edges, and properties to represent and store data and the connections between objects. The node holds the data about a specific object, the edge is the relationship to another object, and properties allow additional data about the node to be added. Because relationships are directly stored as part of the data, links can easily be followed. Often, an entire graph can be retrieved in one call.

Graph stores, like document stores, often map more directly to the structure of object-oriented applications. They also eliminate the need for joins and can prove to have more flexibility in terms of data model evolution. Of course, this works only for data that is ideal for graph-appropriate queries. Traditional queries can prove to be far less performant.⁵

Each of these models has its place in a certain subset of applications. We will summarize the options and trade-offs within. First, let’s look at transactional support and implementation attributes.

Transactions

How a datastore handles transactions is also a considerably important attribute to understand and consider. A transaction is effectively a logical unit of work within a database that can be considered to be indivisible. All of the operations in the transaction must be executed, or rolled back, to maintain consistency within the datastore. Being able to trust that all aspects of a transaction will be committed or rolled back greatly simplifies error handling logic in database-driven applications. These guarantees of the transactional model allow developers to ignore certain aspects of failure and concurrency that would consume significant developer cycles and resources.

If you’ve worked predominantly with traditional relational datastores, you probably take the existence of transactions for granted. This is because almost all of these datastores are built on the ACID model, explained next, introduced by IBM in 1975. All reads and writes are considered to be transactions, and they utilize the underlying architecture of database concurrency to achieve this.

ACID

An ACID database provides a set of guarantees that, when put together, create the acronym ACID. These guarantees are (A)tomicity, (C)onsistency, (I)solation, and (D)urability. In 1983, Andreas Reuter and Theo Härder coined the acronym, building on work by Jim Gray, who enumerated Atomicity, Consistency, and Durability but left out Isolation. These four properties describe the major guarantees of the transaction paradigm, which has influenced many aspects of development in database systems.

It is of the utmost importance when working with a datastore to understand how it defines and implements these concepts because there can be a significant amount of ambiguity and diversity. With this in mind, it behooves us to consider each property and to understand the variations to be found in the wild.⁶

Atomicity

Atomicity refers to the guarantee that an entire transaction will be committed, or written, to the datastore or that the entire transaction will be rolled back. There is no such thing as a partial write or rollback in an atomic database. Atomicity, in this context, does not refer to atomic operations as you might find in software engineering. That term refers to the guarantee of isolation from concurrent processes seeing work in progress rather than only the before and after results.

There are many reasons that a transaction might fail and require rollback. The client process might terminate mid-transaction, or perhaps a network fault could terminate the connection. Similarly, database crashes, server faults, and numerous other operations could require a partially completed transaction to be rolled back.

PostgreSQL implements this by using pg_log. Transactions are written in pg_log and given a state of in progress, committed, or aborted. Should a client abandon or rollback a transaction, it will be marked as aborted. Backend processes will also periodically mark transactions as aborted if there are no backends mapped to it.

It is important to note that you can consider writes atomic only if the underlying disk page writes are atomic. There is significant disagreement on the atomicity of sector writes. Most modern disks will channel power to writing a sector even during a disk failure. But depending on the layers of abstraction between the physical drive and the actual writes being flushed to disk, there are still plenty of opportunities for data loss.

Consistency

The guarantee of consistency is a guarantee that any transaction will bring the database from one valid state to another. A transaction being written can assume to not be able to violate defined rules. Technically, consistency is defined at the application level rather than the database. Traditional databases do, however, give the developer tools to enforce this consistency. Those tools can be guaranteed effective and include constraints and triggers. Constraints can include foreign keys with cascading, not null, uniqueness constraints, datatypes and lengths, and even specific values being allowed in a specific field.

It is interesting and frustrating that consistency is used elsewhere in the realms of databases and software. The CAP theorem uses the term consistency also but in a very different way. Similarly, you will hear this term when discussing hashing and replication.

Isolation

The isolation guarantee is a promise that the concurrent execution of transactions results in the same state that would occur if you were to run those transactions serially and sequentially. ACID databases do this via a combination of techniques that can include write locks, read locks, and snapshots. Collectively this is called concurrency control. In practice, there are multiple types of concurrency control that can lead to different behaviors in the database. Stricter versions can significantly impact performance of concurrent transactions, whereas more relaxed ones might lead to better performance at the cost of less isolation. ⁷

The ANSI/ISO SQL standard defines four possible levels of transaction isolation. Each level would potentially provide a different outcome for the same transaction. These levels are defined in terms of three potential occurrences that are permitted or not at each isolation level:

Dirty read: With a dirty read, you can potentially read uncommitted, or dirty, data that is being written in another transaction from another client.
Nonrepeatable read: With a nonrepeatable read, within the context of a transaction, if you perform the same read twice, you could potentially get different results based on other concurrent activities in the database.
Phantom read: With a phantom read, within the context of a transaction, you perform the same read twice, and the data returned the second time is different from the first. This is different from a nonrepeatable read because with a phantom read, data you have already queried does not change, but more data is returned by your query than before.

To avoid these phenomena, there are four potential isolation levels that can be utilized:

Read Uncommitted

This is the lowest isolation level. Here, dirty reads, dirty writes, and nonrepeatable and phantom reads are all allowed.

Read Committed

In this isolation level, the goal is to avoid dirty reads and dirty writes. In other words, you should not be able to read, or overwrite, uncommitted data. Some databases will avoid dirty writes via write locks acquired on selected data. Write locks are held until the data is committed, and read locks are released after select. Dirty reads are usually implemented by keeping two copies of the data being written in the transaction, one of older committed data to be used for reads from other transactions and one for the data that has been written but not committed.

In read committed isolation, you can still experience nonrepeatable reads however. If uncommitted data is read once and then read again after it has been committed, you will see different values within the context of your own transaction.

Repeatable Reads

To achieve read committed isolation level and to avoid nonrepeatable reads, you must implement additional controls. If a database is using locks to manage concurrency control, a client would need to keep read and write locks until the end of the transaction. This would not maintain a range lock, though, so it would be possible to get phantom reads. As you can imagine, this lock-based approach is heavy handed and can lead to significant performance impact on highly concurrent systems.

The other way to accomplish this is via snapshot isolation. In snapshot isolation, after a transaction is started, the client will see an image of the database based on the current time. Additional writes will not show in the snapshot, allowing for long-running queries to have consistent, repeatable reads. Snapshot isolation uses write locks but not read locks. The goal is to ensure that reads do not block writers, and vice versa. Since this requires more than just two copies, it is referred to as multiversion concurrency control (MVCC).

In repeatable read snapshot isolation, write skew can still occur. In write skew, two writes can be allowed on the same column or columns in a row from two different writers who have read the columns they are updating. This results in rows that can have data from two transactions.

Serializable

This is the highest isolation level and is meant to avoid all of the aforementioned phenomena. Like in repeatable read, if locks are the focus of concurrency control, read and write locks are held for the duration of the transaction. There are additions, however, and the locking strategy is called 2-phase locking (2PL).

In 2PL, a lock can be shared or exclusive. Multiple readers can hold shared locks for reading. However, to get an exclusive lock for a write, all shared read locks must be released after a commit. Similarly, if a write is occurring, shared locks for reads cannot be acquired. In this mode, it can be quite common in high-concurrency environments for transactions to be stuck waiting for a lock. This is called a deadlock. Additionally, range-locks must also be acquired for queries using ranges in their WHERE clauses. Otherwise, phantom reads occur.

2PL can dramatically affect latency for transactions. When many transactions are waiting, system-wide latency can increase significantly. Thus, many systems do not truly implement serializability and stick to repeatable read.

The non-lock-based approach builds on snapshot isolation and is called serial snapshot isolation (SSI). This approach is an optimistic serialization, whereby the database waits until commit to see if any activities have occurred to cause a serializability issue, most often a write collision. This can significantly reduce latency in systems for which concurrency violations are few. However, if these are regular things, the constant rollback and retries can be quite significant.

Because each isolation level is stronger than those below in that no higher isolation level allows an action forbidden by a lower one, the standard permits a DBMS to run a transaction at an isolation level stronger than that requested (e.g., a “Read Committed” transaction may actually be performed at a “Repeatable Read” isolation level).

We have only scratched the surface of isolation, isolation anomalies, and isolation implementations. We have a few delightful recommended reads for you for further dives at the end of the chapter.

Durability

The durability guarantee promises us that as soon as a transaction has been committed, it remains committed. Whether there is a power loss, database crash, hardware fault, or any other issue, the transaction stays durable. Obviously, the database cannot promise that the underlying hardware will support this durability. As discussed in Chapter 5, there are numerous opportunities for the database to believe it has synchronized to disk, when the reality is very different.

Durability is linked closely to atomicity, as durability is required for atomicity. Many databases implement a write-ahead log (WAL) to capture all writes before they are pushed to disk. This log is used to undo a transaction as well as to reapply it. If there is a failure, upon restarting, the database can check this log against the system to determine whether the transaction must be undone, completed, or ignored.

Much like isolation levels, there are times when durability can and should be relaxed to accommodate performance. For true durability, flush to disk must occur on every commit. This can become prohibitively expensive and is not required for all transactions and writes. For instance, in MySQL, you can tune the Innodb log flush to perform periodically rather than after each commit. Similarly, you can do this for replication logs.⁹

Even though we have stayed fairly high-level here, it should be apparent just how much detail is hidden and taken for granted in systems that support transactions. As a DBRE in an organization, it is critical for you to ensure familiarity with the implementation not only for yourself, but also for the development organization. Often the details of these implementations are not readily apparent from documentation, and further tests via such tools as Jepsen and Hermitage can assist you in this discovery process.

Similarly, this knowledge can help you in choosing appropriate configurations when there are options to relax durability or use weaker isolation. Alternatively, knowing when database defaults do not meet your applications needs can be just as important.

BASE

As engineers have looked to alternatives to traditional relational systems, the term BASE has begun to be used as a foil to ACID. BASE stands for basically available, soft state, and eventual consistency. This focuses on nontransactional systems that are distributed and might have fairly nontraditional replication and synchronization capabilities. Unlike ACID systems, there might not ever be a clear state while the system is up and taking traffic. Similarly, without concurrency control needs for transactions, write throughput and concurrency can be dramatically increased at the expense of atomicity, isolation, and consistency.¹⁰

Having looked at the data models and transactional models available to datastores, we’ve covered the conceptual attributes most relevant to developers. Still, there are numerous other attributes that must be considered when evaluating not only database choice but also the entire operational ecosystem and infrastructure around those databases (Table 11-1).

Table 11-1. Datastore conceptual attribute summary
Attribute	MySQL	Cassandra	MongoDB	Neo4J
Data model	Relational	Key–Value	Document	Navigational
Model maturity	Mature	2008	2007	2010
Object relationships	Foreign keys	None	DBRefs	Core to model
Atomicity	Supported	Partition level	Document level	Object level
Consistency (node)	Supported	Unsupported	Unsupported	Strong consistency
Consistency (cluster)	Replication based	Eventual (tunable)	Eventual	XA transaction support
Isolation	MVCC	Serializable option	Read-uncommitted	Read committed
Durability	For DML, not DDL	Supported, tunable	Supported, tunable	Supported, WAL

Now, a lot of this is overly simplified, and a trusty skepticism of the functionality of a feature, supported with testing the efficacy of those claims, will help to clarify things as you evaluate a datastore for your application. Even with these caveats, there are some definite differences that will help to determine the appropriate choice for your application. The next step is to evaluate the internal attributes of a datastore to get the full picture.

Internal Attributes of a Datastore

There are numerous ways to describe and categorize a datastore. The data model and transactional structures are attributes that directly affect application architecture and logic. They tend to thus be a big focus of developers who are looking for velocity and flexibility. The internal, architectural implementations of these databases tend to be black boxes or at least, only features on a glossy marketing brochure. Still, they are crucial to choosing the appropriate datastore for the long term.

Storage

We went over storage in detail in Chapter 10. Each datastore will have one or more options for laying data down on disk that are available to it. This often comes in the form of storage engines. The storage engine manages the reading and writing of data, locking, concurrency access to data, and any processes needed to manage data structures, such as B-tree indexes, log structured merge (LSM) trees, and bloom filters.

Some databases, like MySQL and MongoDB, offer multiple storage engine options. For example, in MongoDB, you can use MMap, WiredTiger in MMap, or LSM structures or RocksDB, which is based on LSM trees. Storage engines implementations will vary significantly, but their attributes can generally be broken down to the following:

Write performance
Read performance
Durability of writes
Storage size

Evaluating storage engines based on these attributes will help to determine which to choose for your datastore. There are often tradeoffs between read and write performance as well as durability. There are also features that can be implemented to increase the effective durability of the storage engine. Understanding these and of course, benchmarking and testing the veracity of claims of durability are of the utmost importance.

The Ubiquitous CAP Theorem Section

Often, when people discuss these attributes, they will refer to Eric Brewer’s CAP theorem (see Figure 11-1). The CAP theorem states that any networked shared-data system can have at most two of three properties or guarantees: (C)onsistency, (A)vailability, or network (P)artition tolerance. As with the terms in ACID, these terms are overly generalized. Each one is not truly either/or and is, in fact, a continuum. Many will refer to a system as CP or AP, meaning that they are designed to encompass two specific properties while trading-off another. Yet, if you dive into those systems, you will find their implementations of each specific attribute to be incomplete, having only achieved a portion of Availability or Consistency.¹¹

CAP is meant to help designers understand the trade-offs between consistency or availability. Network partitions in distributed systems are an inevitability. Networks are inherently unreliable. In the case of such, the node(s) on one side of a partition will inevitably lose consistency if they allow state to be updated. If consistency is preferred, one side of the partition must become unavailable. Let’s look at each term more carefully to understand what those attributes can potentially encompass.

Consistency

Recall that we discussed consistency in the section on transactions. It is the C in ACID. Annoyingly, ACID consistency is not the same as CAP consistency. In ACID, consistency means that a transaction preserves all database rules and constraints. In CAP, consistency means linearizability. Linearizability guarantees that a set of operations on an object in a distributed database will occur in real-time order. Because operations can be reads and writes, this means that these operations must appear as they occur to the rest of the users in the system. Linearizability is a guarantee of sequential consistency within real time.¹²

ACID consistency, like CAP consistency, cannot be maintained across a network partition. This means that ACID-based transactional datastores can guarantee consistency only by sacrificing availability in the face of a network partition.¹³ BASE systems were developed, amongst other reasons, to be able to tolerate network partitions without sacrificing availability.

Availability

The availability aspect of the CAP theorem refers to the ability to process requests. Normally, most distributed systems can provide consistency and availability. But, in the face of a network partition where a subset of nodes is split off from another subset of nodes, the decision must be made to remain available at the expense of consistency. Of course, no system can maintain 100% availability over time, reflecting what we said before regarding availability as a continuum rather than either/or.

Partition tolerance

A network partition is a temporary or permanent disruption in connectivity that ends up disrupting communication between two subsets of the network infrastructure. In effect, this often will create two smaller clusters. Each of these clusters can believe it is the last cluster standing, allowing writes to continue. This leads to two divergent datasets and is also referred to as a split brain.

The CAP theorem was published to help people understand the trade-offs between consistency and availability in distributed datastores. In practice, network partitions encompass a small amount of time in the life cycle of a datastore. Consistency and availability can and should be delivered together. When partitions do occur, however, the system must be able to detect, manage, and recover to restore consistency and availability.

It is worth noting that the CAP theorem does not account for latency or performance at all. Latency can be as crucial as availability, and bad latency can also be a potential cause of consistency issues. Long enough latency can pass a boundary that forces a system to enter into the failure state associated with network partitions. There is a trade-off with regard to latency that is often made more explicitly than those of Consistency and Availability. In fact, the other important reason that BASE systems and the NoSQL movement came about was because of the requirements for increased performance at scale.

Now that we have familiarized ourselves with the CAP theorem, let’s discuss how that might influence our database taxonomy. We could go with the concept of CP versus AP, but we’ve already discussed the oversimplification of such an approach. Rather, let’s look at how distributed systems maintains both consistency and availability.

Consistency Latency Trade-offs

In a distributed system, the system is said to be strongly consistent if all nodes see all transactions in the same order in which they were written. In other words, the system is linearizable. The CAP theorem specifically discusses how the distributed datastore favors consistency or availability in the event of a network partition. Consistency is required throughout the life cycle of the datastore, however, and cannot be looked at just through the CAP paradigm.¹⁴

Everyone would like a strongly consistent distributed datastore, but few people are actually willing to accept the impacts to latency and availability that come with this. So, tradeoffs are made. In this section, we evaluate the trade-offs that are made and how they affect overall consistency in the cluster. This allows us to clearly evaluate a datastore to see if it meets our needs.

When writing data to a node in a distributed datastore, the data must be replicated to meet availability guarantees. As reviewed earlier, we can replicate this data in a few different ways:

Send writes to all nodes at once, synchronously.
Send writes to one node, the role of primary. Replication occurs asynchronously, semi-synchronously, or synchronously.
Send writes to any node, which functions as primary for that transaction only. Replication occurs asynchronously, semi-synchronously, or synchronously.

When writing to any node in a cluster, there is the opportunity for consistency to be broken without some coordinator process, such as Paxos, that can order the writes effectively. This inherently adds more latency to the transaction. This is one way in which strong consistency is maintained while latency impacts are traded off. Inversely, if latency is more crucial than ordering, consistency could be sacrificed at this stage. This is the ordering-latency trade-off.

When sending writes to one node that must propagate to other nodes, there is the possibility that the primary node accepting writes is unavailable due to it being shut down/crashed or due to it having unacceptable amounts of load that can lead to time-outs. Retries or waits increase latency. However, you can configure load balancers or proxies to send writes to another node after a timeout. Writes going to another node can cause consistency issues, however, as conflicts can occur if the original transaction processed but did not provide confirmation. The amount of retries or increased time-out windows impact latency, and eventually availability, while maintaining consistency. This is the primary timeout retry trade-off.

When reading from a node, you can also experience time-outs or unavailability. Sending reads to other nodes in asynchronously replicated environments can lead to stale reads and thus consistency issues. Increasing time-outs and retries reduces the risk of inconsistent results but at the cost of increased latency. This is the reader time-out retry trade-off.

When writing to all nodes synchronously, whether via ordered processor or replication, you also are incurring additional latency due to the overhead of all transactions being shipped to other nodes. If these nodes are on a congested network or are communicating across networks, this latency can be very high. This is the synchronous replication-latency trade-off. A compromise for this trade-off is semi-synchronous replication. This reduces potential latency impacts by reducing the number of nodes and network connections that might affect the replication. Semi-synchronous is a compromise, however, because by increasing latency, you have increased risk of data loss, trading off availability. This is the semi-synchronous availability-latency trade-off.

Each of these shows opportunities for tuning a system toward greater consistency or reduced latency. These trade-offs are crucial for the times when the systems are behaving outside of a network partition and are servicing requests.

Availability

Similarly to consistency and its relationship to latency, we have availability. There is availability in the face of a network partition as in the CAP theorem. But, there is also daily availability in the face of node-level, multinode, or entire cluster failures. When discussing availability in distributed systems, we find it useful to refer to yield and harvest rather than simply availability. Yield refers to the ability to get an answer to your question. Harvest refers to the completeness of the dataset. Rather than simply considering whether a system is up or down, you can evaluate which approach is best—reducing yield or reducing harvest in the face of failures.

The first question to ask yourself in a distributed system is whether it is acceptable to reduce the harvest to maintain the yield. For example, is it acceptable to deliver 75% of the data in a query if 25% of your node capacity is down? If you are delivering a large amount of search results, this might be acceptable. If so, this allows a greater tolerance for failure, which could mean reducing replication factors in your Cassandra ring. Similarly, if your harvest must stay close to 100%, you need to distribute more copies of your data. This means not just more replicas but more availability zones within which replicas should exist.

You can also see this in the decomposition of applications into their own sub-applications. Whether you see this in functional partitioning or microservices, the result is that one failure can be isolated from the rest of the system. This will often require programming work, as well, but it is an example of reducing harvest to maintain yield.

Understanding the storage mechanisms and the way your datastore implements the trade-offs of consistency, availability, and latency give you the “under the covers” understanding of the datastore that complements the conceptual attributes we’ve already covered. The engineers and architects who are responsible for the performance and functionality of the application are most concerned with the conceptual attributes. Operational and database engineers are often focused on making sure that the internal attributes (see Table 11-2) meet the Service Level Objectives (SLOs) that have been set forth for them by the business.

Table 11-2. Datastore internal attribute summary
Attribute	MySQL	Cassandra	MongoDB	Neo4J
Storage engines	Plugins, B-tree primarily	LSM only	Plugins, B-tree, or LSM	Native graph storage
Distributed consistency	Focused on consistency	Eventual, secondary to availability	Focused on consistency	Focused on consistency
Distributed availability	Secondary to consistency	Focused on availability	Secondary to consistency	Secondary to consistency
Latency	Tunable based on durability	Optimized for writes	Tunable for consistency	Optimized for reads

Wrapping Up

Hopefully this field guide has given you a good list of attributes, and the variety therein, for the wild datastore. This should be useful to you whether you are considering a new application, learning an existing one, or evaluating the request by a developer team for the newest cool datastore. Now that we have climbed the ladder from storage to datastore, it’s time to move on to data architectures and pipelines.

¹ Codd, E.F., “The relational model for database management: version 2”, ACM Digital Library.

² Ibid.

³ Ireland, Christopher, et. al, “A Classification of Object-Relational Impedance Mismatch”, IEEE Xplore.

⁴ Vera, Harley, et al., “Data Modeling for NoSQL Document-Oriented Databases”.

⁵ Stonebraker, Micheal and Held, Gerald, “Networks, Hierarchies and Relations in Data Base Management Systems”.

⁶ Vieira, Marco, et al., “Timely ACID Transactions in DBMS”.

⁷ Adya, Atul et al., “Generalized Isolation Level Definitions”.

⁸ See the post “If Eventual Consistency Seems Hard, Wait Till You Try MVCC” on Baron Schwartz’s blog.

⁹ Sears, Russell, and Brewer, Eric, “Segment-Based Recovery: Write-ahead loggin revisited”.

¹⁰ Roe, Charles, “The Question of Database Transaction Processing: An ACID, Base, NoSQL Primer”.

¹¹ See Martin Kleppmann’s article “Please stop calling databases CP or AP”.

¹² See Peter Bailis’s article “Linearizability versus Serializability”.