© Navin Sabharwal, Shakuntala Gupta Edward 2020
N. Sabharwal, S. G. EdwardHands On Google Cloud SQL and Cloud Spannerhttps://doi.org/10.1007/978-1-4842-5537-7_5

5. Cloud Spanner

Navin Sabharwal1  and Shakuntala Gupta Edward2
(1)
New Delhi, India
(2)
Ghaziabad, India
 

Cloud Spanner is Google’s cloud-native, enterprise-grade, always-on, fully managed NewSQL database service. It offers high availability with an industry-leading 99.999% availability SLA and horizontal scalability with consistent global ACID transactions.

Prior to getting started with Cloud Spanner, you need to get familiar with NewSQL. This chapter first takes a look at NewSQL. With digital transformation, many companies are building cloud applications. However, when building their applications, they have been forced to choose between traditional SQL databases (which guarantee ACID based transactional consistency) or the new NoSQL databases (which provide horizontal scaling capabilities). NewSQL brings SQL-ization to the NoSQL world.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

—Wikipedia

This chapter covers:
  • Evolution of NewSQL

  • An introduction to Cloud Spanner

  • Spanner’s availability and fit into the CAP theorem (the theorem for distributed databases)

  • Design decisions

  • Best fits

The next section looks at the history of databases and discusses the evolution of NewSQL.

New in NewSQL

In the mid-1960s, the traditional RDBMS databases were born out of a need to separate code from data. Correctness and consistency were the two important metrics. The number of users querying these databases was considerably low, but the requirement of querying was extensive—unlimited queries could be run on the databases. As the data grew, vertical scaling was a feasible solution. In addition, the downtime required for database migration as well as recovery was acceptable by the user.

A couple of decades ahead, the Internet, big data, and the cloud added new sets of requirements from the databases. The requirements from databases were primarily divided into two categories: OLAP and OLTP.

OLAP (online analytical processing) , also commonly known as data warehouses, deals with historical data for analytical processing and reporting. The workload is primarily read-only and the user base is limited. This requirement still fits in the traditional RDBMSs. In contrast, OLTP (online transactional processing) corresponds to highly concurrent data processing, characterized by short-lived predefined queries being run by real-time users. The queries are not read-only but write intensive as well.

While users access a smaller dataset when compared to OLAP users, the user base is considerable. At any given time, hundreds or thousands of users may be effectively querying the database concurrently. The workload can be both read and write operations. With this scale of users and the nature of operations being performed, the need for high availability increases, as every minute of downtime can cost thousands or even millions of dollars.

In effect, the important requirements for an OLTP databases are scalability, high availability, concurrency, and performance. This gave birth to NoSQL databases. In contrast to the relational data model of the RDBMS systems, in NoSQL, varied data models are used (e.g., document, key-value, wide columns, graphs, etc.). Each was purpose-built to meet a unique set of requirements. These databases are inherently schema-less by design and are not normalized.

Although these databases bring in higher availability, easier scalability, and better performance, they compromise on the strong consistency offered by RDBMSs. They offer eventual consistency. In effect, this is best for applications such as social media sites, where eventual consistency is acceptable. Users do not notice if they see a non-consistent view of the database. But this will not work where, in addition to scalability and high availability, consistency is also expected and is critical as well (such as with e-commerce platforms).

The expectation to combine the scalability and high availability of NoSQL with a relational model, transactional support, and SQL of RDBMS gave birth to NewSQL databases. This type of database is a game changer for those who have a need for the consistency provided by RDBMS and also need scale. The next section looks at the origins of Cloud Spanner at Google.

Origins of Cloud Spanner

Developers have relied on the traditional relational databases for decades to build applications that meet their business needs. In 2007, the year Spanner was built, most of Google’s main critical applications—such as AdWords, Google Play etc.—were being running on massive, manually sharded MySQL implementations.

Although the manual sharding option enabled Google with a scale out mechanism that MySQL didn’t support natively, it was unwieldy, so much so that re-sharding the database was a multi-year process. Google needed a database that had native, flexible sharding capabilities, adhered to relational schema and storage, was ACID-compliant, and supported zero downtime.

Faced with its need and the two sub-optimal choices, a team of Google engineers and researchers set out to develop a globally distributed database that could bridge the gap between SQL and NoSQL.

In 2012, a paper was published about Cloud Spanner, a database that offers the best of both the worlds. Table 5-1 lists its features.
Table 5-1

Feature-Wise Comparison of Cloud Spanner, RDBMS, and NoSQL

 

Cloud Spanner

RDBMS

NoSQL

Schema

Yes

Yes

No

SQL

Yes

Yes

No

Consistency

Strong

Strong

Eventual

Availability

High

Failover

High

Scalability

Horizontal

Vertical

Horizontal

Replication

Automatic

Configurable

Configurable

In the same year, it was initiated for internal Google use to handle workloads of its main critical applications, such as AdWords and Google Play. It supports tens of millions of queries per second.

Over the years, it has been battle-tested internally within Google with hundreds of different applications and petabytes of data across datacenters around the world. After internal use, Google announced Cloud Spanner for use by GCP users in February 2017.

The company saw its potential to handle the explosion of data coming from new information sources such as IoT, while providing the consistency and high availability needed when using this data. Now that you are familiar with its origination, the next section explains Cloud Spanner in more detail.

Google Cloud Spanner

Spanner was built from the ground up to be a widely distributed database, as it had to handle the demanding uptime and scaling requirements imposed by Google’s critical business applications. It can span across multiple machines, datacenters, and regions. This distribution was leveraged to handle huge datasets and workloads while still maintaining very high availability.

Spanner was also aimed to provide the same strict consistency guarantees provided by other enterprise-grade databases. In effect, Cloud Spanner is a fully managed, globally distributed, highly consistent database service and is specifically built from a cloud/distributed design perspective.

Being a managed service, it enables the developers to focus on application logic, value-add innovations and let Google take care of the mundane yet important tasks of maintenance and administrations. In addition, it enables you to do the following:
  • Scale out your RDBMS solutions without complex sharding or clustering

  • Gain horizontal scaling without migrating to a NoSQL landscape

  • Maintain high availability and protect against disasters without needing to engineer a complex replication and failover infrastructure

  • Gain integrated security with data-layer encryption

  • Identity and access management and audit logging

You also need to note that Cloud Spanner is not a
  • Simple scale-up relational database

  • Data warehouse

  • NoSQL database

The next section quickly familiarizes you with the CAP theorem, an important concept when dealing with distributed databases. It explains where Spanner fits in the CAP theorem.

Spanner and CAP Theorem

The CAP theorem states that a database can have only two of the three following desirable properties:
  • C: Consistency, which implies a single value for shared data

  • A: 100% availability, for both read and updates

  • P: Tolerance to network partition

This leads to three kinds of systems, as shown in Figure 5-1:
  • CA: Systems that provide consistency and availability

  • CP: Systems that provide consistency and partition tolerance

  • PA: Systems that provide partition tolerance and availability

../images/489070_1_En_5_Chapter/489070_1_En_5_Fig1_HTML.jpg
Figure 5-1

CAP theorem Venn diagram

The following sections cover insights from the Google whitepaper1. For distributed systems over a wide area, partitions are inevitable, although not necessarily common. If you believe that partitions are inevitable, any distributed system must be prepared to forfeit either consistency (AP) or availability (CP).

Despite being a globally distributed system, Spanner claims to be consistent and highly available. This implies that Spanner is a CA type of system, but the answer is no, as in the scenario of a network partition, Spanner chooses C and forfeits A, making it a CP system at heart. Google’s strategy with Spanner is to improve availability as much as possible, claiming it to be an effectively CA system. Google has introduced many ways to improve availability to a very high level.

One way for claiming effective CA is to ensure a low number of outages due to partitions ensuring higher network availability. It’s a major contribution to improve overall availability. This requirement of network availability for Google Spanner is helped enormously by Google’s wide area network.

Google runs its own private global network that has been custom architected to limit partitions and is tuned for high availability and performance needs of systems like Spanner. Each datacenter is connected to the private global network using at least three independent fibers, ensuring path diversity for every pair of datacenters. There’s significant redundancy of equipment and paths within each datacenter, ensuring that normally catastrophic events, such as cut fiber lines etc., do not lead to outages.

Another way that Spanner gets around CAP is via usage of TrueTime.

This is a service that enables the use of globally synchronized atomic clocks. This allows events to be ordered in real time, enabling Spanner to achieve consistency across regions and continents and even between continents with many nodes. TrueTime also enables taking snapshots across multiple independent systems, as long as they use (monotonically increasing) TrueTime timestamps for commit, agree on a snapshot time, and store multiple versions over time (typically in a log). This improves the recovery time and the overall availability.

The third way it gets around CAP is using the Paxos algorithm. This is used to reach consensus in a distributed environment. Paxos/consensus is the key in making everything work. One of the big reasons is the way transactions are committed and operations are handled during that. A two-phase commit protocol is used by geographically distributed traditional systems. This ensures that each site finishes its own work before finally marking the transaction as completed. Spanner makes each site a full replica of the others and uses a Paxos consensus algorithm to commit a transaction when a majority of sites have completed their work. Users of a particular site that hasn't finished updating can be rerouted to a site that has, until their own site is done. Although this eliminates the gridlock, it introduces slight latency during those specific intervals.

Along with these approaches, other software tricks help too. Spanner locks only a cell, which is a particular column in a particular row during write operations, rather than entire rows. In effect, it not only accelerates the commit, it also minimizes contention, ensuring full database consistency. In addition, for read-only operations that have tolerance for slightly stale data, an older version of the data can be made available. Another way Spanner speeds things up is by storing child data so that it is physically co-located with its parent data. This allows queries that include hierarchical data (like purchase orders and their line items) to be scanned in one sweep rather than requiring the database to traverse a join relationship between the two.

While the CAP theorem states that a distributed database can only achieve two out of the three properties, Spanner cheats in a good way through optimizations that side-step some of the normal constraints imposed by distributed databases and achieves greater than five 9s (99.999%) availability. Before you delve deeper into Cloud Spanner, the next section in this chapter looks at the best-fit workloads for Cloud Spanner.

Best Fit

The database industry now sees various database solutions. Each of them is a viable solution, each has its own solution space, and each is a fit for different workloads.

As an OLTP solution, Google Spanner is ideal for workloads supported by traditional relational databases, e.g. inventory management and financial transactions. Other examples of the solution space include applications providing probabilistic assessments, such as those based on AI and advanced analytics.

By probabilistic, I mean a methodology is chosen on the fly to compute and return the output quickly. You can call this methodology an algorithm. There can be various algorithms available for finding a solution, but it chooses on the fly the one that returns output quickly and the output is good enough. Examples include real-time price updates, or deciding the price to bid for delivering an advertisement in real-time to an end user. An example of this in Google is the challenge in Google AdWords applications to keep track of billions of clicks and rolling those up into advertisement placements and billing. Much of this is probabilistic, spanning large countries, and has low latency requirements.

Google’s development of Spanner is a tribute to the technical inventiveness of Google’s engineers, striving to solve the challenges of emerging probabilistic systems. Another potential use case for Spanner is large-scale public cloud email systems such as Gmail.

Development Support

Cloud Spanner keeps application development simple by supporting standard tools and languages. It supports schemas and DDL statements, SQL queries, distributed transactions, and JDBC drivers and offers client libraries for the most popular languages, including Java, Go, Python, PHP, Node.js, C#, and Ruby.

Summary

This chapter provided an overview of Cloud Spanner. To summarize
  • Is it a distributed database? Yes

  • Is it a relational database? Partially yes

  • Is it an ACID compliant database? Yes

  • Is it a SQL database? Mostly yes

  • Is it CP or AP? CP, effectively CA, assumes 99.999% availability

The next chapter explains the way data is modeled, stored, and queried in Cloud Spanner.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.84.157