Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

N. Sabharwal, S. G. EdwardHands On Google Cloud SQL and Cloud Spannerhttps://doi.org/10.1007/978-1-4842-5537-7_5

5. Cloud Spanner

Navin Sabharwal¹ and Shakuntala Gupta Edward²

(1)

New Delhi, India

(2)

Ghaziabad, India

Cloud Spanner is Google’s cloud-native, enterprise-grade, always-on, fully managed NewSQL database service. It offers high availability with an industry-leading 99.999% availability SLA and horizontal scalability with consistent global ACID transactions.

Prior to getting started with Cloud Spanner, you need to get familiar with NewSQL. This chapter first takes a look at NewSQL. With digital transformation, many companies are building cloud applications. However, when building their applications, they have been forced to choose between traditional SQL databases (which guarantee ACID based transactional consistency) or the new NoSQL databases (which provide horizontal scaling capabilities). NewSQL brings SQL-ization to the NoSQL world.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.
—Wikipedia

This chapter covers:

Evolution of NewSQL
An introduction to Cloud Spanner
Spanner’s availability and fit into the CAP theorem (the theorem for distributed databases)
Design decisions
Best fits

The next section looks at the history of databases and discusses the evolution of NewSQL.

New in NewSQL

In the mid-1960s, the traditional RDBMS databases were born out of a need to separate code from data. Correctness and consistency were the two important metrics. The number of users querying these databases was considerably low, but the requirement of querying was extensive—unlimited queries could be run on the databases. As the data grew, vertical scaling was a feasible solution. In addition, the downtime required for database migration as well as recovery was acceptable by the user.

A couple of decades ahead, the Internet, big data, and the cloud added new sets of requirements from the databases. The requirements from databases were primarily divided into two categories: OLAP and OLTP.

OLAP (online analytical processing) , also commonly known as data warehouses, deals with historical data for analytical processing and reporting. The workload is primarily read-only and the user base is limited. This requirement still fits in the traditional RDBMSs. In contrast, OLTP (online transactional processing) corresponds to highly concurrent data processing, characterized by short-lived predefined queries being run by real-time users. The queries are not read-only but write intensive as well.

While users access a smaller dataset when compared to OLAP users, the user base is considerable. At any given time, hundreds or thousands of users may be effectively querying the database concurrently. The workload can be both read and write operations. With this scale of users and the nature of operations being performed, the need for high availability increases, as every minute of downtime can cost thousands or even millions of dollars.

In effect, the important requirements for an OLTP databases are scalability, high availability, concurrency, and performance. This gave birth to NoSQL databases. In contrast to the relational data model of the RDBMS systems, in NoSQL, varied data models are used (e.g., document, key-value, wide columns, graphs, etc.). Each was purpose-built to meet a unique set of requirements. These databases are inherently schema-less by design and are not normalized.

Although these databases bring in higher availability, easier scalability, and better performance, they compromise on the strong consistency offered by RDBMSs. They offer eventual consistency. In effect, this is best for applications such as social media sites, where eventual consistency is acceptable. Users do not notice if they see a non-consistent view of the database. But this will not work where, in addition to scalability and high availability, consistency is also expected and is critical as well (such as with e-commerce platforms).

The expectation to combine the scalability and high availability of NoSQL with a relational model, transactional support, and SQL of RDBMS gave birth to NewSQL databases. This type of database is a game changer for those who have a need for the consistency provided by RDBMS and also need scale. The next section looks at the origins of Cloud Spanner at Google.

Origins of Cloud Spanner

Developers have relied on the traditional relational databases for decades to build applications that meet their business needs. In 2007, the year Spanner was built, most of Google’s main critical applications—such as AdWords, Google Play etc.—were being running on massive, manually sharded MySQL implementations.

Although the manual sharding option enabled Google with a scale out mechanism that MySQL didn’t support natively, it was unwieldy, so much so that re-sharding the database was a multi-year process. Google needed a database that had native, flexible sharding capabilities, adhered to relational schema and storage, was ACID-compliant, and supported zero downtime.

Faced with its need and the two sub-optimal choices, a team of Google engineers and researchers set out to develop a globally distributed database that could bridge the gap between SQL and NoSQL.

In 2012, a paper was published about Cloud Spanner, a database that offers the best of both the worlds. Table 5-1 lists its features.

Table 5-1

Feature-Wise Comparison of Cloud Spanner, RDBMS, and NoSQL

	Cloud Spanner	RDBMS	NoSQL
Schema	Yes	Yes	No
SQL	Yes	Yes	No
Consistency	Strong	Strong	Eventual
Availability	High	Failover	High
Scalability	Horizontal	Vertical	Horizontal
Replication	Automatic	Configurable	Configurable

In the same year, it was initiated for internal Google use to handle workloads of its main critical applications, such as AdWords and Google Play. It supports tens of millions of queries per second.

Over the years, it has been battle-tested internally within Google with hundreds of different applications and petabytes of data across datacenters around the world. After internal use, Google announced Cloud Spanner for use by GCP users in February 2017.

The company saw its potential to handle the explosion of data coming from new information sources such as IoT, while providing the consistency and high availability needed when using this data. Now that you are familiar with its origination, the next section explains Cloud Spanner in more detail.

Google Cloud Spanner

Spanner was built from the ground up to be a widely distributed database, as it had to handle the demanding uptime and scaling requirements imposed by Google’s critical business applications. It can span across multiple machines, datacenters, and regions. This distribution was leveraged to handle huge datasets and workloads while still maintaining very high availability.

Spanner was also aimed to provide the same strict consistency guarantees provided by other enterprise-grade databases. In effect, Cloud Spanner is a fully managed, globally distributed, highly consistent database service and is specifically built from a cloud/distributed design perspective.

Being a managed service, it enables the developers to focus on application logic, value-add innovations and let Google take care of the mundane yet important tasks of maintenance and administrations. In addition, it enables you to do the following:

Scale out your RDBMS solutions without complex sharding or clustering
Gain horizontal scaling without migrating to a NoSQL landscape
Maintain high availability and protect against disasters without needing to engineer a complex replication and failover infrastructure
Gain integrated security with data-layer encryption
Identity and access management and audit logging

You also need to note that Cloud Spanner is not a

Simple scale-up relational database
Data warehouse
NoSQL database

The next section quickly familiarizes you with the CAP theorem, an important concept when dealing with distributed databases. It explains where Spanner fits in the CAP theorem.

Spanner and CAP Theorem

The CAP theorem states that a database can have only two of the three following desirable properties:

C: Consistency, which implies a single value for shared data
A: 100% availability, for both read and updates
P: Tolerance to network partition

This leads to three kinds of systems, as shown in Figure 5-1:

CA: Systems that provide consistency and availability
CP: Systems that provide consistency and partition tolerance
PA: Systems that provide partition tolerance and availability

../images/489070_1_En_5_Chapter/489070_1_En_5_Fig1_HTML.jpg — Figure 5-1
CAP theorem Venn diagram

The following sections cover insights from the Google whitepaper¹. For distributed systems over a wide area, partitions are inevitable, although not necessarily common. If you believe that partitions are inevitable, any distributed system must be prepared to forfeit either consistency (AP) or availability (CP).

Despite being a globally distributed system, Spanner claims to be consistent and highly available. This implies that Spanner is a CA type of system, but the answer is no, as in the scenario of a network partition, Spanner chooses C and forfeits A, making it a CP system at heart. Google’s strategy with Spanner is to improve availability as much as possible, claiming it to be an effectively CA system. Google has introduced many ways to improve availability to a very high level.

One way for claiming effective CA is to ensure a low number of outages due to partitions ensuring higher network availability. It’s a major contribution to improve overall availability. This requirement of network availability for Google Spanner is helped enormously by Google’s wide area network.

Google runs its own private global network that has been custom architected to limit partitions and is tuned for high availability and performance needs of systems like Spanner. Each datacenter is connected to the private global network using at least three independent fibers, ensuring path diversity for every pair of datacenters. There’s significant redundancy of equipment and paths within each datacenter, ensuring that normally catastrophic events, such as cut fiber lines etc., do not lead to outages.

Another way that Spanner gets around CAP is via usage of TrueTime.

This is a service that enables the use of globally synchronized atomic clocks. This allows events to be ordered in real time, enabling Spanner to achieve consistency across regions and continents and even between continents with many nodes. TrueTime also enables taking snapshots across multiple independent systems, as long as they use (monotonically increasing) TrueTime timestamps for commit, agree on a snapshot time, and store multiple versions over time (typically in a log). This improves the recovery time and the overall availability.

The third way it gets around CAP is using the Paxos algorithm. This is used to reach consensus in a distributed environment. Paxos/consensus is the key in making everything work. One of the big reasons is the way transactions are committed and operations are handled during that. A two-phase commit protocol is used by geographically distributed traditional systems. This ensures that each site finishes its own work before finally marking the transaction as completed. Spanner makes each site a full replica of the others and uses a Paxos consensus algorithm to commit a transaction when a majority of sites have completed their work. Users of a particular site that hasn't finished updating can be rerouted to a site that has, until their own site is done. Although this eliminates the gridlock, it introduces slight latency during those specific intervals.

Along with these approaches, other software tricks help too. Spanner locks only a cell, which is a particular column in a particular row during write operations, rather than entire rows. In effect, it not only accelerates the commit, it also minimizes contention, ensuring full database consistency. In addition, for read-only operations that have tolerance for slightly stale data, an older version of the data can be made available. Another way Spanner speeds things up is by storing child data so that it is physically co-located with its parent data. This allows queries that include hierarchical data (like purchase orders and their line items) to be scanned in one sweep rather than requiring the database to traverse a join relationship between the two.

While the CAP theorem states that a distributed database can only achieve two out of the three properties, Spanner cheats in a good way through optimizations that side-step some of the normal constraints imposed by distributed databases and achieves greater than five 9s (99.999%) availability. Before you delve deeper into Cloud Spanner, the next section in this chapter looks at the best-fit workloads for Cloud Spanner.

Best Fit

The database industry now sees various database solutions. Each of them is a viable solution, each has its own solution space, and each is a fit for different workloads.

As an OLTP solution, Google Spanner is ideal for workloads supported by traditional relational databases, e.g. inventory management and financial transactions. Other examples of the solution space include applications providing probabilistic assessments, such as those based on AI and advanced analytics.

By probabilistic, I mean a methodology is chosen on the fly to compute and return the output quickly. You can call this methodology an algorithm. There can be various algorithms available for finding a solution, but it chooses on the fly the one that returns output quickly and the output is good enough. Examples include real-time price updates, or deciding the price to bid for delivering an advertisement in real-time to an end user. An example of this in Google is the challenge in Google AdWords applications to keep track of billions of clicks and rolling those up into advertisement placements and billing. Much of this is probabilistic, spanning large countries, and has low latency requirements.

Google’s development of Spanner is a tribute to the technical inventiveness of Google’s engineers, striving to solve the challenges of emerging probabilistic systems. Another potential use case for Spanner is large-scale public cloud email systems such as Gmail.

Development Support

Cloud Spanner keeps application development simple by supporting standard tools and languages. It supports schemas and DDL statements, SQL queries, distributed transactions, and JDBC drivers and offers client libraries for the most popular languages, including Java, Go, Python, PHP, Node.js, C#, and Ruby.

Summary

This chapter provided an overview of Cloud Spanner. To summarize

Is it a distributed database? Yes
Is it a relational database? Partially yes
Is it an ACID compliant database? Yes
Is it a SQL database? Mostly yes
Is it CP or AP? CP, effectively CA, assumes 99.999% availability

The next chapter explains the way data is modeled, stored, and queried in Cloud Spanner.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Cloud Spanner

Create new playlist

Sign In

Sign Up