© Navin Sabharwal, Shakuntala Gupta Edward 2020
N. Sabharwal, S. G. EdwardHands On Google Cloud SQL and Cloud Spannerhttps://doi.org/10.1007/978-1-4842-5537-7_6

6. Cloud Spanner Explained

Navin Sabharwal1  and Shakuntala Gupta Edward2
(1)
New Delhi, India
(2)
Ghaziabad, India
 

Now that you are familiar with Spanner, it’s time to look more closely at its underlying concepts and features. As with any database, it is beneficial to do a deep dive before starting to work with it. This chapter explores the concepts that are important when working with Spanner, starting with instances. The chapter covers pointers for databases, tables, and keys. Along the way, it also covers the way operations are handled in Spanner.

This chapter covers:
  • Data modeling concepts

  • Design considerations to keep in mind when modeling with Spanner

  • Data replication

  • Transactions and understanding concurrent transactions, with an example

  • Operations in the distributed environment

The next section begins with instances.

Instance

A Cloud Spanner instance is roughly analogous to an RDBMS server. The instance contains one or more databases and has allocated resources (compute and storage) that are used by all its databases. Figure 6-1 shows a hypothetical representation of an instance.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig1_HTML.jpg
Figure 6-1

Cloud Spanner instance

The resources are allocated during instance creation and are controlled by two main configurations. These configurations decide the compute and the way data is replicated, stored, and computed.

Spanner replicates data automatically. The number of replicas (or copies) to be created and their placement is controlled by the first configuration (instance configuration). There are two options available: Regional and Multi-Regional. In a Regional configuration, data is replicated across three zones within a single region selected by the user. Whereas in a Multi-Regional configuration, data is replicated across four zones in different regions, which are determined by the continent selected by the user. In effect, Regional provides safety against failure of an entire zone, whereas Multi-Region provides higher availability guarantees by providing safety against failure of an entire region itself. Figure 6-2 shows a hypothetical representation of an instance with a Regional configuration.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig2_HTML.jpg
Figure 6-2

Regional Cloud Spanner instance

Figure 6-3 shows a hypothetical representation of an instance with a Multi-Regional configuration.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig3_HTML.jpg
Figure 6-3

Multi-Regional Cloud Spanner instance

With the first configuration, we know the number of replicas to be created and the location where the data will be placed. The second configuration (node allocation) determines the instance serving (compute) and the storage. The resources are allotted per zone. Figure 6-4 shows a hypothetical representation of a regional instance with three nodes.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig4_HTML.jpg
Figure 6-4

3- Node Regional Cloud Spanner instance

As you can see, you get equal resources (both compute and storage) across all zones. Within each zone, data of all the databases of the instance will be distributed across these nodes, which are responsible for processing their part of the data.

Note

Although nine nodes might seem like overkill, this is in line with the Cloud Spanner guarantee of rich querying, strong consistency, high availability, and performance.

Later sections will discuss the way data distribution happens across the nodes. The next section looks at the data modeling concepts found in Spanner.

Data Modeling

Cloud Spanner behaves like an RDBMS in many ways but with a few differences. This section explains Spanner’s data modeling concepts, covering the differences wherever applicable. It begins with databases and tables.

Databases and Tables

Like in RDBMS, in Spanner as well:
  • Databases are created to hold related data together in the form of tables.

  • Tables store data in a structured format using rows and columns.

  • Schemas are predefined with a number of columns, column names, and data types and size (if applicable).

However, in Spanner, you define only one constraint, the NULLABILITY constraint. This enables you to control whether the column can have NULL values or not. In addition, Spanner tables support limited data types, as listed in Table 6-1.
Table 6-1

Spanner Supported Data Types

Category

Data Type

Number

INT64, FLOAT64

Text

STRING

Date Time

TIMESTAMP, DATE

Boolean

BOOL

Bytes

Bytes

Complex

ARRAY

Primary Keys

While defining the schema in Spanner, it’s mandatory to specify a primary key . Like RDBMS, Spanner’s primary key uniquely identifies each row and can be made of a single column (single key) or a group of columns (composite key). But the primary key in Spanner also has a few differences.
  • Primary key creation in RDBMS is not mandatory but in Spanner it’s mandatory.

  • The primary key of RDBMS contains one or more columns. The Cloud Spanner primary key can have zero columns as well. In this case, the table will contain only one row.

  • The primary key of RDBMS can never be NULL; however, Spanner’s primary key column can be NULLABLE. Since it’s a unique identifier, only one row can have the NULL value.

In addition, the Spanner primary key controls data locality, as data is stored in sorted order (this defaults to Ascending) by the primary key values in key-value format. The value includes all of the respective non-key columns. Let’s look at an example and create a table called Artists. The table schema looks like Table 6-2.
Table 6-2

Artists Table Schema

Column Name

Data Type

 

Artist ID

INT64

Primary Key

FirstName

STRING [10]

 

LastName

STRING [20]

 

EMailId

STRING [50]

 
The Artists table contains the data shown in Table 6-3.
Table 6-3

Artists Data

Artist ID

First Name

Last Name

EMailId

1

Mike

Jonas

[email protected]

2

Steffi

Graph

[email protected]

3

Stephan

Chart

[email protected]

4

Leslie

John

[email protected]

5

Kate

Charlie

[email protected]

Figure 6-5 shows a hypothetical representation of the physical layout of the Artists table, where with the Artist ID as the key, the remaining column data is stored as the value.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig5_HTML.jpg
Figure 6-5

Physical layout of the Artists table

Being distributed, Spanner partitions the data automatically and distributes across its allotted node. The partitioning is done by Spanner along the primary key dimension. So, it’s important to choose the primary key carefully.

Choose a Key to Avoid Hotspots

Keys should be chosen so that they enable even distribution of the data and do not create a hotspot. So what is hot spotting? Assume that the database hosting the Artists table is on a two-node instance. This implies that the table’s data will be distributed across the two nodes along the primary key dimension, ArtistId. Figure 6-6 shows a hypothetical distribution of data across the two nodes.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig6_HTML.jpg
Figure 6-6

Primary key ArtistId distribution across two nodes in Zone 1

Since the primary key called ArtistId is a monotonically increasing integer ID, as you start inserting new data, being incremental in nature, all the new additions will start to happen at the end of the key space. In effect, all inserts will start to happen only on Node 2. Figure 6-7 shows inserts of new the Artist IDs 6, 7, 8, 9, 10, 11, and 12.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig7_HTML.jpg
Figure 6-7

Inserted new Artists with IDs 6, 7, 8, 9, 10, 11, and 12

This scenario illustrates hot spotting, as the load weighs down a single node only. This effectively kills parallelization in Spanner , therefore affecting performance. In order to handle such scenarios, you should avoid keys that are sequentially ordered or monotonically increasing, like the ArtistId column. Instead, you should choose a key that evenly distributes data and enables even distribution of workload (reads and writes) and does not create a hotspot.

However, if it is unavoidable and you have to use a sequentially ordered key, you should see if you can hash it. Hashing has the effect of distributing across different servers quite evenly. In the previous example, if you hash Artist IDs, even distribution can be achieved as the hash values of the ArtistId columns will contain random alphanumeric data, which is highly unlikely to be monotonically increasing or decreasing. As a result, the representation will be free of hot spotting.

Note

There are various other techniques defined on the product site that you should take into consideration for efficient selection of the key.

The next section looks at the way related tables should be created for efficient querying .

Interleaving

Spanner enables you to choose the physical representation of your table data based on your querying pattern. In order to handle relationships where tables are related and are frequently queried together, Spanner enables you to define hierarchical (parent-child) relationship between the tables, which enables efficient retrieval as the data of the hierarchically related tables will be co-located and stored together on the disk.

Note

The parent/child relationship can be specified between the tables in up to seven levels of nesting.

The relationship is defined by creating the child table (table containing related information) as an interleaved table of the parent (main) table.

In order to understand this, the example introduces another table called Albums, which stores Artist-wise album details with schema and its data, as shown in Table 6-4 and Table 6-5 respectively.
Table 6-4

Albums Table Schema

Column Name

Data Type

 

AlbumId

INT64

Primary Key

ArtistId

INT64

 

Album Title

STRING [50]

 
Table 6-5

Albums Data

Album ID

Artist ID

Album Title

1

1

Junk yard

2

1

Go On

3

2

Girls like you

4

2

Color Green

5

2

Purple

As of now, the Albums table is created like any other normal table. With the key IDs of both tables (Artists and Albums), the physical layout of the data now looks like Figure 6-8.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig8_HTML.jpg
Figure 6-8

Physical layout of the Artists and Albums tables

As you can see, the data is stored in a contiguous space. In this case, it’s quite possible that, while distributing the data across nodes, rows of the Albums table end up on one node and the Artists table on the other node. This will be apt if you don’t have to query the tables together, but since in this case, you will be always querying Albums with reference to Artists, this representation will have a negative impact on performance. So in this case it’s apt to create Albums as an interleaved table of Artists. While you do so, you need to note that Spanner uses the primary key of the parent table to store the child data with the corresponding parent record. So as you interleave the table, primary key column(s) of the parent table are mandatorily prefixed to the primary key of the child table (in the same order starting with the same column as in the parent table). So, now the primary key of the Albums table will be composed of ArtistId and AlbumId. With the interleaving defined, the table schema is as shown in Table 6-6 and the physical layout of both the tables is as shown in Figure 6-9.
Table 6-6

Albums Table Schema

Column Name

Data Type

 

ArtistId

INT64

Primary Key (Column 1)

AlbumId

INT64

Primary Key (Column 2)

Album Title

STRING [50]

 
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig9_HTML.jpg
Figure 6-9

Interleaved Artists and Albums tables

As you can see, the data is interleaved. Every row from the Artists table is followed by all its rows from the child Albums table, which is determined by the ArtistId column value. This entire representation looks like a big flat table.

In effect, this process of inserting the child rows between the parent rows along the primary key dimension is called interleaving. The child tables are called interleaved tables and the parent tables are called the root table or top-level tables. As can be seen, this will enable fast querying, as data that’s queried together is stored together.

So while designing your schema, it is important to identify the querying pattern to identify the tables that need to be co-located. That way, when you’re actually creating the tables, proper interleaving can be defined.

You are now familiar with two important design decisions—choosing the proper primary key to avoid hot spotting and identifying the querying pattern to interleave the related tables. The next step is to understand another interesting concept, splits .

Splits

As you know by now, Cloud Spanner uses the primary key dimension to partition the data. Splits determine the point within the dimension where the partitions should happen. In effect, splits are used by Cloud Spanner to logically group the rows so that they are independent of each other and can be moved around independently without impacting the other group of data and performance.

Cloud Spanner automatically splits when it deems necessary due to size or load. Splits are done at the root row boundaries, where root row is the row in the root table, so that the root row plus all its descendants (row tree) are always in a single split. Figure 6-10 shows a row tree, i.e., root row plus all its descendants.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig10_HTML.jpg
Figure 6-10

Row tree

Although you don’t control when or how to split, you can control the way you define the root table and the row tree. In effect, it’s important to choose the correct root table while designing the database to enable proper scaling.

Returning to the example, you can see that there are a bunch of possible splits (as you are querying details of albums for a particular artist), each corresponding to a distinct value of the Artist ID, as shown in Figure 6-11.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig11_HTML.jpg
Figure 6-11

Splits

These are groups of rows that can be moved around without affecting any of the other rows. Splits complement the data distribution in Spanner. Having looked at important concepts of choosing a proper primary key, interleaving, and splits, it’s time to look at the indexes .

Secondary Indexes

As you know by now, data is stored in sorted order by the primary key, so any query involving the primary key field will always result in faster querying. However, querying on other non-key fields will lead to full table scan, leading to slower query performances. In order to cater to this scenario, like other databases, Spanner enables you to create secondary indexes on the non-key fields.

Creating a secondary index on a field stores the indexed field in sorted order (specified while creation, defaults to Ascending otherwise) with the associated key field value. Let’s say you want to index Artists by LastName (Ascending Order). Table 6-7 shows the data of Index on LastName.
Table 6-7

ArtistsByLastName(Ascending)

Last Name

Artist ID

Charlie

5

Chart

3

Graph

2

John

4

Jonas

1

So, if you query Fetch Artist ID's with LastName values in the Range specified, this index will be able to return results faster without any table scan as all the data is available within the index itself.

Storing Clause

Now you can rephrase the query to Fetch Artists Id and FirstName where the LastName value is in the range specified. Since in this case FirstName is not there in the index, this query will require a JOIN with the Artists table to fetch the FirstName, leading to a negative impact on query performance.

In order to cater to such a scenario, Spanner enables you to use the STORING clause, which copies the column’s data in the index. Table 6-8 shows the index ArtistsByLastName STORING FirstName field as well.
Table 6-8

ArtistsByLastName (Ascending) with FirstName Field Within the STORING Clause

Last Name

Artist ID

First Name

Charlie

5

Kate

Chart

3

Stephan

Graph

2

Steffi

John

4

Leslie

Jonas

1

Mike

Interleaved Indexes

As with interleaved tables, Spanner enables you to create indexes based on your usage. Say you want to index the albums by name, but instead of indexing all the albums together, you want to index them for each Artist ID separately. To cater to this, Spanner enables you to create interleaved indexes. These will be interleaved within the parent (root) table and are analogous to having a local index per root row. Table 6-9 shows the interleaved index, AlbumsByNameForEachArtist.
Table 6-9

AlbumsByNameForEachArtist

Artist ID

Album Name

Album ID

1

Go On

2

1

Junk Yard

1

2

Color Green

4

2

Girls Like You

3

2

Purple

5

As with interleaved tables, this enables co-location of related rows near each other, in effect improving performance, as the data to be referred together is always placed together.

So, indexes can be really powerful as they make the querying faster by saving data in the format you need. However, the implications of secondary indexes are that you end up paying a storage penalty, as some data is going to be stored multiple times, like with interleaved indexes and the STORING clause. This has a further impact on the write performance.

With all the concepts in place, it’s time to look at the way data will be distributed and replicated, with an example.

Replication

You know by now that Cloud Spanner instances automatically
  • Replicate the data based on the instance configuration chosen

  • Distribute the data based on the nodes allocated

  • Split the data along the primary key dimension

With these details, you can consider an example of a five-node Cloud Spanner Regional configuration instance and look at the way data is distributed and replicated. As mentioned, the Regional configuration implies that the data will be replicated across three zones within a region. A five-node instance means within each zone, the data will be distributed across five nodes. A hypothetical representation of the instance looks like Figure 6-12.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig12_HTML.jpg
Figure 6-12

Five-node instance with Regional configuration in us-central1 region

Assume the instance hosts a database named TestDB, which contains a table called TableA with schema, as shown in Table 6-10.
Table 6-10

TableA Schema

Column Name

Data Type

 

ID

INT64

Primary Key

Value

STRING

 
Let’s further assume the table has the splits outlined in Table 6-11.
Table 6-11

TableA Split Details

Split

Key Range

0

<100

1

>=100 and <500

2

>=500 and <1000

3

>=1000 and <2000

4

>=2000 and <2500

5

>=2500 and <4000

6

>=4000 and <7000

7

>=7000

Note

This split distribution is transparent to the user and is handled automatically by Cloud Spanner. The section is for understanding purposes only.

The distribution implies that if you have to look for a row with an ID 2500, you will have to check Split 5, as this is the split that holds data with a key range >=2500 and <4000. With all the assumptions, the hypothetical split distribution will look like Figure 6-13.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig13_HTML.jpg
Figure 6-13

Hypothetical representation of the data distribution

Here’s what you can learn from this distribution:
  • Data is stored in the form of splits.

  • A node hosts one or many splits. This will be the case when the number of splits is more than the number of nodes allotted to the instance.

  • Replication is at the split level.

  • The split copies (or replicas), which are replicated across independent failure zones, are determined by the instance configuration. In this example, you have three replicas for each split.

Each replica can serve different purposes based on its type. Table 6-12 lists the different types of Spanner replicas.
Table 6-12

Replica Types

Read/Write

Read Only

Witness

Replica holds full copy of the split

Replica holds full copy of the split

Do not maintain a full copy of data

Data is always up-to-date

Data can be stale

Not applicable

Serves both read and write traffic

Server read traffic only

These nodes enable us to scale our read capacity

Serves neither read nor writes

Can elect the leader and vote for whether the write is to be committed or not

Cannot participate in the leadership election nor can vote for committing writes; this implies increasing read replicas doesn’t increase the quorum size required for writes nor its location contributes to write latency

Can elect the leader and vote for whether the write is to be committed or not. These replicas make it easier to achieve quorums for writes without the storage and compute resources

Can be the leader as well

Cannot be a leader

Cannot be a leader

Each split replica forms a replica set. Within each set, one replica is elected as the leader and is responsible for all the writes to the split. All the other replicas (except the witness) can serve read requests. Figure 6-14 shows the leaders highlighted in orange for each replica set.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig14_HTML.jpg
Figure 6-14

Hypothetical representation of the data distribution with leaders highlighted

Spanner uses synchronous , Paxos (consensus protocol) based replication schemes where, whenever a write is issued, every voting replica votes before the writes can be committed. The major concern Cloud Spanner addresses is database replication on a global scale and provisioning of data consistency, even when multiple users conduct transactions that require the system to connect to datacenters across the globe. Cloud Spanner eliminates the bottleneck by using Paxos, which relies on an algorithm to create consensus among globally distributed nodes to conduct the commit. Later in the chapter, you will delve deeper into the way this works while carrying out the read/write operations in Spanner. Prior to that, it’s important that you understand the way operations happen.

Transactions

In Spanner, all operations—whether they are read, write, or read and write both—happen within a transaction. Spanner supports two modes of transaction: read/write and read only. These transaction modes differ in the way they handle access to the data. Let’s begin with the read/write transaction mode.

Read/Write

Read/write mode enables modifications (inserts, deletes, and updates) to the data. For this, they acquire locks to ensure that the data remains consistent. These transactions primarily contain write operations, which may or may not be followed by reads. Say you have a table that maintains some savings accounts details. The table schema looks like Table 6-13.
Table 6-13

Savings Account Table Schema

Column Name

Data Type

Account ID

STRING

Amount

INT64

Last Operation

STRING

Last Modified Date

DATE

Figure 6-15 shows a read/write transaction and the operations that occur within it.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig15_HTML.jpg
Figure 6-15

Read/write transaction

It first reads the amount from the table. Then it processes the data. Say, for example, that you check the amount. If it is greater than $100, you add $100’s to it. Finally, you save the updated data. Figure 6-16 shows the locking mechanism; the locks are acquired at each step.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig16_HTML.jpg
Figure 6-16

Locks acquired

As you can see, the read acquires a shared lock. Within the database at any point in time, you can have multiple concurrent transactions running. This lock enables the other transactions to continue reading. In effect, it does not block the other reads. When commit is initiated, it implies that the transaction is ready with the data to be applied back to the database. At this point it tries to acquire an exclusive lock on the data point. This will block all the new read requests for the data and will wait for the existing locks to be cleared. Once all the locks are cleared, it places the exclusive lock and applies all the writes. Once this is done, the lock is released. The Amount field will show the updated data.

Note that the updates in Cloud Spanner is not in-place updates, instead Spanner uses the MVCC (Multi Version Concurrent Control) method to modify the data. This implies that the writes do not overwrite the data. Instead, an immutable copy is created, which is timestamped with the write’s transaction commit time. In effect, Cloud Spanner maintains multiple immutable versions of data. The next section looks at the read only transaction mode.

Read Only

This is the preferred transaction mode when the operations are just read operations and do not involve an update. This mode does not acquire locks and is also not blocked by any read/write transaction. Whenever it starts, it takes a snapshot of the data as valid at that point in time and all reads within the transaction will return the same value. In effect, all reads within the transaction will have a consistent view of the values. Let’s further assume that the Account table contains data, as shown in Table 6-14.
Table 6-14

Accounts Data

Account ID

Amount

Last Operation

Last Modified Date

1

$1000

Credit

23rd August 2019

2

$90

Debit

18th August 2019

Figure 6-17 shows a read only transaction called TxnR.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig17_HTML.jpg
Figure 6-17

Read only transaction

As seen, you can have reads only with no updates. In general, you can use the reads to process and display the output to the end users. When you have multiple reads happening together, it’s best to use the read only transaction mode. In TxnR all reads will return the result as $1000, as the Begin transaction has captured a snapshot of the data, and all reads refer to that snapshot.

There might be cases where you need to do single read calls or need to read data in parallel. Cloud Spanner enables you to do this with single reads options, which are reads outside the context of a transaction. The single reads are the simplest and fastest operations in Spanner.

By default, the reads within a read only transaction and single reads pick the most recent timestamp; however, Cloud Spanner enables you to control the freshness of the data. At any point in time, Cloud Spanner can perform reads up to one hour in the past.

Note

As you saw earlier, write leads to a creation of an immutable copy. In effect, at any given time, you will have multiple copies maintained. Cloud Spanner continuously garbage collects the old copy in the background to reclaim the storage space. This process is known as “version GC”. By default, version GC reclaims the versions after they are one hour old. Because of this, Cloud Spanner cannot perform reads at read timestamps more than one hour in the past.

In effect, Cloud Spanner enables you to do two types of reads: strong reads and stale reads. Strong reads are the default reads that get the freshest data, wherein reads get to see the effect of all transactions that were committed prior to when the read started. Stale reads enable you to read past data and are further categorized into two types: exact staleness or bounded staleness. Exact staleness reads data at a user-specified timestamp, e.g., 2019-09-16T15:01:23. In bounded staleness, instead of specifying a timestamp, the user specifies a bound, such as 7s. Whenever the transaction is run based on the bound specified, Spanner chooses the newest timestamp within the bound. Bounded staleness is slower than the exact staleness reads; however, they return the freshest data. The read type is controlled by specifying the timestamp bound on the reads.

The read types are useful when you use read replicas to scale out reads. The read replicas can be a bit outdated. If the application is willing to tolerate staleness, then it can get faster responses, as they will be able to read from any nearest read replica and will not have to communicate with the leader replica, which may be geographically distant.

Now that you are familiar with the transaction modes available, let’s next look at the way Spanner handles concurrent transactions across its distributed environment.

Handling Multiple Transactions

Cloud Spanner provides the strictest concurrency control, which is external consistency. This implies that multiple transactions can run side-by-side on different servers (which are possibly spread out across different datacenters in different time zones) without causing any inconsistencies. Proper timestamping is mandatory for achieving external consistency. For this, Spanner uses TrueTime, a Google developed technology. This is a highly available, distributed clock enabling applications running on Google Servers to generate monotonically increasing timestamps without any need for global communication.

You can understand this further with an example of an employee payroll system. It contains three tables. The table schemas are shown in Tables 6-15 through 6-17.
Table 6-15

Salary Table Schema

Column Name

Data Type

Employee ID

INT64

Month

STRING

Salary

INT64

Processed On

DATE

Table 6-16

Advance Taken Table Schema

Column Name

Data Type

Employee ID

INT64

Month

STRING

Advance Taken

INT64

Is Adjusted

BOOLEAN

Processed On

DATE

Table 6-17

Reimbursement Table Schema

Column Name

Data Type

Employee ID

INT64

Month

STRING

Reimbursement Amount

INT64

Is Processed

BOOLEAN

Processed On

DATE

Read/Write Transactions

Let’s further assume you have two transactions running. The first transaction, called Tx1, is responsible for adjusting the advances taken and the second transaction, Tx2, adjusts the reimbursement amount. Figure 6-18 shows operations in transaction Tx1.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig18_HTML.jpg
Figure 6-18

Transaction for adjusting advances

Figure 6-19 shows the operations in transaction Tx2.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig19_HTML.jpg
Figure 6-19

Transaction for adjusting reimbursements

Sequential Occurrence
Now you can see the concurrent occurrence of the two. Figure 6-20 shows the two transactions occurring one after another.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig20_HTML.jpg
Figure 6-20

Execution of read/write transactions one after the other

As TrueTime is used to ensure proper timestamping, TrueTime guarantees that because Tx2 starts to commit after Tx1 finishes, all readers of the database will observe that Tx1 occurred before Tx2. This guarantees no update is ever lost.

Simultaneous Occurrence
Let’s next look at the transactions starting side-by-side, as shown in Figure 6-21.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig21_HTML.jpg
Figure 6-21

Transactions occurring side-by-side

The read’s acquire shared lock is in both transactions. As the write of Tx1 starts to commit prior to Tx2, Tx1 tries to acquire an exclusive lock. It will wait for the Tx2 acquired read lock to be released. In this scenario, Spanner uses the wound-and-wait algorithm and aborts transaction Tx2 so that Tx1 can acquire and continue with its update. Figure 6-22 shows the read locks acquired.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig22_HTML.jpg
Figure 6-22

Read locks acquired

Figure 6-23 shows transaction Tx2 being aborted so that Tx1 can continue with its updates.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig23_HTML.jpg
Figure 6-23

Transaction aborted (wound-and-wait)

In this process, the Tx2 commit fails and will be retried later. This is done to avoid deadlock. In addition, this also guarantees that no updates are ever lost if multiple concurrent transactions start, as at any point in time only one will be updating the data and the other will retry after some time. Under the hood, the age of each transaction is used to resolve the conflicting lock issue. The younger transaction will wait, but an older transaction will wound (abort) a younger transaction. By giving priority to the older transaction, Spanner ensures that eventually every transaction gets a chance to update.

Data Invalidated
Let’s next look at the case shown in Figure 6-24, when Tx1 reads and changes Salary data and Tx2 reads the same data.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig24_HTML.jpg
Figure 6-24

Transactions (data invalidated)

Since Tx2 started earlier than the Tx1 commit, it will read the Salary value, which is already invalidated due to changes made by Tx1. In this case as well, Tx2 will be terminated so that the update of Tx1 is not overwritten. In effect, prior to making the update, the transaction waits for the read locks to be released and also validates that all the read locks it acquired within the transaction are still valid. This implies no change is made to it after it has read the data.

Different Fields Updates
Take a look at the transaction shown in Figure 6-25.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig25_HTML.jpg
Figure 6-25

Transactions updating different fields simultaneously

Transaction Tx1 is working on the Salary field, whereas Tx2 works on the Processed On field. It is quite likely that both transactions will attempt to update the same row of data simultaneously. However, unlike the previous scenario, where transaction Tx2 was aborted to avoid the deadlock situation, here, both transactions will be executed successfully. This is due to the fact that Cloud Spanner locks at the cell level (a row and a column). The row is the same but the columns that both transactions are working on are different. So, there will be no conflict, as the cells that will be updated are different.

Blind Writes

Let’s now look at the scenario of concurrent read/write transactions, where you have only write operations that are not followed by any reads. Such operations are called blind writes. Inserting the current month’s salary details in the Salary table is an example. Write locks are usually exclusive, where only one writer updates the data. Although this prevents data corruption by preventing the race condition on the data, it does impact the throughput.

With blind writes in Spanner, the write locks are allowed to be shared. This is possible due to usage of TrueTime. TrueTime guarantees a unique timestamp to each transaction, so multiple blind writes can run in parallel, thereby updating the same data without worrying about data corruption or loss. This feature not only avoids race conditions but also brings in significant throughput for blind writes .

Read/Write with Read Only

Now you have a scenario of read/write and read only transactions running side-by-side, as shown in Figure 6-26.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig26_HTML.jpg
Figure 6-26

Multiple read/write and read only transactions occurring side-by-side

As you can see, the read only transactions Txr1 and Txr2 are interleaved in between the two read/write transactions, Tx1 and Tx2. You can further assume the Salary table data is as shown in Tables 6-18 through 6-20 at various timestamps.
Table 6-18

Salary Data at t0

Employee ID

Month

Salary

Processed On

1

Aug 2019

$1000

23rd Aug 2019

Table 6-19

Salary Data at t1

Employee ID

Month

Salary

Processed On

1

Aug 2019

$500

23rd Aug 2019

Table 6-20

Salary Data at t2

Employee ID

Month

Salary

Processed On

1

Aug 2019

$10000

23rd Aug 2019

Here are values read in each:
  • In Txr1, all reads will show values, as on timestamp t0, which is $1000 only, as it started prior to the Tx1 commit and the snapshot taken was of timestamp t0. All reads within the transaction return the value from the snapshot, so even though the second read is after the Tx1 commit timestamp, the changes will not be reflected in this transaction read.

  • The change of the Tx1 commit will be visible in the Txr2 read, as the snapshot for this transaction is after the Tx1 commit. Every read in this transaction will show $500. As with Txr1, reads in Txr2 will not show the changes made in transaction Tx2.

This guarantees safety against non-repeatable reads and dirty reads. Non-repeatable reads are reads within a transaction that see different values due to interleaved read/write transactions. Dirty reads are reads that see a change that has not yet been committed.

Having looked at the way operations happen in Spanner, it’s time to go back and look at the way Spanner carries out the read/write operations across its distributed dataset.

Distributed Transactions Explained

To understand distributed transactions, you need to return to the hypothetical data distribution, as shown in Figure 6-27.
../images/489070_1_En_6_Chapter/489070_1_En_6_Fig27_HTML.jpg
Figure 6-27

Hypothetical representation of the data distribution with leaders highlighted

The distributed data table schema and hypothetical split distribution are shown in Tables 6-21 and 6-22.
Table 6-21

TableA Schema

Column Name

Data Type

 

ID

INT64

Primary Key

Value

STRING

 
Table 6-22

TableA Split Details

Split

Key Range

0

<100

1

>=100 and <500

2

>=500 and <1000

3

>=1000 and <2000

4

>=2000 and <2500

5

>=2500 and <4000

6

>=4000 and <7000

7

>=7000

Let’s begin with writes.

Writes

With data being distributed across splits, writes can have two main use cases:
  • Involving a single split. An update statement is issued for row ID 7.

  • Involving multiple splits. An update value of row IDs 410, 500, 1300, and 2300.

Let’s begin with the first use case. The transaction issues the COMMIT statement to begin writing in the database. The following happens:
  • Step1: Identifies the leader replica for the split. In this example, the split is 0 and the leader replica resides on node 1.

  • Step2: The leader replica tries to acquire an exclusive lock on its local copy of the data. In addition, it forwards requests for agreement on committing to voting replicas.

  • Step 3: If the lock is acquired and a majority of votes are received, it tries to assign a timestamp for commit and wait for its certainty. In parallel, it sends the writes to the other replicas hosting the data copy. Replicas store the writes to stable storage and communicate back to the leader. Once the majority of replicas have responded and the commit wait is over, the leader is sure the commit will begin.

  • Step 4: Communicates the timestamp back to the client and leader. The other replicas start applying the changes.

  • Step 5: After completion, the locks are released.

Since this write involves only a single split, it is the cheapest and fastest write. Let’s now look at the second use case, which involves multiple splits. This kind of update uses an extra layer of coordination via the standard two-phase commit algorithm. Like in a single node operation, the transaction issues the COMMIT statement to begin writing in the database. The following happens:
  • Step 1: Identify the leader replica for the splits. The splits are 1, 2, 3, and 4. The leader replicas reside across nodes 1, 2, and 3.

  • Step 2: Since multiple splits are involved, one split leader picks up the role of the coordinator and the remaining leaders become the participants. For this example, assume that split 1 becomes the coordinator. As coordinator, its job is to ensure the atomicity of the transaction—the changes are applied on all the splits or no splits at all.

  • Step 3: The first phase of the two phase commit.

    Like the single split reads, all the respective split leaders (irrespective of the role) try to acquire an exclusive lock on the local data copy. In this example, the following splits tries to acquire a write lock on the row IDs Split 1 => Row Id 410, Split 2 => Row Id 500, Split 3 => Row Id 1300, and Split 4 =>Row Id 2300.

    Each split leader records its set of locks by replicating them to at least a majority of its split replicas. Once they are able to hold the locks, they notify the coordinator.

    If all the split leaders are able to successfully acquire the lock, the coordinator moves to the next step.

    Like in a single split read, the coordinator then tries to assign a timestamp to the commit and wait for its certainty. In parallel, it sends the writes to all the replicas. Replicas store the writes to its stable storage and communicate back to the leader. Once the majority has responded and the commit wait is over, the second phase of the commit begins.

  • Step 4: The second phase of the two phase commit.

    The coordinator communicates the outcome to all the participant’s leaders. The participant’s leaders communicate the same to all its replicas.

    The coordinator and all its participants apply the changes to their respective data. The coordinator leader communicates to the client that the transaction has been committed.

  • Step 5: After the completion, the locks are released

Irrespective of whether the write involves a single node or is multi-node, a write is marked to be committed only when write-quorum is achieved or, in other words, the majority of voting replicas (replicas who are eligible to vote for a write commit) agree to commit the write. Writes always happen on the leader replica and the non-witness replicas of the replica set. The next section covers read operations.

Reads

As you already know by now, reads can be part of read/write transactions as well as read only transactions. Reads that are part of the read/write transactions will always be served by the leader replica, as it requires locking. Reads that are part of read only transactions (or single reads) can be served from any replica. The need to communicate with the leader replica depends on the read type method: strong or stale.

Strong reads can go to any replica, but stale reads always go to the closest available replica. Here’s a strong read operation that reads rows with IDs >=0 and < 700.
  1. 1.

    Splits are identified. In this case, the splits are 0, 1, and 2. Being a strong read, the current TrueTime timestamp is picked and the request is sent to any replica of the splits 0, 1, and 2, along with the read timestamp.

     
  2. 2.
    This step depends on the replica type. In this case, you have only two possibilities—the replica is a leader or is not a leader.
    1. a)

      If the replica turns out to be the leader, reads can proceed directly, as data is always up to date on the leader.

       
    2. b)
      If it’s not a leader replica, there are two new possibilities. The data is updated or is not updated.
      1. i.

        If the data is updated, reads are served as is.

         
      2. ii.

        If it’s not up to date, it interacts with the leader to get the latest update. Once the updates are applied, the data state becomes the latest and the reads are served.

         
       
     
  3. 3.

    Results received from the replicas are combined and returned to the client.

     

If you do stale reads, then you can proceed without Step 2.b.(ii).

Summary

This chapter covered the way data is stored and looked at important modeling concepts, including a brief understanding of the way transactions are handled in Spanner. The next chapter starts on developing with Spanner.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.113.229