© Navin Sabharwal, Shakuntala Gupta Edward 2020
N. Sabharwal, S. G. EdwardHands On Google Cloud SQL and Cloud Spannerhttps://doi.org/10.1007/978-1-4842-5537-7_8

8. Best Practices

Navin Sabharwal1  and Shakuntala Gupta Edward2
(1)
New Delhi, India
(2)
Ghaziabad, India
 

Now that you are familiar with the relational database offerings of GCP—Cloud SQL and Cloud Spanner—it’s time to look at some of the best practices that will enable you to plan the usage of these cloud databases efficiently. Let’s begin with Cloud SQL.

Cloud SQL Best Practices

This section covers the best practices for efficient usage of the Cloud SQL instance.

Plan Resources Efficiently

Ensure sufficient resources are available as per the workload.

Storage

If an instance runs out of storage, there is a possibility that it might stop accepting requests or go offline. You should either ensure that you have selected Enable Automatic Storage Increase for the instance or manually monitor the usage and take necessary actions.

When automatic storage is enabled, storage is checked every 30 seconds and if it falls below the threshold size, additional storage is automatically added to the instance. However, while enabling this, you need to keep in mind that increasing storage is allowed, but once increased the storage cannot be decreased. You should enforce a limit as well because a sudden spike due a temporary increase in traffic can lead to a permanent increased in storage cost for the instance.

When manually monitoring the storage, keep check of the usage and employ techniques (such as purging activities) that routinely delete unused data and drop tables that are no longer required. If the storage is not freed up, you can employ techniques to check the threshold and run scripts to manually increase the storage incrementally. The status of the storage usage can be viewed on the Cloud Console.

Navigate to the Instance Detail page. In the Overview tab you have options to select the parameter. Select Storage Usage. By default, it shows the current hour’s usage. However, you can change it up to the last 30 days and analyze the behavior. Figure 8-1 shows the Storage Usage option.
../images/489070_1_En_8_Chapter/489070_1_En_8_Fig1_HTML.jpg
Figure 8-1

Storage Usage option in the Overview tab of the Instance Detail page

CPU

CPUs should be planned and assigned based on the workload. If the workload consists of CPU-intensive queries, such as sorting, regexes, and other complex functions, the instance might be throttled. As per the recommendation, CPU utilization of the instance should not be over 98% for six hours. This implies improperly sized instance for the workload it’s dealing with. Like storage usage, CPU utilization can also be analyzed in the Overview tab of the Instance Details page. Figure 8-2 shows the CPU utilization. If you find that CPU is becoming a bottleneck, move to a higher tier machine or a machine with better CPU capacity.
../images/489070_1_En_8_Chapter/489070_1_En_8_Fig2_HTML.jpg
Figure 8-2

CPU Utilization option in the Overview tab of Instance Detail page

RAM

Caching is extremely important for read performance. You should compare the size of the dataset to the size of RAM of your instance. Ideally the entire dataset should fit in 70% of the instance's RAM. That way, the queries will not be constrained to IO performance. If this is not the case, consider increasing the size of your instance's tier. The RAM usage can also be analyzed using the first card in the Overview tab of the Instance Detail page. Figure 8-3 shows the Memory Usage option.
../images/489070_1_En_8_Chapter/489070_1_En_8_Fig3_HTML.jpg
Figure 8-3

Memory Usage option in the Overview tab of Instance Detail page

Note

Stack driver monitoring can be used to monitor the usage and raise an alert if a specified threshold is breached. Accordingly, actions can be performed to either kill processes consuming the resources or increase the resource itself.

Maintenance Activities

Maintenance activities comprise disruptive updates that can lead to system restarts. If they are not specified, this can lead to disruptive updates any time of the day. Specify the window in which these updates have the least or no impact on your application/business/end users.

Use Read Replicas

If you have read-heavy workloads, create read replicas so as to offload the read traffic from the primary instance.

Instance Location

Ensure that the database is located near the writer/reader. Otherwise, sending data across datacenters introduces latency and leads to slower performance.

Keep Tables to a Reasonable Number

While designing your schema, keep a check on the number of tables being created across all databases in an instance. The count should be less than 10,000.

Avoid using too many tables, as more tables can lead to the instance being unresponsive and can have a negative performance impact. Keep tabs on the tables getting created. Use the following queries to determine the table count on your instance:
SELECT COUNT(*)
FROM information_schema.tables;
Run the following command to check database-wise tables:
SELECT TABLE_SCHEMA, COUNT(*)
FROM information_schema.tables
group by TABLE_SCHEMA;

You should try reducing the tables you created but if the data architecture requires a large number of tables, you should consider splitting them across multiple instances such that the split data is independent and does not lead to inter-instance querying.

Sharding at Designing

Instead of having one large instance, if possible you should consider using multiple small Cloud SQL instances, i.e., shard the instances wherever possible. Otherwise, you will be limited with everything happening on a single server and will have performance bottlenecks once the instance is at its maximum limit (due to no support for horizontal scaling for writes).

Connection Management

There’s a limit to the number of concurrent connections that can be made to the instances. So, you need to keep in mind the following while managing them:
  • Use connection management practices (such as connection pooling and exponential backoff) to help improve the application’s use of resources, thus helping to stay within the connection limit.

  • Avoid connection leakages. Ensure applications are closing the connections properly and not leading to leakages.

  • Test application behaviors for outages to test for failovers and maintenance updates. During these events, there’s a high possibility of the primary instance being unavailable. The application should be coded to handle such outages by employing connection management techniques such as exponential backoff.

  • Keep transactions small and short.

Certificates

All the certificates used should be kept up-to-date. For example, if you are using Cloud Proxy, you should keep the cloud SQL proxy up-to-date.

Long-Running Operations

Long-running operations cannot be cancelled or stopped until they are complete. In addition, only one operation can run against an instance at a time. So long-running operations should be planned in a way so they don’t overlap with any other operations on the instance.

Use Unique Instance Names

Instance names cannot be reused immediately after deletion, because Cloud SQL reserves the names for a few days. If you have to frequently create and delete instances, you should consider using a timestamp as part of the name to avoid conflicts.

MySQL-Instance Specific Pointers

This section includes a few pointers specific to the MySQL database engine.

Limit the Replication Lag (<1200 Seconds)

A higher replication lag can lead to downtime during failover. Failover is initiated when the primary instance is unresponsive and a higher value in the replication lag makes the failover instance unusable, leading to a failover downtime. In effect, you need to ensure that it’s less than the limit, which is capped at 1200 seconds. Keep monitoring the lag using the Replication Delay metric from the Overview tab of the failover replica Instance Details page.

If it seems to be on the high side, take corrective action (find ways to throttle the incoming load on the master or design a data architecture by sharding the data at the design level itself).

Database Flags

As you know by now, database flags are used to configure and tune the instance by adjusting parameters and options. Various flags are available; however, you need to keep the following in mind when adjusting the values of the flags listed in Table 8-1.
Table 8-1

Database Flags

Flag

Description

Recommended Setting

Impact if Set Otherwise

general_log

Enables the MySQL general log

If On then set the log_output flag to FILE

Slow restarts

slow_query_log

Enables logging to identify the slow performing queries

If On then set the log_output flag to FILE

Slow restarts

max_heap_table_size

Determines size of the memory table

Retain the default

Instance outage due to out of memory error

temp_table_size

Determines size of the temp table

Retain the default or carefully plan the workload so as not to exceed the instance capacity

Instance outage due to out of memory error

query_cache_size and query_cache_type

Together they determine the size of the query cache

Retain the default or carefully plan the workload so as not to exceed the instance capacity

Instance outage due to out of memory error

Having covered the Cloud SQL pointers, the next section looks at some Cloud Spanner pointers.

Cloud Spanner Pointers

This section covers pointers that you need to keep in mind to efficiently design for a Cloud Spanner instance.

Slower Transactions with Scale Out

Spanner provides cross-node transactions with ACID guarantees and can scale out write transactions linearly. However, the scale is accomplished by accepting some trade-offs such as increasing latency with each node due to increase in network latency in votes for write. This means the transaction will become slow as you scale out more.

Migrating Workloads from Existing SQL Setups Is a Cumbersome Task

If you are migrating workloads from an existing SQL setup to Spanner, you need to note that Spanner is neither MySQL nor ANSI SQL compliant, so migrating the workload from MySQL or any ANSI SQL compliant RDBMS requires some level of re-architecture or re-platforming.

Specify Referential Integrity at Design Time

As you know, when designing a data model, Spanner enables you to define interleaved parent/child relationships at the table level, enabling you to co-locate data to be referred together for efficient retrieval. In addition, while defining the model, you check for referential integrity applicability. That is, any deletion in the parent table will cascade the deletion to the child table as well. It’s important that while designing, you ensure the tables are accordingly defined. If you fail to do so, referential integrity won’t be defined and it won’t be automatically enforced, offloading the usual RDBMS-native activity to the application programmers, which is not an ideal scenario. In addition, if you later need to change a regular table to a child table, this will lead to recreating the table, leading to downtime.

Co-Located Joins Offer Performance Benefits

While designing, you need to ensure that tables that will be referred together are co-located using the interleaved options. Co-located joins offer performance benefits over joins that require indexes or back joins. Note that JOINs across non-interleaved tables suffer performance penalties.

Be Careful When Choosing Your Primary Key

The choice of columns for the primary key is very important, as changing the primary key (adding/removing columns) in later stages is not supported in Spanner. You will have to end up dropping and recreating the table, which means downtime.

In addition, when choosing a primary key, it is recommended not to choose a column with values that are monotonically increasing or decreasing (such as auto-increment IDs, timestamp columns, and so on), as this can accidentally lead to hotspots. If you are forced to use such IDs because of applications or other constraints, use hashing techniques to avoid the pitfalls of hotspots.

Being a primary-key oriented design permits Cloud Spanner to be very fast when accessing data using the primary key. However, you need to keep the following issues in mind when working with it:
  • The primary key value cannot be updated easily. If you must update it, you must first delete the original primary key and then insert the updated value.

  • Any UPDATE and DELETE operation must specify a primary key in the where clause. There can’t be a DELETE ALL or UPDATE ALL statement.

Specify the FORCE_INDEX Directive

Cloud Spanner chooses an index automatically in rare circumstances. If the query requests a column that is not stored in the index, Spanner will not choose a secondary index automatically. In effect, you must use the FORCE_INDEX directive in the code wherever applicable to choose the secondary index.

Multitenancy Support Availability

For the multitenancy requirement, you should include a customer ID column in the table and make it the first key column, so that each customer has good locality.

Cloud Spanner will automatically split the data across the nodes based on the size and load patterns. For example, say you have a Tickets table that stores user tickets that are being created. Ticket ID is the primary key for that table. If you have to introduce multitenancy support, introduce a Customer ID column in the table. The primary key becomes a composite key of Customer ID as the first column followed by the Ticket ID. This way, you can have a single table for all your customers while the data is still uniquely identifiable for each customer.

Why not opt for separate databases for each customer? If you do this, you will be restricted to 100 customers per instance due to the spanner limit of 100 databases per instance. However, if your application requires you to store data in different databases, you should go for a separate database for each customer.

Avoid Existing Schema Changes

Although schema updates are allowed in Cloud Spanner, note that the time it takes to update the schema depends on whether the update requires validation of existing data or backfill of any data.

For example, if you want to change an existing column that allows NULL values to be NOT NULL, the Cloud Spanner will validate all the existing data in the columns before marking the change as successful or not. That means that the time it takes for the change to happen depends on the amount of data in the table, the number of nodes in the instance, and the load on the same. Similarly, if you add a new index to an existing table, the index will be refilled, which is a time consuming activity.

Some schema updates can also impact the behavior of requests to the database before the schema update completes. Continuing with the previous example, where you are adding NOT NULL to a column, Cloud Spanner almost immediately begins rejecting writes for new requests that use NULL for the column. If the new schema update ultimately fails for data validation, there will be a period of time when writes are blocked, even if the old schema would have accepted them.

If schema changes are unavoidable, the best practice is:
  • Ensure that existing data meets the constraints imposed by the schema update. For example, if you’re adding a NOT NULL annotation to an existing column, check that the column does not contain any existing NULL values.

  • If you’re writing to a column, table, or index that is undergoing a schema update, ensure that the values being written meet the new constraints.

  • If you’re dropping a column, table, or index, ensure that no write or read transactions are in process.

Note

Any schema change that does not require existing data validation or filling will be quick (such as adding a new column).

Random Batch Ordering

When you’re issuing multiple batches for the schema change, the order in which the changes are applied is random. Thus, you have to ensure idempotent changes so that changes applied in any order lead to the same state.

Say, for example, you issue two batches where one batch contains ALTER TABLE Students ALTER COLUMN Name STRING (100) and the other batch contains the statement ALTER TABLE Students ALTER COLUMN Name STRING (50). Cloud Spanner will leave the column in one of these states, but which one is not specified.

Data Loading

The best practices for bulk loading data into Spanner are as follows:
  • Sort the data by primary key

  • Divide it into 10 * number of nodes separate sections

  • Create sets of worker tasks to upload the data in parallel

Since scaling the number of Cloud Spanner nodes up and down just requires a click, for quickly loading the data, you can consider boosting the instance to the maximum. Once the data is in the database, you can scale it back down to the number of nodes suitable for the usual load.

Limited Access Control

Google’s IAM (Identity & Access Management) is used by Cloud Spanner for access control. It enables you to set permissions at a very high level,1 which doesn’t fit many of the production use cases, such as limiting user access to a specific table or to a subset of table columns, and so on. This limitation forces you to handle security measures through coding or configuration.

In summary, Google Cloud Spanner offers scale-out right out of the box, which specifically means that Spanner is capable of the following:
  • Scales out reads and writes without any change in the application

  • Maintains transactional ACID guarantees across all nodes in the cluster

Cloud Spanner is best suited when you know the queries. However, it is not suitable for workload and business rules that require ongoing schema changes-type flexibility.

Summary

This chapter covered some of the best practices that you should be aware of for better and efficient use of Google database Cloud SQL and Spanner. With this, you come to a conclusion of GCP relational database services—Cloud SQL and Cloud Spanner.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.39.252