Authorizing access to database using a secure connection
Enabling backups for data recovery
Running maintenance activities to ensure that the instance is up-to-date with version upgrades and security patching.
Managing instance resources
Tuning the database parameters for performance optimization
Ensuring that the instance remains highly available.
Analyzing logs for diagnosing issues with the instance
The administrative activities are configurable and Google enables us to manage the configurations using Advanced Configuration options while provisioning the instance itself or from the Instance Detail page.
Many of these comes with default options pre-configured. Let’s run through each activity and look at the configuration.
Note
All the activities discussed below are common for both the database engines of CloudSQL – MySQL and PostgreSQL. If there are differences, we will highlight as we proceed.
Instance access management
High Availability and Scalability configuration options
Backup and restores
Maintenance and resource management.
Logs for diagnosing
Let’s first start with securing access to the instance.
Authorization and Security
Securing access to the database instance is of utmost importance. In some situations, when we are dealing with development or testing an environment, we can temporarily ignore security and allow open access to the instance; however, when we move to a live (production) environment and start dealing with actual data, then the insecure configuration can be risky, e.g. consider an application handling Credit Card details. Leaving the database open to all will result in a disaster by compromising the database.
Let’s look at ways of securing access to our CloudSQL instance. This depends on the ways of accessing.
Authorized Network
Add networks using CIDR notations. Let’s look at few examples of CIDR notation. If we have a single instance e.g. 199.27.25.21 from where we need to restrict access, we should specify CIDR notation as 199.27.25.21/32. Using /32 implies that the IP is an exact IP.
If we have a range of instances following a pattern e.g. instances with IP addresses from 199.27.25.1 to 199.27.25.24 follows a pattern where only the last number is varying. In this case we can add network using the CIDR notation 199.27.25.0/24. Using /24 implies that the first three numbers are fixed and only the last number varies.
Once we are done with the configuration, Click Save. This updates the list of authorized networks. This configuration limits access at the network level. This will make the instance invisible to networks.
Though the above configuration restricts the access to our CloudSQL instance but the communication from the authorized network remains unencrypted (this is the default nature of CloudSQL). This default setting can possibly lead to data being intercepted when in transit leading to data breach.
When working with sensitive data e.g. Credit Card Details, Bank Account details, PIDs, securing data in transit is extremely important. Let’s next look at the CloudSQL configuration for securing the data in transit.
Data in Transit
SSL (Secure Socket Layer) is recommended to be used with authorized networks to secure data in transit.
For the ones who are new to SSL, SSL is a standard way for safely sending sensitive data from point A (web server) to Point B (CloudSQL instance) over unsecured public network where someone could intercept and either capture the information or act as man in the middle and maneuver the data while in transit.
Server CA’s certificate
Client certificate and
Client Private Key
Click on Download button. It downloads the certificate as server-ca.pem file.
Though authorized networks with SSL encryption provide a secure method for establishing connection however this model will not fit very well if we have a dynamic environment.
Such environments have no static IPs available, they are dynamic in nature, clients may live for a short period and the environment may rapidly scale to large numbers.
In order to address this CloudSQL provides the Cloud SQL proxy. The Cloud SQL Proxy provides secure access without having to authorize (whitelist) individual IP addresses of the instances. It doesn’t even require SSL configuration.
Cloud Proxy
Secure connections: The proxies help shield the real IP addresses of your instances. It automatically encrypts traffic to and from the database.
Easier connection management: Proxy handles the authentication with Google Cloud SQL, removing the need to provide static IP addresses for authorizing the network.
The Cloud SQL proxy works by having a proxy client running on the client machine in the local environment. The client application communicates with the proxy client using standard database protocol e.g. JDBC/ODBC. The client does not directly talk to CloudSQL as a result bringing in great deal of additional security. The proxy client in turn uses a secure TCP tunnel to communicate to the CloudSQL.
While setting up the proxy on client we need to provide details of the proxy server. We also need to specify where to find the credentials which will be used to authenticate the client applications. We have covered all the steps required for connecting using Cloud Proxy in the previous chapter.
In the above examples we chose Public IP addresses and Cloud Proxy to connect to our CloudSQL instance. In addition to the above two options Google also has Private Networking connectivity option. Private network also termed as Virtual Private Cloud (VPC) enables much easier, faster and secure connection to CloudSQL.
Private IP Address
Note
Creation of the network is not in scope of this book.
We leave the default selection and Click on Allocate and Connect. Within few seconds a Private IP for the selected network will be allocated to the instance. Click Save for the details to be updated. In order to validate the Private IP allocated. Navigate to the Overview Tab and take a look at the Connect to this instance card. The Private IP allocated will be visible as shown in Figure 4-17.
Note
this option is only visible if private IP is selected for the instance.
Enhanced Network Security as the instance is no longer exposed to the outside world.
Reduced Network Latency
Authorization
instance management e.g. start/stop/create database etc. and
data management e.g. ability to insert, delete, select data.
The first is controlled via the GCP IAM access control and the other is controlled via the database engine user creation. Let’s first look at the IAM access control.
IAM is the identity and access management service of GCP which enables us to control actions on our resources e.g. our CloudSQL instance.
In the name enter the name of the member and select the role like we did while editing. Click on Save to add the user with the permissions enabled.
Enter the user name and password. The users created have the same privileges as the default user. To manage the access we will have to use GRANT/REVOKE in case of MySQL2 and ALTER ROLE in case of Postgres.3 As an admin it’s important for us to provide granular controls on actions a user can perform on our instance.
With the access and authorization part handled, the next important activity is to ensure that at any point in time if our database becomes inconsistent because of accidental update or delete we are able to get our database back to a stable state.
Backups and Restores
In database world the backup & restore mechanism enables us to get our instance back to a stable state at a point in time. The trick is not in taking backups but in scheduling backups in such a way that we can restore to the latest stable data without much data loss.
In CloudSQL automated daily backups are by default selected while provisioning the instance and they are scheduled to start in a defined period. We should change the time period which best suits our environment workload.
Note
The automated backups can be disabled as well. However, this is apt only when we are dealing with development environment and is not recommended in a live production environment.
Though the backups don’t have any impact on write operations still it’s best to choose off peak hours so that it doesn’t overlap with users trying to modify data in the system.
CloudSQL retains the most recent seven automated backups per instance. These backups are incremental which means the oldest backup is a full backup and all the subsequent backups are incremental i.e. only the changed data from the previous backup are captured. This helps save storage required for backups.
Our application requires the backups to be retained for more than 7 days
We are about to take actions which can impact the data consistency and stability and we cannot wait for the automated backup
We want flexibility to take backups more than once per day.
All the backups we create will be deleted when we delete the instance. CloudSQL stores the backup in two regions for redundancy to ensure safety against any disaster situations. However, if the region from where the backup was originally taken is unavailable, then we will not be able to restore from it. This limitation should be kept in mind while devising the backup and high availability strategy.
Note that while restoration is in progress the target instance remains unavailable. Within few minutes the data is restored back and the instance becomes available.
CloudSQL MySQL - Binary Logging
Binary logging is an option which is by default enabled along with automated backups while provisioning CloudSQL – MySQL instance. Binary logging records every data modification operation in an independent log post the last stable backup.
This will be helpful in situations where we accidently end up modifying some critical information. We can use the last stable backup and the log file coordinates till that operation to restore the data back to the state prior to the operation.
It is recommended to keep it on if the application deals with lots of data operations as it will be extremely helpful to restore the state back to a specific point in time. However, we need to note that since binary logging logs all operations, it impacts the write performance.
This change restarts the instance, so a few minutes’ downtime should be expected when changing this configuration.
Note
Automated backup should be enabled for enabling binary logging.
Database restores using the binary logging differ from restoring from a backup. A point in time recovery creates a new instance with the recovered data. While restoring we choose the stable backup and specify the binary log file name and the position of the event till which we want to recover. If we don’t specify the coordinates, then the instance is recovered up to the latest binary log coordinates.
With backups enabled and scheduled, let’s next look at maintenance for our instance. Maintenance is important for proper functioning of the instance. It involves tasks such as applying security patches, upgrading the instance to newer releases so that the known bugs and issues are fixed.
Maintenance
CloudSQL handles the maintenance activities for us. It’s scheduled to run any day any time by default when an instance is provisioned. We should set the maintenance window to day and time when it’s acceptable to carry out the maintenance activities, so that the end users of our application are not impacted.
E.g. if the application is Reporting and Analytics application which is used by business users, then late nights on weekend start is a good time, however if it is a social networking kind of application then early weekdays will be a good time.
Also note that the time is in our local time zone, so while choosing the time we need to take into consideration the location of our end users.
This decides the time when the upgrades will be applied, as soon as a new release happens (earlier) or with delay (later). For this example, we choose later, which is also a recommended option.
With the schedule set, Google takes care of the rest. With connections secured, backups and maintenance scheduled, let’s look at managing resources.
As few of us might already be aware that as the load increases whether its data or end users accessing our application, the instances start to show performance issues. We have to accordingly increase the resources (Vertical scaling) for maintaining optimal performance.
Managing Resources
Instance Type which determines the Compute - CPU, memory
Storage capacity – this determines the disk performance
If our load increases (or decreases), we can choose to upgrade (or downgrade) to any of the available instance types in just a single click. The choice of the instance type will be completely dependent on your application workload.
E.g. For memory intensive applications, it’s best to choose from high memory instances. These applications typically would pin a lot of data in memory so that the response time of returning results from the database is faster.
We can look at the variants d1-n1-standard-4 and d1-ni-highmem-4. Though both have same cores, but they differ in the memory allocated.
By default, it is set to 10GB and enabled for automatic storage increases to handle the increase in data load.
In effect, in CloudSQL the probability of running into low disk space issue is rare as it automatically increases the storage as we near the capacity. The growth factor is set to indefinite i.e. there’s no restriction.
However, as a best practice we should limit the growth. Because though increasing the disk size is instantaneous (this change doesn’t require any restart), going back is not possible. If a temporary data fluctuation happens, we will end up with a large storage capacity, then the actual requirement.
Though we don’t require changing the disk size, however we need to note that the size is proportional to the performance. Larger the disk size, better is the IOPS capacity and fetches. So, if our application is doing frequent disk access then it’s good to start with a larger disk size for better disk performance.
To increase the disk size, just enter your desired size next to the storage capacity and click save. The change is almost instantaneous as no restart is required.
As already mentioned while increasing the size is just few clicks, once increased we will not be able to decrease the size. Keep this in mind while changing the capacity as storage on cloud is charged on pay per use basis.
Now that we have an upgraded instance it doesn’t imply that the database engine will automatically tune to the changes. As with on-premises database, we will have to tune the parameters to effectively utilize the resource increase for maximized performance. Let’s understand with an example.
We have a reporting application using mysqllive instance for data analysis. This ends up creating in-memory temporary tables for data processing. The in memory temporary table size is controlled by max_heap_table_size parameter of MySQL database engine. This is by default set to a 16 MB limit.
As the temporary table size crosses the limit it gets converted from in memory to on disk table, impacting the performance of the processing query as in memory processing is significantly faster than disk processing.
Let’s say with this observation of data size increase, we upgrade the instance to a high memory instance type. Now we have sufficient memory at our disposal, but to reap benefit of this memory increase for temporary table processing we will have to tune the max_heap_table_size parameter. We choose to set it to 256 MB.
We choose a value in the range displayed and that’s all to it. To add more, click on Add Item again. Once done setting all the flags, click on Save.
Click on Save and Restart and within few minutes the parameters are configured. An exhaustive list of parameters is already available in the respective database engine documentation sites.
Let’s finally look at the most important activity, being highly available ensuring our instance is fault tolerant. Goal is to ensure that the instance continues to run and remain available despite component or service failures. This is extremely important if we are dealing with critical applications.
High Availability
Replication is the mechanism of replicating data across multiple instances ensuring data redundancy. This help alleviates system failures. In effect, replication enables highly available system. Let’s look at replication in CloudSQL.
Replication
CloudSQL provides a clustered environment for replication comprising of a primary (or master) instance, a replica for fault tolerance and one or more read replicas. Let’s first look at Read Replicas.
Read Replica
MySQL Read Replicas sync with the Primary instance using asynchronous replication. In asynchronous replication the primary instance keeps writing the events to its binary log. But it does not track whether or when a replica has accessed and processed the events.
Note
Replication in MySQL expects binary logging to be enabled. Please refer to the section on binary logging for more details.
PostgreSQL Read Replicas use the PostgreSQL default replication methodology - write ahead log (WAL) shipping. Every transaction in PostgreSQL is written to a transaction log called WAL (write-ahead log) to achieve durability. A read replica uses these WAL segments to continuously replicate changes from its primary instance.
In effect in either of the database engine creating a read replica does not impact the performance of the primary instance as the operation doesn’t have to wait for any acknowledgement from the read replicas.
Read Replicas can be used to horizontally scale our read loads and offload the read requests from the primary instance. If we have applications with more reads than writes, then turn on multiple read replica instances and route read traffic to these instances .
Fault Tolerance Replica
Replica for Fault tolerance is same as Read replica, however it’s primary job is to be ready as a primary (or master) instance replacement, reducing downtime and ensuring the data continues to be available to client applications in case of any disaster e.g. either zone where the primary (or master) instance is located experiences an outage or the primary (or master) instance is unresponsive.
CloudSQL continuously checks the health of primary (or master) instance. If the instance becomes unresponsive, an automatic switch over happens to the standby fault tolerant replica for serving data. This entire process is called a failover. Let’s first look at the failover process of MySQL database engine.
MySQL
CloudSQL waits for the failover replica to catch up to the primary instance’s state.
Note
This step is affected by replication lag. This is difference in time from when the primary instance makes an update to when the failover replica catches up with that update from its log. Replication lag should be acceptable to initiate the failover.
Let’s next look at the failover process of PostgreSQL database engine.
PostgreSQL
As with MySQL, the failover process will be initiated as the primary instance becomes unresponsive. During failover the IP address and name of the primary instance is transferred to the standby instance and the instance is restarted. After failover, connected application resumes connection to the new primary instance without noticing any change as the connection details remains the same. Let’s now enable high availability.
Enable High Availability
This implies that the failover replica will be created in the same region as that of primary instance but in a different zone. By creating the failover replica in a different zone we can be certain that our instance remains available even if a zone outage occurs.
We have an option Any in zone selection. Selecting this implies that we don’t have any zone preference and we leave the decision on GCP to place our replica in any zone listed in the dropdown. Let’s retain the default selection us-central1-b.
The remaining parameters come configured as that of the primary instance. Since the prime responsibility of this instance is to act as a standby for the primary instance, we retain all the configurations as is and click on Create.
As is seen, unlike MySQL there’s no separate instance provisioning being done. Let’s next create Read replicas.
Read Replicas
First we have to specify the Instance Id. As with Failover replica form, the read replica instance id also comes pre-populated. Let’s retain the default ID. Next we specify the location for the replica. Like Failover Replica, region remains disabled, however note, unlike the failover replica the primary instance zone is listed in the Zone drop down. This implies we can create the read replica in any zone in the region including the primary instance zone as well. We retain the primary instance zone.
Like Failover replica, the read replica form also comes prepopulated with the primary instance configuration. However, since the read replicas are not standby for primary instance and will be primarily used to scale read requests, we can customize the read replica.
The primary configuration which we may change is the instance type, this choice depends on whether we require a more powerful (upgrade) or less powerful (downgrade) read replica. This is determined by our application workload, if its read heavy we can opt for a powerful read replica.
The steps for creating PostgreSQL Read replica is same as above.
With the HA enabled system setup, let’s next look at the way we can diagnose issues.
Logging
However, we need to note that this logs are of operations being performed using Console, gcloud utility or CloudSQL API only.
In order to view error and logs from clients e.g. MySQL Client we can use GCP Stackdriver logging Logs viewer. Stackdriver logging enables us to log and analyze our log data. This also enables us to create a metric wherein we can create notifications as an event occurs for proactive resolutions.
As we can see it enables us to run analysis using Filter. We can also Create a Metric which can be monitored for notification purposes. To learn more on working with Log Viewer and Monitoring, refer Stackdriver Logging.4
We can enable the log collection for our instance using the database flags options.
Note
Stackdriver logging charges will be applicable.
For this example, let’s enable SLOW Queries Logging for our instance, this is important for individuals responsible for managing the MySQL database. This will enable us to identify inefficient or time-consuming which needs optimizations.
log_output=‘FILE’
slow_query_log=on
long_query_time=2 (value is in seconds)
With this we covered important activities for setting up and managing CloudSQL in a live (production) environment.
Summary
In this chapter we configured CloudSQL to work in a high availability architecture. We also saw how to configure CloudSQL to work in a scale out architecture for high volume read applications using read replicas and also looked at ways of diagnosing the instance. With this we conclude usage of CloudSQL instance. Let’s next begin with CloudSpanner.