10 Using a relational database service: RDS

This chapter covers

  • Launching and initializing relational databases with RDS
  • Creating and restoring database snapshots
  • Setting up a highly available database
  • Monitoring database metrics
  • Tweaking database performance

WordPress is a content management system that powers substantial parts of the internet. Like many other applications, WordPress uses a relational database to store articles, comments, users, and many other data. It is fair to say that relational databases are the de facto standard for storing and querying structured data, and many applications are built on top of a relational database system such as MySQL. Typically, relational databases focus on data consistency and guarantee ACID (atomicity, consistency, isolation, and durability) database transactions. A typical task is storing and querying structured data, such as the accounts and transactions in an accounting application. If you want to use a relational database on AWS, you have two options:

  • Use the managed relational database service Amazon RDS, which is offered by AWS.

  • Operate a relational database yourself on top of virtual machines.

The Amazon Relational Database Service (Amazon RDS) offers ready-to-use relational databases such as PostgreSQL, MySQL, MariaDB, Oracle Database, and Microsoft SQL Server. Beyond that, AWS offers its own engine called Amazon Aurora, which is MySQL and PostgreSQL compatible. As long as your application uses one of these database systems, it is not a big deal to switch to RDS. The trickiest part is migrating the data, which you will learn about in this chapter as well.

RDS is a managed service. The managed service provider—in this case, AWS—is responsible for providing a defined set of services—in this case, operating a relational database system. Table 10.1 compares using an RDS database and hosting a database yourself on virtual machines.

Table 10.1 Managed service RDS vs. a self-hosted database on virtual machines

 

Amazon RDS

Self-hosted on virtual machines

Cost for AWS services

Higher because RDS costs more than virtual machines (EC2)

Lower because virtual machines (EC2) are cheaper than RDS

Total cost of ownership

Lower because operating costs are split among many customers

Much higher because you need your own manpower to manage your database

Quality

AWS professionals are responsible for the managed service.

You’ll need to build a team of professionals and implement quality control yourself.

Flexibility

High, because you can choose a relational database system and most of the configuration parameters

Higher, because you can control every part of the relational database system you installed on virtual machines

You’d need considerable time and know-how to build a comparable relational database environment based on virtual machines, so we recommend using Amazon RDS for relational databases whenever possible to decrease operational costs and improve quality. That’s why we won’t cover hosting your own relational database on VMs in this book. Instead, we’ll introduce Amazon RDS in detail.

In this chapter, you’ll launch a MySQL database with the help of Amazon RDS. Chapter 2 introduced a WordPress setup like the one shown in figure 10.1 and described next; you’ll use this example in this chapter, focusing on the database part:

  1. The user sends an HTTP request to the load balancer.

  2. The load balancer distributes the incoming request to a fleet of virtual machines.

  3. Each virtual machine runs a web server, which connects to a MySQL database as well as a network filesystem.

Figure 10.1 The company’s blogging infrastructure consists of two load-balanced web servers running WordPress and a MySQL database server.

After the MySQL database is up and running, you’ll learn how to import, back up, and restore data. More advanced topics like setting up a highly available database and improving the performance of the database will follow.

Not all examples are covered by the Free Tier

The examples in this chapter are not all covered by the Free Tier. A warning message appears when an example incurs costs. Nevertheless, as long as you don’t run all other examples longer than a few days, you won’t pay anything for them. Keep in mind that this applies only if you created a fresh AWS account for this book and nothing else is going on in your AWS account. Try to complete the chapter within a few days; you’ll clean up your account at the end.

10.1 Starting a MySQL database

In the following section, you will launch the infrastructure required to run WordPress on AWS. In this chapter, we will focus on the MySQL database provided by RDS, but you can easily transfer what you learn to other database engines such as Aurora, PostgreSQL, MariaDB, Oracle, and Microsoft SQL Server, as well as to applications other than WordPress.

When you follow the official “How to Install WordPress” tutorial (see http://mng.bz/G1vV), one of the first steps is setting up a MySQL database. Formerly, you might have installed the database system on the same virtual machine that also runs the web server. However, operating a database system is not trivial. You have to implement a solid backup and recovery strategy, for example. Also, when running a database on a single virtual machine, you are introducing a single point of failure into your system that could cause downtimes of your website.

To overcome these challenges, you will use a fully managed MySQL database provided by RDS. AWS provides backup and restore functionality and offers database systems distributed among two data centers as well as the ability to recover from failure automatically.

10.1.1 Launching a WordPress platform with an RDS database

Launching a database consists of two steps:

  1. Launching a database instance

  2. Connecting an application to the database endpoint

Next, you’ll use the same CloudFormation template you used in chapter 2 to spin up the cloud infrastructure for WordPress. The template can be found on GitHub and on S3. You can download a snapshot of the repository at https://github.com/AWSinAction/code3/archive/main.zip. The file we’re talking about is located at chapter10/template .yaml. On S3, the same file is located at http://s3.amazonaws.com/awsinaction-code3/chapter10/template.yaml.

Execute the following command to create a CloudFormation stack containing an RDS database instance with a MySQL engine and web servers serving the WordPress application:

$ aws cloudformation create-stack --stack-name wordpress --template-url 
 https:/ /s3.amazonaws.com/awsinaction-code3/chapter10/template.yaml 
 --parameters "ParameterKey=WordpressAdminPassword,ParameterValue=test1234" 
 --capabilities CAPABILITY_IAM

You’ll have to wait several minutes while the CloudFormation stack is created in the background, which means you’ll have enough time to learn the details of the RDS database instance while the template is launching. The next listing shows parts of the CloudFormation template used to create the wordpress stack. Table 10.2 shows the attributes you need when creating an RDS database instance using CloudFormation or the Management Console.

Table 10.2 Attributes needed to connect to an RDS database

Attribute

Description

AllocatedStorage

Storage size of your database in GB

DBInstanceClass

Size (also known as instance type) of the underlying virtual machine

Engine

Database engine (Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, or Microsoft SQL Server) you want to use

DBName

Identifier for the database

MasterUsername

Name for the admin user

MasterUserPassword

Password for the admin user

It is possible to configure a database instance as publicly accessible, but we generally do not recommend enabling access from the internet to your database, to prevent unwanted access. Instead, as shown in listing 10.1, an RDS instance should be accessible only within the VPC.

To connect to an RDS database instance, you need an EC2 instance running in the same VPC. First, connect to the EC2 instance. From there, you can then connect to the database instance.

Listing 10.1 Excerpt from the CloudFormation template for setting up an RDS database

Resources:
  # [...]
  DatabaseSecurityGroup:                                     
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: 'awsinaction-db-sg'
      VpcId: !Ref VPC
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 3306                                       
        ToPort: 3306
        SourceSecurityGroupId: !Ref WebServerSecurityGroup   
  Database:
    Type: 'AWS::RDS::DBInstance'                             
    DeletionPolicy: Delete                                   
    Properties:
      AllocatedStorage: 5                                    
      BackupRetentionPeriod: 0                               
      DBInstanceClass: 'db.t2.micro'                         
      DBName: wordpress                                      
      Engine: MySQL                                          
      MasterUsername: wordpress                              
      MasterUserPassword: wordpress                          
      VPCSecurityGroups:
      - !Sub ${DatabaseSecurityGroup.GroupId}                
      DBSubnetGroupName: !Ref DBSubnetGroup                  
    DependsOn: VPCGatewayAttachment
  DBSubnetGroup:
    Type: 'AWS::RDS::DBSubnetGroup'                          
    Properties:
      Description: DB subnet group
      SubnetIds:
      - Ref: SubnetA                                         
      - Ref: SubnetB

Security group for the database instance, allowing incoming traffic on the MySQL default port for web servers

The default MySQL port is 3306.

References the security group of EC2 instances running a web server

Creates a database instance with Amazon RDS

Disables backups. (Set to “Snapshot” for production workloads.)

The database provides 5 GB of storage.

Disables backups. (Turn this on in production.)

Creates a default database named wordpress

Uses MySQL as the database engine

The size of the database instance is t2.micro, the smallest available size.

The username for the admin user of the MySQL database

The password for the admin user of the MySQL database

References the security group for the database instance

Defines the subnets the RDS database instance will launch into

The subnet group ...

... consists of subnet A and B, so RDS will distribute database instances among those subnets.

See if the CloudFormation stack named wordpress has reached the state CREATE _COMPLETE with the following command:

$ aws cloudformation describe-stacks --stack-name wordpress

Search for StackStatus in the output, and check whether the status is CREATE_ COMPLETE. If not, you need to wait a few minutes longer (it can take up to 15 minutes to create the stack) and rerun the command. If the status is CREATE_COMPLETE, you’ll find the key OutputKey in the output section. The corresponding OutputValue contains the URL for the WordPress blogging platform. The following listing shows the output in detail. Open this URL in your browser; you’ll find a running WordPress setup.

Listing 10.2 Checking the state of the CloudFormation stack

$ aws cloudformation describe-stacks --stack-name wordpress
{
  "Stacks": [{
    "StackId": "[...]",
    "Description": "AWS in Action: chapter 10",
    "Parameters": [...],
    "Tags": [],
    "Outputs": [
        {
            "Description": "WordPress URL",
            "OutputKey": "URL",
            "OutputValue": "http://[...].us-east-1.elb.amazonaws.com"  
        }
    ],
    "CreationTime": "2017-10-19T07:12:28.694Z",
    "StackName": "wordpress",
    "NotificationARNs": [],
    "StackStatus": "CREATE_COMPLETE",                                  
    "DisableRollback": false
  }]
}

Opens this URL in your browser to open the WordPress application

Waits for state CREATE_COMPLETE for the CloudFormation stack

Launching and operating a relational database like MySQL is that simple. Of course, you can also use the Management Console (https://console.aws.amazon.com/rds/) to launch an RDS database instance instead of using a CloudFormation template. RDS is a managed service, and AWS handles most of the tasks necessary to operate your database in a secure and reliable way. You need to do only two things:

  • Monitor your database’s available storage, and make sure you increase the allocated storage as needed.

  • Monitor your database’s performance, and make sure you increase I/O and computing performance as needed.

Both tasks can be handled with the help of CloudWatch monitoring, as you’ll learn later in the chapter.

10.1.2 Exploring an RDS database instance with a MySQL engine

The CloudFormation stack created an RDS database instance with the MySQL engine. Each database instance offers an endpoint for clients. Clients send their SQL queries to this endpoint to query, insert, delete, or update data. For example, to retrieve all rows from a table, an application sends the following SQL request: SELECT * FROM table. You can request the endpoint and detailed information of an RDS database instance with the describe-db-instances command:

$ aws rds describe-db-instances --query "DBInstances[0].Endpoint"
{
  "HostedZoneId": "Z2R2ITUGPM61AM",
  "Port": 3306,                                                          
  "Address": "wdwcoq2o8digyr.cqrxioeaavmf.us-east-1.rds.amazonaws.com"   
}

Port number of database endpoint

Host name of database endpoint

The RDS database is now running, but what does it cost?

10.1.3 Pricing for Amazon RDS

What does it cost to host WordPress on AWS? We discussed this question in chapter 2 in detail. Here, we want to focus on the costs for RDS.

Databases on Amazon RDS are priced according to the size of the underlying virtual machine and the amount and type of allocated storage. Compared to a database running on a plain EC2 VM, the hourly price of an RDS instance is higher. In our opinion, the Amazon RDS service is worth the extra charge because you don’t need to perform typical DBA tasks like installation, patching, upgrades, migration, backups, and recovery.

Table 10.3 shows a pricing example for a medium-sized RDS database instance with a standby instance for high availability. All prices in USD are for US East (N. Virginia) as of March 11, 2022. Get the current prices at https://aws.amazon.com/rds/pricing/.

Table 10.3 Monthly costs for a medium-sized RDS instance

Description

Monthly price

Database instance db.t4g.medium

$94.17 USD

50 GB of general purpose (SSD)

$11.50 USD

Additional storage for database snapshots (100 GB)

$9.50 USD

Total

$115.17 USD

You’ve now launched an RDS database instance for use with a WordPress web application. You’ll learn about importing data to the RDS database in the next section.

10.2 Importing data into a database

Imagine you are already running WordPress on a virtual machine in your on-premises data center, and you have decided to move the application to AWS. To do so, you need to move the data from the on-premises MySQL database to RDS. You will learn how to do that in this section.

A database without data isn’t useful. In many cases, you’ll need to import data into a new database by importing a dump from the old database. This section will guide you through the process of importing a MySQL database dump to an RDS database with a MySQL engine. The process is similar for all other database engines (Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and Microsoft SQL Server).

To import a database from your on-premises environment to Amazon RDS, follow these steps:

  1. Export the database.

  2. Start a virtual machine in the same region and VPC as the RDS database.

  3. Upload the database dump to the virtual machine.

  4. Run an import of the database dump to the RDS database on the virtual server.

We’ll skip the first step of exporting a MySQL database, because the RDS instance we created in our example is empty and you may not have access to an existing WordPress database. The next sidebar gives you hints on how to create a database dump in case you need that for your real-world systems later.

Exporting a MySQL database

MySQL (and every other database system) offers a way to export and import databases. We recommend using the command-line tools from MySQL for exporting and importing databases. You may need to install the MySQL client, which comes with the mysqldump tool.

The following command exports all databases from localhost and dumps them into a file called dump.sql. Replace $UserName with the MySQL admin user, and enter the password when prompted:

$ mysqldump -u $UserName -p --all-databases > dump.sql

You can also specify only some databases for the export, as shown next. To do so, replace $DatabaseName with the name of the database you want to export:

$ mysqldump -u $UserName -p $DatabaseName > dump.sql

And, of course, you can export a database over a network connection as follows. To connect to a server to export a database, replace $Host with the host name or IP address of your database:

$ mysqldump -u $UserName -p $DatabaseName --host $Host > dump.sql

See the MySQL documentation if you need more information about the mysqldump tool.

Theoretically, you could import a database to RDS from any machine from your on-premises or local network, but the higher latency over the internet or VPN connection will slow down the import process dramatically. Because of this, we recommend adding a second step: upload the database dump to a virtual machine running in the same AWS region and VPC, and import the database into RDS from there.

AWS Database Migration Service

When migrating a huge database to AWS with minimal downtime, the Database Migration Service (DMS) can help. We do not cover DMS in this book, but you can learn more on the AWS website: https://aws.amazon.com/dms.

To do so, we’ll guide you through the following steps:

  1. Connect to the virtual machine that is running WordPress.

  2. Download a database dump from S3 to the VM. (If you are using your own database dump, we recommend uploading it to S3 first.)

  3. Import the database dump into the RDS database from the virtual machine.

Fortunately, you already started two virtual machines that you know can connect to the MySQL database on RDS, because they’re running the WordPress application. Go through the following steps to open a terminal session:

  1. Open the EC2 service via AWS Management Console: https://console.aws.amazon.com/ec2/.

  2. Select one of the two EC2 instances named wordpress.

  3. Click the Connect button.

  4. Select Session Manager and click the Connect button.

Because you are connected to a virtual machine with access to the RDS database instance, you are ready to import the database dump. First, change into the home directory of the ssm-user as follows:

$ cd /home/ssm-user/

We prepared a MySQL database dump of a WordPress blog as an example. The dump contains a blog post and a few comments. Download this database dump from S3 using the following command on the virtual machine:

$ wget https:/ /s3.amazonaws.com/awsinaction-code3/chapter10/wordpress-import.sql

Next, you’ll need the port and hostname, also called the endpoint, of the MySQL database on RDS. Don’t remember the endpoint? The following command will print it out for you. Run this on your local machine:

$ aws rds describe-db-instances --query "DBInstances[0].Endpoint"

Run the following command on the VM to import the data from the file wordpress-import.sql into the RDS database instance; replace $DBAddress with the Address you printed to the terminal with the previous command. The Address will look similar to wdtq7tf5caejft.cd0o57zo3ohr.us-east-1.rds.amazonaws.com. Also, type in wordpress when asked for a password:

$ mysql --host $DBAddress --user wordpress -p < wordpress-import.sql

Point your browser to the WordPress blog again, and you’ll now find many new posts and comments there. If you don’t remember the URL, run the following command on your local machine to fetch it again:

$ aws cloudformation describe-stacks --stack-name wordpress 
 --query "Stacks[0].Outputs[0].OutputValue" --output text

10.3 Backing up and restoring your database

Over the years, your WordPress site has accumulated hundreds of blog posts and comments from the community. That’s a valuable asset. Therefore, it is key that you back up the data.

Amazon RDS is a managed service, but you still need backups of your database in case something or someone harms your data and you need to restore it, or you need to duplicate a database in the same or another region. RDS offers manual and automated snapshots for recovering RDS database instances. In this section, you’ll learn how to use RDS snapshots to do the following:

  • Configuring the retention period and time frame for automated snapshots

  • Creating snapshots manually

  • Restoring snapshots by starting new database instances based on a snapshot

  • Copying a snapshot to another region for disaster recovery or relocation

10.3.1 Configuring automated snapshots

The RDS database you started in section 10.1 can automatically create snapshots if the BackupRetentionPeriod is set to a value between 1 and 35. This value indicates how many days the snapshot will be retained (the default is 1). Automated snapshots are created once a day during the specified time frame. If no time frame is specified, RDS picks a random 30-minute time frame during the night. A new random time frame will be chosen each night.

Creating a snapshot requires all disk activity to be briefly frozen. Requests to the database may be delayed or even fail because of a timeout, so we recommend that you choose a time frame for the snapshot that has the least effect on applications and users (e.g., late at night). Automated snapshots are your backup in case something unexpected happens to your database. This could be a query that deletes all your data accidentally or a hardware failure that causes data loss.

The following command changes the time frame for automated backups to 05:00–06:00 UTC and the retention period to three days. Use the terminal on your local machine to execute it:

$ aws cloudformation update-stack --stack-name wordpress --template-url 
 https:/ /s3.amazonaws.com/awsinaction-code3/chapter10/
 template-snapshot.yaml 
 --parameters ParameterKey=WordpressAdminPassword,UsePreviousValue=true 
 --capabilities CAPABILITY_IAM

The RDS database will be modified based on a slightly modified CloudFormation template, as shown in the following listing.

Listing 10.3 Modifying an RDS database’s snapshot time frame and retention time

Database:
  Type: 'AWS::RDS::DBInstance'
  DeletionPolicy: Delete
  Properties:
    AllocatedStorage: 5
    BackupRetentionPeriod: 3               
    PreferredBackupWindow: '05:00-06:00'   
    DBInstanceClass: 'db.t2.micro'
    DBName: wordpress
    Engine: MySQL
    MasterUsername: wordpress
    MasterUserPassword: wordpress
    VPCSecurityGroups:
    - !Sub ${DatabaseSecurityGroup.GroupId}
    DBSubnetGroupName: !Ref DBSubnetGroup
  DependsOn: VPCGatewayAttachment

Keeps snapshots for three days

Creates snapshots automatically between 05:00 and 06:00 UTC

If you want to disable automated snapshots, you need to set the retention period to 0. As usual, you can configure automated backups using CloudFormation templates, the Management Console, or SDKs. Keep in mind that automated snapshots are deleted when the RDS database instance is deleted. Manual snapshots stay. You’ll learn about them next.

10.3.2 Creating snapshots manually

You can trigger manual snapshots whenever you need, for example, before you update to the latest WordPress version, migrate a schema, or perform some other activity that could damage your database. To create a snapshot, you have to know the instance identifier. The following command extracts the instance identifier from the first RDS database instance:

$ aws rds describe-db-instances --output text 
 --query "DBInstances[0].DBInstanceIdentifier"

The next command creates a manual snapshot called wordpress-manual-snapshot. Replace $DBInstanceIdentifier with the output of the previous command:

$ aws rds create-db-snapshot --db-snapshot-identifier 
 wordpress-manual-snapshot 
 --db-instance-identifier $DBInstanceIdentifier

In case you get a “Cannot create a snapshot because the database instance .. is not currently in the available state.” error, retry after five minutes—your database is still initializing.

It will take a few minutes for the snapshot to be created. You can check the current state of the snapshot with this command:

$ aws rds describe-db-snapshots 
 --db-snapshot-identifier wordpress-manual-snapshot

RDS doesn’t delete manual snapshots automatically; you need to delete them yourself if you don’t need them any longer. You’ll learn how to do this at the end of the section.

Copying an automated snapshot as a manual snapshot

There is a difference between automated and manual snapshots. Automated snapshots are deleted automatically after the retention period is over, but manual snapshots aren’t. If you want to keep an automated snapshot even after the retention period is over, you have to copy the automated snapshot to a new manual snapshot.

Get the snapshot identifier of an automated snapshot from the RDS database you started in section 10.1 by running the following command at your local terminal. Replace $DBInstanceIdentifier with the output of the describe-db-instances command:

$ aws rds describe-db-snapshots --snapshot-type automated 
 --db-instance-identifier $DBInstanceIdentifier 
 --query "DBSnapshots[0].DBSnapshotIdentifier" 
 --output text

The next command copies an automated snapshot to a manual snapshot named wordpress-copy-snapshot. Replace $SnapshotId with the output from the previous command:

$ aws rds copy-db-snapshot 
 --source-db-snapshot-identifier $SnapshotId  
 --target-db-snapshot-identifier wordpress-copy-snapshot

The copy of the automated snapshot is named wordpress-copy-snapshot. It won’t be removed automatically.

10.3.3 Restoring a database

Imagine a scary scenario: you have accidentally deleted all of the blog posts from your WordPress site. Of course, you want to restore the data as fast as possible. Lucky for you, RDS has you covered.

If you restore a database from an automated or manual snapshot, a new database will be created based on the snapshot. As figure 10.2 shows, you can’t restore a snapshot to an existing database.

Figure 10.2 A snapshot can’t be restored into an existing database.

Instead, a new database is created when you restore a database snapshot, as figure 10.3 illustrates.

Figure 10.3 A new database is created to restore a snapshot.

To create a new database in the same VPC as the WordPress platform you started in section 10.1, you need to find out the existing database’s subnet group. Execute this command to do so:

$ aws cloudformation describe-stack-resource  
 --stack-name wordpress --logical-resource-id DBSubnetGroup 
 --query "StackResourceDetail.PhysicalResourceId" --output text

You’re now ready to create a new database based on the manual snapshot you created at the beginning of this section. Execute the following command, replacing $SubnetGroup with the output of the previous command:

$ aws rds restore-db-instance-from-db-snapshot 
 --db-instance-identifier awsinaction-db-restore 
 --db-snapshot-identifier wordpress-manual-snapshot 
 --db-subnet-group-name $SubnetGroup

You might get an “DBSnapshot must have state available but actually has creating” error if your snapshot has not been created yet. In this case, retry the command after five minutes.

A new database named awsinaction-db-restore is created based on the manual snapshot. In theory, after the database is created, you could switch the WordPress application to the new endpoint by modifying the /var/www/html/wp-config.php file on both virtual machines.

If you’re using automated snapshots, you can also restore your database from a specified moment, because RDS keeps the database’s change logs. This allows you to jump back to any point in time from the backup retention period to the last five minutes.

Execute the following command, replacing $DBInstanceIdentifier with the output of the earlier describe-db-instances command, $SubnetGroup with the output of the earlier describe-stack-resource command, and $Time with a UTC timestamp from five minutes ago (e.g., 2022-03-11T09:30:00Z):

$ aws rds restore-db-instance-to-point-in-time 
 --target-db-instance-identifier awsinaction-db-restore-time 
 --source-db-instance-identifier $DBInstanceIdentifier 
 --restore-time $Time --db-subnet-group-name $SubnetGroup

A new database named awsinaction-db-restore-time is created based on the source database from five minutes ago. In theory, after the database is created, you could switch the WordPress application to the new endpoint by modifying the /var/www/ html/wp-config.php file on both virtual machines.

10.3.4 Copying a database to another region

When you created the cloud infrastructure for WordPress, you assumed that most readers will come from the United States. It turns out, however, that most readers access your site from Europe. Therefore, you decide to move your cloud infrastructure to reduce latency for the majority of your readers.

Copying a database to another region is possible with the help of snapshots as well. The main reasons you might do so follow:

  • Disaster recovery—You can recover from an unlikely region-wide outage.

  • Relocating—You can move your infrastructure to another region so you can serve your customers with lower latency.

The second command copies the snapshot named wordpress-manual-snapshot from the region us-east-1 to the region eu-west-1. You need to replace $SourceSnapshotArn with the Amazon Resource Name (ARN) of the snapshot. Use the following command to get the ARN of your manual snapshot:

$ aws rds describe-db-snapshots 
 --db-snapshot-identifier wordpress-manual-snapshot 
 --query "DBSnapshots[0].DBSnapshotArn" --output text

Compliance Moving data from one region to another may violate privacy laws or compliance rules. Make sure you’re allowed to copy the data to another region if you’re working with real data.

$ aws rds copy-db-snapshot 
 --source-db-snapshot-identifier $SourceSnapshotArn 
 --target-db-snapshot-identifier wordpress-manual-snapshot 
 --region eu-west-1

After the snapshot has been copied to the region eu-west-1, you can restore a database from it as described in the previous section.

10.3.5 Calculating the cost of snapshots

Snapshots are billed based on the storage they use. You can store snapshots up to the size of your database instance for free. In our WordPress example, you can store up to 5 GB of snapshots for free. On top of that, you pay per GB per month of used storage. As we’re writing this book, the cost is $0.095 for each GB every month.

Cleaning up

It’s time to clean up the snapshots and delete the restored database instances. Execute the following commands step by step, or jump to the shortcuts for Linux and macOS after the listing:

$ aws rds delete-db-instance --db-instance-identifier 
 awsinaction-db-restore --skip-final-snapshot               
$ aws rds delete-db-instance --db-instance-identifier 
 awsinaction-db-restore-time --skip-final-snapshot          
$ aws rds delete-db-snapshot --db-snapshot-identifier 
 wordpress-manual-snapshot                                  
$ aws rds delete-db-snapshot --db-snapshot-identifier 
 wordpress-copy-snapshot                                    
$ aws --region eu-west-1 rds delete-db-snapshot --db-snapshot-identifier 
 wordpress-manual-snapshot                                  

Deletes the database with data from the snapshot restore

Deletes the database with data from the point-in-time restore

Deletes the manual snapshot

Deletes the copied snapshot

Deletes the snapshot copied to another region

You can avoid typing these commands manually at your terminal by using the following command to download a bash script and execute it directly on your local machine. The bash script contains the same steps as shown in the previous snippet:

$ curl -s https:/ /raw.githubusercontent.com/AWSinAction/
 code3/main/chapter10/cleanup.sh | bash -ex

Keep the rest of the setup, because you’ll use it in the following sections.

10.4 Controlling access to a database

Every day, you can read about WordPress sites that got hacked. One essential aspect of protecting your WordPress site is controlling access to your cloud infrastructure and database.

The shared-responsibility model applies to the RDS service as well as to AWS services in general. AWS is responsible for the security of the cloud in this case—for example, for the security of the underlying OS. You, the customer, need to specify the rules controlling access to your data and RDS database.

Figure 10.4 shows the following three layers that control access to an RDS database:

  • Controlling access to the configuration of the RDS database

  • Controlling network access to the RDS database

  • Controlling data access with the database’s own user and access management features

Figure 10.4 Your data is protected by the database itself, security groups, and IAM.

10.4.1 Controlling access to the configuration of an RDS database

Access to the RDS service is controlled using the IAM service. IAM is responsible for controlling access to actions like creating, updating, and deleting an RDS database instance. IAM doesn’t manage access inside the database; that’s the job of the database engine. IAM policies define which configuration and management actions an identity is allowed to execute on RDS. You attach these policies to IAM users, groups, or roles to control what actions they can perform on the database.

The following listing shows an IAM policy that allows access to all RDS configuration and management actions. You could use this policy to limit access by attaching it only to trusted IAM users and groups.

Listing 10.4 Allowing access to all RDS service configuration and management actions

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",    
    "Action": "rds:*",    
    "Resource": "*"       
  }]
}

Allows the specified actions on the specified resources

All possible actions on RDS service are specified (e.g., changes to the database configuration).

All RDS databases are specified.

Only people and machines that really need to make changes to RDS databases should be allowed to do so. The following listing shows an IAM policy that denies all destructive actions to prevent data loss by human failure.

Listing 10.5 IAM policy denying destructive actions

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",                           
    "Action": "rds:*",                           
    "Resource": "*"                              
  }, {
    "Effect": "Deny",                            
    "Action": ["rds:Delete*", "rds:Remove*"],    
    "Resource": "*"                              
  }]
}

Allows access ...

... to all actions related to RDS ...

... and all resources.

But, denies access ...

... to all destructive actions on the RDS service (e.g., delete database instance) ...

... for all resources.

As discussed in chapter 5, when introducing IAM, a Deny statement overrides any Allow statement. Therefore, a user or role with the IAM policy attached does have limited access to RDS, because all actions are allowed except the ones that are destructive.

10.4.2 Controlling network access to an RDS database

An RDS database is linked to security groups. Each security group consists of rules for a firewall controlling inbound and outbound database traffic. You already know about using security groups in combination with virtual machines.

The next listing shows the configuration of the security group attached to the RDS database in our WordPress example. Inbound connections to port 3306 (the default port for MySQL) are allowed only from virtual machines linked to the security group called WebServerSecurityGroup.

Listing 10.6 CloudFormation template extract: Firewall rules for an RDS database

DatabaseSecurityGroup:                                      
  Type: 'AWS::EC2::SecurityGroup'
  Properties:
    GroupDescription: 'awsinaction-db-sg'
    VpcId: !Ref VPC
    SecurityGroupIngress:
    - IpProtocol: tcp
      FromPort: 3306                                        
      ToPort: 3306
      SourceSecurityGroupId: !Ref WebServerSecurityGroup    

The security group for the database instance, allowing incoming traffic on the MySQL default port for web servers

The default MySQL port is 3306.

References the security group for web servers

Only machines that really need to connect to the RDS database should be allowed to do so on the network level, such as EC2 instances running your web server or application server. See chapter 5 if you’re interested in more details about security groups (firewall rules).

10.4.3 Controlling data access

A database engine also implements access control itself. User management of the database engine has nothing to do with IAM users and access rights; it’s only responsible for controlling access to the database. For example, you typically define a user for each application and grant rights to access and manipulate tables as needed. In the WordPress example, a database user called wordpress is created. The WordPress application authenticates itself to the database engine (MySQL, in this case) with this database user and a password.

IAM database authentication

AWS provides an IAM database authentication mechanism for MariaDB, MySQL, and PostgreSQL. With IAM database authentication, you no longer need to create users with a username and password in the database engine. Instead, you create a database user that uses a plug-in called AWSAuthenticationPlugin for authentication. You then log in to the database with the username and a token that is generated with your IAM identity. The token is valid for 15 minutes, so you have to renew it from time to time. You can learn more about IAM database authentication in the AWS documentation at http://mng.bz/z57r.

Typical use cases follow:

  • Limiting write access to a few database users (e.g., only for an application)

  • Limiting access to specific tables to a few users (e.g., to one department in the organization)

  • Limiting access to tables to isolate different applications (e.g., hosting multiple applications for different customers on the same database)

User and access management varies between database systems. We don’t cover this topic in this book; refer to your database system’s documentation for details.

10.5 Building on a highly available database

The availability of our blog cloudonaut.io is key to our business success. That’s the case for your WordPress site as well. Therefore, you should avoid downtimes when possible. This chapter is about increasing the availability of your database.

The database is typically the most important part of a system. Applications won’t work if they can’t connect to the database, and the data stored in the database is mission critical, so the database must be highly available and store data durably.

Amazon RDS lets you launch highly available (HA) databases. Compared to a default database consisting of a single database instance, an HA RDS database consists of two database instances: a primary and a standby database instance. You will be paying for both instances. All clients send requests to the primary database. Data is replicated between the primary and the standby database synchronously, as shown in figure 10.5.

We strongly recommend using high-availability deployment for all databases that handle production workloads. If you want to save money, you can choose not to deploy a highly available database for your test systems.

If the primary database becomes unavailable due to hardware or network failures, RDS starts the failover process. The standby database then becomes the primary database. As figure 10.6 shows, the DNS name is updated and clients begin to use the former standby database for their requests.

RDS detects the need for a failover automatically and executes it without human intervention.

Figure 10.5 The primary database is replicated to the standby database when running in high-availability mode.

Figure 10.6 The client fails over to the standby database if the primary database fails, using DNS resolution.

Aurora is different

Aurora is an exception to the way that highly available databases operate in AWS. It does not store your data on a single EBS volume. Instead, Aurora stores data on a cluster volume. A cluster volume consists of multiple disks, with each disk having a copy of the cluster data. This implies that the storage layer of Aurora is not a single point of failure. But still, only the primary Aurora database instance accepts write requests. If the primary goes down, it is automatically re-created, which typically takes less than 10 minutes. If you have replica instances in your Aurora cluster, a replica is promoted to be the new primary instance, which usually takes around one minute and is much faster than primary re-creation.

  

Multi-AZ with two standby instances

AWS introduced a new option for multi-AZ deployments of RDS databases: Multi-AZ with two standby instances. The advantages compared to the single standby instance that we discussed follow:

  1. Both standby instances can be used as read replicas to increase capacity for read-only queries. You will learn more about read replicas later.

  2. Lower latency and jitter for transaction commits, which improves the write performance.

  3. Faster failover in less than 60 seconds.

Right now, this option is available only for PostgreSQL and MySQL engines. Check out https://aws.amazon.com/rds/features/multi-az/ to learn more.

10.5.1 Enabling high-availability deployment for an RDS database

Warning Starting a highly available RDS database will incur charges, about USD $0.017000 per hour. See https://aws.amazon.com/rds/pricing/ if you want to find out the current hourly price.

Execute the following command at your local terminal to enable high-availability deployment for the RDS database you started in section 10.1:

$ aws cloudformation update-stack --stack-name wordpress --template-url 
 https:/ /s3.amazonaws.com/awsinaction-code3/
 chapter10/template-multiaz.yaml 
 --parameters ParameterKey=WordpressAdminPassword,UsePreviousValue=true 
 --capabilities CAPABILITY_IAM

The RDS database is updated based on a slightly modified CloudFormation template as shown in the next listing.

Listing 10.7 Modifying the RDS database by enabling high availability

Database:
  Type: 'AWS::RDS::DBInstance'
  DeletionPolicy: Delete
  Properties:
    AllocatedStorage: 5
    BackupRetentionPeriod: 3
    PreferredBackupWindow: '05:00-06:00'
    DBInstanceClass: 'db.t2.micro'
    DBName: wordpress
    Engine: MySQL
    MasterUsername: wordpress
    MasterUserPassword: wordpress
    VPCSecurityGroups:
    - !Sub ${DatabaseSecurityGroup.GroupId}
    DBSubnetGroupName: !Ref DBSubnetGroup
    MultiAZ: true                          
  DependsOn: VPCGatewayAttachment

Enables high-availability deployment for the RDS database

It will take several minutes for the database to be deployed in HA mode. There is nothing more you need to do—the database is now highly available.

What is Multi-AZ?

Each AWS region is split into multiple independent data centers, which are also called availability zones. We introduced the concept of availability zones in chapters 4 and 5, but skipped one aspect of HA deployment that is used only for RDS: the primary and standby databases are launched into two different availability zones. AWS calls the high-availability deployment of RDS Multi-AZ deployment for this reason.

In addition to the fact that a high-availability deployment increases your database’s reliability, it offers another important advantage: reconfiguring or maintaining a single-mode database causes short downtimes. High-availability deployment of an RDS database solves this problem because AWS switches to the standby database during maintenance.

10.6 Tweaking database performance

When search engines decide in which order to present the search results, the loading speed of a website is an important factor. Therefore, it is important to optimize the performance of your WordPress site. You will learn how to make sure the MySQL database is not slowing down your website in this section.

The easiest way to scale a RDS database, or an SQL database in general, is to scale vertically. Scaling a database vertically means increasing the following resources of your database instance:

  • Faster CPU

  • More memory

  • Faster storage

Keep in mind that you can’t scale vertically (which means increasing resources) without limits. One of the largest RDS database instance types comes with 32 cores and 244 GiB memory. In comparison, an object store like S3 or a NoSQL database like DynamoDB can be scaled horizontally without limits, because they add more machines to the cluster if additional resources are needed.

10.6.1 Increasing database resources

When you start an RDS database, you choose an instance type. The instance type defines the computing power and memory of your virtual machine (such as when you start an EC2 instance). Choosing a bigger instance type increases computing power and memory for RDS databases.

You started an RDS database with instance type db.t2.micro, the smallest available instance type. You can change the instance type using a CloudFormation template, the CLI, the Management Console, or AWS SDKs. You may want to increase the instance type if performance is inadequate. You will learn how to measure performance in section 10.7. Listing 10.8 shows how to change the CloudFormation template to increase the instance type from db.t2.micro with one virtual core and 615 MB memory to db.m3.large with two faster virtual cores and 7.5 GB memory. You’ll do this only in theory. Don’t do this to your running database because it is not covered by the Free Tier and will incur charges. Keep in mind that modifying the instance type causes a short downtime.

Listing 10.8 Modifying the instance type to improve performance of an RDS database

Database:
  Type: 'AWS::RDS::DBInstance'
  DeletionPolicy: Delete
  Properties:
    AllocatedStorage: 5
    BackupRetentionPeriod: 3
    PreferredBackupWindow: '05:00-06:00'
    DBInstanceClass: 'db.m3.large'       
    DBName: wordpress
    Engine: MySQL
    MasterUsername: wordpress
    MasterUserPassword: wordpress
    VPCSecurityGroups:
    - !Sub ${DatabaseSecurityGroup.GroupId}
    DBSubnetGroupName: !Ref DBSubnetGroup
    MultiAZ: true
  DependsOn: VPCGatewayAttachment

Increases the size of the underlying virtual machine for the database instance from db.t2.micro to db.m3.large

Because a database has to read and write data to a disk, I/O performance is important for the database’s overall performance. RDS offers the following three different types of storage, as you already know from reading about the block storage service EBS:

  • General purpose (SSD)

  • Provisioned IOPS (SSD)

  • Magnetic

You should choose general-purpose (SSD) or even provisioned IOPS (SSD) storage for production workloads. The options are exactly the same as when using EBS for virtual machines. If you need to guarantee a high level of read or write throughput, you should use provisioned IOPS (SSD). The general-purpose (SSD) option offers moderate baseline performance with the ability to burst. The throughput for general purpose (SSD) depends on the amount of initialized storage size. Magnetic storage is an option if you need to store data at a low cost, or if you don’t need to access it in a predictable, performant way. The next listing shows how to enable general-purpose (SSD) storage using a CloudFormation template.

Listing 10.9 Modifying the storage type to improve performance of an RDS database

Database:
  Type: 'AWS::RDS::DBInstance'
  DeletionPolicy: Delete
  Properties:
    AllocatedStorage: 5
    BackupRetentionPeriod: 3
    PreferredBackupWindow: '05:00-06:00'
    DBInstanceClass: 'db.m3.large'
    DBName: wordpress
    Engine: MySQL
    MasterUsername: wordpress
    MasterUserPassword: wordpress
    VPCSecurityGroups:
    - !Sub ${DatabaseSecurityGroup.GroupId}
    DBSubnetGroupName: !Ref DBSubnetGroup
    MultiAZ: true
    StorageType: 'gp2'              
  DependsOn: VPCGatewayAttachment

Uses general-purpose (SSD) storage to increase I/O performance

10.6.2 Using read replication to increase read performance

A database suffering from too many read requests can be scaled horizontally by adding additional database instances for read traffic and enabling replication from the primary (writable) copy of the database instance. As figure 10.7 shows, changes to the database are asynchronously replicated to an additional read-only database instance. The read requests can be distributed between the primary database and its read-replication databases to increase read throughput. Be aware that you need to implement the distinction between read and write requests on the application level.

Figure 10.7 Read requests are distributed between the primary and read-replication databases for higher read performance.

Tweaking read performance with replication makes sense only if the application generates many read requests and few write requests. Fortunately, most applications read more than they write.

Creating a read-replication database

Amazon RDS supports read replication for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server databases. To use read replication, you need to enable automatic backups for your database, as shown in section 10.3.

Warning Starting an RDS read replica will incur charges. See https://aws.amazon.com/rds/pricing/ if you want to find out the current hourly price.

Execute the following command from your local machine to create a read-replication database for the WordPress database you started in section 10.1. Replace the $DBInstanceIdentifier with the value from aws rds describe-db-instances --query "DBInstances[0].DBInstanceIdentifier" --output text:

$ aws rds create-db-instance-read-replica 
 --db-instance-identifier awsinaction-db-read 
 --source-db-instance-identifier $DBInstanceIdentifier

RDS automatically triggers the following steps in the background:

  1. Creating a snapshot from the source database, also called the primary database instance

  2. Launching a new database based on that snapshot

  3. Activating replication between the primary and read-replication database instances

  4. Creating an endpoint for SQL read requests to the read-replication database instances

After the read-replication database is successfully created, it’s available to answer SQL read requests. The application using the SQL database must support the use of read-replication databases. WordPress, for example, doesn’t support read replicas by default, but you can use a plug-in called HyperDB to do so; the configuration is tricky, so we’ll skip this part. You can get more information here: https://wordpress.org/plugins/hyperdb/. Creating or deleting a read replica doesn’t affect the availability of the primary (writable) database instance.

Using read replication to transfer data to another region

RDS supports read replication between regions for Aurora, MariaDB, MySQL, Oracle, and PostgreSQL databases. You can replicate your data from the data centers in North Virginia to the data centers in Ireland, for example. Three major use cases for this feature follow:

  • Backing up data to another region for the unlikely event of an outage covering a complete region

  • Transferring data to another region to be able to answer read requests with lower latency

  • Migrating a database to another region

Creating read replication between two regions incurs an additional cost because you have to pay for the transferred data.

Promoting a read replica to a standalone database

Imagine your WordPress site became really popular. On the one hand, that’s a great success. On the other hand, you are struggling with handling all that load with your cloud infrastructure. You are thinking about adding read replicas to your database to decrease the load on the primary database.

If you create a read-replication database to migrate a database from one region to another, or if you have to perform heavy and load-intensive tasks on your database, such as adding an index, it’s helpful to switch your workload from the primary database to a read-replication database. The read replica must become the new primary database.

The following command promotes the read-replica database you created in this section to a standalone primary database. Note that the read-replication database will perform a restart and be unavailable for a few minutes:

$ aws rds promote-read-replica --db-instance-identifier awsinaction-db-read

The RDS database instance named awsinaction-db-read will accept write requests after the transformation is successful.

Cleaning up

It’s time to clean up, to avoid unwanted expense. Execute the following command:

$ aws rds delete-db-instance --db-instance-identifier 
 awsinaction-db-read --skip-final-snapshot

You’ve gained experience with the AWS relational database service in this chapter. We’ll end the chapter by taking a closer look at the monitoring capabilities of RDS.

10.7 Monitoring a database

To avoid downtime of your WordPress site, it is imperative that you monitor all important parts of your cloud infrastructure. The database is definitely one of the key components. That’s why you will learn about RDS monitoring in this section.

RDS is a managed service. Nevertheless, you need to monitor some metrics yourself to make sure your database can respond to all requests from applications. RDS publishes several metrics for free to AWS CloudWatch, a monitoring service for the AWS cloud. You can watch these metrics through the Management Console, as shown in figure 10.8, and define alarms for when a metric reaches a threshold.

Figure 10.8 Metrics to monitor an RDS database from the Management Console

Table 10.4 shows the most important metrics; we recommend that you monitor them by creating alarms.

Table 10.4 Important metrics for RDS databases from CloudWatch

Name

Description

FreeStorageSpace

Available storage in bytes. Make sure you don’t run out of storage space. We recommend setting the alarm threshold to < 2147483648 (2 GB).

CPUUtilization

The usage of the CPU as a percentage. High utilization can be an indicator of a bottleneck due to insufficient CPU performance. We recommend setting the alarm threshold to > 80%.

FreeableMemory

Free memory in bytes. Running out of memory can cause performance problems. We recommend setting the alarm threshold to < 67108864 (64 MB).

DiskQueueDepth

Number of outstanding requests to the disk. A long queue indicates that the database has reached the storage’s maximum I/O performance. We recommend setting the alarm threshold to > 64.

SwapUsage

If the database has insufficient memory, the OS starts to use the disk as memory (this is called swapping). Using the disk as memory is slow and will cause performance problems. We recommend setting the alarm threshold to > 268435456 (256 MB).

We recommend that you monitor these metrics in particular, to make sure your database isn’t the cause of application performance problems.

Cleaning up

It’s time to clean up to avoid unwanted expense. Execute the following command to delete all resources corresponding to the WordPress blogging platform based on an RDS database:

$ aws cloudformation delete-stack --stack-name wordpress

In this chapter, you’ve learned how to use the RDS service to manage relational databases for your applications. The next chapter will focus on in-memory caches.

Summary

  • RDS is a managed service that provides relational databases.

  • You can choose between PostgreSQL, MySQL, MariaDB, Oracle Database, and Microsoft SQL Server databases. Aurora is the database engine built by Amazon.

  • The fastest way to import data into an RDS database is to copy it to a virtual machine in the same region and load it into the RDS database from there.

  • RDS comes with built-in backup and restore functionality allowing you to create and restore snapshots on demand as well as to restore a database to a certain point in time.

  • You can control access to data with a combination of IAM policies and firewall rules and on the database level.

  • You can restore an RDS database to any time in the retention period (a maximum of 35 days).

  • RDS databases can be highly available. You should launch RDS databases in Multi-AZ mode for production workloads.

  • Read replication can improve the performance of read-intensive workloads on an SQL database.

  • CloudWatch metrics allow you to monitor your database, for example, to debug performance problems or decide when to increase or decrease the size of your database instance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.30.232