Amazon Aurora is a fully managed relational database offered by Amazon Web Services (AWS). It has many similarities to Amazon Relational Database Service (RDS), which we learned about in the previous chapter, but it also has many exclusive features. Aurora is a major topic within the AWS Certified Database – Specialty exam and as it features many of the same technologies as RDS it is highly recommended that you study Chapter 4, Relational Database Service, before this one.
In this chapter, we will learn about Amazon Aurora's architecture and how it differs from RDS, how we can achieve high availability and design Aurora to allow rapid disaster recovery, and we'll learn about some advanced options and features that only exist within Aurora.
This chapter includes a hands-on lab where we will deploy, configure, and explore an Aurora cluster, including how we can monitor it.
In this chapter, we're going to cover the following main topics:
Let's start by making sure we understand what Aurora is, which database types it supports, and how it differs from RDS.
You will require an AWS account with root access; everything we will do in this chapter will be available as Free Tier, which means you can run all the example code without spending any money as long as your account has only been opened within the last 12 months. You will also require AWS Command-Line Interface (CLI) access. The AWS CLI Configuration Guide (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) will explain the steps required, but I will summarize them here:
You will also require a VPC that meets the minimum requirements for an RDS instance, which you can read about here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html. If you completed the steps in Chapter 3, Understanding AWS infrastructure, you will already have a VPC that meets the requirements.
Amazon Aurora is a managed database service. This means that AWS offers a wrapper around a relational database that handles many of the functions normally carried out by a Database Administrator (DBA). Where Aurora differs from RDS is that Aurora always speeds up the database functionality, and it can run up to five times faster than a non-Aurora version of the same database. Aurora manages such fast speeds by using a distributed storage system to avoid bandwidth and disk-read bottlenecks. Aurora has many benefits compared to RDS:
However, Aurora is more limited in the databases it supports compared to RDS and it can be harder to accurately calculate how much it will cost in advance.
Let's take a look at what database types Aurora supports and how this is decided.
Aurora is a type of RDS, so, therefore, it also only supports relational databases. However, because of the way in which Aurora works, it is described as being compatible with a database engine rather than using it. Currently, only two different database engines are compatible with Aurora:
As you can see, compared to RDS, the choices are much more limited with Aurora. You will also find that newer versions of PostgreSQL and MySQL typically take longer to be supported in Aurora than RDS because the Aurora code wrapper has to be rewritten to support any changes.
As Aurora only supports open source databases, there are no licensing considerations to worry about. With Aurora, you only pay for what you use and you do not need a third-party license.
Aurora is very similar to RDS for both compute and access restrictions so we'll give a brief reminder here. If you are not using a Serverless Aurora cluster (which we will talk about in more depth in the Using Aurora's Global Database and Serverless options section of this chapter.), then the compute considerations are the same as for RDS. You will need to decide the size of the instance you need to handle your workload. The instance class can be changed after the database has been created.
Aurora also has similar restrictions on access as RDS; there is no access to the operating system or to root or sys accounts, and some functionality you would have on-premises has been changed to use Aurora-specific functions instead.
Aurora also has certain service limits. Service limits indicate the maximums that you can use within RDS and includes the maximum amount of storage you can assign to the database, the maximum number of database connections, and even the number of Aurora instances you can run in your account. Here are some of the most common limits you may come across. These are often asked in the exam so it is worth trying to remember these numbers:
There are many other less-common service limits on Aurora but these will not come up in the exam. Please refer to the AWS documentation for the full list, which you can find at the following link: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_Limits.html.
Now that we understand what Aurora is and some of its benefits and limitations, we can start to learn some of the specific features, starting with clusters and replicas.
Amazon Aurora has been designed to benefit from cloud technology and as a result, it can use cloud ideologies such as auto-scaling (both horizontal and vertical) and decoupling of different parts of the application to improve resilience in a deeper manner than RDS. Let's take a closer look at an Aurora cluster to see how it decouples the compute layer and the storage layer to offer high redundancy and fast scaling.
An Aurora cluster is made up of two different types of nodes:
The following diagram shows how an Aurora cluster is arranged:
All of the data is stored within the cluster volumes, and the database instances themselves only hold transient data in memory, which is lost if the instance reboots. The database instances can be read/write (primary) nodes or read replicas, and they communicate with the cluster volumes in the same AZ to improve latency and minimize cross-AZ traffic. The cluster volumes handle all replication, removing this from the database layer to further aid performance and to access the benefits of using the AWS storage backbone network. The database instances do not share data with each other. It is worth noting that even if you have a single database instance, it is still called a cluster as it will contain the single database instance and six cluster volumes. However, an Aurora cluster can contain up to 15 read replicas, which we will learn about now.
Amazon Aurora allows the creation of up to 15 read replicas, sometimes called reader instances, in the same region as the primary/writer instance. The reader instances can be used for two purposes:
A reader instance can be a different instance class to the writer instance, allowing you to optimize the compute capabilities between reads and writes.
Let's look in more depth at what happens when a writer instance fails.
In the case of a failure of the writer instance, Aurora will automatically promote one of the reader instances to become the writer. Promote means to change the instance type from a reader to a writer. As you can create up to 15 reader instances, you can control the order in which the reader instances will be promoted first by assigning a tier to each reader instance. The lowest tier number denotes the highest priority for promotion, starting at tier zero. If more than one reader instance has the same tier, then Aurora will promote the instance that is the same instance class size as the original writer if one exists, and if not, it will pick one at random. This process takes less than 30 seconds to complete.
If you do not have a reader instance running when the writer instance fails, Aurora will recreate the writer instance for you. This will have considerably longer downtime than promoting a reader instance.
You can also manually promote a reader instance at any time without a failover. This creates a new standalone Aurora cluster to which you can now add reader instances.
We've looked at how read replicas or read instances work with Aurora but we haven't learned how we connect to them and how an application will send its read-only traffic to the right database instance. We do this via endpoints.
We learned in the previous section that Aurora uses a cluster endpoint that points to the writer instance. This endpoint is automatically moved if the writer instance fails and a reader instance is promoted. You can also create additional reader endpoints that will act as a load balancer to all the reader instances in your cluster.
You are also able to create custom endpoints for specific scenarios. For example, if you had a web application with read and write traffic and a reporting server with only reads, you might want to ensure the read traffic from both goes to different reader instances to balance the load. This method is also useful if you create reader instances of different sizes to suit different applications, so the reporting server might need to connect to the reader instance larger than the web application. You can use custom endpoints to create groups of reader and writer instances as well allowing for a highly specific configuration. Aurora will automatically stop traffic going to a promoted, deleted, or shut-down instance. You can also tell Aurora to add new instances to a custom endpoint automatically based on the exclusion and static list.
An example of how the endpoints can be configured is shown in the following diagram:
In this section, we've learned how an Aurora cluster works, how endpoints are used to control and configure access to the Aurora instances, and how they work with reader instances to split application traffic. In the next section, we are going to look at how Aurora is backed up and restored, and how you can migrate from RDS.
Ensuring your data is secure and can be restored rapidly is a critical part of any reliable and resilient database system. Aurora has multiple options for backup and recovery strategies.
Amazon Aurora is backed up continually and automatically as well as a system backup taken daily. The continual backups are taken throughout the day and do not have an impact on the performance of the database; this is a major advantage of the cluster volumes Aurora uses. The daily backup is taken during the backup window defined and this can have a low impact on performance, so the backup window should be chosen during a non-peak time. The backups are held in S3 until the retention time is reached, when they are deleted. The retention time can be set between one and 35 days and the default is one day regardless of whether the database is provisioned by the console or awscli.
You can also make ad-hoc backups at any time. These are called snapshots and they are not deleted unless done so manually, so these are often used to hold a backup beyond the retention period. The snapshots can also be used to create a new Aurora cluster and they can be shared with other accounts.
An Aurora cluster can be restored from any Aurora snapshot. To restore the system, you use the snapshot to create a new cluster, which allows you to change the name. The new cluster will be associated with the default parameter unless you override it. If you need the same parameters or parameter group to be used, it is recommended that you do not remove the old group as the new cluster can be associated with it again.
A snapshot can only be restored in the same region or account it is currently stored in. If you wish to restore to a different account or region, you will need to copy or share the snapshot first. Let's learn how to do that now.
First, let's look at copying a snapshot to a different Region in the same account.
To share a snapshot to a different region, you simply copy it and put the new region as the destination. If the Aurora cluster is encrypted, then all the backups will be too. As Key Management Service (KMS) keys are region-specific, the snapshot will need to be encrypted with a KMS key from the target region before you can copy it. Once the snapshot has been copied, you can use it to create a new Aurora cluster in the new region.
The snapshots do not automatically expire, so you will need to manually delete them to clear space if required; however, you must ensure the transfer has been completed fully before deleting the source snapshot, as its removal while the cross-region transfer is taking place can cause it to fail.
You can also share a snapshot with other AWS accounts within the same region. You can share an unencrypted snapshot publicly, which means any other AWS accounts can access it. If you wish to share an encrypted snapshot, you must also share the KMS key it was encrypted with to the other account. For security reasons, you cannot share an encrypted snapshot that was encrypted with the account default KMS key as this may grant access to decrypt other databases or systems that used the same key. By default, you can share a snapshot with up to 20 other AWS accounts.
If you wish to share with a different account in a different region, you must take a two-step approach, by doing either of the following:
Now we've learned how to work with Aurora backups, let's learn how to migrate an RDS database to Aurora.
Amazon Aurora is fully compatible with RDS MySQL and PostgreSQL. This means you can quickly and easily migrate from RDS to Aurora with minimal downtime. An RDS instance allows you to create an Aurora read replica instance that is solely designed for you to migrate.
When you first create an Aurora read replica, AWS takes a snapshot of your RDS instance and copies this to Aurora. This can take some time, several hours per tebibyte. When the read replica is created, RDS will start sending the transaction logs to Aurora so that the data is updated. This is an asynchronous replication, which means that the databases will not always be in sync, and at busy times you can get lag drift. You should monitor the lag and when it is at zero, you can promote the Aurora read replica to become a standalone Aurora cluster. At this point, you can switch the application to use the new Aurora cluster and you can delete the RDS database.
Backtrack
Aurora MySQL offers a feature called Backtrack, which lets you rewind a database to a prior point in time without having to restore the entire database. If you are used to working with Oracle databases, you can consider it as a similar feature to Oracle Flashback. You can enable Backtrack at any time by setting the Backtrack window for your database. Backtrack has a maximum window of 72 hours.
If you need to rewind the database, you can choose the exact moment at any time in your Backtrack window and the database will be put back as it was.
After migrating from RDS to Aurora, you can take advantage of two Aurora-specific features that do not exist in RDS: Global Database and Serverless. Let's learn about them now.
Aurora offers two advanced features that can make a huge difference for certain use cases. In particular, customers who have a worldwide customer base can use Global Database options to reduce the latency between the database and applications around the world, improving performance. Customers with unpredictable or intermittent workloads can benefit from Aurora Serverless, where they can use a database without having to define the compute. Let's start by looking at Aurora Global Database in more depth.
Aurora Global Database allows you to create a cross-region Aurora cluster where you can send read requests all over the world. This allows you to have read replicas in the same regions as your applications and users to greatly reduce latency times and improve the performance of your applications.
Aurora Global Database can also offer rapid recovery from a region outage, as any of the secondary/read regions can be promoted to a primary writer region in under a minute.
There is no performance impact in enabling Global Database as the replication is handled at the cluster volume layer. The cross-region replication is asynchronous but typically suffers lag times of under a second, making it a good solution for read-heavy global workloads.
You are limited to a maximum of five secondary regions, allowing you to operate in six regions at any one time (including the primary region). The nodes in the secondary regions can differ in size and type from the primary, allowing for high customization to fit your use case and usage patterns. For example, if you wanted to run in three regions but the third region had far fewer customers, you could provision a t3.medium instance there instead of an m5.xlarge instance in the other two regions.
Let's now look at another feature of Aurora: Aurora Serverless.
Aurora Serverless is an on-demand, auto-scaling version of Aurora. This means that you do not need to specify the compute or instance class for it as Aurora Serverless is not run on a virtual machine, but runs on AWS hardware instead. Aurora Serverless automatically scales in a fraction of a second and goes from being able to handle a few hundred transactions to hundreds of thousands. Aurora Serverless will also pause when not in use, making it cost-efficient. When Aurora Serverless pauses, it can take several seconds for it to wake up and allow transactions to start again. This is important as the restart period is not instant and therefore you need to carefully decide whether your workload will operate effectively with Aurora Serverless. The best use cases for Aurora are when your workload is unpredictable with sharp spikes and drops in usage. Aurora can almost instantly scale up and down to maintain the same performance level for the end users, regardless of the workload.
Aurora Serverless offers the same features as Aurora, including global tables and read replicas. You can also mix a cluster to feature both Serverless and standard provisioned nodes. This can be used to rapidly add fully automated scaling to any Aurora cluster, even one that's already running.
We've now learned about the key features of Aurora and how they can be used to meet different use cases. Questions around global tables and Aurora Serverless do appear in the exam. Let's now look at how Aurora is priced for both provisioned mode and Serverless.
Aurora pricing is different between provisioned mode and Serverless. In provisioned mode, Aurora is priced in a similar way to RDS, where you decide how much resources (CPU and memory) you need, as well as how much storage. In Serverless, you are billed based on the Aurora Capacity Units (ACUs), which are priced as a combination of CPU and memory. In addition, you pay for any specific features you use such as global tables and Backtrack. In addition, you pay for read/write I/O usage in Aurora, which is included as standard in RDS.
To calculate your total Aurora costs, you will need to choose an instance size, database engine (MySQL or PostgreSQL), storage size, and I/O requirements. You can use the AWS Calculator to help you build your estimate. The following screenshot shows the figures you need to add to the Calculator for storage and I/O rates:
The Calculator URL is https://calculator.aws/.
Aurora provisioned pricing is very similar to RDS, which we covered in Chapter 4, Relational Database Service, so let's look more closely at Aurora Serverless pricing to understand how it differs.
You do not choose an instance class for Aurora Serverless; instead, you can set two optional parameters to control the minimum and maximum amount of CPU and memory resources available to your database. The resources are called ACUs. If you do not set these ACU values, then your Aurora instance can grow from zero ACUs (that is, the database will be shut down) up to 256 ACUs, which equates to an r5.16xlarge instance class. On top of this, Aurora can also automatically scale your reader nodes to the same size. If you recall, you can have 15 reader nodes, giving you the equivalent in processing power of 16 r5.16xlarge instances, which is enough to consistently manage 96,000 simultaneous connections.
Aurora bills each ACU's usage by the second, so you will only pay for what you use. If the database is shut down because it is not being used, you will only pay for the storage being used and no charges will apply for any ACUs.
If you decide to use global tables, you will need to pay for any resource usage for the secondary regions in the same way as your primary region. This could be Aurora Serverless ACUs or provisioned compute. In addition, you will need to pay transfer fees. Transfer fees are paid when data is moved between regions or is moved outside of an AWS data center, for example, if your application sends data back to an on-premises server. To calculate the charges for global tables, you need to work out the write I/O on your primary region and then multiply those by the number of secondary regions. Once you have this figure, you can use the AWS Pricing Calculator to find out the specific cost for your regions.
High-level pricing questions come up in the exam, often focused on how a customer is billed for using Serverless and what they would need to consider when using global tables, so it's important to understand these costs, but you will not be asked to calculate the actual costs.
We've now looked at all the key Aurora features, how it differs from RDS, and how it's priced. Aurora questions are featured heavily in the exam and you will often be asked workload-specific questions where you need to be able to differentiate between a workload only suitable for RDS versus when you might want to use Aurora Serverless. Let's now practice creating and working with an Aurora cluster in a hands-on lab.
Now we have learned about Aurora and its features, let's deploy our own cluster to practice and to see how the topics we've covered in this chapter work together. We will be deploying an Aurora cluster using the MySQL engine in Serverless mode and we'll then add Global Database. We'll use both the console and awscli for these steps.
We'll start by provisioning an Aurora MySQL cluster. We'll be using the Ohio (us-east-1) region. It is important to switch off encryption for this cluster, otherwise we will get an error when creating a global database. In a production environment, we would create a custom KMS key to be used for our multi-region databases, but for now, we will turn off encryption:
aws rds describe-db-instances
You will see output similar to the following:
{
"DBInstances": [
{
"DBInstanceIdentifier": "dbcertaur1",
"DBInstanceClass": "db.r5.large",
"Engine": "mysql",
"DBInstanceStatus": "available",
...
}
]
}
Once the database has been provisioned and shows an AVAILABLE status, we can modify it.
We are now going to add a new reader instance to our cluster, but we are going to deploy it using an auto-scaling policy:
Note
Take a look at the settings and options to ensure you understand what they mean and what they do before continuing.
If this is not visible immediately, go to the DB instance, and then the Logs and Events tab. You should see two entries like this, showing that the auto-scaling policy and the event to create a new read replica have been triggered:
Now we've got our cluster up and running with read replicas, let's make it global.
Now we have our Aurora cluster running and we have the read replicas auto-scaling policy in place, let's create some global databases. The first thing we need to do is to change to a self-managed key; AWS default keys cannot be shared cross-region so we would hit an error if we tried to use one:
You may get an error saying that the database version chosen doesn't meet the requirements for global databases. If this is the case, then click the Modify button and change to a supported type. You can see the supported types for Global Database by selecting the option. You may also need to change the instance size to a higher class if you did not select the r5.large option:
The compatible options change regularly so you'll need to refer to the AWS guides (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html) to find the right combination.
You've now created an Aurora global database across two regions. If you wish, you can now delete the Aurora cluster to save costs, as Aurora is not available on the Free Tier.
In this chapter, we have learned about Amazon Aurora. We have learned how Aurora differs from RDS, what database types are supported, how to deploy both a provisioned and Serverless Aurora cluster, and how to carry out some common maintenance and configuration tasks. We learned how to use both the AWS console and awscli to interact with our databases. These skills will enable us to work with Amazon Aurora databases confidently, as well as describe the use cases and benefits of Aurora compared to RDS.
During the AWS Certified Database – Specialty exam, your knowledge of Aurora will be tested heavily with questions around troubleshooting, service limits, Serverless and Global Database features, and migrating from RDS.
In the next chapter, we will be learning about AWS DynamoDB, which is a NoSQL database designed and fully managed by AWS. DynamoDB is very different from both RDS and Aurora as it supports unstructured data and does not rely on complex queries with joins.
This cheat sheet reminds you of the high-level topics and points covered in this chapter and should act as a revision guide and refresher:
Let's now check your knowledge of what you have learned during this chapter.
To check your knowledge from this chapter, here are five questions that you should now be able to answer. Remember the exam techniques from Chapter 1, AWS Certified Database – Specialty Exam Overview, and remove the clearly incorrect answers first to help you:
Answers with explanations can be found in Chapter 17, Answers.
In this chapter, we have covered the most common Aurora topics. In the AWS Certified Database – Specialty exam, you will be expected to know and understand how other areas of AWS interact with Aurora, which we will cover in more depth in later chapters. However, for a deeper understanding of how the underlying storage and network configuration of Aurora works, refer to the book AWS: Security Best Practices on AWS (https://subscription.packtpub.com/book/virtualization_and_cloud/9781789134513/2/ch02lvl1sec19/aws-kms).
18.191.144.48