Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. Amazon Machine Images

A beginning is a very delicate time
Princess Irulan Corrino

At its core, the EC2 service is a self-service interface to provision virtual machines, either automatically or in interactive fashion. We have already seen how with nothing more than an AWS account and a credit card, an administrator can launch a system (or a few dozen) in seconds. In this chapter we examine how virtual machines can be manipulated in their offline state, through their Amazon Machine Image (AMI) representation. The operator can derive a machine image from another using snapshots, limiting repetitive work and opportunities for error.

What is an AMI?

Some AMIs are virtual appliances—preconfigured server images running a variety of operating systems and software stacks. Amazon provides a number of its own images, running open source and commercial software, and allows any third party to distribute their images through the AWS Marketplace. You can also create your own images, configured exactly to meet your requirements, and share them with a few selected accounts or choose to make them public altogether.

Building your own AMIs has a number of benefits. You get to customize the software selection and configure which services will start when the instance is launched. Any services that are not required can be disabled to cut down on wasted resources. Later chapters show how to launch instances automatically in response to external conditions such as traffic levels (when instances are launched in response to growing demand, it is important they are ready for service as soon as possible).

Once an instance has been configured and an image created from it, that configuration is baked into the AMI. As we look at configuration management tools in Chapter 5, we will see how tools like Puppet can be used to dynamically configure an instance. This raises the question of how much of the configuration should be baked into the AMI, and how much should be dynamically configured.

At one end of the spectrum, you can deploy an entirely vanilla Ubuntu image, automatically install a configuration management tool such as Puppet, and then apply your desired configuration to start up the correct services (such as Nginx for a web server). At the other end of the spectrum, you could create a custom AMI for each specific role within the application: one for the database server, one for the web server, and so on. In the latter case, all configuration options are baked into the AMI, and no dynamic configuration is performed when the instance is launched.

In our experience, the best option is somewhere in the middle: some roles have their own AMI, whereas other AMIs perform multiple roles. The most efficient choice will depend on various factors, including the type of software you deploy and how frequently you modify the server configuration. If it is important for newly launched instances to start serving requests as quickly as possible (which includes practically all uses of Auto Scaling), you’ll want to reduce the amount of automatic configuration that takes place on boot.

At its core, an AMI is essentially a disk image and a metadata file describing how that disk image can be used to launch a virtual server. The metadata file keeps track of internal information that is required when launching instances from this AMI. Multiple disk images can be included within a single AMI, with the metadata file mapping how volumes are attached to their instance. It is important to note that the current Nitro virtualization system regards AMIs just as if they were real disks, and no special provisions are necessary to use custom kernels — to the point that even UEFI boot modes are optionally available to replace the traditional BIOS.

Tip

EC2 supports the execution of user-provided operating system kernels. The mechanism to accomplish this has evolved over time, originally employing Amazon Kernel Images (AKI), and the PV-GRUB bootloader to support paravirtualized bootstrap.

This approach became no longer necessary with the arrival of HVM virtualization, which regards the AMI disk image just as if it were a physical device. Older legacy instance types were retro fitted in December of 2017 to support HVM virtualization in addition to the original legacy hypervisor, rendering any new use of paravirtualized kernels unnecessary.

Note

Ubuntu introduced the use of AWS-tuned Linux kernels in April of 2017. The pupose-built package boots 30% faster and includes EC2-specific storage and network support. Official Ubuntu images are configured to use the AWS-tuned kernel by default, and should you need to include a custom-compiled kernel in an AMI for any reason, the linux-aws package would be the starting point to build it.

Note that the default linux-aws became a rolling kernel in 2020, meaning it can upgrade to newer upstream versions as part of its regular release stream. A linux-aws-lts kernel flavor is separately maintained for operators wishing to minimize changes.

Stop, Reboot, or Terminate?

A running EC2 instance can have its normal operation suddenly interrupted by one of three operations: stop (followed by start at a later time), reboot, or terminate, each resulting in a sharply different outcome. Understanding the difference between these operations is important when cycling instances as will take place working with AMIs.

Terminating an instance results in its immediate and possibly final demise, as by default termination results in the deletion of the EBS root volume backing the instance. Termination protection is available for EC2 instances using a mechanism analog to that previously discussed for CloudFormation stacks.

Triggering a stop will cause an instance’s virtual machine to halt in orderly fashion, preserving its EBS root volume for future use and wiping out any ephemeral storage present (instances with ephemeral root volumes cannot be stopped and restarted). Any Elastic IP address in use will remain associated with the instance, while private and public IP addresses are nearly certain to change at restart, although random re-assignment of the same address is technically possible.

Stopping an instance will also nearly certainly result in it being restarted on a different host. When choosing to reboot an instance instead of stopping and restarting it, EC2 will maintain the instance on the same host. Do not use reboot to troubleshoot a noisy neighbor issue.

EC2 enables operators to customize the outcome of the operating system’s own software-initiated shutdown, which defaults to a stop operation but can be switched to terminate. EBS root volume deletion behavior can also be customized to preserve the volume, which can be very useful when troubleshooting terminated instances.

AMI Types

In the early days of EC2, the only available AMI type was what is now known as an instance store−backed AMI. As the Elastic Block Store service was introduced and evolved, an additional type of AMI was created: the EBS-backed AMI. The key architectural difference between the two is in where the disk image that contains the root volume is stored.

For EBS-backed AMIs, the disk image is simply an EBS snapshot. When launching a new instance from such an image, a volume is created using this snapshot, and this new volume is used as the root device on the instance.

Instance store−backed AMIs are created from template disk images stored in S3, which means the disk image must be copied from S3 each time an instance is launched, introducing a startup delay over EBS-backed instances. Because the image must be downloaded from S3 each time, the root volume size is also limited to 10 GB, whereas EBS-backed instances have their root volumes limited to a more generous 16 TB. The choice of storage technology used for the AMI also affects instance boot time, with EBS-backed AMIs starting in under one minute whereas instance-store start-up times can reach as high as five minutes.

In practice, an EBS-backed AMI is nearly always the best option. This type of AMI can be temporarily stopped and restarted without losing any data, whereas instance store−backed AMIs can only be terminated, at which point all data stored on the volume is lost.

It is easy to identify what type of storage is in use by a given AMI. In the AWS Console, this attribute is found in the Root Device Type field in the Details tab of an AMI. The command line approach is equally straightforward:

 $ aws ec2 describe-images --image-ids ami-0be3f0371736d5394 | 
jq '.. | .RootDeviceType?'
"ebs"

Upgrading a Running Instance’s Hardware

AWS’s ability to change the hardware underlying your instance with just a few API calls is a great perk: you can upgrade your systems (or scale back your expense) with unprecedented ease. This complements AWS’s ability to scale out by adding more instances with the ability to scale up by moving existing instances to more powerful virtual hardware.

To take advantage of this capability your instances must be EBS-backed, as they will need to be restarted in order to change instance type. You will want to use an elastic IP address to be able to maintain the same network endpoint for your existing service. Lastly you should standardize on 64-bit AMIs across your EC2 deployment, as changing bit width requires replacing the AMI itself. For this very same reason the instance’s existing virtualization technology and root volume support choices cannot be altered.

Let’s play it all out in practice by turning an instance of one type into another:

 $ aws ec2 run-instances --image-id ami-0be3f0371736d5394 
--region us-east-1 --instance-type t3.micro  --output text
740376006796	r-03888133e3ed532ea
INSTANCES	0	x86_64	65cb7dec-1153-4fc8-b430-2b9022d46b7f	False	True	xen	ami-0be3f0371736d5394	i-06f244fb304064228	t3.micro	2021-01-09T04:12:57.000Z	ip-172-31-85-151.ec2.internal	172.31.85.151		/dev/sda1	ebs	True		subnet-c2b704ce	hvm	vpc-934935f7
[ output truncated ]

 $ aws ec2 describe-instances --instance-ids i-06f244fb304064228 | grep Type
                    "InstanceType": "t3.micro",
                    "RootDeviceType": "ebs",
                    "VirtualizationType": "hvm",

$ aws ec2 stop-instances --instance-ids i-06f244fb304064228 --output text
STOPPINGINSTANCES	i-06f244fb304064228
CURRENTSTATE	64	stopping
PREVIOUSSTATE	16	running

$ aws ec2 modify-instance-attribute --instance-type m5.4xlarge 
--instance-id i-06f244fb304064228

$ aws ec2 start-instances --instance-ids i-06f244fb304064228 --output text
STARTINGINSTANCES	i-06f244fb304064228
CURRENTSTATE	0	pending
PREVIOUSSTATE	80	stopped

$ aws ec2 describe-instances --instance-ids i-06f244fb304064228 | grep Type
                    "InstanceType": "m5.4xlarge",
                    "RootDeviceType": "ebs",
                    "VirtualizationType": "hvm",

After launching a t3.micro instance, we stopped it and re-initialized the same system on the hardware of a m5.4xlarge. Once you re-associate the elastic IP address, your server will have just received an on-the-fly upgrade from a flimsy dual core with only one gigabyte of RAM to a considerably beefier 64 GBs powered by sixteen cores—not bad for a few seconds’ work!

Building Your Own AMI

AMI builds should be automated as soon as possible, if they take place with any kind of regularity. It is tedious work and involves a lot of waiting around. Automating the process means you’ll probably update AMIs more frequently, reducing a barrier to pushing out new features and software upgrades. Imagine you learn of a critical security flaw in your web server software that must be updated immediately. Having a procedure in place to create new AMIs and push them into production will help you respond to such scenarios rapidly and without wasting lots of time.

To demonstrate the procedures for creating an AMI and some of the useful features that AMIs provide, let’s create an AMI using the command-line tools. This AMI will run an Nginx web server that displays a simple welcome page. We will look at a method of automating this procedure later in the book, in Chapter 10.

Begin by selecting an AMI to use as a base. We will be using our usual Ubuntu 20.04 image with the ID ami-0be3f0371736d5394. Launch an instance of this AMI with aws ec2 run-instances, remembering to specify a valid key pair name and security group granting access to SSH, then use aws ec2 describe-instances to find out the public DNS name for the instance:

$ # if you have not created a security group for SSH access yet,
$ # you need to do that first:

$ aws ec2 create-security-group --group-name ssh --description "SSH Access"
{
    "GroupId": "sg-07ec8513a0bf9acf2"
}

$ aws ec2 authorize-security-group-ingress --group-name ssh --protocol tcp 
--port 22 --cidr 0.0.0.0/0

$ aws ec2 describe-security-groups --group-names ssh --output text
SECURITYGROUPS	SSH Access	sg-07ec8513a0bf9acf2	ssh	740376006796	vpc-934935f7
IPPERMISSIONS	22	tcp	22
IPRANGES	0.0.0.0/0
IPPERMISSIONSEGRESS	-1
IPRANGES	0.0.0.0/0

$ aws ec2 run-instances --image-id ami-0be3f0371736d5394 --region us-east-1 
--key federico  --security-groups ssh --instance-type t3.micro
740376006796	r-070f3b8d826755da5
INSTANCES	0	x86_64	5b656d7d-6324-466a-a6b5-5e581fcc4201	False	True	xen	ami-0be3f0371736d5394	i-0d3f9965712b45500	t3.micro	federico	2021-01-09T16:09:06.000Z	ip-172-31-89-214.ec2.internal	172.31.89.214		/dev/sda1	ebs	True		subnet-c2b704ce	hvm	vpc-934935f7
[ output truncated ]

$ aws ec2 describe-instances --instance-ids i-0d3f9965712b45500 
--region us-east-1 --output text
RESERVATIONS	740376006796	r-070f3b8d826755da5
INSTANCES	0	x86_64	5b656d7d-6324-466a-a6b5-5e581fcc4201	False	True	xen	ami-0be3f0371736d5394	i-0d3f9965712b45500	t3.micro	federico	2021-01-09T16:09:06.000Z	ip-172-31-89-214.ec2.internal	172.31.89.214	ec2-34-236-36-177.compute-1.amazonaws.com	34.236.36.177	/dev/sda1ebs	True		subnet-c2b704ce	hvm	vpc-934935f7
[ output truncated ]

Once the instance has launched, we need to log in via SSH to install Nginx. If you are not using Ubuntu, the installation instructions will differ slightly. On Ubuntu, update the package repositories and install Nginx as follows:

$ ssh ubuntu@ec2-34-236-36-177.compute-1.amazonaws.com
The authenticity of host 'ec2-34-236-36-177.compute-1.amazonaws.com (34.236.36.177)' can't be established.
ECDSA key fingerprint is SHA256:ERzqPqdJzxJwYYqmCF0uYof4y9K4NDTo1pBFkCE7exQ.
Are you sure you want to continue connecting (yes/no)? yes

$ sudo apt update
[ output truncated ]

$ sudo apt install nginx-full --assume-yes
[ output truncated ]

By default, Nginx is installed with a welcome page stored at /usr/share/nginx/html/index.html. If you like, you can modify this file to contain some custom content.

Once the instance is configured, we need to create a matching AMI using aws ec2 create-image. This command will automatically create an AMI from a running instance. Doing so requires that the instance be stopped and restarted, so your SSH session will be terminated when you run this command. In the background, a snapshot of the EBS volumes used for your instance will be made. This snapshot will be used when launching new instances through a newfangled AMI ID. Because it can take some time before snapshots are ready to use, your new AMI will remain in the pending state for a while after aws ec2 create-image completes. The image cannot be used until it enters the available state. You can check on the status in the Management Console or with the aws ec2 describe-images command:

$ aws ec2 create-image --instance-id i-0d3f9965712b45500 --region us-east-1 
--name test-image --output text
ami-0bfd16f7927a830a5

$ aws ec2 describe-images --region us-east-1 --image-ids ami-0bfd16f7927a830a5 
--output text
IMAGES	x86_64	2021-01-10T02:31:57.000Z	True	xen	ami-0bfd16f7927a830a5	740376006796/test-image	machine	test-image	740376006796	Linux/UNIX	False	/dev/sda1	ebs	simple	pending	RunInstances	hvm
BLOCKDEVICEMAPPINGS	/dev/sda1
EBS	True	False	snap-0e8fda8383ec4176a	8	gp2
BLOCKDEVICEMAPPINGS	/dev/sdb	ephemeral0
BLOCKDEVICEMAPPINGS	/dev/sdc	ephemeral1

When your new image is ready, it can be launched by any of the means described previously. Launch a new instance based on this image and get the public DNS name with aws ec2 describe-instances. Connect via SSH, then confirm that Nginx has started automatically:

$ service nginx status
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-01-10 04:16:04 UTC; 2min 39s ago
       Docs: man:nginx(8)
   Main PID: 645 (nginx)
      Tasks: 3 (limit: 1134)
     Memory: 11.8M
     CGroup: /system.slice/nginx.service
             ├─645 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
             ├─646 nginx: worker process
             └─647 nginx: worker process

[ output truncated ]

Although we have configured Nginx and have a running web server, you can’t access the Nginx welcome page just yet. If you try to visit the instance’s public DNS name in your web browser, the request will eventually time out. This is because EC2 instances are, by default, protected by a firewall that allows only connections from instances in the same security group—incoming HTTP connections have to be explicitly enabled with the same processes we used to allow inbound SSH connections. These firewalls, known as security groups, are discussed in the next chapter.

Tip

An EC2 instance’s boot time is brief enough to be of no concern for most operators, but the software stack running on it may have a lengthier initialization time than a web server. EC2 addresses the need for prompt startup of complex workloads with creative use of Linux’s hibernation support, which is also a handy strategy to preserve some application caches with long warm-up times.

Hibernate requires a OS supporting it (Ubuntu and Amazon Linux both do), launched on an istance type capable of hibernation, with appropriate options (--hibernation-options Configured=true), and a custom block device mapping file allocating disk space to persist RAM to disk. Additional configuration steps apply — on Ubuntu, disabling kernel ASLR is required. While not the simplest feature to use, hibernate resolves startup speed requirements that could otherwise go unresolved. Fortunately, Amazon has documented all the required steps in detail in this example.

Remember that both this instance and the original instance from which we created the image are still running. You might want to terminate those before moving on to the next section. The two-line script can be used to terminate all running EC2 instances in your account to clean the slate after running a few experiments.

Tagging your images is a good way to keep track of them. This can be done with the aws ec2 create-tags command. By using backticks to capture the output of shell commands, you can quickly add useful information, such as who created the AMI, as well as static information like the role:

$ aws ec2 create-tags --resources ami-0bfd16f7927a830a5 
--tags Key=role,Value=webserver Key=created-by,Value=`whoami` 
Key=stage,Value=production

$ aws ec2 describe-tags --output text | grep ami-0bfd16f7927a830a5
TAGS	created-by	ami-0bfd16f7927a830a5	image	federico
TAGS	role	ami-0bfd16f7927a830a5	image	webserver
TAGS	stage	ami-0bfd16f7927a830a5	image	production

Deregistering AMIs

Once an AMI is no longer required, it should be deregistered, which means it will no longer be available to use for launching new instances. Although they are not particularly expensive, it is good practice to regularly remove old AMIs because they clutter up the interface and as they accumulate they will contribute to a gradual increase of your AWS costs.

Tip

A good way to identify snapshots in your account ripe for deletion is to retrieve the complete listing of snapshots associated with your OwnerID and applying additional filtering. The OwnerID for your account can be found in the Account Identifiers section of the Security Credentials page, but the handy alias self is always available. To list all your snapshots, enter:

aws ec2 describe-snapshots --owner-ids self 
--output text

You must also delete the snapshot used to create the root volume. This will not happen automatically.

AWS used to allow you to delete the snapshot before deregistering the AMI. Doing so meant you would have an AMI that looks as though it is available and ready for use, but will, in fact, fail when you tried to launch an instance. This has since changed, in one of countless examples of AWS transparently improving infrastructure behind the scenes. Deleting the snapshot backing an AMI now errors out with a helpful error message:

$ aws ec2 delete-snapshot --region us-east-1 --snapshot-id snap-0e8fda8383ec4176a

An error occurred (InvalidSnapshot.InUse) when calling the DeleteSnapshot operation: The snapshot snap-0e8fda8383ec4176a is currently in use by ami-0bfd16f7927a830a5

However, operators can still get themselves in trouble carelessly decommissioning AMIs. If a deregistered AMI is referenced in Auto Scaling groups, it might be some time before you notice the problem. The only option in that case is to quickly create a new AMI and update the Auto Scaling group.

You can check to see whether a particular AMI is in use by running instances with the aws ec2 describe-instances command. For example:

$ aws ec2 describe-instances --output text --filters 
Name=image-id,Values=ami-0bfd16f7927a830a5
RESERVATIONS	740376006796	043234062703	r-04284239167699bbd
INSTANCES	0	x86_64	examp-Ec2In-1RM2I5Y54AWJ1	False	True	xen	ami-0bfd16f7927a830a5	i-0015bd2909173d777	t3.micro	2021-01-11T04:05:23.000Z	ip-172-31-88-197.ec2.internal	172.31.88.197	ec2-3-239-84-160.compute-1.amazonaws.com	3.239.84.160	/dev/sda1	ebs	True		subnet-c2b704ce	hvm	vpc-934935f7
[ output truncated ]
TAGS	aws:cloudformation:stack-id	arn:aws:cloudformation:us-east-1:740376006796:stack/example-stack/37295ab0-53c2-11eb-b1cc-0a23b021aa67
TAGS	aws:cloudformation:stack-name	example-stack
TAGS	aws:cloudformation:logical-id	Ec2Instance

This works for individual instances. For instances that were launched as part of an Auto Scaling group, we can instead use the aws autoscaling describe-launch-configurations command. Unfortunately, this command does not accept a filter argument, so it cannot be used in quite the same way. As a workaround, you can grep the output of aws autoscaling describe-launch-configs for the AMI ID.

Performing these checks before deleting AMIs en masse can save you from a rather irritating cleanup exercise.

Once you are sure the AMI is safe to deregister, you can do so with aws ec2 deregister-image:

$ aws ec2 deregister-image --image-id ami-0bfd16f7927a830a5 --region us-east-1

Remember to delete the snapshot that was used as the root volume of the AMI. You can find it through the aws ec2 describe-snapshots command. When AWS creates a new snapshot, it uses the description field to store the ID of the AMI it was created for, as well as the instance and volume IDs referencing the resources it was created from. Therefore, we can use the AMI ID as a filter in our search, returning the ID of the snapshot we want to delete:

$ aws ec2 describe-snapshots --region us-east-1 --output text --filters 
Name=description,Values="Created by CreateImage*for ami-0bfd16f7927a830a5*"

SNAPSHOTS	Created by CreateImage(i-0d3f9965712b45500) for ami-0bfd16f7927a830a5 from vol-089c950bd54753f4e	False	740376006796	100%	snap-0e8fda8383ec4176a	2021-01-10T02:33:27.412Z	completed	vol-089c950bd54753f4e	8

$ aws ec2 delete-snapshot --snapshot-id snap-0e8fda8383ec4176a 
--region us-east-1

Filter string wildcards (stars) are a really handy feature, and are needed here to match those unknown volume and instance identifiers found in the description string.

Automatic snapshot management has long made everyone’s list of Amazon AWS missing features. A Google search can produce a long list of (mostly out of date) scripts using the EC2 API meant by different users to manage this deficiency. In 2019 Amazon finally introduced the eagerly awaited Data Lifecycle Manager service to address this gap, which is the object of the next section. Before we take the easy way out, let’s examine snapshot management as a great example of how API access and automation can combine to fill any kind of “missing feature” gap.

Querying Snapshots

The AWS CLI has improved over the years to the point where finding snapshots has become much easier; it just requires a little skill with the --query filter. For example, one can find all the snapshots taken before a certain date with the following:

$ aws ec2 describe-snapshots --owner-ids self --output text 
--query 'Snapshots[?StartTime<=`2020-05-31`]'

Created by CreateImage(i-b0b59281) for ami-7c81c46b from vol-03d9b5d6	False740376006796	100%	snap-514c5be2	2016-09-25T23:22:52.000Z	completed	vol-03d9b5d6	8
Created by CreateImage(i-0bd62c3b) for ami-cc491edb from vol-44fce591	False740376006796	100%	snap-6d9afde5	2016-10-16T07:39:19.000Z	completed	vol-44fce591	8
TAGS	Name	Mezzanine test snap page 190 at exit, before db is deleted
Created by CreateImage(i-a6844bbe) for ami-6a5d367d from vol-73fe60d4	False740376006796	100%	snap-428f90da	2016-09-03T20:49:26.000Z	completed	vol-73fe60d4	8

We just found some snapshots Federico forgot to delete after finishing work on the first edition of the Peccary Book. Similarly, we can tabulate all snapshots belonging to our account with only selected attributes included in our query’s results. Just like jq, the query option lets us re-label fields on the fly using a dictionary:

$ aws ec2 describe-snapshots --owner-ids self 
--query 'Snapshots[*].{Label:Description,ID:SnapshotId,Time:StartTime}'

[
    {
        "Label": "Created by CreateImage(i-b0b59281) for ami-7c81c46b from vol-03d9b5d6",
        "ID": "snap-514c5be2",
        "Time": "2016-09-25T23:22:52.000Z"
    },
    {
        "Label": "Created by CreateImage(i-0bd62c3b) for ami-cc491edb from vol-44fce591",
        "ID": "snap-6d9afde5",
        "Time": "2016-10-16T07:39:19.000Z"
    },
    {
        "Label": "Created by CreateImage(i-a6844bbe) for ami-6a5d367d from vol-73fe60d4",
        "ID": "snap-428f90da",
        "Time": "2016-09-03T20:49:26.000Z"
    },
    {
        "Label": "Created by CreateImage(i-0d3f9965712b45500) for ami-0bfd16f7927a830a5 from vol-089c950bd54753f4e",
        "ID": "snap-0e8fda8383ec4176a",
        "Time": "2021-01-10T02:33:27.412Z"
    }
]
]

Combining and extending these commands, we can automate our annual snapshot cleanup with just a little bit of help from the Linux shell to manipulate dates:

#! /bin/bash
REGION=us-east-1

echo "Clearing all EC2 snapshots older than one year from $REGION"

AGE_FILTER=``date +%Y-%m-%d --date '1 year ago'``
SNAPSHOTS=$(aws ec2 describe-snapshots --owner-ids self 
--query "Snapshots[?StartTime<=$AGE_FILTER].{ID:SnapshotId}" --output text)
for i in $SNAPSHOTS
do
 echo "deleting $i"
 aws ec2 delete-snapshot --region $REGION --snapshot-id $i
done

The results are quite self-explaining:

$ ./snapshot-annual-purge.sh
Clearing all EC2 snapshots older than one year from us-east-1
deleting snap-514c5be2
deleting snap-6d9afde5
deleting snap-428f90da

The application of --filters also enables searching for items tagged with a certain value. The following searches for any images tagged as retired from our production environment:

$ aws ec2 describe-images --filters Name=tag-key,Values="stage" 
Name=tag-value,Values="retired" --output text
IMAGES	x86_64	2017-05-29T04:16:43.000Z	xen	ami-838ac495	740376006796/test-image	machine	test-image	740376006796	False	/dev/sda1	ebs	simple	available	hvm
BLOCKDEVICEMAPPINGS	/dev/sda1
EBS	True	False	snap-1b793584	8	standard
BLOCKDEVICEMAPPINGS	/dev/sdb	ephemeral0
BLOCKDEVICEMAPPINGS	/dev/sdc	ephemeral1
TAGS	stage	retired

Automating Deletion

The cleanup we performed manually can be automated with a single consolidated Boto script, shown in Example 2-1. This script will delete all images with a staging lifecycle tag set to a value of retired.

Example 2-1. Deleting images with a Python script

#!/usr/bin/env python

from boto.ec2 import connect_to_region

ec2_conn = connect_to_region('us-east-1')

print 'Deleting retired AMI images.
'

for image in ec2_conn.get_all_images(owners='self', filters={'tag:stage': 'retired'}):
     print ' Deleting image %s and associated snapshot' % (image.id)
     image.deregister(delete_snapshot=True)

This script relies on your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables being set—Boto will attempt to read these automatically. It will delete all images (and the associated snapshots) that have been placed in the retired staging environment. To use this script, make sure your instances follow the tagging strategy described in “Tagging Strategy”. Save this file as delete-retired-amis.py and use chmod to make it executable.

The call to get_all_images specifies some filter conditions: we are interested in images that have a stage tag with a value of retired.

Deleting an image does not automatically delete the snapshot it uses for its root volume so we must do this by setting the delete_snapshot parameter of deregister to True.

Warning

A snapshot can be shared with other organizations by modifying its permissions to include another account’s ID in the AWS console. Make sure you avoid making public any snapshots containing private data, even for short intervals. Hackers can use the AWS API to trivially discover and instantly clone any such snapshots in their quest for valuable data and credentials.

Security researchers have demonstrated exfiltration of SSH login keys, AWS credentials, API access keys, confidential genome sequences, and even the full payroll of a Fortune 100 company from snapshots that were lazily configured with public access permissions.

The Data Lifecycle Manager

The natural destiny of the automation approach designed in the last section is turning it into a cloud-hosted service, ideally as an AWS Lambda function. Amazon did just that for us in 2019, introducing the Data Lifecycle Manager service to automate the management of EBS artifacts.

The Data Lifecycle Manager service provides operators with a mechanism to automatically manage snapshots, as well as AMIs backed by EBS. DLM works by defining policies for their automated creation or deletion, and even supports management actions on volume copies incoming from other AWS accounts.

The new service can be controlled from the AWS CLI through the aws dlm set of commands, but we find that initial policy creation is most easily carried out with the guided wizard in the AWS Console (Figure 2-1).

To define a snapshot creation cycle with cleanup logic matching our scripted automation example, choose EBS snapshots as the object of the policy. Select “instance” to limit its scope to instances the volumes are attached to (an alternative is to use volumes already pre-tagged by separate automation). We are targeting instances tagged with a stage value of production, and are not tagging the lifecycle policy itself at this time. DLM automatically applies the dlm:managed tag to label all entities it manages, and includes additional policy ID, schedule ID, and expiration time metadata by default in other tags. Further on in the wizard, toggle the option to copy tags from the source volume to maintain the consistency of stage labels found in this deployment.

The next section of the wizard requires the IAM role these operations are to execute under—we will discuss roles in the next chapter as we illustrate AWS security, but for now the default option shall suffice. At least one schedule needs to be attached to the policy, and up to four can be defined. In Figure 2-2 we define a policy taking snapshots every Thursday, to be retained for a year. A useful alternative is count-based retention, where a maximum total of snapshots is preserved and the oldest snapshots are automatically deleted once that threshold is exceeded.

Activate the policy by selecting the “enable policy” radio button and clicking on “create policy”. Snapshots of production EBS volumes will now be taken weekly on Thursday, and discarded automagically a year later. Automating these steps saves the operator from the need to track and perform manual cleanups, and rescues the CIO from incurring needless additional cost.

DLM is a great instrument to automate backups snapshots, ensuring the cleanup of older artifacts is also taken care of as part of the process. This new service includes advanced options to replicate snapshots to alternate regions, supporting disaster recovery architectures, and even supports copying EBS artifacts across accounts, providing a logical substitute to offsite backups.

Pets versus Cattle

Microsoft’s Bill Baker is credited with originating the metaphor popularized by OpenStack’s Randy Bias that so vividly illustrates two radically opposite approaches to managing servers. In this tale, pets are lovingly cared for, taken to the vet when they get sick, and tenderly nursed back to health—cattle, on the other hand, are replaced without a second thought, even slaughtered. This distinction is humorously used to illustrate the more formal distinction delineated by Gartner in IT operations: traditional Mode 1 IT servers are highly managed assets that scale up to bigger, more costly hardware and are carefully restored to health should anything go amiss. Mode 2 IT, on the other hand, espouses a radically different operational philosophy: servers are highly disposable entities that are instantiated through automation, eliminated at the drop of a hat when no longer needed, and “scale out” in herds. Replacing a server with an expensive many-socket system and more complex memory architecture is decidedly Mode 1, while adding as many equally sized web servers behind a load balancer as required to deliver today’s service load is the Mode 2 way.

Mode 1 IT is the mainstay of traditional datacenter operations, whereas Mode 2 has emerged as the prim and proper way to design and operate applications in a public cloud environment, now known as the “cloud native” approach. AWS gives you plenty of choices in how you achieve your goals, and we have been introducing all the technical details you need in order to scale services either up or out, but as we proceed to design a realistic application in the coming chapters, we will decidedly lean the way a cloud architect would, and adopt a Mode 2 mindset in our design exclusively. You should do the same; pets do not belong in your cloud architecture.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
2. Amazon Machine Images

Chapter 2. Amazon Machine Images

What is an AMI?

Tip

Note

AMI Types

Building Your Own AMI

Tip

Deregistering AMIs

Tip

Querying Snapshots

Automating Deletion

Example 2-1. Deleting images with a Python script

Warning

The Data Lifecycle Manager

Figure 2-1. Defining a lifecycle policy purging snapshots older than a year

Figure 2-2. Defining a schedule for execution of the lifecycle policy

Pets versus Cattle

Table of Contents for 2. Amazon Machine Images

Create new playlist

Sign In

Sign Up

Chapter 2. Amazon Machine Images

What is an AMI?

Tip

Note

AMI Types

Building Your Own AMI

Tip

Deregistering AMIs

Tip

Querying Snapshots

Automating Deletion

Example 2-1. Deleting images with a Python script

Warning

The Data Lifecycle Manager

Figure 2-1. Defining a lifecycle policy purging snapshots older than a year

Figure 2-2. Defining a schedule for execution of the lifecycle policy

Pets versus Cattle

Table of Contents for
2. Amazon Machine Images