Rolling out AMI upgrades with Terraform

Remember that we used data resource "aws_ami" to pull the latest AMI belonging to the AWS account configured in the template? At that moment, we didn't put much effort into it, blindly pulling any existing AMI , as long as it is the latest updated one:

data "aws_ami" "app-ami" { 
  most_recent = true 
  owners = ["self"] 
} 

With Packer building our AMIs, we can put a bit more effort into this resource. We need to make sure that it pulls the image that is suitable for this application. First, simplify Packer template: remove any variables and make sure that the "ami_name" key looks as simple as the following:

"ami_name": "centos-7-base-puppet-{{timestamp}}", 

Rebake the image and then modify Terraform application module to use the following image:

data "aws_ami" "app-ami" { 
  most_recent = true 
  owners = ["self"] 
  filter { 
    name = "name" 
    values = ["centos-7-base-puppet*"] 
  } 
} 

From the aws_instance resource, we can now remove the provisioner: it was only installing Puppet on the machine, and we already have it installed inside the freshly baked AMI:

resource "aws_instance" "app-server" { 
  ami = "${data.aws_ami.app-ami.id}" 
  instance_type = "${lookup(var.instance_type, var.environment)}" 
  subnet_id = "${element(var.subnets, count.index % 2)}" 
  vpc_security_group_ids = ["${concat(var.extra_sgs,   
  aws_security_group.allow_http.*.id)}"] 
  user_data = "${data.template_file.user_data.rendered}" 
  key_name = "${var.keypair}" 
  tags { 
    Name = "${var.name}" 
  } 
  count = "${var.instance_count}" 
} 

We are still keeping user_data, in case extra on-the-boot modifications to the server are required. Run terraform apply command and make sure that you've destroyed all previously created resources by running terraform destroy command. As a result you will get two instances with an AMI created by Packer.

Now to the interesting part: what if we will update the AMI? Rerun packer build base.json, give it few minutes to run and then execute the terraform plan command to see what Terraform is going to do now:

$> terraform plan
~ module.mighty_trousers.aws_elb.load-balancer
    instances.#: "" => "<computed>"
-/+ module.mighty_trousers.aws_instance.app-server.0
    ami:                               "ami-7f7fba10" => "ami-707cb91f" (forces new resource)
....
-/+ module.mighty_trousers.aws_instance.app-server.1
    ami:                               "ami-7f7fba10" => "ami-707cb91f" (forces new resource)

Apparently, our instances will be recreated because of the latest AMI change. Knowing the nature of Terraform, it will try to recreate them in parallel, leading to a possible long downtime. This is not what we consider the smooth update. So how do we make it as painless as possible?

We talked about it in one of the initial chapters, but most likely you already forgot about the lifecycle block, specifically the  create_before_destroy option. It will first create a new EC2 instance for us, and only then it will remove the old one. Let's add it:

resource "aws_instance" "app-server" { 
  ami = "${data.aws_ami.app-ami.id}" 
  # ... 
  count = "${var.instance_count}" 
  lifecycle { 
    create_before_destroy = true 
  } 
} 

With this in place, the time required to switch AMIs will be much shorter. But it's still not perfect, because we can easily end up with two instances being unavailable simultaneously. It can be okayish for some applications, and it can be a complete disaster for others. What we should do is roll updates by replacing instances one by one. And that's where we are going to hit limitations of Terraform pretty hard because it doesn't have any automated way to perform such upgrades.

Terraform allows you to apply only one resource using the -target argument. It is quite handy because it allows us to build a chain of commands, each of them changing only one instance, as follows:

$> terraform apply  -target "module.mighty_trousers.aws_instance.app-server[0]"
$> terraform apply  -target "module.mighty_trousers.aws_instance.app-server[1]"

It is handy, though, only if you have two, or maybe a dozen servers. More than 15 or 100? Good luck with doing this manually. We don't have to do it manually, though. We can script it.

As a most simplistic dumb (and ugly) example, take a look at this tiny Ruby script:

plan = 'terraform plan | grep "\-/+ module.mighty_trousers.aws_instance.app-server"' 
 
plan.split("
").each do |line| 
  line = line.gsub(/.+module/, "module") 
  components = line.split(".") 
  resource = "#{components[0..-2].join(".")}[#{components.last}]" 
  puts terraform apply -target #{resource}'
end 

This script will run the Terraform plan command and find all the instances that Terraform wants to replace and then loops through them, builds valid resource reference, and feeds it into the terraform apply command. As a result, instances will be replaced one by one, reducing the risk of downtime.

Note

This script leaves much to be desired, of course. It doesn't stream output from Terraform commands till execution is finished, as one example of its roughness.

After looking at this, even though it is smallest example script you might cry out in horror: that's not the way I want to handle my infrastructure updates. And you will be right screaming so. But as long as Terraform doesn't have handlers for update scenarios built in, you have to wrap it with extra tools and scripts, written in a scripting language you are most fond of.

Update scenarios are different depending on business, of course, so even if and when Terraform will gets new features for updates, most likely you will still have to come up with a tooling around it, in order to make it fit your organization in the best way possible. It's a good idea to wrap common operations to your infrastructure operations into a Makefile inside your Terraform working directory.

There is another way to update your infrastructure: so-called blue-green deployments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.136.170