Chapter 4. Core Practice: Define everything as code

In Chapter 1, I identified three core practices that help you to change infrastructure rapidly and reliably:

  • Define everything as code

  • Continuously validate all work in progress

  • Build small, simple pieces that you can change independently

In this chapter, I explore the first of these core practices. Why would you want to define your infrastructure as code? What do you need to do this? Then I explore some issues about the nature of infrastructure coding languages.

This seemingly banal subject is a hot topic in the industry at the time of this writing, with debate raging over the nature of configuration vs. coding, and special-purpose languages vs. standard programming languages.

I’ll close this chapter with some implementation principles to guide you in deciding how to organize your infrastructure code.

The goal of this chapter is to lay out the fundamental concepts of coding infrastructure. With these in place, later chapters offer specific patterns and recommendations for implementation. I explain how to organize infrastructure resources into useful units that I call stacks in Chapter 5 and the following chapters. I expand these to servers, clusters, and other application runtime platforms in later chapters.

Why you should define your infrastructure as code

Even given a dynamic cloud platform, there are simpler ways to provision infrastructure than writing code and running a tool. Go to the platform’s web-based user interface and poke and click an application server cluster into being. Drop to the prompt, and using your command-line prowess, wield the vendor’s CLI (Command-Line Interface) tool to forge an unbreakable network boundary.

But seriously, the previous chapters have explained why it’s better to use code to build your systems. As a quick recap of “Core practice: Define everything as code”, some of the benefits of defining things as code are:

Reusability

If you define a thing as code, you can create many instances of it. You can repair and rebuild your things quickly. Other people can build identical instances of the thing.

Consistency

Things built from code are built the same way every time. This makes system behavior predictable. This makes testing more reliable. This enables continuous validation.

Transparency

Everyone can see how the thing is built by looking at the code, which helps in many ways. People can review the code and suggest improvements. They can learn things to use in other code. They gain insight to help with troubleshooting. They can review and audit for compliance.

What you can define as code

Every infrastructure tool has a different name for its source code-for example, playbooks, cookbooks, manifests, and templates. I refer to these in a general sense as infrastructure code, or sometimes as an infrastructure definition.

Infrastructure code specifies both the infrastructure elements you want and how you want them configured. You run an infrastructure tool to apply your code to an instance of your infrastructure. The tool either creates new infrastructure, or it modifies existing infrastructure to match what you’ve defined in your code.

Some of the things you might define as code include:

  • An infrastructure stack is a collection of elements provisioned from an infrastructure cloud platform. I write about infrastructure platforms in Chapter 3, and discuss stacks in Chapter 5.

  • Elements of a server’s configuration, such as packages, files, user accounts, and services. ([Link to Come])

  • A server role is a collection of server elements that are applied together to a single server instance ([Link to Come])

  • A server image definition generates an image for building multiple server instances. ([Link to Come])

  • An application package defines how to build a deployable application artifact, including containers. ([Link to Come])

  • Configuration and scripts for delivery services, which include pipelines and deployment. ([Link to Come])

  • Configuration for operations services, such as monitoring checks. ([Link to Come])

  • Validation rules, which include automated tests and compliance rules. (Chapter 9)

Choose tools that are configured with code

Infrastructure as Code, by definition, involves specifying your infrastructure in text-based files. You manage these files separately from the tools that you use to apply them to your system. You can read, edit, analyze, and manipulate your specifications using any tools you want.

Other infrastructure automation tools store your infrastructure specifications as data that you can’t access directly. Instead, you can only use and edit the specifications by using the tool itself. The tool may have some combination of GUI, API, and command-line.

The issue with these black-box tools is that they limit the practices and workflows you can use:

  • You can only version your infrastructure specifications if the tool has built-in versioning,

  • You can only use Continuous Integration if the tool has a way to trigger a job automatically when you make a change,

  • You can only create delivery pipelines if the tool makes it easy to version and promote your infrastructure specifications,

  • You can only split monolithic infrastructure into independent pieces if the tool supports it.

Lessons from Software Source Code

The externalized configuration pattern mirrors the way most software source code works. Some development environments keep source code hidden away, such as Visual Basic for Applications. But for non-trivial systems, developers prefer keeping their source code in external files.

It is challenging to use agile engineering practices such as Test-Driven Development, Continuous Integration, and Continuous Delivery with black-box infrastructure management tools.

A tool that uses external code for its specifications doesn’t constrain you to use a specific workflow. You can use an industry-standard source control system, text editor, CI server, and automated testing framework. You can build delivery pipelines using the tool that works best for you.

Manage your code in a version control system

If you’re defining your stuff as code, then putting that code into a version control system (VCS) is simple and powerful. By doing this, you get:

Traceability

VCS provides a history of changes, who made them, and context about why1. This history is invaluable when debugging problems.

Rollback

When a change breaks something—and especially when multiple changes break something—it’s useful to be able to restore things to exactly how they were before.

Correlation

Keeping scripts, specifications, and configuration in version control helps when tracing and fixing gnarly problems. You can correlate across pieces with tags and version numbers.

Visibility

Everyone can see each change committed to the version control system, giving the team situational awareness. Someone may notice that a change has missed something important. If an incident happens, people are aware of recent commits that may have triggered it.

Actionability

The VCS can trigger an action automatically for each change committed. Triggers enable CI jobs and CD pipelines.

Recommendation: Avoid branching

Branching is version control system feature that allows people to work on code in separate streams. There are many popular workflows based around branches. I’ll explain how these workflows can conflict with the core practice of Continuous Integration in [Link to Come].

Secrets and source code

Systems need various secrets. Your stack tool may need a password or key to use your platform’s API to create and change infrastructure. You may also need to provision secrets into environments, for example making sure an application has the password for its database.

It’s essential to handle these types of secrets in a secure way from the very beginning. Whether you are using a public cloud or a private cloud a leaked password can have terrible consequences. So even when you are only writing code to learn how to use a new tool or platform, you should never put secrets into code. There are many stories of people who checked a secret into a source repository they thought was private, only to find it had been discovered by hackers who exploited it to run up huge bills.

One solution to this is to encrypt secrets in order to store them in code2. Infrastructure developers and unattended systems need to be able to decrypt these secrets without storing a secret. So you still have at least one secret to manage outside of source control!

There are a few approaches for handling secrets needed by infrastructure code without actually putting them into code. The include secretless authorization, runtime secret injection, and disposable secrets.

Secretless authorization

Many services and systems provide ways to authorize actions without using secrets. Most cloud platforms can mark a compute service-such as a virtual machine or container instance-as authorized for privileged actions.

For example, an AWS EC2 instance can be assigned an IAM profile that gives processes on the instance rights to carry out a set of API commands. If you configure a stack management tools to run on one of these instances, you avoid the need to manage a secret that might be exploited by attackers.

In some cases, secretless authorization can be used to avoid the need to provision secrets on infrastructure when it is created. For example, an application server might need to access a database instance. Rather than a server configuration tool provisioning a password onto the application server, the database server might be configured to authorize connections from the application server, perhaps based on its network address.

Tying privileges to a compute instance or network address only shifts the possible attack vector. Anyone gaining access to that instance can exploit those privileges. You need to put in the work to protect access to privileged instances. On the other hand, someone gaining access to an instance may be able to access secrets stored there, so giving privileges to the instance may not be any worse. And a secret can potentially be exploited from other locations, so removing the use of secrets entirely is generally a good thing.

Injecting secrets at runtime

When you can’t avoid using secrets for stacks or other infrastructure code, you can explore ways to inject secrets at runtime. You’ll normally implement it as stack parameters, which is the topic of Chapter 8. I describe the details of handling secrets as parameters with each of the patterns and antipatterns in that chapter.

There are two different runtime situations to consider, local development and unattended agents. People who work on infrastructure code3 will often keep secrets in a local file that isn’t stored in version control. The stack tool could read that file directly, especially appropriate if you’re using the stack configuration file pattern (“Pattern: Stack Configuration Files”). Or the file could be a script that sets the secrets in environment variables, which works well with the stack environment variables pattern (“Pattern: Stack Environment Variables”).

These approaches also work on unattended agents, such as those used for CI testing or CD delivery pipelines4. But you need to store the secrets on the server or container that runs the agent. Alternatively, you can use secrets management features of your agent software to provide secrets to the stack command, as with the pipeline stack parameters pattern (“Pattern: Pipeline Stack Parameters”). Another option is to pull secrets from a secrets management service (of the type described in “Secrets management”), which aligns to the stack parameter registry pattern (“Pattern: Stack Parameter Registry”).

Disposable secrets

A cool thing you can do with dynamic platforms is to create secrets on the fly, and only use them on a “need-to-know” basis. In the database password example, the code that provisions the database automatically generates a password and passes it to the code that provisions the application server. Humans don’t ever need to see the secret, so it’s never stored anywhere else.

You can apply the code to reset the password as needed. If the application server is rebuilt, you can re-run the database server code to generate a new password for it.

Secrets management services, such as Hashicorp Vault, can also generate and set a password in other systems and services on the fly. It can then make the password available either to the stack tool when it provisions the infrastructure, or else directly to the services that uses it, such as the application server.

Infrastructure coding languages

System administrators have been using scripts to automate infrastructure management tasks for decades. General-purpose scripting languages like Bash, Perl, Powershell, Ruby, and Python are still an essential part of an infrastructure team’s toolkit.

CFEngine pioneered the use of declarative, Domain Specific Languages (DSL - see “DSLs for infrastructure” a bit later) for infrastructure management. Puppet and then Chef emerged alongside mainstream server virtualization and IaaS cloud. Ansible, Saltstack, and others followed.

Stack-oriented tools like Terraform and CloudFormation arrived a few years later, following the same declarative DSL model.

Recently5, there is a trend of new infrastructure tools that use existing general-purpose programming languages to define infrastructure. Pulumi and the AWS CDK (Cloud Development Kit) are examples, supporting languages like Typescript, Python, and Java. These newer languages use procedural language structures rather than being declarative.

The principles, practices, and patterns in this book should be relevant regardless of what language you use to implement them. The languages we use should make it easy to write code that is easy to understand, test, maintain, and improve. Let’s review the evolution of infrastructure coding languages and consider how different aspects of language choice affect this goal.

Scripting your infrastructure

Before standard tools appeared for provisioning cloud infrastructure declaratively, we wrote scripts in general-purpose, procedural languages. These used SDK (Software Development Kit) libraries to interact with the cloud provider’s API.

Example 4-1 uses pseudo-code, but is similar to scripts that I wrote in Ruby with the AWS SDK. It creates a server named my_application_server and then runs the (fictional) servermaker tool to configure it.

Example 4-1. Example of procedural code that creates a server
import 'cloud-api-library'

network_segment = CloudApi.find_network_segment('private')

app_server = CloudApi.find_server('my_application_server')
if(app_server == null) {
  app_server = CloudApi.create_server(
    name: 'my_application_server',
    image: 'base_linux',
    cpu: 2,
    ram: '2GB',
    network: network_segment
  )
  while(app_server.ready == false) {
    wait 5
  }
  app_server.provision(
    provisioner: servermaker,
    role: tomcat_server
  )
}

This script combines what and how. It specifies attributes of the server, including the CPU and memory resources to provide it, what OS image to start from, and what Ansible role to apply to the server. It also implements logic: it checks whether the server named my_application_server already exists, to avoid creating a duplicate server, and then it waits for the server to become ready before running Ansible on it. The script would need additional logic to handle errors, which I didn’t include in the example.

The example code also doesn’t handle changes to the server’s attributes. What if you need to increase the RAM? You could change the script so that if the server exists, the script will check each attribute and change it if necessary. Or you could write a new script to find and change existing servers.

More realistic scenarios include multiple servers of different types. In addition to our application server, my team had web servers and database servers. We also had multiple environments, which meant multiple instances of each server.

Teams I worked with often turned simplistic scripts like the one in this example into a multi-purpose script. This kind of script would take arguments specifying the type of server and the environment, and use these to create the appropriate server instance. We evolved this into a script that would read configuration files that specify various server attributes.

I was working on a script like this, wondering if it would be worth releasing it as an open-source tool, when Hashicorp released the first version of Terraform.

Building infrastructure with declarative code

Terraform, like most other stack provisioning tools and server configuration tools, uses a declarative language. Rather than a procedural language, which executes a series of statements using control flow logic like if statements and while loops, a declarative language is a set of statements that declare the result you want.

Example 4-2 creates the same server as Example 4-1. The code in this example (as with most code examples in this book) is a fictional language6.

Example 4-2. Example of declarative code
virtual_machine:
  name: my_application_server
  source_image: 'base_linux'
  cpu: 2
  ram: 2GB
  network: private_network_segment
  provision:
    provisioner: servermaker
    role: tomcat_server

This code doesn’t include any logic to check whether the server already exists or to wait for the server to come up before running Ansible. The tool that applies the code implements this logic, along with error-handling. The tool also checks the current attributes of infrastructure against what is declared, and work out what changes to make to bring the infrastructure in line. So to increase the RAM of the application server in this example, you can edit the file and re-run the tool.

Declarative infrastructure tools like Terraform and Chef keep the what and how separate. The code you write as a user of the tool only declares the attributes of your infrastructure. The tool implements the logic for how to make it happen. As a result, your code is cleaner and more direct.

DSLs for infrastructure

In addition to being declarative, infrastructure tools often use their own DSL7.

The advantage of a DSL for infrastructure code is keeping the code simple and focused.

A general-purpose language needs extra syntax, such as declarations of variables and references to class structures in a DSL:

import 'cloud-api-library'
app_server = CloudApi.find_server('my_application_server')
if(app_server == null) {
  app_server = CloudApi.create_server(name: 'my_application_server')

A DSL can strip this to the most relevant elements:

virtual_machine:
  name: my_application_server

The return of general-purpose languages for infrastructure

Newer tools, such as Pulumi and the AWS CDK, bring general-purpose languages back to infrastructure coding. There are several arguments for doing this, some more convincing than others.

Configuration isn’t real code

Some folks take the phrase “Infrastructure as Code” to heart. They argue that declarative languages are just configuration, not a “real” language. Personally, I’m not bothered if someone disparages my code as being mere configuration. I still find it useful to keep “what” and “how” separate, and to avoid writing repetitive, verbose code.

It’s useful not to have to learn a new language

Using a popular language like JavaScript means more people can learn to write infrastructure as code since they don’t need to learn a peculiar special-purpose language. I have sympathy for making it easy for people to adopt infrastructure as code. But I don’t think a new language syntax, especially a simple declarative language, is as hard to learn as the domain-specific aspects of infrastructure code like networking constructs.

Infrastructure DSLs are not well-supported by development tools

This is generally true. There are many mature IDE’s8 for languages like JavaScript, Python, and Ruby. These have loads of features, like syntax highlighting and code refactoring, that help developers to be more productive. Rather than discouraging the use of new languages (of any kind), I would love to see vendors improve their support for infrastructure coding languages.

Proper support for libraries helps you to simplify code

Most infrastructure DSLs let you write modules (as I’ll discuss in Chapter 6). But these are not as rich and flexible as libraries in mature, procedural languages like JavaScript, Python, and Ruby. I discuss reusable modules and libraries for stacks in Chapter 6.

Using the same language lets you combine infrastructure and application code

Examples of this show application code that provisions its own infrastructure. People who’ve been developing web applications for a while recall how nifty it was to be able to embed SQL statements into HTML code. Experience has shown that this does not lead to cleanly designed, maintainable systems. It is often useful to specify actions for a tool to trigger on specific changes to infrastructure, such as provisioning newly created resources. But this can be done without intermingling the different styles of code.

Infrastructure languages are less testable

Also true, although there are several reasons for this. One is that the ecosystem of testing tools for infrastructure is not as mature as that for other types of code. Another reason is that testing declarations requires a different approach to testing logic. A third reason is that testing code that provisions infrastructure has much longer feedback loops. I delve into these in “Challenges with testing infrastructure code”.

The swing back towards general-purpose languages for infrastructure is new9. Some of the arguments threaten to regress us to codebases filled with verbose spaghetti code mingling configuration, application logic, and repetitive utility code. But I expect that this is just one step on a path to languages that support better coding.

For many teams today, the challenges with their codebase do not come from the language they use. Regardless of language, they need ways to keep their code clean, well-organized, and easy to maintain.

Implementation Principles for defining infrastructure as code

To update and evolve your infrastructure systems easily and safely, you need to keep your codebase clean: easy to understand, test, maintain, and improve. Code quality is a familiar theme in software engineering. The following implementation principles are guidelines for designing and organizing your code to support this goal.

Implementation Principle: Avoid mixing different types of code

As discussed earlier, some infrastructure coding languages are procedural, and some are declarative10. Each of these paradigms has its strengths and weaknesses.

A particular scourge of infrastructure is code that mixes both declarative and procedural code, as in Example 4-3. This code includes declarative code for defining a server as well as procedural code that determines which attributes to assign to the server depending on the server role and environment.

Example 4-3. Example of mingled procedural and declarative code
for ${env} in ["test", "staging", "prod"] {
  for ${server_role} in ["app", "web", "db"] {
    server:
      name: ${server_role}-${env}
      image: 'base_linux'
      cpu: if(${env} == "prod" || ${env} == "staging") {
        4
      } else {
        2
      }
      ram: if(${server_role} == "app") {
        "4GB"
      } else if (${server_role} == "db") {
        "2GB"
      } else {
        "1GB"
      }
      provision:
        provisioner: servermaker
        role: ${server_role}
  }
}

Mixed declarative and procedural code is a design smell11https://martinfowler.com/bliki/CodeSmell.html. A “smell” is some characteristic of a system that you observe that suggests there is an underlying problem. In the example from the text, code that mixes declarative and procedural constructs is a smell. This smell suggests that your code may be trying to do multiple things and that it may be better to pull them apart into different pieces of code, perhaps in different languages.]. It’s not easy to understand this code, which makes it harder to debug, and harder to change without breaking something.

The messiness of this code comes from intermingling two different concerns, which leads to the next implementation principle.

Implementation Principle: Separate infrastructure code concerns

A common reason for messy code is that it is doing multiple things. These things may be related, but teasing them apart can make them easier to distinguish, and improve the readability and maintainability of the code.

Example 4-3 does two things: it specifies the infrastructure resource to create, and it configures that resource differently in different contexts. The example illustrates two of the four most common concerns of infrastructure code:

Specification

Specifications define the shape of your infrastructure. Your server has specific packages, configuration files, and user accounts. Declarative languages work well for this because specifications are what you want. Specifications are what most people mean when they talk about infrastructure code.

Configuration

Configuration defines the things that vary when you provision different instances of infrastructure. Different application servers built from a single specification may need different amounts of RAM. You may want to deploy different applications onto otherwise identical servers. Configuration is almost always declarative.

Execution

Execution applies specification and configuration to the actual infrastructure resources. Together, the specification and configuration declare what you want. The orchestration is about how to make that happen. Procedural or functional code usually works best for orchestration. Ideally, you should use an off the shelf tool for this rather than writing your own code.

Orchestration

Orchestration combines multiple specifications and configurations. For example, you may need to create a server in the cloud platform, and then install packages on the server. Or you may need to create networking structures, and then create multiple servers attached to those structures.

Separating specification and configuration

Let’s take a simple example of code that mixes concerns and split it into two cleaner pieces of code. This example specifies an application server, assigning it to a different network depending on which customer uses it:

virtual_machine:
  name: application_server_${CUSTOMER}
  source_image: 'base_linux'
  provision:
    tool: servermaker
    role: foodspin_application
  network: $(switch ${CUSTOMER}) {
    "bomber_burrito": us_network
    "curry_hut": uk_network
    "burger_barn": au_network
  }

The code uses the CUSTOMER parameter to choose the network for the server.

This code is hard to test because any test instance needs to specify a customer. Using a real customer couples your test code with that customer’s configuration, which makes the tests brittle. You could create a test customer. But then you need to add the configuration for your fake customer to your infrastructure code. Mingling code and configuration between test and production descends even farther into the depths of poor code.

Let’s go in the opposite direction, and pull the specification into its own code:

virtual_machine:
  name: application_server_${CUSTOMER}
  source_image: 'base_linux'
  provision:
    tool: servermaker
    role: foodspin_application
  network: ${NETWORK}

This code is more straightforward than the previous example. It only includes code for the things which are common to all application server instances. Because this doesn’t vary, we don’t need logic, so this fits nicely as declarative code.

The only parts of the specification which vary are set using variables, CUSTOMER to give the server a unique name, and NETWORK to assign it to a unique network. There are different patterns for assigning these variables, which are the subject of an entire chapter of this book (Chapter 8).

To illustrate the separation of concerns, let’s use a script to work out the configuration for the server. The script works out which network to assign the server to based on the environment, as before:

switch (${CUSTOMER}) {
  case 'bomber_burrito':
    return 'us_network'
  case 'curry_hut':
    return 'uk_network'
  case 'burger_barn':
    return 'au_network'
}

Each of these two pieces of code is easier to understand. If you need to add a new customer, you can easily add it to the configuration code, without confusing things. This example doesn’t really need procedural code to work out the configuration. But it illustrates that if you find the need for more complicated logic, it’s cleaner to separate it from the declarative code.

[Link to Come] gives more detailed advice on how to implement separation of concerns in your codebase.

Implementation Principle: Treat infrastructure code like real code

To keep an infrastructure codebase clean, you need to treat it as a first-class concern. Too often, people don’t consider infrastructure code to be “real” code. They don’t give it the same level of engineering discipline as application code.

Design and manage your infrastructure code so that it is easy to understand and maintain. Follow code quality practices, such as code reviews, pair programming, and automated testing. Your team should be aware of technical debt and strive to minimize it.

Conclusion

In this chapter, I explored the core practice of defining your system as code. This included the key prerequisites for this practice, considerations of different types of infrastructure coding languages, and a few principles for good code design. The next core practice, continuously validating your code, builds on this material.

But before I cover that in Chapter 9, I’ll describe specific patterns and antipatterns for implementing this first practice in the context of infrastructure stacks.

Chapter 5 explains how to use infrastructure code to provision resources from your cloud platform into useful groups I call stacks. Chapter 6 covers the use of modules within stacks. Then, Chapter 7 describes how to structure your stack code to create multiple environments. After that, Chapter 8 offers patterns for managing the configuration of different stack instances across environments.

1 Context about why depends on people to write useful commit messages

2 git-crypt, blackbox, sops, and transcrypt are a few tools that help you to encrypt secrets in a git repository. Some of these tools integrate with cloud platform authorization, so unattended systems can decrypt them.

3 I explain how people can work on stack code locally in more detail in [Link to Come].

4 I describe how these are used in [Link to Come]

5 “Recently” as I write this in late 2019

6 I use this pseudo-code language to illustrate the concepts I’m trying to explain, without tying them to any specific tool.

7 Martin Fowler and Rebecca Parsons define a DSL as a “small language, focused on a particular aspect of a software system, in their book Domain-Specific Languages (Addison-Wesley Professional)

8 Integrated Development Environment, a specialized editor for programming languages.

9 Again, I’m writing this in late 2019. By the time you read this, things will have moved forward in one direction or another.

10 Object-Oriented Programming (OOP) and Functional Programming are two other programming paradigms used in application software development. Although there are a few examples of tools that use each of these (Riemann), neither is common with infrastructure code. There is no reason tools couldn’t use them with the domain. But even with OOP or functional programming, the advice in this section would still apply: don’t write code that mixes language paradigms.

11 The term design smell derives from [code smell

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.27.232