Chapter 6. A Test-Driven Infrastructure Framework

At the time of the first edition of this book, there was only one tool and a handful of people exploring the ideas of infrastructure testing. The first edition covered that tool—a tool written by me as a proof of concept to demonstrate that the project of testing infrastructure code was achievable. This tool, Cucumber-Chef, was intentionally narrow in its purview, in that it attempted to explore one particular aspect of the broader infrastructure-testing landscape, in a way that reduced the commitment in terms of acquiring new machines to zero. Based around Opscode’s Hosted Chef service and Amazon’s EC2 platform, it set out to open the discussion and get the conversation moving.

The testing ecosystem has blossomed since the first edition of this book. Mature frameworks are emerging, significant community adoption of the testing of cookbooks and infrastructure is taking place, and helper tools and knife plug-ins specifically targeted at infrastructure testing are released regularly.

This chapter takes a high-level philosophical overview of the business of testing infrastructure code. It sets out a vision for what the landscape should look like. This is a landscape that changes day by day. At the time of this writing—early summer 2013—there is a profound level of interest in infrastructure testing. Discussions abound on the mailing lists, IRC, Twitter, and in various podcasts. It’s a dynamic, exciting, and fast-moving subject area.

That said, I believe it is possible both to set a conceptual framework for what needs to be in place, and to outline a workflow based on the current best-of-breed tooling available. Having presented a conceptual framework, we will survey a selection of the currently available tools, providing examples of each tool together with a discussion of their merits and demerits, and how they fit into an overarching testing strategy.

Naturally in a fast-moving technology space such as infrastructure as code, the state of the art is in flux; however, I think we can be confident that a philosophy, methodology, and requirements list against which we can continue to measure tools as they emerge can be synthesized.

Test-Driven Infrastructure: A Conceptual Framework

I’ll start by setting out a high-level vision. I’m not a believer in luck; although I share the observation of legendary South African golfer, Gary Player, who maintained, “The harder I practice, the luckier I get.” That said, I think it does no harm, as a community, or a movement, to have a mascot. The MASCOT I propose upholds the following six objectives:

Test-driven infrastructure should be:

  • Mainstream
  • Automated
  • Side effect aware
  • Continuously integrated
  • Outside-in
  • Test-first

Test-Driven Infrastructure Should Be Mainstream

My vision is that soon it won’t even be questioned that developing infrastructure is done in a test-driven way. Although a very strong case can be made for the approach, it will never become mainstream until the barriers to entry are lowered. It’s no surprise that, of modern languages, Ruby has most comprehensively embraced test-driven engineering. The quality of tooling is tremendously high with innovation and improvement seen on a regular basis. The passion and enthusiasm of the community has made testing a popular topic, and within the web development world, Ruby leads the way, and test-driven development is mainstream. Within our world of infrastructure as code, the tooling we have isn’t yet sufficiently powerful or easy to use to encourage mass adoption, but we’re on the right trajectory.

In order for testing to become mainstream, it’s necessary to agree to a set of standards around which to organize. Of particular concern is community agreement about the general syntax and style of cookbooks. When developing infrastructure code in a shared environment, enforcing a house style can be a very valuable thing to implement. It encourages the team to work in a consistent way and ensures that code is maximally shareable and portable.

Test-Driven Infrastructure Should Be Automated

In order for testing to become mainstream and effective, it’s essential that it’s automated. This is especially the case for long-running, complex integration tests. Without a workflow that includes automation of these high-value, but labor-intensive tests, they simply won’t be run with sufficient frequency to deliver consistent improvements.

Automation takes place at a number of levels. To an extent, the very act of writing test code is a kind of automation. We’re encoding the steps that need to be taken to verify that a given state has been achieved, or that a given behavior is being exhibited. However, it’s not just the encoding of the steps required to carry out the test that needs to be automated. We also need to automate the running of the tests with a degree of frequency that is meaningful, and a degree of feedback that is noticeable and unignorable.

To draw a parallel with the mainstream software development world, when writing tests, some tests are harder to write than others. Specifically, writing unit tests is pretty easy. Writing integration tests is harder. Writing end-to-end acceptance tests is hardest. This means that sometimes the hardest tests are simply not automated—in some cases the testing is left to the customer. The same applies when testing infrastructure. It’s not difficult to write a test that asserts that a resource has been brought into the correct state. It’s harder to test connectivity between two layers of infrastructure, such as between database and web server. It’s hardest of all to verify that the infrastructure behaves as it should, from monitoring to backups, from top to bottom. In both worlds, the most value is in the hardest stuff.

Martin Fowler likes the sound byte, “If it hurts, do more of it.” The logic behind this seemingly paradoxical statement is that there’s an exponential relationship between the amount of pain experienced and the amount of time between occurrences of the thing that causes pain. This is the case for converging nodes, rebuilding servers, migrating databases, speaking to stakeholders, releasing software, and of course, running tests. Thus it stands to reason that by doing it more frequently, it will, in fact, start to hurt less.

If there’s any pain associated with the frequent running of tests—unreliable tests, flakey interfaces, slow test machines, very long-running tests, or the like—it’s especially important to automate them.

Our infrastructure tests should run automatically—ideally on every commit. Even better would be to move to a continuous deployment model, where every commit not only kicks off a test run, but deploys the code on a test environment and then traverses a build pipeline with appropriate yes/no gates, ultimately resulting in an update of the production infrastructure. This is the current state of the art in the software development world. If infrastructure is code, we should give serious thought to adopting the same mentality when writing Chef recipes and cookbooks.

Test-Driven Infrastructure Should Be Side-Effect Aware

In his State of the Union presentation at the inaugural Chefconf event in Burlingame, CA, Adam Jacob made the observation that configuration management is effectively the study of side-effects. When we write infrastructure code to capture a set of complex requirements, what we’re really doing is commanding one system to take action in a way that affects another system, which in turn impacts other systems in such a way as to bring the world into a desired state. Chef takes this challenge in its stride—it aims to make systems easy to reason about, to remain predictable, and understandable in the event of a mistake.

The bigger challenge comes in the inherent portability of Chef cookbooks and recipes. Especially amongst the popular community cookbooks—such as Apache or MySQL, with dozens of contributors across a range of Linux, Unix, and Windows systems—it’s entirely possible that a change or improvement introduced for one platform will have an unexpected and adverse side-effect on users on a different platform. Our test-driven infrastructure vision needs to acknowledge and mitigate against this risk.

Fundamentally, we want to be confident that seemingly trivial changes to our cookbooks don’t have unwanted side-effects. This becomes more of a challenge if our cookbooks grow to support multiple platforms. The possibility that a trivial change for a system running on Red Hat breaks compatibility for FreeBSD is something that needs to be guarded against. Naturally this can be achieved manually, by spinning up a virtual machine, running Chef, and looking at the output, but automating this makes it far more likely that it will happen as a matter of course. This is especially valuable as the number and complexity of our cookbooks grow, and even more especially in an environment in which many different developers are cooperating.

Test-Driven Infrastructure Should Be Continuously Integrated

A key component of constructing a world in which our infrastructure testing is both automated and side-effect aware is that the code we write should be continuously integrated.

Another core practice from eXtreme programming, the idea of continuous integration lies in the recognition that the traditional approach of periodically integrating the code of a number of different people is invariably an error-prone, time-consuming, and painful endeavor. Ron Jeffries quips, on the C2 wiki:

I’ve been working on my classes and think they are perfect. You’ve been working on yours and I suppose you think they’re pretty good, too. Carl has been working on his, and you know how that goes.

Now we have to integrate them to build a new system. Carl’s code, as usual, breaks everything. It looks to me as if you have a few problems, too. My code is solid, I know that because I worked hard on it.

What I can’t understand is why you think there might be something wrong with my code, and Carl, the idiot, is after both of us.

We’re in for a few really unpleasant days. Maybe next time we shouldn’t wait so long to integrate…

Ron Jeffries

The response is the principle that developers should be integrating and committing code very frequently. This avoids diverging or fragmented development efforts, especially where team members are not in direct communication with each other. In a community development effort, such as cookbooks, this is even more vital.

In an XP team, the process of integrating the code means gathering the latest code, and running all the tests. If tests fail, collaborating on what caused the failure and committing a fix becomes the priority task.

If we’re to be serious about developing quality infrastructure code, we need to bring the same practices to bear. This means that our tests need to be run automatically on commits, and the results shared visibly and publicly.

Test-Driven Infrastructure Should Be Outside In

One of the maxims of BDD is that we take an outside-in approach. Imagine I set a group of people in a room to a programming task of moderate complexity. If you were to watch each person in the room after I’d finished explaining the task, I think you’d find that the most natural approach, and the statistically most likely approach of each person would be to open up their editor of choice and start hacking away. You might find some people opening up some kind of interactive REPL and experimenting. Those with a grounding in agile programming approaches might even start writing some basic unit tests. This kind of approach is what I call “inside-out.” Straight away we’re starting to write the code to solve the problem, even if we’re writing tests first.

BDD encourages thinking about the problem a different way. This is the great thing about Cucumber—it allows and to an extent, even forces the developer to step right away from the implementation details and think about how the software should look, feel, behave, act. This is outside-in. We describe a feature that delivers value as an executable specification. Only once we have this feature described, and failing a test, do we start to think about how to make it pass.

The same approach makes a great deal of sense when we are doing infrastructure development. If I set a task, such as setting up an issue tracker, and asked a number of people in a room to carry this out, you’d see similar behavior. Most would start by installing Apache and PHP, and then maybe think about a user, and hack forward from there. A smaller number would start to write or even reuse Chef cookbooks and recipes. The outside-in approach starts by writing the feature that defines how the piece of infrastructure should behave.

We want to ensure that our cookbooks deliver the intended behavior—that they solve the particular problem we have in mind when we set out.

I’ve already covered the foundational principles of behavior-driven development, but I will re-emphasize the fact that none of our development efforts are worth a thing if they don’t address a specific business value. Test-driven infrastructure means committing to build the right thing, not just build the thing right.

Test-Driven Infrastructure Should Be Test-First

The final objective of my mascot manifesto is that as we write our infrastructure, not only should we be ensuring our code is under test, but that those tests should be written before we write any Chef code. This discipline recognizes that the tests we write are actually a development tool in themselves. The benefits are clear:

  • It focuses attention on precisely what the cookbook/recipe needs to do.
  • It makes it very clear where the development should start.
  • There is never any question about the definition of done—the test owns this.
  • It encourages a lean and efficient development approach: we build only as much infrastructure as is needed to make the tests pass.
  • In the spirit of Chef, it makes our code easy to reason about—the target is reproducible, predictable results.
  • Dependencies are flushed out early, and their minimization is a core activity.
  • It surfaces good design decisions by encouraging the creation of solutions that are simple enough to make the test pass, but no simpler.
  • In the event of unexpected failures, the debugging process is targeted.
  • It encourages refactoring—as we write code to make our tests pass, so we should identify hints that refactoring is needed.

I asked my family what animal they felt would be appropriate as a mascot for a test-driven infrastructure manifesto. They gave it careful deliberation before suggesting that the best choice was a tortoise. Their reasoning was that tortoises like eating cucumbers, don’t dash head first into things, but take a measured and careful approach, and in fine Aesopian tradition, win the race anyway.

I’m not sure it’ll catch on, but I am sure that to achieve this sextuplet of objectives, we need to overcome a number of technical hurdles.

The Pillars of Test-Driven Infrastructure

What, then, should be the conceptual framework that informs our choice of tools? How do we go about ensuring that TDI is MASCOT? If we want TDI to be mainstream, what needs to be in place? If we want our testing to be automated, what do we need to accomplish that? What does it mean for our tests to be side-effect aware? What specifically do we need to make sure this is happening? What about continuous integration—how do we go about that? Are there tools? Workflows? What do we require, and what do we lack? In order to perform the whole endeavor test-first, what do we need? Or perhaps, what does the beginning infrastructure developer lack, and if they were to be handed a starter pack, with “TDI Essentials: A Toolkit for Success” written on the front, what would it contain?

I think it makes sense to try to break the requirements down into four broad areas. Obviously we need to write the tests themselves, which requires that we have access to a testing framework, and supporting tools and documentation to help us write those tests. Naturally we need to run our tests, and indeed have them run in our absence, without our constant input. Given that we’re testing infrastructure, we need to be able to set up and tear down—to build a test infrastructure for the purposes of testing. This is effectively a provisioning problem. And finally, we need to be told the results, in a meaningful way, in a timely manner, and in such a way that encourages us to take action. That is to say, the feedback we get needs to be directed, relevant, and accurate.

Let’s unpack these four supporting, conceptual pillars:

  • Writing
  • Running
  • Provisioning
  • Feedback

Writing Tests

The process of testing code consists of setting up state, introducing some input that changes that state, and then comparing the resulting state with our expectations. As discussed in Chapter 5, it’s apparent that this test writing needs to take place at several different levels—from the high-level behavior of the overall system we’re building, to the verification that distribution-specific variables are evaluated correctly.

The main challenge here is in making this process easy. Having to write verbose, manual expectations to assert that, for example, a package was installed, is tiresome. Such expectations and assertions can be simplified and shared. The more complex, end-to-end systems are more likely to require the solutions to more involved and bespoke challenges, however as the corpus of tests in the community grows, so will the body of experience and confidence.

It needs to be easy for infrastructure developers to assert that a resource is in the desired state. Ideally this should be in the form of providing potted assertions that can be reused, rather than requiring the developer to create this scaffolding him or herself.

Running Tests

Once infrastructure developers feel confident in writing tests, they need to establish the most effective way to run both their tests, and in cases Chef itself, on or against a range of systems.

I don’t think it’s unfair to claim that the mechanism by which tests are run automatically is pretty much a solved problem. There are mature job runners and continuous integration frameworks and even online services that are designed specifically for this task and are used every day by countless software development organizations.

However, orchestrating the running of tests we have written is not without its own challenges, especially if the tests are to be run on remote machines or to span multiple systems. In line with our desire for maximum automation, we also need to establish the most effective way to run tests in an unattended way, on commit, or with predictable periodicity.

Of course, a prerequisite for being able to run these tests is the ability to provision test infrastructure rapidly and painlessly. We consider this next.

Provisioning Machines

The holy grail of infrastructure testing is the ability to specify an infrastructure feature, provision some hardware, apply Chef, and verify that the intended behavior has been met, all quickly and automatically.

Primitive testing can be carried out on one’s own development workstation, but pretty soon a need to provision fresh machines, run Chef against them, and then test them, becomes a clear requirement.

As soon as the infrastructure we’re building has one or more of the following characteristics, we need to solve this problem:

  • The infrastructure runs on a different OS from that of the developer’s workstation.
  • The infrastructure runs on more than one OS or distribution.

And, in fact, there is always going to be a need for a complete end-to-end test, which at the very least demands a brand new, fresh, unadulterated machine from which to start, and which may involve a large number of different machines.

Advances in desktop virtualization technology, and the ready availability of highly powered laptops and workstations does make keeping this test environment on one’s local machine more achievable than it was a few years ago. Indeed the ready availability of local test machines has brought about a significant upsurge in people starting to take infrastructure testing seriously. However, we need to think beyond our local machines to facilitate unattended testing, shared infrastructure, and to accommodate the reality of a world in which some developers suffer under highly restrictive IT policies, and in which some organizations—especially charities, non-profits, and businesses in the developing world—simply don’t have the same degree of power and freedom with their local machines as others.

The requirement, therefore, is to be able to provision machines, install Chef, create and apply appropriate run lists, and then run Chef to bring the machines in line with our stated policy. This is a pretty in-depth process. Assuming the Chef code has been written, we still need to make the latest version of the code available, and then converge the node or nodes. We then need to be able to run some kind of test against the converged nodes, from a machine that behaves like an external client.

Virtualization has made this process much simpler than it was even 10 years ago, and excellent network APIs exist for many cloud providers, which makes automated provisioning as a part of the testing process well within our capabilities.

Provisioning is made much simpler with the use of virtualization-based technologies. The ability to create snapshots, roll forwards and backwards, or clone or freshly provision machines makes the setting up of a platform on which or against which to run tests an achievable aim. One variable to consider, however, is the number of machines required. If your infrastructure supports three or more different underlying platforms—such as two different Linux distributions and a flavor of BSD, or Windows—the requirement is now to be able to run and work with three machines. If these machines are to be reasonably responsive and performant, resources in terms of processor power and memory need to be appropriately allocated. Modern hardware brings this within reach, with multi-core laptops with 8G of memory now not uncommon, but cases where developer workstations are insufficiently powerful are still common, so alternative approaches need to be considered.

Feedback of Results

It’s actually the speed of the tests that represents one of the biggest challenges. We want the feedback time to be sufficiently quick as to be rewarding and not frustrating.

The main constraining factor when testing Chef code, which impacts the speed of tests, is the time taken to convergence of the node during a Chef run. A complex cookbook, making use of search, and using the Opscode Hosted Chef platform, could take a minute or more to run per node. Unit tests that take three minutes to run have a high likelihood of being skipped or ignored, so working out the most effective way to converge nodes is highly significant.

Related to the running of the tests is the mechanism for extracting the results. Again, at small scale, running tests and observing the results is trivial. However, storing these results for later analysis, or running the tests and being able to see the results some hours later requires more thought.

In order to achieve continuous integration, we need to make the connection between a line of text on a console indicating a failed test, and something that an automated test runner can understand to mean that the build failed.

Finally, in line with our desire to encourage and enforce shared standards for quality, it’s necessary to provide a means of both defining and assessing compliance against those standards. This has both a community and technological aspect—the standards need to be discussed and agreed, and then an approach to validating code against those standards that is flexible, fast, and automated is required.

Having drawn up a framework for Test-driven infrastructure, we now turn to building a toolkit.

