Caring for Your Tests

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Caring for Your Tests

The benefit of automating your features is that you’ll be able to trust them as living documentation in the long run, because you’ll be checking each scenario against the production code to make sure it’s still valid. For the programmers working on the code, there’s another benefit too: those tests act as a safety net when they’re working on the system, alerting them to any mistakes they make that break existing behavior.

So, your features work as a feedback mechanism to the whole team about the behavior of the system and to the programmers about whether they’ve broken anything. For these feedback loops to be useful, the tests need to be fast and they need to be reliable. Let’s start by looking at problems that affect the reliability of your tests.

Leaky Scenarios

Cucumber scenarios are basically state-transition tests: you put the system into a Given state A, you perform action X (When), and Then you check that it has moved into expected state B. So, each scenario needs the system to be in a certain state before it begins, and yet each scenario also leaves the system in a new, dirty state when it’s finished.

When the state of the system is not reset between tests, we say that they allow state to leak between them. This is a major cause of brittle tests.

When one scenario depends upon state left behind by another earlier scenario in order for it to pass, you’ve created a dependency between the two scenarios. When you have a chain of several scenarios depending on each other like this, it’s only a matter of time before you end up with a train wreck.

If that first scenario, the one that happens to leave the system in just the right state for the next one to pick it up, is ever changed, the next scenario will suddenly start to fail. Even if you don’t change that earlier scenario, what happens if you want to run only the second scenario on its own? Without the state leaked out by the earlier scenario, it will fail.

The opposite of this, independent scenarios, ensures they put the system into a clean state and then add their own data on top of it. This makes them able to stand on their own, rather than being coupled to the data left behind by other tests or shared fixture data. Investing in building up a good reliable library of Test Data Builders makes this much easier to achieve.

If you’re using Ruby, you’re probably familiar with the FactoryGirl^[25] gem. FactoryGirl is an excellent implementation of the Test Data Builder^[26] pattern. For the uninitiated, here’s a quick summary of its benefits.

Suppose you’re testing a payroll system and you need to create a PayCheck record as part of a scenario. The way your domain model is structured, a PayCheck needs an Employee, and the Employee in turn needs an Address. Each of them also has a few mandatory fields. Instead of having to create all these objects individually in your step definition code or having a big fat set of fixture data, you can simply say this:

	Given(/^I have been paid$/) do
	Factory :pay_check
	end

Once you’ve configured FactoryGirl with the structure of your data model (see the FactoryGirl documentation for details on how this is done), all you need to do is ask her for a PayCheck, and she’ll create not only the PayCheck record but all the dependent objects as well, setting the mandatory fields with reasonable default values. If you care about a field having a specific value, you can tell her to override the default:

	Given(/^I have been paid 50 dollars$/) do
	Factory :pay_check, :amount => 50
	end

When it’s this easy to create data, you no longer need to carry around the baggage of big fixture data sets. There’s a small amount of up-front investment in creating these builders, of course, but it quickly pays off in reliable, readable scenarios and step definition code. If your team isn’t using Ruby, they could still point ActiveRecord and FactoryGirl at their database with minimal effort. Otherwise, look for similar tools for your language.

Matt says:

Fixture Is an Overloaded Term

The word fixture has at least three meanings in the domain of automated testing, which can cause confusion. We’ve used the term fixture data in this chapter to mean data that’s used to set up the context for a scenario or test case. This is the common meaning of the term as used in various xUnit testing tools^[27] and by the Ruby on Rails framework.

There is a long tradition (coming from the hardware world, where test fixtures originated) of calling the link between the test system and the system under test a fixture. This is the “glue code” role that we’ve referred to in this book as automation code. The FIT testing framework^[28] uses this meaning of the term.

Some unit testing tools (such as NUnit) have further confused the issue by referring to the test case class itself as a fixture. So much for a ubiquitous language!

We can’t stress enough how fundamental independent scenarios are to successful automated testing. Apart from the added reliability of scenarios that independently set up their own data, they’re also clearer to read and debug. When you can see precisely what data is used by a scenario just by reading it, rather than having to root around in a fixture data script or, worse, in the database itself, you’ll be able to understand and diagnose a failure much more easily.

Race Conditions and Sleepy Steps

If you write end-to-end integration tests for a reasonably complex system, you’ll eventually encounter this problem. Race conditions occur when two or more parts of the system are running in parallel, but success depends on a particular one of them finishing first. In the case of a Cucumber test, your When step might cause the system to start some work that it runs in the background, such as generating a PDF or updating a search index. If this background task happens to finish before Cucumber runs your Then step, the scenario will pass. If Cucumber wins the race and the Then step executes before the background task is finished, the scenario will fail.

When it’s a close race, you’ll have a flickering scenario, where the scenario will pass and fail intermittently. If there’s a clear winner, a race condition can go unnoticed for a long time, until a new change to the system evens up the stakes, and the scenario starts to fail at random.

A crude solution to this problem is to introduce a fixed-length pause or sleep into the scenario to give the system time to finish processing the background task. Although this is definitely a useful technique in the very short term to diagnose a race condition, you should resist the temptation to leave a sleep in your tests once you understand the cause of the problem. Introducing sleepy steps won’t solve the flickering problem but just make it less likely to happen. In the meantime, you’ve added a few extra seconds to your total test run time, swapping one set of problems for another.

If we had to choose, we’d choose slow, reliable tests over faster unreliable ones, but there’s no need to make this compromise. When testers and programmers pair up to automate scenarios, they can craft tests that are built with knowledge of how the system works. This means they can make use of cues from the system to let the tests know when it’s safe to proceed, so instead of using crude fixed-length sleeps, the tests can proceed as quickly as possible. For an example of working with asynchronous code and further detail, see Chapter 9, Dealing with Message Queues and Asynchronous Components.

Shared Environments

This is a problem that we’ve often found in teams that are transitioning from a manual acceptance testing regime to using automated acceptance tests. Traditionally, the manual testers on the team would have a special environment, often called system test, where a recent build of the system would be deployed. They’d run their manual tests in that environment, reporting bugs back to the development team. If more than one team member needed to run tests in that same environment, they’d communicate between each other to make sure they didn’t tread on one another’s toes.

If it’s even slightly awkward to install the system in a new environment, the likelihood is that when the team starts to automate their tests, they’ll follow the path of least resistance and point their test scripts at this existing system test environment. Now the environment is shared between not only the human members of the team but the test scripts too. Suppose a developer gets a bug report and wants to reproduce it for himself. He logs in to the system test environment and clicks a few buttons, but he doesn’t realize the automated tests are running at the same time. As part of the steps to reproduce the bug, the developer unwittingly deletes a database record that the automated test relied on, and the automated test fails. This kind of situation is a classic cause of flickering scenarios.

The shared use of a single environment can also contribute to unreliable tests by causing heavy and inconsistent load on in-demand resources like databases. When the shared database is under excessive load, normally reliable tests will time out and fail.

To deal with this problem, it needs to be so easy to spin up the system in a new environment that you can do it for fun. You need a One-Click System Setup.

To avoid flickering scenarios that result from using shared environments, the team needs a setup script that will create a new instance of the system from scratch, at the click of a button.

If the system has a database, the database generated by the script should contain the latest schema, as well as any stored procedures, views, functions, and so on. It should contain just the very minimum baseline data necessary for the system to be able to function, such as configuration data. Any more should be left for the independent scenarios to create for themselves.

If there are message queues, or memcache daemons, the setup script should start them too, with the minimal configuration you’d expect to be there on the running system.

Even if your team is not using Ruby for their production code, they could experiment with using Ruby’s ActiveRecord^[29] gem for managing database schemas and migration scripts. ActiveRecord turns this kind of chore into a breeze.

Tester Apartheid

Testers are too often unfairly regarded as second-class citizens on a software team. As we’ll explain in Chapter 8, Support Code, developing a healthy suite of Cucumber features requires not only testing skill but programming skill too. When testers are left alone to build their own Cucumber tests, they may lack the software engineering skill to keep their step definition and support code well organized. Before you know it, the tests are a brittle muddle that people are scared to change.

Combat this problem by encouraging programmers and testers to work together when writing step definition and support code. The programmers can show the tester how to keep the code organized and factor out reusable components or libraries that other teams can use. This is precisely how libraries like Capybara (see Chapter 15, Using Capybara to Test Ajax Web Applications) were created, when a programmer extracted reusable code from their team’s step definitions. By pairing with testers like this, programmers also develop a better understanding of what it takes to make their code testable.

When Cucumber is being used to good effect on a team, a tester should be able to delegate the work of running basic checks to Cucumber. This frees them up to do the more interesting, creative work of exploratory testing, as explained in Agile Testing: A Practical Guide for Testers and Agile Teams [CG08].

Fixture Data

When manually testing a system, it’s useful to populate it with realistic data so that you can wander around the system just like you would in the live application. When your team transitions from manual to automated tests, you can often be tempted to just port over a subset of production data so that the automated tests have a functioning system to work with straightaway.

The alternative, having each test set up its own data, might seem like it’s just too hard. In a legacy system—especially one where the design has evolved organically—where creating the single object actually needed for the test you’re working on means you need to create a huge tree of other dependent objects, you’ll feel like the easiest option is just to create this tree once in the fixture data and then share it with all the other tests.

This approach has a couple of significant problems. A set of fixture data, even if it starts out relatively lean, will tend to grow in size over time. As more and more scenarios come to rely on the data, each needing their own specific tweaks, the size and complexity of the fixture data grows and grows. You’ll start to feel the pain of brittle features when you make a change to the data for one scenario, but that change causes other scenarios to fail. Because you have so many different scenarios depending on the fixture data, you’ll tend to plaster on more data because it’s safer than changing the existing data. When a scenario does fail, it’s hard to know which of the masses of data in the system could be relevant to the failure, making diagnosis much more difficult.

When you have a large set of fixture data, it can be slow to set it up between each test. This puts pressure on you to write leaky scenarios that don’t reset the state of the system between each scenario, because it’s quicker not to do so. We’ve already explained where this can lead.

We consider fixture data to be an antipattern. We much prefer using Test Data Builders like FactoryGirl^[30] where the relevant data is created within the test itself, rather than being buried away in a big tangled set of fixture data.

When you have Slow Features, caused by Lots of Scenarios, it’s worth considering splitting your build in two. Use tags to annotate the scenarios that should be run on every check-in, and demote the rest to a nightly build.

Use of this pattern depends on your team’s appetite for risk, as well as their tendency to make mistakes. The scenarios that should be demoted to a nightly build are the ones that rarely, if ever, fail. They’re the scenarios for functionality that hasn’t changed for months, and they cover stable code that isn’t being worked on. They’re the scenarios that if you had to, you’d be prepared to delete altogether.

There’s a maintenance overhead in keeping the tags for the check-in build on the right scenarios. Over time, some of those scenarios will stabilize and should be demoted to the nightly build, to be replaced by newer scenarios.

Although a nightly build can be a good way to get you out of a hole, usually the right long-term solution is to break up your Big Ball of Mud.

Lots of Scenarios

It might seem like stating the obvious, but having a lot of scenarios is by far the easiest way to give yourself a slow overall feature run. We’re not trying to suggest you give up on BDD and go back to cowboy coding, but we do suggest you treat a slow feature run as a red flag. Having lots of tests has other disadvantages than just waiting a long time for feedback. It’s hard to keep a large set of features organized, making them awkward for readers to navigate around. Maintenance is also harder on the underlying step definitions and support code.

We find that teams that have a single humongous build also tend to have an architecture that could best be described as a big ball of mud. Because all of the behavior in the system is implemented in one place, all the tests have to live in one place, too, and have to all be run together as one big lump. This is a classic ailment of long-lived Ruby on Rails applications, which tend to grow organically without obvious interfaces between their subsystems.

We’ll talk more about what to do with big balls of mud in the next section. It’s really important to face up to this problem and tackle it once you realize it’s happening, but it isn’t a problem you’ll be able to solve overnight.

In the meantime, you can keep your features organized using subfolders and tags (see Chapter 5, Expressive Scenarios). Tagging is especially helpful, because you can use tags to partition your tests. You can choose to run partitioned sets of tests in parallel or even demote some of them to run in a Nightly Build.

It’s also worth thinking about whether some of the behavior you’ve specified in Cucumber scenarios could be pushed down and expressed in fast unit tests instead. Teams that enthusiastically embrace Cucumber sometimes forget to write unit tests as well and rely too much on slow integration tests for feedback. Try to think of your Cucumber scenarios as broad brush strokes that communicate the general behavior of the code to the business, but still try to get as good a coverage as you can from fast unit tests. Help make this happen by having testers and programmers work in pairs when implementing Cucumber scenarios. This pair can make good decisions about whether a piece of behavior necessarily needs to be implemented in a slow end-to-end Cucumber scenario and drive out the behavior using a fast unit test instead.

Big Ball of Mud

The Big Ball of Mud^[31] is an ironic name given to the type of software design you see when nobody has really made much effort to actually do any software design. In other words, it’s a big, tangled mess.

We have explained where a big ball of mud will manifest itself as problems in your Cucumber tests: slow features, fixture data, and shared environments are all examples of the trouble it can cause. Look out for these signals and be brave about making changes to your system’s design to make it easier to test.

We suggest Alistair Cockburn’s ports and adapters architecture^[32] as a way of designing your system to be testable. Michael Feathers’s Working Effectively with Legacy Code [Fea04] gives many practical examples of breaking up large systems that weren’t designed to be tested.

Hold regular sessions with your team to talk about your architecture: what you like, what you don’t like, and where you’d like to take it. It’s easy to get carried away with ambitious ideas that evaporate into thin air soon after you get back to your desks, so make sure you try to leave these sessions with realistic, practical steps that will move you in the right direction.

That covers the most common problems you’re likely to hit as you adopt Cucumber into your team. However, it’s one thing to understand these problems, but it’s quite another to get the time to do anything about them. Next we’ll talk about an important technique for making that time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Caring for Your Tests

Create new playlist

Sign In

Sign Up

Caring for Your Tests

Leaky Scenarios

Race Conditions and Sleepy Steps

Shared Environments

Tester Apartheid

Fixture Data

Lots of Scenarios

Big Ball of Mud

Table of Contents for
Caring for Your Tests