Caring for Your Tests

The benefit of automating your features is that you’ll be able to trust them as living documentation in the long run, because you’ll be checking each scenario against the production code to make sure it’s still valid. For the programmers working on the code, there’s another benefit too: those tests act as a safety net when they’re working on the system, alerting them to any mistakes they make that break existing behavior.

So, your features work as a feedback mechanism to the whole team about the behavior of the system and to the programmers about whether they’ve broken anything. For these feedback loops to be useful, the tests need to be fast and they need to be reliable. Let’s start by looking at problems that affect the reliability of your tests.

Leaky Scenarios

Cucumber scenarios are basically state-transition tests: you put the system into a Given state A, you perform action X (When), and Then you check that it has moved into expected state B. So, each scenario needs the system to be in a certain state before it begins, and yet each scenario also leaves the system in a new, dirty state when it’s finished.

When the state of the system is not reset between tests, we say that they allow state to leak between them. This is a major cause of brittle tests.

When one scenario depends upon state left behind by another earlier scenario in order for it to pass, you’ve created a dependency between the two scenarios. When you have a chain of several scenarios depending on each other like this, it’s only a matter of time before you end up with a train wreck.

If that first scenario, the one that happens to leave the system in just the right state for the next one to pick it up, is ever changed, the next scenario will suddenly start to fail. Even if you don’t change that earlier scenario, what happens if you want to run only the second scenario on its own? Without the state leaked out by the earlier scenario, it will fail.

The opposite of this, independent scenarios, ensures they put the system into a clean state and then add their own data on top of it. This makes them able to stand on their own, rather than being coupled to the data left behind by other tests or shared fixture data. Investing in building up a good reliable library of Test Data Builders makes this much easier to achieve.

Matt says:
Matt says:
Fixture Is an Overloaded Term

The word fixture has at least three meanings in the domain of automated testing, which can cause confusion. We’ve used the term fixture data in this chapter to mean data that’s used to set up the context for a scenario or test case. This is the common meaning of the term as used in various xUnit testing tools[27] and by the Ruby on Rails framework.

There is a long tradition (coming from the hardware world, where test fixtures originated) of calling the link between the test system and the system under test a fixture. This is the “glue code” role that we’ve referred to in this book as automation code. The FIT testing framework[28] uses this meaning of the term.

Some unit testing tools (such as NUnit) have further confused the issue by referring to the test case class itself as a fixture. So much for a ubiquitous language!

We can’t stress enough how fundamental independent scenarios are to successful automated testing. Apart from the added reliability of scenarios that independently set up their own data, they’re also clearer to read and debug. When you can see precisely what data is used by a scenario just by reading it, rather than having to root around in a fixture data script or, worse, in the database itself, you’ll be able to understand and diagnose a failure much more easily.

Race Conditions and Sleepy Steps

If you write end-to-end integration tests for a reasonably complex system, you’ll eventually encounter this problem. Race conditions occur when two or more parts of the system are running in parallel, but success depends on a particular one of them finishing first. In the case of a Cucumber test, your When step might cause the system to start some work that it runs in the background, such as generating a PDF or updating a search index. If this background task happens to finish before Cucumber runs your Then step, the scenario will pass. If Cucumber wins the race and the Then step executes before the background task is finished, the scenario will fail.

When it’s a close race, you’ll have a flickering scenario, where the scenario will pass and fail intermittently. If there’s a clear winner, a race condition can go unnoticed for a long time, until a new change to the system evens up the stakes, and the scenario starts to fail at random.

A crude solution to this problem is to introduce a fixed-length pause or sleep into the scenario to give the system time to finish processing the background task. Although this is definitely a useful technique in the very short term to diagnose a race condition, you should resist the temptation to leave a sleep in your tests once you understand the cause of the problem. Introducing sleepy steps won’t solve the flickering problem but just make it less likely to happen. In the meantime, you’ve added a few extra seconds to your total test run time, swapping one set of problems for another.

If we had to choose, we’d choose slow, reliable tests over faster unreliable ones, but there’s no need to make this compromise. When testers and programmers pair up to automate scenarios, they can craft tests that are built with knowledge of how the system works. This means they can make use of cues from the system to let the tests know when it’s safe to proceed, so instead of using crude fixed-length sleeps, the tests can proceed as quickly as possible. For an example of working with asynchronous code and further detail, see Chapter 9, Dealing with Message Queues and Asynchronous Components.

Shared Environments

This is a problem that we’ve often found in teams that are transitioning from a manual acceptance testing regime to using automated acceptance tests. Traditionally, the manual testers on the team would have a special environment, often called system test, where a recent build of the system would be deployed. They’d run their manual tests in that environment, reporting bugs back to the development team. If more than one team member needed to run tests in that same environment, they’d communicate between each other to make sure they didn’t tread on one another’s toes.

If it’s even slightly awkward to install the system in a new environment, the likelihood is that when the team starts to automate their tests, they’ll follow the path of least resistance and point their test scripts at this existing system test environment. Now the environment is shared between not only the human members of the team but the test scripts too. Suppose a developer gets a bug report and wants to reproduce it for himself. He logs in to the system test environment and clicks a few buttons, but he doesn’t realize the automated tests are running at the same time. As part of the steps to reproduce the bug, the developer unwittingly deletes a database record that the automated test relied on, and the automated test fails. This kind of situation is a classic cause of flickering scenarios.

The shared use of a single environment can also contribute to unreliable tests by causing heavy and inconsistent load on in-demand resources like databases. When the shared database is under excessive load, normally reliable tests will time out and fail.

To deal with this problem, it needs to be so easy to spin up the system in a new environment that you can do it for fun. You need a One-Click System Setup.

Tester Apartheid

Testers are too often unfairly regarded as second-class citizens on a software team. As we’ll explain in Chapter 8, Support Code, developing a healthy suite of Cucumber features requires not only testing skill but programming skill too. When testers are left alone to build their own Cucumber tests, they may lack the software engineering skill to keep their step definition and support code well organized. Before you know it, the tests are a brittle muddle that people are scared to change.

Combat this problem by encouraging programmers and testers to work together when writing step definition and support code. The programmers can show the tester how to keep the code organized and factor out reusable components or libraries that other teams can use. This is precisely how libraries like Capybara (see Chapter 15, Using Capybara to Test Ajax Web Applications) were created, when a programmer extracted reusable code from their team’s step definitions. By pairing with testers like this, programmers also develop a better understanding of what it takes to make their code testable.

When Cucumber is being used to good effect on a team, a tester should be able to delegate the work of running basic checks to Cucumber. This frees them up to do the more interesting, creative work of exploratory testing, as explained in Agile Testing: A Practical Guide for Testers and Agile Teams [CG08].

Fixture Data

When manually testing a system, it’s useful to populate it with realistic data so that you can wander around the system just like you would in the live application. When your team transitions from manual to automated tests, you can often be tempted to just port over a subset of production data so that the automated tests have a functioning system to work with straightaway.

The alternative, having each test set up its own data, might seem like it’s just too hard. In a legacy system—especially one where the design has evolved organically—where creating the single object actually needed for the test you’re working on means you need to create a huge tree of other dependent objects, you’ll feel like the easiest option is just to create this tree once in the fixture data and then share it with all the other tests.

This approach has a couple of significant problems. A set of fixture data, even if it starts out relatively lean, will tend to grow in size over time. As more and more scenarios come to rely on the data, each needing their own specific tweaks, the size and complexity of the fixture data grows and grows. You’ll start to feel the pain of brittle features when you make a change to the data for one scenario, but that change causes other scenarios to fail. Because you have so many different scenarios depending on the fixture data, you’ll tend to plaster on more data because it’s safer than changing the existing data. When a scenario does fail, it’s hard to know which of the masses of data in the system could be relevant to the failure, making diagnosis much more difficult.

When you have a large set of fixture data, it can be slow to set it up between each test. This puts pressure on you to write leaky scenarios that don’t reset the state of the system between each scenario, because it’s quicker not to do so. We’ve already explained where this can lead.

We consider fixture data to be an antipattern. We much prefer using Test Data Builders like FactoryGirl[30] where the relevant data is created within the test itself, rather than being buried away in a big tangled set of fixture data.

Lots of Scenarios

It might seem like stating the obvious, but having a lot of scenarios is by far the easiest way to give yourself a slow overall feature run. We’re not trying to suggest you give up on BDD and go back to cowboy coding, but we do suggest you treat a slow feature run as a red flag. Having lots of tests has other disadvantages than just waiting a long time for feedback. It’s hard to keep a large set of features organized, making them awkward for readers to navigate around. Maintenance is also harder on the underlying step definitions and support code.

We find that teams that have a single humongous build also tend to have an architecture that could best be described as a big ball of mud. Because all of the behavior in the system is implemented in one place, all the tests have to live in one place, too, and have to all be run together as one big lump. This is a classic ailment of long-lived Ruby on Rails applications, which tend to grow organically without obvious interfaces between their subsystems.

We’ll talk more about what to do with big balls of mud in the next section. It’s really important to face up to this problem and tackle it once you realize it’s happening, but it isn’t a problem you’ll be able to solve overnight.

In the meantime, you can keep your features organized using subfolders and tags (see Chapter 5, Expressive Scenarios). Tagging is especially helpful, because you can use tags to partition your tests. You can choose to run partitioned sets of tests in parallel or even demote some of them to run in a Nightly Build.

It’s also worth thinking about whether some of the behavior you’ve specified in Cucumber scenarios could be pushed down and expressed in fast unit tests instead. Teams that enthusiastically embrace Cucumber sometimes forget to write unit tests as well and rely too much on slow integration tests for feedback. Try to think of your Cucumber scenarios as broad brush strokes that communicate the general behavior of the code to the business, but still try to get as good a coverage as you can from fast unit tests. Help make this happen by having testers and programmers work in pairs when implementing Cucumber scenarios. This pair can make good decisions about whether a piece of behavior necessarily needs to be implemented in a slow end-to-end Cucumber scenario and drive out the behavior using a fast unit test instead.

Big Ball of Mud

The Big Ball of Mud[31] is an ironic name given to the type of software design you see when nobody has really made much effort to actually do any software design. In other words, it’s a big, tangled mess.

We have explained where a big ball of mud will manifest itself as problems in your Cucumber tests: slow features, fixture data, and shared environments are all examples of the trouble it can cause. Look out for these signals and be brave about making changes to your system’s design to make it easier to test.

We suggest Alistair Cockburn’s ports and adapters architecture[32] as a way of designing your system to be testable. Michael Feathers’s Working Effectively with Legacy Code [Fea04] gives many practical examples of breaking up large systems that weren’t designed to be tested.

Hold regular sessions with your team to talk about your architecture: what you like, what you don’t like, and where you’d like to take it. It’s easy to get carried away with ambitious ideas that evaporate into thin air soon after you get back to your desks, so make sure you try to leave these sessions with realistic, practical steps that will move you in the right direction.

That covers the most common problems you’re likely to hit as you adopt Cucumber into your team. However, it’s one thing to understand these problems, but it’s quite another to get the time to do anything about them. Next we’ll talk about an important technique for making that time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.55.198