Fast Tests, Slow Tests, Filters, and Suites

If you test-drive small, isolated units of code, each of your tests has the ability to run blazingly fast. A typical test might run in less than a millisecond on a decent machine. With that speed, you can run at least thousands of unit tests in a couple seconds.

Dependencies on collaborators that interact with slow resources such as databases and external services will slow things down. Simply establishing a database connection might take 50ms. If a large percentage of your tests must interact with a database, your thousands of tests will require minutes to run. Some shops wait more than a half hour for all their tests to run.

You’ll learn how to begin breaking those dependencies on slow collaborators in the chapter Chapter 5, Test Doubles. Building fast tests might represent the difference between succeeding with TDD and abandoning it prematurely. Why?

A core goal of doing TDD is to obtain as much feedback as possible as often as possible. When you change a small bit of code, you want to know immediately whether your change was correct. Did you break something in a far-flung corner of the codebase?

You want to run all of your unit tests with each small change. A significant payoff of TDD is the ability to obtain this ridiculously rapid and powerful feedback. If you built fast unit tests, it’s absolutely possible, as suggested, to run all your tests in a few seconds. Running all your tests all the time is not unreasonable with such short wait times.

If instead your tests complete executing in more than a few seconds, you won’t run them as often. How often would you wait two minutes for your tests to run—perhaps five times an hour? What if the tests take twenty minutes to run? You might run those tests a handful of times per day.

TDD begins to diminish in power as the feedback cycle lengthens. The longer the time between feedback, the more questionable code you will create. It’s easy and typical to introduce small, or even large, problems as you code. In contrast, running tests after every small change allows you to incrementally address these problems. If your test reveals a defect after you’ve added a couple minutes of new code, you can easily pinpoint code that caused the problem. If you created a few lines of difficult code, you can clean it up readily and safely.

Slow tests create such a problem for TDD that some folks no longer call them unit tests but instead refer to them as integration tests. (See Unit Tests, Integration Tests, and Acceptance Tests.)

Running a Subset of the Tests

You might already have a set of tests (a suite) that verifies a portion of your system. Chances are good that your test suite isn’t so fast, since most existing systems exhibit countless dependencies on slow collaborators.

The first natural reaction to a slow test suite is to run tests less frequently. That strategy doesn’t work if you want to do TDD. The second reaction is to run a subset of the tests all the time. While less than ideal, this strategy might just work, as long as you understand the implications.

Google Mock makes it easy to run a subset of the tests by specifying what is known as a test filter. You specify a test filter as a command-line argument to your test executable. The filter allows you to specify tests to execute in the form test case name.test name. For example, to run a single, specific test, do this:

./test --gtest_filter=ATweet.CanBeCopyConstructed # weak

Don’t get in a regular habit of running only one test at a time, however. You can use a wildcard character (*) to run numerous tests. Let’s at least run all of the tests associated with the ATweet test case.

./test --gtest_filter=ATweet.* # slightly less weak

If you’re test-driving a class related to tweets, perhaps the best thing you can do is find a way to run all other tests related to tweets, too. You have wildcards, dude (or dude-ess)! Take advantage of them.

./test --gtest_filter=*weet*.*

Now we’re talking! That filter also includes all of the tests against RetweetCollection. (I cleverly omitted the capital T since it doesn’t match the lowercase t of RetweetCollection.)

What if you want to run all tests related to tweets but avoid any construction tests for the Tweet class? Google Mock lets you create complex filters.

./test --gtest_filter=*Retweet*.*:ATweet.*:-ATweet*.*Construct*

You use a colon (:) to separate individual filters. Once Google Mock encounters a negative sign (hyphen), all filters thereafter are subtractive. In the previous example, -ATweet*.*Construct* tells Google Mock to ignore all ATweet tests with the word Construct somewhere in their name.

So, what’s the problem with running a subset of the tests? The more tests you can run in a short time, the more likely you will know the moment you introduce a defect. The fewer tests you run, the longer it will be on average before you discover a defect. In general, the longer the time between introducing a defect and discovering it, the longer it will take to correct. There are a few simple reasons. First, if you’ve worked on other things in the interim, you will often expend additional time to understand the original solution. Second, newer code in the area may increase the comprehension effort required as well as the difficulty of making a fix.

Many unit test tools (but not Google Mock) directly support the ability to permanently specify arbitrary suites. For example, CppUnit provides a TestSuite class that allows you to programmatically add tests to a collection of tests to run.

Test Triaging

I worked recently with a customer that had a sizable C++ codebase. They had embarked on TDD not long before my arrival and had built a few hundred tests. The majority of their tests ran slowly, and the entire test run took about three minutes, far too long for the number of tests in place. Most tests took about 300 or more milliseconds to execute.

Using Google Mock, we quickly wrote a test listener whose job was to produce a file, slowTests.txt, containing a list of tests that took longer than 10ms to execute. We then altered Google Mock code to support reading a list of filters from a file. That change essentially provided suite support for Google Mock. We then changed Google Mock to support running the converse of a specified filter. Using the slowTests.txt file, the team could run either the slow subset of all tests or the fast subset of all tests. The slow suite took most of the three minutes to run (see Unit Tests, Integration Tests, and Acceptance Tests); the fast suite took a couple seconds.

We altered Google Mock to fail if an individual test exceeded a specified number of milliseconds (passed in as a command-line argument named slow_test_threshold). The customer then configured the continuous integration (CI) build to first run the fast test suite, triggering it to fail if any test was too slow, and then to run the slow suite.

We told the developers to run the fast suite continually at their desk as they test-drove code. The developers were to specify the slow_test_threshold value as they ran fast tests, so they knew as soon as they introduced an unacceptably slow test. Upon check-in, they were to run both slow and fast suites. We asked the developers to try to shrink the size of slowTests.txt over time, by finding ways to eliminate bad dependencies that made those tests run slowly (see the chapter Chapter 5, Test Doubles).

The team got the message and ended up with the bulk of their unit tests as fast tests. Last I heard, they were rapidly growing their test-driven codebase successfully.

I’ve helped a few teams through a similar process of test triaging. Often it’s the 80-20 rule: 20 percent of the tests take 80 percent of the execution time. Bucketing the tests into fast and slow suites allows a team to quickly improve their ability to do TDD.

The ability to define suites provides the basis for splitting tests so that you can run fast tests only. The identification of the slow tests allows you to incrementally transform crummy slow tests into spiffy fast tests.

You will usually need integration tests (tests that must integrate with external services; see Unit Tests, Integration Tests, and Acceptance Tests). They will be slow. Having the ability to arbitrarily define suites allows you to maintain many collections of tests that you run in the CI build as appropriate.

