8

What Makes a Good Test?

A project developed with TDD will have a lot of tests. But don’t assume that more or longer tests are always better. You need to have good tests. But what makes a good test?

We’re not going to be writing more code in this chapter. This chapter is more of a look back at some of the situations we’ve already encountered as well as referring to some tests in upcoming chapters. This is a chance to reflect on what you’ve learned so far and to look forward to upcoming topics.

A good test should incorporate the following elements:

  • Be easy to understand – a good understanding will lead to better ideas for more tests and make tests easier to maintain.
  • Be focused on a specific scenario – don’t try to test everything in one giant test. Doing too much in a test will break the first guidance of understandability.
  • Be repeatable – tests that use random behavior to sometimes catch problems can miss issues at the worst times.
  • Be kept close to the project – make sure that tests belong to the project they are testing.
  • Should test what should happen instead of how it happens – if a test relies too much on internal workings, then it will be brittle and cause more work when the code is refactored.

Each of these topics will be explained in this chapter with examples.

Technical requirements

All code in this chapter comes from other chapters in this book and is used here as example code to reinforce ideas that make good tests.

Making tests easy to understand

Using descriptive test names is probably the single best thing you can do to improve your tests. I like to name my tests using simple sentences whenever possible. For example, one of the earliest tests we created is called "Test will pass without any confirms" and looks like this:

TEST("Test will pass without any confirms")
{
}

A good pattern to follow is this:

<object><does something><qualification>

Each of the three sections should be replaced with something specific to what you’re doing. For the example just given, <object> is Test, <does something> is will pass, and <qualification> is without any confirms.

I don’t always follow this pattern, especially when testing an object or a type for several different and related results. For example, a simple test immediately following the previous test looks like this:

TEST("Test bool confirms")
{
    bool result = isNegative(0);
    CONFIRM_FALSE(result);
    result = isNegative(-1);
    CONFIRM_TRUE(result);
}

For this simple test, there are only two possibilities. The bool is either false or true. The test is focused on the bool type, and the name fully describes what the test does. My advice is to follow the naming pattern when it makes sense.

The following are ways in which descriptive names help improve your tests:

  • The name is what you see in the summary description and will help anyone understand what is being tested with just a glance.
  • A descriptive name will help you spot holes in your testing because it’s easier to see what’s missing when you can clearly see what is already being tested.
  • A descriptive name that follows the pattern given will help you focus when writing the test. It’s easy to lose track of what a test is supposed to do and start including other things. A descriptive name will help you put related checks in their own tests where they will no longer clutter what is being tested and will have their own descriptive name, which will help them stand out.

Putting all three benefits together, you get a feedback loop that reinforces the need for good naming. You’ll naturally create more tests because each one is focused. This leads to a better understanding of what is being tested, which helps you to find missing tests. And then, when writing the new tests, you’ll stay on track and create yet more tests as new ideas come to mind.

Imagine if, instead, we had taken a different approach and created a test called "Confirm". What would it do? Does it inspire you to think of more tests? And what code would you write in the test? This is a name that stops the cycle of better tests. No one will know what the test does without reading the code. No one will think of new scenarios because the focus is dragged into the code. And the test code itself will likely be all over the place yet still not cover everything that should be tested.

And let’s not forget that using TDD is supposed to help drive our designs, improve the quality of our software, and let us refactor and enhance the code with confidence. Descriptive names help with all of this.

You might find that after a major refactoring, certain tests are no longer applicable. That’s fine, and they can be deleted. Descriptive names will help us spot these outdated tests. Sometimes, instead of deleting tests, they can be updated, and tests that are focused will be easier to update.

The next step to creating good tests is to keep them simple. A complicated test is usually a symptom of a bad design. TDD helps improve designs. So when you find a complicated test, that’s your signal to simplify the design of the project being tested.

If you can make changes that simplify the tests, then that’s usually a double win. You get tests that are easier to understand, which leads to higher quality software that is easier to use. Remember that a test is a consumer of your software, just like any other component. So when you can simplify a test, you’re also simplifying other code that uses the same code being tested.

A big part of simplifying tests is to make use of setup and teardown code. This lets the test focus on what it needs to do and lets us read and understand the main point of the test without getting distracted with other code that just gets things ready.

For example, in Chapter 14, How To Test Services, I show you the test that I first created to test a service. The test created a local instance of the service and called start. I realized that other tests would likely need to start a service, so they might as well share the same service that has already been started with some setup code. The new test uses a test suite that allows multiple tests to share the same setup and teardown code. The test looks like this:

TEST_SUITE("Request can be sent and response received", "Service 1")
{
    std::string user = "123";
    std::string path = "";
    std::string request = "Hello";
    std::string expectedResponse = "Hi, " + user;
    std::string response = gService1.service().handleRequest(
        user, path, request);
    CONFIRM_THAT(response, Equals(expectedResponse));
}

This test has a descriptive name and focuses on what needs to be tested instead of what is needed to create and start the service. The test uses the global gService1 instance, which exposes the already running service through the service method.

By providing descriptive names and keeping your tests as simple as possible, you’ll find better results with TDD that will lead to better software designs. The next section goes into more detail about how to focus on specific scenarios.

Keeping tests focused on specific scenarios

The previous section explained that one of the benefits of descriptive names is that they help keep your tests focused. We’re going to explore in this section various scenarios that will give you ideas for what to focus on.

Saying that a test should be focused is great. But if you don’t know how to figure out what to focus on, then it won’t help you. The advice becomes empty and frustrating.

These five cases will make the advice more meaningful. Not all of them may apply to all situations. But having these will help you, sort of like a checklist. All you need to do is think about each one and write specific tests that cover the case. The cases are as follows:

  1. Happy or normal: This is a common use case.
  2. Edge: This is a case near the transition between normal and error cases.
  3. Error: This is a common problem that needs to be handled.
  4. Not normal: This is a valid but uncommon use case.
  5. Deliberate misuse: This is an error case designed to cause problems on purpose.

Let’s start with the happy or normal case first. This one should be easy, but it often gets over-complicated by including some of the other cases in the same test. Or another way it can be over-complicated is by creating a test that is too vague or not clear that it’s the happy or normal case.

The actual name for this should probably be the normal case since that matches the style of the other cases. But I so often think of this as the happy case that I included both names. You might also think of this as a typical case. However you think of it, all you need to do is pick a scenario that best describes a common way that your code will be used. I think of it as a happy case because there should not be any errors. This should represent usage that is expected and typical and should succeed. For example, in Chapter 13, How to Test Floating Point and Custom Values, there’s a test for float values that covers 1,000 typical values from 0.1 up to 100 in increments of 0.1. The test looks like this:

TEST("Test many float comparisons")
{
    int totalCount {1'000};
    int passCount = performComparisons<float>(totalCount);
    CONFIRM_THAT(passCount, Equals(totalCount));
}

An edge case is right on the borderline between a happy case and a problem or error case. You might often need two edge cases where one represents the most extreme usage that is still within normal bounds, and the other is the beginning of the error conditions. An edge case is a transition between good and bad results. And there can often be multiple edge cases.

Edge cases are extremely valuable to include in your testing because they tend to find a lot of bugs, and maybe even more importantly, they make you think about your design. When you consider edge cases, you’ll often either accept the edge case or change your design so that the edge case doesn’t apply anymore.

The edge cases for the previous float comparisons are to test a very small float value and a very large float value. These are two separate tests and look like this:

TEST("Test small float values")
{
    // Based on float epsilon = 1.1920928955078125e-07
    CONFIRM_THAT(0.000001f, NotEquals(0.000002f));
}
TEST("Test large float values")
{
    // Based on float epsilon = 1.1920928955078125e-07
    CONFIRM_THAT(9'999.0f, Equals(9'999.001f));
}

Edge cases can sometimes be a bit more technical because there’s usually a reason for the test to be an edge case. For float values, the edge cases are based on the epsilon value. Epsilon values are explained in Chapter 13, How to Test Floating Point Values. Adding tests for small and large floating point values will cause us to change the entire way that we compare floating point values in Chapter 13. This is why edge cases are so valuable in testing.

The error case is like the happy case turned sad. Think of a typical problem that your code might need to handle and write a test for that specific problem. Just like how the happy case can sometimes be over-complicated, this one, too, can be over-complicated. You don’t need to include minor variations of an error case just for the sake of variation alone. Just pick what you think represents the most common or middle case that should result in an error and create a test for just that one case. Of course, you will want to name the test with a descriptive name that explains the case.

For example, in Chapter 11, Managing Dependencies, there’s a normal test to make sure that tags can be used to filter messages. An error case is almost the opposite and makes sure that an overridden default tag is not used to filter the message. The test might not make sense without first reading Chapter 11. I’m including it here as an example of an error case. Notice CONFIRM_FALSE at the end of the test, which is the part that ensures the log message does not appear in the log file. The test looks like this:

TEST("Overridden default tag not used to filter messages")
{
    MereTDD::SetupAndTeardown<TempFilterClause> filter;
    MereMemo::addFilterLiteral(filter.id(), info);
    std::string message = "message ";
    message += Util::randomString();
    MereMemo::log(debug) << message;
    bool result = Util::isTextInFile(message,          "application.log");
    CONFIRM_FALSE(result);
}

If there are multiple error cases that you think are all important enough to be included, put them in separate tests and ask what makes each one different. This will lead to the insight that might cause you to change your design or might lead to more tests.

I like to include a few tests that are just outside of a normal case but not borderline or edge. These are still within valid use that should succeed but might cause your code to do a little extra work. This case can be valuable in helping to catch regressions. A regression is a bug that is new and represents a problem that used to work previously. A regression is most common after making a large design change. Having some tests that are not normal but still expected to succeed will improve your confidence in your code continuing to work after major changes are made.

The last case is deliberate misuse and is important for security reasons. This is not just an error case; it’s an error case crafted to try to cause your code to fail in predictable ways that an attacker can use for their own purposes. For cases like this, instead of creating tests that you know will fail, try to think of what would cause your code to fail spectacularly. Maybe your code treats negative numbers as errors. Then for deliberate misuse, maybe consider using really large negative numbers.

In Chapter 14, How to Test Services, there is mention of a possible test for deliberate misuse. We don’t actually create this test, but I do describe what the test would look like. In the service, there is a string value that represents the request being made. The code handles unrecognized request strings, and I mentioned that a good test would try to call the service with some request that doesn’t exist to make sure that the service properly handles the ill-formed request.

For a final piece of advice about focusing on specific scenarios, I’d like to recommend that you avoid retesting. This is not one of the five cases just mentioned because it applies to all of them.

Retesting is where you check the same thing or make the same confirmations over and over in many tests. You don’t need to do this, and it just distracts you from what each test should be focused on.

If there’s a property or result that you need to confirm works, then create a test for it. And then you can trust that it will be tested. You don’t need to confirm again and again that the property works as expected each time it’s used. Once you have a test for something, then you don’t need to verify it works in other tests.

Use random behavior only in this way

The previous chapter mentioned using random behavior in tests, and it’s important for you to understand more about this so that your tests are predictable and repeatable.

Predictability and randomness are about as opposite as you can get. How should we reconcile these two properties? The first thing to understand is that the tests you write should be predictable. If a test passes, then it should always pass unless something outside of your control fails, such as a hard drive crashing in the middle of your tests. There’s no way to predictably handle accidents like that, and that’s not what I’m talking about. I mean that if a test passes, then it should continue to pass until some code change causes it to fail.

And when a test fails, then it should continue to fail until the problem is fixed. The last thing you want is to add random behavior to your tests so that you sometimes do one thing and other times do another. That’s because the first behavior might pass, while the second behavior goes down a different code path that fails.

If you get a failure, make a change that you think fixes the problem, and then get a pass, you might think that your change fixed the problem. But what if the second test run just happened to use the random behavior that was always passing? It makes it hard to verify that a code change actually fixed a problem.

And worse yet, what happens when some random failure condition never gets run? You think all possible code path combinations are being run, but by chance, one or more conditions are skipped. This can cause you to miss a bug that should have been caught.

I hope I’ve convinced you to stay away from random test behavior. If you want to test different scenarios, then write multiple tests so that each scenario is covered by its own test that will reliably be run.

Why, then, did I mention using randomness in the previous chapter? I actually do suggest that you use randomness but not in a way that determines what a test does; rather, in a way that helps prevent collisions between different test runs. The random behavior is mentioned in one of the helper functions to create a temporary table like this:

std::string createTestTable ()
{
    // If this was real code, it might open a
    // connection to a database, create a temp
    // table with a random name, and return the
    // table name.
    return "test_data_01";
}

Let’s say that you have a test that needs some data. You create the data in setup and delete it when the test is done in teardown. What happens if the test program crashes during the test and the teardown never gets a chance to delete the data? The next time you run your test, it’s likely that the setup will fail because the data still exists. This is the collision I’m talking about.

Maybe you think that you can just enhance the setup to succeed if it finds the data already exists. Well, you can also get collisions in other ways, such as when writing code in a team and two team members happen to run the tests at almost the same time. Both setup steps run, and one of them finds the data already exists and continues. But before the test can begin to use the data, the other team member finished, and the teardown code deletes the data. The team member that is still running tests will now fail because the data no longer exists.

You can almost entirely eliminate this problem by generating random data. Not so random that the behavior of the test is affected. But just random enough to avoid conflicts. Maybe the data is identified by a name. As long as the name is not part of what is being tested, the name can be changed slightly so that each time the test is run, the data will have a different name. The createTestTable function returns a hardcoded name, but the comment mentions that a random name might be better.

There is a place for using full random behavior in tests, such as when performing random penetration testing, and you need to fuzz or alter the data to simulate scenarios that you otherwise would not be able to write specific test cases for. The number of possible combinations could be too many to handle with specific named test cases. So in these situations, it is a good idea to write tests that use random data that can change the behavior and outcome of the tests. But these types of tests won’t help you with TDD to improve your designs. They have a place that supplements TDD.

When writing tests that use random behavior, such as when handling uncountable combinations, you’ll need to capture the failures because each one will need to be analyzed to find the problem. This is a time-consuming process. It’s valuable, but not what you need when writing a test to help you figure out what design to use or when evaluating the results of a major design change to see if anything was broken.

For the types of tests that are most beneficial to TDD, avoid any random behavior that can change the outcome of the tests. This will keep your tests reproducible and predictable.

Only test your project

Other components and libraries will be used and might fail. How should you handle these failures in your tests? My advice is to assume that only your code needs to be tested. You should assume that the components and libraries you are using have already been tested and are working correctly.

For one thing, remember that we are using TDD to improve our own code. If you were to write a test for some code that you bought or found online, how would this affect your own code?

There’s always a possibility that you are using an open source library and you have a good idea for improvement. That’s great! But that improvement belongs in that other project. It has no place in your own project’s tests. Even if you find a bug in a commercial software package, all you can do is report the problem and hope it gets fixed.

The last thing you want to do is put a confirmation in your own test project that confirms some other code is working as expected. This not only does not affect your own designs, it actually makes your tests less focused. It takes away from the clarity you should be aiming for by adding distractions that don’t directly benefit your own project.

The next chapter of this book begins Part 2, where we’ll build a logging library. The logging library will be a separate project with its own set of tests. The logging library will also use the testing library we’ve been building. Imagine how confusing it would be if we were to add a new feature to the testing library and then test that new feature from the logging library.

Test what should happen instead of how

A common problem I see is when a test tries to verify expected results by checking internal steps along the way. The test is checking how something is done. This type of test is fragile and often needs frequent updates and changes.

A better approach is to test what happens as a final result. Because then the internal steps can change and adapt as needed. The test remains valid the entire time without further maintenance.

If you find yourself going through your tests and frequently updating them, so the tests pass again, then your tests might be testing how something is done instead of what is done.

For example, in Chapter 10, The TDD Process in Depth, there’s a section called When is testing too much? that explains the idea of what to test in greater detail.

The general idea is this. Let’s say you have a function that adds a filter to a collection. If you write a test that’s focused on how the code works, then you might go through the items in the collection to make sure the filter that was just added really appears in the collection. The problem with this approach is that the collection is an internal step and might change, which will cause the test to fail. A better approach is to first add the filter and then try to perform an operation that would be affected by the filter. Make sure that the filter affects the code as you expect, and leave the internal details of how it works to the code being tested. This is testing what should happen and is a much better approach.

Summary

This has been more of a reflective chapter where you learned some tips to help you write better tests. Examples from earlier and later chapters were used to help reinforce the ideas and guidance. You’ll write better tests if you make sure to consider the following items:

  • The tests should be easy to understand with descriptive names.
  • Prefer small and focused tests instead of large tests that try to do everything.
  • Make sure that tests are repeatable. If a test fails once, then it should continue to fail until the code is fixed.
  • Once you test something, you don’t need to keep testing the same thing. And if you have some useful code that other tests can share, then consider putting the code in its own project with its own set of tests. Only test the code that is in your project.
  • Test what should happen instead of how it should happen. In other words, focus less on the internal steps and instead verify the results you are most interested in.

There are many ways to write better tests. This chapter should not be considered to include the only things you need to consider. Instead, this chapter identifies some common issues that cause problems with tests and gives you tips and advice to improve your tests. In the next chapter, we’re going to use TDD to create something that will use a unit test library just like any other project you’ll be working on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.238.70