Chapter 13. Testing: Early, Often, and Automated

Defects destroy the trust required for effective software development. The customers need to be able to trust the software. The managers need to be able to trust reports of progress. The programmers need to be able to trust each other. Defects destroy this trust. Without trust, people spend much of their time defending themselves against the possibility that someone else may have made a mistake.

It is impossible to eliminate all defects, however. Increasing the mean time between failures from one month to one year seems extremely expensive, and the cost to increase it to one century, as is required for code like that flying in the space shuttle, is astronomical.

Here is the dilemma in software development: defects are expensive, but eliminating defects is also expensive. However, most defects end up costing more than it would have cost to prevent them. Defects are expensive when they occur, both the direct costs of fixing the defects and the indirect costs because of damaged relationships, lost business, and lost development time. The XP practices are aimed at communicating clearly so defects don't arise in the first place, and when they do, making sure the team uses them to learn how to avoid similar problems in the future.

There will always be defects. Unexpected circumstances will arise. In novel, unanticipated situations, the software is likely to do something the author would not have intended had he known about the situation in advance.

Acceptable defect levels vary. One goal of development is to reduce the occurrence of defects to an economically sustainable level. This level is different for different kinds of software. The world's largest web site may have a hundred software errors a second and remain economically viable because 99.99% of the pages appear correctly. Any given user experiences the web site as being reliable. The space shuttle, on the other hand, might be limited to one software-related failure per century to remain viable.

Another goal of development is to reduce the occurrence of defects to a level where trust can reasonably grow on the team. Investment in defect reduction makes sense as an investment in teamwork. Mistakes introduced by one programmer make it harder for everyone else to do their work. Every mistake by one team member that affects another costs the team time, energy, and trust. Good work and good teamwork build morale and confidence. If you can respect and trust your colleagues, you can be more productive and enjoy your work more. Hiding errors to protect yourself, while sometimes seemingly necessary, is a tremendous waste of time and energy. Trust energizes participants. We feel good when things work smoothly. We need to be safe to experiment and make mistakes. We need testing to bring accountability to our experimentation so that we can be sure we are doing no harm.

Until recently, most teams have chosen to live with defects. Having essentially defect-free code, code in which you have a defect a month or a defect a year, has been considered impossible. Even cutting defect rates in half seemed to be too expensive, both in dollars and in schedule time. Many of the social practices in XP, like pair programming, tend to reduce defects. Testing in XP is a technical activity that directly addresses defects. XP applies two principles to increase the cost-effectiveness of testing: double-checking and the Defect Cost Increase.

If you add a column of numbers one way, there are many errors that could cause your sum to be wrong. Add the numbers two different ways, say from the top and then from the bottom, and if you get the same answer both ways it is very likely to be the right answer. Your chances of making two precisely offsetting errors are small.

Software testing is double-checking. You say what you want a computation to do once when you write a test. You say it quite differently when you implement the computation. If the two expressions of the computation match, the code and the tests are in harmony and likely to be correct.

Defect Cost Increase (DCI) is the second principle applied in XP to increase the cost-effectiveness of testing. DCI is one of the few empirically verified truths about software development: the sooner you find a defect, the cheaper it is to fix. If you find a defect after a decade of deployment you'll have to reconstruct a lot of history and context to figure out what the code was supposed to do in the first place, which of those assumptions are in error, and what should be fixed so the rest of the (presumably correct) program remains undisturbed. Catch the same defect the minute it is created and the cost to fix it will be minimal.

DCI implies that software development styles with long feedback loops (Figure 14) will be expensive and have many residual defects. The budget for finding and fixing defects is limited. The more finding and fixing defects costs, the more defects will remain in deployed code.

Late, expensive testing leaves many defects

Figure 14. Late, expensive testing leaves many defects

XP uses DCI in reverse to reduce both the cost of fixing defects and the number of deployed defects. By bringing automated testing into the inner loop of programming (Figure 15), XP attempts to fix defects sooner and more cheaply. This gives XP teams a chance to inexpensively develop software with very few defects by the standards of their contemporaries.

Frequent testing reduces costs and defects

Figure 15. Frequent testing reduces costs and defects

There are several implications to more frequent testing. One is that the same people who make the mistakes have to write the tests. If the time interval between creating and detecting a defect is months, it makes perfect sense for those two activities to belong to different people. If the gap is minutes, the cost of communicating expectations between two people would be prohibitive, even with a tester dedicated to each programmer. At some point the cost of coordination overwhelms the value gained by further shortening the gap unless you have programmers write tests.

If programmers write tests, there may still be the need for another perspective on the system. A programmer or even a pair bring to their code and tests a singular point of view on the functioning of the system, losing some of the value of double-checking. Double-checking works best when two distinct thought processes arrive at the same answer. That's why it is dangerous to copy the results of a calculation as its expected value. You've only thought it through once. It's much better to calculate an example by hand to get a second perspective.

To gain the full benefits of double-checking, in XP there are two sets of tests: one set is written from the perspective of the programmers, testing the system's components exhaustively, and another set is written from the perspective of customers or users, testing the operation of the system as whole. These tests double-check each other. If the programmers' tests are perfect, the customer tests won't catch any errors.

The immediacy of testing in XP also implies that tests must be automated. I've read involved debates about automated versus manual testing. In XP, there is no contest. Over time, by improving the design and customizing the development tools, the team reduces the cost of automating tests to the point that all testing is automated. Automated tests break the stress cycle (Figure 16).

The stress cycle

Figure 16. The stress cycle

With manual testing, the more stressed the team, the more mistakes the team members make in both coding and testing. With automated testing, running the tests themselves is a stress-reliever. The more stressed the team, the more tests it runs. The tests also reduce the number of errors that escape detection by the programmers.

Beta testing is a symptom of weak testing practices and poor communication with customers. However, during the transition to earlier and more frequent testing, it is wise to leave current testing practices in place. The team's goal is to eliminate all post-development testing and shift testing resources to more highly leveraged parts of the development lifecycle. If there are forms of testing, like stress and load testing, that find defects after development is “complete,” bring them into the development cycle. Run load and stress tests continuously and automatically.

Static verification is a valid form of double-checking, particularly for defects that are hard to reproduce dynamically. For static checking to be most valuable it must become faster, part of the inner loop of development. Static checkers should provide feedback in seconds based on changes to a program, much as incremental compilers do now. Even in its current state, where statically verifying a substantial program can take days, it can still be valuable for providing confidence in the concurrency properties of a program. Like other tests, write static verification statements a little at a time, as the program demonstrates the need for double-checking.

DCI tells us to put testing near coding, but it doesn't say exactly when to test. Testing after implementation has the advantage that it makes sense. After all, you can't check the tolerances of a physical part you haven't yet made. This is where the physical metaphor behind the word “testing” is misleading. Because software is a virtual world, “testing” before or after makes equal sense. You can write code to fit a mold or a mold to fit code. You can do whichever creates the most benefit. In the end you put the two together and see if they match (Figure 17).

Code and test in either order

Figure 17. Code and test in either order

Code and tests can be written in either order. In XP, when possible, tests are written in advance of implementation. There are several advantages to writing the tests first. Folk wisdom in software development teaches that interfaces shouldn't be unduly influenced by implementations. Writing a test first is a concrete way to achieve this separation. Tests also serve the human need for certainty. When I have written all the tests that I can imagine could possibly break and they all pass, I'm certain my code is correct. Other tests I can't yet imagine might break, but at least I can point to what the system actually does as demonstrated by the tests.

Tests can provide a measure of progress. If I have ten tests broken and I fix one, then I've made progress. That said, I try to have only one broken test at a time. If I am programming test-first, I write one failing test, make it work, and then write the next failing test. My experience with getting the first test to work often informs my writing of the second. If I write tests based on unvalidated assumptions, I have to modify them all when the assumptions turn out to be wrong. System-level tests give me a sense of certainty that the whole system is working at the end of the week.

In XP, testing is as important as programming. Writing and running the tests gives the team a chance to do work it can be proud of. Running tests gives the team a valid basis for confidence as it moves quickly in unanticipated directions. Tests contribute value to development by strengthening the trust relationships within the team and with customers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.103.202