The practices in this chapter describe a radical departure from the way most programmers have behaved during the last 70 years. They enforce a profound minute-by-minute and second-by-second set of ritualistic behaviors that most programmers initially consider absurd. Many programmers have therefore attempted to do Agile without these practices. They fail, however, because these practices are the very core of Agile. Without TDD, Refactoring, Simple Design, and yes, even Pair Programming, Agile becomes an ineffective flaccid shell of what it was intended to be.
Test-Driven Development is a rich and complex topic that will require an entire book to cover properly. This chapter is merely an overview that focuses more on justification and motivation than on the deeper technical aspects of the practice. In particular, this chapter will not show any code.
Programmers are engaged in a unique profession. We produce huge documents of deeply technical and arcane symbology. Every symbol written into these documents must be correct; otherwise, truly terrible things can happen. One wrong symbol can result in the loss of fortunes and lives. What other profession is like that?
Accounting. Accountants produce huge documents of deeply technical and arcane symbology. Every symbol written into their documents must be correct lest fortunes, and possibly even lives, be lost. How do accountants ensure that every symbol is correct?
Accountants have a discipline that was invented 1000 years ago. It’s called double-entry bookkeeping.1 Every transaction they enter into their books is entered twice: once as a credit in one set of accounts, and again as a complementary debit in another set of accounts. These accounts eventually flow into a single document called the balance sheet, which subtracts the sum of liabilities and equities from the sum of assets. That difference must be zero. If it is not zero, then an error has been made.2
2. If you have studied accounting, your hair is likely now on fire. Yes, this was a gross simplification. On the other hand, had I described TDD in a single paragraph, all the programmers would set their hair on fire.
Accountants are taught, in the early days of their schooling, to enter the transactions one at a time and compute the balance each time. This allows them to catch errors quickly. They are taught to avoid entering a batch of transactions between balance checks, since then errors would be hard to find. This practice is so essential to the proper accounting of monies that it has become law in virtually all parts of the world.
Test-Driven Development is the corresponding practice for programmers. Every required behavior is entered twice: once as a test, and then again as production code that makes the test pass. The two entries are complementary, just as assets are complementary to liabilities and equities. When executed together, the two entries produce a zero result: Zero tests failed.
Programmers who learn TDD are taught to enter every behavior one at a time—once as a failing test, and then again as production code that passes the test. This allows them to catch errors quickly. They are taught to avoid writing a lot of production code and then adding a batch of tests, since errors would then be hard to find.
These two disciplines, double-entry bookkeeping and TDD, are equivalent. They both serve the same function: to prevent errors in critically important documents where every symbol must be correct. Despite how essential programming has become to our society, we have not yet imbued TDD with the force of law. But given that lives and fortunes have already been lost to the vagaries of poorly written software, can that law be far away?
TDD can be described with three simple rules.
Do not write any production code until you have first written a test that fails due to the lack of that code.
Do not write more of a test than is sufficient to fail—and failing to compile counts as a failure.
Do not write more production code than is sufficient to pass the currently failing test.
A programmer with more than a few months’ experience will likely consider these rules to be outlandish, if not downright stupid. They imply a cycle of programming that is perhaps five seconds long. The programmer begins by writing some test code for production code that does not yet exist. This test fails to compile almost immediately because it mentions elements of the production code that have not yet been written. The programmer must stop writing the test and start writing production code. But after only a few keystrokes, the test that failed to compile now compiles properly. This forces the programmer to return to the test and continue to add to it.
This oscillation between the test and the production code is just a few seconds long, and the programmers are trapped within this cycle. The programmers will never again be able to write an entire function, or even a simple
if statement or
while loop, without interrupting themselves by writing the complimentary test code.
Most programmers initially view this as a disruption of their thought processes. This continual interruption imposed by the Three Rules prevents them from properly thinking through the code they are writing. They often feel that the Three Rules create an intolerable distraction.
However, imagine a group of programmers following these Three Rules. Choose any one of those programmers you like, at any time. Everything that programmer was working on executed and passed all its tests less than a minute ago. It doesn’t matter whom you choose or when you choose them—everything worked less than a minute ago.
What would it be like if everything always worked a minute ago? How much debugging would you have to do? If everything worked a minute ago, then almost any failure you encounter will be less than a minute old. Debugging a failure that was added in the last minute is often trivial. Indeed, using a debugger to find the problem is probably overkill.
Are you skilled at operating the debugger? Do you have the debugger’s hot keys memorized? Does your muscle memory automatically know how to hit those keys to set breakpoints, single-step, step-into, and step-over? When you are debugging, do you feel like you are in your element? This is not a skill to be desired.
The only way you get good at using a debugger is by spending a lot of time debugging. Spending a lot of time debugging implies that there are always a lot of bugs. Test-Driven Developers are not skilled at operating the debugger because they simply don’t use a debugger that often; and when they do, it is typically for a very brief amount of time.
Now I don’t want to create a false impression. Even the best Test-Driven Developer still encounters difficult bugs. This is still software; it’s still hard. But the incidence and severity of bugs is vastly reduced by practicing the Three Rules.
Have you ever integrated a third-party package? It likely came in a zip file that contained some source code, DLLs, JAR files, etc. One of the files in that archive was likely a PDF that contained the instructions for integration. At the end of the PDF, there was probably an ugly appendix with all the code examples.
What was the first thing you read in that document? If you are a programmer, you skipped right to the back and read the code examples because the code will tell you the truth.
When you follow the Three Rules, the tests you end up writing become the code examples for the whole system. If you want to know how to call an API function, there are tests that call that function every way it can be called, catching every exception it can throw. If you want to know how to create an object, there are tests that create that object every way it can be created.
The tests are a form of documentation that describe the system being tested. This documentation is written in a language that the programmers know fluently. It is utterly unambiguous, it is so formal it executes, and it cannot get out of sync with the application code. The tests are the perfect kind of documentation for programmers: code.
What’s more, the tests do not form a system in and of themselves. The tests don’t know about each other. There are no dependencies between the tests. Each test is a small and independent unit of code that describes the way one small part of the system behaves.
If you have ever written tests after the fact, you know that it’s not a lot of fun. It’s not fun because you already know the code works. You’ve tested it manually. You are likely writing those tests because someone told you that you had to. It feels like busy work. It’s boring.
When you write the tests first according to the Three Rules, it’s fun. Every new test is a challenge. Every time you make a test pass, it’s a small success. Your work, as you follow the Three Rules, is a chain of those small challenges and successes. It doesn’t feel like busy work—it feels like getting stuff working.
Now let’s return to after-the-fact tests. You somehow feel obligated to write these tests even though you’ve tested the system manually, and you already know it works. You proceed from test to test being unsurprised by the fact that the tests pass.
Inevitably, you will come to a test that’s hard to write. It’s hard to write because when you wrote the code you weren’t thinking about testability, and you did not design it to be testable. To write a test for this code, you are going to have to change the structure by breaking some couplings, adding some abstractions, and/or rerouting some function calls and arguments. This feels like a lot of effort, especially because you already know the code works.
The schedule is tight, and you know you have more pressing things to do. So, you set that test aside. You convince yourself either that it is unnecessary or that you’ll go back and write it later. Thus, you leave a hole in the test suite.
And since you have left holes in the test suite, you suspect everyone else has, too. When you execute the test suite and see it pass, you laugh, smirk, or derisively wave your hand because you know that the passing of the suite doesn’t mean that the system works.
When such a test suite passes, there is no decision you can make. The only information that the passing tests give you is that nothing tested is broken. The incompleteness of the test suite leaves you with no options. However, if you follow the Three Rules, then every line of production code was written in order to make a test pass. Therefore, the test suite is very complete. When it passes, you can make a decision. That decision is to deploy.
That’s the goal. We want to create a suite of automated tests that tells us that it is safe to deploy the system.
Now again, I don’t want to paint a false picture. Following the Three Rules will give you a very complete test suite, but it is probably not 100% complete. This is because there are situations in which following the Three Rules is not practical. These situations are outside the scope of this book except to say that they are limited, and there are solutions that mitigate them. The result is that even the most diligent adherents to the Three Rules are not likely to produce a test suite that is 100% complete.
But 100% completeness is not necessary for the deployment decision. Coverage in the high 90s is likely all that is required—and that kind of completeness is eminently achievable.
I have created test suites that are so complete that they allow the deployment decision to be made. I have seen many others do so as well. In each of those cases, the completeness was less than 100%, but it was high enough to make the deployment decision.
Test coverage is a team metric, not a management metric. Managers are unlikely to know what the metric actually means. Managers should not use this metric as a goal or a target. The team should use it solely to inform their testing strategy.
Do not fail the build based on insuffi cient coverage. If you do this, then the programmers will be forced to remove enough assertions from their tests in order to get the coverage numbers high enough. Code coverage is a complex topic that can only be understood in the context of a deep knowledge of the code and tests. Don’t let it become a management metric.
Remember that function that’s hard to test after the fact? It may be hard to test because it is coupled to behaviors that you’d rather not execute in the test. For example, it might turn on the x-ray machine or delete rows out of the database. The function is hard to test because you did not design it to be easy to test. You wrote the code first, and you are now writing the tests as an afterthought. Designing for testability was the furthest thing from your mind when you wrote the code.
Now you are faced with redesigning the code in order to test it. You look at your watch and realize that this whole testing thing has taken too long already. Since you’ve already tested it manually, and you know it works, you walk away, leaving yet another hole in the test suite.
However, when you write the test first, something very different happens. You cannot write a function that is hard to test. Since you are writing the test first, you will naturally design the function you are testing to be easy to test. How do you keep functions easy to test? You decouple them. Indeed, testability is just a synonym for decoupling.
By writing the tests first, you will decouple the system in ways that you had never thought about before. The whole system will be testable; therefore, the whole system will be decoupled.
It is for this reason that TDD is often called a design technique. The Three Rules force you into a much higher degree of decoupling.
So far, we’ve seen that following the Three Rules provides a number of powerful benefits: less debugging, good low-level documentation, fun, completeness, and decoupling. But these are just ancillary benefits; none of these is the driving reason for practicing TDD. The real reason is courage.
I told you the following story at the beginning of the book, but it bears repeating here.
Imagine that you are looking at some old code on your screen, and it’s a mess. You think to yourself, “I should clean this up.” But your next thought is, “I’m not touching it!” Because you know if you touch it, you will break it; and if you break it, it becomes yours. So you back away from the code, leaving the mess to fester and rot.
This is a fear reaction. You fear the code. You fear touching it. You fear what will happen to you if you break it. So, you fail to do the one thing that could improve the code—you fail to clean it.
If everyone on the team behaves this way, then the code must rot. No one will clean it. No one will improve it. Every feature added will be added in such a way as to minimize the immediate risk to the programmers. Couplings and duplications will be added because they minimize the immediate risk, even though they corrupt the design and integrity of the code.
Eventually the code will become such a horrible mass of unmaintainable spaghetti that little to no progress can be made with it. Estimates will grow exponentially. Managers will become desperate. More and more programmers will be hired in the hopes of increasing productivity, but that increase will not be realized.
Finally, in utter desperation, the managers will agree to the programmers’ demands that the whole system should be rewritten from scratch, and the cycle will begin again.
Imagine a different scenario. Go back to that screen with the messy code. Your first thought was to clean it. What if you had a test suite that was so complete that you trusted it when it passed? What if that test suite ran quickly? What would your next thought be? It would be something like this:
Gosh, I think I’ll change the name of that variable. Ah, the tests still pass. OK, now I’ll split that big function into two smaller functions… Good, the tests still pass…. OK, now I think I can move one of those new functions over into a different class. Whoops! The tests failed. Put it back… Ah, I see, I needed to move that variable as well. Yes, the tests still pass…
When you have a complete test suite, you lose your fear of changing the code. You lose your fear of cleaning the code. So, you will clean the code. You will keep the system neat and orderly. You will keep the design of the system intact. You will not create the festering mass of spaghetti that would drag the team into the doldrums of low productivity and eventual failure.
That is why we practice TDD. We practice it because it gives us the courage to keep the code clean and orderly. It gives us the courage to act like professionals.
Refactoring is another one of those topics that requires a book to describe. Fortunately, Martin Fowler has done a superb job with just such a book.3 In this chapter I’ll simply discuss the discipline, not the specific techniques. Again, this chapter contains no code.
3. Fowler, M. 2019. Refactoring: Improving the Design of Existing Code, 2nd ed. Boston, MA: Addison-Wesley.
Refactoring is the practice of improving the structure of the code without altering the behavior, as defined by the tests. In other words, we make changes to the names, the classes, the functions, and the expressions without breaking any of the tests. We improve the structure of the system, without affecting the behavior.
Of course, this practice couples strongly with TDD. To fearlessly refactor the code, we need a test suite that gives us very high confidence that we aren’t breaking anything.
The kinds of changes made during refactoring range from trivial cosmetics to deep restructurings. The changes might be simple name changes or complex reshufflings of switch statements to polymorphic dispatches. Large functions will be split into smaller, better-named, functions. Argument lists will be changed into objects. Classes with many methods will be split into many smaller classes. Functions will be moved from one class to another. Classes will be extracted into subclasses or inner classes. Dependencies will be inverted, and modules will be moved across architectural boundaries.
And while all this is taking place, we keep the tests continuously passing.
The process of refactoring is woven intrinsically into the Three Rules of TDD in what is known as the Red/Green/Refactor cycle (Figure 5.1).
First, we create a test that fails.
Then we make the test pass.
Then we clean up the code.
Return to step 1.
The idea here is that writing code that works and writing code that is clean are two separate dimensions of programming. Attempting to control both dimensions at the same time is difficult at best, so we separate the two dimensions into two different activities.
To say this differently, it is hard enough to get code working, let alone getting the code to be clean. So we first focus on getting the code working by whatever messy means occur to our meager minds. Then, once working, with tests passing, we clean up the mess we made.
This makes it clear that refactoring is a continuous process, and not one that is performed on a scheduled basis. We don’t make a huge mess for days and days, and then try to clean it up. Rather, we make a very small mess, over a period of a minute or two, and then we clean up that small mess.
The word Refactoring should never appear on a schedule. Refactoring is not the kind of activity that appears on a plan. We do not reserve time for refactoring. Refactoring is simply part of our minute-by-minute, hour-by-hour approach to writing software.
Sometimes the requirements change in such a way that you realize the current design and architecture of the system is suboptimal, and you need to make a significant change to the structure of the system. Such changes are made within the Red/Green/Refactor cycle. We do not create a project specifically to change the design. We do not reserve time in the schedule for such large refactorings. Instead, we migrate the code one small step at a time, while continuing to add new features during the normal Agile cycle.
Such a change to the design may take place over several days, weeks, or even months. During that time, the system continues to pass all its tests and may be deployed to production, even though the design transition is only partially complete.
The practice of Simple Design is one of the goals of Refactoring. Simple Design is the practice of writing only the code that is required with a structure that keeps it simplest, smallest, and most expressive.
Kent Beck’s rules of Simple Design are as follows:
Pass all the tests.
Reveal the intent.
The numbers are both the order in which these are executed and the priority they are given.
Point 1 is self-evident. The code must pass all the tests. The code must work.
Point 2 says that after the code is made to work, it should then be made expressive. It should reveal the intention of the programmer. It should be easy to read and self-descriptive. This is where we apply many of the simpler and more cosmetic refactorings. We also split large functions into smaller, better-named functions.
Point 3 says that after we have made the code as descriptive and expressive as possible, we hunt for and remove any duplication within that code. We don’t want the code to say the same thing more than once. During this activity, the refactorings are usually more complicated. Sometimes removing duplication is as simple as moving the duplicate code into a function and calling it from many places. Other times it requires more interesting solutions, such as the Design Patterns:4 Template Method, Strategy, Decorator, or Visitor.
4. Design Patterns are beyond the scope of this book. See Gamma, E., R. Helm, R. Johnson, and J. Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Reading, MA: Addison-Wesley.
Point 4 says that once we have removed all the duplication, we should strive to decrease the number of structural elements, such as classes, functions, variables, etc.
The goal of Simple Design is to keep the design weight of the code as small as practicable.
The designs of a software system range from quite simple to extraordinarily complex. The more complex the design, the greater the cognitive load placed on the programmers. That cognitive load is Design Weight. The greater the weight of that design, the more time and effort are required for the programmers to understand and manipulate the system.
Similarly, the complexity of the requirements also ranges from very small to very great. The greater the complexity of the requirements, the more time and effort are required to understand and manipulate the system.
However, the two factors are not additive. Complex requirements can be simplified by employing a more complex design. Often the trade-off is favorable. The overall complexity of the system can be reduced by choosing the appropriate design for an existing set of features.
Achieving this balance of design and feature complexity is the goal of Simple Design. Using this practice, programmers continuously refactor the design of the system to keep it in balance with the requirements and, therefore, keep productivity maximized.
The practice of Pair Programming has enjoyed a substantial amount of controversy and disinformation over the years. Many folks react negatively to the idea that two (or more) people can work productively together on the same problem.
First of all, pairing is optional. No one should be forced to pair. Secondly, pairing is intermittent. There are many good reasons to code alone from time to time. The amount of pairing a team should engage in is 50% or so. That number is not critical. It could be as low as 30% or as much as 80%. For the most part, this is an individual and team choice.
Pairing is the act of two people working together on a single programming problem. The pair may work together at the same workstation, sharing the screen, keyboard, and mouse. Or they may work on two connected workstations so long as they see and manipulate the same code. The latter option works nicely with popular screen-sharing software. That software also allows the partners to be remote from each other, so long as they have a good data and voice link.
Pairing programmers sometimes adopt different roles. One may be the driver and the other the navigator. The driver has the keyboard and mouse; the navigator takes a longer view and makes recommendations. Another role option is for one programmer to write a test, and the other to make it pass and write the next test for the first programmer to pass. This is sometimes called Ping-Pong.
Most often, however, there are no roles at all. The programmers are simply co-equal authors sharing the mouse and keyboard in a cooperative manner.
Pairs are not scheduled. They form and break up according to the programmers’ preference. Managers should not try to create pairing schedules or pairing matrices.
Pairs are generally short-lived. A pairing session can last as long as a day, but more often they last no more than an hour or two. Even pairings as short as 15 to 30 minutes can be beneficial.
Stories are not assigned to pairs. Individual programmers, and not pairs, are responsible for completing stories. The duration of a story is generally much longer than a single pairing.
Over the course of a week, each programmer will spend about half of their pairing time on their own tasks, recruiting the help of several others. The other half of their pairing time will be spent helping others with their tasks.
Seniors should take care to pair with juniors more often than they pair with other seniors. Juniors should request the help of seniors more often than they request the help of other juniors. Programmers with specialties should spend significant amounts of their pairing time working with programmers outside of their specialty. The goal is to spread and exchange knowledge, not concentrate it.
We pair so that we behave like a team. The members of a team do not work in isolation from each other. Instead, they collaborate on a second-by-second basis. When a member of a team goes down, the other team members cover the hole left by that member and keep making progress towards the goal.
Pairing is the best way, by far, to share knowledge between team members and prevent knowledge silos from forming. It is the best way to make sure that nobody on the team is indispensable.
Many teams have reported that pairing reduces errors and improves design quality. This is likely true in most cases. It is generally better to have more than one set of eyes on any given problem. Indeed, many teams have replaced code reviews with pairing.
Pairing is a form of code review, but with a significant advantage. The pairing programmers are co-authors during the time they are pairing. They see older code and review it as a matter of course, but with the intention of authoring new code. Thus, the review is not simply a static check to ensure that the team’s coding standards are applied. Rather, it is a dynamic review of the current state of the code with an eye to where the code needs to be in the near future.
It is hard to measure the cost of pairing. The most direct cost is that there are two people working on a single problem. It should be obvious that this does not double the effort to solve the problem; however, it does likely cost something. Various studies have indicated that the direct cost might be about 15%. In other words, it would require 115 pairing programmers to do the same work of 100 individuals (without code reviews).
A naive calculation would suggest that a team that pairs 50% of the time would suffer something less than 8% in productivity. On the other hand, if the practice of pairing replaces code reviews, then there is likely no reduction in productivity at all.
Then we must consider the benefits of the cross-training knowledge exchange and intense collaboration. These benefits are not easily quantifiable, but they are also likely significant.
My experience and the experience of many others is that pairing, when done informally and at the programmers’ discretion, is quite beneficial to the whole team.
The word “pair” implies that there are just two programmers involved in a pairing session. While this is typically true, it is not a rule. Sometimes three, four, or more will decide to work together on a given problem. (Again, this is at the programmers’ discretion.) This is sometimes known as “mob programming.”5,6
Programmers often fear that managers will frown upon pairing and might even demand that pairs break up and stop wasting time. I’ve never seen this happen. In the half-century that I’ve been writing code, I’ve never seen a manager interfere at such a low level. Generally, in my experience, managers are pleased to see programmers collaborating and working together. It creates the impression that work is being done.
If, however, you are a manager who is tempted to interfere with pairing because you fear it is inefficient, then set your fears aside and let the programmers figure this out. They are, after all, the experts. And if you are a programmer whose manager has told you to stop pairing, remind that manager that you are the expert, and that therefore you, and not the manager, must be in charge of the way you work.
Finally, never, ever, ever, ask for permission to pair. Or test. Or refactor. Or… You are the expert. You decide.
The technical practices of Agile are the most essential ingredient of any Agile effort. Any attempt to employ Agile practices without the technical practices is doomed to fail. The reason for this is simply that Agile is an efficient mechanism for making a big mess in a big hurry. Without the technical practices to keep the technical quality high, the productivity of the team will quickly falter and begin an inexorable death spiral.