1

Building the Case for TDD

Before we dive into what test-driven development (TDD) is and how to use it, we’re going to need to understand why we need it. Every seasoned developer knows that bad code is easier to write than good code. Even good code seems to get worse over time. Why?

In this chapter, we will review the technical failures that make source code difficult to work with. We’ll consider the effect that bad code has on both the team and the business bottom line. By the end of the chapter, we’ll have a clear picture of the anti-patterns we need to avoid in our code.

In this chapter, we’re going to cover the following main topics:

  • Writing code badly
  • Recognizing bad code
  • Decreasing team performance
  • Diminishing business outcomes

Writing code badly

As every developer knows, it seems a lot easier to write bad code than to engineer good code. We can define good code as being easy to understand and safe to change. Bad code is therefore the opposite of this, where it is very difficult to read the code and understand what problem it is supposed to be solving. We fear changing bad code – we know that we are likely to break something.

My own troubles with bad code go all the way back to my first program of note. This was a program written for a school competition, which aimed to assist realtors to help their customers find the perfect house. Written on the 8-bit Research Machines 380Z computer at school, this was 1981’s answer to Rightmove.

In those pre-web days, it existed as a simple desktop application with a green-screen text-based user interface. It did not have to handle millions, never mind billions, of users. Nor did it have to handle millions of houses. It didn’t even have a nice user interface.

As a piece of code, it was a couple of thousand lines of Microsoft Disk BASIC 9 code. There was no code structure to speak of, just thousands of lines resplendent with uneven line numbers and festooned with global variables. To add an even greater element of challenge, BASIC limited every variable to a two-letter name. This made every name in the code utterly incomprehensible. The source code was intentionally written to have as few spaces in it as possible in order to save memory. When you only had 32KB of RAM to fit all of the program code, the data, and the operating system in, every byte mattered.

The program only offered its user basic features. The user interface was of its time, using only text-based forms. It predated graphical operating systems by a decade. The program also had to implement its own data storage system, using files on 5.25-inch floppy disks. Again, affordable database components were of the future. The main feature of the program in question was that users could search for houses within certain price ranges and feature sets. They could filter by terms such as the number of bedrooms or price range.

However, the code itself really was a mess. See for yourself – here is a photograph of the original listing:

Figure 1.1 – The estate agent code listing

Figure 1.1 – The estate agent code listing

This horror is the original paper listing of one of the development versions. It is, as you can see, completely unreadable. It’s not just you. Nobody would be able to read it easily. I can’t and I wrote it. I would go as far as to say it is a mess, my mess, crafted by me, one keystroke at a time.

This kind of code is a nightmare to work with. It fails our definition of good code. It is not at all easy to read that listing and understand what the code is supposed to be doing. It is not safe to change that code. If we attempted to, we would find that we could never be certain about whether we have broken some feature or not. We would also have to manually retest the entire application. This would be time-consuming.

Speaking of testing, I never thoroughly tested that code. It was all manually tested without even following a formal test plan. At best, I would have run a handful of happy path manual tests. These were the kind of tests that would confirm that you could add or delete a house, and that some representative searches worked, but that was all. There was no way I ever tested every path through that code. I just guessed that it would work.

If the data handling had failed, I would not have known what had happened. I never tried it. Did every possible search combination work? Who knew? I certainly had no idea. I had even less patience to go through all that tedious manual testing. It worked, enough to win an award of sorts, but it was still bad code.

Understanding why bad code is written

In my case, it was simply down to a lack of knowledge. I did not know how to write good code. But there are also other reasons unrelated to skill. Nobody ever sets out to write bad code intentionally. Developers do the best job they can with the tools available and to the best of their ability at that time.

Even with the right skills, several common issues can result in bad code:

  • A lack of time to refine the code due to project deadlines
  • Working with legacy code whose structure prevents new code from being added cleanly
  • Adding a short-term fix for an urgent production fault and then never reworking it
  • Unfamiliarity with the subject area of the code
  • Unfamiliarity with the local idioms and development styles
  • Inappropriately using idioms from a different programming language

Now that we’ve seen an example of code that is difficult to work with, and understood how it came about, let’s turn to the obvious next question: how can we recognize bad code?

Recognizing bad code

Admitting that our code is difficult to work with is one thing, but to move past that and write good code, we need to understand why code is bad. Let’s identify the technical issues.

Bad variable names

Good code is self-describing and safe to change. Bad code is not.

Names are the most critical factor in deciding whether code will be easy to work with or not. Good names tell the reader clearly what to expect. Bad names do not. Variables should be named according to what they contain. They should answer “why would I want to use this data? What will it tell me?

A string variable that has been named string is badly named. All we know is that it is a string. This does not tell us what is in the variable or why we would want to use it. If that string represented a surname, then by simply calling it surname, we would have helped future readers of our code understand our intentions much better. They would be able to easily see that this variable holds a surname and should not be used for any other purpose.

The two-letter variable names we saw in the listing in Figure 1.1 represented a limitation of the BASIC language. It was not possible to do better at the time, but as we could see, they were not helpful. It is much harder to understand what sn means than surname, if that’s what the variable stores. To carry that even further, if we decide to hold a surname in a variable named x, we have made things really difficult for readers of our code. They now have two problems to solve:

  • They have to reverse-engineer the code to work out that x is used to hold a surname
  • They have to mentally map x with the concept of surname every time that they use it

It is so much easier when we use descriptive names for all our data, such as local variables, method parameters, and object fields. In terms of more general guidelines, the following Google style guide is a good source: https://google.github.io/styleguide/javaguide.html#s5-naming.

Best practice for naming variables

Describe the data contained, not the data type.

We now have a better idea of how to go about naming variables. Now, let’s look at how to name functions, methods, and classes properly.

Bad function, method, and class names

The names of functions, methods, and classes all follow a similar pattern. In good code, function names tell us why we should call that function. They describe what they will do for us as users of that function. The focus is on the outcome – what will have happened by the time the function returns. We do not describe how that function is implemented. This is important. It allows us to change our implementation of that function later if that becomes advantageous, and the name will still describe the outcome clearly.

A function named calculateTotalPrice is clear about what it is going to do for us. It will calculate the total price. It won’t have any surprising side effects. It won’t try and do anything else. It will do what it says it will. If we abbreviate that name to ctp, then it becomes much less clear. If we call it func1, then it tells us absolutely nothing at all that is useful.

Bad names force us to reverse-engineer every decision made every time we read the code. We have to pore through the code to try and find out what it is used for. We should not have to do this. Names should be abstractions. A good name will speed up our ability to understand code by condensing a bigger-picture understanding into a few words.

You can think of the function name as a heading. The code inside the function is the body of text. It works just the same way that the text you’re reading now has a heading, Recognizing bad code, which gives us a general idea of the content in the paragraphs that follow. From reading the heading, we expect the paragraphs to be about recognizing bad code, nothing more and nothing less.

We want to be able to skim-read our software through its headings – the function, method, class, and variable names – so that we can focus on what we want to do now, rather than relearning what was done in the past.

Method names are treated identically to function names. They both describe an action to be taken. Similarly, you can apply the same rules for function names to method names.

Best practice for method and function names

Describe the outcome, not the implementation.

Again, class names follow descriptive rules. A class often represents a single concept, so its name should describe that concept. If a class represents the user profile data in our system, then a class name of UserProfile will help readers of our code to understand that.

A name’s length depends on namespacing

One further tip applies to all names with regard to their length. The name should be fully descriptive but its length depends on a few factors. We can choose shorter names when one of the following applies:

  • The named variable has a small scope of only a few lines
  • The class name itself provides the bulk of the description
  • The name exists within some other namespace, such as a class name

Let’s look at a code example for each case to make this clear.

The following code calculates the total of a list of values, using a short variable name, total:

int calculateTotal(List<Integer> values) {
    int total = 0;
    for ( Integer v : values ) {
        total += v;
    }
    return total ;
}

This works well because it is clear that total represents the total of all values. We do not need a name that is any longer given the context around it in the code. Perhaps an even better example lies in the v loop variable. It has a one-line scope, and within that scope, it is quite clear that v represents the current value within the loop. We could use a longer name such as currentValue instead. However, does this add any clarity? Not really.

In the following method, we have a parameter with the short name gc:

private void draw(GraphicsContext gc) {
    // code using gc omitted
}

The reason we can choose such a short name is that the GraphicsContext class carries most of the description already. If this were a more general-purpose class, such as String, for example, then this short name technique would be unhelpful.

In this final code example, we are using the short method name of draw():

public class ProfileImage {
    public void draw(WebResponse wr) {
        // Code omitted
    }
}

The class name here is highly descriptive. The ProfileImage class name we’ve used in our system is one that is commonly used to describe the avatar or photograph that shows on a user’s profile page. The draw() method is responsible for writing the image data to a WebResponse object. We could choose a longer method name, such as drawProfileImage(), but that simply repeats information that has already been made clear given the name of the class. Details such as this are what give Java its reputation for being verbose, which I feel is unfair; it is often us Java programmers who are verbose, rather than Java itself.

We’ve seen how properly naming things makes our code easier to understand. Let’s take a look at the next big problem that we see in bad code – using constructs that make logic errors more likely.

Error-prone constructs

Another tell-tale sign of bad code is that it uses error-prone constructs and designs. There are always several ways of doing the same thing in code. Some of them provide more scope to introduce mistakes than others. It therefore makes sense to choose ways of coding that actively avoid errors.

Let’s compare two different versions of a function to calculate a total value and analyze where errors might creep in:

 int calculateTotal(List<Integer> values) {
    int total = 0;
    for ( int i=0; i<values.size(); i++) {
        total += values.get(i);
    }
    return total ;
}

The previous listing is a simple method that will take a list of integers and return their total. It’s the sort of code that has been around since Java 1.0.2. It works, yet it is error prone. In order for this code to be correct, we need to get several things right:

  • Making sure that total is initialized to 0 and not some other value
  • Making sure that our i loop index is initialized to 0
  • Making sure that we use < and not <= or == in our loop comparison
  • Making sure that we increment the i loop index by exactly one
  • Making sure that we add the value from the current index in the list to total

Experienced programmers do tend to get all this right first time. My point is that there is a possibility of getting any or all of these things wrong. I’ve seen mistakes made where <= has been used instead of < and the code fails with an ArrayIndexOutOfBounds exception as a result. Another easy mistake is to use = in the line that adds to the total value instead of +=. This has the effect of returning only the last value, not the total. I have even made that mistake as a pure typo – I honestly thought I had typed the right thing but I was typing quickly and I hadn’t.

It is clearly much better for us to avoid these kinds of errors entirely. If an error cannot happen, then it will not happen. This is a process I call designing out errors. It is a fundamental clean-code practice. To see how we could do this to our previous example, let’s look at the following code:

int calculateTotal(List<Integer> values) {
    return values.stream().mapToInt(v -> v).sum();
}

This code does the same thing, yet it is inherently safer. We have no total variable, so we cannot initialize that incorrectly, nor can we forget to add values to it. We have no loop and so no loop index variable. We cannot use the wrong comparison for the loop end and so cannot get an ArrayIndexOutOfBounds exception. There is simply far less that can go wrong in this implementation of the code. It generally makes the code clearer to read as well. This, in turn, helps with onboarding new developers, code reviews, adding new features, and pair programming.

Whenever we have a choice to use code with fewer parts that could go wrong, we should choose that approach. We can make life easier for ourselves and our colleagues by choosing to keep our code as error-free and simple as possible. We can use more robust constructs to give bugs fewer places to hide.

It is worth mentioning that both versions of the code have an integer overflow bug. If we add integers together whose total is beyond the allowable range of -2147483648 to 2147483647, then the code will produce the wrong result. The point still stands, however: the later version has fewer places where things can go wrong. Structurally, it is simpler code.

Now that we have seen how to avoid the kinds of errors that are typical of bad code, let’s turn to other problem areas: coupling and cohesion.

Coupling and cohesion

If we have a number of Java classes, coupling describes the relationship between those classes, while cohesion describes the relationships between the methods inside each one.

Our software designs become easier to work with once we get the amounts of coupling and cohesion right. We will learn techniques to help us do this in Chapter 7, Driving Design–TDD and SOLID. For now, let’s understand the problems that we will face when we get this wrong, starting with the problem of low cohesion.

Low cohesion inside a class

Low cohesion describes code that has many different ideas all lumped together in it in a single place. The following UML class diagram shows an example of a class with low cohesion among its methods:

Figure 1.2 – Low cohesion

Figure 1.2 – Low cohesion

The code in this class attempts to combine too many responsibilities. They are not all obviously related – we are writing to a database, sending out welcome emails, and rendering web pages. This large variety of responsibilities makes our class harder to understand and harder to change. Consider the different reasons we may need to change this class:

  • Changes to the database technology
  • Changes to the web view layout
  • Changes to the web template engine technology
  • Changes to the email template engine technology
  • Changes to the news feed generation algorithm

There are many reasons why we would need to change the code in this class. It is always better to give classes a more precise focus, so that there are fewer reasons to change them. Ideally, any given piece of code should only have one reason to be changed.

Understanding code with low cohesion is hard. We are forced to understand many different ideas at once. Internally, the code is very interconnected. Changing one method often forces a change in others because of this. Using the class is difficult, as we need to construct it with all its dependencies. In our example, we have a mixture of templating engines, a database, and code for creating a web page. This also makes the class very difficult to test. We need to set up all these things before we can run test methods against that class. Reuse is limited with a class like this. The class is very tightly bound to the mix of features that are rolled into it.

High coupling between classes

High coupling describes where one class needs to connect to several others before it can be used. This makes it difficult to use in isolation. We need those supporting classes to be set up and working correctly before we can use our class. For the same reason, we cannot fully understand that class without understanding the many interactions it has. As an example, the following UML class diagram shows classes with a high degree of coupling between each other:

Figure 1.3 – High coupling

Figure 1.3 – High coupling

In this fictitious example of a sales tracking system, several of the classes need to interact with each other. The User class in the middle couples to four other classes: Inventory, EmailService, SalesAppointment, and SalesReport. This makes it harder to use and test than a class that couples to fewer other classes. Is the coupling here too high? Maybe not, but we can imagine other designs that would reduce it. The main thing is to be aware of the degree of coupling that classes have in our designs. As soon as we spot classes with many connections to others, we know we are going to have a problem understanding, maintaining, and testing them.

We’ve seen how the technical elements of high coupling and low cohesion make our code difficult to work with, but there is a social aspect to bad code as well. Let’s consider the effect bad code has on the development team.

Decreasing team performance

A good way to look at bad code is code lacking the technical practices that help other developers understand what it is doing.

When you’re coding solo, it doesn’t matter so much. Bad code will just slow you down and feel a little demoralizing at times. It does not affect anybody else. However, most professionals code in development teams, which is a whole different ball game. Bad code really slows a team down.

The following two studies are interesting as far as this is concerned:

The first study shows that developers waste up to 23% of their time on bad code. The second study shows that in 25% of cases of working with bad code, developers are forced to increase the amount of bad code still further. In these two studies, the term technical debt is used, rather than referring to bad code. There is a difference in intention between the two terms. Technical debt is code that is shipped with known technical deficiencies in order to meet a deadline. It is tracked and managed with the intention that it will later be replaced. Bad code might have the same defects, but it lacks the redeeming quality of intentionality.

It is all too easy to check in code that has been easy to write but will be hard to read. When I do that, I have effectively placed a tax on the team. The next developer to pull my changes will have to figure out what on earth they need to do and my bad code will have made that much harder.

We’ve all been there. We start a piece of work, download the latest code, and then just stare at our screens for ages. We see variable names that make no sense, mixed up with tangled code that really does not explain itself very well at all. It’s frustrating for us personally, but it has a real cost in a programming business. Every minute we spend not understanding code is a minute where money is being spent on us achieving nothing. It’s not what we dreamed of when we signed up to be a developer.

Bad code disrupts every future developer who has to read the code, even us, the original authors. We forget what we previously meant. Bad code means more time spent by developers fixing mistakes, instead of adding value. It means more time is lost on fixing bugs in production that should have been easily preventable.

Worse still, this problem compounds. It is like interest on a bank loan. If we leave bad code in place, the next feature will involve adding workarounds for the bad code. You may see extra conditionals appear, giving the code yet more execution paths and creating more places for bugs to hide. Future features build on top of the original bad code and all of its workarounds. It creates code where most of what we read is simply working around what never worked well in the first place.

Code of this kind drains the motivation out of developers. The team starts spending more time working around problems than they spend adding value to the code. None of this is fun for the typical developer. It’s not fun for anybody on the team.

Project managers lose track of the project status. Stakeholders lose confidence in the team’s ability to deliver. Costs overrun. Deadlines slip. Features get quietly cut, just to claw back a little slack in the schedule. Onboarding new developers becomes painful, to the point of awkwardness, whenever they see the awful code.

Bad code leaves the whole team unable to perform to the level they are capable of. This, in turn, does not make for a happy development team. Beyond unhappy developers, it also negatively impacts business outcomes. Let’s understand those consequences.

Diminishing business outcomes

It’s not just the development team who suffers from the effects of bad code. It’s bad for the entire business.

Our poor users end up paying for software that doesn’t work, or at least that doesn’t work properly. There are many ways that bad code can mess up a user’s day, whether as a result of lost data, unresponsive user interfaces, or any kind of intermittent fault. Each one of these can be caused by something as trivial as setting a variable at the wrong time or an off-by-one error in a conditional somewhere.

The users see neither any of that nor the thousands of lines of code that we got right. They just see their missed payment, their lost document that took 2 hours to type, or that fantastic last-chance ticket deal that simply never happened. Users have little patience for things like this. Defects of this kind can easily lose us a valuable customer.

If we are lucky, users will fill out a bug report. If we are really lucky, they will let us know what they were doing at the time and provide us with the right steps to reproduce the fault. But most users will just hit delete on our app. They’ll cancel future subscriptions and ask for refunds. They’ll go to review sites and let the world know just how useless our app and company are.

At this point, it isn’t merely bad code; it is a commercial liability. The failures and honest human errors in our code base are long forgotten. Instead, we were just a competitor business that came and went in a blaze of negativity.

Decreased revenue leads to decreased market share, a reduced Net Promoter Score®™ (NPS), disappointed shareholders, and all the other things that make your C-suite lose sleep at night. Our bad code has become a problem at the business level.

This isn’t hypothetical. There have been several incidents where software failures have cost the business. Security breaches for Equifax, Target, and even the Ashley Madison site all resulted in losses. The Ariane rocket resulted in the loss of both spacecraft and satellite payload, a total cost of billions of dollars! Even minor incidents resulting in downtime for e-commerce systems can soon have costs mounting, while consumer trust crashes down.

In each case, the failures may have been small errors in comparatively few lines of code. Certainly, they will have been avoidable in some way. We know that humans make mistakes, and that all software is built by humans, yet a little extra help may have been all it would have taken to stop these disasters from unfolding.

The advantage of finding failures early is shown in the following diagram:

Figure 1.4 – Costs of defect discovery

Figure 1.4 – Costs of defect discovery

In the previous figure, the cost of the repair of a defect gets higher the later it is found:

  • Found by a failing test before code:

The cheapest and fastest way to discover a defect is by writing a test for a feature before we write the production code. If we write the production code that we expect should make the test pass, but instead the test fails, we know there is a problem in our code.

  • Found by a failing test after code:

If we write the production code for a feature, and then write a test afterward, we may find defects in our production code. This happens a little later in the development cycle. We will have wasted a little more time before discovering the defect.

  • Found during manual QA:

Many teams include Quality Assurance (QA) engineers. After code has been written by a developer, the QA engineer will manually test the code. If a defect is found here, this means significant time has passed since the developer first wrote the code. Rework will have to be done.

  • Found by the end user once code is in production:

This is as bad as it gets. The code has been shipped to production and end users are using it. An end user finds a bug. The bug has to be reported, triaged, a fix scheduled for development, then retested by QA then redeployed to production. This is the slowest and most expensive path to discovering a defect.

The earlier we find the fault, the less time and money we will have to spend on correcting it. The ideal is to have a failing test before we even write a line of code. This approach also helps us design our code. The later we leave it to find a mistake, the more trouble it causes for everyone.

We’ve seen how low-quality code gives rise to defects and is bad for business. The earlier we detect failures, the better it is for us. Leaving defects in production code is both difficult and expensive to fix, and negatively affects our business reputation.

Summary

We can now recognize bad code from its technical signs and appreciate the problems that it causes for both development teams and business outcomes.

What we need is a technique to help us avoid these problems. In the next chapter, we’ll take a look at how TDD helps us deliver clean, correct code that is a true business asset.

Questions and answers

  1. Isn’t it enough to have working code?

Sadly not. Code that meets user needs is an entry-level step with professional software. We also need code that we know works, and that the team can easily understand and modify.

  1. Users don’t see the code. Why does it matter to them?

This is true. However, users expect things to work reliably, and they expect our software to be updated and improved continuously. This is only possible when the developers can work safely with the existing code.

  1. Is it easier to write good code or bad code?

It is much harder to write good code, unfortunately. Good code does more than simply work correctly. It must also be easy to read, easy to change, and safe for our colleagues to work with. That’s why techniques such as TDD have an important role to play. We need all the help we can get to write clean code that helps our colleagues.

Further reading

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.91.239