As soon as we started programming, we found to our surprise that it wasn’t as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.
—Maurice Wilkes, 1949
It is a painful thing to look at your own trouble and know that you yourself and no one else has made it.
—Sophocles
Congratulations! You’ve finished writing your code so now it’s time to get it working. I know. You’re thinking, “I can write perfect code; I’m careful. I won’t have any errors in my program.” Get over it. Every programmer thinks this at one point or another. There’s just no such thing as a perfect program. Humans are imperfect (thankfully). So we all make mistakes when we write code. After writing code for over 40 years I’ve gotten to the point where most of the time my programs that are less than about 20 lines long don’t have any obvious errors in them and lots of times they even compile the first time. I think that’s a pretty good result. You should shoot for that.
Getting your program to work is a process with three parts, the order of which is the subject of some debate. The three parts are
Debugging is the process of finding the root cause of an error and fixing it. This doesn’t mean treating the symptoms of an error by coding around it to make it go away; it means to find the real reason for the error and fixing that piece of code so the error is removed. Debugging is normally done once you finish writing the code and before you do a code review or unit testing (but see test-driven development later in this chapter).
Reviewing (or inspecting) is the process of reading the code as it sits on the page and looking for errors. The errors can include errors in how you’ve implemented the design, other kinds of logic errors, wrong comments, etc. Reviewing code is an inherently static process because the program isn’t running on a computer – you’re reading it off a screen or a piece of paper. So although reviewing is very good for finding static errors, it can’t find dynamic or interaction errors in your code. That’s what testing is for. We’ll talk more about reviews and inspections in the next chapter.
Testing, of course is the process of finding errors in the code, as opposed to fixing them, which is what debugging is all about. Testing occurs, at minimum, at the following three different levels:
We’ll focus on debugging in this chapter.
We define three types of errors in code
Syntactic errors are errors you make with respect to the syntax of the programming language you’re using. Spelling a keyword wrong, failing to declare a variable before you use it, forgetting to put that closing curly brace in a block, forgetting the return type of a function, and forgetting that semi-colon at the end of a statement are all typical examples of syntactic errors. Syntactic errors are by far the easiest to find, because the compiler finds nearly all of them for you. Compilers are very rigid taskmasters when it comes to enforcing lexical and grammar rules of a language so if you get through the compilation process with no errors and no warnings, then it’s very likely your program has no syntax errors left. Notice the “and no warnings” in the previous sentence. You should always compile your code with the strictest syntax checking turned on, and you should always eliminate all errors and warnings before you move on to reviews or testing. If you are sure you’ve not done anything wrong syntactically, then that’s just one less thing to worry about while you’re finding all the other errors! And the good news is that modern integrated development environments (IDEs) do this for you automatically once you’ve set up the compiler options. So once you set the warning and syntax checking levels, every time you make a change, the IDE will automatically re-compile your file and let you know about any syntactic errors!
Semantic errors, on the other hand, occur when you fail to create a proper sentence in the programming language. You do this because you have some basic misunderstanding about the grammar rules of the language. Not putting curly braces around a block, accidentally putting a semi-colon after the condition in an if
or while
statement in C/C++ or Java, forgetting to use a break;
statement at the end of a case statement inside a switch
, are all classic examples of semantic errors. Semantic errors are harder to find because they are normally syntactically correct pieces of code so the compiler passes your program and it compiles correctly into an object file. It’s only when you try to execute your program that semantic errors surface. The good news is that they’re usually so egregious that they show up pretty much immediately. The bad news is they can be very subtle. For example, in this code segment
the semi-colon at the end of the while
statement’s conditional expression is usually very hard to see, your eyes will just slide right over it; but its effect is to either put the program into an infinite loop, because the loop control variable j is never being incremented, or to never execute the loop, but then erroneously execute the block because it is no longer semantically connected to the while
statement.
The third type of error, logic errors, are by far the most difficult to find and eradicate. A logic error is one that occurs because you’ve made a mistake in translating the design into code. These errors include things like computing a result incorrectly, off-by-one errors in loops (which can also be a semantic error if your off-by-one error is because you didn’t understand array indexing, for example), misunderstanding a network protocol, returning a value of the wrong type from a method, and so on. With a logic error, either your program seems to execute normally, but you get the wrong answers, or it dies a sudden and horrible death because you’ve walked off the end of an array, tried to dereference a null pointer, or tried to go off and execute code in the middle of a data area. It’s not pretty.
Unit testing involves finding the errors in your program, and debugging involves finding the root cause and fixing those errors. Debugging is about finding out why an error occurs in your program. You can look at errors as opportunities to learn more about the program, and about how you work and approach problem solving. Because after all, debugging is a problem solving activity, just as developing a program is problem solving. Look at debugging as an opportunity to learn about yourself and improve your skill set.
Just like in any endeavor, particularly problem solving endeavors, there’s a wrong way and a right way to approach the task. Here are a few things you shouldn’t do as you approach a debugging problem.1
First of all, don’t guess about where the error might be. This implies that (1) you don’t know anything about the program you’re trying to debug, and (2) you’re not going about the job of finding the root cause of the error systematically. Stop, take a deep breath, and start again.
Don’t fix the symptom, fix the problem. Lots of times you can “fix” a problem by forcing the error to go away by adding code. This is particularly true if the error involves an outlier in a range of values. The temptation here is to special case the outlier by adding code to handle just that case. Don’t do it! You haven’t fixed the underlying problem here; you’ve just painted over it. Trust me, there’s some other special case out there waiting to break free and squash your program. Study the program, figure out what it’s doing at that spot, and fix the problem. You’ll thank me later.
__________
1 McConnell, S. Code Complete 2: A Practical Handbook of Software Construction. (Redmond, WA: Microsoft Press, 2004).
Avoid denial. It’s always tempting to say “the compiler must be wrong” or “the system must be broken” or “Ralph’s module is obviously sending me bad data” or “that’s impossible” or some such excuse. Buck up here, developer. If you just “changed one thing” and the program breaks, then guess who probably just injected an error into the program? Or at the very least uncovered one? Review the quote from Sophocles at the beginning of this chapter, “... you yourself and no one else has made it.” You will make mistakes. We all do. The best attitude to display is, “by golly, this program can’t beat me, I’m going to fix this thing!” One of the best discussions of careful coding and how hard it is to write correct programs is the discussion of how to write binary search in Column 5 of Jon Bentley’s Programming Pearls.2 You should read it.
Here’s an approach to debugging that will get the job done. Remember, you’re solving a problem here and the best way to do that is to have a systematic way of sneaking up on the problem and whacking it on the head. The other thing to remember about debugging is that, like a murder mystery, you’re working backwards from the conclusion.3 The bad thing has already happened – your program failed. Now you need to examine the evidence and work backwards to a solution.
This is the key first step. If your error only shows up periodically it will be much, much harder to find. The classic example of how hard this can be is the “but it works fine on my computer” problem. This is the one sentence you never want to hear. This is why people in tech support retire early. Reproducing the problem – in different ways if possible – will allow you to see what’s happening and will give you a clear indication of where the problem is occurring. Luckily for you, most errors are easy to find. Either you get the wrong answer and you can look for where the print statement is located and work backwards from there, or your program dies a horrible death and the system generates a stack trace for you. The Java Virtual Machine does this automatically for you. With other languages, you may need to use a debugger to get the stack trace.
Remember, errors are not random events. If you think the problem is random, then it’s usually one of the following:
__________
2 Bentley, J. Programming Pearls, 2nd Edition. (Reading, MA, Addison-Wesley: 2000).
3 Kernighan, B. W. and R. Pike. The Practice of Programming. (Boston, MA, Addison-Wesley, 1999).
Reproducing the problem is not enough, however. You should reproduce it using the simplest test case that will cause the error to occur. It’s a matter of eliminating all the other possibilities so you can focus on the single one (well, maybe one or two) that probably causes the error. One way to do this is to try to reproduce the problem using half the data you had the first time. Pick one half or the other. If the error still occurs, try it again. If the error doesn’t happen, try the other half of the data. If there’s still no error, then try with three-quarters of the data. You get the idea. You’ll know when you’ve found the simplest case because with anything smaller the behavior of the program will change; either the error will disappear, or you’ll get a slightly different error.
Once you can reproduce the problem from the outside, you can now find where the error is occurring. Once again, we need to do this systematically. For most errors this is easy. There are a number of techniques you can use.
if
-statements, in the default case of a switch
statement, etc. Unless something very spooky is going on you should be able to isolate where the error is occurring pretty quickly using this method. Once again, work your way backwards from the point where you think the error makes itself known. Remember that many times where an error exhibits its behavior may be many lines of code after where the error actually occurs. #ifdef DEBUG
printf("Debug statement in sort routine
");
#endif
#define DEBUG
in a header file or you can compile using gcc -DDEBUG foo.c
and the printf function call will be included in your program. Leaving out the #define or the -DDEBUG
will remove the printf function call from the executable program (but not your source). Beware though that this technique makes your program harder to read because of all the DEBUG
blocks scattered around the code. You should remove DEBUG
blocks before your program releases. Unfortunately, Java doesn’t have this facility because it doesn’t have a pre-processor. However all is not lost. You can get the same effect as the #ifdef DEBUG
by using a named boolean constant. Here’s an example of code:
public class IfDef {
final static boolean DEBUG = true;
public static void main(String [] args) {
System.out.printf("Hello, World
");
if (DEBUG) {
System.out.printf("max(5, 8) is %d
", Math.max(5, 8));
System.out.printf("If this prints, the code was included
");
}
}
}
DEBUG
to true when we want to turn the DEBUG
blocks on, and we’ll then turn it to false when we want to turn them off. This isn’t perfect because you have to re-compile every time you want to turn debugging on and off, but you have to do that with the C/C++ example above as well.for (int j = 0; j <= myArray.length; j++) {
// some code here
}
for (int j = 0; j < length; j++) {
if (c = myArray[j]) {
pos = j;
break;
}
}
TstEql.java:10: incompatible types
found : char
required: boolean
if (c = myArray[j]) {
^
1 error
switch(selectOne) {
case ’p’: operation = "print";
break;
case ’d’: operation = "display";
default: operation = "blank";
break;
}
Don’t be discouraged, though. Most errors you’ll make really are simple. Most of them you’ll catch during code reviews and unit tests. The ones that escape into system test or (heaven forbid) released code are the really interesting ones. Debugging is a great problem solving exercise. Revel in it.
So far the only debugging tools we’ve talked about using are compilers to remove syntax errors and warnings, print statements you can insert in your code to give you data on what is happening where, and inline debugging statements that you can compile in or out. There are other tools you can use that will help you find the source of an error. The first among these are debuggers.
Debuggers are special programs that execute instrumented code and allow you to peek inside the code as it’s running to see what’s going on. Debuggers allow you to stop your running code (breakpoints), examine variable values as the code executes (watchpoints), step into and out of functions, and even make changes to the code and the data while the program is running. Debuggers are the easiest way to get a stack trace for C and C++ programs. For C and C++ developers, the gdb debugger that comes with nearly all Unix and Linux systems (and the development tool packages for Mac OS X and Windows) is usually the debugger of choice. For Java, Gdb is also integrated in some interactive development environments like Eclipse (www.eclipse.org/
), and also comes with a graphical user interface in the DDD debugger (www.gnu.org/software/ddd/
). The NetBeans IDE (www.netbeans.org
) comes with its own graphical debugger. The Java debuggers in Eclipse and NetBeans allow you to set breakpoints at individual lines of code, they let you watch variables values change via watchpoints, and they allow you to step through the code one line or one method at a time. Gdb does all the things mentioned above and more, but you should use it, and any other debugger cautiously. Debuggers, by their nature, have tunnel vision when it comes to looking at code. They are great at showing you all the code for the current function, but they don’t give you a feel for the organization of the program as a whole. They also don’t give you a feel for complicated data structures and it’s hard to debug multi-threaded and multi-process programs using a debugger. Multi-threaded programs are particularly hard for a number of reasons, one of which is that while executing timing is crucial for the different threads, and running a multi-threaded program in a debugger changes the timing.
Once you’ve found where the error is, you need to come up with a fix for it. Most of the time the fix is obvious and simple because the error is simple. That’s the good news. But sometimes while you can find the error, the fix isn’t obvious, or the fix will entail rewriting a large section of code. In cases like this be careful! Take the time necessary to understand the code, and then rewrite the code and fix the error correctly. The biggest problem in debugging is haste.
When you are fixing errors remember two things:
This second item is particularly important. We’ve all been in situations where you’re fixing an error and you find another one in the same piece of code. The temptation is to fix them both right then and there. Resist! Fix the error you came to fix. Test it and make sure the fix is correct. Integrate the new code back into the source code base. Then you can go back to step 1 and fix the second error. You might ask, “Why do all this extra work when I can just make the fix right now?”
Well, here’s the situation. By the time you get to this step in the debugging process you already have a test for the first error, you’ve educated yourself about the code where the error occurs, you’re ready to make that one fix. Why should you confuse the issue by fixing two things now? Besides, you don’t have a test for the second error. So how do you test that fix? Trust me, it’s a little more work, but doing the fixes one at a time will save you lots of headaches down the road.
Well, this sounds obvious, doesn’t it? But you’d be surprised how many fixes don’t get tested. Or if they’re tested, it’s a simple test with generic sample data and no attempt to see if your fix broke anything else.
First of all, re-run the original test that uncovered the error. Not just the minimal test that you came up with in step 1, but the first test that caused the error to appear. If that test now fails (in the sense that the error does not occur any more), then that’s a good sign you’ve at least fixed the proximate cause of the error. Then run every other test in your regression suite (see the next chapter for more discussion on regression tests) so you can make sure you’ve not re-broken something that was already fixed. Finally, integrate your code into the source code base, check out the new version and test the entire thing. If all that still works, then you’re in good shape. Go have a beer.
Well, if there was one error in a particular function or method, then there might be another, right? So while you’re here, you might as well take a look at the code in the general vicinity of the error you just fixed and see if anything like it happens again. This is another example of looking for patterns. Patterns are there because developers make the same mistakes over and over again (we’re human, after all). Grab another cup of coffee and a doughnut and read some more code. It won’t hurt to take a look at the whole module or class and see if there are other errors or opportunities for change. In the agile world, this is called refactoring. This means rewriting the code to make it simpler. Making your code simpler will make it clearer, easier to read, and it will make finding that next error easier. So have some coffee and read some code.
In some of the paragraphs above we’ve made mention of a source code base and integrating changes into that base. That is a sneaky way of starting a brief discussion of source code control, also known as software version control.
Whenever you work on a project, whether you are the only developer or you are part of a team, you should keep backups of the work you’re doing. That’s what a version control system (VCS) does for you, but with a twist. A VCS will not only keep a backup of all the files you create during a project, but it will keep track of all the changes you’ve made to them, so that in addition to saying, “Give me the latest version of PhoneContact.java,” you can say, “I want the version of PhoneContact.java from last Thursday.”
A VCS keeps a repository of all the files you’ve created and added to it for your project. The repository can be a flat file or a more sophisticated database. A client program allows you access the repository and retrieve different versions of one or more of the files stored there. Normally, if you just ask the VCS for a particular file or files, you get the latest version. Whatever version of the file you extract from the repository, it’s called the working copy in VCS-speak. Extracting the file is called a check out.
If you are working on a project all alone, then the working copy you check out from the VCS repository is the only one out there and any changes that you make will be reflected in the repository when you check the file back in. The cool part of this is that if you make a change and it’s wrong, you can just check out a previous version that doesn’t have the change in it. The other interesting part of a VCS is when there is more than one developer working on a project. When you’re working on a development team, it’s quite likely that somebody else on the team may check out the same file that you did. This brings up the problem of file sharing. The problem here is if both of you make changes to the file and then both want to check the file back into the repository who gets to go first and whose changes end up in the repository? Ideally, both, right?
Well, maybe not. Say Alice and Bob both check out PhoneContact.java from the repository and each of them makes changes to it. Bob checks his version of PhoneContact.java back into the repository and goes to lunch. A few minutes later Alice checks in her version of PhoneContact.java. Two problems occur. (1) if Alice hasn’t made any changes in the same lines of code that Bob did, her version is still newer than Bob’s and it hides Bob’s version in the repository. Bob’s changes are still there, but they are now in an older version than Alice’s. (2) Worse, if Alice did make changes to some of the same code that Bob did, then her changes actually overwrite Bob’s and main.c is a very different file. Bummer. So we don’t want either of these situations to occur. How do we avoid this problem?
Version control systems use the following two different strategies to avoid this collision problem.:
The first strategy is lock-modify-unlock. In this strategy, Bob checks out PhoneContact.java and locks it for edit. This means that Bob now has the only working copy of PhoneContact.java that can be changed. If Alice tries to check out PhoneContact.java she gets a message that she can only check out a read-only version and so can’t check it back in until Bob gives up his lock. Bob makes his changes, checks PhoneContact.java back in, and then releases the lock. Alice can now check out and lock an editable version of PhoneContact.java (which now includes Bob’s changes) and make her own changes and check the file back in, giving up her lock. The lock-modify-unlock strategy has the effect of serializing changes in the repository.
This serialization of changes is the biggest problem with lock-modify-unlock. While Bob has the file checked out for editing, Alice can’t make her changes. She just sits around twiddling her thumbs until Bob is done. Alice’s boss doesn’t like this thumb twiddling stuff. However, there is an alternative.
The second strategy is copy-modify-merge. In this strategy, Alice and Bob are both free to check out editable copies of PhoneContact.java. Let’s say that Alice makes her changes first and checks her new version of the file back into the repository and goes out for cocktails. When Bob is finished making his changes he tries to check his new version of PhoneContact.java into the repository only to have the VCS tell him his version of the file is “out of date;” Bob can’t check in. What happened here? Well, the VCS stamps each file that’s checked out with a timestamp and a version number. It also keeps track of what is checked out and who checked it out and when. It checks those values when you try to check in.
When Bob tried to check in, his VCS realized that the version of the code he was trying to check in was older than the current version (the new one that Alice had checked in earlier), so it let him know that. So what is Bob to do? That’s where the third part of copy-modify-merge comes in. Bob needs to tell the VCS to merge his changes with the current version of PhoneContact.java and then check in the updated version. This all works just fine if Alice and Bob have changed different parts of the file. If their changes do not conflict, then the VCS can just do the merge automatically and check in the new file. A problem occurs if Alice and Bob have made changes to the same lines of code in the file. In that case, Bob must do a manual merge of the two files. Bob has to do this because the VCS isn’t smart enough to choose between the conflicting changes. Usually, a VCS will provide some help in doing the merge, but ultimately the merge decision must be Bob’s.
copy-modify-merge is the strategy used by most version control systems these days, including the popular open-source version control system, subversion (http://subversion.apache.org
).4 There is one problem (well, okay, more than one, but we’ll just talk about this one) with copy-modify-merge. If your repository allows you to store binary files, you can’t merge them. Say you have two versions of the same jpg file. How do you decide which of the bits is correct? So in this case the VCS (subversion included) will require you to use lock-modify-unlock.
Git (http://git.scm.com), the other candidate for most popular open-source version control system, uses a model that has each developer have a local repository of the entire development history. When a developer makes a change to a file, the changes are copied to the other local repositories. Git uses a model called an incomplete merge along with a number of plug-in merge tools to coordinate merges across repositories. Git’s main virtue is speed. It may be the fastest distributed VCS around.
Pair programming is a technique to improve software quality and programmer performance. It’s been around for many years, but only recently been formalized [Williams00]. In pair programming two people share one computer and one keyboard. One person “drives,” controlling the keyboard and writing the code, and the other “navigates,” watching for errors in the code, suggesting changes and test cases. Periodically the driver and the navigator switch places. Pairs can work together for long periods of time on a project, or pairs can change with each programming task. Pair programming is particularly popular in agile development environments; in the Extreme Programming process, all developers are required to pair program and no code that has not been written by two people is allowed to be integrated into the project [Beck00]. There have been several studies5 that show that pair programming decreases the number of errors in code and improves the productivity of programmers. So this is our final debugging technique – pair program!
Just like writing good, efficient code, debugging is a skill that all programmers need to acquire. Being a careful coder will mean you have less debugging to do, but there will always be debugging. Programmers are all human and we’ll always make mistakes. Having a basket of debugging skills will help you find the root causes of errors in your code faster and it will help you from injecting more errors. The combination of reviews (Chapter 15), debugging and unit testing – as we’ll see in the next chapter – is the knock-out punch that a developer uses to release defect-free code.
__________
4 Collins-Sussman, B., Fitzpatrick, B. W., and Pilato, C. M. Version Control with Subversion. (Sebastapol, CA: O’Reilly Press, 2010). Retrieved from http://svnbook.red-bean.com/
on 15 October 2010.
5 Cockburn, A. and L. Williams. The Costs and Benefits of Pair Programming. Extreme Programming Examined. (Boston, MA: Addison-Wesley Longman, 2001). Page 592.
Bentley, J. Programming Pearls, 2nd Edition. (Reading, MA, Addison-Wesley: 2000).
Chelf, B. “Avoiding the most common software development goofs.” Retrieved from www.embedded.com/show/Article.jhtml?articleID=192800005
on October 2, 2006.
Cockburn, A. and L. Williams. The Costs and Benefits of Pair Programming. Extreme Programming Examined. (Boston, MA: Addison-Wesley Longman, 2001). Page 592.
Collins-Sussman, B., Fitzpatrick, B. W., and Pilato, C. M. Version Control with Subversion. (Sebastapol, CA: O’Reilly Press, 2010). Retrieved from http://svnbook.red-bean.com/
on 15 October 2010.
Kernighan, B. W. and R. Pike. The Practice of Programming. (Boston, MA, Addison-Wesley, 1999).
McConnell, S. Code Complete 2: A Practical Handbook of Software Construction. (Redmond, WA: Microsoft Press, 2004).
3.128.200.71