Chapter 9. Debugging

I write perfect code.

Anonymous

If you write perfect code, skip to the next chapter. This chapter is for everyone else—those who do not write perfect code. There is one other person who writes perfect code: the source of the quotation. Why did I make it anonymous? I have kept the person’s name anonymous because an experienced developer would never claim to write perfect code. The perfect developer does not exist, which is an unfortunate truth. The perfect developer can be found in the annals of history next to the abominable snowman, big foot, and other mythical figures. Perfect code is not the goal of this chapter or this book. That is not possible. However, minimizing potential problems and bugs is a valid goal and achievable by following certain best practices and policies, some of which are described in this chapter. This chapter is about embracing your bugs. Debugging is another perspective that should be considered in the software development life cycle, like security and deployment. It should not be left as an oversight, waiting for the first crisis situation to occur.

When planning a software project, time is often not set aside for debugging—a major omission since debugging is a significant portion of application development, deployment, and ongoing maintenance. Fifty percent (on average) of the effort to maintain a software application is debugging. Furthermore, debugging is an iterative process. This means debugging is a non-trivial task. Not setting this time aside means problems tend to be uncovered later in testing or, even worse, at deployment, which can be costly. 2-1 in Chapter 2, highlights cost savings from resolving problems early in the software development life cycle.

Effective debugging is about creating and maintaining a stable, robust, efficient, and correct application. A stable application will not fail with little impetus or indiscriminately. A robust application handles exceptional events at run time in a stable manner. Efficient applications scale either as users or throughput increases. A correct application responds as expected to user interaction and other inertia. Effective debugging is similar to being an auto mechanic. An auto mechanic is responsible for keeping a car running well through preemptive and reactive maintenance and repair. You are the software mechanic responsible for keeping your software application running well.

Some debugging techniques are universal and applicable to both desktop and Web-based applications. For example, debugging an overflow exception is virtually the same for Web-based and desktop applications. You check local variables and parameters for invalid values, validate results of expressions, and so on. However, there are circumstances where debugging a Web-based versus desktop application is different. Tracing, for example, can inadvertently display sensitive information of a Web-based application—information that is inappropriate to present to a remote user. This chapter focuses primarily on Web-based applications, such as ASP.NET applications. We have already mentioned one debugging difference. Here are other differences between debugging a Web-based versus desktop application.

  • Web applications reside within an application domain of the worker process, which affects how the application is debugged. Desktop applications are isolated applications.

  • Unhandled exceptions are handled differently in a Web application. Unhandled exceptions are routed to the page, application, and possibly a custom error page. This is somewhat different from resolving unhandled exceptions in a desktop application.

  • Automatic recycling of the worker process, which hosts Web-based applications, can make debugging deadlocks and memory leaks more challenging than with a desktop application.

As mentioned, this chapter focuses on debugging Web applications. I suggest the Debugging Microsoft .NET 2.0 Applications book (Microsoft Press, 2006), written by John Robbins, for a comprehensive discussion of debugging. In addition to being technically superior, the book contains several interesting and fun war stories from John’s experience in the trenches debugging with customers.

Debugging a developmental version of a Web application differs from debugging a production version. A developmental version is probably a debug version and used during active product development. It runs on a local machine and not on a production server. A production version of a Web application should be a release version that is deployed to a production server. Developers typically use the developmental version of a Web application, while remote users are the primarily clients of a production version. For these reasons and more, there are salient differences in debugging a developmental version of a Web application and debugging a production version. For example, requests on the developmental version (debug version) of a Web application do not time out. This provides an opportunity to debug an errand request. However, production versions (release versions) of a Web application set a time-out on requests to prevent stalling the application. Here are additional differences:

  • Production versions are optimized, which can make debugging more challenging.

  • Production servers may not have debugging tools, including Visual Studio, which, of course, makes debugging more difficult to accomplish on the local server.

  • A production server is often less accessible. This may require remote debugging, with related security issues, to debug the Web-based application.

  • Production debugging of a Web-based application is typically done post-mortem, while development debugging may entail live debugging.

Beneath the managed layer of a managed application is a native application. Understanding the native underpinning of a managed application can be helpful when debugging. Debugging at this level may be challenging. However, it can provide important information when debugging. The Shared Source Common Language Infrastructure 2.0 (project name is Rotor) is an open source version of the Common Language Runtime (CLR). Shared Source CLI Essentials, by David Stutz, Ted Neward, and Geoff Shilling (O’Reilly, 2003), is an excellent book on this topic. "Production Debugging for .NET Framework Applications" is an article published on the Microsoft Developer Network (MSDN) and another source of excellent information on the topic, which you can find here: http://msdn.microsoft.com/en-us/library/ms954594.aspx.

For example, when a managed application deadlocks, the native perspective can contain a wealth of helpful information. Windbg is a native debugging tool that is discussed in further detail later in the chapter. The following listing is the native call stack of a managed thread. From this information, I know that this thread was waiting on a Monitor when the program hung. A Monitor is a managed class and wrapper for a critical section, which is a native construct. Toward the top of the call stack is the ntdll!ZwWaitForMultipleObjects method call, where ntdll is the module name and ZwWaitForMultipleObjects is the method. From the arguments of this method, the number of objects being waiting on and their related handles can be found. As expected, the thread is waiting on one object. This is only a peek into the sort of information available when native debugging a managed application. There is much more.

ChildEBP RetAddr Args to Child
03bcf110 77aa0690 76927e09 00000001 03bcf164 ntdll!KiFastSystemCallRet
03bcf114 76927e09 00000001 03bcf164 00000001 ntdll!ZwWaitForMultipleObjects+0xc
03bcf1b0 79ed98fd 03bcf164 004b8dd0 00000000 KERNEL32!WaitForMultipleObjectsEx+0x11d
03bcf218 79ed9889 00000001 004b8dd0 00000000 mscorwks!WaitForMultipleObjectsEx_SO_
TOLERANT+0x6f
03bcf238 79ed9808 00000001 004b8dd0 00000000 mscorwks!Thread::DoAppropriateAptStateWait+0x3c
03bcf2bc 79ed96c4 00000001 004b8dd0 00000000 mscorwks!Thread::DoAppropriateWaitWorker+0x13c
03bcf30c 79ed9a62 00000001 004b8dd0 00000000 mscorwks!Thread::DoAppropriateWait+0x40
03bcf368 79e78944 ffffffff 00000001 00000000 mscorwks!CLREvent::WaitEx+0xf7
03bcf37c 79ed7b37 ffffffff 00000001 00000000 mscorwks!CLREvent::Wait+0x17
03bcf408 79ed7a9e 004cb148 ffffffff 004cb148 mscorwks!AwareLock::EnterEpilog+0x8c
03bcf424 79ebd7e4 c6efd76b 00000000 0139916c mscorwks!AwareLock::Enter+0x61
03bcf4c4 009a03f6 013627f4 793b0d1f 013991a0 mscorwks!JIT_MonEnterWorker_Portable+0xb3

Some things begin at birth and continue to the grave: aging, relatives wanting to borrow frequent flyer miles, and, my personal favorite, taxes. Debugging falls into this category. Bottom line, you are debugging an application from cradle to grave. This is an activity that does not end when an application is placed into production. Actually, most applications are in production for a period of time much longer than the initial development. During this time, you are receiving support calls from customers, building the next release of the product, and of course continuing testing. These activities will undoubtedly uncover a few more bugs and require resolution. In addition, the deployment location is typically dynamic. Operating system upgrades, service packs, hardware upgrades, and more can cause unanticipated problems and necessitate debugging. For these reasons, post-production debugging must be planned, budgeted, and have resources reserved. Otherwise, the shiny new polished application you delivered to clients may eventually run off its wheels.

The culture at your company should not place the focus entirely on new product development, where current products are essentially abandoned at product release. The key word is current. Your current products, not future products, have a greater impact on your profitability, brand, and customer satisfaction. Therefore, if only for self-preservation, you should have a credible plan for debugging those products into the future. In design and implementation, plan for debugging released products. Plan to reinvest 10 percent to 15 percent of time back into the code base for fixing and enhancing problem areas. Here are some examples:

  • Design and implement a plan for reporting the health of the application.

  • Have a plan to enable and disable instrumentation when needed.

  • Plan to incorporate hooks in your application to be used when debugging. For example, build in the ability to dump the state of various important components.

This chapter provided concrete steps, practices, and policies for effective debugging. However, debugging is as much philosophical as it is tactical. A debugging for quality mindset must be adopted by everyone on the team and reinforced by management. You don’t wake up one morning and decide to be concerned about debugging. It must be ongoing and intrinsic to product design, development, and deployment, where every member of the team participates. Management must have an equal commitment to debugging for quality and supporting the software team in this goal. The software architect should ask important questions during the design phase. Can the product be designed in a manner to proactively prevent future problems? What remote functionality is required to debug the application on a production machine? When implementing the product, the developers must also ask important questions. If the application crashes, is there way of collecting and reporting critical information on the problem? The point is you need to think about debugging throughout the software development life to be properly prepared to resolve the inevitable bug.

Not being prepared to handle software problems can cause great harm to your business. The following sections showcase two stories from recent history where software bugs received widespread attention. Yes, software bugs can be headline news and extensively reported. Loss of product sales can be the least of your problems. The long-term harm to the overall brand of the company could be more damaging. The two software problems have become known as the Overflow Bug and the Pentium FDIV bug.

Overflow Bug

In December 2004, thousands of travelers were stranded in airports across the country because of an overflow in a field of a software application. The problem was widely reported during the holiday season, which is peak travel time for airlines. Here is what happened. Comair, a commuter for Delta Airlines, was forced to cancel hundreds of flights and strand angry passengers over the Christmas holiday. The core of the problem was a snowstorm that forced the rescheduling of airline crews. This caused an abnormally high number of crew reassignments, which caused an overflow in a field. This caused an essential software system to crash and otherwise perform incorrectly. The software system was a legacy application that was written in Fortran. Few at Comair understood the legacy application or the language, which complicated problems. The cost to Comair was $20 million. Furthermore, the Overflow Bug and the ramifications of it led to the resignation of two career executives. "Comair’s Christmas Disaster: Bound to Fail" is an excellent article about this event: www.cio.com/article/112103.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.66.128