Chapter 6. Using Effective Tracing

 

“A complex system that does not work is invariably found to have evolved from a simple system that works just fine.”

 
 --Murphy's Technology Laws

Introduction to Tracing

Tracing is writing debug messages to an output stream during the execution of an application. It provides a recorded track of operations performed by the running application in real time. Tracing does not require the application to be running in debug mode. Also, because the trace messages are typically written to a log file, tracing provides information for studying the behavior and troubleshooting the application. Tracing has been used for many years, but it became especially popular in server-side programming. When the application is small, it can easily be loaded into a debugger and stepped through to investigate a problem. Distributed systems tend to be much more complex on the server side and often run inside middleware products such as an application or a Web server. This makes using a debugger impractical, and stepping through code can interfere with multithreading. Debugging is not an option at all for running production applications, so tracing is the only plausible method for providing information about what is going on in the system.

Traces are inserted in code almost like comments, and good tracing makes comments unnecessary. Because tracing results in method calls and messages written to a stream, it can become expensive—especially if placed inside loops and frequently called methods. Logging frameworks are designed to add minimum overhead to code execution, and they can be configured to reduce the level of trace messages that are written to the output stream. It is important to realize, however, that tracing that is not very well optimized is still better than no tracing at all. If a problem exists in a production application without traces, the only source of information is user feedback, which as we all know, is often unreliable or inaccurate. Good use of traces allows tracking the user's action and the application's response to it, which should be adequate to identify the source of the problem in most cases. You should include some sort of user identification in each trace so the log file can be filtered on operations performed by a specific user.

Java did not have a standard logging API until JDK 1.4. This is somewhat unfortunate given the popularity and importance of logging for J2EE applications. As a result of this late standardization, each vendor and many application developers had to create their own version of the tracing and logging API. Today, it is common to see various versions of the logging API, sometimes even in the same application. For example, a J2EE application running inside WebLogic might be using Log4J for its own debug traces but using WebLogic logging API to report system-level critical errors. All logging APIs support a concept of log message levels that control how many messages are written to the log file.

Tracing As an Effective Method of Learning the Software

We already talked about the importance of tracing for the troubleshooting of large systems. The information provided in trace logs is also invaluable for understanding the execution flow of an application and identifying the starting points for reverse engineering. Because traces are designed to provide a human-readable history of operations performed by a product, they are easier to read than decompiled Java code. The log file presents a snapshot of the work done so far. Thus, if a bug results in an exception with no stack trace, you can turn on the most detailed level of tracing, re-create the bug, and find the last message that made it into the log file before the exception. The same technique can be used to locate code that needs to be patched. If the application does not provide adequate tracing, you can add trace messages and even thread stack dumps to classes using patching. Although tedious, it can be the only effective way of studying obfuscated code, especially if control flow obfuscation was used. Tracing takes the guessing out of the picture because trace messages provide undisputable evidence that a certain piece of code executed at a given moment in time.

If the application is not printing log messages already, inserting traces can be time-consuming, and—let's admit it—it is not the type of challenge a good developer is looking for. A quick-and-dirty way of getting a glimpse at the runtime state is printing the call stack of the current thread from a method under investigation, as described in Chapter 5, “Replacing and Patching Application Classes.” Recall that adding a call to the dumpStack() method of java.lang.Thread causes it to print the names of methods that form the call stack. With just one line of code, you can get a good understanding of which calls led to the invocation of the method in question.

Tracing and Logging Tools and APIs

There are two dominant logging APIs today. The first one is Apache's Log4J, which dates back to 1999 when it was started by IBM's AlphaWorks group. It has come a long way and today is probably the most advanced and flexible logging API. It is free and can be distributed as a standalone JAR file, which is compatible with all versions of JDK starting from 1.1. Because of the Apache group's brand recognition and the great set of features found in Log4J, it has gained enormous popularity among Java developers and can be considered the de facto standard for Java logging.

The second framework is Sun's Java Logging API, which has been made the official standard. For better or worse, JCP has decided not to use Log4J but to create a new standard. The Java Logging API is also powerful and flexible, although it lacks some of the advanced features found in Log4J, including

  • Logging to system logs such as the Windows NT Event log and Unix Syslog

  • Log file rotation based on the date

  • C language printf-like formatting patterns

  • Reloading the configuration file

Reloading the configuration file is an important feature for highly available applications. Typically, the configuration is read when a framework is initialized, which is usually at the application startup. Changes to the log levels and categories are not picked up until the application is restarted. Troubleshooting a running production system rules out the possibility of frequent restarts, so without the reloading feature the logging level has to be decided before you start the application.

For the purposes of hacking and reverse engineering, Log4J is clearly a better choice because it can be used with all versions of JDK. The download and online documentation are available at http://jakarta.apache.org/log4j.

Tracing Do's and Don'ts

Tracing is extremely important for large distributed applications, but if it's not used carefully, it can significantly degrade the application performance. Following is a set of simple rules for effective tracing.

Tracing Do's

  • Do use tracing as much as possible.

  • Make sure each trace includes the time, class, and method name that produced the trace.

  • Do write traces before and after critical parts of code, such as external system calls, database calls, and disk access. This enables easy identification of the point of failure.

  • Do include values of parameters and context variables that will help you understand the execution context at the time of the trace. For example, including the user ID in each trace helps filter the log file on operations performed on behalf of a specific user.

  • Do trace out exceptions with the stack trace when it is first caught.

  • Do be sure to use different trace levels when writing messages. Use the critical level for system and error messages and the debug level for noisy messages that are not very important for troubleshooting the system. This enables you to control the size of the log file and the overhead on the application.

Tracing Don'ts

  • Don't use System.out.println to write traces. This imposes permanent overhead on the application and does not allow flexibility. Use a logging framework instead.

  • Don't use Exception.printStackTrace() to output an exception because it always writes the output to the System.out. Use methods provided by the logging framework.

  • Don't insert traces into loops with hundreds or thousands of repetitions.

  • Don't insert traces into small methods that are called frequently.

Another practice to avoid is using the + operator to concatenate strings when passing parameters to the logging framework. Even if the tracing level is raised, the runtime will still perform the expensive operation of allocating a new buffer for a resulting string and copying the argument strings into it. Thus, instead of using

logger.debug("Message received from host: " + host + ", text: " + text)

use

if (logger.isDebugEnabled() == true)
    logger.debug("Message received from host: " + host + ", text: " + text)

The cost of executing the second piece of code is much lower because the string concatenation does not take place unless the debugging traces are enabled.

Quick Quiz

1:

How is tracing different from debugging?

2:

How can tracing be used to hack an application?

3:

Why is the reloading of the logging configuration at runtime important?

4:

Why is using + for string concatenation dangerous?

In Brief

  • Tracing is writing debug messages to an output stream during the execution of an application.

  • Traces are inserted as API calls into the application code.

  • Traces add performance overhead, but it is justified by the improved troubleshooting.

  • Adding traces to an application helps in understanding its execution flow.

  • Log4J and Java Logging API are both good choices for logging frameworks. Log4J is more advanced and can be used with JDKs prior to 1.1, which makes it a better choice for hacking.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.181.47