Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Method Calls

The main focus of most profiling tools is to provide a profile of method calls. This gives you a good idea of where the bottlenecks in your code are and is probably the most important way to pinpoint where to target your efforts. By showing which methods and lines take the most time, a good profiling tool can save you time and effort in locating bottlenecks.

Most method profilers work by sampling the call stack at regular intervals and recording the methods on the stack.^[8] This regular snapshot identifies the method currently being executed (the method at the top of the stack) and all the methods below, to the program’s entry point. By accumulating the number of hits on each method, the resulting profile usually identifies where the program is spending most of its time. This profiling technique assumes that the sampled methods are representative, i.e., if 10% of stacks sampled show method foo( ) at the top of the stack, then the assumption is that method foo() takes 10% of the running time. However, this is a sampling technique , and so it is not foolproof: methods can be missed altogether or have their weighting misrecorded if some of their execution calls are missed. But usually only the shortest tests are skewed. Any reasonably long test (i.e., over seconds, rather than milliseconds) will normally give correct results.

Warning

This sampling technique can be difficult to get right. It is not enough to simply sample the stack. The profiler must also ensure that it has a coherent stack state, so the call must be synchronized across the stack activities, possibly by temporarily stopping the thread. The profiler also needs to make sure that multiple threads are treated consistently, and that the timing involved in its activities is accounted for without distorting the regular sample time. Also, too short a sample interval causes the program to become extremely slow, while too long an interval results in many method calls being missed and hence misrepresentative profile results being generated.

The JDK comes with a minimal profiler, obtained by running a program using the java executable with the -Xrunhprof option (-prof before JDK 1.2, -Xprof with HotSpot). The result of running with this option is a file with the profile data in it. The default name of the file is java.hprof.txt (java.prof before 1.2). This filename can be specified by using the modified option, -Xrunhprof:file=<filename> (-prof:<filename> before 1.2). The output using these options is discussed in detail shortly.

Profiling Methodology

When using a method profiler, the most useful technique is to target the top five to ten methods and choose the quickest to fix. The reason for this is that once you make one change, the profile tends to be different the next time, sometimes markedly so. This way, you can get the quickest speedup for a given effort.

However, it is also important to consider what you are changing, so you know what your results are. If you select a method that is taking up 10% of the execution time, then if you halve the time that method takes, you have speeded up your application by 5%. On the other hand, targeting a method that takes up only 1% of execution time is going to give you a maximum of only 1% speedup to the application, no matter how much effort you put in to speed up that method.

Similarly, if you have a method that takes 10% of the time but is called a huge number of times so that each individual method call is quite short, you are less likely to speed up that method. On the other hand, if you can eliminate some significant fraction of the calling methods (the methods that call the method that takes 10% of the time), you might gain a good speedup in that way.

Let’s look at the profile output from a short program that repeatedly converts some numbers to strings and also inserts them into a hash table:

package tuning.profile;
import java.util.*;

public class ProfileTest
{

  public static void main(String[] args)
  {
    //Repeat the loop this many times
    int repeat = 2000;

    //Two arrays of numbers, eight doubles and ten longs
    double[] ds = {Double.MAX_VALUE, -3.14e-200D,
      Double.NEGATIVE_INFINITY, 567.89023D, 123e199D,
      -0.000456D, -1.234D, 1e55D};
    long[] ls = {2283911683699007717L, -8007630872066909262L,
      4536503365853551745L, 548519563869L, 45L,
      Long.MAX_VALUE, 1L, -9999L, 7661314123L, 0L};

    //initializations
    long time;
    StringBuffer s = new StringBuffer( );
    Hashtable h = new Hashtable( );
    System.out.println("Starting test");
    time = System.currentTimeMillis( );

    //Repeatedly add all the numbers to a stringbuffer,
    //and also put them into a hash table
    for (int i = repeat; i > 0; i--)
    {
        s.setLength(0);
        for (int j = ds.length-1; j >= 0; j--)
        {
            s.append(ds[j]);
            h.put(new Double(ds[j]), Boolean.TRUE);
        }
        for (int j = ls.length-1; j >= 0; j--)
        {
            s.append(ls[j]);
            h.put(new Long(ls[j]), Boolean.FALSE);
        }
    }
    time = System.currentTimeMillis( ) - time;
    System.out.println("  The test took " + time + " milliseconds");
  }
}

The relevant output from running this program with the JDK 1.2 method profiling option follows. (See Section 2.3.2 for a detailed explanation of the 1.2 profiling option and its output.)

CPU SAMPLES BEGIN (total = 15813) Wed Jan 12 11:26:47 2000
rank   self  accum   count trace method
   1 54.79% 54.79%    8664   204 java/lang/FloatingDecimal.dtoa
   2 11.67% 66.46%    1846   215 java/lang/Double.equals
   3 10.18% 76.64%    1609   214 java/lang/FloatingDecimal.dtoa
   4  3.10% 79.74%     490   151 java/lang/FloatingDecimal.dtoa
   5  2.90% 82.63%     458   150 java/lang/FloatingDecimal.<init>
   6  2.11% 84.74%     333   213 java/lang/FloatingDecimal.<init>
   7  1.23% 85.97%     194   216 java/lang/Double.doubleToLongBits
   8  0.97% 86.94%     154   134 sun/io/CharToByteConverter.convertAny
   9  0.94% 87.88%     148   218 java/lang/FloatingDecimal.<init>
  10  0.82% 88.69%     129   198 java/lang/Double.toString
  11  0.78% 89.47%     123   200 java/lang/Double.hashCode
  12  0.70% 90.17%     110   221 java/lang/FloatingDecimal.dtoa
  13  0.66% 90.83%     105   155 java/lang/FloatingDecimal.multPow52
  14  0.62% 91.45%      98   220 java/lang/Double.equals
  15  0.52% 91.97%      83   157 java/lang/FloatingDecimal.big5pow
  16  0.46% 92.44%      73   158 java/lang/FloatingDecimal.constructPow52
  17  0.46% 92.89%      72   133 java/io/OutputStreamWriter.write

In this example, I have extracted only the top few lines from the profile summary table. The methods are ranked according to the percentage of time they take. Note that the trace does not identify actual method signatures, only method names. The top three methods take, respectively, 54.79%, 11.67%, and 10.18% of the time taken to run the full program.^[9] The fourth method in the list takes 3.10% of the time, so clearly you need look no further than the top three methods to optimize the program. The methods ranked first, third, and fourth are the same method, possibly called in different ways. Obtaining the traces for these three entries from the relevant section of the profile output (trace 204 for the first entry, and traces 215 and 151 for the second and fourth entries), you get:

TRACE 204:
	java/lang/FloatingDecimal.dtoa(FloatingDecimal.java:Compiled method)
	java/lang/FloatingDecimal.<init>(FloatingDecimal.java:Compiled method)
	java/lang/Double.toString(Double.java:Compiled method)
	java/lang/String.valueOf(String.java:Compiled method)
TRACE 214:
	java/lang/FloatingDecimal.dtoa(FloatingDecimal.java:Compiled method)
TRACE 151:
	java/lang/FloatingDecimal.dtoa(FloatingDecimal.java:Compiled method)
	java/lang/FloatingDecimal.<init>(FloatingDecimal.java:Compiled method)
	java/lang/Double.toString(Double.java:132)
	java/lang/String.valueOf(String.java:2065)

In fact, both traces 204 and 151 are the same stack, but trace 151 provides line numbers for two of the methods. Trace 214 is a truncated entry, and is probably the same stack as the other two (these differences are one of the limitations of the JDK profiler, i.e., that information is sometimes lost).

So all three entries refer to the same stack: an inferred call from the StringBuffer to append a double, which calls String.valueOf() , which calls Double.toString() , which in turn creates a FloatingDecimal object. (<init> is the standard way to write a constructor call; <clinit> is the standard way to show a class initializer being executed. These are also the actual names for constructors and static initializers in the class file). FloatingDecimal is a class that is private to the java.lang package, which handles most of the logic involved in converting floating-point numbers. FloatingDecimal.dtoa( ) is the method called by the FloatingDecimal constructor that converts the binary floating-point representation of a number into its various parts of digits before the decimal point, after the decimal point, and the exponent. FloatingDecimal stores the digits of the floating-point number as an array of chars when the FloatingDecimal is created; no strings are created until the floating-point number is converted to a string.

Since this stack includes a call to a constructor, it is worth checking the object-creation profile to see whether you are generating an excessive number of objects: object creation is expensive, and a method that generates many new objects is often a performance bottleneck. (I show the object-creation profile and how to generate it in Section 2.4.) The object-creation profile shows that a large number of extra objects are being created, including a large number of FDBigInt objects that are created by the new FloatingDecimal objects.

Clearly, FloatingDecimal.dtoa( ) is the primary method to try to optimize in this case. Almost any improvement in this one method translates directly to a similar improvement in the overall program. However, normally only Sun can modify this method, and even if you want to modify it, it is long and complicated and takes an excessive amount of time to optimize unless you are already familiar with both floating-point binary representation and converting that representation to a string format.

Normally when tuning, the first alternative to optimizing FloatingDecimal.dtoa( ) is to examine the other significant bottleneck method, Double.equals( ), which came second in the summary. Even though this entry takes up only 11.67% compared to over 68% for the FloatingDecimal.dtoa( ) method, it may be an easier optimization target. But note that while a small 10% improvement in the FloatingDecimal.dtoa( ) method translates into a 6% improvement for the program as a whole, the Double.equals( ) method needs to be speeded up to be more than twice as fast to get a similar 6% improvement for the full program.

The trace corresponding to this second entry in the summary example turns out to be another truncated trace, but the example shows the same method in 14th position, and the trace for that entry identifies the Double.equals( ) call as coming from the Hashtable.put( ) call. Unfortunately for tuning purposes, the Double.equals( ) method itself is already quite fast and cannot be optimized further.

When methods cannot be directly optimized, the next best choice is to reduce the number of times they are called or even avoid the methods altogether. (In fact, eliminating method calls is actually the better tuning choice, but is often considerably more difficult to achieve and so is not a first-choice tactic for optimization.) The object-creation profile and the method profile together point to the FloatingDecimal class as being a huge bottleneck, so avoiding this class is the obvious tuning tactic here. In Chapter 5, I employ this technique, avoiding the default call through the FloatingDecimal class for the case of converting floating-point numbers to Strings, and I obtain an order-of-magnitude improvement. Basically, the strategy is to create a more efficient routine to run the equivalent conversion functionality, and then replacing the calls to the underperforming FloatingDecimal methods with calls to the more efficient optimized methods.

The best way to avoid the Double.equals( ) method is to replace the hash table with another implementation that stores double primitive data types directly rather than requiring the doubles to be wrapped in a Double object. This allows the == operator to make the comparison in the put( ) method, thus completely avoiding the Double.equals( ) call: this is another standard tuning tactic, where a data structure is replaced with a more appropriate and faster one for the task.

Note

The 1.1 profiling output is quite different and much less like a standard profiler’s output. Running the 1.1 profiler with this program (details of this output are given in Section 2.3.4) gives:

count callee caller time
21 java/lang/System.gc( )V  
    java/lang/FloatingDecimal.dtoa(IJI)V 760
8 java/lang/System.gc( )V  
    java/lang/Double.equals(Ljava/lang/Object;)Z 295
2 java/lang/Double.doubleToLongBits(D)J 
    java/lang/Double.equals(Ljava/lang/Object;)Z 0

I have shown only the top four lines from the output. This output actually identifies both the FloatingDecimal.dtoa() and the Double.equals( ) methods as taking the vast majority of the time, and the percentages (given by the reported times) are listed as around 70% and 25% of the total program time for the two methods, respectively. Since the “callee” for these methods is listed as System.gc( ) , this also identifies that the methods are significantly involved in memory creation and suggests that the next tuning step might be to analyze the object-creation output for this program.

Java 2 “cpu=samples” Profile Output

The default profile output gained from executing with -Xrunhprof in Java 2 is not useful for method profiling. The default output generates object-creation statistics from the heap as the dump (output) occurs. By default, the dump occurs when the application terminates; you can modify the dump time by typing Ctrl- on Solaris and other Unix systems, or Ctrl-Break on Win32. To get a useful method profile, you need to modify the profiler options to specify method profiling. A typical call to achieve this is:

java -Xrunhprof:cpu=samples,thread=y <classname>

(Note that in a Windows command-line prompt, you need to surround the option with double quotes because the equals sign is considered a meta character.)

Note that -Xrunhprof has an “h” in it. There seems to be an undocumented feature of the VM in which the option -Xrun<something> makes the VM try to load a shared library called <something>, e.g., using -Xrunprof results in the VM trying to load a shared library called “prof.” This can be quite confusing if you are not expecting it. In fact, -Xrunhprof loads the “hprof” shared library.

The profiling option in JDK 1.2/1.3 can be pretty flaky. Several of the options can cause the runtime to crash (core dump). The output is a large file, since huge amounts of trace data are written rather than summarized. Since the profile option is essentially a Sun engineering tool, it has had limited resources applied to it, especially as Sun has a separate (not free) profile tool that Sun engineers would normally use. Another tool that Sun provides to analyze the output of the profiler is called heap-analysis tool (search http://www.java.sun.com for “HAT”). But this tool analyzes only the object-creation statistics output gained with the default profile output, and so is not that useful for method profiling (see Section 2.4 for slightly more about this tool).

Nevertheless, I expect the free profiling option to stabilize and be more useful in future versions. The output when run with the options already listed (cpu=samples, thread=y) already results in fairly usable information. This profiling mode operates by periodically sampling the stack. Each unique stack trace provides a TRACE entry in the second section of the file; describing the method calls on the stack for that trace. Multiple identical samples are not listed; instead, the number of their “hits” are summarized in the third section of the file. The profile output file in this mode has three sections:

Section 1

A standard header section describing possible monitored entries in the file. For example:

WARNING!  This file format is under development, and is subject to
change without notice.

This file contains the following types of records:

THREAD START
THREAD END      mark the lifetime of Java threads

TRACE           represents a Java stack trace.  Each trace consists
                of a series of stack frames.  Other records refer to
                TRACEs to identify (1) where object allocations have
                taken place, (2) the frames in which GC roots were
                found, and (3) frequently executed methods.

Section 2

Individual entries describing monitored events, i.e., threads starting and terminating, but mainly sampled stack traces. For example:

THREAD START (obj=8c2640, id = 6, name="Thread-0", group="main")
THREAD END (id = 6)
TRACE 1:
	<empty>
TRACE 964:
	java/io/ObjectInputStream.readObject(ObjectInputStream.java:Compiled method)
	java/io/ObjectInputStream.inputObject(ObjectInputStream.java:Compiled method)
	java/io/ObjectInputStream.readObject(ObjectInputStream.java:Compiled method)
	java/io/ObjectInputStream.inputArray(ObjectInputStream.java:Compiled method)
TRACE 1074:
	java/io/BufferedInputStream.fill(BufferedInputStream.java:Compiled method)
	java/io/BufferedInputStream.read1(BufferedInputStream.java:Compiled method)
	java/io/BufferedInputStream.read(BufferedInputStream.java:Compiled method)
	java/io/ObjectInputStream.read(ObjectInputStream.java:Compiled method)

Section 3

A summary table of methods ranked by the number of times the unique stack trace for that method appears. For example:

CPU SAMPLES BEGIN (total = 512371) Thu Aug 26 18:37:08 1999
rank   self  accum   count trace method
   1 16.09% 16.09%   82426  1121 java/io/FileInputStream.read
   2  6.62% 22.71%   33926   881 java/io/ObjectInputStream.allocateNewObject
   3  5.11% 27.82%   26185   918 java/io/ObjectInputStream.inputClassFields
   4  4.42% 32.24%   22671   887 java/io/ObjectInputStream.inputObject
   5  3.20% 35.44%   16392   922 java/lang/reflect/Field.set

Section 3 is the place to start when analyzing this profile output. It consists of a table with six fields, headed rank, self, accum, count, trace, and method, as shown. These fields are used as follows:

rank

This column simply counts the entries in the table, starting with 1 at the top, and incrementing by 1 for each entry.

self

The self field is usually interpreted as a percentage of the total running time spent in this method. More accurately, this field reports the percentage of samples that have the stack given by the trace field. Here’s a one-line example:

rank   self  accum   count trace method 
   1 11.55% 11.55%   18382   545 java/lang/FloatingDecimal.dtoa

This example shows that stack trace 545 occurred in 18,382 of the sampled stack traces, and this is 11.55% of the total number of stack trace samples made. It indicates that this method was probably executing for about 11.55% of the application execution time, because the samples are at regular intervals. You can identify the precise trace from the second section of the profile output by searching for the trace with identifier 545. For the previous example, this trace was:

TRACE 545: (thread=1)
	java/lang/FloatingDecimal.dtoa(FloatingDecimal.java:Compiled method)
	java/lang/FloatingDecimal.<init>(FloatingDecimal.java:Compiled method)
	java/lang/Double.toString(Double.java:Compiled method)
	java/lang/String.valueOf(String.java:Compiled method)

This TRACE entry clearly identifies the exact method and its caller. Note that the stack is reported to a depth of four methods. This is the default depth: the depth can be changed using the depth parameter to the -Xrunhprof option, e.g., -Xrunhprof:depth=6,cpu=samples,....

accum

This field is a running additive total of all the self field percentages as you go down the table: for the Section 3 example shown previously, the third line lists 27.82% for the accum field, indicating that the sum total of the first three lines of the self field is 27.82%.

count

This field indicates how many times the unique stack trace that gave rise to this entry was sampled while the program ran.

trace

This field shows the unique trace identifier from the second section of profile output that generated this entry. The trace is recorded only once in the second section no matter how many times it is sampled; the number of times that this trace has been sampled is listed in the count field.

method

This field shows the method name from the top line of the stack trace referred to from the trace field, i.e., the method that was running when the stack was sampled.

This summary table lists only the method name and not its argument types. Therefore, it is frequently necessary to refer to the stack itself to determine the exact method, if the method is an overloaded method with several possible argument types. (The stack is given by the trace identifier in the trace field, which in turn references the trace from the second section of the profile output.) If a method is called in different ways, it may also give rise to different stack traces. Sometimes the same method call can be listed in different stack traces due to lost information. Each of these different stack traces results in a different entry in the third section of the profiler’s output, even though the method field is the same. For example, it is perfectly possible to see several lines with the same method field, as in the following table segment:

rank   self  accum   count trace method
  95   1.1% 51.55%    110    699 java/lang/StringBuffer.append
 110   1.0% 67.35%    100    711 java/lang/StringBuffer.append
 128   1.0% 85.35%     99    332 java/lang/StringBuffer.append

When traces 699, 711, and 332 are analyzed, one trace might be StringBuffer.append(boolean), while the other two traces could both be StringBuffer.append(int), but called from two different methods (and so giving rise to two different stack traces and consequently two different lines in the summary example). Note that the trace does not identify actual method signatures, only method names. Line numbers are given if the class was compiled so that line numbers remain. This ambiguity can be a nuisance at times.

The profiler in this mode (cpu=samples) is useful enough to suffice when you have no better alternative. It does have an effect on real measured times, slowing down operations by variable amounts even within one application run. But it normally indicates major bottlenecks, although sometimes a little extra work is necessary to sort out multiple identical method-name references.

Using the alternative cpu=times mode, the profile output gives a different view of application execution. In this mode, the method times are measured from method entry to method exit, including the time spent in all other calls the method makes. This profile of an application gives a tree-like view of where the application is spending its time. Some developers are more comfortable with this mode for profiling the application, but I find that it does not directly identify bottlenecks in the code.

HotSpot and 1.3 “-Xprof” Profile Output

HotSpot does not support the standard Java 2 profiler detailed in the previous section; it supports a separate profiler using the -Xprof option. JDK 1.3 supports the HotSpot profiler as well as the standard Java 2 profiler detailed in the previous section. The HotSpot profiler has no further options available to modify its behavior; it works by sampling the stack every 10 milliseconds.

The output, printed to standard out, consists of a number of sections. Each section lists entries in order of the number of ticks counted while the method was executed. The various sections include methods executing in interpreted and compiled modes, and VM runtime costs as well:

Section 1

One-line header, for example:

Flat profile of 7.55 secs (736 total ticks): main

Section 2

A list of methods sampled while running in interpreted mode. The methods are listed in order of the total number of ticks counted while the method was at the top of the stack. For example:

Interpreted + native   Method                        
  3.7%    23  +     4    tuning.profile.ProfileTest.main
  2.4%     4  +    14    java.lang.FloatingDecimal.dtoa
  1.4%     3  +     7    java.lang.FDBigInt.<init>

Section 3

A list of methods sampled while running in compiled mode. The methods are listed in order of the total number of ticks counted while the method was at the top of the stack. For example:

Compiled + native   Method                        
 13.5%    99  +     0    java.lang.FDBigInt.quoRemIteration
  9.8%    71  +     1    java.lang.FDBigInt.mult
  9.1%    67  +     0    java.lang.FDBigInt.add

Section 4

A list of external (non-Java) method stubs, defined using the native keyword. Listed in order of the total number of ticks counted while the method was at the top of the stack. For example:

Stub + native   Method                        
  2.6%    11  +     8    java.lang.Double.doubleToLongBits
  0.7%     2  +     3    java.lang.StrictMath.floor
  0.5%     3  +     1    java.lang.Double.longBitsToDouble

Section 5

A list of internal VM function calls. Listed in order of the total number of ticks counted while the method was at the top of the stack. Not tuneable. For example:

Runtime stub + native  Method                        
  0.1%     1  +     0    interpreter_entries
  0.1%     1  +     0    Total runtime stubs

Section 6

Other miscellaneous entries not included in the previous sections:

Thread-local ticks:
  1.4%    10             classloader
  0.1%     1             Interpreter
 11.7%    86             Unknown code

Section 7

A global summary of ticks recorded. This includes ticks from the garbage collector, thread-locking overheads, and other miscellaneous entries:

Global summary of 7.57 seconds:
100.0%   754             Received ticks
  1.9%    14             Received GC ticks
  0.3%     2             Other VM operations

The entries at the top of Section 3 are the methods that probably need tuning. Any method listed near the top of Section 2 should have been targeted by the HotSpot optimizer and may be listed lower down in Section 3. Such methods may still need to be optimized, but it is more likely that the methods at the top of Section 3 are what need optimizing. The ticks for the two sections are the same, so you can easily compare the time taken up by the top methods in the different sections and decide which to target.

JDK 1.1.x “-prof” and Java 2 “cpu=old” Profile Output

The JDK 1.1.x method-profiling output, obtained by running with the -prof option, is quite different from the normal 1.2 output. This output format is supported in Java 2, using the cpu=old variation of the -Xrunhprof option. This output file consists of four sections:

Section 1

The method profile table showing cumulative times spent in each method executed. The table is sorted on the first count field; for example:

callee caller time
29 java/lang/System.gc( )V  
        java/io/FileInputStream.read([B)I 10263
1 java/io/FileOutputStream.writeBytes([BII)V 
        java/io/FileOutputStream.write([BII)V 0

Section 2

One line describing high-water gross memory usage. For example:

handles_used: 1174, handles_free: 339046, heap-used: 113960, heap-free: 21794720

The line reports the number of handles and the number of bytes used by the heap memory storage over the application’s lifetime. A handle is an object reference. The number of handles used is the maximum number of objects that existed at any one time in the application (handles are recycled by the garbage collector, so over its lifetime the application could have used many more objects than are listed). The heap measurements are in bytes.

Section 3

Reports the number of primitive data type arrays left at the end of the process, just before process termination. For example:

sig  count     bytes  indx
[C     174     19060     5
[B       5     19200     8

This section has four fields. The first field is the primitive data type (array dimensions and data type given by letter codes listed shortly), the second field is the number of arrays, and the third is the total number of bytes used by all the arrays. This example shows 174 char arrays taking a combined space of 19,060 bytes, and 5 byte arrays taking a combined space of 19,200 bytes.

The reported data does not include any arrays that may have been garbage collected before the end of the process. For this reason, the section is of limited use. You could use the -noasyncgc option to try to eliminate garbage collection (if you have enough memory; you may also need -mx with a large number to boost the maximum memory available). If you do, also use -verbosegc so that if garbage collection is forced, you at least know that garbage collection has occurred and can get the basic number of objects and bytes reclaimed.

Section 4

The fourth section of the profile output is the per-object memory dump. Again, this includes only objects left at the end of the process just before termination, not objects that may have been garbage-collected before the end of the process. For example:

*** tab[267] p=4bba378 cb=1873248 cnt=219 ac=3 al=1103
  Ljava/util/HashtableEntry; 219 3504
  [Ljava/util/HashtableEntry; 3 4412

This dump is a snapshot of the actual object table. The fields in the first line of an entry are:

***tab[ <index> ]: The entry location as listed in the object table. The index is of no use for performance tuning.
p=< hex value >: Internal memory locations for the instance and class; of no use for performance tuning.
cb=< hex value >: Internal memory locations for the instance and class; of no use for performance tuning.
cnt=< integer >: The number of instances of the class reported on the next line.
ac=< integer >: The number of instances of arrays of the class reported on the next line.
al=< integer >: The total number of array elements for all the arrays counted in the previous (ac) field.

This first line of the example is followed by lines consisting of three fields: first, the class name prefixed by the array dimension if the line refers to the array data; next, the number of instances of that class (or array class); and last, the total amount of space used by all the instances, in bytes. So the example reports that there are 219 HashtableEntry instances taking a total of 3504 bytes between them,^[10] and three HashtableEntry arrays having 1103 array indexes between them (which amounts to 4412 bytes between them, since each entry is a 4-byte object handle).

The last two sections, Sections 3 and 4, give snapshots of the object table memory and can be used in an interesting way: to run a garbage collection just before termination of your application. That leaves in the object table all the objects that are rooted^[11] by the system and by your application (from static variables). If this snapshot shows significantly more objects than you expect, you may be referencing more objects than you realized.

The first section of the profile output is the most useful, consisting of multiple lines, each of which specifies a method and its caller, together with the total cumulative time spent in that method and the total number of times it was called from that caller. The first line of this section specifies the four fields in the profile table in this section: count, callee, caller, and time. They are detailed here:

count

The total number of times the callee method was called from the caller method, accumulating multiple executions of the caller method. For example, if foo1( ) calls foo2( ) 10 times every time foo1( ) is executed, and foo1( ) was itself called three times during the execution of the program, the count field should hold the value 30 for the callee-caller pair foo2( )-foo1(). The line in the table should look like this:

30 x/y/Z.foo2( )V x/y/Z.foo1( )V 1263

(assuming the foo*( ) methods are in class x.y.Z and they both have a void return). The actual reported numbers may be less than the true number of calls: the profiler can miss calls.

callee

The method that was called count times in total from the caller method. The callee can be listed in other entries as the callee method for different caller methods.

caller

The method that called the callee method count times in total.

time

The cumulative time (in milliseconds) spent in the callee method, including time when the callee method was calling other methods (i.e., when the callee method was in the stack but not at the top, and so was not the currently executing method).

If each of the count calls in one line took exactly the same amount of time, then one call from caller to callee took time divided by count milliseconds.

This first section is normally sorted into count order. However, for this profiler, the time spent in methods tends to be more useful. Because the times in the time field include the total time that the callee method was anywhere on the stack, interpreting the output of complex programs can be difficult without processing the table to subtract subcall times. This format is different from the 1.2 output with cpu=samples specified, and is more equivalent to a 1.2 profile with cpu=times specified.

The lines in the profile output are unique for each callee-caller pair, but any one callee method and any one caller method can (and normally do) appear in multiple lines. This is because any particular method can call many other methods, and so the method registers as the caller for multiple callee-caller pairs. Any particular method can also be called by many other methods, and so the method registers as the callee for multiple callee-caller pairs.

The methods are written out using the internal Java syntax listed in Table 2-1.

Table 2-1. Internal Java Syntax for -prof Output Format

Internal Symbol	Java Meaning
`/`	Replaces the . character in package names (e.g., `java/lang/String` stands for `java.lang.String`)
`B`	`byte`
`C`	`char`
`D`	`double`
`I`	`int`
`F`	`float`
`J`	`long`
`S`	`short`
`V`	`void`
`Z`	`boolean`
`[`	One array dimension (e.g., `[[B` stands for a two-dimensional array of bytes, such as `new byte[3][4]`)
`L<classname>;`	A class (e.g., `Ljava/lang/String;` stands for `java.lang.String`)

There are free viewers, including source code, for viewing this format file:

Vladimir Bulatov’s HyperProf (search for HyperProf on the Web)
Greg White’s ProfileViewer (search for ProfileViewer on the Web)
My own viewer (see ProfileStack: A Profile Viewer for Java 1.1)

ProfileStack: A Profile Viewer for Java 1.1

I have made my own viewer available, with source code. (Under the tuning.profview package, the main class is tuning.profview.ProfileStack and takes one argument, the name of the prof file. All classes from this book are available by clicking the “Examples” link from this book’s catalog page, http://www.oreilly.com/catalog/javapt/.) My viewer analyzes the profile output file, combines identical callee methods to give a list of its callers, and maps codes into readable method names. The output to System.out looks like this:

time  count  localtime       callee
19650 2607   19354 int ObjectInputStream.read( )
      Called by
      %      time  count   caller
      98.3   19335 46    short DataInputStream.readShort( )
      1.1    227   1832  int DataInputStream.readUnsignedByte( )
      0.2    58    462   int DataInputStream.readInt( )
      0.1    23    206   int DataInputStream.readUnsignedShort( )
      0.0    4     50    byte DataInputStream.readByte( )
      0.0    1     9     boolean DataInputStream.readBoolean( )
19342 387    19342 int SocketInputStream.socketRead(byte[],int,i
      Called by
      %      time  count   caller
      100.0  19342 4     int SocketInputStream.read(byte[],int,i
15116 3      15116 void ServerSocket.implAccept(Socket)
      Called by
      %      time  count   caller
      100.0  15116 3 Socket ServerSocket.accept( )

Each main (nonindented) line of this output consists of a particular method (callee) showing the cumulative time in milliseconds for all the callers of that method, the cumulative count from all the callers, and the time actually spent in the method itself (not in any of the methods that it called). This last noncumulative time is found by identifying the times listed for all the callers of the method and then subtracting the total time for all those calls from the cumulative time for this method. Each main line is followed by several lines breaking down all the methods that call this callee method, giving the percentage amongst them in terms of time, the cumulative time, the count of calls, and the name of the caller method. The methods are converted into normal Java source code syntax. The main lines are sorted by the time actually spent in the method (the third field, localtime, of the nonindented lines).

The biggest drawback to the 1.1 profile output is that threads are not indicated at all. This means that it is possible to get time values for method calls that are longer than the total time spent in running the application, since all the call times from multiple threads are added together. It also means that you cannot determine from which thread a particular method call was made. Nevertheless, after re-sorting the section on the time field, rather than the count field, the profile data is useful enough to suffice as a method profiler when you have no better alternative.

One problem I’ve encountered is the limited size of the list of methods that can be held by the internal profiler. Technically, this limitation is 10,001 entries in the profile table, and there is presumably one entry per method. There are four methods that help you avoid the limitation by profiling only a small section of your code:

sun.misc.VM.suspendJavaMonitor( )
sun.misc.VM.resumeJavaMonitor( )
sun.misc.VM.resetJavaMonitor( )
sun.misc.VM.writeJavaMonitorReport( )

These methods also allow you some control over which parts of your application are profiled and when to dump the results.

^[8]A variety of profiling metrics, including the way different metrics can be used, are reported in the paper “A unifying approach to performance analysis in the Java environment,” by Alexander, Berry, Levine, and Urquhart, in the IBM Systems Journal, Vol. 39, No. 1. This paper can be found at http://www.research.ibm.com/journal/sj/391/alexander.html.Specifically, see Table 4 in this paper.

^[9]The samples that count towards a particular method’s execution time are those where the method itself is executing at the time of the sample. If method foo( ) was calling another method when the sample was taken, that other method would be at the top of the stack instead of foo( ). So you do not need to worry about the distinction between foo( )’s execution time and the time spent executing foo( )’s callees. Only the method at the top of the stack is tallied.

^[10]A HashtableEntry has one int and three object handle instance variables, each of which takes 4 bytes, so each HashtableEntry is 16 bytes.

^[11]Objects rooted by the system are objects the JVM runtime keeps alive as part of its runtime system. Rooted objects are generally objects that cannot be garbage collected because they are referenced in some way from other objects that cannot be garbage collected. The roots of these non-garbage-collectable objects are normally objects referenced from the stack, objects referenced from static variables of classes, and special objects the runtime system ensures are kept alive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Method Calls

Create new playlist

Sign In

Sign Up

Method Calls

Warning

Profiling Methodology

Note

Java 2 “cpu=samples” Profile Output

HotSpot and 1.3 “-Xprof” Profile Output

JDK 1.1.x “-prof” and Java 2 “cpu=old” Profile Output

Table of Contents for
Method Calls