© Erik Ostermueller 2017
Erik OstermuellerTroubleshooting Java Performancehttps://doi.org/10.1007/978-1-4842-2979-8_12

12. Heap, the “h” in P.A.t.h.

Erik Ostermueller
(1)
Little Rock, Arkansas, USA
 
This is the last of the four P.A.t.h. Checklist chapters. Remember that until the root of the problem is found, you need to check all four parts of the Checklist. This chapter starts out with a very basic Garbage Collection health check—it should take just a few minutes to capture the data and make a healthy/not-healthy assessment.
But by the end of the chapter, I will have you doing more involved tasks, like rummaging through heap dumps to find the root cause of memory leaks that come with this book’s sample applications.
The objectives for this chapter are:
  • Learn how to quickly assess GC Health in any JVM. No extra configuration is required, no need to restart the JVM.
  • Understand the high-level approach to troubleshooting performance issues with the heap.
  • Fix the most common GC inefficiencies by first understanding which heap space is causing your problem, the old space or the new space.
  • Pinpoint the cause of a memory leak by discovering the names of classes with high memory consumption, but also understand how consumption for individual classes trends over time.
  • Take a first look at using a heap analysis tool to discover the root cause of a memory leak.
The Quick GC Health Check in this chapter boils down GC performance into two easy-to-capture metrics that are available from tools right there in the JDK. There are no extra JVM parameters to configure, no JVM restart is required. These are plug-it-in-now metrics, available right when you want them. I’ll provide my own thresholds on what numbers translate into “healthy” or not; you can adjust them based on your own experience.
Along with this health check, this chapter provides a review of a number of other GC/heap analysis techniques. Ultimately I will roll them up into a nice little set of steps that will guide you through fixing the majority of all the GC/heap performance problems you encounter.
One of my goals for this chapter this to show that a great amount of GC/heap tuning can be done without knowing much at all about GC algorithms.

Quick GC Health Check

Understanding how much improvement can be made on a particular slow-looking GC metric, or any slow-looking metric actually, is a bit challenging. So when it comes to assessing GC health, I make it easy on myself with a simple RED-YELLOW-GREEN approach .
It starts with a simple “GC overhead” metric, which is the amount of time spent garbage collecting measured over an elapsed time. So if 150ms (0.150 seconds) of the last 1000ms have been spent in GC, then that’s 15% GC overhead. 20ms (0.02 seconds) out of 1000ms is 2% overhead.
Here is how I use the GC overhead metric to triage performance problems with the heap:
  • RED: If sustained GC overhead is greater than 8%, then high priority should be given to improving GC performance.
  • YELLOW: If sustained GC overhead is greater than 2% (but less than 8%), then start researching improvements, but there is no rush to implement a GC change. Moving from GREEN to YELLOW will often be an early warning sign of a memory leak.
  • GREEN: when sustained GC overhead less than 2% GC performance is considered healthy, and no GC tuning is required. However, systems that require faster response times (perhaps less than 25ms) will still see benefit if GC is tuned more, perhaps down to 0.5%.
The phrase “sustained GC overhead” roughly means sustained for 5 minutes or more. If you don’t like the percentages I’ve assigned to RED-YELLOW-GREEN, I promise I won’t be offended if you alter them to accommodate your experience. The point is that development and operations teams need a consistent approach for determining when corrective action should be taken.
Even if aggregate overhead is low, occasional spikes can be worrisome. Consider using a similar grading scale, like this:
  • RED: More than five 3-second pauses an hour
  • YELLOW: Between two and five 3-second pauses an hour
  • GREEN: one or fewer 3-second pauses an hour
Keep in mind that the G1 garbage collector algorithm was designed to minimize these spikes.
We will discuss the tools you can use to capture this GC overhead metric, but first let’s look at the traditional approach/tooling to capturing GC metrics.

The Traditional Approach to GC Metrics

To use the latest and greatest GC performance analysis programs, you need to enable your JVM to capture a verbose GC file and then analyze the file. First add a few parameters to your JVM start up:
-Xloggc:gc.log
-verbose:gc
Then restart the JVM and reproduce the problem. Then you transfer the gc.log file back to your desktop, and download and install an analysis program that graphs the GC data.
That is a lot of work, but often, additional JVM settings like the following are required to get the full picture of the problem:
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-verbose:sizes
I’m often unsure whether I know the current, full list of parameters to fully vet GC performance or whether the full list even exists. Furthermore, for operational harmony and to avoid filling the hard disk with great performance data, these parameters are essential:
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=10m
It takes an unusually patient snake charmer to collect verbose GC metrics, with the many and varied JVM parameters and the requisite JVM restarts. Making sure all the right parameters stay in place across all environments seems difficult at best, even though real performance analysis stalls until the JVM snake charmers have done their work. When performance analysis stalls during a crisis, angst and frustration rise.
To the good people who run the OpenJdk project, I think it is reasonable to ask that all out-of-the-box JVMs capture these low-overhead metrics by default, with parameters to tweak or turn them off as necessary. How about as a default, we could have a modest 10MB max footprint for the data files?
New open source and freely available graphing and analysis tools show up every year that provide better understanding of verbose GC data. This is a nice, open and growing tool ecosystem, even if snake charming is required to capture data. It would be nice if the graphing tools displayed a live view of the data, instead of the current batched approach, processing from a file.
Live GC data, graphed, can be retrieved from a different toolset: the Visual GC and gcViewer plug-ins for JVisualVM that comes with the JDK. JConsole provides metrics, too. But these facilities lack a number of important metrics, they are less mature, and enhancements are infrequent.
Here are four main facilities for capturing GC performance data:
  • Verbose GC
  • JAVA_HOME/bin/jvisualvm and JAVA_HOME/bin/jconsole
  • JAVA_HOME/bin/jstat
  • Third-party tools like Application Performance Management (APM) tools, Dynatrace, AppDynamics, and so on
Ultimately, this third option, jstat, will be the tool we use to capture metrics for the RED-YELLOW-GREEN approach. But before we jump into that, let’s quickly review a few high-level things in the next section about the GC.

Garbage Collector Overview

Of all the many JVM heap parameters, the -Xmx parameter plays the main role in heap performance. For example, -Xmx2048m defines a fixed 2GB of RAM, called the heap, that is at your Java program’s disposal. Note that unlike many GC parameters, this one uses no equal sign (=)—it’s a bit of an oddball that way. Never set this value higher than your available RAM. OutOfMemory errors show up when your program allocates more than your -Xmx heap max value. Additional RAM outside of the heap can also be used, as it is by implementers of JSR-107:
However, that is out of scope for this small book.
The JVM’s garbage collector (GC) is really just a recycler. Once your Java program is finished using a variable, the GC discards it by making the underlying memory available for subsequent variable declarations. It takes work for the GC to identify variables (and memory) still in-use by your application to ensure that they are not discarded. When I say “work,” I mean it’s going to eat CPU and it will cause some processing slowdowns.
GC algorithms, which are heavily configurable, rely on various carefully sized compartments of memory to pull off this recycling process efficiently. When certain of these compartments aren’t sized right, end-user slowdowns ensue.
There are a number of different GC algorithms that can be used, and JVM start-up parameters decide which one is in use for a given Java process. Even though descriptions for the GC algorithms are available everywhere, somehow that doesn’t translate into being able to quickly assess whether GC performance is healthy or not, and whether an inefficiency lies in the “old” or the “new” generation, which are two of the main heap compartments that I mentioned earlier. All modern GC algorithms split the heap into old and new generations and have different approaches to managing each one.
This document from Oracle provides detail on all the GC algorithms:

Plug-It-in-Now GC Metrics with jstat

I have mentioned elsewhere that if metrics aren’t easy to capture, then performance problems often go unfixed. That’s why I rely on JAVA_HOME/bin/jstat -gc to capture metrics from any running JVM—no need to add JVM GC parameters, no need to restart the JVM and reproduce the problem. Just pass in the PID of the Java process and review the data. The tool summarizes all its metrics for the time interval you give it. I told jstat to use an interval of 1 second (1s), like this:
# jstat -gc <myPid> 1s
Yes, this data is more convenient to retrieve, but there is nothing attractive about the presentation of Listing 12-1, and there are very few tools that create graphs from this format. Looks like a lot of data already, but there is more! I trimmed five columns out of the left side of this image.
YGCT   FGCT
-----  -----
1.719  1.028
1.721  1.028
1.724  1.028
1.728  1.028
1.734  1.028
1.738  1.028
1.744  1.028
1.748  1.028
Listing 12-1.
Data from JAVA_HOME/bin/jstat, one new line of data every 1 second
Ultimately, we need to somehow get the GC overhead metric out of this data for the RED-YELLOW-GREEN scale discussed earlier, and we’ll need it for both the young gen and the new gen. Fortunately, there is one column each for young and new gen: these are YGCT and FGCT, and they are documented here:
But there is one last headache to deal with: each row in these two columns shows the accumulated GC time since you started jstat. Instead, we need GC time that happened in each 1 second interval, and that will allow us to easily calculate an average.
I created a four-line awk script to get around these headaches. The script selects only the two columns we care about (the ones highlighted in Listing 12-1), and it also calculates the differences between each row.
The readme.txt at this link has all the details on the gctime script:
As a point of interest, let’s say that all you have is stdout from jstat with a 1-second interval, as you might get from an operations team. You can get the right gctime output by post-processing the stdout file with the (almost trivial) awk script at the link.
Let’s do some tuning using gctime. Both of the following examples started out with the GC parameters in Listing 12-2, which means that the heap is split right down the center: 512MB old gen, 512MB young gen. For brevity, I have not shown the <argument> XML tags that surround each parameter in the pom.xml file. Results for this configuration are in Listing 12-3, following.
-Xmx1g -XX:NewSize=512m -XX:MaxNewSize=512m -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4
Listing 12-2.
GC parameters dialed into littleMock’s pom.xml for this example
These parameters are kept toward the end of littleMock’s pom.xml (and they’re all wrapped in XML <argument> tags).

Configuring littleMock Performance with Test Keys

We have two sets of performance examples, right? jpt and littleMock. With jpt, when you launch ./startWar.sh 05b and ./load.sh 05b, you see performance of a really slow query without an index. When you instead run with 05a, you get the same test but with the index in place (and better performance). The test ID passed into startWar.sh and tload.sh determines performance.
The littleMock startWar.sh and load.sh are different—they don’t take any parameters. To “dial in” a particular performance problem or optimization, you instead change the ‘performance key’ or other options on the littleMock web page:
http://localhost:8080/ui
With jpt, we are limited to 12 pairs of tests. littleMock performance keys enable countless permutations of performance test scenarios that can be shared and explored together so we can discuss the best tools and performance remedies to use for very specific performance scenarios.
Don’t forget that Glowroot is enabled—just browse to http://localhost:4000. When the ./load.sh script is running, you can see interactively how performance changes as you click options in the littleMock web page.

Optimizing Young Gen GC

Listings 12-3 and 12-4 show gctime GC metrics before and after tuning. Before tuning, the young gen GC health is YELLOW because dYGCT (the left column) has 4%-5% overhead. After tuning, the young GC health is GREEN; about 1% GC overhead. (Output from gctime.sh.)
dYGCT  dFGCT
-----  -----
0.047  0.000
0.052  0.000
0.054  0.000
0.047  0.000
0.052  0.000
0.042  0.000
0.043  0.000
0.054  0.000
0.07   0.000
0.048  0.000
Listing 12-3.
Data from gctime.sh. GC Metrics Before Tuning
dYGCT  dFGCT
-----  -----
0.010  0.000
0.005  0.000
0.011  0.000
0.006  0.000
0.010  0.000
0.012  0.000
0.005  0.000
0.012  0.000
0.005  0.000
0.009  0.000
Listing 12-4.
Data from gctime.sh. GC Metrics after Tuning
Listing 12-3, with New GC Health = YELLOW, was captured using a littleMock test key that simulates mostly stateless activity with little or no long-term cache. The scenario is basically an SOA system with no session management. Note that the B65535 means that there are 64K allocations during each request, which are then discarded after each request—definitely work for the young gen collection process. Here is the performance key that was used:
X2,J25,K25,L0,Q,R,S,A10,B65535,C0,D10
The NewSize and MaxNewSize values both started out at 512MB. Bumping them both to 750MB brought this 4%–5% down to 3.5% (not shown). After further increases to 1.5GB of new space, I got just below 2% new GC overhead and GC Health was officially GREEN (also not shown). But just for grins, I increased the new space all the way to 3GB (Listing 12-5) to lower overhead to 1%, as shown in Listing 12-4. All of these configurations had 512MB of old gen space.
-Xmx3g -XX:NewSize=2500m -XX:MaxNewSize=2500m -XX:+UseConcMarkSweepGC -XX:ConcGCThreads =4
Listing 12-5.
GC Settings That Produced the Improved Results in Listing 12-4

Optimizing Old Gen GC

This next scenario starts with old gen GC health of RED. It simulates a large, ∼500 MB cache that was not designed to expire—like sticking 500MB into a singleton, but inadvertently not allocating enough old gen in the JVM parameters for it. A scenario with a similar memory configuration is a very slow memory leak. Notice in Listing 12-6 that dFGCT (the right column) has greater than 100% overhead. jstat, how is this possible? Then, in Listing 12-7, the off-the-charts dFGCT metrics from Listing 12-6 came down to zeroes, with just 256MB added to the old gen.
dYGCT  dFGCT
-----  -----
0.000  0.000
0.000  1.413
0.000  0.000
0.000  1.932
0.000  0.000
0.004  1.604
0.014  0.034
0.050  0.034
0.000  1.758
0.000  0.000
Listing 12-6.
Data from gctime.sh. Before: Old Gen GC Health Is RED
dYGCT  dFGCT
-----  -----
0.010  0.000
0.008  0.000
0.010  0.000
0.009  0.000
0.010  0.000
0.009  0.000
0.010  0.000
0.008  0.000
0.010  0.000
0.008  0.000
Listing 12-7.
Data from gctime.sh. After: Health Is GREEN
The data in Listing 12-6 was captured by first running with one test key that added garbage that would not expire for a long time—an hour, past the duration of my little test. The C and D parameters set the duration of how long to keep the objects around—and I set it to be more than an hour (3600 seconds in an hour, and tack on three zeros to convert to milliseconds to get C3600000 and D3600001).
To fill up 512MB of old gen space as described earlier, it took littleMock a few minutes of runtime with this performance key:
X2,J25,K25,L0,Q,R,S,A4096,B100,C3600000,D3600001
As I was running this key, I closely watched Glowroot’s JVM Gauge feature (Figure 12-1) to determine when the old gen part of the heap filled up.
A449023_1_En_12_Fig1_HTML.jpg
Figure 12-1.
The Gauges part of Glowroot displays CMS old gen bytes used. Here we create ∼500 MB to simulate an in-memory cache of something and its effects on GC. This graph shows that my old gen is larger than 512MB. It isn’t very germane, but I still wonder how that happened.
When the old gen got to about 500MB (between 10:49pm and 10:50pm in Figure 12-1), I applied the A0 setting (Figure 12-2), probably right before poor littleMock started throwing out-of-memory errors.
A449023_1_En_12_Fig2_HTML.jpg
Figure 12-2.
littleMock’s A0=0 setting that turns off Extra Garbage Collection
Once A0=0 was applied and the rising GC immediately and abruptly leveled out, I captured the overhead metrics in Listing 12-6, which are off the charts, with overhead that is over 100%. Dear jstat, how can one second duration (specified on the command line) contain more than a second of old gen activity, like the 1.x numbers in the dFGCT column of Listing 12-6? I suppose it’s enough to know that the numbers are bad and need improvement.
Ultimately, in littleMock’s pom.xml I added 256MB to the old gen, restarted littleMock, carefully put the roughly 500MB cache back in place, and used A0 to turn off Extra Garbage Collection to bring the heap to steady state. The off-the-chart metrics from gctime.sh came down to zeroes, so we got GREEN GC health for the old gen; see Listing 12-7 for details.
One moral of the story here is to triple check that your old gen has enough space to handle your big allocations, like this 500MB cache. Wouldn’t you be a little embarrassed if you were the last one in the room to find out there was a huge elephant sitting in the seat right behind you? This is the same thing. Also keep in mind that performance degrades when available heap space shrinks.

Heap Trending

Having immediate access to GC performance with jstat is important. For example, if GC health is GREEN for gctime.sh’s old and new metrics, then you can move on to the next item in the P.A.t.h check list. But knowing the history of GC performance provides key troubleshooting details that can help solve problems.
For example, if GC performance was fine for days but abruptly degraded, then your search for a slow memory leak should be put on hold, because an abrupt change like this is more likely to be caused by a one-time event, perhaps a single “poison-pill” request that led to an open-ended query with massive result set that used all available memory.
Heap trending can also be helpful during development to see if any memory leaks have sneaked into the code base. For instance, before the code is released, you could run a load test and make sure that the Used Tenured Heap metrics more resembles the sawtooth in Figure 12-3 than the memory leak in Figure 12-4. These figures show verbose GC data (see the second section of this chapter) graphed by the GCViewer, which is available from https://github.com/chewiebug/GCViewer .
A449023_1_En_12_Fig3_HTML.jpg
Figure 12-3.
This sawtooth pattern in GCViewer is a healthy sign that the system-under-load does not have any memory leaks
A449023_1_En_12_Fig4_HTML.jpg
Figure 12-4.
When the Used Tenured Heap metric is missing a sawtooth, it generally looks like this, which is most likely a memory leak. Screenshot from GCViewer.
The GC metrics available at http://gceasy.io are very helpful, but I’m not crazy about the “upload your GC data to our site or pay for your own server” business model. GarbageCat is another great option:

Using JVM Parameters to Size the Heap

In the world of micro-services, many more JVMs have to compete on a single machine for a fixed amount of RAM, which makes sizing the heap with the -Xmx parameter a critical task. Allocate too little RAM and starve yourself. If you allocate too much, you starve others.
I like to keep the min and max heap size parameters, -Xmx and -Xms, identical. For example, -Xmx1g -Xms1g is good, and I shy away from -Xmx1g -Xms512m. This makes things a bit simpler so I do not have to comprehend a heap that is perpetually expanding/contracting. Also, at JVM start-up time, keeping these two parameters the same makes an “early claim” to the operating system for every last drop of RAM that tests showed was needed.
This keeps other processes from staking a claim to RAM ultimately required by this JVM process. When more micro-service JVMs are playing in the same sandbox, someone is more likely to get sand in their eyes.
In my head, I use a shoebox metaphor to understand how to use JVM parameters to grow and shrink the old and new spaces. The -Xmx determines the size of the shoebox, and the shoebox has a cardboard divider that separates the old and new spaces. As shown in Figure 12-5, if I explicitly set the new size to a particular value, then the old space gets all of the remaining RAM, just as moving the divider to one side of the shoe box increases the size of the other side.
A449023_1_En_12_Fig5_HTML.jpg
Figure 12-5.
The vertical line is like a divider in a shoe box
Consider the parameters -Xmx512m -XX:NewSize=412m depicted in Figure 12-6. For starters, note that the -Xmx parameter does not use an equal sign and -XX:NewSize does—an important syntactical difference. Outside of that small detail, these parameters define a 512MB total heap and allocate most of it (412MB) as the minimum size of the new space. The JVM automatically allocates the remaining 100MB to the old space, as shown in the figure. This is my shoebox metaphor, and the line separating the two spaces is my cardboard divider.
A449023_1_En_12_Fig6_HTML.jpg
Figure 12-6.
Depiction of larger NewSize with JVM settings -Xmx512m -XX:NewSize=412m
Changing the value of the -XX:NewSize parameter moves the divider inside the shoe box (Figure 12-7), but leaves the overall box size unchanged, like this: -Xmx512m -XX:NewSize=100m.
A449023_1_En_12_Fig7_HTML.jpg
Figure 12-7.
Depiction of larger old size, with JVM settings -Xmx512m -XX:NewSize=100m. Changes to the -XX:NewSize parameter only move the shoe box cardboard divider. They do not change overall size of the heap.
If I accidently (or mistakenly) ask for a -XX:NewSize larger than the entire heap (larger than the -Xmx), leaving no room for the old space, then the JVM will give you a not-so-helpful error message:
Invalid maximum heap size: -Xmx=512m
This basically says it is impossible to move the shoebox divider outside of the shoebox. Your shoebox divider, the -XX:NewSize parameter, must leave some room for both the old and new spaces.
So remember: the -Xmx parameter sets the size of the shoe box and the NewSize and MaxNewSize parameters are the divider , which in effect determine the sizes of both the young and the new generation.
Before finishing up this section I would like to quickly mention that the -XX:NewSize parameter is actually the minimum size for the new space.1 To help keep the complexity down and to avoid twisting my brain into a knot, I like to set the -XX:MaxNewSize to the same value as -XX:NewSize. The -XX:MaxNewSize also provides a sanity check for invalid parameters. The 1.8 JDK allows the following, even though it doesn’t make sense to have a new space that is twice the size of the entire heap it’s supposed to be contained in.
java -Xmx512m -XX:NewSize=1024m Test
But if you add the MaxNewSize parameter, like this:
java -Xmx512m -XX:NewSize=1024m -XX:MaxNewSize=1024m Test
You get a warning telling you about the problem:
Java HotSpot(TM) 64-Bit Server VM warning: MaxNewSize (1048576k) is equal to or greater than the entire heap (524288k). A new max generation size of 523776k will be used.

Caution Using NewRatio

My shoebox metaphor approach explicitly sets a ratio between the old and the new spaces by setting absolute byte sizes with -XX:NewSize. The experienced GC tuner knows there is an alternative where the ratio is use instead of specifying integer sizes. That alternative is the -XX:NewRatio JVM parameter.
“Setting -XX:NewRatio=3 means that the ratio between the old and young generation is 1:3, the combined size of eden and the survivor spaces will be one-fourth of the total heap size…”
The JVM was designed to use either NewRatio or NewSize/MaxNewSize to size your old/new spaces, but not both at the same time. I ran a test, and NewRatio is ignored if it is supplied along with NewSize/MaxNewSize, which can be pretty confusing.
But it gets worse—a lot worse. In the new gen tuning we did for Listings 12-3 and 12-4, we shrank the old space to be pretty small compared to the size of the entire heap. In fact, the old space was just 1/7th of the entire heap when performance was at its best.
My point is that shrinking your old size to be less than 50% of your heap is a completely valid and often desirable approach, especially with stateless SOA systems, but unfortunately there are no valid values for -XX:NewRatio that let you do this, as shown in Table 12-1. For these reasons, I suggest avoiding -XX:NewRatio altogether.
Table 12-1.
Old Gen Percentages for Various NewRatio Settings
NewRatio
Old size % (implicit)
1
50%
2
67%
3
75%
4
80%
I have wondered why NewRatio is so limiting, and perhaps there are outdated, historical performance concerns might be at fault?
Some heap tuning tips have a short shelf-life. What’s valid in one version of Java might very well be counter-productive in the next one. Old hands at Java will know this already, but newbies may tend to believe the advice they recently studied is going to be good for J8, J9, and so on.
I’d like to close this section with comments from Oracle tuning experts Charlie Hunt and Tony Printezis, who also stress the importance of the new generation:
“You should try to maximize the number of objects reclaimed in the young generation. This is probably the most important piece of advice when sizing a heap and/or tuning the young generation.”

Blindly Increasing -Xmx

A lazy way to try to fix a heap efficiency problem is by simply increasing the -Xmx value without understanding which GC space is having the problem. For example, consider test 12a in the javaPerformanceTroubleshooting tests .
If you search for 12a in pom-startWar.xml, you’ll find this:
<profile> <id>12a</id> <properties>
<profile.specific.arg.01>-Xmx512m       </profile.specific.arg.01>
If you quadruple the size of the -Xmx value (to 2048MB), you get a little improvement but are still left with 10-15% New GC Overhead %. Try it. This happens because the space with the high overhead that is dragging down performance, the new space, never gets more RAM, because the shoebox divider was not adjusted when increasing the -Xmx—all the space goes to the old gen, and the stateless activity test 12a doesn’t use the old gen.

Reclaiming Unused RAM

The previous section showed how under-allocation of RAM caused GC overhead that limited throughput by about 25%. Over-allocation may or may not cause performance problems, but it is a waste of RAM either way.
Here is a plug-it-in-now approach to detecting when you have allocated lots of RAM that your system never uses.
The jpt tests 06a and 06b get the same throughput, same response time, same CPU consumption. If all these metrics are the same, why do we care? The answer is that the a test achieves exactly the same throughput and response time as b, but it does so with less RAM, and the following data from JAVA_HOME/bin/jstat tells the story. jstat provides two metrics that are very helpful, OGCMX and OGC. As described at this URL:
these values stand for “Maximum old generation capacity (kB)” and “Current old generation capacity (kB),” respectively.
The underlined data on the right in Listing 12-8 shows metrics that can help you discover RAM that is allocated but never used. The first data is from test 06a, and the second is from 06b of the javaPerformanceTroubleshooting examples.
Listing 12-8. jstat Metrics (Underlined) Help Discover Allocated but Unused RAM
A449023_1_En_12_Figa_HTML.jpg
Using the test data in Listing 12-8 from the command-line tool jstat, Figure 12-8 shows that test b wastefully allocates about 1GB of RAM that is never used, and its performance is no better than that of test a.
A449023_1_En_12_Fig8_HTML.gif
Figure 12-8.
Histogram of data from Listing 12-8. The tall blue bar shows RAM that is allocated but unused.
Ultimately, it is up to you to determine how much RAM is considered wastefully allocated. But be careful here. If your performance tests somehow do not reproduce some condition where a great deal of memory is used in production, you could accidentally deallocate too much memory and run into an out-of-memory condition. But you can safely use jstat in production to gather these same metrics to help avoid this problem.
Also, the utility http://gceasy.io does a great job of providing charts that show this same thing—allocated but unused heap space, but it is not a plug-it-in-now tool. If you need to add JVM parameters for verbose GC collection, you need to add them and restart your JVM. Once verbose GC has been enabled, you will need to rerun your load test and then upload your verbose GC data file to someone’s third-party website for the analysis. The jstat approach gets around all of these hassles !

Memory Leaks

The most common kind of memory leak is an ever-growing singleton of some kind, and these kinds of leaks often take hours, days, or longer to amass enough memory in the singleton to start causing problems. Imagine a singleton with a java.util.Map or a java.util.List, where memory continually grows over time as .put() or .add() is called until all of the allocated heap (the -Xmx JVM parameter) is used up. Fast leaks are less common, but are more dangerous, because they can lock up a JVM in seconds without warning. These happen when a defenseless piece of code is forced to process much more data than it was expecting, like mistakenly returning all the rows in a multi-million-row table to the end user.
Diagnostic data for troubleshooting memory leaks comes in a few layers that get increasingly more detailed:
  1. 1.
    The “Heap Trending” section earlier in this chapter will help you learn how quickly the heap fills before throwing out of memory exceptions. This is based on verbose GC data. This is great, but no class names are mentioned in this data, so it makes it tough to identify which class is leaking.
     
  2. 2.
    Identifying specific classes that have the largest instance counts and largest amounts of “referenced” memory, as there were at a single point in time. JAVA_HOME/bin/jmap -histo is one common source of this data. Glowroot provides this as well.
     
  3. 3.
    Understanding trends in the monitored data. For instance, which classes have instance counts that grow perpetually over time? Manual work can be done to compare jmap -histo output (histograms). A tool called heapSpank will do some of this processing for you.
     
  4. 4.
    Tracking ownership of proliferating objects back to the parents that store/reference them. Sometimes you know which objects are littering the heap, but you don’t know how they got there nor why the GC hasn’t already discarded them. Locating exactly where the objects reside in the heap helps identify how they got there in the first place. Heap dumps and heap analysis tools like Eclipse MAT fill this role.
     
Keep in mind that capturing a heap dump is an expensive operation. It can shut down (basically pause) all activity in a JVM for dozens of seconds or more.

Troubleshooting Memory Leaks

Troubleshooting can be done using class histograms.
The JDK has a great plug-it-in-now approach to getting point-in-time instance counts of all classes loaded in the JVM. Just invoke JAVA_HOME/bin/jmap -histo <myPid> and redirect the stdout to a text file. Here is the doc:
If you have 2,000 classes loaded, the jmap -histo output will have one line for each class, showing instance counts of that class and also memory by class. This is the bulk of the output, and because a full heap dump dwarfs a histogram in size and complexity, capturing a heap dump causes much more overhead. That said, I have even seen jmap -histo pause a JVM for dozens of seconds. Sometimes important troubleshooting data is worth the temporary pause.
I suggest grepping the jmap -histo output for your package space. Keep in mind that Java primitives (arrays, int, long) and other types show up with the odd [I syntax shown in Listing 12-9. Generally knowing these primitive counts is only a little helpful, because there are no indicators in this data of what allocated the primitives. When class and package names show up, whether in your own package or a dependency’s package, that provides some context to answer questions like, “Where might there be a singleton that would be amassing objects?” and “What user data would be collecting over time?” If you saw class names/packages dealing with session data, it would be worth your time checking whether HTTP session expiration was working as expected.
$ jmap -histo 29620
num  #instances  #bytes class name
--------------------------------------
 1:   1414   6013016 [I
 2:    793   482888 [B
 3:   2502   334928 <constMethodKlass>
 4:    280   274976 <instanceKlassKlass>
 5:    324   227152 [D
 6:   2502   200896 <methodKlass>
 7:   2094   187496 [C
Listing 12-9.
Very Top Lines of jmap Output
The main limitation with jmap -histo is that it provides only point-in-time data. Capturing and reviewing jmap -histo multiple times can be helpful, but it is easy to drown in the data when you are looking for increases in object counts over multiple jmap -histo output files.
To get a better understanding of how class counts and memory consumption trends over time, I wrote heapSpank, available at heapSpank.org. This is a command-line tool that shows the percentage of time that byte counts are on the rise for the 15 classes most likely to be leaking.
I ran a littleMock test with this key to produce the memory leak:
X2,J25,K25,L0,Q,R,S,A4096,B65535,C120000,D240000
Figure 12-9 shows how heapSpank identified the leak class, but not the parent container; this is a nice head start.
A449023_1_En_12_Fig9_HTML.jpg
Figure 12-9.
heapSpank from heapSpank.org has detected the littleMock memory leak. See how the MyMemoryLeak class bubbles to the top of the display?

Heap Dump Analysis

Early in the chapter, I mentioned the layers of metrics that are available. jstat provides instantaneous metrics to assess health. The verbose GC logs and the trending tools like GCViewer help show trends over time. These indicate whether you have a slow leak building over time. jmap provides counts of classes, and then heapSpank helps understand overtime whether those specific class counts and byte counts continue to rise over time.
All that data is very helpful , but sometimes even all of this is not enough information to fix a memory leak. Let me explain. Just because instances of one or a few objects grow over time doesn’t point you to the exact spot where they are accumulating or even why they are accumulating. Always with chagrin we ask why the Garbage Collector hasn’t already recycled a particular set of objects highlighted by jmap, heapSpank, or Eclipse MAT.
Analyzing a heap dump helps answer these questions.

Capturing a Heap Dump

Especially with a heap filled to capacity with leaky objects, capturing a heap dump can be a high-overhead operation. It can halt JVM activity for seconds or even minutes. That said, some problems just have to be fixed. So I never capture a heap dump unless the jstat and verbose GC data indicate there is an upward trend of byte counts and/or instance counts. If there have already been out-of-memory errors, forget about the overhead and capture the dump.
I simulated a memory leak using this littleMock key—the same one used in the previous section.
X2,J25,K25,L0,Q,R,S,A4096,B65535,C120000,D240000
I let this test run for just a few minutes. Then I captured a heap dump using the command shown in Listing 12-10.
jcmd <myJavaPid> GC.heap_dump /path/to/myDump8.dat
Listing 12-10.
One of Many Ways to Capture a Heap Dump
Don’t forget to use these among the JVM start-up arguments for your application that will automatically capture a heap dump when an out-of-memory error occurs:
 -XX:+HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath=/some/path/ in/production

Using Eclipse MAT to Find Memory Leaks

Generally if you are going to fix a leak, code changes will have to be made. The goal is to find all the leaks that won’t be collected by the garbage collector, but mainly to figure out why the objects are not being collected. Invariably, the answer lies in the object hierarchy—which classes own and hold instances of the classes littering the heap. This is essentially a search for parentage—who owns whom.
This section will use Eclipse MAT to find the source of a leak. The IBM Heap Analyzer is an OK alternative. It sometimes has better auto-leak detection, but the Swing GUI is slower and the navigation around the object graph is more difficult and displays less information than MAT, which is coded in Eclipse RCP/SWT. Let’s get back to MAT.
Download and install MAT from this URL: http://www.eclipse.org/mat/ . I then chose the File ä Open Heap Dump menu option and pointed to myDump8.dat that I captured in Listing 12-10.
Running MAT on your desktop takes a lot of heavy processing, especially with heaps larger than 1GB.
I suggest getting 32GB RAM on your desktop and allocating perhaps half of that to MAT in the -Xmx parameter in MAT’s MemoryAnalyzer.ini.

Eclipse MAT Automatic Leak Detection

MAT has three reports that basically do automatic leak detection. Don’t ignore these reports. They are:
  • Heap Dump Overview
  • Top Components
  • Leak Suspects
I will say there is significant overlap between their functionality and purpose.
When Eclipse opens your heap dump, it asks if you want to create a Leak Suspects report. I chose that option, and MAT displayed Figure 12-10. The Leak Suspects view is not a very interactive pie chart, but the Details hyperlinks in the bottom do allow you to explore the object graph of the heap.
A449023_1_En_12_Fig10_HTML.jpg
Figure 12-10.
Eclipse MAT Leak Suspects looking at a heapDump from littleMock. In perhaps a third to a half of the heap dumps I have seen, this screen points reasonably close to the the leaky instances and/or the object containing them.
In fact, this screenshot in Figure 12-10 has already found the container holding all the leaked objects. See the OldGeneratioRepo at the bottom? littleMock stores all the MyMemoryLeak objects in a this object, and the exact collection is in Listing 12-11. This queue is the object storing all the leaked objects, taken from source code of OldGenerationRepo.java in the littleMock github repo.
Queue<OldGenerationData> allOldGenData = new LinkedBlockingQueue<OldGenerationData>();
Listing 12-11.
The OldGenerationData Queue

Exploring in Eclipse MAT

Right under the name of your dump file is a nice set of icons that I rely on heavily (Figure 12-11).
A449023_1_En_12_Fig11_HTML.jpg
Figure 12-11.
The usability of Eclipse MAT is OK. I normally rely on the icons here to help me find my way around.
Table 12-2 details the location, name and purpose of each icon in Figure 12-11.
Table 12-2.
Eclipse MAT Icons Displayed in Figure 12-11
Location on toolbar
Name of Report
Description
Leftmost icon
Overview
Shows a pie-chart with biggest consuming classes/packages.
Icon second from the left
Histogram
Right-click to drill down into incoming and outgoing references.
Icon third from the left
Dominator Tree
Very much like histogram but with left-click drill down.
Icon fourth from the left
Object Query Language
SQL-like language that lets you query the object ownership tree.
Icon fifth from the left
Double Gears / Thread Overview
Basically includes a fancy thread dump that also shows allocations in thread and thread local storage.
Very helpful for finding the stack trace in progress of creating a fast memory leak.
Icon sixth from the left
Reports
Leak Suspects and other reports.
Icon seventh from the left
Database Can / Query Brower
Provides the same right-click drill down options available in Histogram view.
Icon eighth from the left
Search / Magnifying Glass
Find objects by hex object address.
The Histogram in Figure 12-12 looks very similar to jmap output, but it is very interactive. For instance, the MAT GUI does an excellent job of managing massive lists of objects. It displays a single screenful and allows you to Click for More. But Figure 12-12 uses a different feature to manage the massive quantity of objects—a simple filter. See at the top-left, where I entered a regular expression to find all objects matching *stermueller*?
A449023_1_En_12_Fig12_HTML.jpg
Figure 12-12.
Eclipse MAT Histogram that shows counts and byte counts of all instances. Note that the MyMemoryLeak class shows up right at the top. This is very much like jmap, but interactive. You can sort it in various ways and easily right click and drill down to see all the objects this class points to, and all the objects that point to the selected class.
That is a very handy feature. In this screenshot, MAT has already detected the 2-3MB of data in my package space that is causing the leak. This is an indispensable tool.
In Figure 12-13, I am right-clicking and drilling down to see what “container” class might be holding references to me, keeping “me” from getting collected. This is called looking for GC roots. In my simple example, we already know the answer to this question. But perhaps half the time I am looking at very unfamiliar code and need this feature very much.
A449023_1_En_12_Fig13_HTML.jpg
Figure 12-13.
I right-clicked a leaky object to find the “roots”—what object is keeping this from getting collected
The result is this very nice tree view in Figure 12-14, where we finally see the gory detail of the object parentage that we have been seeking.
A449023_1_En_12_Fig14_HTML.jpg
Figure 12-14.
The ‘Merge Shortest Path to GC Roots’ screen from Eclipse

Kinds of Leaks

Eclipse MAT does a great job at finding garden-variety leaks, like the MyMemoryLeak shown earlier. This is in part because that leak is anchored to a singleton—the GC root. Imagine a small, stateless process that queries all the rows in a multi-million-row database table. There are no GC roots in this case. So I have created one of these for you to experiment with.
Invoking this URL will essentially crash any jpt JVM by selecting millions of rows from a database table:
https://localhost:8675/crashJvmWithMonsterResultSet
Perhaps MAT could be improved by trying to find the leak in the heap dump created by this scenario. Perhaps I just missed it.
Meta-space leaks are also difficult to solve, but I can provide you with a little head start in the troubleshooting. The following command will show meta-space capacity, to see if you are close to filling up the space or not:
jstat -gcmetacapacity <myPid> 1s
But you will have to use the jstat reference to understand all the field names:
To find out how much you allocated in the first place, look for this parameter in your JVM arguments:
-XX:MaxMetaspaceSize=1g
Finally, to find out what class might be leaking, use these instructions to turn on verbose class loading:
Generally after a system has been warmed up, class loading should come to a near halt. If the verbose class loading continues to show activity, especially loading a few classes or more a second, you have a meta-space leak.
Leaks we have discussed so far are part of the heap used by our Java programs. Back before Java 7, we saw a number of memory leaks in the C program (I think) that is the JVM. This is called a native memory leak and is not easy to debug; it is well beyond the scope of this small book.

Heap Troubleshooting, Step-by-Step

I generally troubleshoot heap issues by first checking to see if there are easy solutions. Only when those easy solutions are not successful to I go on to explore things like memory leaks. The following outline describes this rough, troubleshooting progression.
Start by abiding by these general rules:
  • -Xmx must not exceed available RAM.
  • Avoid NewRatio.
  • Love the one you’re with: stick with your favorite GC algorithm (except serial). If you’re on IBM J9, then only use gencon (default) or the new Balanced, which is a ‘regional’ algorithm, similar to G1.
Basic GC troubleshooting steps follow:
  1. 1.
    Start out with a not-huge heap, something between 512MB (perhaps for micro-services) and 4GB.
     
  2. 2.
    While under load, assess GC generational overhead. Use jstat and gctime.sh or gctime.cmd.
     
  3. 3.
    If one or both of the generations have sustained red and/or yellow overhead, configure an additional chunk of RAM (perhaps 256MB?) for that generation; then restart and retest. Repeat until GC overhead percentage results are in the desired range.
     
  4. 4.
    If infrequent but long old gen pauses (slower than 5 seconds) are unacceptable, consider moving to one of the following:
    1. 1.
      Hotspot / G1, because old gen collections are broken up into smaller collections yielding smaller pauses. Here is the detailed G1 tuning doc:
       
    2. 2.
      For IBM J9, use the Balanced collector, whose “regional” approach is very similar to G1.
       
     
Advanced GC troubleshooting steps follow:
  1. 1.
    If old gen continues to have yellow or red or out-of-memory errors, check for memory leaks.
    1. 1.
      Use jmap -histo and/or heapSpank to assess whether there are one or more classes whose byte counts or instance counts are continually on the rise.
       
    2. 2.
      Use PMAT or GCViewer to check verbose GC. Check the graphs in these tools for upward trends in old gen memory consumption. Consider the amount of time the graphs show that it takes to fill the heap. If the heap fills in a matter of seconds or minutes, I consider that a fast leak and look for individual requests that allocate massive amounts of data. A massive JDBC result set is once example. If instead, it takes hours or days to fill the heap, look for a small number of objects that collect in a java.util collection that is held in memory by a singleton.
       
    3. 3.
      If heapSpank doesn’t provide enough data to find cause of the leak, capture heap dump, try one of the following:
      1. 1.
        MAT leak suspects.
         
      2. 2.
        Histogram.
         
      3. 3.
        Use MAT to find GC Roots.
         
       
     
  2. 2.
    Metaspace leaks. Look for “MC”, “MU”, “CCSC”, and “CCSU” from the output of “jstat -gc <myPid> 1000”.
     
  3. 3.
    Sometimes, a single huge allocation of memory can fill the heap and crash your JVM. To learn how to troubleshoot this situation, invoke the following URL from any browser during any jpt test:
    https://localhost:8675/crashJvmWithMonsterResultSet
    This executes a JDBC query that returns a massive result set.
    The brower will hang and the JVM will produce “java.lang.OutOfMemoryError: Java heap space” errors. Create a heap dump using “jcmd <myPid> GC.heap_dump myDump.bin”. Use MAT to detect the leak.
     
  4. 4.
    Look for failure messages in the verbose GC logs. Normally, I use PMAT or GCViewer to ‘look’ at my logs. But when I’m working on a real tough GC problem that defies explanation, I get out my text editor and look through the logs for errors like “[ParNew (promotion failed)”. Be sure to do an internet search for similar errors.
     

Don’t Forget

The world of micro-services is coming, where many more JVMs will run on a single operating systems instance. Instead of 4-8 JVMs on a single machine, perhaps there will be 10 to 30. All of this will require tighter memory management, discovering heap over-allocations, and watching the performance health of many more processes.
For these reasons it would be prudent to learn some of the heap troubleshooting techniques in this chapter. Earlier in this chapter I said that diagnostic data for troubleshooting heap issues comes in a few layers that get increasingly more detailed. Being cognizant of these layers helps you understand troubleshooting data you have missed. For example, plug-it-in-now GC data from gcstat is great for immediacy, but PMAT and GCViewer are better for capturing trends, and those trends indicate whether you have a fast or a slow memory leak.
That is the end of the P.A.t.h. Checklist. What’s next is getting real-world practice learning how to turn Dark Environments into environments with good visibility, and to teach your co-workers how to use these plug-it-in-now techniques to make that happen.
Footnotes
1
This link ( https://docs.oracle.com/cd/E19900-01/819-4742/abeik/index.html ) says: The NewSize and MaxNewSize parameters control the new generation’s minimum and maximum size.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.189.63