Erik Ostermueller
Troubleshooting Java PerformanceDetecting Anti-Patterns with Open Source Tools
Erik Ostermueller
Little Rock, Arkansas, USA
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book's product page, located at www.apress.com/9781484229781 . For more detailed information, please visit http://www.apress.com/source-code .
ISBN 978-1-4842-2978-1e-ISBN 978-1-4842-2979-8
Library of Congress Control Number: 2017954917
© Erik Ostermueller 2017
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Printed on acid-free paper
Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
To John, Owen, and Joan.
Introduction
With just 30 minutes of troubleshooting, how close can you get to finding the root cause of a Java performance problem? What observability tools would you use? What subsystems would you investigate?
This book is a short curriculum in Java performance tuning for Java server-side developers. It explores one methodical approach to getting the most out of your 30 minutes and aims to show that much more is possible than is generally thought, even for Java developers with little exposure to performance tuning.
The brevity of this book is attributed to a sharp focus on only the worst problems seen in the author’s 10 years of working exclusively as a lead Java performance engineer with Java distributed systems. That said, the tools and techniques can be used to find pretty much any defect.
This book is heavy on walkthroughs of performance problems that you can download from github.com and run on your own machine. The hands-on examples provide a rich, in-the-trenches experience that a book-only approach can’t provide, not even a much larger book.
The reader will learn a methodical, easy-to-remember four-step tuning approach, called the P.A.t.h. Checklist, that directs the reader’s attention to the right parts of the system and the right tools to find the performance defects. If you’re wondering why I’ve chosen to capitalize the acronym that way, you’ll find out in Chapter 4 . Only open-source and freely available tools are used. In most cases, you will even see how the monitoring data looks before and after a performance fix is applied. Here is the checklist:
  • P: Persistence. Learn how to recognize and fix the most common JDBC performance issues, ones that also apply to the NoSQL world.
  • A: Alien systems. Detect when network calls to other systems cause slowdowns.
  • t: threads. Learn how to identify CPU and response time issues using a low overhead tool that can be used in any environment, even production.
  • h: heap. With the Quick GC Health Check, the reader will use a red-yellow green approach to assess whether GC performance is healthy. It also provides direction for more advanced techniques like finding/fixing memory leaks.
Generating a production-like workload is required to identify these defects, so there are a few chapters to get you up and going quickly to create load scripts to stress out your system. Among other topics like load scripting priorities and avoiding common pitfalls, the reader will learn a unique approach to deciding exactly how many threads of load to apply to show whether your system is scalable.

Themes

There are a number of recurring themes that we bump into while tuning software.

Dark Environments

Performance defects thrive in “Dark” Environments, ones without the right monitoring. This is a problem because so many of our environments are “Dark”—the QA environment, the demo environment, and the desktop of the developer on your team that has never used a Java profiler.
Using low-overhead tools that mostly come with the JDK, this book will help flush the performance defects out of any environment, dark or not.
A449023_1_En_BookFrontmatter_Fig1_HTML.jpg
Figure 1.
“Dark” environments with little monitoring. APM = Application Performance Management tools.

Plug-it-in-now

There are many outdated and productivity-draining monitoring tools that force us to reconfigure and restart a JVM before gleaning performance data. The dozen or more JVM parameters required for verbose garbage collection (GC) performance data are one such example. Wherever possible, this book uses metrics that you capture from any running JVM anywhere, without reconfiguration, without a JVM restart. I call this the plug-it-in-now approach.

Code First

Most performance engineers would love to have massive amounts of hardware for the software tuning process. Personally, I can’t stand it. Having countless CPUs and unlimited RAM sounds nice, but in reality, once you have lots of hardware, you can’t escape Big Environment Paralysis (my term), which includes:
  • Reduced monitoring/visibility because of heightened security that limits access to machines.
  • Infrequent test runs due to slower ops processes. Initial environment build is complex and takes weeks. Code redeploys take multiple hours. Cluster-wide restarts take 20-30 minutes. System backups and maintenance cause hours of downtime. Most individual load tests run > 60 minutes.
  • Large environments have a ton of extra components with questionable performance: load balancers, firewalls, content switches, clusters, application server configuration, and so on.
  • Larger staffing costs to babysit the extra infrastructure.
All of these things keep me from getting excited about large environments. I’m a little surprised anyone chooses to work like this because progress is excruciatingly slow. Ultimately, what really gets me excited is more tuning in less time.
What if we could get 5-10 fix-test tuning cycles completed a day, instead of the traditional 2-3? That would be a faster and more impressive way to tackle poor performance.
What if we could carefully design a tuning environment that minimized the Big Environment Paralysis? This would enable us developers to shift focus away from managing an environment and towards the performance of our deliverable, the Java code itself. I call this the Code First approach to tuning, and it’s important because it’s the fastest way to tune a system.
Chapter 2 lays out a very detailed design for such a tuning environment, one with a very special mystery feature that really minimizes Big Environment Paralysis. But aside from all of this, one particular requirement for this environment is key: when tuning here, we must be be able to reproduce most of the defects that we’d normally find in production. The environment would be worthless without it. So to concretely demonstrate that production-like performance defects can be found (and fixed) in this environment, I have coded two sets of server-side Java code examples that demonstrate these performance defects (under load) and their fixes.

Write Once Run Anywhere (WORA) Performance Defects

When Java was first released in the mid 1990s, compiling and running a ‘C’ program on multiple operating systems (OSs) was a royal pain. In response, the Java platform went to great lengths to ensure that Java code written on one OS could easily compile and run on all other OSs where Java was available. Sun Microsystems called this Write Once Run Anywhere (WORA).
But it just so happens that WORA has an undesirable and little-known side-effect: if there is a performance defect in your code, then that defect manifests itself not only on multiple platforms, but also in environments of different sizes, large and small.
Of course there are exceptions, but this has proven out repeatedly in my ten years as a Java performance engineer. Since I expect this will be hard for most readers to believe, I have coded more than a dozen performance defects to prove the point. Chapter 2 provides a quick overview of the performance results of these load tests. Chapter 8 details the architecture of the tests and describes how to run these examples on your own machine, with whatever capacity it might have.
Because most performance defects are reproducible in both large and small environments, we have an critical choice to make of where we do our performance tuning. This part is very important: Why tune in a big environment and suffer through all the ‘Big Environment Paralysis’ problems, when we could be much more productive and get 5-10 fix test cycles a day in a small environment?
Keep in mind that even though most tuning is done in a small environment for expediency, a larger environment is often required to demonstrate the full throughput and other performance requirements. What are performance requirements? We’ll get to that real soon.

Three Threads of Load, Zero Think Time (3t0tt)

Part of my day job is training developers to tune their code. One of the most frequent mistakes I see is people stressing out their Java server-side systems with unrealistically large amounts of load. There are so many guys in this field, perhaps this is a testosterone thing?
Both sets of code that come with this book execute small load tests that apply stress to a server-side system. Each one of these tests is configured to launch exactly three Java threads that are simultaneously and perpetually submitting HTTP network requests. No ‘think time’ is configured between the HTTP requests. I call this amount of load 3t0tt, because it is 3 threads of load with zero think time. This is pronounced ‘three tot.’ It is used as a quantity of load, like this: How much load are you applying? 3t0tt.
This particular level of load is designed to have small enough CPU and RAM consumption to run on just about any workstation, but large enough to reproduce most multithreaded / synchronization problems. Even though this small amount of load is rarely enough stress to cause a system to go bezerk, throughout this book, the reader is encouraged to evaluate a few specific parts of the system and determine which one is causing the largest slowdown. This is called ‘the lowest hanging fruit.’ After identifying the issue, a fix is deployed and re-tested. If performance improves, the next lowest hanging-fruit issue is investigated and the process repeats.
Choosing the right amount of load to apply is tough for newcomers to performance. 3t0tt provides a very easy starting point to a difficult training problem. In Chapter 6 , you will learn the next lesson in this realm, which is determining how much load to apply to quickly assess scalability.

The Discipline of Performance Engineering

This book is more of a troubleshooting guide than a guide to the entire discipline of performance tuning. This little section provides a brief look at how the performance process should work and how to know when you’re done tuning a system.
If we’re building a software system, the specs that define precisely what we’re going to code are called the ‘functional requirements.’ They detail the items of data entry on each screen and the results that should be displayed when the user hits the ‘submit’ button. When all functionality in the requirements has been delivered, the application is functionally complete.
Performance requirements, on the other hand, define how quickly the application must respond, and how many business processes the system must handle in a particular time interval (like in an hour) in order to satisfy client needs. These are called, respectively, response time and throughput requirements. A performance requirements document starts by listing all/most of the business processes (from the function requirements) and then assigning response time and throughput requirements to each.
For example, if the response time requirement for your funds transfer is 2 seconds and its throughput requirement is 3,600 an hour, then a steady state load test (see Chapters 4 , 5 , 6 , and 7 for details on load testing) must demonstrate that 90% of funds transfer response times are less than 2 seconds and that at least 3,600 of them were successfully executed in 60 minutes.
Sometimes performance requirements even place a cap on how much money can be spent on hardware. If five nodes in a cluster were required to meet the above goals, and the performance requirements said we could buy at most four nodes, then more tuning would need to be done (probably to lower CPU consumption) so that the response time and throughput goals could be met with just four nodes.
It is also common to stipulate that all performance requirements must be met while keeping CPU consumption below X, where X is whatever the operations folks feel comfortable with in production.
One of the most common mistakes in this area is not having any performance requirements at all. Of course these requirements are imprecise and imperfect. Start with something and refine them over time to best approximate client performance and stability needs without wasting time over-tuning the system.
Here is an example of the kind of refinement I’m talking about. Time should be spent understanding how closely the performance requirements match up to response time and throughput data gathered from production. If production data shows that the system processes 7,200 funds transfers on the busiest hour of the busiest day of the year, and the throughput requirement is less than that, then the throughput requirement should be raised to at least this number (7,200) and probably a little (perhaps 25%) higher, just to be safe.
For a more detailed look at the process, there is a great 20 page pdf whitepaper on the topic by Walter Kuketz on the www.cgi.com website. Walter actually helped me develop the above little section. To find his whitepaper, do an internet search using these search criteria:
“Walter Kuketz” “Guidebook for the Systems Development Life Cycle”

Draw the Line

Whose responsibility is it to make the system perform? The developers who coded it, or a performance engineer? If the performance engineer is your answer, then who is going to keep the developer from littering other code bases with the same kind of performance defects that caused the first system to tank? Keep in mind that laziness that drives automation is good. Laziness that avoids accountability for performance is bad.
This turns developers into performance defect progenitors, and the small cadre performance engineers don’t have a chance of keeping up with such a huge number of defects.
Finally, if a performance engineer is brought in at the end of the development cycle to discover/fix performance issues, then there isn’t time to tune, much less replace, a poorly performing technical approach that was months or years in the making.

Organization of Chapters

This book is divided into three parts.

Part I: Getting Started with Performance Tuning

The three chapters in Part I set the stage for performance testing:
Chapter 1 details four performance anti-patterns that lie at the heart of most performance defects. Understanding these anti-patterns makes the defects much easier to recognize when we’re looking at raw performance metrics.
Chapter 2 makes the case that a very small computing environment, like a developer workstation, is the most productive place to tune a Java server-side system. It also details all the steps necessary to essentially collapse all the parts of a larger system into the smaller one.
Chapter 3 is about the vast sea of performance metrics and how to choose the right ones. If you’ve ever been confused about which metrics to use during performance testing, this chapter is for you.

Part II: Creating Load Scripts and Load Testing

Part II of this book is about creating load scripts and load testing in general:
Chapter 4 details both a “First Priority” and a “Second Priority” approach to creating network load scripts, ones that simulate a production load. The First Priority gets you up and testing quickly. The Second Priority shows how to enhance your scripts to more closely model the production load of real users.
Chapter 5 details all the right criteria you need to evaluate objectively whether the script you built in Chapter 4 simulates a “valid” test or not. This chapter helps you understand quickly whether you messed up your load test and the results must be discarded. No one wants to make big tuning/development decisions based on invalid results.
Chapter 6 details the fastest approach ever to assessing scalability. It describes a test called the Scalability Yardstick. If scalability is important to you, you should run this test once a week or more to help steer your app’s performance in the right direction.
Chapter 7 is my love letter to the JMeter, the open-source load generator. jmeter-plugins.org is important, too! Commercial load generators can’t keep up with all of JMeter’s great features.

Part III: The P.A.t.h. Checklist and Performance Troubleshooting

The chapters in Part III are about performance troubleshooting. They detail the tools and techniques used to identify the root causes of Java server-side performance issues.
Chapter 8 provides the overview to that P.A.t.h. Checklist and provides an architectural overview of the two sample applications that you can run on your own machine. Downloadable from github.com, these sample applications provide a hands-on experience for collecting the right performance data and locating the root cause of the problem.
The items in the P.A.t.h. Checklist were described above so I won’t repeat the descriptions, but here are the chapters for each one:
  • P: Persistence. Chapter 9
  • A: Alien systems. Chapter 10
  • t: threads. Chapter 11
  • h: heap. Chapter 12
Acknowledgments
Thanks to Apress for trusting me as a first time author; your support means everything to me.
Shawn McKinney of the Apache Foundation was the first person who thought I could write a book. Shawn, your early encouragement and honest feedback on crazy ideas have been invaluable, and you were always the first to provide feedback on any new code. Thanks. Mike Scheuter is one of the smartest people I know; he’s a longtime friend, mentor, and colleague from FIS who first taught me to fall in love with stack traces and so many other things, work-related and otherwise. Thanks to my employer FIS and to our PerfCoE team. FIS’ continued executive-level support for great performance throughout the enterprise is unique in the industry. Thank you for that.
Dr. Liz Pierce, the Chair of the Information Science department at UA Little Rock, orchestrated an 8-hour Java performance workshop that I gave in early June, 2017. Thanks, Dr. Liz for your support, and thanks to the 15 or so students, faculty, and others who gave up two Saturday afternoons to geek out on performance. Loved it. The CMG Canada group in Toronto was also kind enough the vet some of these ideas a month or two prior. Thanks.
In 2011, I won a best paper and best speaker award at a cmg.org international performance conference. The famous American computer scientist Jeff Buzen, who made many contributions to the field of queueing theory, led the committee that selected me for those awards. CMG’s support (monetary and otherwise) way back then provided the confidence I needed to publish this book many years later. Thank you Jeff, thank you CMG.
There are many others who have helped in various ways: Joyce Fletcher and Rod Rowley who gave me my first tuning job way back in 2006. Thanks to Mike Dunlavey, Nick Seward, Stellus Pereira, David Humphrey, Dan Sobkoviak, and Mike McMillan for their support. To my Dad, Ralph Ostermueller, for your sustained interest and support over many, many months, and to Mom, too.
I’d also like to quickly mention just few of the particular open-source projects that I benefit from every day. Thanks to Trask at Glowroot.org for last minute fixes, to JMeter and JMeter-Plugins. You all rock.
Thanks to Walter Kuketz, Jeremiah Bentch, Erik Nilsson, and Juan Carlos Terrazas for reading late drafts of this book. Erik, your demands for clarity in the early chapters haunted me. Your input forced me to raise my standards a bit; I hope it shows, thanks. Jeremiah, your veteran commentary on SELECT N+1, JPA and other issues helped me fill in big gaps. Thanks. Lastly, to Walter: Your decades of performance/SDLC experience, succinctly imparted, really helped me avoid derailment at the end. Thanks.
To my technical reviewer, Rick Wagner: Rick, what I most loved about working with you, beyond your extensive Java/RedHat experience and beyond your unique experience reviewing so many technical books, was your ability to regularly guide me to paint a more complete picture of software performance for the reader, instead of the 1/2 painted I had started with. Thanks.
Lastly, thanks to my family. My older son Owen’s editing skills are really on display in the introduction, Chapters 1 and 8 , and other places as well. He’s 20, knows nothing about programming, but gave me masterful lessons on how to gradually build complex ideas. Who’d a thunk it. Contact him before you write your next book. John, my younger son, helped test the code examples and put up with many brainstorming sessions that finally knocked the right idea out of my head. John’s editing skills lie in smoothing out individual sentences. It turns out, there is a pattern here. My incredible wife Joan Dudley, who teaches college literature, is a-big picture editor for both this book and her students. Joan’s editing contributions are “every paragraph is important” and “clarity takes a back seat to no one.” Joan, you made many sacrifices, holding the fort down while I worked on this insanely long project. I love you and thank you for encouraging me to attempt great things.
Contents
Part II: Creating Load Scripts and Load Testing
Part III: The P.A.t.h. Checklist and Performance Troubleshooting
Index191
About the Author and About the Technical Reviewer
About the Author
Erik Ostermueller
A449023_1_En_BookFrontmatter_Figb_HTML.jpg
is a Java architect who is passionate about performance engineering. He has spent the last 10 years leading international performance engineering teams, tuning high-throughput Java financial systems in North and South America, Europe, and Asia. In 2011, he presented a paper entitled “How to Help Developers (Finally) Find Their Own Performance Defects” at the Computer Measurement Group’s annual conference, where he won Best Paper and the Mullen Award for best speaker. The proceeds of this award financed an eight-city speaking tour in the US, Canada, the UK, and Italy.
Erik is the technical lead for the Performance Center of Excellence at FIS Global. He is the founder of heapSpank.org and wuqiSpank.org, and a contributor to jmeter-plugins.org and other open source projects. He lives in Little Rock, Arkansas and plays soccer, tennis, and the piano.
 
About the Technical Reviewer
Rick Wagner
A449023_1_En_BookFrontmatter_Figc_HTML.jpg
has been a software developer, architect, and maintenance engineer for 27 years. Rick has produced production applications in the mainframe era, through client/server, past the dawn of internet applications, and now into the age of containerization. Rick hopes a strict regimen of coding and reading of technical books will allow him to remain relevant as future generations of computer programming unfold.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.228.19