TDD and Performance

Acceptable performance is an important requirement in any system. It’s also likely that many of you are programming in C++ expressly because of its potential for high performance. Throughout the book I’ve dismissed concerns about performance and directed you to read this section, but that’s not because performance isn’t important. It is.

Most of what falls under the umbrella of performance testing is neither TDD nor unit testing. This section presents a test-focused strategy for performance optimization and then discusses how unit-level testing can help you execute that strategy. It also discusses how design and performance relate, emphasizing that you should seek optimal design before you attempt to address performance concerns.

Performance considerations are generally nonfunctional requirements. The system needs to respond within half a second to user interaction under a load of up to 10,000 concurrent users. The system needs to process a batch within a four-hour overnight window. And so on. These are integration-level concerns (see Unit Tests, Integration Tests, and Acceptance Tests) that require an integrated and deployed system. You can’t test these concerns with tests that focus on isolated pieces of logic.

From a (unit-level) test-driven standpoint, you will almost never have the knowledge up front to be able to say, for example, “This function must respond in five microseconds or less.” Determining that need would require that you know how the performance characteristics of the function relate to an end-to-end behavioral need. Even if you could derive a specific micro-level performance specification, you’d find it difficult to determine a consistent measurement that would support all your platforms (development, integration, production, and so on) equally well, given variant machine characteristics.

A Performance Optimization Test Strategy

The general strategy for performance optimization is as follows:

  • Using a test framework, build and execute driver code that baselines the existing performance of your system for the underperforming case.

  • Ensure you have tests that demonstrate the proper behavior of the feature functionality—it’s fairly easy to break things when optimizing a system.

  • Change the driver code into a test that specifies the current performance baseline. This baseline test should fail if an attempted optimization degrades performance.

  • Add a second goal test to execute the same functionality that passes only if the desired new performance is met. (This might be a second assertion in the baseline test.)

  • Determine the performance bottleneck.

  • Attempt to optimize code in the area of the bottleneck. You should be able to discern whether an algorithmic-level optimization is possible. (For example, replace an O(n²) algorithm with one that’s O(n log n)). If so, start there. Otherwise, start with optimizations that retain high-quality design and expressiveness. Often, suboptimal use of C++ can be a culprit (for example, how you pass arguments, use assignment, construct new objects, and make misguided attempts to do better than STL containers and/or Boost).

  • Ensure your unit and acceptance tests still pass.

  • Run the baseline test; if it fails (in other words, if the new performance is worse), discard the modifications and try again.

  • Run the goal test; if it passes, ship it!

  • Otherwise, you might be able to solve the performance challenge by identifying the next-biggest bottleneck, attempting to improve its performance, and so on. However, it’s also possible that your optimization attempt was an inappropriate choice. Preferably, take note of the relative amount of improved performance, and shelve the code changes. Seek another optimization and repeat, checking each time to determine whether the optimizations will add up to the performance goal.

    If you do incrementally incorporate an optimization, ensure you update the criteria in the baseline test.

Here are some extremely important themes as you attempt to optimize:

  • Run the performance tests on a machine with the same characteristics as the production target. Results from tests run elsewhere may not accurately depict the impact of optimizations in production, making such optimizations potentially a waste of time or worse.

  • Don’t assume anything. Your notions as to what should be optimized are often wrong. Always measure before and after.

  • Get the design right first, and only then introduce optimizations. Introduce optimizations that sacrifice on maintainable design and readability only if you absolutely must. Get the design right first!

Relative Unit-Level Performance Tests

Unit-level performance tests can help you along the way, but you can’t use them to determine whether you’ve met the performance goal. Instead, you’ll use them as tools to help you probe at pieces of the puzzle.

In this section, you’ll learn a simple technique for obtaining the average execution time of a tested function. The time will only have meaning as it relates to optimization attempts against that same function.

In the rare case where you are able to define a unit-level need up front, you can test-drive that need using the Relative Unit-Level Performance Tests (RUPTs, I’ll call them). Otherwise, you’ll be in the realm of Test-After Development (TAD).

The steps for a RUPT are much as you would expect.

  1. Create a loop that executes the behavior you want to time repeatedly, perhaps 50,000 times. Looping through should eliminate any aberrations due to startup overhead or clock cycles. You’ll want to make sure the compiler does not optimize away any of the behavior you want to time.

  2. Just prior to the loop, capture the current time in a variable called start.

  3. Just after the code that executes the behavior, capture the current time in stop. Your relative measurement is the elapsed time of stop - start.

  4. Run the RUPT and note the elapsed time. Seek an elapsed time of a few seconds, and alter the number of loop iterations if needed.

  5. Increase the number of iterations by an order of magnitude. Run the test and ensure that the elapsed time similarly increases. If not, your RUPT cannot accurately characterize your optimization attempt. Determine the reason and fix it.

  6. Run the RUPT a few more times. If the elapsed times vary wildly, you do not have a valid RUPT. Determine the reason and fix it. Otherwise, note the average.

  7. Attempt to optimize the code.

  8. Run the RUPT several times and note the average.

  9. If the improvement was considerable, run your performance and goal baselines. Otherwise, discard the change.

The RUPTs are probes that you should discard or relegate to a slush pile of meaningless code that you might plunder later. By no means should they appear in your production unit test suite.

Seeking to Optimize GeoServer Code

Let’s work through a short example of creating a RUPT.

c9/24/GeoServerTest.cpp
 
TEST(AGeoServer_Performance, LocationOf) {
 
const​ ​unsigned​ ​int​ lots{50000};
 
addUsersAt(lots, Location{aUserLocation.go(TenMeters, West)});
 
 
TestTimer t;
 
for​ (​unsigned​ ​int​ i{0}; i < lots; i++)
 
server.locationOf(userName(i));
 
}

The TestTimer class is a simple class that spits out a performance measurement on the console once it goes out of scope. Refer to the following section (The TestTimer Class) for its implementation.

Here’s the code we’re testing. Both locationOf and isTracking execute a find call. Is this an unacceptable performance sink?

c9/24/GeoServer.cpp
 
bool​ GeoServer::isTracking(​const​ ​string​& user) ​const​ {
 
return​ find(user) != locations_.end();
 
}
 
 
Location GeoServer::locationOf(​const​ ​string​& user) ​const​ {
 
if​ (!isTracking(user)) ​return​ Location{}; ​// TODO performance cost?
 
 
return​ find(user)->second;
 
}

We set the number of iterations to 50,000 and run the test a few times. We note an average time (50ms on my machine).

We bump the number of iterations up to 500,000 and run the tests another few times, again noting the average. We expect to see the average correspondingly increase roughly by an order of magnitude, and it does. The average of three runs clocks in at 574ms. If it hadn’t increased, we would have needed to figure out how to prevent the C++ compiler from cleverly optimizing the operations executed in the loop. (Under gcc, you can add an assembler instruction: asm("");.)

We change the code to eliminate the second call to the find function.

c9/25/GeoServer.cpp
 
Location GeoServer::locationOf(​const​ ​string​& user) ​const​ {
 
// optimized
 
auto​ it = find(user);
 
if​ (it == locations_.end()) ​return​ Location{};
 
return​ it->second;
 
}

Yes, a comment is prudent (you might provide a bit more explanation, though). Programmers in a good TDD shop should always be seeking to improve the quality of the code. Without a comment to indicate why you coded it that way, a good programmer is likely to clean up a messier, performance-optimized chunk of code. And since you don’t typically run performance-related acceptance tests continually, it may be difficult to discern the code change that caused a goal performance test to fail.

We rerun the performance tests and note a new average of 488ms, which is 86ms faster than before. The math says that introducing the redundant call to find incurs a cost of almost 18 percent performance degradation per request. It sounds substantial and may well be, but remember that we’re running half a million requests. Per request, we’re talking 0.17 microseconds difference.

These are facts about the changes in behavior from a performance perspective. While they provide only relative, isolated meaning value, they’re not suppositions. We know that our attempt at code optimization was successful—that it improved the execution time of this small unit of code. That’s more than we knew before. It’s also more than most developers know after they attempt to optimize a solution.

The question becomes, is it useful? At this point, we would run our baseline and goal performance tests and determine whether the optimization is necessary. If not, it serves only to make the code more difficult, and we happily discard it.

The cost of retaining the optimization appears minimal. The locationOf function increases by only a line of code, to three simple lines. Many useful optimizations create code that’s considerably harder to decipher and maintain.

Yet there’s another potential optimization route that would be easy to apply, given that we have a clean design. In a GeoServer that tracks tens or hundreds of thousands of users, a user cache might make a lot more sense. During any given time period, the server will likely be asked the locations of a much smaller subset of users, and many requests will duplicate a prior request. Currently, the lookups into the location_ map all funnel through the accessor function find. We could change the code in find to use a cache. Client code would retain its current, expressive design. In contrast, introducing a cache in a class where code always directly accesses member variables can represent a prolonged effort.

A clean design helps with performance optimization in a couple ways. First, it’s easier to pinpoint performance problems using a profiler when you have small functions. Second, small classes and functions increase your potential to consider creative optimizations. They also increase the ease of making the changes once you’ve identified the problem. In contrast, imagine a 500-line function that hides a performance bottleneck. It will take you longer both to determine the problem and to resolve it. (And a 500-line function will almost never have sufficient tests to give you the confidence to make appropriate optimization changes.)

The TestTimer Class

The TestTimer class is a hastily coded, simple tool that you can place at any appropriate point in your test. It prints the elapsed time when it goes out of scope, as well as explanatory text passed to the constructor. Using the no-arg constructor results in the name of the current test being printed.

c9/25/TestTimer.h
 
#ifndef TestTimer_h
 
#define TestTimer_h
 
 
#include <string>
 
#include <chrono>
 
 
struct​ TestTimer {
 
TestTimer();
 
TestTimer(​const​ std::​string​& text);
 
virtual​ ~TestTimer();
 
 
std::chrono::time_point<std::chrono::system_clock> Start;
 
std::chrono::time_point<std::chrono::system_clock> Stop;
 
std::chrono::microseconds Elapsed;
 
std::​string​ Text;
 
};
 
 
#endif
c9/25/TestTimer.cpp
 
#include "TestTimer.h"
 
#include "CppUTest/Utest.h"
 
#include <iostream>
 
 
using​ ​namespace​ std;
 
 
TestTimer::TestTimer()
 
: TestTimer(UtestShell::getCurrent()->getName().asCharString()) {
 
}
 
 
TestTimer::TestTimer(​const​ ​string​& text)
 
: Start{chrono::system_clock::now()}
 
, Text{text} {}
 
TestTimer::~TestTimer() {
 
Stop = chrono::system_clock::now();
 
Elapsed = chrono::duration_cast<chrono::microseconds>(Stop - Start);
 
cout << endl <<
 
Text << ​" elapsed time = "​ << Elapsed.count() * 0.001 << ​"ms"​ << endl;
 
}

You can and should enhance the timer class to suit your needs. You might want to make it thread-safe (it is not). You might prefer using a different platform-specific timing API, or your system might provide a separate implementation for C++11’s high-resolution clock. You might be able to measure using a smaller duration (nanoseconds!), or perhaps you need to use larger durations. Or you might choose to simply insert the three or four lines required directly into your tests, though that seems like unnecessary effort.

Performance and Small Functions

C++ programmers burn billions of anxiety calories annually over the performance cost of a member function call. For that reason, many programmers resist the notion of creating small functions and classes. “I don’t want to extract that code into a separate function; it might represent a performance problem.” Yet compilers today are very smart beasts, able to optimize code in many cases better than you ever could by hand.

Rather than base your resistance to small functions on old wives’ tales, consider real data.

c9/26/GeoServer.cpp
 
Location GeoServer::locationOf(​const​ ​string​& user) ​const​ {
 
// optimized
 
auto​ it = locations_.find(user);
 
if​ (it == locations_.end()) ​return​ Location{};
 
return​ it->second;
 
}

Before manually inlining the find function, the average execution time was 488ms. After inlining, the average execution time was 476ms, a statistically insignificant difference across a half-million executions.

Was the find function inlined by the compiler in the first place? If we force the issue and tell gcc to not inline the function, as follows, there is no substantial difference in execution time (474ms):

c9/27/GeoServer.h
 
std::unordered_map<std::​string​, Location>::const_iterator
 
find(​const​ std::​string​& user)
*
const
 
__attribute__((noinline));

One other interesting aspect of small functions is that C++ compilers are more likely able to inline them in the first place. With larger functions, you actually decrease the compiler’s chances to optimize code.

The reality is that not extracting code to smaller methods represents poor design, and it does virtually nothing to improve the performance of your application. Performance experts already know this.

And don’t trust me, I could well be an old wife. Trust your own measurements.

Recommendations

Many thoughts on performance optimization are based on folklore and the experience of others. Don’t trust the experiences of others. Then again, since most everyone is saying the same thing, it’s probably worth listening to what all consistently say. And I’ll add my experiences to the mix.

My Experiences with Optimization

As a programmer, I’ve been involved in a number of optimization attempts. As a consultant, I’ve worked with several programmers for whom optimization was their primary job (one on a system needing to consistently process 20,000+ transactions per second). In both realms, I’ve experienced and witnessed successes that stemmed from a disciplined approach similar to the previous recommendations. I’ve also seen a spectacular failure as one company hired high-priced consultants to desperately attempt to fix a live, production-scaling challenge by stabbing haphazardly at optimization attempts.

A few key elements appear to provide the best defense against performance challenges.

  • A solid architecture, where the word architecture means the layout of all those things that will be very difficult to change once the system is in place. Specifically, where are the communication points between components (distributed across clients and servers), and how does the architecture support scaling without requiring code changes (in other words, by beefing up hardware)?

  • A solid but flexible design with clean code, complete with tests that provide the flexibility to make confident, dramatic changes when needed.

  • Performance goal tests from day one that specify future scaling expectations. If you expect to deploy your application initially to a dozen users and then ultimately to a hundred, you want to know as soon as possible when new code puts the scaling target at risk.

As far as code-level optimization goes, I have yet to see evidence, or hear it from a performance expert, that refutes the classic advice of getting the design right before attempting optimization and then optimizing only if absolutely necessary.

I’ve witnessed many wrong-headed optimization attempts. In some cases, they were based on misguided or downright false folklore (sometimes even based on another language!). In other cases, the performance recommendations were once true, but later compiler and runtime improvements rendered them obsolete.

Some code-level optimizations do fall in the category of “free." For example, passing by reference in C++ is usually more efficient than passing by value, and it costs nothing in expressiveness. Where such optimizations do not degrade readability or ease of maintenance, go for ’em. Otherwise, save the optimization attempts for later, much later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.62.122