Assert Performance

We know that assert_performance should measure the current performance, compare it with the performance from the previous run, and store the current measurements as the reference value for the next run. Of course, the first test run should just store the results because there’s no previous data to compare with.

Now let’s think through success and failure scenarios for such tests. Failure is easy. If performance is significantly worse, then report the failure. The success scenario, though, has two possible outcomes: one when performance is not significantly different, and another when it has significantly improved.

It looks like it’s not enough just to report failure/success. We need to report the current measurement, as well as any significant difference in performance.

So let’s get back to the editor and try to do exactly that.

chp8/assert_performance.rb
 
require ​'minitest/autorun'
 
 
class​ Minitest::Test
 
def​ assert_performance(current_performance)
 
self.assertions += 1 ​# increase Minitest assertion counter
 
 
benchmark_name, current_average, current_stddev = *current_performance
 
past_average, past_stddev = load_benchmark(benchmark_name)
 
save_benchmark(benchmark_name, current_average, current_stddev)
 
 
optimization_mean, optimization_standard_error = compare_performance(
 
past_average, past_stddev, current_average, current_stddev
 
)
 
 
optimization_confidence_interval = [
 
optimization_mean - 2*optimization_standard_error,
 
optimization_mean + 2*optimization_standard_error
 
]
 
 
conclusion = ​if​ optimization_confidence_interval.all? { |i| i < 0 }
 
:slowdown
 
elsif​ optimization_confidence_interval.all? { |i| i > 0 }
 
:speedup
 
else
 
:unchanged
 
end
 
 
print ​"%-28s %0.3f ± %0.3f: %-10s"​ %
 
[benchmark_name, current_average, current_stddev, conclusion]
 
if​ conclusion != :unchanged
 
print ​" by %0.3f..%0.3f with 95%% confidence"​ %
 
optimization_confidence_interval
 
end
 
print ​" "
 
 
if​ conclusion == :slowdown
 
raise MiniTest::Assertion.new(​"​#{benchmark_name}​ got slower"​)
 
end
 
end
 
 
private
 
 
def​ load_benchmark(benchmark_name)
 
return​ [nil, nil] ​unless​ File.exist?(​"benchmarks/​#{benchmark_name}​"​)
 
benchmark = File.read(​"benchmarks/​#{benchmark_name}​"​)
 
benchmark.split(​" "​).map { |value| value.to_f }
 
end
 
 
def​ save_benchmark(benchmark_name, current_average, current_stddev)
 
File.open(​"benchmarks/​#{benchmark_name}​"​, ​"w+"​) ​do​ |f|
 
f.write ​"%0.3f %0.3f"​ % [current_average, current_stddev]
 
end
 
end
 
 
def​ compare_performance(past_average, past_stddev,
 
current_average, current_stddev)
 
# when there's no past data, just report no performance change
 
past_average ||= current_average
 
past_stddev ||= current_stddev
 
 
optimization_mean = past_average - current_average
 
optimization_standard_error = (current_stddev**2/30 +
 
past_stddev**2/30)**0.5
 
 
# drop non-significant digits that our calculations might introduce
 
optimization_mean = optimization_mean.round(3)
 
optimization_standard_error = optimization_standard_error.round(3)
 
 
[optimization_mean, optimization_standard_error]
 
end
 
end

Again, this includes some simplifications you can easily undo. First, we save the benchmark results to the file in a predefined hard-coded location. Second, we hardcode the number of measurement repetitions to 30, exactly as in the performance_benchmark function. And third, our assert_performance works only with Minitest 5.0 and later, so we need to install the minitest gem.

But now that we have our assert, we can write our first performance test.

chp8/test_assert_performance1.rb
 
require ​'assert_performance'
 
require ​'performance_benchmark'
 
 
class​ TestAssertPerformance < Minitest::Test
 
 
def​ test_assert_performance
 
actual_performance = performance_benchmark(​"string operations"​) ​do
 
result = ​""
 
700.times ​do
 
result += (​"x"​*1024)
 
end
 
end
 
assert_performance actual_performance
 
end
 
 
end

Let’s run it (don’t forget to gem install minitest first).

 
$ ​ruby -I . test_assert_performance1.rb
 
# Running:
 
string operations 0.172 ± 0.011: unchanged
 
.
 
Finished in 2.294557s, 0.4358 runs/s, 0.4358 assertions/s.
 
1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

The first run will save the measurements to the benchmarks/string operations file. If we rerun the test without making any changes, it should report no change.

 
$ ​ruby -I . test_assert_performance1.rb
 
# Running:
 
string operations 0.168 ± 0.016: unchanged
 
.
 
Finished in 2.313815s, 0.4322 runs/s, 0.4322 assertions/s.
 
1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

As expected, the test reports that performance hasn’t changed despite the difference in average numbers. That’s statistical analysis at work! Now you know why we spent so much time talking about it.

Now let’s optimize the program. I’ll take my own advice from Chapter 2 and replace String#+= with String#<<.

chp8/test_assert_performance2.rb
 
require ​'assert_performance'
 
require ​'performance_benchmark'
 
 
class​ TestAssertPerformance < Minitest::Test
 
 
def​ test_assert_performance
 
actual_performance = performance_benchmark(​"string operations"​) ​do
 
result = ​""
 
700.times ​do
*
result << (​"x"​*1024)
 
end
 
end
 
assert_performance actual_performance
 
end
 
 
end

Let’s run the performance test again.

 
$ ​bundle exec ruby -I . test_assert_performance2.rb
 
# Running:
 
string operations 0.004 ± 0.000: speedup by 0.161..0.167 with 95% confidence
 
.
 
Finished in 1.089948s, 0.9175 runs/s, 0.9175 assertions/s.
 
1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

And of course the test reports the huge optimization. That’s exactly what we like to see when we optimize.

However, if the execution environment isn’t perfect, our performance test might report a slowdown or optimization even if we did nothing. For example, I can get the slowdown error from the first unoptimized test on my laptop when it gets busy doing something else. This is one such test run:

 
$ ​ruby -I . test_assert_performance1.rb
 
# Running:
 
string operations 0.201 ± 0.059: slowdown by -0.044..-0.022 with 95% confidence
 
F
 
Finished in 2.456716s, 0.4070 runs/s, 0.4070 assertions/s.
 
 
1) Failure:
 
TestAssertPerformance#​test_assert_performance [test_assert_performance1.rb:10]:
 
string operations got slower
 
 
1 runs, 1 assertions, 1 failures, 0 errors, 0 skips

See how big my standard deviation is? It’s almost a quarter of my average. This means that some of the measurements were outliers, and they made the test fail.

We already talked about two ways of dealing with that. One is to further minimize external factors. Another is to exclude outliers. But there’s one more: you can increase the confidence level for the optimization interval.

The 95% confidence interval we use is roughly plus/minus two standard errors from the mean of the difference between before and after numbers. We can demand 99% confidence. This increases the interval to about plus/minus three standard errors.

Let’s do some quick math to see whether that helps with my failing test. My before and after numbers numbers are 0.168 ± 0.016 and 0.201 ± 0.059.

The mean of the difference is

images/_pragprog/svg-0018.png

The standard error of the mean of the difference is

images/_pragprog/svg-0019.png

The three standard error interval is (-0.066..0). This means that we can’t be 99% confident that the second test run was slower or faster. So the new conclusion is that nothing has changed.

Note how simple tweaking of the confidence interval changed the test outcome. So I recommend that you play with this and come up with the confidence level that works reliably for your performance tests.

There’s of course a limit to confidence level increases. See how we were barely able to determine that performance in our test stayed the same. Had the standard deviation been one millisecond less, we would have declared this run as a slowdown.

You might be tempted to increase the interval size to four or five standard errors from the mean. But in practice, three standard errors (99%) is the highest confidence you should aim for. You can’t demand the confidence of the large hadron collider experiments from your Ruby tests. If your tests are still not reliable, step back and look for more external factors, or start excluding outliers in measurements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.57