Assert Performance

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Assert Performance

We know that assert_performance should measure the current performance, compare it with the performance from the previous run, and store the current measurements as the reference value for the next run. Of course, the first test run should just store the results because there’s no previous data to compare with.

Now let’s think through success and failure scenarios for such tests. Failure is easy. If performance is significantly worse, then report the failure. The success scenario, though, has two possible outcomes: one when performance is not significantly different, and another when it has significantly improved.

It looks like it’s not enough just to report failure/success. We need to report the current measurement, as well as any significant difference in performance.

So let’s get back to the editor and try to do exactly that.

chp8/assert_performance.rb
	require 'minitest/autorun'

	class Minitest::Test
	def assert_performance(current_performance)
	self.assertions += 1 # increase Minitest assertion counter

	benchmark_name, current_average, current_stddev = *current_performance
	past_average, past_stddev = load_benchmark(benchmark_name)
	save_benchmark(benchmark_name, current_average, current_stddev)

	optimization_mean, optimization_standard_error = compare_performance(
	past_average, past_stddev, current_average, current_stddev
	)

	optimization_confidence_interval = [
	optimization_mean - 2*optimization_standard_error,
	optimization_mean + 2*optimization_standard_error
	]

	conclusion = if optimization_confidence_interval.all? { \|i\| i < 0 }
	:slowdown
	elsif optimization_confidence_interval.all? { \|i\| i > 0 }
	:speedup
	else
	:unchanged
	end

	print "%-28s %0.3f ± %0.3f: %-10s" %
	[benchmark_name, current_average, current_stddev, conclusion]
	if conclusion != :unchanged
	print " by %0.3f..%0.3f with 95%% confidence" %
	optimization_confidence_interval
	end
	print " "

	if conclusion == :slowdown
	raise MiniTest::Assertion.new("#{benchmark_name} got slower")
	end
	end

	private

	def load_benchmark(benchmark_name)
	return [nil, nil] unless File.exist?("benchmarks/#{benchmark_name}")
	benchmark = File.read("benchmarks/#{benchmark_name}")
	benchmark.split(" ").map { \|value\| value.to_f }
	end

	def save_benchmark(benchmark_name, current_average, current_stddev)
	File.open("benchmarks/#{benchmark_name}", "w+") do \|f\|
	f.write "%0.3f %0.3f" % [current_average, current_stddev]
	end
	end

	def compare_performance(past_average, past_stddev,
	current_average, current_stddev)
	# when there's no past data, just report no performance change
	past_average \|\|= current_average
	past_stddev \|\|= current_stddev

	optimization_mean = past_average - current_average
	optimization_standard_error = (current_stddev**2/30 +
	past_stddev2/30)0.5

	# drop non-significant digits that our calculations might introduce
	optimization_mean = optimization_mean.round(3)
	optimization_standard_error = optimization_standard_error.round(3)

	[optimization_mean, optimization_standard_error]
	end
	end

Again, this includes some simplifications you can easily undo. First, we save the benchmark results to the file in a predefined hard-coded location. Second, we hardcode the number of measurement repetitions to 30, exactly as in the performance_benchmark function. And third, our assert_performance works only with Minitest 5.0 and later, so we need to install the minitest gem.

But now that we have our assert, we can write our first performance test.

chp8/test_assert_performance1.rb
	require 'assert_performance'
	require 'performance_benchmark'

	class TestAssertPerformance < Minitest::Test

	def test_assert_performance
	actual_performance = performance_benchmark("string operations") do
	result = ""
	700.times do
	result += ("x"*1024)
	end
	end
	assert_performance actual_performance
	end

	end

Let’s run it (don’t forget to gem install minitest first).

	$ ruby -I . test_assert_performance1.rb
	# Running:
	string operations 0.172 ± 0.011: unchanged
	.
	Finished in 2.294557s, 0.4358 runs/s, 0.4358 assertions/s.
	1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

The first run will save the measurements to the benchmarks/string operations file. If we rerun the test without making any changes, it should report no change.

	$ ruby -I . test_assert_performance1.rb
	# Running:
	string operations 0.168 ± 0.016: unchanged
	.
	Finished in 2.313815s, 0.4322 runs/s, 0.4322 assertions/s.
	1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

As expected, the test reports that performance hasn’t changed despite the difference in average numbers. That’s statistical analysis at work! Now you know why we spent so much time talking about it.

Now let’s optimize the program. I’ll take my own advice from Chapter 2 and replace String#+= with String#<<.

chp8/test_assert_performance2.rb
	require 'assert_performance'
	require 'performance_benchmark'

	class TestAssertPerformance < Minitest::Test

	def test_assert_performance
	actual_performance = performance_benchmark("string operations") do
	result = ""
	700.times do
*	result << ("x"*1024)
	end
	end
	assert_performance actual_performance
	end

	end

Let’s run the performance test again.

	$ bundle exec ruby -I . test_assert_performance2.rb
	# Running:
	string operations 0.004 ± 0.000: speedup by 0.161..0.167 with 95% confidence
	.
	Finished in 1.089948s, 0.9175 runs/s, 0.9175 assertions/s.
	1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

And of course the test reports the huge optimization. That’s exactly what we like to see when we optimize.

However, if the execution environment isn’t perfect, our performance test might report a slowdown or optimization even if we did nothing. For example, I can get the slowdown error from the first unoptimized test on my laptop when it gets busy doing something else. This is one such test run:

	$ ruby -I . test_assert_performance1.rb
	# Running:
	string operations 0.201 ± 0.059: slowdown by -0.044..-0.022 with 95% confidence
	F
	Finished in 2.456716s, 0.4070 runs/s, 0.4070 assertions/s.

	1) Failure:
	TestAssertPerformance#test_assert_performance [test_assert_performance1.rb:10]:
	string operations got slower

	1 runs, 1 assertions, 1 failures, 0 errors, 0 skips

See how big my standard deviation is? It’s almost a quarter of my average. This means that some of the measurements were outliers, and they made the test fail.

We already talked about two ways of dealing with that. One is to further minimize external factors. Another is to exclude outliers. But there’s one more: you can increase the confidence level for the optimization interval.

The 95% confidence interval we use is roughly plus/minus two standard errors from the mean of the difference between before and after numbers. We can demand 99% confidence. This increases the interval to about plus/minus three standard errors.

Let’s do some quick math to see whether that helps with my failing test. My before and after numbers numbers are 0.168 ± 0.016 and 0.201 ± 0.059.

The mean of the difference is

The standard error of the mean of the difference is

The three standard error interval is (-0.066..0). This means that we can’t be 99% confident that the second test run was slower or faster. So the new conclusion is that nothing has changed.

Note how simple tweaking of the confidence interval changed the test outcome. So I recommend that you play with this and come up with the confidence level that works reliably for your performance tests.

There’s of course a limit to confidence level increases. See how we were barely able to determine that performance in our test stayed the same. Had the standard deviation been one millisecond less, we would have declared this run as a slowdown.

You might be tempted to increase the interval size to four or five standard errors from the mean. But in practice, three standard errors (99%) is the highest confidence you should aim for. You can’t demand the confidence of the large hadron collider experiments from your Ruby tests. If your tests are still not reliable, step back and look for more external factors, or start excluding outliers in measurements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Assert Performance

Create new playlist

Sign In

Sign Up

Assert Performance

Table of Contents for
Assert Performance