Pick Low-Hanging Fruit

Finally, it’s time to optimize. We’ll start by looking at the flat profile in KCachegrind and sort it by Self time, as shown in the figure.

images/src/optimization_session_start_flat_profile.png

Object#parse_col comes first, taking more than half a second of execution time. The function itself is not that slow, but we call it 150,000 times. Also, it calls two slow functions: <Class::Date>#parse and Regexp#match.

We should definitely concentrate our optimization on this group of three functions. Why is column parsing so slow? It’s not because it takes too much time to parse one row. We do that in about 3 microseconds, but repeat that 150,000 times (50,000 rows multiplied by 3 columns). Can we reduce the number of repetitions? Not unless we change the way we parse the data or the amount of data itself.

Before we make any big architectural changes to the program, let’s search for the low-hanging fruit in the parse_col function itself.

chp4/app.rb
 
def​ parse_col(col)
 
if​ col =~ /^​d​+$/
 
col.to_i
 
elsif​ col =~ /^​d​{4}-​d​{2}-​d​{2}$/
 
Date.parse(col)
 
else
 
col
 
end
 
end

Inside we see the Date#parse that’s responsible for about 800 ms on its own. Yes, we can expect that parsing dates is slow. But let’s see what it’s doing internally. Let’s select its Date#parse in the flat profile and look at the call graph in the bottom-right view, shown in the image.

images/src/optimization_session_date_parse_call_graph.png

OK, it uses regular expressions to extract the year, month and date numbers. But wait, we already do that ourselves in the elsif statement. Date#parse just repeats the same thing. So why don’t we use results of our own parsing and create a Date instance ourselves? Well, let’s try that.

chp5/app_optimized1.rb
 
def​ parse_col(col)
 
if​ col =~ /^​d​+$/
 
col.to_i
 
elsif​ match = /^(​d​{4})-(​d​{2})-(​d​{2})$/.match(col)
 
Date.new(match[1].to_i, match[2].to_i, match[3].to_i)
 
else
 
col
 
end
 
end

In addition to changing parsing to object creation, we had to make two other changes. First, the regular expression now has three groups to capture: year, month, and day. Second, we converted the captured results to integers because the Date constructor doesn’t accept string arguments.

We’ll need to compare profiles before and after optimization, so we’ll save the new profile to a different file.

chp5/app_optimized1.rb
 
printer = RubyProf::CallTreePrinter.new(result)
 
printer.print(File.open(​"callgrind.out.app_optimized1"​, ​"w+"​))

Before reprofiling, we run the tests to make sure we didn’t break anything while optimizing.

 
$ ​bundle exec ruby app.rb --test
 
# Running tests:
 
..
 
Finished tests in 4.115975s, 0.4859 tests/s, 0.7289 assertions/s.
 
2 tests, 3 assertions, 0 failures, 0 errors, 0 skips

Let’s run the application with bundle exec ruby app.rb, go to KCachegrind, open the new profile, and compare it to the original one.

images/src/optimization_session_no_date_parse.png

Hey, we actually optimized something! Look at the Incl. time for Global#[No method]. We reduced it by about 500 ms—that’s a 20% speedup. Yes, we’ve added about 100 ms for string-to-integer conversion, and we do take additional time to extract groups out of matched strings. But we optimized the 800 ms that date parsing took. So we came out net positive despite additional work our program has to do.

What’s next? Let’s look at the flat profile one more time. Object#parse_col is still slow for no obvious reason. Remember, it’s just an if/else statement that doesn’t do anything on its own. So we’ll skip it and look at what’s next: regexp matching and string-to-integer conversion. We can’t really get rid of the latter because it’s a part of our new manual date parsing code. So let’s look what can we do with the former. Go back again to the source.

chp5/app_optimized1.rb
 
def​ parse_col(col)
 
if​ col =~ /^​d​+$/
 
col.to_i
 
elsif​ match = /^(​d​{4})-(​d​{2})-(​d​{2})$/.match(col)
 
Date.new(match[1].to_i, match[2].to_i, match[3].to_i)
 
else
 
col
 
end
 
end

Let’s take a wild guess. What if we merge the two regexp matches in the if and elsif branches like this, retest, and reprofile?

chp5/app_optimized2.rb
 
def​ parse_col(col)
 
if​ match = /^(​d​+)$|^(​d​{4})-(​d​{2})-(​d​{2})$/.match(col)
 
if​ match[1]
 
match[1].to_i
 
else
 
Date.new(match[2].to_i, match[3].to_i, match[4].to_i)
 
end
 
else
 
col
 
end
 
end
images/src/optimization_session_complex_regexp.png

But this time there’s no improvement. We even lost about 200 ms if you compare Incl. time for Global#[No method] between this and previous optimization attempts. Why? The flat profile can tell us. Look how increased complexity of the regular expression matcher negated any potential benefits by taking twice as much time.

So this optimization didn’t work. What lesson have we just learned? Always complete the profile--optimize--profile cycle. Your “optimization” might not be an optimization at all. You never know unless you reprofile.

What else can we optimize? Let’s tackle an issue of large self time of Object#parse_col. It takes 27% of total execution time. That’s too much self time for a function that’s just an if-else statement that calls another function. What’s the deal?

Since there’s no code other than function calls, then it must be a function call itself that takes this much time. Ruby is an interpreted language, and there’s a cost to any function call because of that.

Let’s look again at the code and estimate how many functions Object#parse_col calls.

chp5/app_optimized3.rb
 
def​ parse_col(col)
​ 
if​ col =~ /^​d​+$/
​ 
col.to_i
​ 
elsif​ match = /^(​d​{4})-(​d​{2})-(​d​{2})$/.match(col)
​ 
Date.new(match[1].to_i, match[2].to_i, match[3].to_i)
 
else
 
col
 
end
​ 
end

Two calls: Regexp#initialize and String#=~. These are part of the if condition and are executed every time we call Object#parse_col.

One call to String#to_i. Executed only in one-third of the cases because we have three columns to parse in our example data and only one of the columns is a number.

Again two calls, to Regexp#initialize and Regexp#match. These are executed in two-thirds of the cases.

Date constructor call. We have one date column out of three, hence it’s executed only in one-third of the cases.

While this is not exactly a function call, there’s some work that Ruby does to return the value from a function.

In total we have an average of 2 + 1/3 + 2/3 + 1/3 + 1/3 = 2 2/3 function calls and one value returned from each Object#parse_col invocation.

To test our theory we’ll add a fake column parsing function that will call another dummy function 2 2/3 times on average. We’ll make 150,000 calls to our fake function, the same number as the real one gets, and compare their self times.

chp5/app_optimized3.rb
 
def​ dummy
 
end
 
 
def​ fake_parse_col
 
dummy; dummy;
 
dummy ​if​ rand < 2/3.0
 
end
 
# ...
 
data = generate_test_data
 
GC.disable
 
result = RubyProf.profile ​do
 
people = parse_data(data)
 
find_youngest(people)
 
150000.times { fake_parse_col }
 
end
 
 
printer = RubyProf::CallTreePrinter.new(result)
 
printer.print(File.open(​"callgrind.out.app_optimized3"​, ​"w+"​))

Our fake column parsing function makes three function calls at all times: two of them are to dummy and one of them to rand. In two-thirds of the cases there’s one more dummy invocation. That roughly accounts for 2 2/3 function calls per one fake_parse_col.

Let’s look at the flat profile in KCachegrind.

images/src/optimization_session_fake_parse_col.png

It turns out our theory is correct. Look how similar the self times are for Object#parse_col and Object#fake_parse_col. Does this mean there’s a large cost to a function call in Ruby? Not necessarily. Remember, we run our code in the profiler that does extra work (usually measurements) per each function call.

What part of that cost is the profiler upkeep? It’s easy to find out—we’ll just run the same code from the Ruby console without any profiler and use Benchmark.realtime for measurements.

 
> irb
 
irb(main):001:0> ​def​ dummy
 
irb(main):002:1> ​end
 
=> :dummy
 
irb(main):003:0> ​def​ fake_parse_col
 
irb(main):004:1> dummy; dummy;
 
irb(main):005:1* dummy ​if​ rand < 2/3.0
 
irb(main):006:1> ​end
 
=> :fake_parse_col
 
irb(main):007:0> require ​'benchmark'
 
=> true
 
irb(main):008:0> Benchmark.realtime { 150000.times { fake_parse_col } }
 
=> 0.049100519

That’s approximately 10 times faster than in the profiler. What’s our conclusion? Profiler adds up to 10 times to the Ruby function call cost, so we don’t need to optimize the functions with large self times. Without the profiler they won’t cause a performance problem.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.144.108