Visualize with Ruby-Prof Printers

Ruby-prof can generate several types of human-readable reports. I find three of them to be the most useful: flat, call graph, and call stack reports. (You can see the full list in the ruby-prof documentation.)[10] Each serves a different purpose. But we’re learning by example here, and to show them to you, I’ll need a more sophisticated example to profile.

chp4/app.rb
 
require ​'date'
 
require ​'rubygems'
 
require ​'ruby-prof'
 
 
# This generates CSV like
 
# 1, John McTest, 1980-07-01
 
# 2, Peter McGregor, 1985-12-23
 
# 3, Sylvia McIntosh, 1989-06-13
 
def​ generate_test_data
 
50000.times.map ​do​ |i|
 
name = [​"John"​, ​"Peter"​, ​"Sylvia"​][rand(3)] + ​" "​ +
 
[​"McTest"​, ​"McGregor"​, ​"McIntosh"​][rand(3)]
 
[i, name, Time.at(rand * Time.now.to_i).strftime(​"%Y-%m-%d"​) ].join(​','​)
 
end​.join(​" "​)
 
end
 
 
def​ parse_data(data)
 
data.split(​" "​).map! { |row| parse_row(row) }
 
end
 
 
def​ parse_row(row)
 
row.split(​","​).map! { |col| parse_col(col) }
 
end
 
 
def​ parse_col(col)
 
if​ col =~ /^​d​+$/
 
col.to_i
 
elsif​ col =~ /^​d​{4}-​d​{2}-​d​{2}$/
 
Date.parse(col)
 
else
 
col
 
end
 
end
 
 
def​ find_youngest(people)
 
people.map! { |person| person[2] }.max
 
end
 
 
data = generate_test_data
 
GC.disable
 
result = RubyProf.profile ​do
 
people = parse_data(data)
 
find_youngest(people)
 
end
 
 
printer = RubyProf::FlatPrinter.new(result)
 
printer.print(File.open(​"app_flat_profile.txt"​, ​"w+"​), min_percent: 3)
 
 
printer = RubyProf::GraphPrinter.new(result)
 
printer.print(File.open(​"app_graph_profile.txt"​, ​"w+"​), min_percent: 3)
 
 
printer = RubyProf::CallStackPrinter.new(result)
 
printer.print(File.open(​"app_call_stack_profile.html"​, ​"w+"​))

Our program generates random CSV with the list of people and their birth dates, parses it, and finds the youngest person from that list. Here I parse the CSV by hand to make the profile sophisticated, but still simple enough to fully understand. At the end, I print the three reports I mentioned earlier. I limit flat and graph reports to print-only functions that take more than 3% of the total time.

The only external dependency of this example is the ruby-prof gem. I have a Gemfile like the following and run the application with bundle exec ruby app.rb.

chp4/Gemfile
 
source 'https://rubygems.org'
 
gem 'ruby-prof', require: false

All clear? OK, let’s run the app. It will generate three files: app_flat_profile.txt, app_graph_profile.txt, and app_call_stack_profile.html. These correspond to the three reports: flat, graph, and call stack. After you run the app, open these reports in your editor or web browser, and let’s see what useful information we can extract from each of them.

Flat Report: Find Which Functions Are Slow

The flat profile shows the amount of time spent in each function. For our example application, the flat profile looks like this:

chp4/app_flat_profile.txt
 
Thread ID: 70137029546800
 
Fiber ID: 70137039955760
 
Total: 1.882948
 
Sort by: self_time
 
 
%self total self wait child calls name
 
27.33 1.354 0.515 0.000 0.839 150000 Object#parse_col
 
22.31 0.806 0.420 0.000 0.386 50000 <Class::Date>#parse
 
8.59 0.162 0.162 0.000 0.000 100000 Regexp#match
 
5.11 1.707 0.096 0.000 1.611 50000 Object#parse_row
 
4.79 1.797 0.090 0.000 1.707 50002 *Array#map!
 
4.66 0.088 0.088 0.000 0.000 50001 String#split
 
4.46 0.084 0.084 0.000 0.000 50000 String#gsub!
 
 
* indicates recursively called methods

Let me first explain what the columns in the report mean.

%self

The percentage of the time spent only in this function. See the definition of self.

total

The total time spent in this function, including the execution time of functions that it calls.

self

The time spent only in this function, excluding the execution time of functions that it calls.

wait

The time spent waiting for other threads. This will always be zero for single-threaded apps. I’ll sometimes omit this column from profiles included in this book to save some space.

child

The time spent in functions that are called from the current function.

calls

The total number of calls to this function.

The flat report is sorted by self time. So functions at the top of the report are the ones where our program spends most of the time.

For any given function, self time in the report is a sum of self times of all calls to that function, no matter where this call is in the code. For example, String#split is called once from the parse_data function and 50,000 times in the loop from parse_row. In the flat profile we see one line that reports all 50,001 String#split calls. All other metrics are also aggregated.

Also remember that self time doesn’t include the time of nested function calls. For example, self time for Object#parse_col doesn’t include the time of regular expression matching or date parsing. Despite being a simple-looking function with just an if-else statement, Object#parse_col still takes a considerable amount of time on its own. We’ll discuss the reasons shortly, but for now just keep this in mind.

As we now know, the functions at the top are the slowest. So the first thing you should do is to optimize them. There’s a chance this will make your code fast enough right away.

But often optimization doesn’t end here. It’s not enough to optimize the slowest functions by self time, for two reasons:

  • The profiler may still report a large self time for a function even after you speed it up. For example, our top offender, Object#parse_col, does nothing on its own. All it really executes is an if-else statement. And it’s still slow!

  • The slowest function is either a library function, or simple enough already that you cannot further optimize it. In our example, Array#map! is the #3 slowest function. But it’s a library routine that we can’t really improve.

In both cases the slowness is caused by the number of function calls rather than by the function code itself. The obvious way to optimize in this situation is to reduce the number of function calls. For that you need to know where the function is called and how often. But our flat profile doesn’t show that. It’s time to look at another profile: the graph report.

Graph Report: Understand Which Parts of the Code Are Slow

The graph report is sorted by the total time spent in the function, including the time from functions that it calls. Let’s look at the graph profile of our example application and figure out what it means.

chp4/app_graph_profile.txt
 
Thread ID: 70137029546800
 
Fiber ID: 70137039955760
 
Total Time: 1.8829480049999994
 
Sort by: total_time
 
 
%total %self total self child calls Name
 
-----------------------------------------------------------------------
 
100.00% 0.01% 1.883 0.000 1.883 1 Global#[No method]
 
1.777 0.000 1.777 1/1 Object#parse_data
 
0.106 0.000 0.106 1/1 Object#find_youngest
 
-----------------------------------------------------------------------
 
0.000 0.000 0.000 50000/50002 Object#parse_row
 
0.031 0.031 0.000 1/50002 Object#find_youngest
 
1.767 0.060 1.707 1/50002 Object#parse_data
 
95.46% 4.79% 1.797 0.090 1.707 50002 *Array#map!
 
1.707 0.096 1.611 50000/50000 Object#parse_row
 
1.354 0.515 0.839 150000/150000 Object#parse_col
 
-----------------------------------------------------------------------
 
1.777 0.000 1.777 1/1 Global#[No method]
 
94.39% 0.00% 1.777 0.000 1.777 1 Object#parse_data
 
1.767 0.060 1.707 1/50002 Array#map!
 
0.010 0.010 0.000 1/50001 String#split
 
-----------------------------------------------------------------------
 
1.707 0.096 1.611 50000/50000 Array#map!
 
90.67% 5.11% 1.707 0.096 1.611 50000 Object#parse_row
 
0.077 0.077 0.000 50000/50001 String#split
 
0.000 0.000 0.000 50000/50002 Array#map!
 
-----------------------------------------------------------------------
 
1.354 0.515 0.839 150000/150000 Array#map!
 
71.88% 27.33% 1.354 0.515 0.839 150000 Object#parse_col
 
0.806 0.420 0.386 50000/50000 <Class::Date>#parse
 
0.033 0.033 0.000 50000/50000 String#to_i
 
-----------------------------------------------------------------------
 
0.806 0.420 0.386 50000/50000 Object#parse_col
 
42.79% 22.31% 0.806 0.420 0.386 50000 <Class::Date>#parse
 
0.162 0.162 0.000 100000/100000 Regexp#match
 
0.084 0.084 0.000 50000/50000 String#gsub!
 
0.046 0.046 0.000 50000/50000 String#[]=
 
0.039 0.039 0.000 50000/50000 MatchData#begin
 
0.031 0.031 0.000 50000/50000 Fixnum#div
 
0.024 0.024 0.000 50000/50000 MatchData#end
 
-----------------------------------------------------------------------
 
0.162 0.162 0.000 100000/100000 <Class::Date>#parse
 
8.59% 8.59% 0.162 0.162 0.000 100000 Regexp#match
 
-----------------------------------------------------------------------
 
0.106 0.000 0.106 1/1 Global#[No method]
 
5.61% 0.00% 0.106 0.000 0.106 1 Object#find_youngest
 
0.075 0.000 0.075 1/1 Enumerable#max
 
0.031 0.031 0.000 1/50002 Array#map!
 
-----------------------------------------------------------------------
 
0.010 0.010 0.000 1/50001 Object#parse_data
 
0.077 0.077 0.000 50000/50001 Object#parse_row
 
4.66% 4.66% 0.088 0.088 0.000 50001 String#split
 
-----------------------------------------------------------------------
 
0.084 0.084 0.000 50000/50000 <Class::Date>#parse
 
4.46% 4.46% 0.084 0.084 0.000 50000 String#gsub!
 
-----------------------------------------------------------------------
 
0.075 0.000 0.075 1/1 Object#find_youngest
 
3.98% 0.00% 0.075 0.000 0.075 1 Enumerable#max
 
0.075 0.035 0.040 1/1 Array#each
 
-----------------------------------------------------------------------
 
0.075 0.035 0.040 1/1 Enumerable#max
 
3.98% 1.87% 0.075 0.035 0.040 1 Array#each
 
0.040 0.040 0.000 49999/49999 Date#<=>

For this book I’ve generated the graph profile in the plain text format because it looks better on paper. But an HTML representation of the same report is easier to understand and more convenient to navigate because all function names are hyperlinks to their sections of the profile. To get the HTML report, replace the printer = RubyProf::GraphPrinter.new(result) line in our example with printer = RubyProf::GraphHtmlPrinter.new(result). It’s a good idea to always use the HTML report for your own profiling needs.

The meaning of the columns is the same as for the flat report. I removed the wait column from graph reports included in this book so that they fit the page. We won’t profile multithreaded applications, so the wait values will always be zero for us anyway.

Now let me guide you through this report. Each row contains the same function profiling data as in the flat profile, but with immediate function callers listed above and callees listed below.

The topmost function, called Global#nomethod, represents either part of code between RubyProf.start and RubyProf.end or a block passed to RubyProf.profile. It has no callers, so it comes first in its row. Below the Global#nomethod you see its two callees: Object#parse_data and Object#find_youngest. Each is called exactly one time. Hence the profile reports 1/1 calls, meaning that function is called one time from the global block and there is only one such function call in total.

The second row for Array#map! requires more effort to decipher. The function data line is now in the middle because there are both three callers above it and two callees below it. It may be confusing that the Array#map! data line is not the first in its row, but it’s easy to tell it from callers and callees: it is the only line that has %total and %self values.

Total, self, and child times for the function data line have the same meaning as in the flat profile. These are, respectively, the total time that your program spends in the map iterator, the self time of the iterator function, and the time spent in the block that is passed to map and executed in a loop.

Let’s look at Array#map! caller lines and try to make sense out of them.

The Calls column shows that we executed Array#map! 50,002 times: 50,000 of them from Object#parse_row, one from Object#find_youngest, and another one from Object#parse_data. A quick look at the application code confirms these numbers.

The timing columns for the caller lines have a slightly different meaning. For each caller line these show the total, self, and child times that Array#map! took while that caller executed it.

The timing for Object#parse_data and Object#find_youngest makes perfect sense. Parsing took an order of magnitude more time than selecting an element from the array in the find_youngest function.

But zeroes in total, self, and child times for Object#parse_row do not make any sense at the first sight. Row parsing is one of the slowest functions according to our flat profile, isn’t it? The trick is that the Object#parse_data caller line already includes row parsing times because it calls parse_row in a loop. So the profiler in this case tries to prevent double counting and allocates the time only to the topmost function in the call stack. That happens to be Object#parse_data.

What useful information do we extract from this report? First, we get a high-level overview of where your program spends time. For example, our program spends almost 90% of its time iterating the data to parse CSV rows. Second, we can walk through the profile top-down to get an idea of which part of the code we should optimize. For example, we see that the slowest function by total time Array#map! calls Object#parse_row. That, in turn, calls String#split, which has a significant self time. So one way of optimizing would be to reduce the number of String#split calls or to replace them with something else.

But because of aggregation and double counting prevention, the graph report is still not good enough for top-down performance review. There’s a better report for that: the call stack report.

Call Stack Report: Find Which Execution Path Is Slowest

This report really shows the call tree (not a stack as the name implies). Each node in the tree looks like this:

 
% of total time (% of caller time) Function [# of calls, # of calls total]

You get the percentage of this function’s execution time (including its callees) relative to the total execution time, and the caller function’s time. And you get the number of calls to this function within the current execution path and the total number of calls.

Let’s see how the call stack profile looks for our example application.

chp4/app_call_stack_profile.txt
 
Call tree for application app.rb
 
Generated on 2014-10-02 08:53:27 -0500 with options {}
 
Threshold: [1.0] [Apply] [Expand All] [Collapse All] [Show Help]
 
Thread: 70137029546800, Fiber: 70137039955760 (100.00% ~ 1.8829480049999994)
 
* [-] 100.00% (100.00%) Global#[No_method] [1 calls, 1 total]
 
o [-] 94.39% (94.39%) Object#parse_data [1 calls, 1 total]
 
# [-] 93.83% (99.41%) Array#map! [1 calls, 50002 total]
 
# [-] 90.67% (96.63%) Object#parse_row [50000 calls,
 
50000 total]
 
# [-] 81.45% (89.84%) Array#map! [50000 calls,
 
50002 total]
 
# [-] 71.88% (88.25%) Object#parse_col
 
[150000 calls, 150000 total]
 
# [-] 42.79% (59.52%) <Class::
 
Date>#parse
 
[50000 calls, 50000 total]
 
# [ ] 1.76% (2.45%) String#to_i
 
[50000 calls, 50000 total]
 
# [ ] 4.10% (4.52%) String#split [50000 calls,
 
50001 total]
 
# [ ] 0.56% (0.59%) String#split [1 calls, 50001 total]
 
o [+] 5.61% (5.61%) Object#find_youngest [1 calls, 1 total]
 
# [+] 3.98% (70.93%) Enumerable#max [1 calls, 1 total]
 
# [+] 3.98% (99.98%) Array#each [1 calls, 1 total]
 
# [ ] 2.11% (52.96%) Date#<=> [49999 calls,
 
49999 total]
 
# [ ] 1.63% (29.06%) Array#map! [1 calls, 50002 total]

You can read it top-down exactly as you would read the Ruby code. Our code snippet that we profile calls the parse_data and find_youngest functions. parse_data calls parse_row in a loop. That, in turn, calls parse_col in another loop.

What I like about this type of profile is that the total time percentages are not aggregated between code branches. Here’s what I mean: Both flat and graph reports have only one line for Array#map! in the profile, but it’s actually called from three different places in the code. In contrast, this is exactly what you see in the call stack profile. In our example profile you clearly see that a map from find_youngest takes less than 2% of the execution time, whereas the map that parses columns accounts for about 90% of the total time.

The percentage of caller time is useful to see which branch of code is slower. For example, in the find_youngest branch, two-thirds of the time is spent finding the maximum element in the array and only one-third of the time preparing that array.

The call stack profile is definitely easier to understand than the flat or graph profiles. I would even call it intuitive. But such a report works best for small chunks of code. Profiling a large codebase will give you a huge report, less suitable for optimization than for learning how the unfamiliar code works.

That’s why my recommendation is to start with aggregated flat and graph reports and refer to the call stack report only when the data from those two is unclear.

And even when you can’t seem to make sense out of the aggregated reports, a proper visualization tool can help you more than the call stack report will. Let me show you one such tool.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.134.17