Profile Memory with Ruby-Prof

We can use the ruby-prof profiler for memory optimization too. We can ask it to measure either the size or the number of memory allocations, get a profile similar to those we saw earlier, and visualize it with the same tools. Even the approach to memory profiling is going to be similar. But there’s one catch: we’ll need a patched Ruby interpreter for memory profiling to work. Why? Because Ruby doesn’t give the profiler enough information about memory allocation by default—exactly for performance reasons.

If you aren’t adventurous enough to patch and recompile Ruby, you can skip this section and go directly to the next one, where I’ll explain how to incorporate memory measurements into your code. If you’re still with me here, I have good news. It’s not that hard to get the patched Ruby, at least on development boxes.

Memory profiling patches are included in the RailsExpress patch set[19] maintained by Stefan Kaes along with other performance-related patches. I won’t describe the whole patch set, but be sure to look at it. You may find some of those patches useful.

We’ll continue to use rbenv here. We can compile Ruby with RailsExpress patches with both rbenv and rvm. The former needs an additional plug-in[20] with build scripts that I wrote and maintain. If you, like me, use rbenv, install the plug-in now.

At this point we’ve installed the plug-in, and rbenv install -l will show us which RailsExpress versions we can install.

 
$ ​rbenv install -l | grep railsexpress
 
1.9.3-p551-railsexpress
 
2.0.0-p598-railsexpress
 
2.1.5-railsexpress
 
$ ​rbenv install -k 1.9.3-p551-railsexpress
 
$ ​rbenv install -k 2.0.0-p598-railsexpress
 
$ ​rbenv install -k 2.1.5-railsexpress

These versions were the latest as of this writing. Run rbenv install -l to find the latest available patched Ruby version for you.

Also make sure you have ruby-prof gem installed for all patched Rubys:

 
$ ​rbenv shell 1.9.3-p551-railsexpress
 
$ ​gem install ruby-prof

As of this writing, memory profiling patches work only with Ruby 1.8.7, 1.9.3, and 2.0.0; they don’t support 2.1 and 2.2. When patches don’t work, you get a memory profile with all measurements equal to zero. Try memory profiling the simple application with your Ruby version and see whether you get the results.

If you use Ruby 2.1 or 2.2, try memory profiling your app under 2.0. These versions are mostly compatible and differ only by the garbage collector. That difference has no effect on most types of memory profiling.

Now that you have a working memory profiling config, let’s explore memory profiling. Here’s an extremely simple program that allocates a 10 MB string in memory and then transforms it to uppercase.

chp6/memprof_app.rb
 
require ​'rubygems'
 
require ​'ruby-prof'
 
require ​'benchmark'
 
 
exit(0) ​unless​ ARGV[0]
 
 
GC.enable_stats
 
RubyProf.measure_mode = RubyProf.const_get(ARGV[0])
 
 
result = RubyProf.profile ​do
 
 
str = ​'x'​*1024*1024*10
 
str.upcase
 
 
end
 
 
printer = RubyProf::FlatPrinter.new(result)
 
printer.print(File.open(​"​#{ARGV[0]}​_profile.txt"​, ​"w+"​), min_percent: 1)
 
 
printer = RubyProf::CallTreePrinter.new(result)
 
printer.print(File.open(​"callgrind.out.memprof_app"​, ​"w+"​))

When profiling your application, don’t forget to call the GC.enable_stats function before profiling as we did in this example. Otherwise all your measurements will be zero.

Ruby-prof comes with four memory profiling modes:

  • RubyProf::MEMORY measures the total memory usage.

  • RubyProf::ALLOCATIONS measures the number of object allocations.

  • RubyProf::GC_RUNS measures the number of GC runs.

  • RubyProf::GC_TIME measures the GC time.

Our program takes memory profiling mode as an argument and produces the flat profile named by that mode. So, let’s look at what profiles we can get from all these modes.

Joe asks:
Joe asks:
Should I Disable GC During Memory Profiling?

No, we disabled GC for CPU profiling because it adds extra time to random places. But garbage collection neither allocates extra objects nor uses additional memory in a way that the profiler can detect, so it doesn’t have any effect on memory profiles. And for some profiles like GC runs and time, we even have to keep GC turned on.

Profile Total Memory Usage

This profile will show you how much memory your program allocates and where. This is the most important report because there’s a direct relationship between the memory usage, number of garbage runs, and performance.

Let’s run the example program and look at the report. This examples uses Ruby 1.9.3 since it’s supported best by RailsExpress memory patches.

 
$ ​rbenv shell 1.9.3-p551-railsexpress
 
$ ​ruby memprof_app.rb MEMORY

Here’s the report—yours might have slightly different numbers.

 
Thread ID: 4082460
 
Fiber ID: 5403500
 
Total: 21508.534180
 
Sort by: self_time
 
 
%self total self wait child calls name
 
52.38 11265.339 11265.339 0.000 0.000 3 String#*
 
47.61 10240.337 10240.337 0.000 0.000 1 String#upcase
 
 
* indicates recursively called methods

The flat profile measures memory usage in kilobytes. As we’d expect, the program takes slightly more than 10 MB during the large string creation, and 10 MB more to do string replacement.

Note that the numbers we see in the profiler represent only new memory allocated and used in any given function. This has no relation to the current memory usage during the function execution. The profiler shows you neither how much memory the garbage collector reclaimed nor how much memory was allocated before the function was called. The only way to know current memory usage is to ask the operating system.

For optimization, the current memory usage is less relevant than the amount of additional memory used by a function. Even if you temporarily allocate 100 MB to do some operation, you still create 100 MB worth of work for the garbage collector. Current memory usage might have a peak, but only if you allocate more than the current Ruby heap size. Often in production you won’t see the peak at all, and your code will keep being slow until you optimize those 100 MB.

You now understand what you see in the profile. So let’s optimize memory.

As you’ll recall from Modify Strings in Place, we can modify our string in place to reduce the memory consumption. Let’s do that and reprofile.

chp6/memprof_app_optimized.rb
 
require ​'rubygems'
 
require ​'ruby-prof'
 
require ​'benchmark'
 
 
exit(0) ​unless​ ARGV[0]
 
 
GC.enable_stats
 
RubyProf.measure_mode = RubyProf.const_get(ARGV[0])
 
 
result = RubyProf.profile ​do
 
 
str = ​'x'​*1024*1024*10
 
str.upcase!
 
 
end
 
 
printer = RubyProf::FlatPrinter.new(result)
 
printer.print(File.open(​"​#{ARGV[0]}​_optimized_profile.txt"​, ​"w+"​),
 
min_percent: 1)

This is how the profile looks for the optimized program:

chp6/MEMORY_optimized_profile.txt
 
Thread ID: 3407160
 
Fiber ID: 4578980
 
Total: 11269.050781
 
Sort by: self_time
 
 
%self total self wait child calls name
 
99.97 11265.378 11265.378 0.000 0.000 3 String#*
 
 
* indicates recursively called methods

As expected, the program takes half as much memory. String#upcase! doesn’t even appear in the profile because we’ve limited it to functions that take more than 1% of memory.

There’s always some hidden knowledge that the profiler will reveal—especially the memory profiler. Remember iterators that allocate extra objects from Chapter 2, Fix Common Performance Problems? Guess how I learned about them? By profiling a slow piece of my own code, of course. That time I profiled allocations, not memory usage. So let’s see how to do that with ruby-prof.

Ruby-Prof Object Allocations

Garbage collection time increases not only with increased memory usage, but also with more allocations. If you think about it, it’s natural that the more objects the garbage collector has to process, the more time it will take. Take a peek at Chapter 10, Tune Up the Garbage Collector if you can’t wait to learn more about that.

Let’s profile allocations in our example program before String#upcase! optimization.

 
$ ​rbenv shell 1.9.3-p551-railsexpress
 
$ ​ruby memprof_app.rb ALLOCATIONS
chp6/ALLOCATIONS_profile.txt
 
Thread ID: 15433020
 
Fiber ID: 16613520
 
Total: 7.000000
 
Sort by: self_time
 
 
%self total self wait child calls name
 
42.86 3.000 3.000 0.000 0.000 3 String#*
 
28.57 2.000 2.000 0.000 0.000 1 String#upcase
 
28.57 7.000 2.000 0.000 5.000 1 Global#[No method]
 
 
* indicates recursively called methods

In our simple case the allocations are not the problem. We create just seven objects. Two objects created in Global#[No method] are str and ’x’. String multiplication requires three more objects for each of the three “star” operators. String#upcase produces two more objects. One of them is the resulting string in uppercase. Another one is a temporary object that Ruby creates internally when the original string doesn’t fit into the Ruby object and requires extra heap memory (when string length is more than 23 characters on most modern computers).

Our optimized program does exactly two allocations less.

 
$ ​rbenv shell 1.9.3-p551-railsexpress
 
$ ​ruby memprof_app_optimized.rb ALLOCATIONS
chp6/ALLOCATIONS_optimized_profile.txt
 
Thread ID: 13865760
 
Fiber ID: 15186560
 
Total: 5.000000
 
Sort by: self_time
 
 
%self total self wait child calls name
 
60.00 3.000 3.000 0.000 0.000 3 String#*
 
40.00 5.000 2.000 0.000 3.000 1 Global#[No method]
 
 
* indicates recursively called methods

There’s no line for String#upcase! in the profile. That’s because it doesn’t need any extra memory.

I keep saying that it’s best to use less memory and create fewer objects so that garbage collection has less work to do. So why don’t we measure the number of GC runs and time directly? Let’s see.

Ruby-Prof GC Runs and Times

Yes, ruby-prof can measure the number of GC runs and the amount of time GC took.

 
$ ​rbenv shell 1.9.3-p551-railsexpress
 
$ ​ruby memprof_app.rb GC_RUNS
 
$ ​ruby memprof_app.rb GC_TIME
chp6/GC_RUNS_profile.txt
 
Thread ID: 9671460
 
Fiber ID: 10992080
 
Total: 3.000000
 
Sort by: self_time
 
 
%self total self wait child calls name
 
66.67 2.000 2.000 0.000 0.000 1 String#upcase
 
33.33 1.000 1.000 0.000 0.000 3 String#*
 
 
* indicates recursively called methods
chp6/GC_TIME_profile.txt
 
Thread ID: 16753440
 
Fiber ID: 18074020
 
Total: 0.005989
 
Sort by: self_time
 
 
%self total self wait child calls name
 
62.35 0.004 0.004 0.000 0.000 1 String#upcase
 
37.65 0.002 0.002 0.000 0.000 3 String#*
 
 
* indicates recursively called methods

Why didn’t we start by measuring those in the first place? It seems that these numbers are exactly what we need to know—except that they aren’t. Ruby GC is an adaptive system. It adjusts its settings during the program runtime depending on current memory usage, memory allocation pattern, age of allocated objects, original GC settings, and many other factors.

This means that we need to replicate the same program’s state in memory before each profiling session in order to expose the same GC behavior. That is of course not possible except in the simplest cases. Any production application is guaranteed to be in a different state in memory before running the same piece of code that we’re trying to profile. And the same code that triggers GC several times in one case may not trigger it at all in another.

That’s why the numbers of GC runs and time are useless for optimization. We should optimize the memory usage instead because it doesn’t change no matter when the code is executed.

Visualize Memory Profile with KCachegrind

We can visualize all memory profiles with KCachegrind in the same way as CPU profiles. The optimization process is also the same, and will look exactly like the one we went through earlier in this chapter. Let’s look at the unoptimized app memory profile in KCachegrind.

images/src/kcachegrind_memory_profile.png

As you can see, it’s the same information that we saw in the flat text report from ruby-prof. The units (KB) are also the same.

As long as we have patched Ruby and can use it to run our application, memory profiling is easy. But what if we can’t or don’t want to run patched Ruby? Well, there’s still one option open to us. Let’s look at it next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.96.94