Detect Excessive Memory Usage

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Detect Excessive Memory Usage

But before we get to that, let’s talk about how to determine that memory usage is in fact a problem in your application. For that you’ll need two kinds of tools: one for monitoring and one for profiling.

No matter where you deploy, on servers or on customers’ workstations, you’ll need a monitoring tool to show your application’s memory usage. In the Ruby on Rails world the go-to monitoring tool is New Relic.^[13] Some deployment platforms, such as Heroku, have their own monitoring. If you prefer to set up your own solution, tools are available for that too. Good examples are Nagios^[14] and Munin.^[15]

So what can a monitoring tool show you? Something like this:

images/src/heroku_monitoring_memory_leak.png

That’s the Heroku memory metric chart. You can see that once the application starts, its memory usage constantly grows over the course of the next few hours. When it reaches 512 MB, Heroku restarts the application, and then the growth happens again. This all definitely smells, either like a memory leak or a memory-intensive operation on the large dataset.

Should you see something like that happening for your application, your first task would be to investigate what takes that memory. So next we’ll talk about the two tools you can use for that.

Profile Memory with Valgrind Massif

Valgrind^[16] is a profiler for C and C++ programs that collects different data depending on the “tool” that you use. For example, Memcheck records memory leaks, Callgrind records execution times (in the same way ruby-prof does), Massif records heap usage, and so on.

Valgrind Massif produces a chart showing heap usage over time, including information about which parts of the program are responsible for the most memory allocations. Raw Massif output is not human readable, so again you’ll need a visualization tool. The best multiplatform one is ms_print, which is included in the Valgrind suite. The best third-party tool is Massif Visualizer.^[17] The latter works on Linux, with packages available for major distributions, and on Mac OS via my Homebrew KDE tap.^[18]

Let me show you a sample Massif profile for our—already optimized—application from the previous chapter. But first let’s strip everything related to testing and ruby-prof profiling from it. The program should look like this:

chp6/app_optimized4.rb
	require 'date'

	# This generates CSV like
	# 1, John McTest, 1980-07-01
	# 2, Peter McGregor, 1985-12-23
	# 3, Sylvia McIntosh, 1989-06-13
	def generate_test_data
	50000.times.map do \|i\|
	name = ["John", "Peter", "Sylvia"][rand(3)] + " " +
	["McTest", "McGregor", "McIntosh"][rand(3)]
	[i, name, Time.at(rand * Time.now.to_i).strftime("%Y-%m-%d") ].join(',')
	end.join(" ")
	end

	def parse_data(data)
	data.split(" ").map! { \|row\| parse_row(row) }
	end

	def parse_row(row)
	col1, col2, col3 = row.split(",")
	[
	col1.to_i,
	col2,
	Date.new(col3[0,4].to_i, col3[5,2].to_i, col3[8,2].to_i)
	]
	end

	def find_youngest(people)
	people.map! { \|person\| person[2] }.max
	end

	data = generate_test_data
	people = parse_data(data)
	find_youngest(people)

One more thing. You can only run native executables in Valgrind. So if you are using rbenv or any other wrapper script like me, you’ll need to extract from it the full path to the Ruby executable.

	$ valgrind --tool=massif `rbenv which ruby` app_optimized4.rb
	==17814== Massif, a heap profiler
	==17814== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote
	==17814== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
	==17814== Command: ~/.rbenv/versions/2.2.0/bin/ruby app_optimized4.rb
	==17814==
	==17814==

This will produce the output file named massif.out.PID. In my case, it’s called massif.out.17814. Massif prints out the process identifier (PID) of the program it profiles to the console, so be sure to look at that to locate the correct profile.

Let’s open our profile in Massif Visualizer. The result is shown in the screenshot.

images/src/massif-visualizer-example-app-optimized.png

What we see here is a memory consumption chart for the duration of program execution plus a series of snapshots that Massif takes periodically. Each snapshot records how much memory the program has used at the time. Some of the snapshots are detailed and include a call stack, so you can figure out which part of the code took the most memory. Unfortunately, you will see only C/C++ functions in the stack. Even though Valgrind can’t dig into the Ruby code, you still can figure out what happens there. Let’s look closely at one such snapshot.

Particularly interesting is a peak snapshot that I highlighted in the screenshot. In the following image you’ll see the call stack from the Massif Data view.

images/src/massif-visualizer-example-app-optimized-snapshot.png

The first thing we see here is that our code doesn’t use that much memory, peaking only at 21 MB. Roughly half of that stores our unparsed and parsed data. You will see it under the ruby_xmalloc2 branch. Most of the remaining half of the memory is the Ruby object heap, the heap_assign_page branch.

Looking through the Ruby internals stack can be daunting at first. But if you know even a bit about Ruby internals, some of which you’ll learn about later in this book in Chapter 10, Tune Up the Garbage Collector, you should be able to figure it out. For example, how do we know that heap_page_allocate represents Ruby object creation? Because we see a newobj_of call beneath it, and we know that Ruby stores objects in its own heap.

The ruby_xmalloc2 branch is even more obvious. Inside it we only see what looks like constructor calls for arrays of strings. We don’t even have to know what exactly those internal Ruby functions do. What else can they be other than constructor calls for our unparsed and parsed data objects?

From the chart itself we see that up until snapshot #51 the memory consumption was growing linearly. If we expand that snapshot, we’ll see only the functions that create arrays of strings. That is the result of the data generation that we do in the generate_test_data function.

Once data is ready, our program takes 21 MB. Next we see a drop in memory usage by about 7 MB. It looks like GC was able to reclaim some memory. Then we see the 7 MB spike again, a result of data parsing and searching for the youngest person on the list. We can’t optimize data generation in our example, so if we were to optimize memory, data parsing and searching is where we’d concentrate our efforts.

Massif is a great tool for looking at memory consumption. But it lacks insight into your Ruby code, and you have to guess what happens there. It would be great to look into the Ruby call stack, wouldn’t it? It turns out there’s a tool that almost does that.

Profile Object Allocations with Stackprof

Stackprof is the Ruby tool that implements the same idea as Massif. Similar to Massif, it takes snapshots (samples in Stackprof’s parlance) during the program execution, and gives you a similar chart as the result. What’s the catch? It samples the number of object allocations, not memory consumption. Also, it works only with Ruby 2.1 and later.

For some programs the number of objects allocated is roughly proportional to total memory consumption. That’s the case for our example. All our strings are small, fit into the Ruby object, and require no additional memory on the heap. But if your program allocates small amounts of large objects, then Stackprof will not help you with memory optimization.

Let’s see what Stackprof can find in our example.

First, let’s install Stackprof as a gem:

$ ​gem install stackprof​

Now let’s require it in our example and wrap the code into the StackProf.run block:

chp6/app_optimized_stackprof.rb
	require 'rubygems'
	require 'stackprof'

	StackProf.run(
	mode: :object,
	out: 'stackprof-object-app-optimized.dump',
	raw: true) do
	data = generate_test_data
	people = parse_data(data)
	find_youngest(people)
	end

After we run the program it will create the stackprof-object-app-optimized.dump file with profiling results. It’s a binary file that requires us to use the visualization tools that Stackprof has.

The plain-text report aggregates object allocation data either for the whole program or for any given function. Look at the report for our example program:

$ ​stackprof stackprof-object-app-optimized.dump --text​

	==================================
	Mode: object(1)
	Samples: 6149204 (0.00% miss rate)
	GC: 0 (0.00%)
	==================================
	TOTAL (pct) SAMPLES (pct) FRAME
	5599193 (91.1%) 5599193 (91.1%) block in Object#generate_test_data
	500000 (8.1%) 500000 (8.1%) Object#parse_row
	550003 (8.9%) 50003 (0.8%) Object#parse_data
	5599199 (91.1%) 6 (0.0%) Object#generate_test_data
	2 (0.0%) 2 (0.0%) Object#find_youngest
	6149204 (100.0%) 0 (0.0%) block in main
	6149204 (100.0%) 0 (0.0%) main
	6149204 (100.0%) 0 (0.0%) main
	500000 (8.1%) 0 (0.0%) block in Object#parse_data

Here 90% of all object allocations happen when we generate the test data. Parsing takes the rest. And only two allocations happen inside the find_youngest function that returns the program output.

So by looking at the Stackprof output you can determine which functions to optimize to reduce the memory consumption. In our example these are clearly Object#generate_test_data and Object#parse_row.

You can also dig deeper into the functions and see which lines generate too many objects. For example, let’s look at the Object#parse_row.

	$ stackprof stackprof-object-app-optimized.dump
	--text --method "Object#parse_row"

	samples: 500000 self (8.1%) / 500000 total (8.1%)
	callers:
	500000 ( 100.0%) block in Object#parse_data
	code:
	\| 19 \| def parse_row(row)
	250000 (4.1%) / 250000 (4.1%) \| 20 \| col1, col2, col3 = row.split(",")
	\| 21 \| [
	50000 (0.8%) / 50000 (0.8%) \| 22 \| col1.to_i,
	\| 23 \| col2,
	200000 (3.3%) / 200000 (3.3%) \| 24 \| Date.new(...)
	\| 25 \| ]

Here date object creation is the biggest offender, with type conversion coming second.

There are more ways to visualize Stackprof data. For example, you can generate the HTML page with the flame graph. This is what Massif Visualizer does for Massif profiles.

The process of generating a flame graph may be a bit confusing. Once we have a Stackprof dump, we need to generate a JavaScript file with flame graph data:

	$ stackprof stackprof-object-app-optimized.dump
	--flamegraph > stackprof-object-app-optimized-flamegraph.js

Then we get a link to the HTML page that does visualization:

	$ stackprof stackprof-object-app-optimized.dump
	--flamegraph-viewer stackprof-object-app-optimized-flamegraph.js
	open file:///path_to_stackprof/viewer.html?data=
	/path_to_book/chp6/stackprof-object-app-optimized-flamegraph.js

Finally, we copy the link from the output and open it in the browser.

Unlike with Massif, it’s OK to use Stackprof in production. It’s built on top of rb_profile_frames(), a C API for fetching Ruby backtraces. The API performs no allocations and adds minimal CPU overhead. So unlike Massif or ruby-prof, which slow us down by 2--10 times, Stackprof is safe to use in a live system.

Massif only tells us when and where, approximately, memory gets used. Sometimes we’ll be able to guess what takes memory. Stackprof may point at the actual location in the code that takes the memory. But that only works if we’re allocating too many objects, and it won’t be useful if we allocate just a few heavy objects. In practice, to understand and fix the problem we’ll need to dig deeper. There are better tools for that, so let’s look at the available options next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

	samples: 500000 self (8.1%) / 500000 total (8.1%)
	callers:
	500000 ( 100.0%) block in Object#parse_data
	code:
	\| 19 \| def parse_row(row)
	250000 (4.1%) / 250000 (4.1%) \| 20 \| col1, col2, col3 = row.split(",")
	\| 21 \| [
	50000 (0.8%) / 50000 (0.8%) \| 22 \| col1.to_i,
	\| 23 \| col2,
	200000 (3.3%) / 200000 (3.3%) \| 24 \| Date.new(...)
	\| 25 \| ]

Table of Contents for Detect Excessive Memory Usage

Create new playlist

Sign In

Sign Up

Detect Excessive Memory Usage

Profile Memory with Valgrind Massif

Profile Object Allocations with Stackprof

Table of Contents for
Detect Excessive Memory Usage