Write Less Ruby

One of the gurus who taught me programming used to say that the best code is the code that does not exist. If we could solve the problem without writing any code, then we wouldn’t have to optimize it. Right?

Unfortunately, in the real world we still write code to solve our problems. But that doesn’t mean that it has to be Ruby code. Other tools do certain things better. We have seen that Ruby is especially bad in two areas: large dataset processing and complex computations. So let’s see what you can use instead, and how that improves performance.

Offload Work to the Database

The Ruby community tends to view databases only as data storage tools. Rails developers are especially prone to this because they often use ActiveRecord and ActiveModel abstractions without having to interface with the database directly. So yes, you can build a Rails application without knowing any SQL or understanding the differences between MySQL and PostgreSQL. But by doing this, you’ll trade performance for convenience and miss out on the data processing power that databases provide.

It turns out—surprise, surprise—that databases are really good at complex computations and other kinds of data manipulation. Let me show you just how good they are.

Let’s imagine we have a large database with company employees, say, 10,000 people working in 25 various departments. We know each person’s salary, and we want to to calculate the employees’ rank within a department by salary.

I’ll use PostgreSQL for this example and will create random data for simplicity. To reproduce this example, you should install and launch the PostgreSQL database server.

 
$ ​createdb company_data
 
$ ​psql company_data
 
create​ ​table​ empsalaries(
 
department_id ​integer​,
 
employee_id ​integer​,
 
salary ​integer​);
 
 
insert​ ​into​ empsalaries (
 
select​ (1 + round(random()*25)), *, (50000 + round(random()*250000))
 
from​ generate_series(1, 10000)
 
);
 
 
create​ ​index​ empsalaries_department_id_idx ​on​ empsalaries (department_id);

Let me explain this in case you’re not familiar with PostgreSQL. The insert statement will generate a series of 10,000 rows (our employee IDs), and then for each of those rows will assign a random department ID from 1 to 25 and a random salary from $50,000 to $250,000.

Let’s first use ActiveRecord to calculate an employee rank. For that we’ll create a folder called group_rank with Gemfile and group_rank.rb in it.

chp2/group_rank/Gemfile
 
source 'https://rubygems.org'
 
 
gem 'activerecord'
 
gem 'pg'
chp2/group_rank/group_rank.rb
 
require ​'rubygems'
 
require ​'active_record'
 
 
ActiveRecord::Base.establish_connection(
 
:adapter => ​"postgresql"​,
 
:database => ​"company_data"
 
)
 
 
class​ Empsalary < ActiveRecord::Base
 
attr_accessor :rank
 
end
 
 
time = Benchmark.realtime ​do
 
salaries = Empsalary.all.order(:department_id, :salary)
 
 
key, counter = nil, nil
 
salaries.each ​do​ |s|
 
if​ s.department_id != key
 
key, counter = s.department_id, 0
 
end
 
counter += 1
 
s.rank = counter
 
end
 
end
 
 
puts ​"Group rank with ActiveRecord: %5.3fs"​ % time

Now let’s run bundler to install all the required gems and launch the application to see how long it takes to execute:

 
$ ​cd group_rank
 
$ ​rbenv shell 2.2.0
 
$ ​bundle install --path .bundle/gems
 
$ ​bundle exec ruby group_rank.rb
 
Group rank with ActiveRecord: 0.264s

Taking 246 ms to process a mere 10,000 rows is pretty bad. Now try to do the same thing with 100,000 rows and 1 million rows. Ruby >= 2.0 will take 2.4 and 24 seconds, respectively. Older Rubys like 1.8 and 1.9 might not even finish because GC will kick in too often. I was patient enough to wait 110 seconds for Ruby 1.9 to process 1 million rows. I’m quite sure the users of my code are not that patient.

Now let’s see how fast PostgreSQL can do the same thing on 10,000 rows:

 
$ ​psql company_data
 
=# iming
 
Timing is on.
 
=# select department_id, employee_id, salary,
 
rank() over(partition by department_id order by salary desc)
 
from empsalaries;
 
Time: 22.573 ms

This is ten times faster in PostgreSQL. As a bonus, it also scales nicely. It needs 280 ms for 100,000 rows and 2.3 seconds for 1 million rows.

Notice how PostgreSQL’s performance is consistently ten times faster than the best of Ruby’s. Yes, my example uses Postgres-specific features like window functions. But that’s exactly my point. The database is much better at data processing. That makes a huge difference. We have seen that ten times is not a limit. Sometimes it’s a difference between never finishing the task in Ruby and completing it in several seconds simply by letting your database do what it’s good at.

Rewrite in C

Ruby is implemented in C, so it has an easy way to interface with C code. So if your Ruby code is slow, you can always rewrite it in C. Wait! What? Fear not, I’m not going to try to talk you into writing the C code yourself. You can certainly do that, but it’s out of the scope of this book. Instead I’d like to point out that there are plenty of Ruby gems written in C that do the job faster than their counterparts.

I divide these native code gems into two types:

  1. Gems that rewrite slow parts of Ruby or Ruby on Rails in C

  2. Gems that implement a specific task in C

The Date::Performance gem[4] is a good example of the first type. It’s an old gem that all Ruby 1.8 developers should use. It transparently replaces the slow Ruby Date and DateTime libraries with a similar implementation written in C.

Note that the Date::Performance gem is Ruby 1.8 only. Ruby 1.9 and later have a date library that is much faster.

Let me show how much faster Date::Performance is. For that, we’ll switch to Ruby 1.8, install the date-performance gem, and measure the execution time (without GC, to factor it out) of a program that creates a lot of Date objects.

 
$ ​rbenv shell 1.8.7-p375
 
$ ​gem install date-performance
 
Fetching: date-performance-0.4.8.gem (100%)
 
Building native extensions. This could take a while...
 
Successfully installed date-performance-0.4.8
 
1 gem installed

Let’s see how Date from the standard library performs.

chp2/date_without_date_performance.rb
 
require ​'date'
 
require ​'benchmark'
 
 
GC.disable
 
 
memory_before = `ps -o rss= -p #{Process.pid}`.to_i/1024
 
 
time = Benchmark.realtime ​do
 
100000.times ​do
 
Date.new(2014,5,1)
 
end
 
end
 
 
memory_after = `ps -o rss= -p #{Process.pid}`.to_i/1024
 
 
puts ​"time: ​#{time}​, memory: ​#{​"%d MB"​ % (memory_after - memory_before)}​"
 
$ ​ruby date_without_date_performance.rb
 
time: 2.19644594192505, memory: 262 MB

We need 2.2 seconds to create 100,000 dates. Now let’s compare this with Date::Performance.

chp2/date_with_date_performance.rb
 
require ​'benchmark'
 
require ​'rubygems'
 
require ​'date/performance'
 
 
GC.disable
 
 
memory_before = `ps -o rss= -p #{Process.pid}`.to_i/1024
 
 
time = Benchmark.realtime ​do
 
100000.times ​do
 
Date.new(2014,5,1)
 
end
 
end
 
 
memory_after = `ps -o rss= -p #{Process.pid}`.to_i/1024
 
 
puts ​"time: ​#{time}​, memory: ​#{​"%d MB"​ % (memory_after - memory_before)}​"
 
$ ​ruby -I . date_with_date_performance.rb --no-gc
 
time: 0.294741868972778, memory: 84 MB

The same code written in C is almost eight times faster! And as a bonus it uses 175 MB less memory. Both are great improvements. That’s why I advise that everybody who is stuck with a good old Ruby 1.8 should use the Date::Performance gem.

There are also gems that implement a specific task in C. The best example of this is markdown libraries. Some of them are written in C, some of them in Ruby. Here’s the performance comparison made by Jashank Jeremy, one of the Jekyll blog engine contributors:

Gem

Language

Speed, posts/second

BlueCloth

C

60.7 ± 17.8

RedCarpet

C

56.1 ± 16.5

RDiscount

C

54.9 ± 16.6

Kramdown

Ruby

40.1 ± 8.4

Maruku

Ruby

17.1 ± 6.5

The slowest C implementation (RDiscount) is 1.4 times faster than the fastest Ruby one (Kramdown). The difference between the fastest and slowest is an impressive 3.5 times. As you can see, it makes total sense to search for gems that do the hard work in native code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.52.188