Make ActiveRecord Faster

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Make ActiveRecord Faster

ActiveRecord is a wrapper around your data. By definition that should take memory, and oh indeed it does. It turns out the overhead is quite significant, in both the number of objects and in raw memory.

To see the overhead, let’s create a database table with 10 string columns and fill it with 10,000 rows, each row containing 10 strings of 100 chars.

chp3/app/db/migrate/20140722140429_large_tables.rb
	class LargeTables < ActiveRecord::Migration
	def up
	create_table :things do \|t\|
	10.times do \|i\|
	t.string "col#{i}"
	end
	end

	execute <<-END
	insert into things(col0, col1, col2, col3, col4,
	col5, col6, col7, col8, col9) (
	select
	rpad('x', 100, 'x'), rpad('x', 100, 'x'), rpad('x', 100, 'x'),
	rpad('x', 100, 'x'), rpad('x', 100, 'x'), rpad('x', 100, 'x'),
	rpad('x', 100, 'x'), rpad('x', 100, 'x'), rpad('x', 100, 'x'),
	rpad('x', 100, 'x')
	from generate_series(1, 10000)
	);
	END
	end
	def down
	drop_table :things
	end
	end

This migration creates 10 million bytes of data (10,000 * 10 * 100), approximately 9.5 MB. A database is quite efficient at storing that. For example, my PostgreSQL installation uses just 11 MB:

	$ psql app_development
	app_development=# select pg_size_pretty(pg_relation_size('things'));
	pg_size_pretty
	----------------
	11 MB

Let’s see how memory-efficient ActiveRecord is. We’ll need to create a Thing model:

chp3/app/app/models/thing.rb
	class Thing < ActiveRecord::Base
	end

And we’ll need to adapt our wrapper.rb measurement helper from the previous chapter to Rails:

chp3/app/lib/measure.rb
	class Measure

	def self.run(options = {gc: :enable})
	if options[:gc] == :disable
	GC.disable
	elsif options[:gc] == :enable
	# collect memory allocated during library loading
	# and our own code before the measurement
	GC.start
	end

	memory_before = `ps -o rss= -p #{Process.pid}`.to_i/1024
	gc_stat_before = GC.stat
	time = Benchmark.realtime do
	yield
	end
	gc_stat_after = GC.stat
	GC.start if options[:gc] == :enable
	memory_after = `ps -o rss= -p #{Process.pid}`.to_i/1024

	puts({
	RUBY_VERSION => {
	gc: options[:gc],
	time: time.round(2),
	gc_count: gc_stat_after[:count].to_i - gc_stat_before[:count].to_i,
	memory: "%d MB" % (memory_after - memory_before)
	}
	}.to_json)
	end

	end

For this to work, add the lib directory to Rails’ autoload_paths in config/application.rb.

chp3/app/config/application.rb
	config.autoload_paths << Rails.root.join('lib')

Got that? Good. Now we can run our migration and measure the memory usage. Note that this needs to be done in production mode to make sure we do not include any of Rails development mode’s side effects.

	$ RAILS_ENV=production bundle exec rake db:create
	$ RAILS_ENV=production bundle exec rake db:migrate
	$ RAILS_ENV=production bundle exec rails console

	2.2.0 :001 > Measure.run(gc: :disable) { Thing.all.load }
	{"2.2.0":{"gc":"enable","time":0.32,"gc_count":1,"memory":"33 MB"}}
	=> nil

ActiveRecord uses 3.5 times more memory than the size of the data. It also triggers one garbage collection during loading.

ActiveRecord is convenient, but the convenience that ActiveRecord affords comes at a steep price. I realize I’m not going to convince you to avoid ActiveRecord. But you do need to understand the consequences of using it. In 80% of cases, the speed of development is worth more than the cost in execution speed. In the remaining 20% of cases, you have other options. Let me show you them.

Load Only the Attributes You Need

Your first option is to load only the data you intend to use. Rails makes this very easy to do, like this:

	$ RAILS_ENV=production bundle exec rails console
	Loading production environment (Rails 4.1.4)

	2.2.0 :001 > Measure.run { Thing.all.select([:id, :col1, :col5]).load }
	{"2.2.0":{"gc":"enable","time":0.21,"gc_count":1,"memory":"7 MB"}}
	=> nil

This uses 5 times less memory and runs 1.5 times faster than Thing.all.load. The more columns you have, the more it makes sense to add select into the query, especially if you join tables.

Preload Aggressively

Another best practice is preloading. Every time you query into a has_many or belongs_to relationship, preload.

For example, let’s add a has_many relationship call to our Thing. We’ll need to set up the migration and ActiveRecord model.

chp3/app/db/migrate/20140724142101_minions.rb
	class Minions < ActiveRecord::Migration
	def up
	create_table :minions do \|t\|
	t.references :thing
	10.times do \|i\|
	t.string "mcol#{i}"
	end
	end

	execute <<-END
	insert into minions(thing_id,
	mcol0, mcol1, mcol2, mcol3, mcol4,
	mcol5, mcol6, mcol7, mcol8, mcol9) (
	select
	things.id,
	rpad('x', 100, 'x'), rpad('x', 100, 'x'), rpad('x', 100, 'x'),
	rpad('x', 100, 'x'), rpad('x', 100, 'x'), rpad('x', 100, 'x'),
	rpad('x', 100, 'x'), rpad('x', 100, 'x'), rpad('x', 100, 'x'),
	rpad('x', 100, 'x')
	from things, generate_series(1, 10)
	);
	END
	end
	def down
	drop_table :minions
	end
	end

chp3/app/app/models/minion.rb
	class Minion < ActiveRecord::Base
	belongs_to :thing
	end

chp3/app/app/models/thing.rb
	class Thing < ActiveRecord::Base
	has_many :minions
	end

Run the migration with RAILS_ENV=production bundle exec rake db:migrate and you will get 10 Minions for each Thing in the database.

Iterating over that data without preloading is not such a good idea.

	$ RAILS_ENV=production bundle exec rails console
	Loading production environment (Rails 4.1.4)

	2.2.0 :001 > Measure.run { Thing.all.each { \|thing\| thing.minions.load } }
	{"2.2.0":{"gc":"enable","time":272.93,"gc_count":16,"memory":"478 MB"}}
	=> nil

Good luck waiting for this one line of code to finish. It needs not only to load everything into memory, but also to execute 10,000 queries against the database to fetch the minions for each thing.

Preloading is the better way.

	$ RAILS_ENV=production bundle exec rails console
	Loading production environment (Rails 4.1.4)

	2.2.0 :001 > Measure.run { Thing.all.includes(:minions).load }
	{"2.2.0":{"gc":"enable","time":11.59,"gc_count":19,"memory":"518 MB"}}
	=> nil

Depending on the Rails version, this might be slightly less memory efficient. But the code finishes 25 times faster because Rails performs only two database queries—one to load things, and another to load minions.

Combine Selective Attribute Loading and Preloading

Even better is to take my advice from the Load Only the Attributes You Need section and select only the columns we need. But there’s a catch. Rails does not have a convenient way of selecting a subset of columns from the dependent model. For example, this will fail:

Thing.all.includes(:minions).select(​"col1"​, ​"minions.mcol4"​).load

It fails because includes(:minions) runs an additional query to fetch minions for the things it selected. And Rails is not smart enough to figure out which of the select columns belong to the Minions table.

If we queried from the side of the belongs_to association, we would use joins.

Minion.where(id: 1).joins(:thing).select(​"things.col1"​, ​"minions.mcol4"​)

From the has_many side joins will return duplicates of the same Thing object, 10 duplicates in our case. To combat that, we can use the PostgreSQL-specific array_agg feature that aggregates an array of columns from the joined table.

	$ RAILS_ENV=production bundle exec rails console
	Loading production environment (Rails 4.1.4)

	2.2.0 :001 > query = "select id, col1, array_agg(mcol4) from things
	2.2.0 :002"> inner join
	2.2.0 :003"> (select thing_id, mcol4 from minions) minions
	2.2.0 :004"> on (things.id = minions.thing_id)
	2.2.0 :005"> group by id, col1"
	=> "select id, col1, array_agg(mcol4) from things
	inner join
	(select thing_id, mcol4 from minions) minions
	on (things.id = minions.thing_id)
	group by id, col1"
	2.2.0 :006 > Measure.run { Thing.find_by_sql(query) }
	{"2.2.0":{"gc":"enable","time":0.62,"gc_count":1,"memory":"8 MB"}}
	=> nil

Just look at the memory consumption: 8 MB instead of 518 MB from a full select with preloading. As a bonus, this runs 20 times faster.

Restricting the number of columns you select can save you seconds of execution time and hundreds of megabytes of memory.

Use the Each! Pattern for Rails with find_each and find_in_batches

It is expensive to instantiate a lot of ActiveRecord models. Rails developers knew that and added two functions to loop through large datasets in batches. Both find_each and find_in_batches will load by default 1,000 objects and return them to you—the first function, one by one; the latter, the whole batch at once. You can ask for smaller or larger batches with the :batch_size option.

find_each and find_in_batches will still have to load all the objects in memory. So how do they improve performance? The effect is the same as with the each! pattern from Use the Each! Pattern. Once you’re done with the batch, GC can collect it. Let’s see how that works.

	$ RAILS_ENV=production bundle exec rails console
	Loading production environment (Rails 4.1.4)

	2.2.0 :001 > ObjectSpace.each_object(Thing).count
	=> 0
	2.2.0 :002 > Thing.find_in_batches { \|batch\|
	2.2.0 :003?> GC.start
	2.2.0 :004?> puts ObjectSpace.each_object(Thing).count
	2.2.0 :005?> }
	1000
	2000
	… 6 lines elided
	2000
	2000
	=> nil
	2.2.0 :006 > GC.start
	=> nil
	2.2.0 :007 > ObjectSpace.each_object(Thing).count
	=> 0

GC indeed collects objects from previous batches, so no more than two batches are in memory during the iteration. Compare this with the regular each iterator over the list of objects returned by Thing.all.

	$ RAILS_ENV=production bundle exec rails console
	Loading production environment (Rails 4.1.4)

	2.2.0 :001 > ObjectSpace.each_object(Thing).count
	=> 0
	2.2.0 :002 > Thing.all.each_with_index { \|thing, i\|
	2.2.0 :003?> if i % 1000 == 0
	2.2.0 :004?> GC.start
	2.2.0 :005?> puts ObjectSpace.each_object(Thing).count
	2.2.0 :006?> end
	2.2.0 :007?> }; nil
	10000
	10000
	… 6 lines elided
	10000
	10000
	=> nil

Here we keep 10,000 objects for the whole duration of the each loop. This increases both total memory consumption and GC time. It also increases the risk of running out of memory if the dataset is too big (remember, ActiveRecord needs 3.5 times more space to store your data).

Use ActiveRecord without Instantiating Models

If all you need is to run a database query or update a column in the table, consider using the following ActiveRecord functions that do not instantiate models.

ActiveRecord::Base.connection.execute("select * from things")

This function executes the query and returns its result unparsed.
ActiveRecord::Base.connection.select_values("select col5 from things")

Similar to the previous function, but returns an array of values only from the first column of the query result.
Thing.all.pluck(:col1, :col5)

Variation of the previous two functions. Returns an array of values that contains either the whole row or the columns you specified in the arguments to pluck.
Thing.where("id < 10").update_all(col1: ’something’)

Updates columns in the table.

These not only save you memory, but also run faster because they neither instantiate models nor execute before/after filters. All they do is run plain SQL queries and, in some cases, return arrays as the result.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Make ActiveRecord Faster

Create new playlist

Sign In

Sign Up

Make ActiveRecord Faster

Load Only the Attributes You Need

Preload Aggressively

Combine Selective Attribute Loading and Preloading

Use the Each! Pattern for Rails with find_each and find_in_batches

Use ActiveRecord without Instantiating Models

Table of Contents for
Make ActiveRecord Faster