Waiting for railsapplication.com... | ||
--The status bar of your user’s web browser |
On the web, your users find out that your application is working at exactly one time—when your program responds to a request. The classic example of this is credit card processing. Which would you prefer to use: a site that says “Now processing your transaction” alongside a soothing animation, or one that shows a blank page?
In addition to such user experience situations, your application may have requirements that simply cannot be satisfied in a few seconds. Perhaps you run a popular site that allows users to upload video files and share them with others. You’ll need to convert various types of video content into Flash. No server you can buy is fast enough to perform this work while the user’s web browser waits.
Do either of these scenarios sound familiar? If so, it is probably time to think about performing work in the background of your application. In this chapter, background refers to anything that happens outside of the normal HTTP request/response cycle. Most developers will need to design and implement background processing at some point. Luckily, Rails and Ruby have several libraries and techniques for background processing, including:
With these tools, you can easily add background processing to your Rails applications. This chapter aims to teach you enough about each one that you can decide which makes sense for your particular application.
Rails comes with a built-in tool for running tasks independent of the web cycle. The runner script simply loads the default Rails environment and then executes some specified Ruby code. Popular uses include
Importing “batch” external data
Executing any (class) method in your models
Running intensive calculations, delivering e-mails in batches, or executing scheduled tasks
Usages involving script/runner
that you should avoid at all costs are
Processing incoming e-mail
Tasks that take longer to run as your database grows
For example, let us suppose that you have a model called “Report.” The Report model has a class method called generate_rankings
, which you can call from the command line using
$ ruby script/runner 'Report.generate_rankings'
Since we have access to all of Rails, we can even use the ActiveRecord
finder methods to extract data from our application:[1]
$ ruby script/runner 'User.find(:all).map(&:email).each { |e| puts "<#{e}>"}' <[email protected]> <[email protected]> <[email protected]> # ... <[email protected]>
This example demonstrates that we have access to the User
model and are able to execute arbitrary Rails code. In this case, we’ve collected some e-mail addresses that we can now spam to our heart’s content. (Just kidding!)
There are some things to remember when using script/runner
. You must specify the production environment using the -e
option; otherwise, it defaults to development. The script/runner
help option tells us:
$ script/runner -h Usage: script/runner [options] ('Some.ruby(code)' or a filename) -e, --environment=name Specifies the environment for the runner to operate in (test/development/ production) Default: development
You can also use runner
as a shebang line for your scripts like this:
#!/usr/bin/env /path/to/script/runner
Using script/runner
, we can easily script any batch operations that need to run using cron
or another system scheduler.
For example, you might calculate the most popular or highest-ranking product in your e-commerce application every few minutes or nightly, rather than make an expensive query on every request:
$ script/runner -e production 'Product.calculate_top_ranking'
A sample crontab
to run that script might look like this:
0 */5 * * * root /usr/local/bin/ruby /apps/exampledotcom/current/script/runner -e production 'Product.calculate_top_ranking'
The script will run every five hours to update the Product
model’s top rankings.
On the positive side: It doesn’t get any easier and there are no additional libraries to install. That’s about it.
As for negatives: The script/runner
process loads the entire Rails environment. For some tasks, particularly short-lived ones, that can be quite wasteful of resources.
Also, nothing prevents multiple copies of the same script from running simultaneously, which can be catastrophically bad, depending on the contents of the script.
Wilson Says...
Do not process incoming e-mail with
script/runner
.This is a Denial of Service attack waiting to happen.
Use Fetcher (or something like it) instead:
The bottom line is, use script/runner
for short tasks that need to run infrequently.
You might already know that you can use DRb as a session container for Rails with a little bit of configuration, but out of the box, it comes ready to process simple TCIP/IP requests and perform some background heavy lifting.
DRb literally stands for “Distributed Ruby.” It is a library that allows you to send and receive messages from remote Ruby objects via TCP/IP. Sound kind of like RPC, CORBA, or Java’s RMI? Probably so. This is Ruby’s simple as dirt answer to all of the above.—Chad Fowler’s Intro to DRb (http://chadfowler.com/ruby/drb.html)
Let’s create a DRb server that performs a simple calculation. We will run this server on localhost, but keep in mind that it could be run on one or more remote servers to distribute the load or provide fault tolerance.
Create a file named distributed_server.rb
and give it the contents of Listing 22.1.
Example 22.1. A Simple DRb Calculation Service
#!/usr/bin/env ruby -w # DRb server # load DRb require 'drb' class DistributedServer def perform_calculation(num) num * num end end DRb.start_service("druby://localhost:9000", DistributedServer.new) puts "Starting DRb server at: #{DRb.uri}" DRb.thread.join
After making this file executable (chmod +x
, or equivalent), run it so that it listens on port 9000 for requests:
$./distributed_server Starting DRb server at: druby://localhost:9000
Now, to call this code from Rails, we can require the DRb library at the top of a controller where we plan to use it:
require 'drb' class MessagesController < ApplicationController
To add an action in the controller to invoke a method on our distributed server, you would write an action method such as this one:
def calculation DRb.start_service drb_client = DRbObject.new(nil, 'druby://localhost:9000') @calculation = drb_client.perform_calculation(5) end
We now have access to a @calculation
instance variable that the distributed server actually processed for us. This is a trivial example, but it demonstrates how simple it is to farm out processes to a distributed server.
This code will still be executed as part of the normal Rails request/response cycle. Rails will wait for the DRb perform_calculation
method to complete before processing any view templates or sending any data to the user agent. We may be able to leverage the power of several other servers by using this technique, but it’s still not precisely what most people mean by background processing. To complete our journey to the dark side, we need to implement some kind of job control to wrap around this code.
The good news is that it’s easy to do, but the better news is that someone’s already done it. More on that in the next section, “BackgrounDRb.”
On the positive side: DRb is part of the Ruby Standard Library, so there is nothing extra to install. Extremely reliable. Suitable for persistent processes that can return results quickly to the caller.
On the negative side: DRb is a relatively “low-level” library and does not provide any job control or configuration file support. Using it directly requires you to invent your own conventions for port numbers, class names, and so on.
Use DRb when you need to implement your own load balancing, or when no other solution offers enough control.
For a more in-depth understanding of how DRb operates, and what is going on in these code samples, see the following web articles:
An Introduction to DRb by Eric Hodel at http://segment7.net/projects/ruby/drb/introduction.html
Intro to DRb by Chad Fowler at http://chadfowler.com/ruby/drb.html
Distributed Ruby in a Nutshell by Frank Spychalski at http://amazing-development.com/archives/2006/03/16/rails-and-distributed-ruby-in-a-nutshell/
BackgrounDRb is a “Ruby job server and scheduler” available at http://backgroundrb.devjavu.com/. The principal use case for the BackgrounDRb plugin for Rails is “divorcing long-running tasks from the Rails request/response cycle.”[2]
In addition to supporting asynchronous background processing, BackgrounDRb (along with Ajax code in your Rails application) is commonly used to support status updates and indicators. BackgrounDRb is frequently used to provide progress bars during large file uploads.
BackgrounDRb received a major rewrite for the 0.2.x branch that completely altered the previous version’s job creation and execution. Job processing now uses multiple processes instead of a single, threaded process. Results are also stored in a Result
worker, to allow each job its own process from which to store and retrieve results. It has an active community, and an open source repository with good test/rspec coverage.
BackgrounDRb can be run standalone or as a Rails plugin. It has two package dependencies, installable as gems: Slave 1.1.0 (or higher) and Daemons 1.0.2 (or higher). Install it into an existing Rails application by running the following command:
svn co http://svn.devjavu.com/backgroundrb/tags/release-0.2.1 vendor/plugins/backgroundrb
Note that using the following command
script/plugin install svn://rubyforge.org//var/svn/backgroundrb
installs the older, single-process version of BackgrounDRb, which you don’t want. We’ll cover the newer 0.2.x version only, since current documentation and development occurs there.
Verify that the tests run by visiting the plugin
directory. You will need the RSpec gem installed if you wish to do this.
$ rake (in /Users/your_login/your_app/vendor/plugins/backgroundrb) /usr/local/bin/ruby -Ilib:lib "test/backgroundrb_test.rb" "test/scheduler_test.rb" Loaded suite /usr/local/lib/ruby/gems/1.8/gems/rake- 0.7.1/lib/rake/rake_test_loader Started .................. Finished in 3.107323 seconds. 18 tests, 26 assertions, 0 failures, 0 errors
Assuming that all tests pass, change back to your RAILS_ROOT
and run rake backgroundrb:setup
to install BackgrounDRb’s configuration file, scripts, and directories for tasks and workers.
The default config/backgroundrb.yml
file will look like this:
--- :rails_env: development :host: localhost :port: 2000
The default BackgrounDRb server runs in the development environment, and listens on the localhost server on port 2000. A move to production requires you to update this rails_env
variable. The official BackgrounDRb documentation included with the distribution has more details.
The heart of BackgrounDRb is the MiddleMan
class, which facilitates the creation of workers, keeps track of them, and provides access to their results.
BackgrounDRb allows us to define workers, which are classes containing the code that we would like to execute in the background. By default they will be stored in the lib/workers
directory of your Rails project.
These workers will be subclasses of one of two base classes provided by the plugin:
BackgrounDRb::Worker::Base
—. Simple workers needing minimal environmental setup
BackgrounDRb::Worker::RailsBase
—. Workers that need access to a fully configured Rails environment
Workers that subclass RailsBase
will consume more resources than Base
workers, so if you do not need access to ActiveRecord
models or other Rails facilities, try to use the simple worker class.
If workers need to return their output to our application, we can use their results
method when we invoke them. It operates like a normal Hash
object, but behind the scenes it is a special Result
worker. We can also create log messages via the BackgrounDRb logger
method.
Each worker needs to define a do_work
method that accepts a single args
parameter. BackgrounDRb will automatically call this method when a worker is initialized. Typically this method should be kept simple, and will call other methods you define in order to perform its work.
Let’s create a worker in our new lib/workers
directory. We’ll use the provided generator to create the base class:
$script/generate worker Counter
We’ll add some code to make it count to 10000, to simulate a long-running task. Real-life examples include processing an uploaded file, converting an image, or generating and sending a report. In Listing 22.2, we will shove all of the code into the do_work
method, but in your own code you will want to adhere to normal model design principles and factor out your code appropriately.
Example 22.2. CounterWorker
Class Counts Up to 10,000
class CounterWorker < BackgrounDRb::Worker::RailsBase def do_work(args) logger.info 'Starting the CounterWorker' 1.upto 10_000 do |x| results[:count] = x logger.info "Count: #{x}" end logger.info 'Finished counting to 10,000' end end CounterWorker.register
With a worker ready to go, we can fire up the BackgrounDRb server:
$ ruby script/backgroundrb start
Check to see that the BackgrounDRb processes are running by using the ps
command:[3]
$ps aux | grep background you 617 0.6 -0.2 3628 ?? R 4:20PM 0:00.23 backgroundrb you 618 0.0 -0.7 14640 ?? S 4:20PM 0:00.10 backgroundrb_logger you 619 0.0 -0.7 14572 ?? S 4:20PM 0:00.09 backgroundrb_results
Now, we can trigger the worker from a controller action. The new_worker
class method of MiddleMan
instantiates a new worker and returns a “key” that will allow us to refer to it later.
Here we create a new CounterWorker
and store its key in the session for later use:
def start_counting session[:key] = MiddleMan.new_worker(:class => :counter_worker) redirect_to :action => 'check_counter' end
We’ll go ahead and create another action to check the status of the worker. We must use the key that we saved moments ago to fetch the running worker, and then use the results
method to access the current value of the counter:
def check_counter count_worker = MiddleMan.worker(session[:key]) @count = count_worker.results[:count] end
The corresponding view (for check_counter
) could be this simple:
<p>We're currently counting. We're at <%= @count %>.</p>
Inside the start_counting
action, the new_worker
method immediately calls the do_work
method we defined in the CounterWorker
class. This is a nonblocking call, and our web application happily continues along and redirects us, while the worker chugs along counting.
If we hit the Refresh button on the check_counter
action to reload the results of the worker, it will show the @count
variable increasing, as the background process progresses with its job.
Unfortunately, changes to the workers require BackgrounDRb to be restarted. They are loaded once and then cached, just like your ActiveRecord
models in production mode.
If you get an error like this
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb: 27:in `gem_original_require': no such file to load — slave (LoadError)
remember that BackgrounDRb depends on the slave
and daemons
gems.
If the backgroundrb process should exit or die, the process ID files will need to be cleaned up. You’ll know that it happened if subsequent attempts to start the service result in
ERROR: there is already one or more instance(s) of the program running
To remove the log/backgroundrb.pid
and log/backgroundrb.ppid
, we can use the convenient, built-in zap
command:
$ script/backgroundrb zap
BackgrounDRb should start normally after the old files are zapped.
On the positive side:
Provides job control and asynchronous invocation right out of the box.
Popular, with many code samples posted on the web.
Optimal for “event-based” tasks, such as those that occur every time a user hits a particular action.
As for negatives:
The current version is considered “experimental” by the maintainers. You may end up needing to change your worker or action code as the API evolves.
Support for scheduled tasks is new, and may not be as stable as the rest of the codebase.
Some configuration options are baked in and may be difficult to customize if your production environment is unusual.
All things considered, BackgrounDRb seems perfect for tasks that need to be initiated from a controller action or a model callback.
The website http://daemons.rubyforge.org/ offers an excellent Ruby library that lets you “daemonize” your script for easy management and maintainability.
The script in Listing 22.3 is a simple example of how to use the daemons
library to run a scheduled task.
The script defines a simple task, update_rss_feeds
, and runs it in a loop. If you save it as background_tasks.rb
and run it without any options like this:
script/runner background_tasks.rb
it will show you all options provided by the daemons library:
Usage: BackgroundTasks <command> <options> -- <application options> * where <command> is one of: start start an instance of the application stop stop all instances of the application restart stop all instances and restart them afterwards run start the application and stay on top zap set the application to a stopped state * and where <options> may contain several of the following: -t, --ontop Stay on top (does not daemonize) -f, --force Force operation Common options: -h, --help Show this message --version Show version
You can control your background task process using simple commands.
The Daemon library also guarantees that only one copy of your task is running at a time, which prevents the need for control logic that tends to creep into script/runner
or cron
scripts.
The preceding example demonstrates the control that the Daemons library provides. However, as written, it doesn’t do much. Let’s modify the script to make it fetch e-mails from an external server as well (as shown in Listing 22.4). Since fetching e-mail happens to use the network, we’ll use threads to get more work done in less time.
Example 22.4. The Threaded E-mail Fetcher
require 'thread' require 'daemons' class BackgroundTasks include Singleton def initialize ActiveRecord::Base.allow_concurrency = true end def run threads = [] [:update_rss_feeds, :update_emails].each do |task| threads << Thread.new do self.send task end end threads.each {|t| t.join } end protected def update_rss_feeds loop do Feed.update_all sleep 10 end end def update_emails loop do User.find(:all, :conditions => "email IS NOT NULL").each do |user| user.fetch_emails end sleep 60 end end end Daemons.run_proc('BackgroundTasks') do BackgroundTasks.instance.start end
An important thing to notice about the code in Listing 22.4 is that we added
ActiveRecord::Base.allow_concurrency = true
to the initialize
method. That is a critical step for using ActiveRecord
concurrently in multiple threads. Among other things, the setting gives each thread its own database connection. Forgetting this step can lead to data corruption and other horrors. Consider yourself warned!
The daemon we have just written has only the most trivial scheduling support. Your application may need something more robust than sleep 60
. If this is the case, you may want to consider using the unfortunately named OpenWFEru library available at http://openwferu.rubyforge.org/scheduler.html, which provides a wide variety of scheduling possibilities.
Daemons are the most cost-effective way to implement background-processing code that needs to run continuously, and they offer precise control over which libraries you load, and which settings you configure.
Daemons are also easy to manage with monitoring tools like monit
: http://www.tildeslash.com/monit/.
On the negative side, setting up daemons is not as automatic as BackgrounDRb or as simple as script/runner
. (Fundamentalist programmers might be scared to work on them too.)
Consider using Daemons whenever you need something to run continuously.
In this chapter, our final one of the book, we’ve covered extending Rails with behavior that runs in a context external to normal request processing, that is in the background. The topic runs deep, and we’ve just skimmed across the surface of what is possible.
18.217.107.229