Chapter 22. Background Processing

 

Waiting for railsapplication.com...

 
 --The status bar of your user’s web browser

On the web, your users find out that your application is working at exactly one time—when your program responds to a request. The classic example of this is credit card processing. Which would you prefer to use: a site that says “Now processing your transaction” alongside a soothing animation, or one that shows a blank page?

In addition to such user experience situations, your application may have requirements that simply cannot be satisfied in a few seconds. Perhaps you run a popular site that allows users to upload video files and share them with others. You’ll need to convert various types of video content into Flash. No server you can buy is fast enough to perform this work while the user’s web browser waits.

Do either of these scenarios sound familiar? If so, it is probably time to think about performing work in the background of your application. In this chapter, background refers to anything that happens outside of the normal HTTP request/response cycle. Most developers will need to design and implement background processing at some point. Luckily, Rails and Ruby have several libraries and techniques for background processing, including:

  • script/runner—. Built into Rails

  • DRb—A proven distributed processing library by Masatoshi Seki

  • BackgrounDRb—A plugin written by Ezra Zygmuntowicz and maintained by Skaar

  • Daemons—Makes it easy to create long-running system services. Written by Thomas Uehlinger

With these tools, you can easily add background processing to your Rails applications. This chapter aims to teach you enough about each one that you can decide which makes sense for your particular application.

script/runner

Rails comes with a built-in tool for running tasks independent of the web cycle. The runner script simply loads the default Rails environment and then executes some specified Ruby code. Popular uses include

  • Importing “batch” external data

  • Executing any (class) method in your models

  • Running intensive calculations, delivering e-mails in batches, or executing scheduled tasks

Usages involving script/runner that you should avoid at all costs are

  • Processing incoming e-mail

  • Tasks that take longer to run as your database grows

Getting Started

For example, let us suppose that you have a model called “Report.” The Report model has a class method called generate_rankings, which you can call from the command line using

$ ruby script/runner 'Report.generate_rankings'

Since we have access to all of Rails, we can even use the ActiveRecord finder methods to extract data from our application:[1]

$ ruby script/runner 'User.find(:all).map(&:email).each { |e| 
puts "<#{e}>"}'
<[email protected]>
<[email protected]>
<[email protected]>
# ...
<[email protected]>

This example demonstrates that we have access to the User model and are able to execute arbitrary Rails code. In this case, we’ve collected some e-mail addresses that we can now spam to our heart’s content. (Just kidding!)

Usage Notes

There are some things to remember when using script/runner. You must specify the production environment using the -e option; otherwise, it defaults to development. The script/runner help option tells us:

$ script/runner -h

Usage: script/runner [options] ('Some.ruby(code)' or a 
filename)

  -e, --environment=name   Specifies the environment for the 
runner
                           to operate in (test/development/
production)
                           Default: development

You can also use runner as a shebang line for your scripts like this:

#!/usr/bin/env /path/to/script/runner

Using script/runner, we can easily script any batch operations that need to run using cron or another system scheduler.

For example, you might calculate the most popular or highest-ranking product in your e-commerce application every few minutes or nightly, rather than make an expensive query on every request:

$ script/runner -e production 'Product.calculate_top_ranking'

A sample crontab to run that script might look like this:

0 */5 * * *   root  /usr/local/bin/ruby 
/apps/exampledotcom/current/script/runner -e production 
'Product.calculate_top_ranking'

The script will run every five hours to update the Product model’s top rankings.

script/runner Considerations

On the positive side: It doesn’t get any easier and there are no additional libraries to install. That’s about it.

As for negatives: The script/runner process loads the entire Rails environment. For some tasks, particularly short-lived ones, that can be quite wasteful of resources.

Also, nothing prevents multiple copies of the same script from running simultaneously, which can be catastrophically bad, depending on the contents of the script.

Wilson Says...

Do not process incoming e-mail with script/runner.

This is a Denial of Service attack waiting to happen.

Use Fetcher (or something like it) instead:

http://slantwisedesign.com/rdoc/fetcher/

The bottom line is, use script/runner for short tasks that need to run infrequently.

DRb

You might already know that you can use DRb as a session container for Rails with a little bit of configuration, but out of the box, it comes ready to process simple TCIP/IP requests and perform some background heavy lifting.

DRb literally stands for “Distributed Ruby.” It is a library that allows you to send and receive messages from remote Ruby objects via TCP/IP. Sound kind of like RPC, CORBA, or Java’s RMI? Probably so. This is Ruby’s simple as dirt answer to all of the above.—Chad Fowler’s Intro to DRb (http://chadfowler.com/ruby/drb.html)

A Simple DRb Server

Let’s create a DRb server that performs a simple calculation. We will run this server on localhost, but keep in mind that it could be run on one or more remote servers to distribute the load or provide fault tolerance.

Create a file named distributed_server.rb and give it the contents of Listing 22.1.

Example 22.1. A Simple DRb Calculation Service

#!/usr/bin/env ruby -w
# DRb server

# load DRb
require 'drb'

class DistributedServer
 def perform_calculation(num)
  num * num
 end
end

DRb.start_service("druby://localhost:9000", 
DistributedServer.new)
puts "Starting DRb server at: #{DRb.uri}"

DRb.thread.join

After making this file executable (chmod +x, or equivalent), run it so that it listens on port 9000 for requests:

$./distributed_server
Starting DRb server at: druby://localhost:9000

Using DRb from Rails

Now, to call this code from Rails, we can require the DRb library at the top of a controller where we plan to use it:

require 'drb'
class MessagesController < ApplicationController

To add an action in the controller to invoke a method on our distributed server, you would write an action method such as this one:

def calculation
  DRb.start_service
  drb_client = DRbObject.new(nil, 'druby://localhost:9000')
  @calculation = drb_client.perform_calculation(5)
end

We now have access to a @calculation instance variable that the distributed server actually processed for us. This is a trivial example, but it demonstrates how simple it is to farm out processes to a distributed server.

This code will still be executed as part of the normal Rails request/response cycle. Rails will wait for the DRb perform_calculation method to complete before processing any view templates or sending any data to the user agent. We may be able to leverage the power of several other servers by using this technique, but it’s still not precisely what most people mean by background processing. To complete our journey to the dark side, we need to implement some kind of job control to wrap around this code.

The good news is that it’s easy to do, but the better news is that someone’s already done it. More on that in the next section, “BackgrounDRb.”

DRb Considerations

On the positive side: DRb is part of the Ruby Standard Library, so there is nothing extra to install. Extremely reliable. Suitable for persistent processes that can return results quickly to the caller.

On the negative side: DRb is a relatively “low-level” library and does not provide any job control or configuration file support. Using it directly requires you to invent your own conventions for port numbers, class names, and so on.

Use DRb when you need to implement your own load balancing, or when no other solution offers enough control.

Resources

For a more in-depth understanding of how DRb operates, and what is going on in these code samples, see the following web articles:

BackgrounDRb

BackgrounDRb is a “Ruby job server and scheduler” available at http://backgroundrb.devjavu.com/. The principal use case for the BackgrounDRb plugin for Rails is “divorcing long-running tasks from the Rails request/response cycle.”[2]

In addition to supporting asynchronous background processing, BackgrounDRb (along with Ajax code in your Rails application) is commonly used to support status updates and indicators. BackgrounDRb is frequently used to provide progress bars during large file uploads.

BackgrounDRb received a major rewrite for the 0.2.x branch that completely altered the previous version’s job creation and execution. Job processing now uses multiple processes instead of a single, threaded process. Results are also stored in a Result worker, to allow each job its own process from which to store and retrieve results. It has an active community, and an open source repository with good test/rspec coverage.

Getting Started

BackgrounDRb can be run standalone or as a Rails plugin. It has two package dependencies, installable as gems: Slave 1.1.0 (or higher) and Daemons 1.0.2 (or higher). Install it into an existing Rails application by running the following command:

svn co http://svn.devjavu.com/backgroundrb/tags/release-0.2.1
vendor/plugins/backgroundrb

Note that using the following command

script/plugin install svn://rubyforge.org//var/svn/backgroundrb

installs the older, single-process version of BackgrounDRb, which you don’t want. We’ll cover the newer 0.2.x version only, since current documentation and development occurs there.

Verify that the tests run by visiting the plugin directory. You will need the RSpec gem installed if you wish to do this.

$ rake
(in /Users/your_login/your_app/vendor/plugins/backgroundrb)
/usr/local/bin/ruby -Ilib:lib "test/backgroundrb_test.rb"
"test/scheduler_test.rb"
Loaded suite /usr/local/lib/ruby/gems/1.8/gems/rake-
0.7.1/lib/rake/rake_test_loader
Started
..................
Finished in 3.107323 seconds.

18 tests, 26 assertions, 0 failures, 0 errors

Assuming that all tests pass, change back to your RAILS_ROOT and run rake backgroundrb:setup to install BackgrounDRb’s configuration file, scripts, and directories for tasks and workers.

Configuration

The default config/backgroundrb.yml file will look like this:

---
:rails_env: development
:host: localhost
:port: 2000

The default BackgrounDRb server runs in the development environment, and listens on the localhost server on port 2000. A move to production requires you to update this rails_env variable. The official BackgrounDRb documentation included with the distribution has more details.

Understanding BackgrounDRb

The heart of BackgrounDRb is the MiddleMan class, which facilitates the creation of workers, keeps track of them, and provides access to their results.

BackgrounDRb allows us to define workers, which are classes containing the code that we would like to execute in the background. By default they will be stored in the lib/workers directory of your Rails project.

These workers will be subclasses of one of two base classes provided by the plugin:

  • BackgrounDRb::Worker::BaseSimple workers needing minimal environmental setup

  • BackgrounDRb::Worker::RailsBaseWorkers that need access to a fully configured Rails environment

Workers that subclass RailsBase will consume more resources than Base workers, so if you do not need access to ActiveRecord models or other Rails facilities, try to use the simple worker class.

If workers need to return their output to our application, we can use their results method when we invoke them. It operates like a normal Hash object, but behind the scenes it is a special Result worker. We can also create log messages via the BackgrounDRb logger method.

Each worker needs to define a do_work method that accepts a single args parameter. BackgrounDRb will automatically call this method when a worker is initialized. Typically this method should be kept simple, and will call other methods you define in order to perform its work.

Using the MiddleMan

Let’s create a worker in our new lib/workers directory. We’ll use the provided generator to create the base class:

$script/generate worker Counter

We’ll add some code to make it count to 10000, to simulate a long-running task. Real-life examples include processing an uploaded file, converting an image, or generating and sending a report. In Listing 22.2, we will shove all of the code into the do_work method, but in your own code you will want to adhere to normal model design principles and factor out your code appropriately.

Example 22.2. CounterWorker Class Counts Up to 10,000

class CounterWorker < BackgrounDRb::Worker::RailsBase
 def do_work(args)
  logger.info 'Starting the CounterWorker'
  1.upto 10_000 do |x|
   results[:count] = x
   logger.info "Count: #{x}"
  end
  logger.info 'Finished counting to 10,000'
 end

end

CounterWorker.register

With a worker ready to go, we can fire up the BackgrounDRb server:

$ ruby script/backgroundrb start

Check to see that the BackgrounDRb processes are running by using the ps command:[3]

$ps aux | grep background
you    617  0.6 -0.2  3628  ?? R   4:20PM  0:00.23 
backgroundrb
you    618  0.0 -0.7  14640 ?? S   4:20PM  0:00.10 
backgroundrb_logger
you    619  0.0 -0.7  14572 ?? S   4:20PM  0:00.09 
backgroundrb_results

Now, we can trigger the worker from a controller action. The new_worker class method of MiddleMan instantiates a new worker and returns a “key” that will allow us to refer to it later.

Here we create a new CounterWorker and store its key in the session for later use:

def start_counting
  session[:key] = MiddleMan.new_worker(:class => 
:counter_worker)
  redirect_to :action => 'check_counter'
end

We’ll go ahead and create another action to check the status of the worker. We must use the key that we saved moments ago to fetch the running worker, and then use the results method to access the current value of the counter:

def check_counter
  count_worker = MiddleMan.worker(session[:key])
  @count = count_worker.results[:count]
end

The corresponding view (for check_counter) could be this simple:

<p>We're currently counting. We're at <%= @count %>.</p>

Inside the start_counting action, the new_worker method immediately calls the do_work method we defined in the CounterWorker class. This is a nonblocking call, and our web application happily continues along and redirects us, while the worker chugs along counting.

If we hit the Refresh button on the check_counter action to reload the results of the worker, it will show the @count variable increasing, as the background process progresses with its job.

Caveats

Unfortunately, changes to the workers require BackgrounDRb to be restarted. They are loaded once and then cached, just like your ActiveRecord models in production mode.

If you get an error like this

/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:
27:in `gem_original_require': no such file to load — slave 
(LoadError)

remember that BackgrounDRb depends on the slave and daemons gems.

If the backgroundrb process should exit or die, the process ID files will need to be cleaned up. You’ll know that it happened if subsequent attempts to start the service result in

ERROR: there is already one or more instance(s) of the program
running

To remove the log/backgroundrb.pid and log/backgroundrb.ppid, we can use the convenient, built-in zap command:

$ script/backgroundrb zap

BackgrounDRb should start normally after the old files are zapped.

BackGrounDRb Considerations

On the positive side:

  • Provides job control and asynchronous invocation right out of the box.

  • Popular, with many code samples posted on the web.

  • Optimal for “event-based” tasks, such as those that occur every time a user hits a particular action.

As for negatives:

  • The current version is considered “experimental” by the maintainers. You may end up needing to change your worker or action code as the API evolves.

  • Support for scheduled tasks is new, and may not be as stable as the rest of the codebase.

  • Some configuration options are baked in and may be difficult to customize if your production environment is unusual.

All things considered, BackgrounDRb seems perfect for tasks that need to be initiated from a controller action or a model callback.

Daemons

The website http://daemons.rubyforge.org/ offers an excellent Ruby library that lets you “daemonize” your script for easy management and maintainability.

Usage

The script in Listing 22.3 is a simple example of how to use the daemons library to run a scheduled task.

Example 22.3. A Simple Use of Daemons to Update RSS Feeds in the Background

require 'daemons'

class BackgroundTasks
 include Singleton
 def update_rss_feeds
  loop do
   Feed.update_all
   sleep 10
  end
 end
end

Daemons.run_proc('BackgroundTasks') do
 BackgroundTasks.instance.update_rss_feeds
end

The script defines a simple task, update_rss_feeds, and runs it in a loop. If you save it as background_tasks.rb and run it without any options like this:

script/runner background_tasks.rb

it will show you all options provided by the daemons library:

Usage: BackgroundTasks <command> <options> -- <application 
options>

* where <command> is one of:
 start     start an instance of the application
 stop     stop all instances of the application
 restart    stop all instances and restart them afterwards
 run      start the application and stay on top
 zap      set the application to a stopped state

* and where <options> may contain several of the following:

  -t, --ontop           Stay on top (does not daemonize)
  -f, --force           Force operation

Common options:
  -h, --help            Show this message
    --version          Show version

You can control your background task process using simple commands.

The Daemon library also guarantees that only one copy of your task is running at a time, which prevents the need for control logic that tends to creep into script/runner or cron scripts.

Introducing Threads

The preceding example demonstrates the control that the Daemons library provides. However, as written, it doesn’t do much. Let’s modify the script to make it fetch e-mails from an external server as well (as shown in Listing 22.4). Since fetching e-mail happens to use the network, we’ll use threads to get more work done in less time.

Example 22.4. The Threaded E-mail Fetcher

require 'thread'
require 'daemons'

class BackgroundTasks
 include Singleton

 def initialize
  ActiveRecord::Base.allow_concurrency = true
 end

 def run
  threads = []
  [:update_rss_feeds, :update_emails].each do |task|
   threads << Thread.new do
    self.send task
   end
  end
  threads.each {|t| t.join }
 end
 protected
 def update_rss_feeds
  loop do
   Feed.update_all
   sleep 10
  end
 end

 def update_emails
  loop do
   User.find(:all, :conditions => "email IS NOT NULL").each do 
|user|
    user.fetch_emails
   end
   sleep 60
  end
 end
end

Daemons.run_proc('BackgroundTasks') do
 BackgroundTasks.instance.start
end

An important thing to notice about the code in Listing 22.4 is that we added

ActiveRecord::Base.allow_concurrency = true

to the initialize method. That is a critical step for using ActiveRecord concurrently in multiple threads. Among other things, the setting gives each thread its own database connection. Forgetting this step can lead to data corruption and other horrors. Consider yourself warned!

The daemon we have just written has only the most trivial scheduling support. Your application may need something more robust than sleep 60. If this is the case, you may want to consider using the unfortunately named OpenWFEru library available at http://openwferu.rubyforge.org/scheduler.html, which provides a wide variety of scheduling possibilities.

Daemon Considerations

Daemons are the most cost-effective way to implement background-processing code that needs to run continuously, and they offer precise control over which libraries you load, and which settings you configure.

Daemons are also easy to manage with monitoring tools like monit: http://www.tildeslash.com/monit/.

On the negative side, setting up daemons is not as automatic as BackgrounDRb or as simple as script/runner. (Fundamentalist programmers might be scared to work on them too.)

Consider using Daemons whenever you need something to run continuously.

Conclusion

In this chapter, our final one of the book, we’ve covered extending Rails with behavior that runs in a context external to normal request processing, that is in the background. The topic runs deep, and we’ve just skimmed across the surface of what is possible.

References

1.

Be careful to escape any characters that have specific meaning to your shell.

2.

http://backgroundrb.rubyforge.org/

3.

Windows users can use the tasklist command to get similar results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.107.229