Chapter 20. Background Processing

People count up the faults of those who keep them waiting.

—French Proverb

Users of modern websites have lofty expectations when it comes to application responsiveness – most likely they will expect behavior and speed similar to that of desktop applications. Proper user experience guidelines would dictate that no HTTP request/ response cycle should take more than a second to execute however there will be actions that arise that simply cannot achieve this time constraint.

Tasks of this nature can range from simple, long running tasks due to network latency to more complex tasks that require heavy processing on the server. Examples of these actions could be sending an email or processing video, respectively. In these situations, it is best to have the actions execute asynchronously, so that the responsiveness of the application remains swift while the procedures run.

In this chapter these types of tasks are referred to as background jobs. They include any execution that is handled in a separate process from the Rails application. Rails and Ruby have several libraries and techniques for performing this work, most notably:

• Delayed Job

• Resque

• Rails Runner

This chapter will cover each of these tools, discussing the strengths and weaknesses of each one so that you may determine what is appropriate for your application.

20.1 Delayed Job

Delayed Job1 is a robust background processing library that is essentially a highly configurable priority queue. It provides various approaches to handling asynchronous actions, including:

• Custom background jobs

• Permanently marked background methods

• Background execution of methods at runtime

By default Delayed Job relies on Active Record to store all queue related operations and requires a relational database to store job information. However, it can be configured to use other persistence frameworks, as well as other non-relational databases. Supported options are:

• DataMapper

• MongoMapper (for use with MongoDB)

• CouchREST (for use with CouchDB)

20.1.1 Getting Started

Add the delayed_job gem to your application’s Gemfile, then run the generator to create your execution and migration scripts.

rails generate delayed_job

This will create the database migration that will need to be run to set up the delayed_jobs table in the database, as well as a script to run Delayed Job. If you are using MongoMapper or CouchREST as the persistence framework, you may run the command with a --skip-migration option supplied since no migration will be needed.

To change the default settings for Delayed Job, first add a delayed_job.rb in your config/initializers directory. Options then can be configured by calling various methods on Delayed::Worker, which include settings for changing the behavior of the queue with respect to tries, timeouts, maximum run times, sleep delays and other options.

Delayed::Worker.backend = :mongo_mapper
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 30
Delayed::Worker.max_attempts = 5
Delayed::Worker.max_run_time = 1.hour
Delayed::Worker.max_priority = 10

20.1.2 Creating Jobs

Delayed Job can create background jobs using 3 different techniques, and which one you use depends on your own personal style.

The first option is to chain any method that you wish to execute asynchronously after a call to Object#delay. This is good for cases where some common functionality needs to execute in the background in certain situations, but is acceptable to run synchronously in others.

image

The second technique is to tell Delayed Job to execute every call to a method in the background via the Object.handle_asynchronously macro.

image


Durran says ...

When using handle_asynchronously, make sure the declaration is after the method definition, since Delayed Job uses alias_method_chain internally to set up the behavior.


Lastly, you may create a custom job by creating a separate Ruby object that only needs to respond to perform. That job can then be run at any point by telling Delayed Job to enqueue the action.

image

20.1.3 Running

To start up Delayed Job workers, use the delayed_job script created by the generator. This allows for starting a single worker or multiple workers on their own processes, and also provides the ability to stop all workers.

image


Durran says ...

Delayed Job workers generally have a lifecycle that is equivalent to an application deployment. Because of this, their memory consumption grows over time and may eventually have high swap usage, causing workers to become unresponsive. A good practice is to have a monitoring tool like God or monit watching jobs, and restarting them when their memory usage hits a certain point.


20.1.4 Summary

Delayed Job is an excellent choice when you want ease of setup, need to schedule jobs for later dates, or want to add priorities to jobs in your queue. It works well in situations where the total number of jobs is low and the tasks they execute are not long running or consume large amounts of memory.

Do note that if you are using Delayed Job with a relational database backend and have a large number of jobs, performance issues may arise due to the table locking the framework employs. Since jobs may have a long lifecycle, be wary of resource consumption due to workers not releasing memory once jobs are finished executing. Also where job execution can take a long period of time, higher priority jobs will still wait for the other jobs to complete before being processed. In these cases, using a non-relational backend such as MongoDB or potentially another library such as Resque may be advisable.

20.2 Resque

Resque2 is a background processing framework that supports multiple queues and is optimized for handling extremely large numbers of jobs efficiently. It uses Redis for its persistent storage and comes with a Sinatra web application to monitor the queues and jobs.

Resque actions are Ruby objects or modules that respond to a perform class method. Jobs are stored in the database as JSON objects, and because of this only primitives can be passed as arguments to the actions. Resque also provides hooks into the worker and job lifecycles, as well as the ability to configure custom failure mechanisms.

Due to Resque’s use of Redis as its storage engine, the overhead of job processing is unnoticable. It is currently the best performing background processing library for the feature set, and its parent/child forking architecture makes its resource consumption predictable and easily managed.

20.2.1 Getting Started

First in your Gemfile add the resque gem, then configure Resque by creating a Rails initializer and a resque.yml to store the configuration options. The YAML should be key/value pairs of environment name with the Redis host and port, and the initializer should load the YAML and set up the Redis options.

Configuring failure backends can also be done in the same manner Resque supports persistence to Redis or Hoptoad notifications out of the box, but custom backends can be easily created by inheriting from Resque::Failure::Base.

In config/resque.yml:

development: localhost:6379
staging:     localhost:6379
production:  localhost:6379

The config/initializers/resque.rb:

image

20.2.2 Creating Jobs

Jobs in Resque are plain old Ruby objects that respond to a perform class method and define which queue they should be processed in. The simplest manner to define the queue is to set an instance variable on the job itself.

image

20.2.3 Hooks

Resque provides lifecycle hooks that can used to add additional behavior, for example adding an automatic retry for a failed job. There are two categories of hooks: worker hooks and job hooks.

The available worker hooks are before_first_fork, before_fork, and after_fork. Before hooks are executed in the parent process where the after hook executes in the child process. This is important to note since changes in the parent process will be permanent for the life of the worker, whereas changes in the child process will be lost when the job completes.

image

Job hooks differ slightly from worker hooks in that they are defined on the action classes themselves and are defined as class methods with the hook name as the prefix. The available hooks for jobs are: before_perform, after_perform, around_perform, and on_failure.

An example job that needs to retry itself automatically on failure, and logged some information before it started processing would look like:

image

20.2.4 Plugins

Resque has a very good plugin ecosystem to provide it with additional useful features. Most plugins are modules that are included in your job classes, only to be used on specific jobs that need the extra functionality. Plugins of note are listed below and a complete list can be found at http://wiki.github.com/defunkt/resque/plugins.

resque-lock: Allows for only a single instance of a job to be running at a time.

resque-retry: Adds configurable retry and exponential backoff behavior for failed jobs.

resque-restriction: Provides configurable limits to job execution within given time frames.

resque-schedule: Adds recurring jobs and the ability to schedule jobs in the future.

20.2.5 Running

Resque comes with two rake tasks that can be used to run workers, one to run a single worker for one or more queues the second to run multiple workers. Configuration options are supplied as environment variables when running the tasks and allow for defining the queue for the workers to monitor, logging verbosity, and the number or workers to start.

image

Stopping jobs involves sending signals to the parent Resque workers, which then take the appropriate action on the child and themselves:

QUIT waits for the forked child to finish processing, then exists

TERM/INT immediately kills the child process and exits

USR1 immediately kills the child process, but leaves the parent worker running

USR2 finishes processing the child action, then waits for CONT before spawning another

CONT continues to start jobs again if it was halted by a USR2

20.2.6 Monitoring

One of the really nice features of Resque is the web interface that it ships with for monitoring your queues and jobs. It can run standalone or be mounted with your Rails application using Rack::URLMap in your app’s config.ru.

To run standalone, simply run resque-web from the command line. If you prefer to mount with your Rails application, modify your config.ru to add the Resque server.

image

20.2.7 Summary

Resque is recommended where a large number of jobs are in play with potential unwanted memory growth. Completed child jobs release their memory on completing, so long-running workers do not have the negative effect on system resources that you could potentially have with other frameworks. It does not support priority queueing but does support multiple queues is advantageous when jobs can be categorized together and given pools of workers to run them.

With a Redis backend, Resque does not suffer from the potential database locking issues that can arise when using Delayed Job and has significantly better performance with respect to queue management.

Do note that Redis stores all of its data in memory, so if you are expecting a large amount of jobs but do not have a significant amount of RAM to spare, you may need to look at a different framework.

20.3 Rails Runner

Rails comes with a built-in tool for running tasks independent of the web cycle. The rails runner command simply loads the default Rails environment and then executes some specified Ruby code. Popular uses include:

• Importing batch external data

• Executing any (class) method in your models

• Running intensive calculations, delivering e-mails in batches, or executing scheduled tasks

Usages involving rails runner that you should avoid at all costs are:

• Processing incoming e-mail

• Tasks that take longer to run as your database grows

20.3.1 Getting Started

For example, let us suppose that you have a model called Report. The Report model has a class method called generate_rankings, which you can call from the command line using

$ rails runner 'Report.generate_rankings'

Since we have access to all of Rails, we can even use the Active Record finder methods to extract data from our application.3

image

This example demonstrates that we have access to the User model and are able to execute arbitrary Rails code. In this case, we’ve collected some e-mail addresses that we can now spam to our heart’s content. (Just kidding!)

20.3.2 Usage Notes

There are some things to remember when using rails runner. You must specify the production environment using the -e option; otherwise, it defaults to development. The rails runner help option tells us:

image

Using rails runner, we can easily script any batch operations that need to run using cron or another system scheduler. For example, you might calculate the most popular or highest-ranking product in your e-commerce application every few minutes or nightly, rather than make an expensive query on every request:

$ rails runner ,Äìe production 'Product.calculate_top_ranking'

A sample crontab to run that script might look like

image

The script will run every five hours to update the Product model’s top rankings.

20.3.3 Considerations

On the positive side, it doesn’t get any easier and there are no additional libraries to install. That’s about it.

As for negatives the rails runner process loads the entire Rails environment. For some tasks, particularly short-lived ones, that can be quite wasteful of resources. Also, nothing prevents multiple copies of the same script from running simultaneously, which can be catastrophically bad, depending on the contents of the script.


Wilson says ...

Do not process incoming e-mail with rails runner. It’s a Denial of Service attack waiting to happen. Use Fetcher (or something like it) instead: http://slantwisedesign.com/rdoc/fetcher/.


20.3.4 Summary

The Rails Runner is useful for short tasks that need to run infrequently, but jobs that require more heavy lifting, reporting, and robust failover mechanisms are best handled by other libraries.

20.4 Conclusion

Most web applications today will need to incorporate some form of asynchronous behavior, and we’ve covered some of the important libraries available when needing to implement background processing. There are many other frameworks and techniques available for handling this, so choose the solution that is right for your needs—just remember to never make your users wait.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.181