Fork to Run Heavy Jobs

Cycling long-running Ruby instances helps to deal with sudden increases in memory consumption. But often we know beforehand that the code we’re going to execute will need memory. For example, our database query returned 100,000 rows, and we need to compute complex statistics based on that data.

We can let that memory-heavy operation run and then let our infrastructure restart the Ruby process. But there’s a better solution. We can fork our process and execute the memory-heavy code in the child process. This way, only the child process will grow in memory, and when it exits, the parent process remains unaffected.

The simplest possible implementation looks like this:

 
pid = fork ​do
 
heavy_function
 
end
 
Process::waitpid(pid)

You might recognize this code from the performance_benchmark function in the previous chapter. We used the same fork-and-run approach to isolate benchmarks from the parent process, and from themselves.

You might also recall the downside of this approach. Such code has no easy way of returning data to the parent process. If you want to do it, you’ll need to open a pipe between parent and child, use temporary storage, or store results into the database.

In the previous chapter we already used the temporary storage to communicate between the forked process and its parent. So now let’s see how to send the data via the I/O pipe.

chp9/forked_process_io_pipe_example.rb
 
require ​'bigdecimal'
 
 
def​ heavy_function
 
# this allocates approx. 450,000 extra objects before returning the result
 
Array.new(100000) { BigDecimal(rand(), 3) }.inject(0) { |sum, i| sum + i }
 
end
 
 
# disable GC to compute object allocation statistics
 
GC.disable
 
puts ​"Total Ruby objects before operation: ​#{ObjectSpace.count_objects[:TOTAL]}​"
 
 
# open pipe, then close "read" end on child side,
 
# and "write" end on parent side
 
read, write = IO.pipe
 
 
pid = fork ​do
 
# child may run GC as usual
 
GC.enable
 
 
read.close
 
result = heavy_function
 
# use Marshal.dump to save Ruby objects into the pipe
 
Marshal.dump(result, write)
 
 
exit!(0)
 
end
 
 
write.close
 
result = read.read
 
# make sure we wait until the child finishes
 
Process.wait(pid)
 
 
# use Marshal.dump to load Ruby objects from pipe
 
puts Marshal.load(result).inspect
 
 
# this number should be not too different from the previous one
 
puts ​"Total Ruby objects after operation: ​#{ObjectSpace.count_objects[:TOTAL]}​"

When we run the code, we see that despite the child allocating 400,000--450,000 objects, the parent process doesn’t grow at all.

 
$ ​ruby forked_process_io_pipe_example.rb
 
Total Ruby objects before operation: 30163
 
#<BigDecimal:7f99b3a612e8,'0.5016076916 4137E5',18(27)>
 
Total Ruby objects after operation: 30163

This technique is very useful for long-running Ruby applications that occasionally have to perform memory-heavy operations. But for Rails, there are usually better solutions.

Most modern deployments support the idea of background jobs. For example, delayed_job gem[33] essentially implements the same idea. It lets you delay any function call by serializing the function and its data into the database, and then executing the code in the separated, short-lived process (usually launched by a rake task).

There are many other background job implementations that do the same thing. You can use any of them.

But beware of the ones that use threads instead of separate processes. A notable example is Sidekiq.[34] It is usually one Ruby process running several dozen Ruby threads. All these share the same ObjectSpace, so when one thread grows, the whole process needs a restart. So make sure you use one of the process management tools we talked about earlier to monitor and restart the Sidekiq worker.

Both cycling and forking keep the Ruby process under a certain memory limit, so that GC has less work to do and takes less time to complete. It’s GC time that we’re really optimizing here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.81.33