Fail Gracefully

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Tame Complexity

Next Chapter

Be Stylish

Tip 5

Fail Gracefully

[White Belt] Writing code that fails well is just as important as writing code that works well.

What happens when code fails? It’s going to. Even if you wrote your part perfectly, there are all kinds of conditions that could cause the overall system to fail:

A rogue mail daemon on computer, busy sending offers of great wealth from some foreign country, consumes all the RAM and swap. Your next call to malloc() returns ETOOMUCHSPAM.
Java Update 134,001 fills up the system’s hard drive. You call write(), and the system returns ESWITCHTODECAF.
You try to pull data off a tape, but the tape robot is on a ship at sea, rolling waves cause the robot to drop the tape, and the driver returns EROBOTDIZZY.
Cosmic rays flip a bit in memory, causing a memory access to return 0x10000001 instead of 0x1, and you discover that this makes for a very bad parameter to pass into memcpy() after it returns EMEMTRASHED.

You may think, “Yeah, right” but all these cases actually happened. (Yes, I had to fix a tape robot controller because it would drop tapes when on a Navy ship.) Your code cannot naively assume that the world around it is sane—the world will take every opportunity to prove it wrong.

How your code fails is just as important as how it works. You may not be able to fix the failure, but if nothing else, your code should strive to fail gracefully.

Order of Operations

In many textbook programs, their environment is a clean slate, and the program runs to completion. In many messy, nontextbook programs, the environment is a rugby match of threads and resources, all seemingly trying to beat each other into submission.

Consider the following example: you’re creating a list of customer names and addresses that will be fed to a label printer. Your code gets passed a customer ID and a database connection, so you need to query the database for what you need. You create a linked list whose add() method looks like this:

ListUpdate.rb
	def add(customer_id) # BAD BAD BAD, see text
	begin
	@mutex.lock
	old_head = @head
	@head = Customer.new
	@head.name =
	@database.query(customer_id, :name)
	@head.address =
	@database.query(customer_id, :address)
	@head.next = old_head
	ensure
	@mutex.unlock
	end
	end

(Yes, I know this example is contrived. Bear with me.)

This code works in the happy path: the new element is put at the head of the list, it gets filled in, and everything is happy. But what if one of those queries to the database raises an exception? Take a look at the code again.^[9]

This code doesn’t fail gracefully. In fact, it does collateral damage by allowing a database failure to destroy the customer list. The culprit is the order of operations:

The list @head and @head.next are absolutely vital to the list’s integrity. These shouldn’t be monkeyed with until everything else is ready.
The new object should be fully constructed before inserting into the list.
The lock should not be held during operations that could block. (Assume there’s other threads wanting to read the list.)

Transactions

In the previous section, the example had only one essential bit of state that needed to stay consistent. What about cases where there’s more than one? Consider the classic example of moving money between two bank accounts:

Transaction.rb
	savings.deduct(100)
	checking.deposit(100)

What happens if the database croaks right after the money has been deducted and the deposit into checking fails? Where did the money go? Perhaps you try to solve that case by putting it back into the savings account:

Transaction.rb
	savings.deduct(100) # Happily works

	begin
	checking.deposit(100) # Fails: database went down!
	rescue
	begin
	# Put money back
	savings.deposit(100) # Fails: database still dead
	rescue
	# Now what???
	end
	end

Nice try, but that doesn’t help if the second deposit fails, too.

The tool you need here is a transaction. Its purpose is to allow several operations, potentially to several objects, to be either fulfilled completely or rolled back.

Transactions (here in a made-up system) would allow our previous example to look like this:

Transaction.rb
	t = Transaction.new(savings, checking)
	t.start

	# Inject failure
	checking.expects(:deposit).with(100).raises

	begin
	savings.deduct(100)
	checking.deposit(100)
	t.commit
	rescue
	t.rollback
	end

You’ll usually find transactions in databases, because our example scenario is exceedingly common in that field. You may find variations on this theme anywhere systems require an all-or-nothing interlock.

Failure Injection

So far, we’ve talked about how your code responds to likely failures. For purposes of testing, how do you ensure your code responds well when an essential resource dies, passes on, is no more, ceases to be, pushes up daisies, and becomes an ex-resource?

The solution is to inject failures using an automated test harness. This is easiest with a mock object framework, because you can tell the mock to return good data several times and then return something bogus or throw an exception. Likewise, in the code under test, you assert that the appropriate exception is raised.

Revisiting our list update problem, here’s some test code that simulates a valid database response for key 1 and a failure on the query for key 2:

ListUpdate2.rb
	require 'rubygems'
	require 'test/unit'
	require 'mocha'

	class ListUpdateTest < Test::Unit::TestCase
	def test_database_failure
	database = mock()
	database.expects(:query).with(1, :name).
	returns('Anand')
	database.expects(:query).with(1, :address).
	returns('')
	database.expects(:query).with(2, :name).
①	raises

	q = ShippingQueue.new(database)
	q.add(1)

	assert_raise(RuntimeError) do
②	q.add(2)
	end

	# List is still okay
③	assert_equal 'Anand', q.head.name
	assert_equal nil, q.head.next
	end
	end

① Injection of RuntimeError exception.

② Call will raise; the assert_raise is expecting it (and will trap the exception).

③ Verify that the list is still intact, as if q.add(2) were never called.

Failure injection of this sort allows you to think through—and verify—each potential scenario of doom. Test in this manner just as often as you test the happy path.

Test Monkeys

You can think through scenarios all day long and build tremendously robust code. Yet most fool-proof programs can be foiled by a sufficiently talented fool. If you don’t have such a fool handy, the next best thing is a test monkey.

In my first job working on handheld computers, we had a program called Monkey that would inject random taps and drags into the UI layer, as if they had come from the touchscreen. There was nothing fancier than that. We’d run Monkey until the system crashed.

Monkey may not have been a talented fool, but a whole bunch of monkeys tapping like mad, 24 hours a day, makes up for lack of talent. Alas, no Shakespeare (but perhaps some E. E. Cummings) and a whole bunch of crashes. The crashes were things we couldn’t have envisioned—that was the point.

In the same way, can you create a test harness that beats the snot out of your program with random (but valid) data? Let it run thousands or millions of cycles; you never know what might turn up. I used this technique on a recent project and discovered that once in a blue moon, a vendor API function would return “unknown” for the state of a virtual machine. What do they mean, they don’t know the state? I had no idea the function could return that. My program crashed when it happened. Lesson learned…again.

Actions

Revisit the previous code with the customer list. How would you fix it? Here’s a shell to work with:

ListUpdate2.rb
	require 'thread'

	class Customer
	attr_accessor :name, :address, :next

	def initialize
	@name = nil
	@address = nil
	@next = nil
	end
	end

	class ShippingQueue
	attr_reader :head

	def initialize(database)
	@database = database
	@head = nil
	@mutex = Mutex.new
	end

	def add(customer_id)
	# Fill in this part
	end
	end

Use the test code from Failure Injection to see whether you got it right.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Fail Gracefully

Create new playlist

Sign In

Sign Up

Order of Operations

Transactions

Failure Injection

Test Monkeys

Actions

Table of Contents for
Fail Gracefully