An Ounce of Prevention?

When such staggering costs result from such a small error, the natural response is to say, “This must never happen again.” (I’ve seen ops managers pound their shoes on a table like Nikita Khrushchev while declaring, “This must never happen again.”) But how can it be prevented? Would a code review have caught this bug? Only if one of the reviewers knew the internals of Oracle’s JDBC driver or the review team spent hours on each method. Would more testing have prevented this bug? Perhaps. Once the problem was identified, the team performed a test in the stress test environment that did demonstrate the same error. The regular test profile didn’t exercise this method enough to show the bug. In other words, once you know where to look, it’s simple to make a test that finds it.

Ultimately, it’s just fantasy to expect every single bug like this one to be driven out. Bugs will happen. They cannot be eliminated, so they must be survived instead.

The worst problem here is that the bug in one system could propagate to all the other affected systems. A better question to ask is, “How do we prevent bugs in one system from affecting everything else?” Inside every enterprise today is a mesh of interconnected, interdependent systems. They cannot—must not—allow bugs to cause a chain of failures. We’re going to look at design patterns that can prevent this type of problem from spreading.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.175.113