Use Supervisory Control

Sometimes, when you are testing, you just cannot find a seam. Maybe a class is too well encapsulated. Perhaps the designer of the software under test did not recognize a need to allow the calling code to insert behavioral or monitoring hooks into the software. Maybe the software is already highly synchronized to a point at which it “couldn’t possibly have a race condition,” at least until the ongoing evolution of the system changes the assumptions under which it was originally thread-safe.

The latter happened in a system I worked with at one time. Imagine a system in which autonomous agents10 collaborate in a manner governed by a tightly designed state machine to collectively achieve the application goal. In this particular system, agents organized in a way to effectively model cause–effect relationships patiently waited for messages on a blocking queue from their causative predecessors. Life was good, orderly, and well synchronized in the agents’ world.

10. Software components that operate independently with a locally defined set of tasks to collectively effect emergent behaviors. The typical examples of autonomous agents from the natural world are ants, bees, and birds. For example, you can algorithmically define the behavior of a single bird relative to the movements of its nearest birds to effectively and accurately simulate the behavior of the entire flock.

Then we recognized the need for human influence on the process. Suddenly, events could come in sequences that were not anticipated by the state machine because of the inherently asynchronous nature of human interaction. One particularly inventive system tester created a test harness that rapidly and randomly exerted external influence on the system like an angry child at a keyboard. This system test alone generated a multitude of parallelism bugs literally overnight. Given that the system had very high robustness requirements and hundreds of users interacting with it at a time, the test was only a mild exaggeration of anticipated usage.

The challenge occurred because the race condition was found in the interaction between a very simple loop and an off-the-shelf concurrency component, much like the Java code11 in Listing 13-14.

11. You may want to reference the Javadoc for the BlockingQueue interface if you are interested in the details. It can be found at http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html.

Listing 13-14: A race condition in working with thread-safe components

import java.util.concurrent.BlockingQueue;

class Agent implements Runnable {
  private BlockingQueue<Event> incoming;
  ...
  @Override
  public void run() {
    processEvents();
  }


  public void processEvents() {
    Event event;
    while (event = incoming.take()) { // Blocks
      processEvent(event);
    }
  }

  public void addEvent(Event event) {
    incoming.put(event);
  }

  private void processEvent(Event event) {
    // Do something with the event
  }
}

One problem in particular occurred when two of a particular type of event arrived in rapid succession. In the wild, it happened that two of the same event could arrive at effectively the same time, and both of them would be queued. In isolated unit testing, we could never stuff the queue fast enough to recreate the condition.

So where is the seam? We have an almost trivial, very tight loop operating on a black-box data structure supplied by the system. The only apparent seam is a private method that we would rather not expose.12 At this point, I would encourage you to put down the book for a moment, think about the nature of preemptive multitasking, and see if you can figure out where the invisible seam resides.

12. In the real-life system, the equivalent of processEvent() was protected, but the bug occurred in a specific implementation of it that was itself strongly encapsulated and complex enough that it wouldn’t have been a good test to try to modify a copy of it.

The answer is in how the thread itself time slices. The point at which the loop blocks waiting for an event is a well-defined state. Because the thread life cycle in Java is explicit, we know the thread state when the loop is blocking. Java has methods to suspend and resume threads. These methods have been deprecated because in production systems this is a deadlock-prone way to synchronize tasks. However, for our purposes this is ideal and not susceptible to deadlock because of the simple and linear nature of our test. The test to deterministically reproduce the double-event bug might look like Listing 13-15.

Listing 13-15: A test to deterministically reproduce the bug in Listing 13-14

public class AgentTest {
  @Test
  public void testProcessEvents() {
    Agent sut = new Agent();
    Thread thread = new Thread(sut);
    thread.start();
    while(thread.getState() != Thread.State.BLOCKED) {}
    thread.suspend(); // Freeze the thread
    // Queue up the double event
    sut.addEvent(new ProblematicEvent());
    sut.addEvent(new ProblematicEvent());
    thread.resume(); // Bug triggered!
  }
}

By putting the thread into a known state and suspending it, we guarantee that no processing will occur while we set up the aberrant conditions. Of course, this test is missing assertions about the outcome of successful event processing, but you can add that in when you apply this to your own real world situation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.81.201