Synchronize through Common Lock

Existing explicit synchronization gives us one of the easiest seams to exploit. The most common form occurs when the code under test has a critical section that uses another class that is already thread-safe. In such cases, you can use the synchronization in the other class to suspend within the critical section. Consider the code in Listing 13-7.

Listing 13-7: An example for discussion on common lock synchronization

public final class HostInfoService { // Singleton
  private static HostInfoService instance;
  private HostInfoService() {
   ...
  }

  public static synchronized HostInfoService getInstance() {
    if (instance == null ) {
      instance = new HostInfoService();
    }
    return instance;
  }

  public HostInfo getHostInfo(String hostname) {
    HostInfo hostInfo = new HostInfo();
    ...
    return hostInfo;
  }

public class HostCache {
  private Map<String, HostInfo> hostInfoCache;

  public HostInfo getHostInfo(String hostname) {
    HostInfo info = hostInfoCache.get(hostname);
    if (info == null) {
      info = HostInfoService.getInstance()
          .getHostInfo(hostname);
      hostInfoCache.put(hostname, info);
    }
    return info;
  }
}

First of all, where is the race condition? Introspecting on the intent of HostCache, this kind of functionality is typically designed to be a singleton instance, even if it is not implemented as a Singleton pattern. This means that the application will ensure that there is one and only one of these, at least within some shared scope. As a result, the hostInfoCache map is a shared data structure, even though it is not static. In Java, Map<K,V> is an interface defining a standard key-value storage interface. I have not shown the implementation actually allocated because the implementation type and its synchronization are not the source of our race condition.7 It also is not the seam we will exploit for our example, although it should be seen as a viable alternative.

7. Assuming we are using a thread-safe Map implementation. HashMap (http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html), perhaps the most commonly used Map implementation, is not thread-safe.

Assuming that shared scope involves multiple threads, as is often the case in modern networked applications, the getHostInfo() method has a race condition similar to that in our pseudo-singleton from the introduction but a little bit broader.

Assuming the map implementation itself is thread-safe, the start of our critical section is the initial assignment to info, where the value obtained from the map is stored locally. From that point until the new value is stored defines the extent of our critical section. The value stored for the key can change between these two points on a different thread and give us two different cached HostInfos for the same key.

So what seams do we have in the critical section? There are three that are immediately apparent by nature of the method calls involved, as follows.

1. HostInfoService.getInstance()

2. HostInfoService.getHostInfo()

3. Map.put()

For this example, we will use the first one, HostInfoService.get-Instance(). Our makeshift implementation of getHostInfo() is not a very appealing seam, although a more realistic implementation could provide one. Map.put() could be difficult to use because we have only exposed the interface, not the implementation, and we are not in control of that code. Keep that thought in mind while we look at the test we would write to reproduce the race condition.

First, let’s define a Callable to encapsulate the usage that will reproduce the race condition (Listing 13-8). We will create this as a private inner class of our test class.

Listing 13-8: A Callable to help reproduce the race condition in Listing 13-7

private class GetHostInfoRaceTask

    implements Callable<HostInfo> {
  private final HostCache cache;
  private final String hostname;
  public GetHostInfoRaceTask(HostCache cache,
      String hostname) {
    this.cache = cache;
    this.hostname = hostname;
  }

  @Override
  public HostInfo call() throws Exception {
    return cache.getHostInfo(hostname);
  }
}

Passing the HostCache into the constructor supports the intended singleton usage of the cache. The hostname parameter supports defining the same lookup key for each thread, which is a necessary characteristic for the race condition.

Now that we have our task, the test method to reproduce the race condition is shown in Listing 13-9.

Listing 13-9: A test to reproduce the race condition in Listing 13-7

@Test
public void testGetHostInfo_Race()
    throws InterruptedException, ExecutionException {
  HostCache cache = new HostCache();
  String testHost = "katana";
  FutureTask<HostInfo> task1 =
    new FutureTask<HostInfo>(
        new GetHostInfoRaceTask(cache, testHost)
    );
  FutureTask<HostInfo> task2 =
    new FutureTask<HostInfo>(
        new GetHostInfoRaceTask(cache, testHost)
    );

  Thread thread1 = new Thread(task1);
  Thread thread2 = new Thread(task2);

  thread1.start();
  thread2.start();
  HostInfo result1 = task1.get();
  HostInfo result2 = task2.get();
  Assert.assertNotNull(result1);
  Assert.assertNotNull(result2);
  Assert.assertSame(result1, result2);
}

In the pattern of the Four-Phase Test, everything before we start the threads sets up the fixture, starting the threads and obtaining the results from the futures exercises the SUT, the assertions verify the results, and we rely on Garbage-Collected Teardown. We also choose to declare exceptions in the throws clause to keep our test as simple as possible. If you run this test several times, it should fail a significant percentage of the runs, always at the assertSame() assertion.

We now have a unit test that sporadically fails. However, statistics are not good enough for what we want. It may be tolerable to rely on statistics when tests fail as much as 25–50% of the time like this one. When subtler race conditions manifest in failure rates below 1%, and the overall test bed takes even a moderate time to run—say five minutes—it would take on average over four hours8 to start to have reasonable confidence of a fix. Let’s make the race condition predictable.

8. Five minutes times 100 gives 500 minutes, or just over eight hours. Statistically, there is a 50% probability that we would see the failure in the first four hours or so.

Looking at the implementation of the getInstance() method, we see that the method is synchronized. The method could have also used explicit synchronization on a specific static object instead, and many would say that that is better practice, but for our purposes it does not matter, so we use the briefer formulation.

In Java, a synchronized method synchronizes on its object instance. But this is a static synchronized method, so it synchronizes on its class object. In either case, the object acts as a monitor to control exclusive execution to all similarly synchronized methods. The synchronization primitives themselves are built into the Object class from which every Java class derives.

We will use this fact about Java synchronization to trap our test threads in the middle of the critical section, ensuring that they trigger the race condition failure mode. How can we ensure that the monitor for the HostInfoService is taken when the threads get to it? Let’s enclose the two calls to Thread.start() in a synchronized block.

But wait, that only guarantees that the class monitor is locked when the threads are started, not that it will be locked when they arrive at the call to the getInstance() method. We have a race condition in our test setup! We can remedy this easily. I anticipated this and chose to use explicit Thread instances to run the tasks rather than using something like a ThreadPoolExecutor, as might be suggested by the use of Callable. While the executor does a nice job of encapsulating and managing the threads for production purposes, we want to have a little more insight and control than that encapsulation allows.

Java has a well-defined state machine for a thread’s life cycle. Specifically, when a Java thread is waiting on a monitor, it enters the Thread.State.BLOCKED state until it is released. It would be bad style to create a busy wait for this state in production code, but our test is simple and deterministic. The two lines to start the thread now become part of the fixture setup and look like Listing 13-10 (added lines in bold type).

Listing 13-10: Ensuring that test threads are blocked before testing. The bold-faced lines are added around the corresponding lines from Listing 13-9.

synchronized (HostInfoService.class) {
  thread1.start();
  while (thread1.getState() != Thread.State.BLOCKED) { }
  thread2.start();
  while (thread2.getState() != Thread.State.BLOCKED) { }
}

By the time our test leaves the synchronized block, both of our threads are waiting on the HostInfoService.class monitor in the middle of the critical section. One of the threads will obtain the lock and continue. The other will wait until the lock is released. Either way, we have deterministically reproduced the race condition. The test now fails 100% of the time in its original implementation without any changes to the code under test.

We can now easily fix the race condition in any of several ways. Opting for the easiest for demonstration purposes, let’s just add synchronized to the method definition for HostCache.getHostInfo() so that it looks like

public synchronized HostInfo getHostInfo(String hostname) {

You could refine the scope a little by putting an explicit synchronized block only around the critical section, leaving the return outside and requiring info to be defined, but not assigned, before the block. Depending on the overall synchronization needs of the class, the block could synchronize on this or on a special lock object allocated just to protect this critical section.

Although the specifics of Object monitors are very Java-centric, similar mechanisms exist in all thread-synchronization libraries. Not all are as tightly integrated to the object model, but monitors, mutexes, and critical sections are some of the universal building blocks of thread programming.

As a final note, more complicated locking schemes may require a higher level of ingenuity. For example, we implemented a busy wait in the test until the thread was blocked. How do we know it was blocked at the point at which we thought it was? Does it matter? The short answers are, “We don’t” and “Maybe.”

In our simple example, it made our test deterministically fail, and inspection tells us that there is no other hypothesis to explain it. We could also put the test in the debugger and verify that the threads are blocked at the expected location before leaving the synchronized block. If the code does not change—a very poor assumption to make about any production code—then simply making the test reproducible should be sufficient evidence. However, for code maintenance purposes, we should know that it is behaving the way we designed it to behave, but there is no programmatic way to assert that.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.133.180