Test Harnesses

As you’ve seen in previous chapters, distributed systems have failure modes that are difficult to provoke in development or QA environments. To be more thorough about testing various components together, we often resort to an “integration testing” environment. In this environment, our system is fully integrated to all the other systems it interacts with.

Integration testing presents problems of its own, however. What version should we test against? For greatest assurance, we’d like to test against the versions of our dependencies that will be current when we release our system. We could prove by induction that this approach constrains the entire company to testing only one new piece of software at a time. (Naturally, the proof itself is left as an exercise for the reader.) Furthermore, the interdependencies of today’s systems create such an interlocking web of systems that an integration testing environment really becomes unitary—one global integration test that duplicates the real production systems of the entire enterprise. Such a unitary environment would need change control just as rigorous—or perhaps more so—than the actual production environments.

There is a more abstract difficulty. Integration test environments can verify only what the system does when its dependencies are working correctly. Although it may be possible to provoke the remote system into returning errors, it’s still functioning more or less within specifications. If the specifications say, ”The system shall return an error code 14916 unless the request includes the date of the last telephone sanitization,” then the caller can force that error condition to occur. Nevertheless, the remote system is still operating within specifications.

The main theme of this book, however, is that every system will eventually end up operating outside of spec; therefore, it’s vital to test the local system’s behavior when the remote system goes wonky. Unless the designers of the remote system built in modes that simulate the whole range of out-of-spec failures that can occur naturally in production, there will be behaviors that integration testing does not verify.

A better approach to integration testing would allow you to test most or all of these failure modes. It should preserve or enhance system isolation to avoid the version-locking problem and allow testing in many locations instead of the unitary enterprise-wide integration testing environment I described earlier.

To do that, you can create test harnesses to emulate the remote system on the other end of each integration point. Hardware and mechanical engineers have used test harnesses for a long time. Software engineers have used test harnesses, but not as maliciously as they should. A good test harness should be devious. It should be as nasty and vicious as real-world systems will be. The test harness should leave scars on the system under test. Its job is to make the system under test cynical.

Consider building a test harness that substitutes for the remote end of every web services call. Because the remote call uses the network, the socket connection is susceptible to the following failures:

  • It can be refused.

  • It can sit in a listen queue until the caller times out.

  • The remote end can reply with a SYN/ACK and then never send any data.

  • The remote end can send nothing but RESET packets.

  • The remote end can report a full receive window and never drain the data.

  • The connection can be established, but the remote end never sends a byte of data.

  • The connection can be established, but packets could be lost, causing retransmit delays.

  • The connection can be established, but the remote end never acknowledges receiving a packet, causing endless retransmits.

  • The service can accept a request, send response headers (supposing HTTP), and never send the response body.

  • The service can send one byte of the response every thirty seconds.

  • The service can send a response of HTML instead of the expected XML.

  • The service can send megabytes when kilobytes are expected.

  • The service can refuse all authentication credentials.

These failures fall into distinct categories: network transport problems, network protocol problems, application protocol problems, and application logic problems. With a little mental exercise, you can find failure modes in every layer of the seven-layer OSI model. It would be costly and bizarre to add switches and flags to applications that would allow them to simulate all of these failures. Who would want to risk turning on a “simulated failure” once the system is promoted into production? Integration testing environments are good at examining failures only in the seventh layer—the application layer—and not even all of those.

A test harness “knows” that it’s meant for testing; it has no other role to play. Although the real application wouldn’t be written to call the low-level network APIs directly, the test harness can be. Therefore, it’s able to send bytes too quickly, or very slowly. It can set up extremely deep listen queues. It can bind to a socket and then never service a single connection attempt. The test harness should act like a little hacker, trying all kinds of bad behavior to break callers.

Many kinds of bad behavior will be similar for different applications and protocols. For example, refusing connections, connecting slowly, and accepting requests without reply would apply to any socket protocol: HTTP, RMI, or RPC. For these, a single test harness can simulate many types of bad network behavior. One trick I like is to have different port numbers indicate different kinds of misbehavior. On port 10200, it would accept connections but never reply. Port 10201 gets a connection and a reply, but the reply will be copied from /dev/random. Port 10202 will open a connection, then drop it immediately, and so on. That way, I don’t need to change modes on the test harness and a single test harness can break many applications. It can even help with functional testing in the development environment by letting multiple developers hit the test harness from their workstations. (Of course, it’s also worthwhile to let the developers run their own instances of the killer test harness.)

Bear in mind that your test harness might be really, really good at breaking, even killing applications. It’s not a bad idea to have the test harness log requests, in case your application dies without so much as a whimper to indicate what killed it.

A test harness that injects faults will unearth many hidden dependencies. Injecting latency in requests will uncover many more. Reordering TCP packets will uncover more again. The only limit is your imagination.

The test harness can be designed like an application server; it can have pluggable behavior for the tests that are related to the real application. A single framework for the test harness can be subclassed to implement any application-level protocol, or any perversion of the application-level protocol, necessary. Broadly speaking, a test harness leads toward “chaos engineering,” which we explore in Chapter 17, Chaos Engineering.

Remember This

Emulate out-of-spec failures.

Calling real applications lets you test only those errors that the real application can deliberately produce. A good test harness lets you simulate all sorts of messy, real-world failure modes.

Stress the caller.

The test harness can produce slow responses, no responses, or garbage responses. Then you can see how your application reacts.

Leverage shared harnesses for common failures.

You don’t necessarily need a separate test harness for each integration point. A “killer” server can listen to several ports, creating different failure modes depending on which port you connect to.

Supplement, don’t replace, other testing methods.

The Test Harness pattern augments other testing methods. It does not replace unit tests, acceptance tests, penetration tests, and so on. Each of those techniques help verify functional behavior. A test harness helps verify “nonfunctional” behavior while maintaining isolation from the remote systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.5.239