Slow Responses

As you saw in Socket-Based Protocols, generating a slow response is worse than refusing a connection or returning an error, particularly in the context of middle-layer services.

A quick failure allows the calling system to finish processing the transaction rapidly. Whether that is ultimately a success or a failure depends on the application logic. A slow response, on the other hand, ties up resources in the calling system and the called system.

Slow responses usually result from excessive demand. When all available request handlers are already working, there’s no slack to accept new requests. Slow responses can also happen as a symptom of some underlying problem. Memory leaks often manifest via Slow Responses as the virtual machine works harder and harder to reclaim enough space to process a transaction. This will appear as a high CPU utilization, but it is all due to garbage collection, not work on the transactions themselves. I have occasionally seen Slow Responses resulting from network congestion. This is relatively rare inside a LAN but can definitely happen across a WAN—especially if the protocol is too chatty. More frequently, however, I see applications letting their sockets’ send buffers getting drained and their receive buffers filling up, causing a TCP stall. This usually happens in a hand-rolled, low-level socket protocol, in which the read routine does not loop until the receive buffer is drained.

Slow responses tend to propagate upward from layer to layer in a gradual form of cascading failure.

You should give your system the ability to monitor its own performance, so it can also tell when it isn’t meeting its service-level agreement. Suppose your system is a service provider that’s required to respond within one hundred milliseconds. When a moving average over the last twenty transactions exceeds one hundred milliseconds, your system could start refusing requests. This could be at the application layer, in which the system would return an error response within the defined protocol. Or it could be at the connection layer, by refusing new socket connections. Of course, any such refusal to provide service must be well documented and expected by the callers. (Since the developers of that system will surely have read this book, they’ll already be prepared for failures, and their system will handle them gracefully.)

Remember This

Slow Responses trigger Cascading Failures.

Upstream systems experiencing Slow Responses will themselves slow down and might be vulnerable to stability problems when the response times exceed their own timeouts.

For websites, Slow Responses cause more traffic.

Users waiting for pages frequently hit the Reload button, generating even more traffic to your already overloaded system.

Consider Fail Fast.

If your system tracks its own responsiveness, then it can tell when it’s getting slow. Consider sending an immediate error response when the average response time exceeds the system’s allowed time (or at the very least, when the average response time exceeds the caller’s timeout!).

Hunt for memory leaks or resource contention.

Contention for an inadequate supply of database connections produces Slow Responses. Slow Responses also aggravate that contention, leading to a self-reinforcing cycle. Memory leaks cause excessive effort in the garbage collector, resulting in Slow Responses. Inefficient low-level protocols can cause network stalls, also resulting in Slow Responses.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.114.125