Create Back Pressure

Every performance problem starts with a queue backing up somewhere. Maybe it’s a socket’s listen queue. Maybe it’s the OS’s run queue or the databases I/O queue.

If a queue is unbounded, it can consume all available memory. As the queue grows, the time it takes for a piece of work to get all the way through it grows too. (See Little’s law.[16]) So as a queue’s length reaches toward infinity, response time also heads toward infinity. We really don’t want unbounded queues in our systems.

On the other hand, if the queue is bounded, we have to decide what to do when it’s full and a producer tries to stuff one more thing into it. Even if the object is wafer-thin, the queue has no space.

We really have only a few options:

  • Pretend to accept the new item but actually drop it on the floor.

  • Actually accept the new item and drop something else from the queue on the floor.

  • Refuse the item.

  • Block the producer until there is room in the queue.

For some use cases, dropping the item may be the best option. For data whose value decreases rapidly with age, dropping the oldest item in the queue might be the best option.

Blocking the producer is a kind of flow control. It allows the queue to apply “back pressure” upstream. Presumably that back pressure propagates all the way to the ultimate client, who will be throttled down in speed until the queue releases.

TCP uses extra fields in each packet to create back pressure. Once the window is full, senders are not allowed to send anything until released. Back pressure from the TCP window can cause the sender to fill up its transmit buffers, in which case subsequent calls to write to the socket will block. The mechanisms change but the idea is still to slow the producer down until the consumer can catch up.

Obviously back pressure can lead to blocked threads. It’s important to distinguish back pressure due to a temporary condition from back pressure because a consumer is just broken. The Back Pressure pattern works best with asynchronous calls and programming. One of the many Rx frameworks can help here, as can actors or channels, if your language supports those.

Back pressure only helps manage load when the pool of consumers is finite. That’s because the “upstream” is so diverse that there’s no systemic effect on all of them. We can illustrate this with an example. Suppose your system provides an API for user-created “tags” at a specific location. It is used by native apps and web apps.

Internally, there’s a certain rate at which you can create and index new tags. That’s going to be limited by your storage and indexing technology. When the rate of “create tag” calls exceeds the storage engine’s limit, what happens? The calls get slower and slower. Without back pressure, this would lead to a progressive slowdown until the API seems to be offline.

Instead, we can create back pressure by use of a blocking queue for “create tag” calls. Let’s say each API server is allowed 100 simultaneous calls to the storage engine. When the 101st call arrives at the API server, the calling thread blocks until there is an open slot in the queue. That blocking is the back pressure. The API server cannot make calls any faster than it is allowed.

In this case, a flat limit of 100 calls per server is very crude. It means that one API server may have blocked threads while another has free slots available. We could make this smarter by letting the API servers make as many calls as they want but put the blocking on the receiver’s end. In that case, our off-the-shelf storage engine must be wrapped with a service to receive calls, measure response times, and adjust its internal queue size to maximize throughput and protect the engine.

At some point, though, the API server still has a thread waiting on a call. As we saw in Blocked Threads, blocked threads are a quick path to downtime. At the edge of your system boundary, blocked threads will frustrate a user or provoke a retry loop. As such, back pressure works best within a system boundary. At the edges, you also need load shedding and asynchronous calls.

In our example, the API server should accept calls on one thread pool and then issue the outbound call to storage on another set of threads. That way, when the outbound call blocks, the request-handling thread can time out, unblock, and respond with an HTTP 503. Alternatively, it could drop a “create tag” command in a queue for later indexing. Then an HTTP 202 would be more appropriate.

A consumer inside your system boundary will experience back pressure as a performance problem or as timeouts. In fact, it does indicate a real performance problem—the consumers collectively generated more load than the provider can handler! That doesn’t always mean the provider is to blame, though. It might have enough capacity for “normal” traffic, but one consumer went nuts and started eating Cincinnati. It could be due to an attack of self-denial or just organic changes in traffic patterns.

When Back Pressure kicks in, monitoring needs to know about it. That way you can tell whether it’s a random fluctuation or a trend.

Remember This

Back Pressure creates safety by slowing down consumers.

Consumers will experience slowdowns. The only alternative is to let them crash the provider.

Apply Back Pressure within a system boundary

Across boundaries, look at load shedding instead. This is especially true when the Internet at large is your user base.

Queues must be finite for response times to be finite.

You only have a few options when a queue is full. All of them are unpleasant: drop data, refuse work, or block. Consumers must be careful not to block forever.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.130.24