How a queue works

The Simple Queue Service is a service that delivers a highly available message queue service and operates on a standard HTTP access model. It has the potential to deliver unlimited message capacity for any application size and is delivered as a pay-per-request service, so your costs of running the queue service automatically scale with the size of your application.

A queue is essentially used as a message repository that stores the message on a distributed cluster of servers. Once the message is stored, it can be made visible for consumers to read or made invisible, which means stored but not ready to be read. When messages are produced, they are stored on the cluster with a randomized distribution, as shown in this diagram:

Here, we see the messages A, B, C, D, and E which were produced in that sequence being randomly distributed across a set of hosts in the cluster. Not all messages are present on all hosts, but all messages will be present on multiple hosts for high availability and durability.

Messages that are being consumed will be read from the cluster by polling the cluster. We can implement short polling, which is the default for the SQS, where we call up the service, sample a few nodes from the cluster, and retrieve a message. For example, a short polling request would sample just a small subset of hosts and each would serve one message, thus the reader would retrieve the messages B, A, and D from the example diagram before closing the connection to the server.

If we want to read the maximum possible amount of messages, then we can introduce long polling to our message read requests. This gives us the ability to poll for up to 20 seconds and sample all the servers in the cluster. With long polling, we can batch read up to 10 messages at a time. In our example, the servers would serve perhaps B, A, D, E, and lastly C as the messages from the queue.

Long polling can also help reduce the number of requests being sent to the queue when queues are empty. We need to remember that we pay for every request against the queue service, even if the response is empty. So when queue responses are empty, we should always try to perform the next read with exponential back-off so that we aren't reading an empty queue at a constant rate. For example, if the response is empty, we retry in 2 seconds instead of 1, and after the second one is empty, we can retry in 4 seconds, then 8, then 16, and so on.

Table of Contents for How a queue works

Create new playlist

Sign In

Sign Up

Table of Contents for
How a queue works