Decoupling with message queues

Decoupling encapsulated analytics processes with message queues has several advantages. It allows for change in any process without requiring the other ones to adjust. This is because there is no direct link between them.

It also builds in some robustness in case one process has a failure. The queue can continue to expand without losing data while the down process restarts, and nothing will be lost after things get going again.

What is a message queue?

Simple diagram of a message queue

New data comes into a queue as a message, it goes into line for delivery, and then it is delivered to the end server when it gets its turn. The process adding a message is called the publisher, and the process receiving the message is called the subscriber.

The message queue exists regardless of whether the publisher or subscriber is connected and online. This makes it robust against intermittent connections (intentional or unintentional). The subscriber does not have to wait until the publisher is willing to chat and vice versa.

The size of the queue can also grow and shrink as needed. If the subscriber gets behind, the queue just grows to compensate until it can catch up. This can be useful if there is a sudden burst in messages by the publisher. The queue will act as a buffer and expand to capture the messages while the subscriber is working through the sudden influx.

There is a limit, of course. If the queue reaches some set threshold, it will reject (and you will most likely lose) any incoming messages until the queue gets back under control.

Here is a contrived but real-world example of how this can happen:

Joe Cut-rate (the developer): Hey, when do you want this doo-hickey device to wake up and report?

Jim Unawares (the engineer): Every 4 hours

Joe Cut-rate: No sweat. I'll program it to start at 12 a.m. UTC, then every 4 hours after. How many of these you gonna sell again?

Jim Unawares: About 20 million.

Joe Cut-rate: Um….friggin awesome! I better hardcode that 12 a.m. UTC then, huh?

4 months later

Jim Unawares: We're only getting data from 10% of the devices. And it is never the same 10%. What the heck?

Angela the analyst: Every device in the world reports at exactly the same time. That's the first thing I checked. The message queues are filling up since our subscribers can't process that fast and new messages are dropped. If you hard coded the report time, we're going to have to get the checkbook out to buy a ton of bandwidth for the queues. And we need to do it now, since we are losing 90% of the data every 4 hours. You guys didn't do that, did you?

Although queues in practice typically operate with little lag, make sure the origination time of the data is tracked and not just the time the data was pulled off the queue. It can be tempting to just capture the time the message was processed to save space, but this can cause problems for your analytics.

Why is this important for analytics? If you only have the date and time the message was received by the subscribing server, it may not be as close as you think to the time the message was generated at the originating device. If there are recurring problems with message queues, the spread in time difference would ebb and flow without you being aware of it.

You will be using time values extensively in predictive modeling. If the time values are sometimes accurate and sometimes off, the models will have a harder time finding predictive value in your data.

Your potential revenue from re-purposing the data can also be affected. Customers are unlikely to pay for a service tracking event times for them if it is not always accurate. There is a simple solution. Make sure the time the device sends the data is tracked along with the time the data is received. You can monitor delivery times to diagnose issues and keep a close eye on information lag times. For example, if you notice the delivery time steadily increases just before you get a data loss, it is probably the message queue filling up. If there is no change in delivery time before a loss, it is unlikely to be the queue.

Another benefit of using the cloud is (virtually) unlimited queue sizes when using a managed queue service. This makes the situation described much less likely to occur.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.239.148