Reliability

Applications may crash unexpectedly, stop responding, can have memory leaks,or a bug can make them run slower. In addition to problems that an application may have, we may experience hardware failures or network problems. We need to be sure that messages arrive at their destination no matter what problems our infrastructure may experience. Reliability means every event is guaranteed to arrive at its destination.

Most message queue implementations rely on a broker to have reliability, which means messages are queued and then delivered to their destinations, whereas in ZeroMQ, applications directly communicate with each other and messages are resent if they are lost for some reason.

It is easy to figure out if either the server or the client stops responding when we use the request-reply pattern. If the client or the server does not receive messages from each other, it means there is a problem.

If you recall from Chapter 2, Introduction to Sockets, we said that the publisher does not know whether a subscriber is connected or not. This also means that if a subscriber starts experiencing problems, the publisher will not know about it and the subscriber will miss messages the publisher has been transmitting.

When it comes to reliability in the publish-subscribe pattern, we need bidirectional communication between the publisher and the subscribers. However, the publisher-subscriber pattern does not support bidirectional communication in ZeroMQ, therefore, the option is to use the dealer-router pattern.

Having reliability in the request-reply pattern is relatively easier than in the publish-subscribe pattern. We could simply retry sending the message if we have not received a reply yet. If we still do not get a reply after trying a number of times, we could discard the communication.

Heartbeating is a layer that can be used to detect if a worker has died or is alive. However, it should not be used with the request-reply pattern. Heartbeating travels asynchronously between resources.

If there are a limited number of subscribers connected to the publisher then TCP is fine, whereas if there are massive number of subscribers, in that case, it would be a better idea to use PGM.

Slow subscribers in a publish-subscribe pattern

A serious issue of the publish-subscribe pattern is slow subscribers. A flawless environment would be one where the publisher sends messages to the subscriber at full speed, but this is utopia. In reality, subscribers cannot keep up with the publisher most of the time. They are either poorly implemented, have network issues, or some other reason.

Let's consider the following example where the subscriber runs slower and we abort the program. First, let's look at the server code:

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/time.h>
#include <time.h>
#include "zmq.h"

int main (int argc, char const *argv[]) {
  void* context = zmq_ctx_new();
  void* publisher = zmq_socket(context, ZMQ_PUB);
  printf("Starting Server...
");  
  
  zmq_bind(publisher, "tcp://*:4040");
  
  for(;;) {
    time_t current_time = time(NULL) % 86400;
    char str[11];
    snprintf(str, sizeof str, "%lu", current_time);
    
    int s_len = strlen(str);
    zmq_msg_t message;
    zmq_msg_init_size(&message, s_len);
    memcpy(zmq_msg_data(&message), str, s_len);
    zmq_msg_send(&message, publisher, 0);
    zmq_msg_close(&message);
    sleep(1);
  }
  zmq_close(publisher);
  zmq_ctx_destroy(context);
  
  return 0;
}

And the subscriber that runs slower:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/time.h>
#include <time.h>
#include "zmq.h"

#define DELAY 4

int main (int argc, char const *argv[]) {

  void* context = zmq_ctx_new();
  void* subscriber = zmq_socket(context, ZMQ_SUB);
  
  printf("Getting data...
");
  
  int conn = zmq_connect(subscriber, "tcp://localhost:4040");
  conn = zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, 0, 0);
  
  int i ;

  for(i = 0; i <  10; i++) {
    time_t current_time = time(NULL) % 86400;

    zmq_msg_t reply;
    zmq_msg_init(&reply);
    zmq_msg_recv(&reply, subscriber, 0);
    int length = zmq_msg_size(&reply);
    
    char* value = malloc(length + 1);
    memcpy(value, zmq_msg_data(&reply), length);
    zmq_msg_close(&reply);

    unsigned long t_timer;
    sscanf(value, "%lu", &t_timer);

    int res = abs(current_time - t_timer);

    free(value);

    if(res > DELAY) {
      printf("Subscriber is too slow. Aborting.
");
      break;
    }
    sleep(3);
  }
  
  zmq_close(subscriber);
  zmq_ctx_destroy(context);
  
  
  return 0;
}

We have defined a DELAY constant in our subscriber code and we calculate the time the server takes to send a message and the local time of the subscriber. If the difference between these values is larger than the DELAY constant, we abort the subscriber as it means it runs slower. This is also known as suicidal snail pattern in ZeroMQ terminology.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.244.153