Cloning and load balancing

Traditional, multithreaded web servers are usually scaled only when the resources assigned to a machine cannot be upgraded any more or when doing so would involve a higher cost than simply launching another machine. By using multiple threads, traditional web servers can take advantage of all the processing power of a server, using all the available processors and memory. However, with a single Node.js process it is harder to do that, being single-threaded and having by default a memory limit of 1.7 GB on 64-bit machines (which needs a special command-line option called --max_old_space_size to be increased). This means that Node.js applications are usually scaled much sooner compared to traditional web servers, even in the context of a single machine, to be able to take advantage of all its resources.

Note

In Node.js, vertical scaling (adding more resources to a single machine) and horizontal scaling (adding more machines to the infrastructure) are almost equivalent concepts; both in fact involve similar techniques to leverage all the available processing power.

Don't be fooled into thinking about this as a disadvantage. On the contrary, being almost forced to scale has beneficial effects on other attributes of an application, in particular availability and fault-tolerance. In fact, scaling a Node.js application by cloning is relatively simple and it's often implemented even if there is no need to harvest more resources, just for the purpose of having a redundant, fail-tolerant setup.

This also pushes the developer to take into account scalability from the early stages of an application, making sure the application does not rely on any resource that cannot be shared across multiple processes or machines. In fact, an absolute prerequisite to scaling an application is that each instance does not have to store common information on resources that cannot be shared, usually hardware, such as memory or disk. For example, in a web server, storing the session data in memory or on disk is a practice that does not work well with scaling; instead, using a shared database will assure that each instance will have access to the same session information, wherever it is deployed.

Let's now introduce the most basic mechanism for scaling Node.js applications: the cluster module.

The cluster module

In Node.js, the simplest pattern to distribute the load of an application across different instances running on a single machine is by using the cluster module, which is part of the core libraries. The cluster module simplifies the forking of new instances of the same application and automatically distributes incoming connections across them, as shown in the following figure:

The cluster module

The Master process is responsible for spawning a number of processes (workers), each representing an instance of the application we want to scale. Each incoming connection is then distributed across the cloned workers, spreading the load across them.

Notes on the behavior of the cluster module

In Node.js 0.8 and 0.10, the cluster module shares the same server socket across the workers and leaves to the operating system, the job of load-balancing incoming connections across the available workers. However, there is a problem with this approach; in fact, the algorithms used by the operating system to distribute the load across the workers are not meant to load-balance network requests, but rather to schedule the execution of processes. As a result, the distribution is not always uniform across all the instances; often, a fraction of workers receive most of the load. This type of behavior can make sense for the operating system scheduler because it focuses on minimizing the context switches between different processes. The short story is that the cluster module does not work at its full potential in Node.js <= 0.10.

However, the situation changes starting from version 0.11.2, where an explicit round robin load-balancing algorithm is included inside the master process, which makes sure the requests are evenly distributed across all the workers. The new load-balancing algorithm is enabled by default on all platforms except Windows, and it can be globally modified by setting the variable cluster.schedulingPolicy, using the constants cluster.SCHED_RR (round robin) or cluster.SCHED_NONE (handled by the operating system).

Note

The round robin algorithm distributes the load evenly across the available servers on a rotational basis. The first request is forwarded to the first server, the second to the next server in the list, and so on. When the end of the list is reached, the iteration starts again from the beginning. This is one of the simplest and most used load balancing algorithms; however, it's not the only one. More sophisticated algorithms allow assigning priorities, selecting the least loaded server or the one with the fastest response time.

You can find more details about the evolution of the cluster module in these two Node.js issues:

https://github.com/nodejs/node-v0.x-archive/issues/3241

https://github.com/nodejs/node-v0.x-archive/issues/4435

Building a simple HTTP server

Let's now start working on an example. Let's build a small HTTP server, cloned and load-balanced using the cluster module. First of all, we need an application to scale; for this example we don't need too much, just a very basic HTTP server.

Let's create a file called app.js containing the following code:

const http = require('http'); 
const pid = process.pid; 
 
http.createServer((req, res) => { 
  for (let i = 1e7; i> 0; i--) {} 
  console.log(`Handling request from ${pid}`); 
  res.end(`Hello from ${pid}
`); 
}).listen(8080, () => { 
  console.log(`Started ${pid}`); 
}); 

The HTTP server we just built responds to any request by sending back a message containing its PID; this will be useful to identify which instance of the application is handling the request. Also, to simulate some actual CPU work, we perform an empty loop 10 million times; without this, the server load would be almost nothing considering the small scale of the tests we are going to run for this example.

Tip

The app module we want to scale can be anything and can also be implemented using a web framework, for example, Express.

We can now check if all works as expected by running the application as usual and sending a request to http://localhost:8080 using either a browser or curl.

We can also try to measure the requests per second that the server is able to handle using only one process; for this purpose, we can use a network benchmarking tool such as siege (http://www.joedog.org/siege-home) or Apache ab (http://httpd.apache.org/docs/2.4/programs/ab.html):

siege -c200 -t10S http://localhost:8080

With ab, the command line would be very similar:

ab -c200 -t10 http://localhost:8080/

The preceding commands will load the server with 200 concurrent connections for 10 seconds. As a reference, the result for a system with 4 processors is in the order of 90 transactions per second, with an average CPU utilization of only 20%.

Note

Please remember that the load tests we will perform in this chapter are intentionally simple and minimal and are provided only for reference and learning purposes. Their results cannot provide a 100% accurate evaluation of the performance of the various techniques we are analyzing.

Scaling with the cluster module

Let's now try to scale our application using the cluster module. Let's create a new module called clusteredApp.js:

const cluster = require('cluster'); 
const os = require('os'); 
 
if(cluster.isMaster) { 
  const cpus = os.cpus().length; 
  console.log(`Clustering to ${cpus} CPUs`); 
  for (let i = 0; i<cpus; i++) {      // [1] 
    cluster.fork(); 
  } 
} else { 
  require('./app');                   // [2] 
} 

As we can see, using the cluster module requires very little effort. Let's analyze what is happening:

  • When we launch clusteredApp from the command line, we are actually executing the master process. The cluster.isMaster variable is set to true and the only work we are required to do is forking the current process using cluster.fork(). In the preceding example, we are starting as many workers as the number of CPUs in the system to take advantage of all the available processing power.
  • When cluster.fork() is executed from the master process, the current main module (clusteredApp) is run again, but this time in worker mode (cluster.isWorker is set to true, while cluster.isMaster is false). When the application runs as a worker, it can start doing some actual work. In our example, we load the app module, which actually starts a new HTTP server.

Note

It's important to remember that each worker is a different Node.js process with its own event loop, memory space, and loaded modules.

It's interesting to notice that the usage of the cluster module is based on a recurring pattern, which makes it very easy to run multiple instances of an application:

if(cluster.isMaster) { 
  // fork() 
} else { 
  //do work 
} 

Tip

Under the hood, the cluster module uses the child_process.fork() API (we already met this API in Chapter 9, Advanced Asynchronous Recipes), therefore, we also have a communication channel available between the master and the workers. The instances of the workers can be accessed from the variable cluster.workers, so broadcasting a message to all of them would be as easy as running the following lines of code: Object.keys(cluster.workers).forEach(id => {   cluster.workers[id].send('Hello from the master'); });

Now, let's try to run our HTTP server in cluster mode. We can do that by starting the clusteredApp module as usual:

node clusteredApp

If our machine has more than one processor, we should see a number of workers being started by the master process, one after the other. For example, in a system with four processors, the terminal should look like this:

Started 14107
Started 14099
Started 14102
Started 14101

If we now try to hit our server again using the URL http://localhost:8080 , we should notice that each request will return a message with a different PID, which means that these requests have been handled by different workers, confirming that the load is being distributed among them.

Now we can try to load test our server again:

siege -c200 -t10S http://localhost:8080

This way, we should be able to discover the performance increase obtained by scaling our application across multiple processes. As a reference, by using Node.js 6 in a Linux system with 4 processors, the performance increase should be around 3x (270 trans/sec versus 90 trans/sec) with an average CPU load of 90%.

Resiliency and availability with the cluster module

As we already mentioned, scaling an application also brings other advantages, in particular the ability to maintain a certain level of service even in the presence of malfunctions or crashes. This property is also known as resiliency and it contributes towards the availability of a system.

By starting multiple instances of the same application, we are creating a redundant system, which means that if one instance goes down for whatever reason, we still have other instances ready to serve requests. This pattern is pretty straightforward to implement using the cluster module. Let's see how it works!

Let's take the code from the previous section as the starting point. In particular, let's modify the app.js module so that it crashes after a random interval of time:

// ... 
// At the end of app.js 
setTimeout(() => { 
  throw new Error('Ooops'); 
}, Math.ceil(Math.random() * 3) * 1000); 

With this change in place, our server exits with an error after a random number of seconds between 1 and 3. In a real-life situation, this would cause our application to stop working, and of course, serve requests, unless we use some external tool to monitor its status and restart it automatically. However, if we only have one instance, there may be a non-negligible delay between restarts caused by the startup time of the application. This means that during those restarts, the application is not available. Having multiple instances instead will make sure we always have a backup system to serve an incoming request even when one of the workers fails.

With the cluster module, all we have to do is spawn a new worker as soon as we detect that one is terminated with an error code. Let's then modify the clusteredApp.js module to take this into account:

if(cluster.isMaster) { 
  // ... 
 
  cluster.on('exit', (worker, code) => { 
    if(code != 0 && !worker.suicide) { 
      console.log('Worker crashed. Starting a new worker'); 
      cluster.fork(); 
    } 
  }); 
} else { 
  require('./app'); 
} 

In the preceding code, as soon as the master process receives an 'exit' event, we check whether the process is terminated intentionally or as the result of an error; we do this by checking the status code and the flag worker.exitedAfterDisconnect, which indicates whether the worker was terminated explicitly by the master. If we confirm that the process was terminated because of an error, we start a new worker. It's interesting to notice that while the crashed worker restarts, the other workers can still serve requests, thus not affecting the availability of the application.

To test this assumption, we can try to stress our server again using siege. When the stress test completes, we notice that among the various metrics produced by siege, there is also an indicator that measures the availability of the application. The expected result would be something similar to this:

Transactions:           3027 hits
Availability:           99.31 %
[...]
Failed transactions:         21

Bear in mind that this result can vary a lot; it greatly depends on the number of running instances and how many times they crash during the test, but it should give a good indicator of how our solution works. The preceding numbers tell us that despite the fact that our application is constantly crashing, we only had 21 failed requests over 3,027 hits. In the example scenario we built, most of the failing requests will be caused by the interruption of already established connections during a crash. In fact, when this happens, siege will print an error like the following:

[error] socket: read error Connection reset by peer sock.c:479: Connection reset by peer

Unfortunately, there is very little we can do to prevent these types of failures, especially when the application terminates because of a crash. Nonetheless, our solution proves to be working and its availability is not bad at all for an application that crashes so often!

Zero-downtime restart

A Node.js application might also need to be restarted when its code needs to be updated. So, also in this scenario, having multiple instances can help maintain the availability of our application.

When we have to intentionally restart an application to update it, there is a small window in which the application restarts and is unable to serve requests. This can be acceptable if we are updating our personal blog, but it's not even an option for a professional application with a Service Level Agreement (SLA) or one that is updated very often as part of a continuous delivery process. The solution is to implement a zero-downtime restart where the code of an application is updated without affecting its availability.

With the cluster module, this is again a pretty easy task; the pattern consists of restarting the workers one at a time. This way, the remaining workers can continue to operate and maintain the services of the application available.

Let's then add this new feature to our clustered server; all we have to do is add some new code to be executed by the master process (the clusteredApp.js file):

if (cluster.isMaster) {
  // ...
 
  process.on('SIGUSR2', () => { //[1]
    const workers = Object.keys(cluster.workers);

    function restartWorker(i) { //[2]
      if (i >= workers.length) return;
      const worker = cluster.workers[workers[i]];
      console.log(`Stopping worker: ${worker.process.pid}`);
      worker.disconnect(); //[3]

      worker.on('exit', () => {
        if (!worker.suicide) return;
        const newWorker = cluster.fork(); //[4]
        newWorker.on('listening', () => {
          restartWorker(i + 1); //[5]
        });
      });
    }
    restartWorker(0);
  });
} else {
  require('./app');
}
 

This is how the preceding block of code works:

  1. The restarting of the workers is triggered on receiving the SIGUSR2 signal.
  2. We define an iterator function called restartWorker(). This implements an asynchronous sequential iteration pattern over the items of the cluster.workers object.
  3. The first task of the restartWorker() function is stopping a worker gracefully by invoking worker.disconnect().
  4. When the terminated process exits, we can spawn a new worker.
  5. Only when the new worker is ready and listening for new connections can we proceed with restarting the next worker by invoking the next step of the iteration.

Note

As our program makes use of UNIX signals, it will not work properly on Windows systems (unless you are using the recent Windows subsystem for Linux in Windows 10). Signals are the simplest mechanism to implement our solution. However, this isn't the only one; in fact, other approaches include listening for a command coming from a socket, a pipe, or the standard input.

Now we can test our zero-downtime restart by running the clusteredApp module and then sending a SIGUSR2 signal. However, first we need to obtain the PID of the master process; the following command can be useful to identify it from the list of all the running processes:

ps af

The master process should be the parent of a set of node processes. Once we have the PID we are looking for, we can send the signal to it:

kill -SIGUSR2 <PID>

Now the output of the clusteredApp application should display something like this:

Restarting workers
Stopping worker: 19389
Started 19407
Stopping worker: 19390
Started 19409

We can try to use siege again to verify that we don't have any considerable impact on the availability of our application during the restart of the workers.

Tip

pm2 (https://github.com/Unitech/pm2) is a small utility, based on cluster, which offers load balancing, process monitoring, zero-downtime restarts, and other goodies.

Dealing with stateful communications

The cluster module does not work well with stateful communications where the state maintained by the application is not shared between the various instances. This is because different requests belonging to the same stateful session may potentially be handled by a different instance of the application. This is not a problem limited only to the cluster module, but in general it applies to any kind of stateless, load balancing algorithm. Consider, for example, the situation described by the following figure:

Dealing with stateful communications

The user John initially sends a request to our application to authenticate himself, but the result of the operation is registered locally (for example, in memory), so only the instance of the application that receives the authentication request (Instance A) knows that John is successfully authenticated. When John sends a new request, the load balancer might forward it to a different instance of the application, which actually doesn't possess the authentication details of John, hence refusing to perform the operation. The application we just described cannot be scaled as it is, but luckily, there are two easy solutions we can apply to solve the problem.

Sharing the state across multiple instances

The first option we have to scale an application using stateful communications is sharing the state across all the instances. This can be easily achieved with a shared datastore, as, for example, a database such as PostgreSQL (http://www.postgresql.org), MongoDB (http://www.mongodb.org), or CouchDB (http://couchdb.apache.org), or even better, we can use an in-memory store such as Redis (http://redis.io) or Memcached (http://memcached.org).

The following diagram outlines this simple and effective solution:

Sharing the state across multiple instances

The only drawback of using a shared store for the communication state is that it's not always possible, for example, we might be using an existing library that keeps the communication state in memory; anyway, if we have an existing application, applying this solution requires a change in the code of the application (if it's not already supported). As we will see next, there is a less invasive solution.

Sticky load balancing

The other alternative we have to support stateful communications is having the load balancer always routing all of the requests associated with a session always to the same instance of the application. This technique is also called sticky load balancing.

The following figure illustrates a simplified scenario involving this technique:

Sticky load balancing

As we can see from the preceding figure, when the load balancer receives a request associated with a new session, it creates a mapping with one particular instance selected by the load-balancing algorithm. The next time the load balancer receives a request from that same session, it bypasses the load-balancing algorithm, selecting the application instance that was previously associated with the session. The particular technique we just described involves the inspection of the session ID associated with the requests (usually included in a cookie by the application or the load balancer itself).

A simpler alternative to associate a stateful connection to a single server is by using the IP address of the client performing the request. Usually, the IP is provided to a hash function that generates an ID representing the application instance designated to receive the request. This technique has the advantage of not requiring the association to be remembered by the load balancer. However, it doesn't work well with devices that frequently change IP as, for example, when roaming on different networks.

Tip

Sticky load balancing is not supported by default by the cluster module; however, it can be added with an npm library called sticky-session (https://www.npmjs.org/package/sticky-session).

One big problem with sticky load balancing is the fact that it nullifies most of the advantages of having a redundant system, where all the instances of the application are the same, and where an instance can eventually replace another one that stopped working. For these reasons, the recommendation is to always try to avoid sticky load balancing and building applications that maintain any session state in a shared store or that don't require stateful communications at all (for example, by including the state in the request itself) are preferred.

Tip

For a real example of a library requiring sticky load balancing, we can mention Socket.io (http://socket.io/blog/introducing-socket-io-1-0/#scalability).

Scaling with a reverse proxy

The cluster module is not the only option we have to scale a Node.js web application. In fact, more traditional techniques are often preferred because they offer more control and power in highly available production environments.

The alternative to using cluster is to start multiple standalone instances of the same application running on different ports or machines, and then use a reverse proxy (or gateway) to provide access to those instances, distributing the traffic across them. In this configuration, we don't have a master process distributing requests to a set of workers, but a set of distinct processes running on the same machine (using different ports) or scattered across different machines inside a network. To provide a single access point to our application, we can then use a reverse proxy, a special device or service placed between the clients and the instances of our application, which takes any request and forwards it to a destination server, returning the result to the client as if it was itself the origin. In this scenario, the reverse proxy is also used as a load balancer, distributing the requests among the instances of the application.

Tip

For a clear explanation of the differences between a reverse proxy and a forward proxy, you can refer to the Apache HTTP server documentation at http://httpd.apache.org/docs/2.4/mod/mod_proxy.html#forwardreverse.

The next figure shows a typical multiprocess, multimachine configuration with a reverse proxy acting as a load balancer on the front:

Scaling with a reverse proxy

For a Node.js application, there are many reasons to choose this approach in place of the cluster module:

  • A reverse proxy can distribute the load across several machines, not just several processes
  • The most popular reverse proxies on the market support sticky load balancing
  • A reverse proxy can route a request to any available server, regardless of its programming language or platform
  • We can choose more powerful load balancing algorithms
  • Many reverse proxies also offer other services such as URL rewrites, caching, SSL termination point, or even the functionality of fully-fledged web servers that can be used, for example, to serve static files

That said, the cluster module could also be easily combined with a reverse proxy if necessary; for example, using cluster to scale vertically inside a single machine and then using the reverse proxy to scale horizontally across different nodes.

Note

Pattern

Use a reverse proxy to balance the load of an application across multiple instances running on different ports or machines.

We have many options to implement a load balancer using a reverse proxy; some popular solutions are the following:

  • Nginx (http://nginx.org): This is a web server, reverse proxy, and load balancer, built upon the non-blocking I/O model.
  • HAProxy (http://www.haproxy.org): This is a fast load balancer for TCP/HTTP traffic.
  • Node.js-based proxies: There are many solutions for the implementation of reverse proxies and load balancers directly in Node.js. This might have advantages and disadvantages, as we will see later.
  • Cloud-based proxies: In the era of cloud computing, it's not rare to utilize a load balancer as-a-service. This can be convenient because it requires minimal maintenance, it's usually highly scalable, and sometimes it can support dynamic configurations to enable on-demand scalability.

In the next few sections of this chapter, we will analyze a sample configuration using Nginx, and later on, we will also work on building our very own load balancer using nothing but Node.js!

Load balancing with Nginx

To give an idea of how dedicated reverse proxies work, we will now build a scalable architecture based on Nginx (http://nginx.org), but first we need to install it. We can do that by following the instructions at http://nginx.org/en/docs/install.html.

Tip

On the latest Ubuntu system, you can quickly install Nginx with the command:

sudo apt-get install nginx

On Mac OS X, you can use brew (http://brew.sh):

brew install nginx

As we are not going to use cluster to start multiple instances of our server, we need to slightly modify the code of our application so that we can specify the listening port using a command-line argument. This will allow us to launch multiple instances on different ports. Let's then consider again the main module of our example application (app.js):

const http = require('http'); 
const pid = process.pid; 
 
http.createServer((req, res) => { 
  for (let i = 1e7; i> 0; i--) {} 
  console.log(`Handling request from ${pid}`); 
  res.end(`Hello from ${pid}
`); 
}).listen(process.env.PORT || process.argv[2] || 8080, () => { 
  console.log(`Started ${pid}`); 
}); 

Another important feature we lack by not using cluster is the automatic restart in case of a crash. Luckily, this is easy to fix by using a dedicated supervisor, which is an external process monitoring our application and restarting it if necessary. Possible choices are the following:

For this example, we are going to use forever, which is the simplest and most immediate for us to use. We can install it globally by running the following command:

npm install forever -g

The next step is to start the four instances of our application, all on different ports and supervised by forever:

forever start app.js 8081
forever start app.js 8082
forever start app.js 8083
forever start app.js 8084

We can check the list of the started processes using the command:

forever list

Now it's time to configure the Nginx server as a load balancer.

First, we need to identify the location of the nginx.conf file that can be found in one of the following locations, depending on your system /usr/local/nginx/conf, /etc/nginx, or /usr/local/etc/nginx.

Next, let's open the nginx.conf file and apply the following configuration, which is the very minimum required to get a working load balancer:

http { 
  # ... 
  upstream nodejs_design_patterns_app { 
    server 127.0.0.1:8081; 
    server 127.0.0.1:8082; 
    server 127.0.0.1:8083; 
    server 127.0.0.1:8084; 
  } 
  # ... 
  server { 
    listen 80; 
 
    location / { 
      proxy_pass http://nodejs_design_patterns_app; 
    } 
  } 
  # ... 
} 

The configuration needs very little explanation. In the upstream nodejs_design_patterns_app section, we are defining a list of the backend servers used to handle the network requests, and then, in the server section, we specify the proxy_pass directive, which essentially tells Nginx to forward any request to the server group we defined before (nodejs_design_patterns_app). That's it, now we only need to reload the Nginx configuration with the command:

nginx -s reload

Our system should now be up and running, ready to accept requests and balance the traffic across the four instances of our Node.js application. Simply point your browser to the address http://localhost to see how the traffic is balanced by our Nginx server.

Using a service registry

One important advantage of modern cloud-based infrastructures is the ability to dynamically adjust the capacity of an application based on the current or predicted traffic; this is also known as dynamic scaling. If implemented properly, this practice can reduce the cost of the IT infrastructure enormously while still keeping the application highly available and responsive.

The idea is simple: if our application is experiencing a performance degradation caused by a peak in the traffic, we automatically spawn new servers to cope with the increased load. We could also decide to shut down some servers during certain hours, for example, at night, when we know that the traffic will be less, and restarting them again in the morning. This mechanism requires the load balancer to always be up-to-date with the current network topology, knowing at any time which server is up.

A common pattern to solve this problem is to use a central repository called a service registry, which keeps track of the running servers and the services they provide. The next figure shows a multiservice architecture with a load balancer on the front, dynamically configured using a service registry:

Using a service registry

The preceding architecture assumes the presence of two services, API and WebApp. The load balancer distributes the requests arriving on the /api endpoint to all the servers implementing the API service, while the rest of the requests are spread across the servers implementing the WebApp service. The load balancer obtains the list of servers using the service registry.

For this to work in complete automation, each application instance has to register itself to the service registry the moment it comes up online, and unregister itself when it stops. This way, the load balancer can always have an up-to-date view of the servers and the services available on the network.

Note

Pattern (service registry)

Use a central repository to store an always up-to-date view of the servers and the services available in a system.

This pattern can be applied not only to load balancing, but also more generally as a way to decouple a service type from the servers providing it. We can look at it as a service locator design pattern applied to network services.

Implementing a dynamic load balancer with http-proxy and Consul

To support a dynamic network infrastructure, we can use a reverse proxy such as Nginx or HAProxy; all we need to do is update their configuration using an automated service and then force the load balancer to pick the changes. For Nginx, this can be done using the following command line:

nginx -s reload

The same result can be achieved with a cloud-based solution, but we have a third and more familiar alternative that makes use of our favorite platform.

We all know that Node.js is a great tool to build any sort of network application; as we said, this is exactly one of its main design goals. So, why not build a load balancer using nothing but Node.js? This would give us much more freedom and power, and would allow us to implement any sort of pattern or algorithm straight into our custom-built load balancer, including the one we are now going to explore, dynamic load balancing using a service registry. In this example we are going to use Consul (https://www.consul.io) as the service registry.

For this example, we want to replicate the multiservice architecture we saw in the figure of the previous section, and to do that, we are going to mainly use three npm packages:

Let's start by implementing our services. They are simple HTTP servers like the ones we have used so far to test cluster and Nginx, but this time we want each server to register itself into the service registry the moment it starts.

Let's see how this looks (file app.js):

const http = require('http'); 
const pid = process.pid; 
const consul = require('consul')(); 
const portfinder = require('portfinder'); 
const serviceType = process.argv[2]; 
 
portfinder.getPort((err, port) => {        // [1] 
  const serviceId = serviceType+port; 
  consul.agent.service.register({          // [2] 
    id: serviceId, 
    name: serviceType, 
    address: 'localhost', 
    port: port, 
    tags: [serviceType] 
  }, () => { 
 
    const unregisterService = (err) => {   // [3] 
      consul.agent.service.deregister(serviceId, () => { 
        process.exit(err ? 1 : 0); 
      }); 
    }; 
 
    process.on('exit', unregisterService); // [4] 
    process.on('SIGINT', unregisterService); 
    process.on('uncaughtException', unregisterService); 
 
    http.createServer((req, res) => {      // [5] 
      for (let i = 1e7; i> 0; i--) {} 
      console.log(`Handling request from ${pid}`); 
      res.end(`${serviceType} response from ${pid}
`); 
    }).listen(port, () => { 
      console.log(`Started ${serviceType} (${pid}) on port ${port}`); 
    }); 
  }); 
}); 

In the preceding code, there are some parts that deserve our attention:

  • First, we use portfinder.getPort to discover a free port in the system (by default, portfinder starts to search from port 8000).
  • Next, we use the Consul library to register a new service in the registry. The service definition needs several attributes: id (a unique name for the service), name (a generic name that identifies the service), address and port (to identify how to access the service), tags (an optional array of tags that can be used to filter and group services). We are using serviceType (that we get as a command-line argument) to specify the service name and to add a tag. This will allow us to identify all the services of the same type available in the cluster.
  • At this point we define a function called unregisterService that allows us to remove the service we just registered in Consul.
  • We use unregisterService as a cleanup function, so that when the program is closed (either intentionally or by accident), the service is unregistered from Consul.
  • Finally, we start the HTTP server for our service on the port discovered by portfinder.

Now it's time to implement the load balancer. Let's do that by creating a new module called loadBalancer.js. First, we need to define a routing table to map URL paths to services:

const routing = [
  { 
    path: '/api', 
    service: 'api-service', 
    index: 0 
  },
  { 
    path: '/', 
    service: 'webapp-service', 
    index: 0 
  }
]; 

Each item in the routing array contains service used to handle the requests arriving on the mapped path. The index property will be used to round robin the requests of a given service.

Let's see how this works by implementing the second part of loadbalancer.js:

const http = require('http'); 
const httpProxy = require('http-proxy'); 
const consul = require('consul')();                  // [1] 
 
const proxy = httpProxy.createProxyServer({}); 
http.createServer((req, res) => { 
  let route; 
  routing.some(entry => {                            // [2] 
    route = entry; 
    //Starts with the route path? 
    return req.url.indexOf(route.path) === 0; 
  }); 
 
  consul.agent.service.list((err, services) => {     // [3] 
    const servers = []; 
    Object.keys(services).filter(id => { // 
      if (services[id].Tags.indexOf(route.service) > -1) { 
        servers.push(`http://${services[id].Address}:${services[id].Port}`) 
      } 
    }); 
 
    if (!servers.length) { 
      res.writeHead(502); 
      return res.end('Bad gateway'); 
    } 
 
    route.index = (route.index + 1) % servers.length; // [4] 
    proxy.web(req, res, {target: servers[route.index]}); 
  }); 
}).listen(8080, () => console.log('Load balancer started on port 8080')); 

This is how we implemented our Node.js-based load balancer:

  1. First, we need to require consul so that we can have access to the registry. Next, we instantiate an http-proxy object and start a normal web server.
  2. In the request handler of the server, the first thing we do is match the URL against our routing table. The result will be a descriptor containing the service name.
  3. We obtain from consul the list of servers implementing the required service. If this list is empty, we return an error to the client. We use the Tag attribute to filter all the available services and find the address of the servers that implements the current service type.
  4. At last, we can route the request to its destination. We update route.index to point to the next server in the list, following a round robin approach. We then use the index to select a server from the list, passing it to proxy.web() along with the request (req) and the response (res) objects. This will simply forward the request to the server we chose.

It is now clear how simple it is to implement a load balancer using only Node.js and a service registry and how much flexibility we can have by doing so. Now we should be ready to give it a go, but first, let's install the consul server by following the official documentation at: https://www.consul.io/intro/getting-started/install.html.

This allows us to start the consul service registry in our development machine with this simple command line:

    consul agent -dev

Now we are ready to start the load balancer:

node loadBalancer

Now if we try to access some of the services exposed by the load balancer, we will notice that it returns an HTTP 502 error, because we didn't start any server yet. Try it yourself:

curl localhost:8080/api

The preceding command should return the following output:

Bad Gateway

The situation will change if we spawn some instances of our services, for example, two api-service and one webapp-service:

forever start app.js api-service
forever start app.js api-service
forever start app.js webapp-service

Now the load balancer should automatically see the new servers and start distributing requests across them. Let's try again with the following command:

curl localhost:8080/api

The preceding command should now return this:

api-service response from 6972

By running it again, we should now receive a message from another server, confirming that the requests are being distributed evenly among the different servers:

api-service response from 6979

The advantages of this pattern are immediate. We can now scale our infrastructure dynamically, on demand, or based on a schedule, and our load balancer will automatically adjust with the new configuration without any extra effort!

Peer-to-peer load balancing

Using a reverse proxy is almost a necessity when we want to expose a complex internal network architecture to a public network such as the Internet. It helps hide the complexity, providing a single access point that external applications can easily use and rely on. However, if we need to scale a service that is for internal use only, we can have much more flexibility and control.

Let's imagine having a Service A which relies on a Service B to implement its functionality. Service B is scaled across multiple machines and it's available only in the internal network. What we have learned so far is that Service A will connect to Service B using a reverse proxy, which will distribute the traffic to all the servers implementing Service B.

However, there is an alternative. We can remove the reverse proxy from the picture and distribute the requests directly from the client (Service A), which now becomes directly responsible for load balancing its connections across the various instances of Service B. This is possible only if Server A knows the details about the servers exposing Service B, and in an internal network, this is usually known information. With this approach we are essentially implementing peer-to-peer load balancing.

The following diagram compares the two alternatives we just described:

Peer-to-peer load balancing

This is an extremely simple and effective pattern that enables truly distributed communications without bottlenecks or single points of failure. Besides that, it also does the following:

  • Reduces the infrastructure complexity by removing a network node
  • Allows faster communications, because messages will travel through one fewer node
  • Scales better, because performances are not limited by what the load balancer can handle

On the other hand, by removing the reverse proxy, we are actually exposing the complexity of its underlying infrastructure. Also, each client has to be smarter by implementing a load-balancing algorithm and, possibly, also a way to keep its knowledge of the infrastructure up-to-date.

Tip

Peer-to-peer load balancing is a pattern used extensively in the ØMQ (http://zeromq.org) library.

Implementing an HTTP client that can balance requests across multiple servers

We already know how to implement a load balancer using only Node.js and distribute incoming requests across the available servers, so implementing the same mechanism on the client side should not be that different. All we have to do in fact is wrap the client API and augment it with a load-balancing mechanism. Take a look at the following module (balancedRequest.js):

const http = require('http'); 
const servers = [ 
  {host: 'localhost', port: '8081'}, 
  {host: 'localhost', port: '8082'} 
]; 
let i = 0; 
 
module.exports = (options, callback) => { 
  i = (i + 1) % servers.length; 
  options.hostname = servers[i].host; 
  options.port = servers[i].port; 
 
  return http.request(options, callback); 
}; 

The preceding code is very simple and needs little explanation. We wrapped the original http.request API so that it overrides the hostname and port of the request with those selected from the list of available servers using a round robin algorithm.

The new wrapped API can then be used seamlessly (client.js):

const request = require('./balancedRequest'); 
for(let i = 10; i>= 0; i--) { 
  request({method: 'GET', path: '/'}, res => { 
    let str = ''; 
    res.on('data', chunk => {
      str += chunk;
    }).on('end', () => {
      console.log(str);
    }); 
  }).end(); 
} 

To try the preceding code, we have to start two instances of the sample server provided:

node app 8081
node app 8082

Followed by the client application we just built:

node client

We should notice how each request is sent to a different server, confirming that we are now able to balance the load without a dedicated reverse proxy!

Tip

An improvement to the wrapper we created before would be to integrate a service registry directly into the client and obtain the server list dynamically. You can find an example of this technique in the code distributed with the book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.18.198