Traditional, multithreaded web servers are usually scaled only when the resources assigned to a machine cannot be upgraded any more or when doing so would involve a higher cost than simply launching another machine. By using multiple threads, traditional web servers can take advantage of all the processing power of a server, using all the available processors and memory. However, with a single Node.js process it is harder to do that, being single-threaded and having by default a memory limit of 1.7 GB on 64-bit machines (which needs a special command-line option called --max_old_space_size
to be increased). This means that Node.js applications are usually scaled much sooner compared to traditional web servers, even in the context of a single machine, to be able to take advantage of all its resources.
Don't be fooled into thinking about this as a disadvantage. On the contrary, being almost forced to scale has beneficial effects on other attributes of an application, in particular availability and fault-tolerance. In fact, scaling a Node.js application by cloning is relatively simple and it's often implemented even if there is no need to harvest more resources, just for the purpose of having a redundant, fail-tolerant setup.
This also pushes the developer to take into account scalability from the early stages of an application, making sure the application does not rely on any resource that cannot be shared across multiple processes or machines. In fact, an absolute prerequisite to scaling an application is that each instance does not have to store common information on resources that cannot be shared, usually hardware, such as memory or disk. For example, in a web server, storing the session data in memory or on disk is a practice that does not work well with scaling; instead, using a shared database will assure that each instance will have access to the same session information, wherever it is deployed.
Let's now introduce the most basic mechanism for scaling Node.js applications: the cluster
module.
In Node.js, the simplest pattern to distribute the load of an application across different instances running on a single machine is by using the cluster
module, which is part of the core libraries. The cluster
module simplifies the forking of new instances of the same application and automatically distributes incoming connections across them, as shown in the following figure:
The Master process is responsible for spawning a number of processes (workers), each representing an instance of the application we want to scale. Each incoming connection is then distributed across the cloned workers, spreading the load across them.
In Node.js 0.8 and 0.10, the cluster
module shares the same server socket across the workers and leaves to the operating system, the job of load-balancing incoming connections across the available workers. However, there is a problem with this approach; in fact, the algorithms used by the operating system to distribute the load across the workers are not meant to load-balance network requests, but rather to schedule the execution of processes. As a result, the distribution is not always uniform across all the instances; often, a fraction of workers receive most of the load. This type of behavior can make sense for the operating system scheduler because it focuses on minimizing the context switches between different processes. The short story is that the cluster
module does not work at its full potential in Node.js <= 0.10.
However, the situation changes starting from version 0.11.2, where an explicit round robin load-balancing algorithm is included inside the master process, which makes sure the requests are evenly distributed across all the workers. The new load-balancing algorithm is enabled by default on all platforms except Windows, and it can be globally modified by setting the variable cluster.schedulingPolicy
, using the constants cluster.SCHED_RR
(round robin) or cluster.SCHED_NONE
(handled by the operating system).
The round robin algorithm distributes the load evenly across the available servers on a rotational basis. The first request is forwarded to the first server, the second to the next server in the list, and so on. When the end of the list is reached, the iteration starts again from the beginning. This is one of the simplest and most used load balancing algorithms; however, it's not the only one. More sophisticated algorithms allow assigning priorities, selecting the least loaded server or the one with the fastest response time.
You can find more details about the evolution of the cluster
module in these two Node.js issues:
Let's now start working on an example. Let's build a small HTTP server, cloned and load-balanced using the cluster
module. First of all, we need an application to scale; for this example we don't need too much, just a very basic HTTP server.
Let's create a file called app.js
containing the following code:
const http = require('http'); const pid = process.pid; http.createServer((req, res) => { for (let i = 1e7; i> 0; i--) {} console.log(`Handling request from ${pid}`); res.end(`Hello from ${pid} `); }).listen(8080, () => { console.log(`Started ${pid}`); });
The HTTP server we just built responds to any request by sending back a message containing its PID; this will be useful to identify which instance of the application is handling the request. Also, to simulate some actual CPU work, we perform an empty loop 10 million times; without this, the server load would be almost nothing considering the small scale of the tests we are going to run for this example.
We can now check if all works as expected by running the application as usual and sending a request to http://localhost:8080
using either a browser or curl
.
We can also try to measure the requests per second that the server is able to handle using only one process; for this purpose, we can use a network benchmarking tool such as siege
(http://www.joedog.org/siege-home) or Apache ab
(http://httpd.apache.org/docs/2.4/programs/ab.html):
siege -c200 -t10S http://localhost:8080
With ab
, the command line would be very similar:
ab -c200 -t10 http://localhost:8080/
The preceding commands will load the server with 200 concurrent connections for 10 seconds. As a reference, the result for a system with 4 processors is in the order of 90 transactions per second, with an average CPU utilization of only 20%.
Let's now try to scale our application using the cluster
module. Let's create a new module called clusteredApp.js
:
const cluster = require('cluster'); const os = require('os'); if(cluster.isMaster) { const cpus = os.cpus().length; console.log(`Clustering to ${cpus} CPUs`); for (let i = 0; i<cpus; i++) { // [1] cluster.fork(); } } else { require('./app'); // [2] }
As we can see, using the cluster
module requires very little effort. Let's analyze what is happening:
clusteredApp
from the command line, we are actually executing the master process. The cluster.isMaster
variable is set to true
and the only work we are required to do is forking the current process using cluster.fork()
. In the preceding example, we are starting as many workers as the number of CPUs in the system to take advantage of all the available processing power.cluster.fork()
is executed from the master process, the current main module (clusteredApp
) is run again, but this time in worker mode (cluster.isWorker
is set to true
, while cluster.isMaster
is false
). When the application runs as a worker, it can start doing some actual work. In our example, we load the app
module, which actually starts a new HTTP server.It's interesting to notice that the usage of the cluster
module is based on a recurring pattern, which makes it very easy to run multiple instances of an application:
if(cluster.isMaster) { // fork() } else { //do work }
Under the hood, the cluster
module uses the child_process.fork()
API (we already met this API in
Chapter 9, Advanced Asynchronous Recipes), therefore, we also have a communication channel available between the master and the workers. The instances of the workers can be accessed from the variable cluster.workers
, so broadcasting a message to all of them would be as easy as running the following lines of code:
Object.keys(cluster.workers).forEach(id => {
cluster.workers[id].send('Hello from the master');
});
Now, let's try to run our HTTP server in cluster mode. We can do that by starting the clusteredApp
module as usual:
node clusteredApp
If our machine has more than one processor, we should see a number of workers being started by the master process, one after the other. For example, in a system with four processors, the terminal should look like this:
Started 14107 Started 14099 Started 14102 Started 14101
If we now try to hit our server again using the URL http://localhost:8080
, we should notice that each request will return a message with a different PID, which means that these requests have been handled by different workers, confirming that the load is being distributed among them.
Now we can try to load test our server again:
siege -c200 -t10S http://localhost:8080
This way, we should be able to discover the performance increase obtained by scaling our application across multiple processes. As a reference, by using Node.js 6 in a Linux system with 4 processors, the performance increase should be around 3x (270 trans/sec versus 90 trans/sec) with an average CPU load of 90%.
As we already mentioned, scaling an application also brings other advantages, in particular the ability to maintain a certain level of service even in the presence of malfunctions or crashes. This property is also known as resiliency and it contributes towards the availability of a system.
By starting multiple instances of the same application, we are creating a redundant system, which means that if one instance goes down for whatever reason, we still have other instances ready to serve requests. This pattern is pretty straightforward to implement using the cluster
module. Let's see how it works!
Let's take the code from the previous section as the starting point. In particular, let's modify the app.js
module so that it crashes after a random interval of time:
// ... // At the end of app.js setTimeout(() => { throw new Error('Ooops'); }, Math.ceil(Math.random() * 3) * 1000);
With this change in place, our server exits with an error after a random number of seconds between 1 and 3. In a real-life situation, this would cause our application to stop working, and of course, serve requests, unless we use some external tool to monitor its status and restart it automatically. However, if we only have one instance, there may be a non-negligible delay between restarts caused by the startup time of the application. This means that during those restarts, the application is not available. Having multiple instances instead will make sure we always have a backup system to serve an incoming request even when one of the workers fails.
With the cluster
module, all we have to do is spawn a new worker as soon as we detect that one is terminated with an error code. Let's then modify the clusteredApp.js
module to take this into account:
if(cluster.isMaster) { // ... cluster.on('exit', (worker, code) => { if(code != 0 && !worker.suicide) { console.log('Worker crashed. Starting a new worker'); cluster.fork(); } }); } else { require('./app'); }
In the preceding code, as soon as the master process receives an 'exit'
event, we check whether the process is terminated intentionally or as the result of an error; we do this by checking the status code
and the flag worker.exitedAfterDisconnect
, which indicates whether the worker was terminated explicitly by the master. If we confirm that the process was terminated because of an error, we start a new worker. It's interesting to notice that while the crashed worker restarts, the other workers can still serve requests, thus not affecting the availability of the application.
To test this assumption, we can try to stress our server again using siege
. When the stress test completes, we notice that among the various metrics produced by siege
, there is also an indicator that measures the availability of the application. The expected result would be something similar to this:
Transactions: 3027 hits Availability: 99.31 % [...] Failed transactions: 21
Bear in mind that this result can vary a lot; it greatly depends on the number of running instances and how many times they crash during the test, but it should give a good indicator of how our solution works. The preceding numbers tell us that despite the fact that our application is constantly crashing, we only had 21 failed requests over 3,027 hits. In the example scenario we built, most of the failing requests will be caused by the interruption of already established connections during a crash. In fact, when this happens, siege
will print an error like the following:
[error] socket: read error Connection reset by peer sock.c:479: Connection reset by peer
Unfortunately, there is very little we can do to prevent these types of failures, especially when the application terminates because of a crash. Nonetheless, our solution proves to be working and its availability is not bad at all for an application that crashes so often!
A Node.js application might also need to be restarted when its code needs to be updated. So, also in this scenario, having multiple instances can help maintain the availability of our application.
When we have to intentionally restart an application to update it, there is a small window in which the application restarts and is unable to serve requests. This can be acceptable if we are updating our personal blog, but it's not even an option for a professional application with a Service Level Agreement (SLA) or one that is updated very often as part of a continuous delivery process. The solution is to implement a zero-downtime restart where the code of an application is updated without affecting its availability.
With the cluster
module, this is again a pretty easy task; the pattern consists of restarting the workers one at a time. This way, the remaining workers can continue to operate and maintain the services of the application available.
Let's then add this new feature to our clustered server; all we have to do is add some new code to be executed by the master process (the clusteredApp.js
file):
if (cluster.isMaster) { // ... process.on('SIGUSR2', () => { //[1] const workers = Object.keys(cluster.workers); function restartWorker(i) { //[2] if (i >= workers.length) return; const worker = cluster.workers[workers[i]]; console.log(`Stopping worker: ${worker.process.pid}`); worker.disconnect(); //[3] worker.on('exit', () => { if (!worker.suicide) return; const newWorker = cluster.fork(); //[4] newWorker.on('listening', () => { restartWorker(i + 1); //[5] }); }); } restartWorker(0); }); } else { require('./app'); }
This is how the preceding block of code works:
SIGUSR2
signal.restartWorker()
. This implements an asynchronous sequential iteration pattern over the items of the cluster.workers
object.restartWorker()
function is stopping a worker gracefully by invoking worker.disconnect()
.As our program makes use of UNIX signals, it will not work properly on Windows systems (unless you are using the recent Windows subsystem for Linux in Windows 10). Signals are the simplest mechanism to implement our solution. However, this isn't the only one; in fact, other approaches include listening for a command coming from a socket, a pipe, or the standard input.
Now we can test our zero-downtime restart by running the clusteredApp
module and then sending a SIGUSR2
signal. However, first we need to obtain the PID of the master process; the following command can be useful to identify it from the list of all the running processes:
ps af
The master process should be the parent of a set of node
processes. Once we have the PID we are looking for, we can send the signal to it:
kill -SIGUSR2 <PID>
Now the output of the clusteredApp
application should display something like this:
Restarting workers Stopping worker: 19389 Started 19407 Stopping worker: 19390 Started 19409
We can try to use siege
again to verify that we don't have any considerable impact on the availability of our application during the restart of the workers.
pm2
(https://github.com/Unitech/pm2) is a small utility, based on cluster
, which offers load balancing, process monitoring, zero-downtime restarts, and other goodies.
The cluster
module does not work well with stateful communications where the state maintained by the application is not shared between the various instances. This is because different requests belonging to the same stateful session may potentially be handled by a different instance of the application. This is not a problem limited only to the cluster
module, but in general it applies to any kind of stateless, load balancing algorithm. Consider, for example, the situation described by the following figure:
The user John initially sends a request to our application to authenticate himself, but the result of the operation is registered locally (for example, in memory), so only the instance of the application that receives the authentication request (Instance A) knows that John is successfully authenticated. When John sends a new request, the load balancer might forward it to a different instance of the application, which actually doesn't possess the authentication details of John, hence refusing to perform the operation. The application we just described cannot be scaled as it is, but luckily, there are two easy solutions we can apply to solve the problem.
The first option we have to scale an application using stateful communications is sharing the state across all the instances. This can be easily achieved with a shared datastore, as, for example, a database such as PostgreSQL (http://www.postgresql.org), MongoDB (http://www.mongodb.org), or CouchDB (http://couchdb.apache.org), or even better, we can use an in-memory store such as Redis (http://redis.io) or Memcached (http://memcached.org).
The following diagram outlines this simple and effective solution:
The only drawback of using a shared store for the communication state is that it's not always possible, for example, we might be using an existing library that keeps the communication state in memory; anyway, if we have an existing application, applying this solution requires a change in the code of the application (if it's not already supported). As we will see next, there is a less invasive solution.
The other alternative we have to support stateful communications is having the load balancer always routing all of the requests associated with a session always to the same instance of the application. This technique is also called sticky load balancing.
The following figure illustrates a simplified scenario involving this technique:
As we can see from the preceding figure, when the load balancer receives a request associated with a new session, it creates a mapping with one particular instance selected by the load-balancing algorithm. The next time the load balancer receives a request from that same session, it bypasses the load-balancing algorithm, selecting the application instance that was previously associated with the session. The particular technique we just described involves the inspection of the session ID associated with the requests (usually included in a cookie by the application or the load balancer itself).
A simpler alternative to associate a stateful connection to a single server is by using the IP address of the client performing the request. Usually, the IP is provided to a hash function that generates an ID representing the application instance designated to receive the request. This technique has the advantage of not requiring the association to be remembered by the load balancer. However, it doesn't work well with devices that frequently change IP as, for example, when roaming on different networks.
Sticky load balancing is not supported by default by the cluster
module; however, it can be added with an npm library called sticky-session
(https://www.npmjs.org/package/sticky-session).
One big problem with sticky load balancing is the fact that it nullifies most of the advantages of having a redundant system, where all the instances of the application are the same, and where an instance can eventually replace another one that stopped working. For these reasons, the recommendation is to always try to avoid sticky load balancing and building applications that maintain any session state in a shared store or that don't require stateful communications at all (for example, by including the state in the request itself) are preferred.
For a real example of a library requiring sticky load balancing, we can mention Socket.io
(http://socket.io/blog/introducing-socket-io-1-0/#scalability).
The cluster
module is not the only option we have to scale a Node.js web application. In fact, more traditional techniques are often preferred because they offer more control and power in highly available production environments.
The alternative to using cluster
is to start multiple standalone instances of the same application running on different ports or machines, and then use a reverse proxy (or gateway) to provide access to those instances, distributing the traffic across them. In this configuration, we don't have a master process distributing requests to a set of workers, but a set of distinct processes running on the same machine (using different ports) or scattered across different machines inside a network. To provide a single access point to our application, we can then use a reverse proxy, a special device or service placed between the clients and the instances of our application, which takes any request and forwards it to a destination server, returning the result to the client as if it was itself the origin. In this scenario, the reverse proxy is also used as a load balancer, distributing the requests among the instances of the application.
For a clear explanation of the differences between a reverse proxy and a forward proxy, you can refer to the Apache HTTP server documentation at http://httpd.apache.org/docs/2.4/mod/mod_proxy.html#forwardreverse.
The next figure shows a typical multiprocess, multimachine configuration with a reverse proxy acting as a load balancer on the front:
For a Node.js application, there are many reasons to choose this approach in place of the cluster
module:
That said, the cluster
module could also be easily combined with a reverse proxy if necessary; for example, using cluster
to scale vertically inside a single machine and then using the reverse proxy to scale horizontally across different nodes.
We have many options to implement a load balancer using a reverse proxy; some popular solutions are the following:
In the next few sections of this chapter, we will analyze a sample configuration using Nginx, and later on, we will also work on building our very own load balancer using nothing but Node.js!
To give an idea of how dedicated reverse proxies work, we will now build a scalable architecture based on Nginx (http://nginx.org), but first we need to install it. We can do that by following the instructions at http://nginx.org/en/docs/install.html.
On the latest Ubuntu system, you can quickly install Nginx with the command:
sudo apt-get install nginx
On Mac OS X, you can use brew
(http://brew.sh):
brew install nginx
As we are not going to use cluster
to start multiple instances of our server, we need to slightly modify the code of our application so that we can specify the listening port using a command-line argument. This will allow us to launch multiple instances on different ports. Let's then consider again the main module of our example application (app.js
):
const http = require('http'); const pid = process.pid; http.createServer((req, res) => { for (let i = 1e7; i> 0; i--) {} console.log(`Handling request from ${pid}`); res.end(`Hello from ${pid} `); }).listen(process.env.PORT || process.argv[2] || 8080, () => { console.log(`Started ${pid}`); });
Another important feature we lack by not using cluster
is the automatic restart in case of a crash. Luckily, this is easy to fix by using a dedicated supervisor, which is an external process monitoring our application and restarting it if necessary. Possible choices are the following:
forever
(https://npmjs.org/package/forever) or pm2
(https://npmjs.org/package/pm2)upstart
(http://upstart.ubuntu.com), systemd
(http://freedesktop.org/wiki/Software/systemd) or runit
(http://smarden.org/runit/)monit
(http://mmonit.com/monit) or supervisor
(http://supervisord.org)For this example, we are going to use forever
, which is the simplest and most immediate for us to use. We can install it globally by running the following command:
npm install forever -g
The next step is to start the four instances of our application, all on different ports and supervised by forever
:
forever start app.js 8081 forever start app.js 8082 forever start app.js 8083 forever start app.js 8084
We can check the list of the started processes using the command:
forever list
Now it's time to configure the Nginx server as a load balancer.
First, we need to identify the location of the nginx.conf
file that can be found in one of the following locations, depending on your system /usr/local/nginx/conf
, /etc/nginx
, or /usr/local/etc/nginx
.
Next, let's open the nginx.conf
file and apply the following configuration, which is the very minimum required to get a working load balancer:
http { # ... upstream nodejs_design_patterns_app { server 127.0.0.1:8081; server 127.0.0.1:8082; server 127.0.0.1:8083; server 127.0.0.1:8084; } # ... server { listen 80; location / { proxy_pass http://nodejs_design_patterns_app; } } # ... }
The configuration needs very little explanation. In the upstream nodejs_design_patterns_app
section, we are defining a list of the backend servers used to handle the network requests, and then, in the server
section, we specify the proxy_pass
directive, which essentially tells Nginx to forward any request to the server group we defined before (nodejs_design_patterns_app
). That's it, now we only need to reload the Nginx configuration with the command:
nginx -s reload
Our system should now be up and running, ready to accept requests and balance the traffic across the four instances of our Node.js application. Simply point your browser to the address http://localhost
to see how the traffic is balanced by our Nginx server.
One important advantage of modern cloud-based infrastructures is the ability to dynamically adjust the capacity of an application based on the current or predicted traffic; this is also known as dynamic scaling. If implemented properly, this practice can reduce the cost of the IT infrastructure enormously while still keeping the application highly available and responsive.
The idea is simple: if our application is experiencing a performance degradation caused by a peak in the traffic, we automatically spawn new servers to cope with the increased load. We could also decide to shut down some servers during certain hours, for example, at night, when we know that the traffic will be less, and restarting them again in the morning. This mechanism requires the load balancer to always be up-to-date with the current network topology, knowing at any time which server is up.
A common pattern to solve this problem is to use a central repository called a service registry, which keeps track of the running servers and the services they provide. The next figure shows a multiservice architecture with a load balancer on the front, dynamically configured using a service registry:
The preceding architecture assumes the presence of two services, API
and WebApp
. The load balancer distributes the requests arriving on the /api
endpoint to all the servers implementing the API
service, while the rest of the requests are spread across the servers implementing the WebApp
service. The load balancer obtains the list of servers using the service registry.
For this to work in complete automation, each application instance has to register itself to the service registry the moment it comes up online, and unregister itself when it stops. This way, the load balancer can always have an up-to-date view of the servers and the services available on the network.
This pattern can be applied not only to load balancing, but also more generally as a way to decouple a service type from the servers providing it. We can look at it as a service locator design pattern applied to network services.
To support a dynamic network infrastructure, we can use a reverse proxy such as Nginx or HAProxy; all we need to do is update their configuration using an automated service and then force the load balancer to pick the changes. For Nginx, this can be done using the following command line:
nginx -s reload
The same result can be achieved with a cloud-based solution, but we have a third and more familiar alternative that makes use of our favorite platform.
We all know that Node.js is a great tool to build any sort of network application; as we said, this is exactly one of its main design goals. So, why not build a load balancer using nothing but Node.js? This would give us much more freedom and power, and would allow us to implement any sort of pattern or algorithm straight into our custom-built load balancer, including the one we are now going to explore, dynamic load balancing using a service registry. In this example we are going to use Consul (https://www.consul.io) as the service registry.
For this example, we want to replicate the multiservice architecture we saw in the figure of the previous section, and to do that, we are going to mainly use three npm packages:
http-proxy
(https://npmjs.org/package/http-proxy): This is a library to simplify the creation of proxies and load balancers in Node.jsportfinder
(https://npmjs.com/package/portfinder): This is a library that allows a free port in the system to be discoveredconsul
(https://npmjs.org/package/consul): This is a library that allows to services to be registered in ConsulLet's start by implementing our services. They are simple HTTP servers like the ones we have used so far to test cluster
and Nginx, but this time we want each server to register itself into the service registry the moment it starts.
Let's see how this looks (file app.js
):
const http = require('http'); const pid = process.pid; const consul = require('consul')(); const portfinder = require('portfinder'); const serviceType = process.argv[2]; portfinder.getPort((err, port) => { // [1] const serviceId = serviceType+port; consul.agent.service.register({ // [2] id: serviceId, name: serviceType, address: 'localhost', port: port, tags: [serviceType] }, () => { const unregisterService = (err) => { // [3] consul.agent.service.deregister(serviceId, () => { process.exit(err ? 1 : 0); }); }; process.on('exit', unregisterService); // [4] process.on('SIGINT', unregisterService); process.on('uncaughtException', unregisterService); http.createServer((req, res) => { // [5] for (let i = 1e7; i> 0; i--) {} console.log(`Handling request from ${pid}`); res.end(`${serviceType} response from ${pid} `); }).listen(port, () => { console.log(`Started ${serviceType} (${pid}) on port ${port}`); }); }); });
In the preceding code, there are some parts that deserve our attention:
portfinder.getPort
to discover a free port in the system (by default, portfinder
starts to search from port 8000).id
(a unique name for the service), name
(a generic name that identifies the service), address
and port
(to identify how to access the service), tags
(an optional array of tags that can be used to filter and group services). We are using serviceType
(that we get as a command-line argument) to specify the service name and to add a tag. This will allow us to identify all the services of the same type available in the cluster.unregisterService
that allows us to remove the service we just registered in Consul.unregisterService
as a cleanup function, so that when the program is closed (either intentionally or by accident), the service is unregistered from Consul.portfinder
.Now it's time to implement the load balancer. Let's do that by creating a new module called loadBalancer.js
. First, we need to define a routing table to map URL paths to services:
const routing = [ { path: '/api', service: 'api-service', index: 0 }, { path: '/', service: 'webapp-service', index: 0 } ];
Each item in the routing
array contains service
used to handle the requests arriving on the mapped path
. The index
property will be used to round robin the requests of a given service.
Let's see how this works by implementing the second part of loadbalancer.js
:
const http = require('http'); const httpProxy = require('http-proxy'); const consul = require('consul')(); // [1] const proxy = httpProxy.createProxyServer({}); http.createServer((req, res) => { let route; routing.some(entry => { // [2] route = entry; //Starts with the route path? return req.url.indexOf(route.path) === 0; }); consul.agent.service.list((err, services) => { // [3] const servers = []; Object.keys(services).filter(id => { // if (services[id].Tags.indexOf(route.service) > -1) { servers.push(`http://${services[id].Address}:${services[id].Port}`) } }); if (!servers.length) { res.writeHead(502); return res.end('Bad gateway'); } route.index = (route.index + 1) % servers.length; // [4] proxy.web(req, res, {target: servers[route.index]}); }); }).listen(8080, () => console.log('Load balancer started on port 8080'));
This is how we implemented our Node.js-based load balancer:
consul
so that we can have access to the registry. Next, we instantiate an http-proxy
object and start a normal web server.consul
the list of servers implementing the required service. If this list is empty, we return an error to the client. We use the Tag
attribute to filter all the available services and find the address of the servers that implements the current service type.route.index
to point to the next server in the list, following a round robin approach. We then use the index to select a server from the list, passing it to proxy.web()
along with the request (req
) and the response (res
) objects. This will simply forward the request to the server we chose.It is now clear how simple it is to implement a load balancer using only Node.js and a service registry and how much flexibility we can have by doing so. Now we should be ready to give it a go, but first, let's install the consul
server by following the official documentation at: https://www.consul.io/intro/getting-started/install.html.
This allows us to start the consul
service registry in our development machine with this simple command line:
consul agent -dev
Now we are ready to start the load balancer:
node loadBalancer
Now if we try to access some of the services exposed by the load balancer, we will notice that it returns an HTTP 502 error, because we didn't start any server yet. Try it yourself:
curl localhost:8080/api
The preceding command should return the following output:
Bad Gateway
The situation will change if we spawn some instances of our services, for example, two api-service
and one webapp-service
:
forever start app.js api-service forever start app.js api-service forever start app.js webapp-service
Now the load balancer should automatically see the new servers and start distributing requests across them. Let's try again with the following command:
curl localhost:8080/api
The preceding command should now return this:
api-service response from 6972
By running it again, we should now receive a message from another server, confirming that the requests are being distributed evenly among the different servers:
api-service response from 6979
The advantages of this pattern are immediate. We can now scale our infrastructure dynamically, on demand, or based on a schedule, and our load balancer will automatically adjust with the new configuration without any extra effort!
Using a reverse proxy is almost a necessity when we want to expose a complex internal network architecture to a public network such as the Internet. It helps hide the complexity, providing a single access point that external applications can easily use and rely on. However, if we need to scale a service that is for internal use only, we can have much more flexibility and control.
Let's imagine having a Service A which relies on a Service B to implement its functionality. Service B is scaled across multiple machines and it's available only in the internal network. What we have learned so far is that Service A will connect to Service B using a reverse proxy, which will distribute the traffic to all the servers implementing Service B.
However, there is an alternative. We can remove the reverse proxy from the picture and distribute the requests directly from the client (Service A), which now becomes directly responsible for load balancing its connections across the various instances of Service B. This is possible only if Server A knows the details about the servers exposing Service B, and in an internal network, this is usually known information. With this approach we are essentially implementing peer-to-peer load balancing.
The following diagram compares the two alternatives we just described:
This is an extremely simple and effective pattern that enables truly distributed communications without bottlenecks or single points of failure. Besides that, it also does the following:
On the other hand, by removing the reverse proxy, we are actually exposing the complexity of its underlying infrastructure. Also, each client has to be smarter by implementing a load-balancing algorithm and, possibly, also a way to keep its knowledge of the infrastructure up-to-date.
Peer-to-peer load balancing is a pattern used extensively in the ØMQ (http://zeromq.org) library.
We already know how to implement a load balancer using only Node.js and distribute incoming requests across the available servers, so implementing the same mechanism on the client side should not be that different. All we have to do in fact is wrap the client API and augment it with a load-balancing mechanism. Take a look at the following module (balancedRequest.js
):
const http = require('http'); const servers = [ {host: 'localhost', port: '8081'}, {host: 'localhost', port: '8082'} ]; let i = 0; module.exports = (options, callback) => { i = (i + 1) % servers.length; options.hostname = servers[i].host; options.port = servers[i].port; return http.request(options, callback); };
The preceding code is very simple and needs little explanation. We wrapped the original http.request
API so that it overrides the hostname
and port
of the request with those selected from the list of available servers using a round robin algorithm.
The new wrapped API can then be used seamlessly (client.js
):
const request = require('./balancedRequest');
for(let i = 10; i>= 0; i--) {
request({method: 'GET', path: '/'}, res => {
let str = '';
res.on('data', chunk => {
str += chunk;
}).on('end', () => {
console.log(str);
});
}).end();
}
To try the preceding code, we have to start two instances of the sample server provided:
node app 8081 node app 8082
Followed by the client application we just built:
node client
We should notice how each request is sent to a different server, confirming that we are now able to balance the load without a dedicated reverse proxy!
3.15.18.198