12

Scalability and Architectural Patterns

In its early days, Node.js was just a non-blocking web server written in C++ and JavaScript and was called web.js. Its creator, Ryan Dahl, soon realized the potential of the platform and started extending it with tools to enable the creation of different types of server-side applications on top of JavaScript and the non-blocking paradigm.

The characteristics of Node.js are perfect for the implementation of distributed systems, ranging from a few nodes to thousands of nodes communicating through the network: Node.js was born to be distributed.

Unlike in other web platforms, scalability is a topic that gets explored rather quickly in Node.js while developing an application. This is often because of the single-threaded nature of Node.js, which is incapable of exploiting all the resources of a multi-core machine. But this is just one side of the coin. In reality, there are more profound reasons for talking about scalability with Node.js.

As we will see in this chapter, scaling an application does not only mean increasing its capacity, enabling it to handle more requests faster: it's also a crucial path to achieving high availability and tolerance to errors.

Sometimes, we even refer to scalability when we talk about ways to split the complexity of an application into more manageable pieces. Scalability is a concept with multiple faces, six to be precise, as many as there are faces on a cube—the scale cube.

In this chapter, you will learn the following topics:

  • Why you should care about scalability
  • What the scale cube is and why it is useful to understand scalability
  • How to scale by running multiple instances of the same application
  • How to leverage a load balancer when scaling an application
  • What a service registry is and how it can be used
  • Running and scaling Node.js applications using container orchestration platforms like Kubernetes
  • How to design a microservice architecture out of a monolithic application
  • How to integrate a large number of services through the use of some simple architectural patterns

An introduction to application scaling

Scalability can be described as the capability of a system to grow and adapt to ever-changing conditions. Scalability is not limited to pure technical growth; it is also dependent on the growth of a business and the organization behind it.

If you are building the next "unicorn startup" and you expect your product to rapidly reach millions of users worldwide, you will face serious scalability challenges. How is your application going to sustain ever-increasing demand? Is the system going to get slower over time or crash often? How can you store high volumes of data and keep I/O under control? As you hire more people, how can you organize the different teams effectively and make them able to work autonomously, without contention across the different parts of the codebase?

Even if you are not working on a high-scale project, that doesn't mean that you will be free from scalability concerns. You will just face different types of scalability challenges. Being unprepared for these challenges might seriously hinder the success of the project and ultimately damage the company behind it. It's important to approach scalability in the context of the specific project and understand the expectations for current and future business needs.

Since scalability is such a broad topic, in this chapter, we will focus our attention on discussing the role of Node.js in the context of scalability. We will discuss several useful patterns and architectures used to scale Node.js applications.

With these in your toolbelt and a solid understanding of your business context, you will be able to design and implement Node.js applications that can adapt and satisfy your business needs and keep your customers happy.

Scaling Node.js applications

We already know that most of the workload of a typical Node.js application runs in the context of a single thread. In Chapter 1The Node.js Platform, we learned that this is not necessarily a limitation but rather an advantage, because it allows the application to optimize the usage of the resources necessary to handle concurrent requests, thanks to the non-blocking I/O paradigm. This model works wonderfully for applications handling a moderate number of requests per second (usually a few hundred per second), especially if the application is mostly performing I/O-bound tasks (for example, reading and writing from the filesystem and the network) rather than CPU-bound ones (for example, number crunching and data processing).

In any case, assuming we are using commodity hardware, the capacity that a single thread can support is limited. This is regardless of how powerful a server can be, so if we want to use Node.js for high-load applications, the only way is to scale it across multiple processes and machines.

However, workload is not the only reason to scale a Node.js application. In fact, with the same techniques that allow us to scale workloads, we can obtain other desirable properties such as high availability and tolerance to failures. Scalability is also a concept applicable to the size and complexity of an application. In fact, building architectures that can grow as much as needed over time is another important factor when designing software.

JavaScript is a tool to be used with caution. The lack of type checking and its many gotchas can be an obstacle to the growth of an application, but with discipline and accurate design, we can turn some of its downsides into precious advantages. With JavaScript, we are often pushed to keep the application simple and to split its components it into small, manageable pieces. This mindset can make it easier to build applications that are distributed and scalable, but also easy to evolve over time.

The three dimensions of scalability

When talking about scalability, the first fundamental principle to understand is load distribution, which is the science of splitting the load of an application across several processes and machines. There are many ways to achieve this, and the book The Art of Scalability by Martin L. Abbott and Michael T. Fisher proposes an ingenious model to represent them, called the scale cube. This model describes scalability in terms of the following three dimensions:

  • X-axis — Cloning
  • Y-axis — Decomposing by service/functionality
  • Z-axis — Splitting by data partition

These three dimensions can be represented as a cube, as shown in Figure 12.1:

Figure 12.1: The scale cube

The bottom-left corner of the cube (that is, the intersection between the X-axis and the Y-axis) represents the application having all the functionality in a single code base and running on a single instance. This is what we generally call a monolithic application. This is a common situation for applications handling small workloads or at the early stages of their development. Given a monolithic application, there are three different strategies for scaling it. By looking at the scale cube, these strategies are represented as growth along the different axes of the cube: X, Y, and Z:

  • X-axis — Cloning: The most intuitive evolution of a monolithic, unscaled application is moving right along the X-axis, which is simple, most of the time inexpensive (in terms of development cost), and highly effective. The principle behind this technique is elementary, that is, cloning the same application n times and letting each instance handle 1/nth of the workload.
  • Y-axis — Decomposing by service/functionality: Scaling along the Y-axis means decomposing the application based on its functionalities, services, or use cases. In this instance, decomposing means creating different, standalone applications, each with its own codebase, possibly with its own dedicated database, and even with a separate UI.

    For example, a common situation is separating the part of an application responsible for the administration from the public-facing product. Another example is extracting the services responsible for user authentication, thereby creating a dedicated authentication server.

    The criteria to split an application by its functionalities depend mostly on its business requirements, the use cases, the data, and many other factors, as we will see later in this chapter. Interestingly, this is the scaling dimension with the biggest repercussions, not only on the architecture of an application but also on the way it is managed from a development and an operational perspective. As we will see, microservice is a term that is most commonly associated with a fine-grained Y-axis scaling.

  • Z-axis — Splitting by data partition: The last scaling dimension is the Z-axis, where the application is split in such a way that each instance is responsible for only a portion of the whole data. This is a technique often used in databases, also known as horizontal/vertical partitioning. In this setup, there are multiple instances of the same application, each of them operating on a partition of the data, which is determined using different criteria.

    For example, we could partition the users of an application based on their country (list partitioning), or based on the starting letter of their surname (range partitioning), or by letting a hash function decide the partition each user belongs to (hash partitioning).

    Each partition can then be assigned to a particular instance of the application. The use of data partitions requires each operation to be preceded by a lookup step to determine which instance of the application is responsible for a given datum. As we said, data partitioning is usually applied and handled at the data storage level because its main purpose is overcoming the problems related to handling large monolithic datasets (limited disk space, memory, and network capacity). Applying it at the application level is worth considering only for complex, distributed architectures or for very particular use cases such as, for example, when building applications relying on custom solutions for data persistence, when using databases that don't support partitioning, or when building applications at Google scale. Considering its complexity, scaling an application along the Z-axis should be taken into consideration only after the X and Y axes of the scale cube have been fully exploited.

In the following sections, we will focus on the two most common and effective techniques used to scale Node.js applications, namely, cloning and decomposing by functionality/service.

Cloning and load balancing

Traditional, multithreaded web servers are usually only scaled horizontally when the resources assigned to a machine cannot be upgraded any more, or when doing so would involve a higher cost than simply launching another machine.

By using multiple threads, traditional web servers can take advantage of all the processing power of a server, using all the available processors and memory. Conversely, Node.js applications, being single-threaded, are usually scaled much sooner compared to traditional web servers. Even in the context of a single machine, we need to find ways to "scale" an application in order to take advantage of all the available resources.

In Node.js, vertical scaling (adding more resources to a single machine) and horizontal scaling (adding more machines to the infrastructure) are almost equivalent concepts: both, in fact, involve similar techniques to leverage all the available processing power.

Don't be fooled into thinking about this as a disadvantage. On the contrary, being almost forced to scale has beneficial effects on other attributes of an application, in particular, availability and fault-tolerance. In fact, scaling a Node.js application by cloning is relatively simple and it's often implemented even if there is no need to harvest more resources, just for the purpose of having a redundant, fail-tolerant setup.

This also pushes the developer to take into account scalability from the early stages of an application, making sure the application does not rely on any resource that cannot be shared across multiple processes or machines. In fact, an absolute prerequisite to scaling an application is that each instance does not have to store common information on resources that cannot be shared, such as memory or disk. For example, in a web server, storing the session data in memory or on disk is a practice that does not work well with scaling. Instead, using a shared database will ensure that each instance will have access to the same session information, wherever it is deployed.

Let's now introduce the most basic mechanism for scaling Node.js applications: the cluster module.

The cluster module

In Node.js, the simplest pattern to distribute the load of an application across different instances running on a single machine is by using the cluster module, which is part of the core libraries. The cluster module simplifies the forking of new instances of the same application and automatically distributes incoming connections across them, as shown in Figure 12.2:

Figure 12.2: Cluster module schematic

The master process is responsible for spawning a number of processes (workers), each representing an instance of the application we want to scale. Each incoming connection is then distributed across the cloned workers, spreading the load across them.

Since every worker is an independent process, you can use this approach to spawn as many workers as the number of CPUs available in the system. With this approach, you can easily allow a Node.js application to take advantage of all the computing power available in the system.

Notes on the behavior of the cluster module

In most systems, the cluster module uses an explicit round-robin load balancing algorithm. This algorithm is used inside the master process, which makes sure the requests are evenly distributed across all the workers. Round-robin scheduling is enabled by default on all platforms except Windows, and it can be globally modified by setting the variable cluster.schedulingPolicy and using the constants cluster.SCHED_RR (round robin) or cluster.SCHED_NONE (handled by the operating system).

The round-robin algorithm distributes the load evenly across the available servers on a rotational basis. The first request is forwarded to the first server, the second to the next server in the list, and so on. When the end of the list is reached, the iteration starts again from the beginning. In the cluster module, the round-robin logic is a little bit smarter than the traditional implementation. In fact, it is enriched with some extra behaviors that aim to avoid overloading a given worker process.

When we use the cluster module, every invocation to server.listen() in a worker process is delegated to the master process. This allows the master process to receive all the incoming messages and distribute them to the pool of workers. The cluster module makes this delegation process very simple for most use cases, but there are several edge cases in which calling server.listen() in a worker module might not do what you expect:

  • server.listen({fd}): If a worker listens using a specific file descriptor, for instance, by invoking server.listen({fd: 17}), this operation might produce unexpected results. File descriptors are mapped at the process level, so if a worker process maps a file descriptor, this won't match the same file in the master process. One way to overcome this limitation is to create the file descriptor in the master process and then pass it to the worker process. This way, the worker process can invoke server.listen() using a descriptor that is known to the master.
  • server.listen(handle): Listening using handle objects (FileHandle) explicitly in a worker process will cause the worker to use the supplied handle directly, rather than delegating the operation to the master process.
  • server.listen(0): Calling server.listen(0) will generally cause servers to listen on a random port. However, in a cluster, each worker will receive the same "random" port each time they call server.listen(0). In other words, the port is random only the first time; it will be fixed from the second call on. If you want every worker to listen on a different random port, you have to generate the port numbers by yourself.

Building a simple HTTP server

Let's now start working on an example. Let's build a small HTTP server, cloned and load balanced using the cluster module. First of all, we need an application to scale, and for this example, we don't need too much, just a very basic HTTP server.

So, let's create a file called app.js containing the following code:

import { createServer } from 'http'
const { pid } = process
const server = createServer((req, res) => {
  // simulates CPU intensive work
  let i = 1e7; while (i > 0) { i-- }
  console.log(`Handling request from ${pid}`)
  res.end(`Hello from ${pid}
`)
})
server.listen(8080, () => console.log(`Started at ${pid}`))

The HTTP server we just built responds to any request by sending back a message containing its process identifier (PID); this is useful for identifying which instance of the application is handling the request. In this version of the application, we have only one process, so the PID that you see in the responses and the logs will always be the same.

Also, to simulate some actual CPU work, we perform an empty loop 10 million times: without this, the server load would be almost insignificant and it will be quite hard to draw conclusions from the benchmarks we are going to run.

The app module we create here is just a simple abstraction for a generic web server. We are not using a web framework like Express or Fastify for simplicity, but feel free to rewrite these examples using your web framework of choice.

You can now check if all works as expected by running the application as usual and sending a request to http://localhost:8080 using either a browser or curl.

You can also try to measure the requests per second that the server is able to handle on one process. For this purpose, you can use a network benchmarking tool such as autocannon (nodejsdp.link/autocannon):

npx autocannon -c 200 -d 10 http://localhost:8080

The preceding command will load the server with 200 concurrent connections for 10 seconds. As a reference, the result we got on our machine (a 2.5 GHz quad-core Intel Core i7 using Node.js v14) is in the order of 300 transactions per second.

Please remember that the load tests we will perform in this chapter are intentionally simple and minimal and are provided only for reference and learning purposes. Their results cannot provide a 100% accurate evaluation of the performance of the various techniques we are analyzing. When you are trying to optimize a real production application, make sure to always run your own benchmarks after every change. You might find out that, among the different techniques we are going to illustrate here, some can be more effective than others for your specific application.

Now that we have a simple test web application and some reference benchmarks, we are ready to try some techniques to improve the performance of the application.

Scaling with the cluster module

Let's now update app.js to scale our application using the cluster module:

import { createServer } from 'http'
import { cpus } from 'os'
import cluster from 'cluster'
if (cluster.isMaster) {                                    // (1)
  const availableCpus = cpus()
  console.log(`Clustering to ${availableCpus.length} processes`)
  availableCpus.forEach(() => cluster.fork())
} else {                                                   // (2)
  const { pid } = process
  const server = createServer((req, res) => {
    let i = 1e7; while (i > 0) { i-- }
    console.log(`Handling request from ${pid}`)
    res.end(`Hello from ${pid}
`)
  })
  server.listen(8080, () => console.log(`Started at ${pid}`))
}

As we can see, using the cluster module requires very little effort. Let's analyze what is happening:

  1. When we launch app.js from the command line, we are actually executing the master process. In this case, the cluster.isMaster variable is set to true and the only work we are required to do is forking the current process using cluster.fork(). In the preceding example, we are starting as many workers as there are logical CPU cores in the system to take advantage of all the available processing power.
  2. When cluster.fork() is executed from the master process, the current module (app.js) is run again, but this time in worker mode (cluster.isWorker is set to true, while cluster.isMaster is false). When the application runs as a worker, it can start doing some actual work. In this case, it starts a new HTTP server.

It's important to remember that each worker is a different Node.js process with its own event loop, memory space, and loaded modules.

It's interesting to note that the usage of the cluster module is based on a recurring pattern, which makes it very easy to run multiple instances of an application:

if (cluster.isMaster) {
  // fork()
} else {
  // do work
}

Under the hood, the cluster.fork() function uses the child_process.fork() API, therefore, we also have a communication channel available between the master and the workers. The worker processes can be accessed from the variable cluster.workers, so broadcasting a message to all of them would be as easy as running the following line of code:

Object.values(cluster.workers).forEach(worker => worker.send('Hello from the master'))

Now, let's try to run our HTTP server in cluster mode. If our machine has more than one core, we should see a number of workers being started by the master process, one after the other. For example, in a system with four logical cores, the terminal should look like this:

Started 14107
Started 14099
Started 14102
Started 14101

If we now try to hit our server again using the URL http://localhost:8080, we should notice that each request will return a message with a different PID, which means that these requests have been handled by different workers, confirming that the load is being distributed among them.

Now, we can try to load test our server again:

npx autocannon -c 200 -d 10 http://localhost:8080

This way, we should be able to discover the performance increase obtained by scaling our application across multiple processes. As a reference, in our machine, we saw a performance increase of about 3.3x (1,000 trans/sec versus 300 trans/sec).

Resiliency and availability with the cluster module

Because workers are all separate processes, they can be killed or respawned depending on a program's needs, without affecting other workers. As long as there are some workers still alive, the server will continue to accept connections. If no workers are alive, existing connections will be dropped, and new connections will be refused. Node.js does not automatically manage the number of workers; however, it is the application's responsibility to manage the worker pool based on its own needs.

As we already mentioned, scaling an application also brings other advantages, in particular, the ability to maintain a certain level of service, even in the presence of malfunctions or crashes. This property is also known as resiliency and it contributes to the availability of a system.

By starting multiple instances of the same application, we are creating a redundant system, which means that if one instance goes down for whatever reason, we still have other instances ready to serve requests. This pattern is pretty straightforward to implement using the cluster module. Let's see how it works!

Let's take the code from the previous section as a starting point. In particular, let's modify the app.js module so that it crashes after a random interval of time:

// ...
} else {
  // Inside our worker block
  setTimeout(
    () => { throw new Error('Ooops') },
    Math.ceil(Math.random() * 3) * 1000
  )
  // ...

With this change in place, our server exits with an error after a random number of seconds between 1 and 3. In a real-life situation, this would eventually cause our application to stop serving requests, unless we use some external tool to monitor its status and restart it automatically. However, if we only have one instance, there may be a non-negligible delay between restarts caused by the startup time of the application. This means that during those restarts, the application is not available. Having multiple instances instead will make sure we always have a backup process to serve an incoming request, even when one of the workers fails.

With the cluster module, all we have to do is spawn a new worker as soon as we detect that one is terminated with an error code. Let's modify app.js to take this into account:

// ...
if (cluster.isMaster) {
  // ...
  cluster.on('exit', (worker, code) => {
    if (code !== 0 && !worker.exitedAfterDisconnect) {
      console.log(
        `Worker ${worker.process.pid} crashed. ` +
        'Starting a new worker'
      )
      cluster.fork()
    }
  })
} else {
  // ...
}

In the preceding code, as soon as the master process receives an 'exit' event, we check whether the process is terminated intentionally or as the result of an error. We do this by checking the status code and the flag worker.exitedAfterDisconnect, which indicates whether the worker was terminated explicitly by the master. If we confirm that the process was terminated because of an error, we start a new worker. It's interesting to note that while the crashed worker gets replaced, the other workers can still serve requests, thus not affecting the availability of the application.

To test this assumption, we can try to stress our server again using autocannon. When the stress test completes, we will notice that among the various metrics in the output, there is also an indication of the number of failures. In our case, it is something like this:

[...]
8k requests in 10.07s, 964 kB read
674 errors (7 timeouts)

This should amount to about 92% availability. Bear in mind that this result can vary a lot as it greatly depends on the number of running instances and how many times they crash during the test, but it should give us a good indicator of how our solution works. The preceding numbers tell us that despite the fact that our application is constantly crashing, we only had 674 failed requests over 8,000 hits.

In the example scenario that we just built, most of the failing requests will be caused by the interruption of already established connections during a crash. Unfortunately, there is very little we can do to prevent these types of failures, especially when the application terminates because of a crash. Nonetheless, our solution proves to be working and its availability is not bad at all for an application that crashes so often!

Zero-downtime restart

A Node.js application might also need to be restarted when we want to release a new version to our production servers. So, also in this scenario, having multiple instances can help maintain the availability of our application.

When we have to intentionally restart an application to update it, there is a small window in which the application restarts and is unable to serve requests. This can be acceptable if we are updating our personal blog, but it's not even an option for a professional application with a service-level agreement (SLA) or one that is updated very often as part of a continuous delivery process. The solution is to implement a zero-downtime restart, where the code of an application is updated without affecting its availability.

With the cluster module, this is, again, a pretty easy task: the pattern involves restarting the workers one at a time. This way, the remaining workers can continue to operate and maintain the services of the application available.

Let's add this new feature to our clustered server. All we have to do is add some new code to be executed by the master process:

import { once } from 'events'
// ...
if (cluster.isMaster) {
  // ...
  process.on('SIGUSR2', async () => {                        // (1)
    const workers = Object.values(cluster.workers)
    for (const worker of workers) {                          // (2)
      console.log(`Stopping worker: ${worker.process.pid}`)
      worker.disconnect()                                    // (2)
      await once(worker, 'exit')
      if (!worker.exitedAfterDisconnect) continue
      const newWorker = cluster.fork()                       // (4)
      await once(newWorker, 'listening')                     // (5)
    }
  })
} else {
  // ...
}

This is how the preceding code block works:

  1. The restarting of the workers is triggered on receiving the SIGUSR2 signal. Note that we are using an async function to implement the event handler as we will need to perform some asynchronous tasks here.
  2. When a SIGUSR2 signal is received, we iterate over all the values of the cluster.workers object. Every element is a worker object that we can use to interact with a given worker currently active in the pool of workers.
  3. The first thing we do for the current worker is invoke worker.disconnect(), which stops the worker gracefully. This means that if the worker is currently handling a request, this won't be interrupted abruptly; instead, it will be completed. The worker exits only after the completion of all inflight requests.
  4. When the terminated process exits, we can spawn a new worker.
  5. We wait for the new worker to be ready and listening for new connections before we proceed with restarting the next worker.

Since our program makes use of Unix signals, it will not work properly on Windows systems (unless you are using the Windows Subsystem for Linux). Signals are the simplest mechanism to implement our solution. However, this isn't the only one. In fact, other approaches include listening for a command coming from a socket, a pipe, or the standard input.

Now, we can test our zero-downtime restart by running the application and then sending a SIGUSR2 signal. However, we first need to obtain the PID of the master process. The following command can be useful to identify it from the list of all the running processes:

ps -af

The master process should be the parent of a set of node processes. Once we have the PID we are looking for, we can send the signal to it:

kill -SIGUSR2 <PID>

Now, the output of the application should display something like this:

Restarting workers
Stopping worker: 19389
Started 19407
Stopping worker: 19390
Started 19409

We can try to use autocannon again to verify that we don't have any considerable impact on the availability of our application during the restart of the workers.

pm2 (nodejsdp.link/pm2) is a small utility, based on cluster, which offers load balancing, process monitoring, zero-downtime restarts, and other goodies.

Dealing with stateful communications

The cluster module does not work well with stateful communications where the application state is not shared between the various instances. This is because different requests belonging to the same stateful session may potentially be handled by a different instance of the application. This is not a problem limited only to the cluster module, but, in general, it applies to any kind of stateless, load balancing algorithm. Consider, for example, the situation described by Figure 12.3:

Figure 12.3: An example issue with a stateful application behind a load balancer

The user John initially sends a request to our application to authenticate himself, but the result of the operation is registered locally (for example, in memory), so only the instance of the application that receives the authentication request (Instance A) knows that John is successfully authenticated. When John sends a new request, the load balancer might forward it to a different instance of the application, which actually doesn't possess the authentication details of John, hence refusing to perform the operation. The application we just described cannot be scaled as it is, but luckily, there are two easy solutions we can apply to solve this problem.

Sharing the state across multiple instances

The first option we have to scale an application using stateful communications is sharing the state across all the instances.

This can be easily achieved with a shared datastore, such as, for example, a database like PostgreSQL (nodejsdp.link/postgresql), MongoDB (nodejsdp.link/mongodb), or CouchDB (nodejsdp.link/couchdb), or, even better, we can use an in-memory store such as Redis (nodejsdp.link/redis) or Memcached (nodejsdp.link/memcached).

Figure 12.4 outlines this simple and effective solution:

Figure 12.4: Application behind a load balancer using a shared data store

The only drawback of using a shared store for the communication state is that applying this pattern might require a significant amount of refactoring of the code base. For example, we might be using an existing library that keeps the communication state in memory, so we have to figure out how to configure, replace, or reimplement this library to use a shared store.

In cases where refactoring might not be feasible, for instance, because of too many changes required or stringent time constraints in making the application more scalable, we can rely on a less invasive solution: sticky load balancing (or sticky sessions).

Sticky load balancing

The other alternative we have to support stateful communications is having the load balancer always routing all of the requests associated with a session to the same instance of the application. This technique is also called sticky load balancing.

Figure 12.5 illustrates a simplified scenario involving this technique:

Figure 12.5: An example illustrating how sticky load balancing works

As we can see from Figure 12.5, when the load balancer receives a request associated with a new session, it creates a mapping with one particular instance selected by the load balancing algorithm. The next time the load balancer receives a request from that same session, it bypasses the load balancing algorithm, selecting the application instance that was previously associated with the session. The particular technique we just described involves inspecting the session ID associated with the requests (usually included in a cookie by the application or the load balancer itself).

A simpler alternative to associate a stateful connection to a single server is by using the IP address of the client performing the request. Usually, the IP is provided to a hash function that generates an ID representing the application instance designated to receive the request. This technique has the advantage of not requiring the association to be remembered by the load balancer. However, it doesn't work well with devices that frequently change IP, for example, when roaming on different networks.

Sticky load balancing is not supported by default by the cluster module, but it can be added with an npm library called sticky-session (nodejsdp.link/sticky-session).

One big problem with sticky load balancing is the fact that it nullifies most of the advantages of having a redundant system, where all the instances of the application are the same, and where an instance can eventually replace another one that stopped working. For these reasons, it is recommended to always try to avoid sticky load balancing and building applications that maintain session state in a shared store. Alternatively, where feasible, you can try to build applications that don't require stateful communications at all; for example, by including the state in the request itself.

For a real example of a library requiring sticky load balancing, we can mention Socket.IO (nodejsdp.link/socket-io).

Scaling with a reverse proxy

The cluster module, although very convenient and simple to use, is not the only option we have to scale a Node.js web application. Traditional techniques are often preferred because they offer more control and power in highly-available production environments.

The alternative to using cluster is to start multiple standalone instances of the same application running on different ports or machines, and then use a reverse proxy (or gateway) to provide access to those instances, distributing the traffic across them. In this configuration, we don't have a master process distributing requests to a set of workers, but a set of distinct processes running on the same machine (using different ports) or scattered across different machines inside a network. To provide a single access point to our application, we can use a reverse proxy, a special device or service placed between the clients and the instances of our application, which takes any request and forwards it to a destination server, returning the result to the client as if it was itself the origin. In this scenario, the reverse proxy is also used as a load balancer, distributing the requests among the instances of the application.

For a clear explanation of the differences between a reverse proxy and a forward proxy, you can refer to the Apache HTTP server documentation at nodejsdp.link/forward-reverse.

Figure 12.6 shows a typical multi-process, multi-machine configuration with a reverse proxy acting as a load balancer on the front:

Figure 12.6: A typical multi-process, multi-machine configuration with a reverse proxy acting as a load balancer

For a Node.js application, there are many reasons to choose this approach in place of the cluster module:

  • A reverse proxy can distribute the load across several machines, not just several processes.
  • The most popular reverse proxies on the market support sticky load balancing out of the box.
  • A reverse proxy can route a request to any available server, regardless of its programming language or platform.
  • We can choose more powerful load balancing algorithms.
  • Many reverse proxies offer additional powerful features such as URL rewrites, caching, SSL termination point, security features (for example, denial-of-service protection), or even the functionality of fully-fledged web servers that can be used to, for example, serve static files.

That said, the cluster module could also be easily combined with a reverse proxy if necessary, for example, by using cluster to scale vertically inside a single machine and then using the reverse proxy to scale horizontally across different nodes.

Pattern

Use a reverse proxy to balance the load of an application across multiple instances running on different ports or machines.

We have many options to implement a load balancer using a reverse proxy. The following is a list of the most popular solutions:

  • Nginx (nodejsdp.link/nginx): This is a web server, reverse proxy, and load balancer, built upon the non-blocking I/O model.
  • HAProxy (nodejsdp.link/haproxy): This is a fast load balancer for TCP/HTTP traffic.
  • Node.js-based proxies: There are many solutions for the implementation of reverse proxies and load balancers directly in Node.js. This might have advantages and disadvantages, as we will see later.
  • Cloud-based proxies: In the era of cloud computing, it's not rare to utilize a load balancer as a service. This can be convenient because it requires minimal maintenance, it's usually highly scalable, and sometimes it can support dynamic configurations to enable on-demand scalability.

In the next few sections of this chapter, we will analyze a sample configuration using Nginx. Later on, we will work on building our very own load balancer using nothing but Node.js!

Load balancing with Nginx

To give you an idea of how reverse proxies work, we will now build a scalable architecture based on Nginx, but first, we need to install it. We can do that by following the instructions at nodejsdp.link/nginx-install.

On the latest Ubuntu system, you can quickly install Nginx with the command sudo apt-get install nginx. On macOS, you can use brew (nodejsdp.link/brew): brew install nginx. Note that for the following examples, we will be using the latest version of Nginx available at the time of writing (1.17.10).

Since we are not going to use cluster to start multiple instances of our server, we need to slightly modify the code of our application so that we can specify the listening port using a command-line argument. This will allow us to launch multiple instances on different ports. Let's consider the main module of our example application (app.js):

import { createServer } from 'http'
const { pid } = process
const server = createServer((req, res) => {
  let i = 1e7; while (i > 0) { i-- }
  console.log(`Handling request from ${pid}`)
  res.end(`Hello from ${pid}
`)
})
const port = Number.parseInt(
  process.env.PORT || process.argv[2]
) || 8080
server.listen(port, () => console.log(`Started at ${pid}`))

The only difference between this version and the first version of our web server is that here, we are making the port number configurable through the PORT environment variable or a command-line argument. This is needed because we want to be able to start multiple instances of the server and allow them to listen on different ports.

Another important feature that we won't have available without cluster is the automatic restart in case of a crash. Luckily, this is easy to fix by using a dedicated supervisor, that is, an external process that monitors our application and restarts it if necessary. The following are some possible choices:

For this example, we are going to use forever, which is the simplest and most immediate for us to use. We can install it globally by running the following command:

npm install forever -g

The next step is to start the four instances of our application, all on different ports and supervised by forever:

forever start app.js 8081
forever start app.js 8082
forever start app.js 8083
forever start app.js 8084

We can check the list of the started processes using the command:

forever list

You can use forever stopall to stop all the Node.js processes previously started with forever. Alternatively, you can use forever stop <id> to stop a specific process from the ones shown with forever list.

Now, it's time to configure the Nginx server as a load balancer.

First, we need to create a minimal configuration file in our working directory that we will call nginx.conf.

Note that, because Nginx allows you to run multiple applications behind the same server instance, it is more common to use a global configuration file, which, in Unix systems, is generally located under /usr/local/nginx/conf, /etc/nginx or /usr/local/etc/nginx. Here, by having a configuration file in our working folder, we are taking a simpler approach. This is ok for the sake of this demo as we want to run just one application locally, but we advise you follow the recommended best practices for production deployments.

Next, let's write the nginx.conf file and apply the following configuration, which is the very minimum required to get a working load balancer for our Node.js processes:

daemon off;                                                ## (1)
error_log /dev/stderr info;                                ## (2)
events {                                                   ## (3)
  worker_connections 2048;
}
http {                                                     ## (4)
  access_log /dev/stdout;
  upstream my-load-balanced-app {
    server 127.0.0.1:8081;
    server 127.0.0.1:8082;
    server 127.0.0.1:8083;
    server 127.0.0.1:8084;
  }
  server {
    listen 8080;
    location / {
      proxy_pass http://my-load-balanced-app;
    }
  }
}

Let's discuss this configuration together:

  1. The declaration daemon off allows us to run Nginx as a standalone process using the current unprivileged user and by keeping the process running in the foreground of the current terminal (which allows us to shut it down using Ctrl + C).
  2. We use error_log (and later in the http block, access_log) to stream errors and access logs respectively to the standard output and standard error, so we can read the logs in real time straight from our terminal.
  3. The events block allows us to configure how network connections are managed by Nginx. Here, we are setting the maximum number of simultaneous connections that can be opened by an Nginx worker process to 2048.
  4. The http block allows us to define the configuration for a given application. In the upstream my-load-balanced-app section, we are defining the list of backend servers used to handle the network requests. In the server section, we use listen 8080 to instruct the server to listen on port 8080 and finally, we specify the proxy_pass directive, which essentially tells Nginx to forward any request to the server group we defined before (my-load-balanced-app).

That's it! Now, we only need to start Nginx using our configuration file with the following command:

nginx -c ${PWD}/nginx.conf

Our system should now be up and running, ready to accept requests and balance the traffic across the four instances of our Node.js application. Simply point your browser to the address http://localhost:8080 to see how the traffic is balanced by our Nginx server. You can also try again to load test this application using autocannon. Since we are still running all the processes in one local machine, your results should not diverge much from what you got when benchmarking the version using the cluster module approach.

This example demonstrated how to use Nginx to load balance traffic. For simplicity, we kept everything locally on our machine, but nonetheless, this was a great exercise to get us ready to deploy an application on multiple remote servers. If you want to try to do that, you will essentially have to follow this recipe:

  1. Provision n backend servers running the Node.js application (running multiple instances with a service monitor like forever or by using the cluster module).
  2. Provision a load balancer machine that has Nginx installed and all the necessary configuration to route the traffic to the n backend servers. Every process in every server should be listed in the upstream block of your Nginx configuration file using the correct address of the various machines in the network.
  3. Make your load balancer publicly available on the internet by using a public IP and possibly a public domain name.
  4. Try to send some traffic to the load balancer's public address by using a browser or a benchmarking tool like autocannon.

For simplicity, you can perform all these steps manually by spinning servers through your cloud provider admin interface and by using SSH to log in to those. Alternatively, you could choose tools that allow you to automate these tasks by writing infrastructure as code such as Terraform (nodejsdp.link/terraform), Ansible (nodejsdp.link/ansible), and Packer (nodejsdp.link/packer).

In this example, we used a predefined number of backend servers. In the next section, we will explore a technique that allows us to load balance traffic to a dynamic set of backend servers.

Dynamic horizontal scaling

One important advantage of modern cloud-based infrastructure is the ability to dynamically adjust the capacity of an application based on the current or predicted traffic. This is also known as dynamic scaling. If implemented properly, this practice can reduce the cost of the IT infrastructure enormously while still keeping the application highly available and responsive.

The idea is simple: if our application is experiencing a performance degradation caused by a peak in traffic, the system automatically spawns new servers to cope with the increased load. Similarly, if we see that the allocated resources are underutilized, we can shut some servers down to reduce the cost of the running infrastructure. We could also decide to perform scaling operations based on a schedule; for instance, we could shut down some servers during certain hours of the day when we know that the traffic will be lighter, and restart them again just before the peak hours. These mechanisms require the load balancer to always be up-to-date with the current network topology, knowing at any time which server is up.

Using a service registry

A common pattern to solve this problem is to use a central repository called a service registry, which keeps track of the running servers and the services they provide.

Figure 12.7 shows a multiservice architecture with a load balancer on the front, dynamically configured using a service registry:

Figure 12.7: A multiservice architecture with a load balancer on the front, dynamically configured using a service registry

The architecture in Figure 12.7 assumes the presence of two services, API and WebApp. There can be one or many instances of each service, spread across multiple servers.

When a request to example.com is received, the load balancer checks the prefix of the request path. If the prefix is /api, the request is load balanced between the available instances of the API service. In Figure 12.7, we have two instances running on the server api1.example.com and one instance running on api2.example.com. For all the other path prefixes, the request is load balanced between the available instances of the WebApp service. In the diagram, we have only one WebApp instance, which is running on the server web1.example.com. The load balancer obtains the list of servers and service instances running on every server using the service registry.

For this to work in complete automation, each application instance has to register itself to the service registry the moment it comes up online and unregister itself when it stops. This way, the load balancer can always have an up-to-date view of the servers and the services available on the network.

Pattern (service registry)

Use a central repository to store an always up-to-date view of the servers and the services available in a system.

While this pattern is useful to load balance traffic, it has the added benefit of being able to decouple service instances from the servers on which they are running. We can look at the Service Registry pattern as an implementation of the Service Locator Design pattern applied to network services.

Implementing a dynamic load balancer with http-proxy and Consul

To support a dynamic network infrastructure, we can use a reverse proxy such as Nginx or HAProxy: all we need to do is update their configuration using an automated service and then force the load balancer to pick the changes. For Nginx, this can be done using the following command line:

nginx -s reload

The same result can be achieved with a cloud-based solution, but we have a third and more familiar alternative that makes use of our favorite platform.

We all know that Node.js is a great tool for building any sort of network application and, as we said throughout this book, this is exactly one of its main design goals. So, why not build a load balancer using nothing but Node.js? This would give us much more freedom and power and would allow us to implement any sort of pattern or algorithm straight into our custom-built load balancer, including the one we are now going to explore: dynamic load balancing using a service registry. Furthermore, working on this exercise will definitely help us to understand even better how production-grade products such as Nginx and HAProxy actually work.

In this example, we are going to use Consul (nodejsdp.link/consul) as the service registry to replicate the multiservice architecture we saw in Figure 12.7. To do that, we are going to mainly use three npm packages:

Let's start by implementing our services. These are simple HTTP servers like the ones we have used so far to test cluster and Nginx, but this time, we want each server to register itself into the service registry the moment it starts.

Let's see how this looks (file app.js):

import { createServer } from 'http'
import consul from 'consul'
import portfinder from 'portfinder'
import { nanoid } from 'nanoid'
const serviceType = process.argv[2]
const { pid } = process
async function main () {
  const consulClient = consul()
  const port = await portfinder.getPortPromise()            // (1)
  const address = process.env.ADDRESS || 'localhost'
  const serviceId = nanoid()
  function registerService () {                             // (2)
    consulClient.agent.service.register({
      id: serviceId,
      name: serviceType,
      address,
      port,
      tags: [serviceType]
    }, () => {
      console.log(`${serviceType} registered successfully`)
    })
  }
  function unregisterService (err) {                        // (3)
    err && console.error(err)
    console.log(`deregistering ${serviceId}`)
    consulClient.agent.service.deregister(serviceId, () => {
      process.exit(err ? 1 : 0)
    })
  }
  process.on('exit', unregisterService)                     // (4)
  process.on('uncaughtException', unregisterService)
  process.on('SIGINT', unregisterService)
  const server = createServer((req, res) => {               // (5)
    let i = 1e7; while (i > 0) { i-- }
    console.log(`Handling request from ${pid}`)
    res.end(`${serviceType} response from ${pid}
`)
  })
  server.listen(port, address, () => {
    registerService()
    console.log(`Started ${serviceType} at ${pid} on port ${port}`)
  })
}
main().catch((err) => {
  console.error(err)
  process.exit(1)
})

In the preceding code, there are some parts that deserve our attention:

  1. First, we use portfinder.getPortPromise() to discover a free port in the system (by default, portfinder starts to search from port 8000). We also allow the user to configure the address based on the environment variable ADDRESS. Finally, we generate a random ID to identify this service using nanoid (nodejsdp.link/nanoid).
  2. Next, we declare the registerService() function, which uses the consul library to register a new service in the registry. The service definition needs several attributes: id (a unique identifier for the service), name (a generic name that identifies the service), address and port (to identify how to access the service), and tags (an optional array of tags that can be used to filter and group services). We are using serviceType (which we get from the command-line arguments) to specify the service name and to add a tag. This will allow us to identify all the services of the same type available in the cluster.
  3. At this point, we define a function called unregisterService(), which allows us to remove the service we just registered in Consul.
  4. We use unregisterService() as a cleanup function so that when the program is closed (either intentionally or by accident), the service is unregistered from Consul.
  5. Finally, we start the HTTP server for our service on the port discovered by portfinder and the address configured for the current service. Note that when the server is started, we make sure to invoke the registerService() function to make sure that the service is registered for discovery.

With this script, we will be able to start and register different types of applications.

Now, it's time to implement the load balancer. Let's do that by creating a new module called loadBalancer.js:

import { createServer } from 'http'
import httpProxy from 'http-proxy'
import consul from 'consul'
const routing = [                                            // (1)
  {
    path: '/api',
    service: 'api-service',
    index: 0
  },
  {
    path: '/',
    service: 'webapp-service',
    index: 0
  }
]
const consulClient = consul()                                // (2)
const proxy = httpProxy.createProxyServer()
const server = createServer((req, res) => {
  const route = routing.find((route) =>                      // (3)
    req.url.startsWith(route.path))
  consulClient.agent.service.list((err, services) => {       // (4)
    const servers = !err && Object.values(services)
      .filter(service => service.Tags.includes(route.service))
    if (err || !servers.length) {
      res.writeHead(502)
      return res.end('Bad gateway')
    }
    route.index = (route.index + 1) % servers.length         // (5)
    const server = servers[route.index]
    const target = `http://${server.Address}:${server.Port}`
    proxy.web(req, res, { target })
  })
})
server.listen(8080, () => {
  console.log('Load balancer started on port 8080')
})

This is how we implemented our Node.js-based load balancer:

  1. First, we define our load balancer routes. Each item in the routing array contains the service used to handle the requests arriving on the mapped path. The index property will be used to round-robin the requests of a given service.
  2. We need to instantiate a consul client so that we can have access to the registry. Next, we instantiate an http-proxy server.
  3. In the request handler of the server, the first thing we do is match the URL against our routing table. The result will be a descriptor containing the service name.
  4. We obtain from consul the list of servers implementing the required service. If this list is empty or there was an error retrieving it, then we return an error to the client. We use the Tags attribute to filter all the available services and find the address of the servers that implement the current service type.
  5. At last, we can route the request to its destination. We update route.index to point to the next server in the list, following a round-robin approach. We then use the index to select a server from the list, passing it to proxy.web(), along with the request (req) and the response (res) objects. This will simply forward the request to the server we chose.

It is now clear how simple it is to implement a load balancer using only Node.js and a service registry, as well as how much flexibility we can have by doing so.

Note that in order to keep the implementation simple, we intentionally left out some interesting optimization opportunities. For instance, in this implementation, we are interrogating consul to get the list of registered services for every single request. This is something that can add a significant overhead, especially if our load balancer receives requests with a high frequency. It would be more efficient to cache the list of services and refresh it on a regular basis (for instance, every 10 seconds). Another optimization could be to use the cluster module to run multiple instances of our load balancer and distribute the load across all the available cores in the machine.

Now, we should be ready to give our system a try, but first, let's install the Consul server by following the official documentation at nodejsdp.link/consul-install.

This allows us to start the Consul service registry on our development machine with this simple command line:

consul agent -dev

Now, we are ready to start the load balancer (using forever to make sure the application is restarted in case of a crash):

forever start loadBalancer.js

Now, if we try to access some of the services exposed by the load balancer, we will notice that it returns an HTTP 502 error, because we didn't start any servers yet. Try it yourself:

curl localhost:8080/api

The preceding command should return the following output:

Bad Gateway

The situation will change if we spawn some instances of our services, for example, two api-service and one webapp-service:

forever start --killSignal=SIGINT app.js api-service
forever start --killSignal=SIGINT app.js api-service
forever start --killSignal=SIGINT app.js webapp-service

Now, the load balancer should automatically see the new servers and start distributing requests across them. Let's try again with the following command:

curl localhost:8080/api

The preceding command should now return this:

api-service response from 6972

By running this again, we should now receive a message from another server, confirming that the requests are being distributed evenly among the different servers:

api-service response from 6979

If you want to see the instances managed by forever and stop some of them you can use the commands forever list and forever stop. To stop all running instances you can use forever stopall. Why don't you try to stop one of the running instances of the api-service to see what happens to the whole application?

The advantages of this pattern are immediate. We can now scale our infrastructure dynamically, on demand, or based on a schedule, and our load balancer will automatically adjust with the new configuration without any extra effort!

Consul offers a convenient web UI available at localhost:8500 by default. Check it out while playing with this example to see how services appear and disappear as they get registered or unregistered.

Consul also offers a health check feature to monitor registered services. This feature could be integrated within our example to make our infrastructure even more resilient to failures. In fact, if a service does not respond to a health check, it gets automatically removed from the registry and therefore, it won't receive traffic anymore. If you are curious to see how you can implement this feature, you can check out the official documentation for Checks at nodejsdp.link/consul-checks.

Now that we know how to perform dynamic load balancing using a load balancer and a service registry, we are ready to explore some interesting alternative approaches, like peer-to-peer load balancing.

Peer-to-peer load balancing

Using a reverse proxy is almost a necessity when we want to expose a complex internal network architecture to a public network such as the Internet. It helps hide the complexity, providing a single access point that external applications can easily use and rely on. However, if we need to scale a service that is for internal use only, we can have much more flexibility and control.

Let's imagine having a service, Service A, that relies on Service B to implement its functionality. Service B is scaled across multiple machines and it's available only in the internal network. What we have learned so far is that Service A will connect to Service B using a load balancer, which will distribute the traffic to all the servers implementing Service B.

However, there is an alternative. We can remove the load balancer from the picture and distribute the requests directly from the client (Service A), which now becomes directly responsible for load balancing its requests across the various instances of Service B. This is possible only if Server A knows the details about the servers exposing Service B, and in an internal network, this is usually known information. With this approach, we are essentially implementing peer-to-peer load balancing.

Figure 12.8 compares the two alternatives we just described:

Figure 12.8: Centralized load balancing versus peer-to-peer load balancing

This is an extremely simple and effective pattern that enables truly distributed communications without bottlenecks or single points of failure. Besides that, it also has the following properties:

  • Reduces the infrastructure complexity by removing a network node
  • Allows faster communications because messages will travel through one fewer node
  • Scales better because performances are not limited by what the load balancer can handle

On the other hand, by removing the load balancer, we are actually exposing the complexity of its underlying infrastructure. Also, each client has to be smarter by implementing a load balancing algorithm and, possibly, also a way to keep its knowledge of the infrastructure up to date.

Peer-to-peer load balancing is a pattern used extensively in the ZeroMQ (nodejsdp.link/zeromq) library, which we will use in the next chapter.

In the next section, we will showcase an example implementing peer-to-peer load balancing in an HTTP client.

Implementing an HTTP client that can balance requests across multiple servers

We already know how to implement a load balancer using only Node.js and distribute incoming requests across the available servers, so implementing the same mechanism on the client side should not be that different. All we have to do, in fact, is wrap the client API and augment it with a load balancing mechanism. Take a look at the following module (balancedRequest.js):

import { request } from 'http'
import getStream from 'get-stream'
const servers = [
  { host: 'localhost', port: 8081 },
  { host: 'localhost', port: 8082 }
]
let i = 0
export function balancedRequest (options) {
  return new Promise((resolve) => {
    i = (i + 1) % servers.length
    options.hostname = servers[i].host
    options.port = servers[i].port
    request(options, (response) => {
      resolve(getStream(response))
    }).end()
  })
}

The preceding code is very simple and needs little explanation. We wrapped the original http.request API so that it overrides the hostname and port of the request with those selected from the list of available servers using a round-robin algorithm. Note that, for simplicity, we used the module get-stream (nodejsdp.link/get-stream) to "accumulate" the response stream into a buffer that will contain the full response body.

The new wrapped API can then be used seamlessly (client.js):

import { balancedRequest } from './balancedRequest.js'
async function main () {
  for (let i = 0; i < 10; i++) {
    const body = await balancedRequest({
      method: 'GET',
      path: '/'
    })
    console.log(`Request ${i} completed:`, body)
  }
}
main().catch((err) => {
  console.error(err)
  process.exit(1)
})

To run the preceding code, we have to start two instances of the sample server provided:

node app.js 8081
node app.js 8082

This is followed by the client application we just built:

node client.js

We should note that each request is sent to a different server, confirming that we are now able to balance the load without a dedicated load balancer!

An obvious improvement to the wrapper we created previously would be to integrate a service registry directly into the client and obtain the server list dynamically.

In the next section, we will explore the field of containers and container orchestration and see how, in this specific context, the runtime takes ownership of many scalability concerns.

Scaling applications using containers

In this section, we will demonstrate how using containers and container orchestration platforms, such as Kubernetes, can help us to write simpler Node.js applications that can delegate most of the scaling concerns like load balancing, elastic scaling, and high availability to the underlying container platform.

Containers and container orchestration platforms constitute a quite broad topic, largely outside the scope of this book. For this reason, here, we aim to provide only some basic examples to get you started with this technology using Node.js. Ultimately, our goal is to encourage you to explore new modern patterns in order to run and scale Node.js applications.

What is a container?

A container, specifically a Linux container, as standardized by the Open Container Initiative (OCI) (nodejsdp.link/opencontainers), is defined as "a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another."

In other words, by using containers, you can seamlessly package and run applications on different machines, from a local development laptop on your desk to a production server in the cloud.

Other than being extremely portable, applications running as containers have the advantage of having very little overhead when executed. In fact, containers run almost as fast as running the native application directly on the operating system.

In simple terms, you can see a container as a standard unit of software that allows you to define and run an isolated process directly on a Linux operating system.

For their portability and performance, containers are considered a huge step forward when compared to virtual machines.

There are different ways and tools to create and run an OCI compliant container for an application. The most popular of them is Docker (nodejsdp.link/docker).

You can install Docker in your system by following the instructions for your operating system on the official documentation: nodejsdp.link/docker-docs.

Creating and running a container with Docker

Let's rewrite our simple web server application with some minor changes (app.js):

import { createServer } from 'http'
import { hostname } from 'os'
const version = 1
const server = createServer((req, res) => {
  let i = 1e7; while (i > 0) { i-- }
  res.end(`Hello from ${hostname()} (v${version})`)
})
server.listen(8080)

Compared to the previous versions of this web server, here, we send the machine hostname and the application version back to the user. If you run this server and make a request, you should get back something like this:

Hello from my-amazing-laptop.local (v1)

Let's see how we can run this application as a container. The first thing we need to do is create a package.json file for the project:

{
  "name": "my-simple-app",
  "version": "1.0.0",
  "main": "app.js",
  "type": "module",
  "scripts": {
    "start": "node app.js"
  }
}

In order to dockerize our application, we need to follow a two-step process:

  • Build a container image
  • Run a container instance from the image

To create the container image for our application, we have to define a Dockerfile. A container image (or Docker image) is the actual package and conforms to the OCI standard. It contains all the source code and the necessary dependencies and describes how the application must be executed. A Dockerfile is a file (actually named Dockerfile) that defines the build script used to build a container image for an application. So, without further ado, let's write the Dockerfile for our application:

FROM node:14-alpine
EXPOSE 8080
COPY app.js package.json /app/
WORKDIR /app
CMD ["npm", "start"]

Our Dockerfile is quite short, but there are a lot of interesting things here, so let's discuss them one by one:

  • FROM node:14-alpine indicates the base image that we want to use. A base image allows us to build "on top" of an existing image. In this specific case, we are starting from an image that already contains version 14 of Node.js. This means we don't have to be worried about describing how Node.js needs to be packaged into the container image.
  • EXPOSE 8080 informs Docker that the application will be listening for TCP connections on the port 8080.
  • COPY app.js package.json /app/ copies the files app.js and package.json into the /app folder of the container filesystem. Containers are isolated, so, by default, they can't share files with the host operating system; therefore, we need to copy the project files into the container to be able to access and execute them.
  • WORKDIR /app sets the working directory for the container to /app.
  • CMD ["npm", "start"] specifies the command that is executed to start the application when we run a container from an image. Here, we are just running npm start, which, in turn, will run node app.js, as specified in our package.json. Remember that we are able to run both node and npm in the container only because those two executables are made available through the base image.

Now, we can use the Dockerfile to build the container image with the following command:

docker build .

This command will look for a Dockerfile in the current working directory and execute it to build our image.

The output of this command should be something like this:

Sending build context to Docker daemon  7.168kB
Step 1/5 : FROM node:14-alpine
 ---> ea308280893e
Step 2/5 : EXPOSE 8080
 ---> Running in 61c34f4064ab
Removing intermediate container 61c34f4064ab
 ---> 6abfcdf0e750
Step 3/5 : COPY app.js package.json /app/
 ---> 9d498d7dbf8b
Step 4/5 : WORKDIR /app
 ---> Running in 70ea26158cbe
Removing intermediate container 70ea26158cbe
 ---> fc075a421b91
Step 5/5 : CMD ["npm", "start"]
 ---> Running in 3642a01224e8
Removing intermediate container 3642a01224e8
 ---> bb3bd34bac55
Successfully built bb3bd34bac55

Note that if you have never used the node:14-alpine image before (or if you have recently wiped your Docker cache), you will also see some additional output, indicating the download of this container image.

The final hash is the ID of our container image. We can use it to run an instance of the container with the following command:

docker run -it -p 8080:8080 bb3bd34bac55

This command is essentially telling Docker to run the application from image bb3bd34bac55 in "interactive mode" (which means that it will not go in the background) and that port 8080 of the container will be mapped to port 8080 of the host machine (our operating system).

Now, we can access the application at localhost:8080. So, if we use curl to send a request to the web server, we should get a response similar to the following:

Hello from f2ffa85c8ff8 (v1)

Note that the hostname is now different. This is because every container is running in a sandboxed environment that, by default, doesn't have access to most of the resources in the underlying operating system.

At this point, you can stop the container by just pressing Ctrl + C in the terminal window where the container is running.

When building an image, we can use the -t flag to tag the resulting image. A tag can be used as a more predictable alternative to a generated hash to identify and run container images. For instance, if we want to call our container image hello-web:v1, we can use the following commands:

docker build -t hello-web:v1 .
docker run -it -p 8080:8080 hello-web:v1

When using tags, you might want to follow the conventional format of image-name:version.

What is Kubernetes?

We just ran a Node.js application using containers, hooray! Even though this seems like a particularly exciting achievement, we have just scratched the surface here. The real power of containers comes out when building more complicated applications. For instance, when building applications composed by multiple independent services that needs to be deployed and coordinated across multiple cloud servers. In this situation, Docker alone is not sufficient anymore. We need a more complex system that allows us to orchestrate all the running container instances over the available machines in our cloud cluster: we need a container orchestration tool.

A container orchestration tool has a number of responsibilities:

  • It allows us to join multiple cloud servers (nodes) into one logical cluster, where nodes can be added and removed dynamically without affecting the availability of the services running in every node.
  • It makes sure that there is no downtime. If a container instance stops or becomes unresponsive to health checks, it will be automatically restarted. Also, if a node in the cluster fails, the workload running in that node will be automatically migrated to another node.
  • Provides functionalities to implement service discovery and load balancing.
  • Provides orchestrated access to durable storage so that data can be persisted as needed.
  • Automatic rollouts and rollbacks of applications with zero downtime.
  • Secret storage for sensitive data and configuration management systems.

One of the most popular container orchestration systems is Kubernetes (nodejsdp.link/kubernetes), originally open sourced by Google in 2014. The name Kubernetes originates from the Greek , meaning "helmsman" or "pilot", but also "governor" or more generically, "the one in command". Kubernetes incorporates years of experience from Google engineers running workloads in the cloud at scale.

One of its peculiarities is the declarative configuration system that allows you to define an "end state" and let the orchestrator figure out the sequence of steps necessary to reach the desired state, without disrupting the stability of the services running on the cluster.

The whole idea of Kubernetes configuration revolves around the concept of "objects". An object is an element in your cloud deployment, which can be added, removed, and have its configuration changed over time. Some good examples of Kubernetes objects are:

  • Containerized applications
  • Resources for the containers (CPU and memory allocations, persistent storage, access to devices such as network interfaces or GPU, and so on)
  • Policies for the application behavior (restart policies, upgrades, fault-tolerance)

A Kubernetes object is a sort of "record of intent", which means that once you create one in a cluster, Kubernetes will constantly monitor (and change, if needed) the state of the object to make sure it stays compliant with the defined expectation.

A Kubernetes cluster is generally managed through a command-line tool called kubectl (nodejsdp.link/kubectl-install).

There are several ways to create a Kubernetes cluster for development, testing, and production purposes. The easiest way to start experimenting with Kubernetes is through a local single-node cluster, which can be easily created by a tool called minikube (nodejsdp.link/minikube-install).

Make sure to install both kubectl and minikube on your system, as we will be deploying our sample containerized app on a local Kubernetes cluster in the next section!

Another great way to learn about Kubernetes is by using the official interactive tutorials (nodejsdp.link/kubernetes-tutorials).

Deploying and scaling an application on Kubernetes

In this section, we will be running our simple web server application on a local minikube cluster. So, make sure you have kubectl and minikube correctly installed and started.

On macOS and Linux environments, make sure to run minikube start and eval $(minikube docker-env) to initialize the working environment. The second command makes sure that when you use docker and kubectl in your current terminal you will interact with the local Minikube cluster. If you open multiple terminals you should run eval $(minikube docker-env) on every terminal. You can also run minikube dashboard to run a convenient web dashboard that allows you to visualize and interact with all the objects in your cluster.

The first thing that we want to do is build our Docker image and give it a meaningful name:

docker build -t hello-web:v1 .

If you have configured your environment correctly, the hello-web image will be available to be used in your local Kubernetes cluster.

Using local images is sufficient for local development. When you are ready to go to production, the best option is to publish your images to a Docker container registry such as Docker Hub (nodejsdp.link/docker-hub), Docker Registry (nodejsdp.link/docker-registry), Google Cloud Container Registry (nodejsdp.link/gc-container-registry), or Amazon Elastic Container Registry (nodejsdp.link/ecr). Once you have your images published to a container registry, you can easily deploy your application to different hosts without having to rebuild the corresponding images each time.

Creating a Kubernetes deployment

Now, in order to run an instance of this container in the Minikube cluster, we have to create a deployment (which is a Kubernetes object) using the following command:

kubectl create deployment hello-web --image=hello-web:v1

This should produce the following output:

deployment.apps/hello-web created

This command is basically telling Kubernetes to run an instance of the hello-web:v1 container as an application called hello-web.

You can verify that the deployment is running with the following command:

kubectl get deployments

This should print something like this:

NAME        READY   UP-TO-DATE   AVAILABLE   AGE
hello-web   1/1     1            1           7s

This table is basically saying that our hello-web deployment is alive and that there is one pod allocated for it. A pod is a basic unit in Kubernetes and represents a set of containers that have to run together in the same Kubernetes node. Containers in the same pod have shared resources like storage and network. Generally, a pod contains only one container, but it's not uncommon to see more than one container in a pod when these containers are running tightly coupled applications.

You can list all the pods running in the cluster with:

kubectl get pods

This should print something like:

NAME                         READY   STATUS    RESTARTS   AGE
hello-web-65f47d9997-df7nr   1/1     Running   0          2m19s

Now, in order to be able to access the web server from our local machine, we need to expose the deployment:

kubectl expose deployment hello-web --type=LoadBalancer --port=8080
minikube service hello-web

The first command tells Kubernetes to create a LoadBalancer object that exposes the instances of the hello-web app, connecting to port 8080 of every container.

The second command is a minikube helper command that allows us to get the local address to access the load balancer. This command will also open a browser window for you, so now you should see the container response in the browser, which should look like this:

Hello from hello-web-65f47d9997-df7nr (v1)

Scaling a Kubernetes deployment

Now that our application is running and is accessible, let's actually start to experiment with some of the capabilities of Kubernetes. For instance, why not try to scale our application by running five instances instead of just one? This is as easy as running:

kubectl scale --replicas=5 deployment hello-web

Now, kubectl get deployments should show us the following status:

NAME        READY   UP-TO-DATE   AVAILABLE   AGE
hello-web   5/5     5            5           9m18s

And kubectl get pods should produce something like this:

NAME                         READY   STATUS    RESTARTS   AGE
hello-web-65f47d9997-df7nr   1/1     Running   0          9m24s
hello-web-65f47d9997-g98jb   1/1     Running   0          14s
hello-web-65f47d9997-hbdkx   1/1     Running   0          14s
hello-web-65f47d9997-jnfd7   1/1     Running   0          14s
hello-web-65f47d9997-s54g6   1/1     Running   0          14s

If you try to hit the load balancer now, chances are you will see different hostnames as the traffic gets distributed across the available instances. This should be even more apparent if you try to hit the load balancer while putting the application under stress, for instance, by running an autocannon load test against the load balancer URL.

Kubernetes rollouts

Now, let's try out another feature of Kubernetes: rollouts. What if we want to release a new version of our app?

We can set const version = 2 in our app.js file and create a new image:

docker build -t hello-web:v2 .

At this point, in order to upgrade all the running pods to this new version, we have to run the following command:

kubectl set image deployment/hello-web hello-web=hello-web:v2 --record

The output of this command should be as follows:

deployment.apps/hello-web image updated

If everything worked as expected, you should now be able to refresh your browser page and see something like the following:

Hello from hello-web-567b986bfb-qjvfw (v2)

Note the v2 flag there.

What just happened behind the scenes is that Kubernetes started to roll out the new version of our image by replacing the containers one by one. When a container is replaced, the running instance is stopped gracefully. This way requests that are currently in progress can be completed before the container is shut down.

This completes our mini Kubernetes tutorial. The lesson here is that, when using a container orchestrator platform like Kubernetes, we can keep our application code quite simple, as we won't have to include concerns such as scaling to multiple instances or deal with soft rollouts and application restarts. This is the major advantage of this approach.

Of course, this simplicity does not come for free. It is paid by having to learn and manage the orchestration platform. If you are running small applications in production, it is probably not worth to incur the complexity and the cost of having to install and manage a container orchestrator platform like Kubernetes. However, if you are serving millions of users every day, there is definitely a lot of value in building and maintaining such a powerful infrastructure.

Another interesting observation is that, when running containers in Kubernetes, containers are often considered "disposable," which basically means that they could be killed and restarted at any time. While this might seem like a non-relevant detail, you should actually take this behavior into account and try to keep your applications as stateless as possible. In fact, containers, by default, won't retain any change in the local filesystem, so every time you have to store some persistent information, you will have to rely on external storage mechanisms such as databases or persistent volumes.

If you want to clean up your system from the containers you just ran in the preceding examples and stop minikube, you can do so with the following commands:

kubectl scale --replicas=0 deployment hello-web
kubectl delete -n default service hello-web
minikube stop

In the next and last part of this chapter, we will explore some interesting patterns to decompose a monolithic application into a set of decoupled microservices, something that is critically important if you have built a monolithic application and are now suffering from scalability issues.

Decomposing complex applications

So far in this chapter, we have mainly focused our analysis on the X-axis of the scale cube. We saw how it represents the easiest and most immediate way to distribute the load and scale an application, also improving its availability. In the following section, we are going to focus on the Y-axis of the scale cube, where applications are scaled by decomposing them by functionality and service. As we will learn, this technique allows us to scale not only the capacity of an application, but also, and most importantly, its complexity.

Monolithic architecture

The term monolithic might make us think of a system without modularity, where all the services of an application are interconnected and almost indistinguishable. However, this is not always the case. Often, monolithic systems have a highly modular architecture and a good level of decoupling between their internal components.

A perfect example is the Linux OS kernel, which is part of a category called monolithic kernels (in perfect opposition with its ecosystem and the Unix philosophy). Linux has thousands of services and modules that we can load and unload dynamically, even while the system is running. However, they all run in kernel mode, which means that a failure in any of them could bring the entire OS down (have you ever seen a kernel panic?). This approach is opposite to the microkernel architecture, where only the core services of the operating system run in kernel mode, while the rest run in user mode, usually each one with its own process. The main advantage of this approach is that a problem in any of these services would more likely cause it to crash in isolation, instead of affecting the stability of the entire system.

The Torvalds-Tanenbaum debate on kernel design is probably one of the most famous flame wars in the history of computer science, where one of the main points of dispute was exactly monolithic versus microkernel design. You can find a web version of the discussion (it originally appeared on Usenet) at nodejsdp.link/torvalds-tanenbaum.

It's remarkable how these design principles, which are more than 30 years old, can still be applied today and in totally different environments. Modern monolithic applications are comparable to monolithic kernels: if any of their components fail, the entire system is affected, which, translated into Node.js terms, means that all the services are part of the same code base and run in a single process (when not cloned).

Figure 12.9 shows an example monolithic architecture:

Figure 12.9: Example of a monolithic architecture

Figure 12.9 shows the architecture of a typical e-commerce application. Its structure is modular: we have two different frontends, one for the main store and another for the administration interface. Internally, we have a clear separation of the services implemented by the application. Each service is responsible for a specific portion of the application business logic: ProductsCartCheckoutSearch, and Authentication and Users. However, the preceding architecture is monolithic since every module is part of the same codebase and runs as part of a single application. A failure in any of its components can potentially tear down the entire online store.

Another problem with this type of architecture is the interconnection between its modules; the fact that they all live inside the same application makes it very easy for a developer to build interactions and coupling between modules. For example, consider the use case of when a product is being purchased: the Checkout module has to update the availability of a Product object, and if those two modules are in the same application, it's too easy for a developer to just obtain a reference to a Product object and update its availability directly. Maintaining a low coupling between internal modules is very hard in a monolithic application, partly because the boundaries between them are not always clear or properly enforced.

high coupling is often one of the main obstacles to the growth of an application and prevents its scalability in terms of complexity. In fact, an intricate dependency graph means that every part of the system is a liability, it has to be maintained for the entire life of the product, and any change should be carefully evaluated because every component is like a wooden block in a Jenga tower: moving or removing one of them can cause the entire tower to collapse. This often results in building conventions and development processes to cope with the increasing complexity of the project.

The microservice architecture

Now, we are going to reveal the most important pattern in Node.js for writing big applications: avoid writing big applications. This seems like a trivial statement, but it's an incredibly effective strategy to scale both the complexity and the capacity of a software system. So, what's the alternative to writing big applications? The answer is in the Y-axis of the scale cube: decomposition and splitting by service and functionality. The idea is to break down an application into its essential components, creating separate, independent applications. It is practically the opposite of a monolithic architecture. This fits perfectly with the Unix philosophy and the Node.js principles we discussed at the beginning of the book; in particular, the motto "make each program do one thing well."

Microservice architecture is, today, the main reference pattern for this type of approach, where a set of self-sufficient services replace big monolithic applications. The prefix "micro" means that the services should be as small as possible, but always within reasonable limits. Don't be misled by thinking that creating an architecture with a hundred different applications exposing only one web service is necessarily a good choice. In reality, there is no strict rule on how small or big a service should be. It's not the size that matters in the design of a microservice architecture; instead, it's a combination of different factors, mainly loose couplinghigh cohesion, and integration complexity.

An example of a microservice architecture

Let's now see what the monolithic e-commerce application would look like using a microservice architecture:

Figure 12.10: An example implementation of an e-commerce system using the Microservice pattern

As we can see from Figure 12.10, each fundamental component of the e-commerce application is now a self-sustaining and independent entity, living in its own context, with its own database. In practice, they are all independent applications exposing a set of related services.

The data ownership of a service is an important characteristic of the microservice architecture. This is why the database also has to be split to maintain the proper level of isolation and independence. If a unique shared database is used, it would become much easier for the services to work together; however, this would also introduce a coupling between the services (based on data), nullifying some of the advantages of having different applications.

The dashed lines connecting all the nodes tells us that, in some way, they have to communicate and exchange information for the entire system to be fully functional. As the services do not share the same database, there is more communication involved to maintain the consistency of the whole system. For example, the Checkout service needs to know some information about Products, such as the price and restrictions on shipping, and at the same time, it needs to update the data stored in the Products service such as the product's availability when the checkout is complete. In Figure 12.10, we tried to represent the way the nodes communicate generic. Surely, the most popular strategy is using web services, but as we will see later, this is not the only option.

Pattern (microservice architecture)

Split a complex application by creating several small, self-contained services.

Microservices – advantages and disadvantages

In this section, we are going to highlight some of the advantages and disadvantages of implementing a microservice architecture. As we will see, this approach promises to bring a radical change in the way we develop our applications, revolutionizing the way we see scalability and complexity, but on the other hand, it introduces new nontrivial challenges.

Martin Fowler wrote a great article about microservices that you can find at nodejsdp.link/microservices.

Every service is expendable

The main technical advantage of having each service living in its own application context is that crashes do not propagate to the entire system. The goal is to build truly independent services that are smaller, easier to change, or can even be rebuilt from scratch. If, for example, the Checkout service of our e-commerce application suddenly crashes because of a serious bug, the rest of the system would continue to work as normal. Some functionality may be affected; for example, the ability to purchase a product, but the rest of the system would continue to work.

Also, imagine if we suddenly realized that the database or the programming language we used to implement a component was not a good design decision. In a monolithic application, there would be very little we could do to change things without affecting the entire system. Instead, in a microservice architecture, we could more easily reimplement the entire service from scratch, using a different database or platform, and the rest of the system would not even notice it, as long as the new implementation maintains the same interface to the rest of the system.

Reusability across platforms and languages

Splitting a big monolithic application into many small services allows us to create independent units that can be reused much more easily. Elasticsearch (nodejsdp.link/elasticsearch) is a great example of a reusable search service. ORY (nodejsdp.link/ory) is another example of a reusable open source technology that provides a complete authentication and authorization service that can be easily integrated into a microservice architecture.

The main advantage of the microservice approach is that the level of information hiding is usually much higher compared to monolithic applications. This is possible because the interactions usually happen through a remote interface such as a web API or a message broker, which makes it much easier to hide implementation details and shield the client from changes in the way the service is implemented or deployed. For example, if all we have to do is invoke a web service, we are shielded from the way the infrastructure behind is scaled, from what programming language it uses, from what database it uses to store its data, and so on. All these decisions can be revisited and adjusted as needed, with potentially no impact on the rest of the system.

A way to scale the application

Going back to the scale cube, it's clear that microservices are equivalent to scaling an application along the Y-axis, so it's already a solution for distributing the load across multiple machines. Also, we should not forget that we can combine microservices with the other two dimensions of the cube to scale the application even further. For example, each service could be cloned to handle more traffic, and the interesting aspect is that they can be scaled independently, allowing better resource management.

At this point, it would look like microservices are the solution to all our problems. However, this is far from being true. Let's see the challenges we face using microservices.

The challenges of microservices

Having more nodes to manage introduces a higher complexity in terms of integration, deployment, and code sharing: it fixes some of the pains of traditional architectures, but it also opens up many new questions. How do we make the services interact? How can we keep sanity with deploying, scaling, and monitoring such a high number of applications? How can we share and reuse code between services?

Fortunately, cloud services and modern DevOps methodologies can provide some answers to those questions, and also, using Node.js can help a lot. Its module system is a perfect companion to share code between different projects. Node.js was made to be a node in a distributed system such as those of a microservice architecture.

In the following sections, we will introduce some integration patterns that can help with managing and integrating services in a microservice architecture.

Integration patterns in a microservice architecture

One of the toughest challenges of microservices is connecting all the nodes to make them collaborate. For example, the Cart service of our e-commerce application would make little sense without some Products to add, and the Checkout service would be useless without a list of products to buy (a cart). As we already mentioned, there are also other factors that necessitate an interaction between the various services. For example, the Search service has to know which Products are available and must also ensure it keeps its information up to date. The same can be said about the Checkout service, which has to update the information about Product availability when a purchase is completed.

When designing an integration strategy, it's also important to consider the coupling that it's going to introduce between the services in the system. We should not forget that designing a distributed architecture involves the same practices and principles we use locally when designing a module or subsystem. Therefore, we also need to take into consideration properties such as the reusability and extensibility of the service.

The API proxy

The first pattern we are going to show makes use of an API proxy (also commonly identified as an API gateway), a server that proxies the communications between a client and a set of remote APIs. In a microservice architecture, its main purpose is to provide a single access point for multiple API endpoints, but it can also offer load balancing, caching, authentication, and traffic limiting, all of which are features that prove to be very useful to implement a solid API solution.

This pattern should not be new to us since we already saw it in action in this chapter when we built the custom load balancer with http-proxy and consul. For that example, our load balancer was exposing only two services, and then, thanks to a service registry, it was able to map a URL path to a service and hence to a list of servers. An API proxy works in the same way; it is essentially a reverse proxy and often also a load balancer, specifically configured to handle API requests. Figure 12.11 shows how we can apply such a solution to our e-commerce application:

Figure 12.11: Using the API Proxy pattern in an e-commerce application

From the preceding diagram, it should be clear how an API proxy can hide the complexity of its underlying infrastructure. This is really handy in a microservice infrastructure, as the number of nodes may be high, especially if each service is scaled across multiple machines. The integration achieved by an API proxy is therefore only structural since there is no semantic mechanism. It simply provides a familiar monolithic view of a complex microservice infrastructure.

Since the API Proxy pattern essentially abstracts the complexity of connecting to all the different APIs in the system, it might also allow for some freedom to restructure the various services. Maybe, as your requirements change, you will need to split an existing microservice into two or more decoupled microservices or, conversely, you might realize that, in your business context, it's better to join two or more services together. In both cases, the API Proxy pattern will allow you to make all the necessary changes with potentially no impact on the upstream systems accessing the data through the proxy.

The ability to enable incremental change in an architecture over time is a very important characteristic in modern distributed systems. If you are interested in studying this broad subject in greater depth, we recommend the book Building Evolutionary Architectures: nodejsdp.link/evolutionary-architectures.

API orchestration

The pattern we are going to describe next is probably the most natural and explicit way to integrate and compose a set of services, and it's called API orchestration. Daniel Jacobson, VP of Engineering for the Netflix API, in one of his blog posts (nodejsdp.link/orchestration-layer), defines API orchestration as follows:

"An API Orchestration Layer (OL) is an abstraction layer that takes generically-modeled data elements and/or features and prepares them in a more specific way for a targeted developer or application."

The "generically modeled elements and/or features" fit the description of a service in a microservice architecture perfectly. The idea is to create an abstraction to connect those bits and pieces to implement new services specific to a particular application.

Let's see an example using the e-commerce application. Refer to Figure 12.12:

Figure 12.12: An example usage of an orchestration layer to interact with multiple microservices

Figure 12.12 shows how the Store frontend application uses an orchestration layer to build more complex and specific features by composing and orchestrating existing services. The described scenario takes, as an example, a hypothetical completeCheckout() service that is invoked the moment a customer clicks the Pay button at the end of the checkout.

The figure shows how completeCheckout() is a composite operation made of three different steps:

  1. First, we complete the transaction by invoking checkoutService/pay.
  2. Then, when the payment is successfully processed, we need to tell the Cart service that the items were purchased and that they can be removed from the cart. We do that by invoking cartService/delete.
  3. Also, when the payment is complete, we need to update the availability of the products that were just purchased. This is done through productsService/update.

As we can see, we took three operations from three different services and we built a new API that coordinates the services to maintain the entire system in a consistent state.

Another common operation performed by the API Orchestration Layer is data aggregation, or in other words, combining data from different services into a single response. Imagine we wanted to list all the products contained in a cart. In this case, the orchestration would need to retrieve the list of product IDs from the Cart service, and then retrieve the complete information about the products from the Products service. The ways in which we can combine and coordinate services is infinite, but the important pattern to remember is the role of the orchestration layer, which acts as an abstraction between a number of services and a specific application.

The orchestration layer is a great candidate for a further functional splitting. It is, in fact, very common to have it implemented as a dedicated, independent service, in which case it takes the name of API Orchestrator. This practice is perfectly in line with the microservice philosophy.

Figure 12.13 shows this further improvement of our architecture:

Figure 12.13: An application of the API Orchestrator pattern for our e-commerce example

Creating a standalone orchestrator, as shown in the previous figure, can help in decoupling the client application (in our case, the Store frontend) from the complexity of the microservice infrastructure. This is similar to the API proxy, but there is a crucial difference: an orchestrator performs a semantic integration of the various services, it's not just a naïve proxy, and it often exposes an API that is different from the one exposed by the underlying services.

Integration with a message broker

The Orchestrator pattern gave us a mechanism to integrate the various services in an explicit way. This has both advantages and disadvantages. It is easy to design, easy to debug, and easy to scale, but unfortunately, it has to have a complete knowledge of the underlying architecture and how each service works. If we were talking about objects instead of architectural nodes, the orchestrator would be an anti-pattern called God object, which defines an object that knows and does too much, which usually results in high coupling, low cohesion, but most importantly, high complexity.

The pattern we are now going to show tries to distribute, across the services, the responsibility of synchronizing the information of the entire system. However, the last thing we want to do is create direct relationships between services, which would result in high coupling and a further increase in the complexity of the system, due to the increasing number of interconnections between nodes. The goal is to keep every service decoupled: every service should be able to work, even without the rest of the services in the system or in combination with new services and nodes.

The solution is to use a message broker, a system capable of decoupling the sender from the receiver of a message, allowing us to implement a Centralized Publish/Subscribe pattern. This is, in practice, an implementation of the Observer pattern for distributed systems. We will talk more about this pattern later in Chapter 13, Messaging and Integration Patterns. Figure 12.14 shows an example of how this applies to the e-commerce application:

Figure 12.14: Using a message broker to distribute events in our e-commerce application

As we can see from Figure 12.14, the client of the Checkout service, which is the frontend application, does not need to carry out any explicit integration with the other services.

All it has to do is invoke checkoutService/pay to complete the checkout process and take the money from the customer; all the integration work happens in the background:

  1. The Store frontend invokes the checkoutService/pay operation on the Checkout service.
  2. When the operation completes, the Checkout service generates an event, attaching the details of the operation, that is, the cartId and the list of products that were just purchased. The event is published into the message broker. At this point, the Checkout service does not know who is going to receive the message.
  3. The Cart service is subscribed to the broker, so it's going to receive the purchased event that was just published by the Checkout service. The Cart service reacts by removing the cart identified with the ID contained in the message from its database.
  4. The Products service was subscribed to the message broker as well, so it receives the same purchased event. It then updates its database based on this new information, adjusting the availability of the products included in the message.

This whole process happens without any explicit intervention from external entities such as an orchestrator. The responsibility of spreading the knowledge and keeping information in sync is distributed across the services themselves. There is no god service that has to know how to move the gears of the entire system, since each service is in charge of its own part of the integration.

The message broker is a fundamental element used to decouple the services and reduce the complexity of their interaction. It might also offer other interesting features, such as persistent message queues and guaranteed ordering of the messages. We will talk more about this in the next chapter.

Summary

In this chapter, we learned how to design Node.js architectures that scale both in capacity and complexity. We saw how scaling an application is not only about handling more traffic or reducing the response time, but it's also a practice to apply whenever we want better availability and tolerance to failures. We saw how these properties often are on the same wavelength, and we understood that scaling early is not a bad practice, especially in Node.js, which allows us to do it easily and with few resources.

The scale cube taught us that applications can be scaled across three dimensions. Throughout this chapter, we focused on the two most important dimensions, the X-and Y-axes, allowing us to discover two essential architectural patterns, namely, load balancing and microservices. You should now know how to start multiple instances of the same Node.js application, how to distribute traffic across them, and how to exploit this setup for other purposes, such as fail tolerance and zero-downtime restarts. We also analyzed how to handle the problem of dynamic and auto-scaled infrastructures. With this, we saw that a service registry can really come in useful for those situations. We learned how to achieve these goals by using plain Node.js, external load balancers like Nginx, and service discovery systems like Consul. We also learned the basics of Kubernetes.

At this point, we should have got to grips with some very practical approaches to be able to face scalability much more fearlessly then before.

However, cloning and load balancing cover only one dimension of the scale cube, so we moved our analysis to another dimension, studying in more detail what it means to split an application by its constituent services by building a microservice architecture. We saw how microservices enable a complete revolution in how a project is developed and managed, providing a natural way to distribute the load of an application and split its complexity. However, we learned that this also means shifting the complexity from how to build a big monolithic application to how to integrate a set of services. This last aspect is where we focused the last part of our analysis, showing some of the architectural solutions to integrate a set of independent services.

In the next and last chapter of this book, we will have the chance to complete our Node.js Design Patterns journey by analyzing the messaging patterns we discussed in this chapter, in addition to more advanced integration techniques that are useful when implementing complex distributed architectures.

Exercises

  • 12.1 A scalable book library: Revisit the book library application we built in Chapter 10, Universal JavaScript for Web Applications, reconsidering it after what we learned in this chapter. Can you make our original implementation more scalable? Some ideas might be to use the cluster module to run multiple instances of the server, making sure you handle failures by restarting workers that might accidentally die. Alternatively, why not try to run the entire application on Kubernetes?
  • 12.2 Exploring the Z-axis: Throughout this chapter, we did not show you any examples about how to shard data across multiple instances, but we explored all the necessary patterns to build an application that achieves scalability along the Z-axis of the scale cube. In this exercise, you are challenged to build a REST API that allows you to get a list of (randomly generated) people whose first name starts with a given letter. You could use a library like faker (nodejsdp.link/faker) to generate a sample of random people, and then you could store this data in different JSON files (or different databases), splitting the data into three different groups. For instance, you might have three groups called A-D, E-P, and Q-Z. Ada will go in the first group, Peter in the second, and Ugo in the third. Now, you can run one or more instances of a web server for every group, but you should expose only one public API endpoint to be able to retrieve all the people whose names starts with a given letter (for instance, /api/people/byFirstName/{letter}). Hint: You could use just a load balancer and map all the possible letters to the respective backend of the instances that are responsible for the associated group. Alternatively, you could create an API orchestration layer that encodes the mapping logic and redirects the traffic accordingly. Can you also throw a service discovery tool into the mix and apply dynamic load balancing, so that groups receiving more traffic can scale as needed?
  • 12.3 Music addiction: Imagine you have to design the architecture of a service like Spotify or Apple Music. Can you try to design this service as a collection of microservices by applying some of the principles discussed in this chapter? Bonus points if you can actually implement a minimal version of this idea with Node.js! If this turns out to be the next big startup idea and makes you a millionaire, well… don't forget to thank the authors of this book. :)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.206.43