Node Workers with cluster

In the early days of Node, there were many competing APIs for multithreading. Most of these solutions were clumsy, requiring users to spin up multiple instances of a server to listen on different TCP ports, which would then be hooked up to the real one via proxy. It was only in the 0.6 release that a standard was included out of the box that allowed multiple processes to bind to the same port: cluster.[51]

Typically, cluster is used to spin up one process per CPU core for optimal performance (though whether each process will actually get its own core is entirely up to the underlying OS).

Multithreading/cluster.js
 
var​ cluster = require(​'cluster'​);
 
if​ (cluster.isMaster) {
 
// spin up workers
 
var​ coreCount = require(​'os'​).cpus().length;
 
for​ (​var​ i = 0; i < coreCount; i++) {
 
cluster.fork();
 
}
 
// bind death event
 
cluster.on(​'death'​, ​function​(worker) {
 
console.log(​'Worker '​ + worker.pid + ​' has died'​);
 
});
 
} ​else​ {
 
// die immediately
 
process.exit();
 
}

The output will look something like

<= 
Worker 15330 has died
 
Worker 15332 has died
 
Worker 15329 has died
 
Worker 15331 has died

with one line for each CPU core.

The code may look baffling at first. The trick is that while web workers load a separate script, cluster.fork() causes the same script that it’s run from to be loaded in a separate process. The only way the script knows whether it’s being run as the master or a worker is by checking cluster.isMaster.

The reason for this design decision is that multithreading in Node has a very different primary use case than multithreading in the browser. While the browser can relegate any surplus threads to background tasks, Node servers need to scale up the computational resources available for their main task: handling requests.

(External scripts can be run as separate processes using child_process.fork.[52] Its capabilities are largely identical to those of cluster.fork—in fact, cluster uses child_process under the hood—except that child process can’t share TCP ports.)

Talking to Node Workers

As with web workers, cluster workers can communicate with the master process by sending message events, and vice versa. The API is slightly different, though.

Multithreading/clusterMessage.js
 
var​ cluster = require(​'cluster'​);
 
if​ (cluster.isMaster) {
 
// spin up workers
 
var​ coreCount = require(​'os'​).cpus().length;
 
for​ (​var​ i = 0; i < coreCount; i++) {
 
var​ worker = cluster.fork();
 
worker.send(​'Hello, Worker!'​);
 
worker.on(​'message'​, ​function​(message) {
 
if​ (message._queryId) ​return​;
 
console.log(message);
 
});
 
}
 
} ​else​ {
 
process.send(​'Hello, main process!'​);
 
process.on(​'message'​, ​function​(message) {
 
console.log(message);
 
});
 
}

The output will look something like

 
Hello, main process!
 
Hello, main process!
 
Hello, Worker!
 
Hello, Worker!
 
Hello, main process!
 
Hello, Worker!
 
Hello, main process!
 
Hello, Worker!

where the order is unpredictable, because each thread is racing to console.log first. (You’ll have to manually terminate the process with Ctrl+C.)

As with web workers, the API is symmetric, with a send call on one side triggering a ’message’ event on the other side. But notice that the argument to send (or rather, a serialized copy) is given directly by the ’message’ event, rather than being attached as the data property.

Notice the line

 
if​ (message._queryId) ​return​;

in the master message handler? Node sometimes sends its own messages from the workers, which always look something like this:

 
{ cmd: ​'online'​, _queryId: 1, _workerId: 1 }

It’s safe to ignore these internal messages, but be aware that they’re used to perform some important magic behind the scenes. Most notably, when workers try to listen on a TCP port, Node uses internal messages to allow the port to be shared.

Restrictions on Node Workers

For the most part, cluster obeys the same rules as web workers: there’s a master, and there are workers; they communicate via events with attached strings or serializable objects. However, while workers are obviously second-class citizens in the browser, Node’s workers possess all the rights and privileges of the master except, notably, the following:

  • The ability to shut down the application

  • The ability to spawn more workers

  • The ability to communicate with each other

This gives the master the burden of being a hub for all interthread communication. Fortunately, this inconvenience can be abstracted away with a library like Roly Fentanes’ Clusterhub.[53]

In this section, we’ve seen how workers have become an integral part of Node, allowing a server to utilize multiple cores without running multiple application instances. Node’s cluster API allows the same script to run concurrently, with one master process and any number of workers. To minimize the overhead of communication, shared state should be stored in an external database, such as Redis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.74.63