In the early days of Node, there were many competing APIs for multithreading. Most of these solutions were clumsy, requiring users to spin up multiple instances of a server to listen on different TCP ports, which would then be hooked up to the real one via proxy. It was only in the 0.6 release that a standard was included out of the box that allowed multiple processes to bind to the same port: cluster.[51]
Typically, cluster is used to spin up one process per CPU core for optimal performance (though whether each process will actually get its own core is entirely up to the underlying OS).
Multithreading/cluster.js | |
| var cluster = require('cluster'); |
| if (cluster.isMaster) { |
| // spin up workers |
| var coreCount = require('os').cpus().length; |
| for (var i = 0; i < coreCount; i++) { |
| cluster.fork(); |
| } |
| // bind death event |
| cluster.on('death', function(worker) { |
| console.log('Worker ' + worker.pid + ' has died'); |
| }); |
| } else { |
| // die immediately |
| process.exit(); |
| } |
The output will look something like
<= | Worker 15330 has died |
| Worker 15332 has died |
| Worker 15329 has died |
| Worker 15331 has died |
with one line for each CPU core.
The code may look baffling at first. The trick is that while web workers load a separate script, cluster.fork() causes the same script that it’s run from to be loaded in a separate process. The only way the script knows whether it’s being run as the master or a worker is by checking cluster.isMaster.
The reason for this design decision is that multithreading in Node has a very different primary use case than multithreading in the browser. While the browser can relegate any surplus threads to background tasks, Node servers need to scale up the computational resources available for their main task: handling requests.
(External scripts can be run as separate processes using child_process.fork.[52] Its capabilities are largely identical to those of cluster.fork—in fact, cluster uses child_process under the hood—except that child process can’t share TCP ports.)
As with web workers, cluster workers can communicate with the master process by sending message events, and vice versa. The API is slightly different, though.
Multithreading/clusterMessage.js | |
| var cluster = require('cluster'); |
| if (cluster.isMaster) { |
| // spin up workers |
| var coreCount = require('os').cpus().length; |
| for (var i = 0; i < coreCount; i++) { |
| var worker = cluster.fork(); |
| worker.send('Hello, Worker!'); |
| worker.on('message', function(message) { |
| if (message._queryId) return; |
| console.log(message); |
| }); |
| } |
| } else { |
| process.send('Hello, main process!'); |
| process.on('message', function(message) { |
| console.log(message); |
| }); |
| } |
The output will look something like
| Hello, main process! |
| Hello, main process! |
| Hello, Worker! |
| Hello, Worker! |
| Hello, main process! |
| Hello, Worker! |
| Hello, main process! |
| Hello, Worker! |
where the order is unpredictable, because each thread is racing to console.log first. (You’ll have to manually terminate the process with Ctrl+C.)
As with web workers, the API is symmetric, with a send call on one side triggering a ’message’ event on the other side. But notice that the argument to send (or rather, a serialized copy) is given directly by the ’message’ event, rather than being attached as the data property.
Notice the line
| if (message._queryId) return; |
in the master message handler? Node sometimes sends its own messages from the workers, which always look something like this:
| { cmd: 'online', _queryId: 1, _workerId: 1 } |
It’s safe to ignore these internal messages, but be aware that they’re used to perform some important magic behind the scenes. Most notably, when workers try to listen on a TCP port, Node uses internal messages to allow the port to be shared.
For the most part, cluster obeys the same rules as web workers: there’s a master, and there are workers; they communicate via events with attached strings or serializable objects. However, while workers are obviously second-class citizens in the browser, Node’s workers possess all the rights and privileges of the master except, notably, the following:
The ability to shut down the application
The ability to spawn more workers
The ability to communicate with each other
This gives the master the burden of being a hub for all interthread communication. Fortunately, this inconvenience can be abstracted away with a library like Roly Fentanes’ Clusterhub.[53]
In this section, we’ve seen how workers have become an integral part of Node, allowing a server to utilize multiple cores without running multiple application instances. Node’s cluster API allows the same script to run concurrently, with one master process and any number of workers. To minimize the overhead of communication, shared state should be stored in an external database, such as Redis.
3.145.74.63