Chapter 4 looked at using the SharedArrayBuffer
object to read and write directly to a collection of shared data from across separate threads. But doing so is risky business, allowing one thread to clobber data that was written by another thread. However, thanks to the Atomics
object, you are able to perform very basic operations with that data in a way that prevents data from being clobbered.
Although the basic operations provided by Atomics
are convenient, you will often find yourself needing to perform more complex interactions with that data. For example, once you’ve serialized data as described in “Data Serialization”, like a 1 kilobyte string, you’ll then need to write that data to the SharedArrayBuffer
instance, and none of the existing Atomics
methods will allow you to set the entire value all at once.
This chapter covers additional functionality for coordinating reads and writes to shared data across threads for situations when the previously covered Atomics
methods just aren’t enough.
These methods are a little different than the ones that were already covered in “Atomic Methods for Data Manipulation”. Specificially, the methods previously covered each work with a TypedArray
of any kind and may operate on both SharedArrayBuffer
and ArrayBuffer
instances. However, the methods listed here will only work with Int32Array
and BigInt64Array
instances, and only make sense when used with SharedArrayBuffer
instances.
If you try to use these methods with the wrong type of TypedArray
you’ll get one of these errors:
# Firefox v88 Uncaught TypeError: invalid array type for the operation # Chrome v90 / Node.js v16 Uncaught TypeError: [object Int8Array] is not an int32 or BigInt64 typed array.
As far as prior art goes, these methods are modeled after a feature available in the Linux kernel called the futex, which is short for fast userspace mutex. Mutex itself is short for mutual exclusion, which is when a single thread of execution gets exclusive access to a particular piece of data. A mutex can also be referred to as a lock, where one thread locks access to the data, does its thing, and then unlocks access, allowing another thread to then touch the data. A futex is built on two basic operations, one being “wait” and the other being “wake”.
status
=
Atomics
.
wait
(
typedArray
,
index
,
value
,
timeout
=
Infinity
)
This method first checks typedArray
to see if the value at index
is equal to value
. If it is not, the function returns the value not-equal
. If the value is equal, it will then freeze the thread for up to timeout
milliseconds. If nothing happens during that time, the function returns the value timed-out
. On the other hand, if another thread calls Atomics.notify()
for that same index
within the time period, the function then returns with a value of ok
. Table 5-1 lists these return values.
Value | Meaning |
---|---|
|
The provided |
|
Another thread didn’t call |
|
Another thread did call |
You might be wondering why this method doesn’t throw an error for the first two conditions and silently succeed instead of returning an ok
. Since multithreaded programming is used for performance reasons it stands to reason that calling these Atomics
methods will be done in the hot-paths of an application, which are areas where the application spends the most time. It’s less performant in JavaScript to instantiate Error
objects and generate stack traces than to return a simple string, so the performance of this approach is pretty high. Another reason is that the not-equal
case doesn’t really represent an error case but that something you’re waiting for has already happened.
This blocking behavior might be a little shocking at first. Locking an entire thread sounds a bit intense, and in many cases it is. Another example of what can cause an entire JavaScript thread to lock is the alert()
function in a browser. When that function is called the browser displays a dialog and nothing at all can run—not even any background tasks using the event loop—until the dialog is dismissed. The Atomics.wait()
method similarly freezes the thread.
This behavior is so extreme, in fact, that the “main” thread—the default thread that is available when running JavaScript, outside of a web worker—is not allowed to call this method, at least in a browser. The reason is that locking the main thread would be such a poor user experience that the API authors didn’t even want to allow it. If you do try to call this method in the main thread of a browser you will get one of the following errors:
# Firefox Uncaught TypeError: waiting is not allowed on this thread # Chrome v90 Uncaught TypeError: Atomics.wait cannot be called in this context
Node.js, on the other hand, does allow Atomics.wait()
to be called in the main thread. Since Node.js doesn’t have a UI this isn’t necessarily a bad thing. Indeed, it can be useful when writing scripts where calling fs.readFileSync()
isn’t necessarily a bad thing.
If you’re a JavaScript developer who has ever worked at a company with mobile or desktop developers, you may have overheard them talk about “offloading work from the main thread” or “locking the main thread”. These concerns, which have traditionally belonged to developers of native apps, will be enjoyed by us JavaScript engineers more and more as the language advances. With regards to browsers this issue is often referred to as scroll jank, which is when a CPU is too busy to draw the UI while scrolling.
awaken
=
Atomics
.
notify
(
typedArray
,
index
,
count
=
Infinity
)
The Atomics.notify()
1 method attempts to awaken other threads that have called Atomics.wait()
on the same typedArray
and at the same index
. If any other threads are currently frozen then they will wake up. Multiple threads can be frozen at the same time, each waiting to be notified. The count
value then determines how many of them to awaken. The count
value defaults to Infinity
, meaning that every thread will be awakened. However, if you have four threads waiting and set the value to three, then all but one of them will be woken up. “Timing and Non-Determinism” examines the order of these waking threads.
The return value is the number of threads that have been awoken once the method is complete. If you were to pass in a TypedArray
instance that points to a non-shared ArrayBuffer
instance, this will always return a 0. If no threads happen to be listening at the time it will also return a 0. As this method doesn’t block the thread it can always be called from a main JavaScript thread.
promise
=
Atomics
.
waitAsync
(
typedArray
,
index
,
value
,
timeout
=
Infinity
)
This is essentially a promise-based version of Atomics.wait()
and is the latest addition to the Atomics
family. As of this writing it is available in Node.js v16 and Chrome v87 but not yet available in Firefox or Safari.
This method is essentially a less-performant, non-blocking version of Atomics.wait()
that returns a promise which resolves the status of the wait operation. Due to the loss of performance (a resolving promise is going to have more overhead than pausing a thread and returning a string) it isn’t necessarily as useful for the hot-path of a CPU-heavy algorithm. On the other hand, it can be useful in situations where a lock change is more convenient to signal another thread than to perform message-passing operations via postMessage()
. Since this method doesn’t block the thread it can be used in the main thread of an application.
One of the driving factors for adding this method is so that code compiled using Emscripten (covered in “Compiling C Programs to WebAssembly with Emscripten”) that makes uses of threading is allowed to execute in the main thread and not just in worker threads.
In order for an application to be correct it usually needs to behave in a deterministic fashion. The Atomics.notify()
function accepts an argument count
that contains the number of threads to wake up. The glaring question in this situation is which threads get woken up and in which order?
Threads are woken up in FIFO (First In, First Out) order, meaning the first thread that called Atomics.wait()
is the first to be woken up, the second to call is is the second to be woken up, and so on. Measuring this can be difficult though as log messages printed from different workers aren’t guaranteed to be displayed in the terminal in the true order that they were executed in. Ideally, you should build your applications in such a way that it continues to work fine regardless of the order in which threads have been awoken.
In order to test this for yourself you can create a new application. First, create a new directory named ch5-notify-order/. In it, start off by creating another basic index.html file using the content from Example 5-1.
<html>
<head>
<title>
Shared Memory for Coordination</title>
<script
src=
"main.js"
></script>
</head>
</html>
Next, create another main.js file, containing the content from Example 5-2.
if
(
!
crossOriginIsolated
)
throw
new
Error
(
'Cannot use SharedArrayBuffer'
)
;
const
buffer
=
new
SharedArrayBuffer
(
4
)
;
const
view
=
new
Int32Array
(
buffer
)
;
for
(
let
i
=
0
;
i
<
4
;
i
++
)
{
const
worker
=
new
Worker
(
'worker.js'
)
;
worker
.
postMessage
(
{
buffer
,
name
:
i
}
)
;
}
setTimeout
(
(
)
=>
{
Atomics
.
notify
(
view
,
0
,
3
)
;
}
,
500
)
;
This file first creates a 4-byte buffer, which is the smallest buffer that can support the needed Int32Array
view. Next, it creates four different dedicated workers using a for
loop. For each of the workers it immediately calls the appropriate postMessage()
call to pass in both the buffer as well as the identifier for the thread. This results in 5 different threads that we care about; namely the main thread, and threads that we’ve nicknamed 0, 1, 2, and 3.
JavaScript creates those threads, and the underlying engine goes to work assembling resources, allocating memory, and otherwise doing a lot of magic for us behind the scenes. The amount of time that it takes to perform those tasks is sadly non-deterministic. We can’t know that, for example, it always takes 100 ms to complete the preparation work. In fact, this number will change wildy across machines depending on things like core count, and how busy the machine happens to be at the time the code is run. Lucky for us the postMessage()
call is essentially queued up for us; the JavaScript engine will call the worker’s onmessage
function once it’s ready.
After that, the main thread finishes its work, then waits half a second (500 ms) using setTimeout
, and finally calls Atomics.notify()
. What would happen if the setTimeout
value were too low? Say, 10 ms or even if it were called in the same stack outside of setTimeout
? In that case the threads wouldn’t yet be initialized, the worker wouldn’t have had time to call Atomics.wait()
, and the call would immediately return with a 0. What would happen if the time value is too high? Well, the application might be painfully slow, or any timeout
value used by Atomics.wait()
might have been exceeded.
On Thomas’s laptop the threshold of readiness seems to be somewhere around 120 ms. At that point some of the threads are ready and some aren’t. At about 100 ms usually none of the threads are ready, and at 180 ms usually all of the threads are ready. But, usually is a word that we don’t like to use in programming. It is difficult to know an exact amount of time before a thread is ready. Often this is an issue when first starting an application, not one that is present throughout the lifecycle of the application.
To finish off the application, create a file named worker.js, and add the content from Example 5-3 to it.
self
.
onmessage
=
(
{
data
:
{
buffer
,
name
}
}
)
=>
{
const
view
=
new
Int32Array
(
buffer
)
;
console
.
log
(
`
Worker
${
name
}
started
`
)
;
const
result
=
Atomics
.
wait
(
view
,
0
,
0
,
1000
)
;
console
.
log
(
`
Worker
${
name
}
awoken with
${
result
}
`
)
;
}
;
The worker accepts the shared buffer and the name of the worker thread and stores the values, printing a message that the thread has been initialized. It then calls Atomics.wait()
using the 0th index of the buffer. It assumes an initial value of 0 is present in the buffer (which it is since we haven’t modified the value). The method call also uses a timeout
value of one second (1000 ms). Finally, once the method call is complete, the value is printed in the terminal.
Once you’ve finished creating these files, switch to a terminal and run another web server to view the content. Again, you can do so by running the following command:
$ npx MultithreadedJSBook/serve .
As usual, navigate to the URL printed in the terminal and open the console. If you don’t see any output you may need to refresh the page to run the application again. Table 5-2 contains the output from a test run.
Log | Location |
---|---|
Worker 1 started |
worker.js:4:11 |
Worker 0 started |
worker.js:4:11 |
Worker 3 started |
worker.js:4:11 |
Worker 2 started |
worker.js:4:11 |
Worker 0 awoken with ok |
worker.js:7:11 |
Worker 3 awoken with ok |
worker.js:7:11 |
Worker 1 awoken with ok |
worker.js:7:11 |
Worker 2 awoken with timed-out |
worker.js:7:11 |
You will most likely get different output. In fact, if you refresh the page again, you may get different output once again. Or you may even get consistent output across multiple runs. Ideally, though, the final worker name that is printed with the “started” messages will also be the worker that fails with the “timed-out” message.
This output might be a little confusing. Earlier we stated that the order seems to be FIFO ordered, but the numbers here aren’t from 0 to 3. The reason for that the order doesn’t depend on the order that the threads were created (0, 1, 2, 3), but the order in which the threads executed the Atomics.wait()
call (1, 0, 3, 2 in this case). Even with that in mind the order of the “awoken” messages is confusing (0, 3, 1, 2 in this case). This is likely due to a race condition in the JavaScript engine where different threads print messages, likely at almost the exact same moment.
Once printed, the messages don’t get displayed directly to the screen. If that could happen then the messages could overwrite each other, and you’d end up with visual tearing of pixels. Instead, the engine queues up the messages to be printed and some other mechanism internal to the browser, but hidden away from us developers, determines the order in which they’re taken from the queue and printed to the screen. For that reason, the order of the two sets of messages won’t necessarily correlate. But the only way to truly tell any order is that the message that times out happens to be from the final thread that was started. Indeed, in this case the “timed-out” message is always from the last worker that was started.
This experiment begs the question: how can an application deterministically know when a thread has finished going through initial setup and is thus prepared to take on work?
A simple way to do so is to call postMessage()
from within the worker threads to post back to the parent thread at some point during the onmessage()
handler. This works because once the onmessage()
handler has been called the worker thread has finished its initial setup and is now running JavaScript code.
Here’s an example of the quickest way to pull this off. First, copy the ch5-notify-order/ directory you created and paste it as a new ch5-notify-when-ready/ directory. Inside this directory the index.html file will remain the same, though the two JavaScript files will be updated. First, update main.js to contain the content from Example 5-4.
if
(
!
crossOriginIsolated
)
throw
new
Error
(
'Cannot use SharedArrayBuffer'
)
;
const
buffer
=
new
SharedArrayBuffer
(
4
)
;
const
view
=
new
Int32Array
(
buffer
)
;
const
now
=
Date
.
now
(
)
;
let
count
=
4
;
for
(
let
i
=
0
;
i
<
4
;
i
++
)
{
const
worker
=
new
Worker
(
'worker.js'
)
;
worker
.
postMessage
(
{
buffer
,
name
:
i
}
)
;
worker
.
onmessage
=
(
)
=>
{
console
.
log
(
`
Ready; id=
${
i
}
, count=
${
--
count
}
, time=
${
Date
.
now
(
)
-
now
}
ms
`
)
;
if
(
count
===
0
)
{
Atomics
.
notify
(
view
,
0
)
;
}
}
;
}
The script has been modified so that Atomics.notify()
will be called after each of the four workers has posted a message back to the main thread. Once the fourth and final worker has posted a message the notification is then sent. This allows the application to post a message as soon as it’s ready, likely saving hundreds of milliseconds in the best case, and preventing a failure in the worst case (like when running the code on a very slow single core computer).
The Atomics.notify()
call has also been updated to simply wake up all threads instead of just three, and the timeout has been set back to the default of Infinity
. This was done to show that every thread will receive the message on time.
Next, update worker.js to contain the content from Example 5-5.
self
.
onmessage
=
(
{
data
:
{
buffer
,
name
}
}
)
=>
{
postMessage
(
'ready'
)
;
const
view
=
new
Int32Array
(
buffer
)
;
console
.
log
(
`
Worker
${
name
}
started
`
)
;
const
result
=
Atomics
.
wait
(
view
,
0
,
0
)
;
console
.
log
(
`
Worker
${
name
}
awoken with
${
result
}
`
)
;
}
;
This time the onmessage
handler immediately calls postMessage()
to send a message back to the parent. Then, the wait call happens shortly afterward. Technically, if the parent thread were to somehow receive the message before the Atomics.wait()
call were made, then the application could conceivably break. But the code is relying on the fact that message passing is far slower than iterating over lines of code within a synchronous JavaScript function.
One thing to keep in mind is that calling Atomics.wait()
will pause the thread. This means postMessage()
can’t be called afterward.
When you run this code the new logs print out three pieces of information: the name of the thread, the countdown (always in the order of 3, 2, 1, 0), and finally the amount of time it took for the thread to be ready since the start of the script. Run the same command that you ran before and open the resulting URL in your browser. Table 5-3 contains the log output from some sample runs.
Firefox v88 | Chrome v90 |
---|---|
T1, 86ms |
T0, 21ms |
T0, 99ms |
T1, 24ms |
T2, 101ms |
T2, 26ms |
T3, 108ms |
T3, 29ms |
In this case, with a 16-core laptop, Firefox seems to take around four times as long to initialize the worker threads than Chrome does. Also, Firefox gives a more random thread order than Chrome. Each time the page is refreshed the order of threads for Firefox changes but the order in Chrome does not. This then suggests that the V8 engine used by Chrome is more optimized for starting new JavaScript environments or instantiating browser APIs than the SpiderMonkey engine used by Firefox.
Be sure to test this code in multiple browsers to compare the results that you get. Another thing to keep in mind is that the speed that it takes to initialize threads will also likely depend on the number of cores available on your computer. In fact, to have some additional fun with this program, change the value of 4 that is assigned to the count
variable and in the for
loop to a higher number, then run the code and see what happens. Upon increasing the value to 128, the amount of time it took both browsers to initialize threads jumped a lot. This also consistently breaks the order in which the threads are prepared on Chrome. Generally, performance is lost when using too many threads, and this is examined in more detail in “Low Core Count”.
Now that we’ve had a look at Atomics.wait()
and Atomics.notify()
, it’s time to look at a concrete example. We’ll use Conway’s Game of Life, a well-established concept that naturally lends itself to parallel programming.
The “game” is actually a simulation of population growth and decay. The “world” this simulation exists in is a grid of cells that are in one of two states: alive or dead. The simulation works iteratively, and on each iteration, the following algorithm is performed for each cell.
If the cell is alive:
If there are are 2 or 3 neighbors alive, the cell remains alive.
If there are 0 or 1 neighbors alive, the cell dies (this simulates “underpopulation” as a cause of death).
If there are 4 or more neighbors alive, the cell dies (this simulates “overpopulation” as a cause of death).
If the cell is dead:
If there are exactly 3 neighbors alive, the cell becomes alive (this simulates “reproduction”).
In any other case, the cell remains dead.
When talking about “neighbors alive”, we’re referring to any cell that’s at most one unit away from the current cell, including diagonals, and we’re referring to the state prior to the current iteration. We can simplify these rules to the following.
If there are exactly 3 neighbors alive, the new cell state is alive (regardless of how it started).
If the cell is alive and exactly 2 neighbors are alive, the cell remains alive.
In all other cases, the new cell state is dead.
For our implementation, we’ll make the following assumptions:
The grid is a square. This is a slight simplification so that there’s one less dimension to worry about.
The grid wraps around itself like a torus. This means that when we’re at an edge, and we need to evaluate a neighboring cell outside the bounds, we’ll look at the cell at the other end.
We’ll write our code for web browsers, since they give us a handy <canvas>
with which to plot the state of the Game of Life world. That being said, it’s relatively straightforward to adapt the example to other environments that have some kind of image rendering. In Node.js you could even write to the terminal using ANSI escape codes.
To start off, we’ll build up a Grid
class, which holds our Game of Life world as an array, and handles each iteration. We’ll build it in a frontend-agnostic way, and we’ll even make it usable without any changes in our multithreaded example. In order to properly simulate the Game of Life, we’ll need a multidimensional array to represent our grid of cells. We could use arrays of arrays, but to make things easier later on, we’ll store it in a one-dimensional array, (in fact, a Uint8Array
) and then for any cell with coordinates x
and y
, we’ll store it in the array at cells[size * x + y]
. We’ll also need two of these, since one will be for the current state, and one for the previous state. In another attempt to simplify things for later on, we’ll store both of them sequentially in the same ArrayBuffer
.
Make a directory called ch5-game-of-life/ and add the contents of Example 5-6 to gol.js in that directory.
class
Grid
{
constructor
(
size
,
buffer
,
paint
=
()
=>
{})
{
const
sizeSquared
=
size
*
size
;
this
.
buffer
=
buffer
;
this
.
size
=
size
;
this
.
cells
=
new
Uint8Array
(
this
.
buffer
,
0
,
sizeSquared
);
this
.
nextCells
=
new
Uint8Array
(
this
.
buffer
,
sizeSquared
,
sizeSquared
);
this
.
paint
=
paint
;
}
Here we’ve started off the Grid
class with a constructor. It takes in a size
, which is the width of our square, an ArrayBuffer
called buffer
, and a paint
function which we’ll use later on. We then establish our cells
and nextCells
as instances of Uint8Array
stored side-by-side in the buffer
.
Next, we can add the cell retrieval method we’ll need later on when performing iterations. Add the code in Example 5-7.
getCell
(
x
,
y
)
{
const
size
=
this
.
size
;
const
sizeM1
=
size
-
1
;
x
=
x
<
0
?
sizeM1
:
x
>
sizeM1
?
0
:
x
;
y
=
y
<
0
?
sizeM1
:
y
>
sizeM1
?
0
:
y
;
return
this
.
cells
[
size
*
x
+
y
];
}
To retrieve a cell with a given set of coordinates, we need to normalize the indices. Recall that we’re saying the grid wraps around. The normalization we’ve done here makes sure that if we’re one unit above or below the range, we instead retrieve the cell at the other end of the range.
Now, we’ll add the actual algorithm that runs on every iteration. Add the code in Example 5-8.
static
NEIGHBORS
=
[
[
-
1
,
-
1
]
,
[
-
1
,
0
]
,
[
-
1
,
1
]
,
[
0
,
-
1
]
,
[
0
,
1
]
,
[
1
,
-
1
]
,
[
1
,
0
]
,
[
1
,
1
]
]
;
iterate
(
minX
,
minY
,
maxX
,
maxY
)
{
const
size
=
this
.
size
;
for
(
let
x
=
minX
;
x
<
maxX
;
x
++
)
{
for
(
let
y
=
minY
;
y
<
maxY
;
y
++
)
{
const
cell
=
this
.
cells
[
size
*
x
+
y
]
;
let
alive
=
0
;
for
(
const
[
i
,
j
]
of
Grid
.
NEIGHBORS
)
{
alive
+=
this
.
getCell
(
x
+
i
,
y
+
j
)
;
}
const
newCell
=
alive
===
3
||
(
cell
&&
alive
===
2
)
?
1
:
0
;
this
.
nextCells
[
size
*
x
+
y
]
=
newCell
;
this
.
paint
(
newCell
,
x
,
y
)
;
}
}
const
cells
=
this
.
nextCells
;
this
.
nextCells
=
this
.
cells
;
this
.
cells
=
cells
;
}
}
The set of neighbors coordinates are used in the algorithm to look at neighboring cells in 8 directions. We’ll keep this array handy since we’ll need to use it for every cell.
The iterate()
method takes in a range to operate on in the form of minimum X and Y values (inclusive) and maximum X and Y values (exclusive). For our single-threaded example, it will always be (0, 0, size, size)
, but putting a range here will make it easier to split up when we move to a multithreaded implementation, where we’ll use these X and Y boundaries to divide the whole grid into sections for each thread to work on.
We loop over every cell in the grid, and for each one get the number of neighbors that are alive. We’re using the number 1
to represent living cells, and 0
to represent dead cells, so we can count the number of neighboring living cells by adding them all up. Once we have that, we can apply the simplified Game of Life algorithm. We store the new cell state in the nextCells
array, and then provide the new cell state and coordinates to the paint
callback, for visualization. Then we swap the cells
and nextCells
arrays for the subsequent iteration to use. That way, inside each iteration, cells
always represents the previous iteration’s result, and newCells
always represents the current iteration’s result.
All the code up until this point will be shared with our multithreaded implementation. With the Grid
class complete, we can now move on to creating and initializing a Grid
instance, and tying it to our UI. Add the code from Example 5-9.
const
BLACK
=
0xFF000000
;
const
WHITE
=
0xFFFFFFFF
;
const
SIZE
=
1000
;
const
iterationCounter
=
document
.
getElementById
(
'iteration'
)
;
const
gridCanvas
=
document
.
getElementById
(
'gridcanvas'
)
;
gridCanvas
.
height
=
SIZE
;
gridCanvas
.
width
=
SIZE
;
const
ctx
=
gridCanvas
.
getContext
(
'2d'
)
;
const
data
=
ctx
.
createImageData
(
SIZE
,
SIZE
)
;
const
buf
=
new
Uint32Array
(
data
.
data
.
buffer
)
;
function
paint
(
cell
,
x
,
y
)
{
buf
[
SIZE
*
x
+
y
]
=
cell
?
BLACK
:
WHITE
;
}
const
grid
=
new
Grid
(
SIZE
,
new
ArrayBuffer
(
2
*
SIZE
*
SIZE
)
,
paint
)
;
for
(
let
x
=
0
;
x
<
SIZE
;
x
++
)
{
for
(
let
y
=
0
;
y
<
SIZE
;
y
++
)
{
const
cell
=
Math
.
random
(
)
<
0.5
?
0
:
1
;
grid
.
cells
[
SIZE
*
x
+
y
]
=
cell
;
paint
(
cell
,
x
,
y
)
;
}
}
ctx
.
putImageData
(
data
,
0
,
0
)
;
We assign some constants for the black-and-white pixels we’ll draw to the screen, and set the size (actually, the width) of the grid we’re using. Feel free to play around with the size to see the Game of Life play out in different magnitudes.
We grab an iteration counter and canvas element from the HTML (which we’ll write later on). We’ll set our canvas width and height to SIZE
, and get a 2D context from it to work with.
We’ll use an ImageData
instance to modify the pixels on the canvas directly, via a Uint32Array
.
This paint()
function will be used both in initialization of the grid and on each iteration to modify the buffer backing the ImageData
instance. If a cell is alive, it’ll paint it black. Otherwise it’ll paint it white.
Now we create the grid instance, passing in the size, an ArrayBuffer
big enough to hold both cells
and nextCells
, and our paint()
function.
To initialize the grid, we’ll loop over all the cells and assign each one a random dead or alive state. At the same time, we’ll pass the result to our paint()
function to ensure that the image is updated.
Whenever an ImageData
is modified, we need to add it back to the canvas, so we’re doing it here now that we’re done initializing.
Finally, we’re ready to start running iterations. Add the code from Example 5-10.
let
iteration
=
0
;
function
iterate
(...
args
)
{
grid
.
iterate
(...
args
);
ctx
.
putImageData
(
data
,
0
,
0
);
iterationCounter
.
innerHTML
=
++
iteration
;
window
.
requestAnimationFrame
(()
=>
iterate
(...
args
));
}
iterate
(
0
,
0
,
SIZE
,
SIZE
);
For each iteration, we start off by calling our grid.iterate()
method, which modifies the cells as appropriate. Note that it calls the paint()
function for each cell, so once that happens, our image data is already set, so we just need to add it to the canvas context with putImageData()
. Then, we’ll update the iteration counter on the page and schedule another iteration to happen in a requestAnimationFrame()
callback. Finally we kick everything off with an initial call to iterate()
.
We’re done with the JavaScript, but now we need the supporting HTML. Fortunately, this is very short. Add the contents of Example 5-11 to a file called gol.html in the same directory, and then open that file up in your browser.
<h3>
Iteration:<span
id=
"iteration"
>
0</span></h3>
<canvas
id=
"gridcanvas"
></canvas>
<script
src=
"gol.js"
></script>
You should now see a 1,000 by 1,000 image displaying Conway’s Game of Life, going through the iterations as fast as it can. It should look something like Figure 5-1.
Depending on your computer, you may find that this lags a little bit, rather than being crisp and smooth. Iterating over all of these cells and performing calculations on them takes a lot of computing power. To speed things up a bit, let’s take advantage of some more CPU cores on your machine using WebWorker threads.
For the multithreaded version of our Game of Life implementation, we can re-use a lot of the code. In particular, the HTML doesn’t change and neither does our Grid
class. We’ll set up some worker threads, and an additional one to coordinate and modify image data. We need that additional thread because we can’t use Atomics.wait()
on the main browser thread. We’ll make use of SharedArrayBuffer
, rather than the regular ArrayBuffer
used in the single-threaded example. To coordinate the threads, we’ll need 8 bytes for coordination, specifically 4 to synchronize in each direction, since Atomics.wait()
requires at least an Int32Array
. Since our coordination thread will also be generating the image data, we’ll also need enough shared memory to hold that as well. For a grid of side length SIZE
, this means a SharedArrayBuffer
with memory laid out as in Table 5-4.
Purpose | # of Bytes |
---|---|
Cells (or next cells) |
|
Cells (or next cells) |
|
Image data |
|
Worker thread wait |
|
Coordination thread wait |
|
To get started here, copy the .html and .js files from the previous example to new files named thread-gol.html and thread-gol.js respectively. Edit thread-gol.html to make reference to this new JavaScript file.
Delete everything after the Grid
class definition. The next thing we’ll do is set up some constants. Add Example 5-12 to thread-gol.js.
const
BLACK
=
0xFF000000
;
const
WHITE
=
0xFFFFFFFF
;
const
SIZE
=
1000
;
const
THREADS
=
5
;
// must be a divisor of SIZE
const
imageOffset
=
2
*
SIZE
*
SIZE
const
syncOffset
=
imageOffset
+
4
*
SIZE
*
SIZE
;
const
isMainThread
=
!!
self
.
window
;
The BLACK
, WHITE
, and SIZE
constants have the same purpose as in the single-threaded example. We’ll set this THREADS
constant to any number that’s a divisor of SIZE
, and it will represent the number of worker threads we’ll spawn for doing the Game of Life calculation. We’ll be dividing the grid into chunks that can be handled by each thread. Feel free to play around with the THREADS
and SIZE
variables, so long as THREADS
divides SIZE
. We’ll need the offsets for where the image data and sync bytes are stored, so those are handled here as well. Finally we’re going to use the same file to run on the main thread and any worker threads, so we’ll need a way of knowing whether we’re on the main thread or not.
Next, we’ll start writing the code for the main thread. Add the contents of Example 5-13.
if
(
isMainThread
)
{
const
gridCanvas
=
document
.
getElementById
(
'gridcanvas'
)
;
gridCanvas
.
height
=
SIZE
;
gridCanvas
.
width
=
SIZE
;
const
ctx
=
gridCanvas
.
getContext
(
'2d'
)
;
const
iterationCounter
=
document
.
getElementById
(
'iteration'
)
;
const
sharedMemory
=
new
SharedArrayBuffer
(
syncOffset
+
// data + imageData
THREADS
*
4
// synchronizationi
)
;
const
imageData
=
new
ImageData
(
SIZE
,
SIZE
)
;
const
cells
=
new
Uint8Array
(
sharedMemory
,
0
,
imageOffset
)
;
const
sharedImageBuf
=
new
Uint32Array
(
sharedMemory
,
imageOffset
)
;
const
sharedImageBuf8
=
new
Uint8ClampedArray
(
sharedMemory
,
imageOffset
,
4
*
SIZE
*
SIZE
)
;
for
(
let
x
=
0
;
x
<
SIZE
;
x
++
)
{
for
(
let
y
=
0
;
y
<
SIZE
;
y
++
)
{
// 50% chance of cell being alive
const
cell
=
Math
.
random
(
)
<
0.5
?
0
:
1
;
cells
[
SIZE
*
x
+
y
]
=
cell
;
sharedImageBuf
[
SIZE
*
x
+
y
]
=
cell
?
BLACK
:
WHITE
;
}
}
imageData
.
data
.
set
(
sharedImageBuf8
)
;
ctx
.
putImageData
(
imageData
,
0
,
0
)
;
The first part of this is roughly the same as in the single-threaded example. We’re just grabbing the DOM elements and setting the grid size. Next, we set up the SharedArrayBuffer
which we’re calling sharedMemory
, and putting views on it for the cells
(which we’ll assign values to soon) and got the image data. We’ll use both a Uint32Array
and a Uint8ClampedArray
for the image data, for modification and assignment to the ImageData
instance respectively.
Then we’ll initialize the grid randomly, and at the same time modify the image data accordingly, and populate that image data to the canvas context. This sets up our initial state for the grid. At this point, we can start spawning worker threads. Add the contents of Example 5-14.
const
chunkSize
=
SIZE
/
THREADS
;
for
(
let
i
=
0
;
i
<
THREADS
;
i
++
)
{
const
worker
=
new
Worker
(
'thread-gol.js'
,
{
name
:
`gol-worker-
${
i
}
`
});
worker
.
postMessage
({
range
:
[
0
,
chunkSize
*
i
,
SIZE
,
chunkSize
*
(
i
+
1
)],
sharedMemory
,
i
});
}
const
coordWorker
=
new
Worker
(
'thread-gol.js'
,
{
name
:
'gol-coordination'
});
coordWorker
.
postMessage
({
coord
:
true
,
sharedMemory
});
let
iteration
=
0
;
coordWorker
.
addEventListener
(
'message'
,
()
=>
{
imageData
.
data
.
set
(
sharedImageBuf8
);
ctx
.
putImageData
(
imageData
,
0
,
0
);
iterationCounter
.
innerHTML
=
++
iteration
;
window
.
requestAnimationFrame
(()
=>
coordWorker
.
postMessage
({}));
});
We set up some worker threads in a loop. For each one, we give it a unique name for debugging purposes, post it a message telling it what range (i.e. the boundaries minX
, minY
, maxX
, and maxY
) of the grid we want it to operate in, and send it the sharedMemory
. Then we add coordination worker, and pass it the sharedMemory
and let it know that it’s the coordination worker via a message.
From the main browser thread, we’re only going to talk to this coordination worker. We’ll set it up so that it loops by posting a message every time it receives one, but only after grabbing the image data from SharedMemory
, making the appropriate UI updates, and requesting an animation frame.
The rest of the code runs in the other threads. Add the contents of Example 5-15.
}
else
{
let
sharedMemory
;
let
sync
;
let
sharedImageBuf
;
let
cells
;
let
nextCells
;
self
.
addEventListener
(
'message'
,
initListener
);
function
initListener
(
msg
)
{
const
opts
=
msg
.
data
;
sharedMemory
=
opts
.
sharedMemory
;
sync
=
new
Int32Array
(
sharedMemory
,
syncOffset
);
self
.
removeEventListener
(
'message'
,
initListener
);
if
(
opts
.
coord
)
{
self
.
addEventListener
(
'message'
,
runCoord
);
cells
=
new
Uint8Array
(
sharedMemory
);
nextCells
=
new
Uint8Array
(
sharedMemory
,
SIZE
*
SIZE
);
sharedImageBuf
=
new
Uint32Array
(
sharedMemory
,
imageOffset
);
runCoord
();
}
else
{
runWorker
(
opts
);
}
}
We’re on the other side of that isMainThread
condition now, so we know we’re in a worker thread or the coordination thread. Here, we declare some variables, and then add an initial listener to the message
event. Regardless of whether this is a coordination thread or a worker thread, we’ll need the sharedMemory
and sync
variables populated, so we assign those in the listener. Then we remove the initialization listener, since we won’t need it anymore. The worker threads won’t rely on message passing at all, and the coordination thread will have a different listener, as we’ll see in a moment.
If we’ve initialized a coordination thread we’ll add a new message
listener; a runCoord
function that we’ll define below. Then we’ll get references to cells
and nextCells
since we’ll need to keep track on the coordination thread separate from what’s going on in the Grid
instances in the worker threads. Since we’re generating the image on the coordination thread, we’ll need that too. Then we run the first iteration of runCoord
. If we’ve initialized a worker thread, we simply go ahead and pass the options (containing the range to operate) to runWorker()
.
Let’s go ahead and define runWorker()
right now. Add the contents of Example 5-16.
function
runWorker
({
range
,
i
})
{
const
grid
=
new
Grid
(
SIZE
,
sharedMemory
);
while
(
true
)
{
Atomics
.
wait
(
sync
,
i
,
0
);
grid
.
iterate
(...
range
);
Atomics
.
store
(
sync
,
i
,
0
);
Atomics
.
notify
(
sync
,
i
);
}
}
Worker threads are the only ones that need an instance of the Grid
class, so first we instantiate it, passing in the sharedMemory
as the backing buffer. This works, because we decided that the first part of the sharedMemory
would be the cells
and nextCells
, as it would be in the single-threaded example.
Then we start an infinite loop. The loop performs the following operations:
It performs an Atomics.wait()
on the i
-th element of the sync
array. In the coordination thread, we’ll do the appropriate Atomics.notify()
to allow this to proceed. We’re waiting for the coordination thread here because otherwise we may start changing data and swapping references to cells
and nextCells
before other threads are ready and data has made its way to the main browser thread.
+ Then it performs the iteration on the Grid
instance. Remember, we’re only operating on the range that the coordination thread said to operate on via the range
property.
Once that’s done, it notifies the main thread of having completed this task. This is done by setting the i
-th element of the sync
array to 1
with Atomics.store()
, and then waking the waiting thread via Atomics.notify()
. We’re using the transition away from the 0
state as an indicator that we should so some work, and a transition back to the 0
state to notify that we’ve finished the work.
We’re using Atomics.wait()
to stop the coordination thread from executing while the worker threads are modifying data, and then stop the worker threads with Atomics.wait()
while the coordination thread does its work. On either side, we use Atomics.notify()
to wake the other thread, and immediately go back into a waiting state, waiting for the other thread to notify back. Since we use atomic operations to both modify data and control when it is modified, we know that all the data accesses are sequentially consistent. In the interleaving program flow across threads, a deadlock cannot occur, since we’re always flipping execution back and forth from the coordination thread to the worker threads. The worker threads never execute on the same parts of memory as each other so we don’t have to worry about this concept from the perspective of solely the worker threads.
Worker threads can just run infinitely. We don’t have to be worried about that infinite loop, because it will only proceed if Atomics.wait()
returns, which requires that another thread calls Atomics.notify()
for that same array element.
Let’s wrap up the code here with the runCoord()
function, which is triggered via a message from the main browser thread after the initialization message. Add the contents of Example 5-17.
}
function
runCoord
()
{
for
(
let
i
=
0
;
i
<
THREADS
;
i
++
)
{
Atomics
.
store
(
sync
,
i
,
1
);
Atomics
.
notify
(
sync
,
i
);
}
for
(
let
i
=
0
;
i
<
THREADS
;
i
++
)
{
Atomics
.
wait
(
sync
,
i
,
1
);
}
const
oldCells
=
cells
;
cells
=
nextCells
;
nextCells
=
oldCells
;
for
(
let
x
=
0
;
x
<
SIZE
;
x
++
)
{
The first thing that happens here is the coordination thread notifying the worker threads via the i
-th element of the sync
array for each worker thread, waking them up to perform an iteration. When they’re done, they’ll notify via the same element of the sync
array, so we’ll wait on those. The fact that each of these calls to Atomics.wait()
blocks the thread execution is exactly why we need this coordination thread in the first place, rather than just doing it all on the main browser thread.
Next, we swap the cells
and nextCells
references. The workers have already done this for themselves inside the iterate()
method, so we need to follow suit here. Then we’re ready to iterate over all the cells
and convert their values to pixels in the image data. Finally, we post a message back to the main browser thread, indicating that the data is ready to be displayed in the UI. The coordination thread has nothing to do until the next time it receives a message, at which point runCoord
is run again. This method completes the conceptual loop started in Example 5-14.
Now we’re done! To view the HTML file, remember that in order to use SharedArrayBuffer
, we need a server running with particular headers set. To do this, run the following in your ch5-game-of-life directory:
$
npx MultithreadedJSBook/serve .
Then, append /thread-gol.html
to the URL provided to see our multithreaded implementation of Conway’s Game of Life running. Because we haven’t changed any UI code, it should look exactly the same as the single-threaded example in Figure 5-1. The only difference you should see is in performance. The transitions between iterations likely appear to be much smoother and quicker. You’re not imagining things! We’ve moved the work of calculating cell states and plotting pixels into separate threads, so now the main thread is free to animate more smoothly, and iterations happen faster because we’re using more CPU cores in parallel to do the work.
Most importantly, we’re avoiding most of the overhead of passing messages between threads for coordination by just using Atomics.notify()
to let other threads know that they can continue after having paused themselves with Atomics.wait()
.
At the heart of JavaScript lies the event loop, which allows the language to create new call stacks and handle events. It’s always been there and we JavaScript engineers have always depended on it. This is true for both JavaScript that runs in the browser, where you might have jQuery that listens for a click event in the DOM, or JavaScript that runs on the server, where you might have the Fastify server that waits for an incoming TCP connection to be established.
Enter the new kid on the block: Atomics.wait()
and shared memory. This pattern now allows applications to halt the execution of JavaScript, thereby causing the event loop to completely stop working. Because of this you can’t simply start throwing calls to make use of multithreading into your application and expect it to work without problem. Instead, certain restrictions must be followed to make the application behave nicely.
One such restriction is hinted upon when it comes to browsers: the main thread of the application should not call Atomics.wait()
. And, while it can be done in a simple Node.js script, you should really avoid doing so in a larger application. For example, if your main Node.js thread is handling incoming HTTP requests, or has a handler for receiving operating system signals, what’s going to happen when the event loop comes to a halt when a wait operation is started? Example 5-18 is an example of such a program.
#!/usr/bin/env node
const
http
=
require
(
'http'
)
;
const
view
=
new
Int32Array
(
new
SharedArrayBuffer
(
4
)
)
;
setInterval
(
(
)
=>
Atomics
.
wait
(
view
,
0
,
0
,
1900
)
,
2000
)
;
const
server
=
http
.
createServer
(
(
req
,
res
)
=>
{
res
.
end
(
'Hello World'
)
;
}
)
;
server
.
listen
(
1337
,
(
err
,
addr
)
=>
{
if
(
err
)
throw
err
;
console
.
log
(
'http://localhost:1337/'
)
;
}
)
;
If you feel so inclined, create a directory for this file and execute the server by running the following command:
$ node main.js
Once it’s running, execute the following command in your terminal several times, waiting a random amount of time between each invocation:
$ time curl http://localhost:1337
What this application does is first create an HTTP server and listen for requests. Then, every two seconds, a call to Atomics.wait()
is made. It’s configured in such a way that the application freezes for 1.9 seconds to exaggerate the effect of long pauses. The curl
command you’re running is prefixed with the time
command which displays the amount of time the following command takes to run. Your output will then randomly vary between 0 and 1.9 seconds, which is a huge amount of time for a web request to pause for. Even as you reduce that timeout value closer and closer to 0, you’ll still end up with micro stutters that globally affect all incoming requests. If web browsers allowed Atomics.wait()
calls in the main thread you would definitely be encountering micro stutters from this in websites you visit today.
Another question still remains: what sort of restrictions should come into play with each of the additional threads that an application spawns, considering that each thread has their own event loop?
Our recommendation is to designate ahead of time what the main purpose of each spawned thread is. Each thread either becomes a CPU-heavy thread that makes heavy used of Atomics
calls, or an event-heavy thread that makes minimal Atomics
calls. With such an approach, you might have a thread that is a worker in the truest sense, constantly performing complex calculations and writing the results to a shared array buffer. You would also have your main thread, which is then mostly communicating via message passing and doing event loop based work. It then might make sense to have simple intermediary threads that call Atomics.wait()
as they wait for another thread to finish doing work, then call postMessage()
to send the resulting data back to the main thread to handle the result at a much higher level.
To summarize the concepts in this section:
Don’t use Atomics.wait()
in the main thread.
Designate which threads are CPU-heavy and use lots of Atomics
calls and which threads are evented.
Consider using simple “bridge” threads to wait and post messages where appropriate.
These are some very high-level guidelines that you can follow when designing your application. But, sometimes some more concrete patterns really help drive the point home. Chapter 6 contains some such patterns you might find beneficial.
1 Atomics.notify()
was originally going to be called Atomics.wake()
like its Linux futex equivalent but was later renamed to prevent visual confusion between “wake” and “wait” methods.