Sending binary data in the browser

While message passing is a great way to send data, there are some problems when it comes to sending very large objects across the channel. For instance, let's say we have a dedicated worker that makes requests on our behalf and also adds some data to the worker from a cache. It could potentially have thousands of records. While the worker would already be taking up quite a bit of memory, as soon as we utilize postMessage we will see two things:

  • The amount of time it takes to move the object is going to be long
  • Our memory is going to increase dramatically

The reason for this is the structured clone algorithm that browsers use to send the data. Essentially, instead of just moving the data across the channel, it is going to serialize and deserialize our object, essentially creating multiple copies of it. On top of this, we have no idea when the garbage collector is going to run as we know it is non-deterministic.

We can actually see the copying process in the browser. If we create a worker called largeObject.js and move a giant payload, we can measure the time it takes by utilizing the Date.now() method. On top of this, we can utilize the record system in the developer's tools, as we learned in Chapter 1, Tools for High Performance on the Web, to profile the amount of memory that we use. Let's set this test case up:

  1. Create a new worker and assign it a large object. In this case, we are going to use a 100,000-element array that is storing objects inside of it:
const dataToSend = new Array(100000);
const baseObj = {prop1 : 1, prop2 : 'one'};
for(let i = 0; i < dataToSend.length; i++) {
dataToSend[i] = Object.assign({}, baseObj);
dataToSend[i].prop1 = i;
dataToSend[i].prop2 = `Data for ${i}`;
}
console.log('send at', Date.now());
postMessage(dataToSend);
  1. We now add to our HTML file some code to launch this worker and listen for the message. We will mark when the message arrives and then we will profile the code to see the increase in memory:
const largeWorker = new Worker('largeObject.js');
largeWorker.onmessage = function(ev) {
console.log('the time is', Date.now());
const obj = ev.data;
}

If we now load this up into our browser and profile our code, we should see results similar to the following. The message took anywhere between 800 ms to 1.7 s, and the heap size was anywhere between 80 MB and 100 MB. While this case is definitely out of the bounds of most people, it showcases some issues with this type of message passing.

A solution to this is to use the transferrable portion of the postMessage method. This allows us to send a binary data type across the channel and, instead of copying it, the channel actually just transfers the object. This means that the sender no longer has access to it, but the receiver does. A way to think about this is that the sender puts the data in a holding location and tells the receiver where it is at. At this point, the sender can no longer access it. The receiver receives all of the data and notices that it has a location to look for data. It goes to this location and grabs it, thereby fulfilling the data transfer mechanism.

Let's go ahead and code a simple example. Let's take our heavy worker and populate it with a bunch of data, in this case, a list of numbers from 1 to 1,000,000:

  1. We create an Int32Array with 1,000,000 elements. We then add all of the numbers 1 through 1,000,000 in it:
const viewOfData = new Int32Array(1000000);
for(let i = 1; i <= viewOfData.length; i++) {
viewOfData[i-1] = i;
}
  1. We will then send that data by utilizing the transferrable portion of postMessage. Note that we have to get the underlying ArrayBuffer. We will discuss this shortly:
postMessage(viewOfData, [viewOfData.buffer]);
  1. We will receive the data on the main thread and write out the length of that data:
const obj = ev.data;
console.log('data length', obj.byteLength);

We will notice that the time it took to transfer this large chunk of data was almost unnoticeable. This is because of the preceding theory where it just boxes the data and puts it to the side for the received.

An aside is needed for typed arrays and ArrayBuffers. The ArrayBuffers can be thought of as buffers in Node.js. They are the lowest form of storing data and directly hold the bytes of some data. But, to truly utilize them, we need to put a view on the ArrayBuffer. This means that we need to give meaning to that ArrayBuffer. In our case, we are saying that it stores signed 32-bit integers. We can put all sorts of views over ArrayBuffer, just like how we can interpret buffers in Node.js in different ways. The best way to think about this is that ArrayBuffer is the low-level system that we really don't want to utilize and that the views are the system that gives meaning to the underlying data.

With this in mind, if we check out the byte length of the Int32Array on the worker side, we will see that it is zero. We no longer have access to that data, just as we said. To further utilize this feature before heading on to SharedWorkers and SharedArrayBuffers, we will modify our factorization program to utilize this transferrable property to send the factors across:

  1. We will utilize almost the exact same logic, except instead of sending over the array that we have, we will send over Int32Array:
if( typeof ev.data === 'number' ) {
const result = calculatePrimes(ev.data);
const send = new Int32Array(result);
this.postMessage(result, [result.buffer]);
}
  1. Now we will update our receiving end code to handle ArrayBuffers being sent instead of just an array:
if( typeof ev.data === 'object' ) {
const data = new Int32Array(ev.data);
answer.innerText = data.join(' ');
}

If we test this code out, we will see that it works just the same, but we are no longer copying the data across, we are just giving it to the main thread, thereby making the message passing faster and making it utilize less memory.

The main idea is that, if we are just sending results or we need to be as quick as possible, we should try to utilize the transferrable system for sending data. If we have to use the data in the worker after sending it, or there is not a simple way to send the data (we have no serialization technique), we can utilize the normal postMessage system.

Just because we can use the transferrable system to reduce memory footprint, it could cause times to increase based on the amount of data transformation we need to apply. If we already have binary data, this is great, but if we have JSON data that needs to be moved, it may be better to just transfer it in that form instead of having to go through many intermediary transformations.

With all of these ideas, let's take a look at the SharedWorker system and SharedArrayBuffer system. Both of these systems, especially the SharedArrayBuffer, have led to some issues in the past (we will discuss this in the following section), but if we utilize them carefully we will be able to leverage their capability of being a good message-passing and data-sharing mechanism.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.39.59