Chapter 6. Extracting data from the browser

There are two things that I absolutely love about running neural networks in the web browser: the ability to use the APIs of the browser to input data into a model for training or testing, as well as rendering the training progress or output of the models such as activations, filters, and model structures to the screen. Both tasks are incredibly powerful and fairly easy to implement in modern browsers.

Note: All of the source code used in this book can be found here: https://github.com/backstopmedia/deep-learning-browser. And, you can access the demo of our Rock Paper Scissors game here: https://reiinakano.github.io/tfjs-rock-paper-scissors/. Also, you can access the demo of our text generation model here: https://reiinakano.github.io/tfjs-lstm-text-generation/.

A great way to present a trained Neural Network to a user is enabling the user to interact with the model by creating sample data directly in the browser and feeding it to the network. Thanks to the capabilities of modern browsers, these interactions can range from capturing images from the webcam, drawing sketches on a canvas, or recording audio with your built-in microphone.

Besides evaluating pretrained models in the browser, we can even train a gesture classifier using training images streamed from the webcam, train a MNIST-like model by drawing training samples on a canvas or build a simple voice-recognition network.

In this chapter, we will first cover the loading, extraction and manipulation of image data. In the next step, you will learn how to render image data as well as any two-dimensional data to the screen. We will cover image blending as well as drawing shapes on top of the original images for visualizing object bounding boxes.

In the following section, we will learn how to access the webcam or built-in cameras to decode the image data of the video or single image. In addition, we will also extract the raw data from the microphone stream. At the end of this section, we will load and decode audio files and output sounds on the device speakers.

In the last section, we take a look at the data retrieval and manipulation utilities of popular deep learning frameworks for the browsers such as TensorFlow.js, Keras.JS and WebDNN. We will learn how these frameworks facilitate data loading, image manipulation, and data conversion by providing useful APIs and utility tools.

Loading image data

The process of loading images in JavaScript is one of the simplest, yet very useful techniques for interactively evaluating trained neural networks in the browser. To use the image as an input for a machine learning algorithm, the image’s pixel values have to be extracted beforehand.

In this section, we will learn how to extract RGBA pixel data from images using the Canvas API. We will not just load image data from DOM elements, but also from URLs only. Next, we will understand how to deal with the cross-origin security policy to load remote resources from within JavaScript. In the end of this section, we will see how to fetch binary blobs from the network and cast them into typed array data types.

Extracting pixels from an image

Let’s define an HTML image tag to load a local image to the DOM.

<img src="data/cat.jpeg" id="img"></img>

Now we can access the image element using the global DOM API document.getElementById('img'), which is provided in every browser. The image element which is of type HTMLImageElement, does not provide a direct API to extract its pixel values. Please note, that the global document object and many other JavaScript APIs are only available in the browser and NOT available in JavaScript runtimes without a Document Object Model (DOM), such as Node.js.

For pixel manipulations, modern browsers provide the Canvas API, which can be used to programmatically draw pixel graphics to the screen. We will use this API to extract the pixel values from the image element. To extract data from a canvas element, we first have to create a canvas Context. Within this context, we can draw the image content to the canvas and consecutively access and return the canvas pixel data.

Note: You can find an extensive overview of the Canvas API on MDN - Canvas API, including samples for styles, colors, animations, pixel manipulations and optimizations.

Let’s declare a function to implement this.

function loadRgbaDataFromImage(img) {
  // create a canvas element
  const canvas = document.createElement('canvas');

  // set the canvas dimension to the image size
  canvas.width = img.width;
  canvas.height = img.height;

  // create a 2D rendering context
  const ctx = canvas.getContext('2d');

  // render the image to the canvas context
  // at position 0,0 (left, top)
  ctx.drawImage(img, 0, 0, img.width, img.height);

  // extract the image data
  const imgData = ctx.getImageData(0, 0, canvas.width, canvas.height);

  // convert the image data to int32
  return new Int32Array(imgData.data);
}

In the above code, the getImageData function returns an element of type ImageData, which is an object with the attributes width, height, and data. The data attribute stores the pixel values as a typed array Uint8ClampedArray. To use the image data for deep learning algorithms and frameworks at a later stage, we transform the array to type int32.

Finally, we can call the function once the image is loaded. Keep in mind that the image is loaded asynchronously, and therefore we can extract the pixel values only once the image is fully loaded by the browser and the onload event is triggered.

const img = document.getElementById('img');

img.onload = () => {
  const data = loadRgbaDataFromImage(img);
  // Int32Array(40000) [255, 255, 255, 254, ...]
}

The above function returns the raw RGBA pixel values of the original image as a flat array with the dimensions (height, width, channel) in the range [0, 255].

Loading remote resources

Besides using local (demo) images on a webserver, it is usually also very useful to allow the user to load images from remote locations, and to allow the user to change the url parameter of an image. However, this comes with a great security risk because content could be loaded from any other resource and executed within the current context. Therefore, the browser automatically blocks so-called cross-site requests to different domains, protocols or ports other than the current connection.

The Cross-Origin Resource Sharing (CORS) policy allows a browser to perform a cross-origin HTTP request to a resource by setting additional HTTP headers. For image elements, use the crossOrigin attribute and set it to anonymous, which will explicitly allow this element to loaded cross-site resources.

Note: You can find more information about CORS access control, headers, and attributes for images on MDN - HTPP CORS.

Let’s take a look at the example from the previous section. This time we are loading the sample from a different domain.

<img src="https://../cat.jpeg" crossOrigin="anonymous" id="img"></img>

The above statement can be used if the image element already exists. However, we can easily create an image programmatically from within JavaScript. Due to the asynchronous behavior of loading the image resource, we return a so-called Promise instead of the finished resource. A Promise is an API for an eventual completed (resolved) or failed (rejected) process, which is often used instead of a callback function.

Let’s create an Image object, set the crossOrigin policy, and return a Promise which resolves the loaded image resource.

function loadImage(url) {
  return new Promise((resolve, reject) => {
    const img = new Image();
    img.crossOrigin = "anonymous";
    img.src = url;
    img.onload = () => resolve(img);
    img.onerror = reject;
  })
}

Finally, we can load the image from a remote resource using the function from the above code snippet and use the function loadRgbaDataFromImage from the previous section to extract the RGBA pixel values of the image. We use the Promise::then() function to asynchronously resolve the Promise when the resource is loaded.

const url = "https://foo.bar/cat.jpeg";
loadImage(url).then((img) => {
  const data = loadRgbaDataFromImage(img);
  console.log(data);
  // Int32Array(40000) [255, 255, 255, 254, ...]
});

Using the more elegant async/await syntax, you can write the Promise without a nested .then() function. Please note that the await keyword can only be used within an async function and we have to wrap the complete execution block in a self-executing async function call. This looks like an overkill in this simple example, but comes quite handy when multiple await statements are used.

(async function(){
  const url = "https://../cat.jpeg";
  const img = await loadImage(url);
  const data = loadRgbaDataFromImage(img);
}());

Note: You can find more examples about how and when to use async functions on MDN - Async Function.

In the following code snippets, the self-executing async function block will be skipped for brevity when the await keyword is used.

Fetching binary blobs

Most deep learning frameworks generate large binary blobs for datasets, model weights, activations, and much more. JavaScript is a very versatile language and has built-in support for typed arrays and array buffers. These data structure make working with binary data in the browser quite handy.

If the data can be dumped as a binary blob of a JavaScript compatible datatype (such as for example int8, int16, int32, float32 or float64) in any programming language, it can be easily loaded within JavaScript using ArrayBuffer and TypedArray objects if the datatype is known. The main advantage of using binary blobs and loading them via ArrayBuffer into a typed array, is that the data does not need to be parsed within JavaScript. This leads to huge improvements over using textual representation formats such as CSV or JSON, especially for large files or even makes loading model weights for larger models possible in JavaScript.

Let’s try how this works with a simple Python snippet and Numpy. First we generate an array of random data and then dump it to disk as a binary file.

import numpy as np

filename = "data/rand.bin"

# create an array with random values
r = np.random.rand(100, 100)

# write the array to disk
with open(filename, 'wb') as f:
  f.write(r.astype(np.float32).tostring())

Now, we have a binary blob, rand.bin, and we can go ahead and create a function to fetch binary blobs as array buffers.

async function loadBinaryDataFromUrl(url) {
  const req = new Request(url);
  const res = await fetch(req);

  if (!res.ok) {
    throw Error(res.statusText);
  }

  // return the array buffer representation
  return res.arrayBuffer();
}

We mark the function with the keyword async in order to use the await keyword in the function body. Using await we can wait until the fetch promise is resolved before we continue with the function execution. The fetch response implements the method Fetch::arrayBuffer() to return the array buffer representation from an HTTP request.

Finally, we can load the rand.bin data using the above function, and cast the array buffer into the original datatype. Knowing the original array dimensions we can as well visualize the blob with the renderData function, which was created in the previous section.

const size = 100;
const buf = await loadBinaryDataFromUrl('data/rand.bin');
const data = new Float32Array(buf);

renderData(document.body, data, size, size, false);

Rendering pixel data to the screen

For every development, debugging, training and evaluation process of deep learning models a crucial step is to visualize results to the screen. You can not only spot implementation and training errors when visualizing the output of the layer activations and filter weights, but you also can reason about network performance and insights by visualizing training progress, the class scores, or activation of the receptive field of single input pixels and regions.

The article The Building Blocks of Interpretability (Source: https://distill.pub/2018/building-blocks/) shows the potential for visualizations in the field of deep learning in a very impressive way. Use these visualizations as motivation to learn and master the skills of rendering data to the canvas.

In this section, we will first learn how to display simple image elements on the screen. As a next step, we will render pixel data either as RGBA for color images or single layers from output activations or filter weights as grayscale images. In addition, we will cover how to blend images for displaying segmentation maps and how to draw common shapes on top of existing images for bounding box visualizations.

Displaying images

Let’s start with the simplest approach and render an image element in the browser. To do so, we only have to append the HTMLImageElement to an element in the DOM, e.g. document.body.

function renderImage(elem, img) {
  // append the image element to the DOM
  elem.append(img);
  return img;
}

Using the above snippet and the loadImage function from the previous section, we can easily load images from remote resources and render them to the screen.

const url = "data/cat.jpeg";
const img = await loadImage(url);

renderImage(document.body, img);

Rendering pixel data to canvas

We continue with a slightly more useful approach to render actual pixel data to the screen. This will be very useful whenever you want to visualize a chunk of data such as output activations or filter weights.

For the first approach we assume that the data is stored in RGBA format with the dimensions (height, width, channel) and values in the range [0, 255], such as int32 type. This format is equivalent to the one returned by the function loadRgbaDataFromImage, which was implemented in the previous section.

Let’s write a function to render RGBA data. First, we create a canvas element and dimension it according to the image dimensions. We then create a context element and retrieve the image data from the context spanning the canvas. Next, we transform the values into Uint8ClampedArray and overwrite the image data with these values. Finally, we write the image data back to the canvas and append it to the DOM.

function renderRgbaData(elem, data, width, height, smooth) {
  // create a canvas element
  const canvas = document.createElement('canvas');
  canvas.width = width;
  canvas.height = height;

  // create a 2D rendering context
  const ctx = canvas.getContext('2d');

  // get the ImageData object from the canvas
  const img = ctx.getImageData(0, 0, width, height);

  // convert the pixel values
  const vals = new Uint8ClampedArray(data);

  // write the values to the image data
  img.data.set(vals);

  // write the image data to the canvas context
  ctx.putImageData(img, 0, 0);

  // enable/disable automatic smoothing
  ctx.imageSmoothingEnabled = Boolean(smooth);

  // append the canvas element to the DOM
  elem.append(canvas);
  return canvas;
}

In the above function we add a parameter smooth to control the canvas image smoothing, which is enabled by default. However, when drawing filter activations, we want to see the actual pixel values instead of interpolated smoothed values.

Now we can use the above function to render RGBA data to the screen. We use the two previously create functions, loadImage and loadRgbaDataFromImage, to retrieve the RGBA from an existing image.

const url = "data/cat.jpeg";
const img = await loadImage(url);
const data = loadRgbaDataFromImage(img);

renderRgbaData(document.body, data, img.width, img.height);

The above functions works great for RGBA data, however there is a slight problem: actual filter weights and output activations usually consist of more than three depth channels. So, we usually visualize each channel individually as a grayscale image.

In order to extend the above function to render grayscale images, we replace the value array after the line // convert the pixel values. We need to create a target array vals with the proper dimensions and data type. Then, we need to iterate through the original array and apply the grayscale value to all RGB channels. The modified function looks similar to renderRgbaData, except for the following difference:

function renderData(elem, data, width, height, smooth) {
  ...
  const alpha = 255;
  const len = data.length * 4;
  const vals = new Uint8ClampedArray(len);

  for (let x = 0; x < width; ++x) {
    for (let y = 0; y < height; ++y) {

      // compute the index position
      let ix0 = (y*width + x);
      let ix1 = ix0 * 4;

      // transform range [0, 1] to [255, 0]
      let val = (1 - data[ix0]) * 255;

      // write the value to all RGB channels
      // to generate a grayscale image 
      vals[ix1 + 0] = val;   // R
      vals[ix1 + 1] = val;   // G
      vals[ix1 + 2] = val;   // B
      vals[ix1 + 3] = alpha; // A
    }
  }
  ...
}

Using this function, we can now visualize any two-dimensional data blob.

const size = 8;
const data = new Int32Array([
  0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0,
  1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1,
  0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0,
]);

renderData(document.body, data, size, size, false);

Interpolating image data

To visualize results of a segmentation map, we often have to blend two images together, namely the original and the segmentation mask. Let’s implement this method as a separate function which acts on two arrays d0 and d1 of the same dimensions. We define a parameter alpha, such that alpha=0 should return d0 and alpha=1 should return d1. All values in between 0 and 1 should interpolate between the images.

function interpolateRgba(d0, d1, alpha, width, height, channels){
  const out = new Uint8ClampedArray(d0.length);
  const a0 = 1 - alpha;
  const a1 = alpha;
  for (let x = 0; x < width; ++x) {
    for (let y = 0; y < height; ++y) {
      for (let c = 0; c < channels; ++c) {
        let ix = (y*width + x) * channels + c;
        out[ix] = d0[ix] * a0 + d1[ix] * a1;
      }
    }
  }
  return out;
}

We implement a simple helper function loadRgbaDataFromUrl to retrieve the image data directly from any image by only providing the image’s URL.

async function loadRgbaDataFromUrl(url) {
  const img = await loadImage(url);
  return loadRgbaDataFromImage(img);
}

In addition, we extend the function renderRgbaData such that it can render to an existing canvas element instead of creating and appending a new one every time it gets called.

Finally, we can load the original image and the segmentation mask, overlay the two images and show the overlay whenever the cursor moves over the original image. The code snippet would look like this:

const width = 500;
const height = 375;
const channels = 4;
const canvas = document.getElementById("scene");
const pixels = await loadRgbaDataFromUrl("data/bike.jpg");
const object = await loadRgbaDataFromUrl("data/bike_object.png");

// Compute the interpolated overlay
const layover = interpolateRgba(pixels, object, 0.5, width, height, channels);

// initial render
renderRgbaData(canvas, pixels, width, height);

canvas.onmouseover = () => renderRgbaData(canvas, layover, width, height);
canvas.onmouseleave = () => renderRgbaData(canvas, pixels, width, height);

In the above code, we assume that there exists a canvas element with the id scene in which both images are rendered.

Drawing shapes to canvas

To visualize the results of a localization task, you need to render geometric shapes, such as bounding boxes, to the canvas as an overlay. Using the canvas API this is quite easy to do in JavaScript.

Let’s create a function that adds a rectangle stroke on top of an existing canvas image. We only have to specify the left, top, width, height dimensions and then use ctx.rect() to draw the rectangle. We create a stroke to outline the rectangle by using ctx.stroke(), whereas ctx.fill() fills the rectangle.

function addRect(canvas, dims, color) {
  const ctx = canvas.getContext("2d");
  const left = dims[0];
  const top = dims[1];
  const width = dims[2];
  const height = dims[3];

  ctx.strokeStyle = color || 'black';
  ctx.rect(left, top, width, height);
  ctx.stroke();
}

Creating a circle is similar. We only need to pass the arc parameters cx, cy, outerRadius, innerRadius, arcAngle to the ctx.arc() function. cx and cy define the circle center point, outerRadius and innerRadius allow the radius for both inner and outer circle of a donut, and arcAngle defines the end angle of the arc to draw arc segments. To draw a full circle we default innerRadius to 0 and arcAngle to 2 * Math.PI.

function addCircle(canvas, dims, color) {
  const ctx = canvas.getContext("2d");
  const cx = dims[0];
  const cy = dims[1];
  const outerRadius = dims[2];
  const innerRadius = dims.length > 3 ? dims[3] : 0;
  const arcAngle = dims.length > 4 ? dims[4] : 2 * Math.PI;

  ctx.strokeStyle = color || 'black';
  ctx.beginPath();
  ctx.arc(cx, cy, outerRadius, innerRadius, arcAngle);
  ctx.stroke();
}

Now let’s use both functions to draw a bounding box as well as a circle around the face of the cat.

const url = "data/cat.jpeg";
const img = await loadImage(url);
const data = loadRgbaDataFromImage(img);

const canvas1 = renderRgbaData(document.body, data, img.width, img.height);
addRect(canvas1, [70,20,100,100], "green");

const canvas2 = renderRgbaData(document.body, data, img.width, img.height);
addCircle(canvas2, [120,70,50], "red");

Note: You can find more information about the shapes in the canvas rendering context on MDN - CanvasRenderingContext2D.

Accessing camera, microphone and speakers

The most appealing fact for using browsers to develop, train and evaluate deep learning algorithms and applications is the variety and simplicity of Media APIs available in modern browsers. Training gesture recognition from the built-in webcam or speech recognition from the built-in microphone is just a few lines of code away.

In this section, we will first access the webcam or built-in cameras to display a video stream and access the image data. We can either run an a model continuously on the stream of images for a video or on single images using a button. Next, we will extract data from the microphone in order to feed the raw input data into a deep learning model. Finally, we will use the WebAudio API to load sound files, decode common audio formats such as MP3, WAV, and many more, and play those sounds on the device speakers.

Capturing images from the Webcam

Many deep learning algorithms and applications focus on two-dimensional datasets such as images and video frames. Modern browser and most laptops and mobile devices provide not just a camera but fantastic APIs to easily access the camera images from within JavaScript. This greatly facilitates creating interactive and easy accessible DL applications for the browser.

The browser’s MediaDevices API gives a user access to the video and audio devices such as camera, screen share, microphone and speakers. It is part of the more general WebRTC (short for Web Real-Time Communication) API, a standard to enable peer-to-peer teleconferencing without intermediary servers.

Please note that depending on the browser you can only access the camera and audio via WebRTC in a single browser tab. Due to the sensitivity of the camera and audio data, WebRTC might only work over HTTPS and a valid certificate. However, most browsers also allow access to WebRTC on localhost.

We can start a video stream using the MediaDevices::getUserMedia() function, which will return a promise containing the MediaStream object.

navigator.mediaDevices.getUserMedia({ video: true, audio: false })
  .then((stream) => { ... });

To extract the data from the MediaStream, we have to attach the stream to a video player element. Let’s create such an element and feed the stream into the player. If you want the video player to be visible, we could as well append it to the DOM.

const player = document.createElement('video');

// if the video playback should be visible
// document.body.append(player);

navigator.mediaDevices.getUserMedia({ video: true, audio: false })
  .then((stream) => { player.srcObject = stream; });

Finally, we can extract the content from the video element the same as we did with the img element in the previous section. We render the image to a canvas in the first step, and then extract the ImageData object in the second step.

function loadRgbaDataFromImage(img, width, height) {
  const canvas = document.createElement('canvas');
  canvas.width = width;
  canvas.height = height;
  const ctx = canvas.getContext('2d');
  ctx.drawImage(img, 0, 0, width, height);
  const imgData = ctx.getImageData(0, 0, width, height);
  return new Int32Array(imgData.data);
}

const width = 240;
const height = 160;
const data = loadRgbaDataFromImage(player, width, height);

Note: You can also access the content from a video element in WebGL directly without using a canvas element. To do so, you can bind the video element to a 2D Texture using the WebGLRenderingContext.texImage2D method.

Recording Audio from the Microphone

Using the same MediaDevices API as in the previous section, we can as well access a stream from the microphone. However, instead of using a video element to process the data, we use the WebAudio API, which is a very flexible graph-based audio processing API. This API allow us to create audio streams, processors, as well as to route audio between these nodes and to the speakers.

Let’s first start by retrieving the audio stream using the MediaDevices::getUserMedia() function. We also need to define the global AudioContext.

const audioContext = new AudioContext();

function onStream() { ... }

navigator.mediaDevices.getUserMedia({ audio: true, video: false })
  .then(onStream);

Next, we setup a very simple audio graph, consisting of an input (the microphone stream), a simple processor, and a default output. We also need to define the properties of the processor, including the number of input and output channels as well as the buffer size of the audio chunks.

function onProcess() { ... }

const bufferSize = 4096; // 256, 512, 1024, 2048, 4096, 8192, 16384
const numInputChannels = 1;
const numOutputChannels = 1;

function onStream(stream) {
  const source = audioContext.createMediaStreamSource(stream);
  const processor = audioContext.createScriptProcessor(
    bufferSize, numInputChannels, numOutputChannels);

  // connect the processor to the source
  source.connect(processor);

  // connect the output to the processor
  processor.connect(audioContext.destination);

  processor.onaudioprocess = onProcess;
};

As we can see in the above snippet, all audio processing is performed on an audio graph, which builds on the AudioContext API. In the case of audio recording and feeding it to a deep neural network, we need a source, a processor and a destination as nodes on the audio graph and connect them in series. The processing is then performed in the onProcess function.

function onProcess(e) {
  const data = e.inputBuffer.getChannelData(0);

  console.log(e.inputBuffer, data);
}

Using the AudioBuffer.getChannelData() function, we retrieve the raw audio data as an Float32Array(bufferSize). The AudioBuffer object e.inputBuffer gives us access to the duration, sampleRate, and numberOfChannels properties.

We can now use this data for further processing or feed it directly to a network. If the small audio chunk of the buffer size is not big enough, one can create a large array and copy the different chunks into the corresponding positions of the array.

Loading, decoding and playing sounds

Using the WebAudio API and AudioContext we can also load and decode common audio formats in the browser using the AudioContext::decodeAudioData() function. This is very useful when we want to for example load an MP3 encoded audio sample and feed it into a deep neural network.

In this sample we will reuse the loadBinaryDataFromUrl function from a previous section to load the sound file as ArrayBuffer. Let’s write the function to wrap the AudioContext::decodeAudioData() function in a Promise. This method can decode all audio formats that are supported in the audio and video tags of the HTML5 browser.

const audioContext = new AudioContext();

function decodeAudio(data) {
  return new Promise((resolve, reject) => {
    // decode the array buffer using supported audio formats
    audioContext.decodeAudioData(data, (buffer) => resolve(buffer));
  }); 
}

To test the above snippet, we also implement a minimalistic audio graph that outputs audio from a buffer to the speakers.

function playSound(buffer) {
  const source = audioContext.createBufferSource();
  source.buffer = buffer;
  source.connect(audioContext.destination);
  source.start(0);
}

Finally, we can test the functions to load a sample sound and output it to the speakers.

<script>
const url = "data/Large-dog-barks.mp3";
const data = await loadBinaryDataFromUrl(url);
const audio = await decodeAudio(data);

playSound(audio);
</script>

The audio object in the above snippet is of type AudioBuffer exactly like in the previous section. We can now use the AudioBuffer.getChannelData() method to extract both audio arrays of the two audio channels of this MP3 file.

const c0 = audio.getChannelData(0);
const c1 = audio.getChannelData(1);

Utility tools in deep learning frameworks

In this section we will learn about the utility tools for data loading and manipulation of the popular deep learning frameworks for the browser: TensorFlow.js, Keras.js, and WebDNN. As we saw in the previous chapters, each framework uses an abstraction on top of the TypedArray object to store the tensor variables in a flat array. Most of these frameworks provide utility tools to load, create, resize and visualize data. Let’s take a look!

TensorFlow.js

TensorFlow.js abstracts the data as tf.tensor objects, that contain of the raw data, the tensor shape, and the data type. It doesn’t provide an utility for loading images from URLs, but it provides the tf.fromPixel function to convert image and image like elements (video, image, canvas, etc.) to tf.tensor objects. We can also use the tf.tensor.print() function to print the tensor to the developer console.

const url = "data/cat.jpeg";
const img = await loadImage(url);
const data = tf.fromPixels(img);

data.print();

If the image needs to be resized, we can use the tf.image.resizeBilinear function.

const dataResized = tf.image.resizeBilinear(data, [100, 100]);

Converting a single image to a batch of one image doesn’t affect the underlying raw data but only the tensor shape. In TensorFlow.js we can use the tf.tensor.expandDims function to expand the dimension of the tensor along a defined axis.

const dataBatch = data.expandDims(2);

To load binary data and parse it do a tensor in TensorFlow.js, we can call the default tf.tensor constructor. In the following example we use the loadBinaryDataFromUrl function to load a binary blob in JavaScript which will be implemented in the consecutive chapter. In this case, the blob contains a matrix, and hence we can use the tf.tensor2d constructor.

const size = 100;
const buf = await loadBinaryDataFromUrl('data/rand.bin');
const data = tf.tensor(new Float32Array(buf), [size, size]);

data.print();

To render a tensor to the screen (to a canvas element), we can use the tf.toPixels function. Let’s write a wrapper that creates a new canvas element and renders the tensor to the canvas.

async function render(rootElem, data) {
  const canvas = document.createElement('canvas');
  rootElem.append(canvas);

  await tf.toPixels(data, canvas);
  return canvas;
}

In the above code you can see that tf.toPixels returns a promise. The reason is that accessing the tensor data using tensor.data() is an asynchronous operation in TensorFlow.js. Hence we make the render function asynchronous as well. Finally, we can use this function to render tensors to the screen.

await render(document.body, data);

Keras.js

Keras.js uses the ndarray library under the hood for abstracting tensors on top of TypedArrays. ndarray is a modular multidimensional array implementation for JavaScript and should make it easy for Matlab or NumPy users to get started with vector calculus in JavaScript.

An ndarray object from an image can be created like the following:

const url = "data/cat.jpeg";
const img = await loadImage(url);
const data = ndarray(new Float32Array(img), [width, height, 4])

The ndarray core API provides a lot of functionality for slicing, transposing, reversing and reshaping arrays. However, much more functionality is available in multiple ndarray-* packages.

WebDNN

WebDNN provides a lot of image loading, parsing and transformation capabilities under the WebDNN.Image scope. It also contains the very convenient function WebDNN.Image.getImageArray for loading an image array from an URL, parsing it and resizing it into a defined shape. In the same manner, the function WebDNN.Image.setImageArrayToCanvas provides similar functionalities to render a tensor to a canvas element.

const url = "data/cat.jpeg";
const img = await WebDNN.Image.getImageArray(url, { dstW: 256, dstH: 256 });

const canvas01 = createCanvas(document.body, 256, 256);
WebDNN.Image.setImageArrayToCanvas(img, 256, 256, canvas01)

const canvas02 = createCanvas(document.body, 100, 100);
WebDNN.Image.setImageArrayToCanvas(img, 256, 256, canvas02, { dstW: 100, dstH: 100 });

WebDNN can easily deal with binary data as well. To visualize 2D weights as a greyscale image we can simply add the option {color: WebDNN.Image.Color.GREY}.

const size = 100;
const buf = await loadBinaryDataFromUrl('data/rand.bin');
const img = new Float32Array(buf);

const canvas01 = createCanvas(document.body, 100, 100);
WebDNN.Image.setImageArrayToCanvas(img, 100, 100, canvas01, {
  color: WebDNN.Image.Color.GREY, dstW: 100, dstH: 100, scale: [255], bias: [-1]
});

const canvas02 = createCanvas(document.body, 256, 256);
WebDNN.Image.setImageArrayToCanvas(img, 100, 100, canvas02, {
  color: WebDNN.Image.Color.GREY, dstW: 256, dstH: 256, scale: [255], bias: [-1]
});

Summary

In this chapter you learned how to extract data, such as images from URLs, images from the webcam, and audio from the microphone all from within the browser. When loading data from other domains, we need to set the crossOrigin attribute accordingly to allow a cross-site request.

Binary blobs of data can be easily and very efficiently fetched and parsed into TypedArray data structures using the Fetch API and the Response::arrayBuffer method.

We need to use the Canvas API to transform images and videos into image data or to render image data to the screen. On top of the canvas element, we can draw shapes to visualize object positions or interpolate between two images to visualize the result of a segmentation model.

You also saw how to use the utility functions of TensorFlow.js, Keras.js and WebDNN to load and parse image, audio and binary data using the respective abstractions on top of the TypedArray object.

In the following chapter we will go into more detail and take a look at practical applications and building blocks for advanced data manipulation. We will parse the complete Caffe and Tensorflow model graphs from within JavaScript using protobuf.js, learn how to draw charts using Chart.js, and extract spectogram features from an audio feed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.64.66