Chapter 8. Splitting Up Work Through Web Workers

JavaScript has, since its inception, run in a single thread. With small applications this was practical, but it runs up against certain limits now, with larger and larger applications being loaded into browsers. As more and more JavaScript is run, the application will start to block, waiting for code to finish.

JavaScript runs code from an event loop that takes events off a queue of all the events that have happened in the browser. Whenever the JavaScript runtime is idle, it takes the first event off the queue and runs the handler that goes with that event (see Figure 8-1). As long as those handlers run quickly, this makes for a responsive user experience.

Event loop

Figure 8-1. Event loop

In the past few years, the competition among browsers has in part revolved around the speed of JavaScript. In Chrome and Firefox, JavaScript can now run as much as 100 times faster than it did back in the days of IE 6. Because of this, it is possible to squeeze more into the event loop.

Thankfully, most of the things JavaScript has to do are fast. They tend to be on the order of manipulating some data and passing it into the DOM or making an Ajax call. So the model in Figure 8-1 works pretty well. For things that would take longer than a fraction of a second to compute, a number of tricks can prevent bottlenecks from affecting the user experience.

The main trick is to break the computation into small steps and run each one as an independent job on the queue. Each step ends with a call to the next step after a short delay—say, 1/100 of a second. This prevents the task from locking up the event queue. But it’s still fundamentally unsatisfactory, as it puts the work of the task scheduler on to the programmer. Tuning this solution to make it effective is also a demanding effort. If the time steps are too small, computation can still clog up the event queue and cause other tasks to lag behind. So things will still happen, but the user will feel the lag as the system fails to respond right away to clicks and other user-visible activities. On the other hand, if the steps between actions are too large, the computation will take a very long time to complete, causing the user to wait for her results.

Google Gears created the idea of the “worker pool,” which has turned into the HTML5 Web Worker. The interfaces are somewhat different, but the basic ideas are the same. A worker is a separate JavaScript process that can perform computations and pass messages back and forth with the main process and other workers. A Web Worker differs from a thread in Java or Python in one key aspect of design: there is no shared state. The workers and the main JavaScript instance can communicate only by passing messages.

That one difference leads to a number of key programming practices, most simpler than thread programming. Web Workers have no need for mutexes, locks, or synchronization. Deadlocks and race conditions can’t occur. This also means you can use the huge number of JavaScript packages out there without worrying whether they are thread-safe. The only changes to the browser’s JavaScript environment are a few new methods and events.

Each worker (including the main window) maintains an independent event loop. Whenever there is no code running, the JavaScript runtime returns to this event loop and takes the first message out of the queue. If there are no events in the queue, it will wait until an event arrives and then handle it. If some piece of code is running for a long time, no events will be handled until that piece of code is finished. In the main window, this will result in the browser user interface locking up. (Some browsers will offer to let you stop JavaScript at this point.) In a worker, a long task will keep the worker from accepting any new events. However, the main window, and any other workers, will continue to be responsive.

This design choice does, however, place some restrictions on the worker processes themselves. First, workers do not have access to the DOM. This also means a worker can’t use the Firebug console interface, as Firebug communicates with JavaScript by way of the DOM. Finally, JavaScript debuggers cannot access workers, so there is no way to step through code or do any of the other things that would normally be done in the debugger.

Web Worker Use Cases

The types of applications traditionally run on the Web, and the limitations of the web browser environment, limited the computational needs that would call for a Web Worker. Until recently, most web applications manipulated small amounts of data consisting mostly of text and numbers. In these cases, a Web Worker type of construct is of limited use. Now JavaScript is asked to do a lot more, and many common situations can benefit from spawning new tasks.

Graphics

The HTML5 <svg> and <canvas> tags allow JavaScript to manipulate images, potentially a computationally heavy task. Although web browsers have been able to display images since the release of the Mosaic browser around 1993, the browsers couldn’t manipulate those images. If a web programmer wanted to distort an image, overlay it transparently, and so forth, it could not be done in the browser. In the <img> tag, all the browser could do is substitute a different image by changing the src attribute, or change the displayed size of the image. However, the browser had no way of knowing what the image was or accessing the raw data that made up the image.

The recently added <canvas> tag makes it possible to import an existing image into a canvas and export the raw data back into JavaScript for processing, as long as the image was loaded from the same server as the page it is on. It is also possible to export a frame from a video in the HTML5 <video> tag.[1]

Once the data has been extracted from a graphic, you can pass it to a worker for post-processing. This could be useful for doing anything from cleaning up an image to doing a Fourier transform on a scientific data set. Canvas makes it possible to build complex image editing through various filters written in JavaScript, which should often use Web Workers for better performance.

Maps

In addition to graphics, JavaScript has APIs now for handling map data. Being able to import a map from the Internet and find out the user’s current location via geolocation allows a wide range of web application services.

Suppose you build a route finder into a mobile browser. It would be very nice to be able to take your phone and tell it you wish to go to “#14 King George St, Tel Aviv” and have the browser figure out where you are, direct you to the nearest bus stop, and tell you that you should take the number 82 bus to get there from the Diamond District in Ramat Gan.

An even more complex version of that software might check traffic to tell you that a different bus might take a more roundabout route and leave you a block from your destination, but probably run faster by missing a major traffic snarl.

Using Web Workers

To start up a Web Worker, create a new Worker object and pass, as the parameter to the call, the file that contains the code (see Example 8-1). This will create a worker from the source file.

Example 8-1. Worker example

$(document).ready(function (){
   var worker = new Worker('worker.js'),
   worker.onmessage = function (event){
       console.info(event);
   };
   worker.postMessage("World" );
});

The browser will load the worker, run any code that is not in an event handler, and then launch the event loop to wait for events. The main event to be concerned with is the message event, which is how you send data to the worker. The main thread sends the message by issuing postMessage() and passing data as the argument.

The data from the main thread is held in the event.data field. The worker should retrieve this data through a call to onmessage().

The Worker Environment

Web Workers run in a pretty minimal environment. Many of the familiar objects and interfaces of JavaScript in the browser are missing, including the DOM, the document object, and the window object.

In addition to the standard ECMAScript objects like String, Array, and Date, the following objects and interfaces are available to the Web Worker:

  • The navigator object, which contains four properties: appName, appVersion, userAgent, and platform

  • The location object, with all properties read-only

  • The self object, which is the worker object

  • The importScripts() method

  • The XMLHttpRequest interface for doing Ajax methods

  • setTimeout() and setInterval()

  • The close() method, which ends the worker process

ECMAScript 5 JSON interfaces can also be used, as they are part of the language, not the browser enviroment. Furthermore, the worker can import library scripts from the server with the importScripts() method. This method takes a list of one or more files, which are then loaded. This has the same effect as using a <script> tag in the main user interface thread. Unlike most methods in JavaScript, importScripts is blocking. The function will not return until all the listed scripts have been loaded. importScripts will execute the loaded files in the order in which they were specified to the command.

Although localStorage and sessionStorage are not accessible from the Web Worker, IndexedDB databases are (see Chapter 5). In addition, the IndexedDB specification says that the blocking forms of calls can be used in a Web Worker (but not in the main window). So if you want a worker to manipulate data through IndexedDB, it would make sense to load the new data into the database and then send an “updated” message to the main window or other workers so that they can take any needed actions.

Worker Communication

The main event that concerns a worker is the message event, which is sent to the worker from the postMessage method in the main JavaScript context to pass information. In Firefox, it is possible to pass complex JavaScript objects. However, some versions of Chrome and Safari support only simple data, such as strings, Booleans, and numbers. It is good practice to encode all data into JSON before sending it to a Web Worker.

The worker can send data back to the main thread via the same postMessage method, and receive it back in the main thread via the worker.onmessage handler.

The model for worker communication is that the main task creates the worker, after which they pass messages back and forth as shown in Figure 8-2.

Worker communication

Figure 8-2. Worker communication

Web Worker Fractal Example

Example 8-1 is the “Hello World” of Web Workers. A more complex example is called for. Figure 8-3 shows a visual representation of a Mandelbrot set computed in a Web Worker. Here the worker and the main thread split up the work to draw the fractal. The worker does the actual work of computing the Mandelbrot set, while the frontend script takes that raw data and displays it in the canvas.

Mandelbrot example

Figure 8-3. Mandelbrot example

The frontend script (see Example 8-2) sets up the canvas element and scales it to fit in the page. Then it creates an object to wrap the worker interface. The wrapper object creates the worker in the wrapper’s run() method, passing to the worker a parameter block that tells it what chunk of the Mandelbrot set to compute.

The draw method takes the data, scales it to fit onto the canvas, sets a color, and then draws the pixel.

Note

The HTML Canvas does not have a “draw pixel” command, so to draw a pixel we must draw a square of size 1 and offset it by half a pixel from the spot where we want it to show up. So to draw a pixel at (20,20) the square should extend from (19.5,19.5) to (20.5,20.5). The locations on the canvas grid are not the pixels on the screen but the points between them.

The onmessage handler then waits for events to be sent from the worker. If the event type is draw, the handler calls the method to draw the new data into the canvas. If the event is log, it is logged to the JavaScript console via console.info(). This provides a very simple method to log status information from a worker.

The startWorker method aliases the this to a local variable named that. This is because this is not lexically scoped like other JavaScript variables. To allow the inner function to have access to that object, which it will need to draw a pixel, it is necessary to alias it to a lexically scoped variable. By convention that variable is often called that.

Example 8-2. Mandelbrot frontend

var drawMandelSet = function drawMandelSet(){

    var mandelPanel = $('body'),

    var width = mandelPanel.innerWidth();
    var height = mandelPanel.innerHeight();
 
    var range = [{
        x: -2,
        y: -1.4
    }, {
        x: 5,
        y: 1.4
    }];
    
    $('canvas#fractal').height(height + 100);
    $('canvas#fractal').width(width  - 50);
    var left = 0;
    var top = 0;
    
    var canvas = $("canvas#fractal")[0];
    var ctx = canvas.getContext("2d");
    var params = {
        range: range,
        startx: 0.0,
        starty: 0.0,
        width: width,
        height: height
    };
    var y_array = [];

    var worker = {
        params: params,
        
        draw: function draw(data){
            data.forEach(function d(point){
                if (this.axis.x[point.drawLoc.x] === undefined) {
                    this.axis.x[point.drawLoc.x] = point.point.x;
                }
                if (this.axis.y[height - point.drawLoc.y] === undefined) {
                    this.axis.y[height - point.drawLoc.y] = point.point.y;
                }
                
                ctx.fillStyle = pickColor(point.escapeValue);
                ctx.fillRect(point.drawLoc.x + 0.5, 
                             height - point.drawLoc.y + 0.5, 1, 1);
            }, this);
        },
        
        axis: {
            x: [],
            y: [],
            find: function(x, y){
                return new Complex(this.x[x], this.y[y]);
            },
            
            reset: function(){
                this.x = [], this.y = [];
            }
        },
        myWorker: false,
        
        run: function startWorker(params){
            this.myWorker = new Worker("js/worker.js");
            
            var that = this;
            this.myWorker.postMessage(JSON.stringify(params));
            
            this.myWorker.onmessage = function(event){
            
                var data = JSON.parse(event.data);
                if (data.type === 'draw') {
                    that.draw(JSON.parse(data.data));
                }
                else 
                    if (event.data.type === 'log') {
                        console.info(event);
                    }
            };
        }
    };
    
    worker.run(params);
    return worker;
};

$(document).ready(drawMandelSet);

Function.prototype.createDelegate = function createDelegate(scope){
    var fn = this;
    return function(){
        fn.call(scope, arguments);
    };
};

function pickColor(escapeValue){
    if (escapeValue === Complex.prototype.max_iteration) {
        return "black";
    }

    var tone = 255 - escapeValue * 10; 
    var colorCss = "rgb({r},{g},{b})".populate({
        r: tone,
        g: tone,
        b: tone
    });
    return colorCss;
}

String.prototype.populate = function populate(params) {
    var str = this.replace(/{w+}/g, function stringFormatInner(word) {
        return params[word.substr(1, word.length - 2)];
    });
    return str;
};

The actual worker (see Example 8-3) is very simple. It just loads up a few other files and then waits for a message to be sent from the user interface. When it gets one, it starts the computation.

Example 8-3. Mandelbrot startup

importScripts('function.js','json2.js', 'complex.js','computeMandelbrot.js', 
              'buildMaster.js'),

onmessage = function(event){
    var data = typeof event.data === 'string'? JSON.parse(event.data) :  event.data; 
    buildMaster(data);
};

The buildMaster() function (see Example 8-4) loops over the grid of points for the Mandelbrot set, computing the escape value for each point (see Example 8-5). After every 200 points, the build function sends the results of its computation back to the main thread for drawing, and then zeros out its internal buffer of computed points. This way, instead of waiting for the entire grid to be drawn at once, the user sees the image build progressively.

Example 8-4. Mandelbrot build

var chunkSize = 200;
function buildMaster(data){

    var range = data.range;
    var width = data.width;
    var height = data.height;
    var startx = data.startx;
    var starty = data.starty;
    var dx = (range[1].x - range[0].x) / width;
    var dy = (range[1].y - range[0].y) / height;
    
    
    function send(line){
        var lineData = JSON.stringify(line.map(function makeReturnData(point){
            return {
                drawLoc: point.drawLoc,
                point: point.point,
                escapeValue: point.point.mandelbrot()
            };
        }));
        
        var json = JSON.stringify({
            type: 'draw',
            data: lineData
        });
        postMessage(json);
    };
    
    
    function xIter(x, maxX, drawX){
        var line = [];
        var drawY = starty;
        var y = range[0].y;
        var maxY = range[1].y;
        
        while (y < maxY) {
            if (line.length % chunkSize === chunkSize - 1) {
                send(line);
                line = [];
                
            }
            var pt = {
                point: new Complex(x, y),
                drawLoc: {
                    x: drawX,
                    y: drawY
                }
            };
            line.push(pt);
            y += dy;
            drawY += 1;
        }
        send(line);
        if (x < maxX && drawX < width) {
            xIter.defer(1, this, [x + dx, maxX, drawX + 1]);
        }
    }
    
    xIter(range[0].x, range[1].x, startx);
    
}

The final part of this application is the actual mathematical computation of the Mandelbrot set shown in Example 8-5. This function is done as a while loop instead of a pure function as in Functional Programming, because JavaScript does not support tail recursion. Doing this as a recursive function would be more elegant, but would risk causing a stack overflow.

Example 8-5. Mandelbrot computation

Complex.prototype.max_iteration = 255 * 2;
Complex.prototype.mandelbrot = function(){

    var x0 = this.x;
    var y0 = this.y;
    var x = x0;
    var y = y0;
    var count;
    var x_, y_;
    var max_iteration = this.max_iteration;
    function inSet(x, y){
        return x * x + y * y < 4;
    }
    count = 0;
    while (count < max_iteration && inSet(x, y)) {
        x_ = x * x - y * y + x0;
        y_ = 2 * x * y + y0;
        count += 1;
        x = x_;
        y = y_;
    }
    
    return count;
};

While the worker is doing the calculation of the Mandelbrot set, its main event is blocked. So it is not possible for the UI process to send it a new computation task, or more correctly stated, the worker will not accept the new task until the current task is finished.

To interrupt or change a worker’s behavior—for instance, to let the user in the user interface thread select which area of the Mandelbrot set to draw and then request that the worker draw that area—you have a choice among a few methods.

The simplest method would be to kill the worker and create a new one. This has the advantage that the new worker starts off on a clean state and there can be nothing left over from the prior runs. On the other hand, it also means the worker has to load all the scripts and data from scratch. So if the worker has a long startup time, this is probably not the best approach.

The second method is a little more complex: manage the task queue manually through your program. Have a data structure in the main thread or a worker that keeps a list of blocks of data to compute. When a worker needs a task, it can send a message to that queue object and have a task sent to it. This creates more complexity but has several advantages. First, the worker does not need to be restarted when the application needs it to do something different. Second, it allows the use of multiple workers. Each worker can query the queue manager when it needs the next part of the problem.

You could also have the master task send a large number of events to the worker in sequence. However, this has the problem that there is no way from JavaScript to clear the event queue. So having a job queue that can be managed seems to be the best approach. We’ll explore this solution in the following section.

There is no requirement that an application restrict itself to one Web Worker. JavaScript is quite happy to let you start up a reasonable number of workers. Of course, this makes sense only if the problem can be easily partitioned into several workers, but many problems can be divided that way.

Each worker is an independent construction, so it is possible to create several workers from the same source code, or to create several workers that work independently.

Workers are a fairly heavy construct in JavaScript, so it is probably a bad idea to create more than, say, 10 workers on a given task. However, the optimal number is probably dependent on the user’s browser and hardware as well as the task to be performed.

Testing and Debugging Web Workers

Over the past 10 years, the tools for JavaScript debugging have gotten quite good. Firebug and Chrome Developer Tools both are first-rate debugging tools that can be used for testing JavaScript applications. Unfortunately, neither one can access code running in a Web Worker. So you can’t set break points or step through your code in a worker. Nor do workers show up in the list of loaded scripts that appear in the respective script tags of Firebug and Chrome. Nor can Selenium or QUnit directly test code running in a Web Worker.

Errors in a worker are reported back to the console in Firefox and Chrome. Of course, in many cases, knowing the line and file where the error occurred does not help all that much, as the actual bug was somewhere else.

Chrome does provide the programmer a method for debugging Web Workers. The Chrome Developer Tools script panel contains a Web Workers checkbox. This option causes Chrome to simulate a worker using an iframe.

A Pattern for Reuse of Multithread Processing

Being able to use Web Workers to pull complex functions out of the user’s browser task offers great power for the programmer. Firefox has supported Web Workers since version 3.5 and Chrome has supported them since version 4. Safari and Opera have also supported them for some time. However, as of this writing, Microsoft Internet Exporer does not support Web Workers (though support may appear in IE version 10), nor does Safari on iOS, so it is not possible to use Web Workers on the iPad/iPod/iPhone platform.

What would be ideal is a library that would enable a programmer to abstract out the code to be run into a function or module and a runner that would use the best available mechanism to run that code in the backround: via a Web Worker if available, and otherwise via a setTimeout method. Furthermore, the library would provide a common set of interfaces that could be used for the various interactions, such as posting a message back to the main application.

Such a library should always use feature detection rather than browser detection to figure out which version of the code to run. While a given browser may or may not support Web Workers right now, in the future that will change and a library needs to be able to work with those changes.

The actual function to do the work in this pattern will be called repeatedly with the run state as a parameter. It should do whatever processing it needs to do and return a modified state parameter that will be used to call it again until it finishes its job and calls the stop() method, or is otherwise interrupted. The run function (see Example 8-6) should be treated as a pure function; it should just process its inputs and return a value, but not effect any change in global state, because a different set of interfaces will be available to it depending on whether it is running as a Web Worker or not.

Example 8-6. Run

(function ()
{
  runner.setup(function (state)
  {
     this.postMessage({state: state});
     return {
      time: state.time += 1
    };
  }, {
    time: 0
  });
}());

When running in a Web Worker (see Example 8-7), the run function can be run from inside a standard loop. The system is set up via a postMessage call with some initial parameters that are passed as the initial state to the run method. That method will be repeatedly called by the while loop until it calls the stop function, at which point the state will be posted back to the main message.

Example 8-7. Running a function with a Web Worker

var runner =
{
  stopFlag: false,

  postMessage: function (message)
  {
    self.postMessage(message);
  },

  stop: function ()
  {
    this.stopFlag = true;
  },

  error: function (error)
  {
    this.stopFlag = true;
  },

  setup: function (run)
  {
    this.run = run;
    var that = this;
    self.onmessage = function message(event)
    {
      that.execute(JSON.parse(event.data));
    };
  },

  execute: function (state)
  {
    var that = this;

    setTimeout(function runIterator()
    {
      that.state = that.run.apply(that, [that.state]);

      if (that.stopFlag)
      {
        that.postMessage(that.state);
      }
      else
      {
        that.execute();
      }
    }, 16);
  }
};

(function ()
{
  runner.setup(function (state)
  {
    var newstate = state;
    //modify newstate here
    return newstate;
  }, {
    time: 0
  });
}());

If Web Workers are not available in the browser, the method should be run through a short, repeating timeout instead of in a while loop (see Example 8-8). A while loop would block the message queue, so the main thread could not send messages to the worker. Using a timeout frees up the main thread—the whole goal of this library—and also lets a message change the state of the run function as needed.

Once again, the runner calls the run function with a state parameter that should be returned by the callback function. However, because this is not a Web Worker, the runner will then call the window.setTimeout() method to delay the next iteration by some amount of time and call the function again.

Example 8-8. Running a function without a Web Worker

var runner =
{
  stopFlag: false,
  // override this function
  onmessage: function (msg)
  {
    if (msg.state)
    {
      var state = msg.state;
      $('#status').html("time: " + state.time);
    }
    if (msg.set)
    {
      this.state = msg.set;
    }
    return this.state;
  },
  postMessage: function (message)
  {
    this.onmessage(message);
  },

  stop: function ()
  {
    this.stopFlag = true;
  },
  error: function (error)
  {
    this.stopFlag = true;
  },

  setup: function (run, state)
  {
    this.run = run;
    this.state = state;
    this.execute();
  },

  execute: function ()
  {
    var that = this;

    setTimeout(function runIterator()
    {
      that.state = that.run.apply(that, [that.state]);

      if (that.stopFlag)
      {
        that.postMessage(that.state);
      }
      else
      {
        that.execute();
      }
    }, 250);
  }
};

Communications between the simulated Web Worker and the main body of the code are also somewhat different. Because there is no postMessage() method with a callback, the runner must simulate it by presenting a mechanism to register a callback that can take the same parameters as the Web Worker’s onmessage() handler.

This concept of how to make code portable between a Web Worker and regular JavaScript is presented as a model and not a full solution. It is missing some features, such as loading code. It is also missing a way to call an asynchronous method such as an Ajax call, and resume processing when done. This would be necessary because, although in general Web Workers are designed for processor-intensive work, there will be times when access to an Ajax call or IndexedDB makes sense.

Libraries for Web Workers

When programming JavaScript in the main thread, programmers use a library such as jQuery to improve the API and to hide differences between browsers. For use with Web Workers, there is a jQuery extension called jQuery Hive that provides much of this functionality. Hive includes the PollenJS library in the main JavaScript thread. The library includes interfaces to create workers.

Hive will also encode and decode messages between the main thread and worker if needed. In some browsers (notably Firefox), complex data can be sent over the postMessage() interface. However, in some versions of Chrome and Safari, postMessage() will handle only a string or other simple data.

Hive also includes a subset of the jQuery API in the worker itself. The most important methods in the Hive API are $.get() and $.post(), which mirror the APIs in jQuery. If a worker needs to access the server via Ajax, for instance, using Hive will make your life much easier.

Hive also includes access to a persistent storage interface via $.storage. To set a value, use $.storage(name, value). Calling $.storage(name) without the second value parameter will return the existing value, if set.

Also included in Hive are $.decode() and $.encode(), which can be used to decode or encode JSON messages.



[1] See HTML5 Canvas by Steve Fulton and Jeff Fulton (O’Reilly) for more information on the graphics in HTML5.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.154.64