Implementing download throttling

For incoming streams, Node provides pause and resume methods, but not so for outbound streams. Essentially, this means we can easily throttle upload speeds in Node but download throttling requires a more creative solution.

Getting ready

We'll need a new server.js along with a good-sized file to serve. With the dd command-line program, we can generate a file for testing purposes.

dd if=/dev/zero of=50meg count=50 bs=1048576

This will create a 50 MB file named 50meg which we'll be serving.

Tip

For a similar Windows tool that can be used to generate a large file, check out http://www.bertel.de/software/rdfc/index-en.html.

How to do it...

To keep things as simple as possible our download server will serve just one file, but we'll implement it in a way which would allow us to easily plug in some router code to serve multiple files. First, we will require our modules and set up an options object for file and speed settings.

var http = require('http'),
var fs = require('fs'),

var options = {}
options.file = '50meg';
options.fileSize = fs.statSync(options.file).size;
options.kbps = 32;

If we were serving multiple files, our options object would be largely redundant. However, we're using it here to emulate the concept of a user-determined file choice. In a multifile situation, we would be loading file specifics based upon the requested URL instead.

Note

To see how this recipe could be configured to serve and throttle more than one file, check out the routing recipes In Chapter 1, Making a Web Server

The http module is for the server while the fs module is for creating a readStream and grabbing the size of our file.

We're going to restrict how much data is sent out at once, but we first need to get the data in. So let's create our server and initialize a readStream.

http.createServer(function(request, response) {
  var download = Object.create(options);
  download.chunks = new Buffer(download.fileSize);
  download.bufferOffset = 0;

  response.writeHeader(200, {'Content-Length': options.fileSize});

   fs.createReadStream(options.file)
    .on('data', function(chunk) {  
      chunk.copy(download.chunks,download.bufferOffset);
      download.bufferOffset += chunk.length;
    })
    .once('open', function() {
    	 //this is where the throttling will happen
     });    
}).listen(8080);

We've created our server and specified a new object called download, which inherits from our options object. We add two properties to our request-bound download object: a chunks property that collects the file chunks inside the readStream data event listener and a bufferOffset property that will be used to keep track of the amount of bytes loaded from disk.

All we have to do now is the actual throttling. To achieve this, we simply apportion out the specified number of kilobytes from our buffer every second, thus achieving the specified kilobytes per second. We'll make a function for this, which will be placed outside of http.createServer and we'll call our function throttle.

function throttle(download, cb) {
  var chunkOutSize = download.kbps * 1024,
      timer = 0;
      
  (function loop(bytesSent) {
    var remainingOffset;
    if (!download.aborted) {
      setTimeout(function () {      
        var bytesOut = bytesSent + chunkOutSize;
        
        if (download.bufferOffset > bytesOut) {
          timer = 1000;         
          cb(download.chunks.slice(bytesSent,bytesOut));
          loop(bytesOut);
          return;
        }
        
        if (bytesOut >= download.chunks.length) {
            remainingOffset = download.chunks.length - bytesSent;
            cb(download.chunks.slice(remainingOffset,bytesSent));
            return;
        }
        
          loop(bytesSent); //continue to loop, wait for enough data
      },timer);
    }  
   }(0));
   
   return function () { //return a function to handle an abort scenario
    download.aborted = true;
   };

}

throttle interacts with the download object created on each server request to measure out each chunk according to our predetermined options.kbps speed. For the second parameter (cb), throttle accepts a functional callback. cb in turn takes one parameter, which is the chunk of data that throttle has determined to send. Our throttle function returns a convenience function that can be used to end the loop on abort, avoiding infinite looping. We initialize download throttling by calling our throttle function in the server callback when the readStream opens.

//...previous code
  fs.createReadStream(options.file)
      .on('data', function (chunk) {  
        chunk.copy(download.chunks,download.bufferOffset);
        download.bufferOffset += chunk.length;
      })
      .once('open', function () {
         var handleAbort = throttle(download, function (send) {
                       			      response.write(send);
                           		    });
    
         request.on('close', function () {
            handleAbort();
         });   
       });    

}).listen(8080);

How it works...

The key to this recipe is our throttle function. Let's walk through it. To achieve the specified speed, we send a chunk of data of a certain size every second. The size is determined by the desired amount of kilobytes per second. So, if download.kbps is 32, we'll send 32 KB chunks every second.

Buffers work in bytes, so we set a new variable called chunkOutSize and multiply download.kbps by 1024 to realize the appropriate chunk size in bytes. Next, we set a timer variable which is passed into setTimeout. It is first set to 0 on two accounts. For one, it eliminates an unnecessary initial 1000 millisecond overhead, allowing our server the opportunity to immediately send the first chunk of data, if available. Secondly, if the download.chunks buffer is not full enough to accommodate the demand of chunkOutSize, the embedded loop function recurses without changing timer. This causes the CPU to cycle in real time until the buffer loads enough data to deliver a whole chunk (a process which should take less than a second).

Once we have enough data for the first chunk, timer is set to 1000 because from here on out we want to push a chunk every second.

loop is the guts of our throttling engine. It's a self-recursive function which calls itself with one parameter: bytesSent. The bytesSent parameter allows us to keep track of how much data has been sent so far, and we use it to determine which bytes to slice out of our download.chunks buffer using Buffer.slice. Buffer.slice takes two parameters, start and end. These two parameters are fulfilled with bytesSent and bytesOut respectively. bytesOut is also used against download.bufferOffset to ensure we have enough data loaded for a whole chunk to be sent out.

If there is enough data, we proceed to set the timer to 1000 to initiate our chunk per second policy, then pass the result of download.chunks.slice into cb which becomes our send parameter.

Back inside our server, our send parameter is passed to response.write within our throttle callback, so each chunk is streamed to the client. Once we've passed our sliced chunk to cb we call loop(bytesOut) for a new iteration (thus bytesOut transforms into bytesSent), then we return from the function to prevent any further execution.

The third and final place bytesOut appears is in the second conditional statement of the setTimeout callback, where we use it against download.chunks.length. This is important for handling the last chunk of data. We don't want to loop again after the final chunk has been sent, and if options.kbps doesn't divide exactly into the total file size, the final bytesOut would be larger than the size of the buffer. If passed into the slice method unchecked, this would cause an object out of bounds (oob) error.

So if bytesOut equals, or is greater than, the memory allocated to the download.chunks buffer (that is, the size of our file), we slice the remaining bytes from our download.chunks buffer and return from the function without calling loop, effectively terminating recursion.

To prevent infinite looping when the connection is closed unexpectedly (for instance during connection failure or client abort) throttle returns another function, which is caught in the handleAbort variable and called in the close event of response. The function simply adds a property to the download object to say the download has been aborted. This is checked on each recursion of the loop function. As long as download.aborted isn't true it continues to iterate, otherwise the looping stops short.

Note

There are (configurable) limits on operating systems defining how many files can be opened at once. We would probably want to implement caching in a production download server to optimize file system access. For file limits on Unix systems, see http://www.stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux.

Enabling resumes from broken downloads

If a connection breaks, or a user accidentally aborts a download, the client may initiate a resume request by sending a Range HTTP header to the server. A Range header would look something like this:

Range: bytes=512-1024

When a server agrees to handle a Range header, it sends a 206 Partial Content status and adds a Content-Range header in the response. Where the entire file is 1 MB, a Content-Range reply to the preceding Range header might look as follows:

Content-Range: bytes 512-1024/1024

Notice that there is no equals sign (=) after bytes in a Content-Range header. We can pass an object into the second parameter of fs.createReadStream, which specifies where to start and end reading. Since we are simply handling resumes, we only need to set the start property.

//requires, options object, throttle function, create server etc...
download.readStreamOptions = {};
download.headers = {'Content-Length': download.fileSize};
download.statusCode = 200;
  if (request.headers.range) {
    download.start = request.headers.range.replace('bytes=','').split('-')[0];
    download.readStreamOptions = {start: +download.start};
    download.headers['Content-Range'] = "bytes " + download.start + "-" + 											     download.fileSize + "/" + 												     download.fileSize;
    download.statusCode = 206; //partial content
  }
  response.writeHeader(download.statusCode, download.headers);
  fs.createReadStream(download.file, download.readStreamOptions)
//...rest of the code....

By adding some properties to download, and using them to conditionally respond to a Range header, we can now handle resume requests.

See also

  • Setting up a router discussed in Chapter 1, Making a Web Server
  • Caching content in memory for immediate delivery discussed In Chapter 1, Making a Web Server
  • Communicating via TCP discussed In Chapter 8, Integrating Networking Paradigms
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.186.201