Chapter 2. Node.js Essential Patterns

Embracing the asynchronous nature of Node.js is not trivial at all, especially if coming from a language such as PHP where it is not usual to deal with asynchronous code.

In synchronous programming, we are used to the concept of imagining code as a series of consecutive computing steps defined to solve a specific problem. Every operation is blocking, which means that only when an operation is completed is it possible to execute the next one. This approach makes the code easy to understand and debug.

Instead, in asynchronous programming, some operations, such as reading a file or performing a network request, can be executed as an operation in the background. When an asynchronous operation is invoked, the next one is executed immediately, even if the previous operation has not finished yet. The operations pending in the background can complete at any time, and the whole application should be programmed to react in the proper way when an asynchronous call finishes.

While this non-blocking approach could almost always guarantee superior performance compared to an always-blocking scenario, it provides a paradigm that could be hard to reason about and that can get really cumbersome when dealing with more advanced applications that require complex control flows.

Node.js offers a series of tools and design patterns to deal optimally with asynchronous code. It's important to learn how to use them to gain confidence and write applications that are both performant and easy to understand and debug.

In this chapter, we will see two of the most important asynchronous patterns: callback and event emitter.

The callback pattern

Callbacks are the materialization of the handlers of the reactor pattern, which we introduced in the previous chapter. They are one of those imprints that give Node.js its distinctive programming style. Callbacks are functions that are invoked to propagate the result of an operation and this is exactly what we need when dealing with asynchronous operations. They do replace the use of the return instruction that always executes synchronously. JavaScript is a great language to represent callbacks, because as we have seen, functions are first class objects and can be easily assigned to variables, passed as arguments, returned from another function invocation or stored into data structures. Another ideal construct for implementing callbacks is closures. With closures, we can in fact reference the environment in which a function was created; we can always maintain the context in which the asynchronous operation was requested, no matter when or where its callback is invoked.

If you need to refresh your knowledge about closures, you can refer to the article on the Mozilla Developer Network at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Closures .

In this section, we will analyze this particular style of programming that's made of callbacks instead of x return instructions.

The continuation-passing style

In JavaScript, a callback is a function that is passed as an argument to another function and is invoked with the result when the operation completes. In functional programming, this way of propagating the result is called continuation-passing style (CPS). It is a general concept, and it is not always associated with asynchronous operations. In fact, it simply indicates that a result is propagated by passing it to another function (the callback), instead of directly returning it to the caller.

Synchronous continuation-passing style

To clarify the concept, let's take a look at a simple synchronous function:

function add(a, b) { 
  return a + b; 
} 

There is nothing special here; the result is passed back to the caller using the return instruction; this is also called direct style, and it represents the most common way of returning a result in synchronous programming. The equivalent continuation-passing style of the preceding function would be as follows:

function add(a, b, callback) { 
  callback(a + b); 
} 

The add() function is a synchronous CPS function, which means that it will return a value only when the callback completes its execution. The following code demonstrates this statement:

console.log('before'); 
add(1, 2, result => console.log('Result: ' + result)); 
console.log('after'); 

Since add() is synchronous, the previous code will trivially print the following:

before
Result: 3
after

Asynchronous continuation-passing style

Now, let's consider a case where the add() function is asynchronous, as follows:

function additionAsync(a, b, callback) { 
  setTimeout(() => callback(a + b), 100); 
} 

In the previous code, we used setTimeout() to simulate an asynchronous invocation of the callback. Now, let's try to use additionAsync and see how the order of the operations changes:

console.log('before'); 
additionAsync(1, 2, result => console.log('Result: ' + result)); 
console.log('after'); 

The preceding code will print the following output:

before
after
Result: 3

Since setTimeout() triggers an asynchronous operation, it will not wait for the callback to be executed, but instead, it returns immediately, giving the control back to additionAsync(), and then back to its caller. This property in Node.js is crucial, as it gives control back to the event loop as soon as an asynchronous request is sent, thus allowing a new event from the queue to be processed.

The following image shows how this works:

Asynchronous continuation-passing style

When the asynchronous operation completes, the execution is then resumed starting from the callback provided to the asynchronous function that caused the unwinding. The execution will start from the Event Loop, so it will have a fresh stack. This is where JavaScript comes in really handy. Thanks to closures, it is trivial to maintain the context of the caller of the asynchronous function, even if the callback is invoked at a different point in time and from a different location.

A synchronous function blocks until it completes its operations. An asynchronous function returns immediately and the result is passed to a handler (in our case, a callback) at a later cycle of the event loop.

Non-continuation-passing style callbacks

There are several circumstances in which the presence of a callback argument might make us think that a function is asynchronous or is using a continuation-passing style; that's not always true. Let's take, for example, the map() method of an Array object:

const result = [1, 5, 7].map(element => element - 1); 
console.log(result); // [0, 4, 6] 

Clearly, the callback is used just to iterate over the elements of the array, and not to pass the result of the operation. In fact, the result is returned synchronously using a direct style. The intent of a callback is usually clearly stated in the documentation of the API.

Synchronous or asynchronous?

We have seen how the order of the instructions changes radically depending on the nature of a function-synchronous or asynchronous. This has strong repercussions on the flow of the entire application, both in correctness and efficiency. The following is an analysis of these two paradigms and their pitfalls. In general, what must be avoided is creating inconsistency and confusion around the nature of an API, as doing so can lead to a set of problems which might be very hard to detect and reproduce. To drive our analysis, we will take as an example the case of an inconsistently asynchronous function.

An unpredictable function

One of the most dangerous situations is to have an API that behaves synchronously under certain conditions and asynchronously under others. Let's take the following code as an example:

const fs = require('fs'); 
const cache = {}; 
function inconsistentRead(filename, callback) { 
  if(cache[filename]) { 
    //invoked synchronously 
    callback(cache[filename]);   
  } else { 
    //asynchronous function 
    fs.readFile(filename, 'utf8', (err, data) => { 
      cache[filename] = data; 
      callback(data); 
    }); 
  } 
} 

The preceding function uses the cache variable to store the results of different file read operations. Bear in mind that this is just an example, it does not have error management, and the caching logic itself is suboptimal. Besides this, the preceding function is dangerous because it behaves asynchronously if the cache is not set-which is not until the fs.readFile() function returns its results-but it will also be synchronous for all the subsequent requests for a file already in the cache-triggering an immediate invocation of the callback.

Unleashing Zalgo

Now, let's see how the use of an unpredictable function, such as the one that we defined previously, can easily break an application. Consider the following code:

function createFileReader(filename) {
  const listeners = [];
  inconsistentRead(filename, value => {
    listeners.forEach(listener => listener(value));
  });

  return {
    onDataReady: listener => listeners.push(listener)
  };
} 

When the preceding function is invoked, it creates a new object that acts as a notifier, allowing us to set multiple listeners for a file read operation. All the listeners will be invoked at once when the read operation completes and the data is available. The preceding function uses our inconsistentRead() function to implement this functionality. Let's now try to use the createFileReader() function:

const reader1 = createFileReader('data.txt'); 
reader1.onDataReady(data => { 
  console.log('First call data: ' + data); 
 
  //...sometime later we try to read again from 
  //the same file 
  const reader2 = createFileReader('data.txt'); 
  reader2.onDataReady( data => { 
    console.log('Second call data: ' + data); 
  }); 
}); 

The preceding code will print the following output:

First call data: some data

As you can see, the callback of the second operation is never invoked. Let's see why:

  • During the creation of reader1, our inconsistentRead() function behaves asynchronously, because there is no cached result available. Therefore, we have all the time in the world to register our listener, as it will be invoked later in another cycle of the event loop, when the read operation completes.
  • Then, reader2 is created in a cycle of the event loop in which the cache for the requested file already exists. In this case, the inner call to inconsistentRead() will be synchronous. So, its callback will be invoked immediately, which means that all the listeners of reader2 will be invoked synchronously as well. However, we are registering the listeners after the creation of reader2, so they will never be invoked.

The callback behavior of our inconsistentRead() function is really unpredictable, as it depends on many factors, such as the frequency of its invocation, the filename passed as argument, and the amount of time taken to load the file.

The bug that we've just seen might be extremely complicated to identify and reproduce in a real application. Imagine using a similar function in a web server, where there can be multiple concurrent requests; imagine seeing some of those requests hanging, without any apparent reason and without any error being logged. This definitely falls under the category of nasty defects.

Isaac Z. Schlueter, creator of npm and former Node.js project lead, in one of his blog posts compared the use of this type of unpredictable functions to unleashing Zalgo.

Zalgo is an Internet legend about an ominous entity believed to cause insanity, death, and destruction of the world. If you're not familiar with Zalgo, you are invited to find out what it is.

You can find Isaac Z. Schlueter's original post at http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony .

Using synchronous APIs

The lesson to learn from the unleashing Zalgo example is that it is imperative for an API to clearly define its nature: either synchronous or asynchronous.

One suitable fix for our inconsistentRead() function is to make it totally synchronous. This is possible because Node.js provides a set of synchronous direct style APIs for most basic I/O operations. For example, we can use the fs.readFileSync() function in place of its asynchronous counterpart. The code would now be as follows:

const fs = require('fs'); 
const cache = {}; 
function consistentReadSync(filename) { 
  if(cache[filename]) { 
    return cache[filename];   
  } else { 
    cache[filename] = fs.readFileSync(filename, 'utf8'); 
    return cache[filename]; 
  } 
} 

We can see that the entire function was also converted to a direct style. There is no reason for a function to have a continuation-passing style if it is synchronous. In fact, we can state that it is always best practice to implement a synchronous API using a direct style; this will eliminate any confusion around its nature and will also be more efficient from a performance perspective.

Note

Pattern

Prefer the direct style for purely synchronous functions.

Bear in mind that changing an API from CPS to a direct style, or from asynchronous to synchronous or vice versa might also require a change to the style of all the code using it. For example, in our case, we will have to totally change the interface of our createFileReader() API and adapt it to always work synchronously.

Also, using a synchronous API instead of an asynchronous one has some caveats:

  • A synchronous API for a specific functionality might not always be available.
  • A synchronous API will block the event loop and put the concurrent requests on hold. It does break the JavaScript concurrency model, slowing down the whole application. We will see later in the book what this really means for our applications.

In our consistentReadSync() function, the risk of blocking the event loop is partially mitigated because the synchronous I/O API is invoked only once per filename, while the cached value will be used for all the subsequent invocations. If we have a limited number of static files, then using consistentReadSync() won't have a big effect on our event loop. Things can change quickly if we have to read many files and only once. Using synchronous I/O in Node.js is strongly discouraged in many circumstances; however, in some situations, this might be the easiest and most efficient solution. Always evaluate your specific use case in order to choose the right alternative. Just to make a real use case example: it makes perfect sense to use a synchronous blocking API to load a configuration file while bootstrapping an application.

Use blocking API only when they don't affect the ability of the application to serve concurrent requests.

Deferred execution

Another alternative for fixing our inconsistentRead() function is to make it purely asynchronous. The trick here is to schedule the synchronous callback invocation to be executed "in the future" instead of being run immediately in the same event loop cycle. In Node.js, this is possible using process.nextTick(), which defers the execution of a function until the next pass of the event loop. Its functioning is very simple; it takes a callback as an argument and pushes it to the top of the event queue, in front of any pending I/O event, and returns immediately. The callback will then be invoked as soon as the event loop runs again.

Let's apply this technique to fix our inconsistentRead() function as follows:

const fs = require('fs'); 
const cache = {}; 
function consistentReadAsync(filename, callback) { 
  if(cache[filename]) { 
    process.nextTick(() => callback(cache[filename])); 
  } else { 
    //asynchronous function 
    fs.readFile(filename, 'utf8', (err, data) => { 
      cache[filename] = data; 
      callback(data); 
    }); 
  } 
} 

Now, our function is guaranteed to invoke its callback asynchronously, under any circumstances.

Another API for deferring the execution of code is setImmediate(). While their purposes are very similar, their semantics are quite different. Callbacks deferred with process.nextTick() run before any other I/O event is fired, while with setImmediate(), the execution is queued behind any I/O event that is already in the queue. Since process.nextTick() runs before any already scheduled I/O, it might cause I/O starvation under certain circumstances, for example, a recursive invocation; this can never happen with setImmediate(). We will learn to appreciate the difference between these two APIs when we analyze the use of deferred invocation for running synchronous CPU-bound tasks later in the book.

Note

Pattern

We guarantee that a callback is invoked asynchronously by deferring its execution using process.nextTick().

Node.js callback conventions

In Node.js, continuation-passing style APIs and callbacks follow a set of specific conventions. These conventions apply to the Node.js core API but they are also followed by the vast majority of the userland modules and applications. So, it's very important that we understand them and make sure that we comply whenever we need to design an asynchronous API.

Callbacks come last

In all core Node.js methods, the standard convention is that when a function accepts a callback in input, this has to be passed as the last argument. Let's take the following Node.js core API as an example:

fs.readFile(filename, [options], callback) 

As we can see from the signature of the preceding function, the callback is always put in the last position, even in the presence of optional arguments. The reason for this convention is that the function call is more readable in case the callback is defined in place.

Error comes first

In CPS, errors are propagated as any other type of result, which means using callbacks. In Node.js, any error produced by a CPS function is always passed as the first argument of the callback, and any actual result is passed starting from the second argument. If the operation succeeds without errors, the first argument will be null or undefined. The following code shows you how to define a callback complying with this convention:

fs.readFile('foo.txt', 'utf8', (err, data) => { 
  if(err) 
    handleError(err); 
  else 
    processData(data); 
}); 

It is best practice to always check for the presence of an error, as not doing so will make it harder for us to debug our code and discover the possible points of failure. Another important convention to take into account is that the error must always be of type Error. This means that simple strings or numbers should never be passed as error objects.

Propagating errors

Propagating errors in synchronous, direct style functions is done with the well-known throw statement, which causes the error to jump up in the call stack until it is caught.

In asynchronous CPS however, proper error propagation is done by simply passing the error to the next callback in the chain. The typical pattern looks as follows:

const fs = require('fs'); 
function readJSON(filename, callback) { 
  fs.readFile(filename, 'utf8', (err, data) => { 
    let parsed; 
    if(err) 
      //propagate the error and exit the current function 
      return callback(err); 

    try { 
      //parse the file contents 
      parsed = JSON.parse(data); 
    } catch(err) { 
      //catch parsing errors 
      return callback(err); 
    } 
    //no errors, propagate just the data 
    callback(null, parsed); 
  }); 
}; 

The detail you should notice in the previous code is how the callback is invoked when we want to pass a valid result and when we want to propagate an error. Also notice that when we are propagating an error we use the return statement. We do so to exit from the function as soon as the callback function is invoked and to avoid executing the next lines in readJSON.

Uncaught exceptions

You might have seen in the readJSON() function, that was used in order to avoid any exception being thrown into the fs.readFile() callback, we put a try...catch block around JSON.parse(). Throwing inside an asynchronous callback will cause the exception to jump up to the event loop and never be propagated to the next callback.

In Node.js, this is an unrecoverable state and the application will simply shut down printing the error to the stderr interface. To demonstrate this, let's try to remove the try...catch block from the readJSON() function defined previously:

const fs = require('fs'); 
function readJSONThrows(filename, callback) { 
  fs.readFile(filename, 'utf8', (err, data) => { 
    if(err) { 
      return callback(err); 
    } 
    //no errors, propagate just the data 
    callback(null, JSON.parse(data)); 
  }); 
}; 

Now, in the function we just defined, there is no way of catching an eventual exception coming from JSON.parse(). If we try to parse an invalid JSON file with the following code:

readJSONThrows('nonJSON.txt', err => console.log(err)); 

This would result in the application being abruptly terminated with the following exception being printed on the console:

SyntaxError: Unexpected token d
    at Object.parse (native)
    at [...]
    at fs.js:266:14
    at Object.oncomplete (fs.js:107:15)

Now, if we look at the preceding stack trace, we will see that it starts somewhere from the fs.js module, exactly from the point at which the native API has completed reading and returned its result back to the fs.readFile() function, via the event loop. This clearly shows us that the exception traveled from our callback into the stack and then straight into the event loop, where it's finally caught and thrown in the console.

This also means that wrapping the invocation of readJSONThrows() with a try...catch block will not work, because the stack in which the block operates is different from the one in which our callback is invoked. The following code shows the anti-pattern that we just described:

try { 
  readJSONThrows('nonJSON.txt', function(err, result) { 
    //... 
  }); 
} catch(err) { 
  console.log('This will not catch the JSON parsing exception'); 
} 

The preceding catch statement will never receive the JSON parsing exception, as it will travel back to the stack in which the exception was thrown. We just saw that the stack ends up in the event loop and not with the function that triggers the asynchronous operation.

As said before, the application aborts the moment an exception reaches the event loop; however, we still have a chance to perform some cleanup or logging before the application terminates. In fact, when this happens, Node.js emits a special event called uncaughtException just before exiting the process. The following code shows a sample use case:

process.on('uncaughtException', (err) => { 
  console.error('This will catch at last the ' + 
    'JSON parsing exception: ' + err.message); 
  // Terminates the application with 1 (error) as exit code: 
  // without the following line, the application would continue 
  process.exit(1); 
}); 

It's important to understand that an uncaught exception leaves the application in a state that is not guaranteed to be consistent, which can lead to unforeseeable problems. For example, there might still be incomplete I/O requests running or closures might have become inconsistent. That's why it is always advised, especially in production, to exit from the application after an uncaught exception is received anyway.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.39.144