3

Callbacks and Events

In synchronous programming, we conceptualize code as a series of consecutive computing steps that solve a specific problem. Every operation is blocking, which means that only when an operation is completed, it is possible to execute the next one. This approach makes the code very easy to read, understand, and debug.

On the other side, in asynchronous programming, some operations, such as reading from a file or performing a network request, are launched and then executed "in the background." When we invoke an asynchronous operation, the instruction that follows is executed immediately, even if the previous asynchronous operation has not finished yet. In this scenario, we need a way to get notified when an asynchronous operation completes, and then continue the execution flow using the results from the operation. The most basic mechanism to get notified about the completion of an asynchronous operation in Node.js is the callback, which is nothing more than a function invoked by the runtime with the result of an asynchronous operation.

The callback is the most basic building block on which all other asynchronous mechanisms are based. In fact, without callbacks, we wouldn't have promises, and therefore not even async/await; we also wouldn't have streams or events. This is why it's important to know how callbacks work.

In this chapter, you will learn more about the Node.js Callback pattern and understand what it means, in practice, to write asynchronous code. We will make our way through conventions, patterns, and pitfalls, and by the end of this chapter, you will have mastered the basics of the Callback pattern.

You will also learn about the Observer pattern, which can be considered a close relative of the Callback pattern. The Observer pattern—embodied by the EventEmitter—uses callbacks to deal with multiple heterogeneous events and is one of the most extensively used components in Node.js programming.

To summarize, this is what you will learn in this chapter:

  • The Callback pattern, how it works, what conventions are used in Node.js, and how to deal with its most common pitfalls
  • The Observer pattern and how to implement it in Node.js using the EventEmitter class

The Callback pattern

Callbacks are the materialization of the handlers of the Reactor pattern (introduced in the previous chapter). They are one of those imprints that give Node.js its distinctive programming style.

Callbacks are functions that are invoked to propagate the result of an operation, and this is exactly what we need when dealing with asynchronous operations. In the asynchronous world, they replace the use of the return instruction, which, in turn, always executes synchronously. JavaScript is the ideal language for callbacks because functions are first-class objects and can be easily assigned to variables, passed as arguments, returned from another function invocation, or stored in data structures. Another ideal construct for implementing callbacks is closures. With closures, we can reference the environment in which a function was created; this way, we can always maintain the context in which the asynchronous operation was requested, no matter when or where its callback is invoked.

If you need to refresh your knowledge about closures, you can refer to the article on MDN Web Docs at nodejsdp.link/mdn-closures.

In this section, we will analyze this particular style of programming, which uses callbacks instead of return instructions.

The continuation-passing style

In JavaScript, a callback is a function that is passed as an argument to another function and is invoked with the result when the operation completes. In functional programming, this way of propagating the result is called continuation-passing style (CPS).

It is a general concept, and it is not always associated with asynchronous operations. In fact, it simply indicates that a result is propagated by passing it to another function (the callback), instead of directly returning it to the caller.

Synchronous CPS

To clarify this concept, let's take a look at a simple synchronous function:

function add (a, b) {
  return a + b
}

If you are wondering, there is nothing special going on here. The result is passed back to the caller using the return instruction. This is also called direct style, and it represents the most common way of returning a result in synchronous programming.

The equivalent CPS of the preceding function would be as follows:

function addCps (a, b, callback) {
  callback(a + b)
}

The addCps() function is a synchronous CPS function. It's synchronous because it will complete its execution only when the callback completes its execution too. The following code demonstrates this statement:

console.log('before')
addCps(1, 2, result => console.log(`Result: ${result}`))
console.log('after')

Since addCps() is synchronous, the previous code will trivially print the following:

before
Result: 3
after

Now, let's see how asynchronous CPS works.

Asynchronous CPS

Let's consider a case where the addCps() function is asynchronous:

function additionAsync (a, b, callback) {
  setTimeout(() => callback(a + b), 100)
}

In the previous code, we used setTimeout() to simulate an asynchronous invocation of the callback. setTimeout() adds a task to the event queue that is executed after the given number of milliseconds. This is clearly an asynchronous operation. Now, let's try to use additionAsync() and see how the order of the operations changes:

console.log('before')
additionAsync(1, 2, result => console.log(`Result: ${result}`))
console.log('after')

The preceding code will print the following:

before
after
Result: 3

Since setTimeout() triggers an asynchronous operation, it doesn't wait for the callback to be executed; instead, it returns immediately, giving the control back to additionAsync(), and then back again to its caller. This property in Node.js is crucial, as it gives control back to the event loop as soon as an asynchronous request is sent, thus allowing a new event from the queue to be processed.

Figure 3.1 shows how this works:

Figure 3.1: Control flow of an asynchronous function's invocation

When the asynchronous operation completes, the execution is then resumed, starting from the callback provided to the asynchronous function that caused the unwinding. The execution starts from the event loop, so it has a fresh stack. This is where JavaScript comes in really handy. Thanks to closures, it is trivial to maintain the context of the caller of the asynchronous function, even if the callback is invoked at a different point in time and from a different location.

To sum this up, a synchronous function blocks until it completes its operations. An asynchronous function returns immediately, and its result is passed to a handler (in our case, a callback) at a later cycle of the event loop.

Non-CPS callbacks

There are several circumstances in which the presence of a callback argument might make us think that a function is asynchronous or is using a CPS. That's not always true. Let's take, for example, the map() method of an Array object:

const result = [1, 5, 7].map(element => element - 1)
console.log(result) // [0, 4, 6]

Clearly, the callback is used just to iterate over the elements of the array, and not to pass the result of the operation. In fact, the result is returned synchronously using a direct style. There's no syntactic difference between non-CPS callbacks and CPS ones. Therefore, the intent of a callback should be clearly stated in the documentation of the API.

In the next section, we are going to discuss one of the most important pitfalls of callbacks that every Node.js developer should be aware of.

Synchronous or asynchronous?

You have seen how the execution order of the instructions changes radically depending on the nature of a function—synchronous or asynchronous. This has strong repercussions on the flow of the entire application, both in terms of correctness and efficiency. The following is an analysis of these two paradigms and their pitfalls. In general, what must be avoided is creating inconsistency and confusion around the nature of an API, as doing so can lead to a set of problems that might be very hard to detect and reproduce. To drive our analysis, we will take, as an example, the case of an inconsistently asynchronous function.

An unpredictable function

One of the most dangerous situations is to have an API that behaves synchronously under certain conditions and asynchronously under others. Let's take the following code as an example:

import { readFile } from 'fs'
const cache = new Map()
function inconsistentRead (filename, cb) {
  if (cache.has(filename)) {
    // invoked synchronously
    cb(cache.get(filename))
  } else {
    // asynchronous function
    readFile(filename, 'utf8', (err, data) => {
      cache.set(filename, data)
      cb(data)
    })
  }
}

The preceding function uses the cache map to store the results of different file read operations. Bear in mind that this is just an example; it does not have error management, and the caching logic itself is suboptimal (in Chapter 11, Advanced Recipes, you'll learn how to handle asynchronous caching properly). But besides all this, the preceding function is dangerous because it behaves asynchronously until the file is read for the first time and the cache is set, but it is synchronous for all the subsequent requests once the file's content is already in the cache.

Unleashing Zalgo

Now, let's discuss how the use of an unpredictable function, such as the one that we just defined, can easily break an application. Consider the following code:

function createFileReader (filename) {
  const listeners = []
  inconsistentRead(filename, value => {
    listeners.forEach(listener => listener(value))
  })
  return {
    onDataReady: listener => listeners.push(listener)
  }
}

When the preceding function is invoked, it creates a new object that acts as a notifier, allowing us to set multiple listeners for a file read operation. All the listeners will be invoked at once when the read operation completes and the data is available. The preceding function uses our inconsistentRead() function to implement this functionality. Let's see how to use the createFileReader() function:

const reader1 = createFileReader('data.txt')
reader1.onDataReady(data => {
  console.log(`First call data: ${data}`)
  // ...sometime later we try to read again from
  // the same file
  const reader2 = createFileReader('data.txt')
  reader2.onDataReady(data => {
    console.log(`Second call data: ${data}`)
  })
})

The preceding code will print the following:

First call data: some data

As you can see, the callback of the second reader is never invoked. Let's see why:

  • During the creation of reader1, our inconsistentRead() function behaves asynchronously because there is no cached result available. This means that any onDataReady listener will be invoked later in another cycle of the event loop, so we have all the time we need to register our listener.
  • Then, reader2 is created in a cycle of the event loop in which the cache for the requested file already exists. In this case, the inner call to inconsistentRead() will be synchronous. So, its callback will be invoked immediately, which means that all the listeners of reader2 will be invoked synchronously as well. However, we are registering the listener after the creation of reader2, so it will never be invoked.

The callback behavior of our inconsistentRead() function is really unpredictable as it depends on many factors, such as the frequency of its invocation, the filename passed as an argument, and the amount of time taken to load the file.

The bug that you've just seen can be extremely complicated to identify and reproduce in a real application. Imagine using a similar function in a web server, where there can be multiple concurrent requests. Imagine seeing some of those requests hanging, without any apparent reason and without any error being logged. This can definitely be considered a nasty defect.

Isaac Z. Schlueter, the creator of npm and former Node.js project lead, in one of his blog posts, compared the use of this type of unpredictable function to unleashing Zalgo.

Zalgo is an internet legend about an ominous entity believed to cause insanity, death, and the destruction of the world. If you're not familiar with Zalgo, you are invited to find out what it is.

You can find Isaac Z. Schlueter's original post at nodejsdp.link/unleashing-zalgo.

Using synchronous APIs

The lesson to learn from the unleashing Zalgo example is that it is imperative for an API to clearly define its nature: either synchronous or asynchronous.

One possible fix for our inconsistentRead() function is to make it completely synchronous. This is possible because Node.js provides a set of synchronous direct style APIs for most basic I/O operations. For example, we can use the fs.readFileSync() function in place of its asynchronous counterpart. The code would become as follows:

import { readFileSync } from 'fs'
const cache = new Map()
function consistentReadSync (filename) {
  if (cache.has(filename)) {
    return cache.get(filename)
  } else {
    const data = readFileSync(filename, 'utf8')
    cache.set(filename, data)
    return data
  }
}

You can see that the entire function was also converted into direct style. There is no reason for a function to have a CPS if it is synchronous. In fact, it is always best practice to implement a synchronous API using a direct style. This will eliminate any confusion around its nature and will also be more efficient from a performance perspective.

Pattern

Always choose a direct style for purely synchronous functions.

Bear in mind that changing an API from CPS to a direct style, or from asynchronous to synchronous or vice versa, might also require a change to the style of all the code using it. For example, in our case, we will have to totally change the interface of our createFileReader() API and adapt it so that it always works synchronously.

Also, using a synchronous API instead of an asynchronous one has some caveats:

  • A synchronous API for a specific functionality might not always be available.
  • A synchronous API will block the event loop and put any concurrent requests on hold. This will break the Node.js concurrency model, slowing down the whole application. You will see later in this book what this really means for our applications.

In our consistentReadSync() function, the risk of blocking the event loop is partially mitigated because the synchronous I/O API is invoked only once per filename, while the cached value will be used for all the subsequent invocations. If we have a limited number of static files, then using consistentReadSync() won't have a big effect on our event loop. Things can change quickly if we have to read many files and only once.

Using synchronous I/O in Node.js is strongly discouraged in many circumstances, but in some situations, this might be the easiest and most efficient solution. Always evaluate your specific use case in order to choose the right alternative. As an example, it makes perfect sense to use a synchronous blocking API to load a configuration file while bootstrapping an application.

Pattern

Use blocking APIs sparingly and only when they don't affect the ability of the application to handle concurrent asynchronous operations.

Guaranteeing asynchronicity with deferred execution

Another alternative for fixing our inconsistentRead() function is to make it purely asynchronous. The trick here is to schedule the synchronous callback invocation to be executed "in the future" instead of it being run immediately in the same event loop cycle. In Node.js, this is possible with process.nextTick(), which defers the execution of a function after the currently running operation completes. Its functionality is very simple: it takes a callback as an argument and pushes it to the top of the event queue, in front of any pending I/O event, and returns immediately. The callback will then be invoked as soon as the currently running operation yields control back to the event loop.

Let's apply this technique to fix our inconsistentRead() function, as follows:

import { readFile } from 'fs'
const cache = new Map()
function consistentReadAsync (filename, callback) {
  if (cache.has(filename)) {
    // deferred callback invocation
    process.nextTick(() => callback(cache.get(filename)))
  } else {
    // asynchronous function
    readFile(filename, 'utf8', (err, data) => {
      cache.set(filename, data)
      callback(data)
    })
  }
}

Now, thanks to process.nextTick(), our function is guaranteed to invoke its callback asynchronously, under any circumstances. Try to use it instead of the inconsistentRead() function and verify that, indeed, Zalgo has been eradicated.

Pattern

You can guarantee that a callback is invoked asynchronously by deferring its execution using process.nextTick().

Another API for deferring the execution of code is setImmediate(). While its purpose is very similar to that of process.nextTick(), its semantics are quite different. Callbacks deferred with process.nextTick() are called microtasks and they are executed just after the current operation completes, even before any other I/O event is fired. With setImmediate(), on the other hand, the execution is queued in an event loop phase that comes after all I/O events have been processed. Since process.nextTick() runs before any already scheduled I/O, it will be executed faster, but under certain circumstances, it might also delay the running of any I/O callback indefinitely (also known as I/O starvation), such as in the presence of a recursive invocation. This can never happen with setImmediate().

Using setTimeout(callback, 0) has a behavior comparable to that of setImmediate(), but in typical circumstances, callbacks scheduled with setImmediate() are executed faster than those scheduled with setTimeout(callback, 0). To see why, we have to consider that the event loop executes all the callbacks in different phases; for the type of events we are considering, we have timers (setTimeout()) that are executed before I/O callbacks, which are, in turn, executed before setImmediate() callbacks. This means that if we queue a task with setImmediate() in a setTimeout() callback, in an I/O callback, or in a microtask queued after these two phases, then the callback will be executed in a phase that comes right after the phase we are currently in. setTimeout() callbacks have to wait for the next cycle of the event loop.

You will better appreciate the difference between these APIs when we analyze the use of deferred invocation for running synchronous CPU-bound tasks later in this book.

Next, we are going to explore the conventions used to define callbacks in Node.js.

Node.js callback conventions

In Node.js, CPS APIs and callbacks follow a set of specific conventions. These conventions apply to the Node.js core API, but they are also followed by the vast majority of the userland modules and applications. So, it's very important that you understand them and make sure that you comply whenever you need to design an asynchronous API that makes use of callbacks.

The callback comes last

In all core Node.js functions, the standard convention is that when a function accepts a callback as input, this has to be passed as the last argument.

Let's take the following Node.js core API as an example:

readFile(filename, [options], callback)

As you can see from the signature of the preceding function, the callback is always put in the last position, even in the presence of optional arguments. The reason for this convention is that the function call is more readable in case the callback is defined in place.

Any error always comes first

In CPS, errors are propagated like any other type of result, which means using callbacks. In Node.js, any error produced by a CPS function is always passed as the first argument of the callback, and any actual result is passed starting from the second argument. If the operation succeeds without errors, the first argument will be null or undefined. The following code shows you how to define a callback that complies with this convention:

readFile('foo.txt', 'utf8', (err, data) => {
  if(err) {
    handleError(err)
  } else {
    processData(data)
  }
})

It is best practice to always check for the presence of an error, as not doing so will make it harder for you to debug your code and discover the possible points of failure. Another important convention to take into account is that the error must always be of type Error. This means that simple strings or numbers should never be passed as error objects.

Propagating errors

Propagating errors in synchronous, direct style functions is done with the well-known throw statement, which causes the error to jump up in the call stack until it is caught.

In asynchronous CPS, however, proper error propagation is done by simply passing the error to the next callback in the chain. The typical pattern looks as follows:

import { readFile } from 'fs'
function readJSON (filename, callback) {
  readFile(filename, 'utf8', (err, data) => {
    let parsed
    if (err) {
      // propagate the error and exit the current function
      return callback(err)
    }
    try {
      // parse the file contents
      parsed = JSON.parse(data)
    } catch (err) {
      // catch parsing errors
      return callback(err)
    }
    // no errors, propagate just the data
    callback(null, parsed)
  })
}

Notice how we propagate the error received by the readFile() operation. We do not throw it or return it; instead, we just use the callback as if it were any other result. Also, notice how we use the try...catch statement to catch any error thrown by JSON.parse(), which is a synchronous function and therefore uses the traditional throw instruction to propagate errors to the caller. Lastly, if everything went well, callback is invoked with null as the first argument to indicate that there are no errors.

It's also interesting to note how we refrained from invoking callback from within the try block. This is because doing so would catch any error thrown from the execution of the callback itself, which is usually not what we want.

Uncaught exceptions

Sometimes, it can happen that an error is thrown and not caught within the callback of an asynchronous function. This could happen if, for example, we had forgotten to surround JSON.parse() with a try...catch statement in the readJSON() function we defined previously. Throwing an error inside an asynchronous callback would cause the error to jump up to the event loop, so it would never be propagated to the next callback. In Node.js, this is an unrecoverable state and the application would simply exit with a non-zero exit code, printing the stack trace to the stderr interface.

To demonstrate this, let's try to remove the try...catch block surrounding JSON.parse() from the readJSON() function we defined previously:

function readJSONThrows (filename, callback) {
  readFile(filename, 'utf8', (err, data) => {
    if (err) {
      return callback(err)
    }
    callback(null, JSON.parse(data))
  })
}

Now, in the function we just defined, there is no way of catching an eventual exception coming from JSON.parse(). If we try to parse an invalid JSON file with the following code:

readJSONThrows('invalid_json.json', (err) => console.error(err))

This will result in the application being abruptly terminated, with a stack trace similar to the following being printed on the console:

SyntaxError: Unexpected token h in JSON at position 1
    at JSON.parse (<anonymous>)
    at file:///.../03-callbacks-and-events/08-uncaught-errors/index.js:8:25
    at FSReqCallback.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:61:3)

Now, if you look at the preceding stack trace, you will see that it starts from within the built-in fs module, and exactly from the point in which the native API has completed reading and returned its result back to the fs.readFile() function, via the event loop. This clearly shows that the exception traveled from our callback, up the call stack, and then straight into the event loop, where it was finally caught and thrown to the console.

This also means that wrapping the invocation of readJSONThrows() with a try...catch block will not work, because the stack in which the block operates is different from the one in which our callback is invoked. The following code shows the anti-pattern that was just described:

try {
  readJSONThrows('invalid_json.json', (err) => console.error(err))
} catch (err) {
  console.log('This will NOT catch the JSON parsing exception')
}

The preceding catch statement will never receive the JSON parsing error, as it will travel up the call stack in which the error was thrown, that is, in the event loop and not in the function that triggered the asynchronous operation.

As mentioned previously, the application will abort the moment an exception reaches the event loop. However, we still have the chance to perform some cleanup or logging before the application terminates. In fact, when this happens, Node.js will emit a special event called uncaughtException, just before exiting the process. The following code shows a sample use case:

process.on('uncaughtException', (err) => {
  console.error(`This will catch at last the JSON parsing exception: ${err.message}`)
  // Terminates the application with 1 (error) as exit code.
  // Without the following line, the application would continue
  process.exit(1)
})

It's important to understand that an uncaught exception leaves the application in a state that is not guaranteed to be consistent, which can lead to unforeseeable problems. For example, there might still be incomplete I/O requests running or closures might have become inconsistent. That's why it is always advised, especially in production, to never leave the application running after an uncaught exception is received. Instead, the process should exit immediately, optionally after having run some necessary cleanup tasks, and ideally, a supervising process should restart the application. This is also known as the fail-fast approach and it's the recommended practice in Node.js.

We'll discuss supervisors in more detail in Chapter 12, Scalability and Architectural Patterns.

This concludes our gentle introduction to the Callback pattern. Now, it's time to meet the Observer pattern, which is another critical component of an event-driven platform such as Node.js.

The Observer pattern

Another important and fundamental pattern used in Node.js is the Observer pattern. Together with the Reactor pattern and callbacks, the Observer pattern is an absolute requirement for mastering the asynchronous world of Node.js.

The Observer pattern is the ideal solution for modeling the reactive nature of Node.js and a perfect complement for callbacks. Let's give a formal definition, as follows:

The Observer pattern defines an object (called subject) that can notify a set of observers (or listeners) when a change in its state occurs.

The main difference from the Callback pattern is that the subject can actually notify multiple observers, while a traditional CPS callback will usually propagate its result to only one listener, the callback.

The EventEmitter

In traditional object-oriented programming, the Observer pattern requires interfaces, concrete classes, and a hierarchy. In Node.js, all this becomes much simpler. The Observer pattern is already built into the core and is available through the EventEmitter class. The EventEmitter class allows us to register one or more functions as listeners, which will be invoked when a particular event type is fired. Figure 3.2 visually explains this concept:

Figure 3.2: Listeners receiving events from an EventEmitter

The EventEmitter is exported from the events core module. The following code shows how we can obtain a reference to it:

import { EventEmitter } from 'events'
const emitter = new EventEmitter()

The essential methods of the EventEmitter are as follows:

  • on(event, listener): This method allows us to register a new listener (a function) for the given event type (a string).
  • once(event, listener): This method registers a new listener, which is then removed after the event is emitted for the first time.
  • emit(event, [arg1], [...]): This method produces a new event and provides additional arguments to be passed to the listeners.
  • removeListener(event, listener): This method removes a listener for the specified event type.

All the preceding methods will return the EventEmitter instance to allow chaining. The listener function has the signature function([arg1], [...]), so it simply accepts the arguments provided at the moment the event is emitted.

You can already see that there is a big difference between a listener and a traditional Node.js callback. In fact, the first argument is not an error, but it can be any data passed to emit() at the moment of its invocation.

Creating and using the EventEmitter

Let's now see how we can use an EventEmitter in practice. The simplest way is to create a new instance and use it immediately. The following code shows us a function that uses an EventEmitter to notify its subscribers in real time when a particular regular expression is matched in a list of files:

import { EventEmitter } from 'events'
import { readFile } from 'fs'
function findRegex (files, regex) {
  const emitter = new EventEmitter()
  for (const file of files) {
    readFile(file, 'utf8', (err, content) => {
      if (err) {
        return emitter.emit('error', err)
      }
      emitter.emit('fileread', file)
      const match = content.match(regex)
      if (match) {
        match.forEach(elem => emitter.emit('found', file, elem))
      }
    })
  }
  return emitter
}

The function we just defined returns an EventEmitter instance that will produce three events:

  • fileread, when a file is being read
  • found, when a match has been found
  • error, when an error occurs during reading the file

Let's now see how our findRegex() function can be used:

findRegex(
  ['fileA.txt', 'fileB.json'],
  /hello w+/g
)
  .on('fileread', file => console.log(`${file} was read`))
  .on('found', (file, match) => console.log(`Matched "${match}" in ${file}`))
  .on('error', err => console.error(`Error emitted ${err.message}`))

In the code we just defined, we register a listener for each of the three event types produced by the EventEmitter that was created by our findRegex() function.

Propagating errors

As with callbacks, the EventEmitter can't just throw an exception when an error condition occurs. Instead, the convention is to emit a special event, called error, and pass an Error object as an argument. That's exactly what we were doing in the findRegex() function that we defined earlier.

The EventEmitter treats the error event in a special way. It will automatically throw an exception and exit from the application if such an event is emitted and no associated listener is found. For this reason, it is recommended to always register a listener for the error event.

Making any object observable

In the Node.js world, the EventEmitter is rarely used on its own, as you saw in the previous example. Instead, it is more common to see it extended by other classes. In practice, this enables any class to inherit the capabilities of the EventEmitter, hence becoming an observable object.

To demonstrate this pattern, let's try to implement the functionality of the findRegex() function in a class, as follows:

import { EventEmitter } from 'events'
import { readFile } from 'fs'
class FindRegex extends EventEmitter {
  constructor (regex) {
    super()
    this.regex = regex
    this.files = []
  }
  addFile (file) {
    this.files.push(file)
    return this
  }
  find () {
    for (const file of this.files) {
      readFile(file, 'utf8', (err, content) => {
        if (err) {
          return this.emit('error', err)
        }
        this.emit('fileread', file)
        const match = content.match(this.regex)
        if (match) {
          match.forEach(elem => this.emit('found', file, elem))
        }
      })
    }
    return this
  }
}

The FindRegex class that we just defined extends EventEmitter to become a fully fledged observable class. Always remember to use super() in the constructor to initialize the EventEmitter internals.

The following is an example of how to use the FindRegex class we just defined:

const findRegexInstance = new FindRegex(/hello w+/)
findRegexInstance
  .addFile('fileA.txt')
  .addFile('fileB.json')
  .find()
  .on('found', (file, match) => console.log(`Matched "${match}" in file ${file}`))
  .on('error', err => console.error(`Error emitted ${err.message}`))

You will now notice how the FindRegex object also provides the on() method, which is inherited from the EventEmitter. This is a pretty common pattern in the Node.js ecosystem. For example, the Server object of the core HTTP module inherits from the EventEmitter function, thus allowing it to produce events such as request (when a new request is received), connection (when a new connection is established), or closed (when the server socket is closed).

Other notable examples of objects extending the EventEmitter are Node.js streams. We will analyze streams in more detail in Chapter 6Coding with Streams.

EventEmitter and memory leaks

When subscribing to observables with a long life span, it is extremely important that we unsubscribe our listeners once they are no longer needed. This allows us to release the memory used by the objects in a listener's scope and prevent memory leaks. Unreleased EventEmitter listeners are the main source of memory leaks in Node.js (and JavaScript in general).

A memory leak is a software defect whereby memory that is no longer needed is not released, causing the memory usage of an application to grow indefinitely. For example, consider the following code:

const thisTakesMemory = 'A big string....'
const listener = () => {
  console.log(thisTakesMemory)
}
emitter.on('an_event', listener)

The variable thisTakesMemory is referenced in the listener and therefore its memory is retained until the listener is released from emitter, or until the emitter itself is garbage collected, which can only happen when there are no more active references to it, making it unreachable.

You can find a good explanation about garbage collection in JavaScript and the concept of reachability at nodejsdp.link/garbage-collection.

This means that if an EventEmitter remains reachable for the entire duration of the application, all its listeners do too, and with them all the memory they reference. If, for example, we register a listener to a "permanent" EventEmitter at every incoming HTTP request and never release it, then we are causing a memory leak. The memory used by the application will grow indefinitely, sometimes slowly, sometimes faster, but eventually it will crash the application. To prevent such a situation, we can release the listener with the removeListener() method of the EventEmitter:

emitter.removeListener('an_event', listener)

An EventEmitter has a very simple built-in mechanism for warning the developer about possible memory leaks. When the count of listeners registered to an event exceeds a specific amount (by default, 10), the EventEmitter will produce a warning. Sometimes, registering more than 10 listeners is completely fine, so we can adjust this limit by using the setMaxListeners() method of the EventEmitter.

We can use the convenience method once(event, listener) in place of on(event, listener) to automatically unregister a listener after the event is received for the first time. However, be advised that if the event we specify is never emitted, then the listener is never released, causing a memory leak.

Synchronous and asynchronous events

As with callbacks, events can also be emitted synchronously or asynchronously with respect to the moment the tasks that produce them are triggered. It is crucial that we never mix the two approaches in the same EventEmitter, but even more importantly, we should never emit the same event type using a mix of synchronous and asynchronous code, to avoid producing the same problems described in the Unleashing Zalgo section. The main difference between emitting synchronous and asynchronous events lies in the way listeners can be registered.

When events are emitted asynchronously, we can register new listeners, even after the task that produces the events is triggered, up until the current stack yields to the event loop. This is because the events are guaranteed not to be fired until the next cycle of the event loop, so we can be sure that we won't miss any events.

The FindRegex() class we defined previously emits its events asynchronously after the find() method is invoked. This is why we can register the listeners after the find() method is invoked, without losing any events, as shown in the following code:

findRegexInstance
  .addFile(...)
  .find()
  .on('found', ...)

On the other hand, if we emit our events synchronously after the task is launched, we have to register all the listeners before we launch the task, or we will miss all the events. To see how this works, let's modify the FindRegex class we defined previously and make the find() method synchronous:

find () {
  for (const file of this.files) {
    let content
    try {
      content = readFileSync(file, 'utf8')
    } catch (err) {
      this.emit('error', err)
    }
    this.emit('fileread', file)
    const match = content.match(this.regex)
    if (match) {
      match.forEach(elem => this.emit('found', file, elem))
    }
  }
  return this
}

Now, let's try to register a listener before we launch the find() task, and then a second listener after that to see what happens:

const findRegexSyncInstance = new FindRegexSync(/hello w+/)
findRegexSyncInstance
  .addFile('fileA.txt')
  .addFile('fileB.json')
  // this listener is invoked
  .on('found', (file, match) => console.log(`[Before] Matched "${match}"`))
  .find()
  // this listener is never invoked
  .on('found', (file, match) => console.log(`[After] Matched "${match}"`))

As expected, the listener that was registered after the invocation of the find() task is never called; in fact, the preceding code will print:

[Before] Matched "hello world"
[Before] Matched "hello NodeJS"

There are some (rare) situations in which emitting an event in a synchronous fashion makes sense, but the very nature of the EventEmitter lies in its ability to deal with asynchronous events. Most of the time, emitting events synchronously is a telltale sign that we either don't need the EventEmitter at all or that, somewhere else, the same observable is emitting another event asynchronously, potentially causing a Zalgo type of situation.

The emission of synchronous events can be deferred with process.nextTick() to guarantee that they are emitted asynchronously.

EventEmitter versus callbacks

A common dilemma when defining an asynchronous API is deciding whether to use an EventEmitter or simply accept a callback. The general differentiating rule is semantic: callbacks should be used when a result must be returned in an asynchronous way, while events should be used when there is a need to communicate that something has happened.

But besides this simple principle, a lot of confusion is generated from the fact that the two paradigms are, most of the time, equivalent and allow us to achieve the same results. Consider the following code as an example:

import { EventEmitter } from 'events'
function helloEvents () {
  const eventEmitter = new EventEmitter()
  setTimeout(() => eventEmitter.emit('complete', 'hello world'), 100)
  return eventEmitter
}
function helloCallback (cb) {
  setTimeout(() => cb(null, 'hello world'), 100)
}
helloEvents().on('complete', message => console.log(message))
helloCallback((err, message) => console.log(message))

The two functions helloEvents() and helloCallback() can be considered equivalent in terms of functionality. The first communicates the completion of the timeout using an event, while the second uses a callback. But what really differentiates them is the readability, the semantics, and the amount of code that is required for them to be implemented or used.

While a deterministic set of rules for you to choose between one style or the other can't be given, here are some hints to help you make a decision on which method to use:

  • Callbacks have some limitations when it comes to supporting different types of events. In fact, we can still differentiate between multiple events by passing the type as an argument of the callback, or by accepting several callbacks, one for each supported event. However, this can't exactly be considered an elegant API. In this situation, the EventEmitter can give a better interface and leaner code.
  • The EventEmitter should be used when the same event can occur multiple times, or may not occur at all. A callback, in fact, is expected to be invoked exactly once, whether the operation is successful or not. Having a possibly repeating circumstance should make us think again about the semantic nature of the occurrence, which is more similar to an event that has to be communicated, rather than a result to be returned.
  • An API that uses callbacks can notify only one particular callback, while using an EventEmitter allows us to register multiple listeners for the same event.

Combining callbacks and events

There are some particular circumstances where the EventEmitter can be used in conjunction with a callback. This pattern is extremely powerful as it allows us to pass a result asynchronously using a traditional callback, and at the same time return an EventEmitter, which can be used to provide a more detailed account on the status of an asynchronous process.

One example of this pattern is offered by the glob package (nodejsdp.link/npm-glob), a library that performs glob-style file searches. The main entry point of the module is the function it exports, which has the following signature:

const eventEmitter = glob(pattern, [options], callback)

The function takes a pattern as the first argument, a set of options, and a callback that is invoked with the list of all the files matching the provided pattern. At the same time, the function returns an EventEmitter, which provides a more fine-grained report about the state of the search process. For example, it is possible to be notified in real time when a match occurs by listening to the match event, to obtain the list of all the matched files with the end event, or to know whether the process was manually aborted by listening to the abort event. The following code shows what this looks like in practice:

import glob from 'glob'
glob('data/*.txt',
  (err, files) => {
    if (err) {
      return console.error(err)
    }
    console.log(`All files found: ${JSON.stringify(files)}`)
  })
  .on('match', match => console.log(`Match found: ${match}`))

Combining an EventEmitter with traditional callbacks is an elegant way to offer two different approaches to the same API. One approach is usually meant to be simpler and more immediate to use, while the other is targeted at more advanced scenarios.

The EventEmitter can also be combined with other asynchronous mechanisms such as promises (which we will look at in Chapter 5, Asynchronous Control Flow Patterns with Promises and Async/Await). In this case, just return an object (or array) containing both the promise and the EventEmitter. This object can then be destructured by the caller, like this: {promise, events} = foo().

Summary

In this chapter, we made our first contact with the practical aspects of writing asynchronous code. You discovered the two pillars of the entire Node.js asynchronous infrastructure—the callback and the EventEmitter—and we explored in detail their use cases, conventions, and patterns. We also explored some of the pitfalls of dealing with asynchronous code and you learned about the ways to avoid them. Mastering the content of this chapter paves the way toward learning the more advanced asynchronous techniques that will be presented throughout the rest of this book.

In the next chapter, you will learn how to deal with complex asynchronous control flows using callbacks.

Exercises

  • 3.1 A simple event: Modify the asynchronous FindRegex class so that it emits an event when the find process starts, passing the input files list as an argument. Hint: beware of Zalgo!
  • 3.2 Ticker: Write a function that accepts a number and a callback as the arguments. The function will return an EventEmitter that emits an event called tick every 50 milliseconds until the number of milliseconds is passed from the invocation of the function. The function will also call the callback when the number of milliseconds has passed, providing, as the result, the total count of tick events emitted. Hint: you can use setTimeout() to schedule another setTimeout() recursively.
  • 3.3 A simple modification: Modify the function created in exercise 3.2 so that it emits a tick event immediately after the function is invoked.
  • 3.4 Playing with errors: Modify the function created in exercise 3.3 so that it produces an error if the timestamp at the moment of a tick (including the initial one that we added as part of exercise 3.3) is divisible by 5. Propagate the error using both the callback and the event emitter. Hint: use Date.now() to get the timestamp and the remainder (%) operator to check whether the timestamp is divisible by 5.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.47.51