The module system and its patterns

Modules are the bricks for structuring non-trivial applications, but also the main mechanism to enforce information hiding by keeping private all the functions and variables that are not explicitly marked to be exported. In this section, we will introduce the Node.js module system and its most common usage patterns.

The revealing module pattern

One of the major problems with JavaScript is the absence of namespacing. Programs that run in the global scope polluting it with data that comes from both internal application code and dependencies. A popular technique to solve this problem is called the revealing module pattern, and it looks like the following:

const module = (() => { 
  const privateFoo = () => {...}; 
  const privateBar = []; 
 
  const exported = { 
    publicFoo: () => {...}, 
    publicBar: () => {...} 
  }; 
 
  return exported; 
})(); 
console.log(module);

This pattern leverages a self-invoking function to create a private scope, exporting only the parts that are meant to be public. In the preceding code, the module variable contains only the exported API, while the rest of the module content is practically inaccessible from outside. As we will see in a moment, the idea behind this pattern is used as a base for the Node.js module system.

Node.js modules explained

CommonJS is a group with the aim to standardize the JavaScript ecosystem, and one of their most popular proposals is called CommonJS modules. Node.js built its module system on top of this specification, with the addition of some custom extensions. To describe how it works, we can make an analogy with the revealing module pattern, where each module runs in a private scope, so that every variable that is defined locally does not pollute the global namespace.

A homemade module loader

To explain how this works, let's build a similar system from scratch. The code that follows creates a function that mimics a subset of the functionality of the original require() function of Node.js.

Let's start by creating a function that loads the content of a module, wraps it into a private scope, and evaluates it:

function loadModule(filename, module, require) { 
  const wrappedSrc=`(function(module, exports, require) { 
      ${fs.readFileSync(filename, 'utf8')} 
    })(module, module.exports, require);`; 
  eval(wrappedSrc); 
} 

The source code of a module is essentially wrapped into a function, as it was for the revealing module pattern. The difference here is that we pass a list of variables to the module, in particular, module, exports, and require. Make a note of how the exports argument of the wrapping function is initialized with the content of module.exports, as we will talk about this later.

Note

Bear in mind that this is only an example, and you will rarely need to evaluate some source code in a real application. Features such as eval() or the functions of the vm module (http://nodejs.org/api/vm.html) can be easily used in the wrong way or with the wrong input, thus opening a system to code injection attacks. They should always be used with extreme care or avoided altogether.

Let's now see what these variables contain by implementing our require() function:

const require = (moduleName) => { 
  console.log(`Require invoked for module: ${moduleName}`); 
  const id = require.resolve(moduleName);      //[1] 
  if(require.cache[id]) {                      //[2] 
    return require.cache[id].exports; 
  } 
 
  //module metadata 
  const module = {                             //[3] 
    exports: {}, 
    id: id 
  }; 
  //Update the cache 
  require.cache[id] = module;                  //[4] 
 
  //load the module 
  loadModule(id, module, require);             //[5] 
 
  //return exported variables 
  return module.exports;                       //[6] 
}; 
require.cache = {}; 
require.resolve = (moduleName) => { 
  /* resolve a full module id from the moduleName */ 
}; 

The previous function simulates the behavior of the original require() function of Node.js, which is used to load a module. Of course, this is just for educative purposes and it does not accurately or completely reflect the internal behavior of the real require() function, but it's great to understand the internals of the Node.js module system, how a module is defined, and loaded. What our homemade module system does is explained as follows:

  1. A module name is accepted as input, and the very first thing that we do is resolve the full path of the module, which we call id. This task is delegated to require.resolve(), which implements a specific resolving algorithm (we will talk about it later).
  2. If the module has already been loaded in the past, it should be available in the cache. In this case, we just return it immediately.
  3. If the module was not loaded yet, we set up the environment for the first load. In particular, we create a module object that contains an exports property initialized with an empty object literal. This property will be used by the code of the module to export any public API.
  4. The module object is cached.
  5. The module source code is read from its file and the code is evaluated, as we have seen before. We provide the module with the module object that we just created, and a reference to the require() function. The module exports its public API by manipulating or replacing the module.exports object.
  6. Finally, the content of module.exports, which represents the public API of the module, is returned to the caller.

As we see, there is nothing magical behind the workings of the Node.js module system; the trick is all in the wrapper we create around a module's source code and the artificial environment in which we run it.

Defining a module

By looking at how our custom require() function works, we should now know how to define a module. The following code gives us an example:

//load another dependency 
const dependency = require('./anotherModule'); 
 
//a private function 
function log() { 
  console.log(`Well done ${dependency.username}`); 
} 
 
//the API to be exported for public use 
module.exports.run = () => { 
  log(); 
}; 

The essential concept to remember is that everything inside a module is private unless it's assigned to the module.exports variable. The content of this variable is then cached and returned when the module is loaded using require().

Defining globals

Even if all the variables and functions that are declared in a module are defined in its local scope, it is still possible to define a global variable. In fact, the module system exposes a special variable called global, which can be used for this purpose. Everything that is assigned to this variable will end up automatically in the global scope.

Note

Polluting the global scope is considered bad practice and nullifies the advantage of having a module system. So, use it only if you really know what you are doing.

module.exports versus exports

For many developers who are not yet familiar with Node.js, a common source of confusion is the difference between using exports and module.exports to expose a public API. The code of our custom require function should again clear any doubt. The variable exports is just a reference to the initial value of module.exports; we have seen that such a value is essentially a simple object literal created before the module is loaded.

This means that we can only attach new properties to the object referenced by the exports variable, as shown in the following code:

exports.hello = () => { 
  console.log('Hello'); 
} 

Reassigning the exports variable doesn't have any effect, because it doesn't change the content of module.exports; it will only reassign the variable itself. The following code is therefore wrong:

exports = () => { 
  console.log('Hello'); 
} 

If we want to export something other than an object literal, such as a function, an instance, or even a string, we have to reassign module.exports as follows:

module.exports = () => { 
  console.log('Hello'); 
} 

The require function is synchronous

Another important detail that we should take into account is that our homemade require function is synchronous. In fact, it returns the module contents using a simple direct style, and no callback is required. This is true for the original Node.js require() function too. As a consequence, any assignment to module.exports must be synchronous as well. For example, the following code is incorrect:

setTimeout(() => { 
  module.exports = function() {...}; 
}, 100); 

This property has important repercussions in the way we define modules, as it limits us to mostly using synchronous code during the definition of a module. This is actually one of the most important reasons why the core Node.js libraries offer synchronous APIs as an alternative to most of the asynchronous ones.

If we need some asynchronous initialization steps for a module, we can always define and export an uninitialized module that is initialized asynchronously at a later time. The problem with this approach, though, is that loading such a module using require does not guarantee that it's ready to be used. In Chapter 9, Advanced Asynchronous Recipes, we will analyze this problem in detail and present some patterns to solve this issue elegantly.

For the sake of curiosity, you might want to know that in its early days, Node.js used to have an asynchronous version of require(), but it was soon removed because it was overcomplicating a functionality that was actually meant to be used only at initialization time and where asynchronous I/O brings more complexities than advantages.

The resolving algorithm

The term dependency hell describes a situation whereby the dependencies of software in turn depend on a shared dependency, but require different incompatible versions. Node.js solves this problem elegantly by loading a different version of a module depending on where the module is loaded from. All the merits of this feature go to npm, and also to the resolving algorithm used in the require function.

Let's now give a quick overview of this algorithm. As we saw, the resolve() function takes a module name (which we will call here, moduleName) as input and it returns the full path of the module. This path is then used to load its code and also to identify the module uniquely. The resolving algorithm can be divided into the following three major branches:

  • File modules: If moduleName starts with / , it is already considered an absolute path to the module and it's returned as it is. If it starts with ./, then moduleName is considered a relative path, which is calculated starting from the requiring module.
  • Core modules: If moduleName is not prefixed with / or ./, the algorithm will first try to search within the core Node.js modules.
  • Package modules: If no core module is found matching moduleName, then the search continues by looking for a matching module in the first node_modules directory that is found navigating up in the directory structure starting from the requiring module. The algorithm continues to search for a match by looking into the next node_modules directory up in the directory tree, until it reaches the root of the filesystem.

For file and package modules, both the individual files and directories can match moduleName. In particular, the algorithm will try to match the following:

  • <moduleName>.js
  • <moduleName>/index.js
  • The directory/file specified in the main property of <moduleName>/package.json

The complete, formal documentation of the resolving algorithm can be found at http://nodejs.org/api/modules.html#modules_all_together .

The node_modules directory is actually where npm installs the dependencies of each package. This means that, based on the algorithm we just described, each package can have its own private dependencies. For example, consider the following directory structure:

myApp 
├── foo.js 
└── node_modules 
    ├── depA 
    │   └── index.js 
    ├── depB 
    │   ├── bar.js 
    │   └── node_modules 
    │       └── depA 
    │           └── index.js 
    └── depC 
        ├── foobar.js 
        └── node_modules 
            └── depA 
                └── index.js 

In the previous example, myApp, depB, and depC all depend on depA; however, they all have their own private version of the dependency! Following the rules of the resolving algorithm, using require('depA') will load a different file depending on the module that requires it, for example:

  • Calling require('depA') from /myApp/foo.js will load /myApp/node_modules/depA/index.js
  • Calling require('depA') from /myApp/node_modules/depB/bar.js will load /myApp/node_modules/depB/node_modules/depA/index.js
  • Calling require('depA') from /myApp/node_modules/depC/foobar.js will load /myApp/node_modules/depC/node_modules/depA/index.js

The resolving algorithm is the core part behind the robustness of the Node.js dependency management, and is what makes it possible to have hundreds or even thousands of packages in an application without having collisions or problems of version compatibility.

The resolving algorithm is applied transparently for us when we invoke require(); however, if needed, it can still be used directly by any module by simply invoking require.resolve().

The module cache

Each module is only loaded and evaluated the first time it is required, since any subsequent call of require() will simply return the cached version. This should be clear by looking at the code of our homemade require function. Caching is crucial for performance, but it also has some important functional implications:

  • It makes it possible to have cycles within module dependencies
  • It guarantees, to some extent, that the same instance is always returned when requiring the same module from within a given package

The module cache is exposed via the require.cache variable, so it is possible to directly access it if needed. A common use case is to invalidate any cached module by deleting the relative key in the require.cache variable, a practice very useful during testing but very dangerous if applied in normal circumstances.

Circular dependencies

Many consider circular dependencies an intrinsic design issue, but it is something which might actually happen in a real project, so it's useful for us to know at least how this works in Node.js. If we look again at our homemade require() function, we immediately get a glimpse of how this might work and what its caveats are.

Suppose we have two modules defined as follows:

  • Module a.js:
       exports.loaded = false; 
       const b = require('./b'); 
       module.exports = { 
         bWasLoaded: b.loaded, 
         loaded: true 
       }; 
  • Module b.js:
       exports.loaded = false; 
       const a = require('./a'); 
       module.exports = { 
         aWasLoaded: a.loaded, 
         loaded: true 
       }; 

Now, let's try to load these from another module, main.js, as follows:

const a = require('./a'); 
const b = require('./b'); 
console.log(a); 
console.log(b); 

The preceding code will print the following output:

{ bWasLoaded: true, loaded: true }
{ aWasLoaded: false, loaded: true }

This result reveals the caveats of circular dependencies. While both the modules are completely initialized the moment they are required from the main module, the a.js module will be incomplete when it is loaded from b.js. In particular, its state will be the one that it reached the moment it required b.js. This behavior should ring another bell, which will be confirmed if we swap the order in which the two modules are required in main.js.

If you try it, you will see that this time it will be the module a.js that will receive an incomplete version of b.js. We understand now that this can become quite a fuzzy business if we lose control of which module is loaded first, which can happen quite easily if the project is big enough.

Module definition patterns

The module system, besides being a mechanism for loading dependencies, is also a tool for defining APIs. As for any other problem related to API design, the main factor to consider is the balance between private and public functionality. The aim is to maximize information hiding and API usability, while balancing these with other software qualities such as extensibility and code reuse.

In this section, we will analyze some of the most popular patterns for defining modules in Node.js; each one has its own balance of information hiding, extensibility, and code reuse.

Named exports

The most basic method for exposing a public API is using named exports, which consists of assigning all the values we want to make public to properties of the object referenced by exports (or module.exports). In this way, the resulting exported object becomes a container or namespace for a set of related functionality.

The following code shows a module implementing this pattern:

//file logger.js 
exports.info = (message) => { 
  console.log('info: ' + message); 
}; 
 
exports.verbose = (message) => { 
  console.log('verbose: ' + message); 
}; 

The exported functions are then available as properties of the loaded module, as shown in the following code:

//file main.js 
const logger = require('./logger'); 
logger.info('This is an informational message'); 
logger.verbose('This is a verbose message'); 

Most of the Node.js core modules use this pattern.

The CommonJS specification only allows the use of the exports variable to expose public members. Therefore, the named exports pattern is the only one that is really compatible with the CommonJS specification. The use of module.exports is an extension provided by Node.js to support a broader range of module definition patterns, as those we are going to see next.

Exporting a function

One of the most popular module definition patterns consists of reassigning the whole module.exports variable to a function. Its main strength is the fact that it exposes only a single functionality, which provides a clear entry point for the module, making it simpler to understand and use; it also honors the principle of small surface area very well. This way of defining modules is also known in the community as the substack pattern, after one of its most prolific adopters, James Halliday (nickname substack). Have a look at this pattern in the following example:

//file logger.js 
module.exports = (message) => { 
  console.log(`info: ${message}`); 
}; 

A possible extension of this pattern is using the exported function as namespace for other public APIs. This is a very powerful combination, because it still gives the module the clarity of a single entry point (the main exported function). This approach also allows us to expose other functionalities that have secondary or more advanced use cases. The following code shows you how to extend the module we defined previously by using the exported function as a namespace:

module.exports.verbose = (message) => { 
  console.log(`verbose: ${message}`); 
}; 

This code demonstrates how to use the module that we just defined:

//file main.js 
const logger = require('./logger'); 
logger('This is an informational message'); 
logger.verbose('This is a verbose message'); 

Even though just exporting a function might seem like a limitation, in reality it's a perfect way to put the emphasis on a single functionality, the most important one for the module, while giving less visibility to secondary or internal aspects, which are instead exposed as properties of the exported function itself. The modularity of Node.js heavily encourages the adoption of the Single Responsibility Principle (SRP): every module should have responsibility over a single functionality and that responsibility should be entirely encapsulated by the module.

Note

Pattern (substack)

Expose the main functionality of a module by exporting only one function. Use the exported function as namespace to expose any auxiliary functionality.

Exporting a constructor

A module that exports a constructor is a specialization of a module that exports a function. The difference is that with this new pattern we allow the user to create new instances using the constructor, but we also give them the ability to extend its prototype and forge new classes. The following is an example of this pattern:

//file logger.js 
function Logger(name) { 
  this.name = name; 
} 
 
Logger.prototype.log = function(message) { 
  console.log(`[${this.name}] ${message}`); 
}; 
 
Logger.prototype.info = function(message) { 
  this.log(`info: ${message}`); 
}; 
 
Logger.prototype.verbose = function(message) { 
  this.log(`verbose: ${message}`); 
}; 
 
module.exports = Logger; 

And, we can use the preceding module as follows:

//file main.js 
const Logger = require('./logger'); 
const dbLogger = new Logger('DB'); 
dbLogger.info('This is an informational message'); 
const accessLogger = new Logger('ACCESS'); 
accessLogger.verbose('This is a verbose message'); 

In the same fashion we can easily export an ES2015 class:

class Logger { 
  constructor(name) { 
    this.name = name; 
  } 
 
  log(message) { 
    console.log(`[${this.name}] ${message}`); 
  } 
 
  info(message) { 
    this.log(`info: ${message}`); 
  } 
 
  verbose(message) { 
    this.log(`verbose: ${message}`); 
  } 
} 
 
module.exports = Logger; 

Given that ES2015 classes are just syntactic sugar for prototypes, the usage of this module will be exactly the same as its prototype-based alternative.

Exporting a constructor or a class still provides a single entry point for the module, but compared to the substack pattern, it exposes a lot more of the module internals; however, on the other hand it allows much more power when it comes to extending its functionality.

A variation of this pattern consists of applying a guard against invocations that doesn't use the new instruction. This little trick allows us to use our module as a factory. Let's see how this works:

function Logger(name) { 
  if(!(this instanceof Logger)) { 
    return new Logger(name); 
  } 
  this.name = name; 
}; 

The trick is simple: we check whether this exists and is an instance of Logger. If any of these conditions is false, it means that the Logger() function was invoked without using new, we then proceed with creating the new instance properly and returning it to the caller. This technique allows us to use the module also as a factory:

//file logger.js 
const Logger = require('./logger'); 
const dbLogger = Logger('DB'); 
accessLogger.verbose('This is a verbose message'); 

A much cleaner approach to implement the guard is offered by the ES2015 new.target syntax which is available starting from Node.js version 6. This syntax exposes the new.target property which is a "meta property" made available inside all the functions and that evaluates to true at runtime if the function was called using the new keyword.

We can use this syntax to rewrite our logger factory:

function Logger(name) { 
  if(!new.target) { 
    return new LoggerConstructor(name); 
  } 
  this.name = name; 
} 

This code is totally equivalent to the previous one, so we can say that this new.target syntax is more helpful ES2015 syntactic sugar that makes our code much more readable and natural.

Exporting an instance

We can leverage the caching mechanism of require() to easily define stateful instances with a state created from a constructor or a factory, which can be shared across different modules. The following code shows an example of this pattern:

//file logger.js 
function Logger(name) { 
  this.count = 0; 
  this.name = name; 
} 
Logger.prototype.log = function(message) { 
  this.count++; 
  console.log('[' + this.name + '] ' + message); 
}; 
module.exports = new Logger('DEFAULT'); 

This newly defined module can then be used as follows:

//file main.js 
const logger = require('./logger'); 
logger.log('This is an informational message'); 

Because the module is cached, every module that requires the logger module will actually always retrieve the same instance of the object, thus sharing its state. This pattern is very much like creating a singleton; however, it does not guarantee the uniqueness of the instance across the entire application, as it happens in the traditional singleton pattern. When analyzing the resolving algorithm, we have seen in fact, that a module might be installed multiple times inside the dependency tree of an application. This results in multiple instances of the same logical module, all running in the context of the same Node.js application. In Chapter 7, Wiring Modules, we will analyze the consequences of exporting stateful instances and some alternative patterns.

An extension to the pattern we just described, consists of exposing the constructor used to create the instance, in addition to the instance itself. This allows the user to create new instances of the same object, or even to extend it if necessary. To enable this, we just need to assign a new property to the instance, as shown in the following line of code:

module.exports.Logger = Logger; 

Then, we can use the exported constructor to create other instances of the class:

const customLogger = new logger.Logger('CUSTOM'); 
customLogger.log('This is an informational message'); 

From the usability perspective, this is similar to using an exported function as a namespace; the module exports the default instance of an object—the piece of functionality we might want to use most of the time, while more advanced features, such as the ability to create new instances or extend the object, are still available through less exposed properties.

Modifying other modules or the global scope

A module can even export nothing. This can look a bit out of place; however, we should not forget that a module can modify the global scope and any object in it, including other modules in the cache. Please note that these are in general considered bad practices, but since this pattern can be useful and safe under some circumstances (for example, for testing) and is sometimes used in the wild, it is worth knowing and understanding it. We said a module can modify other modules or objects in the global scope; well, this is called monkey patching. It generally refers to the practice of modifying the existing objects at runtime to change or extend their behavior or to apply temporary fixes.

The following example shows you how we can add a new function to another module:

//file patcher.js 
 
// ./logger is another module 
require('./logger').customMessage = () => console.log('This is a new 
  functionality'); 

Using our new patcher module would be as easy as writing the following code:

//file main.js 
 
require('./patcher'); 
const logger = require('./logger'); 
logger.customMessage(); 

In the preceding code, patcher must be required before using the logger module for the first time in order to allow the patch to be applied.

The techniques described here are all dangerous ones to apply. The main concern is that having a module that modifies the global namespace or other modules is an operation with side effects. In other words, it affects the state of entities outside their scope, which can have consequences that aren't predictable, especially when multiple modules interact with the same entities. Imagine having two different modules trying to set the same global variable, or modifying the same property of the same module; the effects might be unpredictable (which module wins?), but most importantly it would have repercussions in the entire application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.137.117