Simplifying conditional spaghetti

To illustrate too much cyclomatic complexity and how we should approach simplifying it, we're going to be refactoring a piece of code that is responsible for deriving a set of ID numbers and types from a set of licenses:

function getIDsFromLicenses(licenses) {
  const ids = [];
  for (let i = 0; i < licenses.length; i++) {
    let license = licenses[i];
    if (license.id != null) {
      if (license.id.indexOf('c') === 0) {
        let nID = Number(license.id.slice(1));
        if (nID >= 1000000) {
          ids.push({ type: 'car', digits: nID });
        } else {
          ids.push({ type: 'car_old', digits: nID });
        }
      } else if (license.id.indexOf('h') === 0) {
        ids.push({
          type: 'hgv',
          digits: Number(license.id.slice(1))
        });
      } else if (license.id.indexOf('m') === 0) {
        ids.push({
          type: 'motorcycle',
          digits: Number(license.id.slice(1))
        });
      }
    }
  } 
  return ids;
}

This function accepts an array of licenses and then extracts the ID numbers of those licenses (avoiding cases of null or undefined IDs). We determine the type of license based on characters found within its ID. There are four types of licenses that need to be identified and extracted:

car: These are of the c{digits} form, where digits form a number greater than or equal to 1,000,000
car_old: These are of the c{digits} form, where digits form a number less than 1,000,000
hgv: These are of the h{digits}
motorcycle: These are of the m{digits}

The following is an example of the input and the derived output of the getIDsFromLicenses function:

getIDsFromLicenses([
    { name: 'Jon Smith', id: 'c32948' },
    { name: 'Marsha Brown' },
    { name: 'Leah Oak', id: 'h109' },
    { name: 'Jim Royle', id: 'c29283928' }
]);
// Outputs:
[
  {type: "car_old", digits: 32948}
  {type: "hgv", digits: 109}
  {type: "car", digits: 29283928}
]

As you may have observed, the code we've used to extract the IDs is quite cyclomatically complex. You may consider it perfectly reasonable code, and it arguably is, but it could be simpler still. Our function achieves its results imperatively, using up a lot of syntax to explain how it wants to accomplish its task instead of what it wants to accomplish.

To simplify our code, it's first useful to take a fresh look at the problem domain. The task we want to accomplish is to take an input array and, from it, derive a set of license ID types and values. The output array will be an almost 1:1 mapping from the input array, except for cases where licenses have a falsy id property (null, in this case). The following is an illustration of our I/O flow:

[INPUT LICENSES] ==> (DERIVATION LOGIC) ==> [OUTPUT ID TYPES AND DIGITS]

Looked at abstractly in this way, this seems like the perfect opportunity to use Array#map. The map method allows us to run a function on every element within an array to derive a new array containing mapped values.

The first thing we'll want to map is the license to its id:

ids = licenses.map(license => license.id)

We'll want to handle cases where there is no id. To do this, we can apply a filter on the derived IDs:

ids = ids.filter(id => id != null)

And, in fact, because we know that all valid IDs are truthy, we can simply do a Boolean check by directly passing Boolean as our filter function:

ids = ids.filter(Boolean)

From this, we'll receive an array of our licenses but only those with a truthy id property. Following this, we can consider the next transformation we wish to apply to the data. We'd like to split the id value into its constituent parts: we need the initial character of the ID (id.charAt(0)), and then we want to extract the remaining characters (the digits) and cast them to the Number type (Number(id.slice(1))). We can then pass these parts to another function, which will be responsible for extracting the correct ID fields (type and digits) from this information:

ids = ids.map(id => getIDFields(
  id.charAt(0),
  Number(id.slice(1))
));

The getIDFields function will need to determine the type from the individual character and digits for the ID, returning an object of the { type, digits } form:

function getIDFields(idType, digits) {
  switch (idType) {
    case 'c': return {
      type: digits >= 1000000 ? 'car' : 'car_old',
      digits
    };
    case 'h': return { type: 'hgv', digits };
    case 'm': return { type: 'motorcycle', digits };
  }
}

Since we've abstracted this part our logic away to an individual function, we can independently observe and test its behavior:

getIDFields('c', 1000); // => { type: "car_old", digits: 1000 }
getIDFields('c', 2000000); // => { type: "car", digits: 1000 }
getIDFields('h', 1000); // => { type: "hgv", digits: 1000 }
getIDFields('i', 1000); // => { type: "motorcycle", digits: 1000 }

Tying everything together, we end up with a new implementation of getIDsFromLicenses that looks like this:

function getIDsFromLicenses(licenses) {
  return licenses
    .map(license => license.id)
    .filter(Boolean)
    .map(id => getIDFields(
      id.charAt(0),
      Number(id.slice(1))
    ))
}

What we have achieved here is a significant reduction in the amount of cyclomatic complexity that our fellow programmers will need to contend with. We are utilizing Array#map and Array#filter to abstract away both decision-making and iteration logic. This means we end up with an implementation that is far more declarative.

You may notice, as well, that we have extracted repeated logic and generalized it. For example, in our initial implementation, we were implementing many calls to discover the first character of the ID (for example, license.id.indexOf('m') === 0). Our new implementation generalizes this by mapping to a data structure that already includes the first character as a distinct value that we can then pass through to getIDFields to get the relevant type and digits for that ID.

To summarize, our general refactoring approach has involved the following considerations:

We've considered the problem domain with a fresh perspective
We've considered whether there is a common functional or declarative idiom for our I/O
We've considered whether individual logic can be abstracted away or separated

Our code is now easier to comprehend, and hence easier to maintain and debug. It'll likely also be more reliable and stable since its individual units can be more simply tested and can hence avoid regressions in the future. There is, naturally, the potential for a slight performance decrease due to the increased usage of higher abstracted declarative idioms and functions over imperative code, but this is an incredibly marginal difference and, in the vast majority of situations, is worth implementing for the significant benefits that the refactoring produces in terms of maintainability and reliability.

Table of Contents for Simplifying conditional spaghetti

Create new playlist

Sign In

Sign Up

Table of Contents for
Simplifying conditional spaghetti