14 Advanced data manipulation

Whatever is well-conceived is clearly said

This chapter covers

  • Manipulating nested data
  • Writing clear and concise code for business logic
  • Separating business logic and generic data manipulation
  • Building custom data manipulation tools
  • Using the best tool for the job

When our business logic involves advanced data processing, the generic data manipulation functions provided by the language run time and by third-party libraries might not be sufficient. Instead of mixing the details of data manipulation with business logic, we can write our own generic data manipulation functions and implement our custom business logic using them. Separating business logic from the internal details of data manipulation makes the business logic code concise and easy to read for other developers.

14.1 Updating a value in a map with eloquence

Dave is more and more autonomous on the Klafim project. He can implement most features on his own, typically turning to Theo only for code reviews. Dave’s code quality standards are quite high. Even when his code is functionally solid, he tends to be unsatisfied with its readability. Today, he asks for Theo’s help in improving the readability of the code that fixes a bug Theo introduced a long time ago.

  DAVE   I think I have a found a bug in the code that returns book information from the Open Library API.

  THEO   What bug?

  DAVE   Sometimes, the API returns duplicate author names, and we pass the duplicates through to the client.

  THEO   It doesn’t sound like a complicated bug to fix.

  DAVE   Right, I fixed it, but I’m not satisfied with the readability of the code I wrote.

  THEO   Being critical of our own code is an important quality for a developer to progress. What is it exactly that you don’t like?

  DAVE   Take a look at this code.

Listing 14.1 Removing duplicates in a straightforward but tedious way

function removeAuthorDuplicates(book) {
  var authors = _.get(book, "authors");
  var uniqAuthors = _.uniq(authors);
  return _.set(book,"authors", uniqAuthors);
}

  DAVE   I’m using _.get to retrieve the array with the author names, then _.uniq to create a duplicate-free version of the array, and finally, _.set to create a new version of the book with no duplicate author names.

  THEO   The code is tedious because the next value of authorNames needs to be based on its current value.

  DAVE   But it’s a common use case! Isn’t there a simpler way to write this kind of code?

  THEO   Your astonishment definitely honors you as a developer, Dave. I agree with you that there must be a simpler way. Let me phone Joe and see if he’s available for a conference call.

    JOE   How’s it going, Theo?

  THEO   Great! Are you back from your tech conference?

    JOE   I just landed. I’m on my way home now in a taxi.

  THEO   How was your talk about DOP?

    JOE   Pretty good. At the beginning people were a bit suspicious, but when I told them the story of Albatross and Klafim, it was quite convincing.

  THEO   Yeah, adults are like children in that way; they love stories.

    JOE   What about you? Did you manage to achieve polymorphism with multimethods?

  THEO   Yes! Dave even managed to implement a feature in Klafim with multimethods.

    JOE   Cool!

  THEO   Do you have time to help Dave with a question about programming?

JOE   Sure.

  DAVE   Hi Joe. How are you doing?

    JOE   Hello Dave. Not bad. What kind of help do you need?

  DAVE   I’m wondering if there’s a simpler way to remove duplicates inside an array value in a map. Using _.get, _.uniq, and _.set looks quite tedious.

    JOE   You should build your own data manipulation tools.

  DAVE   What do you mean?

    JOE   You should write a generic update function that updates a value in a map, applying a calculation based on its current value.1

  DAVE   What would the arguments of update be in your opinion?

    JOE   Put the cart before the horse.

  DAVE   What?!

    JOE   Rewrite your business logic as if update were already implemented, and you’ll discover what the arguments of update should be.

  DAVE   I see what you mean: the horse is the implementation of update, and the cart is the usage of update.

    JOE   Exactly. But remember, it’s better if you keep your update function generic.

  DAVE   How?

    JOE   By not limiting it to your specific use case.

  DAVE   I see. The implementation of update should not deal with removing duplicate elements. Instead, it should receive the updating function—in my case, _.uniq—as an argument.

    JOE   Exactly! Uh, sorry Dave, I gotta go, I just got home. Good luck!

  DAVE   Take care, Joe, and thanks!

Dave ends the conference call. Looking at Theo, he reiterates the conversation with Joe.

  DAVE   Joe advised me to write my own update function. For that purpose, he told me to start by rewriting removeAuthorDuplicates as if update were already implemented. That will allow us to make sure we get the signature of update right.

  THEO   Sounds like a plan.

  DAVE   Joe called it “putting the cart before the horse.”

  THEO   Joe and his funny analogies ...

?Tip The best way to find the signature of a custom data manipulation function is to think about the most convenient way to use it.

  DAVE   Anyway, the way I’d like to use update inside removeAuthorDuplicates is like this.

Listing 14.2 The code that removes duplicates in an elegant way

function removeAuthorDuplicates(book) {
  return update(book, "authors", _.uniq);
}

  THEO   Looks good to me!

  DAVE   Wow! Now the code with update is much more elegant than the code with _.get and _.set!

  THEO   Before you implement update, I suggest that you write down in plain English exactly what the function does.

  DAVE   It’s quite easy: update receives a map called map, a path called path, and a function called fun. It returns a new version of map, where path is associated with fun(currentValue), and currentValue is the value associated with path in map.

Thinking out loud, Dave simultaneously draws a diagram like that in figure 14.1. Theo is becoming more and more impressed with his young protegé as he studies the figure.

Figure 14.1 The behavior of update

?Tip Before implementing a custom data manipulation function, formulate in plain English exactly what the function does.

  THEO   With such a clear definition, it’s going to be a piece of cake to implement update!

After a few minutes, Dave comes up with the code. It doesn’t take long because the plain-English diagram helps him to organize the code.

Listing 14.3 A generic update function

function update(map, path, fun) {
  var currentValue = _.get(map, path);
  var nextValue = fun(currentValue);
  return _.set(map, path, nextValue);
}

  THEO   Why don’t you see if it works with a simple case such as incrementing a number in a map?

  DAVE   Good idea! I’ll try multiplying a value in a map by 2 with update. How’s this look?

Listing 14.4 Multiplying a value in a map by 2

var m = {
  "position": "manager",
  "income": 100000
};
update(m, "income", function(x) {
  return x * 2;
});
// → {"position": "manager", "income": 200000}

  THEO   Great! It seems to work.

14.2 Manipulating nested data

The next Monday, during Theo and Dave’s weekly sync meeting, they discuss the upcoming features for Klafim. Theo fondly remembers another Monday where they met at Dave’s family home in the country. Coming back to the present moment, Theo begins.

  THEO   Recently, Nancy has been asking for more and more administrative features.

  DAVE   Like what?

  THEO   I’ll give you a few examples... . Let me find the email I got from Nancy yesterday.

  DAVE   OK.

  THEO   Here it is. There are three feature requests for now: listing all the book author IDs, calculating the book lending ratio, and grouping books by a physical library.

  DAVE   What feature should I tackle first?

  THEO   It doesn’t matter, but you should deliver the three of these before the end of the week. Good luck, and don’t hesitate to call me if you need help.

On Tuesday, Dave asks for Theo’s help. Dave is not pleased with how his code looks.

  DAVE   I started to work on the three admin features, but I don’t like the code I wrote. Let me show you the code for retrieving the list of author IDs from the list of books returned from the database.

  THEO   Can you remind me what an element in a book list returned from the database looks like?

  DAVE   Each book is a map with an authorIds array field.

  THEO   OK, so it sounds like a map over the books should do it.

  DAVE   This is what I did, but it doesn’t work as expected. Here’s my code for listing the book author IDs.

Listing 14.5 Retrieving the author IDs in books as an array of arrays

function authorIdsInBooks(books) {
  return _.map(books, "authorIds");
}

  THEO   What’s the problem?

  DAVE   The problem is that it returns an array of arrays of author IDs instead of an array of author IDs. For instance, when I run authorIdsInBooks on a catalog with two books, I get this result.

Listing 14.6 The author IDs in an array of arrays

[
  ["sean-covey", "stephen-covey"],
  ["alan-moore", "dave-gibbons"]
]

  THEO   That’s not a big problem. You can flatten an array of arrays with _.flatten, and you should get the result you expect.

  DAVE   Nice! This is exactly what I need! Give me a moment to fix the code of authorIdsInBooks... here you go.

Listing 14.7 Retrieving the author IDs in books as an array of strings

function authorIdsInBooks(books) {
  return _.flatten(_.map(books, "authorIds"));
}

  THEO   Don’t you think that mapping and then flattening deserves a function of its own?

  DAVE   Maybe. It’s quite easy to implement a flatMap function.2 How about this?

Listing 14.8 The implementation of flatMap

function flatMap(coll, f) {
  return _.flatten(_.map(coll,f));
}

  THEO   Nice!

  DAVE   I don’t know... . It’s kind of weird to have such a small function.

  THEO   I don’t think that code size is what matters here.

  DAVE   What do you mean?

  THEO   See what happens when you rewrite authorIdsInBooks using flatMap.

  DAVE   OK, here’s how I’d use flatMap to list the author IDs.

Listing 14.9 Retrieving the author IDs as an array of strings using flatMap

function authorIdsInBooks(books) {
  return flatMap(books, "authorIds");
}

  THEO   What implementation do you prefer, the one with flatten and map (in listing 14.7) or the one with flatMap (in listing 14.9)?

  DAVE   I don’t know. To me, they look quite similar.

  THEO   Right, but which implementation is more readable?

  DAVE   Well, assuming I know what flatMap does, I would say the implementation with flatMap. Because it’s more concise, it is a bit more readable.

  THEO   Again, it’s not about the size of the code. It’s about the clarity of intent and the power of naming things.

  DAVE   I don’t get that.

  THEO   Let me give you an example from our day-to-day language.

  DAVE   OK.

  THEO   Could you pass me that thing on your desk that’s used for writing?

It takes Dave a few seconds to get that Theo has asked him to pass the pen on the desk. After he passes Theo the pen, he asks:

  DAVE   Why didn’t you simply ask for the pen?

  THEO   I wanted you to experience how it feels when we use descriptions instead of names to convey our intent.

  DAVE   Oh, I see. You mean that once we use a name for the operation that maps and flattens, the code becomes clearer.

  THEO   Exactly.

  DAVE   Let’s move on to the second admin feature: calculating the book lending ratio.

  THEO   Before that, I think we deserve a short period for rest and refreshments, where we drink a beverage made by percolation from roasted and ground seeds.

  DAVE   A coffee break!

14.3 Using the best tool for the job

After the coffee break, Dave shows Theo his implementation of the book lending ratio calculation. This time, he seems to like the code he wrote.

  DAVE   I’m quite proud of the code I wrote to calculate the book lending ratio.

  THEO   Show me the money!

  DAVE   My function receives a list of books from the database like this.

Listing 14.10 A list of two books with bookItems

[
  {
    "isbn": "978-1779501127",
    "title": "Watchmen",
    "bookItems": [
      {
        "id": "book-item-1",
        "libId": "nyc-central-lib",
        "isLent": true
      } 
    ]
  },
  {
    "isbn":  "978-1982137274",
    "title": "7 Habits of Highly Effective People",
    "bookItems": [
      {
        "id": "book-item-123",
        "libId": "hudson-park-lib",
        "isLent": true
      },
      {
        "id": "book-item-17",
        "libId": "nyc-central-lib",
        "isLent": false
      }
    ]
  }
]

  THEO   Quite a nested piece of data!

  DAVE   Yeah, but now that I’m using flatMap, calculating the lending ratio is quite easy. I’m going over all the book items with forEach and incrementing either the lent or the notLent counter. At the end, I return the ratio between lent and (lent + notLent). Here’s how I do that.

Listing 14.11 Calculating the book lending ratio using forEach

function lendingRatio(books) {
  var bookItems = flatMap(books, "bookItems");
  var lent = 0;
  var notLent = 0;
  _.forEach(bookItems, function(item) {
    if(_.get(item, "isLent")) {
      lent = lent + 1;
    } else {
      notLent = notLent + 1;
    }
  });
  return lent/(lent + notLent);
}

  THEO   Would you allow me to tell you frankly what I think of your code?

  DAVE   If you are asking this question, it means that you don’t like it. Right?

  THEO   It’s nothing against you; I don’t like any piece of code with forEach.

  DAVE   What’s wrong with forEach?

  THEO   It’s too generic!

  DAVE   I thought that genericity was a positive thing in programming.

  THEO   It is when we build a utility function, but when we use a utility function, we should use the least generic function that solves our problem.

  DAVE   Why?

  THEO   Because we ought to choose the right tool for the job, like in the real life.

  DAVE   What do you mean?

  THEO   Let me give you an example. Yesterday, I had to clean my drone from the inside. Do you think that I used a screwdriver or a Swiss army knife to unscrew the drone cover?

  DAVE   A screwdriver, of course! It’s much more convenient to manipulate.

  THEO   Right. Also, imagine that someone looks at me using a screwdriver. It’s quite clear to them that I am turning a screw. It conveys my intent clearly.

  DAVE   Are you saying that forEach is like the Swiss army knife of data manipulation?

  THEO   That’s a good way to put it.

?Tip Pick the least generic utility function that solves your problem.

  DAVE   What function should I use then, to iterate over the book item collection?

  THEO   You could use _.reduce.

  DAVE   I thought reduce was about returning data from a collection. Here, I don’t need to return data; I need to update two variables, lent and notLent.

  THEO   You could represent those two values in a map with two keys.

  DAVE   Can you show me how to rewrite my lendingRatio function using reduce?

  THEO   Sure. The initial value passed to reduce is the map, {"lent": 0, "notLent": 0}, and inside each iteration, we update one of the two keys, like this.

Listing 14.12 Calculating the book lending ratio using reduce

function lendingRatio(books) {
  var bookItems = flatMap(books, "bookItems");
  var stats = _.reduce(bookItems, function(res, item) {
    if(_.get(item, "isLent")) {
      res.lent = res.lent + 1;
    } else {
      res.notLent = res.notLent + 1;
    }
    return res;
  }, {notLent: 0, lent:0});
  return stats.lent/(stats.lent + stats.notLent);
}

  DAVE   Instead of updating the variables lent and notLent, now we are updating lent and notLent map fields. What’s the difference?

  THEO   Dealing with map fields instead of variables allows us to get rid of reduce in our business logic code.

  DAVE   How could you iterate over a collection without forEach and without reduce?

  THEO   I can’t avoid the iteration over a collection, but I can hide reduce behind a utility function. Take a look at the way reduce is used inside the code of lendingRatio. What is the meaning of the reduce call?

Dave looks at the code in listing 14.12. He thinks for a long moment before he answers.

  DAVE   I think it’s counting the number of times isLent is true and false.

  THEO   Right. Now, let’s use Joe’s advice about building our own data manipulation tool.

  DAVE   How exactly?

  THEO   I suggest that you write a countByBoolField utility function that counts the number of times a field is true and false.

  DAVE   OK, but before implementing this function, let me first rewrite the code of lendingRatio, assuming this function already exists.

  THEO   You are definitely a fast learner, Dave!

  DAVE   Thanks! I think that by using countByBoolField, the code for calculating the lending ratio using a custom utility function would be something like this.

Listing 14.13 Calculating the book lending ratio

function lendingRatio(books) {
  var bookItems = flatMap(books, "bookItems");
  var stats = countByBoolField(bookItems, "isLent", "lent", "notLent");
  return stats.lent/(stats.lent + stats.notLent);
}

?Tip Don’t use _.reduce or any other low-level data manipulation function inside code that deals with business logic. Instead, write a utility function—with a proper name—that hides _.reduce.

  THEO   Perfect. Don’t you think that this code is clearer than the code using _.reduce?

  DAVE   I do! The code is both more concise and the intent is clearer. Let me see if I can implement countByBoolField now.

  THEO   I suggest that you write a unit test first.

  DAVE   Good idea.

Dave types for a bit. When he’s satisfied, he shows Theo the result.

Listing 14.14 A unit test for countByBoolField

var input = [
  {"a": true},
  {"a": false},
  {"a": true},
  {"a": true}
];
 
var expectedRes = {
  "aTrue": 3,
  "aFalse": 1
};
 
_.isEqual(countByBoolField(input, "a", "aTrue", "aFalse"), expectedRes);

  THEO   Looks good to me. Now, for the implementation of countByBoolField, I think you are going to need our update function.

  DAVE   I think you’re right. On each iteration, I need to increment the value of either aTrue or aFalse using update and a function that increments a number by 1.

After a few minutes of trial and error, Dave comes up with the piece of code that uses reduce, update, and inc. He shows Theo the code for countByBoolField.

Listing 14.15 The implementation of countByBoolField

function inc (n) {
  return n + 1;
}
 
function countByBoolField(coll, field, keyTrue, keyFalse) {
  return _.reduce(coll, function(res, item) {
    if (_.get(item, field)) {
      return update(res, keyTrue, inc);
    }
    return update(res, keyFalse, inc);
  }, {[keyTrue]: 0,                      
      [keyFalse]: 0});
}

Creates a map with keyTrue and keyFalse associated to 0

  THEO   Well done! Shall we move on and review the third admin feature?

  DAVE   The third feature is more complicated. I would like to use the teachings from the first two features for the implementation of the third feature.

  THEO   OK. Call me when you’re ready for the code review.

14.4 Unwinding at ease

Dave really struggled with the implementation of the last admin feature, grouping books by a physical library. After a couple of hours of frustration, Dave calls Theo for a rescue.

  DAVE   I really had a hard time implementing the grouping by library feature.

  THEO   I only have a couple of minutes before my next meeting, but I can try to help you. What’s the exact definition of grouping by library?

  DAVE   Let me show you the unit test I wrote.

Listing 14.16 Unit test for grouping books by a library

var books = [
  {
    "isbn": "978-1779501127",
    "title": "Watchmen",
    "bookItems": [
      {
        "id": "book-item-1",
        "libId": "nyc-central-lib",
        "isLent": true
      } 
    ]
  },
  {
    "isbn":  "978-1982137274",
    "title": "7 Habits of Highly Effective People",
    "bookItems": [
      {
        "id": "book-item-123",
        "libId": "hudson-park-lib",
        "isLent": true
      },
      {
        "id": "book-item-17",
        "libId": "nyc-central-lib",
        "isLent": false
      }
    ]
  }
];
 
var expectedRes = 
{
  "hudson-park-lib": [
    {
      "bookItems": {
        "id": "book-item-123",
        "isLent": true,
        "libId": "hudson-park-lib",
      },
      "isbn": "978-1982137274",
      "title": "7 Habits of Highly Effective People",
    },
  ],
  "nyc-central-lib": [
    {
      "bookItems":  {
        "id": "book-item-1",
        "isLent": true,
        "libId": "nyc-central-lib",
      },
      "isbn": "978-1779501127",
      "title": "Watchmen",
    },
    {
      "bookItems":  {
        "id": "book-item-17",
        "isLent": false,
        "libId": "nyc-central-lib",
      },
      "isbn": "978-1982137274",
      "title": "7 Habits of Highly Effective People",
    },
  ],
};
_.isEqual(booksByRack(books) , expectedRes);

  THEO   Cool... . Writing unit tests before implementing complicated functions was also helpful for me when I refactored Klafim from OOP to DOP.

  DAVE   Writing unit tests for functions that receive and return data is much more fun than writing unit tests for the methods of stateful objects.

?Tip Before implementing a complicated function, write a unit test for it.

  THEO   What was difficult about the implementation of booksByLib?

  DAVE   I started with a complicated implementation involving merge and reduce before I remembered that you advised me to hide reduce behind a generic function. But I couldn’t figure out what kind of generic function I needed.

  THEO   Indeed, it’s not easy to implement.

  DAVE   I’m glad to hear that. I thought I was doing something wrong.

  THEO   The challenge here is that you need to work with book items, but the book title and ISBN are not present in the book item map.

  DAVE   Exactly!

  THEO   It reminds me a query I had to write a year ago on MongoDB, where data was laid out in a similar way.

  DAVE   And what did your query look like?

  THEO   I used MongoDB’s $unwind operator. Given a map m with a field <arr, myArray>, it returns an array where each element is a map corresponding to m without arr and with item associated to an element of myArray.

  DAVE   That’s a bit abstract for me. Could you give me an example?

Theo moves to the whiteboard. He draws a diagram like the one in figure 14.2.

Figure 14.2 The behavior of unwind

  THEO   In my case, I was dealing with an online store, where a customer cart was represented as a map with a customer-id field and an items array field. Each element in the array represented an item in the cart. I wrote a query with unwind that retrieved the cart items with the customer-id field.

  DAVE   Amazing! That’s exactly what we need. Let’s write our own unwind function!

  THEO   I’d be happy to pair program with you on this cool stuff, but I’m already running late for another meeting.

  DAVE   I’m glad I’m not a manager!

When Theo leaves for his meeting, Dave goes to the kitchen and prepares himself a long espresso as a reward for all that he’s accomplished today. He thoroughly enjoys it as he works on the implementation of unwind.

As Joe advised, Dave starts by writing the code for booksByLib as if unwind were already implemented. He needs to go over each book and unwind its book items using flatMap and unwind. He then groups the book items by their libId using _.groupBy. Satisfied with the resulting code, he finishes his espresso.

Listing 14.17 Grouping books by a library using unwind

function booksByRack(books) {
  var bookItems =  flatMap(books, function(book) {
    return unwind(book, "bookItems");
  });
  return _.groupBy(bookItems, "bookItems.libId")
}

Dave cannot believe that such a complicated function could be implemented so clearly and compactly. Dave says to himself that the complexity must reside in the implementation of unwind—but he soon finds out that he’s wrong; it is not going to be as complicated as he thought! He starts by writing a unit test for unwind, similar to Theo’s MongoDB customer cart scenario.

Listing 14.18 A unit test for unwind

var customer = {
  "customer-id": "joe",
  "items": [
    {
      "item": "phone",
      "quantity": 1
    },
    {
      "item": "pencil",
      "quantity": 10
    }
  ]
};
 
var expectedRes = [
  {
    "customer-id": "joe",
    "items": {
      "item": "phone",
      "quantity": 1
    }
  },
  {
    "customer-id": "joe",
    "items": {
      "item": "pencil",
      "quantity": 10
    }
  }
]
 
_.isEqual(unwind(customer, "items"), expectedRes)

The implementation of unwind is definitely not as complicated as Dave thought. It retrieves the array arr associated with f in m and creates, for each element of arr, a version of m, where f is associated with elem. Dave is happy to remember that data being immutable, there is no need to clone m.

Listing 14.19 The implementation of unwind

function unwind(map, field) {
  var arr = _.get(map, field);
  return _.map(arr, function(elem) {
    return _.set(map, field, elem);
  });
}

After a few moments of contemplating his beautiful code, Dave sends Theo a message with a link to the pull request that implements grouping books by a library with unwind. After that he leaves the office to go home, by bike, tired but satisfied.

Summary

  • Maintain a clear separation between the code that deals with business logic and the implementation of the data manipulation.

  • Separating business logic from data manipulation makes our code not only concise, but also easy to read because it conveys the intent in a clear manner.

  • We design and implement custom data manipulation functions in a four-step process:

    a) Discover the function signature by using it before it is implemented.

    b) Write a unit test for the function.

    c) Formulate the behavior of the function in plain English.

    d) Implement the function.

  • The best way to find the signature of a custom data manipulation function is to think about the most convenient way to use it.

  • Before implementing a custom data manipulation function, formulate in plain English exactly what the function does.

  • Pick the least generic utility function that solves your problem.

  • Don’t use _.reduce or any other low-level data manipulation function inside code that deals with business logic. Instead, write a utility function—with a proper name—that hides _.reduce.

  • Before implementing a complicated function, write a unit test for it.

Lodash functions introduced in this chapter

Function

Description

flatten(arr)

Flattens arr a single level deep

sum(arr)

Computes the sum of the values in arr

uniq(arr)

Creates an array of unique values from arr

every(coll, pred)

Checks if pred returns true for all elements of coll

forEach(coll, f)

Iterates over elements of coll and invokes f for each element

sortBy(coll, f)

Creates an array of elements, sorted in ascending order, by the results of running each element in coll through f


1 Lodash provides an implementation of update, but for the sake of teaching, we are writing our own implementation.

2 Lodash provides an implementation of flatMap, but for the sake of teaching, we are writing our own implementation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.90.131