Standing on the shoulders of giants
In part 1, we illustrated how to manage the state of a system without mutating data, where immutability is maintained by constraining ourselves to manipulate the state only with immutable functions using structural sharing. In this chapter, we present a safer and more scalable way to preserve data immutability—representing data with so-called persistent data structures. Efficient implementations of persistent data structures exist for most programming languages via third-party libraries.
It’s at the university where Theo meets Joe this time. When Theo asks Joe if today’s topic is academic in nature, Joe tells him that the use of persistent data structures only became possible in programming languages following a discovery in 2001 by a computer researcher named Phil Bagwell.1 In 2007, Rich Hickey, the creator of Clojure, used this discovery as the foundation of persistent data structures in Clojure. Unveiling the secrets of these data structures to Theo in a university classroom is a way for Joe to honor the memory of Phil Bagwell, who unfortunately passed away in 2012. When they get to the university classroom, Joe starts the conversation with a question.
JOE Are you getting used to DOP’s prohibition against mutating data in place and creating new versions instead?
THEO I think so, but two things bother me about the idea of structural sharing that you showed me.
JOE What bothers you, my friend?
JOE What do you mean by safety?
THEO I mean that using immutable functions to manipulate data doesn’t prevent it from being modified accidentally.
JOE Right! Would you like me to show you the naive way to handle immutability or the real way?
THEO What are the pros and cons of each way?
JOE The naive way is easy but not efficient, although the real way is efficient but not easy.
THEO Let’s start with the naive way then.
JOE Each programming language provides its own way to protect data from being mutated.
THEO How would I do that in Java, for instance?
JOE Java provides immutable collections, and there is a way to convert a list or a map to an immutable list or an immutable map.
►Note Immutable collections are not the same as persistent data structures.
Joe opens his laptop and fires it up. He brings up two code examples, one for immutable lists and one for immutable maps.
var myList = new ArrayList<Integer>(); myList.add(1); myList.add(2); myList.add(3); var myImmutableList = List.of(myList.toArray());
var myMap = new HashMap<String, Object>(); myMap.put("name", "Isaac"); myMap.put("age", 42); var myImmutableMap = Collections.unmodifiableMap(myMap);
THEO What happens when you try to modify an immutable collection?
JOE Java throws an UnsupportedOperationException
.
JOE JavaScript provides an Object.freeze()
function that prevents data from being mutated. It works both with JavaScript arrays and objects.
Joe takes a minute to scroll through his laptop. When he finds what he’s looking for, he shows Theo the code.
THEO What happens when you try to modify a frozen object?
JOE It depends. In JavaScript strict mode, a TypeError
exception is thrown, and in nonstrict mode, it fails silently.
►Note JavaScript’s strict mode is a way to opt in to a restricted variant of JavaScript that changes some silent errors to throw errors.
THEO In case of a nested collection, are the nested collections also frozen?
JOE No, but in JavaScript, one can write a deepFreeze()
function that freezes an object recursively. Here’s another example.
function deepFreeze(object) { // Retrieve the property names defined on object const propNames = Object.getOwnPropertyNames(object); // Freeze properties before freezing self for (const name of propNames) { const value = object[name]; if (value && typeof value === "object") { deepFreeze(value); } } return Object.freeze(object); }
THEO I see that it’s possible to ensure that data is never mutated, which answers my concerns about safety. Now, let me share my concerns about performance.
?Tip It’s possible to manually ensure that our data isn’t mutated, but it’s cumbersome.
THEO If I understand correctly, the main idea behind structural sharing is that most data is usually shared between two versions.
THEO This insight allows us to create new versions of our collections using a shallow copy instead of a deep copy, and you claimed that it was efficient.
THEO Now, here is my concern. In the case of a collection with many entries, a shallow copy might be expensive.
JOE Could you give me an example of a collection with many entries?
THEO A catalog with 100,000 books, for instance.
JOE On my machine, making a shallow copy of a collection with 100,000 entries doesn’t take more than 50 milliseconds.
THEO Sometimes, even 50 milliseconds per update isn’t acceptable.
JOE I totally agree with you. When one needs data immutability at scale, naive structural sharing is not appropriate.
THEO Also, shallow copying an array of 100,000 elements on each update would increase the program memory by 100 KB.
JOE Indeed, at scale, we have a problem both with memory and computation.
?Tip At scale, naive structural sharing causes a performance hit, both in terms of memory and computation.
THEO Is there a better solution?
JOE Yes! For that, you’ll need to learn the real way to handle immutability. It’s called persistent data structures.
THEO In what sense are those data structures persistent?
JOE Persistent data structures are so named because they always preserve their previous versions.
?Tip Persistent data structures always preserve the previous version of themselves when they are modified.
JOE Persistent data structures address the two main limitations of naive structural sharing: safety and performance.
THEO Let’s start with safety. How do persistent data structures prevent data from being mutated accidentally?
JOE In a language like Java, they implement the mutation methods of the collection interfaces by throwing the run-time exception UnsupportedOperationException
.
THEO And, in a language like JavaScript?
JOE In JavaScript, persistent data structures provide their own methods to access data, and none of those methods mutate data.
THEO Does that mean that we can’t use the dot notation to access fields?
JOE Correct. Fields of persistent data structures are accessed via a specific API.
THEO What about efficiency? How do persistent data structures make it possible to create a new version of a huge collection in an efficient way?
JOE Persistent data structures organize data in such a way that we can use structural sharing at the level of the data structure.
JOE Certainly. Let’s start with the simplest data structure: a linked list. Imagine that you have a linked list with 100,000 elements.
JOE What would it take to prepend an element to the head of the list?
THEO You mean to create a new version of the list with an additional element?
THEO Well, we could copy the list and then prepend an element to the list, but it would be quite expensive.
JOE What if I tell you that the original linked list is guaranteed to be immutable?
THEO In that case, I could create a new list with a new head that points to the head of the original list.
Theo goes to the classroom blackboard. He picks up a piece of chalk and draws the diagram shown in figure 9.1.
JOE Would the efficiency of this operation depend on the size of the list?
THEO No, it would be efficient, no matter the size of the list.
JOE That’s what I mean by structural sharing at the level of the data structure itself. It relies on a simple but powerful insight—when data is immutable, it is safe to share it.
?Tip When data is immutable, it is safe to share it.
THEO I understand how to use structural sharing at the level of the data structure for linked lists and prepend operations, but how would it work with operations like appending or modifying an element in a list?
JOE For that purpose, we need to be smarter and represent our list as a tree.
JOE It helps because when a list is represented as a tree, most of the nodes in the tree can be shared between two versions of the list.
JOE Imagine that you take a list with 100,000 elements and split it into two lists of 50,000 elements each: elements 0 to 49,999 in list 1, and elements 50,000 to 99,999 in list 2. How many operations would you need to create a new version of the list where a single element—let’s say, element at index 75,100—is modified?
It’s hard for Theo to visualize this kind of stuff mentally. He goes back to the blackboard and draws a diagram (see figure 9.2). Once Theo looks at the diagram, it’s easy for him to answer Joe’s question.
THEO List 1 could be shared with one operation. I’d need to create a new version of list 2, where element 75,100 is modified. It would take 50,000 operations, so it’s one operation of sharing and one operation of copying 50,000 elements. Overall, it’s 50,001 operations.
JOE Correct. You see that by splitting our original list into two lists, we can create a new version of the list with a number of operations in the order of the size of the list divided by 2.
THEO I agree, but 50,000 is still a big number.
JOE Indeed, but nobody prevents us from applying the same trick again, splitting list 1 and list 2 in two lists each.
JOE We can make list 1.1 with elements 0 to 24,999, then list 1.2 with elements 25,000 to 49,999, list 2.1 with elements 50,000 to 74,999, and list 2.2 with elements 75,000 to 99,999.
THEO Can you draw that on the blackboard?
Now, it’s Joe that goes to the blackboard. He draws the diagram in figure 9.3.
THEO Let me count the number of operations for updating a single element. It takes 2 operations of sharing and 1 operation of copying 25,000 elements. Overall, it takes 25,002 operations to create a new version of the list.
THEO Let’s split the list again then!
JOE Absolutely. In fact, we can split the list again and again until the size of the lists is at most 2. Can you guess what is the complexity of creating a new version then?
THEO I’d say around log2 N operations.
JOE I see that you remember well your material from school. Do you have a gut feeling about what is log2 N when N is 100,000?
THEO Let me see ... 2 to the power of 10 is around 1,000, and 2 to the power of 7 is 128. So, it should be a bit less than 17.
JOE It’s 16.6 to be precise. It means that in order to update an element in a persistent list of 100,000 elements, we need around 17 operations. The same goes for accessing elements.
THEO Nice, but 17 is still not negligible.
JOE I agree. We can easily improve the performance of accessing elements by using a higher branching factor in our tree.
JOE Instead of splitting by 2 at each level, we could split by 32.
THEO But the running time of our algorithm would still grow with log N.
JOE You’re right. From a theoretical perspective, it’s the same. From a practical perspective, however, it makes a big difference.
JOE Because log32 N is 5 times lower than log2 N.
THEO That’s true: 2 to the power of 5 is 32.
JOE Back to our list of 100,000 elements, can you tell me how many operations are required to access an element if the branching factor is 32?
THEO With a branching factor of 2, it was 16.6. If I divide 16.6 by 5, I get 3.3.
?Tip By using a branching factor of 32, we make elements accessed in persistent lists more efficient.
THEO Does this trick also improve the performance of updating an element in a list?
THEO How? We’d have to copy 32 elements at each level instead of 2 elements. It’s a 16× performance hit that’s not compensated for by the fact that the tree depth is reduced by 5×!
JOE I see that you are quite sharp with numbers. There is another thing to take into consideration in our practical analysis of the performance: modern CPU architecture.
THEO Interesting. The more you tell me about persistent data structures, the more I understand why you wanted to have this session at a university: it’s because we’re dealing with all this academic stuff.
JOE Yep. So, to continue, modern CPUs read and write data from and to the main memory in units of cache lines, often 32 or 64 bytes long.
THEO What difference does that make?
JOE A nice consequence of this data access pattern is that copying an array of size 32 is much faster than copying 16 arrays of size 2 that belong to different levels of the tree.
JOE The reason is that copying an array of size 32 can be done in a single pair of cache accesses: one for read and one for write. Although for arrays that belong to different tree levels, each array requires its own pair of cache accesses, even if there are only 2 elements in the array.
THEO In other words, the performance of updating a persistent list is dominated by the depth of the tree.
?Tip In modern CPU architectures, the performance of updating a persistent list is dominated much more by the depth of the tree than by the number of nodes at each level of the tree.
JOE That’s correct, up to a certain point. With today’s CPUs, using a branching factor of 64 would, in fact, decrease the performance of update operations.
JOE Now, I am going to make another interesting claim that is not accurate from a theoretical perspective but accurate in practice.
JOE The number of operations it takes to get or update an element in a persistent list with branching factor 32 is constant.
THEO How can that be? You just made the point that the number of operations is log32 N.
JOE Be patient, my friend. What is the highest number of elements that you can have in a list, in practice?
THEO I don’t know. I never thought about that.
JOE Let’s assume that it takes 4 bytes to store an element in a list.
JOE Now, can you tell me how much memory it would take to hold a list with 10 billion elements?
THEO You mean 1 with 10 zeros?
THEO Each element take 4 bytes, so it would be around 40 GB!
JOE Correct. Do you agree that it doesn’t make sense to hold a list that takes 40 GB of memory?
JOE So let’s take 10 billion as an upper bound to the number of elements in a list. What is log32 of 10 billion?
Once again, Theo uses the blackboard to clarify his thoughts. With that, he quickly finds the answer.
THEO 1 billion is approximately 2^30. Therefore, 10 billion is around 2^33. That means that log2 of 10 billion is 33, so log32 of 10 billion should be around 33/5, which is a bit less than 7.
JOE I am impressed again by your sharpness with numbers. To be precise, log32 of 10 billion is 6.64.
THEO (smiling) I didn’t get that far.
JOE Did I convince you that, in practice, accessing or updating an element in a persistent list is essentially constant?
THEO Yes, and I find it quite amazing!
?Tip Persistent lists can be manipulated in near constant time.
THEO What about persistent maps?
JOE It’s quite similar, but I don’t think we have time to discuss it now.
Startled, Theo looks at his watch. This morning’s session has gone by so quickly. He notices that it’s time to get back to the office and have lunch.
On their way back to the office, Theo and Joe don’t talk too much. Theo’s thoughts take him back to what he learned in the university classroom. He feels a lot of respect for Phil Bagwell, who discovered how to manipulate persistent data structures efficiently, and for Rich Hickey, who created a programming language incorporating that discovery as a core feature and making it available to the world. Immediately after lunch, Theo asks Joe to show him what it looks like to manipulate persistent data structures for real in a programming language.
THEO Are persistent data structures available in all programming languages?
JOE A few programming languages like Clojure, Scala, and C# provide them as part of the language. In most programming languages, though, you need a third-party library.
THEO Could you give me a few references?
Using Theo’s laptop, Joe bookmarks some sites. He knows exactly which URLs to look for. Then, while Theo is looking over the bookmarked sites, Joe goes to the whiteboard and jots down the specific libraries in table 9.1.
Immutable.js for JavaScript at https://immutable-js.com/
Paguro for Java at https://github.com/GlenKPeterson/Paguro
Immutable Collections for C# at http://mng.bz/QW51
Pyrsistent for Python at https://github.com/tobgu/pyrsistent
Hamster for Ruby at https://github.com/hamstergem/hamster
THEO What does it take to integrate persistent data structures provided by a third-party library into your code?
JOE In an object-oriented language like Java, it’s quite straightforward to integrate persistent data structures in a program because persistent data structures implement collection interfaces, besides the parts of the interface that mutate in place.
JOE Take for instance, Paguro for Java. Paguro persistent maps implement the read-only methods of java.util.Map like get()
and containsKey()
, but not methods like put()
and remove()
. On the other hand, Paguro vectors implement the read-only methods of java.util.List like get()
and size()
, but not methods like set()
.
THEO What happens when we call put()
or remove()
on a Paguro map?
JOE It throws an UnSupportedOperationException
exception.
THEO What about iterating over the elements of a Paguro collection with a forEach()
?
JOE That works like it would in any Java collection. Here, let me show you an example.
var myVec = PersistentVector.ofIter( List.of(10, 2, 3)); ❶ for (Integer i : myVec) { System.out.println(i); }
❶ Creates a Paguro vector from a Java list
JOE Paguro collections are Java collections, so they support the Java stream interface. Take a look at this code.
?Tip Paguro collections implement the read-only parts of Java collection interfaces. Therefore, they can be passed to any methods that expect to receive a Java collection without mutating it.
THEO So far, you told me how do use Paguro collections as Java read-only collections. How do I make modifications to Paguro persistent data structures?
JOE In a way similar to the _.set()
function of Lodash FP that we talked about earlier. Instead of mutating in place, you create a new version.
THEO What methods does Paguro expose for creating new versions of a data structure?
JOE For vectors, you use replace()
, and for maps, you use assoc()
.
var myMap = PersistentHashMap.of(Map.of("aa", 1, "bb", 2) .entrySet()); ❶ var myNextMap = myMap.assoc("aa", 42);
❶ Creates a Paguro map from a Java map entry set
THEO Yes! Now I see how to use persistent data structures in Java, but what about JavaScript?
JOE In a language like JavaScript, it’s a bit more cumbersome to integrate persistent data structures.
JOE Because JavaScript objects and arrays don’t expose any interface.
JOE It’s not as terrible as it sounds because Immutable.js exposes its own set of functions to manipulate its data structures.
JOE I’ll show you in a moment. But first, let me show you how to initiate Immutable.js persistent data structures.
JOE Immutable.js provides a handy function that recursively converts a native data object to an immutable one. It’s called Immutable.fromJS()
.
THEO What do you mean by recursively?
JOE Consider the map that holds library data from our Library Management System: it has values that are themselves maps. Immutable.fromJS()
converts the nested maps into immutable maps.
THEO Could you show me some code?
JOE Absolutely. Take a look at this JavaScript code for library data.
var libraryData = Immutable.fromJS({ "catalog": { "booksByIsbn": { "978-1779501127": { "isbn": "978-1779501127", "title": "Watchmen", "publicationYear": 1987, "authorIds": ["alan-moore", "dave-gibbons"] } }, "authorsById": { "alan-moore": { "name": "Alan Moore", "bookIsbns": ["978-1779501127"] }, "dave-gibbons": { "name": "Dave Gibbons", "bookIsbns": ["978-1779501127"] } } } });
THEO Do you mean that the catalog
value in libraryData
map is itself an immutable map?
JOE Yes, and the same for booksByIsbn
, authorIds
, and so forth.
THEO Cool! So how do I access a field inside an immutable map?
JOE As I told you, Immutable.js provides its own API for data access. For instance, in order to access a field inside an immutable map, you use Immutable.get()
or Immutable.getIn()
like the following.
Immutable.get(libraryData, "catalog"); Immutable.getIn(libraryData, ["catalog", "booksByIsbn", "978-1779501127", "title"]); // → "Watchmen"
THEO How do I make a modification to a map?
JOE Similar to what we did with Lodash FP, you use an Immutable.set()
or Immutable.setIn()
map to create a new version of the map where a field is modified. Here’s how.
Immutable.setIn(libraryData, ["catalog", "booksByIsbn", "978-1779501127", "publicationYear"], 1988);
THEO What happens when I try to access a field in the map using JavaScript’s dot or bracket notation?
JOE You access the internal representation of the map instead of accessing a map field.
THEO Does that mean that we can’t pass data from Immutable.js to Lodash for data manipulation?
JOE Yes, but it’s quite easy to convert any immutable collection into a native JavaScript object back and forth.
JOE Immutable.js provides a toJS()
method to convert an arbitrary deeply nested immutable collection into a JavaScript object.
THEO But if I have a huge collection, it could take lots of time to convert it, right?
JOE True. We need a better solution. Hopefully, Immutable.js provides its own set of data manipulation functions like map()
, filter()
, and reduce()
.
THEO What if I need more data manipulation like Lodash’s _.groupBy()
?
JOE You could write your own data manipulation functions that work with the Immutable.js collections or use a library like mudash, which provides a port of Lodash to Immutable.js.
►Note You can access the mudash library at https://github.com/brianneisler/mudash.
JOE A cup of coffee, then I’ll show you how to port functions from Lodash to Immutable.js and how to adapt the code from your Library Management System. You can decide on whichever approach works best for your current project.
JOE Let’s start with our search query. Can you look at the current code and tell me the Lodash functions that we used to implement the search query?
THEO Including the code for the unit tests?
►Note See chapter 6 for the unit test of the search query.
THEO The Lodash functions we used were get
, map
, filter
, and isEqual
.
JOE Here’s the port of those four functions from Lodash to Immutable.js.
Immutable.map = function(coll, f) { return coll.map(f); }; Immutable.filter = function(coll, f) { if(Immutable.isMap(coll)) { return coll.valueSeq().filter(f); } return coll.filter(f); }; Immutable.isEqual = Immutable.is;
THEO The code seems quite simple. But can you explain it to me, function by function?
JOE Sure. Let’s start with get
. For accessing a field in a map, Immutable.js provides two functions: get
for direct fields and getIn
for nested fields. It’s different from Lodash, where _.get
works both on direct and nested fields.
JOE Immutable.js provides its own map
function. The only difference is that it is a method of the collection, but it is something that we can easily adapt.
THEO What about filter
? How would you make it work both for arrays and maps like Lodash’s filter
?
JOE Immutable.js provides a valueSeq
method that returns the values of a map.
THEO Cool. And what about isEqual
to compare two collections?
JOE That’s easy. Immutable.js provides a function named is
that works exactly as isEqual
.
THEO So far, so good. What do I need to do now to make the code of the search query work with Immutable.js?
JOE You simply replace each occurrence of an _
with Immutable
; _.map
becomes Immutable.map
, _.filter
becomes Immutable.filter
, and _.isEqual
becomes Immutable.isEqual
.
THEO I can’t believe it’s so easy!
JOE Try it yourself; you’ll see. Sometimes, it’s a bit more cumbersome because you need to convert the JavaScript objects to Immutable.js objects using Immutable.fromJS
.
Theo copies and pastes the snippets for the code and the unit tests of the search query. Then, he uses his IDE to replace the _
with Immutable
. When Theo executes the tests and they pass, he is surprised but satisfied. Joe smiles.
class Catalog { static authorNames(catalogData, authorIds) { return Immutable.map(authorIds, function(authorId) { return Immutable.getIn( catalogData, ["authorsById", authorId, "name"]); }); } static bookInfo(catalogData, book) { var bookInfo = Immutable.Map({ "title": Immutable.get(book, "title"), "isbn": Immutable.get(book, "isbn"), "authorNames": Catalog.authorNames( catalogData, Immutable.get(book, "authorIds")) }); return bookInfo; } static searchBooksByTitle(catalogData, query) { var allBooks = Immutable.get(catalogData, "booksByIsbn"); var queryLowerCased = query.toLowerCase(); var matchingBooks = Immutable.filter(allBooks, function(book) { return Immutable.get(book, "title"). toLowerCase(). includes(queryLowerCased); }); var bookInfos = Immutable.map(matchingBooks, function(book) { return Catalog.bookInfo(catalogData, book); }); return bookInfos; } }
var catalogData = Immutable.fromJS({ "booksByIsbn": { "978-1779501127": { "isbn": "978-1779501127", "title": "Watchmen", "publicationYear": 1987, "authorIds": ["alan-moore", "dave-gibbons"] } }, "authorsById": { "alan-moore": { "name": "Alan Moore", "bookIsbns": ["978-1779501127"] }, "dave-gibbons": { "name": "Dave Gibbons", "bookIsbns": ["978-1779501127"] } } }); var bookInfo = Immutable.fromJS({ "isbn": "978-1779501127", "title": "Watchmen", "authorNames": ["Alan Moore", "Dave Gibbons"] }); Immutable.isEqual( Catalog.searchBooksByTitle(catalogData, "Watchmen"), Immutable.fromJS([bookInfo])); // → true Immutable.isEqual( Catalog.searchBooksByTitle(catalogData, "Batman"), Immutable.fromJS([])); // → true
THEO Shall we move forward and port the add member mutation?
JOE Sure. Porting the add member mutation from Lodash to Immutable.js only requires you to again replace the underscore (_
) with Immutable
. Let’s look at some code.
UserManagement.addMember = function(userManagement, member) { var email = Immutable.get(member, "email"); var infoPath = ["membersByEmail", email]; if(Immutable.hasIn(userManagement, infoPath)) { throw "Member already exists."; } var nextUserManagement = Immutable.setIn(userManagement, infoPath, member); return nextUserManagement; };
THEO So, for the tests, I’d convert the JavaScript objects to Immutable.js objects with Immutable.fromJS()
. How does this look?
var jessie = Immutable.fromJS({ "email": "[email protected]", "password": "my-secret" }); var franck = Immutable.fromJS({ "email": "[email protected]", "password": "my-top-secret" }); var userManagementStateBefore = Immutable.fromJS({ "membersByEmail": { "[email protected]": { "email": "[email protected]", "password": "my-top-secret" } } }); var expectedUserManagementStateAfter = Immutable.fromJS({ "membersByEmail": { "[email protected]": { "email": "[email protected]", "password": "my-secret" }, "[email protected]": { "email": "[email protected]", "password": "my-top-secret" } } }); var result = UserManagement.addMember(userManagementStateBefore, jessie); Immutable.isEqual(result, expectedUserManagementStateAfter); // → true
THEO Does Immutable.js also support JSON serialization and deserialization?
JOE It supports serialization out of the box. As for deserialization, we need to write our own function.
THEO Does Immutable.js provide an Immutable.stringify()
function?
JOE That’s not necessary because the native JSON.stringify()
function works with Immutable.js objects. Here’s another example.
var bookInfo = Immutable.fromJS({ "isbn": "978-1779501127", "title": "Watchmen", "authorNames": ["Alan Moore", "Dave Gibbons"] }); JSON.stringify(bookInfo); // → {"isbn":"978-1779501127","title":"Watchmen", // → "authorNames":["Alan Moore","Dave Gibbons"]}
THEO How does JSON.stringify()
know how to handle an Immutable.js collection?
JOE As an OOP developer, you shouldn’t be surprised by that.
THEO Hmm ... let me think a minute. OK, here’s my guess. Is that because JSON .stringify()
calls some method on its argument?
JOE Exactly! If the object passed to JSON.stringify()
has a .toJSON()
method, it’s called by JSON.stringify()
.
THEO Nice. What about JSON deserialization?
JOE That needs to be done in two steps. You first convert the JSON string to a JavaScript object and then to an immutable collection.
THEO Something like this piece of code?
THEO So far, we have ported pieces of code that dealt with simple data manipulations. I’m curious to see how it goes with complex data manipulations such as the code that computes the structural diff between two maps.
►Note Chapter 5 introduces structural diff.
JOE That also works smoothly, but we need to port another eight functions.
Immutable.reduce = function(coll, reducer, initialReduction) { return coll.reduce(reducer, initialReduction); }; Immutable.isEmpty = function(coll) { return coll.isEmpty(); }; Immutable.keys = function(coll) { return coll.keySeq(); }; Immutable.isObject = function(coll) { return Immutable.Map.isMap(coll); }; Immutable.isArray = Immutable.isIndexed; Immutable.union = function() { return Immutable.Set.union(arguments); };
THEO Everything looks trivial with one exception: the use of arguments
in Immutable .union
.
JOE In JavaScript, arguments
is an implicit array-like object that contains the values of the function arguments.
THEO I see. It’s one of those pieces of JavaScript magic!
JOE Yep. We need to use arguments
because Lodash and Immutable.js differ slightly in the signature of the union
function. Immutable.Set.union
receives an array of lists, whereas a Lodash _.union
receives several arrays.
THEO Makes sense. Let me give it a try.
Blowing on his fingers like a seasoned safecracker, first one hand and then the next, Theo begins typing. Once again, Theo is surprised to discover that after replacing the _
with Immutable
in listing 9.20, the tests pass with the code in listing 9.21.
function diffObjects(data1, data2) { var emptyObject = Immutable.isArray(data1) ? Immutable.fromJS([]) : Immutable.fromJS({}); if(data1 == data2) { return emptyObject; } var keys = Immutable.union(Immutable.keys(data1), Immutable.keys(data2)); return Immutable.reduce(keys, function (acc, k) { var res = diff(Immutable.get(data1, k), Immutable.get(data2, k)); if((Immutable.isObject(res) && Immutable.isEmpty(res)) || (res == "data-diff:no-diff")) { return acc; } return Immutable.set(acc, k, res); }, emptyObject); } function diff(data1, data2) { if(Immutable.isObject(data1) && Immutable.isObject(data2)) { return diffObjects(data1, data2); } if(data1 !== data2) { return data2; } return "data-diff:no-diff"; }
var data1 = Immutable.fromJS({ g: { c: 3 }, x: 2, y: { z: 1 }, w: [5] }); var data2 = Immutable.fromJS({ g: { c:3 }, x: 2, y: { z: 2 }, w: [4] }); Immutable.isEqual(diff(data1, data2), Immutable.fromJS({ "w": [ 4 ], "y": { "z": 2 } }));
JOE What do you think of all this, my friend?
THEO I think that using persistent data collections with a library like Immutable.js is much easier than understanding the internals of persistent data structures. But I’m also glad that I know how it works under the hood.
After accompanying Joe to the office door, Theo meets Dave. Dave had been peering through the window in Theo’s office, looking at the whiteboard, anxious to catch a glimpse of today’s topic on DOP.
DAVE What did Joe teach you today?
THEO He took me to the university and taught me the foundations of persistent data structures for dealing with immutability at scale.
DAVE What’s wrong with the structural sharing that I implemented a couple of months ago?
THEO When the number of elements in the collection is big enough, naive structural sharing has performance issues.
DAVE I see. Could you tell me more about that?
THEO I’d love to, but my brain isn’t functioning properly after this interesting but exhausting day. We’ll do it soon, promise.
DAVE No worries. Have a nice evening, Theo.
It’s possible to manually ensure that our data isn’t mutated, but it’s cumbersome.
At scale, naive structural sharing causes a performance hit, both in terms of memory and computation.
Naive structural sharing doesn’t prevent data structures from being accidentally mutated.
Immutable collections are not the same as persistent data structures.
Immutable collections don’t provide an efficient way to create new versions of the collections.
Persistent data structures provide an efficient way to create new versions of the collections.
Persistent data structures always preserve the previous version of themselves when they are modified.
Persistent data structures represent data internally in such a way that structural sharing scales well, both in terms of memory and computation.
In practice, manipulation of persistent data structures is efficient even for collections with 10 billion entries!
Due to modern architecture considerations, the performance of updating a persistent list is dominated much more by the depth of the tree than by the number of nodes at each level of the tree.
In most languages, third-party libraries provide an implementation of persistent data structures.
Paguro collections implement the read-only parts of Java collection interfaces.
Paguro collections can be passed to any methods that expect to receive a Java collection without mutating them.
1 P. Bagwell, “Ideal hash trees” (No. REP_WORK), 2001. [Online]. Available: https://lampwww.epfl.ch/papers/idealhashtrees.pdf.
3.149.254.35