7 Staying immutable with untrusted code

Image

In this chapter

  • Make defensive copies to protect your code from legacy code and other code you don’t trust.
  • Compare deep copies to shallow copies.
  • Choose when to use defensive copies versus copy-on-write.

We’ve learned how to maintain immutability in our own code using copy-on-write. But we often have to interact with code that doesn’t use the copy-on-write discipline. There are libraries and existing code that we know treat data as mutable. How can you pass your immutable data to it? In this chapter, we learn a practice for maintaining immutability when interacting with code that might change your data.

Immutability with legacy code

It’s time again for MegaMart’s monthly Black Friday sale (yes, they do one every month). The marketing department wants to promote old inventory to clear it out of the warehouse. The code they have to do that is old and has been added to over time. It works and is crucial for keeping the business profitable.

Image Vocab time

In this book, we’ll use the term legacy code to mean existing code (perhaps written with older practices) that we can’t replace at the moment. We have to work with it as is.

Image

All of the code we’ve been managing for the shopping cart has treated the cart as immutable using a copy-on-write discipline. However, the Black Friday promotion code does not. It mutates the shopping cart quite a lot. It was written years ago, it works reliably, and there’s just no time to go back and rewrite it all. We need a way to safely interface with this existing code.

To trigger the Black Friday promotion, we’ll need to add this line of code to add_item_to_cart().

function add_item_to_cart(name, price) {

var item = make_cart_item(name, price);

shopping_cart = add_item(shopping_cart, item);

var total = calc_total(shopping_cart);

set_cart_total_dom(total);

update_shipping_icons(shopping_cart);

update_tax_dom(total);

black_friday_promotion(shopping_cart);

}

we need to add this line of code, but it will mutate the shopping cart

Calling this function will violate copy-on-write, and we can’t modify black_friday_promotion(). Luckily, there is another discipline that will let us call the function safely without violating copy-on-write. The discipline is called defensive copying. We use it to exchange data with code that mutates data.

Our copy-on-write code has to interact with untrusted code

The marketing team’s Black Friday sale code is untrusted. We don’t trust it because it doesn’t implement the copy-on-write immutability discipline that our code follows.

Our code forms a safe zone where we trust all of the functions to maintain immutability. We can mentally relax while we’re using code inside that circle.

The Black Friday code is outside of that safe zone, but our code still needs to run it. And in order to run it, we need to exchange data with it through its inputs and outputs.

Just to be extra clear: Any data that leaves the safe zone is potentially mutable. It could be modified by the untrusted code. Likewise, any data that enters the safe zone from untrusted code is potentially mutable. The untrusted code could keep references to it and modify it after sending it over. The challenge is to exchange data without breaking our immutability.

Image

We’ve seen the copy-on-write pattern, but it won’t quite help us here. In the copy-on-write pattern, we copy before modifying it. We know exactly what modifications will happen. We can reason about what needs to be copied. On the other hand, in this case, the Black Friday routine is so big and hairy that we don’t know exactly what will happen. We need a discipline with more protective power that will completely shield our data from modification. That discipline is called defensive copying. Let’s see how it works.

Defensive copying defends the immutable original

The solution to the problem of exchanging data with untrusted code is to make copies—two, in fact. Here’s how it works.

Image O is for original

Image C is for copy

As data enters the safe zone from the untrusted code, we can’t trust that the data is immutable. We immediately make a deep copy and throw away the mutable original. Since only trusted code has a reference to that copy, it’s immutable. That protects you as data enters.

Image

You still need protection when data leaves. As we’ve said before, any data that leaves the safe zone should be considered mutable because the untrusted code can modify it. The solution is to make a deep copy and send the copy to the untrusted code. That protects you as data leaves.

Image

That’s defensive copying in a nutshell. You make copies as data enters; you make copies as data leaves. The goal is to keep your immutable originals inside the safe zone and to not let any mutable data inside the safe zone. Let’s apply this discipline to Black Friday.

Implementing defensive copies

We need to call a function that mutates its argument, but we don’t want to break our hard-won immutable discipline. We can use defensive copies to protect data and maintain immutability. It’s called defensive because you are defending your original from modifications.

black_friday_promotion() modifies its argument, the shopping cart. We can deep copy the shopping cart and pass the copy to the function. That way, it won’t modify the original.

Original

function add_item_to_cart(name, price) {

var item = make_cart_item(name, price);

shopping_cart = add_item(shopping_cart,

item);

var total = calc_total(shopping_cart);

set_cart_total_dom(total);

update_shipping_icons(shopping_cart);

update_tax_dom(total);

 

black_friday_promotion(shopping_cart);

}

Copy before sharing data

function add_item_to_cart(name, price) {

var item = make_cart_item(name, price);

shopping_cart = add_item(shopping_cart,

item);

var total = calc_total(shopping_cart);

set_cart_total_dom(total);

update_shipping_icons(shopping_cart);

update_tax_dom(total);

var cart_copy = deepCopy(shopping_cart);

black_friday_promotion(cart_copy);

}

copy data as it leaves

That’s great, except we need the output from black_friday_promotion(). Its output is the modifications it does to the shopping cart. Luckily, it has modified cart_copy. But can we use cart_copy safely? Is it immutable? What if black_friday_promotion() keeps a reference to that shopping cart and modifies it later? These are the kinds of bugs you find weeks, months, or years later. The solution is to make another defensive copy as the data enters our code.

Copy before sharing data

function add_item_to_cart(name, price) {

var item = make_cart_item(name, price);

shopping_cart = add_item(shopping_cart,

item);

var total = calc_total(shopping_cart);

set_cart_total_dom(total);

update_shipping_icons(shopping_cart);

update_tax_dom(total);

var cart_copy = deepCopy(shopping_cart);

black_friday_promotion(cart_copy);

 

}

Copy before and after sharing data

function add_item_to_cart(name, price) {

var item = make_cart_item(name, price);

shopping_cart = add_item(shopping_cart,

item);

var total = calc_total(shopping_cart);

set_cart_total_dom(total);

update_shipping_icons(shopping_cart);

update_tax_dom(total);

var cart_copy = deepCopy(shopping_cart);

black_friday_promotion(cart_copy);

shopping_cart = deepCopy(cart_copy);

}

copy data as it enters

And that’s the defensive copy pattern. As we’ve seen, you protect yourself by making copies. You copy data as it leaves your system, and you copy it as it comes back in.

The copies we make need to be deep copies. We’ll see how to implement that in just a moment.

The rules of defensive copying

Defensive copying is a discipline that maintains immutability when you have to exchange data with code that does not maintain immutability. We’ll call that code you don’t trust. Here are the two rules:

Rule 1: Copy as data leaves your code

If you have immutable data that will leave your code and enter code that you don’t trust, follow these steps to protect your original:

Image Vocab time

Deep copies duplicate all levels of nested data structures, from the top all the way to the bottom.

  1. Make a deep copy of the immutable data.
  2. Pass the copy to the untrusted code.

Rule 2: Copy as data enters your code

If you are receiving data from untrusted code, that data might not be immutable. Follow these steps:

  1. Immediately make a deep copy of the mutable data passed to your code.
  2. Use the copy in your code.

If you follow these two rules, you can interact with any code you don’t trust without breaking your immutable discipline.

Note that these rules could be applied in any order. Sometimes you send data out, and then data comes back. That’s what happens when your code calls a function from an untrusted library.

On the other hand, sometimes you receive data before you send data out. That happens when untrusted code calls a function in your code, like if your code is part of a shared library. Just keep in mind that the two rules can be applied in either order.

We are going to implement defensive copying a few more times. But before we move on to another implementation, let’s keep working on the code we just saw for the Black Friday promotion. We can improve it by wrapping it up.

Also note that sometimes there is no input or output to copy.

Wrapping untrusted code

We have successfully implemented defensive copying, but the code is a bit unclear with all of the copying going on. Plus, we’re going to have to call black_friday_promotion() many times in the future. We don’t want to risk getting the defensive copying wrong. Let’s wrap up the function in a new function that includes the defensive copying inside it.

Image

Original

function add_item_to_cart(name, price) {

var item = make_cart_item(name, price);

shopping_cart = add_item(shopping_cart,

item);

var total = calc_total(shopping_cart);

set_cart_total_dom(total);

update_shipping_icons(shopping_cart);

update_tax_dom(total);

var cart_copy = deepCopy(shopping_cart);

black_friday_promotion(cart_copy);

shopping_cart =

deepCopy(cart_copy);

}

Extracted safe version

function add_item_to_cart(name, price) {

var item = make_cart_item(name, price);

shopping_cart = add_item(shopping_cart,

item);

var total = calc_total(shopping_cart);

set_cart_total_dom(total);

update_shipping_icons(shopping_cart);

update_tax_dom(total);

 

 

shopping_cart =

black_friday_promotion_safe(shopping_cart);

}

 

function black_friday_promotion_safe(cart) {

var cart_copy = deepCopy(cart);

black_friday_promotion(cart_copy);

return deepCopy(cart_copy);

}

extract this code into a new function

Now we can call black_friday_promotion_safe() without worry. It protects our data from modification. And now it’s much more convenient and clear to see what’s going on.

Let’s look at another example.

Image It’s your turn

MegaMart uses a third-party library for calculating payroll. You pass the function payrollCalc() through an array of employee records and it returns an array of payroll checks. The code is defintely untrusted. It will probably modify the employee array, and who knows what it does with the payroll checks.

Your job is to wrap it up in a function that makes it safe using defensive copies. Here’s the signature of payrollCalc():

function payrollCalc(employees) {

return payrollChecks;

}

make a defensive copy version of this

Write the wrapper called payrollCalcSafe().

function payrollCalcSafe(employees) {

Image

Image

}

Image Answer

function payrollCalcSafe(employees) {

var copy = deepCopy(employees);

var payrollChecks = payrollCalc(copy);

return deepCopy(payrollChecks);

}

Image It’s your turn

MegaMart has another legacy system that serves up data about users of the software. You subscribe to updates to users as they change their settings.

But here’s the thing: All the parts of the code that subscribe get the same exact user data. All of the references are to the same objects in memory. Obviously, the user data is coming from untrusted code. Your task is to protect the safe zone with defensive copying. Note that there is no data going back out to the unsafe zone—there’s only mutable user data coming in.

You call it like this:

userChanges.subscribe(function(user) {

 

 

processUser(user);

 

 

});

pass in a callback function

the function will be called with user data whenever the user information changes

all callbacks will be passed a reference to the same mutable object

implement your defensive copying here

imagine this is an epic function in your safe zone. protect it!

Rules of defensive copying

  1. Copy as data leaves trusted code.
  2. Copy as data enters trusted code.

Image Answer

userChanges.subscribe(function(user) {

var userCopy = deepCopy(user);

procssUser(userCopy);

});

no need to copy again because there is no data leaving the safe zone

Defensive copying you may be familiar with

Defensive copying is a common pattern that you might find outside of the traditional places. You may have to squint to see it, though.

Defensive copying in web application programming interfaces (API)

Most web-based APIs are doing implicit defensive copying. Here’s a scenario of how that might look.

A web request comes into your API as JSON. The JSON is a deep copy of data from the client that is serialized over the internet. Your service does its work, then sends the response back as a serialized deep copy, also in JSON. It’s copying data on the way in and on the way back.

It’s doing defensive copying. One of the benefits of a service-oriented or microservices system is that the services are doing defensive copying when they talk to each other. Services with different coding practices and disciplines can communicate without problems.

Image Vocab time

When modules implement defensive copying when talking to each other, this is often called a shared nothing architecture because the modules don’t share references to any data. You don’t want your copy-on-write code to share references with untrusted code.

Defensive copying in Erlang and Elixir

Erlang and Elixir (two functional programming languages) implement defensive copying as well. Whenever two processes in Erlang send messages to each other, the message (data) is copied into the mailbox of the receiver. Data is copied on the way into a process and on the way out. The defensive copying is key to the high reliability of Erlang systems.

For more information on Erlang and Elixir, see https://www.erlang.org and https://elixir-lang.org.

We can tap into the same benefits that microservices and Erlang use in our own modules.

Image Brain break

There’s more to go, but let’s take a break for questions

Q: Wait! Is it really okay to have two copies of the user data at the same time? Which is the real one that represents the user?

A: That’s a great question, and it’s one of the conceptual changes that people go through when learning functional programming. Many people are used to having a user object that represents a user of their software, and it’s confusing to have two copies of the same object. Which one represents the user?

In functional programming, we don’t represent the user. We record and process data about the user. Remember the definition of data: facts about events. We record facts, like the name of the user, about events, like them submitting a form. We can copy those facts as many times as we want.

Q: Copy-on-write and defensive copying seem very similar. Are they really different? Do we need both?

A: Copy-on-write and defensive copying are both used to enforce immutability, so it seems like we should only need one. The fact is that you could get away with only doing defensive copying, even inside the safe zone. That would enforce immutability just fine.

However, defensive copying makes deep copies. Deep copies are much more expensive than shallow copies because they copy the entire nested data structure from top to bottom. We don’t need to make so many copies when we trust the code we’re passing data to. So in order to save the processing and memory of all of those copies, we use copy-on-write when we can, which is everywhere inside the safe zone. The two disciplines work together.

It’s important to compare the two approaches so that we can have a better understanding of when to use each. Let’s do that now.

Copy-on-write and defensive copying compared

Copy-on-write

When to use it

Use copy-on-write when you need to modify data you control.

Defensive copying

When to use it

Use defensive copying when exchanging data with untrusted code.

Copy-on-write

Where to use it

You should use copy-on-write everywhere inside the safe zone. In fact, the copy-on-write defines your immutability safe zone.

Defensive copying

Where to use it

Use copy-on-write at the borders of your safe zone for data that has to cross in or out.

Copy-on-write

Type of copy

Shallow copy—relatively cheap

Defensive copying

Type of copy

Deep copy—relatively expensive

Copy-on-write

The rules
  1. Make a shallow copy of the thing to modify.
  2. Modify the copy.
  3. Return the copy.

Defensive copying

The rules
  1. Make a deep copy of data as it enters the safe zone.
  2. Make a deep copy of data as it leaves the safe zone.

Deep copies are more expensive than shallow copies

The difference between a deep copy and a shallow copy is that a deep copy shares no structure with the original. Every nested object and array is copied. In a shallow copy, we can share a lot of the structure—anything that doesn’t change can be shared.

Shallow copy

Image

In a deep copy, we make copies of everything. We use a deep copy because we don’t trust that any of it will be treated as immutable by the untrusted code.

Deep copy

Image

Deep copies are obviously more expensive. That’s why we don’t do them everywhere. We only do them where we can’t guarantee that copy-on-write will be followed.

Implementing deep copy in JavaScript is difficult

Deep copy is a simple idea that should have a simple implementation. However, in JavaScript it is quite hard to get right because there isn’t a good standard library. Implementing a robust one is beyond the scope of this book.

I recommend using the implementation from the Lodash library (see lodash.com). Specifically, the function _.cloneDeep() (see lodash.com/docs/#cloneDeep) does a deep copy of nested data structures. The library is trusted by thousands if not millions of JavaScript developers.

However, just for completeness, here is a simple implementation that may satisfy your curiosity. It should work for all JSON-legal types and functions.

function deepCopy(thing) {

if(Array.isArray(thing)) {

var copy = [];

for(var i = 0; i < thing.length; i++)

copy.push(deepCopy(thing[i]));

return copy;

} else if (thing === null) {

return null;

} else if(typeof thing === "object") {

var copy = {};

var keys = Object.keys(thing);

for(var i = 0; i < keys.length; i++) {

var key = keys[i];

copy[key] = deepCopy(thing[key]);

}

return copy;

} else {

return thing;

}

}

recursively make copies of all of the elements

strings, numbers, booleans, and functions are immutable so they don’t need to be copied

This function will not hold up to the quirks of JavaScript. There are many more types out there that this will fail on. However, as an outline of what needs to be done, it does a decent job. It shows that arrays and objects need to be copied, but also that the function will recurse into all of the elements of those collections.

I highly recommend using a robust deep copy implementation from a widely used JavaScript library like Lodash. This deep copy function is just for teaching purposes and will not work in production.

Image It’s your turn

The following statements are about the two types of copying we have seen, shallow and deep. Some statements are true for deep copying and some are true for shallow copies. And some are true for both! Write DC by the ones that apply to deep copying and SC for those that apply to shallow copying.

  1. It copies every level of a nested structure.
  2. It is much more efficient than the other because two copies can share structure.
  3. It copies only the parts that change.
  4. Because copies don’t share structure, it is good for protecting the original from untrusted code.

It is useful for implementing a shared nothing architecture.

Key

  1. DC deep copying
  2. SC shallow copying

Image Answer

1. DC; 2. SC; 3. SC; 4. DC; 5. DC.

A dialogue between copy-on-write and defensive copying

The topic: Which discipline is more important?

Copy-on-write:

Obviously, I’m more important. I help people keep their data immutable.

Defensive copying:

That doesn’t make you more important. I help keep data immutable, too.

Well, my shallow copies are way more efficient than your deep copies.

Well, you only have to worry about that because you need to make a copy EVERY SINGLE TIME data is modified. I only need to make copies when data enters or leaves the safe zone.

Exactly my point! There wouldn’t even be a safe zone without me.

Well, I suppose you’re right about that. But your safe zone wouldn’t be any use at all if it couldn’t pass data to the outside. That’s where all the existing code and libraries are.

Well, I really think they should be using me in those legacy codebases and libraries, too. They could learn a lot from a discipline like me. Convert their writes to reads, and the reads naturally become calculations.

Listen, that is never going to happen. Just accept it. There’s too much code out there. There aren’t enough programmers in the whole world to ever rewrite it all.

You’re right! (sobbing) I should face facts. I’m worthless without you!

Oh, now I’m getting all emotional, too. (tears running down face) I can’t live without you, either!

(hugs) (hugs)

Oh, brother! Moving on…

Image It’s your turn

The following statements are about immutability disciplines. Some are true for defensive copying and some are true for copy-on-write. And some are true for both! Write DC by the ones that apply to defensive copying and CW for those that apply to copy-on-write.

  1. It makes deep copies.
  2. It is cheaper than the other.
  3. It is an important way to maintain immutability.
  4. It copies data before modifying the copy.
  5. It is used inside the safe zone to maintain immutability.
  6. You use it when you want to exchange data with untrusted code.
  7. It is a complete immutability solution. It can be used without the other.
  8. It uses shallow copies.
  9. It copies data before sending it to untrusted code.
  10. It copies data it receives from untrusted code.

Key

  1. DC deep copying
  2. SC shallow copying

Image Answer

1. DC; 2. CW; 3. DC and CW, 4. CW; 5. CW; 6. DC; 7. DC; 8. CW; 9. DC; 10. DC.

Image It’s your turn

Your team has just started using a copy-on-write discipline to create a safe zone. Every time your team writes new code, they make sure to keep it immutable. A new task requires you to write code that interacts with existing code that does not keep the immutability discipline. Which of the following courses of action would ensure you maintain immutability? Check all the statements that apply. Justify your responses.

  1. Use defensive copying when exchanging data with the existing code.
  2. Use a copy-on-write discipline when exchanging data with the existing code.
  3. Read the source code of the existing code to see if it modifies the data. If it doesn’t, we don’t need to use any special discipline.
  4. Rewrite the preexisting code to use copy-on-write and call the rewritten code without defensive copying.
  5. The code belongs to your team, so it’s already part of the safe zone.

Image Answer

  1. Yes. Defensive copying will protect your safe zone at the cost of memory and work to make the copies.
  2. No. Copy-on-write only works when you are calling other functions that implement copy-on-write. If you’re not sure, your existing code probably does not implement it.
  3. Maybe. If you analyze the source code, you might discover that it doesn’t modify the data you pass it. However, also be on the lookout for other things it might do, like pass the data to a third part of the code.
  4. Yes. If you can afford it, a rewrite using copy-on-write would solve the problem.
  5. No. Just because you own it does not mean it enforces the immutability discipline of your team.

Conclusion

In this chapter we learned a more powerful yet more expensive discipline for maintaining immutability called defensive copying. It’s more powerful because it can implement immutability all by itself. It’s more expensive because it needs to copy more data. However, when you use defensive copying in tandem with copy-on-write, you can get all of the benefits of both—power when you need it, but shallow copies for efficiency.

Summary

  • Defensive copying is a discipline for implementing immutability. It makes copies as data leaves or enters your code.
  • Defensive copying makes deep copies, so it is more expensive than copy-on-write.
  • Unlike copy-on-write, defensive copying can protect your data from code that does not implement an immutability discipline.
  • We often prefer copy-on-write because it does not require as many copies and we use defensive copying only when we need to interact with untrusted code.
  • Deep copies copy an entire nested structure from top to bottom. Shallow copies only copy the bare minimum.

Up next…

In the next chapter, we will pull together everything we’ve learned so far and discover a way to organize our code to improve the design of our system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.1.232