Greek philosopher Heraclitus said that we cannot step into the same river twice; the river constantly changes, so the river that was there a moment ago is no longer. Many programmers would disagree, objecting that it’s the same river but its state has changed. Functional programmers try to stay true to Heraclitus’s thinking and would create a new river with every observation.
Most programs are built to represent things and processes in the real world, and because the world constantly changes, programs must somehow represent that change. The question is how we represent change. Commercial applications written in the imperative style have state mutation at their core: objects represent entities in the business domain, and change in the world is modeled by mutating the state of these objects.
We’ll start by looking at the weaknesses we introduce in our programs when we use mutation. We’ll then see how we can avoid these problems at the source by representing change without using mutation and, more pragmatically, how to enforce immutability in C#. Finally, because much of our programs’ data is stored in data structures, we’ll introduce the concepts and techniques behind functional data structures, which are also immutable.
State mutation is when memory is updated in place, and an important problem with it is that concurrent access to a shared mutable state is unsafe. You’ve already seen examples demonstrating loss of information due to concurrent updates in chapters 1 and 3; let’s now look at a more object-oriented scenario. Imagine a Product
class with an Inventory
field, representing the number of units in stock:
public class Product { public int Inventory { get; private set; } public void ReplenishInventory(int units) => Inventory += units; public void ProcessSale(int units) => Inventory -= units; }
If Inventory
is mutable as this example shows, and you have concurrent threads updating its value, that can lead to race conditions, and the results can be unpredictable. Imagine that you have a thread replenishing the inventory, while another thread concurrently processes a sale, diminishing the inventory as figure 11.1 shows. If both threads read the value at the same time, and the thread with the sale has the last update, you’ll end up with an overall decrease in inventory.
Not only has the update to replenish the inventory been lost, but the first thread now potentially faces a completely invalid state: a product that’s just been replenished has zero inventory.
If you’ve done some basic multithreading, you’re probably thinking, “Easy! You just need to wrap the updates to Inventory
in a critical section using the lock
statement.” It turns out that this solution, which works for this simple case, can become the source of some difficult bugs as the complexity of the system increases. (A sale affects not only the inventory, but the sales order, the company balance sheet, and so on.)
If things can fail when a single variable is set, imagine when an update to an entity involves updating several fields. For example, imagine that when you update the inventory, you also set a flag indicating whether the product is low on inventory as the following listing shows.
class Product { int inventory; public bool IsLowOnInventory { get; private set; } public int Inventory { get => inventory; private set { inventory = value; ❶ IsLowOnInventory = inventory <= 5; } } }
❶ At this point, the object can be in an invalid state from the perspective of any thread reading its properties.
This code defines an invariant: when inventory
is 5 or less, then IsLowOnInventory
must be true.
In a single-threaded setting, there aren’t any problems with the preceding code. But in a multithreaded setting, a thread could be reading the state of this object just as another thread is performing the update in the window during which Inventory
has been updated but IsLowOnInventory
hasn’t. (Notice that this window widens if the logic to compute IsLowOnInventory
becomes more expensive.) During that window, the invariant can be broken, so the object would appear to be in an invalid state to the first thread. This will, of course, happen very rarely, and it will be nearly impossible to reproduce. This is part of the reason why bugs caused by race conditions are so hard to diagnose.
Indeed, race conditions are known to have caused some of the most spectacular failures in the software industry. If you have a system with concurrency and state mutation, it’s impossible to prove that the system is free of race conditions.1 In other words, if you want concurrency (and, given today’s tendency toward multicore processors and distributed computing, you hardly have a choice) and strong guarantees of correctness, you simply must give up mutation.
Lack of safe concurrent access may be the biggest pitfall of a shared mutable state, but it’s not the only one. Another problem is the risk of introducing coupling—a high degree of interdependence between different parts of your system. In figure 11.1, Inventory
is encapsulated, meaning it can only be set from within the class, and according to OOP theory, that’s supposed to give you a sense of comfort. But how many methods in the Product
class can set the inventory value? How many code paths lead into these methods so that they ultimately affect the value of Inventory
? How many parts of the application can get the same instance of the Product
and rely on the value of Inventory
, and how many will be affected if you introduce a new component that causes Inventory
to change?
For a non-trivial application, it’s difficult to answer these questions completely. This is why inventory
, even though it’s a private field and can be set only via a private setter, qualifies as a global mutable state; as far as we can tell, it could be mutated by any part of the program via public methods in the enclosing class. As a result, mutable state couples the behavior of the various components that read or update that state, making it difficult to reason about the behavior of the system as a whole.
Finally, shared mutable state implies loss of purity. As explained in chapter 3, mutating global state (remember, that’s all state that’s not local to a function, including private variables) constitutes a side effect. If you represent change in the world by mutating objects in your system, you lose the benefits of function purity. For these reasons, the functional paradigm discourages state mutation altogether.
NOTE In this chapter, you’ll learn how to work with immutable data objects. That’s an important technique, but keep in mind that it’s not always sufficient to represent entities that change with time. Immutable data objects can represent the state of an entity at any given point in time, somewhat like a frame in a film, but to represent the entity itself, to get the full moving picture, you need a further abstraction that links those successive states together. We’ll discuss techniques for accomplishing that in chapters 13, 15, 18, and 19.
Let’s look more closely at change and mutation.2 By change, I mean change in the real world, such as when 50 units of stock become available for sale. Mutation means data is updated in place; as you saw in the Product
class, when the Inventory
value is updated, the previous value for Inventory
is lost.
In FP, we represent change without mutation: values aren’t updated in place. Instead, we create new instances that represent the data with the desired changes, as figure 11.2 shows. The fact that the current level of inventory is 53 doesn’t obliterate the fact that it was previously 3.
In FP, we work with immutable values: once a value is initialized, it’s never updated.
To refine or redefine your intuition about change and mutation, it’s useful to distinguish between things that change and things that don’t.
There are some things that we think of as inherently immutable. For example, your age may change from 30 to 31, but the number 30 is still the number 30, and 31 is still 31.
This is modeled in the Base Class Library (BCL) in that all primitive types are immutable. What about more complex types? Dates are a good example. The third of March is still the third of March, even though you may change an appointment in your calendar from the third of March to the fourth. This is also reflected in the BCL in that types that are used to represent dates such as DateTime
are immutable.3 See this for yourself by typing the following in the REPL (use DateTime
instead of DateOnly
if you don’t have .NET 6):
var momsBirthday = new DateOnly(1966, 12, 13); var johnsBirthday = momsBirthday; ❶ // some time goes by... johnsBirthday = johnsBirthday.AddDays(1); ❷ johnsBirthday // => 14/12/1966 momsBirthday // => 13/12/1966 ❸
❶ John has the same birthday as Mom.
❷ You realize that John’s birthday is actually one day later.
❸ Mom’s birthday was not affected.
In the preceding example, we start by saying that Mom and John have the same birthday, so we assign the same value to momsBirthday
and johnsBirthday
. When we then use AddDays
to create a later date and assign it to johnsBirthday
, this leaves momsBirthday
unaffected. In this example, we are doubly protected from mutating the date:
Because System.DateOnly
is a struct, it’s copied upon assignment, so momsBirthday
and johnsBirthday
are different instances.
Even if DateOnly
were a class, so that momsBirthday
and johnsBirthday
pointed to the same instance, the behavior would still be the same because AddDays
creates a new instance, leaving the underlying instance unaffected.
If, on the other hand, DateOnly
were a mutable class and AddDays
mutated the days of its instance, the value of momsBirthday
would be updated as a result—or, rather, as a side effect—of updating johnsBirthday
. (Imagine explaining to Mom that that’s the reason for your belated birthday wishes.)
Now let’s define a custom immutable type. Say we represent a Circle
like so:
You would probably agree that it makes no sense that a circle should ever grow or shrink because it’s a completely abstract geometric entity. The preceding implementation reflects this by declaring the struct as readonly
, which makes it immutable. This means that it will not be possible to update the values for Radius
and Center
; once created, the state of the circle can never change.4
If you have a circle and you’d like a circle double the size, you can define functions to create a new circle based on an existing one. Here’s an example:
OK, so far we haven’t used mutation, and these examples are pretty intuitive. What do numbers, dates, and geometric entities have in common? Their value captures their identity: they are value objects. If you change the value of a date . . . well, it identifies a different date! The problems begin when we consider objects whose value and identity are different things. We’ll look at this next.
Many real-world entities change with time: your bank account, your calendar, your contacts list—all these things have a state that changes with time. Figure 11.3 illustrates this idea.
For such entities, their identity isn’t captured by their value because their identity remains constant, whereas their value changes with time. Instead, their identity is associated with different states at different points in time. Your age may change, or your salary, but your identity doesn’t. To represent such entities, programs must model not only an entity’s state (that’s the easy part), but the transitions from one state to another and often the association of an identity with the entity’s current state.
We’ve discussed some reasons why mutation provides an imperfect mechanism for managing state transitions. In FP, states are not mutated; they’re snapshots that, like the frames of a film, represent an evolving reality but are in themselves static.
To illustrate immutable data objects in C#, let’s start working on AccountState
, which we’ll use to represent the state of a bank account in the BOC application. The following listing shows our model.
public enum AccountStatus { Requested, Active, Frozen, Dormant, Closed } public record AccountState ( CurrencyCode Currency, AccountStatus Status = AccountStatus.Requested, decimal AllowedOverdraft = 0m, IEnumerable<Transaction> TransactionHistory = null ); public record Transaction ( decimal Amount, string Description, DateTime Date );
For brevity, I’ve omitted the definition of CurrencyCode
, which simply wraps a string value such as EUR or USD similarly to the ConnectionString
and SqlTemplate
types we saw in section 9.4.1.
Because AccountState
has several fields and not all may be meaningful all the time, I have provided some reasonable default values for all fields except the currency. To create an AccountState
, all you really need is its currency:
This creates an AccountState
with a default status of Requested
. When you’re ready to activate the account, you can do this by using a with
expression:
public static AccountState Activate(this AccountState original) => original with { Status = AccountStatus.Active };
This creates a new instance of AccountState
, populated with all the values from the original except for Status
, which is set to the new value. The original object is still intact:
var original = new AccountState(Currency: "EUR"); var activated = original.Activate(); original.Status // Requested original.Currency // "EUR" activated.Status // Active activated.Currency // "EUR"
Notice that you can use with
expressions that set more than one property:
public static AccountState RedFlag(this AccountState original) => original with { Status = AccountStatus.Frozen, AllowedOverdraft = 0m };
Next, let’s see how we can further improve this model.
Have another look at the proposed definition of AccountState
(replicated in the following snippet) and see if you can spot any potential problems with it:
public record AccountState ( CurrencyCode Currency, AccountStatus Status = AccountStatus.Requested, decimal AllowedOverdraft = 0m, IEnumerable<Transaction> TransactionHistory = null );
There are in fact a couple of issues here. One thing that immediately stands out is the default value of null
for the list of transactions. The reason for providing a default value is that when a new account is created, it will have no previous transactions, so it makes sense to have this as an optional parameter. But we also don’t want null
to potentially cause a NullReferenceException
. Secondly, this record definition allows you to create an account by changing the currency of an existing account, like so:
This makes no sense. Although the status of an account may go from, say, Requested
to Active
, once an account is opened with a given currency, that should never change. We’d like our model to represent this. Let’s see how we can address both issues, starting with the latter.
Read-only vs. init-only properties
When you use positional records, the compiler creates an init-only auto property for each parameter you declare. This is a property with a get
and an init
method; the latter is a setter that can only be called when the record instance is initialized. If we were to explicitly declare the Currency
property as a public init-only auto property, just as the compiler would generate, it would look like this:
public record AccountState ( CurrencyCode Currency, AccountStatus Status = AccountStatus.Requested, decimal AllowedOverdraft = 0m, IEnumerable<Transaction> TransactionHistory = null ) { public CurrencyCode Currency { get; init; } = Currency; }
The following listing breaks this down so that you can see what every bit means.
public record AccountState(CurrencyCode Currency /*...*/) { public CurrencyCode Currency ❶ { get; ❷ init; ❸ } = ❹ Currency; ❺ }
❶ Currency
here refers to the name of the property.
❷ Gets the value of the property
❸ Allows the value to be set only upon record initialization
❹ Introduces the property initializer
❺ Currency
here refers to the constructor parameter; this means that upon initialization the Currency
property is set to the value provided for the Currency
constructor parameter.
When you use a with
expression to create a modified version of a record, the runtime creates a clone of the original and then calls the init
method of any properties for which you’ve provided new values. Now, writing the property explicitly allows us to override the compiler’s defaults; in this case, we want to define the Currency
property as a read-only auto property by removing the init
method:
Then a with
expression attempting to create a modified version of an account with a different currency will not compile because there’s no init
method for setting the Currency
of the copy.
Immutable objects never change, so all properties of an immutable object must be either read-only or init-only:
Use init-only properties if it makes sense to create a copy where a property is given an updated value.
As you’ve seen, the compiler-generated properties of positional records are init-only, so you need to explicitly declare them if you want them to be read-only.
Initializing an optional list to be empty
Now let’s go back to the problem of TransactionHistory
, which is initialized to be null
when no value is passed to the constructor for AccountState
. What we really want is to have an empty list as the default value, so ideally we’d like to write
public record AccountState ( // ... IEnumerable<Transaction> TransactionHistory = Enumerable.Empty<Transaction>() );
But this doesn’t compile because default values for optional arguments must be compile- time constants. The most concise solution is to explicitly define the Transaction
-History
property and use a property initializer, as the following listing shows.
public record AccountState ( CurrencyCode Currency, AccountStatus Status = AccountStatus.Requested, decimal AllowedOverdraft = 0m, IEnumerable<Transaction> TransactionHistory = null ) { public IEnumerable<Transaction> TransactionHistory { get; init; } = TransactionHistory ❶ ?? Enumerable.Empty<Transaction>(); ❷ }
❶ Refers to the constructor parameter
❷ Uses an empty list if the constructor was given null
While default values for method arguments must be compile-time constants, property initializers don’t have this constraint. Therefore, we can include some logic in the property initializer. The previous code replaces the auto-generated property for TransactionHistory
with an explicit declaration; it’s essentially saying, “When a new AccountState
is created, use the value given for the optional TransactionHistory
constructor parameter to populate the TransactionHistory
property, but use an empty list if it’s null
.”
There are other possible approaches: you could explicitly define a constructor and have this logic in the constructor, or define a full property with a backing field and have this logic in the property’s init
method.
There is one more tweak. For an object to be immutable, all its members must be immutable. If you look at the definition for AccountState
, there’s a catch. TransactionHistory
is defined as an IEnumerable<Transaction>
, and while Transaction
is immutable, there are many mutable lists that implement IEnumerable
. For example, consider the following code:
var mutableList = new List<Transaction>(); var account = new AccountState ( Currency: "EUR", TransactionHistory: mutableList ); account.TransactionHistory.Count() // => 0 mutableList.Add(new(-1000, "Create trouble", DateTime.Now)); account.TransactionHistory.Count() // => 1
This code creates an AccountState
with a mutable list; it then holds a reference to that list so that the list can still be mutated. As a result, we cannot say that our definition of AccountState
is truly immutable.
There are two possible solutions. You could change the type definition, declaring TransactionHistory
to be an ImmutableList
rather than an IEnumerable
. Alternatively, you could rewrite the property as the following listing shows.
using System.Collections.Immutable; public record AccountState // ... { public CurrencyCode Currency { get; } = Currency; public IEnumerable<Transaction> TransactionHistory { get; init; } = ImmutableList.CreateRange (TransactionHistory ?? Enumerable.Empty<Transaction>()); }
This code creates an ImmutableList
from the given IEnumerable
, thus making AccountState
truly immutable.
TIP If given an ImmutableList
, CreateRange
will just return it so that you don’t incur any overhead by using this approach. Otherwise, it will create a defensive copy, ensuring that any subsequent mutation to the given list does not affect AccountState
.
If an account has an immutable list of transactions, how do you add a transaction to the list? You don’t. You create a new list that has the new transaction as well as all existing ones, and that will be part of a new AccountState
. The following listing shows that adding a child to an immutable object involves the creation of a new parent object.
using LaYumba.Functional; ❶ public static AccountState Add (this AccountState account, Transaction trans) => account with { TransactionHistory = account.TransactionHistory.Prepend(trans) ❷ };
❶ Includes Prepend
as an extension method on IEnumerable
❷ A new IEnumerable
, including existing values and the one being added
Notice that in this particular case, we’re prepending the transaction to the list. This is domain-specific; in most cases, you’re interested in the latest transactions, so it’s efficient to keep the latest ones at the front of the list.
Copying a list every time a single element is added or removed may sound terribly inefficient, but this isn’t necessarily the case. We’ll discuss why in chapter 12.
One of the ways in which FP reduces coupling in your applications, therefore making them simpler and easier to maintain, is that it naturally leads to a separation between data and logic. This is the approach we’ve been following in the preceding section:
AccountState
, which we defined in listing 11.2, only contains data.
Business logic, such as activating an account or adding a transaction, is modeled through functions.
We can group all these functions into a static Account
class, including logic for creating new and updated versions of AccountState
, as the following listing demonstrates.
public static class Account { public static AccountState Create(CurrencyCode ccy) => new(ccy); public static AccountState Activate(this AccountState account) => account with { Status = AccountStatus.Active }; public static AccountState Add (this AccountState account, Transaction trans) => account with { TransactionHistory = account.TransactionHistory.Prepend(trans) }; }
Account
is a static class for representing changes to an account, including a factory function. While AccountState
represents the state of the account at a given time, the functions in Account
represent state transitions. This is illustrated in figure 11.4.
When we write logic at a high level, we only rely on Account
: for example,
This means that FP allows you to treat representing state and representing state transitions as separate concerns. Also, business logic is higher-level compared to the data (Account
depends on the lower-level AccountState
).
Account
is a class because C# syntax requires it (with the exception of top-level statements, you cannot declare methods or delegates outside of a class), but conceptually, it’s just a grouping of related functions. This can be referred to as a module. These functions don’t rely on any state in the enclosing class, so you can think of them as free-standing functions and of the class name as part of the namespace.
This separation between data (which is inert) and functions (which perform data transformations) is typical of FP. This is in stark contrast with OOP, where objects include both data and methods that mutate that data.
Separating data from logic results in simpler systems with less coupling that are, therefore, easier to understand and to maintain. It is also a logical choice when programming with distributed systems, where data structures need to be easy to serialize and pass between applications, while logic resides within those applications.
FP discourages state mutation, preventing several drawbacks associated with state mutation, such as lack of thread safety, coupling, and impurity:
For a type to be immutable, all its children, including lists and other data structures, must also be immutable.
You can simplify your application and promote loose coupling by separating data from logic:
1 The preceding examples refer to multithreading, but the same problems can arise if the source of concurrency is asynchrony or parallelism (these terms were described in the sidebar on the “Meaning and types of concurrency” in chapter 3).
2 The fundamental techniques I discuss in this section are ubiquitous in FP, but the concepts and metaphors I use to explain them are largely inspired by Rich Hickey, the creator of the Clojure programming language.
3 The creators of .NET took inspiration from Java, but in this case, they also learned from Java’s mistakes (Java had mutable dates until Java 8).
4 In reality, you can still mutate read-only variables by using reflection. But making a field read-only is a clear signal to any clients of your code that the field isn’t meant to be mutated.
3.148.104.215