5 Modeling the possible absence of data

This chapter covers

  • Using Option to represent the possible absence of data
  • Understanding why null is a terrible idea
  • Whether you should use C# 8 nullable reference types

In chapter 4, I introduced you to the idea that types should precisely represent the data they encapsulate in order to write expressive function signatures. One particularly thorny issue is that of representing data that may not be available. For instance, when you register on a website, you typically have to provide your email address, but other details like your age and gender are optional. The website owner may want to process and analyze this data if it’s available.

“Wait a minute,” you’re probably thinking, “don’t we use null for this?” I’ll discuss null in section 5.5, but for the first part of this chapter, you could just pretend that null doesn’t exist and that we have to come up with a way to represent the possible absence of data.

When coding functionally, you never use nullever. Instead, FP uses the Option type to represent optionality. I hope to show you that Option provides a much more robust and expressive representation. If you’ve never heard of Option before, I ask you to suspend judgment, as the added value of Option may not be clear until you see it used in the next couple of chapters.

5.1 The bad APIs you use every day

The problem of representing the possible absence of data isn’t handled gracefully in .NET libraries. Imagine you go for a job interview and are given the following quiz:

Question: What does this program print?

using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using static System.Console;
 
class IndexerIdiosyncracy
{
   public static void Main()
   {
      try
      {
         var empty = new NameValueCollection();
         var green = empty["green"];             
         WriteLine("green!");
 
         var alsoEmpty = new Dictionary<string, string>();
         var blue = alsoEmpty["blue"];           
         WriteLine("blue!");
      }
      catch (Exception ex)
      {
         WriteLine(ex.GetType().Name);
      }
   }
}

Take a moment to read through the code. Note that NameValueCollection is simply a map from string to string.1 Then, write down what you think the program prints (make sure nobody’s looking). Now, how much would you be willing to bet that you got the right answer? If you’re like me and have a nagging feeling that as a programmer you should really be concerned with other things than these annoying details, the rest of this section will help you see why the problem lies with the APIs themselves and not with your lack of knowledge.

The code uses indexers to retrieve items from two empty collections, so both operations fail. Indexers are, of course, just normal functions—the [] syntax is just sugar—so both indexers are functions of type string string and both are dishonest. Why do I say dishonest?

The NameValueCollection indexer returns null if a key isn’t present. It’s somewhat open to debate whether null is actually a string, but I tend to say no.2 You give the indexer a perfectly valid input string, and it returns the useless null value—not what the signature claims.

The Dictionary indexer throws a KeyNotFoundException, so it’s a function that says, “Give me a string, and I’ll return you a string,” when it should actually say, “Give me a string, and I may return you a string, or I may throw an exception instead.”

To add insult to injury, the two indexers are dishonest in inconsistent ways. Now that you know this, it’s easy to see that the program prints:

green!
KeyNotFoundException

The interface exposed by two different associative collections in .NET is inconsistent. Who’d have thought? And the only way to find out is by looking at the documentation (boring) or stumbling on a bug (worse). Let’s look at the functional approach to representing the possible absence of data.

5.2 An introduction to the Option type

Option is essentially a container that wraps a value ... or no value. It’s like a box that may contain a thing, or it could be empty. The symbolic definition for Option is as follows:

Option<T> = None | Some(T)

Let’s see what that means. T is a type parameter (the type of the inner value), so an Option<int> may contain an int. The | sign means or, so the definition says that an Option<T> can be one of two things:

  • None—A special value indicating the absence of a value. If the Option has no inner value, we say that the Option is None.

  • Some(T)—A container that wraps a value of type T. If the Option has an inner value, we say that the Option is Some.

(In case you’re wondering, in Option<T>, I use angle brackets to indicate that T is a type parameter; in Some(T), I use parentheses to indicate that Some is a function that takes a T and returns an Option<T>, wrapping the given value.)

In terms of sets, Option<T> is the union of the set Some(T) with the singleton set None (see figure 5.1). Option is a good example of a sum type, which we discussed in section 4.2.4.

Figure 5.1 Option<T> is the union of the set Some<T> with the singleton set None.

If bool has two possible values, then Some<bool> also has two possible values, but Option<bool> has three possible values because it also includes None. Similarly, Option<DayOfWeek> has eight possible values, and so on.

We’ll look at implementing Option in the next subsection, but first, let’s take a look at its basic usage so you’re familiar with the API. I recommend you follow along in the REPL, but you’ll need a bit of setup, and that’s described in the following sidebar.

Using the LaYumba.Functional library in the REPL

I developed my own functional library, LaYumba.Functional, to support the teaching of many of the techniques in this book. It would be useful for you to play with the constructs included in LaYumba.Functional in the REPL. This requires you to import it in the REPL:

  1. If you haven’t done so already, download and compile the code samples from https://github.com/la-yumba/functional-csharp-code-2.

  2. Reference the LaYumba.Functional library in your REPL. Just how this works depends on your setup. On my system (using the C# Interactive window in Visual Studio with the code samples solution open), I can do so by typing the following:

    #r "functional-csharp-code-2LaYumba.FunctionalinDebug
    et6.0
     LaYumba.Functional.dll"
  3. Type the following imports into the REPL:

    using LaYumba.Functional;
    using static LaYumba.Functional.F;

Once you’re set up, you can create some Options:

Option<string> _ = None;               
 
Option<string> john = Some("John");    

Creates a None

Creates a Some

That was easy! Now that you know how to create Options, how can you interact with them? At the most basic level, you can do so with Match, a method that performs pattern matching. Simply put, it allows you to run different code depending on whether the Option is None or Some.

For example, if you have an optional name, you can write a function that returns a greeting for that name or a general-purpose message if no name is given. Type the following into the REPL:

string Greet(Option<string> greetee)
   => greetee.Match(
      None: () => "Sorry, who?",            
      Some: (name) => $"Hello, {name}");    
 
Greet(Some("John")) // => "Hello, John"
 
Greet(None) // => "Sorry, who?"

If greetee is None, Match evaluates this function.

If greetee is Some, Match evaluates this function, passing it greetee's inner value.

As you can see, Match takes two functions: the first one says what to do in the None case; the second, what to do in the Some case. In the Some case, the function is given the inner value of the Option.

In the preceding call to Match, the named arguments None: and Some: are used for extra clarity. It’s possible to omit those:

string greet(Option<string> greetee)
   => greetee.Match
   (
      () => "Sorry, who?",
      (name) => $"Hello, {name}"
   );

In general, I omit them because the empty parens () in the first lambda already suggest an empty container (that is, an Option in the None state), whereas the parens with an argument inside, (name), suggest a container with a value inside. (The parens are optional in the Some case, as with any unary lambda, but I keep them here to maintain this graphic analogy.)

If this is all a bit confusing right now, don’t worry; things will fall into place as we go along. For now, these are the things to remember:

  1. Use Some(value) to wrap a value into an Option.

  2. Use None to create an empty Option.

  3. Use Match to run some code depending on the state of the Option.

For now, you can think of None as a replacement for null, and Match as a replacement for a null check. You’ll see in subsequent sections why using Option is actually preferable to null, and why, eventually, you won’t need to use Match very often.

5.3 Implementing Option

Feel free to skip to section 5.4 or skim over this section on first reading. To start with, it’s important that you understand enough to be able to use Option. But if you’d like to see what’s under the hood, in this section, I’ll show you the techniques I used in the implementation of Option that I included in LaYumba.Functional. This is both to show you that there’s little magic involved and to show you ways to work around some limitations of the C# type system. You might like to type this code into an empty project as you follow along.

5.3.1 An idealized implementation of Option

In many typed functional languages, Option can be defined with a one-liner along these lines:

type Option t = None | Some t

The closest equivalent in C# is the following:

interface Option<T> { }
record None : Option<T>;
record Some<T>(T Value) : Option<T>;

That is, we define Option<T> as a marker interface and then provide minimal implementations for None and Some<T>, saying that each of them is a valid Option<T>. Some<T> contains a T, and None contains nothing.

Here we already run into a problem: because None does not actually contain a T, we’d like to say that None is a valid Option<T> regardless of what type T eventually resolves to. Unfortunately, the C# compiler does not allow this, so in order to make the code compile, we need to provide a generic parameter for None as well.

record None<T> : Option<T>;

We now have a basic, working implementation.

5.3.2 Consuming an Option

Next, we want to write code that consumes an Option using pattern matching. Ideally, I’d like it to look like this:

string Greet(Option<string> greetee)
   => greetee switch
   {
      None => "Sorry, who?",
      Some(name) => $"Hello, {name}"
   };

Unfortunately, this does not compile. If we are to satisfy the syntax for pattern matching in C#, we need to rewrite the code as follows:

string Greet(Option<string> greetee)
   => greetee switch
   {
      None<string> => "Sorry, who?",
      Some<string>(var name) => $"Hello, {name}"
   };

This is definitely less elegant (imagine if you have a long type name instead of string), but at least it compiles. It does, however, generate a compiler warning, saying that “the switch expression does not handle all possible values of its input type.” This is because, in theory, some other implementation of Option<string> could exist, and the switch expression in our example does not cater to this. Unfortunately, there is no way to tell C# that we never want anything other than Some and None to implement Option.

We can mitigate both issues by defining our own adapter function Match that includes a discard pattern. This allows us to perform exhaustive pattern matching and gives us an interface that’s easy to consume:

static R Match<T, R>(this Option<T> opt, Func<R> None, Func<T, R> Some)
   => opt switch
   {
      None<T> => None(),
      Some<T>(var t) => Some(t),
      _ => throw new ArgumentException("Option must be None or Some")
   };

Then we can consume an Option like this:

string Greet(Option<string> greetee)
   => greetee.Match
   (
      None: () => "Sorry, who?",
      Some: (name) => $"Hello, {name}"
   );

Now we have an elegant, concise way to consume an Option. (Notice that we also need an overload of Match that takes two actions, allowing us to do something depending on the state of the Option. This can easily be done following the approach described in section 4.3.2.)

5.3.3 Creating a None

Let’s move on to creating Options. To explicitly create a None—say, for testing that Greet works with None—we have to write this:

var greeting = Greet(new None<string>());

This is not nice. I particularly dislike that we have to specify the string parameter: when calling a method, we’d like to have type inference resolve our generic parameters. What we need, ideally, is a value that can be converted to a None<T>, regardless of the type of T.

While you can’t do this with inheritance, it turns out you can do it with type conversion. To achieve this, we need to define a dedicated, non-generic type, NoneType:

struct NoneType { }

Next, we change Option<T> to include implicit conversion from NoneType to None<T>:

abstract record Option<T>
{
   public static implicit operator Option<T>(NoneType _)
      => new None<T>();
}

This effectively tells the runtime that an instance of NoneType can be used where an Option<T> is expected and instructs the runtime to convert the NoneType to a None<T>. Finally, we include a convenience field called None that stores a NoneType:

public static readonly NoneType None = default;

You can now create a None<T> by simply typing None:

Greet(None) // => "Sorry, who?"

Much better! Note that this assumes that the None field is in scope, which can be achieved with using static.

In the previous snippet, None returns a NoneType. Seeing that Greet expects an Option<string>, the runtime calls the implicit conversion we defined in Option<T>, which yields a None<string>. When all is said and done, you can forget that the NoneType exists and just code knowing that None returns a None<T> for the expected T.

5.3.4 Creating a Some

Now for creating a Some. First, because Some indicates the presence of a value, it should not to be possible to wrap a null into a Some. To do this, instead of relying on the automatic methods generated for records by the compiler, we’ll explicitly define the constructor:

record Some<T> : Option<T>
{
   private T Value { get; }
 
   public Some(T value)
      => Value = value ?? throw new ArgumentNullException();
 
   public void Deconstruct(out T value)
      => value = Value;
}

Here I also made the Option's inner value private so that it can only be accessed when the Option is deconstructed in pattern matching. We can then define a convenience function, Some, that wraps a given value into a Some:

public static Option<T> Some<T>(T t) => new Some<T>(t);

With this in place, we can create a Some like so:

Greet(Some("John")) // => "Hello, John"

Now we have nice, clean syntax for creating both a None and a Some. To put the icing on the cake, I’m also going to define an implicit conversion from T to Option<T>:

abstract record Option<T>
{
   public static implicit operator Option<T>(T value)
      => value is null ? new None<T>() : new Some<T>(value);
}

This means that a T can be used where an Option<T> is expected and will automatically be wrapped into a Some<T>—unless it’s null, in which case it will be a None<T>. This snippet saves us from explicitly calling Some:

Greet(None)   // => "Sorry, who?"
Greet("John") // => "Hello, John"

It also allows us to trivially convert a function that returns null to one that returns an Option:

var empty = new NameValueCollection();
Option<string> green = empty["green"];
 
green // => None

5.3.5 Optimizing the Option implementation

For a number of reasons, in my LaYumba.Functional library, I’ve chosen to use a slightly different approach and define Option as in the following listing.

Listing 5.1 An implementation of Option optimized for C#

public struct Option<T>
{
   readonly T? value;                                        
   readonly bool isSome;                                     
 
   internal Option(T value)                                  
   {
      this.value = value ?? throw new ArgumentNullException();
      this.isSome = true;
   }
 
   public static implicit operator Option<T>(NoneType _)
      => default;                                            
 
   public static implicit operator Option<T>(T value)
      => value is null ? None : Some(value);
 
   public R Match<R>(Func<R> None, Func<T, R> Some)          
       => isSome ? Some(value!) : None();                    
}

The value wrapped by a Some

Indicates whether the Option is Some or None

Constructs an Option in the Some state

Constructs an Option in the None state

Once an Option is constructed, the only way to interact with it is with Match.

In this implementation, instead of using different types, I use state (namely, the isSome flag) to indicate whether the Option is Some or None. I’m providing a single constructor that creates an Option in the Some state. That’s because I’ve defined Option as a struct, and structs have an implicit parameterless constructor that initializes all fields to their default values. In this case, the isSome flag is initialized to false, indicating that the Option is None. This implementation has several advantages:

  • Performance is better because structs are allocated on the stack.

  • Being a struct, an Option cannot be null.

  • The default value of an Option is None (with records, it was null).

Everything else (the NoneType, implicit conversion, and the interface of Match) is the same as discussed previously. Finally, I’ve defined the Some function and the None value in the F class, which allows you to easily create Options:

namespace LaYumba.Functional;
 
public static partial class F
{
   public static Option<T> Some<T>(T value) => new Option<T>(value);
   public static NoneType None => default;
}

Now that you have seen all the pieces of the puzzle, take another look at the example I showed earlier. It should be clearer now:

using LaYumba.Functional;
using static LaYumba.Functional.F;
 
string Greet(Option<string> greetee)
   => greetee.Match
   (
      None: () => "Sorry, who?",
      Some: (name) => $"Hello, {name}"
   );
 
Greet(Some("John")) // => "Hello, John"
 
Greet(None) // => "Sorry, who?"

As you’ve seen, there are different possible ways to implement Option in C#. I’ve chosen this particular implementation because it allows the cleanest API from the perspective of client code. But Option is a concept, not a particular implementation, so don’t be alarmed if you see a different implementation in another library or tutorial.3 It will still have the defining features of an Option:

  • A value None that indicates the absence of a value

  • A function Some that wraps a value, indicating the presence of a value

  • A way to execute code depending on whether a value is present (in our case, Match)

Option is also called Maybe

Different functional frameworks use varying terminology to express similar concepts. A common synonym for Option is Maybe, with the Some and None states called Just and Nothing, respectively.

Such naming inconsistencies are unfortunately quite common in FP, and this doesn’t help in the learning process. In this book, I’ll try to present the most common synonyms for each pattern or technique and then stick with one name. From now on, I’ll stick to Option. Just know that if you run across Maybe (say, in a JavaScript or Haskell library), it’s the same concept.

Let’s now look at some practical scenarios in which you can use Option.

5.4 Option as the natural result type of partial functions

We’ve discussed how functions map elements from one set to another and how types represent these sets. There’s an important distinction to make between total and partial functions:

  • Total functions—Mappings that are defined for every element of the domain

  • Partial functions—Mappings that are defined for some but not all elements of the domain

Partial functions are problematic because it’s not clear what the function should do when given an input for which it can’t compute a result. The Option type offers a perfect solution to model such cases: if the function is defined for the given input, it returns a Some wrapping the result; otherwise, it returns None. Let’s look at some common use cases in which we can use this approach.

5.4.1 Parsing strings

Imagine a function that parses a string representation of an integer. You could model this as a function of type string int. This is clearly a partial function because not all strings are valid representations of integers. In fact, there are infinitely many strings that can’t be mapped to an int.

You can provide a safer representation of parsing by having the parser function return an Option<int>. This will be None if the given string can’t be parsed, as figure 5.2 illustrates.

Figure 5.2 Using Option to convey that parsing is a partial function. For input strings that provide valid representation of an integer, the parsing function wraps the parsed int into a Some. Otherwise, it returns None.

A parser function with the signature string int is partial, and it’s not clear from the signature what will happen if you supply a string that can’t be converted to an int. On the other hand, a parser function with signature string Option<int> is total because, for any given string, it returns a valid Option<int>. Here’s an implementation that uses a BCL method to do the grunt work but exposes an Option-based API:

public static class Int
{
   public static Option<int> Parse(string s)
      => int.TryParse(s, out int result)
         ? Some(result) : None;
}

The helper functions in this subsection are included in LaYumba.Functional. You can try them out in the REPL:

Int.Parse("10")    // => Some(10)
Int.Parse("hello") // => None

Similar methods are defined to parse strings into other commonly used types like doubles and dates and, more generally, to convert data in one form to another, more restrictive form.

5.4.2 Looking up data in a collection

In section 5.1, I showed you that some collections expose an API that’s neither honest nor consistent in representing the absence of data. The gist was as follows:

new NameValueCollection()["green"]
// => null
 
new Dictionary<string, string>()["blue"]
// => runtime error: KeyNotFoundException

The fundamental problem is the following. An associative collection maps keys to values and can, therefore, be seen as a function of type TKey TValue. But there’s no guarantee that the collection contains a value for every possible key, so looking up a value is a partial function.

A better, more explicit way to model the retrieval of a value is by returning an Option. It’s possible to write adapter functions that expose an Option-based API, and I generally name these Option-returning functions Lookup:

Lookup : (NameValueCollection, string)  Option<string>

Lookup takes a NameValueCollection and a string (the key) and returns a Some wrapping the value if the key exists and None otherwise. The following listing shows the implementation.

Listing 5.2 Changing a null-returning function to return an Option

public static Option<string> Lookup
   (this NameValueCollection collection, string key)
   => collection[key];

That’s it! The expression collection[key] is of type string, whereas the declared return value is Option<string>, so the string value will be implicitly converted into an Option<string>, with null being replaced by None. With minimal effort, we’ve gone from a null-based API to an Option-based API.

Here’s an overload of Lookup that takes an IDictionary. The signature is similar:

Lookup : (IDictionary<K, T>, K)  Option<T>

We can implement the Lookup function as follows:

public static Option<T> Lookup<K, T>(this IDictionary<K, T> dict, K key)
   => dict.TryGetValue(key, out T value) ? Some(value) : None;

We now have an honest, clear, and consistent API to query both collections:

new NameValueCollection().Lookup("green")
// => None
 
new Dictionary<string, string>().Lookup("blue")
// => None

No more KeyNotFoundException or NullReferenceException because you asked for a key that wasn’t present in the collection. We can apply the same approach when querying other data structures.

5.4.3 The smart constructor pattern

In section 4.2.2, we defined the Age type, a type more restrictive than int, in that not all ints represent a valid age. You can, again, model this with Option, as figure 5.3 shows.

Figure 5.3 Converting from int to Age can also be modeled with Option.

If you need to create an Age from an int, instead of calling the constructor, which has to throw an exception if it’s unable to create a valid instance, you can define a function that returns Some or None to indicate the successful creation of an Age. This is known as a smart constructor: it’s smart in the sense that it’s aware of some rules and can prevent the construction of an invalid object. The following listing shows this approach.

Listing 5.3 Implementing a smart constructor for Age

public struct Age
{
   private int Value { get; }
 
   public static Option<Age> Create(int age)        
      => IsValid(age) ? Some(new Age(age)) : None;
 
   private Age(int value)                           
      => Value = value;
 
   private static bool IsValid(int age)
      => 0 <= age && age < 120;
}

A smart constructor returning an Option

The constructor should now be marked as private.

If you now need to obtain an Age from an int, you’ll get an Option<Age>, which forces you to account for the failure case.

5.5 Dealing with null

At the beginning of this chapter, I asked you to pretend there was no null in C# and that we had to come up with a way to represent optional values. Truly functional languages don’t have null and model optional values with the Option type. However, some of the most popular programming languages, including C#, not only allow for null, but use it as the default value for all reference types. In this section, I’ll show you why this is a problem and how it can be tackled.

5.5.1 Why null is such a terrible idea

Let’s look at some of the reasons why null causes so many problems.

Sloppy data modeling

In section 4.2.4, you saw that the tuple (Age, Gender) has (120 × 2) = 240 possible values. The same is true if you store those two values in a struct. Now, if you define a class or record to hold these values like so

record HealthData(Age age, Gender Gender);

then there are actually 241 possible values because reference types can be null. If you refactor Age to be a class, you now have 121 possible values for Age and 243 possible values for HealthData! Not only is null polluting the mathematical representation of the data, but we also have to write code to handle all those possible values.

Ambiguous function signatures

You may have heard that the NullReferenceException is the single most common source of bugs. But why is it so common? The answer lies, I believe, in a fundamental ambiguity:

  • Because reference types are null by default, your program may encounter a null as a result of a programming error, where a required value was simply not initialized.

  • Other times, null is considered a legal value; for example, the authors of NameValueCollection decided it was OK to represent that a key is not present by returning null.

Because there is no way to declare whether a null value is deliberate or the result of a programming error (at least before C# 8’s nullable reference types, which I’ll discuss in section 5.5.3), you’re often in doubt as to how to treat a null value. Should you allow for null? Should you throw an ArgumentNullException? Should you let the NullReferenceException bubble up? Essentially, every function that accepts or returns a reference type is ambiguous because it’s unclear whether a null value is a legal input or output.

Defensive null-checking

The ambiguity between legal and unintentional nulls does not only cause bugs. It has another effect, which may be even more damaging: it leads to defensive programming. To prevent the lurking NullReferenceException, developers litter their code with null checks and assertions against null arguments. While there is a case for using these assertions (see section 5.5.4), if used throughout the codebase, they create a lot of noise.

5.5.2 Gaining robustness by using Option instead of null

The main step to address these problems is to never use null as a legal value. Instead, use Option to represent optional values. This way, any occurrence of null is the result of a programming error. (This means that you never need to check for null; just let the NullReferenceException bubble up.) Let’s see an example.

Imagine you have a form on your website that allows people to subscribe to a newsletter. A user enters his name and email, and this causes the instantiation of a Subscriber, which is then persisted to the database. Subscriber is defined as follows:

public record Subscriber
(
   string Name,
   string Email
);

When it’s time to send out the newsletter, a custom greeting is computed for the subscriber, which is prepended to the body of the newsletter:

public string GreetingFor(Subscriber subscriber)
   => $"Dear {subscriber.Name.ToUpper()},";

This all works fine. Name can’t be null because it’s a required field in the signup form, and it’s not nullable in the database.

Some months later, the rate at which new subscribers sign up drops, so the business decides to lower the barrier to entry by no longer requiring new subscribers to enter their name. The name field is removed from the form, and the database is modified accordingly.

This should be considered a breaking change because it’s not possible to make the same assumptions about the data any more. And yet, the code still happily compiles. When time comes for the newsletter to be sent, GreetingFor throws an exception when it receives a Subscriber without a Name.

By this time, the person responsible for making the name optional in the database may be on a different team than the person maintaining the code that sends out the newsletter. The code may be in different repositories. In short, it may not be simple to look up all the usages of Name. Instead, it’s better to explicitly indicate that Name is now optional. That is, Subscriber should be changed to

public record Subscriber
(
   Option<string> Name,    
   string Email
);

Name is now explicitly marked as optional.

This not only clearly conveys the fact that a value for Name may not be available, it causes GreetingFor to no longer compile. GreetingFor and any other code that was accessing the Name property will have to be modified to take into account the possibility of the value being absent. For example, you might modify it like so:

public string GreetingFor(Subscriber subscriber)
   => subscriber.Name.Match
   (
      () => "Dear Subscriber,",
      (name) => $"Dear {name.ToUpper()},"
   );

By using Option, you’re forcing the users of your API to handle the case in which no data is available. This places greater demands on the client code, but it effectively removes the possibility of a NullReferenceException occurring.

Changing a string to an Option<string> is a breaking change: in this way, you’re trading run-time errors for compile-time errors, thus making a compiling application more robust.

5.5.3 Non-nullable reference types?

It has become widely accepted that having nullable types is a flaw in the language design. This is somewhat confirmed by the fact that so many releases of C# have introduced new syntax for dealing with null, gradually making the language more complex but without ever solving the problem at the root.

The most radical effort to take a stab at the problem has been made in C# 8 by introducing a feature called nullable reference types (NRT). The name may seem odd, given that reference types were always nullable in C#; the point is that the feature allows you to mark the types you intend to be nullable, and the compiler keeps track of how you access instances of those types. For example, NRT allows you to write

#nullable enable           
 
public record Subscriber
(
   string? Name,           
   string Email            
);

Enables the NRT feature in the code that follows

A nullable field

A non-nullable field

This allows you to be explicit in your declarations on which values can be null. Furthermore, if you dereference Name without a null check, you’ll get a compiler warning telling you that Name may be null:

#nullable enable
 
public string GreetingFor(Subscriber subscriber)
   => $"Dear {subscriber.Name.ToUpper()},";
 
// => CS8602 Dereference of a possibly null reference

On the face of it, you might think this feature supersedes Option, and to a certain extent, it does. When you look deeper, however, you’ll find a few problems:

  • You need to explicitly opt into the feature by adding the Nullable element to your project file (or adding the #nullable directive in your files as shown previously).

  • Even when you’ve opted into NRT at project level, it’s still possible to override this within a file by using the #nullable disable directive. This means that you cannot reason about code in isolation: you now need to look in different places to see whether a string is nullable or not.

  • The compiler warnings only appear if both the nullable value declaration and the code where the value is dereferenced are in a NRT-enabled context, again making it difficult to reason about code in isolation.

  • Unless you’re treating warnings as errors, your code will still compile after changing, say, string to string?, which is, therefore, not a breaking change and will go unnoticed in a codebase with lots of warnings.

  • The compiler can’t always keep track of the null checks you’ve made along the way, For example,

    public string GreetingFor(Subscriber subscriber)
       => IsValid(subscriber)                        
          ? $"Dear {subscriber.Name.ToUpper()},"     
          : "Dear Subscriber";

    Checks that subscriber.Name is not null

    Still warns that you may be dereferencing a null

    results in a compiler warning even if IsValid checks that Name is not null. To fix this, you have to learn an obscure set of attributes to keep the compiler from warning you about these false positives.4

  • Fields that are not marked as nullable can still end up being null (for example, when deserializing an object):

    #nullable enable
     
    var json = @"{""Name"":""Enrico"", ""Email"":null}";
    var subscriber = JsonSerializer.Deserialize<Subscriber>(json);
     
    if (subscriber is not null)
       WriteLine(subscriber.Email.ToLower());
    // => throws NullReferenceException
  • The feature doesn’t allow you to deal with optionality in a way that is uniform between value and reference types. Despite the syntactic similarity between, say, int? and string?, they are completely different: int? is shorthand for Nullable<int>, so we have a structure wrapping the int, somewhat similarly to Option. On the other hand, string? is an annotation telling the compiler that the value could be null.

Notice that none of those limitations apply when using the Option type. Overall, despite my initial excitement as NRT was being developed, I’m now inclined to find it’s too little, too late. It seems that the language team set out with a bold agenda for this feature, but then watered it down to allow users to migrate their existing codebases to C# 8 without too much effort.

If you’re working on a team that embraces NRT and opts to use it everywhere, or if in a few years’ time adoption becomes ubiquitous, then NRT will certainly add value. But at the time of writing, if you’re working on a variety of projects and consuming a variety of libraries, not all of which use NRT throughout, I don’t see NRT bringing a real benefit.

5.5.4 Bulletproof against NullReferenceException

Given all that we discussed previously, in my opinion the most robust approach to prevent null values from wreaking havoc is as follows. Firstly

  • If you’re using C# 8, enable NRT. This helps to ensure that required values are always initialized. More importantly, it conveys intent to consumers of your code that also have NRT enabled.

  • For optional values, use Option<T> rather than T?.

This means that, inside the boundaries of your code, you can be confident that no value is ever null. You should have no null checks nor throw any ArgumentNullException.

Secondly, identify the boundaries of your code. This includes

  • Public methods exposed by libraries that you intend to publish or share across projects

  • Web APIs

  • Listeners to messages from message brokers or persisted queues

In those boundaries, prevent null values from seeping in

  • For required values

    • Throw an ArgumentNullException.
    • Return a response with a status code of 400 (Bad Request).
    • Reject the message.
  • For optional values, convert null values into Options:

    • In C#, this can be done trivially with implicit conversion.
    • If your boundary involves deserializing data sent in another format, you can add the conversion logic to your formatter.

Thirdly, where you consume .NET or third-party libraries, you also need to prevent null from seeping in. You saw an example of how to do this in listing 5.2, where we defined the Option-returning Lookup method on NameValueCollection.

Converting JSON null to C# Option

For convenience, my LaYumba.Functional library includes a formatter that works with .NET’s System.Text.Json and illustrates how null in JSON objects can be translated into C# Option and back. Here’s an example of how to use it:

using System.Text.Json;
using LaYumba.Functional.Serialization.Json;
 
record Person
(
   string FirstName,
   Option<string> MiddleName,
   string LastName
);
 
JsonSerializerOptions ops = new()
{
   Converters = { new OptionConverter() }
};
 
var json = @"{""FirstName"":""Virginia"",
   ""MiddleName"":null, ""LastName"":""Woolf""}";
var deserialized = JsonSerializer.Deserialize<Person>(json, ops);
 
deserialized.MiddleName // => None
 
json = @"{""FirstName"":""Edgar"",
   ""MiddleName"":""Allan"", ""LastName"":""Poe""}";
deserialized = JsonSerializer.Deserialize<Person>(json, ops);
 
deserialized.MiddleName // => Some("Allan")
 

In summary, Option should be your default choice when representing a value that’s, well, optional. Use it in your data objects to model the fact that a property may not be set and in your functions to indicate the possibility that a suitable value may not be returned. Apart from reducing the chance of a NullReferenceException, this will enrich your model and make your code more self-documenting. Using Option in your function signature is one way of attaining the overarching recommendation of chapter 4: designing function signatures that are honest and highly descriptive of what the caller can expect.

In upcoming chapters, we’ll look at how to work effectively with Options. Although Match is the basic way of interacting with an Option, we’ll build a rich, high-level API starting in the next chapter. Option will be your friend, not only when you use it in your programs, but also as a simple structure through which I’ll illustrate many FP concepts.

Exercises

  1. Write a generic Parse function that takes a string and parses it as a value of an enum. It should be usable as follows:

    Enum.Parse<DayOfWeek>("Friday")  // => Some(DayOfWeek.Friday)
     
    Enum.Parse<DayOfWeek>("Freeday") // => None
  2. Write a Lookup function that takes an IEnumerable and a predicate and returns the first element in the IEnumerable that matches the predicate or None, if no matching element is found. Write its signature in arrow notation:

    bool isOdd(int i) => i % 2 == 1;
     
    new List<int>().Lookup(isOdd)     // => None
    new List<int> { 1 }.Lookup(isOdd) // => Some(1)
  3. Write a type Email that wraps an underlying string, enforcing that it’s in a valid format. Ensure that you include the following:

    • A smart constructor
    • Implicit conversion to string so that it can easily be used with the typical API for sending emails
  4. Take a look at the extension methods defined on IEnumerable in System.LINQ .Enumerable.5 Which ones could potentially return nothing or throw some kind of not-found exception and would, therefore, be good candidates for returning an Option<T> instead?

Summary

  • Use the Option type to express the possible absence of a value. An Option can be in one of two states:

    • None, indicating the absence of a value
    • Some, a simple container wrapping a non-null value
  • To execute code conditionally, depending on the state of an Option, use Match with the functions you’d like to evaluate in the None and Some cases.

  • Use Option as a return value when a function cannot guarantee a valid output for all possible inputs including

    • Looking up values in collections
    • Creating objects requiring validation (smart constructors)
  • Identify the boundaries of your code and prevent any null values from seeping in:

    • Enforce required values.
    • Convert optional values to Option.

1 In the early days of .NET, NameValueCollection was used quite frequently because it was common to use ConfigurationManager.AppSettings to get configuration settings from a .config file. This was superseded by the more recent configuration providers, so you may not encounter NameValueCollection often, even though it’s still part of .NET.

2 In fact, the language specification itself says so: if you assign null to a variable as in string s = null;, then s is string evaluates to false.

3 For example, the popular mocking framework NSubstitute includes an implementation of Option.

4 For more details, see http://mng.bz/10XQ.

5 See the Microsoft documentation of enumerable methods: http://mng.bz/PXd8.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.67.13