Not even a thought has arisen; is there still a sin or not?
—Zen Koan, 10th Century CE
This is a chapter about nothing! Specifically, it’s about how we handle the absence of data in our programs. It’s a more important topic than you might think at first: bugs caused by incorrect handling of missing data, typically manifested as “null reference errors,” are distressingly common in Object-Oriented programs. And this still happens, despite code to avoid such errors forming a significant proportion of the line count of many C# code bases.
In this chapter I’ll try to convince you how serious a problem this is and show you the many features and idioms that F# offers to mitigate and even eliminate this class of error.
A Brief History of Null
Some ALGOL W code that uses null
Here, the ability to have a null or an actual value is used to model – for example - the fact that a person might or might not have an elder sibling. Null and nonnull instance values are used as flags to go down various branches of code. The modeling is definitely a bit fuzzy: for instance, FATHER and MOTHER are also nullable, even though everyone has a mother and father. Perhaps this models the fact that we might not know who they are. This kind of ambiguity was excusable in the 1960s, but coding patterns in the style of Listing 3-1 are still surprisingly common, even though there are now well-known techniques for modeling such relationships much more explicitly.
Of course, things have improved somewhat since 1965: in C#, for example, we now have the null coalescing operator ??, which allows us to retrieve either the nonnull value of some source item, or some other value, typically a default. As of C# 6.0, we also have the null-conditional operators ?. and ?[] that allow us to reach into an object or array for a property or indexed item and safely return null if either the object with the property, or the property itself, is null.
What has happened in these cases (typically) is that code has tried to access some property or method of an object, which is itself null, such as the arrival time of the first train when there is no known first train.
Null Reference Mentions in a Major C# Code Base Issue List
Search Term | Open | Closed |
---|---|---|
NullReferenceException | 201 | 521 |
null reference | 264 | 811 |
null-ref | 73 | 191 |
nullref | 2 | 24 |
Incidentally, when I updated these figures for the new edition of this book, the figures in all but two categories had gone up since 2018. Clearly, it isn’t just “bad programmers” making these mistakes: null reference errors are accidents waiting to happen. Rather than blaming the operator, we should follow the basic principles of ergonomics and design such errors out of the technology at the language level.
At the time of writing, the primary approach in C# is still to “code around” the problem of null, which works (if you remember to do it) but does have a cost. I analyzed several open-source C# code bases and found that the proportion of lines involved in managing nulls (null checks, uses of null-coalescing and null-conditional operators) amounted to between 3% and 5% of the significant lines of code. Not crippling by any means, but certainly a significant distraction. Anything we can do to make this process easier has a worthwhile payoff.
The conclusion must be that paying attention to missing data and spending some time learning the techniques handle to it correctly, or avoiding it completely, are among the most useful things you can do as you learn idiomatic F# coding.
Option Types vs. Null
F#’s answer to the problem of potentially absent values is the option type. If you’ve coded in F# at all, you are probably familiar with option types, but please bear with me for a few moments while I establish very clearly what option types are and what they are not.
Example of a Discriminated Union
The Option type viewed as a Discriminated Union
One obvious difference between Shape and Option is that one of the cases of Option takes no payload at all - which makes sense because we can’t know the value of something that, by definition, doesn’t exist. DU cases without payloads are perfectly fine.
Creating and using the Shape DU
Creating and using the Option DU
I could have done this because the compiler offers a special keyword for option types. I only used the Option<string> version in Listing 3-5 to highlight the fact that option types are DUs. In practice, you should use the option keyword as this is built into the language, making it widely understood and performant.
Consuming Option Types
Modeling an optional delivery address using an Option type
Note how at the end of Listing 3-6, we try to treat the orders’ delivery addresses as strings, not as string options, which are a different type. This causes a compiler error for both the myOrder and hisOrder cases, not just a runtime error in the myOrder case. This is the option type protecting us by forcing us to consider the has-data and nodata possibilities at the point of consumption.
This begs the question: How are we supposed to access the underlying value or payload? There are several ways to do this, some more straightforward than others, so in the next few sections, we’ll go through these and examine their benefits and costs.
Pattern Matching on Option Types
Accessing an option type’s payload using pattern matching
Consuming option types using explicit pattern matching in this way has clear trade-offs. The big advantage is that it’s simple: everyone familiar with the basics of F# syntax will be familiar with it, and the reader doesn’t require knowledge of other libraries (or even computer science theory!) to understand what is going on. The disadvantage is that it’s a little verbose and pipeline unfriendly.
I’ll present alternatives in future sections, but before I do, let me say this: if you, and anyone maintaining your code, aren’t completely comfortable with the basics of option types – comfortable to the extent that everyone is ready and keen to move onto more fluent methods of consumption – I’d advise that you stick with good old-fashioned pattern matching, at least for a while. As with many other areas of F# coding, trying to get too clever too quickly can lead to some pretty obscure code and a definite blurring of the principles of motivational transparency and semantic focus.
The Option Module
Once you are ready to go beyond pattern matching, you can start using some of the functions available in the Option module . I personally found the Option module functions a little hard to get my head around at first. I suspect this is because English language descriptions of these functions don’t make much sense without examples – so proceed with this section slowly!
The Option.defaultValue Function
Defaulting an Option Type Instance using Option.defaultValue
The usage of addressForPackage is exactly the same as in Listing 3-7, so I haven’t repeated the usage here.
Using Option.defaultValue in a pipeline
The Option.iter Function
Using Option.iter to take an imperative action if a value is populated
There are additional Option module functions analogous to their collection-based cousins. These include Option.count, which produces 1 if the value is Some, otherwise 0, and Option.toArray and Option.toList, which produce a collection of length 1 containing the underlying value, otherwise an empty collection.
Option.map and Option.bind
Documented Behavior of the Option.map and Option.bind Functions
Function | Description |
---|---|
Option.map | Transforms an option value by using a specified mapping function |
Option.bind | Invokes a function on an optional value that itself yields an option |
The Option.map Function
Using Option.map to optionally apply a function, returning an option type
Here, the requirement is to print a delivery address in capitals if it exists, otherwise to do nothing. We combine Option.map, to do the uppercasing when necessary, with Option.iter, to do the printing.
In the None case (top of the diagram), the None effectively passes through untouched and never goes near the uppercasing operation. In the Some case (bottom of diagram), the payload is uppercased and comes out as a Some value. At this point, we begin to see the beginnings of the “Railway Oriented Programming” paradigm, which we’ll discuss in detail in Chapter 11.
The Option.bind Function
Type Signatures for Option.map and Option.bind
Function | Signature |
---|---|
Option.map | ('T -> 'U) -> 'T option -> 'U option |
Option.bind | ('T -> 'U option) -> 'T option -> 'U option |
Look at them carefully: the only difference is that the “binder” function needs to return an option type ("U" option) rather than an unwrapped type ("U"). The usefulness of this is that if you have a series of operations, each of which might succeed (returning Some value) or fail (returning None), you can pipe them together without any additional ceremony. Execution of your pipeline effectively “bails out” after the first step that returns None, because subsequent steps just pass the None through to the end without attempting to do any processing.
The delivery address might not be specified (i.e., have a value of None).
The delivery address might exist but be an empty string, hence having no last line from which to get the postal code.
The last line might not be convertible to a postal code.
Using Option.bind to create a pipeline of might-fail operations
In Listing 3-12, we have a trylastLine function that splits the address by line breaks and returns the last line if there is one, otherwise None. Similarly, tryPostalCode attempts to convert a string to an integer and returns Some value only if that succeeds. The postalCodeHub function does a super-naive lookup (in reality, it would be some kind of database lookup) and always returns a value. We bring all these together in tryHub, which uses two Option.bind calls and an Option.map call to apply each of these operations in turn to get us from an optional delivery address to an optional delivery hub.
This is a really common pattern in idiomatic F# code: a series of Option.bind and Option.map calls to get from one state to another, using several steps, each of which can fail. Common though it is, it is quite a high level of abstraction, and it’s one of those things where you have to understand everything before you understand anything. So if you aren’t comfortable using it for now – don’t. A bit of nested pattern matching isn’t the worst thing in the world! I’ll return to this topic in Chapter 11 when we talk about “Railway Oriented Programming,” at which point perhaps it’ll make a little more sense.
Option Type No-Nos
Antipattern: accessing Option type payloads using hasValue and Value
Don’t do this! You’d be undermining the whole infrastructure we have built up for handling potentially missing values in a composable way.
Some people would also consider explicit pattern matching using a match expression (in the manner of Listing 3-7) and antipattern too and would have you always use the equivalent functions from the Option module. I think this is advice that’s great in principle but isn’t always easy to follow; you’ll get to fluency with Option.map, Option.bind, or so forth in due course. In the meantime, a bit of pattern matching isn’t going to hurt anyone, and the lower level of abstraction may make your code more comprehensible to nonadvanced collaborators.
Designing Out Missing Data
The BillingDetails type
There must always be a billing address.
There might be a different delivery address but….
There must be no delivery address if the product isn’t a physically deliverable one, such as a download.
Modeling delivery address possibilities using a DU
Consuming the improved BillingDetails type
The tryDeliveryLabel function uses a match expression to extract the relevant address. Then (when it exists), it uses Option.map to pair this with the customer name to form a complete label.
The deliveryLabels function takes a sequence of billingDetails items and applies tryDeliveryLabel to each item. Then it uses Seq.choose both to pick out those items where Some was returned and to extract the payloads of these Some values. (I go into more detail about Seq.choose and related functions in Chapter 4.)
It has good semantic focus . You can tell without looking elsewhere what functions such as tryDeliveryLabel will do and why.
It has good revisability . Let’s say you realize that you want to support an additional delivery mechanism: so-called “Click and Collect,” where the customer comes to a store to collect their item. You might start by adding a new case to the Delivery DU, maybe with a store ID payload. From then on, the compiler would tell you all the points in existing code that you needed to change, and it would be pretty obvious how to add new features such as a function to list click-and-collect orders and their store IDs.
It has good motivational transparency . You aren’t left wondering why a particular delivery address is None. The reasons why an address might or might not exist are right there in the code. Other developers both “above you” in the stack (e.g., someone designing a view model for a UI) and “below you” (e.g., someone consuming the data to generate back-end fulfilment processes) can be clear about when and why certain items should and should not be present.
Modeling like this, where we use DUs to provide storage only for the DU cases where a value is required, brings us toward the nirvana of “Making Illegal State Unrepresentable,” an approach that I believe does more to eliminate bugs than any other coding philosophy I’ve come across.
Interoperating with the Nullable World
In this section, I’ll talk a bit about the implications of nullability when interoperating between F# and C#. There shouldn’t be anything too unexpected here, but when working in F#, it’s always worth bearing in mind the implications of interop scenarios.
Leaking In of Null Values
A null hiding inside an Option type
(As an aside, and perhaps a little surprisingly, doing a printfn "%s" null or a sprint "%s" null is fine – formatting a string with %s produces output as if the string was a nonnull, empty string. The problem in Listing 3-17 is the call to the ToUpper() method of a null instance.)
Obviously, you wouldn’t knowingly write code exactly like Listing 3-17, but it does indicate how we are at the mercy of anything calling our code that might pass us a null. This doesn’t mean that the whole exercise of using option types or DUs is worthless. Option types and other DU wrappers are primarily useful because they make the intention of our code clear. But it does mean that, at the boundary of the code we consider to be safe, we need to validate for or otherwise deal with null values.
Defining a SafeString Type
Validating strings on construction
Having done this, one would need to require all callers to provide us with a SafeString rather than a string type .
It’s a tempting pattern, but frankly, things like nullable strings are so ubiquitous in .NET code that hardly anyone bothers. The overhead of switching to and from such null-safe types so that one can consume them and use them in .NET calls requiring string arguments is just too much to cope with. This is particularly in the case of mixed-language code bases, where, like it or not, nullable strings are something of a lingua franca.
Using Option.ofObj
Using Option.ofObj
Using Option.ofNullable
Using Option.ofNullable
Incidentally, Listing 3-20 was inspired by my exercise watch, which occasionally tells me that my heart rate is null.
Leaking Option Types and DUs Out
Clearly, the flipside of letting nulls leak into our F# code is the potential for leakage outward of F#-specific types such as the option type and Discriminated Unions in general. It’s possible to create and consume these types in languages such as C# using compiler-generated sugar such as the NewCase constructor and the .IsCase, .Tag, and .Item properties, plus a bit of casting. However, it’s generally regarded as bad manners to force callers to do so, if those callers might not be written in F#. Again, some functions in the Option module come to the rescue.
Using Option.toObj
Using Option.toObj
Returning success or failure as a Boolean, with result in a reference parameter
Using Option.toNullable
Using Option.toNullable
The Future of Null
C# 8.0 Syntax for nullable and nonnullable types
The ValueOption Type
Using the ValueOption type
There is also a ValueOption module that contains useful functions like ValueOption.bind, ValueOption.map, ValueOption.count, and ValueOption.iter, which behave in the same way that we described for the Option module previously.
Using ValueOption values can have performance benefits in some kinds of code. To quote the documentation for value option types:
Not all performance-sensitive scenarios are “solved” by using structs. You must consider the additional cost of copying when using them instead of reference types. However, large F# programs commonly instantiate many optional types that flow through hot paths, and in such cases, structs can often yield better overall performance over the lifetime of a program.
The only way to be sure is to experiment with realistic volumes and processing paths.
Recommendations
Avoid using null values to represent things that legitimately might not be set. Instead, use Discriminated Unions to model explicit cases when a value is or is not relevant, and only have storage for the value in the cases where it is relevant. If DUs make things too complicated, or if it is obvious from the immediate context why a value might not be set, model it as an option type.
To make your option-type handling more fluent, consider using functions from the Option module such as Option.bind, Option.map, and Option.defaultValue to create little pipelines that get you safely through one or more processing stages, each of which might fail. But don’t get hung up on this – pattern matching is also fine. What’s not fine is accessing the .IsSome and .Value properties of an option type!
At the boundary of your system, consider using Option.ofObj and Option.ofNull to move incoming nullable values into the option world and Option.toObj and Option.toNullable for option values leaving your code for other languages.
Avoid exposing option types and DUs in APIs if callers might be written in C# or other languages that might not understand F# types.
Remember the voption type and ValueOption module for optional values you want to be stored as structs. Using voption may have performance benefits.
Summary
In this chapter, you learned how to stop thinking of null values and other missing data items as rare cases to be fended off as an afterthought in your code. You found out how to embrace and handle missing data stylishly using F#’s rich toolbox, including option types, value option types, Discriminated Unions, pattern matching, and the Option and ValueOption modules. These techniques may not come easily at first, but after a while, you’ll wonder how you managed in any other way.
In the next chapter, we’ll look at how to use F#’s vast range of collection functions, functions that allow you to process collections such as arrays, lists, and IEnumerable values with extraordinary fluency.
Exercises
Take the code from Listing 3-16 and update it to support the following scenario:
There is an additional delivery type called “Click and Collect.”
When a BillingDetails instance’s delivery value is “Click and Collect,” we need to store an integer StoreId value but no delivery address. (We still store a billing address as for the other cases.)
Write and try out a function called collectionsFor. It needs to take an integer StoreId and a sequence of BillingDetails instances and return a sequence of “Click-and-Collect” instances for the specified store.
What is the most concise function you can write to count the number of BillingDetails instances that have a nonnull billing address? (Ignore the delivery address.)
Hint: One way to solve this is using two functions from the Option module. Option.ofObj is one of them. The other one we only mentioned in passing, earlier in this chapter. You might also want to use Seq.map and Seq.sumBy.
Exercise Solutions
This section shows solutions for the exercises in this chapter.
You can achieve the requirement by adding a new case called ClickAndCollect of int to the Delivery DU (or ClickAndCollect of storeId:int).