When you remove layers, simplicity and speed happen.
—Ginni Rometty, CEO, IBM
Object-Oriented (OO) programming is currently the dominant design approach in almost all software development. In OO, the natural unit of work is, unsurprisingly, the “object” or “class,” and design effort is focused on defining classes that have the right shapes, behaviors, and relationships for the tasks at hand. In F#, by contrast, the natural units of work are types, which describe the shape of data, and functions, units of code that take some (typed) input and produce some (typed) output in a predictable fashion. It makes sense, therefore, to start our journey into stylish F# coding by looking at how best to design and code relatively simple types and functions. It’s a surprisingly rich and rewarding topic.
Miles and Yards (No, Really!)
Some Rail Units of Distance
Name | Equal to |
---|---|
Yard | 0.9144 meters |
Mile | 1760 yards |
That’s simple enough, but it might surprise you to learn how miles and yards are recorded in some British railway systems. They use a single floating-point value, where the whole miles are in the whole part of the number and the yards are in the fractional part, using .0 for zero yards and .1759 for 1,759 yards. For example, a mile and a half would be 1.0880 because half a mile is 880 yards. A fractional part greater than .1759 would be invalid because at 1,760 yards, we are at the next mile.
Now you know why I chose British railway mileages as a nice gnarly domain for our coding examples.1 Clearly, some rather specific coding is needed to allow railway systems to do apparently straightforward things like reading, calculating with, storing, and printing such miles.yards distances. This gives us a great opportunity to exercise our type- and function-design skills.
Converting Miles and Yards to Decimal Miles
Miles Terminology
Term | Example Value | Real-World Meaning |
---|---|---|
miles.yards | 1.0880 | One and a half miles |
decimal miles | 1.5 | One and a half miles |
How to Design a Function
Sketch the signature of the function – naively, what types of inputs does it take, and what type does it return? What should the function itself be called? Does the planned signature fit well into code that would need to call it?
Code the body of the function, perhaps making some deliberately naive assumptions if this helps get quickly to a “first cut.”
Ask, does the sketched signature cover the use cases and eliminate as many potential errors as possible? If not, refine the signature and then the body to match.
In coding the body, did you learn anything about the domain? Did you think of some new error cases that could have been eliminated at the signature level? Is the function name still a good reflection of what it does? Refine the name, signature, and body accordingly.
Rinse and repeat as necessary.
In outlining these steps, I’ve dodged the whole issue of tests. How and when unit tests are written is an important topic, but I’m not getting into that here.
Now let us apply these steps to the miles.yards to decimal miles problem.
Sketch the Signature of the Function
You can sketch out the signature of a function straight into code by typing the let binding of the function, using specified rather than inferred types, and making the body of the function simply raise an exception. Listing 2-1 shows my initial thought on the miles.yards to decimal miles converter.
Here we are saying, “We’ll have a function called convertMilesYards that takes a floating-point input and returns a floating-point result.” The function will compile, meaning that you could even experiment with calling it in other code if you wanted. But there is no danger of forgetting to code the logic of the body because it will immediately fail if actually called.
Naively Code the Body of the Function
Now we can replace the exception in the body of the function with some real code. In the miles.yards example, this means separating the “whole miles” element (for instance, the “1” part of 1.0880) from the fractional part (the 0.0880) and dividing the fractional part by 0.1760 (remembering that there are 1,760 yards in a mile). Listing 2-2 shows how this looks in code.
As you can see from the example at the end of Listing 2-2, this actually works fine. If you wanted, you could stop at this point, add some unit tests if you hadn’t written these already, and move on to another task. In fact, for many purposes, particularly scripts and prototypes, the code as it is would be perfectly acceptable. As you go through the next few sections of this chapter, please bear in mind that the changes we make there are refinements rather than absolute necessities. You should make a mental cost-benefit analysis at every stage, depending on how polished and “bullet proof” you need the code to be.
Review the Signature for Type Safety
The next step in the refinement process is to reexamine the signature, to check whether there are any errors we could eliminate using the signature alone. It’s all very well to detect errors using if/then style logic in the body of a function, but it would be much better to make these errors impossible to even code. Prominent OCaml2 developer Yaron Minsky calls this “making illegal state unrepresentable.” It’s an important technique for making code motivationally transparent and revisable – but it can be a little hard to achieve in code where numeric values are central.
In our example, think about what would happen if we called our naive function with an argument of 1.1760. If you try this, you’ll see that you get a result of 2.0, which is understandable because (fraction / 0.1760) is 1.0 and, in case you’d forgotten, 1.0 + 1.0 is 2.0. But we already said that fractional parts over 0.1759 are invalid because from 0.1760 onward, we are into the next mile. If this happened in practice, it would probably indicate that we were calling the conversion function using some other floating-point value that wasn’t intended to represent miles.yards distances, perhaps because we accessed the wrong field in that hypothetical railway GIS. Our current code leaves the door open to this kind of thing happening silently, and when a bug like that gets embedded deep in a system, it can be very hard to find.
A traditional way of handling this would be to check the fractional part in the body of the conversion function and to raise an exception when it was out of range. Listing 2-3 shows that being done. (As a brief digression, note how we use nameof when raising the exception so that the correct name is output even if the parameter is renamed.)
But this isn’t making illegal state unrepresentable; it’s detecting an invalid state after it has happened. It’s not obvious how to fix this because the milesPointYards input is inherently a floating-point value, and (in contrast to, say, Discriminated Unions) we don’t have a direct way to restrict the range of values that can be expressed. Nonetheless, we can bring the error some way forward in the chain.
We start the process by noting that miles.yards could be viewed as a pair of integers, one for the miles and one for the yards. (In railways miles.yards distances, we disregard fractional yards.) This leads naturally to representing miles.yards as a Single-Case Discriminated Union (Listing 2-4.)
Just in case you aren’t familiar with Discriminated Unions, we are declaring a type called MilesYards, with two integer fields called wholeMiles and yards. From a construction point of view, it’s broadly the same as the C# in Listing 2-5. Consumption-wise though, it’s very different, as we’ll discover in a moment.
I should also mention that in Discriminated Union declarations, the field names (in this case, wholeMiles and yards) are optional, so you will often encounter declarations without them, as in Listing 2-6. I prefer to include field names, even though it’s a little wordier, because this improves motivational transparency.
Going back to our function design task, we’ve satisfied the need for a type that models the fact that miles.yards is really two integers. How do we integrate that with the computation we set out to do? The trick is to isolate the construction of a MilesYards instance from any computation. This is an extreme version of “separation of concerns”: here the concern of constructing a valid instance of miles.yards is a separate one from the concern of using it in a computation. Listing 2-7 shows the construction phase.
Note the carefully constructed signature of the create function: it takes a floatingpoint value (from some external, less strictly typed source like a GIS) and returns our nice strict MilesYards type. For the body, we’ve brought across some of the code from the previous iteration of our function, including the bits that validate the range of the fractional part. Finally, we’ve constructed a MilesYards instance using whole miles and yards.
The mapping from floating point to MilesYards is separately testable from the conversion to decimal yards.
We could use the independent MilesYards type in other useful ways, such as overriding its ToString() method to provide a standard string representation.
The signature and implementation are motivationally transparent. Even if a reader wasn’t familiar with the strange miles.yards convention in British railways, they’d see instantly what we were trying to do, and they’d be very clear that we were doing it deliberately.
Likewise, it’s semantically focused: the reader only has to worry about one thing at a time.
The code is also revisable. For example, if a new requirement surfaced to create distance values from miles and chains (a chain in railways is 22 yards, and yes, this unit is widely used), it would be obvious what to do.
Now it only remains to implement the computation. Listing 2-8 shows a first cut of code to do that.
Again, the signature is super explicit: MilesYards -> float. In the body, we use pattern matching to recover the wholeMiles and yards payload values from the MilesYards instance. Then we use the recovered values in a simple computation to produce decimal miles. Incidentally, if you aren’t familiar with Discriminated Unions, the match expression is how we get at the fields of the DU. This is one way in which a DU differs from an immutable class such as the C# example in Listing 2-5.
Review and Refine
At this point, we have a somewhat safer and more explicit implementation. But it’s not time to rest yet: we should still ruthlessly review the signature, naming, and implementation to ensure they are the best they can be.
The first thing that might jump out at you is the naming of the create function. “Create” is rather a vague word. What if we wanted to create an instance from some other type, such as a string? We could perhaps rename create to fromMilesPointYards - but that still leaves open the issue of what we are creating. And if we incorporated the result type in the name as well, it would be too long.3 How about moving the function into a module with the same name as the type and naming it fromMilesPointYards (Listing 2-9)?
This style of creation, using a from... function within a module, is nice because it leaves open the possibility that we might add additional ways of creating a MilesYards instance. For example, we might later add a fromString function. From the point of view of the caller, they would be doing a MilesYards.fromMilesPointYards or a MilesYards.fromString, which is just about as motivationally transparent as you could wish. We were also able to simplify the name of the conversion function from milesYardsToDecimalMiles to toDecimalMiles.
Thus, they’d bypass our carefully crafted fromMilesPointYards function. If this really bothers you, you can move the Single-Case Discriminated Union inside the module and make its case private (Listing 2-10).
Now the only way to create a MilesYards instance is to go via the fromMilesPointYards function or via any other creation functions we might add in the future.
Sometimes, making a DU case constructor private in this way can cause problems. For example, test code or serialization/deserialization sometimes needs to see the constructor. Also, you won’t be able to pattern match to recover the underlying values. If using private constructors causes more problems than it solves, just put the type outside the module again, and don’t worry too much about it.
A Final Polish
Time for a last look at the code to see if there is anything we can improve or simplify. Listing 2-11 shows where we are so far. (I have reverted to the type-outside-module style we were using prior to Listing 2-10, as this is what I find myself doing most in practice.)
I now only have a couple of objections to this code, and they are both in the area of conciseness. The first is that we can avoid the match expression in the body of toDecimalMiles. Perhaps surprisingly, the way to do that is to move the pattern matching into the parameter declaration! Listing 2-12 shows before-and-after versions of the function.
This trick, which only works safely with Single-Case Discriminated Unions, causes the pattern match to occur at the caller/callee function boundary, rather than within the body of callee. From the caller’s point of view, the type they have to provide (a MilesYards DU instance) is unchanged; but within the callee, we have direct access to the fields of the DU, in this case, the wholeMiles and yards values. I’m laboring this point slightly because the first time you see this approach in the wild, it can be incredibly confusing.
This casting is necessary because F# is stricter when mixing integers and floating-point types than, for example, C#. You have to explicitly cast in one direction or the other, which is intended to help you focus on your code’s intentions and thus to avoid subtle floating-point bugs. However, all those brackets and float keywords do make the code a bit wordy. We can get around this by creating a little operator to do the work. Listing 2-13 shows how this looks. (Obviously, you can put the operator in a different scope if you want to use it more widely.)
The reason I chose ~~ as the name of this operator is that the wavy characters are reminiscent of an analog signal.
I personally find this a very useful trick when writing computational code. That said, many F# developers are reluctant to create their own operators, as it can obfuscate code as much as it simplifies. I’ll leave the choice to you.
Recommendations
To write a function, first define the required signature and then write the body. Refine the signature and body until as many errors as possible are eliminated declaratively at the signature (type) level, and remaining errors are handled imperatively in the function body.
To model a business type, consider using a Single-Case Discriminated Union. Provide functions to act on the type (e.g., to create instances and to convert to other types) in a module with the same name as the type. For extra safety, optionally put the type inside the module and make its single case private.
Consider using operators – sparingly - to simplify code. In particular, consider declaring conversion operators such as ~~ to simplify code that mixes floating-point and integer values.
Summary
In this chapter, you learned how to design and write a function. You started by thinking about types: what type or types the function should take as parameters and what type it should return. Then you coded the body of the function, before circling back to the type signature to try and eliminate possible errors. You learned how to define a Single-Case Discriminated Union type representing some business item together with supporting functions to instantiate the type and to transform the type to another type. You learned the importance of Single-Case Discriminated Unions and about the usefulness of hiding the constructor to maximize type safety. Finally, you learned a couple of tricks to simplify your code: doing pattern matching in the declaration of a function parameter and using operators to simplify common operations such as casting to float.
In the next chapter, we’ll look at missing data: how best to express the concept that a data item is missing or irrelevant in a particular context.
Exercises
Here are some exercises to help you hone the skills you’ve gained so far. Exercise solutions are at the end of the chapter.
There’s a hole in the validation presented previously: we haven’t said anything about what happens when the input distance is negative. If we decided that negative distances simply aren’t valid (because miles.yards values always represent a physical position on a railway network), what would you need to change in the code to prevent negative values entering the domain?
Hint: You could do this around the same point in the code where we already check the range of the yards value.
Write a new type and module that can create and represent a distance in whole miles and chains and convert such a miles-and-chains distance to decimal miles. The only way to create the new MilesChains distance should be by supplying a whole miles and a chains input (i.e., two positive integers), so unlike MilesYards, you won’t need a fromMilesPointYards function.
Hint: There are 80 chains in a mile.
Exercise Solutions
To complete this exercise, you need to create a Single-Case Discriminated Union much like the MilesYards DU, but with wholeMiles and chains as its fields. Since the exercise states that you should only be able to create valid instances, put the DU into a module and make its case private. Add a fromMilesChains function that range-validates the wholeMiles and chains arguments and then use them to make a MilesChains instance.