When you remove layers, simplicity and speed happen.
—Ginni Rometty, CEO, IBM
Object Oriented (OO) programming is currently the dominant design approach in almost all software development. In OO, the natural unit of work is, unsurprisingly, the “object” or “class,” and design effort is focused on defining classes that have the right shapes, behaviors, and relationships for the tasks at hand. In F#, by contrast, the natural units of work are types, which describe the shape of data; and functions, units of code that take some (typed) input and produce some (typed) output in a predictable fashion. It makes sense, therefore, to start our journey into stylish F# coding by looking at how best to design and code relatively simple types and functions. It’s a surprisingly rich and rewarding topic.
Miles and Yards (No, Really!)
Some Rail Units of Distance
Name | Equal to |
---|---|
yard | 0.9144 meters |
mile | 1760 yards |
That’s simple enough, but it might surprise you to learn how miles and yards are recorded in some British railway systems. They use a single floating-point value, where the whole miles are in the whole part of the number, and the yards are in the fractional part, using .0 for zero yards and .1759 for 1,759 yards. For example, a mile and a half would be 1.0880, because half a mile is 880 yards. A fractional part greater than .1759 would be invalid, because at 1,760 yards we are at the next mile.
Now you know why I chose British railway mileages as a nice gnarly domain for our coding examples.1 Clearly some rather specific coding is needed to allow railway systems to do apparently straightforward things like reading, calculating with, storing and printing such miles.yards distances. This gives us a great opportunity to exercise our type- and function-design skills.
Converting Miles and Yards to Decimal Miles
Miles Terminolgy
Term | Example Value | Real-World Meaning |
---|---|---|
miles.yards | 1.0880 | One and a half miles |
decimal miles | 1.5 | One and a half miles |
How to Design a Function
Sketch the signature of the function – naïvely, what types of inputs does it take, and what type does it return? What should the function itself be called? Does the planned signature fit well into code that would need to call it?
Code the body of the function, perhaps making some deliberately naïve assumptions if this helps get quickly to a “first cut.”
Ask, does the sketched signature cover the use cases, and eliminate as many potential errors as possible? If not, refine the signature, then the body to match.
In coding the body, did you learn anything about the domain? Did you think of some new error cases that could have been eliminated at the signature level? Is the function name still a good reflection of what it does? Refine the name, signature, and body accordingly.
Rinse and repeat as necessary.
In outlining these steps, I’ve dodged the whole issue of tests. How and when unit tests are written is an important topic, but I’m not getting into that here.
Now let us apply these steps to the miles.yards to decimal miles problem.
Sketch the Signature of the Function
Sketching out a function signature
Here we are saying, “We’ll have a function called convertMilesYards that takes a floating-point input and returns a floating-point result.” The function will compile, meaning that you could even experiment with calling it in other code if you wanted. But there is no danger of forgetting to code the logic of the body, because it will immediately fail if actually called.
Naïvely Code the Body of the Function
Naïvely coded function body
As you can see from the example at the end of Listing 2-2, this actually works fine. If you wanted, you could stop at this point, add some unit tests if you hadn’t written these already, and move on to another task. In fact, for many purposes, particularly scripts and prototypes, the code as it is would be perfectly acceptable. As you go through the next few sections of this chapter, please bear in mind that the changes we make there are refinements rather than absolute necessities. You should make a mental cost-benefit analysis at every stage, depending on how polished and “bullet proof” you need the code to be.
Review the Signature for Type Safety
The next step in the refinement process is to reexamine the signature, to check whether there are any errors we could eliminate using the signature alone. It’s all very well to detect errors using if/then style logic in the body of a function, but how much better it would be to make these errors impossible to even code. Prominent OCaml2 developer Yaron Minsky calls this “making illegal state unrepresentable.” It’s an important technique for making code motivationally transparent and revisable – but it can be a little hard to achieve in code where numeric values are central.
In our example, think about what would happen if we called our naïve function with an argument of 1.1760. If you try this, you’ll see that you get a result of 2.0, which is understandable because (fraction / 0.1760) is 1.0 and, in case you’d forgotten, 1.0 + 1.0 is 2.0. But we already said that fractional parts over 0.1759 are invalid, because from 0.1760 onward, we are into the next mile. If this happened in practice, it would probably indicate that we were calling the conversion function using some other floating-point value that wasn’t intended to represent miles.yards distances, perhaps because we accessed the wrong field in that hypothetical railway GIS. Our current code leaves the door open to this kind of thing happening silently, and when a bug like that gets embedded deep in a system, it can be very hard to find.
Bounds checking within the conversion function
But this isn’t making illegal state unrepresentable; it’s detecting an invalid state after it has happened. It’s not obvious how to fix this, because the milesPointYards input is inherently a floating-point value, and (in contrast to, say Discriminated Unions), we don’t have a direct way to restrict the range of values that can be expressed. Nonetheless, we can bring the error some way forward in the chain.
Miles and yards as a Single-Case Discriminated Union
An immutable class in C#
A Single-Case Discriminated Union without field names
Constructing and validating a MilesYards instance
Note the carefully constructed signature of the create function: it takes a floating-point value (from some external, less strictly-typed source like a GIS) and returns our nice strict MilesYards type. For the body, we’ve brought across some of the code from the previous iteration of our function, including the range validation of the fractional part. Finally, we’ve constructed a MilesYards instance using whole miles and yards.
The mapping from floating point to MilesYards is separately testable from the conversion to decimal yards.
We could use the independent MilesYards type in other useful ways, such as overriding its ToString() method to provide a standard string representation.
The signature and implementation are motivationally transparent. Even if a reader wasn’t familiar with the strange miles.yards convention in British railways, they’d see instantly what we were trying to do, and they’d be very clear that we were doing it deliberately.
Likewise, it’s semantically focused: the reader only has to worry about one thing at a time, in this case the construction of a miles and yards figure consisting of two integers.
The code is also revisable. For example, if a new requirement appeared to create distance values from miles and chains (a chain in railways is 22 yards, and yes, this unit is widely used), it would be obvious what to do.
Computing decimal miles from a MilesYards instance
Again, the signature is super explicit: MilesYards -> float. In the body we use pattern matching to recover the wholeMiles and yards payload values from the MilesYards instance. Then we use the recovered values in a simple computation to produce decimal miles. Incidentally, if you aren’t familiar with Discriminated Unions, the match expression is how we get at the fields of the DU. This is one way in which a DU differs from an immutable class such as the C# example in Listing 2-5.
Review and Refine
At this point, we have a somewhat safer and more explicit implementation. But it’s not time to rest yet: we should still ruthlessly review the signature, naming, and implementation to ensure they are the best they can be.
Using a module to represent a business class
This style of creation, using a from... function within a module, is nice because it leaves open the possibility that we might add additional ways of creating a MilesYards instance. For example, we might later add a fromString function. From the point of view of the caller, they would be doing a MilesYards.fromMilesPointYards or a MilesYards.fromString, which is just about as motivationally transparent as you could wish. We were also able to simplify the name of the conversion function from milesYardsToDecimalMiles to toDecimalMiles.
Avoiding repetitive naming using the T convention
In cases like this, it’s probably best to go with what the community is doing. If you do use T, make sure you explain it to anyone who has to maintain the code, especially if they aren’t familiar with this convention. Another alternative, which you might have to resort to if calling your code from another language that is confused by the double naming, is to add the word Module to the module name, for example, module MilesYardsModule. In the rest of this chapter, I stick with the MilesYards.MilesYards style of naming.
Hiding the DU constructor
Now the only way to create a MilesYards instance is to go via the fromMilesPointYards function or via any other creation functions we might add in the future.
Note
Sometimes making a DU case constructor private in this way can cause problems. For example, test code or serialization/deserialization sometimes needs to see the constructor. Also, you won’t be able to pattern match to recover the underlying values. If using private constructors causes more problems than it solves, just make the case public again (i.e., remove the keyword private), and don’t worry too much about it.
A Final Polish
A pretty good implementation of miles.yards conversion
Pattern matching in parameter declarations
This trick, which only works safely with single-case Discriminated Unions, causes the pattern match to occur at the caller/callee function boundary, rather than within the body of callee. From the caller’s point of view, the type they have to provide (a MilesYards DU instance) is unchanged; but within the callee we have direct access to the fields of the DU, in this case the wholeMiles and yards values. I’m laboring this point slightly, because the first time you see this approach in the wild, it’s incredibly confusing.
Using an operator to simplify mixing floating point and integer values
The reason I chose ~~ as the name of this operator is that the wavy characters are reminiscent of an analog signal.
I personally find this a very useful trick when writing computational code. That said, many F# developers are reluctant to create their own operators, as it can obfuscate code as much as it simplifies. I’ll leave the choice to you.
Recommendations
To write a function, first define the required signature, then write the body. Refine the signature and body until as many errors as possible are eliminated declaratively at the signature (type) level; and remaining errors are handled imperatively in the function body.
To model a business type, consider using a Single-Case Discriminated Union, perhaps with a private case constructor, embedded in a module. Only allow instances to be created via functions in the same module, and validate inputs in those functions. Provide other functions to act on the type (for example, to convert to other types) in that same module.
Be aware of the existence of the “T” convention for naming business types, but be cautious about using it in new code.
Consider using operators – sparingly - to simplify code. In particular, consider declaring conversion operators such as ~~ to simplify code that mixes floating point and integer values.
Summary
In this chapter you learned how to design and write a function. You started by thinking about types: what type or types the function should take as parameters, and what type it should return. Then you coded the body of the function, before circling back to the type signature to try and eliminate possible errors. You learned how to embed a type representing some business item in a module, together with supporting functions to instantiate the type, and to transform the type to another type. You learned the importance of single-case Discriminated Unions, and about the usefulness of hiding the constructor to maximize type safety. Finally, you learned a couple of tricks to simplify your code: doing pattern matching in the declaration of a function parameter, and using operators to simplify common operations such as casting to float.
In the next chapter, we’ll look at missing data: how best to express the concept that a data item is missing or irrelevant in a particular context.
Exercises
Here are some exercises to help you hone the skills you’ve gained so far. Exercise solutions are at the end of the chapter.
Exercise 2-1 – Handling Negative Distances
There’s a hole in the validation presented above: we haven’t said anything about what happens when the input distance is negative. If we decided that negative distances simply aren’t valid (because miles.yards values always represent a physical position on a railway network), what would you need to change in the code to prevent negative values entering the domain?
Hint: You could do this around the same point in the code where we already check the range of the yards value.
Exercise 2-2 – Handling Distances Involving Chains
Write a new module that can create a distance in whole miles and chains, and convert such a miles-and-chains distance to decimal miles. The only way to create the new MilesChains distance should be by supplying a whole miles and a chains input (i.e., two positive integers), so unlike MilesYards you won’t need a fromMilesPointYards function.
Hint: There are 80 chains in a mile.
Exercise Solutions
Exercise 2-1 – Handling Negative Distances
Exercise 2-2 – Handling Distances Involving Chains
To complete this exercise, you need to create a Single-Case Discriminated Union much like the MilesYards DU, but with wholeMiles and chains as its fields. Add a fromMilesChains function that range-validates the wholeMiles and chains arguments, then uses them to make a MilesChains instance.