I’m an intuitive musician. I have no real technical skills. I can only play six chords on the guitar.
—Patti Smith, Musician
Collection functions are central to productive F# coding. Trying to code without them is as crazy as trying to play the guitar without learning chords. Though you don’t need to know all the chords to be a decent guitarist, you do need the basic ones, and to be able to use them instinctively. The same is true for using collection functions when coding in F#.
But how to start with this seemingly mountainous task? There are over 100 collection functions in the Array module alone! This chapter will get you familiar with the most useful collection functions and show you how to combine them to achieve complex processing tasks with just a few lines of code. I’ll also show you how to spot and recover from mistakes commonly made when using collection functions.
Anatomy of a Collection Function
If you’ve coded at all in F#, you’re probably already familiar with the concept of collection functions, at least through examples such as Array.map and Array.filter. Likewise, if you’re primarily a C# developer, you’ll be familiar with the equivalents in LINQ: Select and Where. But just in case you aren’t familiar with collection functions, here’s a quick primer.
The collection functions in F# are a set of functions that are always available in F# (you don’t have to bring in any extra dependencies) and which let you “do something” with a collection. A collection in this context is a grouping of values of the same type, such as an array, an F# list, or any type that implements IEnumerable. The kinds of operations you can perform are things like filtering, sorting, or transforming. Listing 4-1 shows an example of filtering.
In the numeric literals such as 250_000m in Listing 4-1, the underscores are just a readability aid, equivalent to the commas we might use when handwriting the numbers. The m suffix specifies that these are decimal values, which is a good choice when handling money amounts.
Filtering example
In the C# LINQ and SQL worlds, this is known as a Where operation rather than a filter operation – it’s the same thing.
A function that defines details of the operation we want to perform.
For example, the Array.filter function takes a function that itself takes a collection element and returns a Boolean value. Elements where that function returns true are returned in the result of the filter operation.
An instance of the collection we want to work on – for example, an array.
This is a different approach from the one in C#, where collection functions are normally defined as extension methods on the type. For example, using LINQ in C#, we would do a houses.Where to perform filtering, "Where" being an extension method on the type of houses.
Collection functions that take and use a function argument are “higher-order functions.” But some collection functions, such as Array.sort, don’t take a function argument (in the case of sort because the sorting is done using default comparison). These ones are collection functions but aren’t higher-order functions.
Some of the collection functions need additional arguments. For instance, Array.init, which creates an array, needs to be told how many elements to create.
- 1.
Any parameters that don’t fall into the other two categories – for example, the length of the required collection
- 2.
The function to be applied
- 3.
The collection itself, always as the last parameter
This ordering makes collection functions “pipeline friendly” because the forward-pipe operator passes the result of the preceding operation (typically in this context, a collection) into the last parameter of the next operation.
When you define your own collection functions, use the same parameter ordering style as the built-in functions so that yours will also be pipeline friendly.
The other essential property of collection functions is that they are designed to be used in an immutable way. For example, if you filter a collection, you end up with a new collection containing only the desired elements. The original collection is unaffected. The one slight exception is iter, which returns unit and therefore doesn’t convey any useful information back to the caller. Instead, you would use iter to do something in the “outside world” like printing output or sending messages.
Picking the Right Collection Function
Commonly Used Collection Functions
Begin with | End up with | Functions | |
---|---|---|---|
Many | Equally Many | map, mapi, sort, sortBy, rev | |
Many | Fewer | filter, choose, distinct, take, truncate, tail, sub | |
Many | One | length, fold, reduce, average, head, sum, max, maxBy, min, minBy, find, pick | |
Many | Boolean | exists, forall, isEmpty | |
Nothing | Many | init, create, unfold1 | |
Many | Nothing (except side effects) | iter, iteri | |
Many of Many | Many | concat, collect | |
Many | Groupings | groupBy | |
2 of Many | Many | append, zip | |
Many | 2 of Many | partition |
How many elements do I want to start off with?
How many elements do I want to end up with?
We’re not talking absolute numbers here, but in terms of options such as no elements , exactly one element (or value), or up-to-n elements. In the table, I’ve called these cardinalities Nothing, One, and Many. When I say Many-to-Equally Many, I mean that the number of elements returned is the same as the number provided. In cases where the function will return at most the same number of elements (but probably fewer), I’ve called the cardinality Fewer.
Oddly enough, thinking first about the cardinality of the operation you want is better than thinking first about the specific operation.
Table 4-1 doesn’t cover all the collection functions, just the ones that are most widely used. Once you get used to thinking in terms of collection operations by practicing with the common ones listed previously, you’ll find it relatively easy to scan the documentation for the more exotic functions, such as Array.sortInPlaceWith.
Many-to-Equally Many Collection Functions
Function | Description | Useful Variants |
---|---|---|
map | Takes each of the input values, applies the provided function to it, and returns all the results | Array.Parallel.map |
mapi | As map, but the provided function is called with two arguments: an index value starting with 0 and ending with n-1 and the current element | Parallel.mapi |
rev | Returns a collection containing the original elements in reverse order | |
sort | Returns a collection containing all the elements, but sorted using the default comparer for the element type | sortBy |
sortBy | As sort, but compares using not the elements, but the results of sending the elements to the provided function | sortByDescending, sortWith |
Many-to-Fewer Collection Functions
Function | Description | Useful Variants and Alternatives |
---|---|---|
filter | Returns only those elements that return true when the provided filtering function is applied to them | |
choose | Applies the provided function to each element and returns the values of function results when those function results were Some(value) | |
distinct | Returns the elements after eliminating any duplicates, where duplicates are identified using the default comparer for the element type | distinctBy |
sub | Returns a subset of the elements, starting with the element at the specified index and continuing for the specified length (available for arrays only) | Array slicing syntax, for example, let arr2 = arr.[3..5] |
take | Returns the first n elements | takeWhile, truncate |
truncate | Returns at most the first n elements (fewer if the collection contains fewer than n elements) | |
tail | Returns all elements after the first element |
Many-to-One Collection Functions
Function | Description | Useful Variants |
---|---|---|
length | Calculates the number of elements in the collection | Also available as a property on arrays and F# lists, for example, arr.Length |
fold | Starts with an initial value, applies the provided function to that value and the first element of the collection, then applies the function to the previous result and the second element of the collection, and so forth until all the elements have been used. Returns the final accumulated result of all these operations | foldBack |
reduce | Like fold, but takes its initial state from the first element | reduceBack |
average | Computes the average value of the elements | averageBy |
head | Gets the first element | |
sum | Computes the total value of the elements | sumBy |
max | Gets the maximum element | maxBy |
min | Gets the minimum element | minBy |
find | Gets the first element for which the provided function returns true | tryFind, pick |
pick | Returns the first result for which the provided function returns Some | tryPick, find |
Many-to-Boolean Collection Functions
Function | Description | Useful Variants |
---|---|---|
exists | Returns true if any of the elements returns true when passed into the provided function | |
forall | Returns true if all the elements return true when passed into the provided function | |
isEmpty | Returns true if the collection has no elements |
Nothing-to-Many Collection Functions
Function | Description | Useful Variants and Alternatives |
---|---|---|
init | Creates a collection with n elements, where each element value is created by calling the provided function. An index parameter (starting at 0 and ending at n-1) is provided to each call to the function | initInfinite (for sequences) |
create | Creates a collection with n elements, whose elements are initially the specified single value (available for arrays only) | zeroCreate |
unfold | Creates a collection by taking a specified initial value and passing it to the provided “generator” function. If the generator function returns, say, Some(x,y), then x is added to the sequence and y is passed into the next iteration. If the function returns None, then the sequence ends | Array and list comprehensions |
Many-to-Nothing Collection Functions
Function | Description | Useful Variants and Alternatives |
---|---|---|
iter | Takes each collection element in turn and executes the provided function using the element. The provided function needs to return nothing (in F# terms, unit, denoted by the literal ()). Thus, the only way iter can affect the outside world is via “side effects,” such as writing files, printing lines to the console, and updating mutable values | iteri |
iteri | As iter, but the provided function is called with two arguments: the current element and an index value starting with 0 and ending with n-1 |
Many-of-Many to Many Collection Functions
Function | Description | Useful Variants and Alternatives |
---|---|---|
concat | Takes a collection of collections and returns a single collection of all the input elements. Note the distinction between concat and append. concat takes a collection of collections, whereas append takes exactly two collections | |
collect | Takes a collection, applies the provided function to each of the elements (where the function itself returns a collection) and returns a single collection of all the results. Whenever you find yourself using a map followed by a concat, it’s likely you can replace this with collect. Strictly speaking, this isn’t a “Many-of-Many to Many” operation, but it feels most natural to put it in this category |
Many-to-Groupings Collection Functions
Function | Description | Useful Variants and Alternatives |
---|---|---|
groupBy | Takes a collection, applies the provided function to each of the elements, and returns the distinct values of the results, together with all the elements that resulted in each key result |
2-of-Many to Many Collection Functions
Function | Description | Useful Variants and Alternatives |
---|---|---|
append | Creates a collection consisting of all the elements from both the input collections | |
zip | Takes two collections and returns a single collection, each of whose elements is a tuple of the corresponding values from each of the input collections | zip3 |
Many to 2-of-Many Collection Functions
Function | Description | Useful Variants and Alternatives |
---|---|---|
partition | Takes a collection and returns a tuple of two collections, the first of which contains elements that returned true when the provided function was applied, and the second contains those which returned false (available for arrays and F# lists only) |
Detailed Collection Function Tables
Just skim these detailed tables for now and come back to them as reference when you do the exercises that follow.
Practicing with Collection Functions
Much of the productivity gain from programming in F# comes from effective use of collection functions. So I want to spend a little time practicing how to choose and combine them in a variety of situations. This section contains some exercises that will let you do just that. They get progressively more difficult, so please make sure you take the time to go through them in order so that your skills in this area are really secure.
Remember to refer back to Tables 4-1 through 4-11 to help you find the right collection function in each case. All the exercises in this section can be solved with a call to a single collection function. We’ll explore tasks needing several collection functions in the next section.
Exercise Setup
Setup code for exercises
Take a moment to read the code in Listing 4-2. It provides a type called House, which has an address and a price. (This is a very naive model but will do for the topics covered in this chapter.) There is also a function called House.getRandom, which will create some House instances for you, with random prices and ascending street numbers. I’ve hardwired the seed of the random number generator so you always get the same results, which will make debugging some of your exercise solutions easier. The usage of the Distance.tryToSchool and PriceBand.fromPrice functions will become apparent as you go through the exercises.
As you tackle the exercises, structure your code as I do when I go through Exercise 4-1 with you. This will help you concentrate on the logic of the collection functions. Solutions for the exercises are at the end of the chapter.
Single Collection Function Exercises
Each of the exercises in this section can be solved using just one collection function.
Note I’ll do this exercise with you to help you get used to working with the provided code and the collection functions tables.
The number of decimal places displayed for the price doesn’t matter.
Binding a value
Getting the houses sample
Calling the map function
In Listing 4-5, I’ve typed the final closing bracket for the map’s lambda function, even though the function doesn’t yet have a body. Doing this helps to ensure that Intellisense works correctly while you type the body of the lambda function.
Providing a lambda body for the map function
Output from a successful run of the exercise code
Exercise solved!
Now tackle the remaining exercises on your own. You can solve each of them with a single collection function, and the code in each case can be structured in a very similar way to Listing 4-6.
Take a sample of 20 houses and calculate the average of their prices.
You can assume the list isn’t empty (you know it has 20 houses!).
Take a sample of 20 houses and get all the houses that cost over $250,000.
Take a sample of 20 houses and return an array of tuples, each tuple containing a house and the distance to the nearest school. Use the Distance.tryToSchool function to calculate the distance. Exclude houses for which this function returns None.
Clue: Although you can achieve this in a single collection function, the lambda it uses will need to do some pattern matching on the Some/None cases returned by Distance.tryToSchool.
Multiple Collection Function Exercises
By now, you should be pretty slick at selecting and using an individual collection function to solve a problem. Now it’s time to practice combining collection functions to solve slightly more complex problems. You’ll need to use more than one collection function for each of the exercises in this section. When tackling these exercises, remember to think about the cardinality (e.g., Many-to-Fewer) of each step you need to achieve the goal.
Note I’ll do this exercise with you to help you get used to combining collection functions.
Take a sample of 20 houses, find the ones that cost over $100,000, and iterate over the results printing (not returning) their addresses and prices. The exact format doesn’t matter, as long as the address and price are printed in some form.
You should be able to complete this exercise using two collection functions.
If you remember the previous section, you’ll know immediately that the first function you’ll need is the Many-to-Fewer function filter.
Filtering the sample
Reading the second part of the exercise, you might notice that you aren’t required to create (in F# terms, bind) an actual value, but merely to print results. (In functional programming terms, we are using side effects.) This means that we need a Many-to-None operation, which should help you quickly narrow your choice down to the iter function.
Iterating over an array
Now go on to complete the other multifunction exercises yourself. In each case, you should be able to solve the exercise by using two or more collection functions, piped together as we did here.
Extend the previous exercise, this time ensuring that the houses are printed in descending order of price.
You should be able to complete this exercise using three collection functions (including the two already used in Exercise 4-5).
Take a sample of 20 houses and find the average price of all the houses that cost over $200,000.
You can assume for this exercise that there will be at least one house that fulfills the criterion.
You should be able to complete this exercise using two collection functions.
Take a sample of 20 houses and find the first house that costs less than $100,000 and for which we can calculate the distance to a school. The results should include the house instance and the calculated distance to school.
You can assume for this exercise that there will be at least one house that fulfills the criteria.
You should be able to complete this exercise using two collection functions.
Clue: You can reuse some of the solution code from Exercise 4-4 to help complete this exercise.
Take a sample of 20 houses, and create an array of tuples, where the first element of each tuple is a price band. A price “band” is a range of prices created using the provided PriceBand.fromPrice function. The second element of the tuple is a sequence of all the houses that fall into the band.
It’s OK if a band is omitted when there are no houses in that band. Within a grouping, the houses should be in ascending order of price.
You should be able to complete this exercise using three collection functions.
Partial Functions
There’s another characteristic of collection functions that I omitted from the tables and exercises shown previously for simplicity, but which you must always bear in mind. That characteristic is whether the collection function is partial. (This is a separate concept from the concept of partial application, which I tackle in Chapter 9.)
In this context, a function is partial if it can cause an error rather than return a value, even when given a logically possible input. We’re talking here about errors that are inherent to the input and the function in question, not externally induced conditions such as running out of memory or having one’s network connection fall over. A good example of a function that is partial in this sense is the head function (e.g., Array.head). Using head on an empty collection will cause an ArgumentException. An empty collection doesn’t have a first element.
Another example is the zip function, but here the situation is a little trickier. It’s an error to perform an Array.zip when the input arrays are different lengths. But it’s fine to use Seq.zip when the input sequences are different lengths: the leftover elements in the longer sequence will just be ignored.
Partial Collection Functions to Watch Out For
Function | Error Condition | Ways to Avoid |
---|---|---|
average, max, maxBy, min, minBy | Collection is empty | Check length first and define a suitable value (e.g., 0 or None) in that situation. Also be sure to check whether there is an official implementation of tryAverage, tryMax, etc. (At the time of writing, these have not been implemented) |
find | No elements matched the condition (or the collection was empty) | Use tryFind and handle the Some() and None cases when consuming the result |
pick | No elements matched the condition (or the collection was empty) | Use tryPick and handle the Some() and None cases when consuming the result |
reduce | Collection is empty, so there is no way to get an initial state for the accumulator | Use fold and provide an explicit initial state |
sub | Collection doesn’t have enough elements | Check ranges first Use filter to select elements instead |
zip (Array and List versions) | Collections are different lengths | Check lengths are equal Use Seq.zip and accept that “leftover” elements will be lost |
head and last | Collection is empty | Use tryHead or tryLast Check length first and define a suitable value in that situation |
tail | Collection is empty | Check length first and define a suitable value in that situation |
By the way, any kind of function can be partial. The issue doesn’t just affect collection functions, but it does crop up most commonly in practice when using collection functions.
Get in the habit of thinking about partiality whenever using a collection function, and handle the failure cases explicitly.
Don’t think about a function being partial as a bug in the function: all these cases are inherent in the nature of the function. How can you possibly get the maximum value in an empty list?
Incidentally, a function that isn’t partial in this sense is known as a total function, though you will hardly ever hear this term used outside a math or computer science context (I had to look it up).
Coding Around Partial Functions
As you can see from Table 4-12, many built-in collection functions have try... equivalents (e.g., tryFind), which return None if there is no value to return, or Some(value) if there is, thus making them nice safe total functions. When no such function is available (or you don’t want to use it), there are several other things you can do.
A function to compute an array average, or zero when the array is empty
A generic function to compute an array average, or zero when the array is empty
A function to compute an array average, or a caller-supplied default when the array is empty
Note that in both Listings 4-11 and 4-12, I’ve had to “inline” the functions using the inline keyword because they call Array.average, which has a static parameter.
Remember the principle of semantic focus: the place to handle, for example, the empty collection case is right here in the code where it could occur. Don’t rely on the caller to condition your inputs to prevent conditions such as an empty collection. The calling code might get changed, or your function might get used in new code, and in either case, the input prechecking might be forgotten about.
Using the “try” Idiom for Partial Functions
Defining an idiomatic tryAverage function
Notice that in the averageOr and tryAverage example in Listings 4-10 through 4-13, I put the function in a module called Array. This means that, for example, tryAverage will be available elsewhere as Array.tryAverage and can thus be used in exactly the same way as the built-in functions such as Array.tryFind.
It displays good semantic focus because everything about the process of calculating an average (and returning None when not possible) is handled in one place in the code. The decision as to what to do when the result is None (use a default, raise an error, or whatever) is delegated back to where it should be, in the caller, which is likely to have more “knowledge” about the particular case where the averaging is required.
It displays good motivational transparency: you are saying to the reader, “Here I intend to define a function which behaves like other, similarly named functions such as tryHead.” This leverages the reader’s existing knowledge of how such functions behave, making the code a lot easier to read.
Consuming Values from try… Functions
Consuming option type results using match expressions
There are arguably nicer alternatives to this, which are discussed in Chapters 3 and 11.
Try… Function Exercises
These exercises are variations on some of the previous exercises, except here we remove the assumption that the relevant collection is nonempty.
Take a sample of 20 houses and find the average price of all the houses that cost over $200,000.
You’ll need to make sure you handle the case where no houses in the sample cost over $200,000. (You will need to change the price criterion a little to test this.)
You should be able to complete this exercise using two collection functions, but you may need to define one of these functions yourself.
Take a sample of 20 houses and find the first house that costs less than $100,000 and for which we can calculate the distance to a school. The results should include the house instance and the calculated distance to school.
You’ll need to make sure you handle the case where no houses meet the criteria. (You will need to change the price criterion a little to test this.)
You should be able to complete this exercise using two collection functions.
Clue: You can reuse some of the solution code from previous exercises to help complete this exercise.
Functions for Other Kinds of Collections
Less-Well-Known Collection Functions
Module | Purpose |
---|---|
Array2D, Array3D, Array4D | Basic operations on n-dimensional arrays |
Map | Basic operations on the Map type |
Set | Basic operations on the Set type |
Using Set.map
Note that the Set.map operation is strictly speaking a Many-to-Fewer operation, since it produces a Set, and sets inherently eliminate duplicates. For example, if the input set contained “The” and “the,” the output set would contain only “the.”
When the Collection Function Is Missing
Using Array.partition on a sequence
You can convert the result back to the original collection type if necessary. For example, you could add |> Seq.ofArray to the end of Listing 4-16.
Common Mistakes
Forgetting which functions are partial: I covered this in the section on partial functions previously. Always handle the error cases (such as an empty collection) explicitly, typically by using the try... version of the function.
Not using the choose function: In my early days with F#, I would often write pipelines that called a function that might return None, then filtered for the Some cases, and finally recovered the underlying values by using pattern matching or the Option.Value property. When you catch yourself doing this, use the choose function instead. It does the Some filtering and the value recovery for you. See Listing 4-17.
Not using the collect function: You may find yourself writing a pipeline that produces a collection of collections and then immediately joins these into a single collection using the concat function. Instead, use the collect function to achieve this in a single step.
Long lambda bodies: If the body of a lambda function gets beyond two or three lines, consider pulling it out into a separate, named function and calling that. This will help you mentally isolate the logic of the function from the logic of the pipeline as a whole. It’ll also reduce indenting! See Listing 4-18.
Lambdas that could be direct calls: Whenever your code contains code like (fun x -> doSomething x), it can be replaced simply with doSomething. Listing 4-17 contains an example of this more concise approach, where we do |> Array.map trySchoolDistance instead of |> Array.map (fun h -> trySchoolDistance h).
Overlong pipelines: Pipelines that contain more than a handful of forward-pipe operations can be hard to read and debug. Consider breaking them up, perhaps by binding an intermediate value and then passing this into a separate pipeline. This problem can be mitigated by using anonymous record types instead of tuples to carry structured values between pipeline stages. This can make the meaning of intermediate values clearer by careful naming. We’ll discuss this properly in Chapter 7.
Overlong or obscure tuples: Certain operations naturally produce tuples, which you will then want to pattern match back into individual values, for processing in the next step of your pipeline. Again, the solution to this is often anonymous records. See Listing 4-19.
Using the choose function
Avoiding long lambda functions
Replacing tuples with anonymous records
Recommendations
Become familiar with the many collection functions available to you in modules such as Array, List, and Seq and the more specialized modules such as Map and Set.
Learn how to map from the problem you are trying to solve (e.g., “I have an array of numbers and I want the average of the largest three”) to the type signatures that are likely to help solve them (e.g., 'T [] -> 'T [] for the sorting, 'T [] -> 'T [] for the top-three selection, and ^T [] -> ^T for the average) and from there to the specific collections you are going to need (Array.sort, Array.truncate, and Array.average).
If you find type signatures a little inaccessible, refer back to Tables 4-2 through 4-11 for a more visual reference to the most useful collection functions.
Beware of collection functions that are partial, such as Array.head, which raises an exception when the array is empty. Use the try... version (e.g., Array.tryHead), or if there isn’t one, consider writing one using the try... naming style.
Think of explicit looping, especially using mutable values, as a last resort. There’s usually a collection function or a combination of them that will do the job more simply.
Use pipelines of collection functions, but don’t let them get too long. Remember that anonymous records, discussed at length in Chapter 7, are often preferable to tuples when passing values between stages of a pipeline.
Summary
Collection functions are the guitar chords of the F# world. You simply can’t get by without a good working knowledge of the basic functions and how to fit them together. In most domains, a large proportion of your F# code should consist of pipelines of collection functions that map, filter, summarize, and group data to get from the inputs you have to the outputs you want. Enjoy the feeling of using collection functions to achieve F#’s enduring goal: to solve complex problems with simple code.
In the next chapter, we’ll look at immutability, the curious notion that we should write programs that don’t change anything; and we’ll also learn when to break back out of this mindset and use mutation.
Exercise Solutions
This section shows solutions for the exercises in this chapter. For the code shown here to run, you’ll also need the code for the Houses module in Listing 4-2.