© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
K. EasonStylish F# 6https://doi.org/10.1007/978-1-4842-7205-3_4

4. Working Effectively with Collection Functions

Kit Eason1  
(1)
Farnham, Surrey, UK
 

I’m an intuitive musician. I have no real technical skills. I can only play six chords on the guitar.

—Patti Smith, Musician

Collection functions are central to productive F# coding. Trying to code without them is as crazy as trying to play the guitar without learning chords. Though you don’t need to know all the chords to be a decent guitarist, you do need the basic ones, and to be able to use them instinctively. The same is true for using collection functions when coding in F#.

But how to start with this seemingly mountainous task? There are over 100 collection functions in the Array module alone! This chapter will get you familiar with the most useful collection functions and show you how to combine them to achieve complex processing tasks with just a few lines of code. I’ll also show you how to spot and recover from mistakes commonly made when using collection functions.

Anatomy of a Collection Function

If you’ve coded at all in F#, you’re probably already familiar with the concept of collection functions, at least through examples such as Array.map and Array.filter. Likewise, if you’re primarily a C# developer, you’ll be familiar with the equivalents in LINQ: Select and Where. But just in case you aren’t familiar with collection functions, here’s a quick primer.

The collection functions in F# are a set of functions that are always available in F# (you don’t have to bring in any extra dependencies) and which let you “do something” with a collection. A collection in this context is a grouping of values of the same type, such as an array, an F# list, or any type that implements IEnumerable. The kinds of operations you can perform are things like filtering, sorting, or transforming. Listing 4-1 shows an example of filtering.

Note

In the numeric literals such as 250_000m in Listing 4-1, the underscores are just a readability aid, equivalent to the commas we might use when handwriting the numbers. The m suffix specifies that these are decimal values, which is a good choice when handling money amounts.

type House = { Address : string; Price : decimal }
let houses =
    [|
        { Address = "1 Acacia Avenue"; Price = 250_000m }
        { Address = "2 Bradley Street"; Price = 380_000m }
        { Address = "1 Carlton Road"; Price = 98_000m }
    |]
let cheapHouses =
    houses |> Array.filter (fun h -> h.Price < 100_000m)
// [|{Address = "1 Carlton Road"; Price = 98000M;}|]
printfn "%A" cheapHouses
Listing 4-1

Filtering example

Note

In the C# LINQ and SQL worlds, this is known as a Where operation rather than a filter operation – it’s the same thing.

The collection functions come in a number of flavors, based on the collection type to which they are applicable. Each flavor lives in its own module, so there is an Array module for working on .NET arrays, a Seq module for working on IEnumerables (known as “sequences” in the F# world), and a List module for working on F# lists. (There are some other flavors that I’ll come to later.) Typically, a collection function takes at least two arguments:
  • A function that defines details of the operation we want to perform.

For example, the Array.filter function takes a function that itself takes a collection element and returns a Boolean value. Elements where that function returns true are returned in the result of the filter operation.

In the example in Listing 4-1, we defined the element-processing function anonymously, by saying (fun h -> h.Price < 100_000m). When defined anonymously in this way, the function is known as a lambda function or lambda expression.
  • An instance of the collection we want to work on – for example, an array.

This is a different approach from the one in C#, where collection functions are normally defined as extension methods on the type. For example, using LINQ in C#, we would do a houses.Where to perform filtering, "Where" being an extension method on the type of houses.

Collection functions that take and use a function argument are “higher-order functions.” But some collection functions, such as Array.sort, don’t take a function argument (in the case of sort because the sorting is done using default comparison). These ones are collection functions but aren’t higher-order functions.

Some of the collection functions need additional arguments. For instance, Array.init, which creates an array, needs to be told how many elements to create.

Typically, here is how the ordering of the parameters of a collection function goes:
  1. 1.

    Any parameters that don’t fall into the other two categories – for example, the length of the required collection

     
  2. 2.

    The function to be applied

     
  3. 3.

    The collection itself, always as the last parameter

     

This ordering makes collection functions “pipeline friendly” because the forward-pipe operator passes the result of the preceding operation (typically in this context, a collection) into the last parameter of the next operation.

Note

When you define your own collection functions, use the same parameter ordering style as the built-in functions so that yours will also be pipeline friendly.

The other essential property of collection functions is that they are designed to be used in an immutable way. For example, if you filter a collection, you end up with a new collection containing only the desired elements. The original collection is unaffected. The one slight exception is iter, which returns unit and therefore doesn’t convey any useful information back to the caller. Instead, you would use iter to do something in the “outside world” like printing output or sending messages.

Picking the Right Collection Function

I personally find lists that show the signature of the function, for example ('T -> 'Key) -> 'T [] -> 'T [], not particularly useful in finding the right function, so in Table 4-1, I have put together a more human-friendly reference, which should help you identify the right tool for the job.
Table 4-1

Commonly Used Collection Functions

 

Begin with

End up with

Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figa_HTML.gif

Many

Equally Many

map, mapi, sort, sortBy, rev

../images/462726_2_En_4_Chapter/462726_2_En_4_Figb_HTML.gif

Many

Fewer

filter, choose, distinct, take, truncate, tail, sub

../images/462726_2_En_4_Chapter/462726_2_En_4_Figc_HTML.gif

Many

One

length, fold, reduce, average, head, sum, max, maxBy, min, minBy, find, pick

../images/462726_2_En_4_Chapter/462726_2_En_4_Figd_HTML.gif

Many

Boolean

exists, forall, isEmpty

../images/462726_2_En_4_Chapter/462726_2_En_4_Fige_HTML.gif

Nothing

Many

init, create, unfold1

../images/462726_2_En_4_Chapter/462726_2_En_4_Figf_HTML.gif

Many

Nothing (except side effects)

iter, iteri

../images/462726_2_En_4_Chapter/462726_2_En_4_Figg_HTML.gif

Many of Many

Many

concat, collect

../images/462726_2_En_4_Chapter/462726_2_En_4_Figh_HTML.gif

Many

Groupings

groupBy

../images/462726_2_En_4_Chapter/462726_2_En_4_Figi_HTML.gif

2 of Many

Many

append, zip

../images/462726_2_En_4_Chapter/462726_2_En_4_Figj_HTML.gif

Many

2 of Many

partition

To use this table, you need to think about just two things:
  • How many elements do I want to start off with?

  • How many elements do I want to end up with?

We’re not talking absolute numbers here, but in terms of options such as no elements , exactly one element (or value), or up-to-n elements. In the table, I’ve called these cardinalities Nothing, One, and Many. When I say Many-to-Equally Many, I mean that the number of elements returned is the same as the number provided. In cases where the function will return at most the same number of elements (but probably fewer), I’ve called the cardinality Fewer.

Oddly enough, thinking first about the cardinality of the operation you want is better than thinking first about the specific operation.

Table 4-1 doesn’t cover all the collection functions, just the ones that are most widely used. Once you get used to thinking in terms of collection operations by practicing with the common ones listed previously, you’ll find it relatively easy to scan the documentation for the more exotic functions, such as Array.sortInPlaceWith.

As an example of using Table 4-1, say we have a collection of houses, and we want to retrieve just those houses that have a sale price of less than $100,000. Our “Begin with” cardinality is Many, and our “End up with” cardinality is Fewer. A quick glance at Table 4-1 shows us that the functions that fit this profile are filter, choose, distinct, and sub. At this point, it’s probably pretty obvious from the name which one of these we need (it’s filter), but if it isn’t, at least we only have four functions to consider. How do we choose between these? In Tables 4-2 through 4-11, I give a breakdown of the commonly used functions in each classification. The table you would use for the house example is Table 4-3 because that is the one for Many-to-Fewer operations.
Table 4-2

Many-to-Equally Many Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figk_HTML.gif

Function

Description

Useful Variants

map

Takes each of the input values, applies the provided function to it, and returns all the results

Array.Parallel.map

mapi

As map, but the provided function is called with two arguments: an index value starting with 0 and ending with n-1 and the current element

Parallel.mapi

rev

Returns a collection containing the original elements in reverse order

 

sort

Returns a collection containing all the elements, but sorted using the default comparer for the element type

sortBy

sortBy

As sort, but compares using not the elements, but the results of sending the elements to the provided function

sortByDescending, sortWith

Table 4-3

Many-to-Fewer Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figl_HTML.gif

Function

Description

Useful Variants and Alternatives

filter

Returns only those elements that return true when the provided filtering function is applied to them

 

choose

Applies the provided function to each element and returns the values of function results when those function results were Some(value)

 

distinct

Returns the elements after eliminating any duplicates, where duplicates are identified using the default comparer for the element type

distinctBy

sub

Returns a subset of the elements, starting with the element at the specified index and continuing for the specified length (available for arrays only)

Array slicing syntax, for example,

let arr2 = arr.[3..5]

take

Returns the first n elements

takeWhile, truncate

truncate

Returns at most the first n elements (fewer if the collection contains fewer than n elements)

 

tail

Returns all elements after the first element

 
Table 4-4

Many-to-One Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figm_HTML.gif

Function

Description

Useful Variants

length

Calculates the number of elements in the collection

Also available as a property on arrays and F# lists, for example, arr.Length

fold

Starts with an initial value, applies the provided function to that value and the first element of the collection, then applies the function to the previous result and the second element of the collection, and so forth until all the elements have been used. Returns the final accumulated result of all these operations

foldBack

reduce

Like fold, but takes its initial state from the first element

reduceBack

average

Computes the average value of the elements

averageBy

head

Gets the first element

 

sum

Computes the total value of the elements

sumBy

max

Gets the maximum element

maxBy

min

Gets the minimum element

minBy

find

Gets the first element for which the provided function returns true

tryFind, pick

pick

Returns the first result for which the provided function returns Some

tryPick, find

Table 4-5

Many-to-Boolean Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Fign_HTML.gif

Function

Description

Useful Variants

exists

Returns true if any of the elements returns true when passed into the provided function

 

forall

Returns true if all the elements return true when passed into the provided function

 

isEmpty

Returns true if the collection has no elements

 
Table 4-6

Nothing-to-Many Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figo_HTML.gif

Function

Description

Useful Variants and Alternatives

init

Creates a collection with n elements, where each element value is created by calling the provided function. An index parameter (starting at 0 and ending at n-1) is provided to each call to the function

initInfinite (for sequences)

create

Creates a collection with n elements, whose elements are initially the specified single value (available for arrays only)

zeroCreate

unfold

Creates a collection by taking a specified initial value and passing it to the provided “generator” function. If the generator function returns, say, Some(x,y), then x is added to the sequence and y is passed into the next iteration. If the function returns None, then the sequence ends

Array and list comprehensions

Table 4-7

Many-to-Nothing Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figp_HTML.gif

Function

Description

Useful Variants and Alternatives

iter

Takes each collection element in turn and executes the provided function using the element. The provided function needs to return nothing (in F# terms, unit, denoted by the literal ()). Thus, the only way iter can affect the outside world is via “side effects,” such as writing files, printing lines to the console, and updating mutable values

iteri

iteri

As iter, but the provided function is called with two arguments: the current element and an index value starting with 0 and ending with n-1

 
Table 4-8

Many-of-Many to Many Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figq_HTML.gif

Function

Description

Useful Variants and Alternatives

concat

Takes a collection of collections and returns a single collection of all the input elements. Note the distinction between concat and append. concat takes a collection of collections, whereas append takes exactly two collections

 

collect

Takes a collection, applies the provided function to each of the elements (where the function itself returns a collection) and returns a single collection of all the results. Whenever you find yourself using a map followed by a concat, it’s likely you can replace this with collect. Strictly speaking, this isn’t a “Many-of-Many to Many” operation, but it feels most natural to put it in this category

 
Table 4-9

Many-to-Groupings Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figr_HTML.gif

Function

Description

Useful Variants and Alternatives

groupBy

Takes a collection, applies the provided function to each of the elements, and returns the distinct values of the results, together with all the elements that resulted in each key result

 
Table 4-10

2-of-Many to Many Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figs_HTML.gif

Function

Description

Useful Variants and Alternatives

append

Creates a collection consisting of all the elements from both the input collections

 

zip

Takes two collections and returns a single collection, each of whose elements is a tuple of the corresponding values from each of the input collections

zip3

Table 4-11

Many to 2-of-Many Collection Functions

../images/462726_2_En_4_Chapter/462726_2_En_4_Figt_HTML.gif

Function

Description

Useful Variants and Alternatives

partition

Takes a collection and returns a tuple of two collections, the first of which contains elements that returned true when the provided function was applied, and the second contains those which returned false (available for arrays and F# lists only)

 

Detailed Collection Function Tables

Just skim these detailed tables for now and come back to them as reference when you do the exercises that follow.

Practicing with Collection Functions

Much of the productivity gain from programming in F# comes from effective use of collection functions. So I want to spend a little time practicing how to choose and combine them in a variety of situations. This section contains some exercises that will let you do just that. They get progressively more difficult, so please make sure you take the time to go through them in order so that your skills in this area are really secure.

Remember to refer back to Tables 4-1 through 4-11 to help you find the right collection function in each case. All the exercises in this section can be solved with a call to a single collection function. We’ll explore tasks needing several collection functions in the next section.

Exercise Setup

You’ll need a little code to provide data and useful functions to work with. Either download the source code for this chapter or create an F# script file and add the code from Listing 4-2, replacing anything the creation template added for you.
type House = { Address : string; Price : decimal }
module House =
    let private random = System.Random(Seed = 1)
    /// Make an array of 'count' random houses.
    let getRandom count =
        Array.init count (fun i ->
            { Address = sprintf "%i Stochastic Street" (i+1)
              Price = random.Next(50_000, 500_000) |> decimal })
module Distance =
    let private random = System.Random(Seed = 1)
    /// Try to get the distance to the nearest school.
    /// (Results are simulated)
    let tryToSchool (house : House) =
        // Because we simulate results, the house
        // parameter isn’t actually used.
        let dist = random.Next(10) |> double
        if dist < 5. then
            Some dist
        else
            None
type PriceBand = | Cheap | Medium | Expensive
module PriceBand =
    // Return a price band based on price.
    let fromPrice (price : decimal) =
        if price < 100_000m then
            Cheap
        elif price < 200_000m then
            Medium
        else
            Expensive
Listing 4-2

Setup code for exercises

Take a moment to read the code in Listing 4-2. It provides a type called House, which has an address and a price. (This is a very naive model but will do for the topics covered in this chapter.) There is also a function called House.getRandom, which will create some House instances for you, with random prices and ascending street numbers. I’ve hardwired the seed of the random number generator so you always get the same results, which will make debugging some of your exercise solutions easier. The usage of the Distance.tryToSchool and PriceBand.fromPrice functions will become apparent as you go through the exercises.

As you tackle the exercises, structure your code as I do when I go through Exercise 4-1 with you. This will help you concentrate on the logic of the collection functions. Solutions for the exercises are at the end of the chapter.

Single Collection Function Exercises

Each of the exercises in this section can be solved using just one collection function.

Exercise 4-1 – Transforming Data Items

Note I’ll do this exercise with you to help you get used to working with the provided code and the collection functions tables.

Take a sample of 20 houses, and for each house, produce a string in the form
Address: 1 Stochastic Street - Price: 123456.00000

The number of decimal places displayed for the price doesn’t matter.

To tackle this exercise, first, make sure you have the code from Listing 4-2 in a notebook cell or an F# script. The downloadable notebook for this chapter already has the necessary code in one of the cells. Begin by binding a value called housePrices (Listing 4-3).
// Listing 4-2 code above here
let housePrices =
Listing 4-3

Binding a value

Now take a careful look at the exercise requirements. It doesn’t require you to write a general-purpose function, just to take one sample of 20 houses and produce some results. That means you can just make a value (not a function) using a let binding with no arguments. You can start to define the required value by getting the sample of 20 houses, using the getRandom function in the House module (Listing 4-4).
let housePrices =
    House.getRandom 20
Listing 4-4

Getting the houses sample

Looking back at the exercise requirement again, you are required to produce one string for each input house. In terms of Table 4-1, this is clearly a ManytoEqually Many operation. There are only a few collection functions in the table that match this profile: map, mapi, sort, sortBy, and rev. It’s probably obvious which of these are definitely not the one we need, as this isn’t a sorting or reversing operation. You can refine your choice further by looking at Table 4-2 to review what each function does. The map function looks promising, since its description, “Takes each of the input values, applies the specified function to it, and returns all the results,” looks very similar to what you want to achieve. You don’t want the mapi function, as there is no need for the index value it provides. To use Array.map, you can continue your code as in Listing 4-5.
let housePrices =
    House.getRandom 20
    |> Array.map (fun h -> ) // Still needs a body...
Listing 4-5

Calling the map function

Note

In Listing 4-5, I’ve typed the final closing bracket for the map’s lambda function, even though the function doesn’t yet have a body. Doing this helps to ensure that Intellisense works correctly while you type the body of the lambda function.

Now you need a body for the lambda function, which needs to take a House instance and produce a string in the required format. That’s easy using the sprintf function (Listing 4-6).
let housePrices =
    House.getRandom 20
    |> Array.map (fun h ->
        sprintf "Address: %s - Price: %f" h.Address h.Price)
housePrices
Listing 4-6

Providing a lambda body for the map function

To test the function, execute the relevant notebook cell, or select all the code in your F# script (including the setup code) and send it to F# Interactive. Your output should look something like Listing 4-7. If using a notebook, the output will be shown as a table, but the content should be similar. Your house prices might vary from the listing, depending on the implementation of System.Random in your environment.
  val housePrices : string [] =
    [|"Address: 1 Stochastic Street - Price: 161900.000000";
      ...
      "Address: 20 Stochastic Street - Price: 365808.000000"|]
Listing 4-7

Output from a successful run of the exercise code

Exercise solved!

Now tackle the remaining exercises on your own. You can solve each of them with a single collection function, and the code in each case can be structured in a very similar way to Listing 4-6.

Exercise 4-2 – Calculating an Average

Take a sample of 20 houses and calculate the average of their prices.

You can assume the list isn’t empty (you know it has 20 houses!).

Exercise 4-3 – Selecting Based on a Condition

Take a sample of 20 houses and get all the houses that cost over $250,000.

Exercise 4-4 – Attempting a Calculation and Choosing Successes

Take a sample of 20 houses and return an array of tuples, each tuple containing a house and the distance to the nearest school. Use the Distance.tryToSchool function to calculate the distance. Exclude houses for which this function returns None.

Sample output:
val housesNearSchools : (Houses.House * double) [] =
    [|({Address = "1 Stochastic Street";
        Price = 161900M;}, 2.0); ({Address = "3 Stochastic Street";
                                   Price = 99834M;}, 2.0); ...

Clue: Although you can achieve this in a single collection function, the lambda it uses will need to do some pattern matching on the Some/None cases returned by Distance.tryToSchool.

Multiple Collection Function Exercises

By now, you should be pretty slick at selecting and using an individual collection function to solve a problem. Now it’s time to practice combining collection functions to solve slightly more complex problems. You’ll need to use more than one collection function for each of the exercises in this section. When tackling these exercises, remember to think about the cardinality (e.g., Many-to-Fewer) of each step you need to achieve the goal.

Exercise 4-5 – Filtering and Iterating

Note I’ll do this exercise with you to help you get used to combining collection functions.

Take a sample of 20 houses, find the ones that cost over $100,000, and iterate over the results printing (not returning) their addresses and prices. The exact format doesn’t matter, as long as the address and price are printed in some form.

You should be able to complete this exercise using two collection functions.

If you remember the previous section, you’ll know immediately that the first function you’ll need is the Many-to-Fewer function filter.

You can begin coding by getting the house sample and calling filter (Listing 4-8).
House.getRandom 20
|> Array.filter (fun h -> h.Price > 100_000m)
Listing 4-8

Filtering the sample

Reading the second part of the exercise, you might notice that you aren’t required to create (in F# terms, bind) an actual value, but merely to print results. (In functional programming terms, we are using side effects.) This means that we need a Many-to-None operation, which should help you quickly narrow your choice down to the iter function.

To implement this second operation, simply use the forward-pipe operator (|>) to send the results from the filtering operation to the iteration operation (Listing 4-9).
House.getRandom 20
|> Array.filter (fun h -> h.Price > 100_000m)
|> Array.iter (fun h ->
    printfn "Address: %s Price: %f" h.Address h.Price)
Listing 4-9

Iterating over an array

If you run this, you should get output that looks something like this.
Address: 1 Stochastic Street Price: 161900.000000
Address: 3 Stochastic Street Price: 260154.000000
...
Address: 20 Stochastic Street Price: 365808.000000

Now go on to complete the other multifunction exercises yourself. In each case, you should be able to solve the exercise by using two or more collection functions, piped together as we did here.

Exercise 4-6 – Ordering

Extend the previous exercise, this time ensuring that the houses are printed in descending order of price.

You should be able to complete this exercise using three collection functions (including the two already used in Exercise 4-5).

Exercise 4-7 – Filtering and Averaging

Take a sample of 20 houses and find the average price of all the houses that cost over $200,000.

You can assume for this exercise that there will be at least one house that fulfills the criterion.

You should be able to complete this exercise using two collection functions.

Exercise 4-8 – Finding a Single Element

Take a sample of 20 houses and find the first house that costs less than $100,000 and for which we can calculate the distance to a school. The results should include the house instance and the calculated distance to school.

You can assume for this exercise that there will be at least one house that fulfills the criteria.

You should be able to complete this exercise using two collection functions.

Clue: You can reuse some of the solution code from Exercise 4-4 to help complete this exercise.

Exercise 4-9 – Grouping

Take a sample of 20 houses, and create an array of tuples, where the first element of each tuple is a price band. A price “band” is a range of prices created using the provided PriceBand.fromPrice function. The second element of the tuple is a sequence of all the houses that fall into the band.

It’s OK if a band is omitted when there are no houses in that band. Within a grouping, the houses should be in ascending order of price.

Example output:
  val housesByBand : (Houses.PriceBand * Houses.House []) [] =
    [|(Medium,
       [|{Address = "12 Stochastic Street";
          Price = 161613M;};
     ...
         {Address = "13 Stochastic Street";
          Price = 194049M;}|]);
      (Cheap,
       [|{Address = "11 Stochastic Street";
          Price = 62886M;};
      ...
         {Address = "2 Stochastic Street";
          Price = 99834M;}|]);
      (Expensive,
       [|{Address = "7 Stochastic Street";
          Price = 209337M;};
      ...
         {Address = "14 Stochastic Street";
          Price = 495395M;}|])|]

You should be able to complete this exercise using three collection functions.

Partial Functions

There’s another characteristic of collection functions that I omitted from the tables and exercises shown previously for simplicity, but which you must always bear in mind. That characteristic is whether the collection function is partial. (This is a separate concept from the concept of partial application, which I tackle in Chapter 9.)

In this context, a function is partial if it can cause an error rather than return a value, even when given a logically possible input. We’re talking here about errors that are inherent to the input and the function in question, not externally induced conditions such as running out of memory or having one’s network connection fall over. A good example of a function that is partial in this sense is the head function (e.g., Array.head). Using head on an empty collection will cause an ArgumentException. An empty collection doesn’t have a first element.

Another example is the zip function, but here the situation is a little trickier. It’s an error to perform an Array.zip when the input arrays are different lengths. But it’s fine to use Seq.zip when the input sequences are different lengths: the leftover elements in the longer sequence will just be ignored.

Table 4-12 gives a list of the functions from Table 4-1 that are partial. Whenever you use a function from this list, think carefully about whether the input could ever cause an error condition.
Table 4-12

Partial Collection Functions to Watch Out For

Function

Error Condition

Ways to Avoid

average, max, maxBy,

min,

minBy

Collection is empty

Check length first and define a suitable value (e.g., 0 or None) in that situation. Also be sure to check whether there is an official implementation of tryAverage, tryMax, etc. (At the time of writing, these have not been implemented)

find

No elements matched the condition (or the collection was empty)

Use tryFind and handle the Some() and None cases when consuming the result

pick

No elements matched the condition (or the collection was empty)

Use tryPick and handle the Some() and None cases when consuming the result

reduce

Collection is empty, so there is no way to get an initial state for the accumulator

Use fold and provide an explicit initial state

sub

Collection doesn’t have enough elements

Check ranges first

Use filter to select elements instead

zip (Array and List versions)

Collections are different lengths

Check lengths are equal

Use Seq.zip and accept that “leftover” elements will be lost

head and last

Collection is empty

Use tryHead or tryLast

Check length first and define a suitable value in that situation

tail

Collection is empty

Check length first and define a suitable value in that situation

By the way, any kind of function can be partial. The issue doesn’t just affect collection functions, but it does crop up most commonly in practice when using collection functions.

Note

Get in the habit of thinking about partiality whenever using a collection function, and handle the failure cases explicitly.

Don’t think about a function being partial as a bug in the function: all these cases are inherent in the nature of the function. How can you possibly get the maximum value in an empty list?

Incidentally, a function that isn’t partial in this sense is known as a total function, though you will hardly ever hear this term used outside a math or computer science context (I had to look it up).

Coding Around Partial Functions

As you can see from Table 4-12, many built-in collection functions have try... equivalents (e.g., tryFind), which return None if there is no value to return, or Some(value) if there is, thus making them nice safe total functions. When no such function is available (or you don’t want to use it), there are several other things you can do.

For example, let’s say you have some transaction values and a business rule that says, “When there are no transactions, the average transaction value is considered to be zero.” Listing 4-10 shows how you might define an averaging function that meets this requirement.
module Average =
    let averageValue (values : decimal[]) =
        if values.Length = 0 then
            0.m
        else
            values |> Array.average
    // 370.m
    let ex1 = [|10.m; 100.m; 1000.m|] |> averageValue
    // 0.m
    let ex2 = [||] |> averageValue
Listing 4-10

A function to compute an array average, or zero when the array is empty

This would work fine in the specific case of an array of decimal values, but since this is a book on style, I probably ought to mention a couple of alternatives. The first (Listing 4-11) takes an array of any type (not just decimal) and uses the GenericZero function to return a suitably typed zero value.
module Average =
    let inline averageOrZero (values : 'T[]) =
        if values.Length = 0 then
            LanguagePrimitives.GenericZero<'T>
        else
            values |> Array.average
    // 370.m
    let ex3 = [|10.m; 100.m; 1000.m|] |> averageOrZero
    // 370.f
    let ex3f = [|10.f; 100.f; 1000.f|] |> averageOrZero
    // 0.m
    let ex4:decimal = [||] |> averageOrZero<decimal>
    // 0.f
    let ex4f:decimal = [||] |> averageOrZero<float32>
Listing 4-11

A generic function to compute an array average, or zero when the array is empty

Another possibility (Listing 4-12) is to allow the caller to specify what should be returned when the collection is empty.
module Average =
    let inline averageOr (defaultValue : 'T) (values : 'T[]) =
        if values.Length = 0 then
            defaultValue
        else
            values |> Array.average
    // 370.m
    let ex5 = [|10.m; 100.m; 1000.m|] |> averageOr 0.m
    // 370.f
    let ex5f = [|10.f; 100.f; 1000.f|] |> averageOr 0.f
    // 0.m
    let ex6 = [||] |> averageOr 0.m
    // 0.f
    let ex6f = [||] |> averageOr 0.f
Listing 4-12

A function to compute an array average, or a caller-supplied default when the array is empty

Note that in both Listings 4-11 and 4-12, I’ve had to “inline” the functions using the inline keyword because they call Array.average, which has a static parameter.

Note

Remember the principle of semantic focus: the place to handle, for example, the empty collection case is right here in the code where it could occur. Don’t rely on the caller to condition your inputs to prevent conditions such as an empty collection. The calling code might get changed, or your function might get used in new code, and in either case, the input prechecking might be forgotten about.

Using the “try” Idiom for Partial Functions

Another way of coding around partial functions is to define your own try... version that returns Some(value) when a sensible value can be returned and None when it cannot. For example, at the time of writing, there is no built-in Array.tryAverage function. Listing 4-13 shows how to code your own in just a few lines.
module Array =
    let inline tryAverage (values : 'T[]) =
        if values.Length = 0 then
            None
        else
            values |> Array.average |> Some
Listing 4-13

Defining an idiomatic tryAverage function

Note

Notice that in the averageOr and tryAverage example in Listings 4-10 through 4-13, I put the function in a module called Array. This means that, for example, tryAverage will be available elsewhere as Array.tryAverage and can thus be used in exactly the same way as the built-in functions such as Array.tryFind.

I definitely prefer this final approach: defining your own try... function. This is because it is in line with a couple of our coding principles:
  • It displays good semantic focus because everything about the process of calculating an average (and returning None when not possible) is handled in one place in the code. The decision as to what to do when the result is None (use a default, raise an error, or whatever) is delegated back to where it should be, in the caller, which is likely to have more “knowledge” about the particular case where the averaging is required.

  • It displays good motivational transparency: you are saying to the reader, “Here I intend to define a function which behaves like other, similarly named functions such as tryHead.” This leverages the reader’s existing knowledge of how such functions behave, making the code a lot easier to read.

Consuming Values from try… Functions

Whether you use a built-in try... function like tryFind, or one you defined yourself, you must explicitly handle both the Some and None possibilities when consuming the result. Listing 4-14 shows the one simple way of calling the tryAverage function and dealing with the return value. Here, we use an explicit match statement on the Some and None cases .
    // "The average was 370.000000"
    match [|10.m; 100.m; 1000.m|] |> Array.tryAverage with
    | Some av -> printfn "The average was %f" av
    | None -> printfn "There was no average."
    // "There was no average."
    match [||] |> Array.tryAverage with
    | Some av -> printfn "The average was %f" av
    | None -> printfn "There was no average."
Listing 4-14

Consuming option type results using match expressions

There are arguably nicer alternatives to this, which are discussed in Chapters 3 and 11.

Try… Function Exercises

These exercises are variations on some of the previous exercises, except here we remove the assumption that the relevant collection is nonempty.

Exercise 4-10 – Filtering, Averaging, and Try

Take a sample of 20 houses and find the average price of all the houses that cost over $200,000.

You’ll need to make sure you handle the case where no houses in the sample cost over $200,000. (You will need to change the price criterion a little to test this.)

You should be able to complete this exercise using two collection functions, but you may need to define one of these functions yourself.

Exercise 4-11 – Finding a Single Element If Any

Take a sample of 20 houses and find the first house that costs less than $100,000 and for which we can calculate the distance to a school. The results should include the house instance and the calculated distance to school.

You’ll need to make sure you handle the case where no houses meet the criteria. (You will need to change the price criterion a little to test this.)

You should be able to complete this exercise using two collection functions.

Clue: You can reuse some of the solution code from previous exercises to help complete this exercise.

Functions for Other Kinds of Collections

Although most F# developers are familiar with the most widely used modules of collection functions, the Array, Seq, and List modules, they sometimes forget that similar functions are available for more specialized collections (Table 4-13).
Table 4-13

Less-Well-Known Collection Functions

Module

Purpose

Array2D, Array3D, Array4D

Basic operations on n-dimensional arrays

Map

Basic operations on the Map type

Set

Basic operations on the Set type

For example, let’s say you have a word list generated from some natural language text (like the text of a novel), and this word list is stored as a Set to guarantee uniqueness. Now you want to create another set that contains only lowercased versions of the inputs. Listing 4-15 shows how you might do that.
let novelWords = Set ["The";"the";"quick";"brown";"Fox";"fox"]
// set ["brown"; "fox"; "quick"; "the"]
let lowerWords =
    novelWords
    |> Set.map (fun w -> w.ToLowerInvariant())
lowerWords
Listing 4-15

Using Set.map

Note that the Set.map operation is strictly speaking a Many-to-Fewer operation, since it produces a Set, and sets inherently eliminate duplicates. For example, if the input set contained “The” and “the,” the output set would contain only “the.”

When the Collection Function Is Missing

Collection functions normally exist in all the flavors you are likely to need – that is, for arrays, F# lists, and sequences. However, in some cases, the one you might want is missing: for example, there is an Array.partition and a List.partition, but no Seq.partition. When you need such a missing function, simply convert the collection you are working with into a collection type for which the function you need is available, using either the Collection.ofOtherCollection or Collection.toOtherCollection functions. For instance, Array.ofSeq or Seq.toArray. See Listing 4-16.
type House = { Address : string; Price : decimal }
module House =
    /// Make a sequence of 'count' random houses.
    let getRandomSeq count =
        let random = System.Random(Seed = 1)
        Seq.init count (fun i ->
            { Address = sprintf "%i Stochastic Street" (i+1)
              Price = random.Next(50_000, 500_000) |> decimal })
// Convert a sequence of houses into an array, so that we
// can use Array.partition to divide them into affordable and
// unaffordable. (There is no Seq.partition.)
let affordable, unaffordable =
    House.getRandomSeq 20
    |> Array.ofSeq
    |> Array.partition (fun h -> h.Price < 150_000m)
affordable, unaffordable
Listing 4-16

Using Array.partition on a sequence

You can convert the result back to the original collection type if necessary. For example, you could add |> Seq.ofArray to the end of Listing 4-16.

Common Mistakes

There are a few mistakes that are commonly made when using collection functions. Quite often, these don’t really matter as the output is the same, but it’s worth watching out for them, so as to keep your code as robust and stylish as possible.
  • Forgetting which functions are partial: I covered this in the section on partial functions previously. Always handle the error cases (such as an empty collection) explicitly, typically by using the try... version of the function.

  • Not using the choose function: In my early days with F#, I would often write pipelines that called a function that might return None, then filtered for the Some cases, and finally recovered the underlying values by using pattern matching or the Option.Value property. When you catch yourself doing this, use the choose function instead. It does the Some filtering and the value recovery for you. See Listing 4-17.

  • Not using the collect function: You may find yourself writing a pipeline that produces a collection of collections and then immediately joins these into a single collection using the concat function. Instead, use the collect function to achieve this in a single step.

  • Long lambda bodies: If the body of a lambda function gets beyond two or three lines, consider pulling it out into a separate, named function and calling that. This will help you mentally isolate the logic of the function from the logic of the pipeline as a whole. It’ll also reduce indenting! See Listing 4-18.

  • Lambdas that could be direct calls: Whenever your code contains code like (fun x -> doSomething x), it can be replaced simply with doSomething. Listing 4-17 contains an example of this more concise approach, where we do |> Array.map trySchoolDistance instead of |> Array.map (fun h -> trySchoolDistance h).

  • Overlong pipelines: Pipelines that contain more than a handful of forward-pipe operations can be hard to read and debug. Consider breaking them up, perhaps by binding an intermediate value and then passing this into a separate pipeline. This problem can be mitigated by using anonymous record types instead of tuples to carry structured values between pipeline stages. This can make the meaning of intermediate values clearer by careful naming. We’ll discuss this properly in Chapter 7.

  • Overlong or obscure tuples: Certain operations naturally produce tuples, which you will then want to pattern match back into individual values, for processing in the next step of your pipeline. Again, the solution to this is often anonymous records. See Listing 4-19.

module Array =
    let inline tryAverage (a : 'T[]) =
        if a.Length = 0 then
            None
        else
            a |> Array.average |> Some
// Calculate the average known distance to school
// in a sample of 20 houses.
let averageDistanceToSchool =
    House.getRandom 20
    |> Array.map Distance.tryToSchool
    |> Array.filter (fun d -> d.IsSome)
    |> Array.map (fun d -> d.Value)
    |> Array.tryAverage
// As previous function, but use Array.choose instead
// of map, filter and map.
let averageDistanceToSchool2 =
    House.getRandom 20
    |> Array.choose Distance.tryToSchool
    |> Array.tryAverage
averageDistanceToSchool, averageDistanceToSchool2
Listing 4-17

Using the choose function

// Get houses with their price bands the long-winded way:
let housesWithBands =
    House.getRandom 20
    |> Array.map (fun h ->
        let band =
            if h.Price < 100_000m then
                Cheap
            elif h.Price < 200_000m then
                Medium
            else
                Expensive
        h, band)
// Most of the code above could be pulled into a fromPrice function:
// (Here we use the one that is already defined in the PriceBand module
// in a previous listing.)
let housesWithBands2 =
    House.getRandom 20
    |> Array.map (fun h ->
        h, h.Price |> PriceBand.fromPrice)
housesWithBands, housesWithBands2
Listing 4-18

Avoiding long lambda functions

module PriceBand =
    let order = function
    | Cheap -> 0 | Medium -> 1 | Expensive -> 2
// A report of price bands and the houses that fall into them:
House.getRandom 20
|> Seq.groupBy (fun h -> h.Price |> PriceBand.fromPrice)
|> Seq.sortBy (fun (band, _) -> band |> PriceBand.order)
|> Seq.iter (fun (band, houses) ->
    printfn "---- %A ----" band
    houses
    |> Seq.iter (fun h -> printfn "%s - %f" h.Address h.Price))
// Like the previous report, but using an anoymous record to
// reduce use of tuples:
House.getRandom 20
|> Seq.groupBy (fun h -> h.Price |> PriceBand.fromPrice)
|> Seq.map (fun (band, houses) ->
    {| PriceBand = band; Houses = houses |})
|> Seq.sortBy (fun group -> group.PriceBand |> PriceBand.order)
|> Seq.iter (fun group ->
    printfn "---- %A ----" group.PriceBand
    group.Houses
    |> Seq.iter (fun h -> printfn "%s - %f" h.Address h.Price))
---- Cheap ----
2 Stochastic Street - 99834.000000
9 Stochastic Street - 95569.000000
...
---- Medium ----
1 Stochastic Street - 161900.000000
12 Stochastic Street - 161613.000000
...
---- Expensive ----
3 Stochastic Street - 260154.000000
4 Stochastic Street - 397221.000000
...
Listing 4-19

Replacing tuples with anonymous records

Recommendations

Here are some key points to take away from this chapter:
  • Become familiar with the many collection functions available to you in modules such as Array, List, and Seq and the more specialized modules such as Map and Set.

  • Learn how to map from the problem you are trying to solve (e.g., “I have an array of numbers and I want the average of the largest three”) to the type signatures that are likely to help solve them (e.g., 'T [] -> 'T [] for the sorting, 'T [] -> 'T [] for the top-three selection, and ^T [] -> ^T for the average) and from there to the specific collections you are going to need (Array.sort, Array.truncate, and Array.average).

  • If you find type signatures a little inaccessible, refer back to Tables 4-2 through 4-11 for a more visual reference to the most useful collection functions.

  • Beware of collection functions that are partial, such as Array.head, which raises an exception when the array is empty. Use the try... version (e.g., Array.tryHead), or if there isn’t one, consider writing one using the try... naming style.

  • Think of explicit looping, especially using mutable values, as a last resort. There’s usually a collection function or a combination of them that will do the job more simply.

  • Use pipelines of collection functions, but don’t let them get too long. Remember that anonymous records, discussed at length in Chapter 7, are often preferable to tuples when passing values between stages of a pipeline.

Summary

Collection functions are the guitar chords of the F# world. You simply can’t get by without a good working knowledge of the basic functions and how to fit them together. In most domains, a large proportion of your F# code should consist of pipelines of collection functions that map, filter, summarize, and group data to get from the inputs you have to the outputs you want. Enjoy the feeling of using collection functions to achieve F#’s enduring goal: to solve complex problems with simple code.

In the next chapter, we’ll look at immutability, the curious notion that we should write programs that don’t change anything; and we’ll also learn when to break back out of this mindset and use mutation.

Exercise Solutions

This section shows solutions for the exercises in this chapter. For the code shown here to run, you’ll also need the code for the Houses module in Listing 4-2.

Exercise 4-1 – Transforming Data Items
let housePrices =
    House.getRandom 20
    |> Array.map (fun h ->
        sprintf "Address: %s - Price: %f" h.Address h.Price)
housePrices
|> Array.iter (printfn "%O")
Alternatively, you could have used string interpolation, like this:
let housePrices2 =
    House.getRandom 20
    |> Array.map (fun h ->
        sprintf $"Address: {h.Address} - Price: {h.Price}")
Exercise 4-2 – Calculating an Average
let averagePrice =
    House.getRandom 20
    |> Array.averageBy (fun h -> h.Price)
Exercise 4-3 – Selecting Based on a Condition
let expensive =
    House.getRandom 20
    |> Array.filter (fun h -> h.Price > 250_000m)
Exercise 4-4 – Attempting a Calculation and Choosing Successes
let housesNearSchools =
    House.getRandom 20
    |> Array.choose (fun h ->
        // See also the "Missing Data" chapter
        match h |> Distance.tryToSchool with
        | Some d -> Some(h, d)
        | None -> None)
Exercise 4-5 – Filtering and Iterating
House.getRandom 20
|> Array.filter (fun h -> h.Price > 100_000m)
|> Array.iter (fun h ->
    printfn "Address: %s Price: %f" h.Address h.Price)
Exercise 4-6 – Ordering
House.getRandom 20
|> Array.filter (fun h -> h.Price > 100_000m)
|> Array.sortByDescending (fun h -> h.Price)
|> Array.iter (fun h ->
    printfn "Address: %s Price: %f" h.Address h.Price)
Exercise 4-7 – Filtering and Averaging
let averageOver200K =
    House.getRandom 20
    |> Array.filter (fun h -> h.Price > 200_000m)
    |> Array.averageBy (fun h -> h.Price)
Exercise 4-8 – Finding a Single Element
let cheapHouseWithKnownSchoolDistance =
    House.getRandom 20
    |> Array.filter (fun h -> h.Price < 100_000m)
    |> Array.pick (fun h ->
        match h |> Distance.tryToSchool with
        | Some d -> Some(h, d)
        | None -> None)
Exercise 4-9 – Grouping
let housesByBand =
    House.getRandom 20
    |> Array.groupBy (fun h -> h.Price |> PriceBand.fromPrice)
    |> Array.map (fun group ->
        let band, houses = group
        band, houses |> Array.sortBy (fun h -> h.Price))
You can also “pattern match” in the lambda declaration of the Array.map call for a more concise solution:
    let housesByBand2 =
    House.getRandom 20
    |> Array.groupBy (fun h -> h.Price |> PriceBand.fromPrice)
    |> Array.map (fun (band, houses) ->
        band, houses |> Array.sortBy (fun h -> h.Price))
Exercise 4-10 – Filtering, Averaging, and Try
To test this solution, you’ll need to increase the price criterion so that the sample is empty.
module Array =
    let inline tryAverageBy f (a : 'T[]) =
        if a.Length = 0 then
            None
        else
            a |> Array.averageBy f |> Some
let averageOver200K =
    House.getRandom 20
    |> Array.filter (fun h -> h.Price > 200_000m)
    |> Array.tryAverageBy (fun h -> h.Price)
Exercise 4-11 – Finding a Single Element If Any
To test this solution, you’ll need to decrease the price criterion so that the sample is empty.
let cheapHouseWithKnownSchoolDistance =
    House.getRandom 20
    // Try lower price values to explore what happens
    // when the filter returns no results.
    |> Array.filter (fun h -> h.Price < 100_000m)
    |> Array.tryPick (fun h ->
        match h |> Distance.tryToSchool with
        | Some d -> Some(h, d)
        | None -> None)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.50.29