© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
K. EasonStylish F# 6https://doi.org/10.1007/978-1-4842-7205-3_12

12. Performance

Kit Eason1  
(1)
Farnham, Surrey, UK
 

Since the engine has a mode of acting peculiar to itself, it will in every particular case be necessary to arrange the series of calculations conformably to the means which the machine possesses; for such or such a process which might be very easy for a [human] calculator may be long and complicated for the engine, and vice versâ.

—L. F. Menabrea, describing Charles Babbage’s Analytical Engine, 1842

(Translated by Ada Lovelace)

Design Is Compromise

In programming, there is always a tension between abstraction and efficiency . Code that has a higher level of abstraction is less likely to define the minimum number of operations needed to achieve the correct result in a specific situation. Conversely, code that is written at a lower level will often be faster but will be less widely applicable, leading to more repetition and sometimes worse maintainability. As a language that encourages you to work at a relatively high level of abstraction, F# can sometimes leave you at the wrong end of this trade-off. This chapter aims to give you the tools to recognize common performance bottlenecks in F# code and the skills needed to resolve these to a reasonable degree, without fatally compromising the readability of your code.

Getting from correct code to correct, efficient code is one of the coding tasks that I find the most satisfying. I hope that by the end of this chapter, you’ll feel the same way.

Some Case Studies

.NET performance, and code performance generally, is a huge topic. Rather than getting lost in a sea of performance-related issues, I’m going to focus on a few case studies that represent mistakes I commonly see being made (often by me). Incidentally, you might notice that I haven’t included asynchronous code in these case studies: this topic was covered in Chapter 10. For each case study, I’m going to present some correctly functioning but inefficient code. Then I’ll help you identify why it’s inefficient and show you the steps you can go through to make it relatively fast, but still correct and maintainable.

But before we can embark on the case studies, we need a framework for measuring and comparing performance. Enter “BenchmarkDotNet.”

BenchmarkDotNet

Through most of this book, I’ve avoided using third-party libraries, as I wanted to focus on the language itself. But in the case of performance, we need something that can provide a fair measure of execution speed, which includes running our code a number of times and performing back-to-back comparisons of different versions. BenchmarkDotNet does exactly this and works nicely with F#.

You can either use the source code provided with this book or create your own project using the following steps:
  • Create a command-line F# project.

mkdir performance
cd performance
dotnet new console -lang F#
  • Add the package “BenchmarkDotNet.”

dotnet add package BenchmarkDotNet
  • Add a file called Dummy.fs before Program.fs and populate it with the code from Listing 12-1.

  • Replace the code in Program.fs with the code from Listing 12-2 .

module Dummy
let slowFunction() =
    System.Threading.Thread.Sleep(100)
    99
let fastFunction() =
    System.Threading.Thread.Sleep(10)
    99
Listing 12-1

Dummy functions to benchmark (Dummy.fs)

The code in Listing 12-1, Dummy.fs, is the test subject – the actual code whose performance we want to check. Initially, this will be a dummy, but later we’ll add real code to test.
open System
open BenchmarkDotNet.Running
open BenchmarkDotNet.Attributes
module Harness =
    [<MemoryDiagnoser>]
    type Harness() =
        [<Benchmark>]
        member _.Old() =
            Dummy.slowFunction()
        [<Benchmark>]
        member _.New() =
            Dummy.fastFunction()
[<EntryPoint>]
let main _ =
    BenchmarkRunner.Run<Harness.Harness>()
    |> printfn "%A"
    0
Listing 12-2

Executing the benchmarks (Program.fs)

The code in Listing 12-2, Program.fs, is the “boiler plate” we need to get BenchmarkDotNet to call our code repeatedly to measure its performance.

Once you have all the source in place, run the project, making sure you force the build to be in Release configuration:
dotnet run -c release
You should get a large volume of diagnostic output and toward the end a table of timings as in Listing 12-3 .
| Method |      Mean |    Error |   StdDev |    Median | Allocated |
|------- |----------:|---------:|---------:|----------:|----------:|
|    Old | 110.93 ms | 2.193 ms | 3.283 ms | 110.66 ms |         - |
|    New |  18.62 ms | 0.458 ms | 1.349 ms |  19.11 ms |         - |
Listing 12-3

Dummy benchmark output

The “Method” column contains the names of the methods in the Harness class that we used to call our actual test-subject functions. As we are going to be back-to-back testing original vs. performance-enhanced versions, I’ve called these “Old” and “New.” The “Mean” column shows the average time needed to execute the functions we are testing. Not surprisingly, the “Old” function (slowFunction()) takes more time than the “New” function (fastFunction()). The difference in means is roughly 10:1, reflecting the fact that the slow dummy function sleeps for ten times as long. (It’s not exactly 10:1 because of other overheads that are the same for each version.)

The “Error,” “StdDev,” and “Median” columns give other relevant statistical measures of runtimes. The “Allocated” column shows how much managed memory – if any – was allocated per invocation of the functions under test. For the purposes of this chapter we’ll focus mainly on the “Mean” column. If the function being tested causes garbage collections, there will be additional columns in the table showing how many Generation 0, 1, and 2 collections occurred.

Case Study: Inappropriate Collection Types

Now that we have some nice benchmarking infrastructure in place, it’s time to look at common performance antipatterns and their remedies. We’ll start with what happens if you use an inappropriate collection type or access it inappropriately.

Imagine you have a need to create a “sample” function. It takes a collection of values and returns only every n’th value, for some provided value of n which we’ll call interval. For example, if you gave it the collection ['a';'b';'c';'d'] and an interval of 3, it would return ['a';'d']. The requirement doesn’t say anything about what type of collection contains the input, so you decide to be idiomatic and use F# lists as both the input and the return values. Listing 12-4 shows your first cut of this logic.
let sample interval data =
    [
        let max = (List.length data) - 1
        for i in 0..interval..max ->
            data.[i]
    ]
Listing 12-4

First cut of a sample function

We want to generate an F# list, so we use a list comprehension (the whole body of the function is in []). We use a for loop with a skip value of interval as the sampling mechanism. Items are returned from the input list using the -> operator (a shortcut for yield in for-loops) together with indexed access to the list, that is, data.[i]. Seems reasonable – but does it perform?

To find out, we’ll need to integrate it with the project we put together in Listings 12-1 and 12-2. Add another file called InappropriateCollectionType.fs and ensure that it is first in the compilation order. Populate it with the code from Listing 12-5.
module InappropriateCollectionType
module Old =
    let sample interval data =
        [
            let max = (List.length data) - 1
            for i in 0..interval..max ->
                data.[i]
        ]
module New =
    let sample interval data =
        [
            let max = (List.length data) - 1
            for i in 0..interval..max ->
                data.[i]
        ]
Listing 12-5

Integrating a function with benchmarking

In Listing 12-5, we declare modules Old and New to hold baseline and (in the future) improved versions of our function-under-test. At this stage, the Old and New implementations of sample are the same.

To make the test harness call these functions and to give them something to work on, change the Harness module within Program.fs to look like Listing 12-6.
module Harness =
    [<MemoryDiagnoser>]
    type Harness() =
        let r = Random()
        let list = List.init 1_000_000 (fun _ -> r.NextDouble())
        [<Benchmark>]
        member _.Old() =
            list
            |> InappropriateCollectionType.Old.sample 1000
            |> ignore
        [<Benchmark>]
        member _.New() =
            list
            |> InappropriateCollectionType.New.sample 1000
            |> ignore
Listing 12-6

Modifying the test harness

In Listing 12-6, we create some test data, called list, for the test functions to work on, and we link up the Old and New benchmark functions to the Old and New implementations in the InappropriateCollectionType module.

Note

BenchmarkDotNet offers ways to ensure that time-consuming initializations occur only once globally, or once per iteration. Search online for “benchmarkdotnet setup and cleanup” for details. I haven’t done this here for simplicity. This won’t greatly affect the measurements, but it will have some impact on the time the overall benchmarking process takes to run.

With the code from Listings 12-5 and 12-6 in place, you can run the project and check the results. Your timings should look something like Listing 12-7, though obviously the absolute values will depend on the speed of your machine, what .NET and compiler version you are using, and so forth.
| Method |    Mean |    Error |   StdDev | Allocated |
|------- |--------:|---------:|---------:|----------:|
|    Old | 1.103 s | 0.0112 s | 0.0110 s |     31 KB |
|    New | 1.091 s | 0.0062 s | 0.0058 s |     33 KB |
Listing 12-7

Baseline timings

The key points are that the Old and New methods take similar times (to be expected as they are currently calling identical functions) and that the amount of time per iteration, at over a second, is significant. We have a baseline from which we can optimize!

Avoiding Indexed Access to Lists

One red flag in this code is that it uses indexed access into an F# list: data.[i]. Indexed access into arrays is fine the runtime can calculate an offset from the beginning of the array using the index and retrieve the element directly from the calculated memory location. (The situation might be a little more complex for multidimensional arrays.) But indexed access to an F# list is a really bad idea. The runtime will have to start at the head of the list and repeatedly get the next item until it has reached the n’th item. This is an inherent property of linked lists such as F# lists.

Indexed access to an F# list element is a so-called O(n) operation; that is, the time it takes on average is directly proportional to the length of the list. By contrast, indexed access to an array element is an O(1) operation: the time it takes on average is independent of the size of the array. Also, it takes no longer to retrieve the last element of an array than the first (ignoring any effects of the low-level caching that might go on in the processor).

So can we still use an F# list (which, rightly or wrongly, was our original design decision) while avoiding indexed access? My first thought on this was Listing 12-8, which I wrote almost as a “straw man,” not expecting it to be particularly effective.
    let sample interval data =
        data
        |> List.indexed
        |> List.filter (fun (i, _) ->
            i % interval = 0)
        |> List.map snd
Listing 12-8

First attempt at optimization

In Listing 12-8, we use List.indexed to make a copy of the original list but containing tuples of an index and the original value, for example, [(0, 1.23); (1, 0.98); ...]. Then we use List.filter to pick out the values whose indexes are a multiple of the required interval. Finally, we use List.map snd to recover just the element values as we no longer need the index values.

I had low expectations of this approach, as it involves making a whole additional list (the one with the indexes tupled in); filtering it (with some incidental pattern matching), which will create another list; and mapping to recover the filtered values, which will create a third list. Also, a bit more vaguely, this version is very functional, and we’ve been conditioned over the years to expect that functional code is inherently less efficient than imperative code.

To check my expectations, add the code from Listing 12-8 into the New module in InappropriateCollectionType.fs replacing the existing sample implementation, and run the project. Were you surprised? Listing 12-9 shows the results I got. (I’ve omitted some of the statistical columns to save space.)
| Method |       Mean |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |-----------:|----------:|----------:|----------:|----------:|
|    Old | 1,099.6 ms |         - |         - |         - |     33 KB |
|    New |   134.0 ms | 9000.0000 | 5200.0000 | 1400.0000 | 62,563 KB |
Listing 12-9

Results of first optimization attempt

The good – and perhaps surprising – news is that this is very nearly an order of magnitude faster: 134ms vs. 1,100ms. The takeaway here is that indexed access to F# lists is a disaster for performance. But the fix has come at a cost: there is a great deal of garbage collection going on, and in all three generations. This aspect should not be a surprise: as we just said, the code in Listing 12-8 creates no less than three lists to do its work, only one of which is needed in the final result.

Using Arrays Instead of Lists

Time for another optimization. What if we revoke our initial design decision to use F# lists for the input and output and work with arrays instead? An array is an inherently more efficient data structure for many operations because it is a contiguous block of memory. There is no overhead, as there is with lists, for pointers from the n’th to the n+1’th element. Changing the code to use arrays simply means changing all the references to the List module to use the Array module instead (Listing 12-10).
    let sample interval data =
        data
        |> Array.indexed
        |> Array.filter (fun (i, _) ->
            i % interval = 0)
        |> Array.map snd
Listing 12-10

Directly replacing lists with arrays

You’ll also have to add a line to the test harness (in Program.fs) to make and use an array version of the test data (Listing 12-11).
    type Harness() =
        let r = Random()
        let list = List.init 1_000_000 (fun _ -> r.NextDouble())
        let array = list |> Array.ofList
        [<Benchmark>]
        member _.Old() =
            list
            |> InappropriateCollectionType.Old.sample 1000
            |> ignore
        [<Benchmark>]
        member _.New() =
            array
            |> InappropriateCollectionType.New.sample 1000
            |> ignore
Listing 12-11

Providing an array in the test harness

This improves on the elapsed time of the list-based version by about half but still does a considerable amount of garbage collection (Listing 12-12).
| Method |        Mean |     Gen 0 |     Gen 1 |    Gen 2 | Allocated |
|------- |------------:|----------:|----------:|---------:|----------:|
|    Old | 1,098.67 ms |         - |         - |        - |     33 KB |
|    New |    67.00 ms | 4500.0000 | 2625.0000 | 750.0000 | 39,200 KB |
Listing 12-12

Results of using arrays instead of lists

Again, perhaps this isn’t too surprising: array creation might be a bit more efficient than list creation, but we are still creating three arrays and using two of them only for a brief moment.

Use Sequences Instead of Arrays

There’s tension between the fact that we’d quite like to keep the code idiomatic (a pipeline of collection functions) and the fact that the current version creates some big transient objects. Is there any way to reconcile that? Whenever we want a collection to exist but not exist, we should think about F# sequences. What happens if we replace all the Array module references with Seq references? (Listing 12-13).
    let sample interval data =
        data
        |> Seq.indexed
        |> Seq.filter (fun (i, _) ->
            i % interval = 0)
        |> Seq.map snd
Listing 12-13

Using sequences instead of arrays

To make this a fair test, we ought to make sure the sequence is actually retrieved, so add an Array.ofSeq to the calling code. We can also revert back to the original list as input, as lists can be used as sequences (Listing 12-14).
        [<Benchmark>]
        member __.New() =
            list
            |> InappropriateCollectionType.New.sample 1000
            |> Array.ofSeq
            |> ignore
Listing 12-14

Retrieving the sequence

This makes a very satisfactory difference (Listing 12-15).
| Method |        Mean |     Gen 0 |   Gen 1 | Allocated |
|------- |------------:|----------:|--------:|----------:|
|    Old | 1,097.62 ms |         - |      - |     31 KB |
|    New |    19.97 ms | 3812.5000 | 31.2500 | 31,275 KB |
Listing 12-15

Results of using sequences instead of lists

We reduced the average elapsed time by 70% compared with the previous iteration, and although there is still a lot of garbage collection going on, there is less of it and it is almost all in Generation 0, with none at all in Generation 2. Compared with the original baseline, our code is now more than 50 times faster!

Avoiding Collection Functions

We’ve spent quite a long time tweaking a collection-function-based implementation of sample . The current implementation has the advantage that it is highly idiomatic and has a reasonable degree of motivational transparency. But is it really the best we can do?

If we go back to Listing 12-4, we observe that the main problem was that we were using indexed access to an F# list. Now that we have relaxed the requirement to have lists as inputs and outputs, what happens if we restate the same code in array terms? (Listing 12-16).
    let sample interval data =
        [|
            let max = (Array.length data) - 1
            for i in 0..interval..max ->
                data.[i]
        |]
Listing 12-16

Array comprehension instead of list comprehension

In Listing 12-16, I’ve highlighted all the differences from Listing 12-4: all we have to do is use array clamps ( [|...|] ) instead of list brackets ( [...] ) to enclose the comprehension, and Array.length instead of List.length to measure the length of the data. You’ll also need to undo the changes to Program.fs we made in Listing 12-14, as we’re back to accepting and returning an array rather than a sequence, as in Listing 12-11.

With those simple changes, we’ve fixed the main issue with the baseline version: indexed access into a linked list structure. How does it perform? (Listing 12-17).
| Method |            Mean |  Gen 0 |  Gen 1 | Allocated |
|------- |----------------:|-------:|-------:|----------:|
|    Old | 1,130,741.04 us |      - |      - |     31 KB |
|    New |        13.91 us | 3.9215 | 0.1221 |     32 KB |
Listing 12-17

Results of using array comprehension instead of list comprehension

Note that in Listing 12-17, the measurements are now shown in microseconds (us) rather than milliseconds (ms) because the New measurement is too small to measure in milliseconds. This is impressive: we’ve now improved execution time from the baseline by an astonishing factor of 80,000, nearly five orders of magnitude. It’s as if we found a shortcut on the journey from New York to Paris that shortened the distance from 6000 kilometers to less than 100 meters. And we’ve done so using only functional constructs: an array comprehension and a yielding-for-loop. I’ve sometimes encountered people who consider this style nonfunctional, but I disagree. There is no mutation, no declare-then-populate patterns, and it’s very concise.

Avoiding Loops Having Skips

I’ve heard it said that loops with skips (for i in 1..10..1000 ...) compile to less efficient IL than loops with an implicit skip size of 1 (for i in 1..100 ...). I’ve no idea if this is still true (we’re not going to get into the whole business of inspecting IL in this book), but it’s relatively easy to check whether this makes a difference in practice. Listing 12-18 shows an implementation that avoids a skipping for-loop. We calculate an array index by multiplying the loop counter by the interval. The hard part is defining the upper bound of the loop.
    let sample interval data =
        [|
            let max =
                ( (data |> Array.length |> float) / (float interval)
                  |> ceil
                  |> int ) - 1
            for i in 0..max ->
                 data.[i * interval]
        |]
Listing 12-18

Avoiding a skipping for-loop

This makes no significant difference to performance (Listing 12-19). Either it’s no longer true that skipping loops are substantially less efficient or the overhead of multiplying up the array index has overwhelmed any gains from avoiding a skipping loop.
| Method |            Mean |  Gen 0 |  Gen 1 | Allocated |
|------- |----------------:|-------:|-------:|----------:|
|    Old | 1,109,881.03 us |      - |      - |     32 KB |
|    New |        13.31 us | 3.9215 | 0.1221 |     32 KB |
Listing 12-19

Results of avoiding a skipping loop

Apart from the fact that it makes no difference to performance, there are several reasons why I’d be reluctant to go this far in real code:
  • It lowers the motivational transparency of the code, by making it a little bit less obvious what the author was intending to do.

  • It’s a true microoptimization, with effects that could easily change between architectures or compiler versions. By working at this level, we deny ourselves any potential improvements in the way the compiler and runtime work with respect to skipping loops.

  • The code is much riskier, with a complicated calculation for defining the upper bound of the loop. (It took me no less than six attempts to get it right!) In going down this route, we are laying ourselves open to off-by-one errors and other silly bugs: exactly the kind of thing that idiomatic F# code excels at avoiding.

Inappropriate Collection Types – Summary

Figure 12-1 shows a chart of the effects of our various changes.
../images/462726_2_En_12_Chapter/462726_2_En_12_Fig1_HTML.png
Figure 12-1

Impact of various improvements to collection usage

The improvements are dominated by the simple change of not using indexing into an F# list. Figure 12-2 shows the same measurements on a logarithmic scale, which makes it easier to compare the last few items.
../images/462726_2_En_12_Chapter/462726_2_En_12_Fig2_HTML.png
Figure 12-2

Impact of various improvements to collection usage on a log scale

The takeaways from this section are as follows:
  • Don’t do indexed access into F# lists – that is, myList.[i]. Either use a different collection type or find another way of processing the list.

  • Be familiar with the performance characteristics of the collection data structures and functions that you are using. At the time of writing, these functions are somewhat patchily documented regarding their time complexity (O(n), O(1), etc.), so you may have to do a little online searching or experimentation to pin this down. Don’t default to using F# lists just because they might be considered more idiomatic. Unless you are playing to the strengths of lists (which boils down to use of the head::tail construct), arrays are often the better choice.

  • Pipelines of collection functions (.filter, .map, and so forth) can have decent performance, provided you choose the right collection type in the first place.

  • Sequences (and functions in the Seq module) can sometimes be a way of expressing your logic in terms of pipelines of collection functions, without the overhead of creating and destroying short-lived collection instances.

  • Comprehensions (e.g., placing code in array clamps and using yield or a for...-> loop) can have stellar performance. Don’t be fooled into thinking such code is in some way “not functional” just because the for keyword is involved.

  • Beware of low-level microoptimizations: Are you denying yourself the benefits of potential future compiler or platform improvements? Have you introduced unnecessary risk into the code?

Case Study: Short-Term Objects

An oft-quoted dictum in .NET programming is that you shouldn’t unnecessarily create and destroy large numbers of reference types because of the overhead of allocating them and later garbage collecting them. How true is this in practice, and how can we avoid it?

Imagine you are tasked with taking in a large number of three-dimensional points (x, y, and z positions) and identifying those which are within a given radius of some other fixed point. (For example, you might be trying to identify all the stars that fall within a certain radius of the Sun.) We’ll assume that the API of the function must take a radius value, a tuple of three floats for the “fixed” point, and an array of tuples of three points for the candidate positions (Listing 12-20).
    let withinRadius
         (radius : float)
         (here : float*float*float)
         (coords : (float*float*float)[]) : (float*float*float)[] =
         ...
Listing 12-20

The required API for a point-searching function

As a further given, you have access to a class that can do 3D distance calculations (Listing 12-21).
type Point3d(x : float, y : float, z : float) =
    member __.X = x
    member __.Y = y
    member __.Z = z
    member val Description = "" with get, set
    member this.DistanceFrom(that : Point3d) =
        (that.X - this.X) ** 2. +
        (that.Y - this.Y) ** 2. +
        (that.Z - this.Z) ** 2.
        |> sqrt
    override this.ToString() =
        sprintf "X: %f, Y: %f, Z: %f" this.X this.Y this.Z
Listing 12-21

A 3D point class that can do distance calculations

The type from Listing 12-21 can do the required distance calculation, but you might notice it contains other things – a mutable Description field and a ToString override – which we don’t particularly need for the requirement. This is pretty typical in an object-oriented scenario: the functionality you need is coupled with a certain amount of other stuff you don’t need.

To start exploring this requirement, add another file called ShortTermObjects.fs to your project, and populate it with the code from Listings 12-21 and 12-22.
module ShortTermObjects
type Point3d(x : float, y : float, z : float) =
    // Code as Listing 12-21
    ...
type Float3 = (float * float * float)
module Old =
    let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
        let here = Point3d(here)
        coords
        |> Array.map Point3d
        |> Array.filter (fun there ->
            there.DistanceFrom(here) <= radius)
        |> Array.map (fun p3d -> p3d.X, p3d.Y, p3d.Z)
module New =
    let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
        let here = Point3d(here)
        coords
        |> Array.map Point3d
        |> Array.filter (fun there ->
            there.DistanceFrom(here) <= radius)
        |> Array.map (fun p3d -> p3d.X, p3d.Y, p3d.Z)
Listing 12-22

First cut of the withinRadius function

As in the previous section, the Old and New implementations are the same initially. Note also that we use a type alias (type Float3 = (float * float * float)) to avoid repeating the tuple of three floats throughout the code.

We do the required selection by mapping the incoming array of tuples into Point3d instances and filtering the result using the DistanceFrom instance method. Finally, we map back to an X, Y, Z tuple, as the requirement states we have to return tuples, not Point3d instances.

To integrate with the benchmarking, you’ll also need to alter Program.fs so that the Harness module looks like Listing 12-23.
module Harness =
    [<MemoryDiagnoser>]
    type Harness() =
        let r = Random(1)
        let coords =
            Array.init 1_000_000 (fun _ ->
                r.NextDouble(), r.NextDouble(), r.NextDouble())
        let here = (0., 0., 0.)
        [<Benchmark>]
        member __.Old() =
            coords
            |> ShortTermObjects.Old.withinRadius 0.1 here
            |> ignore
        [<Benchmark>]
        member __.New() =
            coords
            |> ShortTermObjects.New.withinRadius 0.1 here
            |> ignore
Listing 12-23

Integrating the 3D distance calculation with benchmarking

When I ran this code, I didn’t have particularly high hopes: this was going to create a million instances of Point3d just so that we could use the DistanceFrom method for each instance. Listing 12-24 shows the results. (Old and New are roughly the same here, as the same function is being used in this first version.)
| Method |     Mean |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |---------:|----------:|----------:|----------:|----------:|
|    Old | 166.7 ms | 6666.6667 | 3666.6667 | 1000.0000 |     54 MB |
|    New | 160.6 ms | 6250.0000 | 3500.0000 |  750.0000 |     54 MB |
Listing 12-24

Results of a baseline run

The statistics in Listing 12-24 aren’t as bad as I’d feared – the average execution time works out at about 0.17 microseconds per input position. Not terrible, though of course that depends entirely on your objectives. Over 50Mb of memory is being allocated during processing, which might have an effect on the wider system, and there are garbage collection “survivors” into Generations 1 and 2. The .NET garbage collector is pretty good at collecting so-called “Generation 0” items, but for every extra generation that an object survives, it will have been marked and copied, and all pointers to it will have been updated. This is costly! So can we improve on our baseline?

Sequences Instead of Arrays

We learned earlier that sequences can sometimes be a better choice than arrays (or other concrete collections) for pipeline operations that create reference types. It’s simple enough to apply this to the current example (Listing 12-25).
    let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
        let here = Point3d(here)
        coords
        |> Seq.map Point3d
        |> Seq.filter (fun there ->
            there.DistanceFrom(here) <= radius)
        |> Seq.map (fun p3d -> p3d.X, p3d.Y, p3d.Z)
        |> Seq.toArray
Listing 12-25

Using sequences instead of arrays

This runs a little faster and allocates a bit less memory than the baseline example, and no objects survive into Generation 1 but the overall improvement is nothing to write home about (Listing 12-26).
| Method |     Mean |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |---------:|----------:|----------:|----------:|----------:|
|    Old | 171.7 ms | 6666.6667 | 3666.6667 | 1000.0000 |     54 MB |
|    New | 117.1 ms | 5600.0000 |         - |         - |     46 MB |
Listing 12-26

Results of using sequences instead of arrays

Avoiding Object Creation

Maybe it’s time to question the whole approach of creating Point3d instances just so we can use one of Point3d’s methods. Even if you didn’t have access to Point3d’s source code, you’d probably be able to code the calculation for a 3D distance yourself, based on the widely known formula √((x1–x2)2 + (y1-y2)2 + (z1-z2)2).

Listing 12-27 shows what happens when we do this.
    let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
        let distance (p1 : float*float*float) (p2: float*float*float) =
            let x1, y1, z1 = p1
            let x2, y2, z2 = p2
            (x1 - x2) ** 2. +
            (y1 - y2) ** 2. +
            (z1 - z2) ** 2.
            |> sqrt
        coords
        |> Array.filter (fun there ->
            distance here there <= radius)
Listing 12-27

Avoiding object creation

This shaves about 50% off the execution time and is vastly lighter on memory allocation. There is no recorded garbage collection (Listing 12-28).
| Method |      Mean |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |----------:|----------:|----------:|----------:|----------:|
|    Old | 169.84 ms | 6666.6667 | 3666.6667 | 1000.0000 | 54,839 KB |
|    New |  83.58 ms |         - |         - |         - |    126 KB |
Listing 12-28

Results of avoiding object creation

Reducing Tuples

You might be wondering what happens if we simplify the signature of the distance function so that it takes six separate floating-point values instead of two tuples of three floating-point values. This enables us to decompose here into x, y, and z only once, though we still have to decompose each candidate point, now using pattern matching in the filter lambda (Listing 12-29).
    let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
        let distance x1 y1 z1 x2 y2 z2 =
            (x1 - x2) ** 2. +
            (y1 - y2) ** 2. +
            (z1 - z2) ** 2.
            |> sqrt
        let x1, y1, z1 = here
        coords
        |> Array.filter (fun (x2, y2, z2) ->
            distance x1 y1 z1 x2 y2 z2 <= radius)
Listing 12-29

Reducing tuples

This makes no useful difference as compared with the results in Listing 12-28 (Listing 12-30).
| Method |      Mean |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |----------:|----------:|----------:|----------:|----------:|
|    Old | 172.00 ms | 6666.6667 | 3666.6667 | 1000.0000 | 54,839 KB |
|    New |  83.50 ms |         - |         - |         - |    126 KB |
Listing 12-30

Result of reducing tuples

Using Struct Tuples

F# has the concept of “struct tuples” – tuples that are value types rather than reference types. Would using struct tuples improve the performance of our withinDistance function? Listing 12-31 shows the new code. Note how we have to use the struct keyword everywhere we instantiate or pattern match on a struct tuple.
type Float3s = (struct(float * float * float))
module New =
    let withinRadius
        (radius : float)
        (here : Float3s)
        (coords : Float3s[]) =
        let distance p1 p2 =
            let struct(x1, y1, z1) = p1
            let struct(x2, y2, z2) = p2
            (x1 - x2) ** 2. +
            (y1 - y2) ** 2. +
            (z1 - z2) ** 2.
            |> sqrt
        coords
        |> Array.filter (fun there ->
            distance here there <= radius)
Listing 12-31

Using struct tuples

For this change, we’ll also have to amend Program.fs, as the signature of the function being tested has changed slightly (Listing 12-32). (This would be a practical disadvantage of this optimization if the original source of the data couldn’t be changed to produce struct tuples: you’d have to map all your tuples to struct tuples before calling withinDistance.)
module Harness =
    [<MemoryDiagnoser>]
    type Harness() =
        let r = Random(1)
        let coords =
            Array.init 1_000_000 (fun _ ->
                r.NextDouble(), r.NextDouble(), r.NextDouble())
        let here = (0., 0., 0.)
        let coordsStruct =
            coords
            |> Array.map (fun (x, y, z) -> struct(x, y, z))
        let hereStruct = struct(0., 0., 0.)
        [<Benchmark>]
        member __.Old() =
            coords
            |> ShortTermObjects.Old.withinRadius 0.1 here
            |> ignore
        [<Benchmark>]
        member __.New() =
            coordsStruct
            |> ShortTermObjects.New.withinRadius 0.1 hereStruct
            |> ignore
Listing 12-32

Providing struct tuples

Unfortunately, the move to struct tuples doesn’t make much difference for this benchmark (Listing 12-33).
| Method |      Mean |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |----------:|----------:|----------:|----------:|----------:|
|    Old | 170.76 ms | 6666.6667 | 3666.6667 | 1000.0000 | 54,839 KB |
|    New |  85.28 ms |         - |         - |         - |    135 KB |
Listing 12-33

Results of moving to struct tuples

Operator Choice

We seem to be scraping the bottom of the barrel in relation to memory management. Does anything else stand out as being capable of improvement? What is the code doing most?

One thing it is doing a lot is squaring, in the lines that look like this: (x1 - x2) ** 2. + .... This seems pretty innocent, but there is a tiny clue to a potential problem – the fact that we are squaring by raising to a floating-point exponent, 2.0. Maybe the ** operator is more general than it needs to be. What if we use the pown function, which raises to an integer exponent? It’s a simple change (Listing 12-34).
    let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
        let distance x1 y1 z1 x2 y2 z2 =
            pown (x1 - x2) 2 +
            pown (y1 - y2) 2 +
            pown (z1 - z2) 2
            |> sqrt
        let x1, y1, z1 = here
        coords
        |> Array.filter (fun (x2, y2, z2) ->
            distance x1 y1 z1 x2 y2 z2 <= radius)
Listing 12-34

Using pown instead of the ** operator

You’ll also have to undo the changes to Program.fs that we made in Listing 12-32, as we are no longer bothering with struct tuples. The results of using pown are very satisfying (Listing 12-35)!
| Method |       Mean |     Gen 0 |     Gen 1 |    Gen 2 | Allocated |
|------- |-----------:|----------:|----------:|---------:|----------:|
|    Old | 167.257 ms | 6250.0000 | 3500.0000 | 750.0000 | 54,839 KB |
|    New |   9.113 ms |         - |         - |        - |    126 KB |
Listing 12-35

Results of using pown instead of the ** operator

This is almost ten times faster than anything we’ve achieved before and 18 times faster than the baseline. Looking at the compiler source, perhaps this isn’t too surprising. There are several steps involved in getting to the final operation of multiplying x by itself, of which Listing 12-36 is just the last.
            let inline ComputePowerGenericInlined one mul x n =
                let rec loop n =
                    match n with
                    | 0 -> one
                    | 1 -> x
                    | 2 -> mul x x
                    | 3 -> mul (mul x x) x
                    | 4 -> let v = mul x x in mul v v
                    | _ ->
                        let v = loop (n/2) in
                        let v = mul v v in
                        if n%2 = 0 then v else mul v x in
                loop n
Listing 12-36

Part of the compiler logic behind the ** operator

Are we satisfied yet? Well even pown x 2 is a little more general than we need, as we know that we really just want to do x*x. What if we make one last change to do exactly that (Listing 12-37)?
    let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
        let distance x1 y1 z1 x2 y2 z2 =
            let dx = x1 - x2
            let dy = y1 - y2
            let dz = z1 - z2
            dx * dx +
            dy * dy +
            dz * dz
            |> sqrt
        let x1, y1, z1 = here
        coords
        |> Array.filter (fun (x2, y2, z2) ->
            distance x1 y1 z1 x2 y2 z2 <= radius)
Listing 12-37

Avoiding using pown for squaring

This makes a further 60% difference! (Listing 12-38).
| Method |       Mean |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|------- |-----------:|----------:|----------:|----------:|----------:|
|    Old | 172.101 ms | 6666.6667 | 3666.6667 | 1000.0000 | 54,839 KB |
|    New |   3.815 ms |         - |         - |         - |    126 KB |
Listing 12-38

Results of avoiding using pown

We’ve now achieved a gain of 98% over the original implementation. It’s probably time to stop scraping the barrel.

Short-Term Objects – Summary

Figure 12-3 shows a chart of the effects of our various changes.
../images/462726_2_En_12_Chapter/462726_2_En_12_Fig3_HTML.png
Figure 12-3

Impact of various improvements to object usage

The results are dominated by one kind of change: not the way we use objects or collections, but our choice of operator to do the distance calculation.

The takeaways from this section are as follows:
  • If many reference objects are placed into collections, the collections and their functions can have a bearing on performance over and above the cost of the objects themselves. For example, when dealing with long lists of reference type instances, pipelines of sequence functions can be better than pipelines of array functions.

  • Think about why you are creating objects. Could the methods you are calling be factored out into stand-alone functions, meaning that the whole object-instantiation/collection issue goes away (unless those functions themselves allocate memory)? Refactoring into independent functions has additional benefits in terms of conciseness, decoupling, and testability.

  • Concerns about allocation of tuples, and the possible gains from using struct tuples, can be important, but quick wins are not guaranteed.

  • Though discussions of performance in .NET languages often focus on memory management, this is far from being the whole story. Consider algorithms and operators as well.

  • Only use ** for raising to noninteger exponents. Use pown when raising to integer exponents, and also consider simple self-multiplication when the exponent is known in advance (e.g., squaring, cubing, etc.). More generally, remember there is a trade-off between how generic and general-purpose things are (such as the generic ** operator) and how efficient they are.

Case Study: Naive String Building

As developers, we often find ourselves building up strings, for example, for formatting values in UIs or data exports, or sending messages to other servers. It’s easy to get wrong on .NET – but fortunately not too hard to get right either.

For this section, we’ll imagine we’ve been tasked with formatting a two-dimensional array of floating-point values as a single CSV (comma-separated values) string, with line ends between each row of data. For simplicity, we’ll assume that F#’s default floating-point formatting (with the "%f" format specifier) is sufficient. We’ll further assume that the array, while not trivial, fits in memory and that its CSV also fits in memory, so we don’t need a fancy streaming approach.

Add a new file called NaiveStringBuilding.fs to the benchmarking project and copy into it the code from Listing 12-39.
module NaiveStringBuilding
open System
module Old =
    let private buildLine (data : float[]) =
        let mutable result = ""
        for x in data do
            result <- sprintf "%s%f," result x
        result.TrimEnd(',')
    let buildCsv (data : float[,]) =
        let mutable result = ""
        for r in 0..(data |> Array2D.length1) - 1 do
            let row = data.[r, *]
            let rowString = row |> buildLine
            result <- sprintf "%s%s%s" result rowString Environment.NewLine
        result
module New =
    // Code as in Old module above.
Listing 12-39

First cut of a CSV builder

Also change the Harness module in Program.fs to look like Listing 12-40.
module Harness =
    [<MemoryDiagnoser>]
    type Harness() =
        let data =
            Array2D.init 500 500 (fun x y ->
                x * y |> float)
        [<Benchmark>]
        member __.Old() =
            data
            |> NaiveStringBuilding.Old.buildCsv
            |> ignore
        [<Benchmark>]
        member __.New() =
            data
            |> NaiveStringBuilding.New.buildCsv
            |> ignore
Listing 12-40

Integrating CSV generation with benchmarking

In Listing 12-40, we generate a 500 x 500 element array: not exactly “big data,” but it’s still a quarter of a million elements, so will give our CSV builder a decent workout. (You can reduce the number of elements if the benchmarks run too slowly for you.) How does the naive, mutation-based solution shape up? (Listing 12-41).
| Method |    Mean |       Gen 0 |       Gen 1 |       Gen 2 | Allocated |
|------- |--------:|------------:|------------:|------------:|----------:|
|    Old | 1.027 s | 340000.0000 | 149000.0000 | 140000.0000 |      3 GB |
|    New | 1.032 s | 336000.0000 | 145000.0000 | 136000.0000 |      3 GB |
Listing 12-41

Results of naive CSV string building

This is not good. Building the CSV for our relatively modest 500 x 500 array takes over a second and allocates an astonishing 3GB of memory. There is garbage collection going on in all three generations. Imagine you’d put this code on a web server for, say, generating client downloads for scientific or banking customers. You would not be popular! The reason things are so bad is the level at which we are mutating things. Every time we do a result <- sprintf..., we are discarding the string object that was previously referred to by the label result (making it available for garbage collection in due course) and creating another string object. This means allocating and almost immediately freeing vast amounts of memory.

StringBuilder to the Rescue

The problems of string mutation aren’t unique to F#. There is a nice solution in .NET called System.Text.StringBuilder , which is designed to tackle exactly this kind of situation. Listing 12-42 shows how you can use it. The code doesn’t have to change much: the mutable result is replaced by a StringBuilder instance, and the actual mutation of result in buildLine is replaced by calling the string builder’s Append() method. (Confusingly, calling Append both does an in-place append and returns the StringBuilder instance, which is why we have to pipe its result into ignore.) In the buildCsv function , we use StringBuilder.AppendLine() to get the line breaks. Finally, we call the string builder’s ToString() method to get the built-up string.
    open System.Text
    let private buildLine (data : float[]) =
        let sb = StringBuilder()
        for x in data do
            sb.Append(sprintf "%f," x) |> ignore
        sb.ToString().TrimEnd(',')
    let buildCsv (data : float[,]) =
        let sb = StringBuilder()
        for r in 0..(data |> Array2D.length1) - 1 do
            let row = data.[r, *]
            let rowString = row |> buildLine
            sb.AppendLine(rowString) |> ignore
        sb.ToString()
Listing 12-42

Using StringBuilder for string concatenation

The results are impressive: a 14-fold speedup and nearly a 40-fold improvement in memory allocation (Listing 12-43).
| Method |        Mean |       Gen 0 |       Gen 1 |       Gen 2 | Allocated |
|------- |------------:|------------:|------------:|------------:|----------:|
|    Old | 1,036.89 ms | 341000.0000 | 150000.0000 | 141000.0000 |  3,127 MB |
|    New |    72.97 ms |   9857.1429 |   2000.0000 |   1000.0000 |     81 MB |
Listing 12-43

Result of using StringBuilder for string concatenation

Using String.Join

If we now focus on the buildLine function , we notice a few things about it that should make us a little unhappy:
  • It’s too much code for what must surely be a commonly required operation: joining a set of strings together with some separator at the joins.

  • At the end of the string building process, we have to go back and trim off the final separator.

It turns out that .NET offers a built-in function for doing pretty much all we want. String.Join takes a separator and an array of strings to join, so all we need to do before calling it is map the floats into strings in the required format (Listing 12-44).
    open System.Text
    let private buildLine (data : float[]) =
        let cols = data |> Array.map (sprintf "%f")
        String.Join(',', cols)
    let buildCsv (data : float[,]) =
        let sb = StringBuilder()
        for r in 0..(data |> Array2D.length1) - 1 do
            let row = data.[r, *]
            let rowString = row |> buildLine
            sb.AppendLine(rowString) |> ignore
        sb.ToString()
Listing 12-44

Using String.Join

This gives a further incremental improvement in performance and quite a good reduction in memory allocation (Listing 12-45).
| Method |        Mean |       Gen 0 |       Gen 1 |       Gen 2 | Allocated |
|------- |------------:|------------:|------------:|------------:|----------:|
|    Old | 1,026.21 ms | 339000.0000 | 148000.0000 | 139000.0000 |  3,127 MB |
|    New |    62.46 ms |   5750.0000 |   2125.0000 |    875.0000 |     47 MB |
Listing 12-45

Result of using String.Join

Using Array.Parallel.map

If we look again at Listing 12-44, we notice that we have an array mapping operation. With such operations, you can often speed things up by using Array.Parallel.map instead (Listing 12-46). Array.Parallel.map has the same type signature and observable behavior as Array.map, but the computations it specifies are done in parallel, spread across your available cores. Obviously, we don’t want to do this until we are convinced that the operation we are doing is itself reasonably efficient, but here it seems justified.
    open System.Text
    let private buildLine (data : float[]) =
        let cols = data |> Array.Parallel.map (sprintf "%f")
        String.Join(',', cols)
    let buildCsv (data : float[,]) =
        let sb = StringBuilder()
        for r in 0..(data |> Array2D.length1) - 1 do
            let row = data.[r, *]
            let rowString = row |> buildLine
            sb.AppendLine(rowString) |> ignore
        sb.ToString()
Listing 12-46

Using Array.Parallel.map

This brings us a considerable speed improvement, a great cost-benefit given the simplicity of the code change (Listing 12-47).
| Method |        Mean |       Gen 0 |       Gen 1 |       Gen 2 | Allocated |
|------- |------------:|------------:|------------:|------------:|----------:|
|    Old | 1,013.15 ms | 338000.0000 | 147000.0000 | 138000.0000 |  3,127 MB |
|    New |    29.77 ms |   6437.5000 |   2250.0000 |    968.7500 |     50 MB |
Listing 12-47

Result of using Array.Parallel.map

A couple of things to be aware of when using Array.Parallel.map. First, its impact will obviously be very dependent on the number of available cores. It’s not uncommon for cheaper cloud instances (e.g., on Microsoft Azure) to have fewer cores than a typical developer machine, so the in-production speedup may be disappointing. You may have to experiment with running your benchmarks on a cloud instance to clarify this. And second, try not to nest Array.Parallel operations. You will rapidly bump into the law of diminishing returns.

Using String Interpolation

If you are really concentrating, you may be wondering if other point optimizations in the buildLine function might squeeze out a little more performance:
  • Using Seq.map instead of Array.map.

  • Using String.Format instead of F#’s sprintf to format the floating-point values into strings:

    let cols = data |> Array.Parallel.map (fun x -> String.Format("{0}", x))

  • Using F# string interpolation.

Of these, I found that the only substantial improvement was by using F# string interpolation (Listing 12-48) (notice the $"x").
    open System.Text
    let private buildLine (data : float[]) =
        let cols = data |> Array.Parallel.map (fun x -> $"x")
        String.Join(',', cols)
    let buildCsv (data : float[,]) =
        let sb = StringBuilder()
        for r in 0..(data |> Array2D.length1) - 1 do
            let row = data.[r, *]
            let rowString = row |> buildLine
            sb.AppendLine(rowString) |> ignore
        sb.ToString()
Listing 12-48

Using string interpolation

This improved performance by a surprising 50% and memory allocation by two-fifths (Listing 12-49).
| Method |        Mean |       Gen 0 |       Gen 1 |       Gen 2 | Allocated |
|------- |------------:|------------:|------------:|------------:|----------:|
|    Old | 1,029.77 ms | 342000.0000 | 151000.0000 | 142000.0000 |  3,127 MB |
|    New |    14.26 ms |   4109.3750 |    625.0000 |    234.3750 |     32 MB |
Listing 12-49

Result of using string interpolation

Overall, we’ve achieved approximately a 72-fold improvement in performance and two orders of magnitude less memory allocation.

Naive String Building – Summary

Figure 12-4 shows a chart of the effects of our various changes.
../images/462726_2_En_12_Chapter/462726_2_En_12_Fig4_HTML.png
Figure 12-4

Impact of various improvements to string building

The takeaways from this section are as follows:
  • Mutating string instances really means discarding and replacing the entire instance with another one. This is a disaster for performance.

  • The .NET StringBuilder class is optimized for exactly this requirement and can offer a huge speed and memory efficiency boost.

  • When joining strings together with a separator, String.Join gives good performance with minimal code.

  • Using Array.Parallel.map gives a great low-code-impact speed boost. Bear in mind the number of cores on the machine where the code will be running live. Nest Array.Parallel.map operations at your peril.

  • It’s well worth experimenting with alternatives to sprintf for formatting values. In this case, string interpolation halved our execution time.

Other Common Performance Issues

I should mention a few other common performance issues. We don’t have room for case studies for these, but a brief description should be enough to help you avoid them.

Searching Large Collections

If you find yourself repeatedly searching through large collections using Array.find, Array.tryFind, Array.contains, and so forth, consider making the collection an F# Map, a .NET Dictionary, or some other collection optimized for lookup.

Comparison Operators and DateTimes

If you need to do large numbers of less-than, greater-than, or equality comparisons with DateTime or DateTimeOffset instances, consider using DateTime.CompareTo or DateTimeOffset.CompareTo. At the time of writing, this works about five times faster (in a brief informal test) than =, >, >=, <, and <=.

Concatenating Lists

The @ operator concatenates two F# lists. This is a relatively expensive operation, so you may want to avoid doing it in performance critical code. Building up lists using the cons operator (::) is OK (assuming that a list is otherwise suitable for what you are doing) because that’s what linked lists are optimized for.

For-Loop with Unexpected List Creation

What’s the practical difference between these two lines of code? (Listing 12-50).
for i in 0..9999 do...
for i in [0..9999] do...
Listing 12-50

A right way and a wrong way to write a simple indexed for-loop

The first is a simple for-loop and is correct. The second instantiates a list instance with 10,000 elements and iterates over the list. The body of the loops could be the same, and the second would perform much more slowly and would use more memory.

F# and Span Support

Be aware that F# 4.5 introduced support for the .NET type Span<T>. To quote MSDN Magazine:

System.Span<T> is a new value type at the heart of .NET. It enables the representation of contiguous regions of arbitrary memory, regardless of whether that memory is associated with a managed object, is provided by native code via interop, or is on the stack. And it does so while still providing safe access with performance characteristics like that of arrays.

Usage of Span is too low level an undertaking to go into detail here, but if you are really struggling for performance and you think working directly with a range of memory might help, you can do so using F#’s support for Span.

The Importance of Tests

I’ve done all this benchmarking without having shown any tests to prove that the functions being tested still work correctly. This is simply to keep down the amount of code included in the book. In reality, it would be important to have passing tests before you started the optimization process – and to keep running them for each optimization pass. Broadly, your workflow should be like Figure 12-5. You might prefer to write at least some of the tests before writing the implementation, but my key point is they should be in place before you start optimizing.
../images/462726_2_En_12_Chapter/462726_2_En_12_Fig5_HTML.png
Figure 12-5

Workflow for running tests and checking performance

You might shortcut the process a bit by doing several performance optimization passes before rerunning tests. But the important thing is that nothing gets included in your product that doesn’t both pass functional tests and have acceptable performance.

This is especially true when replacing pipelines of collection functions with more handcrafted logic. The lower the level of abstraction you are working at, the more potential there is for silly off-by-one errors and so forth: just the kinds of things that one avoids if one sticks to using collection functions.

Recommendations

Here are some lessons that are worth taking away from this chapter:
  • When performance is important, have a method for measuring it in a simple and repeatable way. BenchmarkDotNet is a good choice.

  • Be keenly aware of the performance characteristics of any collection types and functions you are using. Indexed access into F# lists and list concatenation with @ are traps that are particularly easy to fall into.

  • Instantiating and later destroying reference values (i.e., classes) have a cost. Be mindful of whether those objects need to exist – could a function do the work instead?

  • When instances are in a collection, the type of collection used can also affect memory behavior. Using sequence functions instead of concrete collection functions for intermediate steps in a pipeline can sometimes help (e.g., Seq.map instead of Array.map).

  • Although discussion of .NET performance often focuses on the memory footprint and life cycles of objects, other considerations, such as the choice of operators, can sometimes have a greater impact. Remember the impact of using ** vs. pown or simple multiplication.

  • Naive string building is a common source of performance problems. StringBuilder and String.Join can help. String interpolation can be faster than sprintf.

  • Array.Parallel.map can have a big impact on performance when multiple cores are available. Add it as a last step when you are sure the mapping function itself is efficient.

  • When dealing with DateTimes and DateTimeOffsets, CompareTo is currently faster than comparison operators such <, >, and =.

  • Don’t use for x in [y..z] unless you really did intend to create a collection of values to iterate over. Omit the brackets.

  • You can get great improvements in performance without moving away from a functional, immutable style. Beware of microoptimizations that make your code less reliable, less maintainable, and less likely to benefit from future compiler or platform enhancements.

Summary

Optimizing F# code can be a pleasure rather than a chore, provided you set up good benchmarks and code with a degree of mechanical sympathy. Code a baseline version that works, bearing in mind the principles of motivational transparency and semantic focus. While you should avoid obvious howlers (like indexed access into F# lists), you shouldn’t worry overly much about performance during this step. Ensure tests and benchmarks are in place for this baseline version. Then tackle bottlenecks. You can often achieve improvements of several orders of magnitude without compromising the clarity and maintainability of your code.

Finally, I want to say a big thanks to the authors and maintainers of BenchmarkDotNet. It’s an awesome library, and we’ve only skimmed the surface of its capabilities here.

In the next chapter, we’ll move our focus back from the computer to the human and discuss how to use code layout and naming to maximize the readability and hence the revisability of our code.

Exercises

Exercise 12-1 – Concatenating Lists
You come across the following code, which adds some new transactions to an existing collection of transactions. It seems to be a bottleneck in your system.
type Transaction = { Id : int } // Would contain more fields in reality
let addTransactions
    (oldTransactions : Transaction list)
    (newTransactions : Transaction list) =
    oldTransactions @ newTransactions
let transactions1 = List.init 10_000_000 (fun i -> { Id = i})
let transactions2 = List.init 10_000_000 (fun i -> { Id = i+1_000_000})
let stopwatch = System.Diagnostics.Stopwatch.StartNew()
let allTransactions = addTransactions transactions1 transactions2
sprintf "That took %ims" stopwatch.ElapsedMilliseconds

Assuming that the old and new transaction collections don’t have to be F# lists, how could you speed up the system with minimal code changes?

Exercise 12-2 – Speeding Up Filtering
A colleague suggests that you could speed up the following code (from Listing 12-37) by mapping to the distance in parallel and then filtering. (At the time of writing, there is no Array.Parallel.filter function, which is why you’d have to map first.)
type Float3 = (float * float * float)
let withinRadius (radius : float) (here : Float3) (coords : Float3[]) =
    let distance x1 y1 z1 x2 y2 z2 =
        let dx = x1 - x2
        let dy = y1 - y2
        let dz = z1 - z2
        dx * dx +
        dy * dy +
        dz * dz
        |> sqrt
    let x1, y1, z1 = here
    coords
    // Original code:
    // |> Array.filter (fun (x2, y2, z2) ->
    //    distance x1 y1 z1 x2 y2 z2 <= radius)
    |> Array.Parallel.map (fun (x2, y2, z2) ->
        distance x1 y1 z1 x2 y2 z2)
    |> Array.filter (fun d -> d <= radius)
let r = Random(1)
let coords =
    Array.init 1_000_000 (fun _ ->
        r.NextDouble(), r.NextDouble(), r.NextDouble())
let here = (0., 0., 0.)
let stopwatch = System.Diagnostics.Stopwatch.StartNew()
let result =
    coords
    |> withinRadius 0.1 here
sprintf "That took %ims" stopwatch.ElapsedMilliseconds

Would you expect this to improve performance? Why/why not?

Exercise 12-3 – Changing the Approach to CSV Generation
How could you change the following code (originally from Listing 12-46) so that the entire 2D array is mapped into string representations of the numbers in one step and only then converted into CSV lines?
let buildLine (data : float[]) =
    let cols = data |> Array.Parallel.map (sprintf "%f")
    String.Join(',', cols)
let buildCsv (data : float[,]) =
    let sb = StringBuilder()
    for r in 0..(data |> Array2D.length1) - 1 do
        let row = data.[r, *]
        let rowString = row |> buildLine
        sb.AppendLine(rowString) |> ignore
    sb.ToString()
let data =
    Array2D.init 500 500 (fun x y ->
        x * y |> float)
let stopwatch = System.Diagnostics.Stopwatch.StartNew()
let csv =
    data
    |> buildCsv
    |> ignore
sprintf "That took %ims" stopwatch.ElapsedMilliseconds

What impact does doing this have on performance?

Hints :
  • You can get rid of the buildLine function in your new version.

  • Remember there is an Array2D module.

  • You won’t be able to work in parallel.

Exercise Solutions

Exercise 12-1 – Concatenating Lists
Concatenating lists with @ tends to be slow. Given that we are not required to use lists, it’s simple to replace them with arrays and to use Array.append to perform the joining. Depending on the point in your code at which you wanted to get the results, you could also experiment with using sequences.
type Transaction = { Id : int } // Would contain more fields in reality
let addTransactions
    (oldTransactions : Transaction[])
    (newTransactions : Transaction[]) =
    Array.append oldTransactions newTransactions
let transactions1 = Array.init 10_000_000 (fun i -> { Id = i})
let transactions2 = Array.init 10_000_000 (fun i -> { Id = i+1_000_000})
let stopwatch = System.Diagnostics.Stopwatch.StartNew()
let allTransactions = addTransactions transactions1 transactions2
sprintf "That took %ims" stopwatch.ElapsedMilliseconds
Exercise 12-2 – Speeding Up Filtering

Generally speaking, the suggested change would be slower. This is because the Array.Parallel.map operation creates a whole new array, which we then filter.

Exercise 12-3 – Changing the Approach to CSV Generation
This can be achieved by doing an Array2D.map to generate the string representation of every array value and then iterating over the result row-wise, doing a String.Join and an AppendLine in a single line of code.
open System.Text
let buildCsv (data : float[,]) =
    let dataStrings =
        data |> Array2D.map (sprintf "%f")
    let sb = StringBuilder()
    for cols in 0..(dataStrings |> Array2D.length1) - 1 do
        sb.AppendLine(String.Join(',', cols)) |> ignore
    sb.ToString()
let data =
    Array2D.init 500 500 (fun x y ->
        x * y |> float)
let stopwatch = System.Diagnostics.Stopwatch.StartNew()
let csv =
    data
    |> buildCsv
    |> ignore
sprintf "That took %ims" stopwatch.ElapsedMilliseconds

I found the performance results were considerably worse than the prior (Listing 12-46) version, taking 78ms when run in a notebook, rather than 38ms.

This is at least partly because we are no longer doing an Array.Parallel.map to generate the string representations. There is no Array2D.Parallel.map.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.244.45