© Robert Pickering and Kit Eason 2016

Robert Pickering and Kit Eason, Beginning F# 4.0, 10.1007/978-1-4842-1374-2_13

13. Type Providers

Robert Pickering and Kit Eason

(1)St. Germain-En-Laye, France

In this chapter, you will learn about F# type providers. Type providers let you bridge the gap between language and data —a gap you may not even have realized existed until you give type providers a try. You’ll learn to apply type providers to CSV files, HTML sources, and SQL Server databases. You’ll learn a bit about the potential pitfalls of type providers and how to mitigate them. Finally, you’ll learn how to use a query expression to combine the results of two type providers.

What Are Type Providers?

Type providers are a feature that is more or less unique to the F# language, and this fact makes them a little hard to explain in the abstract. Instead, let’s start by looking at a concrete example. Figure 13-1 shows the CSV type provider in action, allowing you to access the data fields in a CSV file about London Underground stations. See how the columns from the CSV file are available in IntelliSense, just as if you’d written a whole infrastructure of classes and methods to read the CSV file, parsed it into fields, and exposed the fields in objects? But you never had to write this infrastructure!

A340906_2_En_13_Fig1_HTML.jpg
Figure 13-1. Using the CSV type provider

You’ll go on to replicate this example yourself in the next section, but take a moment first to appreciate the elements in play here:

  • An external data source that has some kind of implicit or explicit schema. In this case, the data source is CSV, but given the appropriate type provider it could be XML, JSON, HTML, a SQL database, or even another language. (Yes, you can use type providers to access other languages! This is too advanced a topic for this book, but if you are interested you should look at the type provider for the R statistical programming language at http://bluemountaincapital.github.io/FSharpRProvider/ ).

  • A type provider for the kind of data source being used. Someone needs to have written that type provider; fortunately the open source community has produced type providers for most common scenarios. If there isn’t one in existence, you can actually write your own type provider. This is beyond the scope of this book but is a well-documented process.

  • Your own code, as shown in Figure 13-1. Here you can access fields (and sometimes methods) from the data source as if the data source had been encapsulated in classes.

You really only need to make an effort in the last of these steps, although that effort often amounts to simply pressing the period (.) key.

Using the CSV Type Provider

In this section, you use the CSV type provider to access data about the London Underground train system’s stations.

Note

The CSV files for this example are provided with the code downloads for this book.

Here’s an extract from the CSV:

Station,Line(s),Local Authority,Zone(s),Usage
Acton Town,District; Picadilly,Ealing,3,6.06
Aldgate,Metropolitan; Circle,City of London,1,7.22
Aldgate East,Hammersmith & City; Dictrict,Tower Hamlets,1,12.25
Alperton,Piccadilly,Brent,4,3.17

It’s fairly straightforward but there are also some subtleties. The column separator is obviously a comma, but there are two columns that can hold multiple values separated by semicolons: the Line(s) column and the Zone(s) column. The good news is that there is a column headers line at the beginning of the file.

It’s time to apply the CSV type provider to this dataset . Place the CSV file in a suitable location, such as c:DataUndergroundStations.csv. (If you use the data from this book’s code downloads, use the version in the file Stations_short.csv to begin with.)

Create a new F# library project , and use NuGet or your favorite package manager to add the package FSharp.Data to your project. You will then see the warning dialog in Figure 13-2. (The example shown is for Visual Studio but other environments will show similar dialogs.)

A340906_2_En_13_Fig2_HTML.jpg
Figure 13-2. Type provider security warning

This dialog is shown because type providers are powerful beasts, so, as indicated in the dialog, you should only download them from trustworthy sources. FSharp.Data from NuGet is trustworthy, so click the Enable button.

Rename Library1.fs to UndergroundCSV.fsand edit it so that it looks like the code in Listing 13-1.

Listing 13-1. Listing London Underground Stations from a CSV File
#if INTERACTIVE
// You may have to alter this path depending on the version
// of FSharp.Data downloaded and on you project structure
#r @"..packagesFSharp.Data.2.2.5lib et40FSharp.Data.dll"
#else
module UndergroundCSV
#endif


open FSharp.Data

type Stations = CsvProvider< @"c:DataUndergroundStations.csv",
                   HasHeaders=true>


let ListStations() =
    let stations = Stations.Load(@"c:DataUndergroundStations.csv")


    for station in stations.Rows do
        printfn "%s is in %s" station.Station station.``Local Authority``

Select all the code and send it to F# Interactive. Switch to the F# Interactive screen and type ListStations();;. You should see a list of London Underground stations and the Local Authority in which each falls.

> ListStations();;
Acton Town is in Ealing
Aldgate is in City of London
Aldgate East is in Tower Hamlets
Alperton is in Brent
>

What’s going on? First is a little section that, for F# Interactive only, references the FSharp.Data library . (In compiled code, this reference is provided at the project level.) You also open the FSharp.Data namespace for convenience.

#if INTERACTIVE
// You may have to alter this path depending on the version
// of FSharp.Data downloaded and on you project structure
#r @"..packagesFSharp.Data.2.2.5lib et40FSharp.Data.dll"
#else
module UndergroundCSV
#endif


open FSharp.Data

Then you create a new type called Stations, which looks like a type abbreviation for the CsvProvider type from FSharp.Data:

type Stations = CsvProvider< @"c:DataUndergroundStations.csv",
                  HasHeaders=true>

Note how you must give the path of a sample of the CSV data you want to read. You give this as a static generic parameter, hence the angle brackets, <>. Note also the space between the opening angle bracket and the following @. The @ is needed so that the backslashes in the following literal string aren’t interpreted as escape characters, but <@ in the F# syntax means “open code quotation.” So you use a space to separate the angle bracket and the @ sign, and thus prevent this misinterpretation.

Also in the angle brackets is HasHeaders=true. This tells the type provider that the CSV sample file you are providing has a header line giving column names.

Next is some code to actually use the data:

let ListStations() =
    let stations = Stations.Load(@"c:DataUndergroundStations.csv")

You call the Load method in your Stations type, providing the path of a CSV file as an ordinary function parameter value, and binding the result to a label called stations. In this case you use the same file for the sample and for the data, but in a production scenario you would likely use a small but representative sample at the earlier declaration stage and a larger but identically structured file when getting the actual data. The sample file will need to be available at compile time, so in cases where you are using a non-local build pipeline, it has to be checked in.

Then you iterate over the rows, printing out two of the properties:

for station in stations.Rows do
    printfn "%s is in %s" station.Station station.``Local Authority``

See how the type provider is clever enough to enclose property names in back-quotes in cases, such as `` Local Authority``, where the column name from the CSV file wouldn’t constitute a valid F# identifier, typically because it includes spaces.

To get a feel for how type providers interact with IntelliSense, try deleting the text Station in station.Station and pressing Ctrl+Spacebar (or whatever shortcut activates IntelliSense in your environment). You should see a list of properties, as shown in Figure 13-3. This list was derived from the column names in the first line of the sample CSV file you specified.

A340906_2_En_13_Fig3_HTML.jpg
Figure 13-3. IntelliSense from the CSV header

Now let’s look at some of the real-world complications you might encounter with the CSV type provider and how to deal with them. I mentioned earlier that some of the columns in the CSV contain potentially multiple values. One of these is the Zone column, because some Underground stations are in more than one ticketing zone. This isn’t apparent in the preceding examples because the stations used in the sample and the actual data all happened to have a single zone. Let’s see what happens when that isn’t the case. To set this up, make a copy of the Stations.csv file and call the copy StationsSample.csv. Edit the Stations.csv file (but not StationsSample.csv) to include the following additional station:

Archway,Northern,Islington,2;3,8.94

As you can see, Archway is in zones 2 and 3. Now edit your F# source so that it uses the sample file (StationsSample.csv) when setting up the type provider and the file including Archway (Stations.csv) as the data file. Also, change it to print out Zone as an integer (Listing 13-2).

Listing 13-2. Printing Out Zone Using Separate Sample and Data Files
open FSharp.Data

type Stations = CsvProvider< @"c:DataUndergroundStationsSample.csv",
                  HasHeaders=true>


let ListStations() =
    let stations = Stations.Load(@"c:DataUndergroundStations.csv")


    for station in stations.Rows do
        printfn "%s is in %i" station.Station station.Zone

This compiles absolutely fine. The type provider inferred that Zone is an integer because all the values in the sample were integers. (Actually, if you’re super observant, you might notice that it infers that the Zone is an integer with a unit-of-measure of seconds. This is because the column title was Zone(s), and the type provider assumed that (s) meant “seconds.”) Now try sending the amended code to F# Interactive and running ListStations().

As you might expect, you get an error trying to print out the new station:

> ListStations();;
Acton Town is in 3
Aldgate is in 1
Aldgate East is in 1
Alperton is in 4
System.Exception: Couldn't parse row 5 according to schema: Expecting Int32 in Zone, got 2;3
>

There are several approaches you can take to fix this. One is to add “Archway” to the sample file, and more generally to ensure that the sample file contains examples of every kind of data item that might be in the data at runtime. (These examples will have to appear in the first 1000 rows because that is the size of sample that the type provider uses by default.) Another is to coerce the types of any columns that might cause problems. To force the type of the Zone(s) column, simply edit your Stations declaration to include a Schema assignment, as shown in Listing 13-3.

Listing 13-3. Forcing the Type of a Column
type Stations = CsvProvider< @"c:DataUndergroundStationsSample.csv",
                             HasHeaders=true,
                             Schema="Zone(s)=string">

As soon as you do this, you should get an error in your printfn statement because the format string you used was treating Zone as a string. (And also because the column is now called Zone(s) because there is no longer an assumption that (s) stands for seconds.) Edit the print statement to fix this, and you should find that the ListStations() function runs reliably:

for station in stations.Rows do
    printfn "%s is in %s" station.Station station.``Zone(s)``
> ListStations();;
Acton Town is in 3
Aldgate is in 1
Aldgate East is in 1
Alperton is in 4
Archway is in 2;3
>
Note

The CSV type provider contains an incredibly rich set of features for specifying the types and optionality of columns, for the treatment of missing data, and so on. You can also override the separator so, for example, files with tab or | separators can be treated in the same way as comma-separated files. These settings are documented on the type provider’s GitHub page at https://fsharp.github.io/FSharp.Data/library/CsvProvider.html .

Using the HTML Type Provider

In this section, you’ll learn how to use the HTML type provider. You’ll return to the London Underground stations example, but this time you’ll get the data from a Wikipedia page , which means accessing HTML. If you visit the Wikipedia page called “List of London Underground Stations” at https://en.wikipedia.org/wiki/List_of_London_Underground_stations , you’ll see that it contains a nicely-curated list of stations with all sorts of useful information. But how easy is it to access this information programmatically, so that you can rank the stations by traffic, find all the stations on the Northern line, or answer trivia questions like which station has the most distinct letters in its name?

Once again, start by creating an F# Library project and adding FSharp.Data. Then add the code in Listing 13-4.

Listing 13-4. Accessing Wikipedia Information on London Underground Stations
#if INTERACTIVE
// You may have to alter this path depending on the version
// of FSharp.Data downloaded and on you project structure
#r @"..packagesFSharp.Data.2.2.5lib et40FSharp.Data.dll"
#else
module UndergroundHTML
#endif


open FSharp.Data

type Stations = HtmlProvider< @"https://en.wikipedia.org/wiki/List_of_London_Underground_stations">

let ListStations() =
    let stations = Stations.Load(@"https://en.wikipedia.org/wiki/List_of_London_Underground_stations")


    let list = stations.Tables.Stations
    for station in list.Rows do
        printfn "%s is in %s" station.Station station.``Zone(s)[†]``

Note how close in structure this version is to the CSV example in Listing 13-1. Generally, F# type provider usage always follows this pattern: a setup phase that looks like a type alias, but with a static parameter indicating some location where a sample or schema information can be obtained (and maybe other static parameters specifying options); then the runtime initialization with real data; and finally the use of that data. So similar is this to the CSV example that you really only need to pull out one line for further examination:

let list = stations.Tables.Stations

This is where you specify the specific table that you want from the HTML: the table named Stations. Once again you see how a type provider can blur the distinction between data and code : you didn’t have to say something like stations.Tables("Stations"). The Stations property is right there in IntelliSense.

You’d better check that the example works! Send all the code to F# Interactive and run ListStations():

> ListStations();;
Acton Town is in 3
Aldgate is in 1
Aldgate East is in 1
Alperton is in 4
Amersham is in 9
Angel is in 1
Archway is in 2.1 !2&3
...
Woodford is in 4
Woodside Park is in 4
>

That was pretty successful! However, if you look at the entry for Archway , you’ll see that the Zones value is shown as 2.1 !2&3. This is because the table in Wikipedia contains two elements in this cell: a sort key to facilitate correct sorting by zone (this is where the 2.1 ! comes from), and the actual zone information. Fortunately, with a little extra code you can tidy this up (see Listing 13-5).

Listing 13-5. Tidying Up the Zones Column
for station in list.Rows do

    let zones =
        match station.``Zone(s)[†]`` |> Seq.tryFindIndex ((=) '!') with
        | Some(idx) -> station.``Zone(s)[†]``.[idx+1 ..]
        | None -> station.``Zone(s)[†]``


    printfn "%s is in %s" station.Station zones
Note

I am indebted to Evelina Gabasova for the code in Listing 13-5. Her blog at http://evelinag.com/ is a must for all students of F#.

Here you are binding a new value called zones. To make a value for zones, you start by treating the raw value from the table as a sequence of characters, and trying to find an exclamation point. If one is found, you get all the characters after the index position of the exclamation point. If no exclamation point is found, you use the original string. Then you use your cleaned-up zones value in the printfn statement. This produces a nicer value for Archway and other station’s zones:

Archway is in 2&3

In real life, you might go further and parse the &-separated values into a little array.

Answering Some Questions with the HTML Type Provider

So how about those challenges we mentioned at the beginning of this section? Once you have the data from the CSV type provider, it’s just a matter of using the appropriate F# language constructs to select and manipulate the data as required.

Rank the Stations by Traffic

To rank the stations by passenger traffic (the Usage column) you need to get the usage as a floating point value. Unfortunately, you hit the same problem as earlier for ticketing zones: the Wikipedia table has some additional hidden information in that column, which appears in the text retrieved by the type provider. This time you have both sorting information (in the form 7001523100000000000♠), and for some columns, a note in square brackets. For example, the usage column for Bank Station contains the string 700152310000000000052.31 [note 3]. This means you have to beef up your clean-up code to get the substring after any heart symbol and before any [ symbol, and then try to convert that to a floating point value. Then you sort by the resulting value. This code is shown in Listing 13-6.

Listing 13-6. Stations by Usage
let StationsByUsage() =
    let getAfter (c : char) (s : string) =
        match s |> Seq.tryFindIndex ((=) c) with
            | Some(idx) -> s.[idx+1 ..]
            | None -> s


    let getBefore (c : char) (s : string) =
        match s |> Seq.tryFindIndex ((=) c) with
            | Some(idx) -> s.[.. idx-1]
            | None -> s


    let floatOrZero (s : string) =
        let ok, f = System.Double.TryParse(s)
        if ok then f else 0.


    let usageAsFloat (s : string) =
        s |> getAfter '♠' |> getBefore '[' |> floatOrZero


    let stations =
        Stations.Load(@"https://en.wikipedia.org/wiki/List_of_London_Underground_stations")


    let list = stations.Tables.Stations

    let ranked =  
        list.Rows
        |> Array.map (fun s -> s.Station, s.``Usage[5]`` |> usageAsFloat)
        |> Array.sortByDescending snd


    for station, usage in ranked do
        printfn "%s, %f" station usage

Finding All of the Stations on the Northern Line

Finding all the stations on a particular line is a bit easier. You just need to filter for stations whose Line(s) column contains the name in question (see Listing 13-7). If you want perfect safety, you need to be a bit more careful about parsing the Line(s) column into separate lines, but the code works in practice.

Listing 13-7. Stations on a Specific Underground Line
let StationsOnLine (lineName : string) =
    let stations =
        Stations.Load( @"https://en.wikipedia.org/wiki/List_of_London_Underground_stations")


    let list = stations.Tables.Stations

    let ranked =  
        list.Rows
        |> Array.filter (fun s -> s.``Line(s)[*]``.Contains lineName)


    for station in ranked do
        printfn "%s, %s" station.Station station.``Line(s)[*]``

Which Station Has the Most Distinct Letters in Its Name?

This is a matter of performing a descending sort by a count of the distinct letters in the station name, and getting the first resulting value. In F#, strings can be treated as sequences of characters, so you can directly call functions in the Seq module to do the work. See Listing 13-8.

Listing 13-8. The Station with the Most Distinct Letters
let StationWithMostLetters() =
    let stations =
        Stations.Load( @"https://en.wikipedia.org/wiki/List_of_London_Underground_stations")


    let list = stations.Tables.Stations

    let byChars =  
        list.Rows
        |> Array.sortByDescending
            (fun s -> s.Station |> Seq.distinct |> Seq.length)


    byChars.[0]

Using the SQL Client Type Provider

In this section, you’ll learn how to use the SQL Client type provider, which gives you frictionless, strongly typed access to SQL Server databases. In many scenarios, this type provider is a great alternative to heavyweight solutions like Entity Framework.

Note

To use this example, you need to install SQL Server (the Express edition is suitable, but you can also use the full versions) or have access to SQL Server on another machine. You’ll also need the AdventureWorks sample database or some other interesting dataset. You can readily adapt the code in this example to your own data.

If you need the AdventureWorks sample, you can find it at http://msftdbprodsamples.codeplex.com/ .

To get started, create a new F# Library project, and add a reference to System.Data. Use NuGet or your favorite package manager to install the package FSharp.Data.SqlClient. Rename Library.fs to AdventureWorks.fs and add the code in Listing 13-9. Edit the connectionString value to reflect the requirements of the SQL Server installation to which you are connecting.

Listing 13-9. Using the SQL Client Type Provider
#if INTERACTIVE
// You may have to alter this path depending on the version
// of FSharp.Data.SqlClient downloaded and on your project structure
#r @"..packagesFSharp.Data.SqlClient.1.8lib et40FSharp.Data.SqlClient.dll"
#else
module AdventureWorks
#endif


open FSharp.Data

[<Literal>]
let connectionString =
    @"Data Source=.SQLEXPRESS;Initial Catalog=AdventureWorks2014;Integrated Security=True"


let FindPeople surnameWildCard =
    use cmd = new SqlCommandProvider<"""
        SELECT
            BusinessEntityId,
            Title,
            FirstName,
            LastName
        FROM
            Person.Person
        WHERE
            Person.LastName LIKE @surnameWildCard
        ORDER BY
            LastName,
            FirstName
        """ , connectionString>()


    cmd.Execute(surnameWildCard = surnameWildCard)

Try the code by selecting all and sending to F# Interactive. Then switch to the F# Interactive window and run the function by typing FindPeople "%sen";; in the window, like so:

> FindPeople "%sen";;
val it :
  System.Collections.Generic.IEnumerable<SqlCommandProvider<...>.Record> =
  seq
    [{ BusinessEntityId = 12548; Title = None; FirstName = "Alejandro"; LastName = "Andersen" }
       {Item = ?;};
     { BusinessEntityId = 19795; Title = None; FirstName = "Alicia"; LastName = "Andersen" }
       {Item = ?;};
     { BusinessEntityId = 12018; Title = None; FirstName = "Alisha"; LastName = "Andersen" }
       {Item = ?;};
     { BusinessEntityId = 16657; Title = None; FirstName = "Alison"; LastName = "Andersen" }
       {Item = ?;}; ...]
>

As you can see, the function returns a sequence of records, and each record has a field that comes straight from the corresponding database column. Also note how the optional field Title is mapped as an option type, so it gets the value None when the underlying database row contains a database NULL value for that row.

There are some other points to note about this particular type provider. It doesn’t provide you with direct IntelliSense-style support while you write your SQL. Nor does it try to “abstract away” SQL. The benefit of this is that you get the full richness of the SQL language in all its raw power. Also, you are free to write the most efficient queries possible, something which is always compromised to some extent when SQL is abstracted away.

Having said this, you do get feedback about whether your SQL is valid as you type, provided that you are using an environment like Visual Studio, which offers “red wiggly line” support for language syntax errors . Prove this by editing the SQL from Listing 13-9 to introduce a deliberate error. After a short delay you’ll see a red wiggly line below all your SQL code (Figure 13-4). If you hover the cursor above any of the highlighted code, you’ll see details of the error.

A340906_2_En_13_Fig4_HTML.jpg
Figure 13-4. Errors in SQL are highlighted

This is usually enough information to home in on the mistake.

Another thing to note is that any SQL variables you include (such as @surnameWildCard in this example) are automatically picked up by the type provider and become required parameters of the Execute method:

cmd.Execute(surnameWildCard = surnameWildCard)

If you activate IntelliSense for Execute in this code (in Visual Studio, place the cursor just after the opening bracket and press Ctrl+Shift+Space), you’ll see a tool tip indicating that Execute takes a string parameter called surnameWildCard (Figure 13-5). The parameter list will update every time you add, rename, or remove SQL @ variables in your SQL code. Amazing!

A340906_2_En_13_Fig5_HTML.jpg
Figure 13-5. A parameter automatically generated from an SQL variable

Now take a moment to explore the implications of the fact that nullable database columns are exposed by the type provider as option types. Add the code from Listing 13-10 below your existing code.

Listing 13-10. Attempting to List People Separated by Tabs
let ShowPeople =
    FindPeople
    >> Seq.iter (fun person ->
        printfn "%s %s %s"
            person.Title person.FirstName person.LastName)

Here you are attempting to write a function that calls FindPeople and then lists out some fields from the result in a tab-separated form. (Incidentally, you partially apply FindPeople and use function composition via the >> operator, meaning that you don’t have to repeat the surnameWildCard parameter in this function.) As soon as you paste this code, you should get an error because the format string is expecting person.Title to be a string when in fact it is an Option<string>. This doesn’t happen for the FirstName and LastName fields because these are mandatory in the database, and so can be mapped directly to strings.

You can fix this by providing a little function that maps from an Option<string> to either the underlying string if the input is Some, or to an empty string if it is None. Listing 13-11 shows such a function, named as an operator (∼∼) so that it appears super-lightweight at the call site.

Listing 13-11. Dealing with Optional Strings
let (∼∼) (s : string option) =
    match s with
    | Some s -> s
    | None -> ""

Paste this into your code above ShowPeople, and amend ShowPeople so that it uses the ∼∼ operator for the person.Title field:

printfn "%s	%s	%s"
   ∼∼person.Title person.FirstName person.LastName

Send all your code to F# Interactive and try it out by calling ShowPeople:

> ShowPeople("%sen");;
        Alejandro     Andersen
        Alicia        Andersen
        Alisha        Andersen
        Alison        Andersen
        Alvin         Andersen
        ...
Ms.     Dorothy       Wollesen
>

Joining Datasets from Differing Data Sources

A common scenario, particularly in reporting and data cleansing situations, is the need to combine datasets from completely different sources. You can use F# type providers and F# query expressions to do this simply and effectively.

Let’s imagine you are doing a data cleansing exercise on the data in the AdventureWorks2014 database . You need to see any country names in Person.CountryRegion where the country names don’t match up with a reference list, and take a look at any notes that might indicate that a country code has been reassigned or is otherwise unusual. You can do this using the HTML type provider to get the country code’s reference information from Wikipedia, the SQL Client type provider to get the data to be checked from the database, and an F# query expression to combine them.

Start by creating a new F# library project; rename Library1.fs to VerifyCountryCodes.fs. Use NuGet or your favorite package manager to add the FSharp.Data and FSharp.Data.SqlClient packages. Then add setup information for your Wikipedia and SQL connections, much like you did in the preceding exercises for the HTML and SQL Client type providers (Listing 13-12).

Listing 13-12. Setting up the HTML and SQL Client Type Providers
#if INTERACTIVE
// You may have to alter this path depending on the version
// of FSharp.Data downloaded and on you project structure
#r @"..packagesFSharp.Data.2.2.5lib et40FSharp.Data.dll"
#r @"..packagesFSharp.Data.SqlClient.1.8lib et40FSharp.Data.SqlClient.dll"
#else
module VerifyCountryCodes
#endif


open FSharp.Data

[<Literal>]
let url = "https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2"
[<Literal>]
let connectionString =
    @"Data Source=.SQLEXPRESS;Initial Catalog=AdventureWorks2014;Integrated Security=True"


type WikipediaPage =
    HtmlProvider<url>

Now make some record types to hold the fields that you want to get from each source and the fields you want in the final report (Listing 13-13).

Listing 13-13. Record Types for Data Sources and the Final Report
type WikiCode =
    {
        Code : string
        Name : string
        Notes : string
    }


type DbCode =
    {
        Code : string
        Name : string
    }


type ReportItem =
    {
        Code : string
        WikiName : string
        DBName : string
        Notes : string
    }

Now you need a function that will get the list of codes from the relevant Wikipedia page (Listing 13-14). Note how, as before, you use page.Tables.<Table Name> to get all the rows from a named table within the page. Then you iterate over the rows, creating a WikiCode instance for each row and returning the results as an F# list.

Listing 13-14. Getting County Codes from Wikipedia
let CountryCodesWikipedia() =
    let page = WikipediaPage.Load(url)


    let codes = page.Tables.``Officially assigned code elements``

    [ for row in codes.Rows ->
        { WikiCode.Code = row.Code
          Name = row.``Country name``
          Notes = row.Notes } ]

If you like, you can now send this code to F# Interactive and check that the CountryCodesWikipedia function works as expected.

> CountryCodesWikipedia();;
val it : WikiCode list =
  [{Code = "AD";
    Name = "Andorra";
    Notes = "";};
...

Now you need a function to get country codes and names from the database. This is very similar to CountryCodesWikipedia except that you use the SQL Client type provider (Listing 13-15).

Listing 13-15. Getting Country Codes from the Database
let CountryCodesDatabase() =
    use cmd = new SqlCommandProvider<"""
        SELECT
                CountryRegionCode,
                Name
        FROM
                Person.CountryRegion
        ORDER BY
                CountryRegionCode
        """ , connectionString>()


    let data = cmd.Execute()

    [ for row in data ->
        { DbCode.Code = row.CountryRegionCode
          Name = row.Name } ]

Once again you can send this to F# Interactive and give it a try:

> CountryCodesDatabase();;
val it : DbCode list =
  [{Code = "AD";
    Name = "Andorra";};
...

Now you need to tie these two data sources together, and filter for suspicious results, which in this scenario are any codes with notes on Wikipedia and any codes whose country names differ between Wikipedia and the AdventureWorks2014 database. A good way of joining IEnumerable datasets is F#’s query expression syntax, as shown in Listing 13-16.

Listing 13-16. Using a Query Expression to Join Two Datasets
let Report() =
    query {
        for dbc in CountryCodesDatabase() do
        join wkc in CountryCodesWikipedia() on
            (dbc.Code = wkc.Code)
        where (wkc.Notes <> "" || wkc.Name <> dbc.Name)
        sortBy (wkc.Code)
        select
            {
                ReportItem.Code = wkc.Code
                WikiName = wkc.Name
                DBName = dbc.Name
                Notes = wkc.Notes
            }
    }
    |> Array.ofSeq

The main logic is enclosed in query {}, which is a computation expression like seq {} in which certain keywords and constructs have a special meaning. In the case of query {}, these are keywords that let you get, join, filter, and project data from IEnumerables. You start by calling CountryCodesDatabaseand defining a label dbc for items from IEnumerable:

for dbc in CountryCodesDatabase() do

Then you call CountryCodesWikipedia(), stating that you want to join the Wikipedia results to the database results where the two match on code:

join wkc in CountryCodesWikipedia() on
    (dbc.Code = wkc.Code)

Next, you filter the results for items that are suspicious according to the required logic: different country codes or notes in Wikipedia. You also sort the results by code:

where (wkc.Notes <> "" || wkc.Name <> dbc.Name)
sortBy (wkc.Code)

Finally, within the query expression you project the results into instances of your ReportItem record type :

select
    {
        ReportItem.Code = wkc.Code
        WikiName = wkc.Name
        DBName = dbc.Name
        Notes = wkc.Notes
    }

All that remains is to convert the results from an F# sequence to an array. The advantages of doing this are that it prevents any laziness from leaking into the wider scope, it’s friendlier to other calling languages, and it also makes it easier to see the full results in F# Interactive.

Send all your code to F# Interactive and run it by calling Report(). You get a nice dataset that can be used as a great basis for reviewing the validity of the country codes and names in the database.

> Report();;
val it : ReportItem [] =
  [|{Code = "AI";
     WikiName = "Anguilla";
     DBName = "Anguilla";
     Notes = "AI previously represented French Afar and Issas";};
    {Code = "AQ";
     WikiName = "Antarctica";
     DBName = "Antarctica";
     Notes =
      "Covers the territories south of 60° south latitude Code taken from name in French: Antarctique";};
...
    {Code = "VN";
     WikiName = "Viet Nam";
     DBName = "Vietnam";
     Notes = "ISO country name follows UN designation (common name: Vietnam)";};
...

Summary

In this chapter, you learned about the concept of type providers, and how that concept plays out in the CSV type provider, the HTML type provider, and the SQL Client type provider. Type providers yield an amazing productivity gain in a wide variety of situations where you are required to access external, structured data .

You also learned about F# query expressions , and how to apply them to join datasets coming from two different type providers into a single stream of data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.79.241