© Robert Pickering and Kit Eason 2016

Robert Pickering and Kit Eason, Beginning F# 4.0, 10.1007/978-1-4842-1374-2_11

11. Language-Oriented Programming

Robert Pickering and Kit Eason

(1)St. Germain-En-Laye, France

In this chapter, you will begin by taking a look at what I mean by language-oriented programming, a term that has been used by many people to mean different things. I’ll also briefly discuss its advantages and disadvantages. Next, you’ll look at several different approaches to language-oriented programming in F#. These techniques include using F# literals to create little languages and using F# quotations. You’ll spend the bulk of this chapter looking at examples where you create a language and then create an interpreter to execute that language. Finally, you’ll take a more detailed look at how languages are executed, including a performance comparison of interpreted or compiled execution techniques.

What Is Language-Oriented Programming?

People use the term language-oriented programming to describe many different programming techniques , but the techniques they refer to tend to share a common theme. It’s quite common for programmers to have to implement a predefined language; often this is because you need to extract structured data from information stored or received as string or XML that conforms to this predefined language. The techniques introduced in this chapter will help you do this more reliably. Related to this is the idea of little languages, or domain-specific languages (DSLs); you might want to create a DSL when the best way to solve a problem is to create a custom language to describe the problem and then use this language to solve that problem. Note that you don’t have to create an entire, compilable programming language to implement a DSL. A DSL is simply a set of labels, data structures, and functions that you can use in a language-like way from your “normal” code. Functional programming has always had a strong relationship with language-oriented programming because functional programming languages generally have features that are well suited to creating parsers and compilers.

Data Structures as Little Languages

Taking advantage of language-oriented development doesn’t necessarily mean writing your own parser or compiler. You can accomplish a lot by creating data structures that describe what you want to do and then creating functions or modules that define how the structure should be interpreted. You can create data structures that represent a program in just about any language, but F# lends itself well to this approach. F#’s literal lists and arrays are easy to define and require no bulky type annotations. Its discriminated union types allow you to create structures that express concepts that are related, but which do not necessarily contain the same types of data. You can take advantage of this to build tree-like structures, which prove useful when creating languages. Finally, you can treat functions as values, so you can easily embed functions within data structures. This means F# expressions can become part of your language, usually as an action in response to some particular condition of the language.

Let’s start by taking a look at a beautiful and very practical library that lets you write your own tiny DSL for parsing command line arguments. It’s called Argu(formerly UnionArgParser) and you can install it using NuGet or your favorite package manager. Let’s imagine that you want to create a command line application that can list the contents of a folder in CSV format, allowing the user to specify (using command line parameters) which information about each file to display, and what separator to use (comma, tab, and so forth). Even for this relatively simple requirement, parsing the command line into flags you can use to control the behavior of your program would be a pain. But solving the problem by thinking of the arguments as a DSL makes it super simple.

Start by creating an F# command line application called DirList and install Argu. At the beginning of Program.fs, replace everything before the main function and its EntryPoint attribute with the code from Listing 11-1.

Listing 11-1. A DSL as a Discriminated Union
open Argu

type Arguments =
| Dir of string
| Sep of string
| Date of bool
| Size of bool
with
    interface IArgParserTemplate with
        member s.Usage =
            match s with
            | Dir _ -> "specify a directory to list."
            | Sep _ -> "specify a separator."
            | Date _ -> "specify whether to include the date."
            | Size _ -> "specify whether to include the size."

This is a classic way of getting started with a DSL : defining a discriminated union that indicates the main keywords of the language and the argument types they take. In this case, you are saying “the arguments can contain a directory name that is a string; a CSV separator that is a string; a date flag that is a bool;” and so forth. When getting started in this way, you firmly avoid thinking about how you are going to implement the DSL; you just declare its shape.

In the latter part of Listing 11-1, you also implement an interface called IArgParserTemplate, whose Usage member will come in handy when telling the user what arguments are available. Binding the DSL keywords directly with their documentation is where you start getting the benefits of the Argulibrary.

Now let’s think about how you are going to implement the actual listing of the files. Listing 11-2 shows a simple function that, given the appropriate arguments, will list the files in the specified way.

Listing 11-2. A Function to List Files in a CSV-Friendly Style
open System.IO

let ListFiles
    (directory : string) (sep : string)
    (includeDate : bool) (includeSize : bool) =
    directory
    |> Directory.EnumerateFiles
    // If you are limited to F# 3.x you will have to replace the following line with
    // |> Seq.map (fun name -> FileInfo name)
    |> Seq.map FileInfo
    |> Seq.iter (fun info ->
        printfn "%s%s%s"
            info.Name
            (if includeDate then
                sprintf "%s%s" sep (info.LastWriteTime.ToString())
             else
                "")
            (if includeSize then
                sprintf "%s%s" sep (info.Length.ToString())
             else
                "")
    )

This code performs a Directory.EnumerateFilesto list the names of files; maps those names into FileInfo instances using Seq.map; then iterates over the results, printing out the required columns, separated by the required separators.

You haven’t joined this function onto the parsed arguments yet, but you can still send the function to F# Interactive and execute it manually as a simple exploratory test .

>

val ListFiles :
  directory:string ->
    sep:string -> includeDate:bool -> includeSize:bool -> unit


> ListFiles @"c: emp" "," true true;;
akkaissue.png,01/06/2015 09:03:26,24917
animals.xml,08/09/2015 07:00:03,81
archie.png,19/10/2015 12:02:49,41092
asyncdemo.txt,27/01/2015 09:09:04,41
bike.jpg,28/02/2015 21:17:17,20953
bootstrap.sh,26/10/2015 15:36:21,47
bootstrap3.sh,10/11/2015 09:18:30,167
cluster.txt,08/12/2015 11:38:14,17
createuat.sql,24/09/2015 08:33:31,2341
CreatingAnArray.fs,18/04/2014 19:23:38,1269

At this point, you’ve got both ends of the pipeline defined. Next, you need to do the bit in the middle: mapping from the command line arguments the user typed to the parameters the ListFiles function requires. Here’s where you need Argu magic! Place the contents of Listing 11-3 into Program.fs, replacing the existing main function.

Listing 11-3. Linking Your Arguments DSL with the ListFiles Function
[<EntryPoint>]
let main argv =
    let parser = ArgumentParser.Create<Arguments>()
    let args =
        try
            parser.Parse argv |> Some
        with
        | _ ->
            printfn "Usage: %s" (parser.Usage())
            None
    match args with
    | Some a ->
        let dir = a.GetResult(<@ Dir @>, defaultValue = ".")
        let incDate = a.GetResult(<@ Date @>, defaultValue = false)
        let incSize = a.GetResult(<@ Size @>, defaultValue = false)
        let sep = a.GetResult(<@ Sep @>, defaultValue = ",")
        ListFiles dir sep incDate incSize
        0
    | None -> 1

Here you create a parser using ArgumentParser.Createand, crucially, you provide the Arguments discriminated union as a type argument. You are telling Arguthe definition of your tiny DSL, so it knows what elements it is expected to parse from the argument string you will give it later.

Next (and in a try block in case of invalid arguments) you call parser.Parse, providing the contents of argv. In case you didn’t know, argv is a parameter that is automatically populated for you by the runtime environment, with an array of strings representing the arguments that the user typed in the command line. The command line string is just split up on spaces; there is no intelligence at this stage regarding which items are flag names (e.g. –date) and which are values (e.g. true).

Because it’s possible that the arguments couldn’t be parsed, you handle the resulting exception by returning a None. This means that on success you must pass the output of parser.Parse to Some.

If the arguments were successfully parsed, you can start to access the parameter values. Argu lets you do that using the F# Quotation mechanism, which you first explored in Chapter 6. All you have to do is pass in the discriminated union case (in this example, Dir), which represents the argument whose value you want to get:

let dir = a.GetResult(<@ Dir @>, defaultValue = ".")

See how you can also specify a default value . This will be used if the relevant argument doesn’t appear in the command line.

Now that you have all of the argument values, you can finally call your ListFiles function . You return 0 for success or 1 for failure, in case your program is being used a part of a larger pipeline that needs to know if the file listing succeeded.

On Windows, you can try your file list out by compiling it, opening a command line window, and navigating to the directory where your compiler outputs its executable. Then type something like this, substituting in whatever directory path is appropriate in your environment):

dirlist --dir c:	emp
C:DirListin>dirlist --dir c: emp
akkaissue.png
animals.xml
archie.png
asyncdemo.txt
bike.jpg
bootstrap.sh
bootstrap3.sh
cluster.txt
createuat.sql
CreatingAnArray.fs

On Linux and OS X, you need to run a console window and then execute the following commands, varying the path in the cd command to suit where you created the project:

cd ∼/Code/DirList/DirList/bin/Debug
chmod 744  DirList.exe
./DirList.exe --dir ∼

This should show you a list of the files in your home directory.

Now experiment with the parameters (if you can’t remember then, type dirlist –help). For example,

dirlist --dir c:	emp --date true --size true

On Linux and OS X this would be

./DirList.exe --dir ∼ --date true --size true

Also, see what happens when you type incorrect arguments, such as:

dirlist --foo bar

I am particularly fond of this kind of DSL because I think it makes it clear what arguments the program is expecting and what processing should take place if that argument is received. The fact that the help text is bound more or less directly to the union cases serves a double purpose; it allows the function processing the command line arguments to print out help text automatically if anything goes wrong, and it also reminds you what the argument is in case you forget. I also like this method of creating a command line interpreter because I have written several command line interpreters in imperative languages, and it is not a satisfying experience—you end up having to write lots of code to detail how your command line should be broken up. If you write that code in a traditional imperative way, then you usually spend way too much time calling the string type’s IndexOfand Substring methods .

A Data Structure–Based Language Implementation

Creating any DSL should start with defining what problem you need to solve; in this case, let’s imagine that you need to define a DSL library (sometimes called a combinators library) for drawing 2D images . This is something of an obvious choice. This example demonstrates how you can build up complicated structures out a number of simple primitives. An image on a computer screen is essentially just a collection of lines and polygons, although the image displayed might be extremely intricate. You begin by walking through the main points of the design process and conclude by looking at the full listings.

Note

This example was largely inspired by work done on a similar but much larger system by Chance Coble and Roger Castillo.

You start by designing a set of types that that will describe your picture; these types form the primitives of your image:

// represents the basic shapes that will make up the scene
type Shape =
    | Line of Position * Position
    | Polygon of List<Position>
    | CompositeShape of List<Shape>

This type is recursive, and the CompositeShape union case contains a list of shapes that it will use to form a tree-like structure. In compiler development, this tree-like structure is referred to as the abstract syntax tree (AST) . You’ll see another example of using an AST to represent a program at the end of the chapter.

So far, you have created your picture using three basic elements: lines, polygons, and shapes. The fact that your type is made up of just three simple elements is an important design decision; making your primitives simple makes implementing the engine that will render the image much simpler. The fact that your primitives are so simple means you don’t expect your user to spend time interacting with them directly; instead you’ll provide a set of higher-level wrapper functions that return values of type shape. These are your combinators. The CompositeShapecase in your union is an important example of this; it allows you to build up more complicated shapes out of simpler elements. You expose this through the compose function:

// allows us to compose a list of elements into a
// single shape
let compose shapes = CompositeShape shapes

You use this function to implement a number of higher-level functions. For example, the lines function , which takes a list of positions and returns a shape that is a path through those positions, takes advantage of the compose function to combine a number of individual lines into a single line:

// a line composed of two or more points
let lines posList =
    // grab first value in the list
    let initVal =
        match posList with
        | first :: _ -> first
        | _ -> failwith "must give more than one point"
    // creates a new link in the line
    let createList (prevVal, acc) item =
        let newVal = Line(prevVal, item)
        item, newVal :: acc
    // folds over the list accumlating all points into a
    // list of line shapes
    let _, lines = List.fold createList (initVal, []) posList
    // compose the list of lines into a single shape
    compose lines

Next, you use this lines function in the implementation of several high-level shapes, such as the square function:

let square filled (top, right) size =
    let pos1, pos2 = (top, right), (top, right + size)
    let pos3, pos4 = (top + size, right + size), (top + size, right)
    if filled then
        polygon [ pos1; pos2; pos3; pos4; pos1 ]
    else
        lines [ pos1; pos2; pos3; pos4; pos1 ]

The square function uses the lines function to plot the outline of a square with the calculated points. You can see the full module in Listing 11-4, although a more realistic implementation would probably contain more basic shapes for the users of the library to choose from.

To get this code working, start by creating a new F# Console Application . Add references to System.Drawing and System.Windows.Forms. Add new source files called Combinators.fs and Form.fs, and ensure that the three source files appear in the project in the order Combinators.fs, Form.fs, and then Program.fs. (In Visual Studio, you can reorder in Solution Explorer by right-clicking a source file and choosing “Move Up” or “Move Down.” In Xamarin Studio and MonoDevelop, it’s even easier: just drag source files into the required order in Solution Explorer.) Listing 11-4 goes in Combinators.fs, Listing 11-5 goes in Form.fs, and Listing 11-6 goes in Program.fs.

Listing 11-4. A Combinator Library for Creating Images
namespace Strangelights.GraphicDSL
open System.Drawing


// represents a point within the scene
type Position = int * int


// represents the basic shapes that will make up the scene
type Shape =
    | Line of Position * Position
    | Polygon of List<Position>
    | CompositeShape of List<Shape>


// allows us to give a color to a shape
type Element = Shape * Color


module Combinators =
    // allows us to compose a list of elements into a
    // single shape
    let compose shapes = CompositeShape shapes


    // a simple line made from two points
    let line pos1 pos2 = Line (pos1, pos2)


    // a line composed of two or more points
    let lines posList =
        // grab first value in the list
        let initVal =
            match posList with
            | first :: _ -> first
            | _ -> failwith "must give more than one point"
        // creates a new link in the line
        let createList (prevVal, acc) item =
            let newVal = Line(prevVal, item)
            item, newVal :: acc
        // folds over the list accumlating all points into a
        // list of line shapes
        let _, lines = List.fold createList (initVal, []) posList
        // compose the list of lines into a single shape
        compose lines


    // a polygon defined by a set of points
    let polygon posList = Polygon posList    


    // a triangle that can be either hollow or filled
    let triangle filled pos1 pos2 pos3 =
        if filled then
            polygon [ pos1; pos2; pos3; pos1 ]
        else
            lines [ pos1; pos2; pos3; pos1 ]


    // a square that can either be hollow or filled
    let square filled (top, right) size =
        let pos1, pos2 = (top, right), (top, right + size)
        let pos3, pos4 = (top + size, right + size), (top + size, right)
        if filled then
            polygon [ pos1; pos2; pos3; pos4; pos1 ]
        else
            lines [ pos1; pos2; pos3; pos4; pos1 ]

You now have the basic elements of your language; next you need to implement an interpreter to display the image. The interpreter described in this chapter is a WinForm . The advantage to this approach is that you might also implement an interpreter in WPF or some other graphics environment, which means that it is quite portable between GUI libraries and platforms. Implementing the interpreter is straightforward. You just need to implement each of your union cases. In the case of Line and Polygon, you draw these shapes using the GDI+ objects that WinForms are based on. Fortunately, GDI+ makes it straightforward to draw a line or polygon. The third CompositeShapecase is also straightforward; you simply call your drawing function recursively. You can see the full source code for this in Listing 11-5.

Listing 11-5. An Interpreter to Render Images from Your Combinator Library
namespace Strangelights.GraphicDSL

open System.Drawing
open System.Windows.Forms


// a form that can be used to display the scene
type EvalForm(items: List<Element>) as x =
    inherit Form()
    // handle the paint event to draw the scene
    do x.Paint.Add(fun ea ->
        let rec drawShape (shape, (color: Color)) =
            match shape with
            | Line ((x1, y1), (x2, y2)) ->
                // draw a line
                let pen = new Pen(color)
                ea.Graphics.DrawLine(pen, x1, y1, x2, y2)
            | Polygon points ->
                // draw a polygon
                let points =
                    points
                    |> List.map (fun (x,y) -> new Point(x, y))
                    |> Array.ofList
                let brush = new SolidBrush(color)
                ea.Graphics.FillPolygon(brush, points)
            | CompositeShape shapes ->
                // recursively draw the other contained elements
                List.iter (fun shape -> drawShape(shape, color)) shapes
        // draw all the items we have been passed
        items |> List.iter drawShape)

Putting together a simple image composed of two squares and a triangle now becomes straightforward. You simply call the appropriate functions from your combinator library and then combine them with a color to make a full description of the scene. Listing 11-6 shows how to do this; you can see the resulting image in Figure 11-1.

A340906_2_En_11_Fig1_HTML.jpg
Figure 11-1. A scene rendered by your combinator library
Listing 11-6. Calling Your Combinator Library
open System.Drawing
open System.Windows.Forms
open Strangelights.GraphicDSL


// two test squares
let square1 = Combinators.square true (100, 50) 50
let square2 = Combinators.square false (50, 100) 50


// a test triangle
let triangle1 =
    Combinators.triangle false
        (150, 200) (150, 150) (250, 200)


// compose the basic elements into a picture
let scence = Combinators.compose [square1; square2; triangle1]


// create the display form
let form = new EvalForm([scence, Color.Red])


[<EntryPoint>]
let main argv =
    // show the form
    Application.Run form
    0

The simple example given in Listing 11-6 probably doesn’t represent how you would create images using the combinator library. You’ll take look at a more realistic scenario in Listing 11-7. The best approach to using the combinator library would probably be to carry on programming in the style you wrote the original combinator library in; that is, you would build up simple elements that you can reuse in your image. Next, you’ll look at how to create a scene composed of seven stars. The obvious place to start is with the creation of the star; you can see how to create the star function defined in Listing 11-7. This function creates triangles that are mirror images and combines them with a slight offset to form a six-sided star. This example might give you some idea of how to build up more complex shapes out of simpler ones. Once you have the definition of your star, you simply need a list of positions that tell where to print the stars. You can see this list of points in Listing 11-7. Once you have these two elements, you can combine them using the List.map function and the compose function to create your scene. Next, you can display your scene the same way you did in the previous listing.

Listing 11-7. Creating a More Complex Image Using Your Combinator Library
open System.Drawing
open System.Windows.Forms
open Strangelights.GraphicDSL


// define a function that can draw a 6 sided star
let star (x, y) size =
    let offset = size / 2
    // calculate the first triangle
    let t1 =
        Combinators.triangle false
            (x, y - size - offset)
            (x - size, y + size - offset)
            (x + size, y + size - offset)
    // calculate another inverted triangle
    let t2 =
        Combinators.triangle false
            (x, y + size + offset)
            (x + size, y - size + offset)
            (x - size, y - size + offset)
    // compose the triangles
    Combinators.compose [ t1; t2 ]    


// the points where stars should be plotted
let points = [ (10, 20); (200, 10);
               (30, 160); (100, 150); (190, 150);
               (20, 300); (200, 300);  ]


// compose the stars into a single scene
let scence =
    Combinators.compose
        (List.map (fun pos -> star pos 5) points)


// show the scene in red on the EvalForm
let form = new EvalForm([scence, Color.Red],
                        Width = 260, Height = 350)


[<EntryPoint>]
let main argv =
    // show the form
    Application.Run form
    0

Replace the contents of Program.fs with Listing 11-7 and run your project. Figure 11-2 shows the resulting image.

A340906_2_En_11_Fig2_HTML.jpg
Figure 11-2. Another scene rendered by your combinator library

You’ve now seen two approaches for creating combinator libraries (libraries that create little languages though data structures). At this point, you’re probably beginning to see how you can break a problem down into an abstract description of the problem based on a small set of primitives, possibly with aid of other libraries that you build on these primitives.

Note

If you’re looking for a more in-depth view of a combinator library, take a look at the paper by Simon Peyton Jones, Jean-Marc Eber, and Julian Seward called “Composing contracts: an adventure in financial engineering.” The paper gives an in-depth, yet understandable, study of a combinator library for describing derivatives contracts. The examples in the paper are in Haskell rather than F#, but you could translate them to F# with some effort. You can read this paper at http://research.microsoft.com/en-us/um/people/simonpj/papers/financial-contracts/contracts-icfp.htm .

Metaprogramming with Quotations

In Chapter 6, you used quotations ; these are quoted sections of F# code where the quote operator instructs the compiler to generate data structures representing the code, rather than IL representing the code. This means you have a data structure that represents the code that was coded, rather than code you can execute, and you’re free to do what you want with it. You can either interpret it, performing the actions you require as you go along, or you can compile it into another language. Or you can simply ignore it if you want. You could, for example, take a section of quoted code and compile it for another runtime, such as the Java Virtual Machine (JVM ).

In the next example, you’ll write an interpreter for integer-based arithmetic expressions in F#. This might be useful for learning how stack-based calculations work. Here, your language is already designed for you; it is the syntax available in F#. You’ll work exclusively with arithmetic expressions of the form <@ (2 * (2 - 1)) / 2 @>. This means you need to generate an error whenever you come across syntax that is neither an integer nor an operation. Quotations are based on discriminated union types. When working with quotations, you have to query the expression that you receive using F#’s pattern matching and active patterns. For example, here you query an expression using an active pattern and a when guard to see whether it is an integer; if it is, you push it onto the stack:

| Value (x,ty) when ty = typeof<int> ->
                                        let i = x :?> int
                                        printfn "Push %i" i
                                        operandsStack.Push(x :?> int)

If it isn’t an integer, you could go on to check whether it is one of several other types. There also several parameterized active patterns that you might find useful. For example, SpecificCallaccepts a quotation that is a function expression and allows you to query whether the quotation being matched over is a call to that function. You use this to determine whether a call to an operator is made. For example, the following example checks whether a call to the plus operator is made:

| SpecificCall <@ (+) @> (_,_, [l;r])  -> interpretInner l
                                          interpretInner r
                                          preformOp (+) "Add"

You can see the full code in Listing 11-8.

Listing 11-8. Stack-Based Evaluation of F# Quoted Arithmetic Expressions
open System.Collections.Generic
open Microsoft.FSharp.Quotations
open Microsoft.FSharp.Quotations.Patterns
open Microsoft.FSharp.Quotations.DerivedPatterns


let interpret exp =
    let operandsStack = new Stack<int>()
    let preformOp f name =
        let x, y = operandsStack.Pop(), operandsStack.Pop()
        printfn "%s %i, %i" name x y
        let result = f x y
        operandsStack.Push(result)
    let rec interpretInner exp =
        match exp with
        | SpecificCall <@ (*) @> (_,_, [l;r])  -> interpretInner l
                                                  interpretInner r
                                                  preformOp (*) "Mult"
        | SpecificCall <@ (+) @> (_,_, [l;r])  -> interpretInner l
                                                  interpretInner r
                                                  preformOp (+) "Add"
        | SpecificCall <@ (-) @> (_,_, [l;r])  -> interpretInner l
                                                  interpretInner r
                                                  preformOp (-) "Sub"
        | SpecificCall <@ (/) @> (_,_, [l;r])  -> interpretInner l
                                                  interpretInner r
                                                  preformOp (/) "Div"
        | Value (x,ty) when ty = typeof<int>    ->
                                                   let i = x :?> int
                                                   printfn "Push: %i" i
                                                   operandsStack.Push(x :?> int)
        | _ -> failwith "not a valid op"
    interpretInner exp
    printfn "Result: %i" (operandsStack.Pop())


interpret <@ (2 * (2 - 1)) / 2 @>

You can run this code in F# Interactive, producing the following results:

Push: 2
Push: 2
Push: 1
Sub 1, 2
Multi 1, 2
Push: 2
Div 2, 2
Result: 1

You are always working with F# syntax when you use quotations, which is both an advantage and a disadvantage. The advantage is that you can produce powerful libraries based on this technique that integrate well with F# code, but without having to create a parser. The disadvantage is that it is difficult to produce tools suitable for end users based on this technique. However, libraries that consume or transform F# quotations can still be used from other .NET languages because the F# libraries include functions and samples to convert between F# quotations and other common metaprogramming formats, such as LINQ quotations. For some interesting uses of quotations as little languages, you can see the F# DSL in Microsoft Solver Foundation at https://msdn.microsoft.com/en-us/library/ff524501(v=vs.93).aspx . You can also see my discussion of it on my blog at http://strangelights.com/blog/archive/2008/09/21/1628.aspx .

This concludes your examination of DSLs; the rest of the chapter will dig a bit deeper into the implementation of an interpreter or compiler for your language.

Implementing a Compiler and an Interpreter for an Arithmetic Language

So far, we’ve focused more on the design of the languages themselves, the front end, rather than the implementation of the compiler or interpreter for the language, the back end. In this section, you’ll focus on the implementation of a back end for a simple arithmetic language defined by an AST. The AST syntax tree shown in the first section is based on a union type.

To build a front end for this little language, in other words a parser that translates text input in the language, is beyond the scope of this book. If you want to take the example further, look at the NuGet package named fparsec ( www.quanttec.com/fparsec/ ).

You have two distinct modes of acting on the results of the parser: compiling the results and interpreting them. Compiling refers to changing the AST into some other format that is faster or easier for a machine to execute. Originally, this nearly always meant native code; these days, it’s more likely to refer to something a little more abstract, such as IL, F#, or even C#. Interpreting the results means acting on the results straightaway, without any transformation of the AST. You’ll look briefly at both of these topics in the “Interpreting the AST” and “Compiling the AST” sections; then you’ll compare the two approaches to get some idea of when to use each in the “Compilation vs. Interpretation” section.

The Abstract Syntax Tree

An AST is a representation of the construct that makes up the program; it’s intended to be easy for the programmer to use. One reason that F# is good for this kind of development is its union type. This type is great for representing languages because you can use it to represent items that are related yet do not share the same structure. The following example shows how to use an AST:

type Ast =
  | Ident of string
  | Val of System.Double
  | Multi of Ast * Ast
  | Div of Ast * Ast
  | Plus of Ast * Ast
  | Minus of Ast * Ast

The tree consists of only one type because it is quite simple. A complicated tree would contain many more types , but it would still follow this basic pattern. Here you can see that the tree, which is of the type Ast, will consist of either identifiers (the Ident type); the names of the identifiers represented by a string; or values (the Val type), which will be values represented by a System.Double. The type also includes four more types (Multi, Div, Plus, and Minus) that represent the arithmetic operations. They use recursion, allowing them to be composed of other expressions.

Interpreting the AST

Once you create your AST, you have two choices: you can either interpret it or compile it. Interpreting it simply means walking the tree and performing actions as you go. Compiling it means changing it into some other form that is easier, or more typically, faster, for the machine to execute. This section will examine how to interpret the results ; the next section will look at the options for compiling them; and finally, you will look at when you should use interpretation and when you should use compilation.

The following example shows a short interpreter for your program. The main work of interpreting the AST is done by the function interpret, which walks the tree, performing the necessary actions as it goes. The logic is quite simple. If you find a literal value or an identifier, you return the appropriate value :

| Ident (s) -> variableDict.[s]
| Val (v) -> v

If you find an operand, you recursively evaluate the expressions it contains to obtain its values and then perform the operation:

| Multi (e1, e2) -> (interpretInner e1) * (interpretInner e2)

You can see the complete interpreter in Listing 11-9.

Listing 11-9. Interpreting an AST Generated from Command-Line Input
open System

type Ast =
   | Ident of string
   | Val of System.Double
   | Multi of Ast * Ast
   | Div of Ast * Ast
   | Plus of Ast * Ast
   | Minus of Ast * Ast


// requesting a value for variable from the user
let getVariableValues e =
    let rec getVariableValuesInner input (variables : Map<string, float>) =
        match input with
        | Ident (s) ->
            match variables.TryFind(s) with
            | Some _ -> variables
            | None ->
                printfn "%s: " s
                let v = float(Console.ReadLine())
                variables.Add(s,v)
        | Multi (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | Div (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | Plus (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | Minus (e1, e2) ->
            variables
            |> getVariableValuesInner e1
            |> getVariableValuesInner e2
        | _ -> variables
    getVariableValuesInner e (Map.empty)


// function to handle the interpretation
let interpret input (variableDict : Map<string,float>) =
    let rec interpretInner input =
        match input with
        | Ident (s) -> variableDict.[s]
        | Val (v) -> v
        | Multi (e1, e2) -> (interpretInner e1) * (interpretInner e2)
        | Div (e1, e2) -> (interpretInner e1) / (interpretInner e2)
        | Plus (e1, e2) -> (interpretInner e1) + (interpretInner e2)
        | Minus (e1, e2) -> (interpretInner e1) - (interpretInner e2)
    interpretInner input


// the expression to be interpreted
let e = Multi(Val 2., Plus(Val 2., Ident "a"))


// collect the arguments from the user
let args = getVariableValues e


// interpret the expression
let v = interpret e args


// print the results
printf "result: %f" v

Executing this code in F# Interactive produces the following results:

[a]: 12
result: 28.000000

Compiling the AST

To many developers, compilation means generating native code , so it has a reputation for being difficult. But it doesn’t have to mean generating native code. For a DSL, you typically generate another, more general-purpose programming language. The .NET Framework provides several features for compiling an AST into a program.

Your choice of technology depends on several factors. For example, if you want to target your language at developers, it might be enough to generate a text file containing F#, some other language, or a compiled assembly that can then used within an application. However, if you want to target end users, you will almost certainly have to compile and then execute it on the fly. Table 11-1 summarizes the various options available.

Table 11-1. .NET Code-Generation Technologies

Technology

Description

Microsoft.CSharp. CSharpCodeProvider

This class supports compilation of a C# file that has been created on the fly, either by using simple string concatenation or by using the System.CodeDom namespace. Once the code has been compiled into an assembly, it can be loaded dynamically into memory and executed via reflection. This operation is relatively expensive because it requires writing to the disk and using reflection to execute methods.

System.CodeDom

This is a set of classes aimed at abstracting between operations available in different languages. The idea is to describe your operations using the classes available in this namespace, and then use a provider to compile them into the language of your choice. Providers are available for C# and Roslyn.

FSharp.Compiler.CodeDom

This is a CodeDom provider that can be used to compile F# on the fly in a similar way to the C# CodeDom provider. It can be downloaded from GitHub at http://github.com/fsprojects/FSharp.Compiler.CodeDom .

System.Reflection.Emit

This namespace allows you to build up assemblies using IL. IL offers more features than either F#, C#, or System.CodeDom, so it provides more flexibility; however, it is lower level so it also requires more patience and will probably take more time to get right.

FSharp.Compiler.Service

F# Compiler Services is a component derived from the F# compiler source code that exposes additional functionality for implementing F# language bindings, additional tools based on the compiler, or refactoring tools. The package also includes the F# Interactive service that can be used for embedding F# scripting into your applications.

Mono.Cecil

This is a library extensively used in the Mono framework for both parsing assemblies and dynamically creating them.

You use the System.Reflection.Emit.DynamicMethod class, not because you need the flexibility of IL, but because IL includes built-in instructions for floating-point arithmetic, which makes it well-suited for implementing a little language. The DynamicMethodclass also provides a fast and easy way to let you call into the resulting program.

The method createDynamicMethod compiles the AST by walking the AST and generating code. It begins by creating an instance of the DynamicMethod class to hold the IL you define to represent the method:

let temp = new DynamicMethod("", (type float), paramsTypes, meth.Module)

Next, createDynamicMethod starts walking the tree. When you encounter an identifier, you emit some code to load an argument of your dynamic method:

| Ident name ->
    il.Emit(OpCodes.Ldarg, paramNames.IndexOf(name))

When you encounter a literal, you emit the IL code to load the literal value:

| Val x -> il.Emit(OpCodes.Ldc_R8, x)

When you encounter an operation, you must recursively evaluate both expressions and then emit the instruction that represents the required operation:

| Multi (e1 , e2) ->
    generateIlInner e1
    generateIlInner e2
    il.Emit(OpCodes.Mul)

Note how the operation is emitted last, after both expressions have been recursively evaluated. You do it this way because IL is stack-based, so data from the other operations must be pushed onto the stack before you evaluate the operator.

You can see the complete compiler in Listing 11-10.

Listing 11-10. Compiling an AST Generated from Command Line Input
open System
open System.Reflection
open System.Reflection.Emit


type Ast =
   | Ident of string
   | Val of System.Double
   | Multi of Ast * Ast
   | Div of Ast * Ast
   | Plus of Ast * Ast
   | Minus of Ast * Ast


// get a list of all the parameter names
let rec getParamList e =
    let rec getParamListInner e names =
        match e with
        | Ident name ->
            if not (List.exists (fun s -> s = name) names) then
                name :: names
            else
                names
        | Multi (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
        | Div (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
        | Plus (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
        | Minus (e1 , e2) ->
            names
            |> getParamListInner e1
            |> getParamListInner e2
        | _ -> names
    getParamListInner e []


// create the dynamic method
let createDynamicMethod e (paramNames: string list) =
    let generateIl e (il : ILGenerator) =
        let rec generateIlInner e  =
            match e with
            | Ident name ->
                let index = List.findIndex (fun s -> s = name) paramNames
                il.Emit(OpCodes.Ldarg, index)
            | Val x -> il.Emit(OpCodes.Ldc_R8, x)
            | Multi (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Mul)
            | Div (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Div)
            | Plus (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Add)
            | Minus (e1 , e2) ->
                generateIlInner e1
                generateIlInner e2
                il.Emit(OpCodes.Sub)
        generateIlInner e
        il.Emit(OpCodes.Ret)


    let paramsTypes = Array.create paramNames.Length (typeof<float>)
    let meth = MethodInfo.GetCurrentMethod()
    let temp = new DynamicMethod("", (typeof<float>), paramsTypes, meth.Module)
    let il = temp.GetILGenerator()
    generateIl e il
    temp


// function to read the arguments from the command line
let collectArgs (paramNames : string list) =
    paramNames
    |> Seq.map
        (fun n ->
            printf "%s: " n
            box (float(Console.ReadLine())))
    |> Array.ofSeq


// the expression to be interpreted
let e = Multi(Val 2., Plus(Val 2., Ident "a"))


// get a list of all the parameters from the expression
let paramNames = getParamList e


// compile the tree to a dynamic method
let dm = createDynamicMethod e paramNames


// print collect arguments from the user
let args = collectArgs paramNames


// execute and print out the final result
printfn "result: %O" (dm.Invoke(null, args))

Running the code in Listing 11-10 produces the following results:

a: 14
result: 32

Compilation vs. Interpretation

So when should you use compilation and when should you use interpretation? The final result is basically the same, so the answer generally comes down to the raw speed of the final generated code, though memory usage and start-up times can also play a role in the decision. If you need your code to execute more quickly, then compilation will generally give you better results.

The test harness in Listing 11-11 enables you to execute the interpret function results of createDynamicMethod repeatedly and time how long this takes. It also tests an important variation on dynamic methods; that is where you also generate a new .NET delegate value to act as the handle by which you invoke the generated code. As you can see, it turns out that this is by far the fastest technique. Remember, you’re timing how long it takes to evaluate the AST either directly or in a compiled form; you’re not measuring the parse time or compilation time.

Listing 11-11. A Test Harness for Comparing Performance
open System
open System.Diagnostics
open Strangelights.Expression


// expression to process
let e = Multi(Val 2., Plus(Val 2., Val 2.))


// collect the inputs
printf "Interpret/Compile/Compile Through Delegate [i/c/cd]: "
let interpertFlag = Console.ReadLine()
printf "reps: "
let reps = int(Console.ReadLine())


type Df0 = delegate of unit -> float
type Df1 = delegate of float -> float
type Df2 = delegate of float * float -> float
type Df3 = delegate of float * float * float -> float
type Df4 = delegate of float * float * float * float -> float


// run the tests
match interpertFlag with
| "i" ->
    let args = Interpret.getVariableValues e
    let clock = new Stopwatch()
    clock.Start()
    for i = 1 to reps do
        Interpret.interpret e args |> ignore
    clock.Stop()
    printf "%i" clock.ElapsedTicks
| "c" ->
    let paramNames = Compile.getParamList e
    let dm = Compile.createDynamicMethod e paramNames
    let args = Compile.collectArgs paramNames
    let clock = new Stopwatch()
    clock.Start()
    for i = 1 to reps do
        dm.Invoke(null, args) |> ignore
    clock.Stop()
    printf "%i" clock.ElapsedTicks
| "cd" ->
    let paramNames = Compile.getParamList e
    let dm = Compile.createDynamicMethod e paramNames
    let args = Compile.collectArgs paramNames
    let args = args |> Array.map (fun f -> f :?> float)
    let d =
        match args.Length with
        | 0 -> dm.CreateDelegate(typeof<Df0>)
        | 1 -> dm.CreateDelegate(typeof<Df1>)
        | 2 -> dm.CreateDelegate(typeof<Df2>)
        | 3 -> dm.CreateDelegate(typeof<Df3>)
        | 4 -> dm.CreateDelegate(typeof<Df4>)
        | _ -> failwith "too many parameters"
    let clock = new Stopwatch()
    clock.Start()
    for i = 1 to reps do
        match d with
        | :? Df0 as d -> d.Invoke() |> ignore
        | :? Df1 as d -> d.Invoke(args.[0]) |> ignore
        | :? Df2 as d -> d.Invoke(args.[0], args.[1]) |> ignore
        | :? Df3 as d -> d.Invoke(args.[0], args.[1], args.[2]) |> ignore
        | :? Df4 as d -> d.Invoke(args.[0], args.[1], args.[2], args.[4]) |> ignore
        | _ -> failwith "too many parameters"
    clock.Stop()
    printf "%i" clock.ElapsedTicks
| _ -> failwith "not an option"

Table 11-2 summarizes the results of executing this program against this expression: Multi(Val 2., Plus(Val 2., Val 2.)).

Table 11-2. Summary of Processing the Expression Multi(Val 2., Plus(Val 2., Val 2 for Various Numbers of Repetitions (times in milliseconds)

Repetitions

1

10

100

1,000

10,000

100,000

1,000,000

Interpreted

6,890

6,979

6,932

7,608

14,835

84,823

799,788

Compiled via delegate

865

856

854

1,007

2,369

15,871

151,602

Compiled

1,112

1,409

2,463

16,895

151,135

1,500,437

14,869,692

Table 11-2 and Figure 11-3 indicate that Compiled and Compiled via delegate are much faster over a small number of repetitions. But notice that over 1, 10, and 100 repetitions, the amount of time required grows negligibly. This is because over these small numbers of repetitions, the time taken for each repetition is insignificant. It is only the time that the JIT compiler takes to compile the IL code into native code that is significant. This is why the Compiled and Compiled via delegatetimes are so close. They both have a similar amount of code to JIT compile. The Interpreted time takes longer because you must JIT compile more code, specifically the interpreter. But JIT is a one-off cost because you need to JIT each method only once; therefore, as the number of repetitions goes up, this one-off cost is paid for, and you begin to see a truer picture of the relative performance cost.

A340906_2_En_11_Fig3_HTML.jpg
Figure 11-3. The evaluation time in machine ticks of the expression 1 + 1 against the number of evaluations of the express

You can see clearly from Figure 11-3 that, as the number of repetitions goes up, the cost of Compiled goes up steeply. This is because accessing the compiled DynamicMethodthrough its Invoke method is expensive, and you incur this cost on every repetition. This means that the time taken for a Compiled method increases at the same rate as the number of repetitions. However, the problem lies not with compilation, but with how you invoke the compiled code. It turns out that calling a DynamicMethod through a delegate rather than the Invoke member on the dynamic delegate allows you to pay only once for the cost of binding to the method, so executing a DynamicMethod this way is much more efficient if you intend to evaluate the expression multiple times. Based on these results, compilation with invocation via a delegate provides the best option in terms of speed.

This analysis shows the importance of measurement : don’t assume that compilation has given you the expected performance gains until you actually see the benefits on realistic data sets and have used all the available techniques to ensure no unnecessary overhead is lurking in your code. However, many other factors can affect your performance in the real world. For example, if your expressions change often, your interpreter will need to be JIT compiled only once, but each compiled expression will need to be JIT compiled, which means you’ll need to run your compiled code many times if you want to see any performance gains. Given that interpreted code is usually easier to implement, and that compiled code provides significant performance gains only in certain situations, interpreted code is often your best choice.

When dealing with situations that require code to perform as quickly as possible, it’s generally best to try a few different approaches and then profile your application to see which approach gives the best results.

Summary

In this chapter, you looked at the main features and techniques for language-oriented programming in F# . You saw various techniques; some use data structures as little languages or work with quotations, which involve working with the existing F# syntax to change or extend it. Others, such as implementing a parser, enable you to work with just about any language that is text-based, whether this language is of your own design (or perhaps more commonly) a preexisting language. All these techniques can lead to big productivity gains if you use them correctly.

The next chapter will look at how F# interacts with other existing languages, such as C# and C++.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.213.44