C H A P T E R  2

images

DLR Expression

DLR Expression is the backbone of the DLR. It is a separate feature that you can use without involving the rest of the DLR. If you do use it with other DLR features, there are basically two usage scenarios. One scenario involves defining a language's semantics in terms of DLR expressions. The other is defining the late binding logic of binders and dynamic objects. Don't worry if these don't make much sense to you right now. In this chapter, you will learn how to use DLR Expression by itself, while later chapter will cover the two usage scenarios. Once you get a good grasp of using DLR Expression by itself, you'll be in a good position to use DLR Expression and the other DLR features together.

DLR Expression as a Language

Let's take a look at what DLR Expression is first, before getting into the examples. DLR Expression is much like a programming language. It has constructs like the loop expressions, assignment expressions, and method invocation expressions you normally see in other languages. For example, a Hello World program in C# looks like this:

Console.Writeline("Hello World");

The equivalent code in DLR Expression looks like this:

MethodInfo method = typeof(Console).GetMethod("WriteLine", new Type[] { typeof(String) });

Expression callExpression = Expression.Call(null,
          method,
          Expression.Constant("Hello World"));

This code snippet uses .NET reflection to create a MethodInfo instance that represents the static WriteLine method of the Console class. The code then calls the static Call method of the Expression class to create an Expression instance. The Expression instance represents a call to the Console.WriteLinemethod that will print “Hello World” to the screen. The first input parameter of the Expression.Callmethod is the target object upon which to call the method designated by the second input parameter of Expression.Call. If the method designated by the second input parameter is a static method, there's no target object to call the method on. That's why the code snippet passes null as the first input parameter to the Expression.Call method.

So what are the differences between DLR Expression and a normal programming language, other than that the code in DLR Expression looks a ton more verbose? There are three keys differences:

Code as data and data as code—code expressed in DLR Expression is data that can be more easily analyzed and worked on.

A common denominator of multiple languages—Like CLR's IL instructions, DLR expressions serve as the common denominator of multiple languages.

No concrete syntax, only abstract syntax—DLR Expression defines only abstract syntax and no concrete syntax. However, it supports serialization and we can use that to serialize abstract syntax to concrete syntax or, conversely, to deserialize concrete syntax into abstract syntax.

Let's look at each of the differences in more detail in the next few sections.

Code as Data  

One key difference between DLR Expression and a typical programming language is that code in DLR Expression is data. Code in DLR Expression exists in memory at run time as objects. In contrast, C# code exists at run time as a bunch of IL instructions. Because code in DLR Expression is data, it does not execute like the C# code does. To execute the data, we can either interpret it or we can turn it into IL code and then run it. For example, the following code turns the Hello World example above into IL code and runs it:

Action callDelegate = Expression.Lambda<Action>(callExpression).Compile();
callDelegate();

The first line in the code snippet wraps callExpression into a lambda expression and calls the Compile method to compile the lambda expression into a callable .NET delegate. This line of code essentially turns data into IL instructions. The second line in the code snippet executes the compiled code.

As you can see, instead of one line of C# code, we write a lot more code to achieve the same thing. Obviously, there has to be a reason for doing all this extra work. The reason is that once code is represented as objects in memory, it is far easier to analyze than IL instructions. Of course, if we want to execute the code, we need to turn the objects back into code by either interpreting or compiling the objects. The transformation from code to objects is based on the idea of “code as data.” Likewise, the transformation from objects to code is based on the idea of “data as code.” The word data here means the objects that are used to represent code. Those objects are referred to as data because we analyze them like data. The concept of “code as data” and “data as code” has been around for a long time. People often use the concept to do things like code transformation, rewriting, generation, analysis, etc. LINQ to SQL is an example of using DLR Expression to do code transformation. It transforms a tree of DLR expressions into SQL statements.

A Common Denominator like CLR

Another key differentiator between DLR Expression and a normal programming language is that DLR Expression, like the set of IL instructions, is designed to be the common denominator of multiple languages. IL is the common denominator of languages such as C# and VB.NET. Languages like these translate their high-level language constructs into low-level IL instructions. CLR has a runtime for executing those IL instructions, because its instructions support the high-level language constructs in those languages. DLR Expression is the common denominator of languages such as IronPython and IronRuby for the same reason. Figure 2-1 below shows the analogy between CLR IL instructions and DLR expressions in their roles of being the common denominator of multiple high-level languages.

images

Figure 2-1. The CLR and DLR as common denominators of multiple languages

Concrete Syntax and Serialization

One more difference between DLR Expression and a normal programming language is that DLR Expression defines only the abstract syntax and not the concrete syntax. The concrete syntax of a language is literally what you type in a code editor when you program in that language. For example, when I write 5 + 2 in C#, I write the code in C#'s concrete syntax in a code editor. When I compile the code, the language's parser will parse the code I write in concrete syntax and represent it in memory as a tree of objects. The tree representation is called an abstract syntax tree, and it represents in abstract syntax the same code I write in concrete syntax. So what's the big deal in this distinction between concrete syntax and abstract syntax? The idea here is that multiple concrete syntax notations can be mapped to one single abstract syntax notation.

DLR Expression supports serialization. When we serialize a DLR expression tree into a file, the textual format in which we store the expression tree is one concrete syntax notation of DLR Expression. Each person is free to define his own concrete syntax for DLR Expression and implement the serialization/deserialization code.

Unlike DLR Expression, a typical programming language like C# defines one concrete syntax and one abstract syntax. So the mapping from concrete syntax to abstract syntax is one-to-one. Of course, if you like, you can come up with a custom concrete syntax for writing C# code. You would then implement a custom parser to parse that code and transform it into C#'s abstract syntax. However, I've yet to see anybody so insane. Besides, the C# abstract syntax might not be accessible to the public.  

Expressions vs. Statements

So far, we've looked at DLR Expression as a programming language. In programming languages, there's a distinction between expressions and statements. The distinction between the two is very simple—expressions yield values; statements don't. This is best illustrated with some examples.

Each of the following lines of code is an expression, compound or not, in C#. To qualify as an expression, according to what we just said, the code must yield a value. In the first three lines, the value each expression yields is the number 7. The value the last expression yields is the return value of the method call. The third line is particularly interesting. It's a compound expression that consists of two assignment expressions: j = 7 and i = (j = 7). The expression j = 7 yields 7 as its value, which in turn is assigned to the variable i. If j = 7 were not an expression, there would be nothing to assign to variable i and the third line of code would cause a compilation error.

5 + 2
x = 7           //assignment expression
i = j = 7       //this line of code consists of two assignment expressions
someObject.Foo()      //method call expression

Let's now see some statements. The first line below shows that local variable declaration in C# is a statement. The second line shows that if we add semicolon to the end of an assignment expression, the result is an assignment statement. Similarly, if we add semicolon to the end of a method call expression, we get a method call statement as the result.

int i;   //variable declaration statement
x = 5;           //assignment statement
someObject.Foo();     //method call statement

The if language construct in C# is a statement, not an expression. The following code is valid in C#:

if (true)               //if statement
    Console.WriteLine("Hello");

The code below, however, is not valid C# code because the if statement yields no value that can be assigned to variable x.

//This code is not valid in C#
x = if (true)
    Console.WriteLine("Hello");

So what does all this discussion about expressions versus statements have to do with DLR Expression? Simply put, DLR Expression does not have statements. It has only expressions that yield values. That's why it's called DLR Expression, not DLR Statement. Let's look at an example and see what that means. Listing 2-1 shows an invalid C# code snippet. It's invalid because the if statement yields no value that can be assigned to variable y. The code in Listing 2-1 is similar to the code snippet we just looked at, so it shouldn't be anything new to us. Now let's look the almost equivalent code in DLR Expression in Listing 2-2. You can find the code for both Listing 2-1 and Listing 2-2 in the StatementVersusExpression.cs file of this chapter's source code project.

Listing 2-1. Invalid C# Code That Assigns an If Statement to a Variable.

{
  String x = "Hello";
  String y = if (true)   //This causes compilation error.
                x.ToLower();
}

Listing 2-2. Assigning an IfStatement to a Variable in DLR Expression

1)   MethodInfo method = typeof(String).GetMethod("ToLower", new Type[]{});
2)   ParameterExpression x = Expression.Variable(typeof(String), "x");
3)   ParameterExpression y = Expression.Variable(typeof(String), "y");
4)   Expression blockExpression =Expression.Block(new ParameterExpression[] {x, y},
5)       Expression.Assign(x, Expression.Constant("Hello")),
6)       Expression.Assign(y,
7)           Expression.Condition(Expression.Constant(true),
8)               Expression.Call(x, method),
9)               Expression.Default(typeof(String)),
10)              typeof(String)))
11)  );
12)
13)  Func<String> blockDelegate = Expression.Lambda<Func<String>>(blockExpression).Compile();
14)  String result = blockDelegate();
15)  Console.WriteLine(result);            

Don't be turned off by the amount of DLR Expression code that's needed to achieve what the C# code in Listing 2-1 does. Later sections will describe DLR block expressions, conditional expressions, and assignment expressions in more detail, with examples. For now, I'll explain the example code just enough to illustrate the important differences between expressions in the DLR and statements in C#.

The code in Listing 2-2 has quite some interesting points. First, notice the blockExpression variable in line 4.  A DLR block expression is a container of smaller child expressions. The value of a DLR block expression is the value of the last child expression it contains. This is the first difference from the code in Listing 2-1. In Listing 2-1, the pair of curly braces and all the code within it make up a block statement that yields no value.

The second difference from the code in Listing 2-1 is the assignment expression in line 6 that assigns a conditional expression to the parameter expression y. The conditional expression is the equivalent of the C# if statement in Listing 2-1. The parameter expression y is the equivalent of the variable y in Listing 2-1. Unlike in C#, the if conditional construct in the DLR is an expression, not a statement. It has a value that can be assigned to a variable.

Line 13 in Listing 2-2 compiles the whole block expression into code, a .NET delegate of type Func<String>. That type Func<String> means the delegate takes no input arguments and returns a String object. I didn't pick that delegate type randomly. The delegate type needs to match the type requirements of blockExpression. In our case, blockExpression takes no input arguments and returns a String. This echoes what I said about block expressions earlier. Remember I mentioned that a DLR block expression has a value and that value is the value of the last child expression it contains. In our example, the last child expression is the assignment expression that assigns the conditional expression to the variable y expression. The value of that assignment expression is of type String and therefore the value of the block expression is also of type String.

The result of running the code in Listing 2-2 is the text “Hello” displayed in the console. Our discussion of expressions versus statements leads well to the next section's topic—the Expression class.

Expression Type and Kind

DLR Expression has several classes to represent different expressions, such as the block expressions, conditional expressions, and assignment expressions we saw in the earlier sections. All those classes derive directly or indirectly from the Expression class in the System.Linq.Expressions namespace. The Expression class is the root base class that defines all the properties and operations common to all DLR expressions. Because all expressions yield a value (as we discussed in the previous section), the Expression class defines a property called Type to record the type of an expression's value. Figure 2-2 shows the key properties and methods of the Expression class this chapter covers. As the figure shows, Type is a property in the Expression class. Its type is System.Type. So for example, if a DLR expression represents the string literal “Hello”, that expression's Type property is System.String.

images

Figure 2-2. Expression class

Besides the Type property, we'll encounter two other important properties later: NodeType and CanReduce. The CanReduce property and the Reduce method are about expression reduction, which we'll discuss later in the chapter. The Accept method is related to the Visitor design pattern we can use to change DLR expressions. We will also discuss that later, and don't worry—no prior knowledge of the Visitor design pattern is required. I'll explain the Visitor design pattern in general first, then show you how the pattern is implemented in the DLR. For now, let's look at just the NodeType property, then we'll jump right into a bunch of code examples that show how to write programs in DLR Expression.

The NodeType property gives additional information about what kind of an expression you're dealing with. For example, Table 2-1 shows some C# expressions in the first column. The equivalent DLR expressions of those C# expressions are all instances of the BinaryExpression class. Although they are all instances of BinaryExpression, they have different values for the NodeType property shown in the second column.

Table 2-1. Binary Expressions and Their NodeType Properties

C# Binary Expression NodeType
5 + 2 ExpressionType.Add
5 − 2 ExpressionType.Subtract
5 * 2 ExpressionType.Multiply
5 / 2 ExpressionType.Divide

All of the C# expressions in Table 2-1 have one thing in common—they all have a left operand, a right operand, and an operator in the middle. In other words, if we visualize them as expression trees, they will all have the same shape shown in Figure 2-3. Because of this commonality, it's very typical in the design of abstract syntax trees to use something like DLR's BinaryExpression class to represent them all. The design approach is often called shape-based. Because the BinaryExpression instances all have the same shape, there needs to be a way to distinguish whether a binary expression is an addition, multiplication, subtraction, or something else. The NodeType property is there in the Expression class to carry that additional information about an expression.

images

Figure 2-3. Shape of a binary expression

The code snippet below is an example that prints out the Type and NodeType properties of a BinaryExpression instance and two ConstantExpression instances. The two ConstantExpression instances act as the left and right operands of the binary expression. The expressions in this code example match exactly the tree shape shown in Figure 2-3. In the code snippet, the binary expression is referenced by the addExpression variable. Because the binary expression represents arithmetic addition of two integers, when the code prints out the Type property of the binary expression, the text “System.Int32” will show up on the screen. As for the two ConstantExpression instances, because they represent constant integers, when the code prints out their Type property, we will see “System.Int32” on the screen too. The NodeType property of the binary expression has the value ExpressionType.Add to indicate that the binary expression is a binary addition, not a binary multiplication or any other kind of binary expression. The NodeType property of the two ConstantExpression instances has the value ExpressionType.Constant to indicate that the expressions are constant expressions.

BinaryExpression addExpression = Expression.Add(
                       Expression.Constant(10),
                       Expression.Constant(20));

Console.WriteLine(addExpression.Type);
Console.WriteLine(addExpression.Left.Type);
Console.WriteLine(addExpression.Right.Type);

Console.WriteLine(addExpression.NodeType);
Console.WriteLine(addExpression.Left.NodeType);
Console.WriteLine(addExpression.Right.NodeType);

So far we've covered the important concepts of DLR Expression. The next few sections will introduce the various DLR expression classes, much like we'd introduce the language constructs of a programming language. You'll see examples that show how to write arithmetic, if, switch, for loop, and other DLR expressions.

Binary Expressions

Binary expressions are expressions that have a left and a right child expression, such as one that adds two integers. The expression is binary. Its left child expression is the left operand of the addition operation. Its right child expression is the right operand. Binary expressions are represented as instances of the BinaryExpression class. Listing 2-3 shows first the C# code that adds two doubles and divides the result by 3. Below the C# code is the equivalent code in DLR Expression.

In the DLR Expression part of the example, the doubles are constant expressions. The example calls Expression.Constant to create them. The Expression class provides factory methods for creating instances of built-in expression classes. The factory method for ConstantExpression is Expression.Constant. Similarly, the example calls Expression.Add and Expression.Divide to create the two binary expressions that represent the addition and division.

Listing 2-3. BinaryExamples.cs

public static void CSharpExample()
{
    double result = (10d + 20d) / 3d;
    Console.WriteLine(result);
}

public static void LinqExample()
{
    Expression binaryExpression = Expression.Divide(
        Expression.Add(                 //left child of division
                Expression.Constant(10d),  //left child of addition
                Expression.Constant(20d)), //right child of addition
        Expression.Constant(3d)         //right child of division
    );

    Func<double> binaryDelegate = Expression.Lambda<Func<double>>(binaryExpression)
                                        .Compile();
    Console.WriteLine(binaryDelegate());
}

Both Expression.Add and Expression.Divide return an instance of BinaryExpression. The difference is that the BinaryExpression that Expression.Add returns has ExpressionType.Add as the value of its NodeType property whereas the BinaryExpression that Expression.Divide returns has ExpressionType.Divide. Tables 2-2 and 2-3 list all the different ExpressionType values that a binary expression's NodeType property can take. We have seen some of them in Table 2-1 already. Tables 2-2 and 2-3 present the complete list of all binary expressions and their NodeType property values. For each ExpressionType value in the first column of the tables, Table 2-2 and Table 2-3 show the equivalent C# binary operator in the second column. For each ExpressionType value, the tables don't have a column that shows the corresponding factory method in the Expression class for creating a binary expression for that ExpressionType value. I omit it because the factory method names are the same as their corresponding ExpressionType values. For example, in the first row of Table 2-2, the ExpressionType value is Add and the corresponding factory method is Expression.Add.

Table 2-2 shows the ExpressionType values for binary arithmetic expressions. Table 2-3 shows the ExpressionType values for the rest of binary expressions.

Table 2-2. Arithmetic Binary Expressions

NodeType property value Example of Equivalent
C# Operator
Add
AddAssign
AddChecked
AddAssignChecked
5 + 2
x += 2
checked (x + 2)
checked (x += 2)
Subtract
SubtractAssign
SubtractChecked
SubtractAssignChecked
5 − 2
x -= 2
checked (x - 2)
checked (x -= 2)
Multiply
MultiplyAssign
MultiplyChecked
MultiplyAssignChecked
5 * 2
x *= 2
checked (x * 2)
checked (x *= 2)
Divide
DivideAssign
5 / 2
x /= 2
Modulo
ModuloAssign
5 % 2
x %= 2
Power
PowerAssign
5 ^ 2
x ^= 2

Table 2-3. Non-Arithmetic Binary Expressions

NodeType Example of Equivalent
C# Operator
And
AndAlso
AndAssign
true & false
true && false
x &= true
Or
OrElse
OrAssign
true | false
true || false
x |= false
ExclusiveOr
ExclusiveOrAssign
true ^ false
x ^= false
LessThan
LessThanOrEqual
5 < 2
5 <= 2
GreaterThan
GreaterThanOrEqual
5 > 2
5 >= 2
Equal
NotEqual
5 == 2
5 != 2
RightShift
RightShiftAssign
5 >> 2
x >>= 2
LeftShift
LeftShiftAssign
5 << 2
x <<= 2
Assign x = 2
ArrayIndex x[2]
Coalesce (int) x

Flow Control Expressions

This section will look at examples of if and switch flow control expressions in the DLR. Because the examples will use the expression that calls the Console.WriteLine method in several places, it helps to wrap that expression into the Print helper method shown in Listing 2-4. This way the code examples will be more succinct and easier to read.

Listing 2-4. The Print helper method in ExpressionHelper.cs

public class ExpressionHelper
{
    public static Expression Print(string text)
    {
        return Expression.Call(
                   null,
                   typeof(Console).GetMethod("WriteLine", new Type[] { typeof(String) }),
                   Expression.Constant(text)
               );
    }
}

If-Then-Else Expressions

Listing 2-5 has two methods, CSharpExample()and LinqExample(). The code in CSharpExample() is a simple if-else statement that will print “true” to the screen when you run it. The equivalent code in DLR Expression is in LinqExample(),which  calls the Expression.IfThenElse factory method to create an instance of ConditionalExpression. The conditional expression has three parts—if-test, if-true, and if-false. Each of the tree parts is an expression by itself. In this example, the if-test part is a constant expression that has true as its value. The if-true expression is a method call expression that will print “true” to the screen when executed. The if-false expression is a method call expression that will print “false” to the screen when executed.

Listing 2-5. IfExamples.cs

public static void CSharpExample()
{
    if (true)
        Console.WriteLine("true");
    else
        Console.WriteLine("false");
}

public static void LinqExample()
{
    Expression ifExpression = Expression.IfThenElse(
        Expression.Constant(true),
        ExpressionHelper.Print("true"),
        ExpressionHelper.Print("false")
    );

    Action ifDelegate = Expression.Lambda<Action>(ifExpression).Compile();
    ifDelegate();
}

Switch Expressions

Now we'll look at an example of constructing a switch expression in the DLR. As before, the code in Listing 2-6 shows the example first in C# and then in DLR Expression. The code in CSharpExample() is a simple switch statement that will print “case 1” to the screen when you run it. The equivalent code in DLR Expression is in LinqExample(), which calls the Expression.Switch factory method to create an instance of the SwitchExpression class. The switch expression in the example consists of two parts—the switch value and the two cases. The switch value is an expression. The switch cases are not; they are instances of the SwitchCase class. In this example, the switch value is a constant expression that has integer 1 as its value. The example code calls the Expression.SwitchCase factory method to create the SwitchCase instances. Each SwitchCase instance consists of two parts—the conditions and the body of the case, all of which are expressions. A switch case can have one or more conditions. In our example, the first case has one condition, the integer 1, and the second case has two conditions, the integers 2 and 3.

One important difference between the C# code and the DLR expression code is that in the latter, there is no need to have a break expression to mark the end of each case.

Listing 2-6. SwitchExamples.cs

public static void CSharpExample()
{
    switch (1)
    {
        case 1:
            Console.WriteLine("case 1");
            break;
        case 2:
        case 3:
            Console.WriteLine("case 2 and 3");
            break;
    }
}

public static void LinqExample()
{
    SwitchExpression switchExpression = Expression.Switch(
        Expression.Constant(1),
        new SwitchCase[] {
            Expression.SwitchCase(
                ExpressionHelper.Print("case 1"),
                Expression.Constant(1)
            ),
            Expression.SwitchCase(
                ExpressionHelper.Print("case 2 and 3"),
                Expression.Constant(2),
                Expression.Constant(3)
            )
        }
    );

    Action switchDelegate = Expression.Lambda<Action>(switchExpression).Compile();
    switchDelegate();
}

Scopes and Name Binding

Every language defines its own scoping rules for binding names, and so does DLR Expression. Before I explain what that means, let's look at an example. Listing 2-7 shows some C# code and the scopes it defines:

Listing 2-7. Scopes

namespace ExpressionExamples
{                                                       //new scope ExpressionExamples
    public class Employee
    {                                                   //new scope Employee
        private double monthlySalaryRate = 1000d;

        public double calculateBonus(int performanceRating)
        {                                               //new scope calculateBonus
            return monthlySalaryRate * performanceRating;
        }                                               //end of scope calculateBonus
    }                                                   //end of scope Employee

    public class FatCat
    {                                                   //new scope FatCat
        private double monthlySalaryRate = 1E10;

        public double calculateBonus(int performanceRating)
        {                                               //new scope calculateBonus
            return monthlySalaryRate * 1E20;
        }                                               //end of scope calculateBonus
    }                                                   //end of scope FatCat
}                                                       //end of scope ExpressionExamples

In the example, I purposely define the same calculateBonus method names and the same monthlySalaryRate variable names to illustrate the effect scopes have on name bindings. When the same name shows up multiple times in code, the language's compiler or interpreter needs to have rules for determining what those occurrences of the same name refer to. The rules, as it turns out, are defined in terms of scopes in most if not all languages.

In general, the way it works is very simple (we'll get to some subtle details later). First, you can't define two things with the same name in one scope. In the code above, the names Employee and FatCat have to be different because they are defined in the same scope. Second, sibling scopes are totally isolated from each other. That's why we can have the same variable names and method names in Employee and FatCat because the scopes Employee and FatCat define are siblings. Last, child scope (i.e., a scope nested within another scope) has visibility into its parent and parent's parent and so on. The scope defined by the calculateBonusmethod in the Employeeclass is a child scope of the Employee class's scope. That's why in that method, when we use the name monthlySalaryRateto calculate an employee's bonus, the C# compiler knows we are referring to the Employee class's monthlySalaryRate, not FatCat's monthlySalaryRate.

Astute readers might notice that the scopes in the code example correspond to open-close curly brace pairs one to one. What's more, the nesting of curly brace pairs and the nesting of scopes match up perfectly. That can't be just coincidence, can it? No, it isn't and in fact the essence of curly braces in C# is to define scopes for name bindings. In Listing 2-7, we saw that namespaces, classes, and methods can introduce new scopes. Besides those, in C#, the flow control statements like while, if-then-else, and for can be followed by open-close curly brace pairs and thereby introduce new scopes.

Lexical vs. Dynamic Scope

We will soon get to the discussion of DLR Expression on the subject of scopes and name binding. But first I want to show you some more examples and explain the term lexical scope.

Listing 2-8 shows a C# program on the left and the equivalent Python program on the right. Listing 2-9 shows a similar C# program on the left and its equivalent Python program on the right. Notice that in each program, the function addToY is called twice—once directly by the line addToY(5) and the other time indirectly by the line add4(5).

In Listing 2-8, the function add4assigns number 4 to the outer y and then calls addToY. That does not change the fact that the name y in addToY is bound to outer y. If you run either of the two programs in Listing 2-8, you will see the numbers 7 and 9 printed on the screen.

Listing 2-8. Examples of Lexical Scopes and Name Binding with No Local Variables

images

In Listing 2-9, the function add4 assigns number 4 to a local y and then calls addToY. That again does not change the fact that the name y in addToY is bound to outer y. If you run either of the two programs in Listing 2-8, you will see the numbers 7 and 7 (because the value of outer y is never changed) printed on the screen.

Listing 2-9. Examples of Lexical Scopes and Name Binding with Local Variables

images

The point I want to emphasize with these programs is that in all four cases, (a) no matter where addToY is called (directly or indirectly within another function), and (b) no matter whether we change the value of a local y or the outer y before calling addToY within add4, the name y in function addToY always refers to the outer y. The point, in other words, is that the scope of addToY's name binding does not depend on where addToY is called. It depends on where addToY is defined. Since addToY is defined in the scope where the outer y is also defined, the name y in addToYalways refers to the outer y. This kind of scoping rule is called lexical scoping or static scoping. The opposite of lexical scoping is dynamic scoping. With dynamic scoping, the name y in addToY might not always be bound to the outer y. The binding is more fluid, and it depends on where addToY is called. All of the languages used in this book such as DLR Expression, C#, Ruby and Python use lexical scoping. Therefore, we will not discuss dynamic scoping in more detail.

BlockExpression and Lexical Scoping

So far, we've been using C# as an example for explaining scopes and name binding. Let's see some examples in DLR Expression. Listing 2-10 shows a C# example of nested scopes. Listing 2-11 shows a similar example in DLR Expression. The C# example defines a lambda function add. The body of the add function has two blocks—i.e., the two pairs of left and right curly braces. Each block defines a scope. The outer scope declares the variable x. The inner scope declares the variable y. Because the name x is declared in the outer scope, C# does not allow us to bind that name to different object in the inner scope. That's why in the inner scope if we declare a variable of the same name x, we'll get a compilation error.

Listing 2-10. C# Example in NestedBlockExamples.cs

public static void BlockLexicalScopeCSharpExample()
{
    Func<int> add = () =>
    {
        int x = 2;
        {
            //int x = 1; //Compilation error. C# compiler does not allow this.
            int y = 3;
            return x + y;
        }
    };

    int result = add();
    Console.WriteLine("result is {0}", result);
}

DLR Expression does not impose this limitation and the code in Listing 2-11 is the proof. The DLR example, like the C# example, also has two blocks. Each block defines a scope. Unlike C#, DLR Expression allows us to declare a variable x in the outer scope (line 7) and another variable x in the inner scope (line 10). Both of these are references to the same bolded ParameterExpressioninstance x in line 3. However, even though they are references to the same ParameterExpressioninstance x, the fact that they are in different scopes has an impact on name bindings. That impact is illustrated in the example with the code Expression.Add(x, y) in line 13. The result of the addition is 3, not 5. This is because the addition expression is in the scope where the name x is bound to the inner scope's local variable x. The inner scope's local variable xis implicitly initialized with the value 0. So the result of the addition is 3.

If we don't declare the local variable x in the inner scope, the result of the addition expression will be 5. The result is 5 because the addition will add the outer scope's x and the inner scope's y. To not declare the local variable x in the inner scope, all you need to do is to replace line 10 in Listing 2-11 with the following line:

//result will be 5 if you use the following line of code to replace line 10 in Listing 2-11
new ParameterExpression[] { y },

Listing 2-11. DLR Expression Example in NestedBlockExamples.cs

1)   public static void BlockLexicalScopeLinqExample()
2)   {
3)       ParameterExpression x = Expression.Variable(typeof(int), "x");
4)       ParameterExpression y = Expression.Variable(typeof(int), "y");
5)
6)       Expression add = Expression.Block(
7)               new ParameterExpression[] { x },
8)               Expression.Assign(x, Expression.Constant(2)),
9)               Expression.Block(
10)                  new ParameterExpression[] {x, y}, //Unlike C#, DLR allows this.
11)                  Expression.Assign(y, Expression.Constant(3)),
12)                  Expression.Add(x, y)
13)              )
14)         //If we print out the value of x here, it will be 2.
15)          );
16)
17)
18)      int result = Expression.Lambda<Func<int>>(add).Compile()();
19)      Console.WriteLine("result is {0}", result);
20)  }

Notice the comment in line 14 of Listing 2-11. It says in the outer scope, if we print out the outer scope's x variable, we will see the number 2—even if the inner scope declares another variable x that's initialized to 0. Let me prove that to you in code, and then I'll summarize the key takeaways from our discussion of the DLR's BlockExpression.

Listing 2-12 shows the code that prints out the outer scope's x variable. The code in Listing 2-12 is a simplified version of the code in Listing 2-11. First, we don't need the variable y from Listing 2-11 to demonstrate the point I'm trying to make. The code in Listing 2-12 simply has a variable x in the outer scope and another variable x in the inner scope. In the inner scope, the variable x is assigned the number 1. Because the inner scope's x is bound to a different object than the outer scope's x, assigning the number 1 to the inner scope's x does not change the value of the outer scope's x. So when the code prints out the outer scope's x, the number we'll see on the screen is 2, not 1. The bolded line of code in Listing 2-12 is the line that prints out the outer scope's variable x to the screen. The bolded line of code is a call to the static Print helper method in the ExpressionHelper class. This Print helper method is slightly different from the Print helper method we saw in Listing 2-4. The Print helper method in Listing 2-4 takes a String object as the input parameter. The Print helper method here takes an Expression instance as the input parameter. Listing 2-13 shows the implementation of the Print helper method that takes an Expression instance as input.

Listing 2-12. NestedBlockExamples.cs

public static void OuterScopeVariableNotChangedByInnerScopeLinqExample()
{
    ParameterExpression x = Expression.Variable(typeof(int), "x");
            
    Expression block = Expression.Block(
            new ParameterExpression[] { x },
            Expression.Assign(x, Expression.Constant(2)),
            Expression.Block(
                new ParameterExpression[] { x },
                Expression.Assign(x, Expression.Constant(1))
            ),
            ExpressionHelper.Print(x)
        );

    Expression.Lambda<Action>(block).Compile()();
}

Listing 2-13. The Print Helper Method That Takes an Expression Instance as Input

public static Expression Print(Expression expression)
{
    return Expression.Call(null,
                typeof(Console).GetMethod("WriteLine", new Type[] { typeof(int) }),
                expression);
}

Here are the key takeaways from our discussions of BlockExpression:

  • Even if an outer block already declares variable x, DLR BlockExpression allows an inner block of the outer block to declare variable x again. C# does not allow that.
  • If an outer block and an inner block both declare variable x, those two variables are bound to different objects. Changing the value of one variable does not change the value of the other variable. This is true even when the two variables are represented by the same ParameterExpression instance.

Lambda Expressions and Closure

Lambda expressions are so called because they are based on a theory called lambda calculus in mathematics. Let's look at a C# example of a lambda expression and then I'll use that example to explain some concepts in lambda calculus. Although I had a lot of fun (and pain as well) with lambda calculus while taking the course “Introduction to Programming Language Theory” at Stanford, I promise I won't digress and stray into the parts of lambda calculus that aren't necessary for our discussion in this section.

Listing 2-14 shows a C# code example that creates a lambda expression. The lambda expression is the part in bold. It is an expression and therefore can be assigned to the variable add. The lambda expression takes two input parameters x and y of type int and returns a value of type int. The body of the lambda expression is { return x + y ; }. The x and y in the body of the lambda expression are bound by the input parameters x and y. Therefore, the x and y in the body of the lambda expression are said to be bound variables.

Listing 2-14. A C# Example of a Lambda Expression

public static void LambdaCSharpExample()
{
    Func<int, int, int> add = (x, y) => { return x + y; };
    int result = add(3, 5);
    Console.WriteLine("result is {0}", result);
}

Listing 2-15 shows the DLR Expression equivalent. This code calls the Expression.Lambda<T> factory method to create an instance of Expression<T>. Here T is the delegate type of the lambda expression the factory method creates. The bolded code in line 7 of Listing 2-15 is the body of the lambda expression. The x and y in line 8 are the two input parameters of the lambda expression. The x and y in the body of the lambda expression are bound by the x and y input parameters in line 8.

Listing 2-15. A DLR LambdaExpression Example

1)   public static void LambdaLinqExample()
2)   {
3)       ParameterExpression x = Expression.Parameter(typeof(int), "x");
4)       ParameterExpression y = Expression.Parameter(typeof(int), "y");
5)
6)       Expression<Func<int, int, int>> add = Expression
7)                 .Lambda<Func<int, int, int>>(Expression.Add(x, y),
8)                                              x, y);
9)
10)      int result = add.Compile()(3, 5);
11)      Console.WriteLine("result is {0}", result);
12)  }

So far, the lambda expressions and bound variables we've discussed all have their roots in the lambda calculus theory. Lambda calculus also includes the opposite of bound variables, called free (or unbound) variables, and  C# and DLR Expression allow free variables in lambda expressions, too. Listing 2-16 shows a C# example of a lambda expression that has free variables. The code first declares two variables x and y,then it creates a lambda expression. The body of the lambda expression is the same as the lambda expression body in Listing 2-14. However, in Listing 2-16, the lambda expression does not take any input parameters and, because of this, the variables x and y in the body of the lambda expression are said to be free variables. The free variables need to be bound before the lambda expression can be executed. Because the lexical scope in which the lambda expression is created defines the variables x and y in lines 3 and 4, the x and yin the body of the lambda expression in line 7 are bound to the variables x and y in lines 3 and 4. The lambda expression and the variables x and y in line 3 and line 4 together is called a closure.

Because the lambda expression's free variables are bound in the closure, we can execute the closure. The code in Listing 2-16 purposely executes the closure within a block that spans from line 10 to line 13. The block is redundant in the code. I wrote the code in such a way so that it's as close as possible to the almost equivalent DLR code in Listing 2-17.

Listing 2-16. A C# Example of a Lambda Expression That Has Free Variables

1)   public static void ClosureLexicalScopeCSharpExample()
2)   {
3)       int x = 2;
4)       int y = 1;
5)
6)       //The 'add' delegate and variables x, y form a closure.
7)       Func<int> add = () => { return x + y; };
8)
9)       int result;
10)      {
11)          //int y = 3; //C# compiler does not allow this.
12)          result = add();
13)      }
14)
15)      Console.WriteLine("result is {0}", result);
16)  }

Listing 2-17 shows a DLR Expression example of a lambda expression that has free variables. In Listing 2-17, the code defines a lambda expression and assigns it to the variable add in line 12. The lambda expression has two free variables, x and y, that are bound to the x and y declared in line 9. The key point here is that lexical scoping determines what the free variables x and yare bound to. The free variables x and y are bound to the x and y declared in line 9 because the x and y in line 9 are declared in the same lexical scope as the lambda expression. Because the lambda expression is defined in the outer scope, the free variables x and y are bound to the outer scope's local variables x and y. To prove that, the example deliberately invokes the closure in an inner scope where the name y is bound to the inner scope's local variable y, which has value 3. It doesn't matter where the closure is invoked. The free variables are always bound to the outer scope's local variables x and y. Therefore, the result of running the code in Listing 2-17 is 3 (i.e., 2 + 1), not 5 (i.e., 2 + 3).

Listing 2-17. A DLR Expression Example of a Lambda Expression That Has Free Variables.

1)   public static void ClosureLexicalScopeLinqExample()
2)   {
3)       ParameterExpression x = Expression.Variable(typeof(int), "x");
4)       ParameterExpression y = Expression.Variable(typeof(int), "y");
5)       ParameterExpression add = Expression.Variable(typeof(Func<int>), "add");
6)
7)       Expression addExpression = Expression.Block(
8)          //add is defined in the outer scope but invoked in the inner scope.
9)          new ParameterExpression[] { x, add, y },
10)         Expression.Assign(x, Expression.Constant(2)),
11)         Expression.Assign(y, Expression.Constant(1)),
12)         Expression.Assign(add, Expression.Lambda<Func<int>>(
13)                  Expression.Add(x, y))),  //x, y here are bound to outer scope's x, y
14)         Expression.Block(
15)                  new ParameterExpression[] { y },
16)                  Expression.Assign(y, Expression.Constant(3)),
17)                  Expression.Invoke(add)  //invoke add in the inner scope.
18)              )
19)          );
20)
21)      int result = Expression.Lambda<Func<int>>(addExpression).Compile()();
22)      Console.WriteLine("result is {0}", result);
23)  }

The GotoExpression Class

Many languages have language constructs such as for, for-each, while, and do-while for performing iterations.  DLR Expression supports those, too. However, instead of providing the various high-level constructs for doing iterations, the DLR provides the GotoExpressionclass as a lower-level construct that those high-level constructs can base on. This section will look at GotoExpressionand compare it to C#'s goto statements. The next section will show you how to use GotoExpression to achieve what C#'s while statements can do.

Listing 2-18 shows a C# example of goto statements, and Listing 2-19 shows a similar example in DLR Expression. The C# example shows that C# does not allow code that jumps from an outer scope to a label declared in an inner scope. Line 4 in Listing 2-18 tries to jump from an outer block to the InnerBlock label declared in an inner block. If you uncomment line 4, you'll get compilation error.

As this demonstrates, C# does not allow code to jump from outer scope to a label declared in an inner scope. However, C# allows jumps in the other direction: It allows code to jump from an inner scope to a label declared in an outer scope. The code in line 10 does exactly that.

Listing 2-18. GotoExamples.cs

1)   public static void CSharpExample()
2)   {
3)       //C# cannot do this jump
4)       //goto InnerBlock;
5)
6)       {
7)       InnerBlock:
8)           Console.WriteLine("In inner block.");
9)           //jump to outer block
10)          goto OuterBlock;
11)          Console.WriteLine("This line is unreachable");
12)      }
13)
14)  OuterBlock:
15)      Console.WriteLine("In outer block.");
16)  }

This limitation in C# does not exist in DLR Expression. As Listing 2-19 shows, DLR Expression allows us to jump in both directions. To jump, we need to label the target we want to jump to. So the example code in Listing 2-19 creates two instances of LabelTarget, innerBlock and outerBlock, in lines 3 and 4, to represent the two targets. In lines 12 and 17, the example calls the Expression.Label method and passes it a LabelTarget instance to mark a place in code we can jump to. To jump to a target, the example calls ExpressionGoto in lines 9 and 14. Now let's trace the execution flow of the code and see where the code jumps. In line 9, the code is to jump to the innerBlock label, which is marked in line 12. So the code execution skips line 10 and jumps to line 12. It continues to line 13 and prints out “In inner block.” Then it gets to line 14. The code in line 14 is to jump to the outerBlock label, which is marked in line 17. So the code execution skips line 15 and jumps to line 17. Then it continues to line 18, prints out “In outer block” and finishes.

Listing 2-19. GotoExamples.cs

1)   public static void LinqExample()
2)   {
3)       LabelTarget innerBlock = Expression.Label();
4)       LabelTarget outerBlock = Expression.Label();
5)
6)       Expression<Action> lambda = Expression.Lambda<Action>(
7)           Expression.Block(
8)               //DLR can do this jump
9)               Expression.Goto(innerBlock),
10)              ExpressionHelper.Print("Unreachable"),
11)              Expression.Block(
12)              Expression.Label(innerBlock),
13)                  ExpressionHelper.Print("In inner block."),
14)                  Expression.Goto(outerBlock),
15)                  ExpressionHelper.Print("Unreachable")
16)              ),
17)              Expression.Label(outerBlock),
18)              ExpressionHelper.Print("In outer block.")));
19)
20)      lambda.Compile()();
21)  }

While Loops

Now that we've introduced GotoExpression, let's see how to use it to achieve the equivalent of what while statements do in C#. First, let's see the C# example that will then be translated into the DLR Expression example. Listing 2-20 shows two C# methods. The first method, CSharpExample, has a while loop that adds the numbers 0, 1, and 2. The second method, CSharpGotoExample does the same thing except that instead of using a while loop, it uses C#'s goto statements.

Listing 2-20. C# Examples in WhileExamples.cs

public static void CSharpExample()
{
    int i = 0;
    while (i < 3)
       i++;

    Console.WriteLine("i is {0}", i);
}

public static void CSharpGotoExample()
{
    int i = 0;

WhileLabel:

    if (i < 3)
    {
        i++;
        goto WhileLabel;
    }

    Console.WriteLine("i is {0}", i);
}

As you can see, if a language already defines the if and goto constructs, the while construct is merely syntactic sugar. The syntactic sugar might let users of the language write more concise and readable code, but it doesn't let them express anything that they can't express with if and goto. Given the CSharpGotoExample method in Listing 2-20, it's pretty straightforward to translate that code into the equivalent code in DLR Expression in Listing 2-21. This code creates an instance of LabelTarget called whileLabel (line 3). It uses whileLabel to mark the target of the goto expression (line 9), then it calls the Expression.Goto factory method to create a goto expression that jumps to the target (line 13).

Notice that the code in line 12 calls the PostIncrementAssign factory method to create an expression that represents the code i++. The method PostIncrementAssign returns an instance of UnaryExpression whose NodeType property is ExpressionType.PostIncrementAssign. There are many other kinds of unary expressions for representing unary operations such as ++i, --i, -i (negation), etc. You can refer to the MSDN documentation for a comprehensive list of these unary operations.

Besides GotoExpression, another way to do looping is to use the LoopExpression class in the System.Linq.Expressions namespace. After seeing the code examples in this section, it should be straightforward to learn how to use LoopExpression by reading Microsoft's MSDN documentation. The DLR does not provide anything like a WhileExpression class for doing while loops in particular yet.  

Listing 2-21. DLR Expression example in WhileExamples.cs

1)   public static void LinqExample()
2)   {
3)       LabelTarget whileLabel = Expression.Label();
4)       ParameterExpression i = Expression.Variable(typeof(int), "i");
5)
6)       Expression<Func<int>> lambda = Expression.Lambda<Func<int>>(
7)           Expression.Block(
8)               new ParameterExpression[] {i},
9)               Expression.Label(whileLabel),
10)              Expression.IfThen(Expression.LessThan(i, Expression.Constant(3)),
11)              Expression.Block(
12)                  Expression.PostIncrementAssign(i),
13)                  Expression.Goto(whileLabel))),
14)              i));
15)
16)      int result = lambda.Compile()();
17)      Console.WriteLine("i is {0}", result);
18)  }

Dynamic Expressions

So far none of our discussions about DLR Expression involves dynamic behaviors. Everything has been statically typed. Here's an example of what I mean. In the section “Expression Type and Kind,” we saw the following code snippet:

BinaryExpression addExpression = Expression.Add(Expression.Constant(10),
                   Expression.Constant(20));
Console.WriteLine(addExpression.Type);
Console.WriteLine(addExpression.Left.Type);
Console.WriteLine(addExpression.Right.Type);

This code constructs a BinaryExpression object to represent the addition of two integers. The expression has two subexpressions—the left operand expression and the right operand expression. The left operand expression represents the integer constant 10 and hence its Type property is System.Int32. Similarly, the right operand expression represents the integer constant 20 and hence its Type property is also System.Int32. Adding two integers will result in another integer. Therefore, the Type property of addExpression is System.Int32. If you run the code, you'll see the text “System.Int32” printed three times on the screen.

The point I want to stress is that all three expressions know statically the type of the value they represent. So how about the dynamic C# code shown below in Listing 2-22? Is DLR Expression capable of expressing that? The answer is yes, and that's the topic of this section.

Listing 2-22. A C# Example That Contains a Late-Bound Binary Addition.

public static void CSharpExample()
{
    dynamic x = 5;
    dynamic y = x + 2;
    Console.WriteLine("5 + 2 = {0}", y);
}

The challenge here is that when the code in the bolded line in Listing 2-22 adds x and 2, it doesn't know x's type. Well, you might think it's obvious from the code that x is an integer and has the value 5. That's true in this example. But in general, x can come to the bolded line by other means, perhaps as an input argument of the CSharpExample method. In that case, it's totally up to the caller of the CSharpExample method what the variable x will be.

Because in the bolded line we don't know the static type of x, we can't simply use BinaryExpression to represent the addition. The variable x might not be an integer. It might be some wacky object that simply knows how to add 2 to itself. In order to represent dynamic code like the bolded line in Listing 2-22, the DLR provides the DynamicExpression class. DynamicExpression works very much like what the C# compiler does when it compiles the code in Listing 2-22. So let's look at what the C# compiler does first and then use that knowledge to help us explain DynamicExpression.

When the C# compiler sees the code in Listing 2-22, it compiles it into something similar to what Listing 2-23 compiles to. The code might look baffling because it has things like call site binder and call site that I haven't explained. To fully explain those concepts requires a fair amount of background information. You'll see the detailed explanation of those concepts in Chapters 3 and 4. For now, I want to stay focused on DynamicExpression, so I'll explain those concepts just enough for our discussion.

The variable binder in Listing 2-23 knows how to do late binding for binary additions. The variable binder in our example points to an instance of the SimpleOperationBinder class, which is where the late binding logic is. Listing 2-24 shows the SimpleOperationBinder class, which returns the number 3 as the result of late binding (lines 10 to 13). So if we use an instance of SimpleOperationBinder to perform the late binding of an addition operation, no matter what the two operands of the addition are, the result will always be 3. Of course, no one would find much practical use in a binder like that. I'm using it here because it's the simplest binder I can think of for our example. The C# compiler, of course, won't compile the code in Listing 2-22 into something that uses SimpleOperationBinder. C# has a set of binder classes for performing the late binding logic it desires. We will see some examples of the C# binder classes in Chapter 4. Listing 2-23 is only a simplified illustration of what C# compiler does in compiling the dynamic code in Listing 2-22.

Once the binder is in place, to use the binder, the code in Listing 2-23 creates a call site. The call site is an instance of CallSite<T>. The generic type parameter T in this case is the delegate type Func<CallSite, object, object, object>. You can ignore it for now and be assured that it will become clear when we get to Chapters 3 and 4. The important thing to notice here is that the binder variable is passed to the CallSite<T>.Create method when the call site is created (line 6). Because of that, the call site will use the binder instance created in line 3 to do the late binding when the call site's Target method is invoked (line 9). Finally, when line 10 prints the result of the late binding, we will see the number 3 show up on the screen.

Listing 2-23. A Simplified Illustration of What C# Compiler Does with Dynamic Code (DynamicExamples.cs)

1)   public static void CSharpSimpleBinderExample()
     {
        CallSiteBinder binder = new SimpleOperationBinder();

        CallSite<Func<CallSite, object, object, object>> site =
          CallSite<Func<CallSite, object, object, object>>.Create(binder);

        //This will invoke the binder to do the binding.
        object sum = site.Target(site, 5, 2);
        Console.WriteLine("Sum is {0}", sum);
}

Listing 2-24. SimpleOperationBinder.cs

1)   class SimpleOperationBinder : BinaryOperationBinder
2)   {
3)       public SimpleOperationBinder()
4)           : base(ExpressionType.Add)
5)       { }
6)
7)       public override DynamicMetaObject FallbackBinaryOperation(DynamicMetaObject target,
8)             DynamicMetaObject arg, DynamicMetaObject errorSuggestion)
9)       {
10)          return new DynamicMetaObject(
11)                  Expression.Convert(Expression.Constant(3), typeof(object)),
12)                  BindingRestrictions.GetExpressionRestriction(
13)                      Expression.Constant(true)));
14)      }
15)  }

Now that we've seen conceptually what the C# compiler does in compiling code that involves late binding, let‘s see how DynamicExpression is related to all that. Simply put, there are two ways to do late binding in the DLR. One is to create binders and call sites and use them like the code in Listing 2-23 shows. The other way is to create binders and instances of DynamicExpression and use them like the code in Listing 2-25 shows. Internally, when a DynamicExpression instance is compiled into IL and executed like line 11 in Listing 2-25, a call site is created and the whole late binding process will take place similarly to what the code in Listing 2-23 does. The fact that when a DynamicExpression instance is executed a call site will be created is what underlies a useful late binding technique called deferred binding. We will look at deferred binding in Chapter 4. I mention it here so that when we get to that discussion in Chapter 4, you'll associate it with what you learn in this section.

To create and use a DynamicExpression instance, you first have to have a binder. So the code in line 4 of Listing 2-25 creates an instance of the SimpleOperationBinder class we saw in Listing 2-24 to serve as the binder in this example. Then the code in line 6 calls the Dynamic factory method of the Expression class, passing it the binder object as the first input parameter. The second input parameter passed to the Dynamic factory method is the return type of the late binding operation. In our example, the late binding operation returns the number 3 as an instance of System.Object in line 11 of Listing 2-24. That's why in line 7 of Listing 2-25, the code passes typeof(object) as the second input parameter the Dynamic method. The third input parameter passed to the Dynamic method is an array of the operands of the late-bound binary operation. Lines 10 and 11 compile the DynamicExpression instance into an executable delegate, which is executed in line 13. Line 14 prints the result of the execution to the screen, which will be the number 3.

Listing 2-25. An Example of DynamicExpression in DynamicExamples.cs

1)   public static void LinqExample()
2)   {
3)       //prepare a call site binder
4)       CallSiteBinder binder = new SimpleOperationBinder();
5)
6)       DynamicExpression dynamicExpression = Expression.Dynamic(
7)                   binder, typeof(object),
8)                   new [] {Expression.Constant(5), Expression.Constant(2)});
9)
10)      Func<object> compiledDelegate = Expression.
11)                     Lambda<Func<object>>(dynamicExpression).Compile();
12)
13)      object result = compiledDelegate();
14)      Console.WriteLine("result is {0}", result);
15)  }

Index Expressions

Now we'll look at an example of how to use the IndexExpression class in the System.Linq.Expressions namespace. An IndexExpression instance represents an array or property index. Listing 2-26 shows a C# example that uses an array index to change the value of an array's second element. The integer array numbers has three integers in it—7, 2, and 4. The code in Listing 2-26 changes the second element from 2 to 6. If you run the example, it will print 7, 6, 4 to the screen.

Listing 2-26. A C# Example That Uses an Array Index to Change an Array's Element

public static void CSharpExample()
{
    int[] numbers = { 7, 2, 4 };
    numbers[1] = 6;
    Console.WriteLine("{0}, {1}, {2}", numbers[0], numbers[1], numbers[2]);
}

Listing 2-27 shows the DLR Expression code equivalent to the C# code in Listing 2-26. Like the C# example in Listing 2-26, the code in Listing 2-27 also starts with the integer array numbers that has three integers in it. The code from line 5 to line 7 calls the ArrayAccess factory method of the Expression class to create an instance of the IndexExpression class. The IndexExpression instance represents index 1 of the array numbers. Then the code in lines 9 and 10 creates an assignment expression that assigns an expression representing the integer 6 to indexExpression. Because indexExpression represents index 1 of the array numbers, when the assignment expression arrayIndexExp is executed, the integer 6 will be assigned to the element whose index is 1 in the array numbers. If you run the example, it will print the same result 7, 6, 4 to the screen as the C# example in Listing 2-26 does.

Listing 2-27. A DLR Expression Example That Shows How to Use IndexExpression.

1)   public static void LinqExample()
2)   {
3)       int[] numbers = { 7, 2, 4 };
4)
5)       IndexExpression indexExpression = Expression.ArrayAccess(
6)           Expression.Constant(numbers),
7)           Expression.Constant(1));
8)
9)       Expression arrayIndexExp = Expression.Assign(
10)               indexExpression, Expression.Constant(6));
11)
12)      Action arrayIndexDelegate = Expression.Lambda<Action>(arrayIndexExp).Compile();
13)      arrayIndexDelegate();
14)      Console.WriteLine("{0}, {1}, {2}", numbers[0], numbers[1], numbers[2]);
15)  }

Expression Abstraction, Reduction and Extension

One feature of DLR Expression I like very much is expression reduction. The concept is simple and yet very powerful. The idea is that you can extend DLR Expression by defining your own expression classes. Since you are defining the classes, there is no way for the DLR Expression compiler/interpreter to know what they mean unless you tell it. And the way you tell it is by “reducing” your expressions to DLR expressions—the ones it already knows.  In other words, the set of expression classes in System.Linq.Expressions forms the baseline. If you can define the semantic meaning of your own expression classes in terms of those baseline classes, the DLR interpreter/compiler can understand and act on your expressions.

Expression reduction provides an extension mechanism we can use to define our own custom expression classes. Not only does expression reduction make DLR Expression extensible, it also allows for different levels of abstraction among expression classes. This is because if X reduces to Y, you can think of X as being more abstract than Y. So if your expression classes reduce to the baseline DLR expression classes, yours are said to have a higher level of abstraction.

Let's look at an example of expression reduction. In earlier code examples, we achieve better code quality by using the ExpressionHelper.Print helper method in instead of repeating the code inside that helper method all over the place. Let's refresh ourselves a little bit about what the ExpressionHelper.Print method does. The method returns an expression that represents a call to Console.WriteLine. That's all it does. Here we'll do the same thing again, but this time we'll use expression reduction rather than the helper method. The plan is we will define a new expression class, and we will provide the logic for reducing our expression to the same expression ExpressionHelper.Print returns. Listing 2-28 shows the code for our new expression class, PrintExpression.

Listing 2-28. PrintExpression.cs

public class PrintExpression : Expression
{
    private String text;
    
    private static MethodInfo _METHOD = typeof(Console).GetMethod(
        "WriteLine", new Type[] { typeof(String) });
    
    public PrintExpression(String text)
    {
        this.text = text;
    }

    public String Text
    {
        get { return text; }
    }
    public override bool CanReduce
    {
        get { return true; }
    }

    public override Expression Reduce()
    {
        return Expression.Call(
                    null,
                    _METHOD,
                    Expression.Constant(text));
    }

    public override ExpressionType NodeType
    {
        get { return ExpressionType.Extension; }
    }

    public override Type Type
    {
        get { return _METHOD.ReturnType; }
    }

    public override string ToString()
    {
        return "print " + text;
    }
}

There are two basic requirements we need to meet when implementing a custom expression class. First, our class must derive directly or indirectly from System.Linq.Expressions.Expression. Second, the NodeType property of our class must return ExpressionType.Extension. Beyond those requirements, since we want our PrintExpression to be able to reduce to a MethodCallExpression, we make the CanReduce property of our class return true. The actual logic that performs the reduction is in the Reduce method. As you can see in Listing 2-28, in the Reduce method, we simply invoke the Expression.Call factory method to create and then return an instance of MethodCallExpression that represents a call to Console.WriteLine. Last but not least, don't forget to take care of the Type property. In our case, our expression's value is the same as the return value of the call to Console.WriteLine. And that value's type is _METHOD.ReturnType.

Now that we have the PrintExpression class defined, let's see how it's used. Because all the printing-related code is modularized into the ExpressionHelper.Print method, we don't need to make changes all over the place in our code. All we need is modify the ExpressionHelper.Print method to use our PrintExpression class like the code snippet below shows:

public class ExpressionHelper
{
    public static Expression Print(string text)
    {
        return new PrintExpression(text);
    }
}

That's it. Just one line of code in the method body! At run time, when the DLR Expression interpreter/compiler sees an instance of PrintExpression, it knows that the expression is reducible. So it calls the Reduce method on the expression and gets back a MethodCallExpression instance. The code snippet above is good, but we can make it slightly better. If you recall, DLR Expression provides factory methods for creating expressions. Our code snippet above is a factory method for creating instances of PrintExpression. To make our factory method look more aligned with the DLR's factory methods, let's rename the class name ExpressionHelper to ExpressionEx. Let's also change the return type of the Print method from Expression to PrintExpression. After those changes, our code becomes:

public partial class ExpressionEx
{
    public static PrintExpression Print(String text)
    {
        return new PrintExpression(text);
    }
}

IronPython makes a lot of use of expression reduction. If you take a look at the IronPython source code in the IronPython.Compiler.Ast namespace, you'll see classes like ForStatement, ImportStatement, ScopeStatement, NameExpression and many others. Those classes all derive directly or indirectly from System.Linq.Expressions.Expression and implement their specific reduction logic in the Reduce method.

Immutability and the Visitor Pattern

A DLR expression can have child expressions. The child expressions can in turn have their own child expressions, and so on. All together, the expressions form a tree. There is one root expression in the tree. The leaves of the tree are the terminal expressions that don't have any child expression of their own.

Every expression in the tree is immutable. Once it's created, its states are read-only and can't be changed. The only way to achieve the effect of changing the states of an expression is to create a new expression and toss out the old one. The concept I've just described is termed immutability.

Immutability is a very generic programming concept, and it's not specific to DLR Expression. The class System.String is immutable, as are many other classes that have no relation to DLR Expression. There are many nice benefits of making a class immutable. For example, instances of an immutable class are automatically thread-safe. You don't need to synchronize access to those instances from multiple threads because those instances' states are read-only and, by virtue of that, those states can't be corrupted by thread race conditions.

Because expressions are immutable and because they almost always form a tree except in trivial cases, it's not a straightforward thing to change an expression in a tree. To change an expression in a tree, you also need to change that expression's parent, and the parent's, parent and so on. Figure 2-4 shows a pictorial view of this propagation of changes. All the nodes in the tree are immutable. When we change node 1 by creating a new node, we need to assign the new node as a new child of node 2. Because node 2 is immutable, we can't simply change its children. We have to create a new node 2 and assign new node 1 as its child. We also need to assign the other unchanged child of the old node 2 to be a child of the new node 2. And because we change node 2, we have to change node 3 by creating a new node 3. We then have to assign the new node 2 as a child of new node 3. We also need to assign the other 3 unchanged children of old node 3 to be children of the new node 3.

images

Figure 2-4. Propagation of changes in an expression tree

As you can see, it's a lot of work to change even a single node in a tree. When we make changes to a tree, there are mainly three things in the picture—the tree itself (i.e., the data structure), the tree traversal (i.e., walking the tree), and the actions we perform every time we encounter a node during tree traversal. Fortunately, this problem happens so often that a solution for it already exists—the Visitor design pattern. In the rest of this section, we will look at how the Visitor pattern provides an elegant solution for the problem of changing immutable trees. We will look at the pattern in general, as well as in the context of DLR Expression. As always, I will demonstrate how it works in practice with an example.

Visitor Pattern in General

Figure 2-5 shows the class diagram of the Visitor pattern in general. As the diagram shows, there are two class hierarchies in the Visitor pattern. The Element class hierarchy is the data structures and the Visitor class hierarchy is the algorithms that work on the data structures. The Client in Figure 2-5 is the glue that decides which algorithms to apply to which data structures. The spirit of the pattern is the decoupling between the data structures and the algorithms.

images

Figure 2-5. Class diagram of the Visitor design pattern

The Element class (which can be an interface) has an Accept method. The Accept method can be abstract in the Element class or it can have concrete implementation. Classes derived from Element can choose to override or not to override the Accept method. A typical implementation of the Accept method simply asks the visitor to visit the current element, like this:

virtual Element Accept(Visitor visitor)
{
    visitor.Visit(this);
}

From the code, it is clear that passing different visitors to the Accept method will have different results. If we need to introduce a new way of visiting the elements, we don't need to change any code in the Element class and its derivatives. We can encapsulate that new algorithm (i.e., the new way of visiting the elements) into a new subclass of the Visitor class and pass an instance of that subclass to the Accept method.

The Visitor class (it can be an interface) has one overloaded Visit method for each class derived from Element. Those overloaded Visit methods can be abstract or they can have a concrete implementation. A class derived from Visitor can choose to override those Visit methods that it wants to provide new logic for. For example, if we need VisitorX in Figure 2-5 to provide logic for visiting ElementA, then VisitorX will override only the method virtual Element Visit(ElementA element).

The previous section mentioned that when we make changes to a tree, there are mainly three things in the picture—the tree itself, the tree traversal, and the actions we perform at each node. That observation links to the Visitor pattern nicely. The tree is the data structure and is represented by the Element hierarchy in the Visitor pattern. The tree traversal plus the actions are algorithms and they are represented by the Visitor hierarchy.

Visitor Pattern in DLR Expression

DLR Expression implements a slight variation of the Visitor pattern. Figure 2-6 shows that variation in a class diagram.

images

Figure 2-6. Class diagram of the Visitor design pattern implemented in the DLR

There is one key difference to note about the class diagram in Figure 2-6, and that's the VisitChildren method in the Expression class. The VisitChildren method is about how to visit the child expressions of the current expression. In other words, the method is about tree traversal and hence should belong to the Visitor class hierarchy. However, in DLR Expression, the method is in the Expression class hierarchy. And if you look inside the source code of the DLR, you'll see that all of the VisitChildren methods implemented in BinaryExpression, BlockExpression, and other DLR expression classes delegate the real job to methods in the ExpressionVisitor class. That appears to be unnecessarily convoluted at first glance. Why does the DLR put the VisitChildren method in the Expression class hierarchy and delegate the real job to the Visitor class hierarchy? Why not simply follow the original Visitor pattern and put the VisitChildren method in the Visitor class hierarchy in the first place? The reason is expression extension.

For the built-in DLR expression classes, the DLR knows how to traverse their child expressions. For custom extensions like the PrintExpression class we implemented, the DLR does not know how we would like to traverse the child expressions, and therefore it leaves that decision to us. The DLR defines the VisitChildren method in the Expression class so that when we define a custom expression class, we can override the VisitChildren method and implement our own logic for traversing the custom expression's child expressions. This explains why the method VisitChildren exists in the Expression class hierarchy. For the built-in DLR expression classes, DLR chooses to have a more truthful implementation of the Visitor design pattern, and hence it makes the VisitChildren method in BinaryExpression, BlockExpression, and so on delegate the real job back to methods in the ExpressionVisitor class. This choice makes total sense. The intention of the DLR Expression design is to keep the Expression class hierarchy intact while allowing developers to implement whatever tree walking and visiting behavior they fancy. Because BinaryExpression, BlockExpression, and so on delegate the real job of the VisitChildren method to ExpressionVisitor, we can override those methods when we implement a class derived from ExpressionVisitor without ever needing to change anything in the Expression class hierarchy.

If you want to look at the DLR source code to see how the built-in DLR expression classes traverse their child expressions, here's a rundown of how the VisitChildren method in BinaryExpression, BlockExpression, and so on delegates the real job back to methods in the ExpressionVisitor class. The built-in expression classes like BinaryExpression and BlockExpression all inherit the VisitChildren method implementation from the base Expression class. If you look at the VisitChildren method implemented in the Expression class, you'll see that it takes a visitor (instance of ExpressionVisitor) as the input parameter and calls the Visit method on the visitor. The visitor's Visit method will in turn call back the Accept method of the expression. As you can see, the mechanism is convoluted. The expression calls the visitor and then the visitor calls back the expression. And the convolution is not finished yet. The expression's Accept method is overridden in each Expression subclass to call some method on the visitor. For example, if the expression is an instance of BinaryExpression, its Accept method will call the VisitBinary method on the visitor. And the VisitBinary method is where the logic resides for visiting a binary expression's child expressions.

Expression Visitor Examples

Let's take a look at two examples of walking and visiting expressions. The first example is a trivial case that doesn't involve visiting child expressions. I think it's helpful to use this simpler example to lead into the second example, which involves visiting child expressions. Both examples will use the PrintExpression class we built earlier.

Listing 2-29 shows the visitor class for the PrintExpression class we saw earlier. The PrintExpression class doesn't have any child expressions. Because PrintExpression is an extension expression (i.e., its NodeType property returns ExpressionType.Extension), we override the VisitExtension method in PrintExpressionVisitor. The overridden VisitExtension method checks whether the node argument, the currently visited node, is an instance of PrintExpression. If so, the method creates a new PrintExpression instance whose text is the text of the original PrintExpression instance with “Hello” appended to the beginning.

Listing 2-29. PrintExpressionVisitor.cs

public class PrintExpressionVisitor : ExpressionVisitor
{
    protected override Expression VisitExtension(Expression node)
    {
        if (!(node is PrintExpression))
            return base.VisitExtension(node);

        PrintExpression printExpression = (PrintExpression) node;
        return new PrintExpression("Hello " + printExpression.Text);
    }
}

Listing 2-30 shows the client code that uses PrintExpressionVisitor to modify an instance of PrintExpression. The PrintExpression instance has the text “Bob” at the beginning. After the visitor visits it, the text becomes “Hello Bob”, which is the text you'll see on the screen when you run the example code.

Listing 2-30. Using ExpressionVisitor to Modify a PrintExpression Instance (ExpressionVisitorExamples.cs)

public static void RunExpressionVisitorExample()
{
    PrintExpressionVisitor visitor = new PrintExpressionVisitor();
    Expression bob = ExpressionEx.Print("Bob");
    Expression visitedBob = visitor.Visit(bob);
    Action visitedBobDelegate = Expression.Lambda<Action>(visitedBob).Compile();
    visitedBobDelegate();
}

This example shows how expression visitors work without the complexity of child expressions. In reality, that rarely happens. More often than not, we need to deal with expressions that have child expressions while we walk and visit expression trees. So let's see an example of this.

First, we need an expression class that has at least one child expression. For that, let's define a new PrintExpression2 class based on PrintExpression. Listing 2-31 shows the PrintExpression2 class. As you can see, the code is largely the same as the code of PrintExpression. The only difference is that PrintExpression stores the text to print directly as a string whereas PrintExpression2 stores a ConstantExpression that in turn stores the string text. The ConstantExpression instance is the child expression of PrintExpression2.

Because PrintExpression2 has a child expression, we override the VisitChildren method that PrintExpression2 inherits from the Expression class. In the VisitChildren method, we need to specify how we would like to visit the child expression(s). In this example, there is only one child expression and we don't need to do anything special other than having the visitor visit it. To have the visitor visit the child expression, the VisitChildren method in Listing 2-31 simply calls the Visit method on the visitor variable and passes textExpression, the child expression, as an input argument to the Visit method.

Listing 2-31. PrintExpression2.cs

public class PrintExpression2 : Expression
{
    private ConstantExpression textExpression;
    
    private static MethodInfo _METHOD = typeof(Console).GetMethod(
        "WriteLine", new Type[] { typeof(String) });
    
    public PrintExpression2(ConstantExpression textExpression)
    {
        this.textExpression = textExpression;
    }

    public override bool CanReduce
    {
        get { return true; }
    }

    public override Expression Reduce()
    {
        return Expression.Call(
                    null,
                    _METHOD,
                    textExpression);
    }

    public override ExpressionType NodeType
    {
        get { return ExpressionType.Extension; }
    }

    public override Type Type
    {
        get { return _METHOD.ReturnType; }
    }

    protected override Expression VisitChildren(ExpressionVisitor visitor)
    {
        return visitor.Visit(textExpression);
    }

    public override string ToString()
    {
        return "print " + textExpression.Value;
    }
}

Now that we have an expression that has a child expression, let's see how to visit and modify it. Listing 2-32 shows the visitor class for PrintExpression2. The visitor class overrides not only the VisitExtension method but also the VisitConstant method. This is because PrintExpression2 is an extension expression and the child of PrintExpression2 is a constant expression (i.e., an instance of ConstantExpression). Notice also that the visitor class has a state, withinPrintExpression, to track whether a constant expression it encounters is a child of a PrintExpression2 expression. There might be other constant expressions in a tree that are not children of PrintExpression2 expressions and we don't want to modify those. States like withinPrintExpression are common in detecting relationships between nodes in a tree. The relationship between a PrintExpression2 expression and its child constant expression is a tree pattern we want to match in our example. We achieve that by using the withinPrintExpression member variable of PrintExpressionVisitor2.

In Listing 2-32, the VisitConstant method uses withinPrintExpression to check whether the currently visited node is a child expression of a PrintExpression2 expression. If so, it creates a new constant expression whose text is the text of the currently visited node prefixed with “Hello ”. The VisitExtension method in Listing 2-32 checks whether the currently visited node is an instance of PrintExpression2. If so, it sets withinPrintExpression to true and calls base.VisitExtension(node). The call to base.VisitExtension(node) will in turn call PrintExpression2's VisitChildren, which will call the VisitConstant method of PrintExpressionVisitor2. Finally, when all those calls return, we get back a new child constant expression whose text is prefixed with “Hello ”. The new child constant expression is assigned to the variable modifiedTextExpression. Because the child constant expression has changed, the parent needs to change too. That's why the code in line 16 creates a new instance of PrintExpression2 with modifiedTextExpression as its child expression.

Listing 2-32. PrintExpressionVisitor2.cs

1)   public class PrintExpressionVisitor2 : ExpressionVisitor
2)   {
3)       private bool withinPrintExpression = false;
4)
5)       protected override Expression VisitExtension(Expression node)
6)       {
7)           if (!(node is PrintExpression2))
8)               return base.VisitExtension(node);
9)
10)          withinPrintExpression = true;
11)
12)          ConstantExpression modifiedTextExpression =
13)  (ConstantExpression) base.VisitExtension(node);
14)
15)          //need to change the parent when the child is changed.
16)          PrintExpression2 newExpression = new
17)                PrintExpression2(modifiedTextExpression);
18)
19)          withinPrintExpression = false;
20)
21)          return newExpression;
22)      }
23)
24)      protected override Expression VisitConstant(ConstantExpression node)
25)      {
26)          if (!withinPrintExpression)
27)              return base.VisitConstant(node);
28)
29)          return Expression.Constant("Hello " + node.Value);
30)      }
31)  }

With this example, it should be clear why the DLR defines methods like VisitConstant, VisitBinary, and so on in the ExpressionVisitor class for the built-in expression classes. Had the DLR not done that, we would not be able to visit the constant expressions the way we do in this example without changing the ConstantExpression class. That would be a real mess.

Listing 2-33 shows the client code that uses PrintExpressionVisitor2 to visit and modify a PrintExpression2 instance. The code is largely the same as the client code we saw in the previous example (Listing 2-30). If you run the code, you will see “Hello Bob” printed to the screen.

Listing 2-33. Using ExpressionVisitor to Modify a PrintExpression2 Instance (ExpressionVisitorExamples.cs)

public static void RunExpressionVisitor2Example()
{
    PrintExpressionVisitor2 visitor = new PrintExpressionVisitor2();
    Expression bob = new PrintExpression2(Expression.Constant(text));
    Expression visitedBob = visitor.Visit(bob);
    Action visitedBobDelegate = Expression.Lambda<Action>(visitedBob).Compile();
    visitedBobDelegate();
}

Summary

DLR Expression is the foundation that higher-level DLR features are based on. This chapter begins with a comparison between DLR Expression and a typical programming language. In that part of the chapter, I highlighted some interesting and important characteristics of DLR Expression as a programming language. Then we went through a spate of examples showing how to use the various DLR expression classes. In the last part of this chapter, we looked at the important topic of modifying immutable expression trees using the Visitor design pattern.

This chapter is not intended to be a comprehensive reference for all expression classes that the DLR defines, and so it doesn't cover all DLR expression classes. The chapter covers expression classes such as DynamicExpression and GotoExpression because (a) they are important, (b) they are harder to understand than other expression classes, and (c) they show up more frequently in the rest of the book. For the DLR expression classes not covered in this chapter, you can refer to the MSDN documentation and use what you learn in this chapter to help you figure out how to use them.

After reading this chapter, you are in a good position to explore fascinating DLR features such as binders and dynamic objects in the next three chapters. As you will see in those chapters, we will be using DLR expressions a lot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.119