Chapter 9. Lambda expressions and expression trees

This chapter covers

  • Lambda expression syntax

  • Conversions from lambdas to delegates

  • Expression tree framework classes

  • Conversions from lambdas to expression trees

  • Why expression trees matter

  • Changes to type inference

  • Changes to overload resolution

In chapter 5 we saw how C# 2 made delegates much easier to use due to implicit conversions of method groups, anonymous methods, and parameter covariance. This is enough to make event subscription significantly simpler and more readable, but delegates in C#2 are still too bulky to be used all the time: a page of code full of anonymous methods is quite painful to read, and you certainly wouldn’t want to start putting multiple anonymous methods in a single statement on a regular basis.

One of the fundamental building blocks of LINQ is the ability to create pipelines of operations, along with any state required by those operations. These operations express all kinds of logic about data: how to filter it, how to order it, how to join different data sources together, and much more. When LINQ queries are executed “in process,” those operations are usually represented by delegates.

Statements containing several delegates are common when manipulating data with LINQ to Objects,[1] and lambda expressions in C#3 make all of this possible without sacrificing readability. (While I’m mentioning readability, this chapter uses lambda expression and lambda interchangeably; as I need to refer to normal expressions quite a lot, it helps to use the short version in many cases.)

Note

It’s all Greek to me—The term lambda expression comes from lambda calculus, also written as λcalculus, where λis the Greek letter lambda. This is an area of math and computer science dealing with defining and applying functions. It’s been around for a long time and is the basis of functional languages such as ML. The good news is that you don’t need to know lambda calculus to use lambda expressions in C# 3.

Executing delegates is only part of the LINQ story. To use databases and other query engines efficiently, we need a different representation of the operations in the pipeline: a way of treating code as data that can be examined programmatically. The logic within the operations can then be transformed into a different form, such as a web service call, a SQL or LDAP query—whatever is appropriate.

Although it’s possible to build up representations of queries in a particular API, it’s usually tricky to read and sacrifices a lot of compiler support. This is where lambdas save the day again: not only can they be used to create delegate instances, but the C# compiler can also transform them into expression trees—data structures representing the logic of the lambda expressions so that other code can examine it. In short, lambda expressions are the idiomatic way of representing the operations in LINQ data pipelines—but we’ll be taking things one step at a time, examining them in a fairly isolated way before we embrace the whole of LINQ.

In this chapter we’ll look at both ways of using lambda expressions, although for the moment our coverage of expression trees will be relatively basic—we’re not going to actually create any SQL just yet. However, with the theory under your belt you should be relatively comfortable with lambda expressions and expression trees by the time we hit the really impressive stuff in chapter 12.

In the final part of this chapter, we’ll examine how type inference has changed for C# 3, mostly due to lambdas with implicit parameter types. This is a bit like learning how to tie shoelaces: far from exciting, but without this ability you’ll trip over yourself when you start running.

Let’s begin by seeing what lambda expressions look like. We’ll start with an anonymous method and gradually transform it into shorter and shorter forms.

Lambda expressions as delegates

In many ways, lambda expressions can be seen as an evolution of anonymous methods from C#2. There’s almost nothing that an anonymous method can do that can’t be done using a lambda expression, and it’s almost always more readable and compact using lambdas. In particular, the behavior of captured variables is exactly the same in lambda expressions as in anonymous methods. In their most explicit form, not much difference exists between the two—but lambda expressions have a lot of shortcuts available to make them compact in common situations. Like anonymous methods, lambda expressions have special conversion rules—the type of the expression isn’t a delegate type in itself, but it can be converted into a delegate instance in various ways, both implicitly and explicitly. The term anonymous function covers anonymous methods and lambda expressions—in many cases the same conversion rules apply to both of them.

We’re going to start with a very simple example, initially expressed as an anonymous method. We’ll create a delegate instance that takes a string parameter and returns an int (which is the length of the string). First we need to choose a delegate type to use; fortunately, .NET 3.5 comes with a whole family of generic delegate types to help us out.

Preliminaries: introducing the Func<...> delegate types

There are five generic Func delegate types in the System namespace of .NET 3.5. There’s nothing special about Func—it’s just handy to have some predefined generic types that are capable of handling many situations. Each delegate signature takes between zero and four parameters, the types of which are specified as type parameters. The last type parameter is used for the return type in each case. Here are the signatures of all the Func delegate types:

public delegate TResult Func<TResult>()

public delegate TResult Func<T,TResult>(T arg)

public delegate TResult Func<T1,T2,TResult>(T1 arg1, T2 arg2)

public delegate TResult Func<T1,T2,T3,TResult>
   (T1 arg1, T2 arg2, T3 arg3)

public delegate TResult Func<T1,T2,T3,T4,TResult>
   (T1 arg1, T2 arg2, T3 arg3, T4 arg4)

For example, Func<string,double,int> is equivalent to a delegate type of the form

delegate int SomeDelegate(string arg1, double arg2)

The Action<...> set of delegates provide the equivalent functionality when you want a void return type. The single parameter form of Action existed in .NET 2.0, but the rest are new to .NET 3.5. For our example we need a type that takes a string parameter and returns an int, so we’ll use Func<string,int>.

First transformation to a lambda expression

Now that we know the delegate type, we can use an anonymous method to create our delegate instance. Listing 9.1 shows this, along with executing the delegate instance afterward so we can see it working.

Example 9.1. Using an anonymous method to create a delegate instance

Func<string,int> returnLength;
returnLength = delegate (string text) { return text.Length; };

Console.WriteLine (returnLength("Hello"));

Listing 9.1 prints “5,” just as we’d expect it to. I’ve separated out the declaration of returnLength from the assignment to it so we can keep it on one line—it’s easier to keep track of that way. The anonymous method expression is the part in bold, and that’s the part we’re going to convert into a lambda expression.

The most long-winded form of a lambda expression is this:

(explicitly-typed-parameter-list) => { statements }

The => part is new to C#3 and tells the compiler that we’re using a lambda expression. Most of the time lambda expressions are used with a delegate type that has a nonvoid return type—the syntax is slightly less intuitive when there isn’t a result. This is another indication of the changes in idiom between C# 1 and C# 3. In C# 1, delegates were usually used for events and rarely returned anything. Although lambda expressions certainly can be used in this way (and we’ll show an example of this later), much of their elegance comes from the shortcuts that are available when they need to return a value.

With the explicit parameters and statements in braces, this version looks very similar to an anonymous method. Listing 9.2 is equivalent to listing 9.1 but uses a lambda expression.

Example 9.2. A long-winded first lambda expression, similar to an anonymous method

Func<string,int> returnLength;
returnLength = (string text) => { return text.Length; };

Console.WriteLine (returnLength("Hello"));

Again, I’ve used bold to indicate the expression used to create the delegate instance. When reading lambda expressions, it helps to think of the => part as “goes to”—so the example in listing 9.2 could be read as “text goes to text.Length.” As this is the only part of the listing that is interesting for a while, I’ll show it alone from now on. You can replace the bold text from listing 9.2 with any of the lambda expressions listed in this section and the result will be the same.

The same rules that govern return statements in anonymous methods apply to lambdas too: you can’t try to return a value from a lambda expression with a void return type, whereas if there’s a nonvoid return type every code path has to return a compatible value.[2] It’s all pretty intuitive and rarely gets in the way.

So far, we haven’t saved much space or made things particularly easy to read. Let’s start applying the shortcuts.

Using a single expression as the body

The form we’ve seen so far uses a full block of code to return the value. This is very flexible—you can have multiple statements, perform loops, return from different places in the block, and so on, just as with anonymous methods. Most of the time, however, you can easily express the whole of the body in a single expression, the value of which is the result of the lambda. In these cases, you can specify just that expression, without any braces, return statements, or semicolons. The format then is

(explicitly-typed-parameter-list) => expression

In our case, this means that the lambda expression becomes

(string text) => text.Length

That’s starting to look simpler already. Now, what about that parameter type? The compiler already knows that instances of Func<string,int> take a single string parameter, so we should be able to just name that parameter...

Implicitly typed parameter lists

Most of the time, the compiler can guess the parameter types without you explicitly stating them. In these cases, you can write the lambda expression as

(implicitly-typed-parameter-list) => expression
Implicitly typed parameter lists

An implicitly typed parameter list is just a comma-separated list of names, without the types. You can’t mix and match for different parameters—either the whole list is explicitly typed, or it’s all implicitly typed. Also, if any of the parameters are out or ref parameters, you are forced to use explicit typing. In our case, however, it’s fine—so our lambda expression is now just

(text) => text.Length

That’s getting pretty short now—there’s not a lot more we could get rid of. The parentheses seem a bit redundant, though.

Shortcut for a single parameter

When the lambda expression only needs a single parameter, and that parameter can be implicitly typed, C#3 allows us to omit the parentheses, so it now has this form:

parameter-name => expression

The final form of our lambda expression is therefore

text => text.Length

You may be wondering why there are so many special cases with lambda expressions—none of the rest of the language cares whether a method has one parameter or more, for instance. Well, what sounds like a very particular case actually turns out to be extremely common, and the improvement in readability from removing the parentheses from the parameter list can be significant when there are many lambdas in a short piece of code.

It’s worth noting that you can put parentheses around the whole lambda expression if you want to, just like other expressions. Sometimes this helps readability in the case where you’re assigning the lambda to a variable or property—otherwise, the equals symbols can get confusing. Listing 9.3 shows this in the context of our original code.

Example 9.3. A concise lambda expression, bracketed for clarity

Func<string,int> returnLength;
returnLength = (text => text.Length);

Console.WriteLine (returnLength("Hello"));

At first you may find listing 9.3 a bit confusing to read, in the same way that anonymous methods appear strange to many developers until they get used to them. When you are used to lambda expressions, however, you can appreciate how concise they are. It would be hard to imagine a shorter, clearer way of creating a delegate instance.[3] We could have changed the variable name text to something like x, and in full LINQ that’s often useful, but longer names give a bit more information to the reader.

The decision of whether to use the short form for the body of the lambda expression, specifying just an expression instead of a whole block, is completely independent from the decision about whether to use explicit or implicit parameters. We happen to have gone down one route of shortening the lambda, but we could have started off by making the parameters implicit.

Note

Higher-order functions—The body of a lambda expression can itself contain a lambda expression—and it tends to be as confusing as it sounds. Alternatively, the parameter to a lambda expression can be another delegate, which is just as bad. Both of these are examples of higher-order functions. If you enjoy feeling dazed and confused, have a look at some of the sample code in the downloadable source. Although I’m being flippant, this approach is common in functional programming and can be very useful. It just takes a certain degree of perseverance to get into the right mind-set.

Higher-order functions—

So far we’ve only dealt with a single lambda expression, just putting it into different forms. Let’s take a look at a few examples to make things more concrete before we examine the details.

Simple examples using List<T> and events

When we look at extension methods in chapter 10, we’ll use lambda expressions all the time. Until then, List<T> and event handlers give us the best examples. We’ll start off with lists, using automatically implemented properties, implicitly typed local variables, and collection initializers for the sake of brevity. We’ll then call methods that take delegate parameters—using lambda expressions to create the delegates, of course.

Filtering, sorting, and actions on lists

If you remember the FindAll method on List<T>, it takes a Predicate<T> and returns a new list with all the elements from the original list that match the predicate. The Sort method takes a Comparison<T> and sorts the list accordingly. Finally, the ForEach method takes an Action<T> to perform on each element. Listing 9.4 uses lambda expressions to provide the delegate instance to each of these methods. The sample data in question is just the name and year of release for various films. We print out the original list, then create and print out a filtered list of only old films, then sort and print out the original list, ordered by name. (It’s interesting to consider how much more code would have been required to do the same thing in C#1, by the way.)

Example 9.4. Manipulating a list of films using lambda expressions

Manipulating a list of films using lambda expressions

The first half of listing 9.4 involves just setting up the data. I would have used an anonymous type, but it’s relatively tricky to create a generic list from a collection of anonymous type instances. (You can do it by creating a generic method that takes an array and converts it to a list of the same type, then pass an implicitly typed array into that method. An extension method in .NET 3.5 called ToList provides this functionality too, but that would be cheating as we haven’t looked at extension methods yet!)

Before we use the newly created list, we create a delegate instance Manipulating a list of films using lambda expressions, which we’ll use to print out the items of the list. We use this delegate instance three times, which is why I’ve created a variable to hold it rather than using a separate lambda expression each time. It just prints a single element, but by passing it into List<T>.ForEach we can simply dump the whole list to the console.

The first list we print out Manipulating a list of films using lambda expressions is just the original one without any modifications. We then find all the films in our list that were made before 1960 and print those out Manipulating a list of films using lambda expressions. This is done with another lambda expression, which is executed for each film in the list—it only has to determine whether or not a single film should be included in the filtered list. The source code uses the lambda expression as a method argument, but really the compiler has created a method like this:

private static bool SomeAutoGeneratedName(Film film)
{
   return film.Year < 1960;
}

The method call to FindAll is then effectively this:

films.FindAll(new Predicate<Film>(SomeAutoGeneratedName))

The lambda expression support here is just like the anonymous method support in C# 2; it’s all cleverness on the part of the compiler. (In fact, the Microsoft compiler is even smarter in this case—it realizes it can get away with reusing the delegate instance if the code is ever called again, so caches it.)

The sort Manipulating a list of films using lambda expressions is also performed using a lambda expression, which compares any two films using their names. I have to confess that explicitly calling CompareTo ourselves is a bit ugly. In the next chapter we’ll see how the OrderBy extension method allows us to express ordering in a neater way.

Let’s look at a different example, this time using lambda expressions with event handling.

Logging in an event handler

If you think back to chapter 5, in listing 5.9 we saw an easy way of using anonymous methods to log which events were occurring—but we were only able to get away with a compact syntax because we didn’t mind losing the parameter information. What if we wanted to log both the nature of the event and information about its sender and arguments? Lambda expressions enable this in a very neat way, as shown in listing 9.5.

Example 9.5. Logging events using lambda expressions

static void Log(string title, object sender, EventArgs e)
{
   Console.WriteLine("Event: {0}", title);
   Console.WriteLine("  Sender: {0}", sender);
   Console.WriteLine("  Arguments: {0}", e.GetType());
   foreach (PropertyDescriptor prop in
            TypeDescriptor.GetProperties(e))
   {
      string name = prop.DisplayName;
      object value = prop.GetValue(e);
      Console.WriteLine("    {0}={1}", name, value);
   }
}
...
Button button = new Button();
button.Text = "Click me";
button.Click     += (src, e) => { Log("Click", src, e); };
button.KeyPress   += (src, e) => { Log("KeyPress", src, e); };
button.MouseClick  += (src, e) => { Log("MouseClick", src, e); };

Form form = new Form();
form.AutoSize=true;
form.Controls.Add(button);
Application.Run(form);

Listing 9.5 uses lambda expressions to pass the event name and parameters to the Log method, which logs details of the event. We don’t log the details of the source event, beyond whatever its ToString override returns, because there’s an overwhelming amount of information associated with controls. However, we use reflection over property descriptors to show the details of the EventArgs instance passed to us. Here’s some sample output when you click the button:

Event: Click
 Sender: System.Windows.Forms.Button, Text: Click me
 Arguments: System.Windows.Forms.MouseEventArgs
   Button=Left
   Clicks=1
   X=53
   Y=17
   Delta=0
   Location={X=53,Y=17}
Event: MouseClick
 Sender: System.Windows.Forms.Button, Text: Click me
 Arguments: System.Windows.Forms.MouseEventArgs
   Button=Left
   Clicks=1
   X=53
   Y=17
   Delta=0
   Location={X=53,Y=17}

All of this is possible without lambda expressions, of course—but it’s a lot neater than it would have been otherwise. Now that we’ve seen lambdas being converted into delegate instances, it’s time to look at expression trees, which represent lambda expressions as data instead of code.

Expression trees

The idea of “code as data” is an old one, but it hasn’t been used much in popular programming languages. You could argue that all .NET programs use the concept, because the IL code is treated as data by the JIT, which then converts it into native code to run on your CPU. That’s quite deeply hidden, though, and while libraries exist to manipulate IL programmatically, they’re not widely used.

Expression trees in .NET 3.5 provide an abstract way of representing some code as a tree of objects. It’s like CodeDOM but operating at a slightly higher level, and only for expressions. The primary use of expression trees is in LINQ, and later in this section we’ll see how crucial expression trees are to the whole LINQ story.

C# 3 provides built-in support for converting lambda expressions to expression trees, but before we cover that let’s explore how they fit into the .NET Framework without using any compiler tricks.

Building expression trees programmatically

Expression trees aren’t as mystical as they sound, although some of the uses they’re put to look like magic. As the name suggests, they’re trees of objects, where each node in the tree is an expression in itself. Different types of expressions represent the different operations that can be performed in code: binary operations, such as addition; unary operations, such as taking the length of an array; method calls; constructor calls; and so forth.

The System.Linq.Expressions namespace contains the various classes that represent expressions. All of them derive from the Expression class, which is abstract and mostly consists of static factory methods to create instances of other expression classes. It exposes two properties, however:

  • The Type property represents the .NET type of the evaluated expression—you can think of it like a return type. The type of an expression that fetches the Length property of a string would be int, for example.

  • The NodeType property returns the kind of expression represented, as a member of the ExpressionType enumeration, with values such as LessThan, Multiply, and Invoke. To use the same example, in myString.Length the property access part would have a node type of MemberAccess.

There are many classes derived from Expression, and some of them can have many different node types: BinaryExpression, for instance, represents any operation with two operands: arithmetic, logic, comparisons, array indexing, and the like. This is where the NodeType property is important, as it distinguishes between different kinds of expressions that are represented by the same class.

I don’t intend to cover every expression class or node type—there are far too many, and MSDN does a perfectly good job of explaining them. Instead, we’ll try to get a general feel for what you can do with expression trees.

Let’s start off by creating one of the simplest possible expression trees, adding two constant integers together. Listing 9.6 creates an expression tree to represent 2+3.

Example 9.6. A very simple expression tree, adding 2 and 3

Expression firstArg = Expression.Constant(2);
Expression secondArg = Expression.Constant(3);
Expression add = Expression.Add(firstArg, secondArg);

Console.WriteLine(add);

Running listing 9.6 will produce the output “(2 + 3),” which demonstrates that the various expression classes override ToString to produce human-readable output. Figure 9.1 depicts the tree generated by the code.

Graphical representation of the expression tree created by listing 9.6

Figure 9.1. Graphical representation of the expression tree created by listing 9.6

It’s worth noting that the “leaf” expressions are created first in the code: you build expressions from the bottom up. This is enforced by the fact that expressions are immutable—once you’ve created an expression, it will never change, so you can cache and reuse expressions at will.

Now that we’ve built up an expression tree, let’s try to actually execute it.

Compiling expression trees into delegates

One of the types derived from Expression is LambdaExpression. The generic class Expression<TDelegate> then derives from LambdaExpression. It’s all slightly confusing—figure 9.2 shows the type hierarchy to make things clearer.

Type hierarchy from Expression<TDelegate> up to Expression

Figure 9.2. Type hierarchy from Expression<TDelegate> up to Expression

The difference between Expression and Expression<TDelegate> is that the generic class is statically typed to indicate what kind of expression it is, in terms of return type and parameters. Fairly obviously, this is expressed by the TDelegate type parameter, which must be a delegate type. For instance, our simple addition expression is one that takes no parameters and returns an integer—this is matched by the signature of Func<int>, so we could use an Expression<Func<int>> to represent the expression in a statically typed manner. We do this using the Expression.Lambda method. This has a number of overloads—our examples use the generic method, which uses a type parameter to indicate the type of delegate we want to represent. See MSDN for alternatives.

So, what’s the point of doing this? Well, LambdaExpression has a Compile method that creates a delegate of the appropriate type. This delegate can now be executed in the normal manner, as if it had been created using a normal method or any other means. Listing 9.7 shows this in action, with the same expression as before.

Example 9.7. Compiling and executing an expression tree

Expression firstArg = Expression.Constant(2);
Expression secondArg = Expression.Constant(3);
Expression add = Expression.Add(firstArg, secondArg);

Func<int> compiled = Expression.Lambda<Func<int>>(add).Compile();
Console.WriteLine(compiled());

Arguably listing 9.7 is one of the most convoluted ways of printing out “5” that you could ask for. At the same time, it’s also rather impressive. We’re programmatically creating some logical blocks and representing them as normal objects, and then asking the framework to compile the whole thing into “real” code that can be executed. You may never need to actually use expression trees this way, or even build them up programmatically at all, but it’s useful background information that will help you understand how LINQ works.

As I said at the beginning of this section, expression trees are not too far removed from CodeDOM—Snippy compiles and executes C# code that has been entered as plain text, for instance. However, two significant differences exist between CodeDOM and expression trees.

First, expression trees are only able to represent single expressions. They’re not designed for whole classes, methods, or even just statements. Second, C# supports expression trees directly in the language, through lambda expressions. Let’s take a look at that now.

Converting C# lambda expressions to expression trees

As we’ve already seen, lambda expressions can be converted to appropriate delegate instances, either implicitly or explicitly. That’s not the only conversion that is available, however. You can also ask the compiler to build an expression tree from your lambda expression, creating an instance of Expression<TDelegate> at execution time. For example, listing 9.8 shows a much shorter way of creating the “return 5” expression, compiling it and then invoking the resulting delegate.

Example 9.8. Using lambda expressions to create expression trees

Expression<Func<int>> return5 = () => 5;
Func<int> compiled = return5.Compile();
Console.WriteLine(compiled());

In the first line of listing 9.8, the () => 5 part is the lambda expression. In this case, putting it in an extra pair of parentheses around the whole thing makes it look worse rather than better. Notice that we don’t need any casts because the compiler can verify everything as it goes. We could have written 2+3 instead of 5, but the compiler would have optimized the addition away for us. The important point to take away is that the lambda expression has been converted into an expression tree.

Note

There are limitations—Not all lambda expressions can be converted to expression trees. You can’t convert a lambda with a block of statements (even just one return statement) into an expression tree—it has to be in the form that just evaluates a single expression. That expression can’t contain assignments, as they can’t be represented in expression trees. Although these are the most common restrictions, they’re not the only ones—the full list is not worth describing here, as this issue comes up so rarely. If there’s a problem with an attempted conversion, you’ll find out at compile time.

Let’s take a look at a more complicated example just to see how things work, particularly with respect to parameters. This time we’ll write a predicate that takes two strings and checks to see if the first one begins with the second. The code is simple when written as a lambda expression, as shown in listing 9.9.

Example 9.9. Demonstration of a more complicated expression tree

Expression<Func<string,string,bool>> expression =
   ( (x,y) => x.StartsWith(y) );

var compiled = expression.Compile();

Console.WriteLine(compiled("First", "Second"));
Console.WriteLine(compiled("First", "Fir"));

The expression tree itself is more complicated, especially by the time we’ve converted it into an instance of LambdaExpression. Listing 9.10 shows how it’s built in code.

Example 9.10. Building a method call expression tree in code

Building a method call expression tree in code

As you can see, listing 9.10 is considerably more involved than the version with the C# lambda expression. However, it does make it more obvious exactly what is involved in the tree and how parameters are bound. We start off by working out everything we need to know about the method call that forms the body of the final expression Building a method call expression tree in code: the target of the method (in other words, the string we’re calling StartsWith on); the method itself (as a MethodInfo); and the list of arguments (in this case, just the one). It so happens that our method target and argument will both be parameters passed into the expression, but they could be other types of expressions—constants, the results of other method calls, property evaluations, and so forth.

After building the method call as an expression Building a method call expression tree in code, we then need to convert it into a lambda expression Building a method call expression tree in code, binding the parameters as we go. We reuse the same ParameterExpression values we created as information for the method call: the order in which they’re specified when creating the lambda expression is the order in which they’ll be picked up when we eventually call the delegate.

Figure 9.3 shows the same final expression tree graphically. To be picky, even though it’s still called an expression tree, the fact that we reuse the parameter expressions (and we have to—creating a new one with the same name and attempting to bind parameters that way causes an exception at execution time) means that it’s not a tree anymore.

Graphical representation of expression tree that calls a method and uses parameters from a lambda expression

Figure 9.3. Graphical representation of expression tree that calls a method and uses parameters from a lambda expression

Glancing at the complexity of figure 9.3 and listing 9.10 without trying to look at the details, you’d be forgiven for thinking that we were doing something really complicated when in fact it’s just a single method call. Imagine what the expression tree for a genuinely complex expression would look like—and then be grateful that C# 3 can create expression trees from lambda expressions!

One small point to note is that although the C#3 compiler builds expression trees in the compiled code using code similar to listing 9.10, it has one shortcut up its sleeve: it doesn’t need to use normal reflection to get the MethodInfo for string.StartsWith. Instead, it uses the method equivalent of the typeof operator. This is only available in IL, not in C# itself—and the same operator is also used to create delegate instances from method groups.

Now that we’ve seen how expression trees and lambda expressions are linked, let’s take a brief look at why they’re so useful.

Expression trees at the heart of LINQ

Without lambda expressions, expression trees would have relatively little value. They’d be an alternative to CodeDOM in cases where you only wanted to model a single expression instead of whole statements, methods, types and so forth—but the benefit would still be limited.

The reverse is also true to a limited extent: without expression trees, lambda expressions would certainly be less useful. Having a more compact way of creating delegate instances would still be welcome, and the shift toward a more functional mode of development would still be viable. Lambda expressions are particularly effective when combined with extension methods, as we’ll see in the next chapter. However, with expression trees in the picture as well, things get a lot more interesting.

So what do we get by combining lambda expressions, expression trees, and extension methods? The answer is the language side of LINQ, pretty much. The extra syntax we’ll see in chapter 11 is icing on the cake, but the story would still have been quite compelling with just those three ingredients. For a long time we’ve been able to either have nice compile-time checking or we’ve been able to tell another platform to run some code, usually expressed as text (SQL queries being the most obvious example). We haven’t been able to do both at the same time.

By combining lambda expressions that provide compile-time checks and expression trees that abstract the execution model away from the desired logic, we can have the best of both worlds—within reason. At the heart of “out of process” LINQ providers is the idea that we can produce an expression tree from a familiar source language (C# in our case) and use the result as an intermediate format, which can then be converted into the native language of the target platform: SQL, for example. In some cases there may not be a simple native language so much as a native API—making different web service calls depending on what the expression represents, perhaps. Figure 9.4 shows the different paths of LINQ to Objects and LINQ to SQL.

Both LINQ to Objects and LINQ to SQL start off with C# code, and end with query results. The ability to execute the code remotely comes through expression trees.

Figure 9.4. Both LINQ to Objects and LINQ to SQL start off with C# code, and end with query results. The ability to execute the code remotely comes through expression trees.

In some cases the conversion may try to perform all the logic on the target platform, whereas other cases may use the compilation facilities of expression trees to execute some of the expression locally and some elsewhere. We’ll look at some of the details of this conversion step in chapter 12, but you should bear this end goal in mind as we explore extension methods and LINQ syntax in chapters 10 and 11.

Note

Not all checking can be done by the compiler—When expression trees are examined by some sort of converter, there are often cases that have to be rejected. For instance, although it’s possible to convert a call to string.StartsWith into a similar SQL expression, a call to string. IsInterned doesn’t make sense in a database environment. Expression trees allow a large amount of compile-time safety, but the compiler can only check that the lambda expression can be converted into a valid expression tree; it can’t make sure that the expression tree will be suitable for its eventual use.

That finishes our direct coverage of lambda expressions and expression trees. Before we go any further, however, there are a few changes to C# that need some explanation, regarding type inference and how the compiler selects between overloaded methods.

Changes to type inference and overload resolution

The steps involved in type inference and overload resolution have been altered in C# 3 to accommodate lambda expressions and indeed to make anonymous methods more useful. This doesn’t count as a new feature of C# as such, but it can be important to understand what the compiler is going to do. If you find details like this tedious and irrelevant, feel free to skip to the chapter summary—but remember that this section exists, so you can read it if you run across a compilation error related to this topic and can’t understand why your code doesn’t work. (Alternatively, you might want to come back to this section if you find your code does compile, but you don’t think it should!)

Even within this section I’m not going to go into absolutely every nook and cranny—that’s what the language specification is for. Instead, I’ll give an overview of the new behavior, providing examples of common cases. The primary reason for changing the specification is to allow lambda expressions to work in a concise fashion, which is why I’ve included the topic in this particular chapter. Let’s look a little deeper at what problems we’d have run into if the C# team had stuck with the old rules.

Reasons for change: streamlining generic method calls

Type inference occurs in a few situations. We’ve already seen it apply to implicitly typed arrays, and it’s also required when you try to implicitly convert a method group to a delegate type as the parameter to a method—with overloading of the method being called, and overloading of methods within the method group, and the possibility of generic methods getting involved, the set of potential conversions can become quite overwhelming.

By far the most common situation for type inference is when you’re calling a generic method without specifying the type arguments for that method. This happens all the time in LINQ—the way that query expressions work depends on this heavily. It’s all handled so smoothly that it’s easy to ignore how much the compiler has to work out on your behalf, all for the sake of making your code clearer and more concise.

The rules were reasonably straightforward in C# 2, although method groups and anonymous methods weren’t always handled as well as we might have liked. The type inference process didn’t deduce any information from them, leading to situations where the desired behavior was obvious to developers but not to the compiler. Life is more complicated in C# 3 due to lambda expressions—if you call a generic method using a lambda expression with an implicitly typed parameter list, the compiler needs to work out what types you’re talking about, even before it can check the lambda expression’s body.

This is much easier to see in code than in words. Listing 9.11 gives an example of the kind of issue we want to solve: calling a generic method using a lambda expression.

Example 9.11. Example of code requiring the new type inference rules

static void PrintConvertedValue<TInput,TOutput>
   (TInput input, Converter<TInput,TOutput> converter)
{
   Console.WriteLine(converter(input));
}
...
PrintConvertedValue("I'm a string", x => x.Length);

The method PrintConvertedValue in listing 9.11 simply takes an input value and a delegate that can convert that value into a different type. It’s completely generic—it makes no assumptions about the type parameters TInput and TOutput. Now, look at the types of the arguments we’re calling it with in the bottom line of the listing. The first argument is clearly a string, but what about the second? It’s a lambda expression, so we need to convert it into a Converter<TInput,TOutput>—and that means we need to know the types of TInput and TOutput.

If you remember, the type inference rules of C#2 were applied to each argument individually, with no way of using the types inferred from one argument to another. In our case, these rules would have stopped us from finding the types of TInput and TOutput for the second argument, so the code in listing 9.11 would have failed to compile.

Our eventual goal is to understand what makes listing 9.11 compile in C#3 (and it does, I promise you), but we’ll start with something a bit more modest.

Inferred return types of anonymous functions

Listing 9.12 shows an example of some code that looks like it should compile but doesn’t under the type inference rules of C#2.

Example 9.12. Attempting to infer the return type of an anonymous method

Attempting to infer the return type of an anonymous method

Compiling listing 9.12 under C#2 gives an error

error CS0411: The type arguments for method
'Snippet.WriteResult<T>(Snippet.MyFunc<T>) ' cannot be inferred from the
usage. Try specifying the type arguments explicitly.

We can fix the error in two ways—either specify the type argument explicitly (as suggested by the compiler) or cast the anonymous method to a concrete delegate type:

WriteResult<int>(delegate { return 5; });

WriteResult((MyFunc<int>)delegate { return 5; });

Both of these work, but they’re slightly ugly. We’d like the compiler to perform the same kind of type inference as for nondelegate types, using the type of the returned expression to infer the type of T. That’s exactly what C# 3 does for both anonymous methods and lambda expressions—but there’s one catch. Although in many cases only one return statement is involved, there can sometimes be more. Listing 9.13 is a slightly modified version of listing 9.12 where the anonymous method sometimes returns an integer and sometimes returns an object.

Example 9.13. Code returning an integer or an object depending on the time of day

Code returning an integer or an object depending on the time of day

The compiler uses the same logic to determine the return type in this situation as it does for implicitly typed arrays, as described in section 8.4. It forms a set of all the types from the return statements in the body of the anonymous function[4] (in this case int and object) and checks to see if exactly one of the types can be implicitly converted to from all the others. There’s an implicit conversion from int to object (via boxing) but not from object to int, so the inference succeeds with object as the inferred return type. If there are no types matching that criterion, or more than one, no return type can be inferred and you’ll get a compilation error.

So, now we know how to work out the return type of an anonymous function—but what about lambda expressions where the parameter types can be implicitly defined?

Two-phase type inference

The details of type inference in C# 3 are much more complicated than they are for C# 2. It’s rare that you’ll need to reference the specification for the exact behavior, but if you do I recommend you write down all the type parameters, arguments, and so forth on a piece of paper, and then follow the specification step by step, carefully noting down every action it requires. You’ll end up with a sheet full of fixed and unfixed type variables, with a different set of bounds for each of them. A fixed type variable is one that the compiler has decided the value of; otherwise it is unfixed. A bound is a piece of information about a type variable. I suspect you’ll get a headache, too.

I’m going to present a more “fuzzy” way of thinking about type inference—one that is likely to serve just as well as knowing the specification, and will be a lot easier to understand. The fact is, if the compiler doesn’t perform type inference in exactly the way you want it to, it will almost certainly result in a compilation error rather than code that builds but doesn’t behave properly. If your code doesn’t build, try giving the compiler more information—it’s as simple as that. However, here’s roughly what’s changed for C#3.

The first big difference is that the method arguments work as a team in C# 3. In C# 2 every argument was used to try to pin down some type parameters exactly, and the compiler would complain if any two arguments came up with different results for a particular type parameter, even if they were compatible. In C# 3, arguments can contribute pieces of information—types that must be implicitly convertible to the final fixed value of a particular type parameter. The logic used to come up with that fixed value is the same as for inferred return types and implicitly typed arrays. Listing 9.14 shows an example of this—without using any lambda expressions or even anonymous methods.

Example 9.14. Flexible type inference combining information from multiple arguments

static void PrintType<T> (T first, T second)
{
   Console.WriteLine(typeof(T));
}
...
PrintType(1, new object());

Although the code in listing 9.14 is syntactically valid in C# 2, it wouldn’t build: type inference would fail, because the first parameter would decide that T must be int and the second parameter would decide that T must be object. In C# 3 the compiler determines that T should be object in exactly the same way that it did for the inferred return type in listing 9.13. In fact, the inferred return type rules are effectively one example of the more general process in C#3.

The second change is that type inference is now performed in two phases. The first phase deals with “normal” arguments where the types involved are known to begin with. This includes explicitly typed anonymous functions.

The second phase then kicks in, where implicitly typed lambda expressions and method groups have their types inferred. The idea is to see whether any of the information we’ve pieced together so far is enough to work out the parameter types of the lambda expression (or method group). If it is, the compiler is then able to examine the body of the lambda expression and work out the inferred return type—which is often another of the type parameters we’re looking for. If the second phase gives some more information, we go through it again, repeating until either we run out of clues or we’ve worked out all the type parameters involved.

Let’s look at two examples to show how it works. First we’ll take the code we started the section with—listing 9.11.

static void PrintConvertedValue<TInput,TOutput>
   (TInput input, Converter<TInput,TOutput> converter)
{
   Console.WriteLine(converter(input));
}
...
PrintConvertedValue("I'm a string", x => x.Length);

The type parameters we need to work out in listing 9.11 are TInput and TOutput. The steps performed are as follows:

  1. Phase 1 begins.

  2. The first parameter is of type TInput, and the first argument is of type string. We infer that there must be an implicit conversion from string to TInput.

  3. The second parameter is of type Converter<TInput,TOutput>, and the second argument is an implicitly typed lambda expression. No inference is performed—we don’t have enough information.

  4. Phase 2 begins.

  5. TInput doesn’t depend on any unfixed type parameters, so it’s fixed to string.

  6. The second argument now has a fixed input type, but an unfixed output type. We can consider it to be (string x) => x.Length and infer the return type as int. Therefore an implicit conversion must take place from int to TOutput.

  7. Phase 2 repeats.

  8. TOutput doesn’t depend on anything unfixed, so it’s fixed to int.

  9. There are now no unfixed type parameters, so inference succeeds.

Complicated, eh? Still, it does the job—the result is what we’d want (TInput=string, TOutput=int) and everything compiles without any problems. The importance of phase 2 repeating is best shown with another example, however. Listing 9.15 shows two conversions being performed, with the output of the first one becoming the input of the second. Until we’ve worked out the output type of the first conversion, we don’t know the input type of the second, so we can’t infer its output type either.

Example 9.15. Multistage type inference

static void ConvertTwice<TInput,TMiddle,TOutput>
   (TInput input,
    Converter<TInput,TMiddle> firstConversion,
    Converter<TMiddle,TOutput> secondConversion)
{
   TMiddle middle = firstConversion(input);
   TOutput output = secondConversion(middle);
   Console.WriteLine(output);
}
...
ConvertTwice("Another string",
             text => text.Length,
             length => Math.Sqrt(length));

The first thing to notice is that the method signature appears to be pretty horrific. It’s not too bad when you stop being scared and just look at it carefully—and certainly the example usage makes it more obvious. We take a string, and perform a conversion on it: the same conversion as before, just a length calculation. We then take that length (an int) and find its square root (a double).

Phase 1 of type inference tells the compiler that there must be a conversion from string to TInput. The first time through phase 2, TInput is fixed to string and we infer that there must be a conversion from int to TMiddle. The second time through phase 2, TMiddle is fixed to int and we infer that there must be a conversion from double to TOutput. The third time through phase 2, TOutput is fixed to double and type inference succeeds. When type inference has finished, the compiler can look at the code within the lambda expression properly.

Note

Checking the body of a lambda expression—The body of a lambda expression cannot be checked until the input parameter types are known. The lambda expression x => x.Length is valid if x is an array or a string, but invalid in many other cases. This isn’t a problem when the parameter types are explicitly declared, but with an implicit parameter list the compiler needs to wait until it’s performed the relevant type inference before it can try to work out what the lambda expression means.

These examples have shown only one change working at a time—in practice there can be several pieces of information about different type variables, potentially discovered in different iterations of the process. In an effort to save your sanity (and mine), I’m not going to present any more complicated examples—hopefully you understand the general mechanism, even if the exact details are hazy.

Although it may seem as if this kind of situation will occur so rarely that it’s not worth having such complex rules to cover it, in fact it’s quite common in C#3, particularly with LINQ. Indeed, you could easily use type inference extensively without even thinking about it—it’s likely to become second nature to you. If it fails and you wonder why, however, you can always revisit this section and the language specification.

There’s one more change we need to cover, but you’ll be glad to hear it’s easier than type inference: method overloading.

Picking the right overloaded method

Overloading occurs when there are multiple methods available with the same name but different signatures. Sometimes it’s obvious which method is appropriate, because it’s the only one with the right number of parameters, or it’s the only one where all the arguments can be converted into the corresponding parameter types.

The tricky bit comes when there are multiple methods that could be the right one. The rules are quite complicated (yes, again)—but the key part is the way that each argument type is converted into the parameter type. For instance, consider these method signatures, as if they were both declared in the same type:

void Write(int x)
void Write(double y)

The meaning of a call to Write(1.5) is obvious, because there’s no implicit conversion from double to int, but a call to Write(1) is trickier. There is an implicit conversion from int to double, so both methods are possible. At that point, the compiler considers the conversion from int to int, and from int to double. A conversion from any type to itself is defined to be better than any conversion to a different type, so the Write(int x) method is better than Write(double y) for this particular call.

When there are multiple parameters, the compiler has to make sure there is exactly one method that is at least as good as all the others for every parameter. As a simple example, suppose we had

void Write(int x, double y)
void Write(double x, int y)

A call to Write(1, 1) would be ambiguous, and the compiler would force you to add a cast to at least one of the parameters to make it clear which method you meant to call.

That logic still applies to C#3, but with one extra rule about anonymous functions, which never specify a return type. In this case, the inferred return type (as described in 9.4.2) is used in the “better conversion” rules.

Let’s see an example of the kind of situation that needs the new rule. Listing 9.16 contains two methods with the name Execute, and a call using a lambda expression.

Example 9.16. Sample of overloading choice influenced by delegate return type

static void Execute(Func<int> action)
{
   Console.WriteLine("action returns an int: "+action());
}
static void Execute(Func<double> action)
{
   Console.WriteLine("action returns a double: "+action());
}
...

Execute( () => 1 );

The call to Execute in listing 9.16 could have been written with an anonymous method or a method group instead—the same rules are applied whatever kind of conversion is involved. So, which Execute method should be called? The overloading rules say that when two methods are both applicable after performing conversions on the arguments, then those argument conversions are examined to see which one is “better.” The conversions here aren’t from a normal .NET type to the parameter type—they’re from a lambda expression to two different delegate types. So, which conversion is better?

Surprisingly enough, the same situation in C# 2 would result in a compilation error—there was no language rule covering this case. In C# 3, however, the method with the Func<int> parameter would be chosen. The extra rule that has been added can be paraphrased to this:

If an anonymous function can be converted to two delegate types that have the same parameter list but different return types, then the delegate conversions are judged by the conversions from the inferred return type to the delegates’ return types.

That’s pretty much gibberish without referring to an example. Let’s look back at listing 9.16: we’re converting from a lambda expression with no parameters and an inferred return type of int to either Func<int> or Func<double>. The parameter lists are the same (empty) for both delegate types, so the rule applies. We then just need to find the better conversion: int to int, or int to double. This puts us in more familiar territory—as we saw earlier, the int to int conversion is better. Listing 9.16 therefore prints out “action returns an int: 1.”

Wrapping up type inference and overload resolution

This section has been pretty heavy. I would have loved to make it simpler—but it’s a fundamentally complicated topic. The terminology involved doesn’t make it any easier, especially as parameter type and type parameter mean completely different things! Congratulations if you made it through and actually understood it all. Don’t worry if you didn’t: hopefully next time you read through the section, it will shed a bit more light on the topic—particularly after you’ve run into situations where it’s important in your own code. For the moment, here are the most important points:

  • Anonymous functions (anonymous methods and lambda expressions) have inferred return types based on the types of all the return statements.

  • Lambda expressions can only be understood by the compiler when the types of all the parameters are known.

  • Type inference no longer requires that each argument independently comes to exactly the same conclusion about type parameters, as long as the results stay compatible.

  • Type inference is now multistage: the inferred return type of one anonymous function may be used as a parameter type for another.

  • Finding the “best” overloaded method when anonymous functions are involved takes the inferred return type into account.

Summary

In C#3, lambda expressions almost entirely replace anonymous methods. The only thing you can do with an anonymous method that you can’t do with a lambda expression is say that you don’t care about the parameters in the way that we saw in section 5.4.3. Of course, anonymous methods are supported for the sake of backward compatibility, but idiomatic, freshly written C#3 code will contain very few of them.

We’ve seen how lambda expressions are much more than just a more compact syntax for delegate creation, however. They can be converted into expression trees, which can then be processed by other code, possibly performing equivalent actions in different execution environments. This is arguably the most important part of the LINQ story.

Our discussion of type inference and overloading was a necessary evil to some extent: no one actually enjoys discussing the sort of rules which are required, but it’s important to have at least a passing understanding of what’s going on. Before we all feel too sorry for ourselves, spare a thought for the poor language designers who have to live and breathe this kind of thing, making sure the rules are consistent and don’t fall apart in nasty situations. Then pity the testers who have to try to break the implementation!

That’s it in terms of describing lambda expressions—but we’ll be seeing a lot more of them in the rest of the book. For instance, our next chapter is all about extension methods. Superficially, they’re completely separate from lambda expressions—but in reality the two features are often used together.



[1] LINQ to Objects is the LINQ provider in .NET 3.5 that handles sequences of data within the same process. By contrast, providers such as LINQ to SQL offload the work to other “out of process” systems—databases, for example.

[2] Code paths throwing exceptions don’t need to return a value, of course, and neither do detectable infinite loops.

[3] That’s not to say it’s impossible, however. Some languages allow closures to be represented as simple blocks of code with a magic variable name to represent the common case of a single parameter.

[4] Returned expressions which don’t have a type, such as null or another lambda expression, aren’t included in this set. Their validity is checked later, once a return type has been determined, but they don’t contribute to that decision.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.254.118