Chapter 15. Lambda Expressions

Most of the new features of C# 3.0 opened up a world of expressive functional programming to the C# programmer. Functional programming, in its pure form, is a programming methodology built on top of immutable variables (sometimes called symbols), functions that can produce other functions, and recursion, just to name a few of its foundations. Some prominent functional programming languages include Lisp, Haskell, F#,[61] and Scheme.[62] However, functional programming does not require a pure functional language, and one can use and implement functional programming disciplines in traditionally imperative languages such as the C-based languages (including C#). The features added in C# 3.0 transformed the language into a more expressive hybrid language in which both imperative and functional programming techniques can be utilized in harmony. Lambda expressions are arguably the biggest piece of this functional programming pie.

Introduction to Lambda Expressions

Using lambda expressions, you can succinctly define function objects for use at any time. C# has always supported this capability via delegates, whereby you create a function object (in the form of a delegate) and wire it up to the backing code at the time of creation. Lambda expressions join these two actions—creation and connection—into one expressive statement in the code. Additionally, you can easily associate an environment with function objects using a construct called a closure. A functional is a function that takes functions in its parameter list and operates on those functions, possibly even returning another function as the result. For example, a functional could accept two functions, one performing one mathematical operation and the other performing a different mathematical operation, and return a third function that is a composite function built from the two. Lambda expressions provide a more natural way to create and invoke functionals.

In simple syntactic terms, lambda expressions are a syntax whereby you can declare anonymous functions (delegates) in a more fluid and expressive way. At the same time, they are very much more than that, as you will see. Just about every use of an anonymous method can be replaced with a lambda expression. That said, there's no reason you can't utilize functional programming techniques in C# 2.0[63]. At first, the syntax of lambda expressions might take some time to get used to. Overall, the syntax is very straightforward when you are looking at a lambda expression all by itself. However, when embedded in code, they can be a little tricky to decipher and it might take some time to get used to their syntax.

Lambda expressions really take two forms. The form that most directly replaces anonymous methods in syntax includes a statement block within braces. I like to refer to these as lambda statements. These lambda statements are a direct replacement for anonymous methods. Lambda expressions, on the other hand, provide an even more abbreviated way to declare an anonymous method and do not require code within braces nor a return statement. Both types of lambda expressions can be converted to delegates. However, lambda expressions without statement blocks offer something truly impressive. You can convert them into expression trees based on the types in the System.Linq.Expressions namespace. In other words, the function described in code is turned into data. I cover the topic of creating expression trees from lambda expressions in the section titled "Expression Trees" later in this chapter.

Lambda Expressions and Closures

First, let's look at the simpler form of lambda expressions; that is, the ones without a statement block. As mentioned in the previous section, a lambda expression is a shorthand way of declaring a simple anonymous method. The following lambda expression can be used as a delegate that accepts one parameter and returns the result of performing division by 2 on that parameter:

x => x / 2

What this says is "take x as a parameter and return the result of following operation on x." Notice that the lambda expression is devoid of any type information. It does not mean that the expression is typeless. Instead, the compiler will deduce the type of the argument x and the type of the result depending on the context where it is used. It means that if you are assigning a lambda expression to a delegate, the types of the delegate definition are used to determine the types within the lambda expression. The following code shows what happens when a lambda expression is assigned to a delegate type:

using System;
using System.Linq;

public class LambdaTest
{
    static void Main() {
        Func<int, double> expr = x => x / 2;

        int someNumber = 9;
        Console.WriteLine( "Result: {0}", expr(someNumber) );
    }
}

I have marked the lambda expression in bold so that it stands out. The Func<> type is a helper type provided by the System namespace that you can use to declare simple delegates that take up to four arguments and return a result. In this case, I am declaring a variable expr that is a delegate that accepts an int and returns a double. When the compiler assigns the lambda expression to the expr variable, it uses the type information of the delegate to determine that the type of x must be int, and the type of the return value must be double.

Now, if you execute that code, you'll notice that the result is not entirely accurate. That is, the result has been rounded. This is expected because the result of x/2 is represented as an int, which is then cast to a double. You can fix this by specifying different types in the delegate declaration, as shown here:

using System;
using System.Linq;

public class LambdaTest
{
    static void Main() {
        Func<double, double> expr = (double x) => x / 2;

        int someNumber = 9;
        Console.WriteLine( "Result: {0}", expr(someNumber) );
    }
}

For the sake of demonstration, this lambda expression has what's called an explicitly typed parameter list, and in this case, x is declared as type double. Also notice that the type of expr is now Func<double, double> rather than Func<int, double>. The compiler requires that any time you use a typed parameter list in a lambda expression and assign it to a delegate, the delegate's argument types must match exactly. However, because an int is explicitly convertible to a double, you can pass someNumber to expr at call time as shown.

Note

When using typed parameter lists, notice that the parameter list must be enclosed in parentheses. Parentheses are also required when declaring a delegate that accepts either more than one parameter or no parameters, as I'll show later on. In fact, you can use parentheses at any time; they are optional in lambda expressions of only one implicitly typed parameter.

When the lambda expression is assigned to a delegate, the return type of the expression is generally derived from the argument types. So, in the following code statement, the return type of the expression is double because the inferred type of the parameter x is double:

Func<double, int> expr = (x) => x / 2;   // Compiler Error!!!!

However, because double is not implicitly convertible to int, the compiler will complain:

error CS1662: Cannot convert 'lambda expression' to
delegate type 'System.Func<double,int>' because some of the return
types in the block are not implicitly convertible to the delegate return
type

You can "fix" this by casting the result of the lambda expression body to int:

Func<double, int> expr = (x) => (int) x / 2;

Note

Explicit types in lambda expression parameter lists are required if the delegate you are assigning them to has out or ref parameters. One could argue that fixing the parameter types explicitly within a lambda expression defeats some of the elegance of their expressive power. It definitely can make the code harder to read.

Now I want to show you a simple lambda expression that accepts no parameters:

using System;
using System.Linq;

public class LambdaTest
{
    static void Main() {
        int counter = 0;

        WriteStream( () => counter++ );

        Console.WriteLine( "Final value of counter: {0}",
                           counter );
    }

    static void WriteStream( Func<int> generator ) {
        for( int i = 0; i < 10; ++i ) {
            Console.Write( "{0}, ", generator() );
        }
        Console.WriteLine();
    }
}

Notice how simple it was to pass a function as a parameter into the WriteStream method using this lambda expression. Moreover, the function passed in captures an environment within which to run, namely, the counter value in Main. This captured environment and the function together are commonly referred to as a closure.

Finally, I want to show you an example of a lambda expression that accepts more than one parameter:

using System;
using System.Linq;
using System.Collections.Generic;
public class LambdaTest
{
    static void Main() {
        var teamMembers = new List<string> {
            "Lou Loomis",
            "Smoke Porterhouse",
            "Danny Noonan",
            "Ty Webb"
        };

        FindByFirstName( teamMembers,
                         "Danny",
                         (x, y) => x.Contains(y) );
    }

    static void FindByFirstName(
                        List<string> members,
                        string firstName,
                        Func<string, string, bool> predicate ) {
        foreach( var member in members ) {
            if( predicate(member, firstName) ) {
                Console.WriteLine( member );
            }
        }
    }
}

In this case, the lambda expression is used to create a delegate that accepts two parameters of type string and returns a bool. As you can see, lambda expressions provide a nice, succinct way of creating predicates. In a later section, "Iterators Revisited," I'll build upon an example from Chapter 14 showing how to use lambda expressions as predicates to create flexible iterators.

Closures in C# 1.0

Back in the "good old days" of C# 1.0, creating closures was a painful process indeed, and one needed to do something like the following:

using System;

unsafe public class MyClosure
{
    public MyClosure( int* counter )
    {
        this.counter = counter;
    }

    public delegate int IncDelegate();
    public IncDelegate GetDelegate() {
        return new IncDelegate( IncrementFunction );
    }
private int IncrementFunction() {
        return (*counter)++;
    }

    private int* counter;
}

public class LambdaTest
{
    unsafe static void Main() {
        int counter = 0;

        MyClosure closure = new MyClosure( &counter );

        WriteStream( closure.GetDelegate() );

        Console.WriteLine( "Final value of counter: {0}",
                           counter );
    }

    static void WriteStream( MyClosure.IncDelegate incrementor ) {
        for( int i = 0; i < 10; ++i ) {
            Console.Write( "{0}, ", incrementor() );
        }
        Console.WriteLine();
    }
}

Look at all the work involved without lambda expressions. I have bolded the extra work and other changes in the code. The first order of business is to create an object to represent the delegate and its environment. In this case, the environment is a pointer to the counter variable in the Main method. I decided to use a class to encapsulate the function and its environment. Notice the use of unsafe code in the MyClosure class to accomplish this. Unsafe code is required when using pointers in C# because the safety of the code cannot be verified by the CLR.[64] Then, in the Main method, I created an instance of MyClosure and passed a delegate created by calling GetDelegate to WriteStream.

What a lot of work! And on top of that, it sure makes for some hard-to-follow code.

Note

You might be wondering why I used a pointer in the preceding longhand example, thus forcing one to compile using the /unsafe compiler option. The reason was simply to emphasize the fact that the captured variable can be changed out of band from the code consuming it. When the C# compiler captures a variable in a closure, it does something similar, but instead of using a pointer to the captured variable, it instead initializes a public field of the generated class that implements the closure with a reference to the captured variable or a copy if the captured variable is a value type. However, any code that attempts to modify the captured variable outside the scope of the closure modifies the copy within the closure object because, after all, it is a public field. Design wonks might cry foul because public fields are considered evil. However, remember that this is part of the compiler implementation. In fact, the class the compiler generates is "unspeakable," meaning that you cannot instantiate an instance of it in C# code because the name itself, if typed in code, will generate a syntax error. I invite you to inspect the way the compiler generates closures by opening the compiled code within Intermediate Language Disassembler (ILDASM).

Closures in C# 2.0

In C# 2.0, anonymous methods were introduced to reduce the burden I just described. However, they are not as functionally expressive as lambda expressions because they still carry the old imperative programming style with them and require parameter types in the parameter list. Additionally, the anonymous method syntax is rather bulky. For good measure, the following shows how the previous example would be implemented using anonymous methods, so you can see the difference in syntax from lambda expressions:

using System;

public class LambdaTest
{
    static void Main() {
        int counter = 0;

        WriteStream( delegate () {
                        return counter++;
                     } );

        Console.WriteLine( "Final value of counter: {0}",
                           counter );
    }

    static void WriteStream( Func<int> counter ) {
        for( int i = 0; i < 10; ++i ) {
            Console.Write( "{0}, ", counter() );
        }
        Console.WriteLine();
    }
}

I have bolded the differences between this example and the original lambda expression example. It's definitely much cleaner than the way you would have implemented it in the C# 1.0 days. However, it's still not as expressive and succinct as the lambda expression version. Using lambda expressions, you have an elegant means of defining potentially very complex functions that can even be built by assembling together other functions.

Note

In the previous code example, you likely noticed the implications of referencing the counter variable within the lambda expression. After all, counter is actually a local variable within the scope of Main, yet within the scope of WriteStream it is referenced while invoking the delegate. In the Chapter 10 section "Beware the Captured Variable Surprise," I described how you can do the same thing with anonymous methods. In functional programming lingo, this is called a closure. In essence, any time a lambda expression incorporates the environment around it, a closure is the result. As I'll show in a following section, "Closures (Variable Capture) and Memoization," closures can be very useful. However, when used inappropriately, they can create some nasty surprises.

Lambda Statements

All the lambda expressions I have shown so far have been purely of the expression type. Another type of lambda expression is one I like to call a lambda statement. It is similar in form to the lambda expressions of the previous section except that it is composed of a compound statement block within curly braces. Because of that, a lambda with statement blocks must have a return statement. In general, all lambda expressions shown in the previous section can be converted to a lambda with a statement block simply by surrounding the expression with curly braces after prefixing the right side with a return statement. For example, the following lambda expression:

(x, y) => x * y

can be rewritten as the following lambda with a statement block:

(x, y) => { return x * y; }

In form, lambdas with statement blocks are almost identical to anonymous methods. But there is one major difference between lambdas with statement blocks and lambda expressions. Lambdas with statement blocks can be converted only to delegate types, whereas lambda expressions can be converted both to delegates and to expression trees typed by the family of types centered around System.Linq.Expressions.Expression<T>. I'll discuss expression trees in the next section.

Note

The big difference between lambdas with statement blocks and anonymous methods is that anonymous methods must explicitly type their parameters, whereas the compiler can infer the types of the lambda based on context in almost all cases. The abbreviated syntax offered by lambda expressions fosters a more functional programming thought process and approach.

Expression Trees

So far, I have shown you lambda expressions that replace the functionality of delegates. If I stopped there, I would be doing you a great disservice. That's because the C# compiler also has the capability to convert lambda expressions into expression trees based on the types in the System.Linq.Expressions namespace. I'll explain why this is such a great thing in a later section, "Functions as Data." For example, you've already seen how you can convert a lambda expression into a delegate as shown here:

Func<int, int> func1 = n => n+1;

In this line of code, the expression is converted into a delegate that accepts a single int parameter and returns an int. However, check out the following modification:

Expression<Func<int, int>> expr = n => n+1;

This is really cool! The lambda expression, instead of being converted into a callable delegate, is converted into a data structure that represents the operation. The type of the expr variable is Expression<T>, where T is replaced with the type of delegate the lambda can be converted to. The compiler notices that you are trying to convert the lambda expression into an Expression<Func<int,int>> instance and generates all the code internally to make it happen. At some point later in time, you can then compile the expression into a usable delegate as shown in the next example:

using System;
using System.Linq;
using System.Linq.Expressions;

public class EntryPoint
{
    static void Main() {
        Expression<Func<int, int>> expr = n => n+1;

        Func<int, int> func = expr.Compile();

        for( int i = 0; i < 10; ++i ) {
            Console.WriteLine( func(i) );
        }

    }
}

The line in bold shows the step at which the expression is compiled into a delegate. If you think about it a little bit, you might quickly start imagining how you could modify this expression tree or even combine multiple expression trees to create more complex expression trees prior to compiling them. One could even define a new expression language or implement a parser for an already existing expression language. In fact, the compiler acts as an expression parser when you assign a lambda expression into an Expression<T> type instance. Behind the scenes, it generates the code to build the expression tree and if you use ILDASM or Reflector to look at the generated code, you can see it in action.

The previous example could be rewritten without using the lambda expression as follows:

using System;
using System.Linq;
using System.Linq.Expressions;

public class EntryPoint
{
    static void Main() {
        var n = Expression.Parameter( typeof(int), "n" );
var expr = Expression<Func<int,int>>.Lambda<Func<int,int>>(
                       Expression.Add(n, Expression.Constant(1)),
                       n );

        Func<int, int> func = expr.Compile();

        for( int i = 0; i < 10; ++i ) {
            Console.WriteLine( func(i) );
        }

    }
}

The bolded lines here replace the single line in the prior example in which the expr variable is assigned the lambda expression n => n+1. I think you'll agree that the first example is much easier to read. However, this longhand example helps express the true flexibility of expression trees. Let's break down the steps of building the expression. First, you need to represent the parameters in the parameter list of the lambda expression. In this case, there is only one: the variable n. Thus we start with the following:

var n = Expression.Parameter( typeof(int), "n" );

Note

In these examples, I am using implicitly typed variables to save myself a lot of typing and to reduce clutter for readability. Remember, the variables are still strongly typed. The compiler simply infers their type at compile time rather than requiring you to provide the type.

This line of code says that we need an expression to represent a variable named n that is of type int. Remember that in a plain lambda expression, this type can be inferred based upon the delegate type provided.

Now, we need to construct a BinaryExpression instance that represents the addition operation, as shown next:

Expression.Add(n, Expression.Constant(1))

Here, I've said that my BinaryExpression should consist of adding an expression representing a constant, the number 1, to an expression representing the parameter n. You might have already started to notice a pattern. The framework implements a form of the Abstract Factory design pattern for creating instances of expression elements. That is, you cannot create a new instance of BinaryExpression, or any other building block of expression trees, using the new operator along with the constructor of the type. The constructor is not accessible, so you must use the static methods on the Expression class to create those instances. They give us as consumers the flexibility to express what we want and allow the Expression implementation to decide which type we really need.

Note

If you look up BinaryExpression, UnaryExpression, ParameterExpression, and so on in the MSDN documentation, you will notice that there are no public constructors on these types. Instead, you create instances of Expression derived types using the Expression type, which implements the factory pattern and exposes static methods for creating instances of Expression derived types.

Now that you have the BinaryExpression, you need to use the Expression.Lambda<> method to bind the expression (in this case, n+1) with the parameters in the parameter list (in this case, n). Notice that in the example I use the generic Lambda<> method so that I can create the type Expression<Func<int,int>>. Using the generic form gives the compiler more type information to catch any errors I might have introduced at compile time rather than let those errors bite me at run time.

One more point I want to make that demonstrates how expressions represent operations as data is with the Expression Tree Debugger Visualizer in Visual Studio 2010. If you execute the previous example within the Visual Studio Debugger, once you step past the point where you assign the expression into the expr variable, you will notice that in either the "Autos" or "Locals" windows, the expression is parsed and displayed as {n => (n + 1)} even though it is of type System.Linq.Expressions.Expression<System.Func<int,int>>. Naturally, this is a great help while creating complicated expression trees.

Note

If I had used the nongeneric version of the Expression.Lambda method, the result would have been an instance of LambdaExpression rather than Expression. LambdaExpression also implements the Compile method; however, instead of a strongly typed delegate, it returns an instance of type Delegate. Before you can invoke the Delegate instance, you must cast it to the specific delegate type; in this case, Func<int, int> or another delegate with the same signature, or you must call DynamicInvoke on the delegate. Either one of those could throw an exception at run time if you have a mismatch between your expression and the type of delegate you think it should generate.

Operating on Expressions

Now I want to show you an example of how you can take an expression tree generated from a lambda expression and modify it to create a new expression tree. In this case, I will take the expression (n+1) and turn it into 2*(n+1):

using System;
using System.Linq;
using System.Linq.Expressions;

public class EntryPoint
{
    static void Main() {
        Expression<Func<int,int>> expr = n => n+1;
// Now, reassign the expr by multiplying the original
        // expression by 2.
        expr = Expression<Func<int,int>>.Lambda<Func<int,int>>(
                  Expression.Multiply( expr.Body,
                                       Expression.Constant(2) ),
                  expr.Parameters );

        Func<int, int> func = expr.Compile();

        for( int i = 0; i < 10; ++i ) {
            Console.WriteLine( func(i) );
        }

    }
}

The bolded lines show the stage at which I multiply the original lambda expression by 2. It's very important to notice that the parameters passed into the Lambda<> method (the second parameter) need to be exactly the same instances of the parameters that come from the original expression; that is, expr.Parameters. This is required. You cannot pass a new instance of ParameterExpression to the Lambda<> method; otherwise, at run time you will receive an exception similar to the following because the new ParameterExpression instance, even though it might have the same name, is actually a different parameter instance:

System.InvalidOperationException: Lambda Parameter not in scope

There are many classes derived from the Expression class and many static methods for creating instances of them and combining other expressions. It would be monotonous for me to describe them all here. Therefore, I recommend that you refer to the MSDN Library documentation regarding the System.Linq.Expressions namespace for all the fantastic details.

Functions as Data

If you have ever studied functional languages such as Lisp, you might notice the similarities between expression trees and how Lisp and similar languages represent functions as data structures. Most people encounter Lisp in an academic environment, and many times concepts that one learns in academia are not directly applicable to the real world. But before you eschew expression trees as merely an academic exercise, I want to point out how they are actually very useful.

As you might already guess, within the scope of C#, expression trees are extremely useful when applied to LINQ. I will give a full introduction to LINQ in Chapter 16, but for our discussion here, the most important fact is that LINQ provides a language-native, expressive syntax for describing operations on data that are not naturally modeled in an object-oriented way. For example, you can create a LINQ expression to search a large in-memory array (or any other IEnumerable type) for items that match a certain pattern. LINQ is extensible and can provide a means of operating on other types of stores, such as XML and relational databases. In fact, out of the box, C# supports LINQ to SQL, LINQ to Dataset, LINQ to Entities, LINQ to XML, and LINQ to Objects, which collectively allow you to perform LINQ operations on any type that supports IEnumerable.

So how do expression trees come into play here? Imagine that you are implementing LINQ to SQL to query relational databases. The user's database could be half a world away, and it might be very expensive to perform a simple query. On top of that, you have no way of judging how complex the user's LINQ expression might be. Naturally, you want to do everything you can to provide the most efficient experience possible.

If the LINQ expression is represented in data (as an expression tree) rather than in IL (as a delegate), you can operate on it. Maybe you have an algorithm that can spot places where an optimization can be utilized, thus simplifying the expression. Or maybe when your implementation analyzes the expression, you determine that the entire expression can be packaged up, sent across the wire, and executed in its entirety on the server.

Expression trees give you this important capability. Then, when you are finished operating on the data, you can translate the expression tree into the final executable operation via a mechanism such as the LambdaExpression.Compile method and go. Had the expression only been available as IL code from the beginning, your flexibility would have been severely limited. I hope now you can appreciate the true power of expression trees in C#.

Useful Applications of Lambda Expressions

Now that I have shown you what lambda expressions look like, let's consider some of the things you can do with them. You can actually implement most of the following examples in C# using anonymous methods or delegates. However, it's amazing how a simple syntactic addition to the language can clear the fog and open up the possibilities of expressiveness.

Iterators and Generators Revisited

I've described how you can create custom iterators with C# in a couple of places in this book already.[65] Now I want to demonstrate how you can use lambda expressions to create custom iterators. The point I want to stress is how the code implementing the algorithm, in this case the iteration algorithm, is then factored out into a reusable method that can be applied in almost any scenario.

Note

Those of you who are also C++ programmers and familiar with using the Standard Template Library (STL) will find this notion a familiar one. Most of the algorithms defined in the std namespace in the <algorithm> header require you to provide predicates to get their work done. When the STL arrived on the scene back in the early 1990s, it swept the C++ programming community like a refreshing functional programming breeze.

I want to show how you can iterate over a generic type that might or might not be a collection in the strict sense of the word. Additionally, you can externalize the behavior of the iteration cursor as well as how to access the current value of the collection. With a little thought, you can factor out just about everything from the custom iterator creation method, including the type of the item stored, the type of the cursor, the start state of the cursor, the end state of the cursor, and how to advance the cursor. All these are demonstrated in the following example, in which I iterate over the diagonal of a two-dimensional array:

using System;
using System.Linq;
using System.Collections.Generic;

public static class IteratorExtensions
{
    public static IEnumerable<TItem>
        MakeCustomIterator<TCollection, TCursor, TItem>(
                     this TCollection collection,
                     TCursor cursor,
                     Func<TCollection, TCursor, TItem> getCurrent,
                     Func<TCursor, bool> isFinished,
                     Func<TCursor, TCursor> advanceCursor) {
        while( !isFinished(cursor) ) {
            yield return getCurrent( collection, cursor );
            cursor = advanceCursor( cursor );
        }
    }
}

public class IteratorExample
{
    static void Main() {
        var matrix = new List<List<double>> {
            new List<double> { 1.0, 1.1, 1.2 },
            new List<double> { 2.0, 2.1, 2.2 },
            new List<double> { 3.0, 3.1, 3.2 }
        };

        var iter = matrix.MakeCustomIterator(
                     new int[] { 0, 0 },
                     (coll, cur) => coll[cur[0]][cur[1]],
                     (cur) => cur[0] > 2 || cur[1] > 2,
                     (cur) => new int[] { cur[0] + 1,
                                          cur[1] + 1 } );

        foreach( var item in iter ) {
            Console.WriteLine( item );
        }
    }
}

Let's look at how reusable MakeCustomIterator<> is. Admittedly, it takes some time to get used to the lambda syntax, and those used to reading imperative coding styles might find it hard to follow. Notice that it takes three generic type arguments. TCollection is the type of the collection, which in this example is specified as List<List<double>> at the point of use. TCursor is the type of the cursor, which in this case is a simple array of integers that can be considered coordinates of the matrix variable. And TItem is the type that the code returns via the yield statement. The rest of the type arguments to MakeCustomIterator<> are delegate types that it uses to determine how to iterate over the collection.

First, it needs a way to access the current item in the collection, which, for this example, is expressed in the following lambda expression which uses the values within the cursor array to index the item within the matrix:

(coll, cur) => coll[cur[0]][cur[1]]

Then it needs a way to determine whether you have reached the end of the collection, for which I supply the following lambda expression that just checks to see whether the cursor has stepped off of the edge of the matrix:

(cur) => cur[0] > 2 || cur[1] > 2

And finally it needs to know how to advance the cursor, which I have supplied in the following lambda expression, which simply advances both coordinates of the cursor:

(cur) => new int[] { cur[0] + 1, cur[1] + 1 }

After executing the preceding code, you should see output similar to the following, which shows that you have indeed walked down the diagonal of the matrix from the top left to the bottom right. At each step along the way, MakeCustomIterator<> has delegated work to the given delegates to perform the work.

1
2.1
3.2

Other implementations of MakeCustomIterator<> could accept a first parameter of type IEnumerable<T>, which in this example would be IEnumerable<double>. However, when you impose that restriction, whatever you pass to MakeCustomIterator<> must implement IEnumerable<>. The matrix variable does implement IEnumerable<>, but not in the form that is easily usable, because it is IEnumerable<List<double>>. Additionally, you could assume that the collection implements an indexer, as described in the Chapter 4 section "Indexers," but to do so would be restricting the reusability of MakeCustomIterator<> and which objects you could use it on. In the previous example, the indexer is actually used to access the current item, but its use is externalized and wrapped up in the lambda expression given to access the current item.

Moreover, because the operation of accessing the current item of the collection is externalized, you could even transform the data in the original matrix variable as you iterate over it. For example, I could have multiplied each value by 2 in the lambda expression that accesses the current item in the collection, as shown here:

(coll, cur) => coll[cur[0]][cur[1]] * 2;

Can you imagine how painful it would have been to implement MakeCustomIterator<> using delegates in the C# 1.0 days? This is exactly what I mean when I say that even just the addition of the lambda expression syntax to C# opens one's eyes to the incredible possibilities.

As a final example, consider the case in which your custom iterator does not even iterate over a collection of items at all and is used as a number generator instead, as shown here:

using System;
using System.Linq;
using System.Collections.Generic;

public class IteratorExample
{
    static IEnumerable<T> MakeGenerator<T>( T initialValue,
                                            Func<T, T> advance ) {
        T currentValue = initialValue;
        while( true ) {
            yield return currentValue;
            currentValue = advance( currentValue );
        }
    }

    static void Main() {
        var iter = MakeGenerator<double>( 1,
                                          x => x * 1.2 );

        var enumerator = iter.GetEnumerator();
        for( int i = 0; i < 10; ++i ) {
            enumerator.MoveNext();
            Console.WriteLine( enumerator.Current );
        }
    }
}

After executing this code, you will see the following results:

1
1.2

1.44

1.728

2.0736

2.48832

2.985984

3.5831808

4.29981696
5.159780352

You could allow this method to run infinitely, and it would stop only if you experienced an overflow exception or you stopped execution. But the items you are iterating over don't exist as a collection; rather, they are generated on an as-needed basis each time you advance the iterator. You can apply this concept in many ways, even creating a random number generator implemented using C# iterators.

More on Closures (Variable Capture) and Memoization

In the Chapter 10 section titled "Beware the Captured Variable Surprise," I described how anonymous methods can capture the contexts of their lexical surroundings. Many refer to this phenomenon as variable capture. In functional programming parlance, it's also known as a closure.[66] Here is a simple closure in action:

using System;
using System.Linq;

public class Closures
{
    static void Main() {
        int delta = 1;
        Func<int, int> func = (x) => x + delta;

        int currentVal = 0;
        for( int i = 0; i < 10; ++i ) {
            currentVal = func( currentVal );
            Console.WriteLine( currentVal );
        }
    }
}

The variable delta and the delegate func embody the closure. The expression body references delta, and therefore must have access to it when it is executed at a later time. To do this, the compiler "captures" the variable for the delegate. Behind the scenes, what this means is that the delegate body contains a reference to the actual variable delta. But notice that delta is a value type on the stack. The compiler must be doing something to ensure that delta lives longer than the scope of the method within which is it declared because the delegate will likely be called later, after that scope has exited. Moreover, because the captured variable is accessible to both the delegate and the context containing the lambda expression, it means that the captured variable can be changed outside the scope and out of band of the delegate. In essence, two methods (Main and the delegate) both have access to delta. This behavior can be used to your advantage, but when unexpected, it can cause serious confusion.

Note

In reality, when a closure is formed, the C# compiler takes all those variables and wraps them up in a generated class. It also implements the delegate as a method of the class. In very rare cases, you might need to be concerned about this, especially if it is found to be an efficiency burden during profiling.

Now I want to show you a great application of closures. One of the foundations of functional programming is that the function itself is treated as a first-class object that can be manipulated and operated upon as well as invoked. You've already seen how lambda expressions can be converted into expression trees so you can operate on them, producing more or less complex expressions. But one thing I have not discussed yet is the topic of using functions as building blocks for creating new functions. As a quick example of what I mean, consider two lambda expressions:

x => x * 3
x => x + 3.1415

You could create a method to combine such lambda expressions to create a compound lambda expression as I've shown here:

using System;
using System.Linq;

public class Compound
{
    static Func<T, S> Chain<T, R, S>( Func<T, R> func1,
                                      Func<R, S> func2 ) {
        return x => func2( func1(x) );
    }

    static void Main() {
        Func<int, double> func = Chain( (int x) => x * 3,
                                        (int x) => x + 3.1415 );

        Console.WriteLine( func(2) );
    }
}

The Chain<> method accepts two delegates and produces a third delegate by combining the two. In the Main method, you can see how I used it to produce the compound expression. The delegate that you get after calling Chain<> is equivalent to the delegate you get when you convert the following lambda expression into a delegate:

x => (x * 3) + 3.1415

Having a method to chain arbitrary expressions like this is useful indeed, but let's look at other ways to produce a derivative function. Imagine an operation that takes a really long time to compute. Examples are the factorial operation or the operation to compute the nth Fibonacci number. An example that I ultimately like to show demonstrates the Reciprocal Fibonacci constant, which is

More on Closures (Variable Capture) and Memoization

where Fk is a Fibonacci number.[67]

To begin to demonstrate that this constant exists computationally, you need to first come up with an operation to compute the nth Fibonacci number:

using System;
using System.Linq;

public class Proof
{
    static void Main() {
        Func<int, int> fib = null;
        fib = (x) => x > 1 ? fib(x-1) + fib(x-2) : x;

        for( int i = 30; i < 40; ++i ) {
            Console.WriteLine( fib(i) );
        }
    }
}

When you look at this code, the first thing that jumps up and grabs you is the formation of the Fibonacci routine; that is, the fib delegate. It forms a closure on itself! This is definitely a form of recursion and behavior that I desire. However, if you execute the example, unless you have a powerhouse of a machine, you will notice how slow it is, even though all I did was output the 30th to 39th Fibonacci numbers! If that is the case, you don't even have a prayer of demonstrating the Fibonacci constant. The slowness comes from the fact that for each Fibonacci number that you compute, you have to do a little more work than you did to compute the two prior Fibonacci numbers, and you can see how this work quickly mushrooms.

You can solve this problem by trading a little bit of space for time by caching the Fibonacci numbers in memory. But instead of modifying the original expression, let's look at how to create a method that accepts the original delegate as a parameter and returns a new delegate to replace the original. The ultimate goal is to be able to replace the first delegate with the derivative delegate without affecting the code that consumes it. One such technique is called memorization.[68] This is the technique whereby you cache function return values and each return value's associated input parameters. This works only if the function has no entropy, meaning that for the same input parameters, it always returns the same result. Then, prior to calling the actual function, you first check to see whether the result for the given parameter set has already been computed and return it rather than calling the function. Given a very complex function, this technique trades a little bit of memory space for significant speed gain.

Let's look at an example:

using System;
using System.Linq;
using System.Collections.Generic;

public static class Memoizers
{
    public static Func<T,R> Memoize<T,R>( this Func<T,R> func ) {
        var cache = new Dictionary<T,R>();
        return (x) => {
            R result = default(R);
            if( cache.TryGetValue(x, out result) ) {
                return result;
            }

            result = func(x);
            cache[x] = result;
            return result;
        };
    }
}

public class Proof
{
    static void Main() {
        Func<int, int> fib = null;
        fib = (x) => x > 1 ? fib(x-1) + fib(x-2) : x;
        fib = fib.Memoize();

        for( int i = 30; i < 40; ++i ) {
            Console.WriteLine( fib(i) );
        }
    }
}

First of all, notice that in Main, I have added only one more statement where I apply the Memoize<> extension method to the delegate to produce a new delegate. Everything else stays the same, so the transparent replaceability goal is achieved. The Memoize<> method wraps the original delegate that's passed in via the func argument with another closure that includes a Dictionary<> instance to store the cached values of the given delegate func. In the process of Memoize<> taking one delegate and returning another, it has introduced a cache that greatly improves the efficiency. Each time the derivative delegate is called, it first checks the cache to see whether the value has already been computed.

Warning

Of course, memoization works only for functions that are deterministically repeatable in the sense that you are guaranteed to get the same result for the same parameters. For example, a true random number generator cannot be memoized.

Run the two previous examples on your own machine to see the amazing difference. Now you can move on to the business of computing the Reciprocal Fibonacci constant by modifying the Main method as follows:

static void Main() {
    Func<ulong, ulong> fib = null;
    fib = (x) => x > 1 ? fib(x-1) + fib(x-2) : x;
    fib = fib.Memoize();

    Func<ulong, decimal> fibConstant = null;
    fibConstant = (x) => {
        if( x == 1 ) {
            return 1 / ((decimal)fib(x));
        } else {
            return 1 / ((decimal)fib(x)) + fibConstant(x-1);
        }
    };
    fibConstant = fibConstant.Memoize();

    Console.WriteLine( "
{0}	{1}	{2}	{3}
",
                       "Count",
                       "Fibonacci".PadRight(24),
                       "1/Fibonacci".PadRight(24),
                       "Fibonacci Constant".PadRight(24) );

    for( ulong i = 1; i <= 93; ++i ) {
        Console.WriteLine( "{0:D5}	{1:D24}	{2:F24}	{3:F24}",
                           i,
                           fib(i),
                           (1/(decimal)fib(i)),
                           fibConstant(i) );
    }
}

The bold text shows the delegate I created to compute the nth Reciprocal Fibonacci constant. As you call this delegate with higher and higher values for x, you should see the result get closer and closer to the Reciprocal Fibonacci constant. Notice that I memoized the fibConstant delegate as well. If you don't do this, you might suffer a stack overflow due to the recursion as you call fibConstant with higher and higher values for x. So you can see that memoization also trades stack space for heap space. On each line of output, the code outputs the intermediate values for informational purposes, but the interesting value is in the far right column. Notice that I stopped calculation with iteration number 93. That's because the ulong will overflow with the 94th Fibonacci number. I could solve the overflow problem by using BigInteger in the System.Numeric namespace. However, that's not necessary because the 93rd iteration of the Reciprocal Fibonacci constant shown here is close enough to prove the point of this example:

3.359885666243177553039387

I have bolded the digits that are significant.[69] I think you will agree that memoization is extremely useful. For that matter, many more useful things can be done with methods that accept functions and produce other functions, as I'll show in the next section.

Currying

In the previous section on closures I demonstrated how to create a method that accepts a function, given as a delegate, and produces a new function. This concept is a very powerful one and memoization, as shown in the previous section, is a powerful application of it. In this section, I want to show you the technique of currying.[70] In short, what it means is creating an operation (usually a method) that accepts a function of multiple parameters (usually a delegate) and produces a function of only a single parameter.

Note

If you are a C++ programmer familiar with the STL, you have undoubtedly used the currying operation if you've ever utilized any of the parameter binders such as Bind1st and Bind2nd.

Suppose that you have a lambda expression that looks like the following:

(x, y) => x + y

Now, suppose that you have a list of doubles and you want to use this lambda expression to add a constant value to each item on the list, producing a new list. What would be nice is to create a new delegate based on the original lambda expression in which one of the variables is forced to a static value. This notion is called parameter binding, and those who have used STL in C++ are likely very familiar with it. Check out the next example, in which I show parameter binding in action by adding the constant 3.2 to the items in a List<double> instance:

using System;
using System.Linq;
using System.Collections.Generic;

public static class CurryExtensions
{
    public static Func<TArg1, TResult>
        Bind2nd<TArg1, TArg2, TResult>(
            this Func<TArg1, TArg2, TResult> func,
            TArg2 constant ) {
        return (x) => func( x, constant );
    }
}

public class BinderExample
{
    static void Main() {
        var mylist = new List<double> { 1.0, 3.4, 5.4, 6.54 };
        var newlist = new List<double>();

        // Here is the original expression.
        Func<double, double, double> func = (x, y) => x + y;

        // Here is the curried function.
        var funcBound = func.Bind2nd( 3.2 );

        foreach( var item in mylist ) {
            Console.Write( "{0}, ", item );
            newlist.Add( funcBound(item) );
        }

        Console.WriteLine();
        foreach( var item in newlist ) {
            Console.Write( "{0}, ", item );
        }
    }
}

The meat of this example is in the Bind2nd<> extension method, which I have bolded. You can see that it creates a closure and returns a new delegate that accepts only one parameter. Then, when that new delegate is called, it passes its only parameter as the first parameter to the original delegate and passes the provided constant as the second parameter. For the sake of example, I iterate through the mylist list, building a second list held in the newlist variable while using the curried version of the original method to add 3.2 to each item.

Just for good measure, I want to show you another way you can perform the currying, slightly different from that shown in the previous example:

using System;
using System.Linq;
using System.Collections.Generic;

public static class CurryExtensions
{
    public static Func<TArg2, Func<TArg1, TResult>>
        Bind2nd<TArg1, TArg2, TResult>(
            this Func<TArg1, TArg2, TResult> func ) {
        return (y) => (x) => func( x, y );
    }
}

public class BinderExample
{
    static void Main() {
        var mylist = new List<double> { 1.0, 3.4, 5.4, 6.54 };
        var newlist = new List<double>();
// Here is the original expression.
        Func<double, double, double> func = (x, y) => x + y;

        // Here is the curried function.
       var funcBound = func.Bind2nd()(3.2);

        foreach( var item in mylist ) {
            Console.Write( "{0}, ", item );
            newlist.Add( funcBound(item) );
        }

        Console.WriteLine();
        foreach( var item in newlist ) {
            Console.Write( "{0}, ", item );
        }
    }
}

I have bolded the parts that are different from the previous example. In the first example, Bind2nd<> returned a delegate that accepted a single integer and returned an integer. In this example, I changed Bind2nd<> to return a delegate that accepts a single parameter (the value to bind the second parameter of the original function to) and returns another delegate that is the curried function. Both forms are perfectly valid. But the purists might prefer the second form over the former.

Anonymous Recursion

In the earlier section titled "Closures (Variable Capture) and Memoization," I showed a form of recursion using closures while calculating the Fibonacci numbers. For the sake of discussion, let's look at a similar closure that one can use to calculate the factorial of a number:

Func<int, int> fact = null;
fact = (x) => x > 1 ? x * fact(x-1) : 1;

This code works because fact forms a closure on itself and also calls itself. That is, the second line, in which fact is assigned the lambda expression for the factorial calculation, captures the fact delegate itself. Even though this recursion works, it is extremely fragile, and you must be very careful when using it as written because of reasons I will describe now.

Remember that even though a closure captures a variable for use inside the anonymous method, which is implemented here as a lambda expression, the captured variable is still accessible and mutable from outside the context of the capturing anonymous method or lambda expression. For example, consider what happens if you perform the following:

Func<int, int> fact = null;
fact = (x) => x > 1 ? x * fact(x-1) : 1;
Func<int, int> newRefToFact = fact;

Because objects in the CLR are reference types, newRefToFact and fact now reference the same delegate. Now, imagine that you then do something similar to this:

Func<int, int> fact = null;
fact = (x) => x > 1 ? x * fact(x-1) : 1;
Func<int, int> newRefToFact = fact;
fact = (x) => x + 1;

Now the intended recursion is broken! Can you see why? The reason is that we modified the captured variable fact. We reassigned fact to reference a new delegate based on the lambda expression (x) => x+1. But newRefToFact still references the lambda expression (x) => x > 1 ? x * fact(x-1) : 1. However, when the delegate referenced by newRefToFact calls fact, instead of recursing, it ends up executing the new expression (x) => x+1, which is different behavior from the recursion you had before. Ultimately, the problem is caused by the fact that the closure that embodies the recursion allows you to modify the captured variable (the func delegate) externally. If the captured variable is changed, the recursion could break.

There are several ways to fix this problem, but the typical method is to use anonymous recursion.[71] What ends up happening is that you modify the preceding factorial lambda expression to accept another parameter, which is the delegate to call when it's time to recurse. Essentially, this removes the closure and converts the captured variable into a parameter to the delegate. What you end up with is something similar to the following:

delegate TResult AnonRec<TArg,TResult>( AnonRec<TArg,TResult> f, TArg arg );
AnonRec<int, int> fact = (f, x) => x > 1 ? x * f(f, x-1) : 1;

The key here is that instead of recursing by relying on a captured variable that is a delegate, you instead pass the delegate to recurse on as a parameter. That is, you traded the captured variable for a variable that is passed on the stack (in this case, the parameter f in the fact delegate). In this example, the recursion delegate is represented by the parameter f. Therefore, notice that fact not only accepts f as a parameter, but calls it in order to recurse and then passes f along to the next iteration of the delegate. In essence, the captured variable now lives on the stack as it is passed to each recursion of the expression. However, because it is on the stack, the danger of it being modified out from underneath the recursion mechanism is now gone.

For more details on this technique, I strongly suggest that you read Wes Dyer's blog entry titled "Anonymous Recursion in C#" at http://blogs.msdn.com/wesdyer. In his blog entry he demonstrates how to implement a Y fixed-point combinator that generalizes the notion of anonymous recursion shown previously.[72]

Summary

In this chapter, I introduced you to the syntax of lambda expressions, which are, for the most part, replacements for anonymous methods. In fact, it's a shame that lambda expressions did not come along with C# 2.0 because then there would have been no need for anonymous methods. I showed how you can convert lambda expressions, with and without statement bodies, into delegates. Additionally, you saw how lambda expressions without statement bodies are convertible to expression trees based on the Expression<T> type as defined in the System.Linq.Expression namespace. Using expression trees, you can apply transformations to the expression tree before actually compiling it into a delegate and calling it. I finished the chapter by showing you useful applications of lambda expressions. They included creating generalized iterators, memoization by using closures, delegate parameter binding using currying, and an introduction to the concept of anonymous recursion. Just about all these concepts are foundations of functional programming. Even though one could implement all these techniques in C# 2.0 using anonymous methods, the introduction of lambda syntax to the language makes using such techniques more natural and less cumbersome.

The following chapter introduces LINQ. I will also continue to focus on the functional programming aspects that it brings to the table.



[61] F# is an exciting new functional programming language for the .NET Framework. For more information, I invite you to read Robert Pickering's Foundations of F# (Berkeley, CA: Apress, 2007).

[62] One of the languages that I use often is C++. Those of you that are familiar with metaprogramming in C++ are definitely familiar with functional programming techniques. If you do use C++ and you're curious about metaprogramming, I invite you to check out David Abrahams' and Alexey Gurtovoy's most excellent book C++ Template Metaprogramming (Boston: Addison-Wesley Professional, 2004).

[63] I covered some examples of functional programming with anonymous methods in Chapter 14.

[64] The intricacies of unsafe coding in C# are outside the scope of this book. I encourage you to reference the MSDN documentation for further details.

[65] Chapter 9 introduces iterators via the yield statement, and Chapter 14 expanded on custom iterators in the section titled "Borrowing from Functional Programming."

[66] For a more general discussion of closures, visit http://en.wikipedia.org/wiki/Closure_%28computer_science%29.

[67] Weisstein, Eric W. "Reciprocal Fibonacci Constant." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/ReciprocalFibonacciConstant.html

[68] You can read more about memoization at http://en.wikipedia.org/wiki/Memoization. Also, Wes Dyer has an excellent entry regarding memoization on his blog at http://blogs.msdn.com/wesdyer/archive/2007/01/26/function-memoization.aspx.

[69] You can see many more decimal location of the Fibonacci constant at http://www.research.att.com/~njas/sequences/A079586.

[70] For a lot more information about currying, go to http://en.wikipedia.org/wiki/Currying.

[71] For more theoretical details on anonymous recursion reference the article at http://en.wikipedia.org/wiki/Anonymous_recursion.

[72] Read more about Y fixed-point combinators at http://en.wikipedia.org/wiki/Fixed_point_combinator.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.197.251