Chapter 14. Extension Methods

Using extension methods, you can declare methods that appear to augment the public interface, or contract, of a type. At first glance, they may appear to provide a way to extend classes that are not meant to be extended. However, it's very important to note that extension methods cannot break encapsulation. That's because they're not really instance methods at all and thus cannot crack the shell of encapsulation on the type they are extending.

Introduction to Extension Methods

As previously mentioned, extension methods make it appear that you can modify the public interface of any type. Let's take a quick look at a small example showing extension methods in action:

using System;

namespace ExtensionMethodDemo
{

    public static class ExtensionMethods
    {
        public static void SendToLog( this String str ) {
            Console.WriteLine( str );
        }
    }

    public class ExtensionMethodIntro
    {
        static void Main() {
            String str = "Some useful information to log";

            // Call the extension method
            str.SendToLog();

            // Call the same method the old way.
            ExtensionMethods.SendToLog( str );
        }
    }

}

Take a look at the Main method first. Notice that I declared a System.String first and then called the method SendToLog on the str instance. But wait! There is no method named SendToLog in the System.String type definition. That's because SendToLog is an extension method declared in the previous class, named ExtensionMethods.

At first glance, it appears that ExtensionMethods.SendToLog is just like any other plain static method. But notice two things. It is declared inside a static class, namely ExtensionMethods, and the first parameter to the static method SendToLog has its type prefixed with the keyword this. Using the this keyword in this way on a static method declared in a static class is how you tell the compiler that the method is an extension method.

Notice that in the end of the Main method, I demonstrate that you can still call the SendToLog method just like any other normal static method. In fact, extension methods do not give you any more functionality over regular static methods. Extension methods add a certain amount of syntactic sugar to the language to allow you to call them as if they were an instance method of the type instance you are operating on. But just like any other feature of the language, they can be abused. Therefore, I present some best practices and guidelines when using extension methods later in the "Recommendations for Use" section.

How Does the Compiler Find Extension Methods?

When you call an instance method on a type instance, the compiler must deduce which method you are actually calling by considering things such as the instance's type, its base type (if there is one), any interfaces it and its base type may implement, and so on. As shown in Chapter 5, the steps the compiler goes through to determine which method to call can be quite complex. So how does the compiler handle the added complexity of finding an extension method to call?

Extension methods are typically imported into the current compilation unit via namespaces with the using keyword. When you use the using keyword to import the types from a particular namespace into the current scope, you also make all the extension methods implemented in static classes in that namespace available for call via the new syntax. If you don't import the namespace with the static classes implementing the extension methods you need, you can only call them as static methods using their fully qualified names. Remember that it is not required that you import a namespace in order to use the types within it. For example, you can use the System.Console type in your application without actually importing the types of the System namespace as long as you use its fully qualified name. But you typically do it for convenience. Similarly, in order to call extension methods using the instance method call syntax, you must import the namespace.

Note

Starting with C# 3.0, importing a namespace as a more convenient way to address the types within it has a side effect if that namespace also contains extension methods. I discuss this more fully in the "Recommendations for Use" section later on.

When you invoke an instance method, the compiler searches the type for the matching instance method. If the search yields no matching methods, the compiler proceeds to look for matching extension methods. It starts by searching all the extension methods declared in the innermost containing namespace of the method invocation including its imported namespaces, and if it does not find a match, it searches the next outer enclosing namespace recursively up to the global namespace as it looks for a match. If it fails to find a match, the compiler will stop with a compilation error. Conversely, if the current namespace has more than one extension method imported that matches, the compiler will issue an error complaining about the ambiguity. In such cases, you must fall back to calling the method as a static method while specifying the fully qualified method name.

Note

Technically, this name lookup process is a little more complicated than this. If the instance you are trying to call an extension method on contains a property with the same name, the property will be found by the compiler and then it will stop with an error claiming that you are attempting to call a property as a method.

Consider the following example that illustrates all these points:

using System;

public static class ExtensionMethods
{
    static public void WriteLine( this String str ) {
        Console.WriteLine( "Global Namespace: " + str );
    }
}

namespace A
{
    public static class ExtensionMethods
    {
        static public void WriteLine( this String str ) {
            Console.WriteLine( "Namespace A: " + str );
        }
    }
}

namespace B
{
    public static class ExtensionMethods
    {
        static public void WriteLine( this String str ) {
            Console.WriteLine( "Namespace B: " + str );
        }
    }
}

namespace C
{
    using A;

    public class Employee
    {
        public Employee( String name ) {
            this.name = name;
        }
public void PrintName() {
            name.WriteLine();
        }

        private String name;
    }
}

namespace D
{
    using B;

    public class Dog
    {
        public Dog( String name ) {
            this.name = name;
        }

        public void PrintName() {
            name.WriteLine();
        }

        private String name;
    }
}

namespace E
{
    public class Cat
    {
        public Cat( String name ) {
            this.name = name;
        }

        public void PrintName() {
            name.WriteLine();
        }

        private String name;
    }
}

namespace Demo
{
    using A;
    using B;

    public class EntryPoint
    {
        static void Main() {
            C.Employee fred = new C.Employee( "Fred" );
            D.Dog thor = new D.Dog( "Thor" );
            E.Cat sylvester = new E.Cat( "Sylvester" );
fred.PrintName();
            thor.PrintName();
            sylvester.PrintName();

            // String str = "Etouffe";
            // str.WriteLine();
        }
    }
}

In this example, the same extension method, WriteLine, is declared in three different namespaces (namespace A, namespace B, and the global namespace). Additionally, three types are defined, each in its own namespace: Employee, Dog, and Cat. In the Main method, an instance of each is created, and then the PrintName method is called on each instance. The body of each type's PrintName method then calls the WriteLine extension method on the field of type String, and this is where things get interesting. If you compile and execute the code, you will see the following output on the console:

Namespace A: Fred
Namespace B: Thor
Global Namespace: Sylvester

Notice that in this case the exact implementation of the extension method that gets called is governed by which namespace is imported into the namespace where the type (either Employee, Dog, or Cat) is defined. Because namespace C imports namespace A, it follows that Employee will end up calling the WriteLine defined in namespace A. Similarly, because namespace D imports namespace B, the WriteLine extension method in namespace B will get called. Because namespace E imports neither namespaces A nor B, the search proceeds out to the global namespace, which also includes an implementation of the WriteLine extension method. Finally, notice the commented code at the end of the Main method. If you uncomment these lines and attempt to compile, you will see the compiler complain that it cannot figure out which WriteLine to call because the Demo namespace imports both namespaces A and B. This example highlights the dangers of improperly using this new syntax.

Under the Covers

How does the compiler implement extension methods? Because extension methods are just a syntactic shortcut, no modifications were needed to the runtime to support extension methods. Instead, the compiler implements extension methods completely with metadata. Following is the IL code for the ExtensionMethods.WriteLine method from the default namespace generated by compiling the previous example:

.method public hidebysig static void WriteLine(string str) cil managed
{
  .custom instance void [System.Core]System.Runtime.CompilerServices. ~CCC
ExtensionAttribute::.ctor() = ( 01 00 00 00 )
  // Code size       19 (0x13)
.maxstack  8
  IL_0000:  nop
  IL_0001:  ldstr      "Default Namespace: "
  IL_0006:  ldarg.0
  IL_0007:  call       string [mscorlib]System.String::Concat(string,
                                                                        string)
  IL_000c:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_0011:  nop
  IL_0012:  ret
} // end of method ExtensionMethods::WriteLine

The bold text is wrapped to fit the page width, but what it shows you is that the method has a new attribute applied to it. That attribute is ExtensionAttribute and it is defined in the System.Runtime.CompilerServices namespace. The compiler also applies the attribute to the containing class, which is ExtensionMethods in this case. At compile time, it references this information when searching for potential extension methods to call. This is a perfect example of the power of metadata and the kinds of things you can achieve with custom attributes in metadata.

If you look at the generated IL at the call site of an extension method, you will see that the extension method is called as a normal static method.

Code Readability versus Code Understandability

Let's face it. There are too many companies out there that provide very little documentation of their applications' code. Many companies may have high-level design documents that essentially show the big components of the application and some vague lines joining them together to express the relationships between them. But all too often, companies are in such a hurry to get to market that once they start coding the app, and the design makes a few turns here and there, they never bother to update the original design documents. Give the product a few versions for the code and documentation to diverge further, and then you might as well just send the documents to the recycle bin. At the other end of the spectrum, you very occasionally see organizations that document everything, even up to the point of designing every piece of the application with a UML modeling tool, and so on, before coding.

Most successful projects fall somewhere in between. You typically have just enough documentation so that if the lead developer who conjured up the guts of a rendering engine meets an unfortunate fate, the next developer can come in behind him and couple the information in the documentation with the information in the code and move forward. One key to this puzzle is easily readable code, but it should also be easy-to-understand code.

What's the difference? Code that's easily readable makes the programmer's intentions at the place you are reading easy to absorb. For example, imagine that you are reading someone's code and they have a need to send some string to some log file. You could come across something like the following:

String info = "tasty information";
Logger.LogOutput( info );

For the most part, it's easy to see what is going on. Or is it? What does LogOutput do? Does it write the string to a file? Does it send it to the console? Does it send it to the debug output stream? It's impossible to tell from looking at this code. Instead, we have to look at the Logger.LogOutput method's code to understand what's going on. So one may argue that the key to understandable code is code that is easy to navigate, thus making it easy to follow.

Consider what that code might look like with extension methods in play:

String info = "tasty information";
info.WriteTo( logFile );

Instantly, the code is much easier to read. It almost reads like a written sentence. This is a technique you see used all throughout the .NET Framework. In the second line, it starts with a subject, in the middle there is a verb, and at the end is an indirect object. You could read it as if it were an imperative command given to the info object.

But what if WriteTo is implemented as an extension method? Is the code just as easy to understand as it is to read? It actually might depend on your tools. IntelliSense in Visual Studio definitely can help here because it can help you find which static class implements the WriteTo extension method. But if your favorite editor is Emacs or Vim without appropriate plug-ins, you are stuck having to determine which namespaces have been imported into the current scope, and then you must start searching those namespaces for a class that implements the extension method. Depending on the complexity of the application code and how many namespaces the current compilation unit imports, that could be a nightmare!

So the moral of the story is this. Just as with any other language feature, be sure to use extension methods correctly, sparingly, and only when needed. Just because you can implement something with an extension method doesn't mean that you always should. The entire engineering discipline is built around making engineering decisions whereby we gather information and make the best decision based upon the data, even if none of the options may be perfect.[56]

Recommendations for Use

The following sections detail some best-use practices and guidelines for using extension methods. Of course, as extension method use evolves in the language, this set will grow.

Consider Extension Methods Over Inheritance

When one first encounters extension methods, it's natural to view the feature as a way to extend the public contract of a single type or a group of types. That's an obvious conclusion given the fact that extension methods can be invoked through the instance method call syntax. However, I think it's much more effective to view extension methods as a way to provide operations that you can apply to a given type, or multiple types, in a more attractive syntactic, or general way—because again, they do not actually extend a type's contract at all.

Previously, I showed trivial examples of using extension methods to be able to call WriteLine on instances of type String. Alternatively, one could have attempted to inherit from String in order to provide the same behavior. In this case, you'll find that such a technique is impossible because String is sealed. In Chapter 13, I explain why I believe it is best to prefer sealing the types you create by default. You should unseal them only after you put in the extra thought and design (and documentation) required to make them suitable as base classes. The designers of System.String clearly had good reason to keep us from using it as a base type.

Additionally, gratuitous use of inheritance for this purpose unnecessarily complicates your design. Chapter 4 describes how you should use inheritance sparingly and how using containment is generally more flexible than inheritance. Inheritance is one of the strongest forms of static binding between two types. Overuse of that glue creates a monolith that's extremely difficult to work with.

For example, you might have the need to apply your WriteLine operation, or some other useful extension method, on a type instance that you do not create. Consider an instance of a type that's returned from some factory method, as in the following code:

public class MyFactory
{
    public Widget CreateWidget() {
        ...
    }
}

Here CreateWidget is the method that is creating the instance of the returned Widget. It's what is called a factory method. Typically, you'll have a hierarchy of type specializations derived from Widget, and CreateWidget might take some parameters telling it exactly what type of Widget to create. Regardless of that, we don't have control over how or when the object is created. Therefore, it's impossible for us to use inheritance to extend the Widget type's contract unless we also control the CreateWidget method. And even if we do, maybe that method is already published in some other assembly that's been signed and certified and cannot be easily changed. Clearly inheritance is not the correct approach to add the WriteLine functionality to types of Widget in this situation.

Extension methods also allow you to provide an operation that one can apply to an entire hierarchy of types' instances. For example, consider an extension method whose first parameter is of type System.Object. That extension method can be called on any type. You can't achieve that with inheritance. To do so would require that you have some ability to create a derived type from System.Object, say MyObject, and then somehow get every type in the CLR to derive from MyObject rather than System.Object. Clearly, there's no reason to think too much about the impossible. But just because you can create an extension method that is callable on all objects does not mean you should. Unless the extension method operates only on the public interface of System.Object, you'll most likely have some code in the extension method to determine type at run time so that you can perform some type-specific operation. Such a coding style defeats the strongly typed nature of C# and its compiler.

Note

You can also use generics to apply extension methods to multiple types. You can declare generic extension methods just as easily as declaring generic instance methods or generic delegates. In the section titled "The Visitor Pattern," I show an example of using a generic extension method.

Isolate Extension Methods in Separate Namespace

One of the fundamental disciplines in writing methods is that you should avoid side effects. What that means is that, if you create a method such as LogToFile and within its implementation you decide to modify some global state that is used by other components in the application, you have just introduced a potentially dangerous side effect. Side effects such as these are usually the cause of many hard-to-find bugs because, in this case, the modification of the global state might not be intuitive based on the method name.

In the same regard, try to avoid introducing side effects to your clients when they import namespaces. Specifically, it's best if you declare any extension methods in their own namespace separate from the namespace of the types they extend. Typically, the extension methods are in a nested namespace. Not giving your clients this granularity can cause confusion when the compiler attempts to look up a method for an instance method call.

Imagine, for a moment, the confusion that could come from defining your extension methods in the System namespace. There's no mechanism that keeps you from doing so. Just about everyone imports the System namespace into their code. Thus, if you define your extensions that way, most everyone will import them, and the only way to keep it from happening is for them to live with the inconvenience of not importing the System namespace at all!

If you think this is no big deal, allow me to paint a scenario. Imagine that you have an application that uses a library from Acme Widgets. The developers of Acme Widgets thought that it would be handy to introduce an extension method named WriteToLog so that you can have another debugging tool in your toolbox when using their library. Being good designers, they defined the extension method in a namespace called AcmeWidgetExtensions. Now, two versions later, you come across a library from Ace Objects that you just must have. Before making any changes to your code, you reference their assembly and include their namespace in your project. All of a sudden, your code won't build: the compiler is complaining with error CS0121 that calls to WriteToLog are ambiguous! Further investigation reveals that Ace Objects also thought it would be handy to provide an extension method called WriteToLog. Unfortunately, they defined it in the System namespace, which all your code files import. Ouch!

Thus, the moral of the story is to always define your extension methods in a separate namespace in order to allow your clients the granularity they need when importing them into their scope. Moreover, if you are offering a large set of extension methods, consider whether it would be appropriate to further partition them into multiple namespaces to offer greater granularity to your clients.

Changing a Type's Contract Can Break Extension Methods

When the compiler looks for a name that matches an instance method call, extension methods are the last place it looks. This makes sense because if you have a class that already implements an instance method named WriteToLog, you don't want an extension method to replace that functionality. However, consider the following scenario.

You have an application that uses a library from Acme Widgets. To further help debug your system and to produce a rich logging mechanism, you created an extension method named WriteToLog that you can use to send information about a particular widget to a log file. Time passes, and now you have decided to upgrade to version 2 of the Acme Widgets library. But in the meantime, the creators of the Acme Widgets library decided to extend the public contract of some of their types and add a WriteToLog method because, before you implemented your own WriteToLog extension method, you sent them a feature request expressing how valuable such a thing would be. Without knowing that they added this method to their types, you recompile your code. There are no errors because the new instance method's signature just happens to match your extension method's signature exactly. But then the next time you run your application, you see some different behavior, and all of a sudden, the formatting in your log file is completely different! This happens because now the compiler prefers the instance method over the extension method. It turns out that similar bad things can happen if the type definition includes a new property with the same name as your extension method. But in that case, you get a compiler error as it complains that you are attempting to call a property as if it were a method.

The only real solution to this problem, if you ever come across it, is to switch to calling the extension method through the classic static method call syntax rather than the extension method instance call syntax. But if you're unlucky enough to have the switch happen silently, as in the previous scenario, it might be a little while before you realize that you need to start calling the extension method differently.

Transforms

Even though extension methods are merely syntactic shortcuts for calling static methods using the standard method call syntax, sometimes even such seemingly insignificant features can trigger a different thought process, thus opening up a plethora of new ideas. For example, imagine that you have a collection of data. Let's say that collection implements IEnumerable<T>. Now, let's say that we want to apply an operation to each item in the collection and produce a new collection. For the sake of example, let's assume that we have a collection of integers and that we want to transform them into a collection of doubles that are one-third of the original value. You could approach the problem as shown in this example:

using System;
using System.Collections.Generic;

public class TransformExample
{
    static void Main() {
        var intList = new List<int>() { 1, 2, 3, 4, 5 };

        var doubleList = new List<double>();

        // Compute the new list.
        foreach( var item in intList ) {
            doubleList.Add( (double) item / 3 );
            Console.WriteLine( item );
        }
        Console.WriteLine();

        // Display the new list.
        foreach( var item in doubleList ) {
            Console.WriteLine( item );
        }
        Console.WriteLine();
    }
}

The technique here is a typical imperative programming style and a valid solution to the problem. Unfortunately, it's not very scalable or reusable. For example, imagine if you wanted to apply some other operation to the result of the first one, or maybe three operations chained together. Or maybe you want to make as much of this code reusable as possible.

There are really at least two fundamental operations taking place in this example. The first is that of iterating over the input collection and producing a new collection. Another operation, which is fundamentally orthogonal to the first, is that of dividing each item by 3. Wouldn't it be nice to decouple these two? Then, if coded correctly, the transformation code can be reused with a variety of operations. So, first, let's break out the operation from the transformation and see what the code may look like:

using System;
using System.Collections.Generic;

public class TransformExample
{
    delegate double Operation( int item );

    static List<double> Transform( List<int> input, Operation op ) {
        List<double> result = new List<double>();
        foreach( var item in input ) {
            result.Add( op(item) );
        }
return result;
    }

    static double DivideByThree( int n ) {
        return (double)n / 3;
    }

    static void Main() {
        var intList = new List<int>() { 1, 2, 3, 4, 5 };

        // Compute the new list.
        var doubleList = Transform( intList, DivideByThree );

        foreach( var item in intList ) {
            Console.WriteLine( item );
        }
        Console.WriteLine();

        // Display the new list.
        foreach( var item in doubleList ) {
            Console.WriteLine( item );
        }
        Console.WriteLine();
    }
}

The new code is better. Now, the operation has been factored out and is passed via a delegate to the Transform static method.

As you can imagine, we can convert the Transform method to an extension method. But that's not all! We can also use generics to make the code even more reusable. But wait, there's even more! We can use iterators to make the Transform method calculate its items in a lazy fashion. Check out the next example for a more reusable version of Transform:

using System;
using System.Linq;
using System.Collections.Generic;

public static class MyExtensions
{
    public static IEnumerable<R> Transform<T, R>(
                     this IEnumerable<T> input,
                     Func<T, R> op ) {
        foreach( var item in input ) {
            yield return op( item );
        }
    }
}

public class TransformExample
{
    static double DivideByThree( int n ) {
        return (double)n / 3;
}

    static void Main() {
        var intList = new List<int>() { 1, 2, 3, 4, 5 };

        // Compute the new list.
        var doubleList =
            intList.Transform( DivideByThree );

        foreach( var item in intList ) {
            Console.WriteLine( item );
        }
        Console.WriteLine();

        // Display the new list.
        foreach( var item in doubleList ) {
            Console.WriteLine( item );
        }
    }
}

Now we're getting there! First, notice that Transform<T> is now a generic extension method. Moreover, it takes and returns IEnumerable<T> types. Now Transform<T> can be used on any generic collection and accepts a delegate describing how to transform each item. The Func<> type is defined in the System namespace and makes it easier to declare delegates. Because an iterator block is used to return items from Transform<T> via the yield keyword, each item is only processed each time the returned IEnumerable<T> type's cursor is advanced. In this example, the computational savings are trivial, but this sort of lazy evaluation is one of the cornerstones of LINQ.

However, you can easily imagine a situation where the passed-in operation can take quite a bit of time to process each item in the input collection. The input collection could contain long strings and the operation could be an encryption operation, for example.

Another reason lazy evaluation is so handy is that the input collection could even be an infinite series. How? Check out the next example, which also shows a teaser for lambda expressions, covered in Chapter 15:

using System;
using System.Linq;
using System.Collections.Generic;

public static class MyExtensions
{
    public static IEnumerable<R> Transform<T, R>(
                     this IEnumerable<T> input,
                     Func<T, R> op ) {
        foreach( var item in input ) {
            yield return op( item );
        }
    }
}

public class TransformExample
{
    static IEnumerable<int> CreateInfiniteSeries() {
int n = 0;
        while( true ) {
            yield return n++;
        }
    }

    static void Main() {
        var infiniteSeries1 = CreateInfiniteSeries();

        var infiniteSeries2 =
            infiniteSeries1.Transform( x => (double)x / 3 );

        IEnumerator<double> iter =
            infiniteSeries2.GetEnumerator();

        for( int i = 0; i < 25; ++i ) {
            iter.MoveNext();
            Console.WriteLine( iter.Current );
        }
    }
}

How cool is that? It's so easy to create an infinite series with an iterator block. Of course, in my loop I could not use foreach; otherwise, the program would never finish and you would have to terminate it forcefully. The funny syntax within the Transform<T> method call is a lambda expression. A lambda expression used this way defines a function (passed as a delegate in this case). You can envision lambda expressions as a terse syntax for defining anonymous methods. If you just can't wait to see what lambda expressions are all about, jump to Chapter 15.

Used this way, extension methods allow us to implement more of a functional programming style.[57] After all, the Transform<T> method just shown fits into that category. In fact, you will find that most of the new additions introduced in C# 3.0 facilitate the functional programming paradigm. Those features include extension methods, lambda expressions, and LINQ. Each of these features places the emphasis on the computational operation rather than the structure of the computation. The benefits of functional programming are numerous, and one could fill the pages of an entire book describing them. For example, functional programming facilitates parallelism because variables are typically never changed after initial assignment; thus, fewer locks and sync blocks are necessary.

Note

C++ developers familiar with template metaprogramming, as described in the excellent book C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond, by David Abrahams and Aleksey Gurtovoy (Boston, MA: Addison-Wesley Professional, 2004), will be right at home with this style of functional programming. In fact, template metaprogramming provides a more purely functional programming environment because once a variable (or symbol, in functional programming lingo) is assigned, it can never be changed. C#, on the other hand, offers a hybrid environment in which you are free to implement functional programming if you choose. Also, those familiar with the Standard Template Library (STL) will get a familiar feeling from this style of programming. STL swept through the C++ programming community back in the early 1990s and encouraged a more functional programming thought process.

Operation Chaining

Using extension methods, operation chaining becomes a more natural process. Again, it's nothing that you could not have done in the C# 2.0 days using plain static methods and anonymous methods. However, with the streamlined syntax, chaining actually removes the clutter and can trigger some innovative thinking. Let's start with the example from the previous section, in which we took a list of integers and transformed them into a list of doubles. This time, we'll look at how we can actually chain operations in a fluid way. Let's suppose that after dividing the integers by 3, we want to then compute the square of the result. The following code shows how to do that:

using System;
using System.Linq;
using System.Collections.Generic;

public static class MyExtensions
{
    public static IEnumerable<R> Transform<T, R>(
                     this IEnumerable<T> input,
                     Func<T, R> op ) {
        foreach( var item in input ) {
            yield return op( item );
        }
    }
}

public class TransformExample
{
    static IEnumerable<int> CreateInfiniteList() {
        int n = 0;
        while( true ) yield return n++;
    }

    static double DivideByThree( int n ) {
        return (double)n / 3;
    }

    static double Square( double r ) {
        return r * r;
    }

    static void Main() {
        var divideByThree =
            new Func<int, double>( DivideByThree );
        var squareNumber =
new Func<double, double>( Square );

        var result = CreateInfiniteList().
                        Transform( divideByThree ).
                        Transform( squareNumber );

        var iter = result.GetEnumerator();
        for( int i = 0; i < 25; ++i ) {
            iter.MoveNext();
            Console.WriteLine( iter.Current );
        }
    }
}

Isn't that cool? In one statement of code, I took an infinite list of integers and applied a divisor followed by a squaring operation, and the end result is a lazy-evaluated IEnumerable<double> type that computes each element as needed. Functional programming is actually pretty useful when you look at it this way. Of course, you could chain as many operations as necessary. For example, you might want to append a rounding operation at the end. Or maybe you want to append a filtering operation so that only the results that match a certain criteria are considered. To do that, you could create a generic Filter<T> extension method, similar to Transform<T>, that takes a predicate delegate as a parameter and uses it to filter the items in the collection.

At this point, I'm sure that you're thinking of all the really useful extension methods you could create to manipulate data. You might be wondering if a host of these extension methods already exists. Check out the System.Linq.Enumerable class. This class provides a whole host of extension methods that are typically used with LINQ, which I cover in Chapter 16. All these extension methods operate on types of IEnumerable<T>. Also, the System.Linq.Queryable class provides the same extension methods for types that implement IQueryable<T>, which derives from IEnumerable<T>.

Custom Iterators

Chapter 9 covered iterators, which were added to the language in C# 2.0. I described some ways you could create custom iterators. Extension methods offer even more flexibility to create custom iterators for collections in a very expressive way. By default, every collection that implements IEnumerable or IEnumerable<T> has a forward iterator, so a custom iterator would be necessary to walk through a collection in a different way than its default iterator. Also, you will need to create a custom iterator for types that don't support IEnumerable<T>, as I'll show in the next section, "Borrowing from Functional Programming." Let's look at how you can use extension methods to implement custom iterators on types implementing IEnumerable<T>.

For example, imagine a two-dimensional matrix implemented as a List<List<int>> type. When performing some operations on such matrices, it's common to require an iterator that walks through the matrix in row-major fashion. What that means is that the iterator walks all the items of the first row, then the second row, and so on until it reaches the end of the last row.

You could iterate through the matrix in row-major form as shown here:

using System;
using System.Collections.Generic;

public class IteratorExample
{
    static void Main() {
var matrix = new List<List<int>> {
            new List<int> { 1, 2, 3 },
            new List<int> { 4, 5, 6 },
            new List<int> { 7, 8, 9 }
        };

        // One way of iterating the matrix.
        foreach( var list in matrix ) {
            foreach( var item in list ) {
                Console.Write( "{0}, ", item );
            }
        }

        Console.WriteLine();
    }
}

Yes, this code gets the job done, but it is not very reusable. Let's see one way this can be redone using an extension method:

using System;
using System.Collections.Generic;

public static class CustomIterators
{
    public static IEnumerable<T> GetRowMajorIterator<T>(
                                    this List<List<T>> matrix ) {
        foreach( var row in matrix ) {
            foreach( var item in row ) {
                yield return item;
            }
        }
    }
}

public class IteratorExample
{
    static void Main() {
        var matrix = new List<List<int>> {
            new List<int> { 1, 2, 3 },
            new List<int> { 4, 5, 6 },
            new List<int> { 7, 8, 9 }
        };

        // A more elegant way to enumerate the items.
        foreach( var item in matrix.GetRowMajorIterator() ) {
            Console.Write( "{0}, ", item );
        }

        Console.WriteLine();
    }
}

In this version, I have externalized the iteration into the GetRowMajorIterator<T> extension method. At the same time, I made the extension method generic so it will accept two-dimensional nested lists that contain any type, thus making it a bit more reusable.

Borrowing from Functional Programming

You might have already noticed that many of the new features added in C# 3.0 facilitate a functional programming model. You've always been able to implement functional programming models in C#, but the new language features make it easier syntactically by making the language more expressive. Sometimes, the functional model facilitates easier solutions to various problems. Various languages are categorized as functional languages, and Lisp is one of them.

If you've ever programmed using Lisp, you know that the list is one of the core constructs in that language. In C#, we can model such a list using the following interface definition at the core:

public interface IList<T>
{
    T Head { get; }
    IList<T> Tail { get; }
}

Warning

Although I have named this type IList<T> for this example, be sure not to confuse it with IList<T> in the System.Collections.Generic namespace. If one were to implement this type as written, it would be best to define it within one's own namespace to avoid name conflict. After all, that is one of the benefits of using namespaces.

The structure of this list is a bit different from the average linked list implementation. Notice that instead of one node containing a value and a pointer to the next node, it instead contains the value at the node and then a reference to the rest of the list. In fact, it's rather recursive in nature. That's no surprise because recursive techniques are part of the functional programming model. For example, if you were to represent a list on paper by writing values within parentheses, a traditional list might look like the following:

(1 2 3 4 5 6)

Whereas a list defined using the IList<T> interface above could look like this:

(1 (2 (3 (4 (5 (6 (null null)))))))

Each set of parentheses contains two items: the value of the node and then the remainder of the list within a nested set of parentheses. So, to represent a list with just one item in it, such as just the number 1, we could represent it this way:

(1 (null null))

And of course, the empty list could be represented this way:

(null null)

In the following example code, I create a custom list called MyList<T> that implements IList<T>. The way it is built here is not very efficient, and I'll have more to say about that shortly.

using System;
using System.Collections.Generic;

public interface IList<T>
{
    T Head { get; }
    IList<T> Tail { get; }
}

public class MyList<T> : IList<T>
{
    public static IList<T> CreateList( IEnumerable<T> items ) {
        IEnumerator<T> iter = items.GetEnumerator();
        return CreateList( iter );
    }

    public static IList<T> CreateList( IEnumerator<T> iter ) {
        if( !iter.MoveNext() ) {
            return new MyList<T>( default(T), null );
        }

        return new MyList<T>( iter.Current, CreateList(iter) );
    }

    private MyList( T head, IList<T> tail ) {
        this.head = head;
        this.tail = tail;
    }

    public T Head {
        get {
            return head;
        }
    }

    public IList<T> Tail {
        get {
            return tail;
        }
    }

    private T        head;
    private IList<T> tail;
}

public static class CustomIterators
{
    public static IEnumerable<T>
        LinkListIterator<T>( this IList<T> theList ) {
for( var list = theList;
             list.Tail != null;
             list = list.Tail ) {
            yield return list.Head;
        }
    }
}

public class IteratorExample
{
    static void Main() {
        var listInts = new List<int> { 1, 2, 3, 4 };
        var linkList =
            MyList<int>.CreateList( listInts );

        foreach( var item in linkList.LinkListIterator() ) {
            Console.Write( "{0}, ", item );
        }

        Console.WriteLine();
    }
}

First, notice in Main that I am initializing an instance of MyList<int> using a List<int>. The CreateList static method recursively populates MyList<int> using these values. Once CreateList is finished, we have an instance of MyList<int> that can be visualized as follows:

(1 (2 (3 (4 (null null)))))

You're probably wondering why the list is not represented using the following:

(1 (2 (3 (4 null))))

You could do that; however, you will find that it is not as easy to use either when composing the list or consuming it.

Speaking of consuming the list, you can imagine that there are times when you need to iterate over one of these lists. In that case, you need a custom iterator, which I have highlighted in the example. The code in Main uses this iterator to send all the list items to the console. The output is as follows:

1, 2, 3, 4,

In the example, notice that the LinkListIterator<T> method creates a forward iterator by making some assumptions about how to determine whether it has reached the end of the list and how to increment the cursor during iteration. That is, it starts at the head and assumed it has finished iterating once the current node's tail is null. What if we externalized this information? For example, what if we wanted to allow the user to parameterize what it means to iterate, such as iterate forwards, backwards, circularly, and so on? How could we do that? If the idea of delegates pops into your mind, you're right on track. Check out the following revised version of the iterator extension method and the Main method:

public static class CustomIterators
{
public static IEnumerable<T>
        GeneralIterator<T>( this IList<T> theList,
                          Func<IList<T>, bool> finalState,
                          Func<IList<T>, IList<T>> incrementer ) {
        while( !finalState(theList) ) {
            yield return theList.Head;
            theList = incrementer( theList );
        }
    }
}

public class IteratorExample
{
    static void Main() {
        var listInts = new List<int> { 1, 2, 3, 4 };
        var linkList =
            MyList<int>.CreateList( listInts );

        var iterator = linkList.GeneralIterator( delegate( IList<int> list ) {
                                                    return list.Tail == null;
                                                 },
                                                 delegate( IList<int> list ) {
                                                    return list.Tail;
                                                 } );
        foreach( var item in iterator ) {
            Console.Write( "{0}, ", item );
        }

        Console.WriteLine();
    }
}

Notice that the GeneralIterator<T> method accepts two more delegates, one of which is then called upon to check whether the cursor is at the end of the list, and the other to increment the cursor. In the Main method, I am passing two delegates in the form of anonymous methods. Now the GeneralIterator<T> method can be used to iterate over every other item in the list simply by modifying the delegate passed in through the incrementer parameter.

Note

Some of you might already be familiar with lambda expressions, which were introduced in C# 3.0. Indeed, when using lambda expressions, you can clean up this code considerably by using the lambda expression syntax to replace the previous anonymous delegates. I cover lambda expressions in Chapter 15.

As a final extension method example for operations on the IList<T> type, consider how we could create an extension method to reverse the list and return a new IList<T>. There are several ways one could consider doing this, and some are much more efficient than others. However, I want to show you an example that uses a form of recursion. Consider the following Reverse<T> custom method implementation:

public static class CustomIterators
{
    public static IList<T> Reverse<T>( this IList<T> theList ) {
        var reverseList = new List<T>();
        Func<IList<T>, List<T>> reverseFunc = null;

        reverseFunc = delegate(IList<T> list) {
            if( list != null ) {
                reverseFunc( list.Tail );
                if( list.Tail != null ) {
                    reverseList.Add( list.Head );
                }
            }
            return reverseList;
        };

        return MyList<T>.CreateList( reverseFunc(theList) );
    }
}

If you've never encountered this style of coding, it can surely make your brain twist inside your head. The key to the work lies in the fact that there is a delegate defined that calls itself and captures variables along the way.[58] In the preceding code, the anonymous method is assigned to the reverseFunc variable. And as you can see, the anonymous method body calls reverseFunc, or more accurately, itself! In a way, the anonymous method captures itself! The trigger to get all the work done is in the last line of the Reverse<> method. It initiates the chain of recursive calls to the anonymous method and then passes the resulting List<T> to the CreateList method, thus creating the reversed list.

Those who pay close attention to efficiency are likely pointing out the inefficiency of creating a temporary List<T> instance that is then passed to CreateList in Main. After all, if the original list is huge, creating a temporary list to just throw away moments later will put pressure on the garbage collected heap, among other things. For example, if the constructor to MyList<T> is made public, you can eliminate the temporary List<T> entirely and build the new MyList<T> using a captured variable as shown here:

public static class CustomIterators
{
    public static IList<T> Reverse<T>( this IList<T> theList ) {
        var reverseList = new MyList<T>(default(T), null);
        Func<IList<T>, MyList<T>> reverseFunc = null;

        reverseFunc = delegate(IList<T> list) {
            if( list.Tail != null ) {
                reverseList = new MyList<T>( list.Head, reverseList );
                reverseFunc( list.Tail );
            }

            return reverseList;
};

        return reverseFunc(theList);
    }
}

The previous Reverse<T> method first creates an anonymous function and stores it in the local variable reverseFunc. It then returns the results of calling the anonymous method to the caller of Reverse<T>. All the work of building the reversed list is encapsulated into the closure created by the anonymous method and the captured local variables reverseList and reverseFunc. reverseFunc simply calls itself recursively until it is finished building the reversed list into the reverseList captured variable.

Those of you who are more familiar with functional programming are probably saying that the preceding Reverse<T> extension method can be cleaned up by eliminating the captured variable and using the stack instead. In this case, it's more of a stylistic change, but I want to show it to you for completeness' sake. Instead of having the captured variable reverseList, as in the previous implementation of Reverse<T>, I instead pass the reference to the list I am building as an argument to each recursion of the anonymous method reverseFunc. Why would you want to do this? By eliminating the captured variable reverseList, you eliminate the possibility that the reference could be inadvertently changed outside of the scope of the anonymous method. Therefore, my final example of the Reverse<T> method uses only the stack as a temporary storage location while building the new reversed list:

public static class CustomIterators
{
    public static IList<T> Reverse<T>( this IList<T> theList ) {
        Func<IList<T>, IList<T>, IList<T>> reverseFunc = null;

        reverseFunc = delegate(IList<T> list, IList<T> result) {
            if( list.Tail != null ) {
                return reverseFunc( list.Tail, new MyList<T>(list.Head, result) );
            }

            return result;
        };

        return reverseFunc(theList, new MyList<T>(default(T), null));
    }
}

Note

This code uses the Func<> definition, which is a generic delegate that is defined in the System namespace. Using Func<> is a shortcut you can employ to avoid having to declare delegate types all over the place. You use the Func<> type parameter list to declare what the parameter types (if any) and return type of the delegate are. If the delegate you need has no return value, you can use the Action<> generic delegate type.

The MyList<T> class used in the previous examples builds the linked list from the IEnumerable<T> type entirely before the MyList<T> object can be used. I used a List<T> as the seed data, but I could have used anything that implements IEnumerable<T> to fill the contents of MyList<T>. But what if IEnumerable<T> were an infinite iterator similar to the one created by CreateInfiniteList in the "Operation Chaining" section of this chapter? If you fed the result of CreateInfiniteList to MyList<T>.CreateList, you would have to kill the program forcefully or wait until your memory runs out as it tries to build the MyList<T>. If you are creating a library for general use that contains a type such as MyList<T>, which builds itself given some IEnumerable<T> type, you should do your best to accommodate all scenarios that could be thrown at you. The IEnumerable<T> given to you could take a very long time to calculate each item of the enumeration. For example, it could be enumerating over a database of live data in which database access is very costly. For an example of how to create the list in a lazy fashion, in which each node is created only when needed, check out Wes Dyer's excellent blog, specifically the entry titled "Why all the love for lists?"[59] The technique of lazy evaluation while iterating is a fundamental feature of LINQ, which I cover in Chapter 16.

The Visitor Pattern

The Visitor pattern, as described in the seminal pattern book Design Patterns: Elements of Reusable Object-Oriented Software by the Gang of Four,[60] allows you to define a new operation on a group of classes without changing the classes. Extension methods present a handy option for implementing the Visitor pattern.

For example, consider a collection of types that might or might not be related by inheritance, and imagine that you want to add functionality to validate instances of them at some point in your application. One option, although very unattractive, is to modify the public contract of all the types, introducing a Validate method on each of them. One might even jump to the conclusion that the easiest way to do it is to introduce a new base type that derives from System.Object, implements Validate as an abstract method, and then makes all the other types derive from the new type instead of System.Object. That would be nothing more than a maintenance nightmare in the end.

By now, you should agree that an extension method or a collection of extension methods will do the trick nicely. Given a collection of unrelated types, you will probably implement a host of extension methods. But the beauty is that you don't have to change the already defined types. In fact, if they're not your types to begin with, you cannot change them anyway. Consider the following code:

using System;
using ValidatorExtensions;

namespace ValidatorExtensions
{
    public static class Validators
    {
        public static void Validate( this String str ) {
            // Do something to validate the String instance.

            Console.WriteLine( "String with "" +
                               str +
                               "" Validated." );
}

        public static void Validate( this SupplyCabinet cab ) {
            // Do something to validate the SupplyCabinet instance.

            Console.WriteLine( "Supply Cabinet Validated." );
        }

        public static void Validate( this Employee emp ) {
            // Do something to validate the Employee instance.

            Console.WriteLine( "** Employee Failed Validation! **" );
        }
    }
}

public class SupplyCabinet
{
}

public class Employee
{
}

public class MyApplication
{
    static void Main() {
        String data = "some important data";

        SupplyCabinet supplies = new SupplyCabinet();

        Employee hrLady = new Employee();

        data.Validate();
        supplies.Validate();
        hrLady.Validate();
    }
}

Notice that for each type of object we want to validate (in this example there are three), I have defined a separate Validate extension method. The output from the application shows that the proper Validate method is being called for each instance and is as follows:

String with "some important data" Validated.
Supply Cabinet Validated.
** Employee Failed Validation! **

In this example, it's important to note that the visitors, in this case the extension methods named Validate, must treat the instance that they are validating as black boxes. By that I mean that they do not have the validation capabilities of a true instance method because only true instance methods have access to the internal state of the objects. Nevertheless, in this example, it might make sense to validate the instances from a client's perspective.

Note

Keep in mind that if the extension methods are defined in the same assembly as the types they are extended, they can still access internal members.

Using generics and constraints, you can slightly extend the previous example and provide a generic form of the Validate extension method that can be used if the instance supports a well-known interface. In this case, the well-known interface is named IValidator. Therefore, it would be nice to create a special Validate method that will be called if the type implements the IValidator interface. Consider the following code, which shows the changes marked in bold:

using System;
using ValidatorExtensions;

namespace ValidatorExtensions
{
    public static class Validators
    {
        public static void Validate( this String str ) {
            // Do something to validate the String instance.

            Console.WriteLine( "String with "" +
                               str +
                               "" Validated." );
        }

        public static void Validate( this Employee emp ) {
            // Do something to validate the Employee instance.

            Console.WriteLine( "** Employee Failed Validation! **" );
        }

        public static void Validate<T>( this T obj )
                            where T: IValidator {
            obj.DoValidation();
            Console.WriteLine( "Instance of following type" +
                               " validated: " +
                               obj.GetType() );
        }
    }
}

public interface IValidator
{
void DoValidation();
}

public class SupplyCabinet : IValidator
{
    public void DoValidation() {
        Console.WriteLine( "	Validating SupplyCabinet" );
    }
}

public class Employee
{
}

public class MyApplication
{
    static void Main() {
        String data = "some important data";

        SupplyCabinet supplies = new SupplyCabinet();

        Employee hrLady = new Employee();

        data.Validate();
        supplies.Validate();
        hrLady.Validate();
    }
}

Now, if the instance that we're calling Validate on happens to implement IValidator, and there is not a version of Validate that specifically takes the type as its first parameter, the generic form of Validate will be called, which then passes through to the DoValidation method on the instance.

Notice that I removed the extension method whose first parameter was of type SupplyCabinet, so that the compiler would choose the generic version. If I had left it in, the code as written in Main would call the version that I removed. However, even if I had not removed the nongeneric extension method, I could have forced the compiler to call the generic one by changing the syntax at the point of call, as shown here:

public class MyApplication
{
    static void Main() {
        String data = "some important data";

        SupplyCabinet supplies = new SupplyCabinet();

        Employee hrLady = new Employee();

        data.Validate();

        // Force generic version
        supplies.Validate<SupplyCabinet>();
hrLady.Validate();
    }
}

In the Main method, I have given the compiler more information to limit its search of the Validate method to generic forms of the extension method that accept one generic type parameter.

Summary

In this chapter, I introduced you to extension methods, including how to declare them, how to call them, and how the compiler implements them under the covers. Additionally, you saw how they really are just syntactic sugar and don't require any changes to the underlying runtime in order to work. Extension methods can cause confusion when defined inappropriately, so we looked at some caveats to avoid. I showed you how they can be used to create useful things such as iterators (IEnumerable<T> types) on containers that are not enumerable by default. Even for types that do have enumerators, you can define enumerators that iterate in a custom way. As you'll see in Chapter 15, when they are combined with lambda expressions, extension methods provide a certain degree of expressiveness that is extremely useful. While showing how to create custom iterators, I took a slight detour (using anonymous functions rather than lambda expressions) to show you the world of functional programming that the features added to C# 3.0 unlock. The code for those examples will become much cleaner when you use lambda expressions instead of anonymous methods.

In the next chapter, I'll introduce you to lambda expressions, which really make functional programming in C# syntactically succinct. Additionally, they allow you to convert a functional expression into either code or data in the form of IL code or an expression tree, respectively.



[56] Some people in the jaded crowd call this choosing between the lesser of two evils.

Code Readability versus Code Understandability

[57] For more on functional programming, search "functional programming" on http://www.wikipedia.org.

[58] Computer science wonks like to call a delegate that captures variables a closure, which is a construct in which a function is packaged with an environment (such as variables).

[59] You can find Wes Dyer's blog titled "Yet Another Language Geek" at blogs.msdn.com/wesdyer/.

[60] Design Patterns: Elements of Reusable Object-Oriented Software, by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (Boston, MA: Addison-Wesley Professional, 1995), is cited in the references at the end of this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.154.252