Chapter 12. Extending LINQ

This chapter covers:

  • LINQ’s extension mechanisms
  • Custom query operators
  • The query expression pattern
  • IQueryable and IQueryProvider
  • LINQ to Amazon

When we introduced LINQ, we pointed out that one of its major features is its ability to query several kinds of data and data sources. In the chapters that followed, we focused on LINQ to Objects, LINQ to SQL, and LINQ to XML. In this chapter, we’ll analyze how these flavors of LINQ extend LINQ with support for data sources such as in-memory collections, XML, or relational databases. This will allow you to determine the techniques you can use to extend LINQ and use it with your own data sources. LINQ’s extensibility features will allow you to adapt it to particular needs. They will also enable novel use cases that expand the LINQ spectrum.

LINQ’s extensibility allows you to create your own flavor of LINQ by creating a LINQ provider. Of course, this can be a lot of work. Most of the time you won’t need to create a complete LINQ flavor. You may simply need small adaptations of LINQ’s behavior. Fortunately, LINQ is flexible enough that you can add new query operators or redefine some of the default ones according to your needs. Whether you’re a framework provider who wants to give your users the power of LINQ or simply a developer who wants to adapt LINQ to your own business context, you’ll find that LINQ is flexible enough to adapt to your needs.

The goal of this chapter is to show the available extensibility options that LINQ offers to help you pick the technique best for your situation. We’ll also show you how to use these extensibility options through demonstrations. In order to demonstrate LINQ’s extensibility, we’ll cover several examples. We’ll start by creating custom query operators and using them as utility methods that can simplify your LINQ queries. We’ll also create domain-specific query operators. These will allow you to work closely with your own business objects. We’ll then see how we can rewrite the basic operators used by query expressions, such as Where or OrderBy. Finally, we’ll create a new LINQ provider: LINQ to Amazon. This example will demonstrate how to encapsulate calls to a web API into a LINQ provider that you’ll use in LINQ queries. We’ll then review advanced extensibility features that involve expression trees and the System.Linq.IQueryable<T> interface.

To get started, let’s review how LINQ was designed to be extensible.

12.1. Discovering LINQ’s extension mechanisms

As we explained when we introduced LINQ, it is not a closed package that allows working only with in-memory collections, XML, or relational databases. In fact, LINQ is based on extensibility from the ground up. In other words, LINQ isn’t exclusive. It can be adapted to work with the data sources you have to deal with.

The core of LINQ consists of query operators and query expressions. This is where the magic happens. The great news is that the query syntax offered by LINQ’s query expressions is by no means hard-wired to the standard query operators we introduced in chapters 3 and 4. Query expressions are purely a syntactic feature that applies to anything that fulfills what is known as the LINQ query expression pattern. This pattern defines the set of operators required to fully support query expressions and how they must be implemented. Implementing this pattern consists of providing methods with appropriate names and signatures. The standard query operators you are used to working with provide an implementation of the LINQ query expression pattern. They’re implemented as extension methods (see chapter 2) that augment the IEnumerable<T> interface. This is just one possible implementation of the LINQ query expression pattern.

The standard query operators implement the LINQ query expression pattern to enable querying any .NET array or collection. Developers may apply the query syntax to any class they want, as long as they make sure their implementation adheres to the LINQ pattern. Third parties are free to replace the standard query operators with their own implementations that are appropriate for a target domain or technology. Custom implementation may provide additional services such as remote evaluation, query translation, or optimization. By adhering to the conventions of the LINQ pattern, such implementations can enjoy the same language integration and tool support as the standard query operators.

Before looking at how we can extend LINQ, it’s important to understand how Microsoft’s official flavors of LINQ are built on the LINQ foundation. This will allow us to highlight the different extensibility options that are available.

12.1.1. How the LINQ flavors are LINQ implementations

The extensibility of the query architecture is used in LINQ to provide implementations that work over various data sources such as XML or SQL data. The query operators over XML (LINQ to XML) use an efficient, easy-to-use in-memory XML facility to provide XPath/XQuery functionality in the host programming language. The query operators over relational data (LINQ to SQL) build on the integration of SQL-based schema definitions into the CLR type system. This integration provides strong typing over relational data while retaining the expressive power of the relational model and the performance of query evaluation directly in the underlying data store.

The flavors of LINQ provided by Microsoft are all made possible thanks to LINQ’s extensibility. These flavors include LINQ to Objects, LINQ to XML, LINQ to DataSet, LINQ to SQL, and LINQ to Entities. In terms of implementation, each flavor comes in the form of a LINQ provider. Each provider relies on specific extensibility techniques supported by LINQ. Depending on what you wish to achieve, you’ll reuse one of the techniques that the default providers use. Reviewing how each provider is implemented will help you determine which technique to use to create your own LINQ to Whatever.

LINQ to Objects

LINQ to Objects allows us to query arrays or other collections that implement the IEnumerable<T> interface. LINQ to Objects relies on the standard query operators, which are extension methods for the IEnumerable<T> type. When we use LINQ to Objects, we’re using the set of query operators implemented by the System.Linq.Enumerable class. That’s all there is to LINQ to Objects. It’s pretty straightforward.

LINQ to DataSet

LINQ to DataSet allows us to query DataSets using LINQ. It is not much more complicated than LINQ to Objects. LINQ to DataSet is also based on the same standard query operators, but it adds a small set of extension methods for the types involved in DataSets, mostly the System.Data.DataRow class.

See our online chapter to learn more about LINQ to DataSet.

LINQ to XML

LINQ to XML is also based on the standard query operators, but adds a set of classes to deal with XML objects. LINQ to XML is used the same way LINQ to Objects is used, but this time you query and create objects such as XNode, XElement, XAttribute, and XText.

LINQ to SQL

LINQ to SQL works differently than the previous providers. While the standard query operators used with LINQ to Objects and LINQ to XML use delegates, all the query operators used by LINQ to SQL are implemented using expression trees. The implementation of the operators is provided by the System.Linq.Queryable class this time. Also, these operators don’t deal with IEnumerable<T> but with IQueryable<T>. The use of expression trees and IQueryable<T> enables LINQ to SQL queries obtained by numerous calls to query operators to be converted into a single SQL query that gets sent to the database.

LINQ to Entities

LINQ to Entities is implemented using the same technique as LINQ to SQL. LINQ to Entities translates LINQ expressions into the canonical query trees used throughout the ADO.NET Entity Framework. These trees are handed out to the Entity Framework query pipeline for mapping and SQL generation.

The way the official Microsoft LINQ providers are implemented should give you a good idea of what can be achieved through LINQ’s extensibility features. Here are the options we can use to improve LINQ or to create a new LINQ provider:

  • Create query operators that implement the LINQ query expression pattern using delegates.
  • Provide classes that can be used with the standard query operators but that allow working with a specific data source or with specific data types.
  • Create query operators that implement the LINQ query expression pattern using expression trees.
  • Implement the IQueryable<T> interface.

We’ll soon demonstrate how to put these techniques into practice. Before this, let’s suggest additional usages for LINQ’s extensibility features.

12.1.2. What can be done with custom LINQ extensions

The range of possibilities offered by LINQ’s extensibility features goes from querying your custom business objects to querying...anything! LINQ has extension mechanisms suitable for the level of customization you desire.

As we’ll demonstrate through examples, you can start by simply creating additional query operators. And if you have a need for it, you can even create a custom implementation of the standard query operators. By writing extension methods that mimic what the implementation from System.Linq.Enumerable provides, you can adapt the behavior of the standard operators to your needs.

In advanced cases, you can resort to the technique used for LINQ to SQL: resorting to expression trees and implementing the IQueryable<T> interface. This is more difficult than creating simple query operators, but this is what you’ll need to do if you want to use LINQ queries with complex or remote data sources. For example, web sites and web services don’t support the kind of intensive interaction LINQ to Objects implies with the standard query operators. This means that other techniques are required. Similar to the way LINQ to SQL works, you can take advantage of expression trees and deferred query execution to be able to query remote sources.

It may be difficult to imagine what your needs will be, but we can give you an idea of what can be achieved through extensibility. Let’s review potential uses of LINQ to help you see how far you can go with it.

Suggested use cases for LINQ’s extensibility

Here are some scenarios that could require putting LINQ’s extensibility into action:

  • Querying a custom data source (such as a filesystem, Active Directory, WMI, or Windows’s Event Log)
  • Querying web services (Amazon, other public web services, or in-house web services)
  • Allowing the developers who use your product to take advantage of LINQ—if you are a tool provider or sell a development framework (examples include object-relational frameworks)

Some of these scenarios are more difficult than others. Querying the Windows Event Log may not require more than implementing some query operators, which is not very difficult. In comparison, integrating LINQ with an object-relational framework is more involved and implies dealing with the IQueryable<T> interface and expression trees. This is what LINQ to SQL uses to generate SQL queries from LINQ queries. This is also what a framework like NHibernate could use to generate HQL queries from LINQ queries.

 

Note

The custom query operators we’ll demonstrate here apply to in-memory queries only. This means that they can work with LINQ to Objects, LINQ to DataSet, and LINQ to XML, but not with LINQ to SQL or LINQ to Entities. This is because for a query operator to be supported by LINQ to SQL or LINQ to Entities, it must be translatable into SQL or Entity SQL. LINQ to SQL and LINQ to Entities have no knowledge about your additional operators, so they wouldn’t know what to do with them.

Techniques exist to create custom query operators that can be used in LINQ to SQL, but we won’t discuss them here.

 

Enough with the preliminaries! It’s time to get our hands dirty. We’ll cover the various extensibility options, from the simplest ones to the richer and more difficult ones. We’ll use a gradual approach, starting with “light” extensions and finishing with our advanced LINQ to Amazon example. To get started, let’s see how to implement additional query operators.

12.2. Creating custom query operators

In this section, we’ll focus on LINQ to Objects. Even though LINQ comes with 51 standard query operators, in some situations this may not be enough, as you’ll see. The first way to extend LINQ is to create additional query operators. You can use this technique to overcome the limitations that you may run into when working with the standard query operators. We’ll lead you through examples that will show you how to create additional operators that supplement the standard operators. We’ll also demonstrate how custom query operators may be used to enrich your LINQ queries with domain-specific processing.

12.2.1. Improving the standard query operators

Since we’re looking at how you can overcome limitations of the standard query operators, the best example to look at is a custom implementation of the Sum operator. When using the standard query operators in his code, a C# developer named Troy Magennis noticed some limitations (see http://aspiring-technology.com/blogs/troym/archive/2006/10/06/24.aspx). One of the limitations comes from the Sum query operator. There is a high chance for overflow when working with big numbers and the variant of Sum that operates on a sequence of integers.

The following simple piece of code demonstrates this problem:

Enumerable.Sum(new int[] {int.MaxValue, 1});

Understandably, this code yields an OverflowException with the message “Arithmetic operation resulted in an overflow.”[1] The problem is that the sum of two integers can be too big to fit in an int (System.Int32) object. This is why Troy wrote LongSum, which returns a long (System.Int64) object instead of an int object, as with Sum.

1 C# statements can execute in either checked or unchecked context, depending on the use of the checked or unchecked keywords. In a checked context, arithmetic overflow raises an exception. In an unchecked context, arithmetic overflow is ignored and the result is truncated. The Sum operator is implemented using the checked keyword, hence the OverflowException.

Let’s re-create the LongSum operator together. As you saw when we introduced the standard query operators in chapter 3, they consist of extension methods for the IEnumerable<T> type.

Listing 12.1 shows how the Sum operator for int comes out of the box in the System.Linq.Enumerable class.

Listing 12.1. Standard implementation of the Sum operator for int
namespace System.Linq
{
  public static class Enumerable
  {
    ...

    public static int Sum(this IEnumerable<int> source)
    {
      if (source == null)
        throw new ArgumentNullException("source");
      int sum = 0;
      checked
      {
        foreach (int v in source)
          sum += v;
      }
      return sum;
     }

     public static int? Sum(this IEnumerable<int?> source)
     {
       if (source == null)
         throw new ArgumentNullException("source");
       int? sum = 0;
       checked
       {
         foreach (int? v in source)
           if (v != null)
             sum += v;
       }
       return sum;
     }

     public static int Sum<T>(this IEnumerable<T> source,
       Func<T, int> selector)
     {
       return Enumerable.Sum(Enumerable.Select(source, selector));
     }

     public static int? Sum<T>(this IEnumerable<T> source,
       Func<T, int?> selector)
     {
       return Enumerable.Sum(Enumerable.Select(source, selector));
     }

     ...
  }
}

As you can see in the code, the Sum operator is implemented as four method overloads. These methods can be easily adapted to create the LongSum operator. Listing 12.2 shows the source code that implements the same four methods but with longs as the results.

Listing 12.2. LongSum, improved implementation of the Sum operator for int (SumExtensions.cs)
using System;
using System.Collections.Generic;
using System.Linq;

namespace LinqInAction.Extensibility
{
  public static class SumExtensions
  {
    public static long LongSum(this IEnumerable<int> source)
    {
      if (source == null)
        throw new ArgumentNullException("source");
      long  sum = 0;
      checked
      {
        foreach (int v in source)
          sum += v;
      }
      return sum;
    }

    public static long? LongSum (this IEnumerable<int?> source)
    {
      if (source == null)
        throw new ArgumentNullException("source");
      long? sum = 0;
      checked
      {
        foreach (int? v in source)
          if (v != null)
            sum += v;
      }
      return sum;
    }

    public static long LongSum<T>(this IEnumerable<T> source,
      Func<T, int> selector)
    {
      return SumExtensions.LongSum (Enumerable.Select(source, selector));
    }

    public static long? LongSum<T>(this IEnumerable<T> source,
      Func<T, int?> selector)
    {
      return SumExtensions.LongSum (Enumerable.Select(source, selector));
    }
  }
}

The new LongSum operator we’ve just created in listing 12.2 returns the numerical sum of a sequence of ints or nullable ints as a long or nullable long. This gives more range for the results compared to the default Sum operator.

This demonstrates how you can improve your LINQ experience with query operators that work the way they should, or at least the way you want them to work. This kind of extensibility ensures that you are not stuck with a static predefined set of operators.

The example we’ve just seen shows how to fix a problem with a standard query operator. But this isn’t the only way we can extend LINQ to solve a problem or to improve our code. Creating custom query operators can be useful in other situations, as you’ll see next.

12.2.2. Utility or domain-specific query operators

Our first example revolved around a standard query operator. The default set of operators that comes with LINQ to Objects is useful and can be applied to a wide range of situations. This is possible especially since these operators are generic: they can be used with any type of objects. However, when you are dealing with business objects, specific operations may be required.

Imagine you are working on Book and Publisher objects. How do you determine whether a book is expensive in a LINQ query? How do you retrieve a publisher’s books? The standard query operators may not be adapted to satisfy such needs because as we mentioned earlier, they’re generic! While being generic is a big advantage, it doesn’t help when business-specific processing or concepts are required, because more specialized assistance is needed. In situations like this, you would want to use custom utility query operators.

When writing code, developers often create utility or helper methods. Utility methods are commonly used to simplify code and keep frequently used code in one place. In order to remove complexity from your LINQ queries, it may be useful to create utility methods. Imagine you want to create a method that deals with a collection of Book objects. You could simply create a traditional method to do this, but the best way to proceed is to create a query operator. Since a query is made of calls to query operators, utility methods integrate nicely within LINQ queries if they’re written as query operators.

In order to get a feel for utility query operators, we are going to go through some samples. Each of the operators we’ll introduce works on one or a collection of the business objects from our LinqBooks running example. This is why we could say that these operators are “domain-specific query operators.”

Let’s start with an operator that works on a sequence of books.

IEnumerable<Book>.TotalPrice

The code in listing 12.3 shows how you can create an operator that works on a sequence of Book objects to compute a total price.

Listing 12.3. TotalPrice custom query operator (CustomQueryOperators.cs)
using System;
using System.Collections.Generic;
using System.Linq;

using LinqInAction.LinqBooks.Common;

namespace LinqInAction.Extensibility
{
  public static class CustomQueryOperators
  {
    ...

    public static Decimal TotalPrice(this IEnumerable<Book> books)
    {
      if (books == null)
        throw new ArgumentNullException("books");

      Decimal result = 0;
      foreach (Book book in books)
        if (book != null)
          result += book.Price;
      return result;
    }

    ...
  }
}

Our new TotalPrice operator can then be nicely used in query expressions, like in the following:

from publisher in SampleData.Publishers
join book in SampleData.Books
  on publisher equals book.Publisher into pubBooks
select new { Publisher = publisher.Name,
             TotalPrice = pubBooks.TotalPrice() };

The same could be done without much difficulty by using only standard operators, but you get the idea. Creating your own operators helps write shorter and clearer code. In general, it can be useful to create utility methods that you can use in your queries.

Let’s consider another utility operator that also works on a sequence of books.

IEnumerable<Book>.Min

Let’s say we’d like to implement Min for Book objects. The Min operator provided by the standard query operators only works on numeric values. The extension method in listing 12.4 provides an implementation of Min that works on a sequence of Book objects and returns the book that has the lowest number of pages as the result.

Listing 12.4. Min custom query operator (CustomQueryOperators.cs)
public static Book Min(this IEnumerable<Book> source)
{
  if (source == null)
    throw new ArgumentNullException("source");

  Book result = null;
  foreach (Book book in source)
  {
    if ((result == null) || (book.PageCount < result.PageCount))
      result = book;
  }
  return result;
}

With this custom query operator, you can write the following code, for instance:

Book minBook = SampleData.Books.Min();
Console.WriteLine(
  "Book with the lowest number of pages = {0} ({1} pages)",
  minBook.Title, minBook.PageCount);

This example shows how you can adapt a concept like Min introduced by the standard query operators to deal with domain-specific objects.

For a change, let’s now see a utility operator that works on a Publisher object.

Publisher.Books

You can resort to any extension method that helps you simplify your code and hide complexity. For example, in the following query, we use a join clause to get access to each publisher’s books:

from publisher in SampleData.Publishers
join book in SampleData.Books
  on publisher equals book.Publisher into books
select new {
  Publisher = publisher.Name,
  TotalPrice = books.TotalPrice()
};

We’re likely to use the same kind of join clause in every query each time we want to access a publisher’s books. It could be useful to create a utility query operator to perform this operation. The operator in listing 12.5 selects a publisher’s books from a sequence of books.

Listing 12.5. Books custom query operator (CustomQueryOperators.cs)
static public IEnumerable<Book> Books(this Publisher publisher,
  IEnumerable<Book> books)
{
  return books.Where(book => book.Publisher == publisher);
}

This new Books operator can be used to simplify our previous query expression as follows:

from publisher in SampleData.Publishers
select new {
  Publisher = publisher.Name,
  TotalPrice = publisher.Books(SampleData.Books).TotalPrice()
};

Of course this operator can be reused in other queries as well, which makes it easy to filter books by publisher.

 

Warning

This is also an interesting example of what should be avoided! The code that uses the join clause and not our Books operator will be more efficient in most cases because it uses the GroupJoin operator behind the scenes. GroupJoin is optimized to join sequences, and in our case it will loop on books only once to find their publisher. The version of the code that uses our Books operator will loop on the collection of books for each publisher.

This example should help you to understand that while it’s easy to create new query operators, it’s not always the most efficient option. The choice is yours. Always consider the implications.

 

Before we move on to other kinds of extensibility, let’s consider one more example of a domain-specific query operator.

Book.IsExpensive

The last operator in this section will show you how query operators can be used to code a specific concept only once.

The sample operator in listing 12.6 takes a book as a parameter and returns whether or not it is expensive.

Listing 12.6. IsExpensive custom query operator (CustomQueryOperators.cs)

The IsExpensive operator defined in the listing can be used in LINQ queries each time we need to know whether a book is expensive. Here is a sample query that uses this operator:

var books =
  from book in SampleData.Books
  group book.Title by book.IsExpensive() into bookGroup
  select new { Expensive = bookGroup.Key, Books = bookGroup };
ObjectDumper.Write(books, 1);

The results of this query’s execution looks like this:

Expensive=True   Books=...
  Books: Funny Stories
  Books: C# on Rails
  Books: Bonjour mon Amour
Expensive=False  Books=...
  Books: LINQ rules
  Books: All your base are belong to us

The advantage of creating operators like IsExpensive is that they abstract away some notions that need to be expressed in queries. For example, IsExpensive can be reused in multiple queries without having to think each time about what “expensive” means. (Whether something is expensive is subjective, so good luck writing an actual algorithm for this!) Also, if this notion needs to be changed, it can be done in only one place: the operator’s code.

We’ve seen how you can use LINQ’s extensibility to create utility operators that help you deal with business objects. The operators we’ve demonstrated are additional operators that can be used in LINQ queries, but only through the dot notation. Only a small set of query operators can be used implicitly with the query expression syntax. This is the case for basic operators like Where, Select, or OrderBy, for example, which are transparently invoked when where, select, or orderby clauses are used in a query expression. We’ll now demonstrate another kind of extensibility supported by LINQ that allows you to reimplement the operators behind from, where, join, orderby, select, and the other keywords in a query expression.

12.3. Custom implementations of the basic query operators

In the previous section, when we demonstrated how to use our additional query operators, we used the explicit dot notation (method syntax). For example, here is a query that uses two of the operators we created, Books and TotalPrice:

from publisher in SampleData.Publishers
where publisher.Name.StartsWith("A")
select new {
  Publisher = publisher.Name,
  TotalPrice = publisher.Books(SampleData.Books).TotalPrice()
};

This query implicitly involves more operators than just ours. Namely, the Where and Select operators are also part of the query through the where and select clauses. By default, the clauses of this kind of query expression are translated into calls to standard query operators. You may wish to change how a query like this one behaves. We’ll show how you can easily provide and use your own implementations of Where and Select even if they’re used through the query expression notation. Thanks to the way the compiler resolves query operators when it translates a query expression, we can define what implementation of the basic query operators is used.

We’ll first review how query expressions are translated into method calls. This implies that we get to know the query expression pattern. Once we know the basics of the query translation mechanism, we’ll go through some sample implementations of the query expression pattern.

12.3.1. Refresh on the query translation mechanism

Let’s review how the compiler translates query expressions into method calls. This is the starting point of the extensibility option that will allow you to create custom implementation of the basic query operators.

Imagine that we write the following query:

using System.Linq;
using LinqInAction.LinqBooks.Common;

static class TestCustomImplementation
{
  static void Main()
  {
    var books =
      from book in SampleData.Books
      where book.Price < 30
      select book.Title;

    ObjectDumper.Write(books);
  }
}

The code that actually gets executed for this query depends on one thing: the namespaces you import. Because query operators are extension methods, they’re referenced through namespaces. When the compiler sees a query expression, it converts it into calls to extension methods.

One task that the compiler achieves is resolving where the Where and Select methods come from. If you import System.Linq, the compiler will find the Where and Select extension methods that the System.Linq.Enumerable class provides. The result is that the code that actually gets executed is the following:

var query =
  System.Linq.Enumerable.Select(
    System.Linq.Enumerable.Where(
      SampleData.Books,
      book => book.Price < 30),
    book => book.Title);

If you don’t import System.Linq, but instead a namespace of your own that also provides implementations of the Where and Select operators, the code is translated differently.

The idea here is that the same query expression can become translated into something like this:

var query =
  MyNamespace.MyExtensions.Select(
    MyNamespace.MyExtensions.Where(
  SampleData.Books,
  book => book.Price < 30),
book => book.Title);

The kind of extensibility we’re discussing in this section relies on this mechanism. Let’s now examine more precisely how the mapping between a query expression and query operators works.

12.3.2. Query expression pattern specification

We’ve just seen how we can provide our own implementation for a query expression’s where and select clauses. The same mechanism applies to all the clauses. The C# 3.0 specification details which operators should be implemented to fully support query expressions and how they must be implemented. This document introduces the pattern of methods that types can implement to support query expressions as the query expression pattern.

The recommended shape of a generic class C<T> that supports the query expression pattern is shown in listing 12.7.

Listing 12.7. The query expression pattern
delegate R Func<T1,R>(T1 arg1);
delegate R Func<T1,T2,R>(T1 arg1, T2 arg2);

class C
{
  public C<T> Cast<T>();
}

class C<T>
{
  public C<T> Where(Func<T,bool> predicate);
  public C<U> Select<U>(Func<T,U> selector);
  public C<U> SelectMany<U,V>(Func<T,C<U>> selector,
                              Func<T,U,V> resultSelector);
  public C<V> Join<U,K,V>(C<U> inner,
                          Func<T,K> outerKeySelector,
                          Func<U,K> innerKeySelector,
                          Func<T,U,V> resultSelector);
  public C<V> GroupJoin<U,K,V>(C<U> inner,
                               Func<T,K> outerKeySelector,
                               Func<U,K> innerKeySelector,
                               Func<T,C<U>,V> resultSelector);
  public O<T> OrderBy<K>(Func<T,K> keySelector);
  public O<T> OrderByDescending<K>(Func<T,K> keySelector);
  public C<G<K,T>> GroupBy<K>(Func<T,K> keySelector);
  public C<G<K,E>> GroupBy<K,E>(Func<T,K> keySelector,
                                Func<T,E> elementSelector);
}

class O<T> : C<T>
{
  public O<T> ThenBy<K>(Func<T,K> keySelector);
  public O<T> ThenByDescending<K>(Func<T,K> keySelector);
}

class G<K,T> : C<T>
{
  public K Key { get; }
}

 

Note

The query expression pattern for VB has not been provided by Microsoft at the time of this writing.

 

You should refer to the C# 3.0 specification to learn about the details of this pattern. Because query expressions are translated into method invocations by means of a syntactic mapping, types have considerable flexibility in how they implement the query expression pattern. In the context of this book, there are a few things you need to know to understand the examples we’re about to work out:

  • A generic type is used in the query expression pattern to illustrate the proper relationships between parameter and result types, but it is possible to implement the pattern for nongeneric types as well.
  • The standard query operators we described in chapters 3 and 4 provide an implementation of the query operator pattern for any type that implements the IEnumerable<T> interface. Although we’re used to working on collections with the standard query operators in the context of LINQ to Objects and LINQ to XML, you can see that IEnumerable<T> is not part of the pattern. This means we can use LINQ with any object and not just enumerations/sequences.
  • The standard query operators are implemented as extension methods, but the patterns’ methods can be implemented as extension methods or as instance methods, because the two have the same invocation syntax.
  • The methods can request delegates or expression trees as their parameters because lambda expressions are convertible to both.
  • Although recommended for completeness, providing an implementation of all the previously listed methods is not required.

Everything we covered in the first part of this section is the foundation we needed to start creating custom implementations of the basic query operators. We are now ready to see some examples.

To give you a good overview of how it’s possible to implement the LINQ query expression pattern, here’s what we’re going to demonstrate next:

  • We’ll show you examples of generic implementation as well as nongeneric.
  • We’ll build operators that work on IEnumerable<T> as well as operators that work on other kinds of objects.
  • We’ll build operators that receive delegates as well as operators that receive expression trees.
  • Some of our operators will be defined as extension methods, some as instance methods.
  • To keep things simple, we’ll provide implementations of Where and Select only.

Let’s jump right into our examples.

12.3.3. Example 1: tracing standard query operators’ execution

In our first example, we’ll create custom implementations of the Where and Select operators. Our methods will just delegate the processing to the standard Enumerable.Where and Enumerable.Select implementations.

Listing 12.8 shows two operators implemented in a class named CustomImplementation inside the LinqInAction.Extensibility namespace.

Listing 12.8. Custom implementations of Where and Select with the standard generic signatures (CustomImplementation.csproj)
using System;
using System.Collections.Generic;
using System.Linq;

namespace LinqInAction.Extensibility
{
  public static class CustomImplementation
  {
    public static IEnumerable<TSource> Where<TSource>(
      this IEnumerable<TSource> source,
      Func<TSource, Boolean> predicate)
    {
      Console.WriteLine("in CustomImplementation.Where<TSource>");

      return Enumerable.Where(source, predicate);
    }

    public static IEnumerable<TResult> Select<TSource, TResult>(
      this IEnumerable<TSource> source,
      Func<TSource, TResult> selector)
    {
      Console.WriteLine(
        "in CustomImplementation.Select<TSource, TResult>");
      return Enumerable.Select(source, selector);
    }
  }
}

In order to use these new implementations of the two operators, all we need to do is import the LinqInAction.Extensibility namespace instead of the System.Linq namespace:

//using System.Linq;
using LinqInAction.Extensibility;
using LinqInAction.LinqBooks.Common;

class TestCustomImplementation
{
  static void Main()
  {
    var books =
      from book in SampleData.Books
      where book.Price < 30
      select book.Title;

    ObjectDumper.Write(books);
  }
}

Of course, executing this program will display our trace information:

in CustomImplementation.Where<TSource>
in CustomImplementation.Select<TSource, TResult>
Funny Stories
LINQ rules
Bonjour mon Amour

That’s it for our first example. You’ve just seen how to provide your own implementation of the Where and Select operators. Here, we’ve simply added some trace information, which can be useful if you want to better understand how queries work. Of course, you could do something completely different and more useful. Maybe you could rewrite the basic query operators to improve their performance? Let’s make that a challenge for you. Please let us know if you can imagine ways to do that!

Before moving on to our next example that shows another custom implementation of the basic query operators, we’d like to point out that the mechanism we’ve just demonstrated comes with a limitation that we’ll explain in the next section.

12.3.4. Limitation: query expression collision

There is an important limitation you need to keep in mind when implementing the query expression pattern: You cannot implement one or two operators and mix them with the default ones if the signatures of your implementations and the default ones are the same. This is due to the way extension methods are resolved.

Let’s say we change our query expression to sort the results:

var query =
  from Book book in SampleData.Books
  where book.Price < 30
  orderby book.Title
  select book.Title;

As you can see, a new query operator gets involved: OrderBy. The problem is that since we provide implementations only for Where and Select, the compiler complains that it can’t find an implementation for OrderBy:

'System.Collections.Generic.IEnumerable<LinqInAction.LinqBooks.Com-
mon.Book>' does not contain a definition for 'OrderBy' and no extension
method 'OrderBy' accepting a first argument of type 'System.Collec-
tions.Generic.IEnumerable<LinqInAction.LinqBooks.Common.Book>' could be
found (are you missing a using directive or an assembly reference?)

While we wanted to put in place new implementations of Where and Select, we may not be interested in providing a custom implementation for OrderBy and the other operators at this time. The default reflex would be to reuse the standard implementation to stick with the default behavior. In order to do this, you need to import the System.Linq namespace in your code file in addition to our own namespace. You can try to do that, but you’ll notice that the compiler reports a conflict because it doesn’t know how to choose between the implementations of the Where and Select operators from our namespace and the ones from the System.Linq namespace. Here is what the compiler errors say:

error CS1940: Multiple implementations of the query pattern were found for
source type 'LinqInAction.LinqBooks.Common.Book[]'.  Ambiguous call to
'Where'.

error CS1940: Multiple implementations of the query pattern were found for
source type 'System.Collections.Generic.IEnumerable<LinqInAction.Linq-
Books.Common.Book>'.  Ambiguous call to 'Select'.

We could call this a namespace collision. The way extension methods are resolved makes handling multiple extension methods with the same signature in the same scope difficult.

There is unfortunately no easy way to remove the ambiguity in this case. This means that in a given file, either you use only the operators you’ve implemented or you use only those from System.Linq. One option would be to change our versions of the operators to work with more precise types such as IEnumerable<Book> instead of IEnumerable<T>, but obviously this would require creating an implementation for each type we want to deal with—Publisher, Author, and so on—which would make things a bit difficult.

 

Note

Sometimes, the compiler chooses silently between the available operators. For example, if the replacement operators are in the same namespace as the calling code, they’re chosen silently. In this case, there is no conflict with the implementation from System.Linq.Enumerable either. The problem is that this situation does not happen often, because most of the time the namespaces are different.

 

In fact, as soon as you import the System.Linq namespace—just to get access to the Func delegate types, for example—you simply cannot use your own reimplementations of the standard query operators because of the conflict with the implementations provided by the System.Linq.Enumerable class.

 

Trick

One way to get access to the Func delegate types and other types declared in the System.Linq namespace is to use complete type names (types prefixed by their namespace). For example, if you don’t add using System.Linq; at the top of your C# file, you can use System.Linq.Func<...> to get access to the Func delegate types without creating a namespace collision.

You can also use a namespace alias for long namespaces. For example, if you add using SysLinq = System.Linq; at the top of your C# file instead of using System.Linq;, you can use SysLinq.Func<...> to reference the Func delegate types.

 

Let’s now move on to a second example.

12.3.5. Example 2: nongeneric, domain-specific operators

In this new example, we’ll create another custom implementation of the basic query operators that will show you how the query expression pattern can be implemented by domain-specific query operators.

You’ve just seen in the previous example that you can provide your own implementations of the basic query operators by creating extension methods for the IEnumerable<T> type. It’s interesting to note that you may also create query operators that work on an enumeration of a specific type and not just on a generic enumeration.

Instead of creating an extension method for IEnumerable<T>, you can create an extension method for IEnumerable<Book>. This allows you to transparently use a custom implementation of the query operators for Book objects while using the standard implementation of the query operators for objects of other types. This can be used as a workaround for the limitation we presented in the previous section. However, doing this can make sense in itself.

Here, we’ll create implementations of the Where and Select operators that work on Book objects. We’ll adapt the generic implementations we provided in listing 12.8 as our first example and use the fact that we work with Book objects to display the title of each book that the operators process.

Listing 12.9 shows our domain-specific implementations.

Listing 12.9. Domain-specific implementations of Where and Select (DomainSpecificOperators.cs)
using System;
using System.Collections.Generic;
using System.Linq;

using LinqInAction.LinqBooks.Common;

namespace LinqInAction.Extensibility
{
  static class DomainSpecificOperators
  {
    public static IEnumerable<Book> Where(
      this IEnumerable<Book> source,
      Func<Book, Boolean> predicate)
    {
      foreach (Book book in source)
      {
        Console.WriteLine(
          "processing book "{0}" in "+
          "DomainSpecificOperators.Where",
          Book.Title);
        if (predicate(book))
          yield return book;
      }
    }

    public static IEnumerable<TResult> Select<TResult>(
      this IEnumerable<Book> source, Func<Book, TResult> selector)
    {
      foreach (Book book in source)
      {
        Console.WriteLine(
          "processing book "{0}" in "+
          "DomainSpecificOperators.Select<TResult>",
          book.Title);
        yield return selector(book);
      }
    }
  }
}

Let’s reuse the same query as in our first example:

using LinqInAction.Extensibility;
using LinqInAction.LinqBooks.Common;

static class TestDomainSpecificOperators
{
  static void Main()
  {
    var books =
      from book in SampleData.Books
      where book.Price < 30
      select book.Title;

    ObjectDumper.Write(books);
  }
}

When executed, this program outputs the following kind of results:

processing book "Funny Stories" in DomainSpecificOperators.Where
processing book "Funny Stories" in DomainSpecificOperators.Select<TResult>
Funny Stories
processing book "Linq rules" in DomainSpecificOperators.Where
processing book "Linq rules" in DomainSpecificOperators.Select<TResult>
LINQ rules
processing book "C# on Rails" in DomainSpecificOperators.Where
processing book "All your base are belong to us" in DomainSpecificOperators.Where
processing book "Bonjour mon Amour" in DomainSpecificOperators.Where
processing book "Bonjour mon Amour" in DomainSpecificOperators.Select<TResult>
Bonjour mon Amour

The trace information in these results shows which books are processed by each operator.

In comparison to generic operators, domain-specific operators know the types they’re working on. This allows us to access specific members, like the Title property in our example.

Also, the limitation we presented in the previous section does not exist with this kind of operator. Domain-specific operators can be used in combination with the default implementation of the other operators.

This time, we can use an orderby clause in our query, although we didn’t provide a custom implementation for the OrderBy operator:

var query =
  from Book book in SampleData.Books
  where book.Price < 30
  orderby book.Title
  select book.Title;

The only thing you need to do for this to work is to import both our operator’s namespace (LinqInAction.Extensibility) and the System.Linq namespace.

 

Warning

As you can see, changing or adding a namespace import can make a serious difference in the behavior of your code. A given query can behave differently if you use System.Linq, LinqInAction.Extensibility, or another namespace!

The design decision of relying on namespace imports to reference extension methods (and query operators) is questionable. Anyway, be careful about this and double-check the namespaces you import when in doubt.

 

After demonstrating that the implementation you provide for the basic query operators doesn’t have to work on generic types, we’ll show you in a third example that your implementation doesn’t necessarily have to work on sequences either.

12.3.6. Example 3: non-sequence operator

This last example of how to provide custom implementations of the operators used in query expressions demonstrates how you can integrate single objects in queries.

The standard query operators provide an implementation of the query operator pattern for IEnumerable<T>. This allows you to work with collections like the array of Book objects provided by our SampleData.Books property. Let’s suppose we want to work with a single object and not a sequence of objects. What can we do?

In the following query, we work on a specific Publisher instance and use it in a way similar to how we’d use a sequence of Publisher objects:

from publisher in SampleData.Publishers[0]
join book in SampleData.Books
  on publisher equals book.Publisher into books
select new { Publisher = publisher.Name, Books = books};

This query seems to make sense, but the problem is that it doesn’t work as is with the standard query operators. This is because the standard query operators are designed to work only with IEnumerable<T>. The particular problem in our case is that the compiler complains that it cannot find GroupJoin for the Publisher type:

error CS1936: Could not find an implementation of the query pattern for
source type 'LinqInAction.LinqBooks.Common.Publisher'.  'GroupJoin' not
found.

The GroupJoin operator—used because we are performing a join operation—is defined the following way:

public static IEnumerable<TResult>
  GroupJoin<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, IEnumerable<TInner>, TResult> resultSelector)

You can clearly see that the outer argument is defined as a sequence (IEnumerable<TOuter>). All we need to do to make the compiler happy is provide a new implementation of GroupJoin that accepts a single element as the outer object instead of a sequence.

Listing 12.10 shows how to write this additional version of GroupJoin.

Listing 12.10. Implementation of GroupJoin for a single element (NonSequenceOperator.cs)

All we’ve done here is change the type of the first argument and adapt the code to deal with a single object.

Until now, we’ve used only simple examples, but you should now be able to code your own query operators. We are now going to introduce a richer example. It will have methods request expression trees as their parameters instead of delegates.

12.4. Querying a web service: LINQ to Amazon

In the previous section, we learned how to create custom query operators or implement the standard ones differently. This is a solution that works well for objects in memory, just like what LINQ to Objects offers. In this section, we’ll consider a different scenario: We’ll query a web service. More precisely, we’ll query Amazon to get information about books.

In this section, now that you know a lot about LINQ and how it works, we’re going to create our own LINQ provider: LINQ to Amazon! In the next section, we are going to further refine our implementation.

This example will allow us to address the case of query translation to another query language and remote evaluation. The query we’ll write here will be translated into web queries and run on a remote web server. This requires a different extensibility mechanism than what we’ve seen previously.

12.4.1. Introducing LINQ to Amazon

The example we’ll introduce in this section will use LINQ’s extensibility to allow for language-integrated queries against a book catalog. LINQ queries will be converted to REST URLs, which are supported by Amazon’s web services. These services return XML data, which we’ll be able to convert from XML to .NET objects using LINQ to XML.

A use case for this example could be the following:

  1. Search for books on Amazon using a LINQ query
  2. Display the results in XHTML using LINQ to XML
  3. Import the selected books into a database using LINQ to SQL

The goal here is not to create a complete solution, so we won’t demonstrate all of this at this point. We’ll focus on the first part of the scenario. We already performed this kind of operation in the prior chapters, but this time we’ll create a LINQ provider that can be used to write queries without worrying about the details of the dialog with Amazon.

We won’t support the complete set of operators that could be used in a LINQ query. This would be too complex to present in the context of this book. Anyway, since we are calling an underlying web service, we need to restrict the query possibilities to what the service supports.

For the moment, let’s look at the client code we would like to be able to write:

var query =
  from book in new LinqToAmazon.AmazonBookSearch()
  where
    book.Title.Contains("ajax") &&
    (book.Publisher == "Manning") &&
    (book.Price <= 25) &&
    (book.Condition == BookCondition.New)
  select book;

This piece of code is nearly self-explanatory. This is LINQ to Amazon code. It expresses a query against Amazon, but does not execute it. The query variable contains...a query. The query will be executed when we start enumerating the results.

The following piece of code makes the transition from the LINQ to Amazon world to the familiar LINQ to Objects world:

var sequence = query.AsEnumerable();

The query gets executed when AsEnumerable is called and an enumeration of the resulting books is created. The next steps could be to use LINQ to Objects to perform grouping operations on the results:

var groups =
  from book in query.AsEnumerable()
  group book by book.Year into years
  orderby years.Key descending
  select new {
    Year = years.Key,
    Books =
      from book in years
      select new { book.Title, book.Authors }
  };

This query can be used for displaying the results like this:

Published in 2006
  Title=Ruby for Rails : Ruby Techniques for Rails Developers   Authors=...
  Title=Wxpython in Action    Authors=...

Published in 2005
  Title=Ajax in Action    Authors=...
  Title=Spring in Action (In Action series)     Authors=...

Published in 2004
  Title=Hibernate in Action (In Action series)    Authors=...
  Title=Lucene in Action (In Action series)     Authors=...

Here is the code that produces this kind of results:

foreach (var group in groups)
{
  Console.WriteLine("Published in " + group.Year);
  foreach (var book in group.Books)
  {
    Console.Write("  ");
    ObjectDumper.Write(book);
  }
  Console.WriteLine();
}

What a great way to query a catalog of books! Don’t you think that this code is comprehensible and clearly expresses the intention? It’s certainly better than having to construct a web request and having to know all the details of the Amazon API.

Let’s see what’s needed to implement LINQ to Amazon.

12.4.2. Requirements

This time, the data we’ll query will not be in memory. When the data is in memory, we can query it continuously and retrieve the results one by one.

In our now-classic LINQ to Objects example, each time we perform an iteration in foreach, a new result is pulled from the original list down through our query processing:

Here is the detail of what can happen when the program is executed:

  1. First iteration of the foreach loop

    1. Is the first book cheaper than $30? No.
    2. Is the second book cheaper than $30? Yes.
    3. Process the second book.
  2. Second iteration

    1. Is the third book cheaper than $30? Yes.
    2. Process the third book.
  3. Third iteration

    1. Is the third book cheaper than $30? No.
    2. Etc.

As you can see, deferred query execution implies that we work continuously on the original data source. In our new example, we’ll call a web service, so we can’t rely on the same kind of processing. We want to make a query over the web only once, and we don’t want to retrieve a complete list we would filter locally. Instead, we want the web service to return only those results we are interested in.

This requires the following steps:

  1. As a developer, we express a query using LINQ.
  2. At run-time, the query is translated into something the web service can understand.
  3. The web service is called and returns the results.

The key point here is that we need the web query to be completely defined before we can make the call.

12.4.3. Implementation

We’ll now start to write the code for creating LINQ to Amazon. Before getting to the details of the implementation code, let’s describe what we need to do in order to be able to use LINQ with Amazon.

First, we’ll work with books, just like in our other examples. The difference though is that a book described by Amazon is not the same as what the Book class models. For the sake of simplicity, we’ll define an AmazonBook class that represents a book as returned by Amazon’s web services:

public class AmazonBook
{
  public IList<String> Authors { get; set; }
  public BookCondition Condition { get; set; }
  public String        Isbn { get; set; }
  public UInt32        PageCount { get; set; }
  public String        Publisher { get; set; }
  public Decimal       Price { get; set; }
  public String        Title { get; set; }
  public UInt32        Year { get; set; }
}

 

Note

Here we use auto-implemented properties, a new feature of C#. We used this feature in chapters 2 and 6.

 

You can see that this class defines the members we use in our query (Title, Publisher, Price, and Condition), as well as others we’ll use later for display. Condition is of type BookCondition, which is just an enumeration defined like this:

public enum BookCondition {All, New, Used, Refurbished, Collectible}

The next and main thing we have to do is define the AmazonBookSearch class we’ll use to perform the query.

An instance of this class will represent a given query. This is why it should contain the criteria we specify in the where clause of our query. For clarity and reusability, we created the AmazonBookQueryCriteria class, which looks like this:

class AmazonBookQueryCriteria
{
  public BookCondition? Condition { get; set; }
  public Decimal? MaximumPrice { get; set; }
  public String Publisher { get; set; }
  public String Title { get; set; }
}

AmazonBookSearch contains an instance of AmazonBookQueryCriteria. Here is the first version of the AmazonBookSearch class:

public class AmazonBookSearch
{
  private AmazonBookQueryCriteria _Criteria;
}

As it stands, this class is useless. To be able to use an instance of AmazonBookSearch in a query expression, we need to provide the accompanying Where and Select query operators. For a change, we won’t create these operators as extension methods, but instead as instance methods (we used extension methods for all the examples in sections 12.2 and 12.3). This is also supported by the query expression pattern.

Here is how we’ll write the Where and Select operators:

In both methods, we just return the current AmazonBookSearch instance because we are still working on the same query.

You should notice an important thing here: Our operators are not receiving delegates as with our previous examples, but instances of the Expression<TDelegate> class . As you saw in chapter 3, the System.Linq.Expressions.Expression<TDelegate> class can be used to retrieve an expression tree. In operators that receive a delegate as a parameter, we can’t really do much more than execute the code the delegate points to. In comparison, the expression tree we receive in Where describes what is written in the where clause of a query as data instead of code. The point is that we’ll be able to analyze the predicate expression tree received as a parameter by the Where method to extract the criteria specified in the query.

The next logical step is to code the AmazonBookExpressionVisitor class used in Where . This class is used to process an expression tree and extract the query criteria it contains. Before doing so, it’s important to get an idea of what the expression tree contains. An expression tree is a hierarchy of expressions. Listing 12.11 shows the complete hierarchy received by the Where method.

Listing 12.11. Sample expression tree generated for a LINQ to Amazon query

If you look closely at this tree, you should be able to locate the criteria we’ve specified in our query: the restriction on the title , the filter on the publisher , the price limit , and the book condition . As a reminder, here is the query for which this expression tree is generated:

from book in new LinqToAmazon.AmazonBookSearch()
where
  book.Title.Contains("ajax") &&
  (book.Publisher == "Manning") &&
  (book.Price <= 25) &&
  (book.Condition == BookCondition.New)
select book;

The ProcessExpression method of the AmazonBookExpressionVisitor class should basically walk through the expression tree to extract information. Here we’ll implement the Visitor design pattern to find all the criteria the expression tree contains.

Here is the main method of the AmazonBookExpressionVisitor class, VisitExpression:

private void VisitExpression(Expression expression)
{
  if (expression.NodeType == ExpressionType.AndAlso)
  {
    ProcessAndAlso((BinaryExpression)expression);
  }
  else if (expression.NodeType == ExpressionType.Equal)
  {
    ProcessEqual((BinaryExpression)expression);
  }
  else if (expression.NodeType == ExpressionType.LessThanOrEqual)
  {
    ProcessLessThanOrEqual((BinaryExpression)expression);
  }
  else if (expression is MethodCallExpression)
  {
    ProcessMethodCall((MethodCallExpression)expression);
  }
  else if (expression is LambdaExpression)
  {
    ProcessExpression(((LambdaExpression)expression).Body);
  }
}

We won’t detail every method here. You can refer to the complete source code accompanying this book to see how all these methods are implemented. Just to give you an idea, here is the VisitAndAlso method:

private void VisitAndAlso(BinaryExpression andAlso)
{
  VisitExpression(andAlso.Left);
  VisitExpression(andAlso.Right);
}

Here is the VisitEqual method, which handles the book.Publisher == "xxx" and book.Condition == BookCondition.* criteria:

After the execution of AmazonBookExpressionVisitor.ProcessExpression, our AmazonBookSearch instance has collected all the criteria provided in the LINQ query. At this point, the query has been parsed, but hasn’t been executed. No call has been made to Amazon.

As usual, we want the execution to happen when we start enumerating the results of the query. This is why we’ll make AmazonBookSearch implement IEnumerable<AmazonBook>. Here is how to code the two necessary methods:

As you can see, all the processing is delegated to a helper class, AmazonHelper, which knows how to build an Amazon URL and how to call Amazon and convert the results into a sequence of AmazonBook objects.

Here is the AmazonHelper.BuildUrl method, which takes the criteria and returns an URL that uses them:

static internal String BuildUrl(AmazonBookQueryCriteria criteria)
{
  if (criteria == null)
    throw new ArgumentNullException("criteria");

  String url = URL_AWSECommerceService;
  if (!String.IsNullOrEmpty(criteria.Title))
    url += "&Title=" + HttpUtility.UrlEncode(criteria.Title);
  if (!String.IsNullOrEmpty(criteria.Publisher))
    url += "&Publisher=" + HttpUtility.UrlEncode(criteria.Publisher);
  if (criteria.Condition.HasValue)
    url += "&Condition=" +
             HttpUtility.UrlEncode(criteria.Condition.ToString());
  if (criteria.MaximumPrice.HasValue)
    url += "&MaximumPrice=" +
             HttpUtility.UrlEncode(
              (criteria.MaximumPrice * 100)
                .Value.ToString(CultureInfo.InvariantCulture)
             );

  return url;
}

The second method of the AmazonHelper class is PerformWebQuery. This method performs the actual call to Amazon and builds the results by parsing the web response using LINQ to XML:

That’s all there is to it. You should now be able to use LINQ to Amazon queries! Keep in mind that this is a straightforward implementation. This implementation supports only simple queries and is likely to fail if you try to use it with different queries. Feel free to build on this example and improve it!

You can take a look at the complete source code for the details of the implementation (look for the LinqToAmazon project).

 

Note

In order to use the Amazon.com web services and test this example fully, you need to register with the Amazon Web Services program. After registering with Amazon, you’ll be assigned an access key. Edit the Amazon-Helper.cs file and replace INSERT YOUR AWS ACCESS KEY HERE with your access key.

 

In some cases, creating additional query operators or reimplementing the standard ones is not enough. In these cases, you may resort to another extensibility mechanism offered by LINQ. We’ll show you an example in the next section.

12.5. IQueryable and IQueryProvider: LINQ to Amazon advanced edition

In the previous version of our LINQ to Amazon example, we implemented the Where operator in such a way that it receives the criteria expressed in our query as an expression tree. All the processing happens in this operator by analyzing the expression tree.

Our first LINQ to Amazon implementation is far from being complete. We created it to take into account only one call to the Where operator. If we were to write a complete implementation of LINQ to Amazon, we’d have to resort to an advanced technique. This technique relies on the System.Linq.IQueryable<T> interface. This is the technique used for LINQ to SQL to query data from a relational database. The use of expression trees and the IQueryable<T> interface enables rich queries, obtained by numerous calls to query operators, to be converted into a single SQL query that gets sent to the database.

In this section, we’ll create a new implementation of LINQ to Amazon that relies on IQueryable<T>. Before doing so, let’s spend some time learning more about IQueryable<T>.

12.5.1. The IQueryable and IQueryProvider interfaces

Let’s look at the query we used with the first implementation of LINQ to Amazon:

var query =
  from book in new LinqToAmazon.AmazonBookSearch()
  where
    book.Title.Contains("ajax") &&
    (book.Publisher == "Manning") &&
    (book.Price <= 25) &&
    (book.Condition == BookCondition.New)
  select book;

As usual, the key thing the compiler looks at when it’s about to convert such a query expression into calls to query operators is the type of object the query operates on. In our case, this is an instance of LinqToAmazon.AmazonBookSearch. The compiler notices that AmazonBookSearch provides an implementation of the Where operator and so this is what will be invoked when the query is evaluated. Of course, the real execution only happens when the query is enumerated through a call to GetEnumerator.

To be able to support richer queries using the same technique, we would have to implement more operators than just Where. For example, with our first implementation, we get the results in an unspecified order. If we want to sort the results, we can do it locally using LINQ to Objects. If we want to be able to perform the sort operation on the server, we would have to implement the OrderBy operator in addition to Where. We would then be able to retrieve the sort information expressed in the query and transmit it as part of the web query. If the server supports sorting, the results we retrieve would be sorted without having to use LINQ to Objects afterward on the client.

Another thing that our first implementation doesn’t support is retrieving partial information. If you look at our query’s select clause, you’ll notice that we return complete information on books. What if we wanted to retrieve only the titles? It would be more efficient to ask the web server to return only the title of each book instead of the complete information about it. In order to do this, we would have to implement the Select operator.

You should start to understand that if we do it this way, the analysis of the query is scattered in several places: in each operator’s implementation. This tends to complicate the analysis of the query and makes optimization more difficult.

The IQueryable interface has been designed to help in situations like this. It allows us to receive all the information contained in a query as one big expression tree instead of having each operator receive partial information. Once the expression tree is ready, it can be analyzed to do whatever we want in response to the query. IQueryable defines the pattern for you to gather up a user’s query and present it to your processing engine as a single expression tree that you can either transform or interpret.

When a query works on an object that implements IQueryable, the query operators that are used are not coming from the System.Linq.Enumerable class, but from the System.Linq.Queryable class. This class provides all the query operators required by the LINQ query expression pattern implemented using expression trees.

The query operators in the Queryable static class do not actually perform any querying. Instead, their functionality is to build an expression tree as an instance of the System.Linq.Expressions.Expression object representing the query to be performed and then pass that Expression object to the source IQueryable for further processing.

All the implementations of the query operators provided by the Queryable class return a new IQueryable that augments that expression tree with a representation of a call to that query operator. Thus, when it comes time to evaluate the query, typically because the IQueryable is enumerated, the data source can process the expression tree representing the whole query in one batch.

The actual query execution is performed by classes that implement the IQueryable interface, as well as the additional IQueryProvider interface. We’ll now see how these two types work together and how to implement them.

Getting ready for the implementation

With our first implementation, the queries were applied to an instance of the LinqToAmazon.AmazonBookSearch type. This type implements IEnumerable<AmazonBook>. Here is a sample query using the first implementation:

var query =
  from book in new LinqToAmazon.AmazonBookSearch()
  where
    book.Title.Contains("ajax") &&
    (book.Publisher == "Manning") &&
    (book.Price <= 25) &&
    (book.Condition == BookCondition.New)
  select book;

In the second implementation, we’re going to create new types that implement IQueryable and IQueryProvider. The entry point type will be named AmazonBookQueryProvider. This is the class that will implement IQueryProvider. A second class will provide a generic implementation of IQueryable<T>: the Query<T> class.

Here is how these two classes will allow us to write the same query as earlier using the second implementation:

var provider = new AmazonBookQueryProvider();
var queryable = new Query<AmazonBook>(provider);
var query =
  from book in queryable
  where
    book.Title.Contains("ajax") &&
    (book.Publisher == "Manning") &&
    (book.Price <= 25) &&
    (book.Condition == BookCondition.New)
  select book;

Notice how the query is unchanged. Only the object on which we are performing the query is different. The use of an implicitly typed local variable through the var keyword abstracts away the type of the query’s result, but it is different for each implementation. With the first implementation, the type of the result is IEnumerable<AmazonBook>. With the second implementation, the type of the result is IQueryable<AmazonBook>.

As we already explained, the difference is that IEnumerable<T> represents an enumeration, while IQueryable<T> represents a query. An instance of a type that implements IQueryable<T> contains all the information needed to execute a query. Think of it as a description of what you want done when the query is enumerated.

Overview of IQueryable and IQueryProvider

Before we move on to the implementation, let’s look at how the IQueryable<T> and IQueryProvider interfaces are defined.

Here is the declaration of IQueryable<T>:

interface IQueryable<T> : IEnumerable<T>, IQueryable
{
}

This means that we have to implement the members of the following interfaces:

interface IEnumerable<T> : IEnumerable
{
  IEnumerator<T> GetEnumerator();
}

interface IEnumerable
{
  IEnumerator GetEnumerator();
}

interface IQueryable : IEnumerable
{
  Type ElementType { get; }
  Expression Expression { get; }
  IQueryProvider Provider { get; }
}

The main element you should pay attention to in the interfaces is the Expression property of the IQueryable interface. It gives you the expression that corresponds to the query. The actual query underneath the hood of an IQueryable is an expression tree of LINQ query operators/method calls. This is the part of the IQueryable that your provider must comprehend in order to do anything useful.

Note that the IQueryable<T> interface implements IEnumerable<T> so that the results of the query it encompasses can be enumerated. Enumeration should force the execution of the expression tree associated with an IQueryable object. At this time, we’ll translate the expression tree into an Amazon web query and make the call to Amazon’s web services. This is what the IQueryProvider referenced by an IQueryable instance will do.

We have to implement the members of the IQueryProvider interface in order to handle the execution of the queries. Here is how it is declared:

public interface IQueryProvider
{
  IQueryable CreateQuery(Expression expr);
  IQueryable<TElement> CreateQuery<TElement>(Expression expr);
  object Execute(Expression expr);
  TResult Execute<TResult>(Expression expr);
}

As you can see, the IQueryProvider interface contains two groups of methods, one for the creation of queries and another of the execution of queries. Each group contains both generic and nongeneric overloads. Implementing IQueryProvider may look like a lot of work. Don’t worry. You only really need to worry about the Execute method. It is the entry point into your provider for executing query expressions. This is the quintessence of your LINQ provider implementation.

Now that you’ve seen what needs to be implemented to create a complete LINQ provider, you may start to wonder if it’s not something difficult. Well, it is! You should never consider the creation of a LINQ provider to be an easy task. However, things should be a bit easier after you’ve taken a look at our sample implementation and you’ve been able to see how the mechanics work. The LINQ to Amazon sample is here to help you make your first steps with IQueryable<T> without too much difficulty. It contains the bases required for every implementation of a LINQ provider.

Let’s now see how the LINQ to Amazon provider implements IQueryable and IQueryProvider.

12.5.2. Implementation

To implement LINQ to Amazon’s query provider, we reused code provided by Matt Warren from Microsoft on his blog.[2] The code we reuse consists of a generic implementation of IQueryable<T> (the Query<T> class in the Query.cs file) and a base implementation of IQueryProvider (the QueryProvider class in the Query-Provider.cs file).

2 Matt Warren provides an introduction to the implementation of an IQueryable provider, as well as sample source code in his blog. The series of posts is available at the following address: http://blogs.msdn.com/mattwar/archive/2007/08/09/linq-building-an-iqueryable-provider-part-i.aspx

Once you have these classes at hand, what’s left is to create a class that inherits from QueryProvider and provides an implementation for the Execute method, and optionally one for the GetQueryText method. Of course, implementing Execute is the most difficult part, precisely because what a LINQ provider does is execute queries!

In our case, this is not so difficult, as you can see. Here is how we implemented the AmazonBookQueryProvider class:

You can see that the work is greatly simplified because we’d already created the useful helper classes, AmazonBookExpressionVisitor and AmazonHelper, in the previous section.

If we were to rewrite LINQ to SQL, the Execute method would convert the entire expression tree it receives as an argument into an equivalent SQL query and send that query to a database for execution. The LINQ to Amazon implementation instead needs to convert the expression tree into a web request and execute that request.

We won’t give more details about the implementation here because it would be too long. You should look at the source code accompanying this book to learn more. We recommend that you also refer to Matt Warren’s blog posts to fully understand how to implement a complete LINQ provider.

Before closing this chapter, we think it may be useful to review the execution of a sample query step by step to help you better understand how an implementation of IQueryable<T> works.

12.5.3. What happens exactly

You may wonder how the mechanism enabled by IQueryable<T> works. We’ll now quickly depict this mechanism to satisfy your curiosity.

Let’s consider the following sample query that works with an AmazonBookQueryProvider:

var provider = new AmazonBookQueryProvider();
var queryable = new Query<AmazonBook>(provider);
var query =
  from book in queryable
  where
    book.Title.Contains("ajax") &&
    (book.Publisher == "Manning") &&
    (book.Price <= 25) &&
    (book.Condition == BookCondition.New)
  select book;

Each time a query such as this one is written, the compiler generates the following kind of code:

var provider = new AmazonBookQueryProvider();
var queryable = new Query<AmazonBook>(provider);
IQueryable<AmazonBook> query =
  Queryable.Where<AmazonBook>(queryable, <expression tree>);

Queryable.Where is a static method that takes as arguments an IQueryable<T> followed by an expression tree. The Queryable.Where method returns the result of a call to the provider’s CreateQuery method.

In our case, the source IQueryable<T> is an instance of the Query<AmazonBook> class. The implementation of CreateQuery provided by the base QueryProvider class creates a new Query<AmazonBook> instance that keeps track of the expression tree. We don’t support complex queries, so CreateQuery is called only once in our case, but in richer implementations CreateQuery could be invoked several times in cascade to create a deep expression tree.

The next operation is the enumeration of the query. Typically, this happens in a foreach loop in which you process the results. Enumerating the query invokes the GetEnumerator method of the Query<AmazonBook> object.

In response to a call to the GetEnumerator method, the Execute method of the provider is invoked. This is where we parse the expression tree, generate the corresponding web query, call Amazon, and build a list of AmazonBook objects based on the response we get. Finally, we return the list of books as the result of the Execute method, and that becomes the result of the GetEnumerator method. The query execution is then complete and the list of books is now ready to be processed.

That’s all for our LINQ to Amazon example. Implementing IQueryable<T> enables powerful scenarios that integrate LINQ with a lot of different data sources. This powerful extensibility option is not easy to implement, which is why we recommend you take a look at other implementations to make sure you fully understand how IQueryable works if you plan on creating your own implementation.

12.6. Summary

In this chapter, we presented options available to extend LINQ and adapt it to your needs. The sample extensions we demonstrated here are simple. It will be interesting to see how many real-life alternate implementations and extensions are released as people find flaws or shortcomings in the default set.

LINQ’s extensibility is what allows it to offer support for several data sources. It’s also what will allow wide adoption of LINQ by developers in all layers of applications. As LINQ gets adopted, we are likely to see more and more framework providers adding LINQ support to their products to offer their users the benefits of strongly typed and standard querying capabilities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.47.218