Chapter 12. LINQ beyond collections

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12. LINQ beyond collections

This chapter covers

LINQ to SQL
IQueryable and expression tree queries
LINQ to XML
Parallel LINQ
Reactive extensions for .NET
Writing your own operators

Suppose an alien visited you and asked you to describe “culture.” How could you capture the diversity of human culture in a short space of time? You may decide to spend that time showing him culture rather than just describing it in the abstract: a visit to a New Orleans jazz club, opera in La Scala, the Louvre gallery in Paris, a Shakespeare play in Stratford-upon-Avon, and so on.

Would this alien know everything about culture afterward? Could he compose a tune, write a book, dance a ballet, craft a sculpture? Absolutely not. But he’d hopefully come away with a sense of culture—its richness and variety, its ability to light up people’s lives.

So it is with this chapter. You’ve now seen all of the features of C# 3, but without seeing more of LINQ you don’t have enough context to really appreciate them. When the first edition of this book was published, not many LINQ technologies were available—now there’s a glut of them, both from Microsoft and third parties. That in itself hasn’t surprised me—but I’ve been fascinated to see the different nature of these technologies.

We’re going to look at various ways in which LINQ manifests itself, with an example of each. I’ve chosen to demonstrate Microsoft technologies in the main, because they’re the most typical ones. This isn’t meant to imply that third parties aren’t welcome in the LINQ ecosystem: there are a number of projects, both commercial and open source, providing access to varied data sources and building extra features on top of existing providers. In contrast to the rest of this book, we’ll only skim the surface of each of the topics here—the point isn’t to learn the details, but to immerse yourself in the spirit of LINQ. To investigate any of these technologies further, I recommend that you get a dedicated book or read the relevant documentation carefully. I’ve resisted the temptation to say “there’s more to LINQ to [xxx] than this” at the end of each section, but please take it as read. Each technology has many capabilities beyond querying, but I’ve focused on the areas that are directly related to LINQ.

Let’s start off with the provider that generally got the most attention when LINQ was first introduced: LINQ to SQL.

12.1. Querying a database with LINQ to SQL

I’m sure by now you’ve absorbed the message that LINQ to SQL converts query expressions into SQL, which is then executed on the database. It’s more than that—it’s a full ORM solution—but I’m going to concentrate on the query side of LINQ to SQL rather than go into concurrency handling and the other details that an ORM has to deal with. I’ll show you just enough so that you can experiment with it yourself—the database and code are available on the book’s website (http://csharpindepth.com). The database is in SQL Server 2005 format to make it easy to play with even if you don’t have the latest version of SQL Server installed, although obviously Microsoft has made sure that LINQ to SQL works against newer versions too.

Why Linq to Sql Rather Than the Entity Framework?

Speaking of “newer versions,” you may be wondering why I’ve chosen to demonstrate LINQ to SQL instead of the Entity Framework, which is now Microsoft’s preferred solution (and also supports LINQ). The answer is merely simplicity: whereas the Entity Framework is undoubtedly more powerful than LINQ to SQL in various ways, it requires extra concepts that would take too much space to explain here. I’m trying to give you a sense of the consistency (and occasional inconsistencies) that LINQ provides, and that’s as applicable to LINQ to SQL as to the Entity Framework.

Before we start writing any queries, we need a database and a model to represent it in code.

12.1.1. Getting started: the database and model

LINQ to SQL needs metadata about the database to know which classes correspond to which database tables, and so on. There are various ways of representing that metadata: I’m going to use the LINQ to SQL designer built into Visual Studio. You can design the entities first and ask LINQ to create the database, or design your database and let Visual Studio work out what the entities should look like. Personally I favor the second approach, but there are pros and cons for both ways.

Creating the Database Schema

The mapping from the classes in chapter 11 to database tables is straightforward. Each table has an autoincrementing integer ID column with an appropriate name: ProjectID, DefectID, and so forth. The references between tables simply use the same name, so the Defect table has a ProjectID column, for instance, with a foreign key constraint. There are a few exceptions to this simple set of rules:

User is a reserved word in T-SQL, so the User class is mapped to the DefectUser table.
The enumerations (status, severity, and user type) don’t have tables: their values are simply mapped to tinyint columns in the Defect and DefectUser tables.
The Defect table has two links to the DefectUser table: one for the user who created the defect and one for the current assignee. These are represented with the CreatedByUserId and AssignedToUserId columns, respectively.

Creating the Entity Classes

Once our tables are created, generating the entity classes from Visual Studio is easy. Simply open Server Explorer (View -> Server Explorer) and add a data source to the SkeetySoftDefects database (right-click on Data Connections and select Add Connection). You should be able to see four tables: Defect, DefectUser, Project, and NotificationSubscription.

You can then add a new item of type “LINQ to SQL classes” to the project. This name will be the basis for a generated class representing the overall database model: I’ve used the name DefectModel, which leads to a class called DefectModelDataContext. The designer will open when you’ve created the new item. You can then drag the four tables from Server Explorer into the designer, and it’ll figure out all the associations. After that, you can rearrange the diagram and adjust various properties of the entities. Here’s a list of what I changed:

I renamed the DefectID property to ID to match our previous model.
I renamed DefectUser to User (so although the table is still called DefectUser, we’ll generate a class called User, just like before).
I changed the type of the Severity, Status, and UserType properties to their enum equivalents (having copied those enumerations into the project).
I renamed the parent and child properties used for the associations between Defect and DefectUser—the designer guessed suitable names for the other associations, but had trouble here because there were two associations between the same pair of tables. I named the relationships AssignedTo/Assigned-Defects and CreatedBy/CreatedDefects.

Figure 12.1 shows the designer diagram after all of these changes. As you can see, it looks much like the class diagram we saw in figure 11.3, except without the enumerations.

Figure 12.1. The LINQ to SQL classes designer showing the rearranged and modified entities

If you look in the C# code generated by the designer (DefectModel.designer.cs), you’ll find five partial classes: one for each of the entities, and the DefectModelDataContext class I mentioned earlier. The fact that they’re partial is useful: in this case I added extra constructors to match the ones we had for our original in-memory classes, so the code from chapter 11 to create the sample data could be reused without much extra work. For the sake of brevity I haven’t included the insertion code here, but if you look at PopulateDatabase.cs in the source code, you should be able to follow it easily enough. Of course, you don’t have to run this yourself—the downloadable database is already populated.

Now that we have a schema in SQL, an entity model in C#, and some sample data, let’s get querying.

12.1.2. Initial queries

I’m sure you’ve guessed what’s coming, but hopefully that won’t make it any less impressive. We’re going to execute query expressions against our data source, watching LINQ to SQL convert the query into SQL on the fly. For the sake of familiarity, we’ll use some of the same queries we saw executing against our in-memory collections in chapter 11.

First Query: Finding Defects Assigned to Tim

I’ll skip over the trivial examples from early in the chapter, starting instead with the query from listing 11.7 that checks for open defects assigned to Tim. Here’s the query part of listing 11.7, for the sake of comparison:

User tim = SampleData.Users.TesterTim;

var query = from defect in SampleData.AllDefects
            where defect.Status != Status.Closed
            where defect.AssignedTo == tim
            select defect.Summary;

The full LINQ to SQL equivalent of listing 11.7 is shown in the following listing.

Listing 12.1. Querying the database to find all Tim’s open defects

Listing 12.1 requires a certain amount of explanation, because it’s all new. First we create a new data context to work with . Data contexts are pretty multifunctional, taking responsibility for connection and transaction management, query translation, tracking changes in entities, and dealing with identity. For the purposes of this chapter, we can regard a data context as our point of contact with the database. We won’t be looking at the more advanced features here, but we take advantage of one useful capability: we tell the data context to write out all the SQL commands it executes to the console . The model-related properties used in the code for this section (Defects, Users, and so on) are all of type Table<T> for the relevant entity type. They act as the data sources for our queries.

We can’t use SampleData.Users.TesterTim to identify Tim in the main query because that object doesn’t know the ID of the relevant row in the DefectUser table. Instead, we use a separate query to load Tim’s user entity . I happen to have used dot notation for this, but a query expression would’ve worked just as well. The Single method just returns a single result from a query, throwing an exception if there isn’t exactly one element. In a real-life situation, you may have the entity as a product of other operations such as logging in—and if you don’t have the full entity, you may have its ID, which can be used equally well within the main query. As an alternative in this case, we could’ve changed the open defects query to filter based on the assignee’s name. That wouldn’t have quite been in the spirit of the original query, though.

Within the query expression , the only difference between the in-memory query and the LINQ to SQL query is the data source—instead of using SampleData.AllDefects, we use context.Defects. The final results are the same (although the ordering isn’t guaranteed), but the work has been done on the database.

As we’ve asked the data context to log the generated SQL, we can see exactly what’s going on when we run the code. The console output shows both of the queries executed on the database, along with the query parameter values:^[1]

¹ Additional log output is generated showing some details of the data context, which I’ve cut to avoid distracting from the SQL. The console output also contains the summaries printed by the foreach loop, of course.

SELECT [t0].[UserID], [t0].[Name], [t0].[UserType]
FROM [dbo].[DefectUser] AS [t0]
WHERE [t0].[Name] = @p0
-- @p0: Input String (Size = 11; Prec = 0; Scale = 0) [Tim Trotter]

SELECT [t0].[Summary]
FROM [dbo].[Defect] AS [t0]
WHERE ([t0].[AssignedToUserID] = @p0) AND ([t0].[Status] <> @p1)
-- @p0: Input Int32 (Size = 0; Prec = 0; Scale = 0) [2]
-- @p1: Input Int32 (Size = 0; Prec = 0; Scale = 0) [4]

Note how the first query fetches all of the properties of the user because we’re populating a whole entity—but the second query only fetches the summary, as that’s all we need. LINQ to SQL has also converted our two separate where clauses in the second query into a single filter on the database.

LINQ to SQL is capable of translating a wide range of expressions. Let’s try a slightly more complicated query from chapter 11, just to see what SQL is generated.

SQL Generation for a More Complex Query: A Let Clause

Our next query shows what happens when we introduce a sort of temporary variable with a let clause. In chapter 11 we considered a bizarre situation, if you remember—pretending that calculating the length of a string took a long time. Again, the query expression is exactly the same as in listing 11.11, with the exception of the data source. Listing 12.2 shows the LINQ to SQL code.

Listing 12.2. Using a `let` clause in LINQ to SQL

using (var context = new DefectModelDataContext())
{
   context.Log = Console.Out;

   var query = from user in context.Users
               let length = user.Name.Length
               orderby length
               select new { Name = user.Name, Length = length };

   foreach (var entry in query)
   {
      Console.WriteLine("{0}: {1}", entry.Length, entry.Name);
   }
}

The generated SQL is close to the spirit of the sequences we saw in figure 11.5. The innermost sequence (the first one in that diagram) is the list of users; that’s transformed into a sequence of name/length pairs (as the nested select), and then the no-op projection is applied, with an ordering by length:

SELECT [t1].[Name], [t1].[value]
FROM (
   SELECT LEN([t0].[Name]) AS [value], [t0].[Name]
   FROM [dbo].[DefectUser] AS [t0]
   ) AS [t1]
ORDER BY [t1].[value]

This is a good example of where the generated SQL is wordier than it needs to be. Although we couldn’t reference the elements of the final output sequence when performing an ordering on the query expression, you can in SQL. This simpler query would’ve worked fine:

SELECT LEN([t0].[Name]) AS [value], [t0].[Name]
FROM [dbo].[DefectUser] AS [t0]
ORDER BY [value]

Of course, what’s important is what the query optimizer does on the database—the execution plan displayed in SQL Server Management Studio Express is the same for both queries, so it doesn’t look like we’re losing out.

The final set of LINQ to SQL queries we’re going to look at are all joins.

12.1.3. Queries involving joins

We’ll try both inner joins and group joins, using the examples of joining notification subscriptions against projects. I suspect you’re used to the drill now—the pattern of the code is the same for each query, so from here on I’ll just show the query expression and the generated SQL unless something else is going on.

Explicit Joins: Matching Defects With Notification Subscriptions

Our first query is the simplest kind of join—an inner equijoin using a LINQ join clause:

// Query expression (modified from listing 11.12)
from defect in context.Defects
join subscription in context.NotificationSubscriptions
   on defect.Project equals subscription.Project
select new { defect.Summary, subscription.EmailAddress }

-- Generated SQL
SELECT [t0].[Summary], [t1].[EmailAddress]
FROM [dbo].[Defect] AS [t0]
INNER JOIN [dbo].[NotificationSubscription] AS [t1]
ON [t0].[ProjectID] = [t1].[ProjectID]

Unsurprisingly, it uses an inner join in SQL. It’d be easy to guess at the generated SQL in this case. How about a group join, though? This is where things get slightly more hectic:

// Query expression (modified from listing 11.13)
from defect in context.Defects
join subscription in context.NotificationSubscriptions
   on defect.Project equals subscription.Project
   into groupedSubscriptions
select new { Defect = defect, Subscriptions = groupedSubscriptions }

-- Generated SQL
SELECT [t0].[DefectID] AS [ID], [t0].[Created],
[t0].[LastModified], [t0].[Summary], [t0].[Severity],
[t0].[Status], [t0].[AssignedToUserID],
[t0].[CreatedByUserID], [t0].[ProjectID],
[t1].[NotificationSubscriptionID],
[t1].[ProjectID] AS [ProjectID2], [t1].[EmailAddress],
   (SELECT COUNT(*)
    FROM [dbo].[NotificationSubscription] AS [t2]
    WHERE [t0].[ProjectID] = [t2].[ProjectID]) AS [count]
FROM [dbo].[Defect] AS [t0]
LEFT OUTER JOIN [dbo].[NotificationSubscription] AS [t1]
ON [t0].[ProjectID] = [t1].[ProjectID]
ORDER BY [t0].[DefectID], [t1].[NotificationSubscriptionID]

That’s a major change in the amount of SQL generated! There are two important things to notice. First, it uses a left outer join instead of an inner join, so we’d still see a defect even if it didn’t have anyone subscribing to its project. If you want a left outer join but without the grouping, the conventional way of expressing this is to use a group join and then an extra from clause using the DefaultIfEmpty extension method on the embedded sequence. It looks odd, but it works well.

The second odd thing about the previous query is that it calculates the count for each group within the database. This is effectively a trick performed by LINQ to SQL to make sure that all the processing can be done on the server. A naive implementation would have to perform the grouping in memory, after fetching all the results. In some cases the provider could do tricks to avoid needing the count, simply spotting when the grouping ID changes, but there are issues with this approach for some queries. It’s possible that a later implementation of LINQ to SQL will be able to switch courses of action depending on the exact query.

You don’t need to explicitly write a join in the query expression to see one in the SQL. Our final queries will show joins implicitly created through property access expressions.

Implicit Joins: Showing Defect Summaries and Project Names

Let’s take a simple example. Suppose we want to list each defect, showing its summary and the name of the project it’s part of. The query expression is just a matter of a projection:

// Query expression
from defect in context.Defects
select new { defect.Summary, ProjectName = defect.Project.Name }

-- Generated SQL
SELECT [t0].[Summary], [t1].[Name]
FROM [dbo].[Defect] AS [t0]
INNER JOIN [dbo].[Project] AS [t1]
ON [t1].[ProjectID] = [t0].[ProjectID]

Note how we’ve navigated from the defect to the project via a property—LINQ to SQL has converted that navigation into an inner join. It can use an inner join here because the schema has a non-nullable constraint on the ProjectID column of the Defect table—every defect has a project. Not every defect has an assignee, though—the AssignedToUserID field is nullable, so if we use the assignee in a projection instead, a left outer join is generated:

// Query expression
from defect in context.Defects
select new { defect.Summary, Assignee = defect.AssignedTo.Name }

-- Generated SQL
SELECT [t0].[Summary], [t1].[Name]
FROM [dbo].[Defect] AS [t0]
LEFT OUTER JOIN [dbo].[DefectUser] AS [t1]
ON [t1].[UserID] = [t0].[AssignedToUserID]

Of course, if you navigate via more properties, the joins get more and more complicated. I’m not going into the details here—the important thing is that LINQ to SQL has to do a lot of analysis of the query expression to work out what SQL is required. In order to perform that analysis, it clearly needs to be able to look at the query we’ve specified. Let’s move away from LINQ to SQL specifically, and think in general terms about what LINQ providers of this kind need to do. This will apply to any provider that needs to introspect the query, rather than just being handed a delegate. At long last, it’s time to see why expression trees were added as a feature of C# 3.

12.2. Translations using IQueryable and IQueryProvider

In this section we’re going to find out the basics of how LINQ to SQL manages to convert our query expressions into SQL. This is the starting point for implementing your own LINQ provider, should you wish to. (Please don’t underestimate the technical difficulties involved in doing so—but if you like a challenge, implementing a LINQ provider is certainly interesting.) This is the most theoretical section in the chapter, but it’s useful to have some insight as to how LINQ decides whether to use in-memory processing, a database, or some other query engine.

In all the query expressions we’ve seen in LINQ to SQL, the source has been a Table<T>. But if you look at Table<T>, you’ll see it doesn’t have a Where method, or Select, or Join, or any of the other standard query operators. Instead, it uses the same trick that LINQ to Objects does—just as the source in LINQ to Objects always implements IEnumerable<T> (possibly after a call to Cast or OfType) and then uses the extension methods in Enumerable, so Table<T> implements IQueryable<T> and then uses the extension methods in Queryable. We’ll see how LINQ builds up an expression tree and then allows a provider to execute it at the appropriate time. Let’s start by looking at what IQueryable<T> consists of.

12.2.1. Introducing IQueryable<T> and related interfaces

If you look up IQueryable<T> in the documentation and see what members it contains directly (rather than inheriting), you may be disappointed. There aren’t any. Instead, it inherits from IEnumerable<T> and the nongeneric IQueryable, which in turn inherits from the nongeneric IEnumerable. So, IQueryable is where the new and exciting members are, right? Well, nearly. In fact, IQueryable just has three properties: QueryProvider, ElementType, and Expression. The QueryProvider property is of type IQueryProvider—yet another new interface to consider.

Lost? Perhaps figure 12.2 will help out—a class diagram of all the interfaces directly involved.

Figure 12.2. Class diagram based on the interfaces involved in `IQueryable<T>`

The easiest way of thinking about IQueryable is that it represents a query that’ll yield a sequence of results when you execute it. The details of the query in LINQ terms are held in an expression tree, as returned by the Expression property of the IQueryable. Executing a query is performed by beginning to iterate through an IQueryable (in other words, calling the GetEnumerator method and then MoveNext on the result) or by a call to the Execute method on an IQueryProvider, passing in an expression tree.

So, with at least some grasp of what IQueryable is for, what’s IQueryProvider? We can do more with a query than just execute it—we can also use it to build a bigger query, which is the purpose of the standard query operators in LINQ.^[2] To build up a query, we need to use the CreateQuery method on the relevant IQueryProvider.^[3]

² Well, the ones that keep deferring execution, such as Where and Join. We’ll see what happens with the aggregations such as Count in a while.

³ Both Execute and CreateQuery have generic and nongeneric overloads. The nongeneric versions make it easier to create queries dynamically in code. Compile-time query expressions use the generic version.

Think of a data source as a simple query (SELECT * FROM SomeTable in SQL, for instance)—calling Where, Select, OrderBy, and similar methods results in a different query, based on the first one. Given any IQueryable query, you can create a new query by performing the following steps:

Ask the existing query for its query expression tree (using the Expression property).
Build a new expression tree that contains the original expression and the extra functionality you want (a filter, projection, or ordering, for instance).
Ask the existing query for its query provider (using the Provider property).
Call CreateQuery on the provider, passing in the new expression tree.

Of those steps, the only tricky one is creating the new expression tree. Fortunately, there are a bunch of extension methods on the static Queryable class that do all that for us. Enough theory—let’s start implementing the interfaces so we can see all this in action.

12.2.2. Faking it: interface implementations to log calls

Before you get too excited, we’re not going to build our own fully fledged query provider in this chapter. But if you understand everything in this section, you’ll be in a much better position to build one if you ever need to—and possibly more importantly, you’ll understand what’s going on when you issue LINQ to SQL queries. Most of the hard work of query providers goes on at the point of execution, where they need to parse an expression tree and convert it into the appropriate form for the target platform. We’re concentrating on the work that happens before that—how LINQ prepares to execute a query.

We’ll write our own implementations of IQueryable and IQueryProvider, and then try to run a few queries against them. The interesting part isn’t the results—we won’t be doing anything useful with the queries when we execute them—but the series of calls made up to the point of execution. We’ll write types FakeQueryProvider and FakeQuery. The implementation of each interface method writes out the current expression involved, using a simple logging method (not shown here). Let’s look first at FakeQuery, as shown in the following listing.

Listing 12.3. A simple implementation of `IQueryable` that logs method calls

The property members of IQueryable are implemented in FakeQuery with automatic properties , which are set by the constructors. There are two constructors: a parameterless one that’s used by our main program to create a plain source for the query, and one that’s called by FakeQueryProvider with the current query expression.

The use of Expression.Constant(this) as the initial source expression is just a way of showing that the query initially represents the original object. (Imagine an implementation representing a table, for example—until you apply any query operators, the query would just return the whole table.) When the constant expression is logged, it uses the overridden ToString method, which is why we’ve given a short, constant description . This makes the final expression much cleaner than it would’ve been without the override. When we’re asked to iterate over the results of the query, we always just return an empty sequence to make life easy. Production implementations would parse the expression here, or (more likely) call Execute on their query provider and just return the result.

As you can see, not a lot is going on in FakeQuery, and the following listing shows that FakeQueryProvider is simple, too.

Listing 12.4. An implementation of `IQueryProvider` that uses `FakeQuery`

class FakeQueryProvider : IQueryProvider
{
   public IQueryable<T> CreateQuery<T>(Expression expression)
   {
      Logger.Log(this, expression);
      return new FakeQuery<T>(this, expression);
   }

   public IQueryable CreateQuery(Expression expression)
   {
      Type queryType = typeof(FakeQuery<>).MakeGenericType
         (expression.Type);
      object[] constructorArgs = new object[] { this, expression };
      return (IQueryable)Activator.CreateInstance
         (queryType, constructorArgs);
   }

   public T Execute<T>(Expression expression)
   {
      Logger.Log(this, expression);
      return default(T);
   }

   public object Execute(Expression expression)
   {
      Logger.Log(this, expression);
      return null;
   }
}

There’s even less to talk about in terms of the implementation of FakeQueryProvider than there was for FakeQuery<T>. The CreateQuery methods do no real processing but act as factory methods for the query. The only tricky bit is that the nongeneric overload still needs to provide the right type argument for FakeQuery<T> based on the Type property of the given expression. The Execute method overloads just return empty results after logging the call. This is where a lot of analysis would normally be done, along with the actual call to the web service, database, or whatever the target platform is.

Even though we’ve done no real work, when we start to use FakeQuery as the source in a query expression, interesting things start to happen. I’ve already let slip how we’re able to write query expressions without explicitly writing methods to handle the standard query operators: it’s all about extension methods, this time the ones in the Queryable class.

12.2.3. Gluing expressions together: the Queryable extension methods

Just as the Enumerable type contains extension methods on IEnumerable<T> to implement the LINQ standard query operators, the Queryable type contains extension methods on IQueryable<T>. There are two big differences between the implementations in Enumerable and those in Queryable.

First, the Enumerable methods all use delegates as their parameters—the Select method takes a Func<TSource,TResult>, for example. That’s fine for in-memory manipulation, but for LINQ providers that execute the query elsewhere, we need a format we can examine more closely—expression trees. For example, the corresponding overload of Select in Queryable takes a parameter of type Expression<Func <TSource,TResult>>. The compiler doesn’t mind at all—after query translation, it has a lambda expression that it needs to pass as an argument to the method, and lambda expressions can be converted to either delegate instances or expression trees.

This is how LINQ to SQL can work so seamlessly. The four key elements involved are all new features of C# 3: lambda expressions, the translation of query expressions into normal expressions that use lambda expressions, extension methods, and expression trees. Without all four, there’d be problems. If query expressions were always translated into delegates, for instance, they couldn’t be used with a provider such as LINQ to SQL, which requires expression trees. Figure 12.3 shows two possible paths taken by query expressions; they differ only in what interfaces their data source implements.

Figure 12.3. A query taking two paths, depending on whether the data source implements `IQueryable` or only `IEnumerable`

Note how in figure 12.3 the early parts of the compilation process are independent of the data source. The same query expression is used, and it’s translated in exactly the same way. It’s only when the compiler looks at the translated query to find the appropriate Select and Where methods to use that the data source is truly important. At that point, the lambda expressions can be converted to either delegate instances or expression trees, potentially giving radically different implementations: typically in-memory for the left path, and SQL executing against a database in the right path.

Just to hammer home a familiar point, the decision in figure 12.3 of whether to use Enumerable or Queryable has no explicit support in the C# compiler. These aren’t the only two possible paths, as we’ll see later with Parallel LINQ and Reactive LINQ. You can create your own interface and implement extension methods following the query pattern, or even create a type with appropriate instance methods.

The second big difference between Enumerable and Queryable is that the Enumerable extension methods do the actual work associated with the corresponding query operator (or at least they build iterators that do that work). There’s code in Enumerable.Where to execute the specified filter and only yield appropriate elements as the result sequence, for example. By contrast, the query operator implementations in Queryable do little: they just create a new query based on the parameters or call Execute on the query provider, as described at the end of section 12.2.1. In other words, they’re only used to build up queries and request that they be executed—they don’t contain the logic behind the operators. This means they’re suitable for any LINQ provider that uses expression trees—but they’re useless on their own. They’re the glue between your code and the details of the provider.

With the Queryable extension methods available and ready to use our IQueryable and IQueryProvider implementations, it’s finally time to see what happens when we use a query expression with our custom provider.

12.2.4. The fake query provider in action

Listing 12.5 shows a simple query expression, which (supposedly) finds all the strings in our fake source beginning with “abc” and projects the results into a sequence of the lengths of the matching strings. We iterate through the results, but don’t do anything with them, as we know already that they’ll be empty. Of course, we have no source data, and we haven’t written any code to do any real filtering—we’re just logging which calls are made by LINQ in the course of creating the query expression and iterating through the results.

Listing 12.5. A simple query expression using the fake query classes

var query = from x in new FakeQuery<string>()
            where x.StartsWith("abc")
            select x.Length;
foreach (int i in query) { }

What would you expect the results of running listing 12.5 to be? In particular, what would you like to be logged last, at the point where we’d normally expect to do some real work with the expression tree? Here are the results, reformatted slightly for clarity:

FakeQueryProvider.CreateQuery
Expression=FakeQuery.Where(x => x.StartsWith("abc"))

FakeQueryProvider.CreateQuery
Expression=FakeQuery.Where(x => x.StartsWith("abc"))
                    .Select(x => x.Length)

FakeQuery<Int32>.GetEnumerator
Expression=FakeQuery.Where(x => x.StartsWith("abc"))
                    .Select(x => x.Length)

The two important things to note are that GetEnumerator is only called at the end, not on any intermediate queries: by the time GetEnumerator is called, we have all the information present in the original query expression. We haven’t manually had to keep track of earlier parts of the expression in each step—a single expression tree captures all the information so far.

Don’t be fooled by the concise output, by the way—the actual expression tree is deep and complicated, particularly due to the where clause including an extra method call. This expression tree is what LINQ to SQL would examine to work out what query to execute. LINQ providers could build up their own queries (in whatever form they may need) as calls to CreateQuery are made, but usually looking at the final tree when GetEnumerator is called is simpler, as all the necessary information is available in one place.

The final call logged by listing 12.5 was to FakeQuery.GetEnumerator, and you may be wondering why we also need an Execute method on IQueryProvider. Well, not all query expressions generate sequences—if you use an aggregation operator such as Sum, Count, or Average, we’re no longer really creating a source—we’re evaluating a result immediately. That’s when Execute is called, as shown by the following listing and its output.

Listing 12.6. `IQueryProvider.Execute`

var query = from x in new FakeQuery<string>()
            where x.StartsWith("abc")
            select x.Length;

double mean = query.Average();

// Output
FakeQueryProvider.CreateQuery
Expression=FakeQuery.Where(x => x.StartsWith("abc"))

FakeQueryProvider.CreateQuery
Expression=FakeQuery.Where(x => x.StartsWith("abc"))
                    .Select(x => x.Length)

FakeQueryProvider.Execute
Expression=FakeQuery.Where(x => x.StartsWith("abc"))
                   .Select(x => x.Length)
                   .Average()

The FakeQueryProvider can be quite useful when it comes to understanding what the C# compiler is doing behind the scenes with query expressions. It’ll show the transparent identifiers introduced within a query expression, along with the translated calls to SelectMany, GroupJoin, and the like.

12.2.5. Wrapping up IQueryable

We haven’t written any of the significant code that a real query provider would need in order to get useful work done, but hopefully our fake provider has given you insight into how LINQ providers are given the information from query expressions. It’s all built up by the Queryable extension methods, given an appropriate implementation of IQueryable and IQueryProvider.

We’ve gone into a bit more detail in this section than we will for the rest of the chapter, as it’s involved the foundations that underpin the LINQ to SQL code we saw earlier. Even though you’re unlikely to need to implement the query interfaces yourself, the steps involved in taking a C# query expression and (at execution time) running some SQL on a database are quite profound and lie at the heart of the big features of C# 3. Understanding why C# has gained these features will help keep you more in tune with the language.

This is the end of our coverage of LINQ using expression trees. The rest of the chapter involves in-process queries using delegates—but as we’ll see, there can still be a great deal of variety and innovation in how LINQ can be used. Our first port of call is LINQ to XML, which is “merely” an XML API designed to integrate well with LINQ to Objects.

12.3. LINQ-friendly APIs and LINQ to XML

LINQ to XML is by far the most pleasant XML API I’ve ever used. Whether you’re consuming existing XML, generating a new document, or a bit of both, it’s easy to use and understand. Part of that is completely independent of LINQ, but a lot of it’s due to how well it interacts with the rest of LINQ. As with section 12.1, I’ll give you just enough introductory information to understand the examples, and then see how LINQ to XML blends its own query operators with those in LINQ to Objects. By the end of the section you may have some ideas about how you can make your own APIs work in harmony with the framework.

12.3.1. Core types in LINQ to XML

LINQ to XML lives in the System.Xml.Linq assembly, and most of the types are in the System.Xml.Linq namespace too.^[4] Almost all of the types in that namespace have a prefix of X; so whereas the normal DOM API has an XmlElement type, the LINQ to XML equivalent is XElement. This makes it easy to spot when code is using LINQ to XML, even if you’re not immediately familiar with the exact type involved. Figure 12.4 shows the types you’ll use most often.

⁴ I regularly forget whether it’s System.Xml.Linq or System.Linq.Xml. I would say that if you remember that it’s an XML API first and foremost, you should be okay—but it’s clearly not working for me. Maybe you’ll have better luck.

Figure 12.4. Class diagram for LINQ to XML, showing the most commonly used types

Here’s a brief rundown of the types shown:

XName is used for names of elements and attributes. Instances are usually created using an implicit conversion from a string (in which case no namespace is used) or via the +(XNamespace, string) overloaded operator.
XNamespace represents an XML namespace—a URI, basically. Instances are usually created by the implicit conversion from string.
XObject is the common ancestor of both XNode and XAttribute: unlike in the DOM API, an attribute isn’t a node in LINQ to XML. Methods returning child nodes don’t include attributes, for example.
XNode represents a node in the XML tree. It defines various members to manipulate and query the tree. There are various other classes derived from XNode that aren’t shown in figure 12.4, such as XComment and XDeclaration. These are used relatively infrequently—the most common node types are documents, elements, and text.
XAttribute is an attribute with a name and a value. The value is intrinsically text, but there are explicit conversions to many other data types, such as int and DateTime.
XContainer is a node in the XML tree that can have child content—it’s an element or a document, basically.
XText is a text node; a further derived type XCData is used to represent CDATA text nodes. (Roughly equivalent to a verbatim string literal—less escaping is required.) XText is rarely instantiated directly in user code; instead when a string is used as the content of an element or document, that’s converted into an XText instance.
XElement is an element. This is the most commonly used class in LINQ to XML, along with XAttribute. Unlike in the DOM API, you can create an XElement without creating a document to contain it; unless you really need a document object (for a custom XML declaration, perhaps), you can often just use elements.
XDocument is a document. Its root element is accessed using the Root property—this is the equivalent to XmlDocument.DocumentElement. As noted earlier, this often isn’t required.

More types are available even within the document model, and a few other types for things such as loading and saving options—but these are the most important ones. Of the preceding types, the only ones you regularly need to reference explicitly are XElement and XAttribute. If you use namespaces, you’ll use XNamespace as well, but most of the rest of the types can be ignored the rest of the time. It’s amazing how much you can do with so few types. Speaking of amazing, I can’t resist showing you how the namespace support works in LINQ to XML. We’re not going to use namespaces anywhere else, but it’s a good example of how a well-designed set of conversions and operators can make life easier. It’ll also ease us into our first topic: constructing elements.

If you only need to specify the name of an element or attribute without a namespace, you can just use a string. You won’t find any constructors for either type with parameters of type string though—they all accept an XName. An implicit conversion exists from string to XName, and also from string to XNamespace. Adding together a namespace and a string also gives you an XName. There’s a fine line between operator abuse and genius, but in this case LINQ to XML really makes it work. Here’s some code to create two elements—one within a namespace and one not:

XElement noNamespace = new XElement("no-namespace");
XNamespace ns = "http://csharpindepth.com/sample/namespace";
XElement withNamespace = new XElement(ns + "in-namespace");

This makes for readable code even when namespaces are involved—which comes as a welcome relief from some other APIs. But we’ve just created two empty elements. How do we give them some content?

12.3.2. Declarative construction

Normally in the DOM API, you create an element and then add content to it. We can do that in LINQ to XML as well, via the Add method inherited from XContainer—but that’s not the idiomatic LINQ to XML way of doing things.^[5] It’s worth looking at the signature of XContainer.Add though, because it introduces us to the content model. You might’ve expected a signature of Add(XNode) or perhaps Add(XObject)—but in fact it’s just Add(object). The same pattern is used for the XElement (and XDocument) constructor signatures. After the name, you can specify nothing (to create an empty element), a single object (to create an element with a single child node), or an array of objects to create multiple child nodes. In the multiple children case, a parameter array is used (the params keyword in C#), which means the compiler will create the array for you—you can just keep listing arguments.

⁵ In some ways it’s a shame that XElement doesn’t implement IEnumerable—as otherwise collection initializers would be another approach to construction. Never mind—using the constructor works neatly anyway.

The use of plain object for the content type may sound crazy, but it’s incredibly useful. When you add content—whether it’s through a constructor or the Add method—the following points are considered.

Null references are ignored.
XNode and XAttribute instances are added in a relatively straightforward manner; they’re cloned if they already have parents, but otherwise no conversion is required. (Some other sanity checks are performed, for instance to make sure you don’t have duplicate attributes in a single element.)
Strings, numbers, dates, times, and so on are added by converting them into XText nodes using standard XML formatting.
If the argument implements IEnumerable (and isn’t covered by anything else) then Add will iterate over its contents and add each value in turn, recursing where necessary.
Anything that doesn’t have special-case handling is converted into text by just calling ToString().

This means that you often don’t need to prepare your content in a special way before adding it to an element—LINQ to XML just does the right thing for you. The details are explicitly documented, so you don’t need to worry about it being too magical—but it really works. Constructing nested elements leads to code that naturally resembles the hierarchical structure of the tree. This is best shown with an example. Here’s a snippet of LINQ to XML code:

new XElement("root",
   new XElement("child",
      new XElement("grandchild", "text")),
   new XElement("other-child"));

And here’s the XML of the created element—note the visual similarity between the code and the output:

<root>
  <child>
    <grandchild>text</grandchild>
  </child>
  <other-child />
</root>

So far, so good—but the important part for us is the fourth bullet in the earlier list, where sequences are processed recursively... because that lets you build an XML structure out of a LINQ query in a natural way. For example, the book’s website has some code to generate an RSS feed from its database. The statement to construct the XML document is 28 lines long—which I’d normally expect to be an abomination—but it’s remarkably pleasant to read.^[6] That statement contains two LINQ queries—one to populate an attribute value, and the other to provide a sequence of elements, each representing a news item. As you read the code, it’s obvious what the resulting XML will look like.

⁶ One contributing factor to the readability is an extension method I created to convert anonymous types into elements, using the properties for child elements. If you’re interested, the code is freely available as part of my MiscUtil project (see http://mng.bz/xDMt). It only helps when the XML structure you need fits a certain pattern, but in that case it can reduce the clutter of XElement constructor calls significantly.

To make this more concrete, let’s take two simple examples from the defect tracking system. I’ll demonstrate using the LINQ to Objects sample data, but we could use almost identical queries to work with another LINQ provider instead. First we’ll build an element containing all the users in the system. In this case we just need a projection, so the following listing uses dot notation:

Listing 12.7. Creating elements from the sample users

var users = new XElement("users",
    SampleData.AllUsers.Select(user => new XElement("user",
       new XAttribute("name", user.Name),
       new XAttribute("type", user.UserType)))
);
Console.WriteLine(users);

// Output
<users>
  <user name="Tim Trotter" type="Tester" />
  <user name="Tara Tutu" type="Tester" />
  <user name="Deborah Denton" type="Developer" />
  <user name="Darren Dahlia" type="Developer" />
  <user name="Mary Malcop" type="Manager" />
  <user name="Colin Carton" type="Customer" />
</users>

If we want to make a slightly more complex query, it’s probably worth using a query expression. Listing 12.8 creates another list of users, but this time only the developers within SkeetySoft. For a bit of variety, this time each developer’s name is a text node within an element instead of an attribute value:

Listing 12.8. Creating elements with text nodes

var developers = new XElement("developers",
   from user in SampleData.AllUsers
   where user.UserType == UserType.Developer
   select new XElement("developer", user.Name)
);
Console.WriteLine(developers);

// Output
<developers>
  <developer>Deborah Denton</developer>
  <developer>Darren Dahlia</developer>
</developers>

This sort of thing can be applied to all the sample data, leaving a document structure like this:

<defect-system>
  <projects>
    <project name="..." id="...">
      <subscription email="..." />
    </project>
  </projects>
  <users>
    <user name="..." id="..." type="..." />
  </users>
  <defects>
    <defect id="..." summary="..." created="..." project="..."
            assigned-to="..." created-by="..." status="..."
            severity="..." last-modified="..." />
  </defects>
</defect-system>

You can see the code to generate all of this in XmlSampleData.cs in the downloadable solution. It demonstrates an alternative to the one-huge-statement approach: each of the elements under the top level is created separately, then glued together like this:

XElement root = new XElement("defect-system", projects, users, defects);

We’ll use this XML to demonstrate our next LINQ integration point: queries. Let’s start with the query methods available on a single node.

12.3.3. Queries on single nodes

You may be expecting me to reveal that XElement implements IEnumerable and that LINQ queries come for free. It’s not quite that simple, because there are so many different things that an XElement could iterate through. XElement contains a number of axis methods that are used as query sources. If you’re familiar with XPath, the idea of an axis will no doubt be familiar to you. Here are the axis methods used directly for querying a single node, each of which returns an appropriate IEnumerable<T>:

Ancestors
Annotations
Descendants
AncestorsAndSelf
Attributes
DescendantsAndSelf
DescendantNodes
Elements
ElementsBeforeSelf
DescendantNodesAndSelf
ElementsAfterSelf
Nodes

All of these are fairly self-explanatory (and the MSDN documentation provides more details). There are useful overloads to retrieve only nodes with an appropriate name: calling Descendants("user") on an XElement will return all user elements underneath the element you call it on, for instance.

In addition to these calls returning sequences, some methods return a single result—Attribute and Element are the most important, returning the named attribute and the first child element with the specified name, respectively. Additionally, there are explicit conversions from an XAttribute or XElement to any number of other types, such as int, string, and DateTime. These are important for both filtering and projecting results. Each conversion to a non-nullable value type also has a conversion to its nullable equivalent—these (and the conversion to string) return a null value if you invoke them on a null reference. This null propagation means you don’t have to check for the presence or absence of attributes or elements within the query—you can use the query results instead.

What does this have to do with LINQ? Well, the fact that multiple search results are returned in terms of IEnumerable<T> means you can use the normal LINQ to Objects methods after finding some elements. The following listing shows an example of finding the names and types of the users, this time starting off with the sample data in XML.

Listing 12.9. Displaying the users within an XML structure

XElement root = XmlSampleData.GetElement();

var query = root.Element("users").Elements().Select(user => new
            {
              Name = (string) user.Attribute("name"),
              UserType = (string) user.Attribute("type")
            });
foreach (var user in query)
{
   Console.WriteLine ("{0}: {1}", user.Name, user.UserType);
}

After creating the data at the start, we navigate down to the users element, and ask it for its direct child elements. This two-step fetch could be shortened to just root.Descendants("user"), but it’s good to know about the more rigid navigation so you can use it where necessary. It’s also more robust in the face of changes to the document structure, such as another (unrelated) user element being added elsewhere in the document.

The rest of the query expression is merely a projection of an XElement into an anonymous type. I’ll admit that we’re cheating slightly with the user type: we’ve kept it as a string instead of calling Enum.Parse to convert it into a proper UserType value. The latter approach works perfectly well—but it’s quite longwinded when you only need the string form, and the code becomes hard to format sensibly within the strict limits of the printed page.

There’s nothing particularly special here—returning query results as sequences is fairly common, after all. It’s worth noting how seamlessly we can go from domain-specific query operators to general-purpose ones. That’s not the end of the story though—LINQ to XML has some extra extension methods to add as well.

12.3.4. Flattened query operators

We’ve seen how the result of one part of a query is often a sequence—and in LINQ to XML it’s often a sequence of elements. What if you wanted to then perform an XML-specific query on each of those elements? To present a somewhat contrived example, we can find all the projects in our sample data with root.Element("projects"). Elements(), but how can we find the subscription elements within them? We need to apply another query to each element, and then flatten the results. (Again, we could use root.Descendants("subscription")—but imagine a more complex document model where that wouldn’t work.)

This may sound familiar, and it is—LINQ to Objects already provides the SelectMany operator (represented by multiple from clauses in a query expression) to do this. So we could write our query as

from project in root.Element("projects").Elements()
from subscription in project.Elements("subscription")
select subscription

As there are no elements within a project other than subscription, we could just use the overload of Elements that doesn’t specify a name. I find it clearer to specify the element name in this case, but it’s often just a matter of taste. (The same argument could be made for calling Element("projects").Elements("project") to start with, admittedly.) Here’s the same query written using dot notation and an overload of SelectMany that only returns the flattened sequence, without performing any further projections:

root.Element("projects").Elements()
    .SelectMany(project => project.Elements("subscription"))

Neither of these queries are completely unreadable by any means, but they’re not ideal. LINQ to XML provides a number of extension methods (in the System. Xml.Linq.Extensions class) which either act on a specific sequence type or are generic with a constrained type argument, to cope with the lack of generic interface covariance prior to C# 4. There’s InDocumentOrder, which does exactly what it sounds like—and most of the axis methods mentioned in section 12.4.3 are also available as extension methods. This means that we can convert our query into this simpler form:

root.Element("projects").Elements().Elements("subscription")

This sort of construction makes it easy to write XPath-like queries in LINQ to XML without everything just being a string. If you want to use XPath, that’s available too via more extension methods—but personally I’ve found that the query methods have served me well more often than not. This also supports mixing the query with the operators of LINQ to Objects. For example, to find all the subscriptions for projects with a name including “Media,” you could use

root.Element("projects").Elements()
    .Where(project => ((string) project.Attribute("name"))
                                          .Contains("Media"))
    .Elements("subscription")

Before we move on to Parallel LINQ, let’s think about how the design of LINQ to XML merits the “LINQ” part of its title—and how you could potentially apply the same techniques to your own API.

12.3.5. Working in harmony with LINQ

Some of the design decisions in LINQ to XML seem odd if you take them in isolation as part of an XML API, but in the context of LINQ they make perfect sense. The designers clearly imagined how their types could be used within LINQ queries, and how they could interact with other data sources. If you’re writing your own data access API, in whatever context that might be, it’s worth taking the same things into account. If someone uses your methods in the middle of a query expression, are they left with something useful? Will they be able to use some of your query methods, then some from LINQ to Objects, then some more of yours in one fluent expression?

We’ve seen three ways in which LINQ to XML has accommodated the rest of LINQ:

It’s good at consuming sequences with its approach to construction. LINQ is deliberately declarative, and LINQ to XML supports this with a declarative way of creating XML structures.
It returns sequences from its query methods. This is probably the most obvious step that data access APIs would already take: returning query results as IEnumerable<T> or a class implementing it is pretty much a no-brainer.
It extends the set of queries you can perform on sequences of XML types: this makes it feel like a unified querying API, even though some of it’s XML-specific.

You may be able to think of other ways in which your own libraries can play nicely with LINQ: these aren’t the only options you should consider, but they’re a good starting point. Above all, I’d urge you to put yourself in the shoes of a developer wanting to use your API within code that’s already using LINQ. What might such a developer want to achieve? Can LINQ and your API be mixed easily, or are they really aiming for different goals?

We’re roughly halfway through our whirlwind tour of different approaches to LINQ. Our next stop is in some ways reassuring and in some ways terrifying: we’re back to querying simple sequences, but this time in parallel...

12.4. Replacing LINQ to Objects with Parallel LINQ

I’ve been following Parallel LINQ for a long time. I first came across it when Joe Duffy introduced it on his blog in September 2006 (see http://mng.bz/vYCO). The first Community Technology Preview (CTP) was released in November 2007, and the overall feature set has evolved over time too. It’s now part of a wider effort called Parallel Extensions, which is part of .NET 4, aiming to provide higher level building blocks for concurrent programming than the relatively small set of primitives we’ve had to work with until now. There’s a lot more to Parallel Extensions than Parallel LINQ—or PLINQ, as it’s often known—but we’ll only be looking at the LINQ aspect here.

The idea behind Parallel LINQ is that you should be able to take a LINQ to Objects query that’s taking a long time and make it run faster by using multiple threads to take advantage of multiple cores—with as few changes to the query as possible. As with anything to do with concurrency, it’s not quite as simple as that, but you may be surprised at just what can be achieved. Of course, we’re still trying to think bigger than individual LINQ technologies—we’re thinking about the different models of interaction involved, rather than the precise details. But if you’re interested in concurrency, I’d heartily recommend that you dive into Parallel Extensions—it’s one of the most promising approaches to parallelism that I’ve come across recently.

I’m going to use a single example for this section: rendering a Mandelbrot set image (see http://mng.bz/D6YL). Let’s start off by trying to get it right with a single thread before moving into trickier territory.

12.4.1. Plotting the Mandelbrot set with a single thread

Before any mathematicians attack me, I should point out that I’m using the term Mandelbrot set loosely here. The details aren’t really important—but these aspects are

We’re trying to create a rectangular image, given various options such as width, height, origin and search depth.
For each pixel in the image, we’re going to calculate a byte value that will end up as the index into a 256-entry palette.
The calculation of one pixel value doesn’t rely on any other results.

The last point is absolutely crucial—it means this task is embarrassingly parallel. In other words, there’s nothing in the task itself that makes it hard to parallelize. We still need a mechanism for distributing the workload across threads and then gathering the results together, but the rest should be easy. PLINQ will be responsible for the distribution and collation (with a little help and care); we just need to express what we want to do. For the purposes of demonstrating multiple approaches, I’ve put together an abstract base class that’s responsible for setting things up, running the query, and displaying the results, along with a method to compute the color of an individual pixel. An abstract method is responsible for creating a byte array of values, which are then converted into the image. The first row of pixels comes first, left to right, then the second row, and so on. Each example here is just an implementation of this method.

I should note that using LINQ really isn’t an ideal solution here—there are various inefficiencies in the way that I’m doing things. Don’t focus on that side of things: concentrate on the idea that we have an embarrassingly parallel query, and we want to execute it across multiple cores. The following listing shows the single-threaded version in all its simple glory:

Listing 12.10. Single-threaded Mandelbrot generation query

var query = from row in Enumerable.Range(0, Height)
            from column in Enumerable.Range(0, Width)
            select ComputeIndex(row, column);

return query.ToArray();

We iterate over every row and every column within each row, computing the index of the relevant pixel. Calling ToArray() evaluates the resulting sequence, converting it into an array. Figure 12.5 shows the beautiful results.

Figure 12.5. Mandelbrot image generated on a single thread

This took about 5.5 seconds to generate on my dual-core laptop; the ComputeIndex method performs more iterations than we really need, in order to make the timing differences more obvious.^[7] Now that we have a benchmark in terms of both timing and what the results should look like, let’s try to parallelize the query.

⁷ Proper benchmarking is hard—particularly when threading is involved. I haven’t attempted to do rigorous measurements or anything of the kind. The timings given are just meant to be indicative of “faster” and “slower”; please take the numbers with a pinch of salt.

12.4.2. Introducing ParallelEnumerable, ParallelQuery, and AsParallel

Parallel LINQ brings with it several new types... but in many cases, you’ll never see their names mentioned. They live in the System.Linq namespace, so you don’t even need to change using directives. ParallelEnumerable is a static class, similar to Enumerable—it mostly contains extension methods, the majority of which extend ParallelQuery.

This latter type has both a nongeneric and a generic form (ParallelQuery and ParallelQuery<TSource>) but most of the time you’ll use the generic form, just as IEnumerable<T> is more widely used than IEnumerable. Additionally, there’s OrderedParallelQuery<TSource>, which is the parallel equivalent of IOrderedEnumerable<T>. The relationships between all of these types are shown in figure 12.6.

Figure 12.6. Class diagram for Parallel LINQ, including relationship to normal LINQ interfaces

As you can see, ParallelQuery<TSource> implements IEnumerable<TSource>, so once you’ve constructed a query appropriately, you can iterate through the results in the normal way. Once you have a parallel query, the extension methods in ParallelEnumerable take precedence over the ones in Enumerable (because ParallelQuery<T> is more specific than IEnumerable<T>; see section 10.2.3 if you need a reminder of the rules)—which is how the parallelism is maintained throughout a query. There’s a parallel equivalent to all the LINQ standard query operators—although you should be careful if you’ve created any of your own extension methods. You’ll still be able to call them, but they’ll force the query to be single-threaded from that point onward.

So how do you get a parallel query to start with? By calling AsParallel—an extension method in ParallelEnumerable, but which extends IEnumerable<T>. So we can parallelize our Mandelbrot query incredibly simply, as shown in the following listing.

Listing 12.11. First attempt at a multithreaded Mandelbrot generation query

var query = from row in Enumerable.Range(0, Height)
.AsParallel()
            from column in Enumerable.Range(0, Width)
            select ComputeIndex(row, column);

return query.ToArray();

Job done? Well, not quite. This query does run in parallel—but the results aren’t quite what we need: it doesn’t maintain the order in which we process the rows. Instead of our beautiful Mandelbrot image, we get something like figure 12.7... although the exact details change every time, of course.

Figure 12.7. Mandelbrot image generated using an unordered query, resulting in some sections being incorrectly placed

Oops. On the bright side, this rendered in about 3.2 seconds, so my machine was clearly making use of its second core. On the other hand, getting the right answer is pretty important.

You may be surprised to hear that this is a deliberate feature of Parallel LINQ. Ordering a parallel query requires more coordination between the threads, and the whole purpose of parallelization is to improve performance—so PLINQ defaults to an unordered query. It’s a bit of a nuisance in our case though.

12.4.3. Tweaking parallel queries

Fortunately, there’s a way out of this—you just need to force the query to be treated as ordered, which is available via the AsOrdered extension method. Listing 12.12 shows the fixed code, which produces the original image. It’s slightly slower than the unordered query, but still significantly faster than the single-threaded version.

Listing 12.12. Multithreaded Mandelbrot query maintaining ordering

var query = from row in Enumerable.Range(0, Height)
                                  .AsParallel().AsOrdered()
            from column in Enumerable.Range(0, Width)
            select ComputeIndex(row, column);

return query.ToArray();

The nuances of ordering are beyond the scope of this book, but I recommend that you read this blog post, http://blogs.msdn.com/pfxteam/archive/2008/06/11/8592301.aspx>, which goes into the gory details. A number of other methods can be used to alter how the query behaves:

AsUnordered—Makes an ordered query unordered; if you only need results to be ordered for the first part of a query, this allows later stages to be executed more efficiently.
WithCancellation—Specifies a cancellation token to be used with this query. Cancellation tokens are used throughout Parallel Extensions to allow tasks to be cancelled in a safe, controlled manner.
WithDegreeOfParallelism—Allows you to specify the maximum number of concurrent tasks used to execute the query. You could use this to limit the number of threads used if you wanted to avoid swamping the machine, or to increase the number of threads used for a query which wasn’t CPU-bound.
WithExecutionMode—Can be used to force the query to execute in parallel, even if Parallel LINQ thinks it’d execute faster as a single-threaded query.
WithMergeOptions—Allows you to tweak how the results are buffered: disabling buffering gives the shortest time before the first result is returned, but also lower throughput; full buffering gives the highest throughput, but no results are returned before the query has executed completely. The default is a compromise between the two.

The important point is that aside from ordering, these shouldn’t affect the results of the query. You can design your query and test it in LINQ to Objects, then parallelize it, work out your ordering requirements, and tweak it if necessary to perform just how you want it to. If you showed the final query to someone who knew LINQ but not PLINQ, you’d only have to explain the PLINQ-specific method calls—the rest of the query would be familiar. Have you ever seen such an easy way to achieve concurrency? (The rest of Parallel Extensions is aimed at achieving simplicity where possible, too.)

Play with the Code Yourself

A couple of further points are demonstrated in the downloadable source code: if you parallelize across the whole query of pixels rather than just the rows, then an unordered query looks even weirder; and there’s a ParallelEnumerable.Range method that gives PLINQ a bit more information than calling Enumerable.Range(...).AsParallel(). I used AsParallel() in this section, as that’s the more general way of parallelising a query: most queries don’t start with a range.

Changing the in-process query model from single-threaded to parallel is quite a small conceptual leap, really. In our next section we’ll turn the model on its head.

12.5. Inverting the query model with LINQ to Rx

All of the LINQ libraries we’ve seen so far have one thing in common: you pull data from them using IEnumerable<T>. At first sight, that seems so obvious that it’s not worth saying—what would be the alternative? Well, how about if you push the data instead of pulling it? Instead of the data consumer being in control, the provider can be in the driving seat, letting the data consumer react when new data is available. Don’t worry too much if all this sounds dauntingly different: you actually know about the fundamental concept already, in the form of events. If you’re comfortable with the idea of subscribing to an event, reacting to it, and unsubscribing later, that’s a good starting point.

Reactive Extensions for .NET is a Microsoft project on DevLabs (see http://mng.bz/R7ip and http://mng.bz/HCLP); versions are available for .NET 3.5 SP1, .NET 4, Silverlight 3 and 4, and there’s even a version targeting JavaScript. You may hear it going by various names, but Rx and LINQ to Rx are the most common abbreviations, and they’re the ones I’ll use here. Its scope is more than just the reactive side of things we’re looking at here—in particular, there’s an interesting assembly called System.Interactive that contains various extra LINQ to Objects methods; the push operations are implemented within System.Reactive. Even within the push model we’ll barely be scratching the surface here. I know this is true for everything we’ve covered in this chapter, but I think it’s particularly applicable in this section: not only is there a lot to learn about the library itself, but it’s a whole different way of thinking. There are loads of videos on Channel 9 (see http://mng.bz/QoXE)—some are based on the mathematical aspects, whereas others are more practical. In this section I’ll be emphasizing the way that the LINQ concepts can be applied to this push model for data flow.

Enough of the introduction... let’s meet the two interfaces that form the basis of LINQ to Rx.

12.5.1. IObservable<T> and IObserver<T>

The data model of LINQ to Rx is the mathematical dual of the normal IEnumerable<T> model.^[8] When you iterate over a pull collection, you effectively start off by saying, “Please give me an iterator” (the call to GetEnumerator) and then repeatedly say “Is there another item? If so, I’d like it now” (via calls to MoveNext and Current). LINQ to Rx reverses this. Instead of requesting an iterator, you provide an observer. Then, instead of requesting the next item, your code is told when one is ready—or when an error occurs, or the end of the data is reached. Here are the declarations of the two interfaces involved:

⁸ For a more detailed examination of this duality—and the essence of LINQ itself—I recommend Bart de Smet’s “MinLinq” blog post at http://mng.bz/96Wh.

public interface IObservable<T>
{
   IDisposable Subscribe (IObserver<T> observer);
}

public interface IObserver<T>
{
   void OnNext (T value);
   void OnCompleted();
   void OnException (Exception error);
}

These interfaces are actually part of .NET 4 (in the System namespace), even though the rest of LINQ to Rx is in a separate download. In fact, they’re IObservable<out T> and IObserver<in T> in .NET 4, expressing the covariance of IObservable and the contravariance of IObserver. We’ll learn more about generic variance in the next chapter, but I’m presenting the interfaces here as if they were invariant for the sake of simplicity. One concept at a time! Figure 12.8 shows the duality in terms of how data flows in each model.

Figure 12.8. Sequence diagram showing the duality of `IEnumerable<T>` and `IObservable<T>`

I suspect I’m not alone in finding the push model harder to think about, as it has the natural ability to work asynchronously—but look at how much simpler it is than the pull model, in terms of the flow diagram. This is partly due to the multiple method approach of the pull model: if IEnumerator<T> just had a method with a signature of bool TryGetNext(out T item), it’d be somewhat simpler.

Earlier I mentioned that LINQ to Rx is similar to the events we’re already familiar with. Calling Subscribe on an observable is like using += with an event to register a handler. The disposable value returned by Subscribe remembers the observer you passed in: disposing of it is like using -= with the same handler. In many cases you really don’t need to unsubscribe from the observable; it’s really available in case you need to unsubscribe halfway through a sequence—the equivalent of breaking out of a foreach loop early. Failing to dispose of an IDisposable value may feel like anathema to you, but it’s often safe in LINQ to Rx. None of the examples in this chapter will use the return value of Subscribe.

That’s all there is to IObservable<T>—but what about the observer itself? Why does it have three methods? Consider the normal pull model where for any MoveNext/Current pair of calls, three things can happen:

We may be at the end of the sequence, in which case MoveNext returns false.
We may not have reached the end of the sequence, in which case MoveNext returns true, and Current returns the new value.
An error may occur—we could fail to read the next line from a network connection, for example. In this case, an exception would be thrown.

The IObserver<T> interface represents each of these options as a separate method. Typically an observer will have its OnNext method called repeatedly, and then finally OnCompleted—unless there’s an error of some kind, in which case OnError will be called instead. After the sequence has completed or encountered an error, no further method calls will be made. You rarely need to implement IObserver<T> directly, though. There are many extension methods on IObservable<T>, including overloads for Subscribe. These allow you to subscribe to an observable by just providing appropriate delegates: usually you provide a delegate to be executed for each item, and then optionally one to be executed on completion, on error, or both.

With that bit of theory out of the way, we can see some actual code using LINQ to Rx.

12.5.2. Starting simply (again)

We’re going to demonstrate LINQ to Rx in the same way we started off with LINQ to Objects—using a range. Instead of Enumerable.Range, we’ll use Observable.Range, which creates an observable range. Each time an observer subscribes to the range, the numbers are emitted to that observer using OnNext, followed by OnCompleted. We’ll start off as simply as we can, just printing out each value as we receive it, and a confirmation message at the end or if an error occurs. The following listing shows that this is actually less code than you’d need for the pull model.

Listing 12.13. First contact with `IObservable<T>`

var observable = Observable.Range(0, 10);
observable.Subscribe(x => Console.WriteLine("Received {0}", x),
                     e => Console.WriteLine("Error: {0}", e),
                     () => Console.WriteLine("Finished"));

In this case it’s hard to see how we could get an error, but I’ve included the error notification delegate for completeness. The results are as you’d expect:

Received 0
Received 1
...
Received 9
Finished

The observable returned by the Range method is known as a cold observable: it lies dormant until an observer subscribes to it, at which point it’ll emit the values to that individual observer. If you subscribe with another observer, that will see another copy of the range. This isn’t quite the same as a normal event such as a button click, where several observers could be subscribed to the same actual sequence of values—and the values may be effectively yielded whether there are any observers or not. (You can click a button even if there aren’t any event handlers attached, after all.) Sequences like this are known as hot observables. It’s important to know which type you’re dealing with, even though the same set of operations apply to both kinds.

Now that we’ve done the simplest thing possible, let’s try some familiar LINQ operators.

12.5.3. Querying observables

By now I’m sure you’re familiar with the pattern—there are various extension methods in a static class (called Observable, somewhat predictably) that perform appropriate transformations. We’ll look at just a few of the available operators, and think a little about what’s not available, and why it’s not.

Filtering and Projecting

Let’s jump straight into a query expression that takes a sequence of numbers, filters out the odd ones, and squares anything that’s left. We subscribe Console.WriteLine to the final result of the query, so that any items produced will be displayed. The following listing shows the code—look at how the query expression could easily be a LINQ to Objects query.

Listing 12.14. Filtering and projecting in LINQ to Rx

var numbers = Observable.Range(0, 10);
var query = from number in numbers
            where number % 2 == 0
            select number * number;
query.Subscribe(Console.WriteLine);

For simplicity’s sake, I haven’t added handlers for completion or error—and using the conversion from the Console.WriteLine method group to an Action<int> keeps the code nice and short. This produces the same results it would in LINQ to Objects: 0, 4, 16 and so on. Let’s move on to grouping.

Grouping

A group by query expression in LINQ to Rx produces a new IGroupedObservable<T> for each group—although what you then do with the grouping isn’t always obvious. For example, it’s not uncommon to have a nested subscription so that each time a new group is produced, you subscribe an observer to that group. The results within each group are produced as they’re received by the grouping construct—effectively it acts as a sort of redirection choice, like an usher at a play examining each person’s ticket as they arrive, and directing them to the relevant section of the theatre. By contrast, LINQ to Objects collects a whole group together before returning it—which means it has to read to the end of the sequence, buffering all the results.

The following listing shows an example of this nested subscription, and also demonstrates how group results are emitted.

Listing 12.15. Grouping numbers mod 3

var numbers = Observable.Range(0, 10);
var query = from number in numbers
            group number by number % 3;
query.Subscribe(group => group.Subscribe
    (x => Console.WriteLine("Value: {0}; Group: {1}", x, group.Key)));

The best way to understand this is probably to remember that dealing with groups in LINQ to Objects often involves having a nested foreach loop—so we have nested subscriptions in LINQ to Rx. When in doubt, try to find the duality between the two data models. In LINQ to Objects we’d normally process each whole group in turn, whereas the order in LINQ to Rx means the output of listing 12.15 looks like this:

Value: 0; Group: 0
Value: 1; Group: 1
Value: 2; Group: 2
Value: 3; Group: 0
Value: 4; Group: 1
Value: 5; Group: 2
Value: 6; Group: 0
Value: 7; Group: 1
Value: 8; Group: 2
Value: 9; Group: 0

This makes perfect sense when you think of the push model—and in some cases it means that operations that would’ve required a lot of data buffering in LINQ to Objects can be implemented in LINQ to Rx much more efficiently. As a final example, let’s look at another operator that uses multiple sequences.

Flattening

LINQ to Rx supplies a few overloads of SelectMany and the idea is still the same as in LINQ to Objects: each item in the original sequence produces a new sequence, and the result is the combination of all these new sequences, flattened. The following listing shows this in action—it’s a little like listing 11.16, when we first introduced SelectMany in LINQ to Objects.

Listing 12.16. `SelectMany` producing multiple ranges

var query = from x in Observable.Range(1, 3)
            from y in Observable.Range(1, x)
            select new { x, y };
query.Subscribe(Console.WriteLine);

Here are the results, which should be reasonably predictable:

{ x = 1, y = 1 }
{ x = 2, y = 1 }
{ x = 2, y = 2 }
{ x = 3, y = 1 }
{ x = 3, y = 2 }
{ x = 3, y = 3 }

In this case, the results are deterministic, but that’s only because by default, Observable.Range emits items on the current thread. It’s entirely possible to have multiple sequences being produced on multiple threads. For fun, you might want to change the second call to Observable.Range to specify Scheduler.ThreadPool as a third argument. At that point, while each of the inner sequences comes out in order with respect to itself, those separate sequences can be mixed up amongst each other. Imagine a sports stadium with one official firing a starting pistol for several different races in quick succession: even if you know the winner of each race, you don’t know which race will finish first.

Apologies if this makes you want to go and lie down. If it’s any consolation, it gives me the same feeling. I do find it fascinating at the same time though.

What’s in and What’s Out?

We already know that a let clause works by just calling Select, so that’s okay—but not all LINQ to Objects operators are implemented in LINQ to Rx. The missing operators are generally the ones that would have to buffer all their output and return a new observable. For example, there’s no Reverse method, and no OrderBy. C# is quite happy with that—it just won’t let you use an orderby clause in a query expression based on observables. There’s a Join method, but that doesn’t deal with observables directly—it handles join plans. This is part of the Rx implementation of the join-calculus, and well beyond the scope of this book. Likewise there’s no GroupJoin method, so join...into isn’t supported.

For the various LINQ standard query operators that aren’t covered by the query expression syntax—and to see the range of extra methods it makes available—see the System.Reactive documentation. Although you may start off being disappointed about the familiar functionality from LINQ to Objects that’s missing in LINQ to Rx (usually because it just doesn’t make sense), you may be surprised by how rich the set of available methods really is. Many of the new methods are then ported to LINQ to Objects in the System.Interactive assembly.

12.5.4. What’s the point?

I’m well aware that I haven’t provided any compelling reasons to use LINQ to Rx yet. This is deliberate, as I don’t intend to show a full, useful example—it’s incidental to the point of this chapter, and would take too much space. But Rx provides an elegant way of thinking about all kinds of asynchronous processes—normal .NET events (which can be viewed as an observable using Observable.FromEvent), asynchronous I/O, and calls to web services, for example. It provides a way of managing the complexity and concurrency in an efficient manner. There’s no doubt that it is harder to get your head around than LINQ to Objects, but if you’re in the kind of situation where it’d be useful, you’re already facing a mountain of complexity.

LINQ to Rx is a relatively young project, with the first release appearing on DevLabs in November 2009. If you’ve found this short introduction interesting, you should definitely take a closer look. The reason I wanted to cover Rx in this book, despite not being able to do it any sort of justice, is because it shows why LINQ was designed the way it was. Although there are conversion methods available between IEnumerable<T> and IObservable<T>, there’s no inheritance relationship—if the language had made any requirement that the types involved in LINQ had to be pull sequences, there would’ve been no query expression support for Rx at all. It would’ve been even more disastrous if extension methods had been limited to IEnumerable<T> in some way. Likewise, we’ve seen that not all the normal LINQ operators are applicable to Rx—which is why it’s important that the language specifies query translations in terms of a pattern that should be supported as far as it makes sense for the given provider. I hope you have a sense that even though the push and pull models are very different to work with, LINQ acts as a sort of unifying force where possible.

You may be relieved to hear that our last topic is a lot simpler—it’s back on the home ground of LINQ to Objects, but this time we’re writing our own extension methods.

12.6. Extending LINQ to Objects

One of the nice things about LINQ is that it’s extensible. Not only can you come up with your own query providers and data models, you can also add to existing ones. In my experience, the most common situation where this is useful is with LINQ to Objects. If you need a particular type of query that isn’t directly supported (or is awkward or inefficient with the standard query operators), you can write your own. Of course, writing a general-purpose generic method can be more challenging than just solving your immediate problem, but if you find yourself writing similar code a few times, it’s worth considering whether you could refactor it into a new operator.

Personally I enjoy writing query operators. There are interesting technical challenges, but it rarely requires a huge amount of code—and the results can be elegant. In this section we’ll look at some of the ways you can make your custom operators behave efficiently and predictably, followed by a full sample for selecting a random element from a sequence.

12.6.1. Design and implementation guidelines

Most of these may seem fairly obvious, but this section can form a useful checklist when you write an operator.

Unit Tests

It’s generally pretty easy to write a good set of unit tests for operators, although you may be surprised at how many you end up with for what originally appears to be simple code. Don’t forget to test corner cases such as empty sequences as well as invalid arguments. MoreLINQ has some helper methods in its unit test project that you may wish to use for your own tests.

Argument Checking

Good methods check their arguments... but there’s a problem when it comes to LINQ operators. Many operators return another sequence, as we’ve already seen—and iterator blocks are the easiest way to implement this functionality. But you should really perform the argument checking as soon as your method is called, rather than waiting until the caller decides to iterate over the results. If you’re going to use an iterator block, split your method into two: perform argument checking in a public method and then call a private method to do the iteration.

Optimization

IEnumerable<T> itself is fairly weak in terms of the operations it supports, but the execution-time type of a sequence you’re working on may have considerably more functionality. For example, the Count() operator will always work, but it’ll generally be an O(n) operation. If you call it on an implementation of ICollection<T>, though, it can use the Count property directly—which will generally be O(1). In .NET 4, this optimization is extended to cover ICollection as well. Likewise retrieving a specific element by index is slow in the general case, but can be efficient if the sequence implements IList<T>. If your operator can benefit from these optimizations, you can have different execution paths depending on the execution-time type. To test the slow path in unit tests, you can always call Select(x => x) on a List<T> to retrieve a nonlist sequence. LinkedList<T> can test the case where you want an ICollection<T> that doesn’t implement IList<T>.

Documentation

It’s important to document what your code will do with its inputs, and also the expected performance of the operator. This is particularly important if your method needs to work with multiple sequences: which will be evaluated first, and how far? Does your code stream its data, buffer it, or a mixture? Does it use deferred or immediate execution? Can any parameters be null, and if so, does that have a special meaning?

Iterate Once Where Possible

IEnumerable<T> will let you iterate over it multiple times—you can have multiple iterators active at the same time over the same sequence, potentially. But this is rarely a good idea within an operator. Wherever possible, it’s wise to iterate over your input sequences just once. This will mean your code will work even for nonrepeatable sequences, such as lines read from a network stream. If you do need to read the sequence multiple times (and you don’t want to buffer the whole sequence yourself like Reverse does), you should draw particular attention to this in the documentation.

Remember to Dispose of Iterators

In most cases, you can use a foreach statement to iterate over your data source. But it’s sometimes useful to treat the first item differently, in which case using an iterator directly can lead to the simplest code. In that situation, remember to include a using block for the iterator. We’re not used to disposing of iterators ourselves because normally foreach does it for us, which can make it hard to spot the bug.

Custom Comparisons

Many LINQ operators have overloads that allow you to specify an appropriate IEqualityComparer<T> or IComparer<T>. If you’re building a general-purpose library for others (potentially developers who you aren’t in contact with), it may be worth providing similar overloads yourself. On the other hand, if you’re the sole user, or it’s just going to be members of your team, you can do this on a need-to-implement basis. It’s easy though: typically the simpler overloads just call a more complex one, passing EqualityComparer<T>.Default or Comparer<T>.Default as the comparison.

Now that I’ve talked the talk, let’s check whether I can actually walk the walk.

12.6.2. Sample extension: selecting a random element

The idea of our extension method is simple: given a sequence and an instance of Random, return a random element from the sequence. You could add an overload that didn’t require the instance of Random, but I prefer to make the dependency on a random number generator explicit. Randomness is a tricky topic for various reasons; rather than discuss it here, I’ve included an article on the book’s website (see http://mng.bz/h483). Also for reasons of space, I haven’t included the XML documentation or unit tests in listing 12.17, but of course they’re in the downloadable code.

Listing 12.17. Extension method to choose a random element from a sequence

Listing 12.17 doesn’t show the technique of splitting an extension method into argument validation and then implementation, because it doesn’t use an iterator block. Look back at our implementation of the Where operator in section 10.3.7 for an example of this. No custom comparisons are required either—but apart from that, every item on our checklist is appropriate.

First we validate our arguments in the obvious way . In chapter 15 we’ll learn an alternative way of expressing preconditions using Code Contracts, but for now I’ve kept with normal exceptions. Things get more interesting at —we handle the case where the source sequence implements ICollection.^[9] This allows us to take the count cheaply, and then generate just a single random number to work out which element to pick. We don’t explicitly handle the case where the source sequence implements IList<T>—instead, we rely on ElementAt to do that for us (as it’s documented to do).

⁹ The downloadable code contains the same test for implementations of ICollection<T>, just like Count() does in .NET 4. It’s exactly the same block of code, just with a different type and a different variable name.

If we’re dealing with a noncollection sequence (such as the result of another query operator), we want to avoid taking the count and then picking an element: that would require us to either buffer the contents of the sequence or iterate over it twice. Instead we step through it once, explicitly fetching the iterator so that we can test for an empty sequence easily. The clever bit^[10] is at —we replace our current idea of a random element with the element from the iterator with a probability of 1/n, where n is the number of elements we’ve seen so far. So there’s a 1/2 chance of replacing the first element with the second, a 1/3 chance of replacing the result after two elements with the third element, and so on. The final result is that each element in the sequence has an equal chance of being picked, and we’ve managed to iterate just once.

¹⁰ I’m allowed to say that it’s clever because even though it’s my implementation, it’s not my idea.

Of course the important point isn’t what this particular method does—it’s the potential issues we had to think about as we implemented it. Once you know what to look for, it really doesn’t take much effort to implement a robust method like this, and your personal toolbox will grow over time.

12.7. Summary

Phew! This chapter has been the exact opposite of most of the rest of the book. Instead of focusing on a single topic in great detail, we’ve covered a range of LINQ technologies, but at a shallow level.

I wouldn’t expect you to feel particularly familiar with any one of the specific technologies we’ve looked at here, but I hope you have a deeper understanding of why LINQ is important. It’s not about XML, or in-memory queries, SQL queries, observables, or enumerators—it’s about consistency of expression, and giving the C# compiler the opportunity to validate your queries to at least some extent, regardless of their final execution platform.

You should now appreciate why expression trees are so important that they’re among the few framework elements that the C# compiler has direct intimate knowledge of (along with strings, IDisposable, IEnumerable<T>, and Nullable<T>, for example). They’re passports for behavior, allowing it to cross the border of the local machine, expressing logic in whatever foreign tongue is catered for by a LINQ provider.

It’s not just expression trees—we’ve also relied on the query expression translation employed by the compiler, and the way that lambda expressions can be converted to both delegates and expression trees. Extension methods are also important, as without them each provider would have to give implementations of all the relevant methods. If you look back at all the new features of C#, you’ll find few that don’t contribute significantly to LINQ in some way or other. That’s part of the reason for this chapter’s existence: to show the connections between all the features.

I shouldn’t wax lyrical for too long, though—as well as the upsides of LINQ, we’ve seen a few gotchas. LINQ won’t always allow us to express everything we need in a query, nor does it hide all the details of the underlying data source. When it comes to database LINQ providers, the impedance mismatches that have caused developers so much trouble in the past are still with us: we can reduce their impact with ORM systems and the like, but without a proper understanding of the query being executed on your behalf, you’re likely to run into significant issues. In particular, don’t think of LINQ as a way of removing your need to understand SQL—just think of it as a way of hiding the SQL when you’re not interested in the details. Likewise, in order to plan an effective parallel query, you’ve got to know where ordering matters and where it doesn’t, and perhaps help the framework along a bit by giving it more tuning information.

Since .NET 3.5 came out, I’ve been delighted to see how wholeheartedly the community has embraced it. In that case I have the benefit of hindsight. I have little idea of how developers will take to the features of C# 4... but let’s dive into them in the final part of the book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12. LINQ beyond collections

Create new playlist

Sign In

Sign Up

Chapter 12. LINQ beyond collections

12.1. Querying a database with LINQ to SQL

Why Linq to Sql Rather Than the Entity Framework?

12.1.1. Getting started: the database and model

Creating the Database Schema

Creating the Entity Classes

Figure 12.1. The LINQ to SQL classes designer showing the rearranged and modified entities

12.1.2. Initial queries

First Query: Finding Defects Assigned to Tim

Listing 12.1. Querying the database to find all Tim’s open defects

SQL Generation for a More Complex Query: A Let Clause

Listing 12.2. Using a let clause in LINQ to SQL

12.1.3. Queries involving joins

Explicit Joins: Matching Defects With Notification Subscriptions

Implicit Joins: Showing Defect Summaries and Project Names

12.2. Translations using IQueryable and IQueryProvider

12.2.1. Introducing IQueryable<T> and related interfaces

Figure 12.2. Class diagram based on the interfaces involved in IQueryable<T>

12.2.2. Faking it: interface implementations to log calls

Listing 12.3. A simple implementation of IQueryable that logs method calls

Listing 12.4. An implementation of IQueryProvider that uses FakeQuery

12.2.3. Gluing expressions together: the Queryable extension methods

Figure 12.3. A query taking two paths, depending on whether the data source implements IQueryable or only IEnumerable

12.2.4. The fake query provider in action

Listing 12.5. A simple query expression using the fake query classes

Listing 12.6. IQueryProvider.Execute

12.2.5. Wrapping up IQueryable

12.3. LINQ-friendly APIs and LINQ to XML

12.3.1. Core types in LINQ to XML

Figure 12.4. Class diagram for LINQ to XML, showing the most commonly used types

12.3.2. Declarative construction

Listing 12.7. Creating elements from the sample users

Listing 12.8. Creating elements with text nodes

12.3.3. Queries on single nodes

Listing 12.9. Displaying the users within an XML structure

12.3.4. Flattened query operators

12.3.5. Working in harmony with LINQ

12.4. Replacing LINQ to Objects with Parallel LINQ

12.4.1. Plotting the Mandelbrot set with a single thread

Listing 12.10. Single-threaded Mandelbrot generation query

Figure 12.5. Mandelbrot image generated on a single thread

12.4.2. Introducing ParallelEnumerable, ParallelQuery, and AsParallel

Figure 12.6. Class diagram for Parallel LINQ, including relationship to normal LINQ interfaces

Listing 12.11. First attempt at a multithreaded Mandelbrot generation query

Figure 12.7. Mandelbrot image generated using an unordered query, resulting in some sections being incorrectly placed

12.4.3. Tweaking parallel queries

Listing 12.12. Multithreaded Mandelbrot query maintaining ordering

Play with the Code Yourself

12.5. Inverting the query model with LINQ to Rx

12.5.1. IObservable<T> and IObserver<T>

Figure 12.8. Sequence diagram showing the duality of IEnumerable<T> and IObservable<T>

12.5.2. Starting simply (again)

Listing 12.13. First contact with IObservable<T>

12.5.3. Querying observables

Filtering and Projecting

Listing 12.14. Filtering and projecting in LINQ to Rx

Grouping

Listing 12.15. Grouping numbers mod 3

Flattening

Listing 12.16. SelectMany producing multiple ranges

What’s in and What’s Out?

12.5.4. What’s the point?

12.6. Extending LINQ to Objects

12.6.1. Design and implementation guidelines

Unit Tests

Argument Checking

Optimization

Documentation

Iterate Once Where Possible

Remember to Dispose of Iterators

Custom Comparisons

12.6.2. Sample extension: selecting a random element

Listing 12.17. Extension method to choose a random element from a sequence

12.7. Summary

Table of Contents for
Chapter 12. LINQ beyond collections

Listing 12.2. Using a `let` clause in LINQ to SQL

Figure 12.2. Class diagram based on the interfaces involved in `IQueryable<T>`

Listing 12.3. A simple implementation of `IQueryable` that logs method calls

Listing 12.4. An implementation of `IQueryProvider` that uses `FakeQuery`

Figure 12.3. A query taking two paths, depending on whether the data source implements `IQueryable` or only `IEnumerable`

Listing 12.6. `IQueryProvider.Execute`

Figure 12.8. Sequence diagram showing the duality of `IEnumerable<T>` and `IObservable<T>`

Listing 12.13. First contact with `IObservable<T>`

Listing 12.16. `SelectMany` producing multiple ranges