15. LINQ with Query Expressions

The end of Chapter 14 showed a query using standard query operators for GroupJoin(), SelectMany(), and Distinct(), in addition to the creation of two anonymous types. The result was a statement that spanned multiple lines and was rather more complex and difficult to comprehend than statements typically written using only features of C# 2.0. Modern programs that manipulate rich data sets often require such complex queries; it would therefore be nice if the language made them easier to read. Domain-specific query languages such as SQL make it much easier to read and understand a query, but lack the full power of the C# language. That is why the C# language designers added query expressions syntax to C# 3.0. With query expressions, many standard query operator expressions are transformed into more readable code, much like SQL.

Image

In this chapter, we introduce query expressions and use them to express many of the queries from the preceding chapter.

Introducing Query Expressions

Two of the most frequent operations developers perform are filtering the collection to eliminate unwanted items and projecting the collection so that the items take a different form. For example, given a collection of files, we could filter it to create a new collection of only the files with a “.cs” extension, or only files larger than 1 million bytes. We could also project the file collection to create a new collection of paths to the directories the files are located in and the corresponding directory size. Query expressions provide straightforward syntaxes for both of these common operations. Listing 15.1 shows a query expression that filters a collection of strings; Output 15.1 shows the results.

Listing 15.1. Simple Query Expression


using System;
using System.Collections.Generic;
using System.Linq;

// ...

  static string[] Keywords = {
      "abstract", "add*", "alias*", "as", "ascending*",
      "async*", "await*", "base","bool", "break",
      "by*", "byte", "case", "catch", "char", "checked",
      "class", "const", "continue", "decimal", "default",
      "delegate", "descending*", "do", "double",
      "dynamic*", "else", "enum", "event", "equals*",
      "explicit", "extern", "false", "finally", "fixed",
      "from*", "float", "for", "foreach", "get*", "global*",
      "group*", "goto", "if", "implicit", "in", "int",
      "into*", "interface", "internal", "is", "lock", "long",
      "join*", "let*", "namespace", "new", "null", "object",
      "on*", "operator", "orderby*", "out", "override",
      "params", "partial*", "private", "protected", "public",
      "readonly", "ref", "remove*", "return", "sbyte", "sealed",
      "select*", "set*", "short", "sizeof", "stackalloc",
      "static", "string", "struct", "switch", "this", "throw",
      "true", "try", "typeof", "uint", "ulong", "unchecked",
      "unsafe", "ushort", "using", "value*", "var*", "virtual",
      "void", "volatile", "where*", "while", "yield*"};

  private static void ShowContextualKeywords1()
  {
      IEnumerable<string> selection =
          from word in Keywords
              where !word.Contains('*')
              select word;


      foreach (string keyword in selection)
      {
          Console.Write(keyword + " ");
      }
  }

// ...

Output 15.1.

abstract as base bool break byte case catch char checked class const
continue decimal default delegate do double else enum event explicit
extern false finally fixed float for foreach goto if implicit in int
interface internal is lock long namespace new null object operator out
override params private protected public readonly ref return sbyte
sealed short sizeof stackalloc static string struct switch this throw
true try typeof uint ulong unchecked unsafe ushort using virtual void
volatile while

In this query expression, selection is assigned the collection of C# reserved keywords. The query expression in this example includes a where clause that filters out the noncontextual keywords.

Query expressions always begin with a “from clause” and end with a “select clause” or a “group clause,” identified by the from, select, or group contextual keyword, respectively. The identifier word in the from clause is called a range variable; it represents each item in the collection, much as the loop variable in a foreach loop represents each item in a collection.

Developers familiar with SQL will notice that query expressions have a syntax that is similar to that of SQL. This design was deliberate so that LINQ would be easy to learn for programmers who already know SQL. However, there are some obvious differences. The first difference most developers familiar with SQL notice is that the C# query expression shown here has the clauses in the order from, then where, then select. The equivalent SQL query has the SELECT clause first, then the FROM clause, and ends with the WHERE clause.

One reason for this is to enable IntelliSense, the feature of the IDE whereby the editor produces helpful user interface elements such as drop-down lists that describe the members of a given object. Because from appears first and identifies the string array Keywords as the data source, the code editor can deduce that the range variable word is of type string. When you are typing the code into the editor and reach the dot following word, the editor will display only the members of string.

If the from clause appeared after the select, as it does in SQL, as you were typing in the query the editor would not know what the data type of word was, and therefore would not be able to display a list of word’s members. In Listing 15.1, for example, it wouldn’t be possible to predict that Contains() was a possible member of word.

The C# query expression order also more closely matches the order in which operations are logically performed. When evaluating the query, you begin by identifying the collection (described by the from clause), then filter out the unwanted items (with the where clause), and finally describe the desired result (with the select clause).

Finally, the C# query expression order ensures that the rules about where (range) variables are in scope are mostly consistent with the scoping rules for local variables; for example, a (range) variable must be declared by a clause (typically a from clause) before the variable can be used, much as a local variable must always be declared before it can be used.

Projection

The result of a query expression is a collection of type IEnumerable<T> or IQueryable<T>.1 The actual type T is inferred from the select or group by clause. In Listing 15.1, for example, the compiler knows that Keywords is of type string[], which is convertible to IEnumerable<string> and deduces that word is therefore of type string. The query ends with select word, and therefore, the result of the query expression must be a collection of strings, so the type of the query expression is IEnumerable<string>.

1. The result of a query expression is, as a practical matter, almost always IEnumerable<T> or a type derived from it. It is legal, though somewhat perverse, to create an implementation of the query methods that return other types; there is no requirement in the language that the result of a query expression be convertible to IEnumerable<T>.

In this case the “input” and “output” of the query are both a collection of strings. However, the “output” type can be quite different from the “input” type if the expression in the select clause is of an entirely different type. Consider the query expression in Listing 15.2, and its corresponding output in Output 15.2.

Listing 15.2. Projection Using Query Expressions


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...

  static void List1(string rootDirectory, string searchPattern)
  {
      IEnumerable<string> fileNames = Directory.GetFiles(
          rootDirectory, searchPattern);
      IEnumerable<FileInfo> fileInfos =
          from fileName in fileNames
          select new FileInfo(fileName);


      foreach (FileInfo fileInfo in fileInfos)
      {
            Console.WriteLine(".{0}({1})",
                fileInfo.Name, fileInfo.LastWriteTime);
      }
  }

// ...

Output 15.2.

Account.cs(11/22/2011 11:56:11 AM)
Bill.cs(8/10/2011 9:33:55 PM)
Contact.cs(8/19/2011 11:40:30 PM)
Customer.cs(11/17/2011 2:02:52 AM)
Employee.cs(8/17/2011 1:33:22 AM)
Person.cs(10/22/2011 10:00:03 PM)

This query expression results in an IEnumerable<FileInfo> rather than the IEnumerable<string> data type returned by Directory.GetFiles(). The select clause of the query expression can potentially project out a data type that is different from what was collected by the from clause expression.

Notice that in this example, the type FileInfo was chosen because it has the two relevant fields needed for the desired output: the filename and the last write time. There might not be such a convenient type if you needed other information not captured in the FileInfo object. Anonymous types provide a convenient and concise way to project the exact data you need without having to find or create an explicit type. (In fact, this scenario was the key motivator for adding anonymous types to the language.) Listing 15.3 provides output similar to that in Listing 15.2, but via anonymous types rather than FileInfo.

Listing 15.3. Anonymous Types within Query Expressions


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...

  static void List2(string rootDirectory, string searchPattern)
  {
        var fileNames =Directory.GetFiles(
            rootDirectory, searchPattern)
        var fileResults =
            from fileName in fileNames
            select new
            {
                Name = fileName,
                LastWriteTime = File.GetLastWriteTime(fileName)
            };


        foreach (var fileResult in fileResults)
        {
            Console.WriteLine("{0}({1})",
                fileResult.Name, fileResult.LastWriteTime);

        }
  }

// ...

In this example, the query projects out only the filename and its last file write time. A projection such as the one in Listing 15.3 makes little difference when working with something small such as FileInfo. However, “horizontal” projection that filters down the amount of data associated with each item in the collection is extremely powerful when the amount of data is significant and retrieving it (perhaps from a different computer over the Internet) is expensive. Rather than retrieving all the data when a query executes, the use of anonymous types enables the capability of storing and retrieving only the required data into the collection.

Imagine, for example, a large database that has tables with 30 or more columns. If there were no anonymous types, developers would be required to either use objects containing unnecessary information or define small, specialized classes useful only for storing the specific data required. Instead, anonymous types enable support for types to be defined by the compiler—types that contain only the data needed for their immediate scenario. Other scenarios can have a different projection of only the properties needed for that scenario.


Filtering

In Listing 15.1, we include a where clause that filters out reserved keywords but not contextual keywords. The where clause filters the collection “vertically”; if you think of the collection as a vertical list of items, the where clause makes that vertical list shorter so that there are fewer items within the collection. The filter criteria are expressed with a predicate—a lambda expression that returns a bool such as word.Contains() (as in Listing 15.1) or File.GetLastWriteTime(file) < DateTime.Now.AddMonths(-1). The latter is shown in Listing 15.6, the output of which appears in Output 15.5.

Listing 15.6. Query Expression Filtering Using where


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...

  static void FindMonthOldFiles(
      string rootDirectory, string searchPattern)
  {
      IEnumerable<FileInfo> files =
          from fileName in Directory.GetFiles(
              rootDirectory, searchPattern)
          where File.GetLastWriteTime(fileName) <
              DateTime.Now.AddMonths(-1)

          select new FileInfo(fileName);

      foreach (FileInfo file in files)
      {
          //  As simplification, current directory is
          //  assumed to be a subdirectory of
          //  rootDirectory
          string relativePath = file.FullName.Substring(
                  Environment.CurrentDirectory.Length);
          Console.WriteLine(".{0}({1})",
              relativePath, file.LastWriteTime);
      }
  }

// ...

Output 15.5.

.TestDataBill.cs(8/10/2011 9:33:55 PM)
.TestDataContact.cs(8/19/2011 11:40:30 PM)
.TestDataEmployee.cs(8/17/2011 1:33:22 AM)
.TestDataPerson.cs(10/22/2011 10:00:03 PM)

Sorting

To order the items using a query expression you can use the orderby clause, as shown in Listing 15.7.

Listing 15.7. Sorting Using a Query Expression with an orderby Clause


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...
  static void ListByFileSize1(
      string rootDirectory, string searchPattern)
  {
      IEnumerable<string> fileNames =
          from fileName in Directory.GetFiles(
              rootDirectory, searchPattern)
          orderby (new FileInfo(fileName)).Length descending,
              fileName

          select fileName;

      foreach (string fileName in fileNames)
      {
          Console.WriteLine("{0}", fileName);
      }
  }
// ...

Listing 15.7 uses the orderby clause to sort the files returned by Directory.GetFiles() first by file size in descending order and then by filename in ascending order. Multiple sort criteria are separated by a comma such that first the items are ordered by size, and if the size is the same they are ordered by filename. ascending and descending are contextual keywords indicating the sort order direction. Specifying the order as ascending or descending is optional; if the direction is omitted (as it is here on filename) the default is ascending.

The let Clause

In Listing 15.8 we have a query that is very similar to that in Listing 15.7 except that the type argument of IEnumerable<T> is FileInfo. Notice that there is a problem with this query: We have to redundantly create a FileInfo twice, in both the orderby clause and the select clause.

Listing 15.8. Projecting a FileInfo Collection and Sorting by File Size


using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;

// ...
  static void ListByFileSize2(
      string rootDirectory, string searchPattern)
  {
      IEnumerable<FileInfo> files =
          from fileName in Directory.GetFiles(
              rootDirectory, searchPattern)
          orderby new FileInfo(fileName).Length, fileName
          select new FileInfo(fileName);

      foreach (FileInfo file in files)
      {
          //  As simplification, current directory is
          //  assumed to be a subdirectory of
          //  rootDirectory
          string relativePath = file.FullName.Substring(
              Environment.CurrentDirectory.Length);
          Console.WriteLine(".{0}({1})",
              relativePath, file.Length);
      }
  }
// ...

Unfortunately, although the end result is correct, Listing 15.8 ends up instantiating a FileInfo object twice for each item in the source collection, which seems wasteful and unnecessary. To avoid unnecessary and potentially expensive overhead like this, you can use a let clause, as demonstrated in Listing 15.9.

Listing 15.9. Ordering the Results in a Query Expression


     // ...
     IEnumerable<FileInfo> files =
          from fileName in Directory.GetFiles(
              rootDirectory, searchPattern)
          let file = new FileInfo(fileName)
          orderby file.Length, fileName
          select file;

      // ...

The let clause introduces a new range variable that can hold the value of an expression that is used throughout the remainder of the query expression. You can add as many let clauses as you like; simply add each as an additional clause to the query after the first from clause but before the final select/group by clause.

Grouping

A common data manipulation scenario is the grouping of related items. In SQL, this generally involves aggregating the items to produce a summary or total or other aggregate value. LINQ, however, is more expressive than this. LINQ expressions allow for individual items to be grouped into a series of subcollections, and for those groups to be associated with items in the collection being queried. For example, Listing 15.10 and Output 15.6 demonstrate how to group together the contextual keywords and the regular keywords.

Listing 15.10. Grouping Together Query Results


using System;
using System.Collections.Generic;
using System.Linq;

// ...

  private static void GroupKeywords1()
  {
      IEnumerable<IGrouping<bool, string>> selection =
          from word in Keywords
          group word by word.Contains('*');

      foreach (IGrouping<bool, string> wordGroup
          in selection)
      {
          Console.WriteLine(Environment.NewLine + "{0}:",
              wordGroup.Key ?
                  "Contextual Keywords" : "Keywords");
          foreach (string keyword in wordGroup)
          {
              Console.Write(" " +
                  (wordGroup.Key ?
                      keyword.Replace("*", null) : keyword));
          }
      }
  }

// ...

Output 15.6.

Keywords:
 abstract as base bool break byte case catch char checked class const
continue decimal default delegate do double else enum event explicit
extern false finally fixed float for foreach goto if implicit in int
interface internal is lock long namespace new null object operator out
override params private protected public readonly ref return sbyte
sealed short sizeof stackalloc static string struct switch this throw
true try typeof uint ulong unchecked unsafe ushort using virtual void
volatile while
Contextual Keywords:
 add alias ascending async await by descending dynamic equals from
get global group into join let on orderby partial remove select
set value var where yield

There are several things to note in this listing. First, the query result is a sequence of elements of type IGrouping<bool, string>. The first type argument indicates that the “group key” expression following by was of type bool, and the second type argument indicates that the “group element” expression following group was of type string. That is, the query produces a sequence of groups where the Boolean key is the same for each string in the group.

Because a query with a groupby clause produces a sequence of collections, the common pattern for iterating over the results is to create nested foreach loops. In Listing 15.10, the outer loop iterates over the groupings and prints out the type of keyword as a header. The nested foreach loop prints each keyword in the group as an item below the header.

Since the result of this query expression is itself a sequence, you can query the resultant sequence like any other. Listing 15.11 and Output 15.7 show how to create an additional query that adds a projection onto a query that produces a sequence of groups. (The next section, on query continuations, shows a more pleasant syntax for adding additional query clauses to a complete query.)

Listing 15.11. Selecting an Anonymous Type Following the group Clause


using System;
using System.Collections.Generic;
using System.Linq;

// ...

  private static void GroupKeywords1()
  {
      IEnumerable<IGrouping<bool, string>> keywordGroups =
          from word in Keywords
          group word by word.Contains('*');

      var selection =
          from groups in keywordGroups
          select new
          {
              IsContextualKeyword = groups.Key,
              Items = groups
          };


      foreach (var wordGroup in selection)
      {
          Console.WriteLine(Environment.NewLine + "{0}:",
              wordGroup.IsContextualKeyword ?
                  "Contextual Keywords" : "Keywords");
          foreach (var keyword in wordGroup.Items)
          {
              Console.Write(" " +
                  keyword.Replace("*", null));
          }
      }
  }

// ...

Output 15.7.

Keywords:
 abstract as base bool break byte case catch char checked class const
continue decimal default delegate do double else enum event explicit
extern false finally fixed float for foreach goto if implicit in int
interface internal is lock long namespace new null object operator out
override params private protected public readonly ref return sbyte
sealed short sizeof stackalloc static string struct switch this throw
true try typeof uint ulong unchecked unsafe ushort using virtual void
volatile while
Contextual Keywords:
 add alias ascending async await by descending dynamic equals from
get global group into join let on orderby partial remove select
set value var where yield

The group clause results in a query that produces a collection of IGrouping<TKey, TElement> objects—just as the GroupBy() standard query operator did (see Chapter 14). The select clause in the subsequent query uses an anonymous type to effectively rename IGrouping<TKey, TElement>.Key to IsContextualKeyword and naming the subcollection property Items. With this change, the nested foreach uses wordGroup.Items rather than wordGroup directly, as shown in Listing 15.10. Another potential property to add to the anonymous type would be the count of items within the subcollection. However, this is available on wordGroup.Items.Count(), so the benefit of adding it to the anonymous type directly is questionable.

Query Continuation with into

As we saw in Listing 15.11, you can use an existing query as the input to a second query. However, it is not necessary to write an entirely new query expression when you want to use the results of one query as the input to another. You can extend any query with a query continuation clause using the contextual keyword into. A query continuation is nothing more than syntactic sugar for creating two queries and using the first as the input to the second. The range variable introduced by the into clause (groups in Listing 15.11) becomes the range variable for the remainder of the query; any previous range variables are logically a part of the earlier query and cannot be used in the query continuation. Listing 15.12 shows how to rewrite the code of Listing 15.11 to use a query continuation instead of two queries.

Listing 15.12. Selecting without the Query Continuation


using System;
using System.Collections.Generic;
using System.Linq;

// ...

  private static void GroupKeywords1()
  {
      var selection =
          from word in Keywords
          group word by word.Contains('*')
          into groups
              select new
              {
                  IsContextualKeyword = groups.Key,
                  Items = groups
              };


      // ...

  }

// ...

The ability to run additional queries on the results of an existing query using into is not specific to queries ending with group clauses, but rather can be used on all query expressions. Query continuation is simply a shorthand for writing query expressions that consume the results of other query expressions. You can think of into as a “pipeline operator,” because it “pipes” the results of the first query into the second query. You can arbitrarily chain together many queries in this way.

“Flattening” Sequences of Sequences with Multiple from Clauses

It is often desirable to “flatten” a sequence of sequences into a single sequence. For example, a sequence of customers might each have an associated sequence of orders, or a sequence of directories might each have an associated sequence of files. The SelectMany sequence operator (discussed in Chapter 14) concatenates together all the subsequences; to do the same thing with query expression syntax you can use multiple from clauses, as shown in Listing 15.13.

Listing 15.13. Multiple Selection



     var selection =
          from word in Keywords
          from character in word
          select character;

The preceding query will produce the sequence of characters a, b, s, t, r, a, c, t, a, s, b, a, s, e, ....

Multiple from clauses can also be used to produce the Cartesian product—the set of all possible combinations of several sequences—as shown in Listing 15.14.

Listing 15.14. Cartesian Product


     var numbers = new[] { 1, 2, 3 };
     var product =
          from word in Keywords
          from number in numbers
          select new {word, number};

This would produce a sequence of pairs (abstract, 1), (abstract, 2), (abstract, 3), (as, 1), (as, 2), ....

Query Expressions Are Just Method Invocations

Somewhat surprisingly, adding query expressions to C# 3 required no changes to the CLR or to the CIL language. Rather, the C# compiler simply translates query expressions into a series of method calls. Consider, for example, the query expression from Listing 15.1, a portion of which appears in Listing 15.16.

Listing 15.16. Simple Query Expression


  private static void ShowContextualKeywords1()
  {
      IEnumerable<string> selection =
          from word in Keywords
              where word.Contains('*')
              select word;

      // ...
  }

// ...

After compilation, the expression from Listing 15.16 is converted to an IEnumerable<T> extension method call from System.Linq.Enumerable, as shown in Listing 15.17.

Listing 15.17. Query Expression Translated to Standard Query Operator Syntax


  private static void ShowContextualKeywords3()
  {
      IEnumerable<string> selection =
          Keywords.Where(word => word.Contains('*'));


      // ...
  }

// ...

As discussed in Chapter 14, the lambda expression is then itself translated by the compiler to emit a method with the body of the lambda, and the usage of it becomes allocation of a delegate to that method.

Every query expression can (and must) be translated to method calls, but not every sequence of method calls has a corresponding query expression. For example, there is no query expression equivalent for the extension method TakeWhile<T>(Func<T, bool> predicate), which repeatedly returns items from the collection as long as the predicate returns true.

For those queries that do have both a method call form and a query expression form, which is better? This is a judgment call; some queries are better suited for query expressions whereas others are more readable as method invocations.


Guidelines

DO use query expression syntax to make queries easier to read, particularly if they involve complex from, let, join, or group clauses.

CONSIDER using the standard query operators (method call form) if the query involves operations that do not have a query expression syntax, such as Count(), TakeWhile(), or Distinct().


Summary

This chapter introduced a new syntax, that of query expressions. Readers familiar with SQL will immediately see the similarities between query expressions and SQL. However, query expressions also introduce additional functionality, such as grouping into a hierarchical set of new objects, which was unavailable with SQL. All of the functionality of query expressions was already available via standard query operators, but query expressions frequently provide a simpler syntax for expressing such a query. Whether through standard query operators or query expression syntax, however, the end result is a significant improvement in the way developers are able to code against collection APIs, an improvement that ultimately provides a paradigm shift in the way object-oriented languages are able to interface with relational databases.

In the next chapter, we continue our discussion of collections: investigating some of the .NET Framework collection types as well as how to define custom collections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.136.159