C H A P T E R  21

Image

Introduction to LINQ

Image What Is LINQ?

Image LINQ Providers

Image Query Syntax and Method Syntax

Image Query Variables

Image The Structure of Query Expressions

Image The Standard Query Operators

Image LINQ to XML

What Is LINQ?

In a relational database system, data is organized into nicely normalized tables and accessed with a very simple but powerful query language—SQL. SQL can work with any set of data in a database because the data is organized into tables, following strict rules.

In a program, as opposed to a database, however, data is stored in class objects or structs that are all vastly different. As a result, there's been no general query language for retrieving data from data structures. The method of retrieving data from objects has always been custom-designed as part of the program. LINQ, however, makes it easy to query collections of objects.

The following are the important high-level characteristics of LINQ:

  • LINQ stands for Language Integrated Query and is pronounced link.
  • LINQ is an extension of the .NET Framework that allows you to query collections of data in a manner similar to using SQL to query databases.
  • With LINQ you can query data from databases, collections of program objects, XML documents, and more.

The following code shows a simple example of using LINQ. In this code, the data source being queried is simply an array of ints. The definition of the query is the statement with the from and select keywords. Although the query is defined in this statement, it is actually performed and used in the foreach statement at the bottom.

   static void Main()
   {
      int[] numbers = { 2, 12, 5, 15 };         // Data source

      IEnumerable<int> lowNums =                // Define and store the query.
                         from n in numbers
                         where n < 10
                         select n;

      foreach (var x in lowNums)                // Execute the query.
          Console.Write("{0}, ", x);
   }

This code produces the following output:


2, 5,

LINQ Providers

In the previous example, the data source was simply an array of ints, which is an in-memory object of the program. LINQ, however, can work with many different types of data sources, such as SQL databases, XML documents, and a host of others. For every data source type, however, under the covers there must be a module of code that implements the LINQ queries in terms of that data source type. These code modules are called LINQ providers. The important points about LINQ providers are the following:

  • Microsoft provides LINQ providers for a number of common data source types, as shown in Figure 21-1.
  • You can use any LINQ-enabled language (C# in our case) to query any data source type for which there is a LINQ provider.
  • New LINQ providers are constantly being produced by third parties for all sorts of data source types.
Image

Figure 21-1. The architecture of LINQ, the LINQ-enabled languages, and LINQ providers

There are entire books dedicated to LINQ in all its forms and subtleties, but that's clearly beyond the scope of this chapter. Instead, this chapter will introduce you to LINQ and explain how to use it with program objects (LINQ to Objects) and XML (LINQ to XML).

Anonymous Types

Before getting into the details of LINQ's querying features, I'll start by covering a language feature that allows you to create unnamed class types. These are called, not surprisingly, anonymous types.

In Chapter 6 we covered object initializers, which is the construct that allows you to initialize the fields and properties of a new class instance when using an object-creation expression. Just to remind you, this kind of object-creation expression consists of three components: the keyword new, the class name or constructor, and the object initializer. The object initializer consists of a comma-separated list of member initializers between a set of curly braces.

Creating a variable of an anonymous type uses the same form—but without the class name or constructor. The following line of code shows the object-creation expression form of an anonymous type:

Image

The following code shows an example of creating and using an anonymous type. It creates a variable called student, with an anonymous type that has three string properties and one int property. Notice in the WriteLine statement that the instance's members are accessed just as if they were members of a named type.

Image

This code produces the following output:


Mary Jones, Age 19, Major: History

Important things to know about anonymous types are the following:

  • Anonymous types can be used only with local variables—not with class members.
  • Since an anonymous type doesn't have a name, you must use the var keyword as the variable type.

When the compiler encounters the object initializer of an anonymous type, it creates a new class type with a private name that it constructs. For each member initializer, it infers its type and creates a private variable of that type in the new class, and it creates a read/write property to access the variable. The property has the same name as the member initializer. Once the anonymous type is constructed, the compiler creates an object of that type.

Besides the assignment form of member initializers, anonymous type object initializers also allow two other forms: simple identifiers and member access expressions. These two forms are called projection initializers. The following variable declaration shows all three forms. The first member initializer is in the assignment form. The second is an identifier, and the third is a member access expression.

   var student = new { Age = 19, Major, Other.Name };

For example, the following code uses all three types. Notice that the projection initializers must be defined before the declaration of the anonymous type. Major is a local variable, and Name is a static field of class Other.

Image

This code produces the following output:


Mary Jones, Age 19, Major: History

The projection initializer form of the object initializer just shown has exactly the same result as the assignment form shown here:

   var student = new { Age = Age, Name = Other.Name, Major = Major};

Although your code cannot see the anonymous type, it's visible to object browsers. If the compiler encounters another anonymous type with the same parameter names, with the same inferred types, and in the same order, it will reuse the type and create a new instance—not create a new anonymous type.

Query Syntax and Method Syntax

There are two syntactic forms you can use when writing LINQ queries—query syntax and method syntax.

  • Query syntax is a declarative form that looks very much like an SQL statement. Query syntax is written in the form of query expressions.
  • Method syntax is an imperative form, which uses standard method invocations. The methods are from a set called the standard query operators, which will be described later in the chapter.
  • You can also combine both forms in a single query.

Microsoft recommends using query syntax because it's more readable, more clearly states your query intentions, and is therefore less error-prone. There are some operators, however, that can be written only using method syntax.

Image Note  Queries expressed using query syntax are translated by the C# compiler into method invocation form. There is no difference in runtime performance between the two forms.

The following code shows all three query forms. In the method syntax part, you might find that the parameter of the Where method looks a bit odd. It's a lambda expression, as was described in Chapter 15. I'll cover its use in LINQ a bit later in the chapter.

   static void Main( )
   {
      int[] numbers = { 2, 5, 28, 31, 17, 16, 42 };

      var numsQuery = from n in numbers                    // Query syntax
                      where n < 20
                      select n;

      var numsMethod = numbers.Where(x => x < 20);         // Method syntax

      int numsCount = (from n in numbers                   // Combined
                      where n < 20
                      select n).Count();

      foreach (var x in numsQuery)
         Console.Write("{0}, ", x);
      Console.WriteLine();

      foreach (var x in numsMethod)
         Console.Write("{0}, ", x);
      Console.WriteLine();

      Console.WriteLine(numsCount);
   }

This code produces the following output:


2, 5, 17, 16,
2, 5, 17, 16,
4

Query Variables

LINQ queries can return two types of results: an enumeration, which lists the items that satisfy the query parameters; or a single value, called a scalar, which is some form of summary of the results that satisfied the query.

In the following example code, the following happens:

  • The first statement creates an array of ints and initializes it with three values.
  • The second statement returns an IEnumerable object, which can be used to enumerate the results of the query.
  • The third statement executes a query and then calls a method (Count) that returns the count of the items returned from the query. We'll cover operators that return scalars, such as Count, later in the chapter.
   int[] numbers = { 2, 5, 28 };

   IEnumerable<int> lowNums = from n in numbers       // Returns an enumerator
                              where n < 20
                              select n;

   int numsCount            = (from n in numbers       // Returns an int
                              where n < 20
                              select n).Count();

The variable on the left of the equals sign is called the query variable. Although the types of the query variables are given explicitly in the example statements, you could also have had the compiler infer the types of the query variables by using the var keyword in place of the type names.

It's important to understand the contents of query variables. After executing the preceding code, query variable lowNums does not contain the results of the query. Instead, it contains an object of type IEnumerable<int>, which can perform the query if it's called upon to do so later in the code. Query variable numsCount, however, contains an actual integer value, which can have been obtained only by actually running the query.

The differences in the timing of the execution of the queries can be summarized as follows:

  • If a query expression returns an enumeration, the query is not executed until the enumeration is processed.
    • If the enumeration is processed multiple times, the query is executed multiple times.
    • If the data changes between the time the enumeration is produced and the time the query is executed, the query is run on the new data.
  • If the query expression returns a scalar, the query is executed immediately, and the result is stored in the query variable.

Figure 21-2 illustrates this for the enumerable query. Variable lowNums contains a reference to the enumerable that can enumerate the query results from the array.

Image

Figure 21-2. The compiler creates an object that implements IEnumerable<int> and stores the query in the object.

The Structure of Query Expressions

A query expression consists of a from clause followed by a query body, as illustrated in Figure 21-3. Some of the important things to know about query expressions are the following:

  • The clauses must appear in the order shown.
    • The two parts that are required are the from clause and the select...group clause.
    • The other clauses are optional.
  • In a LINQ query expression, the select clause is at the end of the expression. This is different than SQL, where the SELECT statement is at the beginning of a query. One of the reasons for using this position in C# is that it allows Visual Studio's IntelliSense to give you more options while you're entering code.
  • There can be any number of from...let...where clauses, as illustrated in the figure.
Image

Figure 21-3. The structure of a query statement consists of a from clause followed by a query body.

The from Clause

The from clause specifies the data collection that is to be used as the data source. It also introduces the iteration variable. The important points about the from clause are the following:

  • The iteration variable sequentially represents each element in the data source.
  • The syntax of the from clause is shown following, where
    • Type is the type of the elements in the collection. This is optional, because the compiler can infer the type from the collection.
    • Item is the name of the iteration variable.
    • Items is the name of the collection to be queried. The collection must be enumerable, as described in Chapter 13.

Image

The following code shows a query expression used to query an array of four ints. Iteration variable item will represent each of the four elements in the array and will be either selected or rejected by the where and select clauses following it. This code leaves out the optional type (int) of the iteration variable.

Image

This code produces the following output:


10, 11, 12,

Figure 21-4 shows the syntax of the from clause. The type specifier is optional, since it can be inferred by the compiler. There can be any number of optional join clauses.

Image

Figure 21-4. The syntax of the from clause

Although there is a strong similarity between the LINQ from clause and the foreach statement, there are several major differences:

  • The foreach statement executes its body at the point in the code where it is encountered. The from clause, on the other hand, does not execute anything. It creates an enumerable object that's stored in the query variable. The query itself might or might not be executed later in the code.
  • The foreach statement imperatively specifies that the items in the collection are to be considered in order, from the first to the last. The from clause declaratively states that each item in the collection must be considered but does not assume an order.

The join Clause

The join clause in LINQ is much like the JOIN clause in SQL. If you're familiar with joins from SQL, then joins in LINQ will be nothing new for you conceptually, except for the fact that you can now perform them on collections of objects as well as database tables. If you're new to joins or need a refresher, then the next section should help clear things up for you.

The first important things to know about a join are the following:

  • A join operation takes two collections and creates a new temporary collection of objects, where each object contains all the fields from an object from both initial collections.
  • Use a join to combine data from two or more collections.

The syntax for a join is shown here. It specifies that the second collection is to be joined with the collection in the previous clause.

Image

Figure 21-5 illustrates the syntax for the join clause.

Image

Figure 21-5. Syntax for the join clause

The following annotated statement shows an example of the join clause:

Image

What Is a Join?

A join in LINQ takes two collections and creates a new collection where each element has members from the elements of the two original collections.

For example, the following code declares two classes: Student and CourseStudent.

  • Objects of type Student contain a student's last name and student ID number.
  • Objects of type CourseStudent represent a student that is enrolled in a course and contain the course name and a student ID number.
   public class Student
   {
      public int    StID;
      public string LastName;
   }

   public class CourseStudent
   {
      public string CourseName;
      public int    StID;
   }

Figure 21-6 shows the situation in a program where there are three students and three courses, and the students are enrolled in various courses. The program has an array called students, of Student objects, and an array called studentsInCourses, of CourseStudent objects, which contains one object for every student enrolled in each course.

Image

Figure 21-6. Students enrolled in various courses

Suppose now that you want to get the last name of every student in a particular course. The students array has the last names, and the studentsInCourses array has the course enrollment information. To get the information, you must combine the information in the arrays, based on the student ID field, which is common to objects of both types. You can do this with a join on the StID field.

Figure 21-7 shows how the join works. The left column shows the students array, and the right column shows the studentsInCourses array. If we take the first student record and compare its ID with the student ID in each studentsInCourses object, we find that two of them match, as shown at the top of the center column. If we then do the same with the other two students, we find that the second student is taking one course, and the third student is taking two courses.

The five grayed objects in the middle column represent the join of the two arrays on field StID. Each object contains three fields: the LastName field from the Students class, the CourseName field from the CourseStudent class, and the StID field common to both classes.

Image

Figure 21-7. Two arrays of objects and their join on field StId

The following code puts the whole example together. The query finds the last names of all the students taking the history course.

   class Program
   {
      public class Student {                         // Declare classes.
         public int    StID;
         public string LastName;
      }

      public class CourseStudent {
         public string CourseName;
         public int    StID;
      }
                                                     // Initialize arrays.
      static CourseStudent[] studentsInCourses = new CourseStudent[] {
           new CourseStudent { CourseName = "Art",         StID = 1 },
           new CourseStudent { CourseName = "Art",         StID = 2 },
           new CourseStudent { CourseName = "History",     StID = 1 },
           new CourseStudent { CourseName = "History",     StID = 3 },
           new CourseStudent { CourseName = "Physics",     StID = 3 },
       };

      static Student[] students = new Student[] {
           new Student { StID = 1, LastName = "Carson"   },
           new Student { StID = 2, LastName = "Klassen"  },
           new Student { StID = 3, LastName = "Fleming"  },
       };

      static void Main( )
      {
         // Find the last names of the students taking history.
         var query = from s in students
                     join c in studentsInCourses on s.StID equals c.StID
                     where c.CourseName == "History"
                     select s.LastName;

         // Display the names of the students taking history.
         foreach (var q in query)
            Console.WriteLine("Student taking History:  {0}", q);
      }
   }

This code produces the following output:


Student taking History:  Carson
Student taking History:  Fleming

The from . . . let . . . where Section in the Query Body

The optional from...let...where section is the first section of the query body. It can have any number of any of the three clauses that comprise it—the from clause, the let clause, and the where clause. Figure 21-8 summarizes the syntax of the three clauses.

Image

Figure 21-8. The syntax of the from . . . let . . . where clause

The from Clause

You saw that a query expression starts with a required from clause, which is followed by the query body. The body itself can start with any number of additional from clauses, where each subsequent from clause specifies an additional source data collection and introduces a new iteration variable for use in further evaluations. The syntax and meanings of all the from clauses are the same.

The following code shows an example of this use.

  • The first from clause is the required clause of the query expression.
  • The second from clause is the first clause of the query body.
  • The select clause creates objects of an anonymous type.

Image

This code produces the following output:


{ a = 5, b = 6, sum = 11 }
{ a = 5, b = 7, sum = 12 }
{ a = 5, b = 8, sum = 13 }
{ a = 6, b = 6, sum = 12 }
{ a = 6, b = 7, sum = 13 }
{ a = 6, b = 8, sum = 14 }

The let Clause

The let clause takes the evaluation of an expression and assigns it to an identifier to be used in other evaluations. The syntax of the let clause is the following:

   let Identifier = Expression

For example, the query expression in the following code pairs each member of array groupA with each element of array groupB. The where clause eliminates each set of integers from the two arrays where the sum of the two is not equal to 12.

Image

This code produces the following output:


{ a = 3, b = 9, sum = 12 }
{ a = 4, b = 8, sum = 12 }
{ a = 5, b = 7, sum = 12 }
{ a = 6, b = 6, sum = 12 }

The where Clause

The where clause eliminates items from further consideration if they don't meet the specified condition. The syntax of the where clause is the following:

   where BooleanExpression

Important things to know about the where clause are the following:

  • A query expression can have any number of where clauses, as long as they are in the from...let...where section.
  • An item must satisfy all the where clauses to avoid elimination from further consideration.

The following code shows an example of a query expression that contains two where clauses. The where clauses eliminate each set of integers from the two arrays where the sum of the two is not greater than or equal to 11, and the element from groupA is not the value 4. Each set of elements selected must satisfy the conditions of both where clauses.

Image

This code produces the following output:


{ a = 4, b = 7, sum = 11 }
{ a = 4, b = 8, sum = 12 }
{ a = 4, b = 9, sum = 13 }

The orderby Clause

The orderby clause takes an expression and returns the result items in order according to the expression.

Figure 21-9 shows the syntax of the orderby clause. The optional keywords ascending and descending set the direction of the order. Expression is generally a field of the items.

  • The default ordering of an orderby clause is ascending. You can, however, explicitly set the ordering of the elements to either ascending or descending, using the ascending and descending keywords.
  • There can be any number of orderby clauses, and they must be separated by commas.
Image

Figure 21-9. The syntax of the orderby clause

The following code shows an example of student records ordered by the ages of the students. Notice that the array of student information is stored in an array of anonymous types.

Image

This code produces the following output:


Jones, Mary:  19 - History
Smith, Bob:  20 - CompSci
Fleming, Carol:  21 - History

The select . . . group Clause

There are two types of clauses that make up the select...group section—the select clause and the group...by clause. While the clauses that precede the select...group section specify the data sources and which objects to choose, the select...group section does the following:

  • The select clause specifies which parts of the chosen objects should be selected. It can specify any of the following:
    • The entire data item
    • A field from the data item
    • A new object comprising several fields from the data item (or any other value, for that matter).
  • The group...by clause is optional and specifies how the chosen items should be grouped. We'll cover the group...by clause later in the chapter.

Figure 21-10 shows the syntax for the select...group clause.

Image

Figure 21-10. The syntax of the select . . . group clause

The following code shows an example of using the select clause to select the entire data item. First, the program creates an array of objects of an anonymous type. The query expression then uses the select statement to select each item in the array.

   using System;
   using System.Linq;
   class Program {
      static void Main() {
         var students = new[]       // Array of objects of an anonymous type
         {
            new { LName="Jones",   FName="Mary",  Age=19, Major="History" },
            new { LName="Smith",   FName="Bob",   Age=20, Major="CompSci" },
            new { LName="Fleming", FName="Carol", Age=21, Major="History" }
         };

         var query = from s in students
                     select s;

         foreach (var q in query)
             Console.WriteLine("{0}, {1}: Age {2}, {3}",
                               q.LName, q.FName, q.Age, q.Major);
      }
   }

This code produces the following output:


Jones, Mary: Age 19, History
Smith, Bob: Age 20, CompSci
Fleming, Carol: Age 21, History

You can also use the select clause to choose only particular fields of the object. For example, the select clause in the following code selects only the last name of the student.

   var query = from s in students
               select s.LName;

   foreach (var q in query)
       Console.WriteLine(q);

When you substitute these two statements for the corresponding two statements in the preceding full example, the program produces the following output:


Jones
Smith
Fleming

Anonymous Types in Queries

The result of a query can consist of items from the source collections, fields from the items in the source collections, or anonymous types.

You can create an anonymous type in a select clause by placing curly braces around a comma-separated list of fields you want to include in the type. For example, to make the code in the previous section select just the names and majors of the students, you could use the following syntax:

Image

The following code creates an anonymous type in the select clause and uses it later in the WriteLine statement.

Image

This code produces the following output:


Mary Jones -- History
Bob Smith -- CompSci
Carol Fleming -- History

The group Clause

The group clause groups the selected objects according to some criterion. For example, with the array of students in the previous examples, the program could group the students according to their majors.

The important things to know about the group clause are the following:

  • When items are included in the result of the query, they're placed in groups according to the value of a particular field. The value on which items are grouped is called the key.
  • Unlike the select clause, the group clause does not return an enumerable that can enumerate the items from the original source. Instead, it returns an enumerable that enumerates the groups of items that have been formed.
  • The groups themselves are enumerable and can enumerate the actual items.

An example of the syntax of the group clause is the following:

Image

For example, the following code groups the students according to their majors:

Image

This code produces the following output:


History
      Jones, Mary
      Fleming, Carol
CompSci
      Smith, Bob

Figure 21-11 illustrates the object that is returned from the query expression and stored in the query variable.

  • The object returned from the query expression is an enumerable that enumerates the groups resulting from the query.
  • Each group is distinguished by a field called Key.
  • Each group is itself enumerable and can enumerate its items.
Image

Figure 21-11. The group clause returns a collection of collections of objects rather than a collection of objects.

Query Continuation

A query continuation clause takes the result of one part of a query and assigns it a name so that it can be used in another part of the query. Figure 21-12 shows the syntax for query continuation.

Image

Figure 21-12. The syntax of the query continuation clause

For example, the following query joins groupA and groupB and names the join groupAandB. It then performs a simple select from groupAandB.

Image

This code produces the following output:


4  5  6

The Standard Query Operators

The standard query operators comprise a set of methods called an application programming interface (API) that lets you query any .NET array or collection. Important characteristics of the standard query operators are the following:

  • The collection objects queried are called sequences and must implement the IEnumerable<T> interface, where T is a type.
  • The standard query operators use method syntax.
  • Some operators return IEnumerable objects (or other sequences), while others return scalars. Operators that return scalars execute their queries immediately and return a value instead of an enumerable object to be iterated over later.

For example, the following code shows the use of operators Sum and Count, which return ints. Notice the following about the code:

  • The operators are used as methods directly on the sequence objects, which in this case is array numbers.
  • The return type is not an IEnumerable object but an int.

Image

This code produces the following output:


Total: 12, Count: 3

There are 47 standard query operators that fall into 14 different categories. These categories are shown in Table 21-1.

Table 21-1. Categories of the Standard Query Operators

Name Number of Operators Description
Restriction 1 Returns a subset of the objects of the sequence, based on selection criteria
Projection 2 Selects which parts of the objects of a sequence are finally returned
Partitioning 4 Skips or returns objects from a sequence
Join 2 Returns an IEnumerable object that joins two sequences, based on some criterion
Concatenation 1 Produces a single sequence from two separate sequences
Ordering 2 Orders a sequence based on supplied criteria
Grouping 1 Groups a sequence based on supplied criteria
Set 4 Performs set operations on a sequence
Conversion 7 Converts sequences to various forms such as arrays, lists, and dictionaries
Equality 1 Compares two sequences for equality
Element 9 Returns a particular element of a sequence
Generation 3 Generates sequences
Quantifiers 3 Returns Boolean values specifying whether a particular predicate is true about a sequence
Aggregate 7 Returns a single value representing characteristics of a sequence

Query Expressions and the Standard Query Operators

As mentioned at the beginning of the chapter, every query expression can also be written using method syntax with the standard query operators. The set of standard query operators is a set of methods for performing queries. The compiler translates every query expression into standard query operator form.

Clearly, since all query expressions are translated into the standard query operators—the operators can perform everything done by query expressions. But the operators also give additional capabilities that aren't available in query expression form. For example, operators Sum and Count, which were used in the previous example, can be expressed only using the method syntax.

The two forms, query expressions and method syntax, however, can be combined. For example, the following code shows a query expression that also uses operator Count. Notice that the query expression part of the statement is inside parentheses, which is followed by a dot and the name of the method.

Image

This code produces the following output:


Count: 3

Signatures of the Standard Query Operators

The standard query operators are methods declared in class System.Linq.Enumerable. These methods, however, aren't just any methods—they are extension methods that extend generic class IEnumerable<T>.

Extension methods were covered in Chapters 7 and 19, but the most important thing to remember about them is that they are public, static methods that, although defined in one class, are designed to add functionality to a different class—the one listed as the first formal parameter. This formal parameter must be preceded by the keyword this.

For example, the following are the signatures of three of the operators: Count, First, and Where. At first glance, the signatures of the operators can be somewhat intimidating. Notice the following about the signatures:

  • Since the operators are generic methods, they have a generic parameter (T) associated with their names.
  • Since the operators are extension methods that extend IEnumerable<T>, they must satisfy the following syntactic requirements:
    • They must be declared public and static.
    • They must have the this extension indicator before the first parameter.
    • They must have IEnumerable<T> as the first parameter type.

Image

For example, the following code shows the use of operators Count and First. Both operators take only a single parameter—the reference to the IEnumerable<T> object.

  • The Count operator returns a single value, which is the count of all the elements in the sequence.
  • The First operator returns the first element of the sequence.

The first two times the operators are used in this code, they are called directly, just like normal methods, passing the name of the array as the first parameter. In the following two lines, however, they are called using the extension method syntax, as if they were method members of the array, which is enumerable. Notice that in this case no parameter is supplied. Instead, the array name has been moved from the parameter list to before the method name. There it is used as if it contained a declaration of the method.

The direct syntax calls and the extension syntax calls are completely equivalent in effect—only their syntax is different.

Image

This code produces the following output:


Count: 6, FirstNumber: 3
Count: 6, FirstNumber: 3

Delegates As Parameters

As you just saw in the previous section, the first parameter of every operator is a reference to an IEnumerable<T> object. The parameters following it can be of any type. Many operators take generic delegates as parameters. (Generic delegates were explained in Chapter 19.) The most important thing to recall about generic delegates as parameters is the following:

  • Generic delegates are used to supply user-defined code to the operator.

To explain this, I'll start with an example showing several ways you might use the Count operator. The Count operator is overloaded and has two forms. The first form, which was used in the previous example, has a single parameter, as shown here:

   public static int Count<T>(this IEnumerable<T> source);

Like all extension methods, you can use it in the standard static method form or in the form of an instance method on an instance of the class it extends, as shown in the following two lines of code:

   var count1 = Linq.Enumerable.Count(intArray);       // Static method form

   var count2 = intArray.Count();                      // Instance method form

In these two instances, the query counts the number of ints in the given integer array. Suppose, however, that you only want to count the odd elements of the array. To do that, you must supply the Count method with code that determines whether an integer is odd.

To do this, you would use the second form of the Count method, which is shown following. It has a generic delegate as its second parameter. At the point it is invoked, you must supply a delegate object that takes a single input parameter of type T and returns a Boolean value. The return value of the delegate code must specify whether the element should be included in the count.

Image

For example, the following code uses the second form of the Count operator to instruct it to include only those values that are odd. It does this by supplying a lambda expression that returns true if the input value is odd and false otherwise. (Lambda expressions were covered in Chapter 15.) At each iteration through the collection, Count calls this method (represented by the lambda expression) with the current value as input. If the input is odd, the method returns true, and Count includes the element in the total.

Image

This code produces the following output:


Count of odd numbers: 4

The LINQ Predefined Delegate Types

Like the Count operator from the previous example, many of the LINQ operators require you to supply code that directs how the operator performs its operation. You can do this by using delegate objects as parameters.

Remember from Chapter 15 that you can think of a delegate object as an object that contains a method or list of methods with a particular signature and return type. When the delegate is invoked, the methods it contains are invoked in sequence.

LINQ defines two families of generic delegate types for use with the standard query operators. These are the Func delegates and the Action delegates. Each set has 17 members.

  • The delegate objects you create for use as actual parameters must be of these delegate types or of these forms.
  • TR represents the return type and is always last in the list of type parameters.

The first four generic Func delegates are listed here. The first form takes no method parameters and returns an object of the return type. The second takes a single method parameter and returns a value, and so forth. Notice that the return type parameter has the out keyword, making it covariant. It can therefore accept the type declared or any type derived from that type. The input parameters have the in keyword, making them contravariant. They, therefore, can accept the declared type, or any type derived from that type.

Image

With this in mind, if you look again at the declaration of Count, which follows, you can see that the second parameter must be a delegate object that takes a single value of some type T as the method parameter and returns a value of type bool.

Image

A parameter delegate that produces a Boolean value is called a predicate.

The first four Action delegates are the following. They're the same as the Func delegates except that they have no return value and hence no return value type parameter. All their type parameters are contravariant.

   public delegate void Action                     ( );
   public delegate void Action<in T1>              ( T1 a1 );
   public delegate void Action<in T1, in T2>       ( T1 a1, T2 a2 );
   public delegate void Action<in T1, in T2, in T3>( T1 a1, T2 a2, T3 a3 );

Example Using a Delegate Parameter

Now that you better understand Count's signature and LINQ's use of generic delegate parameters, you'll be better able to understand a full example.

The following code first declares method IsOdd, which takes a single parameter of type int and returns a bool value stating whether the input parameter was odd. Method Main does the following:

  • It declares an array of ints as the data source.
  • It creates a delegate object called MyDel of type Func<int, bool>, and it uses method IsOdd to initialize the delegate object. Notice that you don't need to declare the Func delegate type because, as you saw, it's already predefined by LINQ.
  • It calls Count using the delegate object.
   class Program
   {
      static bool IsOdd(int x)    // Method to be used by the delegate object
      {
         return x % 2 == 1;       // Return true if x is odd.
      }

      static void Main()
      {
         int[] intArray = new int[] { 3, 4, 5, 6, 7, 9 };

         Func<int, bool> myDel = new Func<int, bool>(IsOdd); // Delegate object
         var countOdd = intArray.Count(myDel);               // Use delegate

         Console.WriteLine("Count of odd numbers: {0}", countOdd);
      }
   }

This code produces the following output:


Count of odd numbers: 4

Example Using a Lambda Expression Parameter

The previous example used a separate method and a delegate to attach the code to the operator. This required declaring the method, declaring the delegate object, and then passing the delegate object to the operator. This works fine and is exactly the right approach to take if either of the following conditions is true:

  • If the method must be called from somewhere else in the program than just in the place it's used to initialize the delegate object
  • If the code in the method body is more than just a statement or two long

If neither of these conditions is true, however, you probably want to use a more compact and localized method of supplying the code to the operator, using a lambda expression as described in Chapter 15.

We can modify the previous example to use a lambda expression by first deleting the IsOdd method entirely and placing the equivalent lambda expression directly at the declaration of the delegate object. The new code is shorter and cleaner and looks like this:

Image

Like the previous example, this code produces the following output:


Count of odd numbers: 4

We could also have used an anonymous method in place of the lambda expression, as shown following. This is more verbose, though, and since lambda expressions are equivalent semantically and are less verbose, there's little reason to use anonymous methods anymore.

Image

LINQ to XML

Extensible Markup Language (XML) is an important method of storing and exchanging data. LINQ adds features to the language that make working with XML much easier than previous methods such as XPath and XSLT. If you're familiar with these methods, you might be pleased to hear that LINQ to XML simplifies the creation, traversal, and manipulation of XML in a number of ways, including the following:

  • You can create an XML tree in a top-down fashion, with a single statement.
  • You can create and manipulate XML in-memory without having an XML document to contain the tree.
  • You can create and manipulate string nodes without having a Text subnode.

Although I won't give a complete treatment of XML, I will start by giving a very brief introduction to it before describing some of the XML manipulation features supplied by LINQ.

Markup Languages

A markup language is a set of tags placed in a document to give information about the information in the document. That is, the markup tags are not the data of the document—they contain data about the data. Data about data is called metadata.

A markup language is a defined set of tags designed to convey particular types of metadata about the contents of a document. HTML, for example, is the most widely known markup language. The metadata in its tags contains information about how a web page should be rendered in a browser and how to navigate among the pages using the hypertext links.

While most markup languages contain a predefined set of tags, XML contains only a few defined tags, and the rest are defined by the programmer to represent whatever kinds of metadata are required by a particular document type. As long as the writer and reader of the data agree on what the tags mean, the tags can contain whatever useful information the designers want.

XML Basics

Data in an XML document is contained in an XML tree, which consists mainly of a set of nested elements.

The element is the fundamental constituent of an XML tree. Every element has a name and can contain data. Some can also contain other, nested elements. Elements are demarcated by opening and closing tags. Any data contained by an element must be between its opening and closing tags.

  • An opening tag starts with an open angle bracket, followed by the element name, followed optionally by any attributes, followed by a closing angle bracket.
          <PhoneNumber>
  • A closing tag starts with an open angle bracket, followed by a slash character, followed by the element name, followed by a closing angle bracket.
          </PhoneNumber>
  • An element with no content can be represented by a single tag that starts with an open angle bracket, followed by the name of the element, followed by a slash, and is terminated with a closing angle bracket.
          <PhoneNumber />

The following XML fragment shows an element named EmployeeName followed by an empty element named PhoneNumber.

Image

Other important things to know about XML are the following:

  • XML documents must have a single root element that contains all the other elements.
  • XML tags must be properly nested.
  • Unlike HTML tags, XML tags are case sensitive.
  • XML attributes are name/value pairs that contain additional metadata about an element. The value part of an attribute must always be enclosed in quotation marks, which can be either double quotation marks or single quotation marks.
  • Whitespace within an XML document is maintained. This is unlike HTML, where whitespace is consolidated to a single space in the output.

The following XML document is an example of XML that contains information about two employees. This XML tree is extremely simple in order to show the elements clearly. The important things to notice about the XML tree are the following:

  • The tree contains a root node of type Employees that contains two child nodes of type Employee.
  • Each Employee node contains nodes containing the name and phone numbers of an employee.
   <Employees>
      <Employee>
         <Name>Bob Smith</Name>
         <PhoneNumber>408-555-1000</PhoneNumber>
         <CellPhone />
      </Employee>
      <Employee>
         <Name>Sally Jones</Name>
         <PhoneNumber>415-555-2000</PhoneNumber>
         <PhoneNumber>415-555-2001</PhoneNumber>
      </Employee>
   </Employees>

Figure 21-13 illustrates the hierarchical structure of the sample XML tree.

Image

Figure 21-13. Hierarchical structure of the sample XML tree

The XML Classes

LINQ to XML can be used to work with XML in two ways. The first way is as a simplified XML manipulation API. The second way is to use the LINQ query facilities you've seen throughout the earlier part of this chapter. I'll start by introducing the LINQ to XML API.

The LINQ to XML API consists of a number of classes that represent the components of an XML tree. The three most important classes you'll use are XElement, XAttribute, and XDocument. There are other classes as well, but these are the main ones.

In Figure 21-13, you saw that an XML tree is a set of nested elements. Figure 21-14 shows the classes used to build an XML tree and how they can be nested.

For example, the figure shows the following:

  • An XDocument node can have the following as its direct child nodes:
    • At most, one of each of the following node types: an XDeclaration node, an XDocumentType node, and an XElement node
    • Any number of XProcessingInstruction nodes
  • If there is a top-level XElement node under the XDocument, it is the root of the rest of the elements in the XML tree.
  • The root element can in turn contain any number of nested XElement, XComment, or XProcessingInstruction nodes, nested to any level.
Image

Figure 21-14. The containment structure of XML nodes

Except for the XAttribute class, most of the classes used to create an XML tree are derived from a class called XNode and are referred to generically in the literature as “XNodes.” Figure 21-14 shows the XNode classes in white clouds, while the XAttribute class is shown in a gray cloud.

Creating, Saving, Loading, and Displaying an XML Document

The best way to demonstrate the simplicity and usage of the XML API is to show simple code samples. For example, the following code shows how simple it is to perform several of the important tasks required when working with XML.

It starts by creating a simple XML tree consisting of a node called Employees, with two subnodes containing the names of two employees. Notice the following about the code:

  • The tree is created with a single statement that creates all the nested elements in place in the tree. This is called functional construction.
  • Each element is created in place using an object creation expression, using the constructor of the type of the node.

After creating the tree, the code saves it to a file called EmployeesFile.xml, using XDocument's Save method. It then reads the XML tree back from the file using XDocument's static Load method and assigns the tree to a new XDocument object. Finally, it uses WriteLine to display the structure of the tree held by the new XDocument object.

Image

This code produces the following output:


<Employees>
  <Name>Bob Smith</Name>
  <Name>Sally Jones</Name>
</Employees>

Creating an XML Tree

In the previous example, you saw that you can create an XML document in-memory by using constructors for XDocument and XElement. In the case of both constructors

  • The first parameter is the name of the object.
  • The second and following parameters contain the nodes of the XML tree. The second parameter of the constructor is a params parameter, and so can have any number of parameters.

For example, the following code produces an XML tree and displays it using the Console.WriteLine method:

   using System;
   using System.Xml.Linq;                         // This namespace is required.

   class Program
   {
      static void Main( ) {
         XDocument employeeDoc =
            new XDocument(                     // Create the document.
               new XElement("Employees",       // Create the root element.
  
                  new XElement("Employee",     // First employee element
                     new XElement("Name", "Bob Smith"),
                     new XElement("PhoneNumber", "408-555-1000") ),

                  new XElement("Employee",     // Second employee element
                     new XElement("Name", "Sally Jones"),
                     new XElement("PhoneNumber", "415-555-2000"),
                     new XElement("PhoneNumber", "415-555-2001") )
               )
            );
         Console.WriteLine(employeeDoc);       // Displays the document
      }
   }

This code produces the following output:


<Employees>
  <Employee>
    <Name>Bob Smith</Name>
    <PhoneNumber>408-555-1000</PhoneNumber>
  </Employee>
  <Employee>
    <Name>Sally Jones</Name>
    <PhoneNumber>415-555-2000</PhoneNumber>
    <PhoneNumber>415-555-2001</PhoneNumber>
  </Employee>
</Employees>

Using Values from the XML Tree

The power of XML becomes evident when you traverse an XML tree and retrieve or modify values. Table 21-2 shows the main methods used for retrieving data.

Table 21-2. Methods for Querying XML

Method Name Class Return Type Description
Nodes Xdocument
XElement
IEnumerable<object> Returns all the children of the current node, regardless of their type
Elements Xdocument
XElement
IEnumerable<XElement> Returns all the current node's XElement child nodes or all the child nodes with a specific name
Element Xdocument
XElement
XElement Returns the current node's first XElement child node or the first child node with a specific name
Descendants XElement IEnumerable<XElement> Returns all the descendant XElement nodes or all the descendant XElement nodes with a specific name, regardless of their level of nesting below the current node
DescendantsAndSelf XElement IEnumerable<XElement> Same as Descendants but also includes the current node
Ancestors XElement IEnumerable<XElement> Returns all the ancestor XElement nodes or all the ancestor XElement nodes above the current node that have a specific name
AncestorsAndSelf XElement IEnumerable<XElement> Same as Ancestors but also includes the current node
Parent XElement XElement Returns the parent node of the current node

Some of the important things to know about the methods in Table 21-2 are the following:

  • Nodes: The Nodes method returns an object of type IEnumerable<object>, because the nodes returned might be of different types, such as XElement, XComment, and so on. You can use the type parameterized method OfType<type> to specify what type of nodes to return. For example, the following line of code retrieves only the XComment nodes:
          IEnumerable<XComment> comments = xd.Nodes().OfType<XComment>();
  • Elements: Since retrieving XElements is such a common requirement, there is a shortcut for expression Nodes().OfType<XElement>()—the Elements method.
    • Using the Elements method with no parameters returns all the child XElements.
    • Using the Elements method with a single name parameter returns only the child XElements with that name. For example, the following line of code returns all the child XElement nodes with the name PhoneNumber.
            IEnumerable<XElement> empPhones = emp.Elements("PhoneNumber");
  • Element: This method retrieves just the first child XElement of the current node. Like the Elements method, it can be called with either one or no parameters. With no parameters, it gets the first child XElement node. With a single name parameter, it gets the first child XElement node of that name.
  • Descendants and Ancestors: These methods work like the Elements and Parent methods, but instead of returning the immediate child elements or parent element, they include the elements below or above the current node, regardless of the difference in nesting level.

The following code illustrates the Element and Elements methods:

Image

This code produces the following output:


Bob Smith
   408-555-1000
Sally Jones
   415-555-2000
   415-555-2001

Adding Nodes and Manipulating XML

You can add a child element to an existing element using the Add method. The Add method allows you to add as many elements as you like in a single method call, regardless of the node types you are adding.

For example, the following code creates a simple XML tree and displays it. It then uses the Add method to add a single node to the root element. Following that, it uses the Add method a second time to add three elements—two XElements and an XComment. Notice the results in the output:

   using System;
   using System.Xml.Linq;

   class Program
   {
      static void Main()
      {
         XDocument xd = new XDocument(               // Create XML tree
            new XElement("root",
               new XElement("first")
            )
         );

         Console.WriteLine("Original tree");
         Console.WriteLine(xd); Console.WriteLine(); // Display the tree.

         XElement rt = xd.Element("root");           // Get the first element.

         rt.Add( new XElement("second"));            // Add a child element.

         rt.Add( new XElement("third"),              // Add three more children.
                 new XComment("Important Comment"),
                 new XElement("fourth"));

         Console.WriteLine("Modified tree");
         Console.WriteLine(xd);                      // Display modified tree
      }
   }

This code produces the following output:


<root>
  <first />
</root>

<root>
  <first />
  <second />
  <third />
  <!--Important Comment-->
  <fourth />
</root>

The Add method places the new child nodes after the existing child nodes, but you can place the nodes before and between the child nodes as well, using the AddFirst, AddBeforeSelf, and AddAfterSelf methods.

Table 21-3 lists some of the most important methods for manipulating XML. Notice that some of the methods are applied to the parent node and others to the node itself.

Table 21-3. Methods for Manipulating XML

Method Name Call From Description
Add Parent Adds new child nodes after the existing child nodes of the current node
AddFirst Parent Adds new child nodes before the existing child nodes of the current node
AddBeforeSelf Node Adds new nodes before the current node at the same level
AddAfterSelf Node Adds new nodes after the current node at the same level
Remove Node Deletes the currently selected node and its contents
RemoveNodes Node Deletes the currently selected XElement and its contents
SetElement Parent Sets the contents of a node
ReplaceContent Node Replaces the contents of a node

Working with XML Attributes

Attributes give additional information about an XElement node. They're placed in the opening tag of the XML element.

When you functionally construct an XML tree, you can add attributes by just including XAttribute constructors within the scope of the XElement constructor. There are two forms of the XAttribute constructor; one takes a name and a value, and the other takes a reference to an already existing XAttribute.

The following code adds two attributes to root. Notice that both parameters to the XAttribute constructor are strings; the first specifies the name of the attribute, and the second gives the value.

Image

This code produces the following output. Notice that the attributes are placed inside the opening tag of the element.


<root color="red" size="large">
  <first />
  <second />
</root>

To retrieve an attribute from an XElement node, use the Attribute method, supplying the name of the attribute as the parameter. The following code creates an XML tree with a node with two attributes—color and size. It then retrieves the values of the attributes and displays them.

   static void Main( )
   {
      XDocument xd = new XDocument(                      // Create XML tree
         new XElement("root",
            new XAttribute("color", "red"),
            new XAttribute("size", "large"),
            new XElement("first")
         )
      );

      Console.WriteLine(xd); Console.WriteLine();        // Display XML tree

      XElement rt = xd.Element("root");                  // Get the element.

      XAttribute color = rt.Attribute("color");          // Get the attribute.
      XAttribute size =  rt.Attribute("size");           // Get the attribute.

      Console.WriteLine("color is {0}", color.Value);    // Display attr. value
      Console.WriteLine("size  is {0}", size.Value);     // Display attr. value
   }

This code produces the following output:


<root color="red" size="large">
  <first />
</root>

color is red
size  is large

To remove an attribute, you can select the attribute and use the Remove method or use the SetAttributeValue method on its parent and set the attribute value to null. The following code demonstrates both methods:

   static void Main( ) {
      XDocument xd = new XDocument(
         new XElement("root",
            new XAttribute("color", "red"),
            new XAttribute("size", "large"),
            new XElement("first")
         )
      );

      XElement rt = xd.Element("root");          // Get the element.

      rt.Attribute("color").Remove();            // Remove the color attribute.
      rt.SetAttributeValue("size", null);        // Remove the size attribute.

      Console.WriteLine(xd);
   }

This code produces the following output:


<root>
  <first />
</root>

To add an attribute to an XML tree or change the value of an attribute, you can use the SetAttributeValue method, as shown in the following code:

   static void Main( ) {
      XDocument xd = new XDocument(
         new XElement("root",
            new XAttribute("color", "red"),
            new XAttribute("size", "large"),
            new XElement("first")));

      XElement rt = xd.Element("root");            // Get the element.

      rt.SetAttributeValue("size",  "medium");     // Change attribute value
      rt.SetAttributeValue("width", "narrow");     // Add an attribute.

      Console.WriteLine(xd); Console.WriteLine();
   }

This code produces the following output:


<root color="red" size="medium" width="narrow">
  <first />
</root>

Other Types of Nodes

Three other types of nodes used in the previous examples are XComment, XDeclaration, and XProcessingInstruction. They're described in the following sections.

XComment

Comments in XML consist of text between the <!-- and --> tokens. The text between the tokens is ignored by XML parsers. You can insert text in an XML document using the XComment class, as shown in the following line of code:

   new XComment("This is a comment")
XDeclaration

XML documents start with a line that includes the version of XML used, the type of character encoding used, and whether the document depends on external references. This is information about the XML, so it's actually metadata about the metadata! This is called the XML declaration and is inserted using the XDeclaration class. The following shows an example of an XDeclaration statement:

   new XDeclaration("1.0", "utf-8", "yes")
XProcessingInstruction

An XML processing instruction is used to supply additional data about how an XML document should be used or interpreted. Most commonly, processing instructions are used to associate a style sheet with the XML document.

You can include a processing instruction using the XProcessingInstruction constructor, which takes two string parameters—a target and a data string. If the processing instruction takes multiple data parameters, those parameters must be included in the second parameter string of the XProcessingInstruction constructor, as shown in the following constructor code. Notice that in this example, the second parameter is a verbatim string, and literal double quotes inside the string are represented by sets of two contiguous double quote marks.

   new XProcessingInstruction( "xml-stylesheet",
                               @"href=""stories"", type=""text/css""")

The following code uses all three constructs:

   static void Main( )
   {
      XDocument xd = new XDocument(
         new XDeclaration("1.0", "utf-8", "yes"),
         new XComment("This is a comment"),
         new XProcessingInstruction("xml-stylesheet",
                                    @"href=""stories.css"" type=""text/css"""),
         new XElement("root",
            new XElement("first"),
            new XElement("second")
         )
      );
   }

This code produces the following output in the output file. Using a WriteLine of xd, however, would not show the declaration statement, even though it is included in the document file.


<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!--This is a comment-->
<?xml-stylesheet href="stories.css" type="text/css"?>
<root>
  <first />
  <second />
</root>

Using LINQ Queries with LINQ to XML

You can combine the LINQ XML API with LINQ query expressions to produce simple yet powerful XML tree searches.

The following code creates a simple XML tree, displays it to the screen, and then saves it to a file called SimpleSample.xml. Although there's nothing new in this code, we'll use this XML tree in the following examples.

   static void Main( )
   {
      XDocument xd = new XDocument(
         new XElement("MyElements",
            new XElement("first",
               new XAttribute("color", "red"),
               new XAttribute("size",  "small")),
            new XElement("second",
               new XAttribute("color", "red"),
               new XAttribute("size",  "medium")),
            new XElement("third",
               new XAttribute("color", "blue"),
               new XAttribute("size",  "large"))));

      Console.WriteLine(xd);                      // Display XML tree
      xd.Save("SimpleSample.xml");                // Save XML tree
   }

This code produces the following output:


<MyElements>
  <first color="red" size="small" />
  <second color="red" size="medium" />
  <third color="blue" size="large" />
</MyElements>

The following example code uses a simple LINQ query to select a subset of the nodes from the XML tree and then displays them in several ways. This code does the following:

  • It selects from the XML tree only those elements whose names have five characters. Since the names of the elements are first, second, and third, only node names first and third match the search criterion, and therefore those nodes are selected.
  • It displays the names of the selected elements.
  • It formats and displays the selected nodes, including the node name and the values of the attributes. Notice that the attributes are retrieved using the Attribute method, and the values of the attributes are retrieved with the Value property.

Image

This code produces the following output:


first
third

Name: first, color: red, size: small
Name: third, color: blue, size: large

The following code uses a simple query to retrieve all the top-level elements of the XML tree and creates an object of an anonymous type for each one. The first use of the WriteLine method shows the default formatting of the anonymous type. The second WriteLine statement explicitly formats the members of the anonymous type objects.

Image

This code produces the following output. The first three lines show the default formatting of the anonymous type. The last three lines show the explicit formatting specified in the format string of the second WriteLine method.


{ Name = first, color = color="red" }
{ Name = second, color = color="red" }
{ Name = third, color = color="blue" }

first ,   color: red
second,   color: red
third ,   color: blue

From these examples, you can see that you can easily combine the XML API with the LINQ query facilities to produce powerful XML querying capabilities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.108.119