Query Syntax and Method Syntax
The Structure of Query Expressions
In a relational database system, data is organized into nicely normalized tables and accessed with a very simple but powerful query language—SQL. SQL can work with any set of data in a database because the data is organized into tables, following strict rules.
In a program, as opposed to a database, however, data is stored in class objects or structs that are all vastly different. As a result, there's been no general query language for retrieving data from data structures. The method of retrieving data from objects has always been custom-designed as part of the program. LINQ, however, makes it easy to query collections of objects.
The following are the important high-level characteristics of LINQ:
The following code shows a simple example of using LINQ. In this code, the data source being queried is simply an array of int
s. The definition of the query is the statement with the from
and select
keywords. Although the query is defined in this statement, it is actually performed and used in the foreach
statement at the bottom.
static void Main()
{
int[] numbers = { 2, 12, 5, 15 }; // Data source
IEnumerable<int> lowNums = // Define and store the query.
from n in numbers
where n < 10
select n;
foreach (var x in lowNums) // Execute the query.
Console.Write("{0}, ", x);
}
This code produces the following output:
2, 5,
In the previous example, the data source was simply an array of int
s, which is an in-memory object of the program. LINQ, however, can work with many different types of data sources, such as SQL databases, XML documents, and a host of others. For every data source type, however, under the covers there must be a module of code that implements the LINQ queries in terms of that data source type. These code modules are called LINQ providers. The important points about LINQ providers are the following:
Figure 21-1. The architecture of LINQ, the LINQ-enabled languages, and LINQ providers
There are entire books dedicated to LINQ in all its forms and subtleties, but that's clearly beyond the scope of this chapter. Instead, this chapter will introduce you to LINQ and explain how to use it with program objects (LINQ to Objects) and XML (LINQ to XML).
Before getting into the details of LINQ's querying features, I'll start by covering a language feature that allows you to create unnamed class types. These are called, not surprisingly, anonymous types.
In Chapter 6 we covered object initializers, which is the construct that allows you to initialize the fields and properties of a new class instance when using an object-creation expression. Just to remind you, this kind of object-creation expression consists of three components: the keyword new
, the class name or constructor, and the object initializer. The object initializer consists of a comma-separated list of member initializers between a set of curly braces.
Creating a variable of an anonymous type uses the same form—but without the class name or constructor. The following line of code shows the object-creation expression form of an anonymous type:
The following code shows an example of creating and using an anonymous type. It creates a variable called student
, with an anonymous type that has three string
properties and one int
property. Notice in the WriteLine
statement that the instance's members are accessed just as if they were members of a named type.
This code produces the following output:
Mary Jones, Age 19, Major: History
Important things to know about anonymous types are the following:
var
keyword as the variable type.When the compiler encounters the object initializer of an anonymous type, it creates a new class type with a private name that it constructs. For each member initializer, it infers its type and creates a private variable of that type in the new class, and it creates a read/write property to access the variable. The property has the same name as the member initializer. Once the anonymous type is constructed, the compiler creates an object of that type.
Besides the assignment form of member initializers, anonymous type object initializers also allow two other forms: simple identifiers and member access expressions. These two forms are called projection initializers. The following variable declaration shows all three forms. The first member initializer is in the assignment form. The second is an identifier, and the third is a member access expression.
var student = new { Age = 19, Major, Other.Name };
For example, the following code uses all three types. Notice that the projection initializers must be defined before the declaration of the anonymous type. Major
is a local variable, and Name
is a static field of class Other
.
This code produces the following output:
Mary Jones, Age 19, Major: History
The projection initializer form of the object initializer just shown has exactly the same result as the assignment form shown here:
var student = new { Age = Age, Name = Other.Name, Major = Major};
Although your code cannot see the anonymous type, it's visible to object browsers. If the compiler encounters another anonymous type with the same parameter names, with the same inferred types, and in the same order, it will reuse the type and create a new instance—not create a new anonymous type.
There are two syntactic forms you can use when writing LINQ queries—query syntax and method syntax.
Microsoft recommends using query syntax because it's more readable, more clearly states your query intentions, and is therefore less error-prone. There are some operators, however, that can be written only using method syntax.
Note Queries expressed using query syntax are translated by the C# compiler into method invocation form. There is no difference in runtime performance between the two forms.
The following code shows all three query forms. In the method syntax part, you might find that the parameter of the Where
method looks a bit odd. It's a lambda expression, as was described in Chapter 15. I'll cover its use in LINQ a bit later in the chapter.
static void Main( )
{
int[] numbers = { 2, 5, 28, 31, 17, 16, 42 };
var numsQuery = from n in numbers // Query syntax
where n < 20
select n;
var numsMethod = numbers.Where(x => x < 20); // Method syntax
int numsCount = (from n in numbers // Combined
where n < 20
select n).Count();
foreach (var x in numsQuery)
Console.Write("{0}, ", x);
Console.WriteLine();
foreach (var x in numsMethod)
Console.Write("{0}, ", x);
Console.WriteLine();
Console.WriteLine(numsCount);
}
This code produces the following output:
2, 5, 17, 16,
2, 5, 17, 16,
4
LINQ queries can return two types of results: an enumeration, which lists the items that satisfy the query parameters; or a single value, called a scalar, which is some form of summary of the results that satisfied the query.
In the following example code, the following happens:
int
s and initializes it with three values.IEnumerable
object, which can be used to enumerate the results of the query.Count
) that returns the count of the items returned from the query. We'll cover operators that return scalars, such as Count
, later in the chapter. int[] numbers = { 2, 5, 28 };
IEnumerable<int> lowNums = from n in numbers // Returns an enumerator
where n < 20
select n;
int numsCount = (from n in numbers // Returns an int
where n < 20
select n).Count();
The variable on the left of the equals sign is called the query variable. Although the types of the query variables are given explicitly in the example statements, you could also have had the compiler infer the types of the query variables by using the var
keyword in place of the type names.
It's important to understand the contents of query variables. After executing the preceding code, query variable lowNums
does not contain the results of the query. Instead, it contains an object of type IEnumerable<int>
, which can perform the query if it's called upon to do so later in the code. Query variable numsCount
, however, contains an actual integer value, which can have been obtained only by actually running the query.
The differences in the timing of the execution of the queries can be summarized as follows:
Figure 21-2 illustrates this for the enumerable query. Variable lowNums
contains a reference to the enumerable that can enumerate the query results from the array.
Figure 21-2. The compiler creates an object that implements IEnumerable<int> and stores the query in the object.
A query expression consists of a from
clause followed by a query body, as illustrated in Figure 21-3. Some of the important things to know about query expressions are the following:
from
clause and the select...group
clause.select
clause is at the end of the expression. This is different than SQL, where the SELECT
statement is at the beginning of a query. One of the reasons for using this position in C# is that it allows Visual Studio's IntelliSense to give you more options while you're entering code.from...let...where
clauses, as illustrated in the figure.Figure 21-3. The structure of a query statement consists of a from clause followed by a query body.
The from
clause specifies the data collection that is to be used as the data source. It also introduces the iteration variable. The important points about the from
clause are the following:
from
clause is shown following, where
The following code shows a query expression used to query an array of four int
s. Iteration variable item
will represent each of the four elements in the array and will be either selected or rejected by the where
and select
clauses following it. This code leaves out the optional type (int
) of the iteration variable.
This code produces the following output:
10, 11, 12,
Figure 21-4 shows the syntax of the from
clause. The type specifier is optional, since it can be inferred by the compiler. There can be any number of optional join
clauses.
Figure 21-4. The syntax of the from clause
Although there is a strong similarity between the LINQ from
clause and the foreach
statement, there are several major differences:
foreach
statement executes its body at the point in the code where it is encountered. The from
clause, on the other hand, does not execute anything. It creates an enumerable object that's stored in the query variable. The query itself might or might not be executed later in the code.foreach
statement imperatively specifies that the items in the collection are to be considered in order, from the first to the last. The from
clause declaratively states that each item in the collection must be considered but does not assume an order.The join
clause in LINQ is much like the JOIN
clause in SQL. If you're familiar with joins from SQL, then joins in LINQ will be nothing new for you conceptually, except for the fact that you can now perform them on collections of objects as well as database tables. If you're new to joins or need a refresher, then the next section should help clear things up for you.
The first important things to know about a join are the following:
The syntax for a join is shown here. It specifies that the second collection is to be joined with the collection in the previous clause.
Figure 21-5 illustrates the syntax for the join
clause.
Figure 21-5. Syntax for the join clause
The following annotated statement shows an example of the join
clause:
A join in LINQ takes two collections and creates a new collection where each element has members from the elements of the two original collections.
For example, the following code declares two classes: Student
and CourseStudent
.
Student
contain a student's last name and student ID number.CourseStudent
represent a student that is enrolled in a course and contain the course name and a student ID number. public class Student
{
public int StID;
public string LastName;
}
public class CourseStudent
{
public string CourseName;
public int StID;
}
Figure 21-6 shows the situation in a program where there are three students and three courses, and the students are enrolled in various courses. The program has an array called students
, of Student
objects, and an array called studentsInCourses
, of CourseStudent
objects, which contains one object for every student enrolled in each course.
Figure 21-6. Students enrolled in various courses
Suppose now that you want to get the last name of every student in a particular course. The students
array has the last names, and the studentsInCourses
array has the course enrollment information. To get the information, you must combine the information in the arrays, based on the student ID field, which is common to objects of both types. You can do this with a join on the StID
field.
Figure 21-7 shows how the join works. The left column shows the students
array, and the right column shows the studentsInCourses
array. If we take the first student record and compare its ID with the student ID in each studentsInCourses
object, we find that two of them match, as shown at the top of the center column. If we then do the same with the other two students, we find that the second student is taking one course, and the third student is taking two courses.
The five grayed objects in the middle column represent the join of the two arrays on field StID
. Each object contains three fields: the LastName
field from the Students
class, the CourseName
field from the CourseStudent
class, and the StID
field common to both classes.
Figure 21-7. Two arrays of objects and their join on field StId
The following code puts the whole example together. The query finds the last names of all the students taking the history course.
class Program
{
public class Student { // Declare classes.
public int StID;
public string LastName;
}
public class CourseStudent {
public string CourseName;
public int StID;
}
// Initialize arrays.
static CourseStudent[] studentsInCourses = new CourseStudent[] {
new CourseStudent { CourseName = "Art", StID = 1 },
new CourseStudent { CourseName = "Art", StID = 2 },
new CourseStudent { CourseName = "History", StID = 1 },
new CourseStudent { CourseName = "History", StID = 3 },
new CourseStudent { CourseName = "Physics", StID = 3 },
};
static Student[] students = new Student[] {
new Student { StID = 1, LastName = "Carson" },
new Student { StID = 2, LastName = "Klassen" },
new Student { StID = 3, LastName = "Fleming" },
};
static void Main( )
{
// Find the last names of the students taking history.
var query = from s in students
join c in studentsInCourses on s.StID equals c.StID
where c.CourseName == "History"
select s.LastName;
// Display the names of the students taking history.
foreach (var q in query)
Console.WriteLine("Student taking History: {0}", q);
}
}
This code produces the following output:
Student taking History: Carson
Student taking History: Fleming
The optional from...let...where
section is the first section of the query body. It can have any number of any of the three clauses that comprise it—the from
clause, the let
clause, and the where
clause. Figure 21-8 summarizes the syntax of the three clauses.
Figure 21-8. The syntax of the from . . . let . . . where clause
You saw that a query expression starts with a required from
clause, which is followed by the query body. The body itself can start with any number of additional from
clauses, where each subsequent from
clause specifies an additional source data collection and introduces a new iteration variable for use in further evaluations. The syntax and meanings of all the from
clauses are the same.
The following code shows an example of this use.
from
clause is the required clause of the query expression.from
clause is the first clause of the query body.select
clause creates objects of an anonymous type.This code produces the following output:
{ a = 5, b = 6, sum = 11 }
{ a = 5, b = 7, sum = 12 }
{ a = 5, b = 8, sum = 13 }
{ a = 6, b = 6, sum = 12 }
{ a = 6, b = 7, sum = 13 }
{ a = 6, b = 8, sum = 14 }
The let
clause takes the evaluation of an expression and assigns it to an identifier to be used in other evaluations. The syntax of the let
clause is the following:
let Identifier = Expression
For example, the query expression in the following code pairs each member of array groupA
with each element of array groupB
. The where
clause eliminates each set of integers from the two arrays where the sum of the two is not equal to 12.
This code produces the following output:
{ a = 3, b = 9, sum = 12 }
{ a = 4, b = 8, sum = 12 }
{ a = 5, b = 7, sum = 12 }
{ a = 6, b = 6, sum = 12 }
The where
clause eliminates items from further consideration if they don't meet the specified condition. The syntax of the where
clause is the following:
where BooleanExpression
Important things to know about the where
clause are the following:
where
clauses, as long as they are in the from...let...where
section.where
clauses to avoid elimination from further consideration.The following code shows an example of a query expression that contains two where
clauses. The where
clauses eliminate each set of integers from the two arrays where the sum of the two is not greater than or equal to 11, and the element from groupA
is not the value 4. Each set of elements selected must satisfy the conditions of both where
clauses.
This code produces the following output:
{ a = 4, b = 7, sum = 11 }
{ a = 4, b = 8, sum = 12 }
{ a = 4, b = 9, sum = 13 }
The orderby
clause takes an expression and returns the result items in order according to the expression.
Figure 21-9 shows the syntax of the orderby
clause. The optional keywords ascending
and descending
set the direction of the order. Expression is generally a field of the items.
orderby
clause is ascending. You can, however, explicitly set the ordering of the elements to either ascending or descending, using the ascending
and descending
keywords.orderby
clauses, and they must be separated by commas.Figure 21-9. The syntax of the orderby clause
The following code shows an example of student records ordered by the ages of the students. Notice that the array of student information is stored in an array of anonymous types.
This code produces the following output:
Jones, Mary: 19 - History
Smith, Bob: 20 - CompSci
Fleming, Carol: 21 - History
There are two types of clauses that make up the select...group
section—the select
clause and the group...by
clause. While the clauses that precede the select...group
section specify the data sources and which objects to choose, the select...group
section does the following:
select
clause specifies which parts of the chosen objects should be selected. It can specify any of the following:
group...by
clause is optional and specifies how the chosen items should be grouped. We'll cover the group...by
clause later in the chapter.Figure 21-10 shows the syntax for the select...group
clause.
Figure 21-10. The syntax of the select . . . group clause
The following code shows an example of using the select
clause to select the entire data item. First, the program creates an array of objects of an anonymous type. The query expression then uses the select
statement to select each item in the array.
using System;
using System.Linq;
class Program {
static void Main() {
var students = new[] // Array of objects of an anonymous type
{
new { LName="Jones", FName="Mary", Age=19, Major="History" },
new { LName="Smith", FName="Bob", Age=20, Major="CompSci" },
new { LName="Fleming", FName="Carol", Age=21, Major="History" }
};
var query = from s in students
select s;
foreach (var q in query)
Console.WriteLine("{0}, {1}: Age {2}, {3}",
q.LName, q.FName, q.Age, q.Major);
}
}
This code produces the following output:
Jones, Mary: Age 19, History
Smith, Bob: Age 20, CompSci
Fleming, Carol: Age 21, History
You can also use the select
clause to choose only particular fields of the object. For example, the select
clause in the following code selects only the last name of the student.
var query = from s in students
select s.LName;
foreach (var q in query)
Console.WriteLine(q);
When you substitute these two statements for the corresponding two statements in the preceding full example, the program produces the following output:
Jones
Smith
Fleming
The result of a query can consist of items from the source collections, fields from the items in the source collections, or anonymous types.
You can create an anonymous type in a select
clause by placing curly braces around a comma-separated list of fields you want to include in the type. For example, to make the code in the previous section select just the names and majors of the students, you could use the following syntax:
The following code creates an anonymous type in the select
clause and uses it later in the WriteLine
statement.
This code produces the following output:
Mary Jones -- History
Bob Smith -- CompSci
Carol Fleming -- History
The group
clause groups the selected objects according to some criterion. For example, with the array of students in the previous examples, the program could group the students according to their majors.
The important things to know about the group
clause are the following:
select
clause, the group
clause does not return an enumerable that can enumerate the items from the original source. Instead, it returns an enumerable that enumerates the groups of items that have been formed.An example of the syntax of the group
clause is the following:
For example, the following code groups the students according to their majors:
This code produces the following output:
History
Jones, Mary
Fleming, Carol
CompSci
Smith, Bob
Figure 21-11 illustrates the object that is returned from the query expression and stored in the query variable.
Key
.Figure 21-11. The group clause returns a collection of collections of objects rather than a collection of objects.
A query continuation clause takes the result of one part of a query and assigns it a name so that it can be used in another part of the query. Figure 21-12 shows the syntax for query continuation.
Figure 21-12. The syntax of the query continuation clause
For example, the following query joins groupA
and groupB
and names the join groupAandB
. It then performs a simple select from groupAandB
.
This code produces the following output:
4 5 6
The standard query operators comprise a set of methods called an application programming interface (API) that lets you query any .NET array or collection. Important characteristics of the standard query operators are the following:
IEnumerable<T>
interface, where T
is a type.IEnumerable
objects (or other sequences), while others return scalars. Operators that return scalars execute their queries immediately and return a value instead of an enumerable object to be iterated over later.For example, the following code shows the use of operators Sum
and Count
, which return int
s. Notice the following about the code:
numbers
.IEnumerable
object but an int
.This code produces the following output:
Total: 12, Count: 3
There are 47 standard query operators that fall into 14 different categories. These categories are shown in Table 21-1.
Table 21-1. Categories of the Standard Query Operators
Name | Number of Operators | Description |
Restriction | 1 | Returns a subset of the objects of the sequence, based on selection criteria |
Projection | 2 | Selects which parts of the objects of a sequence are finally returned |
Partitioning | 4 | Skips or returns objects from a sequence |
Join | 2 | Returns an IEnumerable object that joins two sequences, based on some criterion |
Concatenation | 1 | Produces a single sequence from two separate sequences |
Ordering | 2 | Orders a sequence based on supplied criteria |
Grouping | 1 | Groups a sequence based on supplied criteria |
Set | 4 | Performs set operations on a sequence |
Conversion | 7 | Converts sequences to various forms such as arrays, lists, and dictionaries |
Equality | 1 | Compares two sequences for equality |
Element | 9 | Returns a particular element of a sequence |
Generation | 3 | Generates sequences |
Quantifiers | 3 | Returns Boolean values specifying whether a particular predicate is true about a sequence |
Aggregate | 7 | Returns a single value representing characteristics of a sequence |
As mentioned at the beginning of the chapter, every query expression can also be written using method syntax with the standard query operators. The set of standard query operators is a set of methods for performing queries. The compiler translates every query expression into standard query operator form.
Clearly, since all query expressions are translated into the standard query operators—the operators can perform everything done by query expressions. But the operators also give additional capabilities that aren't available in query expression form. For example, operators Sum
and Count
, which were used in the previous example, can be expressed only using the method syntax.
The two forms, query expressions and method syntax, however, can be combined. For example, the following code shows a query expression that also uses operator Count
. Notice that the query expression part of the statement is inside parentheses, which is followed by a dot and the name of the method.
This code produces the following output:
Count: 3
The standard query operators are methods declared in class System.Linq.Enumerable
. These methods, however, aren't just any methods—they are extension methods that extend generic class IEnumerable<T>
.
Extension methods were covered in Chapters 7 and 19, but the most important thing to remember about them is that they are public
, static
methods that, although defined in one class, are designed to add functionality to a different class—the one listed as the first formal parameter. This formal parameter must be preceded by the keyword this
.
For example, the following are the signatures of three of the operators: Count
, First
, and Where
. At first glance, the signatures of the operators can be somewhat intimidating. Notice the following about the signatures:
T
) associated with their names.IEnumerable<T>
, they must satisfy the following syntactic requirements:
public
and static
.this
extension indicator before the first parameter.IEnumerable<T>
as the first parameter type.For example, the following code shows the use of operators Count
and First
. Both operators take only a single parameter—the reference to the IEnumerable<T>
object.
Count
operator returns a single value, which is the count of all the elements in the sequence.First
operator returns the first element of the sequence.The first two times the operators are used in this code, they are called directly, just like normal methods, passing the name of the array as the first parameter. In the following two lines, however, they are called using the extension method syntax, as if they were method members of the array, which is enumerable. Notice that in this case no parameter is supplied. Instead, the array name has been moved from the parameter list to before the method name. There it is used as if it contained a declaration of the method.
The direct syntax calls and the extension syntax calls are completely equivalent in effect—only their syntax is different.
This code produces the following output:
Count: 6, FirstNumber: 3
Count: 6, FirstNumber: 3
As you just saw in the previous section, the first parameter of every operator is a reference to an IEnumerable<T>
object. The parameters following it can be of any type. Many operators take generic delegates as parameters. (Generic delegates were explained in Chapter 19.) The most important thing to recall about generic delegates as parameters is the following:
To explain this, I'll start with an example showing several ways you might use the Count
operator. The Count
operator is overloaded and has two forms. The first form, which was used in the previous example, has a single parameter, as shown here:
public static int Count<T>(this IEnumerable<T> source);
Like all extension methods, you can use it in the standard static method form or in the form of an instance method on an instance of the class it extends, as shown in the following two lines of code:
var count1 = Linq.Enumerable.Count(intArray); // Static method form
var count2 = intArray.Count(); // Instance method form
In these two instances, the query counts the number of int
s in the given integer array. Suppose, however, that you only want to count the odd elements of the array. To do that, you must supply the Count
method with code that determines whether an integer is odd.
To do this, you would use the second form of the Count
method, which is shown following. It has a generic delegate as its second parameter. At the point it is invoked, you must supply a delegate object that takes a single input parameter of type T
and returns a Boolean value. The return value of the delegate code must specify whether the element should be included in the count.
For example, the following code uses the second form of the Count
operator to instruct it to include only those values that are odd. It does this by supplying a lambda expression that returns true
if the input value is odd and false
otherwise. (Lambda expressions were covered in Chapter 15.) At each iteration through the collection, Count
calls this method (represented by the lambda expression) with the current value as input. If the input is odd, the method returns true
, and Count
includes the element in the total.
This code produces the following output:
Count of odd numbers: 4
Like the Count
operator from the previous example, many of the LINQ operators require you to supply code that directs how the operator performs its operation. You can do this by using delegate objects as parameters.
Remember from Chapter 15 that you can think of a delegate object as an object that contains a method or list of methods with a particular signature and return type. When the delegate is invoked, the methods it contains are invoked in sequence.
LINQ defines two families of generic delegate types for use with the standard query operators. These are the Func
delegates and the Action
delegates. Each set has 17 members.
TR
represents the return type and is always last in the list of type parameters.The first four generic Func
delegates are listed here. The first form takes no method parameters and returns an object of the return type. The second takes a single method parameter and returns a value, and so forth. Notice that the return type parameter has the out
keyword, making it covariant. It can therefore accept the type declared or any type derived from that type. The input parameters have the in
keyword, making them contravariant. They, therefore, can accept the declared type, or any type derived from that type.
With this in mind, if you look again at the declaration of Count
, which follows, you can see that the second parameter must be a delegate object that takes a single value of some type T
as the method parameter and returns a value of type bool
.
A parameter delegate that produces a Boolean value is called a predicate.
The first four Action
delegates are the following. They're the same as the Func
delegates except that they have no return value and hence no return value type parameter. All their type parameters are contravariant.
public delegate void Action ( );
public delegate void Action<in T1> ( T1 a1 );
public delegate void Action<in T1, in T2> ( T1 a1, T2 a2 );
public delegate void Action<in T1, in T2, in T3>( T1 a1, T2 a2, T3 a3 );
Now that you better understand Count
's signature and LINQ's use of generic delegate parameters, you'll be better able to understand a full example.
The following code first declares method IsOdd
, which takes a single parameter of type int
and returns a bool
value stating whether the input parameter was odd. Method Main
does the following:
int
s as the data source.MyDel
of type Func<int, bool>
, and it uses method IsOdd
to initialize the delegate object. Notice that you don't need to declare the Func
delegate type because, as you saw, it's already predefined by LINQ.Count
using the delegate object. class Program
{
static bool IsOdd(int x) // Method to be used by the delegate object
{
return x % 2 == 1; // Return true if x is odd.
}
static void Main()
{
int[] intArray = new int[] { 3, 4, 5, 6, 7, 9 };
Func<int, bool> myDel = new Func<int, bool>(IsOdd); // Delegate object
var countOdd = intArray.Count(myDel); // Use delegate
Console.WriteLine("Count of odd numbers: {0}", countOdd);
}
}
This code produces the following output:
Count of odd numbers: 4
The previous example used a separate method and a delegate to attach the code to the operator. This required declaring the method, declaring the delegate object, and then passing the delegate object to the operator. This works fine and is exactly the right approach to take if either of the following conditions is true:
If neither of these conditions is true, however, you probably want to use a more compact and localized method of supplying the code to the operator, using a lambda expression as described in Chapter 15.
We can modify the previous example to use a lambda expression by first deleting the IsOdd
method entirely and placing the equivalent lambda expression directly at the declaration of the delegate object. The new code is shorter and cleaner and looks like this:
Like the previous example, this code produces the following output:
Count of odd numbers: 4
We could also have used an anonymous method in place of the lambda expression, as shown following. This is more verbose, though, and since lambda expressions are equivalent semantically and are less verbose, there's little reason to use anonymous methods anymore.
Extensible Markup Language (XML) is an important method of storing and exchanging data. LINQ adds features to the language that make working with XML much easier than previous methods such as XPath and XSLT. If you're familiar with these methods, you might be pleased to hear that LINQ to XML simplifies the creation, traversal, and manipulation of XML in a number of ways, including the following:
Text
subnode.Although I won't give a complete treatment of XML, I will start by giving a very brief introduction to it before describing some of the XML manipulation features supplied by LINQ.
A markup language is a set of tags placed in a document to give information about the information in the document. That is, the markup tags are not the data of the document—they contain data about the data. Data about data is called metadata.
A markup language is a defined set of tags designed to convey particular types of metadata about the contents of a document. HTML, for example, is the most widely known markup language. The metadata in its tags contains information about how a web page should be rendered in a browser and how to navigate among the pages using the hypertext links.
While most markup languages contain a predefined set of tags, XML contains only a few defined tags, and the rest are defined by the programmer to represent whatever kinds of metadata are required by a particular document type. As long as the writer and reader of the data agree on what the tags mean, the tags can contain whatever useful information the designers want.
Data in an XML document is contained in an XML tree, which consists mainly of a set of nested elements.
The element is the fundamental constituent of an XML tree. Every element has a name and can contain data. Some can also contain other, nested elements. Elements are demarcated by opening and closing tags. Any data contained by an element must be between its opening and closing tags.
<PhoneNumber>
</PhoneNumber>
<PhoneNumber />
The following XML fragment shows an element named EmployeeName
followed by an empty element named PhoneNumber
.
Other important things to know about XML are the following:
The following XML document is an example of XML that contains information about two employees. This XML tree is extremely simple in order to show the elements clearly. The important things to notice about the XML tree are the following:
Employees
that contains two child nodes of type Employee
.Employee
node contains nodes containing the name and phone numbers of an employee. <Employees>
<Employee>
<Name>Bob Smith</Name>
<PhoneNumber>408-555-1000</PhoneNumber>
<CellPhone />
</Employee>
<Employee>
<Name>Sally Jones</Name>
<PhoneNumber>415-555-2000</PhoneNumber>
<PhoneNumber>415-555-2001</PhoneNumber>
</Employee>
</Employees>
Figure 21-13 illustrates the hierarchical structure of the sample XML tree.
Figure 21-13. Hierarchical structure of the sample XML tree
LINQ to XML can be used to work with XML in two ways. The first way is as a simplified XML manipulation API. The second way is to use the LINQ query facilities you've seen throughout the earlier part of this chapter. I'll start by introducing the LINQ to XML API.
The LINQ to XML API consists of a number of classes that represent the components of an XML tree. The three most important classes you'll use are XElement
, XAttribute
, and XDocument
. There are other classes as well, but these are the main ones.
In Figure 21-13, you saw that an XML tree is a set of nested elements. Figure 21-14 shows the classes used to build an XML tree and how they can be nested.
For example, the figure shows the following:
XDocument
node can have the following as its direct child nodes:
XDeclaration
node, an XDocumentType
node, and an XElement
nodeXProcessingInstruction
nodesXElement
node under the XDocument
, it is the root of the rest of the elements in the XML tree.XElement
, XComment
, or XProcessingInstruction
nodes, nested to any level.Figure 21-14. The containment structure of XML nodes
Except for the XAttribute
class, most of the classes used to create an XML tree are derived from a class called XNode
and are referred to generically in the literature as “XNodes.” Figure 21-14 shows the XNode
classes in white clouds, while the XAttribute
class is shown in a gray cloud.
The best way to demonstrate the simplicity and usage of the XML API is to show simple code samples. For example, the following code shows how simple it is to perform several of the important tasks required when working with XML.
It starts by creating a simple XML tree consisting of a node called Employees
, with two subnodes containing the names of two employees. Notice the following about the code:
After creating the tree, the code saves it to a file called EmployeesFile.xml
, using XDocument
's Save
method. It then reads the XML tree back from the file using XDocument
's static Load
method and assigns the tree to a new XDocument
object. Finally, it uses WriteLine
to display the structure of the tree held by the new XDocument
object.
This code produces the following output:
<Employees>
<Name>Bob Smith</Name>
<Name>Sally Jones</Name>
</Employees>
In the previous example, you saw that you can create an XML document in-memory by using constructors for XDocument
and XElement
. In the case of both constructors
params
parameter, and so can have any number of parameters.For example, the following code produces an XML tree and displays it using the Console.WriteLine
method:
using System;
using System.Xml.Linq; // This namespace is required.
class Program
{
static void Main( ) {
XDocument employeeDoc =
new XDocument( // Create the document.
new XElement("Employees", // Create the root element.
new XElement("Employee", // First employee element
new XElement("Name", "Bob Smith"),
new XElement("PhoneNumber", "408-555-1000") ),
new XElement("Employee", // Second employee element
new XElement("Name", "Sally Jones"),
new XElement("PhoneNumber", "415-555-2000"),
new XElement("PhoneNumber", "415-555-2001") )
)
);
Console.WriteLine(employeeDoc); // Displays the document
}
}
This code produces the following output:
<Employees>
<Employee>
<Name>Bob Smith</Name>
<PhoneNumber>408-555-1000</PhoneNumber>
</Employee>
<Employee>
<Name>Sally Jones</Name>
<PhoneNumber>415-555-2000</PhoneNumber>
<PhoneNumber>415-555-2001</PhoneNumber>
</Employee>
</Employees>
The power of XML becomes evident when you traverse an XML tree and retrieve or modify values. Table 21-2 shows the main methods used for retrieving data.
Table 21-2. Methods for Querying XML
Method Name | Class | Return Type | Description |
Nodes |
Xdocument |
IEnumerable<object> |
Returns all the children of the current node, regardless of their type |
Elements |
Xdocument |
IEnumerable<XElement> |
Returns all the current node's XElement child nodes or all the child nodes with a specific name |
Element |
Xdocument |
XElement |
Returns the current node's first XElement child node or the first child node with a specific name |
Descendants |
XElement |
IEnumerable<XElement> |
Returns all the descendant XElement nodes or all the descendant XElement nodes with a specific name, regardless of their level of nesting below the current node |
DescendantsAndSelf |
XElement |
IEnumerable<XElement> |
Same as Descendants but also includes the current node |
Ancestors |
XElement |
IEnumerable<XElement> |
Returns all the ancestor XElement nodes or all the ancestor XElement nodes above the current node that have a specific name |
AncestorsAndSelf |
XElement |
IEnumerable<XElement> |
Same as Ancestors but also includes the current node |
Parent |
XElement |
XElement |
Returns the parent node of the current node |
Some of the important things to know about the methods in Table 21-2 are the following:
Nodes
: The Nodes
method returns an object of type IEnumerable<object>
, because the nodes returned might be of different types, such as XElement
, XComment
, and so on. You can use the type parameterized method OfType<
type>
to specify what type of nodes to return. For example, the following line of code retrieves only the XComment
nodes:
IEnumerable<XComment> comments = xd.Nodes().OfType<XComment>();
Elements
: Since retrieving XElements
is such a common requirement, there is a shortcut for expression Nodes().OfType<XElement>()
—the Elements
method.
Elements
method with no parameters returns all the child XElement
s.Elements
method with a single name parameter returns only the child XElement
s with that name. For example, the following line of code returns all the child XElement
nodes with the name PhoneNumber.
IEnumerable<XElement> empPhones = emp.Elements("PhoneNumber");
Element
: This method retrieves just the first child XElement
of the current node. Like the Elements
method, it can be called with either one or no parameters. With no parameters, it gets the first child XElement
node. With a single name parameter, it gets the first child XElement
node of that name.Descendants
and Ancestors
: These methods work like the Elements
and Parent
methods, but instead of returning the immediate child elements or parent element, they include the elements below or above the current node, regardless of the difference in nesting level.The following code illustrates the Element
and Elements
methods:
This code produces the following output:
Bob Smith
408-555-1000
Sally Jones
415-555-2000
415-555-2001
You can add a child element to an existing element using the Add
method. The Add
method allows you to add as many elements as you like in a single method call, regardless of the node types you are adding.
For example, the following code creates a simple XML tree and displays it. It then uses the Add
method to add a single node to the root element. Following that, it uses the Add
method a second time to add three elements—two XElements
and an XComment
. Notice the results in the output:
using System;
using System.Xml.Linq;
class Program
{
static void Main()
{
XDocument xd = new XDocument( // Create XML tree
new XElement("root",
new XElement("first")
)
);
Console.WriteLine("Original tree");
Console.WriteLine(xd); Console.WriteLine(); // Display the tree.
XElement rt = xd.Element("root"); // Get the first element.
rt.Add( new XElement("second")); // Add a child element.
rt.Add( new XElement("third"), // Add three more children.
new XComment("Important Comment"),
new XElement("fourth"));
Console.WriteLine("Modified tree");
Console.WriteLine(xd); // Display modified tree
}
}
This code produces the following output:
<root>
<first />
</root>
<root>
<first />
<second />
<third />
<!--Important Comment-->
<fourth />
</root>
The Add
method places the new child nodes after the existing child nodes, but you can place the nodes before and between the child nodes as well, using the AddFirst
, AddBeforeSelf
, and AddAfterSelf
methods.
Table 21-3 lists some of the most important methods for manipulating XML. Notice that some of the methods are applied to the parent node and others to the node itself.
Table 21-3. Methods for Manipulating XML
Method Name | Call From | Description |
Add |
Parent | Adds new child nodes after the existing child nodes of the current node |
AddFirst |
Parent | Adds new child nodes before the existing child nodes of the current node |
AddBeforeSelf |
Node | Adds new nodes before the current node at the same level |
AddAfterSelf |
Node | Adds new nodes after the current node at the same level |
Remove |
Node | Deletes the currently selected node and its contents |
RemoveNodes |
Node | Deletes the currently selected XElement and its contents |
SetElement |
Parent | Sets the contents of a node |
ReplaceContent |
Node | Replaces the contents of a node |
Attributes give additional information about an XElement
node. They're placed in the opening tag of the XML element.
When you functionally construct an XML tree, you can add attributes by just including XAttribute
constructors within the scope of the XElement
constructor. There are two forms of the XAttribute
constructor; one takes a name and a value, and the other takes a reference to an already existing XAttribute
.
The following code adds two attributes to root
. Notice that both parameters to the XAttribute
constructor are strings; the first specifies the name of the attribute, and the second gives the value.
This code produces the following output. Notice that the attributes are placed inside the opening tag of the element.
<root color="red" size="large">
<first />
<second />
</root>
To retrieve an attribute from an XElement
node, use the Attribute
method, supplying the name of the attribute as the parameter. The following code creates an XML tree with a node with two attributes—color
and size
. It then retrieves the values of the attributes and displays them.
static void Main( )
{
XDocument xd = new XDocument( // Create XML tree
new XElement("root",
new XAttribute("color", "red"),
new XAttribute("size", "large"),
new XElement("first")
)
);
Console.WriteLine(xd); Console.WriteLine(); // Display XML tree
XElement rt = xd.Element("root"); // Get the element.
XAttribute color = rt.Attribute("color"); // Get the attribute.
XAttribute size = rt.Attribute("size"); // Get the attribute.
Console.WriteLine("color is {0}", color.Value); // Display attr. value
Console.WriteLine("size is {0}", size.Value); // Display attr. value
}
This code produces the following output:
<root color="red" size="large">
<first />
</root>
color is red
size is large
To remove an attribute, you can select the attribute and use the Remove
method or use the SetAttributeValue
method on its parent and set the attribute value to null
. The following code demonstrates both methods:
static void Main( ) {
XDocument xd = new XDocument(
new XElement("root",
new XAttribute("color", "red"),
new XAttribute("size", "large"),
new XElement("first")
)
);
XElement rt = xd.Element("root"); // Get the element.
rt.Attribute("color").Remove(); // Remove the color attribute.
rt.SetAttributeValue("size", null); // Remove the size attribute.
Console.WriteLine(xd);
}
This code produces the following output:
<root>
<first />
</root>
To add an attribute to an XML tree or change the value of an attribute, you can use the SetAttributeValue
method, as shown in the following code:
static void Main( ) {
XDocument xd = new XDocument(
new XElement("root",
new XAttribute("color", "red"),
new XAttribute("size", "large"),
new XElement("first")));
XElement rt = xd.Element("root"); // Get the element.
rt.SetAttributeValue("size", "medium"); // Change attribute value
rt.SetAttributeValue("width", "narrow"); // Add an attribute.
Console.WriteLine(xd); Console.WriteLine();
}
This code produces the following output:
<root color="red" size="medium" width="narrow">
<first />
</root>
Three other types of nodes used in the previous examples are XComment
, XDeclaration
, and XProcessingInstruction
. They're described in the following sections.
Comments in XML consist of text between the <!--
and -->
tokens. The text between the tokens is ignored by XML parsers. You can insert text in an XML document using the XComment
class, as shown in the following line of code:
new XComment("This is a comment")
XML documents start with a line that includes the version of XML used, the type of character encoding used, and whether the document depends on external references. This is information about the XML, so it's actually metadata about the metadata! This is called the XML declaration and is inserted using the XDeclaration
class. The following shows an example of an XDeclaration
statement:
new XDeclaration("1.0", "utf-8", "yes")
An XML processing instruction is used to supply additional data about how an XML document should be used or interpreted. Most commonly, processing instructions are used to associate a style sheet with the XML document.
You can include a processing instruction using the XProcessingInstruction
constructor, which takes two string parameters—a target and a data string. If the processing instruction takes multiple data parameters, those parameters must be included in the second parameter string of the XProcessingInstruction
constructor, as shown in the following constructor code. Notice that in this example, the second parameter is a verbatim string, and literal double quotes inside the string are represented by sets of two contiguous double quote marks.
new XProcessingInstruction( "xml-stylesheet",
@"href=""stories"", type=""text/css""")
The following code uses all three constructs:
static void Main( )
{
XDocument xd = new XDocument(
new XDeclaration("1.0", "utf-8", "yes"),
new XComment("This is a comment"),
new XProcessingInstruction("xml-stylesheet",
@"href=""stories.css"" type=""text/css"""),
new XElement("root",
new XElement("first"),
new XElement("second")
)
);
}
This code produces the following output in the output file. Using a WriteLine
of xd
, however, would not show the declaration statement, even though it is included in the document file.
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!--This is a comment-->
<?xml-stylesheet href="stories.css" type="text/css"?>
<root>
<first />
<second />
</root>
You can combine the LINQ XML API with LINQ query expressions to produce simple yet powerful XML tree searches.
The following code creates a simple XML tree, displays it to the screen, and then saves it to a file called SimpleSample.xml
. Although there's nothing new in this code, we'll use this XML tree in the following examples.
static void Main( )
{
XDocument xd = new XDocument(
new XElement("MyElements",
new XElement("first",
new XAttribute("color", "red"),
new XAttribute("size", "small")),
new XElement("second",
new XAttribute("color", "red"),
new XAttribute("size", "medium")),
new XElement("third",
new XAttribute("color", "blue"),
new XAttribute("size", "large"))));
Console.WriteLine(xd); // Display XML tree
xd.Save("SimpleSample.xml"); // Save XML tree
}
This code produces the following output:
<MyElements>
<first color="red" size="small" />
<second color="red" size="medium" />
<third color="blue" size="large" />
</MyElements>
The following example code uses a simple LINQ query to select a subset of the nodes from the XML tree and then displays them in several ways. This code does the following:
Attribute
method, and the values of the attributes are retrieved with the Value
property.This code produces the following output:
first
third
Name: first, color: red, size: small
Name: third, color: blue, size: large
The following code uses a simple query to retrieve all the top-level elements of the XML tree and creates an object of an anonymous type for each one. The first use of the WriteLine
method shows the default formatting of the anonymous type. The second WriteLine
statement explicitly formats the members of the anonymous type objects.
This code produces the following output. The first three lines show the default formatting of the anonymous type. The last three lines show the explicit formatting specified in the format string of the second WriteLine
method.
{ Name = first, color = color="red" }
{ Name = second, color = color="red" }
{ Name = third, color = color="blue" }
first , color: red
second, color: red
third , color: blue
From these examples, you can see that you can easily combine the XML API with the LINQ query facilities to produce powerful XML querying capabilities.
18.223.108.119