One of the common programming tasks C# programmers perform every day is finding and retrieving objects in memory, a database, or an XML file. For example, you may be developing a cell phone customer support system that will allow a customer to see how much each member of the family has spent in phone calls. To do so, you’ll need to retrieve records from various sources (phone company records online, phone books kept locally, etc.), filtered by various criteria (by name or by month), and sorted in various ways (e.g., by date, by family member).
One way you might have implemented this in the past would be to search a database by address, returning all the records to the user, perhaps presenting them in a listbox. The user would pick the name she was interested in and the data of interest (e.g., the number of ringtones downloaded in the past three months), and you would go back to the database (or perhaps to a different database) and retrieve that information, using the chosen family member’s unique ID as a key, retrieving the required data.
Although C# provides support for in-memory searches such as finding a name in a collection, traditionally, you were required to turn to another technology (such as ADO.NET) to retrieve data from a database. Although ADO.NET made this fairly easy, a sharp distinction was drawn between retrieving data from in-memory collections and retrieving data from persistent storage.
In-memory searches lacked the powerful and flexible query capabilities of SQL, whereas ADO.NET was not integrated into C#, and SQL itself was not object-oriented (in fact, the point of ADO.NET was to bridge the object-to-relational model). LINQ is an integrated feature of C# 3.0 itself, and thus (at long last) brings an object-oriented bridge over the impedance mismatch between object-oriented languages and relational databases.
The goal of LINQ (Language-INtegrated Query) is to integrate extensive query capabilities into the C# language, to make SQL-like capabilities part of the language, and to remove the distinctions among searching a database, an XML document, or an in-memory data collection.
This chapter will introduce LINQ and show how it fits into C# and into your programming. Subsequent chapters will dive into the details of using LINQ to retrieve and manipulate data in databases and in other data repositories. You’ll learn about ADO.NET in Chapter 16.
In previous versions of C#, if you wanted to find an object in a database you had to leave C# and turn to the Framework (most often ADO.NET). With LINQ, you can stay within C#, and thus within a fully class-based perspective.
Many books start with anonymous methods, then introduce Lambda expressions, and finally introduce LINQ. It is my experience that it is far easier to understand each of these concepts by going in the opposite direction, starting with queries and introducing Lambda expressions for what they are: enabling technologies. Each of these topics will, however, be covered here and in subsequent chapters.
Let’s start simply by searching a collection for objects that match a given criterion, as demonstrated in Example 13-1.
using System; using System.Collections.Generic; using System.Linq; namespace Programming_CSharp { // Simple customer class public class Customer { public string FirstName { get; set; } public string LastName { get; set; } public string EmailAddress { get; set; } // Overrides the Object.ToString( ) to provide a // string representation of the object properties. public override string ToString( ) { return string.Format("{0} {1} Email: {2}", FirstName, LastName, EmailAddress); } } // Main program public class Tester { static void Main( ) { List<Customer> customers = CreateCustomerList( ); // Find customer by first nameIEnumerable<Customer> result = from customer in customers where customer.FirstName == "Donna" select customer; Console.WriteLine("FirstName == "Donna""); foreach (Customer customer in result) Console.WriteLine(customer.ToString( )); customers[3].FirstName = "Donna"; Console.WriteLine("FirstName == "Donna" (take two)"); foreach (Customer customer in result) Console.WriteLine(customer.ToString( )); } // Create a customer list with sample data private static List<Customer> CreateCustomerList( ) { List<Customer> customers = new List<Customer> { new Customer { FirstName = "Orlando", LastName = "Gee", EmailAddress = "[email protected]"}, new Customer { FirstName = "Keith", LastName = "Harris", EmailAddress = "[email protected]" }, new Customer { FirstName = "Donna", LastName = "Carreras", EmailAddress = "[email protected]" }, new Customer { FirstName = "Janet", LastName = "Gates", EmailAddress = "[email protected]" }, new Customer { FirstName = "Lucy", LastName = "Harrington", EmailAddress = "[email protected]" } }; return customers; } } } Output: FirstName == "Donna" Donna Carreras Email: [email protected] FirstName == "Donna" (take two) Donna Carreras Email: [email protected] Donna Gates Email: [email protected]
Example 13-1 defines a simple Customer
class with three properties: FirstName, LastName
, and EmailAddress
. It overrides the Object.ToString( )
method to provide a string representation of its instances.
The program starts by creating a customer list with some sample data, taking advantage of object initialization as discussed in Chapter 4. Once the list of customers is created, Example 13-1 defines a LINQ query:
IEnumerable<Customer> result = from customer in customers where customer.FirstName == "Donna" select customer;
The result
variable is initialized with a query expression. In this example, the query will retrieve all Customer
objects whose first name is “Donna” from the customer list. The result of such a query is a collection that implements IEnumerable<T>
, where T
is the type of the result object. In this example, because the query result is a set of Customer
objects, the type of the result variable is IEnumerable<Customer>
.
Let’s dissect the query and look at each part in more detail.
The first part of a LINQ query is the from
clause:
from customer in customers
The generator of a LINQ query specifies the data source and a range variable. A LINQ data source can be any collection that implements the System.Collections.Generic.IEnumerable<T>
interface. In this example, the data source is customers
, an instance of List<Customer>
that implements IEnumerable<T>
.
You’ll see how to do the same query against a SQL database in Chapter 15.
A LINQ range variable is like an iteration variable in a foreach
loop, iterating over the data source. Because the data source implements IEnumerable<T>
, the C# compiler can infer the type of the range variable from the data source. In this example, because the type of the data source is List<Customer>
, the range variable customer
is of type Customer
.
The second part of this LINQ query is the where
clause, which is also called a filter. This portion of the clause is optional:
where customer.FirstName == "Donna"
The filter is a Boolean expression. It is common to use the range variable in a where
clause to filter the objects in the data source. Because customer
in this example is of type Customer
, you use one of its properties, in this case FirstName
, to apply the filter for your query.
Of course, you may use any Boolean expression as your filter. For instance, you can invoke the String.StartsWith( )
method to filter customers by the first letter of their last name:
where customer.LastName.StartsWith("G")
You can also use composite expressions to construct more complex queries. In addition, you can use nested queries where the result of one query (the inner query) is used to filter another query (the outer query).
The last part of a LINQ query is the select
clause (known to database geeks as the “projection”), which defines (or projects) the results:
select customer;
In this example, the query returns the customer objects that satisfy the query condition. You may constrain which fields you project, much as you would with SQL. For instance, you can return only the qualified customers’ email addresses only:
select customer.EmailAddress;
LINQ implements deferred query evaluation, meaning that the declaration and initialization of a query expression do not actually execute the query. Instead, a LINQ query is executed, or evaluated, when you iterate through the query result:
foreach (Customer customer in result) Console.WriteLine(customer.ToString( ));
Because the query returns a collection of Customer
objects, the iteration variable is an instance of the Customer
class. You can use it as you would any Customer
object. This example simply calls each Customer
object’s ToString( )
method to output its property values to the console.
Each time you iterate through this foreach
loop, the query will be reevaluated. If the data source has changed between executions, the result will be different. This is demonstrated in the next code section:
customers[3].FirstName = "Donna";
Here, you modify the first name of the customer “Janet Gates” to “Donna” and then iterate through the result again:
Console.WriteLine("FirstName == "Donna" (take two)"); foreach (Customer customer in result) Console.WriteLine(customer.ToString( ));
As shown in the sample output, you can see that the result now includes Donna Gates as well.
In most situations, deferred query evaluation is desired because you want to obtain the most recent data in the data source each time you run the query. However, if you want to cache the result so that it can be processed later without having to reexecute the query, you can call either the ToList( )
or the ToArray( )
method to save a copy of the result. Example 13-2 demonstrates this technique as well.
using System; using System.Collections.Generic; using System.Linq; namespace Programming_CSharp { // Simple customer class public class Customer { // Same as in Example 13-1 } // Main program public class Tester { static void Main( ) { List<Customer> customers = CreateCustomerList( ); // Find customer by first name IEnumerable<Customer> result = from customer in customers where customer.FirstName == "Donna" select customer; List<Customer> cachedResult = result.ToList<Customer>( ); Console.WriteLine("FirstName == "Donna""); foreach (Customer customer in cachedResult) Console.WriteLine(customer.ToString( )); customers[3].FirstName = "Donna"; Console.WriteLine("FirstName == "Donna" (take two)"); foreach (Customer customer in cachedResult) Console.WriteLine(customer.ToString( )); } // Create a customer list with sample data private static List<Customer> CreateCustomerList( ) { // Same as in Example 13-1 } } } Output: FirstName == "Donna" Donna Carreras Email: [email protected] FirstName == "Donna" (take two) Donna Carreras Email: [email protected]
In this example, you call the ToList<T>
method of the result
collection to cache the result. Note that calling this method causes the query to be evaluated immediately. If the data source is changed after this, the change will not be reflected in the cached result. You can see from the output that there is no Donna Gates in the result.
One interesting point here is that the ToList<T>
and ToArray<T>
methods are not actually methods of IEnumerable
; that is, if you look in the documentation for IEnumerable
, you will not see them in the methods list. They are actually extension methods provided by LINQ. We will look at extension methods in more detail later in this chapter.
If you are familiar with SQL, you will notice a striking similarity between LINQ and SQL, at least in their syntax. The only odd-one-out at this stage is that the select
statement in LINQ appears at the end of LINQ query expressions, instead of at the beginning, as in SQL. Because the generator, or the from
clause, defines the range variable, it must be stated first. Therefore, the projection
part is pushed back.
LINQ provides many of the common SQL operations, such as join queries, grouping, aggregation, and sorting of results. In addition, it allows you to use the object-oriented features of C# in query expressions and processing, such as hierarchical query results.
You will often want to search for objects from more than one data source. LINQ provides the join
clause that offers the ability to join many data sources, not all of which need be databases. Suppose you have a list of customers containing customer names and email addresses, and a list of customer home addresses. You can use LINQ to combine both lists to produce a list of customers, with access to both their email and home addresses:
from customer in customers join address in addresses on customer.Name equals address.Name ...
The join
condition is specified in the on
subclause, similar to SQL, except that the objects joined need not be tables or views in a database. The join
class syntax is:
[data source 1] join [data source 2] on [join condition]
Here, we are joining two data sources, customers and addresses, based on the customer name properties in each object. In fact, you can join more than two data sources using a combination of join
clauses:
from customer in customers join address in addresses on customer.Name equals address.Name join invoice in invoices on customer.Id equals invoice.CustomerId join invoiceItem in invoiceItems on invoice.Id equals invoiceItem.invoiceId
A LINQ join clause returns a result only when objects satisfying the join condition exist in all data sources. For instance, if a customer has no invoice, the query will not return anything for that customer, not even her name and email address. This is the equivalent of a SQL inner join clause.
LINQ cannot perform an outer join (which returns a result if either of the data sources contains objects that meet the join condition).
You can also specify the sort order in LINQ queries with the orderby
clause:
from customer in Customers orderby customer.LastName select customer;
This sorts the result by customer last name in ascending order. Example 13-3 shows how you can sort the results of a join
query.
using System; using System.Collections.Generic; using System.Linq; namespace Programming_CSharp { // Simple customer class public class Customer { // Same as in Example 13-1 } // Customer address class public class Address { public string Name { get; set; } public string Street { get; set; } public string City { get; set; } // Overrides the Object.ToString( ) to provide a // string representation of the object properties. public override string ToString( ) { return string.Format("{0}, {1}", Street, City); } } // Main program public class Tester { static void Main( ) { List<Customer> customers = CreateCustomerList( ); List<Address> addresses = CreateAddressList( ); // Find all addresses of a customer var result = from customer in customers join address in addresses on string.Format("{0} {1}", customer.FirstName, customer.LastName) equals address.Name orderby customer.LastName, address.Street descending select new { Customer = customer, Address = address }; foreach (var ca in result) { Console.WriteLine(string.Format("{0} Address: {1}", ca.Customer, ca.Address)); } } // Create a customer list with sample data private static List<Customer> CreateCustomerList( ) { // Same as in Example 13-1 } // Create a customer list with sample data private static List<Address> CreateAddressList( ) { List<Address> addresses = new List<Address> { new Address { Name = "Janet Gates", Street = "165 North Main", City = "Austin" }, new Address { Name = "Keith Harris", Street = "3207 S Grady Way", City = "Renton" }, new Address { Name = "Janet Gates", Street = "800 Interchange Blvd.", City = "Austin" }, new Address { Name = "Keith Harris", Street = "7943 Walnut Ave", City = "Renton" }, new Address { Name = "Orlando Gee", Street = "2251 Elliot Avenue", City = "Seattle" } }; return addresses; } } } Output: Janet Gates Email: [email protected] Address: 800 Interchange Blvd., Austin Janet Gates Email: [email protected] Address: 165 North Main, Austin Orlando Gee Email: [email protected] Address: 2251 Elliot Avenue, Seattle Keith Harris Email: [email protected] Address: 7943 Walnut Ave, Renton Keith Harris Email: [email protected] Address: 3207 S Grady Way, Renton
The Customer
class is identical to the one used in Example 13-1. The address is also very simple, with a customer name field containing customer names in the <first name> <last name>
form, and the street and city of customer addresses.
The CreateCustomerList( )
and CreateAddressList( )
methods are just helper functions to create sample data for this example. This example also uses the new C# object and collection initializers, as explained in Chapter 4.
The query definition, however, looks quite different from the last example:
var result = from customer in customers join address in addresses on string.Format("{0} {1}", customer.FirstName, customer.LastName) equals address.Name orderby customer.LastName, address.Street descending select new { Customer = customer, Address = address.Street };
The first difference is the declaration of the result. Instead of declaring the result as an explicitly typed IEnumerable<Customer>
instance, this example declares the result as an implicitly typed variable using the new var
keyword. We will leave this for just a moment, and jump to the query definition itself.
The generator now contains a join
clause to signify that the query is to be operated on two data sources: customers and addresses. Because the customer name property in the Address
class is a concatenation of customer first and last names, you construct the names in Customer
objects to the same format:
string.Format("{0} {1}", customer.FirstName, customer.LastName)
The dynamically constructed customer full name is then compared with the customer name property in the Address
objects using the equals
operator:
string.Format("{0} {1}", customer.FirstName, customer.LastName) equals address.Name
The orderby
clause indicates the order in which the result should be sorted:
orderby customer.LastName, address.Street descending
In the example, the result will be sorted first by customer last name in ascending order, then by street address in descending order.
The combined customer name, email address, and home address are returned. Here you have a problem—LINQ can return a collection of objects of any type, but it can’t return multiple objects of different types in the same query, unless they are encapsulated in one type. For instance, you can select either an instance of the Customer
class or an instance of the Address
class, but you cannot select both, like this:
select customer, address
The solution is to define a new type containing both objects. An obvious way is to define a CustomerAddress
class:
public class CustomerAddress { public Customer Customer { get; set; } public Address Address { get; set; } }
You can then return customers and their addresses from the query in a collection of CustomerAddress
objects:
var result = from customer in customers join address in addresses on string.Format("{0} {1}", customer.FirstName, customer.LastName) equals address.Name orderby customer.LastName, address.Street descending Select new CustomerAddress { Customer = customer, Address = address };
Another powerful feature of LINQ, commonly used by SQL programmers but now integrated into the language itself, is grouping, as shown in Example 13-4.
using System;
using System.Collections.Generic;
using System.Linq;
namespace Programming_CSharp
{
// Customer address class
public class Address
{
// Same as in Example 13-3
}
// Main program
public class Tester
{
static void Main( )
{
List<Address> addresses = CreateAddressList( );// Find addresses grouped by customer name
var result =
from address in addresses
group address by address.Name;
foreach (var group in result)
{
Console.WriteLine("{0}", group.Key);
foreach (var a in group)
Console.WriteLine(" {0}", a);
}
}
// Create a customer list with sample data
private static List<Address> CreateAddressList( )
{
// Same as in Example 13-3
}
}
}
Output:
Janet Gates
165 North Main, Austin
800 Interchange Blvd., Austin
Keith Harris
3207 S Grady Way, Renton
7943 Walnut Ave, Renton
Orlando Gee
2251 Elliot Avenue, Seattle
Example 13-4 makes use of the group
keyword, a query operator that splits a sequence into a group given a key value—in this case, customer name (address.Name
). The result is a collection of groups, and you’ll need to enumerate each group to get the objects belonging to it.
Often, you do not want to create a new class just for storing the result of a query. C# 3.0 provides anonymous types that allow us to declare both an anonymous class and an instance of that class using object initializers. For instance, we can initialize an anonymous customer address object:
new { Customer = customer, Address = address }
This declares an anonymous class with two properties, Customer
and Address
, and initializes it with an instance of the Customer
class and an instance of the Address
class. The C# compiler can infer the property types with the types of assigned values, so here, the Customer
property type is the Customer
class, and the Address
property type is the Address
class. As a normal, named class, anonymous classes can have properties of any type.
Behind the scenes, the C# compiler generates a unique name for the new type. This name cannot be referenced in application code; therefore, it is considered nameless.
Now, let’s go back to the declaration of query results where you declare the result as type var
:
var result = ...
Because the select
clause returns an instance of an anonymous type, you cannot define an explicit type IEnumerable<T>
. Fortunately, C# 3.0 provides another feature—implicitly typed local variables—that solves this problem.
You can declare an implicitly typed local variable by specifying its type as var
:
var id = 1; var name = "Keith"; var customers = new List<Customer>( ); var person = new {FirstName = "Donna", LastName = "Gates", Phone="123-456-7890" };
The C# compiler infers the type of an implicitly typed local variable from its initialized value. Therefore, you must initialize such a variable when you declare it. In the preceding code snippet, the type of id
will be set as an integer, the type of name
as a string, and the type of customers
as a strongly typed List<T>
of Customer
objects. The type of the last variable, person
, is an anonymous type containing three properties: FirstName, LastName
, and Phone
. Although this type has no name in our code, the C# compiler secretly assigns it one and keeps track of its instances. In fact, the Visual Studio IDE IntelliSense is also aware of anonymous types, as shown in Figure 13-1.
Back in Example 13-3, result
is an instance of the constructed IEnumerable<T>
that contains query results, where the type of the argument T
is the anonymous type that contains two properties: Customer
and Address
.
Now that the query is defined, the next statement executes it using the foreach
loop:
foreach (var ca in result) { Console.WriteLine(string.Format("{0} Address: {1}", ca.Customer, ca.Address)); }
As the result is an implicitly typed IEnumerable<T>
of the anonymous class {Customer, Address}
, the iteration variable is also implicitly typed to the same class. For each object in the result list, this example simply prints its properties.
If you already know a little SQL, the query expressions introduced in previous sections are quite intuitive and easy to understand because LINQ is similar to SQL. As C# code is ultimately executed by the .NET CLR, the C# compiler has to translate query expressions to the format understandable by .NET. Because the .NET runtime understands method calls that can be executed, the LINQ query expressions written in C# are translated into a series of method calls. Such methods are called extension methods, and they are defined in a slightly different way than normal methods.
Example 13-5 is identical to Example 13-1 except it uses query operator extension methods instead of query expressions. The parts of the code that have not changed are omitted for brevity.
using System;
using System.Collections.Generic;
using System.Linq;
namespace Programming_CSharp
{
// Simple customer class
public class Customer
{
// Same as in Example 13-1
}
// Main program
public class Tester
{
static void Main( )
{
List<Customer> customers = CreateCustomerList( );
// Find customer by first nameIEnumerable<Customer> result =
customers.Where(customer => customer.FirstName == "Donna");
Console.WriteLine("FirstName == "Donna"");
foreach (Customer customer in result)
Console.WriteLine(customer.ToString( ));
}
// Create a customer list with sample data
private static List<Customer> CreateCustomerList( )
{
// Same as in Example 13-1
}
}
}
Output:
(Same as in Example 13-1)
Example 13-5 searches for customers whose first name is “Donna” using a query expression with a where
clause. Here’s the original code from Example 13-1:
IEnumerable<Customer> result = from customer in customers where customer.FirstName == "Donna" select customer;
Here is the extension Where( )
method:
IEnumerable<Customer> result = customers.Where(customer => customer.FirstName == "Donna");
You may have noticed that the select
clause seems to have vanished in this example. For details on this, please see the sidebar, "Whither the select Clause?" (And try to remember, as Chico Marx reminded us, “There ain’t no such thing as a Sanity Clause.”)
Recall that Customers
is of type List<Customer>
, which might lead you to think that List<T>
must have implemented the Where
method to support LINQ. It does not. The Where
method is called an extension method because it extends an existing type. Before we go into more details in this example, let’s take a closer look at extension methods.
C# 3.0 introduces extension methods that provide the ability for programmers to add methods to existing types. For instance, System.String
does not provide a Right( )
function that returns the rightmost n characters of a string. If you use this functionality a lot in your application, you may have considered building and adding it to your library. However, System.String
is defined as sealed, so you can’t subclass it. It is not a partial class, so you can’t extend it using that feature.
Of course, you can’t modify the .NET core library directly either. Therefore, you would have to define your own helper method outside of System.String
and call it with syntax such as this:
MyHelperClass.GetRight(aString, n)
This is not exactly intuitive. With C# 3.0, however, there is a more elegant solution. You can actually add a method to the System.String
class; in other words, you can extend the System.String
class without having to modify the class itself. Such a method is called an extension method. Example 13-6 demonstrates how to define and use an extension method.
using System; namespace Programming_CSharp_Extensions { // Container class for extension methods. public static class ExtensionMethods { // Returns a substring containing the rightmost // n characters in a specific string. public static string Right(this string s, int n) { if (n < 0 || n > s.Length) return s; else return s.Substring(s.Length - n); } } public class Tester { public static void Main( ) { string hello = "Hello"; Console.WriteLine("hello.Right(−1) = {0}", hello.Right(−1)); Console.WriteLine("hello.Right(0) = {0}", hello.Right(0)); Console.WriteLine("hello.Right(3) = {0}", hello.Right(3)); Console.WriteLine("hello.Right(5) = {0}", hello.Right(5)); Console.WriteLine("hello.Right(6) = {0}", hello.Right(6)); } } } Output: hello.Right(−1) = Hello hello.Right(0) = hello.Right(3) = llo hello.Right(5) = Hello hello.Right(6) = Hello
The first parameter of an extension method is always the target type, which is the string class in this example. Therefore, this example effectively defines a Right( )
function for the string
class. You want to be able to call this method on any string, just like calling a normal System.String
member method:
aString.Right(n)
In C#, an extension method must be defined as a static method in a static class. Therefore, this example defines a static class, ExtensionMethods
, and a static method in this class:
public static string Right(this string s, int n) { if (n < 0 || n > s.Length) return s; else return s.Substring(s.Length - n); }
Compared to a regular method, the only notable difference is that the first parameter of an extension method always consists of the this
keyword, followed by the target type, and finally an instance of the target type:
this string s
The subsequent parameters are just normal parameters of the extension method. The method body has no special treatment compared to regular methods either. Here, this function simply returns the desired substring or, if the length argument n
is invalid, the original string.
To use an extension method, it must be in the same scope as the client code. If the extension method is defined in another namespace, you should add a “using” directive to import the namespace where the extension method is defined. You can’t use fully qualified extension method names as you do with a normal method. The use of extension methods is otherwise identical to any built-in methods of the target type. In this example, you simply call it like a regular System.String
method:
hello.Right(3)
It is worth mentioning, however, that extension methods are somewhat more restrictive than regular member methods—extension methods can only access public members of target types. This prevents the breach of encapsulation of the target types.
Another restriction is that if an extension method conflicts with a member method in the target class, the member method is always used instead of the extension method, as you can see in Example 13-7.
using System; namespace Programming_CSharp_Extensions { // Container class for extension methods. public static class ExtensionMethods { // Returns a substring between the specific // start and end index of a string. public static string Substring(this string s, int startIndex, int endIndex) { if (startIndex >= 0 && startIndex <= endIndex && endIndex < s.Length) return s.Substring(startIndex, endIndex - startIndex); else return s; } } public class Tester { public static void Main( ) { string hello = "Hello"; Console.WriteLine("hello.Substring(2, 3) = {0}", hello.Substring(2, 3)); } } } Output: hello.Substring(2, 3) = llo
The Substring( )
extension method in this example has exactly the same signature as the built-in String.Substring(int startIndex, int length)
method. As you can see from the output, it is the built-in Substring( )
method that is executed in this example. Now, we’ll go back to Example 13-4, where we used the LINQ extension method, Where
, to search a customer list:
IEnumerable<Customer> result = customers.Where(customer => customer.FirstName == "Donna");
This method takes a predicate as an input argument.
In C# and LINQ, a predicate is a delegate that examines certain conditions and returns a Boolean value indicating whether the conditions are met.
The predicate performs a filtering operation on queries. The argument to this method is quite different from a normal method argument. In fact, it’s a lambda expression, which I introduced in Chapter 12.
In Chapter 12, I mentioned that you can use lambda expressions to define inline delegate definitions. In the following expression:
customer => customer.FirstName == "Donna"
the left operand, customer
, is the input parameter. The right operand is the lambda expression that checks whether the customer’s FirstName
property is equal to “Donna.” Therefore, for a given customer object, you’re checking whether its first name is Donna. This lambda expression is then passed into the Where
method to perform this comparison operation on each customer in the customer list.
Queries defined using extension methods are called method-based queries. Although the query and method syntaxes are different, they are semantically identical, and the compiler translates them into the same IL code. You can use either of them based on your preference.
Let’s start with a very simple query, as shown in Example 13-8.
using System; using System.Linq; namespace SimpleLamda { class Program { static void Main(string[] args) { string[] names = { "Jesse", "Donald", "Douglas" }; var dNames = names.Where(n => n.StartsWith("D")); foreach (string foundName in dNames) { Console.WriteLine("Found: " + foundName); } } } } Output: Found: Donald Found: Douglas
The statement names.Where
is shorthand for:
System.Linq.Enumerable.Where(names,n=>n.StartsWith("D"));
Where
is an extension method and so you can leave out the object (names) as the first argument, and by including the namespace System.Linq
, you can call upon Where
directly on the names object rather than through Enumerable.
Further, the type of dNames
is Ienumerable<string>;
we are using the new ability of the compiler to infer this by using the keyword var
. This does not undermine type-safety, however, because var
is compiled into the type Ienumerable<string>
through that inference.
Thus, you can read this line:
var dNames = names.Where(n => n.StartsWith("D"));
as “fill the IEnumerable collection dNames from the collection names with each member where the member starts with the letter D.”
As the method syntax is closer to how the C# compiler processes queries, it is worth spending a little more time to look at how a more complex query is expressed to gain a better understanding of LINQ. Let’s translate Example 13-3 into a method-based query to see how it would look (see Example 13-9).
namespace Programming_CSharp { // Simple customer class public class Customer { // Same as in Example 13-1 } // Customer address class public class Address { // Same as in Example 13-3 } // Main program public class Tester { static void Main( ) { List<Customer> customers = CreateCustomerList( ); List<Address> addresses = CreateAddressList( ); var result = customers.Join(addresses, customer => string.Format("{0} {1}", customer.FirstName, customer.LastName), address => address.Name, (customer, address) => new { Customer = customer, Address = address }) .OrderBy(ca => ca.Customer.LastName) .ThenByDescending(ca => ca.Address.Street); foreach (var ca in result) { Console.WriteLine(string.Format("{0} Address: {1}", ca.Customer, ca.Address)); } } // Create a customer list with sample data private static List<Customer> CreateCustomerList( ) { // Same as in Example 13-3 } // Create a customer list with sample data private static List<Address> CreateAddressList( ) { // Same as in Example 13-3 } } } Output: Janet Gates Email: [email protected] Address: 800 Interchange Blvd., Austin Janet Gates Email: [email protected] Address: 165 North Main, Austin Orlando Gee Email: [email protected] Address: 2251 Elliot Avenue, Seattle Keith Harris Email: [email protected] Address: 7943 Walnut Ave, Renton Keith Harris Email: [email protected] Address: 3207 S Grady Way, Renton
In Example 13-3, the query is written in query syntax:
var result = from customer in customers join address in addresses on string.Format("{0} {1}", customer.FirstName, customer.LastName) equals address.Name orderby customer.LastName, address.Street descending select new { Customer = customer, Address = address.Street };
It is translated into the method syntax:
var result = customers.Join(addresses, customer => string.Format("{0} {1}", customer.FirstName, customer.LastName), address => address.Name, (customer, address) => new { Customer = customer, Address = address }) .OrderBy(ca => ca.Customer.LastName) .ThenByDescending(ca => ca.Address.Street);
The lambda
expression takes some getting used to. Start with the OrderBy
clause; you read that as “Order in this way: for each customerAddress
, get the Customer’s LastName
.” You read the entire statement as, “start with customers and join to addresses as follows, for customers concatenate the First.Name
and Last.Name
, and then for address fetch each Address.Name
and join the two, then for the resulting record create a CustomerAddress
object where the customer matches the Customer
and the address matches the Address
; now order these first by each customer’s LastName
and then by each Address’ Street
name.”
The main data source, the customers
collection, is still the main target object. The extension method, Join( )
, is applied to it to perform the join operation. Its first argument is the second data source, addresses
. The next two arguments are join condition fields in each data source. The final argument is the result of the join condition, which is in fact the select clause in the query.
The OrderBy
clauses in the query expression indicate that you want to order by the customers’ last name in ascending order, and then by their street address in descending order. In the method syntax, you must specify this preference by using the OrderBy
and the ThenBy
methods.
You can just call OrderBy
methods in sequence, but the methods must be in reverse order. That is, you must invoke the method to order the last field in the query OrderBy
list first, and order the first field in the query OrderBy
list last. In this example, you will need to invoke the order by street method first, followed by the order by name method:
var result = customers.Join(addresses, customer => string.Format("{0} {1}", customer.FirstName, customer.LastName), address => address.Name, (customer, address) => new { Customer = customer, Address = address }) .OrderByDescending(ca => ca.Address.Street) .OrderBy(ca => ca.Customer.LastName);
As you can see from the result, the results for both examples are identical. Therefore, you can choose either based on your own preference.
Ian Griffiths, one of the smarter C# programmers on Earth, who blogs at IanG on Tap (http://www.interact-sw.co.uk/iangblog/), makes the following point, which I will illustrate in Chapter 15, but which I did not want to leave hanging here: “You can use exactly these same two syntaxes on a variety of different sources, but the behavior isn’t always the same. The meaning of a lambda expression varies according to the signature of the function it is passed to. In these examples, it’s a succinct syntax for a delegate. But if you were to use exactly the same form of queries against a SQL data source, the lambda expression is turned into something else.”
All the LINQ extension methods—Join, Select, Where
, and so on—have multiple implementations, each with different target types. Here, we’re looking at the ones that operate over IEnumerable
. The ones that operate over IQueryable
are subtly different. Rather than taking delegates for the join, projection, where, and other clauses, they take expressions. Those are wonderful and magical things that enable the C# source code to be transformed into an equivalent SQL query.
18.117.9.138