C-style languages (including C#) are imperative in nature, meaning that the emphasis is placed on the state of the system, and changes are made to that state over time. Data-acquisition languages such as SQL are functional in nature, meaning that the emphasis is placed on the operation and there is little or no mutable data used during the process. LINQ bridges the gap between the imperative programming style and the functional programming style. LINQ is a huge topic that deserves entire books devoted to it and what you can do with LINQ.[73] There are several implementations of LINQ readily available: LINQ to Objects, LINQ to SQL, LINQ to Dataset, LINQ to Entities, and LINQ to XML. I will be focusing on LINQ to Objects because I'll be able to get the LINQ message across without having to incorporate extra layers and technologies.
Development for LINQ started some time ago at Microsoft and was born out of the efforts of Anders Hejlsberg and Peter Golde. The idea was to create a more natural and language-integrated way to access data from within a language such as C#. However, at the same time, it was undesirable to implement it in such a way that it would destabilize the implementation of the C# compiler and become too cumbersome for the language. As it turns out, it made sense to implement some building blocks in the language in order to provide the functionality and expressiveness of LINQ. Thus we have features like lambda expressions, anonymous types, extension methods, and implicitly typed variables. All are excellent features in themselves, but arguably were precipitated by LINQ.
LINQ does a very good job of allowing the programmer to focus on the business logic while spending less time coding up the mundane plumbing that is normally associated with data access code. If you have experience building data-aware applications, think about how many times you have found yourself coding up the same type of boilerplate code over and over again. LINQ removes some of that burden.
Throughout this book, I have stressed how just about all the new features introduced by C# 3.0 foster a functional programming model. There's a good reason for that, in the sense that data query is typically a functional process. For example, a SQL statement tells the server exactly what you want and what to do. It does not really describe objects and structures and how they are related both statically and dynamically, which is typically what you do when you design a new application in an object-oriented language. Therefore, functional programming is the key here and any techniques that you might be familiar with from other functional programming languages such as Lisp, Scheme, or F# are applicable.
At first glance, LINQ query expressions look a lot like SQL expressions. But make no mistake: LINQ is not SQL. For starters, LINQ is strongly typed. After all, C# is a strongly typed language, and therefore, so is LINQ. The language adds several new keywords for building query expressions. However, their implementation from the compiler standpoint is pretty simple. LINQ query expressions typically get translated into a chain of extension method calls on a sequence or collection. That set of extension methods is clearly defined, and they are called standard query operators.
This LINQ model is quite extensible. If the compiler merely translates query expressions into a series of extension method calls, it follows that you can provide your own implementations of those extension methods. In fact, that is the case. For example, the class System.Linq.Enumerable
provides implementations of those methods for LINQ to Objects, whereas System.Linq.Queryable
provides implementations of those methods for querying types that implement IQueryable<T>
and are commonly used with LINQ to SQL.
Let's jump right in and have a look at what queries look like. Consider the following example, in which I create a collection of Employee
objects and then perform a simple query:
using System; using System.Linq; using System.Collections.Generic; public class Employee { public string FirstName { get; set; } public string LastName { get; set; } public Decimal Salary { get; set; } public DateTime StartDate { get; set; } } public class SimpleQuery { static void Main() { // Create our database of employees. var employees = new List<Employee> {
new Employee { FirstName = "Joe", LastName = "Bob", Salary = 94000, StartDate = DateTime.Parse("1/4/1992") }, new Employee { FirstName = "Jane", LastName = "Doe", Salary = 123000, StartDate = DateTime.Parse("4/12/1998") }, new Employee { FirstName = "Milton", LastName = "Waddams", Salary = 1000000, StartDate = DateTime.Parse("12/3/1969") } };var query = from employee in employees
where employee.Salary > 100000
orderby employee.LastName, employee.FirstName
select new { LastName = employee.LastName,
FirstName = employee.FirstName };
Console.WriteLine( "Highly paid employees:" ); foreach( var item in query ) { Console.WriteLine( "{0}, {1}", item.LastName, item.FirstName ); } } }
First of all, you will need to import the System.Linq
namespace, as I show in the following section titled "Standard Query Operators." In this example, I marked the query expression in bold to make it stand out. It's quite shocking if it's the first time you have seen a LINQ expression! After all, C# is a language that syntactically evolved from C++ and Java, and the LINQ syntax looks nothing like those languages.
For those of you familiar with SQL, the first thing you probably noticed is that the query is backward from what you are used to. In SQL, the select
clause is normally the beginning of the expression. There are several reasons why the reversal makes sense in C#. One reason is so that Intellisense will work. In the example, if the select
clause appeared first, Intellisense would have a hard time knowing which properties employee
provides because it would not even know the type of employee
yet.
Prior to the query expression, I created a simple list of Employee
instances just to have some data to work with.
Each query expression starts off with a from
clause, which declares what's called a range variable. The from
clause in our example is very similar to a foreach
statement in that it iterates over the employees
collection and stores each item in the collection in the variable employee
during each iteration. After the from
clause, the query consists of a series of clauses in which we can use various query operators to filter the data represented by the range variable. In my example, I applied a where
clause and an orderby
clause, as you can see. Finally, the expression closes with select
, which is a projection operator. When you perform a projection in the query expression, you are typically creating another collection of information, or a single piece of information, that is a transformed version of the collection iterated by the range variable. In the previous example, I wanted just the first and last names of the employees in my results.
Another thing to note is my use of anonymous types in the select
clause. I wanted the query to create a transformation of the original data into a collection of structures, in which each instance contains a FirstName
property, a LastName
property, and nothing more. Sure, I could have defined such a structure prior to my query and made my select
clause instantiate instances of that type, but doing so defeats some of the convenience and expressiveness of the LINQ query.
And most importantly, as I'll detail a little later in the section "The Virtues of Being Lazy," the query expression does not execute at the point the query variable is assigned. Instead, the query variable in this example implements IEnumerable<T>
, and the subsequent use of foreach
on the query
variable produces the end result of the example.
The end result of building the query expression culminates in what's called a query variable, which is query
in this example. Notice that I reference it using an implicitly typed variable. After all, can you imagine what the type of query is? If you are so inclined, you can send query.GetType
to the console and you'll see that the type is as shown here:
System.Linq.Enumerable+<SelectIterator>d__b`2[Employee, <>f__AnonymousType0`2[System.String,System.String]]
Before I break down the elements of a LINQ expression in more detail, I want to show you an alternate way of getting the work done. In fact, it's more or less what the compiler is doing under the covers.
The LINQ syntax is very foreign looking in a predominantly imperative language like C#. It's easy to jump to the conclusion that the C# language underwent massive modifications in order to implement LINQ. Actually, the compiler simply transforms the LINQ expression into a series of extension method calls that accept lambda expressions.
If you look at the System.Linq
namespace, you'll see that there are two interesting static classes full of extension methods: Enumerable
and Queryable. Enumerable
defines a collection of generic extension methods usable on IEnumerable
types, whereas Queryable
defines the same collection of generic extension methods usable on IQueryable
types. If you look at the names of those extension methods, you'll see they have names just like the clauses in query expressions. That's no accident because the extension methods implement the standard query operators I mentioned in the previous section. In fact, the query expression in the previous example can be replaced with the following code:
var query = employees .Where( emp => emp.Salary > 100000 ) .OrderBy( emp => emp.LastName ) .OrderBy( emp => emp.FirstName ) .Select( emp => new {LastName = emp.LastName, FirstName = emp.FirstName} );
Notice that it is simply a chain of extension method calls on IEnumerable
, which is implemented by employees
. In fact, you could go a step further and flip the statement inside out by removing the extension method syntax and simply call them as static methods, as shown here:
var query = Enumerable.Select( Enumerable.OrderBy( Enumerable.OrderBy( Enumerable.Where( employees, emp => emp.Salary > 100000), emp => emp.LastName ), emp => emp.FirstName ), emp => new {LastName = emp.LastName, FirstName = emp.FirstName} );
But why would you want to do such a thing? I merely show it here for illustration purposes so you know what is actually going on under the covers. Those who are really attached to C# 2.0 anonymous methods could even go one step further and replace the lambda expressions with anonymous methods. Needless to say, the Enumerable
and Queryable
extension methods are very useful even outside the context of LINQ. And as a matter of fact, some of the functionality provided by the extension methods does not have matching query keywords and therefore can only be used by invoking the extension methods directly.
LINQ is built upon the use of standard query operators, which are methods that operate on sequences such as collections that implement IEnumerable
or IQueryable
. As discussed previously, when the C# compiler encounters a query expression, it typically converts the expression into a series or chain of calls to those extension methods that implement the behavior.
There are two benefits to this approach. One is that you can generally perform the same actions as a LINQ query expression by calling the extension methods directly. The resulting code is not as easy to read as code with query expressions. However, there might be times when you need functionality from the extension methods, and a complete query expression might be overkill. Other times are when query operators are not exposed as query keywords.
The greatest benefit of this approach is that LINQ is extensible. That is, you can define your own set of extension methods, and the compiler will generate calls to them while compiling a LINQ query expression. For example, suppose that you did not import the System.Linq
namespace and instead wanted to provide your own implementation of Where
and Select
. You could do that as shown here:
using System; using System.Collections.Generic; public static class MySqoSet { public static IEnumerable<T> Where<T> ( this IEnumerable<T> source, System.Func<T,bool> predicate ) { Console.WriteLine( "My Where implementation called." ); return System.Linq.Enumerable.Where( source, predicate ); }
public static IEnumerable<R> Select<T,R> ( this IEnumerable<T> source, System.Func<T,R> selector ) { Console.WriteLine( "My Select implementation called." ); return System.Linq.Enumerable.Select( source, selector ); } } public class CustomSqo { static void Main() { int[] numbers = { 1, 2, 3, 4 }; var query = from x in numbers where x % 2 == 0 select x * 2; foreach( var item in query ) { Console.WriteLine( item ); } } }
Notice that I did not have to import the System.Linq
namespace. Aside from the added convenience, this helps prove my point because not importing the System.Linq
namespace prevents the compiler from automatically finding the extension methods in System.Linq.Enumerable
. In the MySqoSet
static class, I provide my own implementations of the standard query operators Where
and Select
that simply log a message and then forward to the ones in Enumerable
. If you run this example, the output will look as follows:
My Where implementation called.
My Select implementation called. 4
8
You could take this exercise a little further and imagine that you want to use LINQ against a collection that does not support IEnumerable
. Although you would normally make your collection support IEnumerable
, for the sake of argument, let's say it supports the custom interface IMyEnumerable
instead. In that case, you can supply your own set of standard query operators that operates on IMyEnumerable
rather than IEnumerable
. There is one drawback, though. If your type does not derive from IEnumerable
, you cannot use a LINQ query expression because the from
clause requires a data source that implements IEnumerable
or IEnumerable<T>
. However, you can call the standard query operators on your IMyEnumerable
type to achieve the same effect. I will show an example of this in the later section titled "Techniques from Functional Programming," in which I build upon an example from Chapter 14.
C# 2008 introduces a small set of new keywords for creating LINQ query expressions, some of which we have already seen in previous sections. They are from, join, where, group, into, let, ascending, descending, on, equals, by, in, orderby
, and select
. In the following sections, I cover the main points regarding their use.
Each query begins with a from
clause. The from
clause is a generator that also defines the range variable, which is a local variable of sorts used to represent each item of the input collection as the query expression is applied to it. The from
clause is just like a foreach
construct in the imperative programming style, and the range variable is identical in purpose to the iteration variable in the foreach
statement.
A query expression might contain more than one from
clause. In that case, you have more than one range variable, and it's analogous to having nested foreach
clauses. The next example uses multiple from
clauses to generate the multiplication table you might remember from grade school, albeit not in tabular format:
using System; using System.Linq; public class MultTable { static void Main() {var query = from x in Enumerable.Range(0,10)
from y in Enumerable.Range(0,10)
select new {
X = x,
Y = y,
Product = x * y
};
foreach( var item in query ) { Console.WriteLine( "{0} * {1} = {2}", item.X, item.Y, item.Product ); } } }
Remember that LINQ expressions are compiled into strongly typed code. So in this example, what is the type of x
and what is the type of y?
The compiler infers the types of those two range variables based upon the type argument of the IEnumerable<T>
interface returned by Range
. Because Range
returns a type of IEnumerable<int>
, the type of x
and y
is int
. Now, you might be wondering what happens if you want to apply a query expression to a collection that only supports the nongeneric IEnumerable
interface. In those cases, you must explicitly specify the type of the range variable, as shown here:
using System; using System.Linq; using System.Collections;
public class NonGenericLinq { static void Main() { ArrayList numbers = new ArrayList(); numbers.Add( 1 ); numbers.Add( 2 ); var query = from int n in numbers select n * 2; foreach( var item in query ) { Console.WriteLine( item ); } } }
You can see where I am explicitly typing the range variable n
to type int
. At run time, a cast is performed, which could fail with an InvalidCastException
. Therefore, it's best to strive to use the generic, strongly typed IEnumerable<T>
rather than IEnumerable
so these sorts of errors are caught at compile time rather than run time.
As I've emphasized throughout this book, the compiler is your best friend. Use as many of its facilities as possible to catch coding errors at compile time rather than run time. Strongly typed languages such as C# rely upon the compiler to verify the integrity of the operations you perform on the types defined within the code. If you cast away the type and deal with general types such as System.Object
rather than the true concrete types of the objects, you are throwing away one of the most powerful capabilities of the compiler. Then, if there is a type-based mistake in your code, and quality assurance does not catch it before it goes out the door, you can bet your customer will let you know about it, in the most abrupt way possible!
Following the from
clause, you might have a join
clause used to correlate data from two separate sources. Join operations are not typically needed in environments where objects are linked via hierarchies and other associative relationships. However, in the relational database world, there typically are no hard links between items in two separate collections, or tables, other than the equality between items within each record. That equality operation is defined by you when you create a join
clause. Consider the following example:
using System; using System.Linq; using System.Collections.Generic; public class EmployeeId { public string Id { get; set; }
public string Name { get; set; } } public class EmployeeNationality { public string Id { get; set; } public string Nationality { get; set; } } public class JoinExample { static void Main() { // Build employee collection var employees = new List<EmployeeId>() { new EmployeeId{ Id = "111-11-1111", Name = "Ed Glasser" }, new EmployeeId{ Id = "222-22-2222", Name = "Spaulding Smails" }, new EmployeeId{ Id = "333-33-3333", Name = "Ivan Ivanov" }, new EmployeeId{ Id = "444-44-4444", Name = "Vasya Pupkin" } }; // Build nationality collection. var empNationalities = new List<EmployeeNationality>() { new EmployeeNationality{ Id = "111-11-1111", Nationality = "American" }, new EmployeeNationality{ Id = "333-33-3333", Nationality = "Russian" }, new EmployeeNationality{ Id = "222-22-2222", Nationality = "Irish" }, new EmployeeNationality{ Id = "444-44-4444", Nationality = "Russian" } }; // Build query. var query = from emp in employeesjoin n in empNationalities
on emp.Id equals n.Id
orderby n.Nationality descending select new { Id = emp.Id, Name = emp.Name, Nationality = n.Nationality }; foreach( var person in query ) { Console.WriteLine( "{0}, {1}, {2}", person.Id, person.Name, person.Nationality ); }
} }
In this example, I have two collections. The first one contains just a collection of employees and their employee identification numbers. The second contains a collection of employee nationalities in which each employee is identified only by employee ID. To keep the example simple, every piece of data is a string
. Now, I want a list of all employee names and their nationalities and I want to sort the list by their nationality but in descending order. A join
clause comes in handy here because there is no single data source that contains this information. But join
lets us meld the information from the two data sources, and LINQ makes this a snap! In the query expression, I have highlighted the join
clause. For each item that the range variable emp
references (that is, for each item in employees
), it finds the item in the collection empNationalities
(represented by the range variable n)
where the Id
is equivalent to the Id
referenced by emp
. Then, my projector clause, the select
clause, takes data from both collections when building the result and projects that data into an anonymous type. Thus, the result of the query is a single collection where each item from both employees
and empNationalities
is melded into one. If you execute this example, the results are as shown here:
333-33-3333, Ivan Ivanov, Russian
444-44-4444, Vasya Pupkin, Russian 222-22-2222, Spaulding Smails, Irish
111-11-1111, Ed Glasser, American
When your query contains a join
operation, the compiler converts it to a Join
extension method call under the covers unless it is followed by an into
clause. If the into
clause is present, the compiler uses the GroupJoin
extension method which also groups the results. For more information on the more esoteric things you can do with join
and into
clauses, reference the MSDN documentation on LINQ or see Pro LINQ: Language Integrated Query in C# 2008 by Joseph C. Rattz, Jr. (Apress, 2007).
There's no reason you cannot have multiple join
clauses within the query to meld data from multiple different collections all at once. In the previous example, you might have a collection that represents languages spoken by each nation, and you could join each item from the empNationalities
collection with the items in that language's spoken collection. To do that, you would simply have one join
clause following another.
Following one or more from
clause generators or the join
clauses if there are any, you typically place one or more filter clauses. Filters consist of the where
keyword followed by a predicate expression. The where
clause is translated into a call to the Where
extension method, and the predicate is passed to the Where
method as a lambda expression. Calls to Enumerable.Where
, which are used if you are performing a query on an IEnumerable
type, convert the lambda expression into a delegate. Conversely, calls to Queryable.Where
, which are used if you perform a query on a collection via an IQueryable
interface, convert the lambda expression into an expression tree.[74] I'll have more to say about expression trees in LINQ later, in the section titled "Expression Trees Revisited."
The orderby
clause is used to sort the sequence of results in a query. Following the orderby
keyword is the item you want to sort by, which is commonly some property of the range variable. You can sort in either ascending or descending order, and if you don't specify that with either the ascending
or descending
keyword, ascending is the default order. Following the orderby
clause, you can have an unlimited set of subsorts simply by separating each sort item with a comma, as demonstrated here:
using System; using System.Linq; using System.Collections.Generic; public class Employee { public string LastName { get; set; } public string FirstName { get; set; } public string Nationality { get; set; } } public class OrderByExample { static void Main() { var employees = new List<Employee>() { new Employee { LastName = "Glasser", FirstName = "Ed", Nationality = "American" }, new Employee { LastName = "Pupkin", FirstName = "Vasya", Nationality = "Russian" }, new Employee { LastName = "Smails", FirstName = "Spaulding", Nationality = "Irish" }, new Employee { LastName = "Ivanov", FirstName = "Ivan", Nationality = "Russian" } };
var query = from emp in employeesorderby emp.Nationality,
emp.LastName descending,
emp.FirstName descending
select emp; foreach( var item in query ) { Console.WriteLine( "{0}, {1}, {2}", item.LastName, item.FirstName, item.Nationality ); } } }
Notice that because the select
clause simply returns the range variable, this whole query expression is nothing more than a sort operation. But it sure is a convenient way to sort things in C#. In this example, I sort first by Nationality
in ascending order, then the second expression in the orderby
clause sorts the results of each nationality group by LastName
in descending order, and then each of those groups is sorted by FirstName
in descending order.
At compile time, the compiler translates the first expression in the orderby
clause into a call to the OrderBy
standard query operator extension method. Any subsequent secondary sort expressions are translated into chained ThenBy
extension method calls. If orderby
is used with the descending
keyword, the generated code uses OrderByDescending
and ThenByDescending
respectively.
In a LINQ query, the select
clause is used to produce the end result of the query. It is called a projector because it projects, or translates, the data within the query into a form desired for consumption. If there are any filtering where
clauses in the query expression, they must precede the select
clause. The compiler converts the select
clause into a call to the Select
extension method. The body of the select
clause is converted into a lambda expression that is passed into the Select
method, which uses it to produce each item of the result set.
Anonymous types are extremely handy here and you would be correct in guessing that the anonymous types feature was born from the select
operation during the development of LINQ. To see why anonymous types are so handy in this case, consider the following example:
using System; using System.Linq; public class Result { public Result( int input, int output ) { Input = input; Output = output; } public int Input { get; set; } public int Output { get; set; } } public class Projector {
static void Main() { int[] numbers = { 1, 2, 3, 4 }; var query = from x in numbers select new Result( x, x*2 ); foreach( var item in query ) { Console.WriteLine( "Input = {0}, Output = {1}", item.Input, item.Output ); } } }
This works. However, notice that I had to declare a new type Result
just to hold the results of the query. Now, what if I wanted to change the result to include x, x*2
, and x*3
in the future? I would have to first go modify the definition of the Result
class to accommodate that. Ouch! It's so much easier just to use anonymous types as follows:
using System; using System.Linq; public class Projector { static void Main() { int[] numbers = { 1, 2, 3, 4 }; var query = from x in numbers select new {Input = x,
Output = x*2 };
foreach( var item in query ) { Console.WriteLine( "Input = {0}, Output = {1}", item.Input, item.Output ); } } }
Now that's much better! I can go and add a new property to the result type and call it Output2
, for example, and it would not force any changes on anything other than the anonymous type instantiation inside the query expression. Existing code will continue to work, and anyone who wants to use the new Output2
property can use it.
Of course, there are some circumstances where you do want to use predefined types in the select
clause such as when one of those type instances has to be returned from a function. However, the more you can get away with using anonymous types, the more flexibility you will have later on.
The let
clause introduces a new local identifier that can subsequently be referenced in the remainder of the query. Think of it as a local variable that is visible only within the query expression, just as a local variable inside a normal code block is visible only within that block. Consider the following example:
using System; using System.Linq; using System.Collections.Generic; public class Employee { public string LastName { get; set; } public string FirstName { get; set; } } public class LetExample { static void Main() { var employees = new List<Employee>() { new Employee { LastName = "Glasser", FirstName = "Ed" }, new Employee { LastName = "Pupkin", FirstName = "Vasya" }, new Employee { LastName = "Smails", FirstName = "Spaulding" }, new Employee { LastName = "Ivanov", FirstName = "Ivan" } }; var query = from emp in employeeslet fullName = emp.FirstName +
" " + emp.LastName
orderby fullName select fullName; foreach( var item in query ) { Console.WriteLine( item ); } } }
In this example, I wanted to sort the names in ascending order, but by sorting on the full name created by putting the FirstName
and LastName
together. I introduce this construct by using the let
clause to define the fullName
variable.
One other nice quality of local identifiers introduced by let
clauses is that if they reference collections, you can use the variable as input to another from
clause to create a new derived range variable. In the previous section titled "The from Clause and Range Variables," I gave an example using multiple from
clauses to generate a multiplication table. Following is a slight variation of that example using a let
clause:
using System; using System.Linq; public class MultTable { static void Main() { var query = from x in Enumerable.Range(0,10)let innerRange = Enumerable.Range(0, 10)
from y in innerRange
select new { X = x, Y = y, Product = x * y }; foreach( var item in query ) { Console.WriteLine( "{0} * {1} = {2}", item.X, item.Y, item.Product ); } } }
I have bolded the changes in this query from the earlier example. Notice that I added a new intermediate identifier named innerRange
and I then iterate over that collection with the from
clause following it.
The query expression can have an optional group
clause, which is very powerful at partitioning the input of the query. The group
clause is a projector as it projects the data into a collection of IGrouping
interfaces. Because of that, the group
clause can be the final clause in the query, just like the select
clause. The IGrouping
interface is defined in the System.Linq
namespace and it also derives from the IEnumerable
interface. Therefore, you can use an IGrouping
interface anywhere you can use an IEnumerable
interface. IGrouping
comes with a property named Key
, which is the object that delineates the subset. Each result set is formed by applying an equivalence operator to the input data and Key
. Let's take a look at an example that takes a series of integers and partitions them into the set of odd and even numbers:[75]
using System; using System.Linq;
public class GroupExample { static void Main() { int[] numbers = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; // partition numbers into odd and // even numbers.var query = from x in numbers
group x by x % 2;
foreach( var group in query ) { Console.WriteLine( "mod2 == {0}", group.Key ); foreach( var number in group ) { Console.Write( "{0}, ", number ); } Console.WriteLine( " " ); } } }
First of all, notice that there is no select
clause in this query. The end result of the query is a sequence of two instances of IGrouping
, that is IEnumerable<IGrouping<int>>
. The first instance in the result sequence contains the even numbers, and the second one contains the odd numbers, as shown in the following output:
mod2 == 0
0, 2, 4, 6, 8, mod2 == 1
1, 3, 5, 7, 9,
The first foreach
iterates over the two groups, or the two instances of IGrouping
. And because each IGrouping
implements IEnumerable
, there is a nested foreach
loop that iterates over all the items in the group. As you can see, this simple query iterated over all the items from the source data collection, numbers
, and produced two resultant groups. Internally, the compiler translates each group
clause into a call to the GroupBy
standard query operator.
The group
clause can also partition the input collection using multiple keys, also known as compound keys. I prefer to think of it as partitioning on one key that consists of multiple pieces of data. In order to perform such a grouping, you can use an anonymous type to introduce the multiple keys into the query, as demonstrated in the following example:
using System; using System.Linq; using System.Collections.Generic; public class Employee { public string LastName { get; set; } public string FirstName { get; set; } public string Nationality { get; set; } } public class GroupExample { static void Main() { var employees = new List<Employee>() { new Employee { LastName = "Jones", FirstName = "Ed", Nationality = "American" }, new Employee { LastName = "Ivanov", FirstName = "Vasya", Nationality = "Russian" }, new Employee { LastName = "Jones", FirstName = "Tom", Nationality = "Welsh" }, new Employee { LastName = "Smails", FirstName = "Spaulding", Nationality = "Irish" }, new Employee { LastName = "Ivanov", FirstName = "Ivan", Nationality = "Russian" } };var query = from emp in employees
group emp by new {
Nationality = emp.Nationality,
LastName = emp.LastName
};
foreach( var group in query ) { Console.WriteLine( group.Key ); foreach( var employee in group ) { Console.WriteLine( employee.FirstName ); } Console.WriteLine(); } } }
Notice the anonymous type within the group
clause. What this says is that I want to partition the input collection into groups where both the Nationality
and LastName
are the same. In this example, every group ends up having one entity except one, and it's the one where Nationality
is Russian and LastName
is Ivanov.
Essentially how it works is that for each item, it builds an instance of the anonymous type and checks to see whether that key instance is equal to the key of an existing group. If so, the item goes in that group. If not, a new group is created with that instance of the anonymous type as the key.
If you execute the preceding code, you will see the following results:
{ Nationality = American, LastName = Jones }
Ed { Nationality = Russian, LastName = Ivanov } Vasya Ivan { Nationality = Welsh, LastName = Jones } Tom { Nationality = Irish, LastName = Smails }
Spaulding
The grouping by itself is useful indeed. However, what if you want to operate further on each of the groups within the query, thus treating the resulting partition as an intermediate step? That's when you use the into
keyword, described in the next section.
The into
keyword is similar to the let
keyword in that it defines an identifier local to the scope of the query. Using an into
clause, you tell the query that you want to assign the results of a group
or a join
operation to an identifier that can then be used later on in the query. In query lingo, this is called a continuation because the group
clause is not the final projector in the query. However, the into
clause acts as a generator, much as from
clauses do, and the identifier introduced by the into
clause is similar to a range variable in a from
clause. Let's look at some examples:
using System; using System.Linq; public class GroupExample { static void Main() { int[] numbers = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; // Partition numbers into odd and // even numbers.var query = from x in numbers
group x by x % 2 into partition
where partition.Key == 0
select new {
Key = partition.Key,
Count = partition.Count(),
Group = partition
};
foreach( var item in query ) { Console.WriteLine( "mod2 == {0}", item.Key ); Console.WriteLine( "Count == {0}", item.Count ); foreach( var number in item.Group ) { Console.Write( "{0}, ", number ); } Console.WriteLine( " " ); } } }
In this query, the continuation (the part of the query after the into
clause) filters the series of groups where Key
is 0
by using a where
clause. This filters out the group of even numbers. I then project that group out into an anonymous type, producing a count of items in the group to go along with the Key
property and the items in the group. Thus the output to the console includes only one group.
But what if I wanted to add a count to each group in the partition? As I said before, the into
clause is a generator. So I can produce the desired result by changing the query to this:
var query = from x in numbers group x by x % 2 into partition select new { Key = partition.Key, Count = partition.Count(), Group = partition };
Notice that I removed the where
clause, thus removing any filtering. When executed with this version of the query, the example produces the following desired output:
mod2 == 0
Count == 5 0, 2, 4, 6, 8, mod2 == 1 Count == 5
1, 3, 5, 7, 9,
In both of the previous query expressions, note that the result is not an IEnumerable<IGrouping<T>>
as it commonly is when the group
clause is the final projector. Rather, the end result is an IEnumerable<T>
where T
is replaced with our anonymous type.
When you build a LINQ query expression and assign it to a query variable, very little code is executed in that statement. The data becomes available only when you iterate over that query variable, which executes the query once for each result in the result set. So, for example, if the result set consists of 100 items and you only iterate over the first 10, you don't pay the price for computing the remaining 90 items in the result set unless you apply some sort of operator such as Average
, which requires you to iterate over the entire collection.
You can use the Take
extension method, which produces a deferred execution enumerator, to access a specified number of elements at the head of the given stream. Similarly useful methods are TakeWhile, Skip
, and SkipWhile
.
The benefits of this deferred execution approach are many. First of all, the operations described in the query expression could be quite expensive. Because those operations are provided by the user, and the designers of LINQ have no way of predicting the complexity of those operations, it's best to harvest each item only when necessary. Also, the data could be in a database halfway around the world. You definitely want lazy evaluation on your side in that case. And finally, the range variable could actually iterate over an infinite sequence. I'll show an example of that in the next section.
Internally, the query variable is implemented using C# iterators by using the yield
keyword. I explained in Chapter 9 that code containing yield
statements actually compiles into an iterator object. Therefore, when you assign the LINQ expression to the query variable, just about the only code that is executed is the constructor for the iterator object. The iterator might depend on other nested objects, and they are initialized as well. You get the results of the LINQ expression once you start iterating over the query variable using a foreach
statement, or by using the IEnumerator
interface.
As an example, let's have a look at a query slightly modified from the code in the earlier section "LINQ Query Expressions." For convenience, here is the relevant code:
var query = from employee in employees where employee.Salary > 100000 select new { LastName = employee.LastName, FirstName = employee.FirstName }; Console.WriteLine( "Highly paid employees:" ); foreach( var item in query ) { Console.WriteLine( "{0}, {1}", item.LastName, item.FirstName );
Notice that the only difference is that I removed the orderby
clause from the original LINQ expression; I'll explain why in the next section. In this case, the query is translated into a series of chained extension method calls on the employees
variable. Each of those methods returns an object that implements IEnumerable<T>
. In reality, those objects are iterators created from a yield
statement.
Let's consider what happens when you start to iterate over the query variable in the foreach
block. To obtain the next result, first the from
clause grabs the next item from the employees
collection and makes the range variable employee
reference it. Then, under the covers, the where
clause passes the next item referenced by the range variable to the Where
extension method. If it gets trapped by the filter, execution backtracks to the from
clause to obtain the next item in the collection. It keeps executing that loop until either employees
is completely empty or an element of employees
passes the where
clause predicate. Then the select
clause projects the item into the format we want by creating an anonymous type and returning it. Once it returns the item from the select
clause, the enumerator's work is done until the query variable cursor is advanced by the next iteration.
LINQ query expressions can be reused. For example, suppose you have started iterating over the results of a query expression. Now, imagine that the range variable has iterated over just a few of the items in the input collection, and the variable referencing the collection is changed to reference a different collection. You can continue to iterate over the same query and it will pick up the changes in the new input collection without requiring you to redefine the query. How is that possible? Hint: think about closures and variable capture and what happens if the captured variable is modified outside the context of the closure.
In the previous section, I removed the orderby
clause from the query expression, and you might have been wondering why. That's because there are certain query operations that foil lazy evaluation. After all, how can orderby
do its work unless it has a look at all the results from the previous clauses? Of course it can't, and therefore orderby
forces the clauses prior to it to iterate to completion.
orderby
is not the only clause that subverts lazy evaluation, or deferred execution, of query expressions. group . . . by
and join
do as well. Additionally, any time you make an extension method call on the query variable that produces a singleton value (as opposed to an IEnumerable<T>
result), such as Count
, you force the entire query to iterate to completion.
The original query expression used in the earlier section "LINQ Query Expressions" looked like the following:
var query = from employee in employees
where employee.Salary > 100000
orderby employee.LastName, employee.FirstName
select new { LastName = employee.LastName,
FirstName = employee.FirstName };
Console.WriteLine( "Highly paid employees:" );
foreach( var item in query ) {
Console.WriteLine( "{0}, {1}",
item.LastName,
item.FirstName );
}
I have bolded the orderby
clause to make it stand out. When you ask for the next item in the result set, the from
clause sends the next item in employees
to the where
clause filter. If it passes, that is sent on to the orderby
clause. However, now the orderby
clause needs to see the rest of the input that passes the filter, so it forces execution back up to the from
clause to get the next item that passes the filter. It continues in this loop until there are no more items left in the employees
collection. Then, after ordering the items based on the criteria, it passes the first item in the ordered set to the select
projector. When foreach
asks for the next item in the result set, evaluation starts with the orderby
clause because it has cached all the results from every clause prior. It takes the next item in its internal cache and passes it on to the select
projector. This continues until the consumer of the query variable iterates over all the results, thus draining the cache formed by orderby
.
Now, earlier I mentioned the case where the range variable in the expression iterates over an infinite loop. Consider the following example:
using System; using System.Linq; using System.Collections.Generic; public class InfiniteList { static IEnumerable<int> AllIntegers() { int count = 0; while( true ) { yield return count++; } } static void Main() {
var query = from number in AllIntegers()
select number * 2 + 1;
foreach( var item in query.Take(10) ) { Console.WriteLine( item ); } } }
Notice in the bolded query expression, it makes a call to AllIntegers
, which is simply an iterator that iterates over all integers starting from zero. The select
clause projects those integers into all the odd numbers. I then use Take
and a foreach
loop to display the first ten odd numbers. Notice that if I did not use Take
, the program would run forever unless you compile it with the /checked+
compiler option to catch overflows.
Methods that create iterators over infinite sets like the AllIntegers
method in the previous example are sometimes called streams. The Queryable
and Enumerable
classes also contain useful methods that generate finite collections. Those methods are Empty
, which returns an empty set of elements; Range
, which returns a sequence of numbers; and Repeat
, which generates a repeated stream of constant objects given the object to return and the number of times to return it. I wish Repeat
would iterate forever if a negative count is passed to it.
Consider what would happen if I modified the query expression ever so slightly as shown here:
var query = from number in AllIntegers() orderby number descending select number * 2 + 1;
If you attempt to iterate even once over the query variable to get the first result, then you had better be ready to terminate the application. That's because the orderby
clause forces the clauses before it to iterate to completion. In this case, that will never happen.
Even if your range variable does not iterate over an infinite set, the clauses prior to the orderby
clause could be very expensive to execute. So the moral of the story is this: be careful of the performance penalty associated with using orderby, group . . . by
, and join
in your query expressions.
Sometimes you need to execute the entire query immediately. Maybe you want to cache the results of your query locally in memory or maybe you need to minimize the lock length to a SQL database. You can do this in a couple of ways. You could immediately follow your query with a foreach
loop that iterates over the query variable, stuffing each result into a List<T>
. But that's so imperative! Wouldn't you rather be functional? Instead, you could call the ToList
extension method on the query variable, which does the same thing in one simple method call. As with the orderby
example in the previous section, be careful when calling ToList
on a query that returns an infinite result set. There is also a ToArray
extension method for converting the results into an array. I show an interesting usage of ToArray
in the later section titled "Replacing foreach Statements."
Along with ToList
, there are other extension methods that force immediate execution of the entire query. They include such methods as Count, Sum, Max, Min, Average, Last, Reverse
and any other method that must execute the entire query in order to produce its result.
In Chapter 15, I described how lambda expressions can be converted into expression trees. I also made a brief mention of how this is very useful for LINQ to SQL.
When you use LINQ to SQL, the bodies of the LINQ clauses that boil down to lambda expressions are represented by expression trees. These expression trees are then used to convert the entire expression into a SQL statement for use against the server. When you perform LINQ to Objects, as I have done throughout this chapter, the lambda expressions are converted to delegates in the form of IL code instead. Clearly that's not acceptable for LINQ to SQL. Can you imagine how difficult it would be to convert IL into SQL?
As you know by now, LINQ clauses boil down to extension method calls implemented in either System.Linq.Enumerable
or System.Linq.Queryable
. But which set of extension methods are used and when? If you look at the documentation for the methods in Enumerable
, you can see that the predicates are converted to delegates because the methods all accept a type based on the Func<>
generic delegate type. However, the extension methods in Queryable
, which have the same names as those in Enumerable
, all convert the lambda expressions into an expression tree because they take a parameter of type Expression<T>
. Clearly, LINQ to SQL uses the extension methods in Queryable
.
Incidentally, when you use the extension methods in Enumerable
, you can pass either lambda expressions or anonymous functions to them because they accept a delegate in their parameter lists. However, the extension methods in Queryable
can accept only lambda expressions because anonymous functions cannot be converted into expression trees.
In the following sections, I want to explore some more of the functional programming concepts that are prevalent throughout the features added in C# 3.0. As you'll soon see, some problems are solved with clever use of delegates created from lambda expressions to add the proverbial extra level of indirection. I'll also show how you can replace many uses of the imperative programming style constructs such as for
loops and foreach
loops using a more functional style.
In this section, I will revisit an example introduced in Chapter 14, in which I showed how to implement a Lisp-style forward-linked list along with some extension methods to perform on that list. The primary interface for the list is shown here:
public interface IList<T> { T Head { get; }
IList<T> Tail { get; } }
A possible implementation of a collection based on this type was shown in Chapter 14; I repeat it here for convenience:
public class MyList<T> : IList<T> { public static IList<T> CreateList( IEnumerable<T> items ) { IEnumerator<T> iter = items.GetEnumerator(); return CreateList( iter ); } public static IList<T> CreateList( IEnumerator<T> iter ) { if( !iter.MoveNext() ) { return new MyList<T>( default(T), null ); } return new MyList<T>( iter.Current, CreateList(iter) ); } public MyList( T head, IList<T> tail ) { this.head = head; this.tail = tail; } public T Head { get { return head; } } public IList<T> Tail { get { return tail; } } private T head; private IList<T> tail; }
Now, let's say that you want to implement the Where
and Select
standard query operators. Based on this implementation of MyList
, those operators could be implemented as shown here:
public static class MyListExtensions { public static IEnumerable<T> GeneralIterator<T>( this IList<T> theList, Func<IList<T>, bool> finalState, Func<IList<T>, IList<T>> incrementer ) { while( !finalState(theList) ) { yield return theList.Head;
theList = incrementer( theList ); } } public static IList<T> Where<T>( this IList<T> theList, Func<T, bool> predicate ) { Func<IList<T>, IList<T>> whereFunc = null; whereFunc = list => { IList<T> result = new MyList<T>(default(T), null); if( list.Tail != null ) { if( predicate(list.Head) ) { result = new MyList<T>( list.Head, whereFunc(list.Tail) ); } else { result = whereFunc( list.Tail ); } } return result; }; return whereFunc( theList ); } public static IList<R> Select<T,R>( this IList<T> theList, Func<T,R> selector ) { Func<IList<T>, IList<R>> selectorFunc = null; selectorFunc = list => { IList<R> result = new MyList<R>(default(R), null); if( list.Tail != null ) { result = new MyList<R>( selector(list.Head), selectorFunc(list.Tail) ); } return result; }; return selectorFunc( theList ); } }
Each of the two methods, Where
and Select
, uses an embedded lambda expression that is converted to a delegate in order to get the work done.
Chapter 14 demonstrated a similar technique, but because lambda expressions had not been introduced yet, it used anonymous methods instead. Of course, lambda expressions clean up the syntax quite a bit.
In both methods, the embedded lambda expression is used to perform a simple recursive computation to compute the desired results. The final result of the recursion produces the product you want from each of the methods. I encourage you to follow through the execution of this code in a debugger to get a good feel for the execution flow.
The GeneralIterator
method in the previous example is used to create an iterator that implements IEnumerable
on the MyList
object instances. It is virtually the same as that shown in the example in Chapter 14.
Finally, you can put all of this together and execute the following code to see it in action:
public class SqoExample { static void Main() { var listInts = new List<int> { 5, 2, 9, 4, 3, 1 }; var linkList = MyList<int>.CreateList( listInts ); // Now go. var linkList2 = linkList.Where( x => x > 3 ).Select( x => x * 2 ); var iterator2 = linkList2.GeneralIterator( list => list.Tail == null, list => list.Tail ); foreach( var item in iterator2 ) { Console.Write( "{0}, ", item ); } Console.WriteLine(); } }
Of course, you will have to import the appropriate namespaces in order for the code to compile. Those namespaces are System, System.Linq
, and System.Collections.Generic
. If you execute this code, you will see the following results:
10, 18, 8,
There are some very important points and problems to address in this example, though. Notice that my query was not written using a LINQ query expression even though I do make use of the standard query operators Where
and Select
. This is because the from
clause requires that the given collection must implement IEnumerable
. Because the IList
interface does not implement IEnumerable
, it is impossible to use foreach
or a from
clause. You could use the GeneralIterator
extension method to get an IEnumerable
interface on the IList
and then use that in the from
clause of a LINQ query expression. In that case, there would be no need to implement custom Where
and Select
methods because you could just use the ones already implemented in the Enumerable
class. However, your results of the query would be in the form of an IEnumerable
and not an IList
, so you would then have to reconvert the results of the query back to an IList
. Although these conversions are all possible, for the sake of example, let's assume that the requirement is that the standard query operators must accept the custom IList
type and return the custom IList
type. Under such a requirement, it is impossible to use LINQ query expressions, and you must invoke the standard query operators directly.
You can see the power of the LINQ layered design and implementation. Even when your custom collection type does not implement IEnumerable
, you can still perform operations using custom designed standard query operators, even though you cannot use LINQ query expressions.
There is one major problem with the implementation of MyList
and the extension methods in the MyListExtensions
class as shown so far. They are grossly inefficient! One of the functional programming techniques employed throughout the LINQ implementation is that of lazy evaluation. In the section titled "The Virtues of Being Lazy," I showed that when you create a LINQ query expression, very little code is executed at that point, and operations are performed only as needed while you iterate the results of the query. The implementations of Where
and Select
for IList
, as shown so far, don't follow this methodology. For example, when you call Where
, the entire input list is processed before any results are returned to the caller. That's bad because what if the input IList
were an infinite list? The call to Where
would never return.
When developing implementations of the standard query operators or any other method in which lazy evaluation is desirable, I like to use an infinite list for input as the litmus test of whether my lazy evaluation code is working as expected. Of course, as shown in the section "Subverting Laziness," there are certain operations that just cannot be coded using lazy evaluation.
Let's turn to reimplementing the custom standard query operators in the previous example using lazy evaluation. Let's start by considering the Where
operation. How could you reimplement it to use lazy evaluation? It accepts an IList
and returns a new IList
, so how is it possible that Where
could return only one item at a time? The solution actually lies in the implementation of the MyList
class. Let's consider the typical IEnumerator
implementation for a moment. It has an internal cursor that points to the item that the IEnumerable.Current
property returns, and it has a MoveNext
method to go to the next item. The IEnumerable.MoveNext
method is the key to retrieving each value only when needed. When you call MoveNext
, you are invoking the operation to produce the next result, but only when needed, thus using lazy evaluation.
I've mentioned Andrew Koenig's "Fundamental Theorem of Software Engineering," in which all problems can be solved by introducing an extra level of indirection.[76] Although it's not really a theorem, it is true and very useful. In the C language, that form of indirection is typically in the form of a pointer. In C++ and other object-oriented languages, that extra level of indirection is typically in the form of a class (sometimes called a wrapper class). In functional programming, that extra level of indirection is typically a function in the form of a delegate.
So how can you fix this problem in MyList
by adding the proverbial extra level of indirection? It's actually fundamentally quite simple. Don't compute the IList
that is the IList.Tail
until it is asked for. Consider the changes in the MyList
implementation as shown here:
public class MyList<T> : IList<T> { public static IList<T> CreateList( IEnumerable<T> items ) { IEnumerator<T> iter = items.GetEnumerator(); return CreateList( iter ); } public static IList<T> CreateList( IEnumerator<T> iter ) {Func<IList<T>> tailGenerator = null;
tailGenerator = () => {
if( !iter.MoveNext() ) {
return new MyList<T>( default(T), null );
}
return new MyList<T>( iter.Current, tailGenerator );
};
return tailGenerator();
} public MyList( T head, Func<IList<T>> tailGenerator ) { this.head = head;this.tailGenerator = tailGenerator;
} public T Head { get { return head; } }public IList<T> Tail {
get {
if( tailGenerator == null ) {
return null;
} else if( tail == null ) {
tail = tailGenerator();
}
return tail;
}
}
private T head;private Func<IList<T>> tailGenerator;
private IList<T> tail = null; }
I have bolded the portions of the code that are interesting. Notice that the constructor still accepts the item that is assigned to head
, but instead of taking an IList
tail as the second argument it accepts a delegate that knows how to compute tail
instead. There's the extra level of indirection! Also, notice that the get
accessor of the Tail
property then uses that delegate on an as-needed basis to compute tail
when asked for it. And finally, the CreateList
static method that builds an IList
from an IEnumerator
must pass in a delegate that simply grabs the next item out of the IEnumerator
. So, even if you initialize a MyList
with an IEnumerable
, the IEnumerable
type is not fully consumed at creation time as it was in the example from Chapter 14. That's a definite plus because even the IEnumerable
passed in can reference an infinite stream of objects.
Now, let's turn to the modifications necessary for the standard query operators so they can work on this new implementation of MyList
. Consider the modifications shown here:
public static class MyListExtensions { public static IEnumerable<T> GeneralIterator<T>( this IList<T> theList, Func<IList<T>,bool> finalState, Func<IList<T>,IList<T>> incrementer ) { while( !finalState(theList) ) { yield return theList.Head; theList = incrementer( theList ); } } public static IList<T> Where<T>( this IList<T> theList, Func<T, bool> predicate ) {Func<IList<T>> whereTailFunc = null;
whereTailFunc = () => {
IList<T> result = null;
if( theList.Tail == null ) {
result = new MyList<T>( default(T), null );
}
if( predicate(theList.Head) ) {
result = new MyList<T>( theList.Head,
whereTailFunc );
}
theList = theList.Tail;
if( result == null ) {
result = whereTailFunc();
}
return result;
};
return whereTailFunc();
} public static IList<R> Select<T,R>( this IList<T> theList,
Func<T,R> selector ) {Func<IList<R>> selectorTailFunc = null;
selectorTailFunc = () => {
IList<R> result = null;
if( theList.Tail == null ) {
result = new MyList<R>( default(R), null );
} else {
result = new MyList<R>( selector(theList.Head),
selectorTailFunc );
}
theList = theList.Tail;
return result;
};
return selectorTailFunc();
} }
The implementations for Where
and Select
build a delegate that knows how to compute the next item in the result set and pass that delegate to the new instance of MyList
that they return. If this code looks overwhelming, I encourage you to step through it within a debugger to get a better feel for the execution flow. Thus, we have achieved lazy evaluation. Notice that each lambda expression in each method forms a closure that uses the passed-in information to form the recursive code that generates the next element in the list. Test the lazy evaluation by introducing an infinite linked list of values.
Before you can prove the lazy evaluation with an infinite list, you need to either iterate through the results using a for
loop (because a foreach
loop will attempt to iterate to the nonexistent end). Or instead of using a for
loop, implement the standard query operator Take
, which returns a given number of elements from the list. Following is a possible implementation of Take
using the new lazy MyList
implementation:
public static class MyListExtensions { public static IList<T> Take<T>( this IList<T> theList, int count ) { Func<IList<T>> takeTailFunc = null; takeTailFunc = () => { IList<T> result = null; if( theList.Tail == null || count— == 0 ) { result = new MyList<T>( default(T), null ); } else { result = new MyList<T>( theList.Head, takeTailFunc ); } theList = theList.Tail; return result; };
return takeTailFunc(); } }
This implementation of Take
is very similar to that of Select
, except that the closure formed by the lambda expression assigned to takeTailFunc
also captures the count parameter.
Using Take
is a more functional programming approach rather than using a for
loop to count through the first few items in a collection.
Armed with the Take
method, you can prove that lazy evaluation works with the following code:
public class SqoExample {static IList<T> CreateInfiniteList<T>( T item ) {
Func<IList<T>> tailGenerator = null;
tailGenerator = () => {
return new MyList<T>( item, tailGenerator );
};
return tailGenerator();
}
static void Main() { var infiniteList = CreateInfiniteList<int>( 21 );var linkList = infiniteList.Where( x => x > 3 )
.Select( x => x * 2 )
.Take( 10 );
var iterator = linkList.GeneralIterator( list => list.Tail == null, list => list.Tail ); foreach( var item in iterator ) { Console.Write( "{0}, ", item ); } Console.WriteLine(); } }
The Main
method uses the CreateInfiniteList
method to create an infinite IList
stream that returns the constant 21
. Following the creation of infiniteList
are chained calls to the custom standard query operators. Notice that the final method in the chain is the Take
method, in which I am asking only for the first 10
items in the result set. Without that call, the foreach
loop later on would loop indefinitely. Because the Main
method actually runs to completion, it proves that the lazy evaluation coded into the new MyList
and the new implementations of Where, Select
, and Take
are working as expected. If any of them were broken, execution would get stuck in an infinite loop.
As with most of the new features added in C# 3.0, LINQ imparts a taste of functional programming on the language that, when used appropriately, can leave a sweet aftertaste on the palate. Because functional programming has, over the years, been considered less efficient in its consumption of memory and CPU resources, it's possible that inappropriate use of LINQ could actually lead to inefficiencies. As with just about anything in software development, moderation is often the key to success. With enough use and given enough functional programming examples, you might be surprised by how many problems can be solved in a different and sometimes clearer way using LINQ and functional programming practices rather than the typical imperative programming style of C-style languages such as C#, C++, and Java.
In many of the examples in this book, I send a list of items to the console to illustrate the results of the example. I have typically used a Console.WriteLine
method call within a foreach
statement to iterate over the results when the result set is a collection. Now I want to show you how this can be done differently using LINQ, as in the following example:
using System; using System.Linq; using System.Collections.Generic; public static class Extensions { public static string Join( this string str, IEnumerable<string> list ) { return string.Join( str, list.ToArray() ); } } public class Test { static void Main() { var numbers = new int[] { 5, 8, 3, 4 };Console.WriteLine(
string.Join(", ",
(from x in numbers
orderby x
select x.ToString()).ToArray()) ); } }
I have bolded the interesting part of the code. In one statement, I sent all the items in the numbers
collection to the console separated by commas and sorted in ascending order. Isn't that cool? The way it works is that my query expression is evaluated immediately because I call the ToArray
extension method on it to convert the results of the query into an array. That's where the typical foreach
clause disappears to. The static method String.Join
should not be confused with the LINQ join
clause or the Join
extension method you get when using the System.Linq
namespace. What it does is intersperse the first string, in this case a comma, among each string in the given array of strings, building one big string in the process. I then simply pass the results of String.Join
to Console.WriteLine
.
In my opinion, LINQ is to C# what the Standard Template Library (STL) is to C++. When STL first came out in the early 1990s, it really jolted C++ programmers into thinking more functionally. It was definitely a breath of fresh air. LINQ has this same effect on C#, and I believe that as time goes on, you will see more and more crafty usage of functional programming techniques using LINQ. For example, if a C++ programmer used the STL effectively, there was little need to write a for
loop because the STL provides algorithms where one passes a function into the algorithm along with the collection to operate on, and it invokes that function on each item in the collection. One might wonder why this technique is so effective. One reason is that for
loops are a common place to inadvertently introduce an off-by-one bug. Of course, the C# foreach
keyword also helps alleviate that problem.
With enough thought, you could probably replace just about every foreach
block in your program with a LINQ query expression. It does not necessarily make sense to do so, but it is a great mental exercise on functional programming.
LINQ is clearly the culmination of most of the features added in C# 3.0. Or put another way, most of the new features of C# 3.0 were born from LINQ. In this chapter, I showed the basic syntax of a LINQ query including how LINQ query expressions ultimately compile down to a chain of extension methods known as the standard query operators. I then described all the new C# keywords introduced for LINQ expressions. Although you are not required to use LINQ query expressions and you can choose to call the extension methods directly, it sure makes for easily readable code. However, I also described how when you implement standard query operators on collection types that don't implement IEnumerable
, you might not be able to use LINQ query expressions.
I then explored the usefulness of lazy evaluation, or deferred execution, which is used extensively throughout the library provided LINQ standard operators on IEnumerable
and IQueryable
types. And finally, I closed the chapter by exploring how to apply the concept of lazy evaluation when defining your own custom implementations of the standard query operators.
LINQ is such a huge topic that there is no way I could possibly cover every nuance in one chapter. For example, you'll notice that I covered only LINQ to Objects, not LINQ to SQL, XML, DataSet, or Entities. Entire books are devoted to LINQ. I highly suggest that you frequently reference the MSDN documentation on LINQ. Additionally, you might consider LINQ for Visual C# 2005 by Fabio Claudio Ferracchiati or Pro LINQ: Language Integrated Query in C# 2008 by Joseph C. Rattz, Jr., both published by Apress.
In the next chapter, I will introduce one of the coolest new features added in the C# 4.0 language. It is the new dynamic
type and it brings interoperability in C# to a level of parity with Visual Basic, among other things.
[73] For more extensive coverage of LINQ, I suggest you check out Foundations of LINQ in C#, by Joseph C. Rattz, Jr. (Apress, 2007).
[74] In Chapter 15, I show how lambda expressions that are assigned to delegate instance variables are converted into executable IL code, whereas lambda expressions that are assigned to Expression<T>
are converted into expression trees, thus describing the expression with data rather than executable code.
[75] In the discussion of the group clause, I am using the word partition in the set theory context. That is a set partition of a space S is a set of disjoint subsets whose union produces S.
[76] I first encountered Koenig's so called fundamental theorem of software engineering in his excellent book co-authored with Barbara Moo titled Ruminations on C++ (Boston: Addison-Wesley Professional, 1996).
18.118.142.166