Ling is a programming paradigm that melds a declarative approach with the more traditional1 procedural approach. Because the declarative approach is different from the procedural approach used in other C# code, it requires a different mental model. Until you have become comfortable with that new model, the code is going to be a bit puzzling at times, but hang in there. There is a lot to like in Linq; you can use it to write code that is smaller, easier to write and understand, and less likely to contain bugs.
There are three main parts of the Microsoft Linq offering2:
This chapter will cover Linq to objects, and the following chapters will cover the XML and SQL variants.
Getting Started with Linq to Objects
We’ve been volunteered to help a friend keep track of the grades for a class. Our first task is to compute the current average for each student, so we open up our laptop, start up Visual Studio, and begin coding:
class Assignment
{
public string Name { get; set; }
public int Score { get; set; }
}
class Student
{
public Student()
{
Assignments = new List<Assignment>();
}
public string Name { get; set; }
public List<Assignment> Assignments { get; set; }
public double GetAverageScore()
{
double sum = 0;
foreach (Assignment assignment in Assignments)
{
sum += assignment.Score;
}
return sum / Assignments.Count;
}
}
That works fine. But it’s a bit annoying that we have to write a method to compute the average, and we certainly don’t want to have to do it multiple times. Let’s see if we can generalize it a bit. Here’s the starting point of a more general average method:
public static double AverageV1(List<Assignment> values)
{
double sum = 0;
foreach (Assignment assignment in values)
{
sum += assignment.Score;
}
return sum / values.Count;
}
This only works for the Assignment type. Is there a way to get rid of the assignment.Score reference in the foreach and make it work for any type? To do so, we will need a way of reaching into an instance and pulling out (or “selecting”) the data value to perform the average on. That is a perfect place for a bit of delegate/lambda magic:
public static double AverageV2(List<Assignment> values, Func<Assignment, int> selector)
{
double sum = 0;
foreach (Assignment value in values)
{
sum += selector(value);
}
return sum / values.Count;
}
System.Func<T, TResult> is a delegate that defines a function that takes an argument of input type T and returns a value of the output type TResult. By making it an argument to the Average method, we allow the caller to specify how to extract the value.
To call it, we’ll need to specific the function to pull the score out of the assignment:
double average = Helpers.AverageV2(Assignments, a => a.Score);
The second parameter is a lambda that takes an assignment and returns a score. We are still wedded to the Assignment type, but perhaps we can make it generic:
public static double AverageV3<T>(List<T> values, Func<T, int> selector)
{
double sum = 0;
foreach (T value in values)
{
sum += selector(value);
}
return sum / values.Count;
}
By replacing the hard-coded uses of the Argument type with a generic argument T, this method can now be used to get the sum of any list.3 There are a few more improvements we can make:
public static double AverageV4<T>(this IEnumerable<T> values, Func<T, int> selector)
{
double sum = 0;
int count = 0;
foreach (T value in values)
{
sum += selector(value);
count++;
}
return sum / count;
}
Making it an extension method makes it easier to call, and having it take an IEnumerable<T> rather than List<T> makes it work on any sequence of values. It can be called as follows:
double average = Assignments.AverageV4(a => a.Score);
At this point, our work is done, we have a method that does exactly what we want. And then a friend points us to the documentation for IEnumerable<T>, and we will find that somebody else has already done this work for us, giving us a method that is used exactly like the one we just wrote:
double average = Assignments.Average(a => a.Score);
Such methods are what I call sequence methods,4 and they are the basis for everything that Linq does.
Our next task is to count the number of missing assignments each student has. We write the following:
public int CountOfAssignmentsWithZeroScores()
{
int count = 0;
foreach (Assignment assignment in Assignments)
{
if (assignment.Score == 0)
{
count++;
}
}
return count;
}
This sort of list traversal has been bread-and-butter code for years, and most programmers have written it thousands of times. It’s not a hard thing to do, but it is a tedious thing to do it over and over. Linq provides an interesting alternative.
First, we need to filter the assignment to only those with non-zero scores:
Assignments.Where(a => a.Score == 0);
and then we can get the count of those assignments:
Assignments.Where(a => a.Score == 0).Count();
The Where() method functions the same way a WHERE statement does in SQL; we start with all the assignments, but only those that pass the condition make it past the Where() method to be counted.
The previous statement can be written in a different way using another overload of the Count() method:
Assignments.Count(a => a.Score == 0);
In this version, we are counting all the items in the sequence that meet the condition.5
Our next project is to produce a simple flat list that has the student’s name and average score. We’ll start by defining a class that can hold the data that we want:
class NameAndGrade
{
public NameAndGrade(string name, double average)
{
Name = name;
Average = average;
}
public string Name { get; set; }
public double Average { get; set; }
}
Now that we have the class, we can create an expression that will generate a collection of NameAndGrade instances:
students.Select(s => new NameAndGrade(s.Name, s.GetAverageScore()));
This uses the Select() method, which takes the current sequence and converts it into a sequence of another type.6 In this case, a sequence of Student is transformed into a sequence of NameAndGrade.
It’s more than a bit cumbersome to have to define a separate class to hold the new information when you use a Select() method. This is the sort of tedium that programmers hate and compilers are quite good at.
C# therefore provides support to generate a new class automatically that will hold the items that result from the select, so it is possible to simply write the following:
var nameAndAverages = students.Select(
s => new {
Name = s.Name,
Average = s.GetAverageScore()
});
In this case, the use of var is required, as there is no way to specify the name of the anonymous type that is created. The compiler does generate a class, but there’s no way to find the name of it.7 That means that you can’t pass instances of the type to a method or return the type from a method; in those situations you must write the type yourself.8
It is of course possible to eliminate the call to the GetAverageScore() method and use another Linq expression in its place:
var nameAndAverages = students.Select(
s => new
{
Name = s.Name,
Average = s.Assignments.Average(a => a.Score)
});
The next task is to figure out the average score across all assignments. The obvious way to write it is as follows:
var a1 = students.Average(s => s.GetAverageScore());
This is incorrect; the average of all the averages may not be the overall average, since students might not have the same number of assignments. Perhaps selecting the Assignments collection can help:
var a2 = students.Select(s => s.Assignments);
This, unfortunately, is not a collection of assignments, but a collection of collections of assignments. What is needed is an operator that will enumerate across multiple collections and join them together into a single sequence. That is done with the SelectMany() method:
var a3 = students.SelectMany(s => s.Assignments).Average(a => a.Score);
Behind the Curtain
To understand all that can be done with Linq, it’s important to understand a few implementation details; how Linq does what it does. Consider the following expression:
var x = students.Where(s => s.Name.Length < 5).Skip(1).Average(s => s.Assignments.Average(a => a.Score));
That’s ugly. It will look better if formatted as follows:
var x = students
.Where(s => s.Name.Length < 5)
.Skip(1)
.Average(s.Assignments.Average(a => a.Score));
The C# rule is that expressions are evaluated from left to right, so the operations will happen in the following order:
In fact, that’s almost backward from how it works, because the Linq expressions deal with IEnumerable<T>, not List<T>. The actual sequences is as follows:
This chaining together of enumerators has a number of benefits:
In the previous example, it was the Average() method that started the enumeration, and the reason it did so is that it needs all the values to calculate its return value. If the Average() was omitted:
var x = students
.Where(s => s.Name.Length < 5)
.Skip(1);
only steps 1 and 2 are executed, and therefore no operations have been performed. The value of x after execution of this statement is an IEnumerable<Student>, which will not do any work until somebody gets around to enumerating over it. This behavior is very important in Linq to SQL.
The communication between methods may be something other than IEnumerable<T>. Consider the following:
students.OrderBy(s => s.Assignments.Count());
That seems straightforward enough—order the students based on the number of assignments each student has. What does the following do?
students
.OrderBy(s => s.Assignments.Count())
.ThenBy(s => s.Assignments.Average(a => a.Score));
It sorts first by the count of assignments, then uses the average score as a subsort for any items that are equal to the first. This is done through a clever private conversation between the OrderBy() and ThenBy() methods using the IOrderedEnumerable<T> interface, which allows the two orderings to be batched and applied together, with the count as the primary sort key and the average as the secondary sort key.
The effect is as if both of the selectors were written together:
students
.OrderByMany(
s => s.Assignments.Count(),
s => s.Assignments.Average(a => a.Score));
and the single function does all the sorting,11 which is quite cool.
As part of the goal to make Linq to SQL more SQL-like, the C# compiler supports an alternate syntax for Linq expressions. The following:
var x = students
.Where(s => s.Name.Length < 5)
.Select(s => new
{
Name = s.Name,
Average = s.Assignments.Average(a => a.Score)
});
can alternatively be expressed as follows:
var y = from s in students
where s.Name.Length < 5
select new
{
Name = s.Name,
Average = s.Assignments.Average(a => a.Score)
};
This is similar to SQL select syntax, but differs in the order in which the select, where, and from sections are specified.12
Note There is potential for confusion if both syntaxes are used in the same codebase. The standard C# approach makes more sense to me, and I find it easier to separate the Linq syntax from any SQL queries involved. However, this may not apply in Linq to SQL code, so the best approach may differ for different teams.
It’s simple to write additional sequence methods. The following is a subset method that skips the even-numbered items in a sequence:
private class EveryOtherEnumerator<T> : IEnumerable<T>
{
IEnumerable<T> m_baseEnumerable;
public EveryOtherEnumerator(IEnumerable<T> baseEnumerable)
{
m_baseEnumerable = baseEnumerable;
}
public IEnumerator<T> GetEnumerator()
{
int count = 0;
foreach (T value in m_baseEnumerable)
{
if (count % 2 == 0)
{
yield return value;
}
count++;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Sequence Method Reference
Linq provides a large set of methods to make your life easier. The following section groups them by function.
An aggregate method takes a sequence and produces a single value, causing the sequence to be enumerated. Aggregates can operate on the following data types:
In addition, nullable values of any of these types are permitted. The aggregate methods are shown in Table 28-1:
Table 28-1. Linq Aggregate Methods
The aggregate method can be used to construct custom aggregators, but in practice it is simpler to write a custom sequence method instead.
The transformational methods take a sequence of one type and transform it into a sequence of another type.
The Select() method applies the supplied expression to every item in the sequence, generating a new sequence that contains all the return values. For example:
students.Select(s => s.Name);
The SelectMany() method joins together a collection that exists on each item in the sequence into a single sequence. For example:
students.SelectMany(s => s.Assignments);
The Cast() method casts all of the items in a sequence to a specific type. For example:
List<object> objects = new List<object>();
objects.Add("A");
objects.Add("B");
objects.Add("C");
var sequenceOfString = objects.Cast<string>();
Each of the items in the objects list will be cast to the string type.
The ToArray() method returns an array that contains all of the items in the sequence. ToArray() will force enumeration of the sequence.
The ToDictionary() method takes a sequence of instances and adds those instances to a dictionary, using the specified value as the key for the item in the dictionary.
List<Student> students = new List<Student>();
Dictionary<string, Student> studentDictionary = students.ToDictionary(s => s.Name);
The Dictionary type requires that the key values are all unique. If the keys are not unique, the ToLookup() method may be a better choice. ToDictionary() will force enumeration of the sequence.
The ToList() method returns a List that contains all of the items in the sequence. ToList() will force enumeration of the sequence.
The ToLookup() method takes a sequence of instances and adds those instances to a lookup, using the specified value as the key for the item in the lookup. The Lookup class is similar to the Dictionary class, but instead of storing a single value, it stores an IEnumerable of that value.
The Lookup type stores a collection of values that are stored for each key, so it is a good choice if the key values are not unique. ToLookup() will force enumeration of the sequence.
The extraction methods are used to take a sequence and extract a single element from the sequence. For example, the First() method is used to take the first element of a sequence:
Student firstStudent = students.First();
If the sequence is empty, an exception will be thrown.
Some of the extraction methods have an “OrDefault” variant; this variant will return the default value for the type (i.e., default(T)) instead of throwing an exception.
Student firstStudent = students.FirstOrDefault();
It is sometimes desirable to filter a list before performing the extraction. For example, the following will return the first student with a name more than five characters in length:
students.Where(s => s.Name.Length > 5).First();
There is an additional overload to the First() method that allows this to be written more concisely13:
students.First(s => s.Name.Length > 5);
The extraction methods are shown in Table 28-2.
Table 28-2. Linq Extraction Methods
The subset methods are used to produce a sequence that is a subset of the original sequence. The most common of these is the Where() method. For example, the following generates all students who have names that are fewer than five characters in length:
var shortNameStudents = students.Where(s => s.Name.Length < 5);
The subset methods are shown in Table 28-3.
Table 28-3. Linq Subset Methods
Ordering methods are used to reorder the element in a sequence. The ordering methods are listed in Table 28-4.
Table 28-4. Linq Ordering Methods
Whole sequence methods perform operations on a whole sequence. For example, the following produces the set union of two lists of integers:
int[] list1 = { 1, 3, 5, 7, 9 };
int[] list2 = { 1, 2, 3, 5, 8, 13 };
var unionOfLists = list1.Union(list2);
The whole sequence methods are shown in Table 28-5.
Table 28-5. Linq Whole Sequence Methods
Comparisons can be performed either using the default equality comparer or by specifying a equality comparer.
The order of the elements in the resulting sequence are not defined. All of these methods force enumeration of the sequence.
The conditional methods evaluate a sequence and return a boolean value. For example, the Any() method can be used to determine if any students have a name longer than 30 characters in length:
bool veryLongNameStudents = _students.Any(s => s.Name.Length > 30);
The conditional methods are shown in Table 28-6.
Table 28-6. Linq Conditional Methods
Comparisons can be performed either using the default equality comparer or by specifying an equality comparer.
Generator methods are used to generate sequences. The generator methods are shown in Table 28-7.
Table 28-7. Linq Generator Methods
Method | Description |
---|---|
Range() | Generates a sequence of integers. |
Repeat() | Returns a sequence that contains one element repeated a specific number of times. |
Empty() | Returns the empty sequence. |
The Join() method takes two sequences and joins them together using a specific key; this is the “inner join” method that is done by databases. Consider the following:
class StudentEmail
{
public string Name { get; set; }
public string EmailAddress { get; set; }
}
List<Student> students = ...
List<StudentEmail> studentEmails = ...
var joined = students.Join(studentEmails,
s => s.Name,
e => e.Name,
(s, e) => new
{
Name = s.Name,
EmailAddress = e.EmailAddress,
Average = s.GetAverageScore()
});
The Student and StudentEmail classes both store the student’s name, and the name can therefore be used to hook them together.14 The Join() specifies the second sequence, the two selectors that are used to determine what values to compare, and a function to create the resulting method.
Sample data for this join are listed in Tables 28-8 and 28-9.
Table 28-8. Student Sample Data
Name | Average |
---|---|
John | 20 |
Bob | 15 |
Sally | 18 |
Table 28-9. StudentEmail Sample Data
Name | Email Address |
---|---|
John | [email protected] |
Bob | [email protected] |
Tony | [email protected] |
The result of the join is shown in Table 28-10.
Table 28-10. Result of Join
Name | Average | Email Address |
---|---|---|
John | 20 | [email protected] |
Bob | 15 | [email protected] |
The result set contains only those items where the key values match in both sequences.
Inside of the Join(), it creates a Lookup of the keys in the second sequence and then uses that to find the values that match the key from the first sequence, so it’s reasonably efficient. Note that like the SQL operation, the Join() method is combinatorical; if there are two “John Smith” entries in the student sequence and three in the email sequence, the final sequence will contain six “John Smith” entries.
Note If you are experienced with SQL, you are probably asking yourself “where are the other types of joins?” Only the inner join is supported directly through a sequence operator, although the same operations can be performed in other ways. If you are not experienced with SQL, don’t worry about it; this topic will be covered in much more detail in Chapter 30.
The GroupBy() method is used to take an entire sequence and group it into different buckets, based on a specific key value. Consider the following data:
List<Demographics> data = new List<Demographics>();
data.Add(new Demographics("Fred", 55, 98008, 55000));
data.Add(new Demographics("Barney", 58, 98052, 125000));
data.Add(new Demographics("Wilma", 38, 98008, 250000));
data.Add(new Demographics("Dino", 12, 98001, 12000));
data.Add(new Demographics("George", 55, 98001, 80000));
data.Add(new Demographics("Elroy", 8, 98008, 8000));
data.Add(new Demographics("Judy", 16, 98008, 18000));
data.Add(new Demographics("Jane", 48, 98008, 251000));
The GroupBy() can be used to bucket the data by zip code:
var x = data.GroupBy(d => d.ZipCode);
The result looks a lot like a Lookup structure; there is a different key for each unique value of the key, and all the elements that were grouped are stored under one of the keys.
The following can be used to output the result:
foreach (var group in x)
{
Console.WriteLine(group.Key);
foreach (var item in group)
{
Console.Write(" ");
Console.WriteLine(item);
}
}
Output:
98008
Fred, 55, 98008, 55000
Wilma, 38, 98008, 250000
Elroy, 8, 98008, 8000
Judy, 16, 98008, 18000
Jane, 48, 98008, 251000
98052
Barney, 58, 98052, 125000
98001
Dino, 12, 98001, 12000
George, 55, 98001, 80000
Once the data are grouped, they can be summarized with one of the aggregate methods:
var x = data
.GroupBy(d => d.ZipCode)
.Select(d => new
{
ZipCode = d.Key,
AverageSalary = d.Average(d2 => d2.Salary)
});
This results in a sequence of elements containing the zip code and the average salary of all the zip codes in the group.
1 More traditional for those of us who grew up with procedural languages.
2 The paradigm that Linq uses can easily be extended and applied to other areas.
3 You may have noticed that hard-coded int as the second type parameter. What would be very nice would be to write an average method that worked across all “numeric” types, but—unlike some other languages—C# and .NET don’t have a type system that makes that possible, so we’re stuck with separate overloads if we want to use different numeric types. We can kind of get around this by defining our own data types that implement an interface that defines the arithmetic operations, but it’s easier and cleaner to define the methods multiple times, one for each numeric type.
4 From what I can tell, there is no generally accepted term. From the language perspective, they’re just methods, and therefore they don’t have separate names.
5 This overloading of method names to do two different things is confusing. A name like CountWhere() makes more sense to me.
6 It would be clearer if the method was named Transform() rather than Select(), but the Select term came from the SQL world, so we’re stuck with it, unless you can go back to the 1970s and change history.
7 Technically, this is false. You can run ILDASM on the assembly and figure out what the name is, and then use that in your code, although it’s a really bad idea to do so, so bad that you should forget I mentioned it. In this case the name is <>f__AnonymousType0'@<'<Name>j__TPar','<Average>j__TPar'>.
8 There was considerable discussion during the design of Linq about ways to name the anonymous type (so it could be used as a function parameter), but we were unable to come up with a satisfactory solution.
9 More specifically, it creates an instance of a class that implements IEnumerable<Student> and returns that instance.
10 Some methods, such as Reverse(), require full enumeration to generate the list of items.
11 If you are curious, the method is defined as follows:public static IEnumerable<T> OrderByMany<T>(this IEnumerable<T> values, params Func<T, double>[] selector)Implementation is left as an exercise for the reader.
12 The order is rearranged so that IntelliSense can function in the select clause; in a normal SQL syntax order (select, from, where), IntelliSense cannot be provided in the select clause because the from clause has not yet been written.
13 The naming of this method is unfortunate; it would be clearer if this method were named FirstWhere(), so that it would be obviously distinct from First().
14 In the real world, it would be advisable to use something more unique, such as a student identification number.
18.118.16.229