CHAPTER 28

image

Linq to Objects

Ling is a programming paradigm that melds a declarative approach with the more traditional1 procedural approach. Because the declarative approach is different from the procedural approach used in other C# code, it requires a different mental model. Until you have become comfortable with that new model, the code is going to be a bit puzzling at times, but hang in there. There is a lot to like in Linq; you can use it to write code that is smaller, easier to write and understand, and less likely to contain bugs.

There are three main parts of the Microsoft Linq offering2:

  • Linq to objects, which operates against collections and collection-like classes.
  • Linq to XML, which is used to perform XML processing.
  • Linq to SQL, which is used to execute database queries.

This chapter will cover Linq to objects, and the following chapters will cover the XML and SQL variants.

Getting Started with Linq to Objects

We’ve been volunteered to help a friend keep track of the grades for a class. Our first task is to compute the current average for each student, so we open up our laptop, start up Visual Studio, and begin coding:

class Assignment
{
    public string Name { get; set; }
    public int Score { get; set; }
}
class Student
{
    public Student()
    {
       Assignments = new List<Assignment>();
    }

    public string Name { get; set; }
    public List<Assignment> Assignments { get; set; }

    public double GetAverageScore()
    {
       double sum = 0;
       foreach (Assignment assignment in Assignments)
       {
            sum += assignment.Score;
       }

       return sum / Assignments.Count;
    }
}

That works fine. But it’s a bit annoying that we have to write a method to compute the average, and we certainly don’t want to have to do it multiple times. Let’s see if we can generalize it a bit. Here’s the starting point of a more general average method:

public static double AverageV1(List<Assignment> values)
{
    double sum = 0;
    foreach (Assignment assignment in values)
    {
       sum += assignment.Score;
    }

    return sum / values.Count;
}

This only works for the Assignment type. Is there a way to get rid of the assignment.Score reference in the foreach and make it work for any type? To do so, we will need a way of reaching into an instance and pulling out (or “selecting”) the data value to perform the average on. That is a perfect place for a bit of delegate/lambda magic:

public static double AverageV2(List<Assignment> values, Func<Assignment, int> selector)
{
    double sum = 0;
    foreach (Assignment value in values)
    {
       sum += selector(value);
    }

    return sum / values.Count;
}

System.Func<T, TResult> is a delegate that defines a function that takes an argument of input type T and returns a value of the output type TResult. By making it an argument to the Average method, we allow the caller to specify how to extract the value.

To call it, we’ll need to specific the function to pull the score out of the assignment:

double average = Helpers.AverageV2(Assignments, a => a.Score);

The second parameter is a lambda that takes an assignment and returns a score. We are still wedded to the Assignment type, but perhaps we can make it generic:

public static double AverageV3<T>(List<T> values, Func<T, int> selector)
{
    double sum = 0;
    foreach (T value in values)
    {
       sum += selector(value);
    }

    return sum / values.Count;
}

By replacing the hard-coded uses of the Argument type with a generic argument T, this method can now be used to get the sum of any list.3 There are a few more improvements we can make:

public static double AverageV4<T>(this IEnumerable<T> values, Func<T, int> selector)
{
    double sum = 0;
    int count = 0;
    foreach (T value in values)
    {
       sum += selector(value);
       count++;
    }

    return sum / count;
}

Making it an extension method makes it easier to call, and having it take an IEnumerable<T> rather than List<T> makes it work on any sequence of values. It can be called as follows:

double average = Assignments.AverageV4(a => a.Score);

At this point, our work is done, we have a method that does exactly what we want. And then a friend points us to the documentation for IEnumerable<T>, and we will find that somebody else has already done this work for us, giving us a method that is used exactly like the one we just wrote:

double average = Assignments.Average(a => a.Score);

Such methods are what I call sequence methods,4 and they are the basis for everything that Linq does.

Filtering Data

Our next task is to count the number of missing assignments each student has. We write the following:

public int CountOfAssignmentsWithZeroScores()
{
    int count = 0;
    foreach (Assignment assignment in Assignments)
    {
       if (assignment.Score == 0)
       {
            count++;
       }
    }

    return count;
}

This sort of list traversal has been bread-and-butter code for years, and most programmers have written it thousands of times. It’s not a hard thing to do, but it is a tedious thing to do it over and over. Linq provides an interesting alternative.

First, we need to filter the assignment to only those with non-zero scores:

Assignments.Where(a => a.Score == 0);

and then we can get the count of those assignments:

Assignments.Where(a => a.Score == 0).Count();

The Where() method functions the same way a WHERE statement does in SQL; we start with all the assignments, but only those that pass the condition make it past the Where() method to be counted.

The previous statement can be written in a different way using another overload of the Count() method:

Assignments.Count(a => a.Score == 0);

In this version, we are counting all the items in the sequence that meet the condition.5

Transforming Data

Our next project is to produce a simple flat list that has the student’s name and average score. We’ll start by defining a class that can hold the data that we want:

class NameAndGrade
{
    public NameAndGrade(string name, double average)
    {
       Name = name;
       Average = average;
    }

    public string Name { get; set; }
    public double Average { get; set; }
}

Now that we have the class, we can create an expression that will generate a collection of NameAndGrade instances:

students.Select(s => new NameAndGrade(s.Name, s.GetAverageScore()));

This uses the Select() method, which takes the current sequence and converts it into a sequence of another type.6 In this case, a sequence of Student is transformed into a sequence of NameAndGrade.

It’s more than a bit cumbersome to have to define a separate class to hold the new information when you use a Select() method. This is the sort of tedium that programmers hate and compilers are quite good at.

C# therefore provides support to generate a new class automatically that will hold the items that result from the select, so it is possible to simply write the following:

var nameAndAverages = students.Select(
    s => new {
       Name = s.Name,
       Average = s.GetAverageScore()
    });

In this case, the use of var is required, as there is no way to specify the name of the anonymous type that is created. The compiler does generate a class, but there’s no way to find the name of it.7 That means that you can’t pass instances of the type to a method or return the type from a method; in those situations you must write the type yourself.8

It is of course possible to eliminate the call to the GetAverageScore() method and use another Linq expression in its place:

var nameAndAverages = students.Select(
       s => new
       {
            Name = s.Name,
            Average = s.Assignments.Average(a => a.Score)
       });

Stringing Sequences Together

The next task is to figure out the average score across all assignments. The obvious way to write it is as follows:

var a1 = students.Average(s => s.GetAverageScore());

This is incorrect; the average of all the averages may not be the overall average, since students might not have the same number of assignments. Perhaps selecting the Assignments collection can help:

var a2 = students.Select(s => s.Assignments);

This, unfortunately, is not a collection of assignments, but a collection of collections of assignments. What is needed is an operator that will enumerate across multiple collections and join them together into a single sequence. That is done with the SelectMany() method:

var a3 = students.SelectMany(s => s.Assignments).Average(a => a.Score);

Behind the Curtain

To understand all that can be done with Linq, it’s important to understand a few implementation details; how Linq does what it does. Consider the following expression:

var x = students.Where(s => s.Name.Length < 5).Skip(1).Average(s => s.Assignments.Average(a => a.Score));

That’s ugly. It will look better if formatted as follows:

var x = students
            .Where(s => s.Name.Length < 5)
            .Skip(1)
            .Average(s.Assignments.Average(a => a.Score));

The C# rule is that expressions are evaluated from left to right, so the operations will happen in the following order:

  1. Where() operates on all the students, creating a list of Student instances that meet the condition.
  2. Skip() takes the list, removes the first item, and passes it on.
  3. Average() takes the list, traverses all the items, and figures out the average.

In fact, that’s almost backward from how it works, because the Linq expressions deal with IEnumerable<T>, not List<T>. The actual sequences is as follows:

  1. Where() creates an enumerator,9 which stores the lambda expression and the enumerator from students, implements IEnumerable<Student>, and returns this instance.
  2. Skip() creates an enumerator, which stores the enumerator returned by the Where() method, remembers how many items to skip and implements IEnumerable<Student>, and returns this instance.
  3. Average() stores the enumerator from Skip() and the lambda expression it gets and initializes count and sum variables. It then starts to calculate the average.
  4. The Average() enumerator asks the Skip() enumerator for an item.
  5. The Skip() enumerator asks the Where() enumerator for an item.
  6. The Where() enumerator asks the students list for an item. It repeats this until it finds an item that passes the condition. When it gets one, it passes it on.
  7. The Skip() enumerator decrements the skip count it saved. If that skip count is positive, it goes back to step 4. If not, the item is passed through.
  8. The Average() method takes the item, calls the lambda to get the value from it, and increments the count and sum variables.
  9. If there are no more items, Average() computes the average and returns it. If not, the process continues with step 4.

This chaining together of enumerators has a number of benefits:

  • It avoids intermediate lists and is therefore more efficient.10
  • It defers execution of an operation until it is enumerated.
  • It allows some interesting operations to be performed.

In the previous example, it was the Average() method that started the enumeration, and the reason it did so is that it needs all the values to calculate its return value. If the Average() was omitted:

var x = students
            .Where(s => s.Name.Length < 5)
            .Skip(1);

only steps 1 and 2 are executed, and therefore no operations have been performed. The value of x after execution of this statement is an IEnumerable<Student>, which will not do any work until somebody gets around to enumerating over it. This behavior is very important in Linq to SQL.

The communication between methods may be something other than IEnumerable<T>. Consider the following:

students.OrderBy(s => s.Assignments.Count());

That seems straightforward enough—order the students based on the number of assignments each student has. What does the following do?

students
    .OrderBy(s => s.Assignments.Count())
    .ThenBy(s => s.Assignments.Average(a => a.Score));

It sorts first by the count of assignments, then uses the average score as a subsort for any items that are equal to the first. This is done through a clever private conversation between the OrderBy() and ThenBy() methods using the IOrderedEnumerable<T> interface, which allows the two orderings to be batched and applied together, with the count as the primary sort key and the average as the secondary sort key.

The effect is as if both of the selectors were written together:

students
    .OrderByMany(
       s => s.Assignments.Count(),
       s => s.Assignments.Average(a => a.Score));

and the single function does all the sorting,11 which is quite cool.

Query Expressions

As part of the goal to make Linq to SQL more SQL-like, the C# compiler supports an alternate syntax for Linq expressions. The following:

var x = students
            .Where(s => s.Name.Length < 5)
            .Select(s => new
            {
                Name = s.Name,
                Average = s.Assignments.Average(a => a.Score)
            });

can alternatively be expressed as follows:

var y = from s in students
       where s.Name.Length < 5
       select new
       {
            Name = s.Name,
            Average = s.Assignments.Average(a => a.Score)
       };

This is similar to SQL select syntax, but differs in the order in which the select, where, and from sections are specified.12

image Note  There is potential for confusion if both syntaxes are used in the same codebase. The standard C# approach makes more sense to me, and I find it easier to separate the Linq syntax from any SQL queries involved. However, this may not apply in Linq to SQL code, so the best approach may differ for different teams.

A Sequence Method of Your Own

It’s simple to write additional sequence methods. The following is a subset method that skips the even-numbered items in a sequence:

private class EveryOtherEnumerator<T> : IEnumerable<T>
{
    IEnumerable<T> m_baseEnumerable;

    public EveryOtherEnumerator(IEnumerable<T> baseEnumerable)
    {
       m_baseEnumerable = baseEnumerable;
    }

    public IEnumerator<T> GetEnumerator()
    {
       int count = 0;
       foreach (T value in m_baseEnumerable)
       {
            if (count % 2 == 0)
            {
                yield return value;
            }
            count++;
       }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
       return GetEnumerator();
    }
}

Sequence Method Reference

Linq provides a large set of methods to make your life easier. The following section groups them by function.

Aggregate Methods

An aggregate method takes a sequence and produces a single value, causing the sequence to be enumerated. Aggregates can operate on the following data types:

  • Decimal
  • Double
  • Float
  • Int
  • Long

In addition, nullable values of any of these types are permitted. The aggregate methods are shown in Table 28-1:

Table 28-1. Linq Aggregate Methods

Method Description
Average() Returns the average of the values.
Count() Returns the count of the values.
Min() Returns the minimum of the values.
Max() Returns the maximum of the values.
Sum() Returns the sum of the values.
Aggregate() Computes a general aggregate.

The aggregate method can be used to construct custom aggregators, but in practice it is simpler to write a custom sequence method instead.

Transformational Methods

The transformational methods take a sequence of one type and transform it into a sequence of another type.

Select()

The Select() method applies the supplied expression to every item in the sequence, generating a new sequence that contains all the return values. For example:

students.Select(s => s.Name);

SelectMany()

The SelectMany() method joins together a collection that exists on each item in the sequence into a single sequence. For example:

students.SelectMany(s => s.Assignments);

Cast()

The Cast() method casts all of the items in a sequence to a specific type. For example:

List<object> objects = new List<object>();
objects.Add("A");
objects.Add("B");
objects.Add("C");
var sequenceOfString = objects.Cast<string>();

Each of the items in the objects list will be cast to the string type.

ToArray()

The ToArray() method returns an array that contains all of the items in the sequence. ToArray() will force enumeration of the sequence.

ToDictionary()

The ToDictionary() method takes a sequence of instances and adds those instances to a dictionary, using the specified value as the key for the item in the dictionary.

List<Student> students = new List<Student>();
Dictionary<string, Student> studentDictionary = students.ToDictionary(s => s.Name);

The Dictionary type requires that the key values are all unique. If the keys are not unique, the ToLookup() method may be a better choice. ToDictionary() will force enumeration of the sequence.

ToList()

The ToList() method returns a List that contains all of the items in the sequence. ToList() will force enumeration of the sequence.

ToLookup()

The ToLookup() method takes a sequence of instances and adds those instances to a lookup, using the specified value as the key for the item in the lookup. The Lookup class is similar to the Dictionary class, but instead of storing a single value, it stores an IEnumerable of that value.

The Lookup type stores a collection of values that are stored for each key, so it is a good choice if the key values are not unique. ToLookup() will force enumeration of the sequence.

Extraction Methods

The extraction methods are used to take a sequence and extract a single element from the sequence. For example, the First() method is used to take the first element of a sequence:

Student firstStudent = students.First();

If the sequence is empty, an exception will be thrown.

Some of the extraction methods have an “OrDefault” variant; this variant will return the default value for the type (i.e., default(T)) instead of throwing an exception.

Student firstStudent = students.FirstOrDefault();

It is sometimes desirable to filter a list before performing the extraction. For example, the following will return the first student with a name more than five characters in length:

students.Where(s => s.Name.Length > 5).First();

There is an additional overload to the First() method that allows this to be written more concisely13:

students.First(s => s.Name.Length > 5);

The extraction methods are shown in Table 28-2.

Table 28-2. Linq Extraction Methods

Method Description
First() Returns the first element that matches a condition (or with no condition) or throws an exception if there is no element.
FirstOrDefault() Returns the first element that matches a condition (or with no condition) or a default value if there is no element.
Last() Returns the last element that matches a condition (or with no condition) or throws an exception if there is no element.
LastOrDefault() Returns the last element that matches a condition (or with no condition) or a default value if there is no element.
Single() Returns the element that matches a condition (or with no condition) if there is only one element in the sequence, or throws an exception if there is no element.
SingleOrDefault() Returns the element that matches a condition (or with no condition) if there is only one element in the sequence, or a default value if there is no element.
ElementAt() Returns the element at the specific index in the sequence or throws an exception if the element does not exist.
ElementAtOrDefault() Returns the element at the specific index in the sequence or a default value if the element does not exist.
DefaultIfEmpty() Returns the sequence if there are elements in the sequence, or a sequence containing one element with the default value in it.

Subset Methods

The subset methods are used to produce a sequence that is a subset of the original sequence. The most common of these is the Where() method. For example, the following generates all students who have names that are fewer than five characters in length:

var shortNameStudents = students.Where(s => s.Name.Length < 5);

The subset methods are shown in Table 28-3.

Table 28-3. Linq Subset Methods

Method Description
Where() Returns the subset of the sequence that match the specified condition.
Distinct() Returns the subset of the sequence with no equal elements, either using the default equality comparer or a specified equality comparer.
OfType() Returns the subset of the sequence that are of the specified type.
Skip() Skips n elements at the beginning of the sequence, and returns the remainder of the sequence.
SkipWhile() Skips all elements at the beginning of the sequence that match the condition.
Take() Returns n elements at the beginning of the sequence, then skips the remainder of the sequence.
TakeWhile() Returns all elements at the beginning of the sequence that match the condition, then skips the remainder of the sequence.

Ordering Methods

Ordering methods are used to reorder the element in a sequence. The ordering methods are listed in Table 28-4.

Table 28-4. Linq Ordering Methods

Method Description
OrderBy() Orders the elements in a sequence in ascending order according to a key or according to a comparer.
OrderByDescending() Orders the elements in a sequence in descending order according to a key or according to a comparer.
Reverse() Returns a sequence of elements in the reverse order.
ThenBy() Orders any elements that were equal by a previous ordering method according to a key or according to a comparer.
ThenByDescending() Orders any elements that were equal by a previous ordering method in descending order according to a key or according to a comparer.

Whole Sequence Methods

Whole sequence methods perform operations on a whole sequence. For example, the following produces the set union of two lists of integers:

int[] list1 = { 1, 3, 5, 7, 9 };
int[] list2 = { 1, 2, 3, 5, 8, 13 };
var unionOfLists = list1.Union(list2);

The whole sequence methods are shown in Table 28-5.

Table 28-5. Linq Whole Sequence Methods

Method Description
Intersect() Returns the set intersection of two sequences.
Union() Returns the set union of two sequences.
Except() Returns the set difference between two sequences.
Concat() Returns the concatenated elements of two lists.
Zip() Traverses two sequences in parallel, returning a new sequence that depends on the corresponding elements in each sequence.

Comparisons can be performed either using the default equality comparer or by specifying a equality comparer.

The order of the elements in the resulting sequence are not defined. All of these methods force enumeration of the sequence.

Conditional Methods

The conditional methods evaluate a sequence and return a boolean value. For example, the Any() method can be used to determine if any students have a name longer than 30 characters in length:

bool veryLongNameStudents = _students.Any(s => s.Name.Length > 30);

The conditional methods are shown in Table 28-6.

Table 28-6. Linq Conditional Methods

Method Description
All() Returns true if the condition is true for all of the elements in the sequence.
Any() Returns true if the condition is true for any of the elements in the sequence.
Contains() Returns true if the sequence contains the specified element.
SequenceEqual() Returns true if the two sequences are equal.

Comparisons can be performed either using the default equality comparer or by specifying an equality comparer.

Generator Methods

Generator methods are used to generate sequences. The generator methods are shown in Table 28-7.

Table 28-7. Linq Generator Methods

Method Description
Range() Generates a sequence of integers.
Repeat() Returns a sequence that contains one element repeated a specific number of times.
Empty() Returns the empty sequence.

Join()

The Join() method takes two sequences and joins them together using a specific key; this is the “inner join” method that is done by databases. Consider the following:

class StudentEmail
{
    public string Name { get; set; }
    public string EmailAddress { get; set; }
}

List<Student> students = ...
List<StudentEmail> studentEmails = ...

var joined = students.Join(studentEmails,
                s => s.Name,
                e => e.Name,
                (s, e) => new
   {
    Name = s.Name,
    EmailAddress = e.EmailAddress,
    Average = s.GetAverageScore()
   });

The Student and StudentEmail classes both store the student’s name, and the name can therefore be used to hook them together.14 The Join() specifies the second sequence, the two selectors that are used to determine what values to compare, and a function to create the resulting method.

Sample data for this join are listed in Tables 28-8 and 28-9.

Table 28-8. Student Sample Data

Name Average
John 20
Bob 15
Sally 18

Table 28-9. StudentEmail Sample Data

Name Email Address
John [email protected]
Bob [email protected]
Tony [email protected]

The result of the join is shown in Table 28-10.

Table 28-10. Result of Join

Name Average Email Address
John 20 [email protected]
Bob 15 [email protected]

The result set contains only those items where the key values match in both sequences.

Inside of the Join(), it creates a Lookup of the keys in the second sequence and then uses that to find the values that match the key from the first sequence, so it’s reasonably efficient. Note that like the SQL operation, the Join() method is combinatorical; if there are two “John Smith” entries in the student sequence and three in the email sequence, the final sequence will contain six “John Smith” entries.

image Note  If you are experienced with SQL, you are probably asking yourself “where are the other types of joins?” Only the inner join is supported directly through a sequence operator, although the same operations can be performed in other ways. If you are not experienced with SQL, don’t worry about it; this topic will be covered in much more detail in Chapter 30.

GroupBy()

The GroupBy() method is used to take an entire sequence and group it into different buckets, based on a specific key value. Consider the following data:

List<Demographics> data = new List<Demographics>();

data.Add(new Demographics("Fred",   55, 98008, 55000));
data.Add(new Demographics("Barney", 58, 98052, 125000));
data.Add(new Demographics("Wilma", 38, 98008, 250000));
data.Add(new Demographics("Dino",   12, 98001, 12000));
data.Add(new Demographics("George", 55, 98001, 80000));
data.Add(new Demographics("Elroy",   8, 98008,   8000));
data.Add(new Demographics("Judy",   16, 98008, 18000));
data.Add(new Demographics("Jane",   48, 98008, 251000));

The GroupBy() can be used to bucket the data by zip code:

var x = data.GroupBy(d => d.ZipCode);

The result looks a lot like a Lookup structure; there is a different key for each unique value of the key, and all the elements that were grouped are stored under one of the keys.

The following can be used to output the result:

foreach (var group in x)
{
    Console.WriteLine(group.Key);
    foreach (var item in group)
    {
       Console.Write("    ");
       Console.WriteLine(item);
    }
}
Output:
98008
      Fred, 55, 98008, 55000
      Wilma, 38, 98008, 250000
      Elroy, 8, 98008, 8000
      Judy, 16, 98008, 18000
      Jane, 48, 98008, 251000
98052
      Barney, 58, 98052, 125000
98001
      Dino, 12, 98001, 12000
      George, 55, 98001, 80000

Once the data are grouped, they can be summarized with one of the aggregate methods:

var x = data
          .GroupBy(d => d.ZipCode)
          .Select(d => new
             {
                  ZipCode = d.Key,
                  AverageSalary = d.Average(d2 => d2.Salary)
             });

This results in a sequence of elements containing the zip code and the average salary of all the zip codes in the group.

1 More traditional for those of us who grew up with procedural languages.

2 The paradigm that Linq uses can easily be extended and applied to other areas.

3 You may have noticed that hard-coded int as the second type parameter. What would be very nice would be to write an average method that worked across all “numeric” types, but—unlike some other languages—C# and .NET don’t have a type system that makes that possible, so we’re stuck with separate overloads if we want to use different numeric types. We can kind of get around this by defining our own data types that implement an interface that defines the arithmetic operations, but it’s easier and cleaner to define the methods multiple times, one for each numeric type.

4 From what I can tell, there is no generally accepted term. From the language perspective, they’re just methods, and therefore they don’t have separate names.

5 This overloading of method names to do two different things is confusing. A name like CountWhere() makes more sense to me.

6 It would be clearer if the method was named Transform() rather than Select(), but the Select term came from the SQL world, so we’re stuck with it, unless you can go back to the 1970s and change history.

7 Technically, this is false. You can run ILDASM on the assembly and figure out what the name is, and then use that in your code, although it’s a really bad idea to do so, so bad that you should forget I mentioned it. In this case the name is <>f__AnonymousType0'@<'<Name>j__TPar','<Average>j__TPar'>.

8 There was considerable discussion during the design of Linq about ways to name the anonymous type (so it could be used as a function parameter), but we were unable to come up with a satisfactory solution.

9 More specifically, it creates an instance of a class that implements IEnumerable<Student> and returns that instance.

10 Some methods, such as Reverse(), require full enumeration to generate the list of items.

11 If you are curious, the method is defined as follows:public static IEnumerable<T> OrderByMany<T>(this IEnumerable<T> values, params Func<T, double>[] selector)Implementation is left as an exercise for the reader.

12 The order is rearranged so that IntelliSense can function in the select clause; in a normal SQL syntax order (select, from, where), IntelliSense cannot be provided in the select clause because the from clause has not yet been written.

13 The naming of this method is unfortunate; it would be clearer if this method were named FirstWhere(), so that it would be obviously distinct from First().

14 In the real world, it would be advisable to use something more unique, such as a student identification number.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.16.229