21. Querying in-memory data by using query expressions

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 21. Querying in-memory data by using query expressions

After completing this chapter, you will be able to:

What is Language-Integrated Query?
Using query operators
Explain how Language-Integrated Query defers evaluation of a query and how you can force immediate execution and cache the results of a Language-Integrated Query query.

You have now met most of the features of the C# language. However, so far I have glossed over one important aspect of the language that is likely to be used by many applications: the support that C# provides for querying data. You have seen that you can define structures and classes for modeling data and that you can use collections and arrays for temporarily storing data in memory. However, how do you perform common tasks such as searching for items in a collection that match a specific set of criteria? For example, if you have a collection of Customer objects, how do you find all customers that are located in London, or how can you find out which town has the most customers that have procured your services? You can write your own code to iterate through a collection and examine the fields in each object, but these types of tasks occur so often that the designers of C# decided to include features in the language to minimize the amount of code you need to write. In this chapter, you will learn how to use these advanced C# language features to query and manipulate data.

What is Language-Integrated Query?

All but the most trivial of applications need to process data. Historically, most applications provided their own logic for performing these operations. However, this strategy can lead to the code in an application becoming very tightly coupled with the structure of the data that it processes. If the data structures change, you might need to make a significant number of changes to the code that handles the data. The designers of the Microsoft .NET Framework thought long and hard about these issues and decided to make the life of an application developer easier by providing features that abstract the mechanism that an application uses to query data from application code itself. These features are called Language-Integrated Query, or LINQ.

The creators of LINQ took an unabashed look at the way in which relational database management systems such as Microsoft SQL Server separate the language used to query a database from the internal format of the data in the database. Developers accessing a SQL Server database issue Structured Query Language (SQL) statements to the database management system. SQL provides a high-level description of the data that the developer wants to retrieve but does not indicate exactly how the database management system should retrieve this data. These details are controlled by the database management system itself. Consequently, an application that invokes SQL statements does not care how the database management system physically stores or retrieves data. The format used by the database management system can change (for example, if a new version is released) without the application developer needing to modify the SQL statements used by the application.

LINQ provides syntax and semantics very reminiscent of SQL, and with many of the same advantages. You can change the underlying structure of the data being queried without needing to change the code that actually performs the queries. You should be aware that although LINQ looks similar to SQL, it is far more flexible and can handle a wider variety of logical data structures. For example, LINQ can handle data organized hierarchically, such as that found in an XML document. However, this chapter concentrates on using LINQ in a relational manner.

Using LINQ in a C# application

Perhaps the easiest way to explain how to use the C# features that support LINQ is to work through some simple examples based on the following sets of customer and address information:

Table 21-1. Customer Information

CustomerID	FirstName	LastName	CompanyName
1	Kim	Abercrombie	Alpine Ski House
2	Jeff	Hay	Coho Winery
3	Charlie	Herb	Alpine Ski House
4	Chris	Preston	Trey Research
5	Dave	Barnett	Wingtip Toys
6	Ann	Beebe	Coho Winery
7	John	Kane	Wingtip Toys
8	David	Simpson	Trey Research
9	Greg	Chapman	Wingtip Toys
10	Tim	Litton	Wide World Importers

Table 21-2. Address Information

CompanyName	City	Country
Alpine Ski House	Berne	Switzerland
Coho Winery	San Francisco	United States
Trey Research	New York	United States
Wingtip Toys	London	United Kingdom
Wide World Importers	Tetbury	United Kingdom

LINQ requires the data to be stored in a data structure that implements the IEnumerable or IEnumerable<T> interface, as described in Chapter 19. It does not matter what structure you use (an array, a HashSet<T>, a Queue<T>, or any of the other collection types, or even one that you define yourself) as long as it is enumerable. However, to keep things straightforward, the examples in this chapter assume that the customer and address information is held in the customers and addresses arrays shown in the following code example.

Note

In a real-world application, you would populate these arrays by reading the data from a file or a database.

var customers = new[] {
    new { CustomerID = 1, FirstName = "Kim", LastName = "Abercrombie",
          CompanyName = "Alpine Ski House" },
    new { CustomerID = 2, FirstName = "Jeff", LastName = "Hay",
          CompanyName = "Coho Winery" },
    new { CustomerID = 3, FirstName = "Charlie", LastName = "Herb",
          CompanyName = "Alpine Ski House" },
    new { CustomerID = 4, FirstName = "Chris", LastName = "Preston",
          CompanyName = "Trey Research" },
    new { CustomerID = 5, FirstName = "Dave", LastName = "Barnett",
          CompanyName = "Wingtip Toys" },
    new { CustomerID = 6, FirstName = "Ann", LastName = "Beebe",
          CompanyName = "Coho Winery" },
    new { CustomerID = 7, FirstName = "John", LastName = "Kane",
          CompanyName = "Wingtip Toys" },
    new { CustomerID = 8, FirstName = "David", LastName = "Simpson",
          CompanyName = "Trey Research" },
    new { CustomerID = 9, FirstName = "Greg", LastName = "Chapman",
          CompanyName = "Wingtip Toys" },
    new { CustomerID = 10, FirstName = "Tim", LastName = "Litton",
          CompanyName = "Wide World Importers" }
};
var addresses = new[] {
    new { CompanyName = "Alpine Ski House", City = "Berne",
          Country = "Switzerland"},
    new { CompanyName = "Coho Winery", City = "San Francisco",
          Country = "United States"},
    new { CompanyName = "Trey Research", City = "New York",
          Country = "United States"},
    new { CompanyName = "Wingtip Toys", City = "London",
          Country = "United Kingdom"},
    new { CompanyName = "Wide World Importers", City = "Tetbury",
          Country = "United Kingdom"}
};

Note

The sections Selecting data Filtering data Ordering, grouping, and aggregating data and Joining data that follow show you the basic capabilities and syntax for querying data by using LINQ methods. The syntax can become a little complex at times, and you will see when you reach the section Using query operators that it is not actually necessary to remember how all the syntax works. However, it is useful for you to at least take a look at these sections so that you can fully appreciate how the query operators provided with C# perform their tasks.

Selecting data

Suppose that you want to display a list consisting of the first name of each customer in the customers array. You can achieve this task with the following code:

IEnumerable<string> customerFirstNames =
    customers.Select(cust => cust.FirstName);
foreach (string name in customerFirstNames)
{
    Console.WriteLine(name);
}

Although this block of code is quite short, it does a lot, and it requires a degree of explanation, starting with the use of the Select method of the customers array.

Using the Select method, you can retrieve specific data from the array—in this case, just the value in the FirstName field of each item in the array. How does it work? The parameter to the Select method is actually another method that takes a row from the customers array and returns the selected data from that row. You can define your own custom method to perform this task, but the simplest mechanism is to use a lambda expression to define an anonymous method, as shown in the preceding example. There are three important things that you need to understand at this point:

The variable cust is the parameter passed in to the method. You can think of cust as an alias for each row in the customers array. The compiler deduces this from the fact that you are calling the Select method on the customers array. You can use any legal C# identifier in place of cust.
The Select method does not actually retrieve the data at this time; it simply returns an enumerable object that will fetch the data identified by the Select method when you iterate over it later. We will return to this aspect of LINQ in the section LINQ and deferred evaluation later in this chapter.
The Select method is not actually a method of the Array type. It is an extension method of the Enumerable class. The Enumerable class is located in the System.Linq namespace and provides a substantial set of static methods for querying objects that implement the generic IEnumerable<T> interface.

The preceding example uses the Select method of the customers array to generate an IEnumerable<string> object named customerFirstNames. (It is of type IEnumerable<string> because the Select method returns an enumerable collection of customer first names, which are strings.) The foreach statement iterates through this collection of strings, printing out the first name of each customer in the following sequence:

Kim
Jeff
Charlie
Chris
Dave
Ann
John
David
Greg
Tim

You can now display the first name of each customer. How do you fetch the first and last name of each customer? This task is slightly trickier. If you examine the definition of the Enumerable.Select method in the System.Linq namespace in the documentation supplied with Microsoft Visual Studio 2013, you will see that it looks like this:

public static IEnumerable<TResult> Select<TSource, TResult> (
         this IEnumerable<TSource> source,
         Func<TSource, TResult> selector
)

What this actually says is that Select is a generic method that takes two type parameters named TSource and TResult as well as two ordinary parameters named source and selector. TSource is the type of the collection that you are generating an enumerable set of results for (customer objects in this example), and TResult is the type of the data in the enumerable set of results (string objects in this example). Remember that Select is an extension method, so the source parameter is actually a reference to the type being extended (a generic collection of customer objects that implements the IEnumerable interface in the example). The selector parameter specifies a generic method that identifies the fields to be retrieved. (Remember that Func is the name of a generic delegate type in the .NET Framework that you can use for encapsulating a generic method that returns a result.) The method referred to by the selector parameter takes a TSource (in this case, customer) parameter and yields a TResult (in this case, string) object. The value returned by the Select method is an enumerable collection of TResult (again string) objects.

Note

Chapter 12 explains how extension methods work and the role of the first parameter to an extension method.

The important point to understand from the preceding paragraph is that the Select method returns an enumerable collection based on a single type. If you want the enumerator to return multiple items of data, such as the first and last name of each customer, you have at least two options:

You can concatenate the first and last names together into a single string in the Select method, like this:

IEnumerable<string> customerNames =
    customers.Select(cust => String.Format("{0} {1}", cust.FirstName, cust.LastName));

You can define a new type that wraps the first and last names, and use the Select method to construct instances of this type, like this:

class FullName
{
    public string FirstName{ get; set; }
    public string LastName{ get; set; }
}
...
IEnumerable<FullName> customerNames =
    customers.Select(cust => new FullName
    {
        FirstName = cust.FirstName,
        LastName = cust.LastName
    } );

The second option is arguably preferable, but if this is the only use that your application makes of the Names type, you might prefer to use an anonymous type instead of defining a new type specifically for a single operation, like this:

var customerNames =
    customers.Select(cust => new { FirstName = cust.FirstName, LastName = cust.LastName } );

Notice the use of the var keyword here to define the type of the enumerable collection. The type of objects in the collection is anonymous, so you do not know the specific type for the objects in the collection.

Filtering data

With the Select method, you can specify, or project, the fields that you want to include in the enumerable collection. However, you might also want to restrict the rows that the enumerable collection contains. For example, suppose you want to list the names of all companies in the addresses array that are located in the United States only. To do this, you can use the Where method, as follows:

IEnumerable<string> usCompanies =
    addresses.Where(addr => String.Equals(addr.Country, "United States"))
             .Select(usComp => usComp.CompanyName);
foreach (string name in usCompanies)
{
    Console.WriteLine(name);
}

Syntactically, the Where method is similar to Select. It expects a parameter that defines a method that filters the data according to whatever criteria you specify. This example makes use of another lambda expression. The variable addr is an alias for a row in the addresses array, and the lambda expression returns all rows where the Country field matches the string “United States”. The Where method returns an enumerable collection of rows containing every field from the original collection. The Select method is then applied to these rows to project only the CompanyName field from this enumerable collection to return another enumerable collection of string objects. (The variable usComp is an alias for the type of each row in the enumerable collection returned by the Where method.) The type of the result of this complete expression is therefore IEnumerable<string>. It is important to understand this sequence of operations—the Where method is applied first to filter the rows, followed by the Select method to specify the fields. The foreach statement that iterates through this collection displays the following companies:

Coho Winery
Trey Research

Ordering, grouping, and aggregating data

If you are familiar with SQL, you are aware that it makes it possible you to perform a wide variety of relational operations besides simple projection and filtering. For example, you can specify that you want data to be returned in a specific order, you can group the rows returned according to one or more key fields, and you can calculate summary values based on the rows in each group. LINQ provides the same functionality.

To retrieve data in a particular order, you can use the OrderBy method. Like the Select and Where methods, OrderBy expects a method as its argument. This method identifies the expressions that you want to use to sort the data. For example, you can display the name of each company in the addresses array in ascending order, like this:

IEnumerable<string> companyNames =
    addresses.OrderBy(addr => addr.CompanyName).Select(comp => comp.CompanyName);
foreach (string name in companyNames)
{
    Console.WriteLine(name);
}

This block of code displays the companies in the addresses table in alphabetical order.

Alpine Ski House
Coho Winery
Trey Research
Wide World Importers
Wingtip Toys

If you want to enumerate the data in descending order, you can use the OrderByDescending method, instead. If you want to order by more than one key value, you can use the ThenBy or ThenByDescending method after OrderBy or OrderByDescending.

To group data according to common values in one or more fields, you can use the GroupBy method. The following example shows how to group the companies in the addresses array by country:

var companiesGroupedByCountry =
    addresses.GroupBy(addrs => addrs.Country);
foreach (var companiesPerCountry in companiesGroupedByCountry)
{
    Console.WriteLine("Country: {0}	{1} companies",
            companiesPerCountry.Key, companiesPerCountry.Count());
    foreach (var companies in companiesPerCountry)
    {
        Console.WriteLine("	{0}", companies.CompanyName);
    }
}

By now, you should recognize the pattern. The GroupBy method expects a method that specifies the fields by which to group the data. There are some subtle differences between the GroupBy method and the other methods that you have seen so far, though.

The main point of interest is that you don’t need to use the Select method to project the fields to the result. The enumerable set returned by GroupBy contains all the fields in the original source collection, but the rows are ordered into a set of enumerable collections based on the field identified by the method specified by GroupBy. In other words, the result of the GroupBy method is an enumerable set of groups, each of which is an enumerable set of rows. In the example just shown, the enumerable set companiesGroupedByCountry is a set of countries. The items in this set are themselves enumerable collections containing the companies for each country in turn. The code that displays the companies in each country uses a foreach loop to iterate through the companiesGroupedByCountry set to yield and display each country in turn, and then it uses a nested foreach loop to iterate through the set of companies in each country. Notice in the outer foreach loop that you can access the value you are grouping by using the Key field of each item, and you can also calculate summary data for each group by using methods such as Count, Max, Min, and many others. The output generated by the example code looks like this:

Country: Switzerland    1 companies
        Alpine Ski House
Country: United States  2 companies
        Coho Winery
        Trey Research
Country: United Kingdom 2 companies
        Wingtip Toys
        Wide World Importers

You can use many of the summary methods such as Count, Max, and Min directly over the results of the Select method. If you want to know how many companies there are in the addresses array, you can use a block of code such as this:

int numberOfCompanies = addresses.Select(addr => addr.CompanyName).Count();
Console.WriteLine("Number of companies: {0}", numberOfCompanies);

Notice that the result of these methods is a single scalar value rather than an enumerable collection. The output from the preceding block of code looks like this:

Number of companies: 5

I should utter a word of caution at this point. These summary methods do not distinguish between rows in the underlying set that contain duplicate values in the fields you are projecting. This means that, strictly speaking, the preceding example shows you only how many rows in the addresses array contain a value in the CompanyName field. If you wanted to find out how many different countries are mentioned in this table, you might be tempted to try this:

int numberOfCountries = addresses.Select(addr => addr.Country).Count();
Console.WriteLine("Number of countries: {0}", numberOfCountries);

The output looks like this:

Number of countries: 5

In fact, there are only three different countries in the addresses array—it just so happens that United States and United Kingdom both occur twice. You can eliminate duplicates from the calculation by using the Distinct method, like this:

int numberOfCountries =
    addresses.Select(addr => addr.Country).Distinct().Count();
Console.WriteLine("Number of countries: {0}", numberOfCountries);

The Console.WriteLine statement now outputs the expected result:

Number of countries: 3

Joining data

Just like SQL, LINQ gives you the ability to join multiple sets of data together over one or more common key fields. The following example shows how to display the first and last names of each customer, together with the name of the country where the customer is located:

var companiesAndCustomers = customers
  .Select(c => new { c.FirstName, c.LastName, c.CompanyName })
  .Join(addresses, custs => custs.CompanyName, addrs => addrs.CompanyName,
  (custs, addrs) => new {custs.FirstName, custs.LastName, addrs.Country });
foreach (var row in companiesAndCustomers)
{
    Console.WriteLine(row);
}

The customers’ first and last names are available in the customers array, but the country for each company that customers work for is stored in the addresses array. The common key between the customers array and the addresses array is the company name. The Select method specifies the fields of interest in the customers array (FirstName and LastName), together with the field containing the common key (CompanyName). You use the Join method to join the data identified by the Select method with another enumerable collection. The parameters to the Join method are as follows:

The enumerable collection with which to join
A method that identifies the common key fields from the data identified by the Select method
A method that identifies the common key fields on which to join the selected data
A method that specifies the columns you require in the enumerable result set returned by the Join method

In this example, the Join method joins the enumerable collection containing the FirstName, LastName, and CompanyName fields from the customers array with the rows in the addresses array. The two sets of data are joined where the value in the CompanyName field in the customers array matches the value in the CompanyName field in the addresses array. The result set comprises rows containing the FirstName and LastName fields from the customers array with the Country field from the addresses array. The code that outputs the data from the companiesAndCustomers collection displays the following information:

{ FirstName = Kim, LastName = Abercrombie, Country = Switzerland }
{ FirstName = Jeff, LastName = Hay, Country = United States }
{ FirstName = Charlie, LastName = Herb, Country = Switzerland }
{ FirstName = Chris, LastName = Preston, Country = United States }
{ FirstName = Dave, LastName = Barnett, Country = United Kingdom }
{ FirstName = Ann, LastName = Beebe, Country = United States }
{ FirstName = John, LastName = Kane, Country = United Kingdom }
{ FirstName = David, LastName = Simpson, Country = United States }
{ FirstName = Greg, LastName = Chapman, Country = United Kingdom }
{ FirstName = Tim, LastName = Litton, Country = United Kingdom }

Note

Remember that collections in memory are not the same as tables in a relational database, and the data they contain is not subject to the same data integrity constraints. In a relational database, it could be acceptable to assume that every customer has a corresponding company and that each company has its own unique address. Collections do not enforce the same level of data integrity, meaning that you can quite easily have a customer referencing a company that does not exist in the addresses array, and you might even have the same company occurring more than once in the addresses array. In these situations, the results that you obtain might be accurate but unexpected. Join operations work best when you fully understand the relationships between the data you are joining.

Using query operators

The preceding sections have shown you many of the features available for querying in-memory data by using the extension methods for the Enumerable class defined in the System.Linq namespace. The syntax makes use of several advanced C# language features, and the resultant code can sometimes be quite hard to understand and maintain. To relieve you of some of this burden, the designers of C# added query operators to the language with which you can employ LINQ features by using a syntax more akin to SQL.

As you saw in the examples shown earlier in this chapter, you can retrieve the first name for each customer, like this:

IEnumerable<string> customerFirstNames =
    customers.Select(cust => cust.FirstName);

You can rephrase this statement by using the from and select query operators, like this:

var customerFirstNames = from cust in customers
                         select cust.FirstName;

At compile time, the C# compiler resolves this expression into the corresponding Select method. The from operator defines an alias for the source collection, and the select operator specifies the fields to retrieve by using this alias. The result is an enumerable collection of customer first names. If you are familiar with SQL, notice that the from operator occurs before the select operator.

Continuing in the same vein, to retrieve the first and last names for each customer, you can use the following statement. (You might want to refer to the earlier example of the same statement based on the Select extension method.)

var customerNames = from cust in customers
                    select new { cust.FirstName, cust.LastName };

You use the where operator to filter data. The following example shows how to return the names of the companies based in the United States from the addresses array:

var usCompanies = from a in addresses
                  where String.Equals(a.Country, "United States")
                  select a.CompanyName;

To order data, use the orderby operator, like this:

var companyNames = from a in addresses
                   orderby a.CompanyName
                   select a.CompanyName;

You can group data by using the group operator in the following manner:

var companiesGroupedByCountry = from a in addresses
                                group a by a.Country;

Notice that, as with the earlier example showing how to group data, you do not provide the select operator, and you can iterate through the results by using exactly the same code as the earlier example, like this:

foreach (var companiesPerCountry in companiesGroupedByCountry)
{
    Console.WriteLine("Country: {0}	{1} companies",
            companiesPerCountry.Key, companiesPerCountry.Count());
    foreach (var companies in companiesPerCountry)
    {
        Console.WriteLine("	{0}", companies.CompanyName);
    }
}

You can invoke the summary functions such as Count over the collection returned by an enumerable collection, like this:

int numberOfCompanies = (from a in addresses
                         select a.CompanyName).Count();

Notice that you wrap the expression in parentheses. If you want to ignore duplicate values, use the Distinct method:

int numberOfCountries = (from a in addresses
                         select a.Country).Distinct().Count();

Tip

In many cases, you probably want to count just the number of rows in a collection rather than the number of values in a field across all the rows in the collection. In this case, you can invoke the Count method directly over the original collection, like this:

int numberOfCompanies = addresses.Count();

You can use the join operator to combine two collections across a common key. The following example shows the query returning customers and addresses over the CompanyName column in each collection, this time rephrased using the join operator. You use the on clause with the equals operator to specify how the two collections are related.

Note

LINQ currently supports equi-joins (joins based on equality) only. If you are a database developer who is used to SQL, you might be familiar with joins based on other operators such as > and <, but LINQ does not provide these features.

var countriesAndCustomers = from a in addresses
                            join c in customers
                            on a.CompanyName equals c.CompanyName
                            select new { c.FirstName, c.LastName, a.Country };

Note

In contrast with SQL, the order of the expressions in the on clause of a LINQ expression is important. You must place the item you are joining from (referencing the data in the collection in the from clause) to the left of the equals operator and the item you are joining with (referencing the data in the collection in the join clause) to the right.

LINQ provides a large number of other methods for summarizing information, joining, grouping, and searching through data. This section has covered just the most common features. For example, LINQ provides the Intersect and Union methods, which you can use to perform setwide operations. It also provides methods such as Any and All that you can use to determine whether at least one item in a collection or every item in a collection matches a specified predicate. You can partition the values in an enumerable collection by using the Take and Skip methods. For more information, see the material in the LINQ section of the documentation provided with Visual Studio 2013.

Querying data in Tree<TItem> objects

The examples you’ve seen so far in this chapter have shown how to query the data in an array. You can use exactly the same techniques for any collection class that implements the generic IEnumerable<T> interface. In the following exercise, you will define a new class for modeling employees for a company. You will create a BinaryTree object containing a collection of Employee objects, and then you will use LINQ to query this information. You will initially call the LINQ extension methods directly, but then you will modify your code to use query operators.

Retrieve data from a BinaryTree by using the extension methods

Start Visual Studio 2013 if it is not already running.
Open the QueryBinaryTree solution, which is located in the Microsoft PressVisual CSharp Step By StepChapter 21Windows XQueryBinaryTree folder in your Documents folder. The project contains the Program.cs file, which defines the Program class with the Main and doWork methods that you saw in previous exercises.
In Solution Explorer, right-click the QueryBinaryTree project, point to Add, and then click Class. In the Add New Item – Query BinaryTree dialog box, in the Name box, type Employee.cs and then click Add.

Add the automatic properties shown below in bold to the Employee class:

class Employee
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Department { get; set; }
    public int Id { get; set; }
}

Add the ToString method shown in bold in the code that follows to the Employee class. Types in the .NET Framework use this method when converting the object to a string representation, such as when displaying it by using the Console.WriteLine statement.
```
class Employee
{
    ...
    public override string ToString()
    {
        return String.Format("Id: {0}, Name: {1} {2}, Dept: {3}",
            this.Id, this.FirstName, this.LastName, this.Department);
    }
}
```
Modify the definition of the Employee class to implement the IComparable<Employee> interface, as shown here:
```
class Employee : IComparable<Employee>
{
}
```
This step is necessary because the BinaryTree class specifies that its elements must be “comparable.”
Right-click the IComparable<Employee> interface in the class definition, point to Implement Interface, and then click Implement Interface Explicitly.
This action generates a default implementation of the CompareTo method. Remember that the BinaryTree class calls this method when it needs to compare elements when inserting them into the tree.
Replace the body of the CompareTo method with the code that follows shown in bold. This implementation of the CompareTo method compares Employee objects based on the value of the Id field.
```
int IComparable<Employee>.CompareTo(Employee other)
{
    if (other == null)
    {
        return 1;
    }
    if (this.Id > other.Id)
    {
        return 1;
    }
    if (this.Id < other.Id)
    {
        return -1;
    }
    return 0;
}
```
Note
For a description of the IComparable<T> interface, refer to Chapter 19.
In Solution Explorer, right-click the QueryBinaryTree solution, point to Add, and then click Existing Project. In the Add Existing Project dialog box, move to the folder Microsoft PressVisual CSharp Step By StepChapter 21Windows XBinaryTree in your Documents folder, click the BinaryTree project, and then click Open.
The BinaryTree project contains a copy of the enumerable BinaryTree class that you implemented in Chapter 19.
In Solution Explorer, right-click the QueryBinaryTree project, and then, on the shortcut menu that opens, click Add Reference. In the Reference Manager – QueryBinaryTree dialog box, in the left pane, click Solution. In the middle pane, select the BinaryTree project, and then click OK.
Display the Program.cs file for the QueryBinaryTree project in the Code and Text Editor window, and verify that the list of using directives at the top of the file includes the following line of code:
```
using System.Linq;
```
Add the following using directive that brings the BinaryTree namespace into scope to the list at the top of the Program.cs file:
```
using BinaryTree;
```

In the doWork method in the Program class, remove the // TODO: comment and add the following statements shown in bold type to construct and populate an instance of the BinaryTree class:

static void doWork()
{
  Tree<Employee> empTree = new Tree<Employee>(
    new Employee { Id = 1, FirstName = "Kim", LastName = "Abercrombie",
                             Department = "IT" });
 empTree.Insert( new Employee { Id = 2, FirstName = "Jeff", LastName = "Hay",
                                                   Department = "Marketing" });
 empTree.Insert( new Employee { Id = 4, FirstName = "Charlie", LastName = "Herb",
                                                   Department = "IT" });
 empTree.Insert( new Employee { Id = 6, FirstName = "Chris", LastName = "Preston",
                                                   Department = "Sales"});
 empTree.Insert( new Employee { Id = 3, FirstName = "Dave", LastName = "Barnett",
                                                   Department = "Sales" });
 empTree.Insert( new Employee { Id = 5, FirstName = "Tim", LastName = "Litton",
                                                   Department="Marketing" });}

Add the following statements shown in bold to the end of the doWork method. This code invokes the Select method to list the departments found in the binary tree.

static void doWork()
{
    ...
    Console.WriteLine("List of departments");
    var depts = empTree.Select(d => d.Department);
    foreach (var dept in depts)
    {
        Console.WriteLine("Department: {0}", dept);
    }
}

On the Debug menu, click Start Without Debugging.
The application should output the following list of departments:
```
List of departments
Department: IT
Department: Marketing
Department: Sales
Department: IT
Department: Marketing
Department: Sales
```
Each department occurs twice because there are two employees in each department. The order of the departments is determined by the CompareTo method of the Employee class, which uses the Id property of each employee to sort the data. The first department is for the employee with the Id value 1, the second department is for the employee with the Id value 2, and so on.
Press Enter to return to Visual Studio 2013.
In the doWork method in the Program class, modify the statement that creates the enumerable collection of departments as shown in bold in the following example:
```
var depts = empTree.Select(d => d.Department).Distinct();
```
The Distinct method removes duplicate rows from the enumerable collection.
On the Debug menu, click Start Without Debugging.
Verify that the application now displays each department only once, like this:
```
List of departments
Department: IT
Department: Marketing
Department: Sales
```
Press Enter to return to Visual Studio 2013.

Add the following statements shown in bold to the end of the doWork method. This block of code uses the Where method to filter the employees and return only those in the IT department. The Select method returns the entire row rather than projecting specific columns.

static void doWork()
{
    ...
    Console.WriteLine("
Employees in the IT department");
    var ITEmployees =
        empTree.Where(e => String.Equals(e.Department, "IT"))
        .Select(emp => emp);
    foreach (var emp in ITEmployees)
    {
        Console.WriteLine(emp);
    }
}

Add the code shown below in bold to the end of the doWork method, after the code from the preceding step. This code uses the GroupBy method to group the employees found in the binary tree by department. The outer foreach statement iterates through each group, displaying the name of the department. The inner foreach statement displays the names of the employees in each department.

static void doWork()
{
    ...
    Console.WriteLine("
All employees grouped by department");
    var employeesByDept = empTree.GroupBy(e => e.Department);
    foreach (var dept in employeesByDept)
    {
        Console.WriteLine("Department: {0}", dept.Key);
        foreach (var emp in dept)
        {
            Console.WriteLine("	{0} {1}", emp.FirstName, emp.LastName);
        }
    }
}

On the Debug menu, click Start Without Debugging. Verify that the output of the application looks like this:

List of departments
Department: IT
Department: Marketing
Department: Sales
Employees in the IT department
Id: 1, Name: Kim Abercrombie, Dept: IT
Id: 4, Name: Charlie Herb, Dept: IT
All employees grouped by department
Department: IT
        Kim Abercrombie
        Charlie Herb
Department: Marketing
        Jeff Hay
        Tim Litton
Department: Sales
        Dave Barnett
        Chris Preston

Press Enter to return to Visual Studio 2013.

Retrieve data from a BinaryTree by using query operators

In the doWork method, comment out the statement that generates the enumerable collection of departments, and replace it with the equivalent statement shown in bold, using the from and select query operators:
```
// var depts = empTree.Select(d => d.Department).Distinct();
var depts = (from d in empTree
             select d.Department).Distinct();
```

Comment out the statement that generates the enumerable collection of employees in the IT department, and replace it with the following code shown in bold:

// var ITEmployees =
//    empTree.Where(e => String.Equals(e.Department, "IT"))
//    .Select(emp => emp);
var ITEmployees = from e in empTree
                  where String.Equals(e.Department, "IT")
                  select e;

Comment out the statement that generates the enumerable collection grouping employees by department, and replace it with the statement shown in bold in the following code:
```
// var employeesByDept = empTree.GroupBy(e => e.Department);
var employeesByDept = from e in empTree
                      group e by e.Department;
```

On the Debug menu, click Start Without Debugging. Verify that the program displays the same results as before.

List of departments
Department: IT
Department: Marketing
Department: Sales
Employees in the IT department
Id: 1, Name: Kim Abercrombie, Dept: IT
Id: 4, Name: Charlie Herb, Dept: IT
All employees grouped by department
Department: IT
        Kim Abercrombie
        Charlie Herb
Department: Marketing
        Jeff Hay
        Tim Litton
Department: Sales
        Dave Barnett
        Chris Preston

Press Enter to return to Visual Studio 2013.

LINQ and deferred evaluation

When you use LINQ to define an enumerable collection, either by using the LINQ extension methods or by using query operators, you should remember that the application does not actually build the collection at the time that the LINQ extension method is executed; the collection is enumerated only when you iterate over it. This means that the data in the original collection can change between executing a LINQ query and retrieving the data that the query identifies; you will always fetch the most up-to-date data. For example, the following query (which you saw earlier) defines an enumerable collection of companies in the United States:

var usCompanies = from a in addresses
                  where String.Equals(a.Country, "United States")
                  select a.CompanyName;

The data in the addresses array is not retrieved, and any conditions specified in the Where filter are not evaluated until you iterate through the usCompanies collection:

foreach (string name in usCompanies)
{
    Console.WriteLine(name);
}

If you modify the data in the addresses array between defining the usCompanies collection and iterating through the collection (for example, if you add a new company based in the United States), you will see this new data. This strategy is referred to as deferred evaluation.

You can force evaluation of a LINQ query when it is defined and generate a static, cached collection. This collection is a copy of the original data and will not change if the data in the collection changes. LINQ provides the ToList method to build a static List object containing a cached copy of the data. You use it like this:

var usCompanies = from a in addresses.ToList()
                  where String.Equals(a.Country, "United States")
                  select a.CompanyName;

This time, the list of companies is fixed when you create the query. If you add more United States companies to the addresses array, you will not see them when you iterate through the usCompanies collection. LINQ also provides the ToArray method that stores the cached collection as an array.

In the final exercise in this chapter, you will compare the effects of using deferred evaluation of a LINQ query to generating a cached collection.

Examine the effects of deferred and cached evaluation of a LINQ query

Return to Visual Studio 2013, display the QueryBinaryTree project, and then edit the Program.cs file.

Comment out the contents of the doWork method apart from the statements that construct the empTree binary tree, as shown here:

static void doWork()
{
  Tree<Employee> empTree = new Tree<Employee>(
    new Employee { Id = 1, FirstName = "Kim", LastName = "Abercrombie",
                           Department = "IT" });
    empTree.Insert( new Employee { Id = 2, FirstName = "Jeff", LastName = "Hay",
                                                  Department = "Marketing" });
    empTree.Insert( new Employee { Id = 4, FirstName = "Charlie", LastName = "Herb",
                                                  Department = "IT" });
    empTree.Insert( new Employee { Id = 6, FirstName = "Chris", LastName = "Preston",
                                                  Department = "Sales"});
    empTree.Insert( new Employee { Id = 3, FirstName = "Dave", LastName = "Barnett",
                                                  Department = "Sales" });
    empTree.Insert( new Employee { Id = 5, FirstName = "Tim", LastName = "Litton",
                                                  Department="Marketing" });
     // comment out the rest of the method
    ...
}

Tip

You can comment out a block of code by selecting the entire block in the Code and Text Editor window and then clicking the Comment Out The Selected Lines button on the toolbar or by pressing Ctrl+E and then pressing C.

Add the following statements shown in bold to the doWork method, after the code that creates and populates the empTree binary tree:
```
static void doWork()
{
    ...
    Console.WriteLine("All employees");
    var allEmployees = from e in empTree
                       select e;
    foreach (var emp in allEmployees)
    {
        Console.WriteLine(emp);
    }
    ...
}
```
This code generates an enumerable collection of employees named allEmployees and then iterates through this collection, displaying the details of each employee.

Add the following code immediately after the statements you typed in the preceding step:

static void doWork()
{
    ...
    empTree.Insert(new Employee
    {
        Id = 7,
        FirstName = "David",
        LastName = "Simpson",
        Department = "IT"
    });
    Console.WriteLine("
Employee added");
    Console.WriteLine("All employees");
    foreach (var emp in allEmployees)
    {
        Console.WriteLine(emp);
    }
    ...
}

These statements add a new employee to the empTree tree and then iterate through the allEmployees collection again.

On the Debug menu, click Start Without Debugging. Verify that the output of the application looks like this:

All employees
Id: 1, Name: Kim Abercrombie, Dept: IT
Id: 2, Name: Jeff Hay, Dept: Marketing
Id: 3, Name: Dave Barnett, Dept: Sales
Id: 4, Name: Charlie Herb, Dept: IT
Id: 5, Name: Tim Litton, Dept: Marketing
Id: 6, Name: Chris Preston, Dept: Sales
Employee added
All employees
Id: 1, Name: Kim Abercrombie, Dept: IT
Id: 2, Name: Jeff Hay, Dept: Marketing
Id: 3, Name: Dave Barnett, Dept: Sales
Id: 4, Name: Charlie Herb, Dept: IT
Id: 5, Name: Tim Litton, Dept: Marketing
Id: 6, Name: Chris Preston, Dept: Sales
Id: 7, Name: David Simpson, Dept: IT

Notice that the second time the application iterates through the allEmployees collection, the list displayed includes David Simpson, even though this employee was added only after the allEmployees collection was defined.

Press Enter to return to Visual Studio 2013.
In the doWork method, change the statement that generates the allEmployees collection to identify and cache the data immediately, as shown here in bold:
```
var allEmployees = from e in empTree.ToList<Employee>()
                   select e;
```
LINQ provides generic and nongeneric versions of the ToList and ToArray methods. If possible, it is better to use the generic versions of these methods to ensure the type safety of the result. The data returned by the select operator is an Employee object, and the code shown in this step generates allEmployees as a generic List<Employee> collection. If you specify the nongeneric ToList method, the allEmployees collection will be a List of object types.

On the Debug menu, click Start Without Debugging. Verify that the output of the application looks like this:

All employees
Id: 1, Name: Kim Abercrombie, Dept: IT
Id: 2, Name: Jeff Hay, Dept: Marketing
Id: 3, Name: Dave Barnett, Dept: Sales
Id: 4, Name: Charlie Herb, Dept: IT
Id: 5, Name: Tim Litton, Dept: Marketing
Id: 6, Name: Chris Preston, Dept: Sales
Employee added
All employees
Id: 1, Name: Kim Abercrombie, Dept: IT
Id: 2, Name: Jeff Hay, Dept: Marketing
Id: 3, Name: Dave Barnett, Dept: Sales
Id: 4, Name: Charlie Herb, Dept: IT
Id: 5, Name: Tim Litton, Dept: Marketing
Id: 6, Name: Chris Preston, Dept: Sales

Notice that the second time the application iterates through the allEmployees collection, the list displayed does not include David Simpson. This is because the query is evaluated and the results are cached before David Simpson is added to the empTree binary tree.

Press Enter to return to Visual Studio 2013.

Summary

In this chapter, you learned how LINQ uses the IEnumerable<T> interface and extension methods to provide a mechanism for querying data. You also saw how these features support the query expression syntax in C#.

If you want to continue to the next chapter, keep Visual Studio 2013 running, and turn to Chapter 22.
If you want to exit Visual Studio 2013 now, on the File menu, click Exit. If you see a Save dialog box, click Yes and save the project.

Quick reference

To	Do this
Project specified fields from an enumerable collection	Use the Select method and specify a lambda expression that identifies the fields to project. For example: var customerFirstNames = customers.Select(cust => cust.FirstName); Or use the from and select query operators. For example: var customerFirstNames = from cust in customers select cust.FirstName;
Filter rows from an enumerable collection	Use the Where method, and specify a lambda expression containing the criteria that rows should match. For example: var usCompanies = addresses.Where(addr => String.Equals(addr.Country, "United States")) .Select(usComp => usComp.CompanyName); Or use the where query operator. For example: var usCompanies = from a in addresses where String.Equals(a.Country, "United States") select a.CompanyName;
Enumerate data in a specific order	Use the OrderBy method and specify a lambda expression identifying the field to use to order rows. For example: var companyNames = addresses.OrderBy(addr => addr.CompanyName) .Select(comp => comp.CompanyName); Or, use the orderby query operator. For example: var companyNames = from a in addresses orderby a.CompanyName select a.CompanyName;
Group data by the values in a field	Use the GroupBy method and specify a lambda expression identifying the field to use to group rows. For example: var companiesGroupedByCountry = addresses.GroupBy(addrs => addrs.Country); Or, use the group by query operator. For example: var companiesGroupedByCountry = from a in addresses group a by a.Country;
Join data held in two different collections	Use the Join method, specifying the collection with which to join, the join criteria, and the fields for the result. For example: var countriesAndCustomers = customers .Select(c => new { c.FirstName, c.LastName, c.CompanyName }). Join(addresses, custs => custs.CompanyName, addrs => addrs.CompanyName, (custs, addrs) => new {custs.FirstName, custs.LastName, addrs.Country }); Or, use the join query operator. For example: var countriesAndCustomers = from a in addresses join c in customers on a.CompanyName equals c.CompanyName select new { c.FirstName, c.LastName, a.Country };
Force immediate generation of the results for a LINQ query	Use the ToList or ToArray method to generate a list or an array containing the results. For example: var allEmployees = from e in empTree.ToList<Employee>() select e;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 21. Querying in-memory data by using query expressions

Create new playlist

Sign In

Sign Up

Chapter 21. Querying in-memory data by using query expressions

What is Language-Integrated Query?

Using LINQ in a C# application

Note

Note

Selecting data

Note

Filtering data

Ordering, grouping, and aggregating data

Joining data

Note

Using query operators

Tip

Note

Note

Querying data in Tree<TItem> objects

Note

LINQ and deferred evaluation

Tip

Summary

Quick reference

Table of Contents for
21. Querying in-memory data by using query expressions