Chapter 11
Joining Query Results

In This Chapter

Image Using Multiple From Clauses

Image Defining Inner Joins

Image Using Custom, or Nonequijoins

Image Implementing Group Join and Left Outer Join

Image Implementing a Cross Join

Image Defining Joins Based on Composite Keys

I’m still vertical.”

—Jackson Wayfare

If you have experience with SQL joins, the basics of LINQ joining are pretty straightforward. If you have no SQL experience, this chapter covers the basics and more advanced join queries. This chapter describes when the literal keyword join can be used to define a join and when where predicates are used instead. Inner joins, left joins, cross joins, group joins, custom joins, and joins based on composite keys are all covered.

Using Multiple From Clauses

The presence of multiple from clauses in a LINQ query does not necessarily represent a join. A correlation is required. A correlation is where the query indicates how one sequence is related to another. This can be done in the join clause with the equals keyword, an inequality expression, or multiple expressions. Listing 11.1 contains an example that has multiple from clauses but no correlation between the sequence in the first from and second from clause.

Listing 11.1 Multiple from Clauses But No Correlated Join

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace MultipleFroms
{
 class Program
 {
   static void Main(string[] args)
   {
     string[] willRogers =
     {
       “Don’t gamble; take all your savings “,
       “and buy some good stock and hold it “,
       “till it goes up, then sell it. If it “,
       “don’t go up, don’t buy it. Will Rogers”
     };

     // part, word, words, and one are all called range variables
     var horseSense = from part in willRogers
                      let words = part.Split(‘;’, ‘.’, ‘ ‘, ‘,’)
                      from word in words
                      let one = word.ToUpper()
                      where one.Contains(‘I’)
                      select word;
     Array.ForEach(horseSense.ToArray(), w => Console.WriteLine(w));
     Console.ReadLine();
   }
 }
}

The quotation attributed to Will Rogers (willRogers) is an array of strings. The query contains two from clauses, one from the array of strings and the second from the let variable words. The query selects each of the strings from the original quote. Each string is split and assigned to words. The second from clause selects each word in the words, converts them to uppercase, and selects those that contain I.

In the clause “from part in willRogers”, part is called the range. The range variable is like the iteration variable in a for loop. The keyword let is also used to define a range. Thus, in the query in Listing 11.1, there are four range variables: part, words, word, and one. LINQ infers the type of range variables from the context.

Range variables are a limiting factor that defines when you can use join and when you can’t. This concept is covered in more detail in the section titled “Using Custom, or Nonequijoins.”

Defining Inner Joins

Given two sources of data (this chapter refers to one as the left source and the other as the right source), an inner join is a join such that only objects that have matching criteria in both the left and right side are matched. In LINQ, this literally translates to all the items in the first from clause are returned only if there are items in the second from clause. The number of items in the first and second sequences does not have to be equal, but all of the items that have matching criteria in both sequences are returned to form a new sequence.

Inner joins are generally based on what is commonly referred to as master-detail or parent-child relationships. Generally, there is one item in the master source for every one or more items in the detail source. Certain things can happen to the data, like children without parents—called orphans—but this doesn’t change the way you request an inner join sequence. Listing 11.2 demonstrates an inner join based on a contrived data set.

Listing 11.2 A Typical Inner Join Based on a Single “Key,” the Customer Object’s ID.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace JoinDemo
{
 class Program
 {
   static void Main(string[] args)
   {
     List<Customer> customers = new List<Customer>{
       new Customer{ID=1, CompanyName=”Tom’s Toffees”},
       new Customer{ID=2, CompanyName=”Karl’s Coffees”},
       new Customer{ID=3, CompanyName=”Mary’s Marshmallow Cremes”}};

     List<Order> orders = new List<Order>{
       new Order{ID=1, CustomerID=1, ItemDescription=”Granulated Sugar”},
       new Order{ID=2, CustomerID=1, ItemDescription=”Molasses”},
       new Order{ID=3, CustomerID=2, ItemDescription=”French Roast
Beans”},
       new Order{ID=4, CustomerID=2, ItemDescription=”Ceramic Cups”}};

     // join
     var join1 = from customer in customers
                  join order in orders on customer.ID equals
                  order.CustomerID
                  select new {Name=customer.CompanyName,
                    Item=order.ItemDescription};
     Array.ForEach(join1.ToArray(), o=>Console.WriteLine
       (“Name: {0}, Item: {1}”, o.Name, o.Item));
     Console.ReadLine();
   }
 }

 public class Customer
 {
   public int ID{get; set;}
   public string CompanyName{get; set; }
 }

 public class Order
 {
   public int ID{get; set; }
   public int CustomerID{get; set; }
   public string ItemDescription{get; set;}
 }
}

In Listing 11.2, the customers and orders are related based on the customer ID in both the Customer and Order classes. Tom’s Toffees is used twice because it appears once in the customers sequence and is associated with two orders. Mary’s Marshmallow Cremes does not show up in the resultset because Mary’s has no orders in the data.

The query in Listing 11.2 is called an equijoin. This query is testing for equality, so it uses the join keyword. Notice that the LINQ query’s on predicate uses the keyword equals instead of the operator ==. You can write custom joins based on inequality, called nonequijoins, but you can’t use the keyword join for inequality tests.

Using Custom, or Nonequijoins

A literal join clause performs an equijoin using the join, on, and equals keywords. You can perform nonequijoins for cross joins and customs joins without the join keyword when the join is predicated on inequality, multiple expressions of equality or inequality, or when a temporary range variable using let is introduced for the right-side sequence. To define a custom join, define the correlation between sequences in a where clause.

Defining a Nonequal Custom Join

If there are two sequences containing keys, equijoins based on those keys are straightforward. However, if the second sequence is part of search criteria or input from a user, the second sequence would not be composed of objects and, hence, there would be no keys to join on. In this scenario, you could use a custom join and a where predicate.

Listing 11.3 defines two extension methods for IDataReader. The extension methods read and initialize a collection of products and suppliers. The Product and Supplier code is boilerplate code representing the Northwind Products and Suppliers tables. The classes Product and Supplier are entity classes.

In Listing 11.3, productIDsWanted represents the selection criteria and the LINQ query uses a range variable ID to determine which Product objects to return. The SupplierID and the ProductName are displayed to the console using the Array.ForEach method and the Action<T> delegate.

Listing 11.3 Custom Join Based on Selection Criteria Sequence

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
using System.Data.SqlClient;

namespace CustomJoin
{

 class Program
 {

   static void Main()
   {

     var productIDsWanted = new int[] { 1, 2, 3, 4, 5, 6 };

     List<Product> products = GetProducts();

     var results =
       from p in products
       from id in productIDsWanted
       where p.SupplierID == id
       orderby p.SupplierID
       select p;

     Array.ForEach(results.ToArray(), target => Console.WriteLine(
       “Supplier ID={0,4}, Product={1}”,
       target.SupplierID, target.ProductName
       ));
     Console.ReadLine();

   }

   private static readonly string connectionString =
       “Data Source=.\SQLEXPRESS;AttachDbFilename=”” +
       “C:\Books\Sams\LINQ\Northwind\northwnd.mdf”;” +
       “Integrated Security=True;Connect Timeout=30;User Instance=True”;

   private static List<Product> GetProducts()
   {

     const string SQL = “SELECT * FROM PRODUCTS”;
     using(SqlConnection connection = new SqlConnection(connectionString))
     {
       connection.Open();
       SqlCommand command = new SqlCommand(SQL, connection);
       IDataReader reader = command.ExecuteReader();
       return reader.ReadProducts();
     }
   }

   private static List<Supplier> GetSuppliers()
   {

     const string SQL = “SELECT * FROM Suppliers”;
     List<Supplier> suppliers = new List<Supplier>();

     using (SqlConnection connection = new SqlConnection(connectionString))
     {
       connection.Open();
       SqlCommand command = new SqlCommand(SQL, connection);
       IDataReader reader = command.ExecuteReader();
       return reader.ReadSuppliers();
     }
   }

 }
 public class Product
 {
   public int? ProductID{ get; set; }
   public string ProductName{ get; set; }
   public int? SupplierID{ get; set; }
   public int? CategoryID{ get; set; }
   public string QuantityPerUnit{ get; set; }
   public decimal? UnitPrice{ get; set; }
   public int? UnitsInStock{ get; set; }
   public int? UnitsOnOrder{ get; set; }
   public int? ReorderLevel{ get; set; }
   public bool? Discontinued{ get; set; }
..}

 public class Supplier
 {
   public int? SupplierID{ get; set; }
   public string CompanyName{ get; set; }
   public string ContactName{ get; set; }
   public string ContactTitle{ get; set; }
   public string Address{ get; set; }
   public string City{ get; set; }
   public string Region{ get; set; }
   public string PostalCode{ get; set; }
   public string Country{ get; set; }
   public string Phone{ get; set; }
   public string Fax{ get; set; }
   public string HomePage{ get; set; }
 }

 public static class ExtendsReader
 {
   public static List<Product> ReadProducts(this IDataReader reader)
   {

     List<Product> list = new List<Product>();
     while (reader.Read())
     {
       Product o = new Product();
       o.ProductID = reader.GetInt32(0);
       o.ProductName = reader.GetString(1);
       o.SupplierID = reader.GetInt32(2);
       o.CategoryID = reader.GetInt32(3);
       o.QuantityPerUnit = reader.GetString(4);
       o.UnitPrice = reader.GetDecimal(5);
       o.UnitsInStock = reader.GetInt16(6);
       o.UnitsOnOrder = reader.GetInt16(7);
       o.ReorderLevel = reader.GetInt16(8);
       o.Discontinued = reader.GetBoolean(9);
       list.Add(o);
     }

     return list;
   }

   public static List<Supplier> ReadSuppliers(this IDataReader reader)
   {
     List<Supplier> list = new List<Supplier>();
     while(reader.Read())
     {
       Supplier o = new Supplier();
       o.SupplierID = reader.GetInt32(0);
       o.CompanyName = reader.GetString(1);
       o.ContactName = reader.GetString(2);
       o.ContactTitle = reader.GetString(3);
       o.Address = reader.GetString(4);
       o.City = reader.GetString(5);
       o.Region = reader[“Region”] ==
         System.DBNull.Value ? ““ : reader.GetString(6);
       o.PostalCode = reader.GetString(7);
       o.Country = reader.GetString(8);
       o.Phone = reader.GetString(9);
       o.Fax = reader[“Fax”] ==
         System.DBNull.Value ? ““ : reader.GetString(10);
       o.HomePage = reader[“HomePage”] == System.DBNull.Value ?
         ““ : reader.GetString(11);

       list.Add(o);
     }

     return list;
   }
 }
}

Notice the use of the formatting sequence on the Console.WriteLine statement {0,4}. You can use additional formatting input to manage the layout or formatting. Console.WriteLine has an overloaded version that invokes string.Format and {0,4} pads the output for format argument 0 to a width of 4, left-justifying the output.

Defining a Custom Join with Multiple Predicates

If there are multiple selection criteria, you can add additional predicates to the where clause of a custom join. Listing 11.4 shows a revision to the LINQ query from Listing 11.3. The new query further refines the selection criteria to those elements between 2 and 5.

Listing 11.4 Refined Custom Join Based on the Selection Criteria Sequence and Additional Predicates in the where Clause

var results =
       from p in products
       from id in productIDsWanted
       where p.SupplierID == id
       && id > 2 && id < 5
       orderby p.SupplierID
       select p;

Suppose you want the actual supplier objects and the product objects to get the name of the supplier as well as the name of the product. You could add an equijoin on supplier and product (you would need to obtain a list of suppliers with the GetSuppliers method) and a custom join on the sequence of integers. The Main function in Listing 11.5 can be used to replace Main in Listing 11.3. The revised Main function in Listing 11.5 displays the company name of the supplier instead of the ID and the product name.

Listing 11.5 A Revision to Main in Listing 11.3 That Displays the Supplier’s Name Instead of the ID and the Product Name

static void Main()
{
 var productIDsWanted = new int[] { 1, 2, 3, 4, 5, 6 };

 List<Product> products = GetProducts();
 List<Supplier> suppliers = GetSuppliers();

 var results =
   from p in products
   join s in suppliers on p.SupplierID equals s.SupplierID
   from id in productIDsWanted
   where p.SupplierID == id
   && id > 2 && id < 5
   orderby p.SupplierID
   select new { SupplierName = s.CompanyName, ProductName = p.ProductName };

   Array.ForEach(results.ToArray(), target => Console.WriteLine(
     “Supplier ID={0}, Product={1}”,
     target.SupplierName, target.ProductName
   ));
   Console.ReadLine();
}

Defining Custom Joins with a Temporary Range Variable

As mentioned earlier in the chapter, the variable introduced in the from clause before the in keyword is referred to as a range variable. It plays the same role as the iterator variable in a for loop. You can introduce additional range variables using the let keyword. If you introduce a range variable for the right side of an inner join, you can’t use the join on equals clause; instead, you have to use a custom join with a where clause and a predicate.

Listing 11.6 is a variation of Listing 11.3. (That’s why the entire listing is here.) Listing 11.6 shows how you can combine multiple queries in a single SQL statement used to initialize the SqlCommand object. IDataReader.Read iterates through all of the elements in a single resultset and IDataReader.NextResult iterates over the resultsets. Because the IDataReader is forward only, you want to get all of the elements from each resultset before moving to the next resultset, as demonstrated in Listing 11.6.

Listing 11.6 Custom Join with Temporary Range Variable Using let

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
using System.Data.SqlClient;

namespace NonEquijoin
{

 class Program
 {

   static void Main()
   {
     const string connectionString =
       //”Data Source=BUTLER;Initial Catalog=Northwind;Integrated Security=True”;

       “Data Source=.\SQLEXPRESS;AttachDbFilename=”” +
       “C:\Books\Sams\LINQ\Northwind\northwnd.mdf”;” +
       “Integrated Security=True;Connect Timeout=30;User Instance=True”;
     const string SQL = “SELECT * FROM PRODUCTS; SELECT * FROM SUPPLIERS”;
     List<Product> products = new List<Product>();
     List<Supplier> suppliers = new List<Supplier>();

     using(SqlConnection connection = new SqlConnection(connectionString))
     {
       connection.Open();
       SqlCommand command = new SqlCommand(SQL, connection);
       IDataReader reader = command.ExecuteReader();
       products = reader.ReadProducts();

       if(reader.NextResult())
         suppliers = reader.ReadSuppliers();

     }

     var nonEquiJoin =
       from p in products
         let master = from s in suppliers select s.SupplierID
         where master.Contains(p.SupplierID)
         orderby p.SupplierID
         select new {SupplierID=p.SupplierID, ProductName = p.ProductName};

     Array.ForEach(nonEquiJoin.ToArray(), target=>Console.WriteLine(
       “Supplier ID={0,4}, Product={1}”,
       target.SupplierID, target.ProductName
       ));
     Console.ReadLine();
   }
 }

 public class Product
 {
   public int? ProductID{ get; set; }
   public string ProductName{ get; set; }
   public int? SupplierID{ get; set; }
   public int? CategoryID{ get; set; }
   public string QuantityPerUnit{ get; set; }
   public decimal? UnitPrice{ get; set; }
   public int? UnitsInStock{ get; set; }
   public int? UnitsOnOrder{ get; set; }
   public int? ReorderLevel{ get; set; }
   public bool? Discontinued{ get; set; }
 }
 public class Supplier
 {
   public int? SupplierID{ get; set; }
   public string CompanyName{ get; set; }
   public string ContactName{ get; set; }
   public string ContactTitle{ get; set; }
   public string Address{ get; set; }
   public string City{ get; set; }
   public string Region{ get; set; }
   public string PostalCode{ get; set; }
   public string Country{ get; set; }
   public string Phone{ get; set; }
   public string Fax{ get; set; }
   public string HomePage{ get; set; }
 }

 public static class ExtendsReader
 {
   public static List<Product> ReadProducts(this IDataReader reader)
   {

     List<Product> list = new List<Product>();
     while (reader.Read())
     {
       Product o = new Product();
       o.ProductID = reader.GetInt32(0);
       o.ProductName = reader.GetString(1);
       o.SupplierID = reader.GetInt32(2);
       o.CategoryID = reader.GetInt32(3);
       o.QuantityPerUnit = reader.GetString(4);
       o.UnitPrice = reader.GetDecimal(5);
       o.UnitsInStock = reader.GetInt16(6);
       o.UnitsOnOrder = reader.GetInt16(7);
       o.ReorderLevel = reader.GetInt16(8);
       o.Discontinued = reader.GetBoolean(9);
       list.Add(o);
     }

     return list;
   }

   public static List<Supplier> ReadSuppliers(this IDataReader reader)
   {
     List<Supplier> list = new List<Supplier>();
     while(reader.Read())
     {
       Supplier o = new Supplier();
       o.SupplierID = reader.GetInt32(0);
       o.CompanyName = reader.GetString(1);
       o.ContactName = reader.GetString(2);
       o.ContactTitle = reader.GetString(3);
       o.Address = reader.GetString(4);
       o.City = reader.GetString(5);
       o.Region = reader[“Region”] ==
         System.DBNull.Value ? ““ : reader.GetString(6);
       o.PostalCode = reader.GetString(7);
       o.Country = reader.GetString(8);
       o.Phone = reader.GetString(9);
       o.Fax = reader[“Fax”] ==
         System.DBNull.Value ? ““ : reader.GetString(10);
       o.HomePage = reader[“HomePage”] == System.DBNull.Value ?
         ““ : reader.GetString(11);

       list.Add(o);
     }

     return list;
   }
 }
}

The LINQ query in Listing 11.6 uses the range variable master defined from an additional LINQ query that returns a sequence of just SupplierIDs. Because the listing uses let, a where clause implements the implied custom join, returning only those products that contain the predicated SupplierID. The results are ordered by the SupplierID.

Implementing Group Join and Left Outer Join

The group join and the left outer join use a group join but produce different resultsets. A group join is emitted to Microsoft Intermediate Language (MSIL, managed code’s assembly language output) as an extension method GroupJoin when the into clause is used in a LINQ query. The group is represented as a master-detail relationship with a single instance of elements from the master sequence and subordinate, related sequences representing the details (or children). A left outer join uses the GroupJoin method (and into clause) but the data is flattened back out, denormalizing elements from both sequences. Unlike the inner join, however, you still get master sequence elements that have no child, detail sequences.

For example, if you perform group join on customers and orders from Listing 11.2, you get one result for Tom’s, one for Karl’s, and one for Mary’s. Tom’s contains a child sequence containing Granulated Sugar and Molasses, Karl’s contains a sequence containing French Roast Beans and Ceramic Cups, and Mary’s has no child sequence. If you convert the group to a left join, the customer data reappears each time for every child element and Mary’s appears once with an empty child element.

The next subsection demonstrates the group join and the following subsection demonstrates the revision that produces the flattened left join.

Defining a Group Join

SQL experts know what groups do. As a reminder for the rest of us, a group is a way of returning items in a first sequence and correlating each of the items with the related elements from the second sequence. For example, in Listing 11.2, Tom’s Toffees sells Granulated Sugar and Molasses. With an inner join, Tom’s Toffees is returned and repeated for each of the items Tom sells. If you use a group join, Tom’s Toffees is returned one time and is correlated with the items he sells. A group join, in effect, is a way to represent the normalized relationship existing in relational databases, and an inner join denormalizes (or repeats) data.

With the group join, the data is handled differently. An outer sequence represents the groups and an inner sequence in each group element represents the elements of the group. The result is a logical, master-detail relationship represented by the return type. In technical reality, you still have an IEnumerable<T> return result from group join queries, but each group is associated with a subordinate sequence related by a key value.

Listing 11.7 shows a straightforward use of the group join. It is the presence of the into keyword and predicate that instructs the compiler to emit the call to the GroupJoin extension method (see Figure 11.1).

Listing 11.7 Defining a Query That Uses the GroupJoin Behavior, Capturing the Master-Detail Relationship Between Customers and Orders

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace GroupJoin
{
 class Program
 {
   static void Main(string[] args)
   {
     List<Customer> customers = new List<Customer>{
       new Customer{ID=1, CompanyName=”Tom’s Toffees”},
       new Customer{ID=2, CompanyName=”Karl’s Coffees”},
       new Customer{ID=3, CompanyName=”Mary’s Marshmallow Cremes”}};

     List<Order> orders = new List<Order>{
       new Order{ID=1, CustomerID=1, ItemDescription=”Granulated Sugar”},
       new Order{ID=2, CustomerID=1, ItemDescription=”Molasses”},
       new Order{ID=3, CustomerID=2, ItemDescription=”French Roast Beans”},
       new Order{ID=4, CustomerID=2, ItemDescription=”Ceramic Cups”}};

     // group join
     var group = from customer in customers
                 join order in orders on customer.ID equals
                 order.CustomerID into children
                 select new { Customer = customer.CompanyName,
                   Orders = children };

     string separator = new string(‘-’, 40);
     foreach (var customer in group)
     {
       Console.WriteLine(“Customer: {0}”, customer.Customer);
       Console.WriteLine(separator);
       foreach (var order in customer.Orders)
       {
         Console.WriteLine(“ ID={0}, Item={1}”, order.ID, order.ItemDescription);
       }
       Console.WriteLine(Environment.NewLine);
     }

     Console.ReadLine();
   }
 }

 public class Customer
 {
   public int ID{get; set;}
   public string CompanyName{get; set; }
 }
 public class Order
 {
   public int ID{get; set; }
   public int CustomerID{get; set; }
   public string ItemDescription{get; set;}
 }
}

Figure 11.1 Lutz Roeder’s tool Reflector showing the disassembled GroupJoin sample with the emitted GroupJoin extension method emitted for the LINQ query.

Image

The way the query is designed in this example provides a master-detail relationship, which includes Mary’s Marshmallow Cremes even though there are no orders for Mary’s. Note the use of the nested fo r loop to get at the child sequence data. The output from Listing 11.7 is shown in Figure 11.2.

Implementing a Left Outer Join

A left outer join is an inner join plus all of the elements in the first range that don’t have related elements in the second range, so a left join is always greater than or equal to the size of an inner join.

A left joins starts out as a group join. You can use the join… on…equals… into phrase to start the left outer join, but you’ll want to flatten the resultset. To flatten the resultset, add an additional from clause based on the group. If you stop here, though, you are

Figure 11.2 The output from the group join demonstrates child sequences for every parent element even if the sequence is empty.

Image

simply taking the long route to an inner join. To get master elements that don’t have detail elements, you need to incorporate the DefaultIfEmpty extension method and express how to handle absent detail elements.

Listing 11.8 shows a revision to the Main function for Listing 11.7. This version adds an additional from clause and range child. The range child represents all of the elements in the subordinate sequences represented by children. The predicate

in children.DefaultIfEmpty(new Order())

means that if there is no Order for that child, you need to create an empty instance of the Order class. (This is a subtle implementation of the Null Object pattern; the basic concept is also described in Martin Fowler’s book, Refactoring, as the “Introduce Null Object” refactoring.) The result is that we get Tom’s, Karl’s, and Mary’s in the resultset.

Listing 11.8 A Revision to Listing 11.7’s Main Function That Converts the Basic Group Join into a Left Outer Join, Flattening the Master-Detail Relationship, Which Is the Nature of Group Relationships, into a Standard Join

static void Main(string[] args)
    {
     List<Customer> customers = new List<Customer>{
       new Customer{ID=1, CompanyName=”Tom’s Toffees”},
       new Customer{ID=2, CompanyName=”Karl’s Coffees”},
       new Customer{ID=3, CompanyName=”Mary’s Marshmallow Cremes”}};

     List<Order> orders = new List<Order>{
       new Order{ID=1, CustomerID=1, ItemDescription=”Granulated Sugar”},
       new Order{ID=2, CustomerID=1, ItemDescription=”Molasses”},
       new Order{ID=3, CustomerID=2, ItemDescription=”French Roast Beans”},
       new Order{ID=4, CustomerID=2, ItemDescription=”Ceramic Cups”}};

     // group join
     var group = from customer in customers
                 join order in orders on customer.ID equals
                 order.CustomerID into children
                 from child in children.DefaultIfEmpty(new
                   Order())
                 select new { Customer = customer.CompanyName,
                  Item=child.ItemDescription };

     foreach (var customer in group)
     {
       Console.WriteLine(“Customer={0}, Item={1}”, customer.Customer,
         customer.Item);
     }
     Console.ReadLine();
   }
 }

Notice that in the foreach loop, the internal loop is no longer needed. That’s because the query in Listing 11.8 is really no longer represented as a master-detail hierarchical result-set; it is simply a left outer join. As you can see from the output in Figure 11.3, the master elements—the customers—have been denormalized and are now repeated for each child element, and the Customer objects are present even if no child exists, as in Mary’s case.

Figure 11.3 The group join is converted to a left outer join by including an additional range variable on the group and using DefaultIfEmpty to handle null child sequences.

Image

Implementing a Cross Join

A cross join or Cartesian join is the basis for every other join. A cross join is a join between two sources absent any correlative predicate. The word Cartesian (after French mathematician René Descartes) refers to the product of the join. For example, if table 1 has 10 rows and table 2 has 25 rows, a cross (or Cartesian) join will produce a result with 250, or every combination of data from table 1 and table 2. Assuming you have a sequence of products and a sequence of customers, the following query (a cross join) would produce all of the combinations of customers and products (the Cartesian product):

var test = from customer in customers
          from product in products
          select new
          {
            customer.CompanyName,
            product.ProductName
           };

As you can imagine, given a left set with m elements and a right set with n elements, the cross join results in a sequence with m * n elements. For example, a customer table with 1,000 customers and a products table with a 1,000 products cross joined would return 1,000,000 rows of data in the resultset. You can imagine that cross join results can quickly become unmanageable.

In the routine case, a cross join is either intentionally being used to produce test data or is the result of a poorly formed correlation between the data sets. Some clever people have devised a useful purpose for cross joins. Suppose, for example, you have to produce a report for all customers and all products, even showing quantities for products a customer hasn’t purchased (where quantity equals 0). (The same approach would work for showing only products a customer isn’t buying.) A cross join of customers and products would include every combination of customer and product. Then, you could perform a left join to return every customer and product combination with actual sales figures, filling in quantities and sales values and zeroing out customer-product combinations that have no sales. Listing 11.9 contains such a SQL Server query for the Northwind database that produces a cross join of customers and products and then a left join to fill in the sales figures for customer-product combinations where orders exist. (Table 11.1 contains the first 10 rows of output from the query, and Figure 11.4 shows part of the execution plan, visually describing how SQL Server will execute the query.)

Table 11.1 The First 10 Rows Returned from the Cross Join Query in Listing 11.9

Image

Figure 11.4 Part of the execution plan from Microsoft SQL Server Management Studio.

Image

Listing 11.9 A SQL Query That Uses the Cross Join to Fill in Sales Figures for Customer and Products, Including Zeros for Customer-Product Combinations with No Sales Orders

SELECT c.CustomerID, c.CompanyName, p.ProductName, ISNULL(S.Quantity, 0) AS Quantity,
  ISNULL(S.UnitPrice, 0) AS UnitPrice, ISNULL(S.Discount, 0) AS Discount,
  ISNULL(ROUND((S.Quantity * S.UnitPrice) * (1 - S.Discount), 2), 0) AS TotalSales
FROM  Customers AS c CROSS JOIN
  Products AS p LEFT OUTER JOIN
   (SELECT o.CustomerID, od.ProductID, SUM(od.UnitPrice) AS UnitPrice,
     SUM(od.Quantity) AS Quantity, od.Discount
   FROM  Orders AS o INNER JOIN
     [Order Details] AS od ON o.OrderID = od.OrderID
   GROUP BY o.CustomerID, od.ProductID, od.Discount) AS S
ON S.CustomerID = c.CustomerID AND p.ProductID = S.ProductID
ORDER BY c.CustomerID, p.ProductName, UnitPrice DESC

The SQL version (and the LINQ version) require some effort to get right, but in many instances you will be trading lines of code for time because these kinds of monolithic queries are difficult to write, difficult to debug, and difficult to test. Listing 11.10 is an approximation of the query in Listing 11.9. (The plumbing and some sample code was added for testing purposes, but in practice this code would be necessary to write with or without LINQ.)

Listing 11.10 An Approximation of the SQL in Listing 11.9 as a LINQ Query

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;

namespace CrossJoinReportMagic
{
 class Program
 {
   static void Main(string[] args)
   {

     var orderInfo =   /* left outer join left hand side */
                       from cross in
                       (
                         /* cross join customer and products */
                         from customer in customers
                         from product in products
                         select new

                         {
                           customer.CustomerID,
                           customer.CompanyName,
                           product.ProductID,
                           product.ProductName
                         })
                       /* left outer join right hand side */
                       join groupJoin in
                       (

                       /* group join on orders and details */
                       (from order in orders
                        join detail in orderDetails on
                        order.OrderID equals detail.OrderID
                        group detail by new
                        {
                          order.CustomerID,
                          detail.ProductID,
                          detail.UnitPrice,
                          detail.Discount
                        } into groups
                        from item in groups
                        let Quantity = groups.Sum(d => d.Quantity)
                        select new
                        {
                          groups.Key.CustomerID,
                          item.ProductID,
                          item.UnitPrice,
                          Quantity,
                          item.Discount
                        }).Distinct()).DefaultIfEmpty()
                     on new { cross.CustomerID, cross.ProductID } equals
                        new { groupJoin.CustomerID, groupJoin.ProductID }
                     into grouped
                     from result in grouped.DefaultIfEmpty()
                     orderby  cross.CompanyName, cross.ProductID,
                       (result != null ? result.Quantity : 0) descending
                     select new
                     {
                       CustomerID = cross.CustomerID,
                       CompanyName = cross.CompanyName,
                       ProductID = cross.ProductID,
                       ProductName = cross.ProductName,
                       UnitPrice = result == null ? 0 : result.UnitPrice,
                       Quantity = result == null ? 0 : result.Quantity,
                       Discount = result == null ? 0 : result.Discount,
                        Total = result == null ? 0 :
                          result.Quantity * result.UnitPrice *
                         (1 - result.Discount)
                     };

     Dump(orderInfo);
     Console.ReadLine();
        }

   public static void Dump(object data)
   {
     if (data is IEnumerable)
     {
       IEnumerable list = data as IEnumerable;
       IEnumerator enumerator = list.GetEnumerator();
       int i = 0;
       bool once = false;
       while(enumerator.MoveNext())
       {
         i++;
         object current = enumerator.Current;

         PropertyInfo[] props = current.GetType().GetProperties();
         if(!once /* write headers */)
         {
           foreach(PropertyInfo info in props)
           {
              Console.Write(“{0,-10}”, info.Name.Length > 9 ? info.Name.Remove(9):
               info.Name.PadRight(10));
             once = true;
           }
         }

         foreach (PropertyInfo info in props)
         {
           object value = null;
           try{ value = info.GetValue(current, null); }
           catch { value = “<empty>”; }

           string val = (value.ToString().Length > 9) ?
             value.ToString().Remove(9) : value.ToString().PadRight(10);

           if (info.PropertyType == typeof(decimal))
             Console.Write(“{0,-10:C02}”, Convert.ToDecimal(val));
           else
           if (info.PropertyType == typeof(int))
             Console.Write(“{0,-10}”, Convert.ToInt32(val));
           else
           if (info.PropertyType == typeof(float))
             Console.Write(“{0,-10:F02}”, Convert.ToSingle(val));
           else
             Console.Write(“{0,-10}”, val);
         }

         Console.WriteLine(Environment.NewLine);
       }

       Console.WriteLine(“Item count: {0}”, i);
     }
   }
   #region collection initializations
   private static Customer[] customers = new Customer[]{
       new Customer{CustomerID=”ALFKI”, CompanyName=”Alfreds Futterkiste”},
       new Customer{CustomerID=”ANATR”,
         CompanyName=”Ana Trujillo Emparedados y helados”},
       new Customer{CustomerID=”ANTON”, CompanyName=”Antonio Moreno Taquería”},
        new Customer{CustomerID=”AROUT”, CompanyName=”Around the Horn”},
       new Customer{CustomerID=”BERGS”, CompanyName=”Berglunds snabbköp”},
       new Customer{CustomerID=”SPECD”, CompanyName=”Spécialités du monde”}
     };

   private static Product[] products = new Product[]{
       new Product{ProductID=1, ProductName=”Chai”},
       new Product{ProductID=2, ProductName=”Chang”},
       new Product{ProductID=3, ProductName=”Aniseed Syrup”},
       new Product{ProductID=4, ProductName=”Chef Anton’s Cajun Seasoning”},
       new Product{ProductID=5, ProductName=”Chef Anton’s Gumbo Mix”},
       new Product{ProductID=6, ProductName=”Grandma’s Boysenberry Spread”},
       new Product{ProductID=7, ProductName=”Uncle Bob’s Organic Dried Pears”},
       new Product{ProductID=8, ProductName=”Northwoods Cranberry Sauce”},
       new Product{ProductID=9, ProductName=”Mishi Kobe Niku”},
       new Product{ProductID=10, ProductName=”Ikura”}
     };

   private static Order[] orders = new Order[]{
       new Order{ OrderID= 10278, CustomerID=”BERGS”},
       new Order{ OrderID= 10280, CustomerID=”BERGS”},
       new Order{ OrderID= 10308, CustomerID=”BERGS”},
       new Order{ OrderID= 10355, CustomerID=”AROUT”},
       new Order{ OrderID= 10365, CustomerID=”ANTON”},
       new Order{ OrderID= 10383, CustomerID=”AROUT”},
       new Order{ OrderID= 10384, CustomerID=”BERGS”},
       new Order{ OrderID= 10444, CustomerID=”BERGS”},
       new Order{ OrderID= 10445, CustomerID=”BERGS”},
       new Order{ OrderID= 10453, CustomerID=”AROUT”},
       new Order{ OrderID= 10507, CustomerID=”ANTON”},
       new Order{ OrderID= 10524, CustomerID=”BERGS”},
       new Order{ OrderID= 10535, CustomerID=”ANTON”},
       new Order{ OrderID= 10558, CustomerID=”AROUT”},
       new Order{ OrderID= 10572, CustomerID=”BERGS”},
       new Order{ OrderID= 10573, CustomerID=”ANTON”},
       new Order{ OrderID= 10625, CustomerID=”ANATR”},
       new Order{ OrderID= 10626, CustomerID=”BERGS”},
       new Order{ OrderID= 10643, CustomerID=”ALFKI”}
     };

   private static OrderDetail[] orderDetails = new OrderDetail[]{
       new OrderDetail{OrderID=10278, ProductID=1,
         UnitPrice=15.5000M, Quantity=16, Discount=0},
       new OrderDetail{OrderID=10278, ProductID=2,
         UnitPrice=44.0000M, Quantity=15, Discount=0},
       new OrderDetail{OrderID=10278, ProductID=3,
         UnitPrice=35.1000M, Quantity=8, Discount=0},
       new OrderDetail{OrderID=10278, ProductID=4,
         UnitPrice=12.0000M, Quantity=25, Discount=0},
       new OrderDetail{OrderID=10280, ProductID=5,
         UnitPrice=3.6000M, Quantity=12, Discount=0},
       new OrderDetail{OrderID=10280, ProductID=6,
         UnitPrice=19.2000M, Quantity=20, Discount=0},
       new OrderDetail{OrderID=10280, ProductID=7,
         UnitPrice=6.2000M, Quantity=30, Discount=0},
       new OrderDetail{OrderID=10308, ProductID=1,
         UnitPrice=15.5000M, Quantity=5, Discount=0},
       new OrderDetail{OrderID=10308, ProductID=9,
         UnitPrice=12.0000M, Quantity=5, Discount=0},
       new OrderDetail{OrderID=10355, ProductID=3,
         UnitPrice=3.6000M, Quantity=25, Discount=0},
       new OrderDetail{OrderID=10355, ProductID=5,
         UnitPrice=15.6000M, Quantity=25, Discount=0},
       new OrderDetail{OrderID=10365, ProductID=7,
         UnitPrice=16.8000M, Quantity=24, Discount=0},
       new OrderDetail{OrderID=10383, ProductID=9,
         UnitPrice=4.8000M, Quantity=20, Discount=0},
       new OrderDetail{OrderID=10383, ProductID=2,
         UnitPrice=13.0000M, Quantity=15, Discount=0},
       new OrderDetail{OrderID=10383, ProductID=4,
         UnitPrice=30.4000M, Quantity=20, Discount=0}
       };
   #endregion
 }

   public class Customer
   {
     public string CustomerID { get; set; }
     public string CompanyName { get; set; }
   }

   public class Product
   {
     public int ProductID { get; set; }
     public string ProductName { get; set; }
   }

   public class Order
   {
     public int OrderID { get; set; }
     public string CustomerID { get; set; }
   }

   public class OrderDetail
   {
     public int OrderID { get; set; }
     public int ProductID { get; set; }
     public decimal UnitPrice { get; set; }
     public int Quantity { get; set; }
     public decimal Discount { get; set; }
   }
}

The LINQ query starts on the line that begins with var orderInfo and extends for 56 lines. This probably can’t be stressed enough: Although a language, parser, and compiler (and the “Rainman”) might be able to comprehend code like the query in Listing 11.10, normal humans generally can’t with the amount of effort that makes it unproductive to produce. As a general practice, it is better to write lucid code and let compilers optimize the code; this is true 99.9% of the time.

Defining Joins Based on Composite Keys

A key is a generally a field that represents some value that is correlated to a like value somewhere else. For example, the preceding section used CustomerID from Customer and Order objects; CustomerID is an example of a key. A composite key is a key that is composed of multiple fields. In the parlance of SQL, composite keys are thought of as an index based on multiple columns (or fields). The same basic premise holds true for LINQ.

In LINQ where you would generally expect a single value to express a correlated relationship, composite keys use the new keyword and multiple fields. Listing 11.11 shows a query that defines two composite keys: The first is based on the cross join range variable cross.CustomerID and cross.ProductID and the second is based on the range variable groupJoin.CustomerID and groupJoin.ProductID. (Listing 11.10 also shows a composite key for grouping in the phrase that begins with “group detail by new”.)

Listing 11.11 Defining a Composite Key to Correlate Objects That Have Relationships Expressed By More Than a Single Field

  on new { cross.CustomerID, cross.ProductID } equals
    new { groupJoin.CustomerID, groupJoin.ProductID }

Summary

There is a lot to LINQ joins just as there is a lot to SQL joins. One of the biggest challenges is: When does good taste suggest you write a compound series of statements versus a monolithic statement? There were some studies done that suggest the human, short-term mind can juggle 7 to 10 items at a time. So, a good rule of thumb is if you feel yourself dropping some juggled items, your successors will find the queries difficult to juggle too.

Master-detail relationships composed of two sequences to create a new, third sequence are pretty straightforward and, luckily, they are the most common type. To address everyday kinds of joins, this chapter showed how LINQ can support inner joins, group joins, left joins, cross joins, custom joins, and joins based on composite keys. The inner join is probably one of the most routine, followed by the left join. If you find yourself needing right joins (which this chapter didn’t cover), cross joins, or custom joins, like everything else, it will take a lot of practice to become proficient. Hopefully, this chapter helps get you started.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.38.92