In This Chapter
Using Multiple From
Clauses
Defining Inner Joins
Using Custom, or Nonequijoins
Implementing Group Join and Left Outer Join
Implementing a Cross Join
Defining Joins Based on Composite Keys
“I’m still vertical.”
—Jackson Wayfare
If you have experience with SQL joins, the basics of LINQ joining are pretty straightforward. If you have no SQL experience, this chapter covers the basics and more advanced join queries. This chapter describes when the literal keyword join
can be used to define a join and when where predicates are used instead. Inner joins, left joins, cross joins, group joins, custom joins, and joins based on composite keys are all covered.
From
ClausesThe presence of multiple from
clauses in a LINQ query does not necessarily represent a join. A correlation is required. A correlation is where the query indicates how one sequence is related to another. This can be done in the join
clause with the equals
keyword, an inequality expression, or multiple expressions. Listing 11.1 contains an example that has multiple from
clauses but no correlation between the sequence in the first from
and second from
clause.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace MultipleFroms
{
class Program
{
static void Main(string[] args)
{
string[] willRogers =
{
“Don’t gamble; take all your savings “,
“and buy some good stock and hold it “,
“till it goes up, then sell it. If it “,
“don’t go up, don’t buy it. Will Rogers”
};
// part, word, words, and one are all called range variables
var horseSense = from part in willRogers
let words = part.Split(‘;’, ‘.’, ‘ ‘, ‘,’)
from word in words
let one = word.ToUpper()
where one.Contains(‘I’)
select word;
Array.ForEach(horseSense.ToArray(), w => Console.WriteLine(w));
Console.ReadLine();
}
}
}
The quotation attributed to Will Rogers (willRogers
) is an array of strings. The query contains two from
clauses, one from the array of strings and the second from the let
variable words
. The query selects each of the strings from the original quote. Each string is split and assigned to words
. The second from
clause selects each word in the words
, converts them to uppercase, and selects those that contain I.
In the clause “from part in willRogers
”, part
is called the range. The range variable is like the iteration variable in a for
loop. The keyword let
is also used to define a range. Thus, in the query in Listing 11.1, there are four range variables: part
, words
, word
, and one
. LINQ infers the type of range variables from the context.
Range variables are a limiting factor that defines when you can use join
and when you can’t. This concept is covered in more detail in the section titled “Using Custom, or Nonequijoins.”
Given two sources of data (this chapter refers to one as the left source and the other as the right source), an inner join is a join such that only objects that have matching criteria in both the left and right side are matched. In LINQ, this literally translates to all the items in the first from
clause are returned only if there are items in the second from
clause. The number of items in the first and second sequences does not have to be equal, but all of the items that have matching criteria in both sequences are returned to form a new sequence.
Inner joins are generally based on what is commonly referred to as master-detail or parent-child relationships. Generally, there is one item in the master source for every one or more items in the detail source. Certain things can happen to the data, like children without parents—called orphans—but this doesn’t change the way you request an inner join sequence. Listing 11.2 demonstrates an inner join based on a contrived data set.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace JoinDemo
{
class Program
{
static void Main(string[] args)
{
List<Customer> customers = new List<Customer>{
new Customer{ID=1, CompanyName=”Tom’s Toffees”},
new Customer{ID=2, CompanyName=”Karl’s Coffees”},
new Customer{ID=3, CompanyName=”Mary’s Marshmallow Cremes”}};
List<Order> orders = new List<Order>{
new Order{ID=1, CustomerID=1, ItemDescription=”Granulated Sugar”},
new Order{ID=2, CustomerID=1, ItemDescription=”Molasses”},
new Order{ID=3, CustomerID=2, ItemDescription=”French Roast
Beans”},
new Order{ID=4, CustomerID=2, ItemDescription=”Ceramic Cups”}};
// join
var join1 = from customer in customers
join order in orders on customer.ID equals
order.CustomerID
select new {Name=customer.CompanyName,
Item=order.ItemDescription};
Array.ForEach(join1.ToArray(), o=>Console.WriteLine
(“Name: {0}, Item: {1}”, o.Name, o.Item));
Console.ReadLine();
}
}
public class Customer
{
public int ID{get; set;}
public string CompanyName{get; set; }
}
public class Order
{
public int ID{get; set; }
public int CustomerID{get; set; }
public string ItemDescription{get; set;}
}
}
In Listing 11.2, the customers and orders are related based on the customer ID in both the Customer
and Order
classes. Tom’s Toffees is used twice because it appears once in the customers
sequence and is associated with two orders. Mary’s Marshmallow Cremes does not show up in the resultset because Mary’s has no orders in the data.
The query in Listing 11.2 is called an equijoin. This query is testing for equality, so it uses the join
keyword. Notice that the LINQ query’s on
predicate uses the keyword equals
instead of the operator ==
. You can write custom joins based on inequality, called nonequijoins, but you can’t use the keyword join
for inequality tests.
A literal join
clause performs an equijoin using the join
, on
, and equals
keywords. You can perform nonequijoins for cross joins and customs joins without the join
keyword when the join is predicated on inequality, multiple expressions of equality or inequality, or when a temporary range variable using let
is introduced for the right-side sequence. To define a custom join, define the correlation between sequences in a where
clause.
If there are two sequences containing keys, equijoins based on those keys are straightforward. However, if the second sequence is part of search criteria or input from a user, the second sequence would not be composed of objects and, hence, there would be no keys to join on. In this scenario, you could use a custom join and a where
predicate.
Listing 11.3 defines two extension methods for IDataReader
. The extension methods read and initialize a collection of products and suppliers. The Product
and Supplier
code is boilerplate code representing the Northwind Products and Suppliers tables. The classes Product
and Supplier
are entity classes.
In Listing 11.3, productIDsWanted
represents the selection criteria and the LINQ query uses a range variable ID to determine which Product
objects to return. The SupplierID
and the ProductName
are displayed to the console using the Array.ForEach
method and the Action<T>
delegate.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
using System.Data.SqlClient;
namespace CustomJoin
{
class Program
{
static void Main()
{
var productIDsWanted = new int[] { 1, 2, 3, 4, 5, 6 };
List<Product> products = GetProducts();
var results =
from p in products
from id in productIDsWanted
where p.SupplierID == id
orderby p.SupplierID
select p;
Array.ForEach(results.ToArray(), target => Console.WriteLine(
“Supplier ID={0,4}, Product={1}”,
target.SupplierID, target.ProductName
));
Console.ReadLine();
}
private static readonly string connectionString =
“Data Source=.\SQLEXPRESS;AttachDbFilename=”” +
“C:\Books\Sams\LINQ\Northwind\northwnd.mdf”;” +
“Integrated Security=True;Connect Timeout=30;User Instance=True”;
private static List<Product> GetProducts()
{
const string SQL = “SELECT * FROM PRODUCTS”;
using(SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
SqlCommand command = new SqlCommand(SQL, connection);
IDataReader reader = command.ExecuteReader();
return reader.ReadProducts();
}
}
private static List<Supplier> GetSuppliers()
{
const string SQL = “SELECT * FROM Suppliers”;
List<Supplier> suppliers = new List<Supplier>();
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
SqlCommand command = new SqlCommand(SQL, connection);
IDataReader reader = command.ExecuteReader();
return reader.ReadSuppliers();
}
}
}
public class Product
{
public int? ProductID{ get; set; }
public string ProductName{ get; set; }
public int? SupplierID{ get; set; }
public int? CategoryID{ get; set; }
public string QuantityPerUnit{ get; set; }
public decimal? UnitPrice{ get; set; }
public int? UnitsInStock{ get; set; }
public int? UnitsOnOrder{ get; set; }
public int? ReorderLevel{ get; set; }
public bool? Discontinued{ get; set; }
..}
public class Supplier
{
public int? SupplierID{ get; set; }
public string CompanyName{ get; set; }
public string ContactName{ get; set; }
public string ContactTitle{ get; set; }
public string Address{ get; set; }
public string City{ get; set; }
public string Region{ get; set; }
public string PostalCode{ get; set; }
public string Country{ get; set; }
public string Phone{ get; set; }
public string Fax{ get; set; }
public string HomePage{ get; set; }
}
public static class ExtendsReader
{
public static List<Product> ReadProducts(this IDataReader reader)
{
List<Product> list = new List<Product>();
while (reader.Read())
{
Product o = new Product();
o.ProductID = reader.GetInt32(0);
o.ProductName = reader.GetString(1);
o.SupplierID = reader.GetInt32(2);
o.CategoryID = reader.GetInt32(3);
o.QuantityPerUnit = reader.GetString(4);
o.UnitPrice = reader.GetDecimal(5);
o.UnitsInStock = reader.GetInt16(6);
o.UnitsOnOrder = reader.GetInt16(7);
o.ReorderLevel = reader.GetInt16(8);
o.Discontinued = reader.GetBoolean(9);
list.Add(o);
}
return list;
}
public static List<Supplier> ReadSuppliers(this IDataReader reader)
{
List<Supplier> list = new List<Supplier>();
while(reader.Read())
{
Supplier o = new Supplier();
o.SupplierID = reader.GetInt32(0);
o.CompanyName = reader.GetString(1);
o.ContactName = reader.GetString(2);
o.ContactTitle = reader.GetString(3);
o.Address = reader.GetString(4);
o.City = reader.GetString(5);
o.Region = reader[“Region”] ==
System.DBNull.Value ? ““ : reader.GetString(6);
o.PostalCode = reader.GetString(7);
o.Country = reader.GetString(8);
o.Phone = reader.GetString(9);
o.Fax = reader[“Fax”] ==
System.DBNull.Value ? ““ : reader.GetString(10);
o.HomePage = reader[“HomePage”] == System.DBNull.Value ?
““ : reader.GetString(11);
list.Add(o);
}
return list;
}
}
}
Notice the use of the formatting sequence on the Console.WriteLine
statement {0,4}. You can use additional formatting input to manage the layout or formatting. Console.WriteLine
has an overloaded version that invokes string.Format
and {0,4} pads the output for format argument 0 to a width of 4, left-justifying the output.
If there are multiple selection criteria, you can add additional predicates to the where
clause of a custom join. Listing 11.4 shows a revision to the LINQ query from Listing 11.3. The new query further refines the selection criteria to those elements between 2 and 5.
var results =
from p in products
from id in productIDsWanted
where p.SupplierID == id
&& id > 2 && id < 5
orderby p.SupplierID
select p;
Suppose you want the actual supplier objects and the product objects to get the name of the supplier as well as the name of the product. You could add an equijoin on supplier and product (you would need to obtain a list of suppliers with the GetSuppliers
method) and a custom join
on the sequence of integers. The Main
function in Listing 11.5 can be used to replace Main
in Listing 11.3. The revised Main
function in Listing 11.5 displays the company name of the supplier instead of the ID and the product name.
static void Main()
{
var productIDsWanted = new int[] { 1, 2, 3, 4, 5, 6 };
List<Product> products = GetProducts();
List<Supplier> suppliers = GetSuppliers();
var results =
from p in products
join s in suppliers on p.SupplierID equals s.SupplierID
from id in productIDsWanted
where p.SupplierID == id
&& id > 2 && id < 5
orderby p.SupplierID
select new { SupplierName = s.CompanyName, ProductName = p.ProductName };
Array.ForEach(results.ToArray(), target => Console.WriteLine(
“Supplier ID={0}, Product={1}”,
target.SupplierName, target.ProductName
));
Console.ReadLine();
}
As mentioned earlier in the chapter, the variable introduced in the from
clause before the in
keyword is referred to as a range variable. It plays the same role as the iterator variable in a for
loop. You can introduce additional range variables using the let
keyword. If you introduce a range variable for the right side of an inner join, you can’t use the join on equals
clause; instead, you have to use a custom join with a where
clause and a predicate.
Listing 11.6 is a variation of Listing 11.3. (That’s why the entire listing is here.) Listing 11.6 shows how you can combine multiple queries in a single SQL statement used to initialize the SqlCommand
object. IDataReader.Read
iterates through all of the elements in a single resultset and IDataReader.NextResult
iterates over the resultsets. Because the IDataReader
is forward only, you want to get all of the elements from each resultset before moving to the next resultset, as demonstrated in Listing 11.6.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
using System.Data.SqlClient;
namespace NonEquijoin
{
class Program
{
static void Main()
{
const string connectionString =
//”Data Source=BUTLER;Initial Catalog=Northwind;Integrated Security=True”;
“Data Source=.\SQLEXPRESS;AttachDbFilename=”” +
“C:\Books\Sams\LINQ\Northwind\northwnd.mdf”;” +
“Integrated Security=True;Connect Timeout=30;User Instance=True”;
const string SQL = “SELECT * FROM PRODUCTS; SELECT * FROM SUPPLIERS”;
List<Product> products = new List<Product>();
List<Supplier> suppliers = new List<Supplier>();
using(SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
SqlCommand command = new SqlCommand(SQL, connection);
IDataReader reader = command.ExecuteReader();
products = reader.ReadProducts();
if(reader.NextResult())
suppliers = reader.ReadSuppliers();
}
var nonEquiJoin =
from p in products
let master = from s in suppliers select s.SupplierID
where master.Contains(p.SupplierID)
orderby p.SupplierID
select new {SupplierID=p.SupplierID, ProductName = p.ProductName};
Array.ForEach(nonEquiJoin.ToArray(), target=>Console.WriteLine(
“Supplier ID={0,4}, Product={1}”,
target.SupplierID, target.ProductName
));
Console.ReadLine();
}
}
public class Product
{
public int? ProductID{ get; set; }
public string ProductName{ get; set; }
public int? SupplierID{ get; set; }
public int? CategoryID{ get; set; }
public string QuantityPerUnit{ get; set; }
public decimal? UnitPrice{ get; set; }
public int? UnitsInStock{ get; set; }
public int? UnitsOnOrder{ get; set; }
public int? ReorderLevel{ get; set; }
public bool? Discontinued{ get; set; }
}
public class Supplier
{
public int? SupplierID{ get; set; }
public string CompanyName{ get; set; }
public string ContactName{ get; set; }
public string ContactTitle{ get; set; }
public string Address{ get; set; }
public string City{ get; set; }
public string Region{ get; set; }
public string PostalCode{ get; set; }
public string Country{ get; set; }
public string Phone{ get; set; }
public string Fax{ get; set; }
public string HomePage{ get; set; }
}
public static class ExtendsReader
{
public static List<Product> ReadProducts(this IDataReader reader)
{
List<Product> list = new List<Product>();
while (reader.Read())
{
Product o = new Product();
o.ProductID = reader.GetInt32(0);
o.ProductName = reader.GetString(1);
o.SupplierID = reader.GetInt32(2);
o.CategoryID = reader.GetInt32(3);
o.QuantityPerUnit = reader.GetString(4);
o.UnitPrice = reader.GetDecimal(5);
o.UnitsInStock = reader.GetInt16(6);
o.UnitsOnOrder = reader.GetInt16(7);
o.ReorderLevel = reader.GetInt16(8);
o.Discontinued = reader.GetBoolean(9);
list.Add(o);
}
return list;
}
public static List<Supplier> ReadSuppliers(this IDataReader reader)
{
List<Supplier> list = new List<Supplier>();
while(reader.Read())
{
Supplier o = new Supplier();
o.SupplierID = reader.GetInt32(0);
o.CompanyName = reader.GetString(1);
o.ContactName = reader.GetString(2);
o.ContactTitle = reader.GetString(3);
o.Address = reader.GetString(4);
o.City = reader.GetString(5);
o.Region = reader[“Region”] ==
System.DBNull.Value ? ““ : reader.GetString(6);
o.PostalCode = reader.GetString(7);
o.Country = reader.GetString(8);
o.Phone = reader.GetString(9);
o.Fax = reader[“Fax”] ==
System.DBNull.Value ? ““ : reader.GetString(10);
o.HomePage = reader[“HomePage”] == System.DBNull.Value ?
““ : reader.GetString(11);
list.Add(o);
}
return list;
}
}
}
The LINQ query in Listing 11.6 uses the range variable master defined from an additional LINQ query that returns a sequence of just SupplierID
s. Because the listing uses let
, a where
clause implements the implied custom join, returning only those products that contain the predicated SupplierID
. The results are ordered by the SupplierID
.
The group join and the left outer join use a group join but produce different resultsets. A group join is emitted to Microsoft Intermediate Language (MSIL, managed code’s assembly language output) as an extension method GroupJoin
when the into
clause is used in a LINQ query. The group is represented as a master-detail relationship with a single instance of elements from the master sequence and subordinate, related sequences representing the details (or children). A left outer join uses the GroupJoin
method (and into
clause) but the data is flattened back out, denormalizing elements from both sequences. Unlike the inner join, however, you still get master sequence elements that have no child, detail sequences.
For example, if you perform group join on customers and orders from Listing 11.2, you get one result for Tom’s, one for Karl’s, and one for Mary’s. Tom’s contains a child sequence containing Granulated Sugar and Molasses, Karl’s contains a sequence containing French Roast Beans and Ceramic Cups, and Mary’s has no child sequence. If you convert the group to a left join, the customer data reappears each time for every child element and Mary’s appears once with an empty child element.
The next subsection demonstrates the group join and the following subsection demonstrates the revision that produces the flattened left join.
SQL experts know what groups do. As a reminder for the rest of us, a group is a way of returning items in a first sequence and correlating each of the items with the related elements from the second sequence. For example, in Listing 11.2, Tom’s Toffees sells Granulated Sugar and Molasses. With an inner join, Tom’s Toffees is returned and repeated for each of the items Tom sells. If you use a group join, Tom’s Toffees is returned one time and is correlated with the items he sells. A group join, in effect, is a way to represent the normalized relationship existing in relational databases, and an inner join denormalizes (or repeats) data.
With the group join, the data is handled differently. An outer sequence represents the groups and an inner sequence in each group element represents the elements of the group. The result is a logical, master-detail relationship represented by the return type. In technical reality, you still have an IEnumerable<T>
return result from group join queries, but each group is associated with a subordinate sequence related by a key value.
Listing 11.7 shows a straightforward use of the group join. It is the presence of the into
keyword and predicate that instructs the compiler to emit the call to the GroupJoin
extension method (see Figure 11.1).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace GroupJoin
{
class Program
{
static void Main(string[] args)
{
List<Customer> customers = new List<Customer>{
new Customer{ID=1, CompanyName=”Tom’s Toffees”},
new Customer{ID=2, CompanyName=”Karl’s Coffees”},
new Customer{ID=3, CompanyName=”Mary’s Marshmallow Cremes”}};
List<Order> orders = new List<Order>{
new Order{ID=1, CustomerID=1, ItemDescription=”Granulated Sugar”},
new Order{ID=2, CustomerID=1, ItemDescription=”Molasses”},
new Order{ID=3, CustomerID=2, ItemDescription=”French Roast Beans”},
new Order{ID=4, CustomerID=2, ItemDescription=”Ceramic Cups”}};
// group join
var group = from customer in customers
join order in orders on customer.ID equals
order.CustomerID into children
select new { Customer = customer.CompanyName,
Orders = children };
string separator = new string(‘-’, 40);
foreach (var customer in group)
{
Console.WriteLine(“Customer: {0}”, customer.Customer);
Console.WriteLine(separator);
foreach (var order in customer.Orders)
{
Console.WriteLine(“ ID={0}, Item={1}”, order.ID, order.ItemDescription);
}
Console.WriteLine(Environment.NewLine);
}
Console.ReadLine();
}
}
public class Customer
{
public int ID{get; set;}
public string CompanyName{get; set; }
}
public class Order
{
public int ID{get; set; }
public int CustomerID{get; set; }
public string ItemDescription{get; set;}
}
}
The way the query is designed in this example provides a master-detail relationship, which includes Mary’s Marshmallow Cremes even though there are no orders for Mary’s. Note the use of the nested fo r
loop to get at the child sequence data. The output from Listing 11.7 is shown in Figure 11.2.
A left outer join is an inner join plus all of the elements in the first range that don’t have related elements in the second range, so a left join is always greater than or equal to the size of an inner join.
A left joins starts out as a group join. You can use the join… on…equals… into
phrase to start the left outer join, but you’ll want to flatten the resultset. To flatten the resultset, add an additional from
clause based on the group. If you stop here, though, you are
simply taking the long route to an inner join. To get master elements that don’t have detail elements, you need to incorporate the DefaultIfEmpty
extension method and express how to handle absent detail elements.
Listing 11.8 shows a revision to the Main
function for Listing 11.7. This version adds an additional from
clause and range child. The range child represents all of the elements in the subordinate sequences represented by children. The predicate
in children.DefaultIfEmpty(new Order())
means that if there is no Order
for that child, you need to create an empty instance of the Order
class. (This is a subtle implementation of the Null Object pattern; the basic concept is also described in Martin Fowler’s book, Refactoring, as the “Introduce Null Object” refactoring.) The result is that we get Tom’s, Karl’s, and Mary’s in the resultset.
static void Main(string[] args)
{
List<Customer> customers = new List<Customer>{
new Customer{ID=1, CompanyName=”Tom’s Toffees”},
new Customer{ID=2, CompanyName=”Karl’s Coffees”},
new Customer{ID=3, CompanyName=”Mary’s Marshmallow Cremes”}};
List<Order> orders = new List<Order>{
new Order{ID=1, CustomerID=1, ItemDescription=”Granulated Sugar”},
new Order{ID=2, CustomerID=1, ItemDescription=”Molasses”},
new Order{ID=3, CustomerID=2, ItemDescription=”French Roast Beans”},
new Order{ID=4, CustomerID=2, ItemDescription=”Ceramic Cups”}};
// group join
var group = from customer in customers
join order in orders on customer.ID equals
order.CustomerID into children
from child in children.DefaultIfEmpty(new
Order())
select new { Customer = customer.CompanyName,
Item=child.ItemDescription };
foreach (var customer in group)
{
Console.WriteLine(“Customer={0}, Item={1}”, customer.Customer,
customer.Item);
}
Console.ReadLine();
}
}
Notice that in the foreach
loop, the internal loop is no longer needed. That’s because the query in Listing 11.8 is really no longer represented as a master-detail hierarchical result-set; it is simply a left outer join. As you can see from the output in Figure 11.3, the master elements—the customers
—have been denormalized and are now repeated for each child element, and the Customer
objects are present even if no child exists, as in Mary’s case.
A cross join or Cartesian join is the basis for every other join. A cross join is a join between two sources absent any correlative predicate. The word Cartesian (after French mathematician René Descartes) refers to the product of the join. For example, if table 1 has 10 rows and table 2 has 25 rows, a cross (or Cartesian) join will produce a result with 250, or every combination of data from table 1 and table 2. Assuming you have a sequence of products and a sequence of customers, the following query (a cross join) would produce all of the combinations of customers and products (the Cartesian product):
var test = from customer in customers
from product in products
select new
{
customer.CompanyName,
product.ProductName
};
As you can imagine, given a left set with m elements and a right set with n elements, the cross join results in a sequence with m * n elements. For example, a customer table with 1,000 customers and a products table with a 1,000 products cross joined would return 1,000,000 rows of data in the resultset. You can imagine that cross join results can quickly become unmanageable.
In the routine case, a cross join is either intentionally being used to produce test data or is the result of a poorly formed correlation between the data sets. Some clever people have devised a useful purpose for cross joins. Suppose, for example, you have to produce a report for all customers and all products, even showing quantities for products a customer hasn’t purchased (where quantity equals 0). (The same approach would work for showing only products a customer isn’t buying.) A cross join of customers and products would include every combination of customer and product. Then, you could perform a left join to return every customer and product combination with actual sales figures, filling in quantities and sales values and zeroing out customer-product combinations that have no sales. Listing 11.9 contains such a SQL Server query for the Northwind database that produces a cross join of customers and products and then a left join to fill in the sales figures for customer-product combinations where orders exist. (Table 11.1 contains the first 10 rows of output from the query, and Figure 11.4 shows part of the execution plan, visually describing how SQL Server will execute the query.)
SELECT c.CustomerID, c.CompanyName, p.ProductName, ISNULL(S.Quantity, 0) AS Quantity,
ISNULL(S.UnitPrice, 0) AS UnitPrice, ISNULL(S.Discount, 0) AS Discount,
ISNULL(ROUND((S.Quantity * S.UnitPrice) * (1 - S.Discount), 2), 0) AS TotalSales
FROM Customers AS c CROSS JOIN
Products AS p LEFT OUTER JOIN
(SELECT o.CustomerID, od.ProductID, SUM(od.UnitPrice) AS UnitPrice,
SUM(od.Quantity) AS Quantity, od.Discount
FROM Orders AS o INNER JOIN
[Order Details] AS od ON o.OrderID = od.OrderID
GROUP BY o.CustomerID, od.ProductID, od.Discount) AS S
ON S.CustomerID = c.CustomerID AND p.ProductID = S.ProductID
ORDER BY c.CustomerID, p.ProductName, UnitPrice DESC
The SQL version (and the LINQ version) require some effort to get right, but in many instances you will be trading lines of code for time because these kinds of monolithic queries are difficult to write, difficult to debug, and difficult to test. Listing 11.10 is an approximation of the query in Listing 11.9. (The plumbing and some sample code was added for testing purposes, but in practice this code would be necessary to write with or without LINQ.)
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;
namespace CrossJoinReportMagic
{
class Program
{
static void Main(string[] args)
{
var orderInfo = /* left outer join left hand side */
from cross in
(
/* cross join customer and products */
from customer in customers
from product in products
select new
{
customer.CustomerID,
customer.CompanyName,
product.ProductID,
product.ProductName
})
/* left outer join right hand side */
join groupJoin in
(
/* group join on orders and details */
(from order in orders
join detail in orderDetails on
order.OrderID equals detail.OrderID
group detail by new
{
order.CustomerID,
detail.ProductID,
detail.UnitPrice,
detail.Discount
} into groups
from item in groups
let Quantity = groups.Sum(d => d.Quantity)
select new
{
groups.Key.CustomerID,
item.ProductID,
item.UnitPrice,
Quantity,
item.Discount
}).Distinct()).DefaultIfEmpty()
on new { cross.CustomerID, cross.ProductID } equals
new { groupJoin.CustomerID, groupJoin.ProductID }
into grouped
from result in grouped.DefaultIfEmpty()
orderby cross.CompanyName, cross.ProductID,
(result != null ? result.Quantity : 0) descending
select new
{
CustomerID = cross.CustomerID,
CompanyName = cross.CompanyName,
ProductID = cross.ProductID,
ProductName = cross.ProductName,
UnitPrice = result == null ? 0 : result.UnitPrice,
Quantity = result == null ? 0 : result.Quantity,
Discount = result == null ? 0 : result.Discount,
Total = result == null ? 0 :
result.Quantity * result.UnitPrice *
(1 - result.Discount)
};
Dump(orderInfo);
Console.ReadLine();
}
public static void Dump(object data)
{
if (data is IEnumerable)
{
IEnumerable list = data as IEnumerable;
IEnumerator enumerator = list.GetEnumerator();
int i = 0;
bool once = false;
while(enumerator.MoveNext())
{
i++;
object current = enumerator.Current;
PropertyInfo[] props = current.GetType().GetProperties();
if(!once /* write headers */)
{
foreach(PropertyInfo info in props)
{
Console.Write(“{0,-10}”, info.Name.Length > 9 ? info.Name.Remove(9):
info.Name.PadRight(10));
once = true;
}
}
foreach (PropertyInfo info in props)
{
object value = null;
try{ value = info.GetValue(current, null); }
catch { value = “<empty>”; }
string val = (value.ToString().Length > 9) ?
value.ToString().Remove(9) : value.ToString().PadRight(10);
if (info.PropertyType == typeof(decimal))
Console.Write(“{0,-10:C02}”, Convert.ToDecimal(val));
else
if (info.PropertyType == typeof(int))
Console.Write(“{0,-10}”, Convert.ToInt32(val));
else
if (info.PropertyType == typeof(float))
Console.Write(“{0,-10:F02}”, Convert.ToSingle(val));
else
Console.Write(“{0,-10}”, val);
}
Console.WriteLine(Environment.NewLine);
}
Console.WriteLine(“Item count: {0}”, i);
}
}
#region collection initializations
private static Customer[] customers = new Customer[]{
new Customer{CustomerID=”ALFKI”, CompanyName=”Alfreds Futterkiste”},
new Customer{CustomerID=”ANATR”,
CompanyName=”Ana Trujillo Emparedados y helados”},
new Customer{CustomerID=”ANTON”, CompanyName=”Antonio Moreno Taquería”},
new Customer{CustomerID=”AROUT”, CompanyName=”Around the Horn”},
new Customer{CustomerID=”BERGS”, CompanyName=”Berglunds snabbköp”},
new Customer{CustomerID=”SPECD”, CompanyName=”Spécialités du monde”}
};
private static Product[] products = new Product[]{
new Product{ProductID=1, ProductName=”Chai”},
new Product{ProductID=2, ProductName=”Chang”},
new Product{ProductID=3, ProductName=”Aniseed Syrup”},
new Product{ProductID=4, ProductName=”Chef Anton’s Cajun Seasoning”},
new Product{ProductID=5, ProductName=”Chef Anton’s Gumbo Mix”},
new Product{ProductID=6, ProductName=”Grandma’s Boysenberry Spread”},
new Product{ProductID=7, ProductName=”Uncle Bob’s Organic Dried Pears”},
new Product{ProductID=8, ProductName=”Northwoods Cranberry Sauce”},
new Product{ProductID=9, ProductName=”Mishi Kobe Niku”},
new Product{ProductID=10, ProductName=”Ikura”}
};
private static Order[] orders = new Order[]{
new Order{ OrderID= 10278, CustomerID=”BERGS”},
new Order{ OrderID= 10280, CustomerID=”BERGS”},
new Order{ OrderID= 10308, CustomerID=”BERGS”},
new Order{ OrderID= 10355, CustomerID=”AROUT”},
new Order{ OrderID= 10365, CustomerID=”ANTON”},
new Order{ OrderID= 10383, CustomerID=”AROUT”},
new Order{ OrderID= 10384, CustomerID=”BERGS”},
new Order{ OrderID= 10444, CustomerID=”BERGS”},
new Order{ OrderID= 10445, CustomerID=”BERGS”},
new Order{ OrderID= 10453, CustomerID=”AROUT”},
new Order{ OrderID= 10507, CustomerID=”ANTON”},
new Order{ OrderID= 10524, CustomerID=”BERGS”},
new Order{ OrderID= 10535, CustomerID=”ANTON”},
new Order{ OrderID= 10558, CustomerID=”AROUT”},
new Order{ OrderID= 10572, CustomerID=”BERGS”},
new Order{ OrderID= 10573, CustomerID=”ANTON”},
new Order{ OrderID= 10625, CustomerID=”ANATR”},
new Order{ OrderID= 10626, CustomerID=”BERGS”},
new Order{ OrderID= 10643, CustomerID=”ALFKI”}
};
private static OrderDetail[] orderDetails = new OrderDetail[]{
new OrderDetail{OrderID=10278, ProductID=1,
UnitPrice=15.5000M, Quantity=16, Discount=0},
new OrderDetail{OrderID=10278, ProductID=2,
UnitPrice=44.0000M, Quantity=15, Discount=0},
new OrderDetail{OrderID=10278, ProductID=3,
UnitPrice=35.1000M, Quantity=8, Discount=0},
new OrderDetail{OrderID=10278, ProductID=4,
UnitPrice=12.0000M, Quantity=25, Discount=0},
new OrderDetail{OrderID=10280, ProductID=5,
UnitPrice=3.6000M, Quantity=12, Discount=0},
new OrderDetail{OrderID=10280, ProductID=6,
UnitPrice=19.2000M, Quantity=20, Discount=0},
new OrderDetail{OrderID=10280, ProductID=7,
UnitPrice=6.2000M, Quantity=30, Discount=0},
new OrderDetail{OrderID=10308, ProductID=1,
UnitPrice=15.5000M, Quantity=5, Discount=0},
new OrderDetail{OrderID=10308, ProductID=9,
UnitPrice=12.0000M, Quantity=5, Discount=0},
new OrderDetail{OrderID=10355, ProductID=3,
UnitPrice=3.6000M, Quantity=25, Discount=0},
new OrderDetail{OrderID=10355, ProductID=5,
UnitPrice=15.6000M, Quantity=25, Discount=0},
new OrderDetail{OrderID=10365, ProductID=7,
UnitPrice=16.8000M, Quantity=24, Discount=0},
new OrderDetail{OrderID=10383, ProductID=9,
UnitPrice=4.8000M, Quantity=20, Discount=0},
new OrderDetail{OrderID=10383, ProductID=2,
UnitPrice=13.0000M, Quantity=15, Discount=0},
new OrderDetail{OrderID=10383, ProductID=4,
UnitPrice=30.4000M, Quantity=20, Discount=0}
};
#endregion
}
public class Customer
{
public string CustomerID { get; set; }
public string CompanyName { get; set; }
}
public class Product
{
public int ProductID { get; set; }
public string ProductName { get; set; }
}
public class Order
{
public int OrderID { get; set; }
public string CustomerID { get; set; }
}
public class OrderDetail
{
public int OrderID { get; set; }
public int ProductID { get; set; }
public decimal UnitPrice { get; set; }
public int Quantity { get; set; }
public decimal Discount { get; set; }
}
}
The LINQ query starts on the line that begins with var orderInfo
and extends for 56 lines. This probably can’t be stressed enough: Although a language, parser, and compiler (and the “Rainman”) might be able to comprehend code like the query in Listing 11.10, normal humans generally can’t with the amount of effort that makes it unproductive to produce. As a general practice, it is better to write lucid code and let compilers optimize the code; this is true 99.9% of the time.
A key is a generally a field that represents some value that is correlated to a like value somewhere else. For example, the preceding section used CustomerID
from Customer
and Order
objects; CustomerID
is an example of a key. A composite key is a key that is composed of multiple fields. In the parlance of SQL, composite keys are thought of as an index based on multiple columns (or fields). The same basic premise holds true for LINQ.
In LINQ where you would generally expect a single value to express a correlated relationship, composite keys use the new
keyword and multiple fields. Listing 11.11 shows a query that defines two composite keys: The first is based on the cross join range variable cross.CustomerID
and cross.ProductID
and the second is based on the range variable groupJoin.CustomerID
and groupJoin.ProductID
. (Listing 11.10 also shows a composite key for grouping in the phrase that begins with “group detail by new”
.)
on new { cross.CustomerID, cross.ProductID } equals
new { groupJoin.CustomerID, groupJoin.ProductID }
There is a lot to LINQ joins just as there is a lot to SQL joins. One of the biggest challenges is: When does good taste suggest you write a compound series of statements versus a monolithic statement? There were some studies done that suggest the human, short-term mind can juggle 7 to 10 items at a time. So, a good rule of thumb is if you feel yourself dropping some juggled items, your successors will find the queries difficult to juggle too.
Master-detail relationships composed of two sequences to create a new, third sequence are pretty straightforward and, luckily, they are the most common type. To address everyday kinds of joins, this chapter showed how LINQ can support inner joins, group joins, left joins, cross joins, custom joins, and joins based on composite keys. The inner join is probably one of the most routine, followed by the left join. If you find yourself needing right joins (which this chapter didn’t cover), cross joins, or custom joins, like everything else, it will take a lot of practice to become proficient. Hopefully, this chapter helps get you started.
3.144.38.92