In This Chapter
Finding Distinct Elements
Defining Exclusive Sets with Intersect
and Except
Creating Composite Resultsets with Union
“Ah, well, then I suppose I shall have to die beyond my means.”
—Oscar Wilde
Set operations are like knights in the game chess. Set operations—like knights—help you move in imaginative ways. The set operations are implemented as extension methods. Distinct
selects a distinct element in a sequence. Union
performs set addition, Intersect
performs set arithmetic, and Except
helps exclude elements from a set.
Because this chapter is composed of a relatively few number of features, a lot of code samples are included. The code samples are intended to help you think of ways to incorporate set logic into your applications. As you will see, some of the samples manipulate data from a database. Obviously, that data could be manipulated in Structured Query Language (SQL) beforehand, so the samples present an alternative to writing the SQL to do the same thing.
Distinct
is the only set operation that has a Language INtegrated Query (LINQ) keyword, but that is only in Visual Basic (VB). Distinct
is designed to examine two sets and eliminate duplicates. Distinct
can be invoked with a sequence and by invoking an overloaded form using an object that implements IEqualityComparer
.
Listing 9.1 shows a list of numbers representing grades from a group of students (see Figure 9.1). The example invokes the Distinct
operation to remove duplicates and then determines the median or middle grade. (A favorite of students, a college professor might use this to “grade on a curve.”) In this listing, the sequence (or set) has duplicates at 72, 75, 81, and 92; these are removed from the sequence. The median grade is then 85. (Chapter 8, “Using Aggregate Operations,” includes an example of implementing a custom aggregator, a Median
extension method.)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace DistinctDemo
{
class Program
{
static void Main(string[] args)
{
var grades = new int[]
{ 65, 72, 72, 75, 75, 77, 79, 80, 81, 81, 81, 85,
88, 91, 92, 92, 92, 95, 99, 100 };
// select distinct grades
var distinct = grades.Distinct();
Console.WriteLine(“Median grade: {0}”,
distinct.ToArray<int>()[distinct.Count() / 2]);
Console.ReadLine();
}
}
}
The example in this section combines SQL and custom objects. Assume you have a sales force and an objective to determine all of the unique cities in which you sell product. The example uses the Distinct
method to determine the distinct list of cities (which might then be used to divvy up sales regions by sales representative).
Each part of the solution is provided in its own section with an elaboration on that part of the solution. The complete listing is provided at the end of this section.
You can define an entity class very quickly in C# using two methods. One is to use DevExpress’s CodeRush product, which is a cool metaprogramming tool, and the next is to use automatic properties. In Listing 9.2, a custom Order
class is defined using automatic properties. In conjunction with the automatic properties, nullable types are employed. Using nullable types permits assignment of null
to fields that are actually null in the database rather than contriving arbitrary null representatives, such as the classic – 1 for integers.
class Order
{
public int? OrderID { get; set; }
public string CustomerID { get; set; }
public int? EmployeeID { get; set; }
public DateTime? OrderDate { get; set; }
public DateTime? RequiredDate { get; set; }
public DateTime? ShippedDate { get; set; }
public int? ShipVia { get; set; }
public decimal? Freight { get; set; }
public string ShipName { get; set; }
public string ShipAddress { get; set; }
public string ShipCity { get; set; }
public string ShipRegion { get; set; }
public string ShipPostalCode { get; set; }
public string ShipCountry { get; set; }
}
Remember the basic rule is that automatic properties just mean that the basic setter and getter are defined for you and you can’t refer to fields directly. (Refer to Chapter 7, “Sorting and Grouping Queries,” for the introduction to automatic properties.) The absence of a constructor also means you can construct instances of Order
with any combination of compound initialization your solution requires.
Entities are generally classes that represent tables in a database. These can be modeled either with an Entity Relational Diagram (a database modeling tool) or the Unified Modeling Language (and a UML tool). Generally, both kinds of models are not needed but one or the other is helpful. The biggest things to avoid are having entities in code that map to tables defined more than once in code and having entities in code for the sake of having entities. Some entities, such as linking tables, should be used to simply help construct composite objects.
Listing 9.3 uses the canonical Northwind database, a generic List
(or Order
objects), and a while
loop to populate the list. The connection is created and the collection is accessible via an IDataReader
. (In the example, the provider-agnostic types were used just to remind you of their availability.) It is also worth noting that the Order
objects were constructed using compound type initialization introduced in Chapter 2, “Using Compound Type Initialization.”
public static List<Order> GetOrders()
{
string connectionString =
“Data Source=.\SQLEXPRESS;AttachDbFilename=”C:\Books\Sams” +
“\LINQ\Northwind\northwnd.mdf”;Integrated Security=True;Connect ” +
“Timeout=30;User Instance=True”;
List<Order> orders = new List<Order>();
using(IDbConnection connection = new SqlConnection(connectionString))
{
connection.Open();
IDbCommand command = new SqlCommand(“SELECT * FROM Orders”);
command.Connection = connection;
command.CommandType = CommandType.Text;
IDataReader reader = command.ExecuteReader();
while(reader.Read())
{
orders.Add(new Order
{
OrderID = reader.IsDBNull(0) ? null : (int?)reader.GetInt32(0),
CustomerID = reader.IsDBNull(1) ? null : reader.GetString(1),
EmployeeID = reader.IsDBNull(2) ? null : (int?)reader.GetInt32(2),
OrderDate = reader.IsDBNull(3) ? null :
(DateTime?)reader.GetDateTime(3),
RequiredDate = reader.IsDBNull(4) ? null:
(DateTime?)reader.GetDateTime(4),
ShippedDate = reader.IsDBNull(5) ? null :
(DateTime?)reader.GetDateTime(5),
ShipVia = reader.IsDBNull(6) ? null : (int?)reader.GetInt32(6),
Freight = reader.IsDBNull(7) ? null : (decimal?)reader.GetDecimal(7),
ShipName = reader.IsDBNull(8) ? null : reader.GetString(8),
ShipAddress = reader.IsDBNull(9) ? null : reader.GetString(9),
ShipCity = reader.IsDBNull(10) ? null : reader.GetString(10),
ShipRegion = reader.IsDBNull(11) ? null : reader.GetString(11),
ShipPostalCode = reader.IsDBNull(12) ? null : reader.GetString(12),
ShipCountry = reader.IsDBNull(13) ? null : reader.GetString(13),
}
);
}
}
return orders;
}
}
The only change you will need to make is to adjust the connection string to match the location of your copy of the Northwind Trading Company database. As a reminder, you can store the connection string in a .config
file, and if you are interested, you can learn about writing a custom installer for dynamic database configuration in my article titled, “Implementing a Custom ConnectionString Installer for Setup” on developer.com at www.developer.com/security/article.php/1158037043914.
IEqualityComparer
To augment this example, you can perform custom comparisons on Order
objects. Previously, it was mentioned that distinct cities are desired. By default, objects perform default referential comparisons. In this case, you want comparison by city. To accomplish this, you can implement an IEqualityComparer
for Order
objects.
Listing 9.4 demonstrates an example of IEqualityComparer
. The basic rule is that two objects are considered equal if the Equals
method returns true
and the hash code of each object is the same. For that, you need to implement an Equals
function that compares the ShipCity
of Order
objects and returns the hash code from the ShipCity
too. The result is shown in Listing 9.4.
class CityComparer : IEqualityComparer<Order>
{
#region IEqualityComparer<Order> Members
public bool Equals(Order x, Order y)
{
return x.ShipCity == null || y.ShipCity == null ?
false : x.ShipCity.Equals(y.ShipCity);
}
public int GetHashCode(Order obj)
{
return obj.ShipCity == null ? -1 : obj.ShipCity.GetHashCode();
}
#endregion
}
It is always helpful to be able to dump the state of an object. It is always too time consuming to implement such behavior manually for every kind of object. To make the state of an object available for debugging and testing, you can implement an object Dumper
using reflection.
In Listing 9.5, the dumper uses a StringBuilder
and Reflection
to get the public properties—representing the state of an object—and iterate through the properties to display the name and value of each property. This is easier in .NET 3.5 because you can use the Array.ForEach
method and the generic delegate Action
. The Action
is implicit as the second argument of the ForEach
method.
public static string Dump<T>(T obj)
{
Type t = typeof(T);
StringBuilder builder = new StringBuilder();
PropertyInfo[] infos = t.GetProperties();
// feel the enmity of your peers if you write code like this
Array.ForEach( infos.ToArray(), p => builder.AppendFormat(
“{0}={1} ”, p.Name, p.GetValue(obj, null) == null ? “” :
p.GetValue(obj, null)));
builder.AppendLine();
return builder.ToString();
}
The comment in Listing 9.5, “feel the enmity of your peers if you write code like this,” is a bit lighthearted. A general rule of thumb is that it is okay to write esoteric code, such as the use of the ternary operator in the second argument of ForEach
, as long as the following is true:
All the code isn’t esoteric.
Esoteric code is only used occasionally.
It comprises the bulk of a singular—as opposed to monolithic—function.
The code is in a disposable versus domain-significant method.
If the code is buggy, it can be easily replaced with a more verbose form.
As a general rule, code like that in the Dumper
is a little showy. However, this is a book, and a book is like a commercial for the new 620 Corvette. It’s okay to talk about going 220 miles per hour for marketing purposes because it’s fun, but on real city streets and highways (and in production code), it might be a little dangerous.
Listing 9.6 provides the complete listing as well as the code that uses the Distinct
method. The Main
function requests the list of Order
objects with GetOrders
. A quick dump of these objects reveals that we have everything. In Version 1—as noted by the comment—we can request the distinct list of cities by directly calling the extension method and passing in an instance of the CityComparer
. Version 1 also uses OrderBy
as an extension method to sort the distinct list of cities. Version 2 combines the sort and the call to Distinct
in a single LINQ query. Notice that Distinct
is used as an extension method in the second example tool; the only real difference between version 1 and 2 is how the sorting behavior is invoked.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Reflection;
using System.Collections;
namespace DistinctDemo2
{
class Program
{
/// <summary>
/// Demonstrates distinct ship city
/// </summary>
/// <param name=“args”></param>
static void Main(string[] args)
{
List<Order> orders = GetOrders();
Array.ForEach(orders.ToArray(), o => Console.WriteLine(Dump(o)));
Console.WriteLine(orders.Count);
// version 1 with extension method OrderBy
var cities = orders.Distinct(new CityComparer());
Array.ForEach(cities.OrderBy(o => o.ShipCity).ToArray(),
orderedByCity => Console.WriteLine(orderedByCity.ShipCity));
// version 2 with query
var cities2 = from order in orders.Distinct(new CityComparer())
orderby order.ShipCity
select order;
Array.ForEach(cities2.ToArray(),
orderedByCity => Console.WriteLine(orderedByCity.ShipCity));
Console.WriteLine(cities.Count());
Console.ReadLine();
}
public static string Dump<T>(T obj)
{
Type t = typeof(T);
StringBuilder builder = new StringBuilder();
PropertyInfo[] infos = t.GetProperties();
// feel the enmity of your peers if you write code like this
Array.ForEach( infos.ToArray(), p => builder.AppendFormat(
“{0}={1} ”, p.Name, p.GetValue(obj, null) == null ? “” :
p.GetValue(obj, null)));
builder.AppendLine();
return builder.ToString();
}
public static List<Order> GetOrders()
{
string connectionString =
“Data Source=.\SQLEXPRESS;AttachDbFilename=“C:\Books\Sams” +
“\LINQ\Northwind\northwnd.mdf”;Integrated Security=True;Connect ” +
“Timeout=30;User Instance=True”;
List<Order> orders = new List<Order>();
using(IDbConnection connection = new SqlConnection(connectionString))
{
connection.Open();
IDbCommand command = new SqlCommand(“SELECT * FROM Orders”);
command.Connection = connection;
command.CommandType = CommandType.Text;
IDataReader reader = command.ExecuteReader();
while(reader.Read())
{
orders.Add(new Order
{
OrderID = reader.IsDBNull(0) ? null : (int?)reader.GetInt32(0),
CustomerID = reader.IsDBNull(1) ? null : reader.GetString(1),
EmployeeID = reader.IsDBNull(2) ? null : (int?)reader.GetInt32(2),
OrderDate = reader.IsDBNull(3) ? null :
(DateTime?)reader.GetDateTime(3),
RequiredDate = reader.IsDBNull(4) ? null :
(DateTime?)reader.GetDateTime(4),
ShippedDate = reader.IsDBNull(5) ? null :
(DateTime?)reader.GetDateTime(5),
ShipVia = reader.IsDBNull(6) ? null : (int?)reader.GetInt32(6),
Freight = reader.IsDBNull(7) ? null : (decimal?)reader.GetDecimal(7),
ShipName = reader.IsDBNull(8) ? null : reader.GetString(8),
ShipAddress = reader.IsDBNull(9) ? null : reader.GetString(9),
ShipCity = reader.IsDBNull(10) ? null : reader.GetString(10),
ShipRegion = reader.IsDBNull(11) ? null : reader.GetString(11),
ShipPostalCode = reader.IsDBNull(12) ? null : reader.GetString(12),
ShipCountry = reader.IsDBNull(13) ? null : reader.GetString(13),
}
);
}
}
return orders;
}
}
class CityComparer : IEqualityComparer<Order>
{
#region IEqualityComparer<Order> Members
public bool Equals(Order x, Order y)
{
return x.ShipCity == null || y.ShipCity == null ?
false : x.ShipCity.Equals(y.ShipCity);
}
public int GetHashCode(Order obj)
{
return obj.ShipCity == null ? -1 : obj.ShipCity.GetHashCode();
}
#endregion
}
class Order
{
public int? OrderID { get; set; }
public string CustomerID { get; set; }
public int? EmployeeID { get; set; }
public DateTime? OrderDate { get; set; }
public DateTime? RequiredDate { get; set; }
public DateTime? ShippedDate { get; set; }
public int? ShipVia { get; set; }
public decimal? Freight { get; set; }
public string ShipName { get; set; }
public string ShipAddress { get; set; }
public string ShipCity { get; set; }
public string ShipRegion { get; set; }
public string ShipPostalCode { get; set; }
public string ShipCountry { get; set; }
}
}
Intersect
and Except
The Intersect
extension method compares an argument sequence with the source sequence returning only those elements in the source sequence that are also in the target sequence. The Except
extension method returns those elements in the source object—the one invoking the Except
method—that are not in the argument object.
Listing 9.7 contains an array of event numbers and a short sequence of Fibonacci numbers. The evens excluding the Fibonacci numbers are all of the evens excluding 2, 8, and 34, which are in the Fibonacci sequence. (Each of the sequences is shown in Figure 9.2.)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ExceptDemo
{
class Program
{
static void Main(string[] args)
{
var evens = new int[] { 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24, 26, 28, 30, 32, 34};
var fibos = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34 };
var setDifference = evens.Except(fibos);
Array.ForEach<int>(setDifference.ToArray(), e => Console.WriteLine(e));
Console.ReadLine();
}
}
}
Listing 9.8 is a throwback to Listing 9.6. In this example, the Main
function is replaced by an exclusion list that returns all of the cities (for the top salesman) except for Münster and Albuquerque. In Listing 9.8, the Distinct
method and OrderBy
capability are used in their extension method form.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Reflection;
using System.Collections;
namespace ExceptDemo2
{
class Program
{
/// <summary>
/// Demonstrates except ship city
/// </summary>
/// <param name=“args”></param>
static void Main(string[] args)
{
List<Order> orders = GetOrders();
Console.WriteLine(orders.Count);
List<Order> exclusions = new List<Order>();
exclusions.Add(new Order { ShipCity = “Münster” });
exclusions.Add(new Order { ShipCity = “Albuquerque” });
var cities = orders.Distinct(new CityComparer()).Except(
exclusions, new CityComparer());
Array.ForEach(cities.OrderBy(o => o.ShipCity).ToArray(),
orderedByCity => Console.WriteLine(orderedByCity.ShipCity));
Console.WriteLine(cities.Count());
Console.ReadLine();
}
}
Intersect
is an extension method that determines the elements found in the source that also exist in the argument sequence returning only elements found in both. Listing 9.9 returns the elements in both evens and the short Fibonacci sequence. (The visualized results are shown in Figure 9.3.)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace IntersectDemo
{
class Program
{
static void Main(string[] args)
{
var evens = new int[] { 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24, 26, 28, 30, 32, 34};
var fibos = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34 };
var intersection = evens.Intersect(fibos);
Array.ForEach<int>(intersection.ToArray(), e => Console.WriteLine(e));
Console.ReadLine();
}
}
}
This chapter suggests that the extension methods described apply to collections. In actuality, these extension methods and LINQ apply to things that implement IEnumerable
, which can be referred in a general sense as collections; however, the extension methods and LINQ apply to things that implement IQueryable
, too. IQueryable
inherits from IEnumerable
. (Remember that interfaces can also use inheritance.)
Listing 9.10 uses Intersect
to find the files that exist in both.NET Framework 2.0 and 3.5. The code uses Directory.GetFiles
from System.IO
to get files in both framework directories and returns those in common in both directories. (There are surprisingly few similarities given the way each framework folder is configured.) Such a technique could be used to find and remove duplicate files, for example.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Security.Permissions;
namespace IntersectDemo2
{
class Program
{
static void Main(string[] args)
{
var v2 = Directory.GetFiles(
@“C:WindowsMicrosoft.NETFrameworkv2.0.50727”, “*.*”,
SearchOption.AllDirectories);
var v35 = Directory.GetFiles(
@“C:WindowsMicrosoft.NETFrameworkv3.5”, “*.*”,
SearchOption.AllDirectories);
var frameworkMatches = v35.Intersect(v2);
Console.WriteLine(“Framework matches between version 2 and 3.5”);
Array.ForEach(frameworkMatches.ToArray(), file => Console.WriteLine(file));
var frameworkDifferences = v35.Except(v2);
Console.WriteLine(“Framework differences between version 2 and 3.5”);
Array.ForEach(frameworkDifferences.ToArray(), file => Console.WriteLine(new FileInfo(file).Name));
Console.ReadLine();
}
}
}
The second part of the example uses Except
to find differences. There are many differences between the two folders. Both Intersect
and Except
in the example use the Array.ForEach
method, a generic Action
delegate, and a Lambda Expression to send the results to the console.
Union
In college, computer science and math majors study discrete mathematics
. It is the study of the propositional and predicate calculus. Union
is basically arithmetic in the predicate calculus.
The Union
extension method adds two sequences together and returns the unique members found in both sequences. If you wrote code to add all of the elements of both sequences and then performed the Distinct
operation on the result, you would get the same sequence as produced by Union
. For example, given a sequence of 1, 2, 3, 4 and 2, 4, 6, the Union
of these sequences will result in a new sequence with the elements 1, 2, 3, 4, 6. Union
is set addition. The Union
extension method returns all of the unique members found in both sets, excluding duplicates. Listing 9.11 contains a short Union
example using the evens and fibos that we used earlier in the chapter.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace UnionDemo
{
class Program
{
static void Main(string[] args)
{
var evens = new int[] { 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24, 26, 28, 30, 32, 34};
var fibos = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34 };
var union = evens.Union(fibos);
Array.ForEach<int>(union.ToArray(), e => Console.WriteLine(e));
Console.ReadLine();
}
}
}
Listing 9.12 uses the code from Listing 9.6 to get a list of Order
objects from the Northwind database. The first LINQ query returns a subsequence that contains orders shipped to Mexico, and the second query returns orders shipped to New Mexico and Texas—ShipRegion
equals “NM” or “TX”. The results are added together to produce sales orders by territory—perhaps the southwestern United States and Mexico.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Reflection;
using System.Collections;
namespace UnionDemo2
{
class Program
{
/// <summary>
/// Demonstrates except ship city
/// </summary>
/// <param name=“args”></param>
static void Main(string[] args)
{
List<Order> orders = GetOrders();
Console.WriteLine(orders.Count);
var citiesInMexico = from order in orders
where order.ShipCountry == “Mexico”
select order;
var citiesinNewMexico = from order in orders
where order.ShipRegion == “NM”
|| order.ShipRegion == “TX”
select order;
var territory = citiesInMexico.Union(citiesinNewMexico);
Array.ForEach(territory.ToArray(),
input => Console.WriteLine(Dump(input)));
Console.ReadLine();
}
}
Set operations are ubiquitous. Because so many programmers perform these tasks manually, it is easy to overlook how often we want some subset of what we already have. This chapter looked at Union
, Intersect
, Except
, and Distinct
. With these extension methods and LINQ, you might never again write nested loops scanning lists for elements in common or distinct. Sure, it’s possible to perform some of these operations with SQL, but not all objects originate in a database, and it is here that set operations will really shine.
3.147.83.176