Chapter 9
Performing Set Operations

In This Chapter

Image Finding Distinct Elements

Image Defining Exclusive Sets with Intersect and Except

Image Creating Composite Resultsets with Union

“Ah, well, then I suppose I shall have to die beyond my means.”

—Oscar Wilde

Set operations are like knights in the game chess. Set operations—like knights—help you move in imaginative ways. The set operations are implemented as extension methods. Distinct selects a distinct element in a sequence. Union performs set addition, Intersect performs set arithmetic, and Except helps exclude elements from a set.

Because this chapter is composed of a relatively few number of features, a lot of code samples are included. The code samples are intended to help you think of ways to incorporate set logic into your applications. As you will see, some of the samples manipulate data from a database. Obviously, that data could be manipulated in Structured Query Language (SQL) beforehand, so the samples present an alternative to writing the SQL to do the same thing.

Finding Distinct Elements

Distinct is the only set operation that has a Language INtegrated Query (LINQ) keyword, but that is only in Visual Basic (VB). Distinct is designed to examine two sets and eliminate duplicates. Distinct can be invoked with a sequence and by invoking an overloaded form using an object that implements IEqualityComparer.

Listing 9.1 shows a list of numbers representing grades from a group of students (see Figure 9.1). The example invokes the Distinct operation to remove duplicates and then determines the median or middle grade. (A favorite of students, a college professor might use this to “grade on a curve.”) In this listing, the sequence (or set) has duplicates at 72, 75, 81, and 92; these are removed from the sequence. The median grade is then 85. (Chapter 8, “Using Aggregate Operations,” includes an example of implementing a custom aggregator, a Median extension method.)

Figure 9.1 The original list of grades, after Distinct is applied, and the median value.

Image

Listing 9.1 Determining the Median Grade from a List of Numbers Representing Grades

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DistinctDemo
{
  class Program
  {
   static void Main(string[] args)
   {
     var grades = new int[]
       { 65, 72, 72, 75, 75, 77, 79, 80, 81, 81, 81, 85,
         88, 91, 92, 92, 92, 95, 99, 100 };

     // select distinct grades
     var distinct = grades.Distinct();
     Console.WriteLine(“Median grade: {0}”,
       distinct.ToArray<int>()[distinct.Count() / 2]);
     Console.ReadLine();
   }
  }
 }

Finding Distinct Objects Using Object Fields

The example in this section combines SQL and custom objects. Assume you have a sales force and an objective to determine all of the unique cities in which you sell product. The example uses the Distinct method to determine the distinct list of cities (which might then be used to divvy up sales regions by sales representative).

Each part of the solution is provided in its own section with an elaboration on that part of the solution. The complete listing is provided at the end of this section.

Defining a Custom Order Object with Automatic Properties

You can define an entity class very quickly in C# using two methods. One is to use DevExpress’s CodeRush product, which is a cool metaprogramming tool, and the next is to use automatic properties. In Listing 9.2, a custom Order class is defined using automatic properties. In conjunction with the automatic properties, nullable types are employed. Using nullable types permits assignment of null to fields that are actually null in the database rather than contriving arbitrary null representatives, such as the classic – 1 for integers.

Listing 9.2 Creating a Custom Entity Class with Automatic Properties and Nullable Types

class Order
{
  public int? OrderID { get; set; }
  public string CustomerID { get; set; }
  public int? EmployeeID { get; set; }
  public DateTime? OrderDate { get; set; }
  public DateTime? RequiredDate { get; set; }
  public DateTime? ShippedDate { get; set; }
  public int? ShipVia { get; set; }
  public decimal? Freight { get; set; }
  public string ShipName { get; set; }
  public string ShipAddress { get; set; }
  public string ShipCity { get; set; }
  public string ShipRegion { get; set; }
  public string ShipPostalCode { get; set; }
  public string ShipCountry { get; set; }
}

Remember the basic rule is that automatic properties just mean that the basic setter and getter are defined for you and you can’t refer to fields directly. (Refer to Chapter 7, “Sorting and Grouping Queries,” for the introduction to automatic properties.) The absence of a constructor also means you can construct instances of Order with any combination of compound initialization your solution requires.

Instantiating Custom Objects from SQL

Entities are generally classes that represent tables in a database. These can be modeled either with an Entity Relational Diagram (a database modeling tool) or the Unified Modeling Language (and a UML tool). Generally, both kinds of models are not needed but one or the other is helpful. The biggest things to avoid are having entities in code that map to tables defined more than once in code and having entities in code for the sake of having entities. Some entities, such as linking tables, should be used to simply help construct composite objects.

Listing 9.3 uses the canonical Northwind database, a generic List (or Order objects), and a while loop to populate the list. The connection is created and the collection is accessible via an IDataReader. (In the example, the provider-agnostic types were used just to remind you of their availability.) It is also worth noting that the Order objects were constructed using compound type initialization introduced in Chapter 2, “Using Compound Type Initialization.”

Listing 9.3 Constructing Entity Objects from a SQL Database Using a Generic List and Compound Type Initialization

public static List<Order> GetOrders()
{
  string connectionString =
    “Data Source=.\SQLEXPRESS;AttachDbFilename=”C:\Books\Sams” +
    “\LINQ\Northwind\northwnd.mdf”;Integrated Security=True;Connect ” +
    “Timeout=30;User Instance=True”;

  List<Order> orders = new List<Order>();

  using(IDbConnection connection = new SqlConnection(connectionString))
  {
    connection.Open();
    IDbCommand command = new SqlCommand(“SELECT * FROM Orders”);
    command.Connection = connection;
    command.CommandType = CommandType.Text;
    IDataReader reader = command.ExecuteReader();
    while(reader.Read())
    {
      orders.Add(new Order
      {
        OrderID = reader.IsDBNull(0) ? null : (int?)reader.GetInt32(0),
        CustomerID = reader.IsDBNull(1) ? null : reader.GetString(1),
        EmployeeID = reader.IsDBNull(2) ? null : (int?)reader.GetInt32(2),
        OrderDate = reader.IsDBNull(3) ? null :
          (DateTime?)reader.GetDateTime(3),
        RequiredDate = reader.IsDBNull(4) ? null:
          (DateTime?)reader.GetDateTime(4),
        ShippedDate = reader.IsDBNull(5) ? null :
          (DateTime?)reader.GetDateTime(5),
        ShipVia = reader.IsDBNull(6) ? null : (int?)reader.GetInt32(6),
        Freight = reader.IsDBNull(7) ? null : (decimal?)reader.GetDecimal(7),
        ShipName = reader.IsDBNull(8) ? null : reader.GetString(8),
        ShipAddress = reader.IsDBNull(9) ? null : reader.GetString(9),
        ShipCity = reader.IsDBNull(10) ? null : reader.GetString(10),
        ShipRegion = reader.IsDBNull(11) ? null : reader.GetString(11),
        ShipPostalCode = reader.IsDBNull(12) ? null : reader.GetString(12),
        ShipCountry = reader.IsDBNull(13) ? null : reader.GetString(13),
       }
       );
     }
    }

    return orders;
  }
}

The only change you will need to make is to adjust the connection string to match the location of your copy of the Northwind Trading Company database. As a reminder, you can store the connection string in a .config file, and if you are interested, you can learn about writing a custom installer for dynamic database configuration in my article titled, “Implementing a Custom ConnectionString Installer for Setup” on developer.com at www.developer.com/security/article.php/1158037043914.

Implementing an IEqualityComparer

To augment this example, you can perform custom comparisons on Order objects. Previously, it was mentioned that distinct cities are desired. By default, objects perform default referential comparisons. In this case, you want comparison by city. To accomplish this, you can implement an IEqualityComparer for Order objects.

Listing 9.4 demonstrates an example of IEqualityComparer. The basic rule is that two objects are considered equal if the Equals method returns true and the hash code of each object is the same. For that, you need to implement an Equals function that compares the ShipCity of Order objects and returns the hash code from the ShipCity too. The result is shown in Listing 9.4.

Listing 9.4 Implementing an IEqualityComparer By Implementing Equals and GetHashCode

class CityComparer : IEqualityComparer<Order>
{
  #region IEqualityComparer<Order> Members

  public bool Equals(Order x, Order y)
  {
    return x.ShipCity == null || y.ShipCity == null ?
      false : x.ShipCity.Equals(y.ShipCity);
  }

  public int GetHashCode(Order obj)
  {
    return obj.ShipCity == null ? -1 : obj.ShipCity.GetHashCode();
  }

  #endregion
}

Implementing a Simple Object Dumper

It is always helpful to be able to dump the state of an object. It is always too time consuming to implement such behavior manually for every kind of object. To make the state of an object available for debugging and testing, you can implement an object Dumper using reflection.

In Listing 9.5, the dumper uses a StringBuilder and Reflection to get the public properties—representing the state of an object—and iterate through the properties to display the name and value of each property. This is easier in .NET 3.5 because you can use the Array.ForEach method and the generic delegate Action. The Action is implicit as the second argument of the ForEach method.

Listing 9.5 Creating a General Object Dumper Utility Based on Reflection and Properties

public static string Dump<T>(T obj)
{
  Type t = typeof(T);
  StringBuilder builder = new StringBuilder();
  PropertyInfo[] infos = t.GetProperties();

  // feel the enmity of your peers if you write code like this
  Array.ForEach( infos.ToArray(), p => builder.AppendFormat(
    “{0}={1} ”, p.Name, p.GetValue(obj, null) == null ? “” :
    p.GetValue(obj, null)));
  builder.AppendLine();

  return builder.ToString();
}

The comment in Listing 9.5, “feel the enmity of your peers if you write code like this,” is a bit lighthearted. A general rule of thumb is that it is okay to write esoteric code, such as the use of the ternary operator in the second argument of ForEach, as long as the following is true:

Image All the code isn’t esoteric.

Image Esoteric code is only used occasionally.

Image It comprises the bulk of a singular—as opposed to monolithic—function.

Image The code is in a disposable versus domain-significant method.

Image If the code is buggy, it can be easily replaced with a more verbose form.

As a general rule, code like that in the Dumper is a little showy. However, this is a book, and a book is like a commercial for the new 620 Corvette. It’s okay to talk about going 220 miles per hour for marketing purposes because it’s fun, but on real city streets and highways (and in production code), it might be a little dangerous.

Listing 9.6 provides the complete listing as well as the code that uses the Distinct method. The Main function requests the list of Order objects with GetOrders. A quick dump of these objects reveals that we have everything. In Version 1—as noted by the comment—we can request the distinct list of cities by directly calling the extension method and passing in an instance of the CityComparer. Version 1 also uses OrderBy as an extension method to sort the distinct list of cities. Version 2 combines the sort and the call to Distinct in a single LINQ query. Notice that Distinct is used as an extension method in the second example tool; the only real difference between version 1 and 2 is how the sorting behavior is invoked.

Listing 9.6 Sorting and Returning a Distinct List of Cities from the Northwind Order Table

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Reflection;
using System.Collections;

namespace DistinctDemo2
{
  class Program
  {
    /// <summary>
    /// Demonstrates distinct ship city
    /// </summary>
    /// <param name=“args”></param>
    static void Main(string[] args)
    {
      List<Order> orders = GetOrders();
      Array.ForEach(orders.ToArray(), o => Console.WriteLine(Dump(o)));
      Console.WriteLine(orders.Count);

      // version 1 with extension method OrderBy
      var cities = orders.Distinct(new CityComparer());
      Array.ForEach(cities.OrderBy(o => o.ShipCity).ToArray(),
        orderedByCity => Console.WriteLine(orderedByCity.ShipCity));

      // version 2 with query
      var cities2 = from order in orders.Distinct(new CityComparer())
                   orderby order.ShipCity
                   select order;
      Array.ForEach(cities2.ToArray(),
        orderedByCity => Console.WriteLine(orderedByCity.ShipCity));
      Console.WriteLine(cities.Count());
      Console.ReadLine();
    }

    public static string Dump<T>(T obj)
    {
      Type t = typeof(T);
      StringBuilder builder = new StringBuilder();

      PropertyInfo[] infos = t.GetProperties();
      // feel the enmity of your peers if you write code like this
      Array.ForEach( infos.ToArray(), p => builder.AppendFormat(
        “{0}={1} ”, p.Name, p.GetValue(obj, null) == null ? “” :
        p.GetValue(obj, null)));
      builder.AppendLine();
      return builder.ToString();
    }

    public static List<Order> GetOrders()
    {
      string connectionString =
        “Data Source=.\SQLEXPRESS;AttachDbFilename=“C:\Books\Sams” +
        “\LINQ\Northwind\northwnd.mdf”;Integrated Security=True;Connect ” +
        “Timeout=30;User Instance=True”;

      List<Order> orders = new List<Order>();

      using(IDbConnection connection = new SqlConnection(connectionString))
      {
        connection.Open();
        IDbCommand command = new SqlCommand(“SELECT * FROM Orders”);
        command.Connection = connection;
        command.CommandType = CommandType.Text;
        IDataReader reader = command.ExecuteReader();
        while(reader.Read())
        {
          orders.Add(new Order
          {
            OrderID = reader.IsDBNull(0) ? null : (int?)reader.GetInt32(0),
            CustomerID = reader.IsDBNull(1) ? null : reader.GetString(1),
            EmployeeID = reader.IsDBNull(2) ? null : (int?)reader.GetInt32(2),
            OrderDate = reader.IsDBNull(3) ? null :
              (DateTime?)reader.GetDateTime(3),
            RequiredDate = reader.IsDBNull(4) ? null :
              (DateTime?)reader.GetDateTime(4),
            ShippedDate = reader.IsDBNull(5) ? null :
              (DateTime?)reader.GetDateTime(5),
            ShipVia = reader.IsDBNull(6) ? null : (int?)reader.GetInt32(6),
            Freight = reader.IsDBNull(7) ? null : (decimal?)reader.GetDecimal(7),
            ShipName = reader.IsDBNull(8) ? null : reader.GetString(8),
            ShipAddress = reader.IsDBNull(9) ? null : reader.GetString(9),
            ShipCity = reader.IsDBNull(10) ? null : reader.GetString(10),
            ShipRegion = reader.IsDBNull(11) ? null : reader.GetString(11),
            ShipPostalCode = reader.IsDBNull(12) ? null : reader.GetString(12),
            ShipCountry = reader.IsDBNull(13) ? null : reader.GetString(13),
          }
          );
        }
       }
       return orders;
     }
    }

    class CityComparer : IEqualityComparer<Order>
    {
      #region IEqualityComparer<Order> Members
      public bool Equals(Order x, Order y)
      {
        return x.ShipCity == null || y.ShipCity == null ?
          false : x.ShipCity.Equals(y.ShipCity);
      }

      public int GetHashCode(Order obj)
      {
        return obj.ShipCity == null ? -1 : obj.ShipCity.GetHashCode();
    }
    #endregion
  }

  class Order
  {
    public int? OrderID { get; set; }
    public string CustomerID { get; set; }
    public int? EmployeeID { get; set; }
    public DateTime? OrderDate { get; set; }
    public DateTime? RequiredDate { get; set; }
    public DateTime? ShippedDate { get; set; }
    public int? ShipVia { get; set; }
    public decimal? Freight { get; set; }
    public string ShipName { get; set; }
    public string ShipAddress { get; set; }
    public string ShipCity { get; set; }
    public string ShipRegion { get; set; }
    public string ShipPostalCode { get; set; }
    public string ShipCountry { get; set; }
  }
}

Defining Exclusive Sets with Intersect and Except

The Intersect extension method compares an argument sequence with the source sequence returning only those elements in the source sequence that are also in the target sequence. The Except extension method returns those elements in the source object—the one invoking the Except method—that are not in the argument object.

Listing 9.7 contains an array of event numbers and a short sequence of Fibonacci numbers. The evens excluding the Fibonacci numbers are all of the evens excluding 2, 8, and 34, which are in the Fibonacci sequence. (Each of the sequences is shown in Figure 9.2.)

Figure 9.2 An array of even integers, a short Fibonacci sequence and the evens excluding those that are also in the Fibonacci sequence.

Image

Listing 9.7 Calculating a Set Difference and Displaying the Results

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ExceptDemo
{
  class Program
  {
    static void Main(string[] args)
    {
      var evens = new int[] { 2, 4, 6, 8, 10, 12, 14, 16, 18,
        20, 22, 24, 26, 28, 30, 32, 34};
      var fibos = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34 };
      var setDifference = evens.Except(fibos);
      Array.ForEach<int>(setDifference.ToArray(), e => Console.WriteLine(e));
      Console.ReadLine();
    }
  }
}

Listing 9.8 is a throwback to Listing 9.6. In this example, the Main function is replaced by an exclusion list that returns all of the cities (for the top salesman) except for Münster and Albuquerque. In Listing 9.8, the Distinct method and OrderBy capability are used in their extension method form.

Listing 9.8 Creating an Ordered and Distinct List of Customer Cities, Except for Those Specified in an Exclusion List

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Reflection;
using System.Collections;

namespace ExceptDemo2
{
  class Program
  {
    /// <summary>
    /// Demonstrates except ship city
    /// </summary>
    /// <param name=“args”></param>
    static void Main(string[] args)
    {
      List<Order> orders = GetOrders();
      Console.WriteLine(orders.Count);

      List<Order> exclusions = new List<Order>();
      exclusions.Add(new Order { ShipCity = “Münster” });
      exclusions.Add(new Order { ShipCity = “Albuquerque” });
      var cities = orders.Distinct(new CityComparer()).Except(
        exclusions, new CityComparer());

      Array.ForEach(cities.OrderBy(o => o.ShipCity).ToArray(),
        orderedByCity => Console.WriteLine(orderedByCity.ShipCity));

      Console.WriteLine(cities.Count());
      Console.ReadLine();
    }
}

Intersect is an extension method that determines the elements found in the source that also exist in the argument sequence returning only elements found in both. Listing 9.9 returns the elements in both evens and the short Fibonacci sequence. (The visualized results are shown in Figure 9.3.)

Figure 9.3 Even numbers, a short Fibonacci sequence, and the intersection of evens that are in the Fibonacci sequence.

Image

Listing 9.9 Set Intersection Returns a New Collection Containing the Elements Found in Both Collections

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace IntersectDemo
{
  class Program
  {
     static void Main(string[] args)
    {
      var evens = new int[] { 2, 4, 6, 8, 10, 12, 14, 16, 18,
        20, 22, 24, 26, 28, 30, 32, 34};
      var fibos = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34 };
      var intersection = evens.Intersect(fibos);
      Array.ForEach<int>(intersection.ToArray(), e => Console.WriteLine(e));
      Console.ReadLine();
    }
  }
}

This chapter suggests that the extension methods described apply to collections. In actuality, these extension methods and LINQ apply to things that implement IEnumerable, which can be referred in a general sense as collections; however, the extension methods and LINQ apply to things that implement IQueryable, too. IQueryable inherits from IEnumerable. (Remember that interfaces can also use inheritance.)

Listing 9.10 uses Intersect to find the files that exist in both.NET Framework 2.0 and 3.5. The code uses Directory.GetFiles from System.IO to get files in both framework directories and returns those in common in both directories. (There are surprisingly few similarities given the way each framework folder is configured.) Such a technique could be used to find and remove duplicate files, for example.

Listing 9.10 Comparing Files Found in Two Separate Directories

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Security.Permissions;

namespace IntersectDemo2
{
  class Program
  {
    static void Main(string[] args)
    {
      var v2 = Directory.GetFiles(
        @“C:WindowsMicrosoft.NETFrameworkv2.0.50727”, “*.*”,
          SearchOption.AllDirectories);
      var v35 = Directory.GetFiles(
        @“C:WindowsMicrosoft.NETFrameworkv3.5”, “*.*”,
        SearchOption.AllDirectories);

      var frameworkMatches = v35.Intersect(v2);
      Console.WriteLine(“Framework matches between version 2 and 3.5”);
      Array.ForEach(frameworkMatches.ToArray(), file => Console.WriteLine(file));
      var frameworkDifferences = v35.Except(v2);
      Console.WriteLine(“Framework differences between version 2 and 3.5”);
      Array.ForEach(frameworkDifferences.ToArray(), file => Console.WriteLine(new FileInfo(file).Name));
      Console.ReadLine();
    }
  }
}

The second part of the example uses Except to find differences. There are many differences between the two folders. Both Intersect and Except in the example use the Array.ForEach method, a generic Action delegate, and a Lambda Expression to send the results to the console.

Creating Composite Resultsets with Union

In college, computer science and math majors study discrete mathematics. It is the study of the propositional and predicate calculus. Union is basically arithmetic in the predicate calculus.

The Union extension method adds two sequences together and returns the unique members found in both sequences. If you wrote code to add all of the elements of both sequences and then performed the Distinct operation on the result, you would get the same sequence as produced by Union. For example, given a sequence of 1, 2, 3, 4 and 2, 4, 6, the Union of these sequences will result in a new sequence with the elements 1, 2, 3, 4, 6. Union is set addition. The Union extension method returns all of the unique members found in both sets, excluding duplicates. Listing 9.11 contains a short Union example using the evens and fibos that we used earlier in the chapter.

Listing 9.11 An Example Demonstrating Set Addition—Union Extension Method

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace UnionDemo
{
  class Program
  {
    static void Main(string[] args)
    {
      var evens = new int[] { 2, 4, 6, 8, 10, 12, 14, 16, 18,
        20, 22, 24, 26, 28, 30, 32, 34};
      var fibos = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34 };
      var union = evens.Union(fibos);
      Array.ForEach<int>(union.ToArray(), e => Console.WriteLine(e));
      Console.ReadLine();
    }
  }
}

Listing 9.12 uses the code from Listing 9.6 to get a list of Order objects from the Northwind database. The first LINQ query returns a subsequence that contains orders shipped to Mexico, and the second query returns orders shipped to New Mexico and Texas—ShipRegion equals “NM” or “TX”. The results are added together to produce sales orders by territory—perhaps the southwestern United States and Mexico.

Listing 9.12 Orders Where ShipCountry Equals Mexico and ShipRegion Equals Texas and New Mexico

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Reflection;
using System.Collections;

namespace UnionDemo2
{
  class Program
  {
    /// <summary>
    /// Demonstrates except ship city
    /// </summary>
    /// <param name=“args”></param>
    static void Main(string[] args)
    {
      List<Order> orders = GetOrders();
      Console.WriteLine(orders.Count);

      var citiesInMexico = from order in orders
                           where order.ShipCountry == “Mexico”
                           select order;
      var citiesinNewMexico = from order in orders
                              where order.ShipRegion == “NM”
                              || order.ShipRegion == “TX”
                              select order;

      var territory = citiesInMexico.Union(citiesinNewMexico);

      Array.ForEach(territory.ToArray(),
        input => Console.WriteLine(Dump(input)));
      Console.ReadLine();
    }
}

Summary

Set operations are ubiquitous. Because so many programmers perform these tasks manually, it is easy to overlook how often we want some subset of what we already have. This chapter looked at Union, Intersect, Except, and Distinct. With these extension methods and LINQ, you might never again write nested loops scanning lists for elements in common or distinct. Sure, it’s possible to perform some of these operations with SQL, but not all objects originate in a database, and it is here that set operations will really shine.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.83.176