Goals of this chapter:
• Define the two options for working with set data using LINQ.
• Introduce the HashSet type and how this relates to LINQ.
• Introduce the LINQ standard query operators that relate to working with set data.
There are two ways of applying set-based functions over data sequences using LINQ. This chapter explores the merits of both options and explains when and why to use one method over another.
Set operations allow various functions to compare elements in collections (and in some cases, the same collection) against each other in order to determine overlapping and unique elements within a collection.
Framework libraries for set operations were missing in the .NET Framework 1, 2, and 3. The HashSet
was introduced in .NET Framework 3.5, and this collection type solves most set problems often faced by developers. LINQ extended set function capability with specific operators, some of which overlap with HashSet
functionality. It is important to understand the benefits of both strategies and when to choose one over another. This section looks in detail at the two main choices:
• LINQ standard query operators
• HashSet<T>
class from the Systems.Collections.Generic
namespace
The decision of how to approach a set-based task depends on problem specifics, but in general the strengths of each can be described in the following ways:
Use HashSet and its operators when
• Duplicate items are not allowed in the collections.
• Modifying the original collection is desired. These operators make changes to the original collection.
Use LINQ Operators when
• Duplicate items are allowed in the collections.
• When returning a new IEnumerable<T>
is desired rather than modifying the original collection.
LINQ to Objects has standard query operators for working on sets of elements within collections. These operators allow two different collections (containing the same types of elements) to be merged into a single collection using various methods.
The set operators all implement a deferred execution pattern, simply meaning that they do not evaluate the next element until they are iterated over one element at a time. Each operator is detailed in this section, including the method signatures for each operator.
Concat
combines the contents of two collections. It operates by looping over the first collection yield returning each element, then looping over the second collection yield returning each element. If returning the duplicate elements is not the desired behavior, consider using the Union
operator instead. An ArgumentNullException
is thrown if either collection is null
when this operator is called.
Concat
has a single overload with the following method signature:
Listing 6-1 demonstrates the simplest use of the Concat
operator and the subtle difference between Concat
and Union
. The Console output from this example is shown in Output 6-1.
Listing 6-1. Simple example showing the difference between Concat
and Union
—see Output 6-1
Output 6-1
A useful application of the Concat
operator when binding a sequence to a control is its ability to add an additional entry at the start or end as a placeholder. For example, to make the first entry in a bound sequence the text “—none chosen—”, the code in Listing 6-2 can be used, with the result shown in Figure 6-1.
Listing 6-2. Using the Concat
operator to add values to a sequence—see Figure 6-1
The Distinct
operator removes duplicate elements from a sequence using either the default EqualityComparer
or a supplied EqualityComparer
. It operates by iterating the source sequence and returning each element of equal value once, effectively skipping duplicates. An ArgumentNullException
is thrown if the source collection is null
when this operator is called.
The method signatures available for the Distinct
operator are:
Listing 6-3 demonstrates how to use the Distinct
operator to remove duplicate entries from a collection. This example also demonstrates how to use the built-in string comparison types in order to perform various cultural case-sensitive and insensitive comparisons. The Console output from this example is shown in Output 6-2.
Listing 6-3. Example showing how to use the Distinct
operator—this example also shows the various built-in string comparer statics—see Output 6-2
Output 6-2
The Except
operator produces the set difference between two sequences. It will only return elements in the first sequence that don’t appear in the second sequence using either the default EqualityComparer
or a supplied EqualityComparer
. It operates by first obtaining a distinct list of elements in the second sequence and then iterating the first sequence and only returns elements that do not appear in the second sequence’s distinct list. An ArgumentNullException
is thrown if either collection is null
when this operator is called.
The method signatures available for the Except
operator are:
Listing 6-4 shows the most basic example of using the Except
operator. The Console output from this example is shown in Output 6-3.
Listing 6-4. The Except
operator returns all elements in the first sequence, not in the second sequence—see Output 6-3
Output 6-3
The Intersect
operator produces a sequence of elements that appear in both collections. It operates by skipping any element in the first collection that cannot be found in the second collection using either the default EqualityComparer
or a supplied EqualityComparer
. An ArgumentNullException
is thrown if either collection is null
when this operator is called.
The method signatures available for the Intersect
operator are:
Listing 6-5 shows the most basic use of the Intersect
operator. The Console output from this example is shown in Output 6-4.
Listing 6-5. Intersect
operator example—see Output 6-4
Output 6-4
The Union
operator returns the distinct elements from both collections. The result is similar to the Concat
operator, except the Union
operator will only return an equal element once, rather than the number of times that element appears in both collections. Duplicate elements are determined using either the default EqualityComparer
or a supplied EqualityComparer
. An ArgumentNullException
is thrown if either collection is null
when this operator is called.
The method signatures available for the Union
operator are:
Listing 6-1 demonstrated the subtle difference between Union
and Concat
operators. Use the Union
operator when you want each unique element only returned once (duplicates removed) and Concat
when you want every element from both collection sequences.
Listing 6-6 demonstrates a useful technique of combining data from multiple source types by unioning (or concatenating, excepting, intersecting, or distincting for that matter) data from either a collection of Contact
elements or CallLog
elements based on a user’s partial input. This feature is similar to the incremental lookup features offered by many smart-phones, in which the user inputs either a name or phone number, and a drop-down displays recent numbers and searches the contacts held in storage for likely candidates. This technique works because of how .NET manages equality for anonymous types that are projected. The key to this technique working as expected is to ensure that the projected names for each field in the anonymous types are identical in name, case, and order. If these conditions are satisfied, the anonymous types can be operated on by any of the set-based operators.
Listing 6-6 uses the sample data of Contact
and CallLog
types introduced earlier in this book in Table 2-1 and Table 2-2 with sample partial user-entered data of Ka
and 7
. The Console output from this example is shown in Output 6-5.
Listing 6-6. Anonymous types with the same members can be unioned and concatenated—see Output 6-5
Output 6-5
LINQ’s set operators rely on instances of EqualityComparer<T>
to determine if two elements are equal. When no equality comparer is specified, the default equality comparer is used for the element type by calling the static property Default
on the generic EqualityComparer
type. For example, the following two statements are identical for the Distinct
operator (and all of the set operators):
For programming situations where more control is needed for assessing equality, a custom comparer can be written, or one of the built-in string comparisons can be used.
Listing 6-3 introduced an example that showed case-insensitive matching of strings using the distinct operator. It simply passed in a static instance of a built-in comparer type using the following code:
In addition to the string comparer used in this example, there are a number of others that can be used for a particular circumstance. Table 6-1 lists the available built-in static properties that can be called on the StringComparer
type to get an instance of that comparer.
In Chapter 4 in the “Specifying Your Own Key Comparison Function” section, you first saw the ability to customize how LINQ evaluates equality between objects by writing a custom equality comparer type. As an example we wrote a custom comparison type that resolved equality based on the age-old phonetic comparison algorithm, Soundex. The code for the SoundexEqualityComparer
is shown in Listing 4-5, and in addition to being useful for grouping extension methods, the same equality comparer can be used for the LINQ set operators. For example, Listing 6-7 shows how to use the Soundex algorithm to determine how many distinct phonetic names are present in a list of names. The following code will correctly return the Console window text, Number of unique phonetic names = 4
.
Listing 6-7. Using a custom equality comparer with the Distinct
operator
HashSet<T>
was introduced in .NET Framework 3.5 as part of the System.Collections.Generic
namespace. HashSet is an unordered collection containing unique elements and provides a set of standard set operators such as intersection and union (plus many more). It has the standard collection operations Add
(although this method returns a Boolean indicating whether or not that element already existed in the collection), Remove
, and Contains
, but because it uses a hash-based implementation for object identity, these operations are immediately accessible without looping the entire list as occurs with the List<T>
collection for example (O(1) rather than O(n)).
Although the operators on HashSet would appear to overlap with the LINQ set operators, Intersect
and Union
, these HashSet implementations modify the set they were called on, rather than return a new IEnumerable<T>
collection, as is the behavior of the LINQ operators. For this reason, the names on the HashSet operators are differentiated as IntersectWith
and UnionWith
, and the LINQ operators are also available with a HashSet collection as Intersect
and Union
. This naming distinction avoids naming clashes and also allows the desired behavior to be chosen in a specific case.
HashSet
implements the IEnumerable<T>
pattern and therefore can be used in a LINQ query. Listing 6-8 demonstrates a LINQ query to find even numbers in a HashSet
made by unioning two collections. The Console output from this example is shown in Output 6-6.
Listing 6-8. LINQ query over a HashSet collection—see Output 6-6
Output 6-6
The differences between the HashSet
and LINQ operator support are listed here (as documented in the Visual Studio documentation), although LINQ-equivalent approximations are easy to construct as documented in Table 6-2 and implemented in Listing 6-9.
Listing 6-9. Approximate LINQ implementations of the operators in the HashSet type
Working with set data using LINQ is no harder than choosing the correct standard query operator. The existing HashSet<T>
collection type can also be used, and the decision on which set approach suits your problem boils down to the following factors:
Use HashSet
and its operators when
• Duplicate items are not allowed in the collections.
• Modifying the original collection is desired. These operators make changes to the original collection.
• Duplicate items are allowed in the collections.
• When returning a new IEnumerable<T>
is desired instead of modifying the original collection.
Having explained all of the built-in standard query operators in this and previous chapters, the next chapter looks at how to extend LINQ by building custom operators that extend the LINQ story and integrate into the language just like the Microsoft-supplied operators.
18.188.152.136