Chapter 9. Arrays, Collection Types, and Iterators

Collection types have been around in various forms since the dawn of programming. I'm sure you remember the linked list exercises when you were learning to write programs. In this chapter, I'll give a brief overview of arrays but won't go into much detail, as arrays have not changed much between the various .NET releases.

However, I'll spend more time explaining the major generic collection interfaces and iterators along with what sorts of cool things you can do with them. Traditionally, creating enumerators for collection types has been tedious and annoying. Iterators make this task a breeze, while making your code a lot more readable in the process.

Introduction to Arrays

C# arrays, as well as arrays in the CLR, are highly evolved from C/C++ arrays. In C/C++, you typically access an array by offsetting a pointer that points to the beginning of a contiguous range of items in a memory block somewhere. C/C++ arrays have no built-in range checking, which is the root of more bugs than you can shake a stick at. C# and the CLR solve this problem elegantly by making the array type a built-in, implicit type to the runtime.

When you declare a type—whether it's a class or struct—the runtime reserves the right to silently generate an array type based upon that new type. The array type that it generates is a reference type—thus, all array instances are of class type. The reference type that it generates is derived from System.Array, and ultimately from System.Object. Therefore, you can treat all C# arrays polymorphically through a reference to System.Array. Of course, that means that each array, no matter what concrete type of array it is, implements all of the methods and properties of System.Array.

The way that you declare an array within C# is similar to C/C++, except the designers of the language took the liberty to make the syntax a tad more intuitive in their minds, in that the square brackets in the declaration follow the type and not the array variable name. The following example shows three ways to create an array:

using System;

public class EntryPoint
{
    static void Main() {
        int[] array1 = new int[ 10 ];
        for( int i = 0; i < array1.Length; ++i ) {
            array1[i] = i*2;
}

        int[] array2 = new int[] { 2, 4, 6, 8 };

        int[] array3 = { 1, 3, 5, 7 };
    }
}

The longhand way to create an array instance and fill it with initial values is shown where array1 is initialized. Items are indexed using an indexer that is typically greater than or equal to 0. You may know that arrays in the CLR can have a user-defined lower bound. However, in C#, the lower bound is always 0 in order to meet the CLS restriction that arrays have a 0 lower bound. The initialization techniques used for array2 and array3 show a shorter notation for doing the same thing. Notice that in most cases, you must first allocate the array instances on the heap using the new operator. The same thing happens with the array3 instance, but the compiler does it for you in order to facilitate the notational shorthand. It's interesting to note that an array of type object—thus, System.Object[]—is itself of type System.Object.

One of the conveniences of .NET arrays is that they are range-checked. Therefore, if you step off the end of one of them, thus going out of bounds, the runtime will throw an IndexOutOfRangeException instead of changing random memory, as in native C/C++. So you can say goodbye to those lurking, hard-to-find range bugs, because the CLR won't allow them to lurk too long, and they definitely won't go unnoticed for long periods of time anymore.

Lastly, notice that you can conveniently iterate through the items in the array using the C# foreach statement. This works because System.Array implements IEnumerable. I have more to say about IEnumerable and its cousin IEnumerator later on, in the section titled "IEnumerable<T>, IEnumerator<T>, IEnumerable, and IEnumerator."

Note

The compiler actually uses an optimization while compiling the foreach loop which I mention in Chapter 13. Instead of actually converting the array to an IEnumerable instance and then calling IEnumerable.GetEnumerator, it instead simply calls any public GetEnumerator method that matches the required signature. This optimization is a form of what is known as duck typing, which I speak more about in Chapter 17. Even though foreach uses this optimization, it is always a good idea to implement IEnumerable and IEnumerable<T> because a consumer may wish to iterate over your collection without using foreach such as with LINQ.

Implicitly Typed Arrays

C# 3.0 introduced an abbreviated way of initializing arrays when the type of the array can be inferred at runtime. Let's have a look at the new syntax in the following code snippet:

using System;

public class EntryPoint
{
    static void Main() {
        // A conventional array
        int[] conventionalArray = new int[] { 1, 2, 3 };
// An implicitly typed array
        var implicitlyTypedArray = new [] { 4, 5, 6 };
        Console.WriteLine( implicitlyTypedArray.GetType() );

        // An array of doubles
        var someNumbers = new [] { 3.1415, 1, 6 };
        Console.WriteLine( someNumbers.GetType() );

        // Won't compile!
        // var someStrings = new [] { "int",
        //                            someNumbers.GetType() };
    }
}

For comparison purposes, the first array variable, named conventionalArray, uses one of the conventional syntaxes for declaring and instantiating an array. However, the next variable, implicitlyTypedArray, uses the new and shorter syntax, and so it is devoid of any type information. Instead of providing the compiler with type information, I instead rely on the compiler to deduce that each item in the array is of type int. And to save even more keystrokes and make my life a bit easier, implicitlyTypedArray is declared as an implicitly typed local variable. If you execute this code, you will see that the WriteLine method call right below it shows that the implicitly typed variable is of type System.Int32[]. In fact, you could have expressed the same line of code as follows:

int[] implicitlyTypedArray = new [] { 4, 5, 6 };

However, because you've already got the compiler figuring out the type of the array elements, you may as well go a little further and let it deduce the entire array type, especially if the variable remains local to the scope of this method. But what happens if you declare the array using multiple types in the initialization list?

When the compiler is presented with multiple types within the initialization list of an implicitly typed array, it determines a type that all the given types are implicitly convertible to. And, of course, for all practical purposes, any type instance is convertible to its own type. Therefore, in the declaration of the someNumbers instance in the example, the compiler determines that all of the types within the braces are convertible to System.Double. Not surprisingly, the following WriteLine method call confirms this. But what if the types are not all implicitly convertible to a single type?

When the compiler cannot find a common type that all of the array variables are convertible to, it will present the compiler warning CS0826, stating

No best type found for implicitly typed array

If you uncomment the line where I have declared the someStrings variable, you will see this behavior, because System.Type instances are not implicitly convertible to System.String.

Note

All objects in .NET are implicitly convertible to System.Object; therefore, you may be wondering why the compiler won't settle on using System.Object as the common type in the previous assignment to someStrings. That is because the compiler may only use types that are present in the implicitly typed array expression. Since System.Object is not one of the types listed in the assignment of someStrings, it is not considered.

But what happens if you declare an array with two items and both types are convertible to each other? Getting into a situation like this is very rare and usually has to be induced by having two custom types that both contain implicit conversion operators to each other's type.[27] Just to see what happened, I did just that. When I attempted to declare an implicitly typed array with an item of each type, I was greeted with the CS0826 compiler error again, which I expected.

You may already be thinking about many useful applications for implicitly typed arrays. But in most cases, they merely save you some typing.[28] This is handy if your array contains closed generic types that require a lot of typing to specify their type. So, in that respect, implicitly typed arrays can make more readable code. But in other cases, they can actually make code harder to follow for a maintenance engineer if knowing the type of the array at the point of declaration is essential to easily reading and understanding what the code is doing.

Note

Implicitly typed variables are actually implicitly typed local variables, so unless the method you are reading is overly complex and huge, you should have no trouble deducing the type yourself. If you have trouble deducing the type from looking at the code, it may mean that the function is too complex and in need of some refactoring. Additionally, if you're using implicitly typed local variables in your code, be sure the name the variables logically so that a maintenance engineer can deduce the type of the variable easily.

All of that said, implicitly typed arrays are great for instantiating n-tuples of items. For example, the following code snippet shows a succinct way to declare a matrix of integers:

using System;

public class EntryPoint {
    static void Main() {
        var threeByThree = new [] {
            new [] { 1, 2, 3 },
            new [] { 4, 5, 6 },
            new [] { 7, 8, 9 }
        };

        foreach( var i in threeByThree ) {
foreach( var j in i ) {
                Console.Write( "{0}, ", j );
            }
            Console.Write( "
" );
        }
    }
}

Type Convertibility and Covariance

When you declare an array to contain instances of a certain reference type, the instances that you may place in that array can actually be instances of a more derived type, or any other reference type implicitly convertible to the contained type. For example, if you create an array that contains instances of type Animal, then you can feasibly insert an instance of Dog or Cat if both of them derive from Animal.

Note

In C/C++, storing instances of type Dog or Cat into arrays as type Animal is strongly discouraged because the objects, if held by value, would get sheared off, the Cat-ness and Dog-ness would get chopped off, and all you'd end up with is Animal-ness. Not so in C#, because the array merely references the objects on the heap. If you want to make an analogy to C/C++ arrays, C# arrays are similar to C/C++ arrays holding pointers to Cat and Dog through pointers to type Animal.

You can coerce array types in another, even more interesting way:

using System;

public class Animal { }
public class Dog : Animal { }
public class Cat : Animal { }

public class EntryPoint
{
    static void Main() {
        Dog[] dogs = new Dog[ 3 ];
        Cat[] cats = new Cat[ 2 ];

        Animal[] animals = dogs;
        Animal[] moreAnimals = cats;
    }
}

The assignment from dogs to animals and from cats to animals is something that you definitely can't do in native C/C++. Arrays in C# are assignable as long as their rank matches and the contained type is a reference type that is implicitly convertible from one to the other. This capability of array assignment in the CLR is provided by the fact that arrays are covariant as opposed to invariant. Both arrays in the previous example have a rank of 1, and Dog and Cat are type-convertible to Animal, thus the assignment works. The C# creators included covariant array support in the CLR primarily to support the Java language. Incidentally, this feature of the language is inherently broken and allows you to insert an instance of Table into an array of Dogs. For more details, refer to the section titled "Co- and Contravariance" in Chapter 11.

Note

The full type information of an array comprises its rank (how many dimensions it has) and the type that it contains.

Sortability and Searchability

If you take a look at the entire System.Array interface as documented in the MSDN documentation, you'll notice that several methods have to do with sorting the items within the array. These methods are usable when the contained type implements IComparable, the standard interface through which items of a particular type are compared.[29] Naturally, you cannot sort a multidimensional array, and if you try, you'll need to be ready to catch an exception of type RankException. So, always be cognizant of what types of things could go wrong when you're calling methods on arrays to do what could appear to be fail-proof operations.

Using the static methods Index and LastIndexOf, you can locate a specific value within an array. If the method fails to find the requested value, it returns −1. No particular search algorithm is involved with these methods other than the fact that the former starts searching from the beginning of the array and the latter starts at the end. If you'd like to perform a faster search, you can use the BinarySearch static method. However, before you can do so, you must sort your array, and of course, that requires that the items within the array implement IComparable/IComparable<T> or you must provide a type that implements IComparer/IComparer<T> which is passed to the Sort method.

Note

The difference between whether your type implements IComparable/IComparable<T> or whether you provide another type which implements IComparer/IComparer<T> to perform the comparison of two instances is a subtle but important one. As Jon Skeet points out, separating the comparison logic from the type itself facilitates greater flexibility. Sort methods should provide an overload that accepts a type that implements IComparer/IComparer<T> which the Sort method then delegates to in order to compare two instances. This design whereby one can provide the algorithm at the time of the sort is a form of the Strategy pattern.

Synchronization

Many times, you'll find it necessary to synchronize access to an array or a collection type that implements ICollection.[30] The System.Array type implements ICollection as well as IList. One of the properties of ICollection is IsSynchronized, which always returns false for regular arrays. That's because regular arrays aren't synchronized by default, because enforcing such a rule would cause those who don't need synchronization to pay a penalty. Therefore, you must manage synchronization yourself.

Tip

If you are going to need synchronization within collections used in multithreaded systems, I highly suggest that you use the collection types in System.Collections.Concurrent. These types were added to .NET 4.0 by the Parallel Computing Platform team at Microsoft and their locking techniques are finely tuned for efficiency in concurrent multithreaded environments.

The easiest way to manage synchronization is via the System.Monitor class, which you normally use via the C# lock keyword. The class allows you to acquire the built-in synchronization lock on an object.[31] However, instead of acquiring a lock on the array object itself, you should acquire a lock on the ICollection.SyncRoot object instead.

Note

You can acquire a lock on any object referenced in the CLR. Each object has a lazily created sync block, which contains the lock variable that the CLR manages internally when System.Monitor attempts to acquire the lock.

Many array and collection implementations are free to return a reference to the actual container via the ICollection.SyncRoot property, but they might not for various reasons. ICollection.SyncRoot provides a common way for synchronizing access to both arrays and collections. I have more to say about synchronization when I cover the ICollection interface in the "Collection Synchronization" section.

Vectors vs. Arrays

It's interesting to note that the CLR supports two special types to deal with arrays in C# code. If your array happens to be single-dimensional, and it happens to have a lower bound of 0, which is usually true for C# arrays,[32] then the CLR uses a special built-in type called a vector, which is actually a subtype of System.Array. The CLR supports special IL instructions defined to work directly with vectors. If your array is multidimensional, then a CLR vector type is not used and an array object is used instead. To demonstrate this, let's take a quick look at some IL code generated by the following short example:

public class EntryPoint
{
    static void Main() {
        int val = 123;
        int newVal;

        int[] vector = new int[1];
        int[,] array = new int[1,1];
        vector[0] = val;
        array[0,0] = val;

        newVal = vector[0];
        newVal = array[0,0];
    }
}

Take a look at the generated IL for the Main method:

.method private hidebysig static void Main() cil managed
{
  .entrypoint
  // Code size       46 (0x2e)
  .maxstack  4
  .locals init ([0] int32 val,
           [1] int32 newVal,
           [2] int32[] 'vector',
           [3] int32[0...,0...] 'array')
  IL_0000:  nop
  IL_0001:  ldc.i4.s   123
  IL_0003:  stloc.0
  IL_0004:  ldc.i4.1
  IL_0005:  newarr     [mscorlib]System.Int32
  IL_000a:  stloc.2
  IL_000b:  ldc.i4.1
  IL_000c:  ldc.i4.1
  IL_000d:  newobj     instance void int32[0...,0...]::.ctor(int32,
                                                             int32)
  IL_0012:  stloc.3
  IL_0013:  ldloc.2
  IL_0014:  ldc.i4.0
  IL_0015:  ldloc.0
IL_0016:  stelem.i4
  IL_0017:  ldloc.3
  IL_0018:  ldc.i4.0
  IL_0019:  ldc.i4.0
  IL_001a:  ldloc.0
  IL_001b:  call       instance void int32[0...,0...]::Set(int32,
                                                           int32,
                                                           int32)
  IL_0020:  ldloc.2
  IL_0021:  ldc.i4.0
  IL_0022:  ldelem.i4
  IL_0023:  stloc.1
  IL_0024:  ldloc.3
  IL_0025:  ldc.i4.0
  IL_0026:  ldc.i4.0
  IL_0027:  call       instance int32 int32[0...,0...]::Get(int32,
                                                            int32)
  IL_002c:  stloc.1
  IL_002d:  ret
} // end of method EntryPoint::Main

Notice the difference between usages of the two C# arrays. On line IL_0005, the newarr IL instruction creates the instance represented by the vector variable. The multidimensional array held in the variable array is created on line IL_000d. In the first case, a native IL instruction handles the operation, whereas a regular constructor call handles the operation in the second case. Similarly, when accessing the elements, the IL instructions stelem and ldelem, on lines IL_0016 and IL_0022 respectively, are used for the vector, whereas regular method calls handle the access to the elements of the multidimensional array.

Because vector support is handled by specific IL instructions tailored specifically for vectors, it's safe to assume that vector use tends to be more efficient than multidimensional array use, even though instances of both derive from System.Array.

Multidimensional Rectangular Arrays

C# and the CLR contain direct support for multidimensional arrays, also known as rectangular arrays. You can easily declare an array with multiple rank within C#. Simply introduce a comma into the square brackets to separate the rank, as shown in the following example:

using System;

public class EntryPoint
{
    static void Main() {
        int[,] twoDim1 = new int[5,3];

        int[,] twoDim2 = { {1, 2, 3},
                           {4, 5, 6},
                           {7, 8, 9} };

        foreach( int i in twoDim2 ) {
            Console.WriteLine( i );
}
    }
}

There are several things to note when using rectangular arrays. All usage of these arrays boils down to method calls on a CLR-generated reference type, and the built-in vector types don't come into play here. Notice the two declarations. In each case, you don't need the size of each dimension when declaring the type. Again, that's because arrays are typed based on their containing type and rank. However, once you create an instance of the array type, you must provide the size of the dimensions. I did this in two different ways in this example. In creating twoDim1, I explicitly said what the dimension sizes are, and in the creation of twoDim2, the compiler figured it out based upon the initialization expression.

In the example, I listed all of the items in the array using the foreach loop as shown. foreach iterates over all items in the array in a row-major fashion. I could have achieved the same goal using two nested for loops, and I definitely would have needed to do that if I needed to iterate over the array elements in any other order. When doing so, keep in mind that the Array.Length property returns the total amount of items in the array. In order to get the count of each dimension, you must call the Array.GetLength method supplying the dimension that you're interested in. For example, I could have iterated over the items in the array using the following syntax, and the results would have been the same:

using System;

public class EntryPoint
{
    static void Main() {
        int[,] twoDim = { {1, 2, 3},
                          {4, 5, 6},
                          {7, 8, 9} };

        for( int i = 0; i != twoDim.GetLength(0); ++i ) {
            for( int j = 0; j != twoDim.GetLength(1); ++j ) {
                Console.WriteLine( twoDim[i,j] );
            }
        }

        for( int i = twoDim.GetLowerBound(0);
             i <= twoDim.GetUpperBound(0);
             ++i ) {
            for( int j = twoDim.GetLowerBound(1);
                 j <= twoDim.GetUpperBound(1);
                 ++j ) {
                Console.WriteLine( twoDim[i,j] );
            }
        }
    }
}

For good measure, I've shown how to iterate over the dimensions of the array using two methods. The first method assumes that the lower bound of each dimension is 0, and the second does not. In all of the calls to GetLength, GetUpperBound, and GetLowerBound, you must supply a zero-based dimension of the Array that you're interested in.

Note

All arrays created within C# using the standard C# array declaration syntax will have a lower bound of 0. However, if you're dealing with arrays used for mathematical purposes, as well as arrays that come from assemblies written in other languages, you may need to consider that the lower bound may not be 0.

When you access the items of a multidimensional array, the compiler generates calls to Get and Set methods, which are similar to GetValue and SetValue. These methods are overloaded to accept a variable list of integers to specify the ordinal of each dimension within the array.

When mapping multidimensional arrays to mathematical concepts, the rectangular array is the most natural and preferred way to go. However, creating methods where an argument may be an array of varying rank is tricky, because you must accept the argument as type System.Array and dynamically deal with the rank of the array. You can access the rank of an array using the Array.Rank property. Thus, creating rank-general code is tricky due to the syntactical burden of accessing all array items through method calls to System.Array, but it is entirely possible. Moreover, the most general array-manipulation code should also handle the case of nonzero lower bounds in the individual ranks.

Multidimensional Jagged Arrays

If you come from a C/C++ or Java background, you're probably already familiar with jagged arrays, because those languages don't support rectangular multidimensional arrays like C# does. The only way to implement multidimensional arrays is to create arrays of arrays, which is precisely what a jagged array is. However, because each element of the top-level array is an individual array instance, each array instance in the top-level array can be any size. Therefore, the array isn't necessarily rectangular—hence, the name jagged arrays.

The syntactical pattern for declaring a jagged array in C# is similar to its cousins C++ and Java. The following example shows how to allocate and use a jagged array:

using System;
using System.Text;

public class EntryPoint
{
    static void Main() {
        int[][] jagged = new int[3][];

        jagged[0] = new int[] {1, 2};
        jagged[1] = new int[] {1, 2, 3, 4, 5};
        jagged[2] = new int[] {6, 5, 4};

        foreach( int[] ar in jagged ) {
            StringBuilder sb = new StringBuilder();
            foreach( int n in ar ) {
                sb.AppendFormat( "{0} ", n );
            }
            Console.WriteLine( sb.ToString() );
        }
        Console.WriteLine();
for( int i = 0; i < jagged.Length; ++i ) {
            StringBuilder sb = new StringBuilder();
            for( int j = 0; j < jagged[i].Length; ++j ) {
                sb.AppendFormat( "{0} ", jagged[i][j] );
            }
            Console.WriteLine( sb.ToString() );
        }
    }
}

As you can see, allocating and creating a jagged array is a bit more complex than rectangular arrays because you must handle all of the subarray allocations individually, whereas a rectangular array gets allocated all at once. Notice how the output provides a jagged-looking output, because each subarray has a different size:

1 2
1 2 3 4 5
6 5 4

In the example, I show two ways to iterate through the array just to show the syntax for accessing the individual items within a jagged array and how that syntax differs from accessing items within a rectangular array. The syntax is similar to that of C++ and Java. The foreach method of iterating through the array is more elegant, and as I'll cover later on, using foreach allows you to use the same code to iterate through collections that may not be arrays.

Note

It's preferable to use foreach to iterate through arrays and collections. That way, you can change the type of the container later and the foreach block won't have to change. If you use a for loop instead, you may have to change the method used to access each individual element. Additionally, foreach handles cases where the array has a nonzero lower bound.

It often makes sense to use jagged arrays rather than rectangular arrays. For example, you may be reading in information from a database, and each entry in the top-level array may represent a collection where each subcollection may have a widely varying amount of items in it. If most of the subcollections contain just a handful of items and then one of them contains 100 items, a rectangular array would waste a lot of space because it would allocate 100 entries for each subcollection no matter what. Jagged arrays are generally more space efficient, but the trade-off is that accessing items within a jagged array requires more care, because you cannot assume that each subarray has the same number of items in it.

Note

Jagged arrays can potentially be more computationally efficient, because jagged arrays are typically arrays of single-dimension, zero-lower-bound arrays, which the CLR represents with vectors, as described previously in this chapter.

Collection Types

Ever since its inception, the .NET Framework has offered a host of collection types for managing everything from an expandable array via ArrayList, a Queue, a Stack, or even a dictionary via the HashTable class. Over the years, newer versions of the .NET Framework expanded these types. Generally, a collection is any type that holds on to a set of objects and implements IEnumerable or IEnumerable<T>. The objects in the set are typically related to each other in some way defined by the problem domain.

I'm assuming that you're already familiar with the nongeneric collection types and collection interfaces available in .NET 1.1—specifically, those defined in the System.Collections and System.Collections.Specialized namespaces. You can find plenty of documentation on these in the MSDN. Throughout this discussion, I'll call the old collection types the nongeneric collection types in order to distinguish them from the new collection types and interfaces defined within the System.Collections.Generic and System.Collections.ObjectModel namespaces.

Comparing ICollection<T> with ICollection

The most obvious additions to the collection types starting within the .NET 2.0 Framework are the types defined within the System.Collections.Generic namespace. You can read much more about generics in Chapter 11. These types are strongly typed, thus giving the compiler a bigger type-safety hammer to wield when ferreting out type-mismatch bugs at compile time. In addition, when used to contain value types, they are much more efficient, because there is no gratuitous boxing. Arguably, the root type of all the generic collection types is ICollection<T>. I have included the declaration for it here:

public interface ICollection<T> : IEnumerable<T>, IEnumerable
{
   int Count { get; }
   bool IsReadOnly { get; }
   void Add( T item );
   void Clear();
   bool Contains( T item );
   void CopyTo( T[] array, int arrayIndex );
   bool Remove( T item );
}

For the sake of comparison, I've included the nongeneric ICollection interface definition as well:

public interface ICollection : IEnumerable
{
   int Count { get; }
   bool IsSynchronized { get; }
   object SyncRoot { get; }
   void CopyTo( Array array, int index );
}

Now, let's take a look at the differences and what that means for your code. One thing that has been missing with the nongeneric collections is a uniform interface for managing the contents of the collection. For example, the nongeneric Stack and Queue types both have a Clear method to erase their contents. As expected, they both implement ICollection. However, because ICollection doesn't contain any modifying methods, you generally can't treat instances of these two types polymorphically within code. Thus, you would always have to cast an instance variable to type Stack in order to call Stack.Clear, and cast to type Queue in order to call Queue.Clear.

ICollection<T> helps this problem by declaring some methods for modifying the collection. As with most general-use solutions, it does not necessarily apply to all situations. For example, ICollection<T> also declares an IsReadOnly property, because sometimes you need to introduce an immutable collection in your design. For those instances, you would expect calls to Add, Clear, and Remove to throw an InvalidOperationException.

Note

For better performance, it's recommended that calling code determines if such operations are forbidden by first checking the IsReadOnly property, thus avoiding the exception altogether. Of course, if the end result of IsReadOnly returning true is that you throw an exception, then there is no gain.

Since a main purpose of ICollection<T> is to provide stronger type safety, it only makes sense that ICollection<T> provides its own version of CopyTo that is strongly typed. Whereas ICollection.CopyTo knows that the first parameter is an array and accepts a System.Array reference as its first parameter, ICollection<T>.CopyTo is given the concrete array type in its first parameter. Clearly, you can only pass a single dimension array to ICollection<T>.CopyTo. The fact is that the nongeneric ICollection.CopyTo only accepts an array of single dimension as well, but because the compiler cannot determine the rank of a System.Array type at compile time, you get a runtime exception of the type ArgumentException if you pass an array with more than one dimension to a proper implementation of ICollection.CopyTo. Notice that I said "a proper implementation." Not only is the caller of ICollection.CopyTo supposed to know this rule, but so is the type implementing ICollection. The added type information in ICollection<T>.CopyTo not only protects both the caller and the implementer from making this mistake, it also provides greater efficiency.

You'll notice that all of the generic collection types implement both ICollection<T> and ICollection. Both interfaces provide useful utility to the container type. Any methods in ICollection that overlap with ICollection<T> should be implemented explicitly.

Note

When defining your own collection types, you should derive from Collection<T> in the System.Collections.ObjectModel namespace unless there is a good reason not to do so. For instance, Collection<T> might have some functionality that you don't want, or you must be explicit about how the items are stored in the collection and has protected virtual methods that you can override to control its behavior. When you don't derive from Collection<T>, your job is much more laborious, because you must reimplement most of what Collection<T> already implements. If you are creating your own custom dictionary type, derive from KeyedCollection<TKey, TItem> instead. Incidentally, List<T> is not designed to be used as a base class because it does not have any ways for one to override its behavior.

Collection Synchronization

One capability present in ICollection that is missing from its generic counterpart is the provision for handling multithreaded synchronization generally across all collections. By default, most collection types are not synchronized. You can access the IsSynchronized property to determine whether the collection is synchronized. Most of the time, including with System.Array, the answer will be false. However, sometimes you'll require synchronization while accessing these collections from multiple threads.

There are a couple of ways to control synchronization to collections that return false from ICollection.IsSynchronized. The most basic way is to use the ICollection.SyncRoot property, which returns an object that you can subsequently use with the System.Monitor—usually via the C# lock statement—to guard access to the collection. Handling it this way gives you much greater flexibility when accessing the collection, because you control the granularity of exactly when the lock is acquired and released. However, the burden is on you to make sure that locking is handled appropriately, because the collection doesn't attempt to acquire the lock internally.

Note

Choosing how to implement synchronization is a classic engineering trade-off decision to make when designing new collections that implement ICollection. You can implement synchronization internally to the collection, but clients that don't need it pay a performance penalty. You also can externalize the synchronization by implementing ICollection.SyncRoot, but then you rely on the clients to manage the synchronization correctly. You should consider your application domain thoroughly when choosing between the two.

In some cases, collection types simply return this for ICollection.SyncRoot. It is best practice that you never synchronize access to a collection by passing its reference directly to the System.Monitor. Instead, always use the object obtained through the SyncRoot property, even though it may actually return this.

As an alternative to managing the SyncLock manually, most of the nongeneric collection types in the standard library implement a Synchronized method, which returns an object that wraps the collection and manages the synchronization lock for you. You may want to consider applying this same pattern when creating collection types of your own. By using the wrapper returned by the Synchronized method, client code that uses the collection doesn't have to change in order to work in a multithreaded environment. When implementing your own collections, always allow clients to choose whether synchronization is used and never force it upon them.

Tip

If you are going to need synchronization within collections used in multithreaded systems, I highly suggest that you use the collection types in System.Collections.Concurrent. These types were added to .NET 4.0 by the Parallel Computing Platform team at Microsoft and their locking techniques are finely tuned for efficiency in concurrent multithreaded environments.

Lists

One thing that is missing from ICollection<T>, and for good reason, is an index operator that allows you to access the items within the collection using the familiar array-access syntax. The fact is that not all concrete types that implement ICollection<T> need to have an index operator, and in some of those cases, it makes no sense for them to have an index operator. For example, an index operator for a list of integers would probably accept a parameter of type int, whereas a dictionary type would accept a parameter type that is the same as the key type in the dictionary.

If you're defining a collection where it makes sense to index the items, then you want that collection to implement IList<T>. Concrete generic list collection types typically implement the IList<T> and IList interfaces. IList<T> implements ICollection<T>, and IList implements ICollection, so any type that is a list is also a collection. The IList<T> interface looks like the following:

public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
   T this[ int index ] { get; set; }
   int IndexOf( T item );
   void Insert( int index, T item );
   void RemoveAt( int index );
}

The IList interface is a bit larger:

public interface IList : ICollection, IEnumerable
{
   bool IsFixedSize { get; }
   bool IsReadOnly { get; }
   object this[ int index ] { get; }
   int Add( object value );
   void Clear();
   bool Contains( object value );
   int IndexOf( object value );
   void Insert( int index, object value );
   void Remove( object value );
   void RemoveAt( int index );
}

As you can see, there is some overlap between IList<T> and IList, but there are plenty of useful properties and methods in IList that a generic container such as List<T>, or any other generic list that you create, would want. As with ICollection<T> and ICollection, the typical pattern is to implement both interfaces. You should explicitly implement the methods of IList that overlap in functionality with those of IList<T>, so that the only way to get to them is to convert the instance reference to the IList type first.

Note

Generally, when implementing your own list types, you should derive your implementation from Collection<T> in the System.Collections.ObjectModel namespace.

Dictionaries

The .NET 2.0 Framework introduced the IDictionary<TKey, TValue> type as a generic and thus strongly typed counterpart to IDictionary. As usual, concrete types that implement IDictionary<TKey, TValue> should implement IDictionary as well. There is a lot of overlap, and the generic interface declares more type-safe versions of some properties and methods declared in IDictionary. However, there is also a new method available on IDictionary<TKey, TValue> called TryGetValue, which you can use to attempt to get a value based on the given key. The method returns the value through an out parameter, and the actual return value from the method indicates whether the item was in the dictionary. Although you can do this same thing using the index operator and catching the KeyNotFoundException when the item is not in there, it is always more efficient to avoid exceptions if you know the item is probably not there. Using exceptions for the purpose of control flow is a practice to avoid for two reasons. First, using exceptions for control flow is inefficient, because exceptions are expensive. Second, it trivializes the fact that an exception is a truly exceptional event. When using exceptions for control flow, you're using exceptions to handle an expected event. You'll find more cases of this Try... method call pattern throughout the .NET Framework, because the .NET team made a concerted effort to avoid efficiency bottlenecks such as these.

Note

When implementing generic dictionaries, you have a couple of choices from which to derive implementations. First, you can use SortedDictionary<TKey, TValue>, which provides O(log n) retrieval and implements IDictionary<TKey, TValue> as well as the collection interfaces. However, you can also choose to use KeyedCollection<TKey, TValue> in the System.Collections.ObjectModel namespace. Although it doesn't actually implement the dictionary interfaces, it does provide O(1) retrieval most of the time. For more details, see the MSDN documentation.

Sets

The .NET 3.5 Framework introduced yet another useful collection class, known as HashSet, which is defined in the System.Collections.Generic namespace. HashSet implements the typical set operations that you would expect. For example, you can call the IntersectWith method to modify the current set so that it will contain an intersection of the current items and the items contained in the IEnumerable<T> type given. Conversely, UnionWith modifies the current set to contain the union of two sets. Other useful methods include IsSubsetOf, IsSupersetOf, ExceptWith, SymmetricExceptWith, Contains, etc. These are just a few of the useful methods available for sets.

Note

Notice that the various set operation methods implemented by HashSet accept parameters of type IEnumerable<T>. This is very handy because it allows you to use any collection type as the parameter to these methods rather than only HashSet instances.

As is typical with set operations, you can only add unique values to instances of HashSet. For example, if you have already added the values 1, 2, and 3 to a HashSet<int> instance, then you cannot add another integer corresponding to one of those values. This is the reason the Add method returns a Boolean indicating whether the operation succeeded or not. It would be inefficient to throw an exception in such cases, so the result is indicated via the return value from Add.

System.Collections.ObjectModel

For those of you who need to define your own collection types, you'll find the types defined in the System.Collection.ObjectModel namespace most useful. In fact, you should derive your implementations from the objects in this namespace, if at all possible. This namespace contains only five types, and the fact that this namespace exists has been the source of some controversy. There were two main reasons these types were broken out into their own namespace. First, the Visual Basic environment already contains a Collection type that is implemented by a namespace it imports by default, and the Visual Basic team was concerned that VB users could become confused by seeing two types with similar names and drastically different behaviors popping up in IntelliSense. Second, the Base Class Libraries (BCL) team thought that users would rarely need the types in this namespace. Whether that is true will be shown over time. My opinion is that these types are extremely useful for writing libraries or for code consumed by others. One of Microsoft's guidelines even suggests that you should consider creating a subclass of these types when exposing collections, even if only to provide a richer type name describing the collection and an easily accessible extensibility point.

These types are extremely useful if you're defining collection types of your own. You can derive your type from Collection<T> easily in order to get default collection behavior, including implementation of ICollection<T>, IList<T>, and IEnumerable<T>. Collection<T> also implements the nongeneric interfaces ICollection, IList, and IEnumerable. However, you may have to cast the type to one of these interfaces explicitly to access the properties and methods of them, because many of them are implemented explicitly. Moreover, the Collection<T> type uses the NVI pattern[33] to provide the derived type with a set of protected virtual methods that you can override. I won't list the entire public interface to Collection<T> here, because you can find the details in the MSDN documentation. However, the protected virtual methods that you may override are shown in the following code:

public class Collection<T> : ICollection<T>, IList<T>, IEnumerable<T>,
                             ICollection, IList, IEnumerable
{
   ...
   protected virtual void ClearItems();
   protected virtual void InsertItem( int index, T item );
   protected virtual void RemoveItem( int index );
protected virtual void SetItem( int index, T item );
   ...
}

You cannot modify the storage location of the collection by overriding these methods. Collection<T> manages the storage of the items, and the items are held internally through a private field of type IList<T>. However, you can override these methods to manage extra information triggered by these operations. Just be sure to call through to the base class versions in your overrides.

Finally, the Collection<T> type offers two constructors: one creates an empty instance, and the other accepts an IList<T>. The constructor copies the passed-in contents of the IList<T> instance into the new collection in the order that they are provided by the enumerator returned from IList<T>.GetEnumerator. This ordering is important to note, as you'll see a way to control it in the following section on enumerators and iterator blocks. The implementation of the source list's enumerator can do such things as reverse the order of the items as they're put into the collection, simply by providing a proper enumerator implementation. Personally, I believe there should be more constructors on Collection<T> that accept an interface of type IEnumerator<T> and IEnumerable<T> in order to provide more flexible ways to fill a collection. You can solve this problem by introducing the extra constructors into a type that derives from Collection<T>, as I've shown here:

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

public class MyCollection<T> : Collection<T>
{
   public MyCollection() : base() {
    }

    public MyCollection( IList<T> list )
        : base(list) { }

    public MyCollection( IEnumerable<T> enumerable )
        : base() {
        foreach( T item in enumerable ) {
            this.Add( item );
        }
    }

    public MyCollection( IEnumerator<T> enumerator )
        : base() {
        while( enumerator.MoveNext() ) {
            this.Add( enumerator.Current );
        }
    }
}

public class EntryPoint
{
    static void Main() {
        MyCollection<int> coll =
            new MyCollection<int>( GenerateNumbers() );

        foreach( int n in coll ) {
Console.WriteLine( n );
        }
    }

    static IEnumerable<int> GenerateNumbers() {
        for( int i = 4; i >= 0; —i ) {
            yield return i;
        }
    }
}

In Main, you can see the instance of MyCollection<int> created by passing in an IEnumerable<int> type returned from the GenerateNumbers method. If the yield keyword in the GenerateNumbers method looks foreign to you, it may be because it's a feature added in C# 2.0. I'll explain this keyword a little later on in this chapter. Essentially, it defines what's called an iterator block, which creates a compiler-generated enumerator from the code. After creating a MyCollection<T> constructed type, you can still hold on to it and use it solely through a Collection<T> reference. After all, MyCollection<T> is-a Collection<T>. Incidentally, I didn't bother creating constructors that accept the nongeneric IEnumerable and IEnumerator, simply because I want to favor stronger type safety.

You may have noticed the existence of List<T> in the System.Collections.Generic namespace. It would be tempting to use List<T> in your applications whenever you need to provide a generic list type to consumers. However, instead of using List<T>, consider Collection<T>. List<T> doesn't implement the protected virtual methods that Collection<T> implements. Therefore, if you derive your list type from List<T>, your derived type has no way to respond when modifications are made to the list. On the other hand, List<T> serves as a great tool to use when you need to embed a raw list-like storage implementation within a type, because it is devoid of virtual method calls such as Collection<T> and is more efficient as a result.

Another useful type within the System.Collections.ObjectModel namespace is the type ReadOnlyCollection<T>, which is a wrapper you can use to implement read-only collections. Since the C# language lacks any notion of using the const keyword for const-correctness like in C++, it is essential to create immutable types when necessary and pass those to methods in lieu of const parameters. The constructor for ReadOnlyCollection<T> accepts an IList<T> parameter type. Thus, you can use a ReadOnlyCollection<T> to wrap any type that implements IList<T>, including Collection<T>. Naturally, if users access the ICollection<T>.IsReadOnly property, the answer will be true. Any time users call a modifying method such as ICollection<T>.Clear, an exception of type NotSupportedException will be thrown. Moreover, in order to call modifying methods, the ReadOnlyCollection<T> reference must be cast to the interface containing the method, because ReadOnlyCollection<T> implements all modifying methods explicitly. The biggest benefit of implementing these methods explicitly is to help you avoid their use at compile time.

Efficiency

When given a choice, you should always prefer the generic collection types over the nongeneric versions because of added type safety and higher efficiency. Let's consider the efficiency standpoint a little more closely. When containing value types, the generic types avoid any unnecessary boxing and unboxing. Boxing is definitely a much more expensive operation than unboxing, because boxing requires a heap allocation but an unboxing operation doesn't. Rico Mariani pinpoints many other efficiency bottlenecks in his blog, Rico Mariani's Performance Tidbits.[34] He indicates that the development teams spent a lot of time focusing specifically on performance issues and simplifying things to make them better. One excellent example that he provides illustrates how List<T> is remarkably faster than ArrayList when used in many foreach iterations. However, the speed is not because of the obvious boxing/unboxing reasons, but rather because ArrayList uses a gratuitous amount of virtual methods, especially during enumeration. ArrayList.GetEnumerator is virtual, and the nested enumerator type ArrayListEnumeratorSimple also implements the MoveNext method and the Current property virtually. That adds up to many costly virtual methods to call during enumeration. Unless you're enumerating an ArrayList like a crazed demon, you won't notice this performance penalty, but it just goes to show how much attention the BCL development team has been putting on efficiency lately.

This is a great example of why you want to analyze your class designs clearly to ensure that you're making your classes inheritable for a good reason. Don't make a method virtual unless you're positive someone will need to override it, and if you do, make sure you use the NVI pattern covered in Chapter 13. It is my firm belief that you should tend toward creating sealed classes, unless you're absolutely sure that there is a good reason why people would want to inherit from your class. If you can't think of a reason why they would want to, don't leave it unsealed just because you think someone may come up with a good reason in the future. If you don't come up with a good reason, then it's unlikely that you created your class with inheritance in mind, and it may not work as expected for whatever derives from your class. Inheritability should be a conscious decision and not a subconscious one.

Note

Even if your class derives from a class that uses virtual methods, it will be more efficient if you declare it sealed, because the compiler can then call those virtual methods nonvirtually when calling through a reference to the derived type.

There is one caveat to everything mentioned so far: Gratuitous use of generics, or any feature for that matter, without knowing the ramifications is never good. Whenever a fully constructed type is created, the runtime must generate that code within memory. Also, fully constructed types created from generic types with static fields will each get their own copy of the static fields. Moreover, they'll all get their own version of the static constructor. So, if the generic contains a field like this:

public class MyGeneric<T>
{
   public static int staticField;
}

then MyGeneric<int>.staticField and MyGeneric<long>.staticField will both reference different storage locations.

The moral of the story is that you must consider the engineering trade-off. Although generics help avoid boxing and generally create more efficient code, they can also increase the size of your application's working set. If in doubt, measure the results using performance-analysis tools to determine the proper route to take.

IEnumerable<T>, IEnumerator<T>, IEnumerable, and IEnumerator

You've seen how you can use the C# foreach statement to conveniently iterate over a collection of objects, including a System.Array, ArrayList, List<T>, and so on. How does this work? The answer is that each collection that expects to work with foreach must implement the IEnumerable<T> or IEnumerable interface that foreach uses to obtain an object that knows how to enumerate, or iterate over, the items in the collection. The iterator object obtained from IEnumerable<T> must implement the IEnumerator<T> or IEnumerator interface. Generic collection types typically implement IEnumerable<T>, and the enumerator object implements IEnumerator<T>. IEnumerable<T> derives from IEnumerable, and IEnumerator<T> derives from IEnumerator. This allows you to use generic collections in places where nongeneric collections are used. Strictly speaking, your collection types are not required to implement enumerators, and users can iterate through the collection using a for loop if you provide an index operator by implementing IList<T>, for example. However, you won't make many friends that way, and once I show you how easy it is to create enumerators using iterator blocks, you'll see that it's a piece of cake to implement IEnumerable<T> and IEnumerator<T>.

Many of you may already be familiar with the nongeneric enumerator interfaces and how to implement enumerators on your collection types. In the rest of this section, I'll quickly go over the salient points of creating enumerators from scratch, and I'll quickly transition to how to create enumerators the new and improved way using iterator blocks. If you'd like, you may skip to the next section on iterators. Or if you want a refresher on implementing enumerators, go ahead and read the rest of this section.

The IEnumerable<T> interface exists so that clients have a well-defined way to obtain an enumerator on the collection. The following code defines the IEnumerable<T> and IEnumerable interfaces:

public interface IEnumerable<T> : IEnumerable
{
   IEnumerator<T> GetEnumerator();
}

public interface IEnumerable
{
   IEnumerator GetEnumerator();
}

Since both interfaces implement GetEnumerator with the same overload signature (remember, the return value doesn't take part in overload resolution), any collection that implements IEnumerable<T> needs to implement one of the GetEnumerator methods explicitly. It makes the most sense to implement the non-generic IEnumerable.GetEnumerator method explicitly to make the compiler happy.

The IEnumerator<T> and IEnumerator interfaces are shown here:

public interface IEnumerator<T> : IEnumerator, IDisposable
{
   T Current { get; }
}

public interface IEnumerator
{
   object Current { get; }
   bool MoveNext();
   void Reset();
}

Again, the two interfaces implement a member that has the same signature, which, in this case, is the Current property. When implementing IEnumerator<T>, you should implement IEnumerator.Current explicitly. Also, notice that IEnumerator<T> implements the IDisposable interface. Later, I'll explain why this is a good thing.

Now I'm going to show you how to implement IEnumerable<T> and IEnumerator<T> for a home-grown collection type. Good teachers always show you how to do something the "hard way" before introducing you to the "easy way." I think this technique is useful because it forces you to understand what is happening under the covers. When you know what's happening underneath, you're more adept at dealing with the technicalities that may come from using the "easy way." Let's look at an example of implementing IEnumerable<T> and IEnumerator<T> the hard way by introducing a home-grown collection of integers. I'll show how to implement the generic versions, because that implies that you must also implement the nongeneric versions as well. In this example, I haven't implemented ICollection<T> so as not to clutter the example, because I'm focusing on the enumeration interfaces:

using System;
using System.Threading;
using System.Collections;
using System.Collections.Generic;

public class MyColl<T> : IEnumerable<T>
{
    public MyColl( T[] items ) {
        this.items = items;
    }

    public IEnumerator<T> GetEnumerator() {
        return new NestedEnumerator( this );
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }

    // The enumerator definition.
    class NestedEnumerator : IEnumerator<T>
    {
        public NestedEnumerator( MyColl<T> coll ) {
            Monitor.Enter( coll.items.SyncRoot );
            this.index = −1;
            this.coll = coll;
        }

        public T Current {
            get { return current; }
        }

        object IEnumerator.Current {
            get { return Current; }
        }

        public bool MoveNext() {
if( ++index >= coll.items.Length ) {
                return false;
            } else {
                current = coll.items[index];
                return true;
            }
        }

        public void Reset() {
            current = default(T);
            index = 0;
        }

        public void Dispose() {
            try {
                current = default(T);
                index = coll.items.Length;
            }
            finally {
                Monitor.Exit( coll.items.SyncRoot );
            }
        }

        private MyColl<T> coll;
        private T current;
        private int index;
    }

    private T[] items;
}

public class EntryPoint
{
    static void Main() {
        MyColl<int> integers =
            new MyColl<int>( new int[] {1, 2, 3, 4} );

        foreach( int n in integers ) {
            Console.WriteLine( n );
        }
    }
}

Note

In most real-world cases, you would derive your custom collection class from Collection<T> and get the IEnumerable<T> implementation for free.

This example initializes the internal array within MyColl<T> with a canned set of integers, so that the enumerator will have some data to play with. Of course, a real container should implement ICollection<T> to allow you to populate the items in the collection dynamically. The foreach statements expand into code that obtains an enumerator by calling the GetEnumerator method on the IEnumerable<T> interface. The compiler is smart enough to use IEnumerator<T>.GetEnumerator rather than IEnumerator.GetEnumerator in this case. Once it gets the enumerator, it starts a loop, where it first calls MoveNext and then initializes the variable n with the value returned from Current. If the loop contains no other exit paths, the loop will continue until MoveNext returns false. At that point, the enumerator finishes enumerating the collection, and you must call Reset on the enumerator in order to use it again.

Even though you could create and use an enumerator explicitly, I recommend that you use the foreach construct instead. You have less code to write, which means fewer opportunities to introduce inadvertent bugs. Of course, you might have good reasons to manipulate the enumerators directly. For example, your enumerator could implement special methods specific to your concrete enumerator type that you need to call while enumerating collections. If you must manipulate an enumerator directly, be sure to always do it inside a using block, because IEnumerator<T> implements IDisposable.

Notice that there is no synchronization built into enumerators by default. Therefore, one thread could enumerate over a collection, while another thread modifies it. If the collection is modified while an enumerator is referencing it, the enumerator is semantically invalid, and subsequent use could produce undefined behavior. If you must preserve integrity within such situations, then you may want your enumerator to lock the collection via the object provided by the SyncRoot property. The obvious place to obtain the lock would be in the constructor for the enumerator. However, you must also release the lock at some point. You already know that in order to provide such deterministic cleanup, you must implement the IDisposable interface. That's exactly one reason why IEnumerator<T> implements the IDisposable interface. Moreover, the code generated by a foreach statement creates a try/finally block under the covers that calls Dispose on the enumerator within the finally block. You can see the technique in action in my previous example.

Types That Produce Collections

I've already touched upon the fact that a collection's contents can change while an enumerator is enumerating the collection. If the collection changes, it could invalidate the enumerator. In the following sections on iterators, I show how you can create an enumerator that locks access to the container while it is enumerating. Although that's possible, it may not be the best thing to do from an efficiency standpoint. For example, what if it takes a long time to iterate over all of the items in the collection? The foreach loop could do some lengthy processing on each item, during which time anyone else could be blocked from modifying the collection.

In cases like these, it may make sense for the foreach loop to iterate over a copy of the collection rather than the original collection itself. If you decide to do this, you need to make sure you understand what a copy of the collection means. If the collection contains value types, then the copy is a deep copy, as long as the value types within don't hold on to reference types internally. If the collection contains reference types, you need to decide if the copy of the collection must clone each of the contained items. Either way, it would be nice to have a design guideline to follow in order to know when to return a copy.

The current rule of thumb when returning collection types from within your types is to always return a copy of the collection from methods, and return a reference to the actual collection if accessed through a property on your type. Although this rule is not set in stone, and you're in no way obligated to follow it, it does make some semantic sense. Methods tend to indicate that you're performing some sort of operation on the type and you may expect results from that operation. On the other hand, property access tends to indicate that you need direct access to the state of the object itself. Therefore, this rule of thumb makes good semantic sense. In general, it makes sense to apply this same semantic separation to all properties and methods within your types.

Iterators

In the previous section, I showed you a cursory and lightweight example of creating an enumerator for a collection type. After you do this a few times, the task becomes mundane. And any time a task becomes mundane, we as humans are more likely to introduce silly mistakes. C# introduces a new construct called an iterator block to make this task much easier. Before I go into the gory details of iterators, let's quickly look at how to accomplish the same task as the example in the previous section. This is the "easy way" that I was talking about:

using System;
using System.Collections;
using System.Collections.Generic;

public class MyColl<T> : IEnumerable<T>
{
    public MyColl( T[] items ) {
        this.items = items;
    }

    public IEnumerator<T> GetEnumerator() {
        foreach( T item in items ) {
            yield return item;
        }
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }

    private T[] items;
}

public class EntryPoint
{
    static void Main() {
        MyColl<int> integers =
            new MyColl<int>( new int[] {1, 2, 3, 4} );

        foreach( int n in integers ) {
            Console.WriteLine( n );
        }
    }
}

It doesn't get much easier than that. Notice that the enumerator implementation from the example in the previous section has boiled down to three lines within the GetEnumerator method. The key to the whole thing is the yield keyword. The presence of the yield keyword defines this block of code as a yield block. When you see it for the first time, it can be a little confusing to figure out exactly what's going on. When GetEnumerator is called, the code in the method that contains the yield statement is not actually executed at that point in time. Instead, the compiler generates an enumerator class, and that class contains the yield block code. It is an instance of that class that is returned. Thus, when the foreach statement in Main calls through to the IEnumerator<T> methods, the code in the yield block is utilized.

One thing missing that was in the example from the previous section is synchronization. Let's explore how to add synchronization to the enumerator returned by the yield block. The following is a replacement for the previous GetEnumerator method:

public IEnumerator<T> GetEnumerator() {
    lock( items.SyncRoot ) {
        for( int i = 0; i < items.Length; ++i ) {
            yield return items[i];
        }
    }
}

How amazingly simple is that? For the sake of variety, I've changed the way I iterate over the collection using a for loop rather than foreach. Now, let me explain what magic the compiler is doing here. As before, the yield block code isn't executed immediately. Rather, an enumerator object is returned. Internally, the enumerator can be in one of several states. The first time MoveNext is called on the enumerator, the block of code is executed up until the first yield statement is reached. Each subsequent call to MoveNext continues execution of the loop until either a yield break statement is reached or the loop falls through to the end of the method. Once that happens, the enumerator goes into its final state, and you cannot use it to enumerate the collection anymore. In fact, the Reset method isn't available for use on enumerators generated from yield blocks, and if you call it, a NotSupportedException is thrown. At the end of enumeration, any finally blocks within the yield block are executed as expected. In this case, that means releasing the lock, because the C# lock statement boils down to a try/finally construct under the covers. Also, if the enumerator is disposed of before it reaches the end of the loop, the compiler is smart enough to put the code within the finally block into the implementation of Dispose on the enumerator so that the lock always gets released.

As you can see, the compiler is doing a lot of work for you under the covers when you use iterators. As a final example, I've shown yet another way to iterate through the items in this collection:

public IEnumerator<T> GetEnumerator( bool synchronized ) {
    if( synchronized ) {
        Monitor.Enter( items.SyncRoot );
    }
    try {
        int index = 0;
        while( true ) {
            if( index < items.Length ) {
                yield return items[index++];
            } else {
                yield break;
            }
        }
    }
    finally {
        if( synchronized ) {
            Monitor.Exit( items.SyncRoot );
        }
    }
}

public IEnumerator<T> GetEnumerator() {
    return GetEnumerator( false );
}

It is not a pretty way to iterate over the items, but I wanted to show you an example of using the yield break statement. Also, notice that I created a new GetEnumerator method that accepts a bool denoting whether the caller wants a synchronized or nonsynchronized enumerator. The important thing to note here is that the enumerator object created by the compiler now has a public field named synchronized. Any parameters passed to the method containing the yield block are added as public fields to the generated enumerator class.

Note

The enumerator generated from the yield block captures local variables and parameters; therefore, it is invalid to attempt to declare ref or out parameters on methods that implement a yield block.

You could argue that the added fields should be private rather than public, because you can really mess up the enumerator if you access the fields and modify those public fields during enumeration. In this case, if you modify the synchronized field during enumeration and change it to false, other entities will have a hard time gaining access to the collection because the lock will never be released. Even though you have to use reflection to access the public fields of an enumerator generated from a yield block, it's easy and dangerous to do so if used improperly. That's not to say that this technique cannot be useful, as I show in an example in the section "Forward, Reverse, and Bidirectional Iterators," when I demonstrate how to create a bidirectional iterator.

You can mitigate this whole can of worms by introducing the proverbial extra level of indirection. Instead of returning the enumerator resulting from the yield block, put it inside of a wrapper enumerator and return the wrapper instead. This technique is shown in the following example:

using System;
using System.Threading;
using System.Reflection;
using System.Collections;
using System.Collections.Generic;

public class EnumWrapper<T> : IEnumerator<T>
{
    public EnumWrapper( IEnumerator<T> inner ) {
        this.inner = inner;
    }

    public void Dispose() { inner.Dispose(); }

    public bool MoveNext() { return inner.MoveNext(); }

    public void Reset() { inner.Reset(); }

    public T Current {
        get { return inner.Current; }
    }

    object IEnumerator.Current {
        get { return inner.Current; }
    }
private IEnumerator<T> inner;
}

public class MyColl<T> : IEnumerable<T>
{
    public MyColl( T[] items ) {
        this.items = items;
    }

    public IEnumerator<T> GetEnumerator( bool synchronized ) {
        return( new EnumWrapper<T>(GetPrivateEnumerator(synchronized)) );
    }

    private IEnumerator<T> GetPrivateEnumerator( bool synchronized ) {
        if( synchronized ) {
            Monitor.Enter( items.SyncRoot );
        }
        try {
            foreach( T item in items ) {
                yield return item;
            }
        }
        finally {
            if( synchronized ) {
                Monitor.Exit( items.SyncRoot );
            }
        }
    }

    public IEnumerator<T> GetEnumerator() {
        return GetEnumerator( false );
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }

    private T[] items;
}

public class EntryPoint
{
    static void Main() {
        MyColl<int> integers =
            new MyColl<int>( new int[] {1, 2, 3, 4} );

        IEnumerator<int> enumerator =
            integers.GetEnumerator( true );

        // Try to get a reference to synchronized field.
        //
// Throws an exception!!!
        object field = enumerator.GetType().
            GetField("synchronized").GetValue( enumerator );
    }
}

In Main, you can see that I'm playing the part of the nefarious developer who wants to change the enumerator's state during enumeration. You can see that I attempt to change the value of the property on enumerator using reflection. This throws an exception, because the property does not exist on the wrapper.

Note

Those of you familiar with the intricacies of reflection will recognize that it is technically possible for the code to modify the private field within the EnumWrapper<T> instance. That, however, requires that the code pass the ReflectionPermission code access security (CAS) demand. This demand fails unless the person running this code has granted it explicitly, and that's unlikely. CAS is beyond the scope of this book, but for all the nitty-gritty details on CAS, including how to extend it to meet your needs, I recommend reading .NET Framework Security by Brian A. LaMacchia, et al. (Upper Saddle River, NJ: Pearson Education, 2002).

So far, you've seen how iterator blocks are handy for creating enumerators. However, you can also use them to generate the enumerable type as well. For example, suppose you want to iterate through the first few powers of 2. You could do the following:

using System;
using System.Collections.Generic;

public class EntryPoint
{
    static public IEnumerable<int> Powers( int from,
                                           int to ) {
        for( int i = from; i <= to; ++i ) {
            yield return (int) Math.Pow( 2, i );
        }
    }

    static void Main() {
        IEnumerable<int> powers = Powers( 0, 16 );
        foreach( int result in powers ) {
            Console.WriteLine( result );
        }
    }
}

In this example, the compiler generates a single type that implements the four interfaces IEnumerable<int>, IEnumerable, IEnumerator<int>, and IEnumerator. Therefore, this type serves as both the enumerable and the enumerator. The bottom line is that any method that contains a yield block must return a type of IEnumerable<T>, IEnumerable, IEnumerator<T>, or IEnumerator, where T is the yield type of the method. The compiler will handle the rest. I recommend that you strive to use the generic versions, because they will avoid unnecessary boxing for value types and give the type-safety engine more muscle. In the previous example, the from and to values are stored as public fields in the enumerable type, as shown earlier in this section. So, you may want to consider wrapping them up inside an immutable type if you want to prevent users from modifying them during enumeration.

Tip

Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries by Krzysztof Cwalina and Brad Abrams (Boston, MA: Addison-Wesley Professional, 2005) suggests that a type should never implement both IEnumerable<T> and IEnumerator<T>, because a single type should semantically be either a collection or an enumerator but not both. However, the objects generated by yield blocks violate this rule. For hand-coded collections, you should try to adhere to the rule, even though it's clear that C# does this to make yield blocks more useful.

Forward, Reverse, and Bidirectional Iterators

Many libraries that support iterators on their container types support three main flavors of iterators in the form of forward, reverse, and bidirectional iterators. Forward iterators are analogous to regular enumerators implementing IEnumerator<T>, which the GetEnumerator methods of the container types in the .NET library typically expose. However, what if you need a reverse iterator or a bidirectional iterator? C# iterators make creating such things nice and easy.

To get a reverse iterator for your container, all you need to do is create a yield block that loops through the items in the collection in reverse order. Even more convenient, you can typically declare your yield block external to your collection, as shown in the following example:

using System;
using System.Collections.Generic;

public class EntryPoint
{
    static void Main() {
        List<int> intList = new List<int>();
        intList.Add( 1 );
        intList.Add( 2 );
        intList.Add( 3 );
        intList.Add( 4 );

        foreach( int n in CreateReverseIterator(intList) ) {
            Console.WriteLine( n );
        }
    }

    static IEnumerable<T> CreateReverseIterator<T>( IList<T> list ) {
        int count = list.Count;
        for( int i = count-1; i >= 0; —i ) {
            yield return list[i];
        }
}
}

The meat of the example is in the CreateReverseIterator<T> method. This method only works on collections of type IList<T>, but you could easily write another form of CreateReverseIterator<T> that takes some other collection type. When you create utility methods of this sort, it's always best to be as generic as possible in the types that you accept. For example, would it be possible to make CreateReverseIterator<T> more general-purpose by accepting a type of ICollection<T>? No, because ICollection<T> doesn't declare an index operator. IList<T> does declare an index operator, though.

Now let's turn our attention to a bidirectional iterator. In order to make a bidirectional iterator out of an enumerator, you need to be able to toggle its direction. As I showed previously, enumerators created from methods that accept parameters and contain a yield block have public fields that you can modify. Although you must use reflection to access these fields, you can still do it nevertheless. First, let's look at a possible usage scenario for a bidirectional iterator:

static void Main() {
    LinkedList<int> intList = new LinkedList<int>();
    for( int i = 1; i < 6; ++i ) {
        intList.AddLast( i );
    }

    BidirectionalIterator<int> iter =
        new BidirectionalIterator<int>(intList,
                                       intList.First,
                                       TIteratorDirection.Forward);

    foreach( int n in iter ) {
        Console.WriteLine( n );

        if( n == 5 ) {
            iter.Direction = TIteratorDirection.Backward;
        }
    }
}

You need a way to create an iterator object that supports IEnumerable<T> and then use it within a foreach statement to start the enumeration. At any time within the foreach block, you want the ability to reverse the direction of iteration. The following example shows a BidirectionalIterator class that facilitates the previous usage model:

public enum TIteratorDirection {
    Forward,
    Backward
};

public class BidirectionalIterator<T> : IEnumerable<T>
{
    public BidirectionalIterator( LinkedList<T> list,
                                  LinkedListNode<T> start,
                                  TIteratorDirection dir ) {
        enumerator = CreateEnumerator( list,
                                       start,
                                       dir ).GetEnumerator();
enumType = enumerator.GetType();
    }

    public TIteratorDirection Direction {
        get {
            return (TIteratorDirection) enumType.GetField("dir")
                            .GetValue( enumerator );
        }
        set {
            enumType.GetField("dir").SetValue( enumerator,
                                               value );
        }
    }

    private IEnumerator<T> enumerator;
    private Type enumType;

    private IEnumerable<T> CreateEnumerator( LinkedList<T> list,
                                             LinkedListNode<T> start,
                                             TIteratorDirection dir ) {
        LinkedListNode<T> current = null;
        do {
            if( current == null ) {
                current = start;
            } else {
                if( dir == TIteratorDirection.Forward ) {
                    current = current.Next;
                } else {
                    current = current.Previous;
                }
            }

            if( current != null ) {
                yield return current.Value;
            }
        } while( current != null );
    }

    public IEnumerator<T> GetEnumerator() {
        return enumerator;
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }
}

Technically speaking, I didn't have to wrap the enumerator inside the BidirectionalIterator class. I could have accessed the direction variable via reflection from within the foreach block directly. However, in order to do that, the code within the foreach block would have needed the name of the parameter passed into the BidirectionalIterator.CreateEnumerator method with the yield block. In order to avoid such disjoint coupling, I tidied it up within the BidirectionalIterator wrapper class and provided a Direction property to access the public field on the enumerator.

Finally, the following example shows how you can use the same technique to implement a circular iterator. You could use this for things such as game loops, where you must iterate indefinitely through a collection of entities, updating their state with each pass until requested to quit:

using System;
using System.Collections;
using System.Collections.Generic;

public class EntryPoint
{
    static void Main() {
        LinkedList<int> intList = new LinkedList<int>();
        for( int i = 1; i < 6; ++i ) {
            intList.AddLast( i );
        }

        CircularIterator<int> iter =
            new CircularIterator<int>(intList,
                                      intList.First);

        int counter = 0;
        foreach( int n in iter ) {
            Console.WriteLine( n );

            if( counter++ == 100 ) {
                iter.Stop();
            }
        }
    }
}

public class CircularIterator<T> : IEnumerable<T>
{
    public CircularIterator( LinkedList<T> list,
                             LinkedListNode<T> start ) {
        enumerator = CreateEnumerator( list,
                                       start,
                                       false ).GetEnumerator();
        enumType = enumerator.GetType();
    }

    public void Stop() {
        enumType.GetField("stop").SetValue( enumerator, true );
    }

    private IEnumerator<T> enumerator;
    private Type enumType;

    private IEnumerable<T> CreateEnumerator( LinkedList<T> list,
                                             LinkedListNode<T> start,
                                             bool stop ) {
        LinkedListNode<T> current = null;
        do {
if( current == null ) {
                current = start;
            } else {
                current = current.Next;
                if( current == null ) {
                    current = start;
                }
            }

            yield return current.Value;
        } while( !stop );
    }

    public IEnumerator<T> GetEnumerator() {
        return enumerator;
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }
}

I've included a Stop method on CircularIterator<T> so that you can easily tell it to stop iterating. Of course, as with the bidirectional iterator example, the Stop method uses reflection to set the public field stop on the compiler-generated enumerator. I'm sure that you'll agree that there are many more creative uses for yield blocks for creating complex iteration paths.

Collection Initializers

C# 3.0 introduced a new abbreviated syntax for initializing collections, similar to the object initializer syntax shown in the section titled "Object Initializers." If the collection type instance you are initializing implements IEnumerable or IEnumerable<T> and contains a public Add method that accepts one parameter of the contained type, you can utilize this new syntax. Alternatively, your type could just implement ICollection<T> from the System.Collections.Generic namespace because it also implements IEnumerable<T>. The collection initializer syntax is shown in the following:

using System;
using System.Collections.Generic;

public class Employee
{
    public string Name { get; set; }
}

public class CollInitializerExample
{
    static void Main() {
        var developmentTeam = new List<Employee> {
            new Employee { Name = "Michael Bolton" },
            new Employee { Name = "Samir Nagheenanajar" },
            new Employee { Name = "Peter Gibbons" }
};

        Console.WriteLine( "Development Team:" );
        foreach( var employee in developmentTeam ) {
            Console.WriteLine( "	" + employee.Name );
        }
    }
}

Under the covers the compiler generates a fair amount of code to help you out here. For each item in the collection initialization list, the compiler generates a call to the collection's Add method. Notice that I have also used the new object initializer syntax to initialize each of the instances in the initializer list.

As I've mentioned, the collection type must implement ICollection<T> or implement IEnumerable<T> and a public Add method. If it does not, you will receive compile-time errors. Additionally, the collection must implement only one specialization of ICollection<T>; that is, it can only implement ICollection<T> for one type T. And finally, each item in the collection initialization list must be implicitly convertible to the type T.

Summary

In this chapter, I gave a brief overview of how arrays work in the CLR and in C#, in preparation for the discussion of generic collection types. After reviewing the generic collection types defined in System.Collections.Generic, I covered efficiency and usage concerns and introduced you to the useful types defined in System.Collections.ObjectModel. I then turned the spotlight on enumerators and showed you how to create effective enumerators efficiently by employing iterator yield blocks added in the C# 2.0 language. Finally, I showed you the syntax added in C# 3.0 that streamlines initializing a collection at instantiation time.

Although this chapter didn't delve into the minute details of each of the collection types, after reading the chapter, you should be effectively armed with the information you need to make informed choices about which generic collection types to use and when. I encourage you to reference the MSDN documentation often for all of the finer details regarding the APIs for the collection types.

In the next chapter, I cover delegates, anonymous methods, and events. Anonymous methods are useful for creating callable code inline at the point where you register the callback with the caller.



[27] I cover how you can define your own custom implicit and explicit conversion operators in Chapter 6.

[28] Implicitly typed arrays are also very useful when used with LINQ. In fact, that's true of most of the features introduced in C# 3.0. Taken individually, they provide minimal added convenience, but taken as a whole and used with LINQ, they provide the ability to create incredibly expressive expressions. I cover LINQ in Chapter 16.

[29] IComparable<T>, the generic form of IComparable, is also available.

[30] Chapter 12 covers synchronization and concurrency in detail, along with the subject of threading in the .NET Framework.

[31] You can find out more about System.Monitor and other synchronization techniques in Chapter 12.

[32] Arrays declared with the C# array syntax always have a zero lower bound. If you need an array with a nonzero lower bound, you must create the instance via the System.Array.CreateInstance() method.

[33] I describe the NVI pattern in Chapter 13.

[34] You can find Rico's blog at http://blogs.msdn.com/ricom/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.197.95