Collection

Another common task under programming is to store and manage ordered sets of elements. Some common examples are the list of open files in an application, the set of rows returned from a database query, the group of users allowed to access an application, and so on.

Typically, dealing with elements of a specific type is encapsulated in a new type. Such a type is called a collection type. A collection type essentially lets you treat a set of elements as a single unit.

The BCL defines many useful collection types under the System.Collections and System.Collections.Specialized namespaces. The following are some common examples:

  • ArrayList: An array of elements. The size of the array can grow dynamically.

  • BitArray: A variable-size compact array of bit values.

  • HashTable: A variable-size table of key–value pairs.

  • Queue: A variable-size first in, first out (FIFO) queue of elements.

  • SortedList: A variable-size sorted list of key–value pairs.

  • Stack: A variable-size last in, first out (LIFO) stack of elements.

  • StringCollection: A variable-size collection of String values.

The .NET Framework offers a formal definition of a collection as a type that implements a standard interface ICollection (namespace System.Collections). Here is the definition of the interface:

interface ICollection : IEnumerable {
     Int32 Count { get; }
     void CopyTo(Array array, Int32 index);
     Boolean IsSynchronized{ get; }
     Object SyncRoot { get; }
}

Property Count returns the number of elements in the collection. For some types, calculating the number of elements can be quite time consuming. Such a type may choose to throw a NotSupportedException instead of returning the count.

Method CopyTo can be used to copy a collection into an array.

Property IsSynchronized returns true if the collection has been designed to be thread-safe. For updating such a collection, no explicit thread-safety measures are needed.

Property SyncRoot returns an object that can be used to explicitly synchronize access to the collection in a multithreaded environment. Under C#, the returned value is typically used with a lock statement. Most implementations of SyncRoot simply return “this” object. Thread safety and the semantics of lock are covered in Chapter 8 when we discuss synchronization.

Note that almost all collection classes defined in the BCL are not thread-safe by default. It is up to the user of the collection to explicitly synchronize access to the collection, if necessary. The good news is that most of these collection classes implement a method called Synchronized that returns a synchronized wrapper around the underlying collection. Look in the SDK documentation for more information.

Let's now look at how to use some frequently used (BCL) collection classes.

Lists

Although a collection is formally defined using ICollection, the interface provides rather limited functionality. A collection type is more meaningful if it provides manageability functions such as adding and removing elements. Such collection types can expose their functionality by means of a standard interface, IList. Here is its definition:

public interface IList : ICollection {
     // Access individual element by its index
     Object this[Int32 index} {get; set;}

     // Adding elements
     Int32 Add(Object item);
     void Insert(Int32 index, Object item);

     // Removing elements
     void Remove(Object item); // Remove the specified item
     void RemoveAt(Int32 index); // Remove item at the index
     void Clear(); // Remove all items

     // Search
     Boolean Contains(Object item); // Return true if item found
     Int32 IndexOf(Object item); // Return the index of the item

     // Misc
     Boolean IsFixedSize { get; } // Return true if the set
                           // cannot be resized
     Boolean IsReadOnly { get; } // Return true if the set
                           // cannot be modified
}

As can be seen from the IList interface definition, any collection type that implements IList must also implement ICollection and IEnumerable interfaces. Types that implement IList expose a decent set of functionality for adding, removing, and searching elements in the collection.

An example of a BCL class that implements IList is ArrayList. This class stores generic objects. The following code excerpt shows how to use this class:

// Project ArrayList

class Foo {
     public Foo(int val) {m_Value = val;}
     public int Value { get {return m_Value;}}
     private int m_Value;
}

class MyApp {
     public static void Main() {
       ArrayList myList = new ArrayList();

       // Add some values
       myList.Add(new Foo(5));
       myList.Add(new Foo(10));
       myList.Add(new Foo(15));

       // enumerate values
       for(int i=0;i<myList.Count;i++) {
         Foo f = (Foo) myList[i];
         Console.WriteLine(f.Value);
       }

       // Remove the second value
       myList.RemoveAt(1);

       // enumerate values using foreach
       foreach(Foo f in myList) {
         Console.WriteLine(f.Value);
       }
     }
}

An important aspect of an IList-based collection type is that it provides indexed access to its elements; that is, you can access an element in the collection by its index, as illustrated here:

Foo f = (Foo) myList[i];

Note that indexes in this collection type are zero-based.

Internally, ArrayList uses an array to store the elements. As elements are added and removed, it automatically adjusts the size of the internal array by allocating and freeing memory as required. To optimize performance, the size of the internal array is adjusted in chunks. The class exposes a property, Capacity, that defines the number of elements the internal array can hold without requiring a reallocation. The initial value for the Capacity by default is 16, which implies that the first 16 elements added to the ArrayList do not result in any reallocation. When the 17th element is being added, the Capacity gets doubled; that is, the size of the internal array is reallocated to 32. When the 33rd element is added, the Capacity gets doubled once again, and so on.

In some situations, you may wish to define your own initial Capacity. ArrayList defines an overloaded constructor that can be used to specify the Capacity. The following code excerpt sets Capacity to 3.

// Project ArrayList

ArrayList myNewList = new ArrayList(3);
Console.WriteLine("Initial capacity: {0}", myNewList.Capacity);

It is also possible to change the Capacity property anytime during the execution. Just make sure that the specified value is greater than the number of elements the ArrayList object is holding. Otherwise, the runtime will throw an ArgumentOutOfRangeException.

Initialize Collections with Item Count

When using ArrayList or any other collection type under .NET, try to specify the initial capacity. This obviates unnecessary reallocations.


Note that when using ArrayList, each element is stored as a generic object (System.Object). There are two problems with this. First, when a value-type element is stored, it gets boxed. Second, when retrieved, the element typically requires a cast back to its original type.

So, can we create strongly typed collections; that is, collections where storing or retrieving elements does not require casting?

Custom Collections

The BCL provides a class, CollectionBase, which you can inherit to create a custom strongly typed collection. This class provides a property of type IList called List that you can use to store and retrieve elements. The following code excerpt illustrates its usage:

// Project CustomCollector

class FooList : CollectionBase {
     public int Add(Foo f) {
       return List.Add(f);
     }

     public Foo this[int index] {
       get { return (Foo) List[index]; }
       set { List[index] = value; }
     }

     public new void RemoveAt(int index) {
       List.RemoveAt(index);
     }

     public new Enumerator GetEnumerator() {
       return new Enumerator(this);
     }
     ...
}

The enumerator class can be created as shown in the earlier section. To take a snapshot, this class can store as member field the enumerator from the CollectionBase class. This is illustrated in the following code:

// Project CustomCollector

class FooList : CollectionBase {
     ...
     public class Enumerator : IEnumerator {
       private IEnumerator m_BaseEnumerator;

       public Enumerator(IEnumerable baseEnumerable) {
          m_BaseEnumerator = baseEnumerable.GetEnumerator();
       }
       ...
     }
}

Note that implementing a strongly typed collection using CollectionBase doesn't really save on performance. The casts (and possible boxing for value types) still take place when the values are passed to and from the inner list. The only advantage is that the user code does not have to perform any explicit cast operations.

In the next release of the .NET Framework, Microsoft plans to offer an extension to C# called generics. Among other things, generics is intended to work with value types nicely without the need for boxing.

Why inherit from CollectionBase instead of ArrayList? Well, CollectionBase provides hooks to monitor addition and removal of elements from the collection. For example, one can override CollectionBase's virtual methods OnInsert and OnInsertComplete to provide extra logic before and after inserting an element into the collection.

As a collection of strings is used so often in programming, it is worth noting that the BCL provides a strongly typed collection for strings called StringCollection.

Arrays

Although one can implement a strongly typed collection by hand, the .NET Framework provides a simpler mechanism to do so by means of arrays. An array defines a way to store a set of strongly typed elements. The only limitation on an array is that once it is allocated, its size cannot be changed.

In C#, creating and using arrays is quite similar to that in C++ or Java. The following code excerpt illustrates its usage:

// Project SimpleArray

int[] arr = new int[] {20, 10, 40, 30};
int [] newarr = new int[4];

// Method 1 of accessing elements
for(int i=0;i<arr.Length;i++) {
     newarr[i] = arr[i];
     Console.WriteLine(newarr[i]);
}

When dealing with arrays, there are two important differences that C++ programmers should be aware of. The first difference is that you cannot define a fixed-sized array declaratively. For example, the following lines of code are not valid under C#:

int arr[4]; // illegal
int[4] newarr; // illegal

However, you can define the size of the array using keyword new, as we did in our previous example. You can either define the size or initialize the array that indirectly defines the size:

int [] newarr = new int[4];
int[] arr = new int[] {20, 10, 40, 30}; // initialize with values

The second difference is that in C# (and in Java), creating an array of a reference type, as shown in the following code excerpt, does not create each individual element of the array. To illustrate this, consider the following code excerpt:

// Project SimpleArray

class Foo {
     ...
}

Foo[] foos = new Foo[2];

Creating an array only creates the placeholder for storing the elements, not the elements (unlike C++, which creates and initializes the elements as well). The individual elements can be created in two ways, as illustrated here:

// Project SimpleArray

// Method 1
Foo[] foos = new Foo[2] { new Foo(), new Foo() };

// Method 2. Use a loop.
for(int i=0;i<2;i++) {
     Foo[i] = new Foo();
}

Note that this extra initialization logic is needed only for reference type elements, not for value types.

Behind the scenes, an array gets derived from System.Array class. This class implements interface IList but hides some of the interface's methods such as Add and Remove that may result in changing the size of the array.

System.Array also provides methods for searching and sorting arrays. For example, the following line of code sorts our array in an ascending order:

Array.Sort(arr);

Array Covariance

Under C# specifications, for two reference types A and B, if there exists a conversion from A to B, then it is also possible to convert an array of A to an array of B. The following code excerpt illustrates this:

// Project SimpleArray

class Fruit {
     ...
};

class Orange : Fruit {
     ...
}

class TestFruits {
  public static void Test() {
    Orange[] oranges = new Orange[2];
    Fruit[] fruits = oranges;
     ...
  }
}

Essentially, this lets you convert a bag of oranges to a bag of fruits, which is not possible in C++. Java programmers, however, are used to this idea of array covariance.

It should be noted, that array covariance applies only to reference types. It does not apply to value types. For example, you cannot convert int[] to object[] or double[] to float[].


Dictionaries

All the collection types that we have dealt with so far store a set of single elements. Sometimes, it is desirable to store elements as key–value pairs. For example, a professor may wish to create a list of student grades where the key is the full name of the student and the value is the grade. Such a collection is referred to as a dictionary.

The .NET Framework offers a formal definition of a dictionary as a type that implements a standard interface IDictionary. Here is its definition (along with a short description of each method):

public interface IDictionary : ICollection {
     // access elements of the collection
     Object this[Object key] { get; set; } // indexed access
     ICollection Keys { get; } // return a collection of keys
     ICollection Values { get; }// return a collection of values
     new IDictionaryEnumerator GetEnumerator(); //The enumerator
     // Add, Remove, etc.
     void Add(Object key, Object value); // add a pair
     void Remove(Object key); // remove a pair by its key
     void Clear(); // clear the whole table

     // search
     Boolean Contains(Object key); // search for a given key

     // Misc
     Boolean IsFixedSize { get; } // Is the size fixed?
     Boolean IsReadOnly  { get; } // Is it just read-only?
}

As can be seen from the IDictionary interface definition, any class that implements IDictionary must also implement ICollection and IEnumerable.

Perhaps the most commonly used BCL implementation of IDictionary is the Hashtable class. The following code excerpt demonstrates how to create and use this class:

// Project HashTable

public static void Main() {
     Hashtable grades = new Hashtable();

     // Add some values
     grades.Add("Tom", 70);
     grades.Add("Dick", 60);
     grades.Add("Harry", 80);

     // display grade for a specific student
     int grade = (int) grades["Harry"];
     Console.WriteLine("Harry's grade={0}", grade.ToString());

     // enumerate values
     foreach(DictionaryEntry entry in grades) {
       Console.WriteLine("{0}={1}", entry.Key, entry.Value);
     }

     // Remove Dick
     grades.Remove("Dick");

     // enumerate values
     foreach(DictionaryEntry entry in grades) {
       Console.WriteLine("{0}={1}", entry.Key, entry.Value);
     }

}

Internally, a Hashtable's storage is divided into a number of buckets. Each bucket can store at most one entry. When an entry is being added to the Hashtable, the table uses the hash code of the key to compute a suitable bucket location for storing the entry. The search algorithm is based on an algorithm called double hashing [Nist]. If the computed bucket has already been filled, then the algorithm tries to guess another bucket location using a different hash function, thus minimizing clustering. The guessing process continues until an empty bucket is found or a certain number of attempts have been exceeded.

For looking up a key, the Hashtable applies the same double hashing technique.

From the algorithm, there are two things that must be obvious. First, providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. In a hash table with a good implementation of a hash function, searching for an element takes constant time. In a hash table with a poor implementation of a hash function, the time for a search increases with the number of items in the hash table. Hash functions should also be inexpensive to compute.

Second, for faster inserts and lookups, the number of buckets should be more than the number of entries. The ratio of entries to buckets is called the load factor of the hash table. The default load factor for the Hashtable is 1.0, implying that there is one entry per bucket on an average. A load factor of less than 1.0 implies that when an entry is being added to the Hashtable, there is a good chance that an empty bucket will be found on the first attempt. Likewise, there is a good chance of looking up a key on the first attempt.

It is worth noting that the increased performance is achieved at the expense of higher memory consumption—a smaller load factor value implies that more buckets must be created.

A Good Load Factor for Hashtables

A load factor of 0.7 to 0.8 has been found to provide a good balance between performance and memory consumption.


The Hashtable provides many overloaded constructors that can be used to specify the load factor. The following line of code uses one such constructor.

// Project HashTable

Hashtable newgrades = new Hashtable(10, 0.7f);

The Hashtable constructor used in this code takes two parameters. The first parameter is used to specify the initial capacity of the Hashtable object. The second parameter is the load factor that the Hashtable object should use.

There is an important programming consideration when defining a type that may be used as a key in the Hashtable. When an entry is being added to the Hashtable or a key is being looked up, the Hashtable obtains the hash code by calling Object.GetHashCode on the key. The actual key is matched by calling Object.Equals on the key. Therefore, it is important for the key type to provide a suitable implementation of GetHashCode and Equals methods. Furthermore, these methods must produce the same results when called with the same parameters while the key exists in the Hashtable.

Note that the Hashtable stores keys and values as generic objects. It is possible to define a custom dictionary that is strongly typed. The framework provides a class, DictionaryBase, to make this job easier for the implementers. Defining a custom dictionary based on DictionaryBase is similar to defining a collection using CollectionBase, as we did earlier, and is left as an exercise for the readers.

The BCL also provides another IDictionary-based class called ListDictionary. This class is a simple implementation of IDictionary using a singly linked list. If the number of elements being stored is 10 or less, this class provides better performance than a Hashtable and uses less memory.

The BCL provides one more class, HybridDictionary, which uses ListDictionary while the collection is small and then switches to Hashtable when the collection gets large. This class is recommended for cases in which the number of elements in the dictionary is unknown. However, be aware of the overhead of switching between the ListDictionary and the Hashtable.

Using a Different Hash Code Algorithm

A HashTable depends on the hash code of the object being inserted or retrieved. Recall that providing a good hash function on a class can significantly improve the performance of adding those objects to the hash table.

If the type being added to the hash table is owned by you, it is easy to override Object.GetHashCode and provide a suitable implementation. But what can you do about types that you don't own?

For cases where you cannot override GetHashCode, the BCL provides a different mechanism to provide your hash function—by way of interface IHashCodeProvider. Here is its definition:

public interface IHashCodeProvider {
     int GetHashCode(object o);
}

Some overloads of the Hashtable constructor take IHashCodeProvider as a parameter. This gives you a chance to pass in an object that implements IHashCodeProvider and returns the hash code based on your logic.

A common example of a type that you don't own but might require a different hash function is String. The default implementation of String.GetHashCode computes the hash code that is based on the case-sensitivity of the string. However, you can now define your own hash code provider that returns a case-insensitive hash code. As a matter of fact, this case is so common that the BCL provides such a provider—class CaseInsensitiveHashCodeProvider. The BCL also provides a static instance of this class that can be accessed via CaseInsensitiveHashCodeProvider.Default.

Sorting a Collection

Consider the following code excerpt:

// Project CollectionSort

class Foo {
     public Foo(int val) {m_Value = val;}
     public int Value { get {return m_Value;}}
     private int m_Value;
}

class MyApp {
     public static void Main() {
       ArrayList myList = new ArrayList();

       // Add some values
       myList.Add(new Foo(10));
       myList.Add(new Foo(5));
       myList.Add(new Foo(15));

       // enumerate values
       foreach(Foo f in myList) {
          Console.WriteLine(f.Value);
       }
     }
}

The main program stores instances of class Foo in an ArrayList object and dumps the value of each instance back to the console. When the program is run, you see the following output:

10
5
15

Sometimes it is desirable to have a sorting order between items of a collection. In fact this is such a common programming request that BCL defines a method, Sort, on ArrayList (as well as Array). Our Foo items can be sorted, for example, using the following line of code:

myList.Sort();

However, if you execute this code, you will notice that the program throws an InvalidOperationException. The problem is that the system doesn't know how to compare one instance of Foo to another.

To impose a sorting order there must be some mechanism that allows two items to be compared. Under .NET, this mechanism is provided by means of a standard interface IComparer, defined as follows:

public interface IComparer {
  int Compare (Object x, Object y);
}

This simple interface defines just one method, Compare. The purpose of this method is to compare two objects and return an integer indicating which object should be placed before the other. The implication of the return value is shown in Table 5.1.

Table 5.1. Comparer Return Value
Return ValueImplication
A negative numberx is less than y. Therefore, x should be placed before y.
0x and y are equal. Either can come first.
A positive numberx is greater than y. Therefore, x should be placed after y.

ArrayList supplies many overloaded versions of the Sort method. One such method takes a parameter of type IComparer that the method uses internally for comparing items in the collection. You can define a new type that implements IComparer and pass it to this method, as illustrated here:

// Project CollectionSort

class FooComparer : IComparer {
     public int Compare(Object x, Object y) {
       Foo f1 = x as Foo;
       Foo f2 = y as Foo;
       if ((null == f1) || (null == f2)) {
         throw new ArgumentException();
       }
       return (f1.Value - f2.Value);
     }
}

class MyApp {
     public static void Main() {
       ...
       myList.Sort(new FooComparer());
     }
}

The BCL provides some useful utility classes that are based on IComparer. For example, you can use the class CaseInsensitiveComparer to perform case-insensitive comparisons on strings. The class provides a static instance that can be obtained using the property CaseInsensitiveComparer.Default. Likewise, if you are interested in case-sensitive comparisons, you can use another class Comparer or its static instance Comparer.Default.

Case-Insensitive Hash Table

If you wish to define a hash table that can store and retrieve keys in a case-insensitive fashion, you need to supply both a case-insensitive hash-code provider and a case-insensitive comparer to the hash table. This can be done using one of the overloaded constructors of Hashtable, as shown here.

Hashtable grades = new Hashtable(
     CaseInsensitiveHashCodeProvider.Default,
     CaseInsensitiveComparer.Default);


The BCL defines yet another mechanism to compare objects by means of an interface, IComparable (namespace System). Here is its definition:

public interface IComparable {
     int CompareTo(Object object);
}

Any object that implements IComparable is declaring that it knows how to compare itself with other objects. The interface method CompareTo serves comparing “this” object with the specified object. The following code excerpt illustrates this. It defines a new class FooNew that is a replacement for our old class Foo. The changes have been highlighted.

// Project CollectionSort

class FooNew : IComparable {
     public FooNew(int val) {m_Value = val;}
     public int Value { get {return m_Value;}}

     public int CompareTo(Object o) {
							  FooNew f = o as FooNew;
							  if (null == f) {
							throw new ArgumentException();
							  }
							  return (this.Value - f.Value);
							}

     private int m_Value;
}

When such a type is used with Array or ArrayList, there is no need to provide a separate comparer object for sorting. However, a separate comparer object is still useful if your collection contains objects of different types.

It is worth mentioning that almost all base datatypes defined in the BCL, such as Int32, String, Double, and so on, implement the IComparable interface.

How about keeping the keys of a dictionary in a sorted order? Well, a Hashtable doesn't provide such a sorting functionality. However, the BCL provides another IDictionary-based class called SortedList that can be used to sort the keys, either via the IComparable interface or the IComparer interface. Check the SDK documentation on how to use this class. Keep in mind, though, that operations on a SortedList tend to be slower than operations on a Hashtable because of the sorting.

You may be wondering why IComparable is defined in the System namespace and not in the System.Collection namespace. This is because this interface provides a general mechanism that is not just specific to collections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.152.98