Chapter 12. Storing Data in Collections

One of the most common operations in programming is storing large sets of data. Traditionally, programmers used arrays to store related data. Because arrays can store custom data types, they seem to be the answer to many data-storage and data-manipulation issues. Arrays, however, don't expose all the functionality you might need in your application. To address the issues of data storage outside databases, the Framework provides, in addition to arrays, certain classes known as collections.

There are databases, of course, that can store any type of data and preserve their structure as well, but not all applications use databases. Although databases store data permanently, collections live in memory and must be persisted to a disk file between sessions. Collections can also be used to store data you read from a database in the target computer's memory. If your application needs to store custom objects, such as the ones you designed in Chapter 8, "Working with Objects," or a few names and contact information, you shouldn't have to set up a database. A simple collection like the ones described in this chapter will suffice.

In this chapter, you'll learn how to do the following:

  • Make the most of arrays

  • Store data in specialized collections such as List and Dictionary collections

  • Sort and search collections with custom comparers

Advanced Array Topics

Arrays are indexed sets of data, and this is how we've used them so far in this book. In this chapter, you will learn about additional members that make arrays extremely flexible. The System.Array class provides methods for sorting arrays, searching for an element, and more. In the past, programmers spent endless hours writing code to perform the same operations on arrays, but the Framework frees them from similar counterproductive tasks.

This chapter starts with a brief discussion of the advanced features of the Array class. With so many specialized collections supported by the Framework, arrays are no longer the primary mechanism for storing sets of data. However, because many developers are still using arrays, I've decided to include a brief presentation of the advanced techniques that will simplify the manipulation of arrays.

Sorting Arrays

To sort an array, call its Sort method. This method is heavily overloaded, and as you will see, it is possible to sort an array based on the values of another array or even to supply your own custom sorting routines. If the array is sorted, you can call the BinarySearch method to locate an element very efficiently. If not, you can still locate an element in the array by using the IndexOf and LastIndexOf methods. The Sort method is a reference method: It requires that you supply the name of the array to be sorted as an argument and sorts the array in place (in other words, it doesn't return another array with the same elements in a different order). The simplest form of the Sort method accepts a single argument, which is the name of the array to be sorted:

Array.Sort(arrayName)

This method sorts the elements of the array according to the type of its elements, as long as the array is strictly typed and was declared as a simple data type (String, Decimal, Date, and so on). If the array contains data that are not of the same type or that are objects, the Sort method will fail. The Array class just doesn't know how to compare integers to strings or dates, so don't attempt to sort arrays whose elements are not of the same type. If you can't be sure that all elements are of the same type, use a Try...Catch statement.

You can also sort a section of the array by using the following form of the Sort method, where startIndex and endIndex are the indices that delimit the section of the array to be sorted:

System.Array.Sort(arrayName, startIndex, endIndex)

An interesting variation of the Sort method sorts the elements of an array according to the values of the elements in another array. Let's say you have one array of names and another of matching Social Security numbers. It is possible to sort the array of names according to their Social Security numbers. This form of the Sort method has the following syntax:

System.Array.Sort(array1, array2)

array1 is the array with the keys (the Social Security numbers), and array2 is the array with the actual elements to be sorted. This is a very handy form of the Sort method. Let's say you have a list of words stored in one array and their frequencies in another. Using the first form of the Sort method, you can sort the words alphabetically. With the third form of the Sort method, you can sort them according to their frequencies (starting with the most common words and ending with the least common ones). The two arrays must be one-dimensional and have the same number of elements. If you want to sort a section of the array, just supply the startIndex and endIndex arguments to the Sort method, after the names of the two arrays.

Another form of the Sort method relies on a user-supplied function to sort arrays of custom objects. As you recall, arrays can store all types of objects. But the Framework doesn't know how to sort your custom objects, or even its built-in objects. To sort an array of objects, you must provide your own class that implements the IComparer interface (basically, a function that can compare two instances of a custom class). This form of the Sort method is described in detail in the section titled "The IEnumerator and IComparer Interfaces" later in this chapter.

Searching Arrays

Arrays can be searched in two ways: with the BinarySearch method, which works on sorted arrays and is extremely fast, and with the IndexOf and LastIndexOf methods, which work regardless of the order of the elements. All three methods search for an instance of an item and return its index, and they're all reference methods. The IndexOf and LastIndexOf methods are similar to the methods by the same name of the String class. They return the index of the first (or last) instance of an object in the array, or they return the value −1 if the object isn't found in the array. Both methods are overloaded, and the simplest form of the IndexOf method is the following, where arrayName is the name of the array to be searched and object is the item you're searching for:

itemIndex = System.Array.IndexOf(arrayName, object)

Another form of the IndexOf and LastIndexOf methods allows you to begin the search at a specific index:

itemIndex = System.Array.IndexOf(arrayName, object, startIndex)

This form of the method starts searching in the segment of the array from startIndex to the end of the array. Finally, you can specify a range of indices in which the search will take place by using the following form of the method:

itemIndex = System.Array.IndexOf(arrayName, object, startIndex, endIndex)

You can search large arrays more efficiently with the BinarySearch method if the array is sorted. The simplest form of the BinarySearch method is the following:

System.Array.BinarySearch(arrayName, object)

The BinarySearch method returns an integer value, which is the index of the object you've been searching for in the array. If the object argument is not found, the method returns a negative value, which is the negative of the index of the next larger item minus one. This transformation, the negative of a number minus one, is called the one's complement, and other languages provide an operator for it: the tilde (~). The one's complement of 10 is −11, and the one's complement of −3 is 2.

Why all this complexity? Zero is a valid index, so only a negative value could indicate a failure in the search operation. A value of −1 would indicate that the operation failed, but the BinarySearch method does something better. If it can't locate the item, it returns the index of the item immediately after the desired item (the first item in the array that exceeds the item you're searching for). This is a near match, and the BinarySearch method returns a negative value to indicate near matches. A near match is usually the same string with different character casing or a slightly different spelling. It may also be a string that's totally irrelevant to the one you're searching for. Notice that there will always be a near match unless you're searching for a value larger than the last value in the array. In this case, the BinarySearch method will return the one's complement of the array's upper bound (−100 for an array of 100 elements, if you consider that the index of the last element is 99).

VB 2010 at Work: The ArraySearch Application

The ArraySearch application, shown in Figure 12.1, demonstrates how to handle exact and near matches reported by the BinarySearch method. The Populate Array button populates an array with 10,000 random strings. The same strings are also displayed in a sorted ListBox control, so you can view them. The elements have the same order in both the array and the ListBox, so you can use the index reported by the BinarySearch method to locate and select instantly the same item in the ListBox.

Searching an array and highlighting the same element in the ListBox control

Figure 12.1. Searching an array and highlighting the same element in the ListBox control

Each of the 10,000 random strings has a random length of 3 to 15 characters. When you run the application, message boxes will pop up, displaying the time it took for each operation: how long it took to populate the array, how long it took to sort it, and how long it took to populate the ListBox. You might want to experiment with large arrays (100,000 elements or more) to get an idea of how efficiently VB handles arrays.

The Search Array button prompts the user for a string via the InputBox() function and then locates the string in the array by calling the BinarySearch method in the array. The result is either an exact or a near match, and it's displayed in a message box. At the same time, the item reported by the BinarySearch method is also selected in the ListBox control.

Run the application, populate the ListBox control, and then click the Search Array button. Enter an existing string (you can use lowercase or uppercase characters; it doesn't make a difference), and verify that the application reports an exact match and locates the item in the ListBox. The program appears to perform case-insensitive searches because all the strings stored in the array are in uppercase, and the search argument is also converted to uppercase before the BinarySearch method is called.

Now, enter a string that doesn't exist in the list (or the beginning of an existing string) and see how the BinarySearch handles near matches.

The code behind the Search Array button calls the BinarySearch method and stores the integer returned by the method to the wordIndex variable. Then it examines the value of this variable. If wordIndex is positive, there was an exact match, and it's reported. If wordIndex is negative, the program calculates the one's complement of this value, which is the index of the nearest match. The element at this index is reported as a near match. Finally, regardless of the type of the match, the code selects the same item in the ListBox and scrolls it into view. Listing 12.1 is the code behind the Search Array button.

Example 12.1. Locating exact and near matches with BinarySearch

Private Sub bttnSearch_Click(...) Handles bttnSearch.Click
   Dim srchWord As String   ' the word to search for
   Dim wordIndex As Integer ' the index of the word
   srchWord = InputBox(
              "Enter word to search for").ToUpper
   wordIndex = System.Array.BinarySearch(words, srchWord)
   If wordIndex >= 0 Then ' exact match!
       ListBox1.TopIndex = wordIndex
       ListBox1.SelectedIndex = wordIndex
       MsgBox("An exact match was found for " &
          " the word [" & words(wordIndex) &
          "] at index " & wordIndex.ToString,,
          "EXACT MATCH")
   Else                    ' Near match
       ListBox1.TopIndex = -wordIndex - 1
       ListBox1.SelectedIndex = -wordIndex - 1
       MsgBox("The nearest match is the word [" &
           words(-wordIndex - 1) & "] at index " &
           (-wordIndex - 1).ToString, , "NEAR MATCH")
   End If
End Sub

Notice that all methods for sorting and searching arrays work with the base data types only. If the array contains custom data types, you must supply your own functions for comparing elements of this type, a process described in detail in the section "The IEnumerator and IComparer Interfaces" later in this chapter.

The Binary Search Algorithm

The BinarySearch method uses a powerful search algorithm, the binary search algorithm, but it requires that the array be sorted. You need not care about the technical details of the implementation of a method, but in the case of the binary search algorithm, a basic understanding of how it works will help you understand how it performs near matches.

To locate an item in a sorted array, the BinarySearch method compares the search string to the array's middle element. If the search string is smaller, you know that the element is in the first half of the array, and you can safely ignore the second half. The same process is repeated with the remaining half of the elements. The search string is compared with the middle element of the reduced array, and after the comparison, you can ignore one-half of the reduced array. At each step, the binary search algorithm rejects one-half of the items left until it reduces the list to a single item. This is the item you're searching for. If not, the item is not in the list. To search a list of 1,024 items, the binary search algorithm makes 10 comparisons. At the first step, it's left with an array of 512 elements, then 256 elements, then 128 elements, and so on, until it reaches a single element. For an array of 1,024 × 1,024 (that's a little more than a million) items, the algorithm makes 20 comparisons to locate the desired item.

If you apply the BinarySearch method to an array that hasn't been sorted, the method will carry out all the steps and report that the item wasn't found, even if the item belongs to the array. The algorithm doesn't check the order of the elements; it just assumes that they're sorted. The binary search algorithm always halves the number of elements in which it attempts to locate the search argument. That's why you should never apply the BinarySearch method to an array that hasn't been sorted yet.

To see what happens when you apply the BinarySearch method to an array that hasn't been sorted, remove the statement that calls the Sort method in the ArraySearch sample application. The application will keep reporting near matches, even if the string you're searching for is present in the array. Of course, the near match reported by the BinarySearch method in an unsorted array isn't close to the element you're searching for—it's just an element that happens to be there when the algorithm finishes.

Performing Other Array Operations

The Array class exposes additional methods, which are described briefly in this section. The Reverse method reverses the order of the elements in an array. The Reverse method accepts the array to be reversed as an argument and returns another array:

reversedArray = System.Array.Reverse(arrayName)

The Copy and CopyTo methods copy the elements of an array (or segment of an array) to another array. The syntax of the Copy method is as follows:

System.Array.Copy(sourceArray, destinationArray, count)

sourceArray and destinationArray are the names of the two arrays, and count is the number of elements to be copied. The copying process starts with the first element of the source array and ends after the first count elements have been copied. If count exceeds the length of either array, an exception will be thrown.

Another form of the Copy method allows you to specify the range of elements in the source array to be copied and a range in the destination array in which these elements will be copied. The syntax of this form of the method is as follows:

System.Array.Copy(sourceArray, sourceStart,
         destinationArray, destinationStart, count)

This method copies count elements from the source array, starting at location sourceStart, and places them in the destination array, starting at location destinationStart. All indices must be valid, and there should be count elements after the sourceStart index in the source array, as well as count elements after the destinationStart in the destination array. If not, an exception will be thrown.

The CopyTo method is similar, but it doesn't require the name of the source array. It copies the elements of the array to which it's applied into the destination array, where sourceArray is a properly dimensioned and initialized array:

sourceArray.CopyTo(destinationArray, sourceStart)

Finally, you can filter array elements by using the Filter() function, which is not a method of the Array class; it's a VB function that acts on arrays. The Filter() function performs an element-by-element comparison and rejects the elements that don't meet the user-specified criteria. The filtered elements are returned as a new array, while the original array remains intact. The syntax of the Filter() function is as follows:

filteredArray = Filter(source, match, include, compare)

source is the array to be searched, and it must be a one-dimensional array of strings or objects. The match argument is the string to search for. Every element that includes the specified string is considered a match. The remaining arguments are optional: include is a True/False value indicating whether the method will return the elements that include (if True) or exclude (if False) the matching elements. The compare argument is a member of the CompareMethod enumeration: It can be Binary (for binary or case-sensitive comparisons) or Text (for textual or case-insensitive comparisons). If no match is found, the method will return an empty array.

The following code segment filters out the strings that don't contain the word visual from the words array:

Dim words() As String = {"Visual Basic", "Java", "Visual Studio"}
Dim selectedWords() As String
selectedWords = Filter(words, "visual", True, CompareMethod.Text)
Dim selword As String
Dim msg As String = ""
For Each selword In selectedWords
    msg &= selword & vbCrLf
Next
MsgBox(msg)

If you execute the preceding statements, the message box will display the following:

Visual Basic
Visual Studio

There are a few more interesting array methods, such as the FindAll method, which finds the elements that meet arbitrary conditions, and the TrueForAll method, which returns True if all elements in the array meet the criteria you supply. With the introduction of LINQ, a topic that's discussed in detail in Chapter 14, "Introduction to LINQ," there's very little reason to use these methods. LINQ provides a straightforward syntax for selecting elements from an array. If you haven't seen the LINQ syntax before, here the code segment extracts the strings that contain the word visual, similar to the preceding sample:

Dim words() As String = {"Visual Basic", "Java", "Visual Studio"}
Dim selectedWords = From word In words
                    Where word.ToUpper.Contains("VISUAL")
                    Select word

I will not discuss this syntax any further in this chapter, but it's easy to see that the code is more intuitive and that the filtering expression may contain any of the Framework's functions and methods, allowing for much more flexible pattern-matching techniques. As you may recall from Chapter 10, "Applied Object-Oriented Programming," the functionality of arrays, as well as all other collections, has been enhanced with extension methods. One of Chapter 10's sample projects is the Extension Methods project, which demonstrates some of the array's extension methods.

Array Limitations

As implemented in version 4.0 of the Framework, arrays are more flexible than ever. They're very efficient, and the most demanding tasks programmers previously had to perform with arrays are now implemented as methods of the Array class. However, arrays aren't perfect for all types of data storage. The most important shortcoming of arrays is that they're not dynamic. Inserting or removing elements entails moving all the following elements up or down. Arrays are the most efficient collection in the Framework, but when you need a dynamic structure for adding and removing elements in the course of an application, you should use a collection, such as the List or ArrayList collection, described in detail in the next section.

Collection Types

Collections are structures for storing sets of data, similar to arrays, but they are more flexible, and there are several types of collections to choose from. Let's start with a rough classification of collections. There are collections that store just data, like the List and ArrayList collections, and there are collections that store pairs of keys and data values, like the Dictionary and HashTable collections. If each data value has a unique key, you can use this key to quickly retrieve the matching data value. An ArrayList collection can store temperatures, just like an array. It can also store objects with properties such as city names and their temperatures, but it doesn't allow you to retrieve the temperature in a specific city. If you're interested in temperatures only, you can use an ArrayList or even an array. If you need to access the temperatures in specific cities, however, you must use a Dictionary collection that stores temperatures indexed by the corresponding city names.

Another way to classify the collections is as typed and untyped. Untyped collections, such as the ArrayList collection, allow you to store objects of any type. Typed collections, on the other hand, such as the List collection, allow you to store objects of a specific type only. The advantage of using typed collections is that their elements expose the properties of the specific type. Consider an ArrayList that contains Rectangle objects. To access the width of the third element in the collection, you must use an expression like the following:

Dim AL As New ArrayList
CType(AL(2), Rectangle).Width

This expression may cause a runtime error if the third element in the AL collection is not of the Rectangle type (AL is a properly initialized ArrayList).

If you use a List collection to store the same objects, then you can access the width of an element of the collection with a much simpler expression:

Dim LST As New List(Of Rectangle)
' statements to populate the LST collection
LST(2).Width

(LST is a properly initialized List collection, populated with Rectangle objects.) The members of the Rectangle class will appear in the IntelliSense list as you type. Because the LST List collection will not accept any objects other than Rectangles, all errors will be caught at design time, and they will not generate runtime errors.

Now I'll cover the specialized collections of the Framework starting with the simpler ones, the ArrayList and List collections. The two collections are functionally equivalent, and they expose identical members. The ArrayList and List collections allow you to maintain multiple elements, similar to an array; however, Lists and ArrayLists allow the insertion of elements anywhere in the collection, as well as the removal of any element. In other words, they're dynamic structures that can also grow automatically as you add or remove elements. Like an array, the collection's elements can be sorted and searched. You can also remove elements by value, not only by index. If you have a collection populated with names, you remove the item Charles by passing the string itself as an argument. Notice that Charles is not an index value; it's the element you want to remove.

Creating Collections

To use an ArrayList in your code, you must first create an instance of the ArrayList class by using the New keyword, as in the following statement:

Dim aList As New ArrayList

The aList variable represents an ArrayList that can hold only 16 elements (the default size). You can set the initial capacity of the ArrayList by setting its Capacity property, which is the number of elements the ArrayList can hold. Notice that you don't have to prepare the collection to accept a specific number of items. Every time you exceed the collection's capacity, it's doubled automatically. However, it's not decreased automatically when you remove items.

The exact number of items currently in the ArrayList is given by the Count property, which is always less than (or, at most, equal to) the Capacity property. (Both properties are expressed in terms of items.) If you decide that you will no longer add more items to the collection, you can call the TrimToSize method, which will set the collection's capacity to the number of items in the list.

Where the ArrayList collection can store objects of any type, a variation of the ArrayList stores objects of the same type. This collection is called List, and it's a typed ArrayList. The only difference between ArrayLists and Lists is in the way they're declared: When you declare a List collection, you must supply the type of the objects you intend to store in it, using the Of keyword in parentheses, as shown in the following statements, which declare two List collections for storing Rectangle objects (the Rects collection) and custom objects of the Student type (the Students collection):

Dim Rects As New List(Of Rectangle)
Dim Students As New List(Of Student)

Other than that, the List collection is identical to the ArrayList collection. Of course, because the List collection stores objects of a specific type, it integrates nicely with IntelliSense. If you type the name of an element in the collection (such as Rects(3)) and the following period, IntelliSense will display the members of the Rectangle class in a drop-down list. In the following section, you'll learn about the methods they expose and how to use both types of collections in your code. I will be using mostly the List class in the sample code and suggest that you do the same in your code whenever possible, but the same techniques apply to the ArrayList class as well.

Adding and Removing List Items

To add a new item to a List, use the Add method, whose syntax is as follows:

index = lst.Add(obj)

lst is a properly declared List (or ArrayList) variable, and obj is the item you want to add to the collection. The type of the obj object should be the same as the one you used in the declaration of the List collection. The Add method appends the specified item to the collection and returns the index of the new item. If you're using a List named Capitals to store the names of the state capitals, you can add an item by using the following statement:

Capitals.Add("Sacramento")

Let's say you created a structure called Person by using the following declaration:

Structure Person
   Dim LastName As String
   Dim FirstName As String
   Dim Phone As String
   Dim EMail As String
End Structure

To store a collection of Person items in a List, create a variable of the Person type, set its fields, and then add it to the List, as shown in Listing 12.2.

Example 12.2. Adding a structure to an ArrayList

Dim Persons As New List(Of Person)
Dim p As New Person
p.LastName = "Last Name"
p.FirstName = "First Name"
p.Phone = "Phone"
p.EMail = "[email protected]"
Persons.Add(p)
p = New Person
p.LastName = "another name"
{ statements to set the other fields}
Persons.Add(p)

If you execute these statements, the List collection will hold two items, both of the Person type. Notice that you can add multiple instances of the same object to the List collection. To find out whether an item belongs to the collection already, use the Contains method, which accepts as an argument an object and returns a True or False value, depending on whether the object belongs to the list:

If Persons.Contains(p) Then
   MsgBox("Duplicate element rejected")
Else
   Persons.Add(p)
   MsgBox("Element appended successfully")
End If

By default, items are appended to the collection. To insert an item at a specific location, use the Insert method, which accepts as an argument the location at which the new item will be inserted and, of course, an object to insert in the ArrayList, as shown next:

aList.Insert(index, object)

Unlike the Add method, the Insert method doesn't return a value—the location of the new item is already known.

You can also add multiple items via a single call to the AddRange method. This method appends to the collection of a set of items, which could come from an array or from another list. The following statement appends the elements of an array to the Lst collection:

Dim colors() As Color = {Color.Red, Color.Blue, Color.Green}
Lst.AddRange(colors)

To insert a range of items anywhere in the collection, use the InsertRange method. Its syntax is the following, where index is the index of the collections where the new elements will be inserted, and objects is a collection of the elements to be inserted:

Lst.InsertRange(index, objects)

Finally, you can overwrite a range of elements in the collection with a new range by using the SetRange method. To overwrite the items in locations 5 through 9 in an collection, use a few statements like the following:

Dim words() As String =
               {"Just", "a", "few", "more", "words"}
Lst.SetRange(5, words)

This code segment assumes that the Lst collection contains at least 10 items, and it replaces half of them.

To remove an item, use the Remove method, whose syntax is the following:

aList.Remove(object)

The object argument is the value to be removed, not an index value. If the collection contains multiple instances of the same item, only the first instance of the object will be removed.

Notice that the Remove method compares values, not references. If the collection contains a Rectangle object, you can search for this item by creating a new variable of this type and setting its properties to the properties of the Rectangle object you want to remove:

Dim R1 As New Rectangle(10, 10, 100, 100)
Dim R2 As Rectangle = R1
Lst.Add(R1)
Lst.Add(R2)
Dim R3 As Rectangle
R3 = New Rectangle(10, 10, 100, 100)
Lst.Remove(R3)

If you execute these statements, they will add two identical rectangles to the Lst collection. The last statement will remove the first of the two rectangles.

If you attempt to remove an item that doesn't exist, no exception is thrown—simply, no item is removed from the list. You can also remove items by specifying their index in the list via the RemoveAt method. This method accepts as an argument the index of the item to remove, which must be less than the number of items currently in the list.

To remove more than one consecutive item, use the RemoveRange method, whose syntax is the following:

Lst.RemoveRange(startIndex, count)

The startIndex argument is the index of the first item to be removed, and count is the number of items to be removed.

The following statements are examples of the methods that remove items from a collection. The first two statements remove an item by value. The first statement removes an object, and the second removes a string item. The following statement removes the third item, and the last one removes the third through the fifth items.

aList.Remove(Color.Red)
aList.Remove("Richard")
aList.RemoveAt(2)
aList.RemoveRange(2, 3)

Collection Initializers

You can also declare and populate a collection in a single statement, with a collection initializer. The collection initializer is a sequence of elements enclosed in a pair of curly brackets. The following statement declares and initializes a new ArrayList collection with strings:

Dim letters As New ArrayList({"quick", "dog", "fox", "lazy", "brown"})

As you recall from Chapter 8, "Working with Objects," you can initialize an object by supplying a set of values in parentheses as an argument to its constructor. The curly brackets indicate that you're supplying a collection of values. As a reminder, here's the simplest form of collection initializer, the array initializer:

Dim numbers() As Decimal = {2.0, 12.99, 0.001, 1000.0, 10.01}

If this were a List collection, the initialization would be a little more complicated. Let's initialize a List collection of strings. You first must declare the collection with a statement like the following:

Dim words = New List(Of String)

To initialize the collection, you must supply a list of strings in a pair of curly brackets. The following statement creates an array of strings:

{"the", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"}

To initialize the List collection, all you have to do is append the array with the values to the collection's initializer:

Dim words = New List(Of String)({"the", "quick", "brown",
                                  "fox", "jumped", "over", "the", "lazy", "dog"})

Collection initializers allow you to declare and initialize a collection similar to all scalar objects. You can also supply the constructors of more complex objects to the collection initializer. The following statement will create a List of Rectangle objects:

Dim Rects = New List(Of Rectangle)({New Rectangle(0, 0, 4, 14),
                                    New Rectangle(0, 0, 101, 101),
                                    New Rectangle(100, 100, 10, 20)})

This looks fairly complicated, almost C#-like, but here's how you read the statement. The constructor of the Rects collection accepts its initial values in a pair of parentheses. In the parentheses themselves, you can embed an array of object initializers. Note that all object initializers must be embedded in a pair of curly brackets.

Finally, you can initialize a collection with anonymous types, a topic discussed in Chapter 10. An anonymous type is a type that's created while it's being declared. Here's an anonymous type that may appear anywhere in your code:

Dim P = New With {.First = "Evangelos", .Last = "Petroutsos"}

Note that the variable P is of an anonymous type, because there's no type name. However, it has two fields that you can access as if the P variable were based on a custom class:

P.First = "Richard"

The next statement creates a new collection of objects, each one representing a city:

Dim cities = New ArrayList(
                  {New With {.city = "a city", .population = 201213},
                   New With {.city = "a small city", .population = 101320},
                   New With {.city = "a town", .population = 39210}})

You may find the number of curly brackets overwhelming, but the trick to using initializers effectively is to build them step-by-step. The following expression creates a new variable of the anonymous type with two fields:

New With {.city = "a city", .population = 201213}

Then you embed several of these expressions with different field values in a pair of curly brackets to create a collection of objects of this anonymous type, and finally you embed the collection in a pair of parentheses to pass them as arguments to the collection's initializer. To standardize the collection initializers, consider the following rule: All values you add to the collection must be embedded in a pair of curly brackets:

{object1, object2, object3, ...}

Each object in the collection must be initialized with its own constructor. Replace object1 with its initializer, object2 with the same initializer and different values, and so on, to create a collection of objects. Then embed the entire expression, along with the outer curly brackets, in a pair of parentheses following the collection's declaration. If the objects are of an anonymous type, insert the appropriate anonymous initializer for each object. As for the curly brackets, think of them as delimiters for sets of values for arrays (or sets of properties for objects). In Visual Basic, curly brackets are used only with collection initializers and anonymous constructors.

Of course, you can't create a List collection of an anonymous type, because the List is a strongly typed collection.

Extracting Items from a Collection

To access the items in the collection, use an index value, similar to the array. The first item's index is 0, and the last item's index is Lst.Count-1, where Lst is a properly initialized List or ArrayList collection. You can also extract a range of items from the list by using the GetRange method. This method extracts a number of consecutive elements from the Lst collection and stores them to a new collection, the newLst collection, where index is the index of the first item to copy, and count is the number of items to be copied:

newLst = Lst.GetRange(index, count)

The GetRange method returns another collection with the proper number of items. The following statement copies three items from the Lst collection and inserts them at the beginning of the bLst collection. The three elements copied are the fourth through sixth elements in the original collection:

bLst.InsertRange(0, aLst.GetRange(3, 3))

The Repeat method, which fills a list with multiple instances of the same item, has the following syntax:

newList = ArrayList.Repeat(item, count)

This method returns a new list with count elements, all of them being identical to the item argument. The Reverse method, finally, reverses the order of the elements in a collection, or a portion of it.

Sorting Lists

To sort an ArrayList or List collection, use the Sort method, which has three overloaded forms:

aLst.Sort()
aLst.Sort(comparer)
aLst.Sort(startIndex, endIndex, comparer)

The Sort method doesn't require you to pass the name of the list to be sorted as an argument; unlike the Sort method of the Array class, this is an instance method and it sorts the collection to which it's applied. aLst is a properly declared and initialized List or ArrayList variable. The first form of the Sort method sorts the list alphabetically or numerically, depending on the data type of the objects stored in it. If the items are not all of the same type, an exception will be thrown. You'll see how you can handle this exception in just a moment.

If the items stored in the collection are of a data type other than the base data types, you must supply your own mechanism to compare the objects. The other two forms of the Sort method use a custom function for comparing items; you will see how they're used in the section "The IEnumerator and IComparer Interfaces" later in this chapter.

If the list contains items of widely different types, the Sort method will fail. To prevent a runtime exception (the InvalidOperationException), you must make sure that all items are of the same type. If you can't ensure that all the items are of the same type, catch the possible errors and handle them from within a structured exception handler, as demonstrated in Listing 12.3. (The listing doesn't show the statements for populating the ArrayList, just the use of a structured exception handler for the Sort method.)

Example 12.3. Foolproof sorting

Dim sorted As Boolean = True
Try
  aLst.Sort()
Catch SortException As InvalidOperationException
     MsgBox("You can't sort an ArrayList whose items " &
            "aren't of the same type")
     sorted = False
Catch GeneralException As Exception
     MsgBox("The following exception occurred: " &
            vbCrLf & GeneralException.Message)   sorted = False
End Try
If sorted Then
  { process sorted list}
Else
  { process unsorted list}
End If

The sorted Boolean variable is initially set to True because the Sort method will most likely succeed. If not, an exception will be thrown, in which case the code sets the sorted variable to False and uses it later to distinguish between sorted and unsorted collections. Notice the two clauses of the Catch statement that distinguish between the invalid operation exception and any other type of exception.

The Sort method can't even sort a collection of various numeric data types. If some of its elements are doubles and some are integers or decimals, the Sort method will fail. You must either make sure that all the items in the ArrayList are of the same type or provide your own function for comparing the ArrayList's items. The best practice is to make sure that your collection contains items of the same type by using a List collection. If a collection contains items of different types, however, how likely is it that you'll have to sort such a collection?

Searching Lists

Like arrays, ArrayList and List collections expose the IndexOf and LastIndexOf methods to search in an unsorted list and the BinarySearch method for sorted lists. The IndexOf and LastIndexOf methods accept as an argument the object to be located and return an index:

Dim index As Integer = Lst.IndexOf(object)

Here, object is the item you're searching. The LastIndexOf method has the same syntax, but it starts scanning the array from its end and moves backward toward the beginning. The IndexOf and LastIndexOf methods are overloaded as follows:

Lst.IndexOf(object, startIndex)
Lst.IndexOf(object, startIndex, length)

The two additional arguments determine where the search starts and ends. Both methods return the index of the item if it belongs to the collection. If not, they return the value −1. The IndexOf and LastIndexOf methods of the List and ArrayList classes perform case-sensitive searches, and they report exact matches only.

If the list is already sorted, use the BinarySearch method, which accepts as an argument the object to be located and returns its index in the collection, where object is the item you're looking for:

Dim index As Integer = Lst.BinarySearch(object)

There are two more overloads of this method. To search for an item in a list with custom objects, use the following form of the BinarySearch method:

Dim index As Integer = Lst.BinarySearch(object, comparer)

The first argument is the object you're searching for, and the second is the name of an IComparer object. Another overload of the BinarySearch method allows you to search for an item in a section of the collection; its syntax is as follows:

Dim index As Integer = _
         Lst.BinarySearch(startIndex, length, object, comparer)

The first argument is the index at which the search will begin, and the second argument is the length of the subrange. object and comparer are the same as with the second form of the method.

Iterating Through a List

To iterate through the elements of a collection, you can set up a For...Next loop:

For i = 0 To Lst.Count - 1
   { process item Lst(i)}
Next

or a For Each loop:

For Each itm In aLst
   { process item itm }
Next

This is a trivial operation, but the processing itself can get as complicated as required by the type of objects stored in the collection. The current item at each iteration is the Lst(i). It's recommended that you cast the object to the appropriate type and then process it.

You can also use the For Each...Next loop with an Object variable, as shown next:

Dim itm As Object
For Each itm In aLst
   { process item itm}
Next

If you're iterating through an ArrayList collection, you must cast the control variable itm to the appropriate type. If you're iterating through a List collection, then the itm variable is of the same type as the one you used in the declaration of the List collection, as long as the Infer option is on. Alternatively, you can declare the control variable in the For Each statement:

Dim Lst AS New List(Of String)
' statements to populate list
For Each itm As String In Lst
' process item lst
Next

An even better method is to create an enumerator for the collection and use it to iterate through its items. This technique applies to all collections and is discussed in the "The IEnumerator and IComparer Interfaces" section later in this chapter.

The List and ArrayList classes address most of the problems associated with the Array class, but one last problem remains—that of accessing the items in the collection through a meaningful key. This is the problem addressed by the Dictionary and HashTable collections.

The Dictionary Collection

As you saw, the List and ArrayList classes address most of the problems of the Array class, while they support all the convenient array features. Yet, the two lists, like the Array, have a major drawback: You must access their items by an index value. Another collection, the Dictionary collection, is similar to the List and ArrayList collections, but it allows you to access the items by a meaningful key.

Each item in a Dictionary has a value and a key. The value is the same value you'd store in an array, but the key is a meaningful entity for accessing the items in the collection, and each element's key must be unique. Both the values stored in a Dictionary and their keys can be objects. Typically, the keys are short strings (such as Social Security numbers, ISBN values, product IDs, and so on) or integers.

The Dictionary collection exposes most of the properties and methods of the List, with a few notable exceptions. The Count property returns the number of items in the collection as usual, but the Dictionary collection doesn't expose a Capacity property. The Dictionary collection uses fairly complicated logic to maintain the list of items, and it adjusts its capacity automatically. Fortunately, you need not know how the items are stored in the collection.

To create a Dictionary in your code, declare it with the New keyword, and supply the type of the keys and values you plan to store in the collection with a statement like the following:

Dim Students As New Dictionary(Of Integer, Student)

The first argument is the type of the keys, and the second argument is the type of the values you plan to store in the collection. Dictionaries are typed collections, unless you declare them with the Object type. To add an item to the Dictionary, use the Add method with the following syntax:

Students.Add(key, value)

value is the item you want to add (it can be an object of the type specified in the collection's declaration), and key is an identifier you supply, which represents the item. After populating the collection you will use the key to access specific elements of the collection. For example, you will use city names to locate their temperature as shown in the following statements:

Dim Temperatures As New Dictionary(Of String, Decimal)
Temperatures.Add("Houston", 81.3)
Temperatures.Add("Los Angeles", 78.5)

To find out the temperature in Houston, use the following statement:

MsgBox(Temperatures("Houston").ToString)

Notice that you can have duplicate values, but the keys must be unique. If you attempt to reuse an existing key as you populate the collection, an InvalidArgumentException exception will be thrown. To find out whether a specific key or value is already in the collection, use the ContainsKey and ContainsValue methods. The syntax of the two methods is quite similar, and they return True if the specified key or value is already in use in the collection:

Dictionary.ContainsKey(object)
Dictionary.ContainsValue(object)

The Dictionary collection exposes the Contains method, too, which returns True if the collection contains a pair of a key and a value. If the collection contains the same value with a different key, the Contains method will return False.

To find out whether a specific key is in use already, use the ContainsKey method, as shown in the following statements, which add a new item to the Dictionary only if its key doesn't exist already:

Dim value As New Rectangle(100, 100, 50, 50)
Dim key As String = "Rect1"
If Not Rects.ContainsKey(key) Then
   Rects.Add(key, value)
End If

If the key exists already, you might want to change its value. In this case, use the following notation, which adds new elements to the list if the key isn't present or changes the value of the element that corresponds to the specified key:

Rects(key) = value

The Values and Keys properties allow you to retrieve all the values and the keys in the Dictionary, respectively. Both properties are collections and expose the usual members of a collection. To iterate through the values stored in a Dictionary, use the following loop:

Dim itm As Object
For Each itm In Dict.Values
   Debug.WriteLine(itm)
Next

Listing 12.4 demonstrates how to scan the keys of a Dictionary through the Keys property and then use these keys to access the matching items through the Item property.

Example 12.4. Iterating a Dictionary collection

Private Function ShowDictionaryContents(
                    ByVal table As Dictionary
                    (Of String, Decimal)) As String
       Dim msg As String
       Dim element, key As Object
       msg = "The HashTable contains " &
               table.Count.ToString & " elements: " & vbCrLf
       For Each key In table.Keys
            element = table.Item(key)
            msg = msg & vbCrLf & "    Element Key= " &
                  key.ToString
            msg = msg & "    Element Value= " &
                  element.ToString & vbCrLf
       Next
       Return (msg)
End Function

To print the contents of a Dictionary collection in the Output window, call the ShowHashTableContents() function, passing the name of the Dictionary as an argument, and then print the string returned by the function:

Dim Dict As New Dictionary
{ statements to populate Dictionary}
MsgBox(ShowDictionaryContents(Dict))

The specific implementation of the ShowDictionaryContents() function applies to Dictionary collections that use strings as keys and decimals as values. For Dictionary collections that store different types, you must edit the function to accommodate the appropriate data types.

The HashTable Collection

Another collection, very similar to the Dictionary collection, is the HashTable collection. The HashTable is an untyped Dictionary. To populate a HashTable with temperature values indexed by city names, use a few statements like the following:

Dim tmps As New Hashtable
tmps("Houston") = 81.3
tmps("Los Angeles") = 78.5

If an element with the same key you're trying to add exists already, then no new item will be added to the list. Instead, the existing item's value will be changed. After executing the following statements, for example, the Temperatures HashTable will contain a single element with the key Houston and the value 79.9:

Dim temperatures As New Hashtable
tmps("Houston") = 81.3
tmps("Houston") = 79.9

You can always use the Add method to add elements to a HashTable collection, and the syntax of this method is identical to the Add method of the Dictionary collection. Like the Dictionary collection, it exposes the Add method that accepts a key and a value as arguments, a Remove method that accepts the key of the element to be removed, and the Keys and Values properties that return all the keys and values in the collection, respectively. The major difference between the Dictionary and HashTable collections is that the Dictionary class is a strongly typed one, while the HashTable can accept arbitrary objects as values and is not as fast as the Dictionary class. Another very important difference is that the Dictionary collection class is not serializable, a topic discussed in detail in the following chapter. Practically, this means that a Dictionary collection can't be persisted to a disk file with a single statement, which is true for the HashTable collection.

VB 2010 at Work: The WordFrequencies Project

In this section, you'll develop an application that counts word frequencies in a text. The WordFrequencies application (available for download from www.sybex.com/go/masteringvb2010) scans text files and counts the occurrences of each word in the text. As you will see, the HashTable collection is the natural choice for storing this information because you want to access a word's frequency by using the actual word as the key. To retrieve (or update) the frequency of the word elaborate, for example, you will use this expression:

Words("ELABORATE").Value

where Words is a properly initialized HashTable object.

When the code runs into another instance of the word elaborate, it simply increases the matching item of the Words HashTable by one:

Words("ELABORATE").Value += 1

Arrays and Lists (or ArrayLists) are out of the question because they can't be accessed by a key. You could also use the SortedList collection (described later in this chapter), but this collection maintains its items sorted at all times. If you need this functionality as well, you can modify the application accordingly. The items in a SortedList are also accessed by keys, so you won't have to introduce substantial changes in the code.

Let me start with a few remarks. First, all words we locate in the various text files will be converted to uppercase. Because the keys of the HashTable are case sensitive, converting them to uppercase eliminates the usual problem of case sensitivity (hello being a different word than Hello and HELLO) by eliminating multiple possible variations in capitalization for the same word.

The frequencies of the words can't be calculated instantly because we need to know the total number of words in the text. Instead, each value in the HashTable is the number of occurrences of a specific word. To calculate the actual frequency of the same word, we must divide this value by the number of occurrences of all words, but this can happen only after we have scanned the entire text file and counted the occurrences of each word.

Figure 12.2 shows the application's interface. To scan a text file and process its words, click the Read Text File button. The Open dialog box will prompt you to select the text file to be processed, and the application will display in a message box the number of unique words read from the file. Then you can click the Show Word Count button to count the number of occurrences of each word in the text. The last button on the form calculates the frequency of each word and sorts the words according to their frequencies.

The WordFrequencies project demonstrates how to use the HashTable collection.

Figure 12.2. The WordFrequencies project demonstrates how to use the HashTable collection.

The application maintains a single HashTable collection, the Words collection, and it updates this collection rather than counting word occurrences from scratch for each file you open. The Frequency Table menu contains the commands to save the words and their counts to a disk file and read the same data from the file. The commands in this menu can store the data either to a text file (Save XML/Load XML commands) or to a binary file (Save Binary/Load Binary commands). Use these commands to store the data generated in a single session, load the data in a later session, and process more files.

The WordFrequencies application uses techniques and classes I haven't discussed yet. Serialization is discussed in detail in the next chapter, whereas reading from (or writing to) files is discussed in the tutorial "Accessing Folders and Files." (You'll find that tutorial online at www.sybex.com/go/masteringvb2010.) You don't really have to understand the code that opens a text file and reads its lines; just focus on the segments that manipulate the items of the HashTable.

To test the project, I used some large text files I downloaded from the Project Gutenberg website (http://promo.net/pg/). This site contains entire books in electronic format (plain-text files), and you can borrow some files to test any program that manipulates text. (Choose some books you will enjoy reading.)

The code reads the text into a string variable, and then it calls the Split method of the String class to split the text into individual words. The Split method uses the space, comma, period, quotation mark, exclamation mark, colon, semicolon, and new-line characters as delimiters. The individual words are stored in the Words array; after this array has been populated, the program goes through each word in the array and determines whether it's a valid word by calling the IsValidWord() function. This function returns False if one of the characters in the word is not a letter; strings such as B2B or U2 are not considered proper words. IsValidWord() is a custom function, and you can edit it as you wish to handle the specific content (my assumption that a word may not contain digits is quite reasonable to text files but totally wrong if you're handling code listings, for example).

Any valid word becomes a key to the wordFrequencies HashTable. The corresponding value is the number of occurrences of the specific word in the text. If a key (a new word) is added to the table, its value is set to 1. If the key exists already, its value is increased by 1 via the following If statement:

If Not wordFrequencies.ContainsKey(word) Then
   wordFrequencies.Add(word, 1)
Else
   wordFrequencies(word) = CType(wordFrequencies(word), Integer) + 1
End If

Listing 12.5 is the code that reads the text file and splits it into individual words. The code reads the entire text into a string variable, the txtLine variable, and the individual words are isolated with the Split method of the String class. The Delimiters array stores the characters that the Split method will use as delimiters, and you can add more delimiters depending on the type of text you're processing. If you're counting keywords in program listings, for example, you'll have to add the math symbols and parentheses as delimiters.

Example 12.5. Splitting a text file into words

Private Sub bttnRead_Click(...) Handles bttnRead.Click
   OpenFileDialog1.DefaultExt = "TXT"
   OpenFileDialog1.Filter = "Text|*.TXT|All Files|*.*"
   If OpenFileDialog1.ShowDialog() <>
            Windows.Forms.DialogResult.OK Then Exit Sub
   Dim str As StreamReader = File.OpenText(OpenFileDialog1.FileName)
   Dim txtLine As String
   Dim words() As String
   Dim Delimiters() As Char =
         {CType(" ", Char), CType(".", Char), CType(",", Char),
          CType("?", Char), CType("!", Char), CType(";", Char),
          CType(":", Char), Chr(10), Chr(13), vbTab}
   txtLine = str.ReadToEnd
   words = txtLine.Split(Delimiters)
   Dim uniqueWords As Integer
   Dim iword As Integer, word As String
   For iword = 0 To Words.GetUpperBound(0)
       word = words(iword).ToUpper
       If IsValidWord(word) Then
If Not wordFrequencies.ContainsKey(word) Then
               wordFrequencies.Add(word, 1)
               uniqueWords += 1
           Else
               wordFrequencies(word) =
                      CType(wordFrequencies(word), Integer) + 1
           End If
       End If
   Next
   MsgBox("Read " & words.Length & " words and found " &
           uniqueWords & " unique words")
   RichTextBox1.Clear()
End Sub

This event handler keeps track of the number of unique words and displays them in a RichTextBox control. In a document with 90,000 words, it took less than a second to split the text and perform all the calculations. The process of displaying the list of unique words in the RichTextBox control was very fast, too, thanks to the StringBuilder class. The code behind the Show Word Count button (see Listing 12.6) displays the list of words along with the number of occurrences of each word in the text.

Example 12.6. Displaying the count of each word in the text

Private Sub bttnCount_Click(...) Handles bttnCount.Click
   Dim wEnum As IDictionaryEnumerator
   Dim allWords As New System.Text.StringBuilder
   wEnum = WordFrequencies.GetEnumerator
   While wEnum.MoveNext
       allWords.Append(wEnum.Key.ToString &
               vbTab & "-->" & vbTab & _
               wEnum.Value.ToString & vbCrLf)
   End While
   RichTextBox1.Text = allWords.ToString
End Sub

The last button on the form calculates the frequency of each word in the HashTable, sorts the words according to their frequencies, and displays the list. Its code is detailed in Listing 12.7.

Example 12.7. Sorting the words according to frequency

Private Sub bttnShow_Click(...) Handles bttnSort.Click
   Dim wEnum As IDictionaryEnumerator
   Dim words(wordFrequencies.Count) As String
   Dim frequencies(wordFrequencies.Count) As Double
   Dim allWords As New System.Text.StringBuilder
Dim i, totCount As Integer
   wEnum = wordFrequencies.GetEnumerator
   While wEnum.MoveNext
       words(i) = CType(wEnum.Key, String)
       frequencies(i) = CType(wEnum.Value, Integer)
       totCount = totCount + Convert.ToInt32(Frequencies(i))
       i = i + 1
   End While
   For i = 0 To words.GetUpperBound(0)
       frequencies(i) = frequencies(i) / totCount
   Next
   Array.Sort(frequencies, Words)
   RichTextBox1.Clear()
   For i = words.GetUpperBound(0) To 0 Step −1
       allWords.Append(words(i) & vbTab & "-->" &
                           vbTab & Format(100 * frequencies(i),
                           "#.000") & vbCrLf)
   Next
   RichTextBox1.Text = allWords.ToString
End Sub

The SortedList Collection

The SortedList collection is a combination of the Array and HashTable classes. It maintains a list of items that can be accessed either with an index or with a key. When you access items by their indices, the SortedList behaves just like an ArrayList; when you access items by their keys, the SortedList behaves like a HashTable. What's unique about the SortedList is that this collection is always sorted according to the keys. The items of a SortedList are always ordered according to the values of their keys, and there's no method for sorting the collection according to the values stored in it.

To create a new SortedList collection, use a statement such as the following:

Dim sList As New SortedList

As you might have guessed, this collection can store keys that are of the base data types. If you want to use custom objects as keys, you must specify an argument of the IComparer type, which tells VB how to compare the custom items. This information is crucial; without it, the SortedList won't be able to maintain its items sorted. You can still store items in the SortedList, but they will appear in the order in which they were added. This form of the SortedList constructor has the following syntax, where comparer is the name of a custom class that implements the IComparer interface (which is discussed in detail later in this chapter):

Dim sList As New SortedList(New comparer)

There are also two more forms of the constructor, which allow you to specify the initial capacity of the SortedList collection, as well as a Dictionary object, whose data (keys and values) will be automatically added to the SortedList.

To add an item to a SortedList collection, use the Add method, whose syntax is the following, where key is the key of the new item and item is the item to be added:

sList.Add(key, item)

Both arguments are objects. But remember, if the keys are objects, the collection won't be automatically sorted; you must provide your own comparer, as discussed later in this chapter. The Add method is the only way to add items to a SortedList collection, and all keys must be unique; attempting to add a duplicate key will throw an exception.

The SortedList class also exposes the ContainsKey and ContainsValue methods, which allow you to find out whether a key or item already exists in the list. To add a new item, use the following statement to make sure that the key isn't in use:

If Not sList.ContainsKey(myKey) Then
   sList.Add(myKey, myItem)
End If

To replace an existing item, use the SetByIndex method, which replaces the value at a specific index. The syntax of the method is the following, where the first argument is the index at which the value will be inserted, and item is the new item to be inserted in the collection:

sList.SetByIndex(index, item)

This object will replace the value that corresponds to the specified index. The key, however, remains the same. There's no equivalent method for replacing a key; you must first remove the item and then insert it again with its new key.

To remove items from the collection, use the Remove and RemoveAt methods. The Remove method accepts a key as an argument and removes the item that corresponds to that key. The RemoveAt method accepts an index as an argument and removes the item at the specified index. To remove all the items from a SortedList collection, call its Clear method.

Other Collections

The System.Collections class exposes a few more collections, including the Queue and Stack collections. The main characteristic of these two collections is how you add and remove items to them. When you add items to a Queue collection, the items are appended to the collection. When you remove items, they're removed from the top of the collection. Queues are known as last in, first out (LIFO) structures because you can extract only the oldest item in the queue. You'd use this collection to simulate the customer line in a bank or a production line.

The Stack collection inserts new items at the top, and you can remove only the top item. The Stack collection is a first in, first out (FIFO) structure. You'd use this collection to emulate the stack maintained by the CPU, one of the most crucial structures for the operating system and applications alike. The Stack and Queue collections are used heavily in computer science but hardly ever in business applications, so I won't discuss them further in this book.

The IEnumerator and IComparer Interfaces

IEnumerator and IComparer are two classes that unlock some of the most powerful features of collections. The proper term for IEnumerator and IComparer is interface, a term I will describe shortly. Every class that implements the IEnumerator interface is capable of retrieving a list of pointers for all the items in a collection, and you can use this list to iterate through the items in a collection. Every collection has a built-in enumerator, and you can retrieve it by calling its GetEnumerator method. And every class that implements the IComparer interface exposes the Compare method, which tells the compiler how to compare two objects of the same type. After the compiler knows how to compare the objects, it can sort a collection of objects with the same type.

The IComparer interface consists of a function that compares two items and returns a value indicating their order (which one is the smaller item or whether they're equal). The Framework can't compare objects of all types; it knows only how to compare the base types. It doesn't even know how to compare built-in objects such as two rectangles or two color objects. If you have a collection of colors, you might want to sort them according to their luminance, saturation, brightness, and so on. Rectangles can be sorted according to their area or perimeter. The Framework can't make any assumptions as to how you might want to sort your collection, and of course, it doesn't expose members to sort a collection in all possible ways. Instead, it gives you the option to specify a function that compares two colors (or two objects of any other type, for that matter) and uses this function to sort the collection. The same function is used by the BinarySearch method to locate an item in a sorted collection. In effect, the IComparer interface consists of a single function that knows how to compare two specific custom objects.

So, what is an interface? An interface is another term in object-oriented programming that describes a very simple technique. When you write the code for a class, you might not know how to implement a few operations, but you do know that they'll have to be implemented later. You insert a placeholder for these operations (one or more function declarations) and expect that the application that uses the class will provide the actual implementation of these functions. All collections expose a Sort method, which sorts the items in the collection by comparing them to one another. To do so, the Sort method calls a function that compares two items and returns a value indicating their relative order. Custom objects must provide their own comparison function—or more than a single function, if you want to sort them in multiple ways. Because you can't edit the collection's Sort method code, you must supply your comparison function through a mechanism that the class can understand. This is what the IComparer interface is all about.

Enumerating Collections

All collections expose the GetEnumerator method. This method returns an object of the IEnumerator type, which allows you to iterate through the collection without having to know anything about its items, not even the count of the items. To retrieve the enumerator for a collection, call its GetEnumerator method by using a statement like the following:

Dim ALEnum As IEnumerator
ALEnum = aList.GetEnumerator

The IEnumerator class exposes two methods: the MoveNext and Reset methods. The MoveNext method moves to the next item in the collection and makes it the current item (property Current). When you initialize the IEnumerator object, it's positioned in front of the very first item, so you must call the MoveNext method to move to the first item. The Reset method does exactly the same thing: It repositions the IEnumerator in front of the first element.

The MoveNext method doesn't return an item, as you might expect. It returns a True/False value that indicates whether it has successfully moved to the next item. After you have reached the end of the collection, the MoveNext method will return False. Here's how you can enumerate through a List collection by using an enumerator:

Dim listItems As IEnumerator
listItems = list.GetEnumerator
While listItems.MoveNext
   { process item listItems.Current}
End While

At each iteration, the current item is given by the Current property of the enumerator. After you reach the last item, the MoveNext method will return False, and the loop will terminate. To rescan the items, you must reset the enumerator by calling its Reset method.

The event handler in Listing 12.8 populates an ArrayList with Rectangle objects and then iterates through the collection and prints the area of each Rectangle using the collection's enumerator.

Example 12.8. Iterating an ArrayList with an enumerator

Dim aList As New ArrayList()
Dim R1 As New Rectangle(1, 1, 10, 10)
aList.Add(R1)
R1 = New Rectangle(2, 2, 20, 20)
aList.Add(R1)
aList.add(New Rectangle(3, 3, 2, 2))
Dim REnum As IEnumerator
REnum = aList.GetEnumerator
Dim R As Rectangle
While REnum.MoveNext
   R = CType(REnum.Current, Rectangle)
   Debug.WriteLine((R.Width * R.Height).ToString)
End While

The REnum variable is set up and used to iterate through the items of the collection. At each iteration, the code stores the current Rectangle to the R variable, and it uses this variable to access the properties of the Rectangle object (its width and height).

Of course, you can iterate a collection without the enumerator, but with a For Each...Next or For Each loop. To iterate through a HashTable, you can use either the Keys or Values collection. The code shown in Listing 12.9 populates a HashTable with Rectangle objects. Then it scans the items and prints their keys, which are strings, and the area of each rectangle. Note how it uses each item's Key property to access the corresponding item in the collection.

Example 12.9. Iterating a HashTable with its keys

Dim hTable As New HashTable()
Dim r1 As New Rectangle(1, 1, 10, 10)
hTable.Add("R1", r1)
r1 = New Rectangle(2, 2, 20, 20)
hTable.Add("R2", r1)
hTable.add("R3", New Rectangle(3, 3, 2, 2))
Dim key As Object
Dim R As Rectangle
For Each key In hTable.keys
   R = CType(hTable(key), Rectangle)
   Debug.WriteLine(String.Format( _
           "The area of Rectangle {0} is {1}" _
            key.ToString, R.Width * R.Height))
Next

The code adds three Rectangle objects to the HashTable and then iterates through the collection using the Keys properties. Each item's key is a string (R1, R2, and R3). The Keys property is itself a collection and can be scanned with a For Each...Next loop. At each iteration, you access a different item through its key with the expression hTable(key). The output produced by this code is shown here:

The area of Rectangle R1 is 100
The area of Rectangle R3 is 4
The area of Rectangle R2 is 400

Alternatively, you can iterate a HashTable with an enumerator, but be aware that the GetEnumerator method of the HashTable collection returns an object of the IDictionaryEnumerator type, not an IEnumerator object. The IDictionaryEnumerator class is quite similar to the IEnumerator class, but it exposes additional properties. They are the Key and Value properties, and they return the current item's key and value. The IDictionaryEnumerator class also exposes the Entry property, which contains both the key and the value.

Assuming that you have populated the hTable collection with the same three Rectangle objects, you can use the statements in Listing 12.10 to iterate through the collection's items.

Example 12.10. Iterating a HashTable with an enumerator

Dim hEnum As IDictionaryEnumerator
hEnum = hTable.GetEnumerator
While hEnum.MoveNext
   Debug.WriteLine(
            String.Format("The area of rectangle " &
            "{0} is {1}", hEnum.Key, _
            CType(hEnum.Value, Rectangle).Width *
            CType(hEnum.Value, Rectangle).Height))
End While

If you execute these statements after populating the HashTable collection with three Rectangles, they will produce the same output as Listing 12.8.

Custom Sorting

The Sort method of the various collections allows you to sort collections, as long as the items are of the same base data type. If the items are objects, however, the collection doesn't know how to sort them. If you want to sort objects, you must help the collection a little by telling it how to compare the objects. A sorting operation is nothing more than a series of comparisons. Sorting algorithms compare items and swap them if necessary.

All the information needed by a sorting algorithm to operate on an item of any type is a function that compares two objects of the same custom type. Let's say you have a list of persons, and each person is a structure that contains names, addresses, e-mail addresses, and so on. The System.Collections class can't make any assumptions as to how you want your list sorted, not to mention that a collection can be sorted by any field (name, e-address, postal code, and so on).

The comparer is implemented as a separate class, outside all other classes in the project, and is specific to a custom data type. Let's say you have created a custom structure for storing contact information. The Person object is declared as a structure with the following fields:

Structure Person
   Dim Name As String
   Dim BirthDate As Date
   Dim EMail As String
End Structure

You'll probably build a class to represent persons, but I'm using a structure to simplify the code. To add an instance of the Person object to a collection, create a variable of Person type, initialize its fields, and then add it to the appropriate collection (a List, or Dictionary collection, for example). This collection can't be sorted with the simple forms of the Sort method because the compiler doesn't know how to compare two Person objects. You must provide your own function for comparing two variables of the Person type. After this function is written, the compiler can call it to compare items and therefore sort the collection. This custom function, however, can't be passed to the Sort and BinarySearch methods by name. You must create a new class that implements the IComparer interface and pass an instance of this class to the two methods.

Implementing the IComparer Interface

Here's the outline of a class that implements the IComparer interface:

Class customComparer : Implements IComparer
   Public Function Compare(
              ByVal o1 As Object, ByVal o2 As Object)
              As Integer Implements IComparer.Compare
      { function's code }
   End Function
End Class

The name of the class can be anything; I'm using the name customComparer to indicate that the class compares custom types. It should be a name that indicates the type of comparison it performs or the type of objects it compares. After the class declaration, you must specify the interface implemented by the class. As soon as you type the first line of the preceding code segment, the editor will insert automatically the stub of the Compare() function. That's because every class that implements the IComparer interface must provide a Compare method with the specific signature. The interface declares a placeholder for a function, whose code must be provided by the developer.

To use the custom function to compare items, you must create an object of the customComparer type (or whatever you have named the class) and then pass it to the Sort and BinarySearch methods as an argument:

Dim CompareThem As New customComparer
aList.Sort(CompareThem)

You can combine the two statements in one by initializing the customComparer variable in the line that calls the Sort method:

aList.Sort(New customComparer)

You can also use the equivalent syntax of the BinarySearch method to locate a custom object that implements its own IComparer interface:

aList.BinarySearch(object, New customComparer)

This is how you can use a custom function to compare two objects. Everything is the same, except for the name of the comparer, which is different every time.

The last step is to implement the function that compares the two objects and returns an integer value, indicating the order of the elements. This value should be −1 if the first object is smaller than the second object, 0 if the two objects are equal, and 1 if the first object is larger than the second object. Smaller here means that the element appears before the larger one when sorted in ascending order. Listing 12.11 is the function that sorts Person objects according to the BirthDate field. The sample code for this and the following section comes from the CustomComparer project (available for download from www.sybex.com/go/masteringvb2010). The main form contains a single button, which populates the collection and then prints the original collection, the collection sorted by name, and the collection sorted by birth date.

Example 12.11. A custom comparer

Class PersonAgeComparer : Implements IComparer
  Public Function Compare(
      ByVal o1 As Object, ByVal o2 As Object)
      As Integer Implements IComparer.Compare
     Dim person1, person2 As Person
     Try
        person1 = CType(o1, Person)
        person2 = CType(o2, Person)
     Catch compareException As system.Exception
        Throw (compareException)
        Exit Function
     End Try
     If person1.BirthDate < person2.BirthDate Then
        Return −1
     Else
        If person1.BirthDate > person2.BirthDate Then
           Return 1
        Else
           Return 0
        End If
     End If
  End Function
End Class

The code could have been considerably simpler, but I'll explain momentarily why the Try statement is necessary. The comparison takes place in the If statement. If the first person's birth date is chronologically earlier than the second person's, the function returns the value −1. If the first person's birth date is chronologically later than the second person's birth date, the function returns 1. Finally, if the two values are equal, the function returns 0.

The code is straightforward, so why the error-trapping code? Before you perform the necessary comparisons, you convert the two objects into Person objects. It's not unthinkable that the collection with the objects you want to sort contains objects of different types. If that's the case, the CType() function won't be able to convert the corresponding argument to the Person type, and the comparison will fail. The same exception that would be thrown in the function's code is raised again from within the error handler, and it's passed back to the calling code.

Implementing Multiple Comparers

Person objects can be sorted in many ways. You might want to sort them by ID, name, age, and so on. To accommodate multiple sorts, you must implement several classes, each one with a different implementation of the Compare() function. Listing 12.12 shows two classes that implement two different Compare() functions for the Person class. The PersonNameComparer class compares the names, whereas the PersonAgeComparer class compares the ages. Both classes, however, implement the IComparer interface.

Example 12.12. A class with two custom comparers

Class PersonNameComparer : Implements IComparer
  Public Function Compare(
            ByVal o1 As Object, ByVal o2 As Object)
            As Integer Implements IComparer.Compare
     Dim person1, person2 As Person
     Try
        person1 = CType(o1, Person)
        person2 = CType(o2, Person)
     Catch compareException As system.Exception
        Throw (compareException)
        Exit Function
     End Try
     If person1.Name < person2.Name Then
        Return −1
     Else
        If person1.Name > person2.Name Then
           Return 1
        Else
           Return 0
        End If
     End If
  End Function
End Class

Class PersonAgeComparer : Implements IComparer
  Public Function Compare(
            ByVal o1 As Object, ByVal o2 As Object)
            As Integer Implements IComparer.Compare
     Dim person1, person2 As Person
     Try
        person1 = CType(o1, Person)
        person2 = CType(o2, Person)
     Catch compareException As system.Exception
        Throw (compareException)
Exit Function
     End Try
     If person1.BDate > person2.BDate Then
        Return −1
     Else
        If person1.BDate < person2.BDate Then
           Return 1
        Else
           Return 0
        End If
     End If
  End Function
End Class

To test the custom comparers, create a new application, and enter the code of Listing 12.12 (it contains two classes) in a separate Class module. Don't forget to include the declaration of the Person structure. Then place a button on the form and enter the code of Listing 12.13 in its Click event handler. This code adds three persons with different names and birth dates to a List collection.

Example 12.13. Testing the custom comparers

Private Sub Button1_Click(...) Handles Button1.Click
  Dim Lst As New List()
  Dim p As Person
  ' Populate collection
  p.Name = "C Person"
  p.EMail = "[email protected]"
  p.BDate = #1/1/1961#
  If Not Lst.Contains(p) Then Lst.Add(p)
  p.Name = "A Person"
  p.EMail = "[email protected]"
  p.BDate = #3/3/1961#
  If Not Lst.Contains(p) Then Lst.Add(p)
  p.Name = "B Person"
  p.EMail = "[email protected]"
  p.BDate = #2/2/1961#
  If Not Lst.Contains(p) Then Lst.Add(p)
  ' Print collection as is
  Dim PEnum As IEnumerator
  PEnum = Lst.GetEnumerator
  ListBox1.Items.Add("Original Collection")
  While PEnum.MoveNext
     ListBox1.Items.Add(
          CType(PEnum.Current, Person).Name &
          vbTab & CType(PEnum.Current, Person).BDate)
  End While
  ' Sort by name, then print collection
ListBox1.Items.Add(" ")
  ListBox1.Items.Add("Collection Sorted by Name")
  Lst.Sort(New PersonNameComparer())
  PEnum = Lst.GetEnumerator
  While PEnum.MoveNext
     ListBox1.Items.Add(
          CType(PEnum.Current, Person).Name &
          vbTab & CType(PEnum.Current, Person).BDate)
  End While
  ' Sort by age, then print collection
  ListBox1.Items.Add(" ")
  ListBox1.Items.Add("Collection Sorted by Age")
  Lst.Sort(New PersonAgeComparer())
  PEnum = Lst.GetEnumerator
  While PEnum.MoveNext
     ListBox1.Items.Add(
          CType(PEnum.Current, Person).Name &
          vbTab & CType(PEnum.Current, Person).BDate)
  End While
End Sub

The four sections of the code are delimited by comments in the listing. The first section populates the collection with three variables of the Person type. The second section prints the items in the order in which they were added to the collection:

C Person   1/1/1961
A Person   3/3/1961
B Person   2/2/1961

The third section of the code calls the Sort method, passing the PersonNameComparer custom comparer as an argument, and it again prints the contents of the List. The names are listed now in alphabetical order:

A Person   3/3/1961
B Person   2/2/1961
C Person   1/1/1961

In the last section, it calls the Sort method again—this time to sort the items by age—and prints them:

C Person   1/1/1961
B Person   2/2/1961
A Person   3/3/1961

It is straightforward to write your own custom comparers and sort your custom object in any way that suits your application. Custom comparisons might include more complicated calculations, not just comparisons. For example, you can sort Rectangles by their area, color values by their hue or saturation, and customers by the frequency of their orders.

The Bottom Line

Make the most of arrays.

The simplest method of storing sets of data is to use arrays. They're very efficient, and they provide methods to perform advanced operations such as sorting and searching their elements. Use the Sort method of the Array class to sort an array's elements. To search for an element in an array, use the IndexOf and LastIndexOf methods, or use the BinarySearch method if the array is sorted. The BinarySearch method always returns an element's index, which is a positive value for exact matches and a negative value for near matches.

Master It

Explain how you can search an array and find exact and near matches.

Store data in collections such as List and Dictionary collections.

In addition to arrays, the Framework provides collections, which are dynamic data structures. The most commonly used collections are the List and the Dictionary. Collections are similar to arrays, but they're dynamic structures. List and ArrayList collections store lists of items, whereas Dictionary and HashTable collections store key-value pairs and allow you to access their elements via a key. You can add elements by using the Add method and remove existing elements by using the Remove and RemoveAt methods.

Dictionary collections provide the ContainsKey and ContainsValue methods to find out whether the collection already contains a specific key or value as well as the GetKeys and GetValues methods to retrieve all the keys and values from the collection, respectively. As a reminder, the List and Dictionary collections are strongly typed. Their untyped counterparts are the ArrayList and HashTable collections.

Master It

How will you populate a Dictionary with a few pairs of keys/values and then iterate though the collection's items?

Sort and search collections with custom comparers.

Collections provide the Sort method for sorting their items and several methods to locate items: IndexOf, LastIndexOf, and BinarySearch. Both sort and search operations are based on comparisons, and the Framework knows how to compare values' types only (Integers, Strings, and the other primitive data types). If a collection contains objects, you must provide a custom function that knows how to compare two objects of the same type.

Master It

How do you specify a custom comparer function for a collection that contains Rectangle objects?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.212.160