1.16. The Hashtable Container

The System.Collections namespace provides a Hashtable container. A Hashtable represents a key/value pair in which the key is used for fast lookup. In other languages, we might call our table map or Dictionary. We'll use the Hashtable object to hold the occurrence count of the individual words.

We'll use the word as the key. The occurrence count is the value. If the word is not yet entered in the table, we add the pair, setting the occurrence count to 1. Otherwise, we use the key to retrieve and increment the occurrence count. Here is what this looks like:

Hashtable words = new Hashtable();
int dim1_length = sentences.GetLength( 0 );

for( int ix = 0; ix < dim1_length; ++ix )
{
     foreach ( string st in sentences[ ix ] )
     {
         // normalize each word to lowercase
         string key = st.ToLower();

         // is the word currently in Hashtable?
         // if not, then we add it ...

            if ( ! words.Contains( key ))
                   words.Add( key, 1 );

         // otherwise, we increment the count
            else
                 words[ key ] = (int) words[ key ] + 1;
     }
}

To discover if a key is present in a Hashtable, we invoke its predicate Contains() method, which returns true if the entry is found. We add the key/value pair to a Hashtable either by an explicit assignment:

words[ key ] = 1;

or through the Add() method:

words.Add( key, 1 );

Typically, the key type is a string or one of the built-in numeric types. The value type, which holds the actual data to be retrieved, is generally more application specific—often a class implemented to support the problem domain. The Hashtable can support different key and value types because it declares both its key and value members to be of type object.

However, because Hashtable stores its value as an object type, we must explicitly cast it back to its original type. For a value type, recall, this requires unboxing. Any modification to the unboxed value is not automatically reflected in the stored instance, so we must reset it:

words[ key ] = (int) words[ key ] + 1;

Keeping track of trivial words, such as a, an, if, but, and so on, may or may not be useful. One strategy for eliminating these words from our occurrence count is to create a second Hashtable of common words—for example,

Hashtable common_words = new Hashtable();

common_words.Add( "the", 0 );
common_words.Add( "but", 0 );
// ...
common_words.Add( "and", 0  );

and to check each word against this table before entering it into our main table:

foreach ( string st in sentences[ ix ])
{
    string key = st.ToLower();
    if ( common_words.Contains( key ))
         continue;
}

All that's left for the completion of our program is to print out the occurrence count of the words to the output file in dictionary order. How do we do that?

Our first thought is to iterate across the Hashtable using a foreach loop. To do that, we access each key/value pair in turn through a DictionaryEntry object, which provides a Key and a Value pair of properties:

foreach ( DictionaryEntry de in word )
     fwriter.WriteLine( "{0} : {1}", de.Key, de.Value );

As with many programming solutions, this both works and doesn't work. The good news is that it prints out the word and occurrence count of each element. The bad news is that it doesn't do so in dictionary order. For example, here are the first few entries:

lime-tinted : 1
waist : 1
bold : 1

The problem is that the key is inserted within the Hashtable based on the key's hash value, and we cannot directly override that. One solution is to access the key values, place them in a sortable container, and then sort and iterate over that container. For each now sorted key, we retrieve the associated value:

ArrayList aKeys = new ArrayList( words.Keys );
aKeys.Sort();

foreach ( string key in aKeys )
    fwriter.WriteLine( "{0} :: {1}", key, words[ key ] );

This solves the final sticking point in our solution:

apron :: 1
around :: 1
blossoms :: 3

The IDictionary interface provides an abstract model for the storage and retrieval of key/value pairs. (We look at interfaces in detail in Chapter 4.) The Hashtable implements this interface, in which both the key and the value are defined to be of type object. Several strongly typed, specialized classes defined under System.Collections.Specialized are also available. These include

  • StringDictionary: a hash table with the key strongly typed to be a string rather than an object. The key is treated as case insensitive.

  • ListDictionary: an IDictionary implementation using a singly-linked list. If the number of elements is ten or less, it is smaller and faster than a Hashtable.

  • NameValueCollection: a sorted collection of string keys and string values. These can be accessed through either a key hash or an index. NameValueCollection stores multiple string values under a single key.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.151.44