Chapter 5. Unordered Associative Containers

The more laws and order are made prominent,
The more thieves and robbers there will be.

The Way of Lao-tzu
LAO-TZU

Suppose that you’ve been asked to write the symbol table for a compiler. Whenever the code being compiled defines a new symbol, its name and its associated type information have to be stored in the symbol table. Whenever the code uses a name, the compiler has to find that name in the symbol table and dig out its properties. This is a typical job for an associative container: You have the value of a key, and you need to find the data associated with that key. With the current standard C++ library, you can do this with a map:


typedef std::string name;
struct type_data { type_id type; unsigned flags; };
std::map<name, type_data> sym_tab;

But compiler writers are speed freaks, and doing an O(logn) lookup, as map does, in a symbol table drives them nuts. So, after you show this to them and the screaming dies down, you rewrite your symbol table like this:


typedef std::string name;
struct type_data { type_id type; unsigned flags; };
std::tr1::unordered_map<name, type_data> sym_tab;

Now you’re providing constant-time lookups, and the compiler guys are happy.[1]

5.1. Standardizing Hash Tables

One of the most commonly requested additions to the standard C++ library is hash tables. Properly used, they can provide significant speed improvements in searches. A 1995 proposal to add STL-style hash tables to the standard library was rejected because the target date for completion of the standard did not leave enough time for proper evaluation and adaptation of the proposed changes. Now that there’s time, the TR1 library provides hash tables, under the names unordered_map, unordered_multimap, unordered_set, and unordered_multiset.[2] These template classes are defined in the headers <unordered_map> and <unordered_set>. In addition to these four containers, the TR1 library provides a template class named hash that provides the default hash functions for these containers.

These containers don’t quite fit into the existing set of container requirements, so TR1 provides a new set of requirements for what it formally refers to as “unordered associative containers.” Don’t let the name confuse you: Their requirements are slightly different from the requirements for associative containers, so the two kinds aren’t quite interchangeable. We’ll look at their differences in more detail a little later.

One of the biggest problems in designing a generic interface to a hash table is choosing the right set of parameters for tuning the operation of the resulting objects. With too few parameters, the template is too inflexible for broad use. With too many, it’s too flexible, and it becomes difficult to know what it’s really doing. TR1’s unordered containers can be tuned at compile time by selecting suitable type arguments for hashing data values and for comparing data values for equality. They can be tuned at runtime by setting the maximum allowable load factor, which determines how many objects can be inserted in the container without forcing it to reshuffle them into more buckets, and by calling the member function rehash, which forces that reshuffling.

5.2. Hash Tables

A hash table stores elements in buckets. Each bucket can hold zero or more elements. The hash table chooses a bucket for an element, based on the hash value for that element. The hash value for an element is determined by passing that element to a hash function. If this is done well, a hash table can provide constant-time searches.

Example 5.1. Hash Table (contunord/hashtable.cpp)


#include <algorithm>
#include <array>
#include <iomanip>
#include <iostream>

#include <iterator>
#include <limits>
#include <list>
using std::tr1::array;
using std::copy; using std::find;
using std::ostream_iterator;
using std::list;
using std::cout; using std::setw;
using std::numeric_limits;

typedef list <int> bucket;
typedef array <bucket, 5> table;

size_t hash (int i)
  { // return hash value for i
  return i;
  }

void show (const table& tbl)
  { // show contents of buckets in table
  for (int i = 0; i <tbl.size (); ++i)
    { // show contents of bucket i
    cout << "bucket " << setw (2) << i <<":";
    copy (tbl [i]. begin (), tbl [i]. end (),         
      ostream_iterator <int >(cout,""));         
    cout  <<  ' ';                             
    }                                                 
  }                                                   
                                                      
void insert (table& tbl, int val)                 
  { // insert val into table     
  size_t hash_val = hash (val) % tbl.size ();         
  tbl [ hash_val ]. push_back (val);                  
  }                                                   
                                                      
bool contains (const table& tbl, int val)         
  { // return true if tbl contains val
  int hash_val = hash (val) % tbl.size ();            
  bucket::const_iterator  first = tbl [ hash_val ]. begin ();
  bucket::const_iterator  last = tbl [ hash_val ]. end ();
  return find (first, last, val) != last;             
  }                                                   
                                                      
void report (const table& tbl, int val)           
  { // report whether tbl contains val

  cout << "table"          
    << (contains (tbl, val)? "contains"         
       :"does not contain")                            
    << val << ' ';                       
  }                                                   
                                                      
int main ()                                           
  { // demonstrate simple hash table
  table tbl;                                          
  insert (tbl, 3);                                    
  insert (tbl, 195);                                  
  insert (tbl, 5);                                    
  insert (tbl, 6);                                    
  insert (tbl, 55);                                   
  insert (tbl, 1);                                    
  insert (tbl, 33);                                   
  insert (tbl, 40);                                   
  show (tbl);                                         
  report (tbl, 3);                                    
  report (tbl, 4);                                    
  return 0;                                           
  }                                                   


This code stores integer values in a hash table with five buckets, each of which is an object of type std::list<int>. The hash function, hash, takes an argument of type int and simply converts it to a value of type size_t.[3] The function insert calls hash with the value to be inserted, reduces the result modulo the size of the container, and uses the result as an index to determine which linked list to append the value to. As you can see, the time it takes to insert a new element into this hash table does not depend on how may elements are already in the list. Thus, insertion into this table is O(1).

Searching this table for a value is more complicated. The function contains calls hash with the target value, reduces the result modulo the size of the container, and uses the result as an index to determine which linked list to search. The function then calls the standard algorithm find to look for the target value in that linked list. In looking for the target value, find walks through the list’s controlled sequence until it finds a matching value or reaches the end of the sequence.[4] Thus, the time needed to find an element is proportional to the number of elements in the linked list. That’s not a problem if only a few elements are in each linked list, but with a fixed number of linked lists, as in this particular version of a hash table, the linked lists become proportionately longer as we add elements to the table. On average, each linked list will have n/M elements, where n is the number of elements in the table and M is the number of buckets. When n/M is large, the time needed to find an element is proportional to the number of elements in the table, so is much longer than the constant-time lookup that hash tables are capable of.

The key to keeping hash table searches fast is to keep the number of elements in each bucket as low as is reasonable. Since the average number of elements in each bucket is n/M, this means that we need to keep that ratio low. To do this, as n increases, we also need to increase M. That is, we need to add more buckets to the table and redistribute the elements held in the table throughout the new set of buckets. This process, known as rehashing, is so important to maintaining the speed of hash tables that TR1 hash tables are automatically rehashed whenever the average number of elements in each bucket exceeds the table’s load factor.

5.3. Associative Containers and Unordered Containers

The C++ standard defines four standard categories of container types: containers, reversible containers, sequence containers, and associative containers. If you’re careful, these abstract categories allow you to design an application that can be implemented with one of several different container types, leaving you the flexibility to select the container to use later on, when you know more about the characteristics of the real-world data that the application will handle. If your application uses only operations defined for a particular container category, you can use any container type that satisfies the requirements for that category. For example, we saw in Chapter 4 that the TR1 template class array is a reversible container but not a sequence container. It can be used in place of a list, a vector, or a deque in any application that relies only on the operations defined for a reversible container.

TR1 adds a fifth container category. Unordered containers satisfy all the requirements for containers but do not support any of the six comparison operators.[5] Unordered containers are not reversible, so they also do not have the member functions rbegin and rend, which return reverse iterators. Other than that, unordered containers provide the same set of operations as associative containers. In addition, unordered containers have a set of operations for examining the contents of individual buckets and tuning the operation of the container.

5.4. Requirements for Unordered Containers

5.4.1. Container Requirements

According to the definition in the C++ standard, a container must support the requirements set out in this section.

A container type X must provide the following nested type names:

X::value_type: the type of the objects held in the container.

X::reference: the type of an lvalue of X::value_type.

X::const_reference: the type of a const lvalue of X::value_type.

X::const_iterator: a constant iterator type whose value type is X::value_type. It can belong to any iterator category except output iterator.

X::iterator: an iterator type whose value type is X::value_type. It can belong to any iterator category except output iterator, and objects of type X::iterator must be convertible to type X::const_iterator.

X::difference_type: the same type as iterator_traits<X::iterator>::difference_type.

X::size_type: an unsigned integer type that can represent any non-negative value of X::difference_type.

The following expressions must be valid and must have the given meaning. Here, a and b are objects of type X, and r has type X&.

X();X u;: constructs an object with an empty controlled sequence.

X(a);X u(a); X u = a;: constructs an object that is a copy of a.

(&a)-> X();: the destructor destroys each object in the controlled sequence and frees any memory allocated by a.

a.begin(); a.end();:for a const object a, returns an iterator object of type X::const_iterator; otherwise, returns an iterator object of type X::iterator. The range [a.begin(), a.end()) consists of iterators that point at each of the objects in a.

a.swap(b);: swaps the contents of a and b.

r=a;: the function returns X&, replacing r’s controlled sequence with a copy of b’s controlled sequence.

a.size();: returns a value of type X::size_type that holds the number of objects in a’s controlled sequence.

a.max_size();: returns a value of type X::size_type that holds the largest possible number of elements in a container of type X.

a.empty();: returns a value of type bool that holds the value of a.size() == 0.

The following expressions must be valid and must have the given meaning:

a==b: returns a value that is convertible to bool, with the value true only if a.size() == b.size() and std::equal(a.begin(), a.end(), b.begin()).

a != b: returns a value that is convertible to bool and holds the value !(a == b).

a < b: returns a value convertible to bool and holds the value returned by the call std::lexicographical_compare(a.begin(), a.end(), b.begin(), b.end()). If this expression is used, then for two objects c and d of type X::value_type, the expression c<d must be defined and must be a total ordering relationship.

a>b: returns a value that is convertible to bool and holds the value b < a.

a<=b: returns a value that is convertible to bool and holds the value !(a>b).

a>=b: returns a value that is convertible to bool and holds the value !(a<b).

5.4.2. Unordered Container Requirements

According to the definition in TR1, an unordered associative container must meet all the requirements for a container, as described in Section 5.4.1, except for the equality and inequality operations at the end. In addition, unordered associative containers must meet the requirements in this section.

Unordered associative containers must provide the following nested type names:

key_type: the type of the container’s keys.

key_equal: the type of the object used to compare keys for equality.

hasher: the type used to produce hash values.

local_iterator: an iterator type used to iterate within a bucket. The members of iterator_traits<local_iterator> must name the same types as the members of iterator.

const_ local_iterator: an iterator type used to iterate within a bucket. The members of iterator_traits<const_local_iterator> must name the same types as the members of const_iterator.

In the following lists, X is an unordered associative container type, a is an object of type X, b is an object of type X or const X, first and last are input iterators that point at objects of type X::value_type and [first, last) is a valid range, hf is an object of type X::hasher or const X::hasher, eq is an object of type X::key_equal or const X::key_equal, and n is a value of type X::size_type.

Unordered associative containers provide constructors that support the following code constructs.

X(n, hf, eq); X a(n, hf, eq); constructs an object with at least n buckets, holding no objects. The container will use a copy of hf to compute hash values and a copy of eq to compare keys. Its time complexity is O(n).

X(n, hf); X a(n, hf); constructs an object with at least n buckets, holding no objects. The container will use a copy of hf to compute hash values and key_equal() to compare keys. Its time complexity is O(n).

X(n); X a(n); constructs an object with at least n buckets, holding no objects. The container will use hasher() to compute hash values and key_equal() to compare keys. Its time complexity is O(n).

X(); X a; constructs an object with an unspecified number of buckets, holding no objects. The container will use hasher() to compute hash values and key_equal() to compare keys. Its time complexity is O(1).

X(first, last, n, hf, eq); X a(first, last, n, hf, eq); constructs an object with at least n buckets, holding no objects, then inserts the objects in the range [first, last) into the container. The container will use a copy of hf to compute hash values and a copy of eq to compare keys. Its average time complexity is O(N), where N is distance(first, last). Its worst-case time complexity is O(N2).

X(first, last, n, hf); X a(first, last, n, hf); constructs an object with at least n buckets, holding no objects, then inserts the objects in the range [first, last) into the container. The container will use a copy of hf to compute hash values and key_equal() to compare keys. Its average time complexity is O(N), where N is distance(first, last). Its worst-case time complexity is O(N2).

X(first, last, n); X a(first, last, n); constructs an object with at least n buckets, holding no objects, then inserts the objects in the range [first, last) into the container. The container will use hasher() to compute hash values and key_equal() to compare keys. Its average time complexity is O(N), where N is distance(first, last). Its worst-case time complexity is O(N2).

X(first, last); X a(first, last); constructs an object with an unspecified number of buckets, holding no objects, then inserts the objects in the range [first, last) into the container. The container will use hasher() to compute hash values and key_equal() to compare keys. Its average time complexity is O(N), where N is distance(first, last). Its worst-case time complexity is O(N2).

X(b); X a(b); constructs an object that is a copy of b, including copies of b’s hash object, equality predicate, and maximum load factor. Its average time complexity is O(b.size()), and its worst-case time complexity is O(b.size()2).

a = b; replaces the contents of a with the contents of b, including copies of b’s hash object, equality predicate, and maximum load factor. Its average time complexity is O(b.size()), and its worst-case time complexity is O(b.size()2).

The following functions insert single objects into an unordered associative container. If the container does not allow duplicate keys, the functions return an object of type pair<X::iterator, bool>. The iterator object points at the element that compares equal to the function’s argument. The bool value indicates whether the object was newly inserted—which happens only when there is no object equal to the argument in the container—or whether the object was already present. The object q is a valid, dereferenceable iterator that points at an element of a, and r is a valid, dereferenceable const iterator that points at an element of a. The average time complexity of these functions is O(1), and their worst-case time complexity is O(a.size()).

a.insert(t); inserts an object equal to t into a.

a.insert(q, t); inserts an object equal to t into a, The iterator q points at an element that might be close to the position where t belongs.

a.insert(r, t); inserts an object equal to t into a, The iterator r points at an element that might be close to the position where t belongs.

This function inserts multiple objects into an unordered associative container.

a.insert(i, j); the iterators i and j must not point at objects in a. The function returns void. It calls a.insert(t) for each of the objects pointed at by the iterators in the range [i, j).

The following functions remove objects from an unordered associative container. The objects q1 and q2 are objects of type X::iterator that point at elements in the container, and the objects r1 and r2 are objects of type X::const_iterator that point at elements of the container.

a.erase(k); the function returns a value of type X::size_type. It removes all elements whose key compares equal to the Key value k and returns the number of elements that were removed. Its average time complexity is O(a count(k)), and its worst-case time complexity is O(a.size()).

a.erase(q); a.erase(r); the first function returns an object of type X::iterator and the second function returns an object of type X:: const_iterator. The functions remove the element pointed at by q and r, respectively, and return an iterator object that points at the element that followed that element. Their average time complexity is O(1), and their worst-case time complexity is O(a.size()).

a.erase(q1, q2); a.erase(r1, r2); the first function returns an object of type X::iterator. The second function returns an object of type X::const_iterator. The functions remove the elements pointed at by the iterators in the range [q1, q2) and [r1, r2), respectively, and return an iterator object that points at the element that followed the erased elements. Their average time complexity is O(distance(q1,q2)) and O(distance(r1,r2)), respectively, and their worst-case time complexity is O(a.size()).

a.clear(); the function returns void. It removes all the elements from the container. Its time complexity is O(a.size()).

The following functions search an unordered associative container for objects that match a given key k:

b.find(k); for a const object b, returns an iterator object of type X::const_iterator;otherwise, returns an iterator object of type X ::iterator. The returned object points at an object in b whose key compares equal to the X::Key value k. If no such object exists, the returned object is equal to b.end(). Its average time complexity is O(1), and its worst-case time complexity is O(b.size()).

b.count(k); returns a value of type X::size_type, equal to the number of objects in b whose key compares equal to the X::Key value k. Its average time complexity is O(1), and its worst-case time complexity is O(b.size()).

b.equal_range(k); for a const object b, returns an object of type pair<X::const_iterator, X::const_iterator>; otherwise, returns an object of type pair<X::iterator, X::iterator>. The pair defines a range that contains all the objects in b whose key compares equal to the X::Key value k. If no such object exists, the function returns make_pair(b.end(), b.end()). Its average time complexity is O(b.count(k)), and its worst-case time complexity is O(b.size()).

The preceding member functions are all required for associative containers.[6] Unordered associative constainers provide additional member functions that support the following code constructs.

b.hash_function() returns an object of type X::hasher that is a copy of b’s hash object.

b.key_eq() returns an object of type X::key_equal that is a copy of b’s equality predicate.

b.bucket_count() returns a value of type X::size_type that is the number of buckets in b.

b.max_bucket_count() returns a value of type X::size_type that is an upper bound on the number of buckets that b can contain.

b.bucket(key) returns a value of type X::size_type that is the index of the bucket where objects whose key is equal to key would be found.

b.bucket_size(n) returns a value of type X::size_type that is the number of objects in the bucket at index n. Its time complexity is O(M), where M is the number of objects in the bucket.

b.begin(n); b.end(n); for a const object b, returns an object of type X::const_local_iterator;otherwise, returns an object of type X::local_iterator. The range [b.begin(n), b.end(n)) contains all the objects in the nth bucket.

b.load_factor() returns a value of type float whose value is the average number of objects in each bucket.

b.max_load_factor() returns a value of type float whose value is the target load factor. See Section 5.10.

a.max_load_factor(z) sets the target load factor to the positive floating-point value z. See Section 5.10.

a.rehash(n) resizes the container so that it has at least n buckets and its load factor is less than or equal to its target load factor. See Section 5.10.

5.4.3. Exception Safety

The first rule of exception safety is: Don’t write hash objects or equality predicates that throw exceptions. I can’t imagine a reason for wanting to do that, but the TR1 specification doesn’t prohibit it. It just doesn’t promise much of anything if either of those functions throws an exception.

Here are the rules that unordered associative containers must follow with regard to exceptions.

• The clear member function does not throw exceptions.

• The erase member functions do not throw exceptions other than those thrown by the hash object or the equality predicate.

• The insert member functions that insert one element have no effect if an exception is thrown during their execution other than by the hash object.

• The swap function does not throw exceptions other than any exceptions thrown by the copy constructor or the copy assignment operator of the hash object or the predicate object.

• The rehash function has no effect if an exception is thrown during its execution other than by the hash object or the equality predicate.

5.5. The Headers <unordered_map> and <unordered_-set>

The header <unordered_map> has the definitions for the two templates unordered_map and unorded_multimap and their corresponding swap functions.

           
namespace std {         // C++ Standard Library
 namespace tr1 {        // TR1 additions
                                                      
    // TEMPLATE CLASS unordered_map
template <class Key, class Ty,                    
    class Hash = hash<Key>,                    
    class Pred = equal_to<Key>,                
    class Alloc = allocator <pair < const Key, Ty > > >
      class unordered_map;       
                                                      
    // TEMPLATE CLASS unordered_multimap
template < class Key, class Ty,                    
    class Hash = hash <Key >,                   
    class Pred = equal_to <Key>,                
    class Alloc = allocator <pair < const Key, Ty> > >
      class unordered_multimap;  
                                                      
    //TEMPLATE FUNCTIONS swap    
template <class Key, class Ty,                     
  class Hash, class Pred, class Alloc>             
void swap (                      
  unordered_map (Key, Ty, Hash, Pred, Alloc>& left, 
  unordered_map (Key, Ty, Hash, Pred, Alloc>& right);
template <class Key, class Ty,                     
  class Hash, class Pred, class Alloc>             
void swap (                      
  unordered_multimap (Key, Ty, Hash, Pred, Alloc>& left,
  unordered_multimap (Key, Ty, Hash, Pred, Alloc>& right);
                                                      
} }                                                   

The header <unordered_set> has the definitions for the two templates unordered_set and unorded_multiset and their corresponding swap functions.

           
namespace std {         // C++ Standard Library
 namespace tr1 {        // TR1 additions
                                                      
     // TEMPLATE CLASS unordered_set
template <class  Key,                             
    class Hash = hash<Key>,                    
    class Pred = equal_to<Key>,                
    class Alloc = allocator <Key> >          
      class unordered_set;       
                                                      
    // TEMPLATE CLASS unordered_multiset
template <class Key,                              
    class Hash = hash<Key>,                    
    class Pred = equal_to<Key>,                
    class Alloc = allocator<Key> >          
      class unordered_multiset;  
                                                      
    // TEMPLATE FUNCTIONS swap    
template <class Key, class Hash, class Pred, class Alloc>
void swap (unordered_set (Key, Hash, Pred, Alloc >& left,
    unordered_set (Key, Hash, Pred, Alloc >& right);
template <class Key, class Hash, class Pred, class Alloc>
void swap (unordered_multiset (Key, Hash, Pred, Alloc >& left,
    unordered_multiset (Key, Hash, Pred, Alloc>& right);
                                                      
} }                                                   

The full synopsis for each of the four unordered containers is rather lengthy and mostly repeats the things we looked at in Section 5.4, so instead of going through all their details,[7] we’ll look at a summary of the functions that they have in common with their ordered counterparts in Section 5.9. We’ll look at what the template parameters mean in Section 5.7, at the various constructors in Section 5.8, at rehashing in Section 5.10, and at how to tune a container in Section 5.11.

5.6. The Class Template hash

The header <functional> defines the template class hash:

           
namespace std {                                       
 namespace tr1 {                                      

    // CLASS TEMPLATE hash
template<class Ty> struct hash
  : unary_function<Ty, size_t>                 
  { // function object to compute hash values
  size_t operator()(Ty val) const;                   
  };                                                  
} }                                                   

The template class hash<Ty> describes a callable type whose function call operator takes one argument of type Ty and returns a value of type std::size_t. Its function call operator shall not throw any exceptions and shall return the same value when called with equal arguments.

The template class shall be defined in such a way that it can be instantiated with the following types:

bool

• Integer types: char, signed char, unsigned char, short, unsigned short, int, unsigned int, long, unsigned long, _-Longlong, _ULonglong

• Floating-point types: float, double, long double

• Pointer types

• Strings: std::string, std::wstring

Objects of type hash<Ty> provide the default hash functions for the unordered containers. If the type of your container’s key is not one of the types required for hash, you can write your own specialization.

Example 5.2. Specializing hash (contunord/hash.cpp)


           
#include <functional>                           
#include <iostream>                             
#include <string>                               
#include <iterator>                             
#include <vector>                               
using std::tr1::hash;                                 
using std::cout; using std::string;                   
using std::vector; using std::iterator_traits;        
                                                      
template <class InIt>                           
void show_hashes (InIt first, InIt last)              
  { // demonstrate use of hash<Ty>
  typedef typename                                    

    iterator_traits <InIt>:: value_type type;
  hash <type> hasher;                           
  while (first != last)                               
    cout << hasher (*first++) << '';      
  cout << ' ';                                 
  }                                                   
                                                      
struct coord                                          
  { // two-dimensional integer coordinates
  int x, y;                                           
  };                                                  
                                                      
namespace std {                                       
namespace tr1 { // put specialization in std::tr1
template  <>                                    
struct hash <coord>                             
  { // template specialization for struct coord
  std::size_t operator ()(const coord & val) const
    { // return hash value for val
    hash <int> make_hash;                       
    return make_hash (val .x) + make_hash (val .y);   
    }                                                 
  };                                                  
}}                                                    
                                                      
#define SIZE (arr) (sizeof (arr) / sizeof (* arr))    
                                                      
int main ()                                           
  { // demonstrate use and specialization of hash<Ty>
  int data [] = { 1, 2, 3, 4, 5 };                    
  show_hashes (data, data + SIZE (data));             
                                                      
  char * text [] = { "1", "2", "3", "4", "5" };       
  vector <string> strings (text,  text + SIZE (text));
  show_hashes (strings.begin (),   strings.end ());   
                                                      
  coord points [] = { { 0, 0 }, { 0, 1 }, { 1, 0 },   
    { 1, 1 }, { 2, 2 } };                             
  show_hashes (points,   points + SIZE (points));     
  return  0;                                          
  }                                                   


5.7. Instantiating the Unordered Containers

As we’ve seen, the template classes unordered_set and unordered_multiset take a single template parameter, referred to as the Key type, that gives the type of the objects to be held in the container. They both take additional template parameters that have default values, so you often won’t have to write them out.

Similarly, the template classes unordered_map and unordered_multimap take two template parameters: a Key type that gives the type of the objects to be used as keys and an associated type Ty that gives the type of the objects that will be associated with those keys. They also take the same list of additional template parameters with default values that unordered_set and unordered_multiset take.

The Key type is used to store and find objects in the unordered associative containers. In order to do this, it must be possible to compute hash values for objects of type Key. By default, the containers all use the callable type std::tr1::hash<Key>. If that type is valid for the Key type, either because it’s supported by the implementation or because your code supplies a specialization for your key type, the default will work. Otherwise, you need to provide your own type for computing hash values. You do that by passing the name of your type as the next argument in the template’s parameter list:

           
struct MyKey                                          
{                                                     
// whatever…                   
};                                                    
                                                      
struct MyHash                                         
{                                                     
size_t operator ()(myKey key) const;                  
};                                                    
                                                      
  // uses hash<MyKey>:     
std::tr1::unordered_set <MyKey> set1;           
  // uses MyHash:                
std::tr1::unordered_set <MyKey, MyHash> set2;   

The containers also use std::equal_to<Key> to decide whether two key values are equal. That template, in turn, returns the result of applying operator== to the key objects. So if you have an operator== for your key types, or if you have a specialization of std::equal_to for your key types, the default equality predicate will work. Otherwise, you’ll need to provide your own predicate type:

           
namespace std { // put specialization in std
template <>                                     
struct equal_to <MyKey>:                        
  public binary_function <MyKey, MyKey, bool>   
{                                                     
bool operator ()(const MyKey&, const MyKey&) const;
};                                                    
}                                                     
                                                      
struct MyEq                                           
{                                                     
bool operator ()(const MyKey&, const MyKey&) const;
};                                                    
                                                      
   // uses equal_to<MyKey>:
std::tr1::unordered_set <MyKey, MyHash> set3;   
   // uses MyEq:                 
std::tr1::unordered_set <MyKey, MyHash, MyEq> set4;

Finally, as with all the other containers, you can pass a final template parameter that gives the type of the allocator for the container to use when it needs to allocate memory.

In all these cases, the container will use the default constructor to create the objects that it uses for hashing, comparing, and allocating. If your type needs something more than default construction, in addition to naming the type as a template parameter, you must pass a suitably constructed object to the container’s constructor. The container will make copies of those objects and use them for hashing and comparing.

5.8. Constructors

You can specify the initial contents of an unordered associative container in three ways. You can provide no initial contents, you can provide another container to copy, and you can provide a range of iterators that point at initial values. We’ve seen these three forms of constructor in Section 5.4.1 and Section 5.4.2. They are used like this:

           
unordered_set<int> u0;                       //no initialization
unordered_set<int> u1(u0);                  //copy initialization
unordered_set<int> u2(u1.begin(), u1.end());
                                              //range initialization

Of course, using another container to provide a range is simply a handy example. You can use any range defined by a pair of iterators whose associated value_type is the same as the type held by the container.

The first and the third forms of constructor can also be used with additional arguments. You can pass a value of type X::hasher if your hash object needs initialization. If you pass an X::hasher object, you can also pass a value of type X::key_equal if your equality predicate needs initialization. Finally, if you’ve passed both of these, you can also pass an object of type X::allocator if your allocator needs initialization.

5.9. Container Operations

Just as with ordered containers, you can insert a value t by calling the member function container.insert(t). You can insert with a hint by using insert(q, t), where q is an iterator into the container. You can insert a sequence of values with insert(i, j), where i and j are iterators that designate a sequence of values.

To remove elements, unordered containers provide the usual clear and erase member functions. You can call erase with a value to remove all elements whose key compares equal to that value. You can also call it with an iterator that designates the element to be removed. Finally, you can call it with a pair of iterators that designates a range of elements within the container to be removed.

To search for elements, unordered containers provide the member functions find, count, and equal_range. Each of them takes a key value to search for. The member function find returns an iterator that points to an element whose key compares equal to the key value, or an iterator equal to end() if no such element exists. The member function count returns the number of elements whose keys compare equal to the key value. The member function equal_range returns a pair of iterators that designates a range of elements within the container, all of whose keys compare equal to the key value.[8]

If you stick to these operations, you can write code that works with both the associative containers and the unordered associative containers. For example, the following program[9] works when it’s implemented with either a std::map or a std::tr1::unordered_map.

Example 5.3. Basic Operations (contunord/basics.cpp)


           
#include <unordered_map>                        
#include <iostream>                             
#include <ostream>                              

#include <iomanip>       
#include <string>                               
#include <utility>                              
#include <algorithm>                            
#include <iterator>                             
#include <functional>                           
using std::tr1::unordered_multimap;                   
using std::string; using std::make_pair; using std::pair;
using std::setw; using std::setfill;                  
using std::copy; using std::cout;                     
using std::basic_ostream;   using std::ostream_iterator;
using std::ios_base; using std::ios;                  
using std::unary_function;                            
                                                      
typedef unordered_multimap <string, string> dictionary;
typedef dictionary::value_type element;               
                                                      
static const char * pairs [] =                        
  { // English/German word pairs 
  "day",      "Tag",                                  
  "strange",  "fremd",                                
  "car",      "Auto",                                 
  "smart",    "elegant",                              
  "trait",    "Merkmal",                              
  "strange",  "seltsam",                              
  "smart",    "raffiniert",                           
  "smart",    "klug",                                 
  "clever",   "raffiniert",                           
  0,            0
  };

namespace std  { // add inserter to namespace std
template  <class Elem, class Traits>
basic_ostream <Elem, Traits>&  operator <<(
  basic_ostream <Elem, Traits>& str, const element& elt)
  { // insert element into stream and restore flags
  ios_base::fmtflags flags = str.flags ();
  str.setf (ios::left, ios::adjustfield);
  str << '' << setw (10) << elt.first << elt.second;
  str.flags (flags);
  return str;
  }
}

template <class InIt, class OutIt, class Pred>

OutIt copy_if (InIt  first, InIt last, OutIt dest, Pred pr)
  { // copy elements for which pr(*first) is true
  for (; first != last; ++ first, ++ dest)
    if (pr (* first))
      * dest = * first;
  return  dest;
  }

struct  equals
  : unary_function <element, bool>
  { // callable type that matches second object in pair to string
  equals (const string& s) : str (s) {}
  bool operator ()(const element& elt) const
    { // return true for match
    return elt.second ==  str;
    }
private :
  string str;
  };

int main ()
  { // demonstrate use of unordered_multimap
  dictionary  dict;

    // initialize:
  const char ** cur = pairs;
  while (cur [0])
    { // add initial entries
    dict.insert (make_pair (cur [0],  cur [1]));
    cur +=  2;
    }

    // print out all elements
  ostream_iterator <element> output (cout, " ");
  cout << make_pair (" English ", "German") << ' ';
  cout << setfill (' - ') << setw (20) << ""
    << setfill ('') <<   ' ';
  copy (dict.begin (), dict.end (),  output);

    // print out all values for key "smart"
  string key (" smart ");
  cout << ' ' << key << ": ";
  copy (dict.lower_bound (key), dict.upper_bound (key),
    output);

    // print out all keys for value "raffiniert"
  string value (" raffiniert ");
  cout <<' ' << value << ": ";
  copy_if (dict.begin (), dict.end (),
    output,  equals (value));
  return 0;
  }


5.10. Load Factors and Rehashing

We saw earlier that it’s important to keep the average number of objects per bucket down if we want fast searches. In fact, there are some formalisms for talking about that. A hash table’s load factor is the average number of objects per bucket. When a hash table’s load factor is less than its target load factor, inserting another object won’t trigger a rehash. You can get the hash table’s current load factor by calling the member function load_factor(). You can get the table’s target load factor by calling the member function max_load_factor(), and you can set the table’s target load factor by calling max_load_factor(z), where z is the new value for the load factor.

The member function rehash(n) sets the number of buckets to at least n but large enough so that the load factor is less than the target load factor and then redistributes the hash table’s objects into the new set of buckets. This usually reduces the number of objects in each bucket, which, in turn, improves search performance.

You won’t ordinarily have to call rehash, though, because the container does it for you when the load factor gets too high. Technically, the requirement goes the other way around: as long as the load factor is less than the maximum load factor, insertions don’t invalidate iterators. This means that objects can’t be moved to other buckets, which in turn means that the table can’t be rehashed. There is no requirement to rehash when the load factor gets too large, just a caution that rehashing might happen. That’s a careful way of allowing incremental rehashing, which gradually increases the number of buckets in the table and moves some elements into new buckets on each insertion, rather than doing everything at once. That distributes the rehashing work over several insertions, which can improve responsiveness in real-time systems. If you’ve done a bunch of insertions and you’re not going to do any more, it’s a good idea to call rehash to make sure that rehashing has been completed.

Example 5.4. Load Factors and Rehashing (contunord/rehash.cpp)



#include <unordered_set>
#include <iostream>
using std::tr1::unordered_set;
using std::cout;

typedef unordered_set <int> iset;

static void show_details (const iset& set)
  { // show container properties
  cout << "load factor: " << set.load_factor ()
    << " target load factor: " << set.max_load_factor()
    << " buckets: " << set.bucket_count() << ' ';
  }

int main ()
  { // show growth pattern
  iset  set;
  show_details (set);
  int i;
  for (i = 0; i < 20; ++ i)
    set.insert (i);
  show_details (set);
  for (; i < 40; ++ i)
    set.insert (i);
  show_details (set);
  for (; i < 60; ++ i)
    set.insert (i);
  show_details (set);
  set.max_load_factor (2.0);
  show_details (set);
  set.rehash (10);
  show_details (set);
  set.rehash (30);
  show_details (set);
  return  0;
  }


5.11. Tuning

Tweaking the hash table’s load factor can create a better distribution of objects among the buckets, but it won’t help if your hash function isn’t doing a good job. This isn’t the place to talk about the details of good hash functions—that’s a large topic in itself. When you’re storing data in a hash table, what you need to know is how well the hash function distributes the data that you have. You can look at the distribution with the member functions bucket_count(), which tells you how many buckets the hash table has; bucket_size(n), which tells you how many objects are in the nth bucket; and with the member functions begin(n) and end(n), which return iterators that you can use to look at the objects in the nth bucket. If your hash table isn’t getting the performance you expect, this information can point you to the culprit. You may need to replace the hash function.

Example 5.5. Tuning a Hash Table (contunord/tuning.cpp)



#include <unordered_set>
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <iterator>
using std::cout; using std::setw;
using std::copy; using std::ostream_iterator;

typedef std::tr1::unordered_set <int> iset;
typedef iset::value_type elt;

static  void show_buckets (const iset& set)
  { // show details of buckets in set
  cout << setw (3) << set.size () << "elements,"
    << setw (3) << set.bucket_count () << "buckets,"
    << "load factor" << set.load_factor () << ". ";
  for (int i = 0; i < set.bucket_count (); ++ i)
    cout << i << ':' << set.bucket_size (i) << "";
  cout << ' ';
  ostream_iterator <elt> output (cout, "");
  for (int i = 0; i <set.bucket_count (); ++i)
    { // show contents of bucket i
    cout << setw (3) << i << ":";
    copy (set.begin (i), set .end (i), output);
    cout << ' ';
    }
  }

int main ()

  { // demonstrate use of bucket functions
  iset  set;
  for (int i = 0; i <100; ++i)
    set.insert (i);
  show_buckets (set);
  return  0;
  }


Further Reading

The Art of Computer Programming [Knu98b, 513–559] has the usual encyclopedic discussion of hashing.

Introduction to Algorithms [CLR90, 219–243] has a more approachable discussion.

Exercises

Exercise 1

For each of the following errors, write a simple test case containing the error, and try to compile it. In the error messages, look for the key words that relate to the error in the code.

1. Attempting to create an unordered container that holds an object type with no specialization of std::tr1::hash.

2. Attempting to create an unordered container that holds an object type with no specialization of std::equal_to and no operator==.

Exercise 2

Define a struct named elt that holds a value of type unsigned int, and write a specialization of std::tr1::hash whose function call operator computes hash values for elt by returning twice the object’s stored value. Create an object of type std::tr1::unordered_set<elt>, put a dozen or so objects of type elt into it, and examine the distribution of values in the buckets.

Exercise 3

Change the specialization of std::tr1::hash in the previous exercise by adding a constructor that takes a value of type int, with a default value of 2, and stores it in the object. Change the function call operator to return that stored value times the value stored in the elt object.

• Create an object of type std::tr1::unordered_set<elt>, and pass an object of type std::tr1::hash<elt> created with the default constructor to its constructor. Put a dozen or so objects of type elt into it, and examine the distribution of the values in the buckets.

• Create an object of type hash<elt>, initialized with the value 3, and an object of type std::tr1::unordered_set<elt>, initialized with a size and the newly created hash<elt> object. Put a dozen or so objects of type elt into it, and examine the distribution of the values in the buckets.

• Try the same thing again, with the hash<elt> object initialized with the value 4.

• Try the same thing again, with the hash<elt> object initialized with the value 0.

Exercise 4

Change the specialization of std::tr1::hash in the previous exercise so that it returns the remainder when the value stored in the elt object is divided by 4. Put a dozen or so objects of type elt into it, and examine the distribution of the values in the buckets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.111.41