43

Data Structures in C++

Mark Allen Weiss

Florida International University

43.1Introduction

43.2Basic Containers

Sequence ContainersAssociative ContainersContainer Adapters

43.3Iterators

BasicsReverse Iterators

43.4Additional Components of the STL

Sorting, Searching, and Selection

Acknowledgments

References

43.1Introduction

In C++, several classic data structures are implemented as a part of the standard library, commonly known as the Standard Template Library. The Standard Template Library (or simply, the STL) consists of a set of container classes, such as lists, sets, and maps; iterator classes that are used to traverse the container classes; and generic algorithms, such as sorting and searching. As its name implies, the library consists of (both function and class) templates. The STL is very powerful and makes some use of advanced C++ features. Our discussion is focused on the most basic uses of the STL, but does include some features found only in C++11 or later.

We partition our discussion of the Standard Template Library into the following sections:

1.Containers

2.Iterators

3.Generic Algorithms

See References 17 for additional material on the Standard Template Library.

43.2Basic Containers

The STL defines several container templates that store collections of objects. Some collections are unordered, while others are ordered. Some collections allow duplicates, while others do not. All containers support the following operations:

int size( ) const: returns the number of elements in the container.

void clear( ): removes all elements from the container.

bool empty( ) const: returns true if the container contains no elements and false otherwise.

There is no universal add or insert member function; different containers use different names.

The container classes can be split into three broad categories:

1.Sequence containers maintain items with a notion of a position in the collection. Examples include vector, list, and deque.

2.Associative containers maintain items in a manner that allows efficient searching, insertion, and deletion (using balanced search trees, and starting in C++11, hash tables). Examples include set, multiset, map, and multimap.

3.Container adapters built on top of other containers to yield classic data structures. Examples include stack, queue, and priority_queue.

Associated with all containers is an iterator. An iterator represents the notion of a position in the container and is used to step through the container. All containers support the following operations:

iterator begin( ): returns an appropriate iterator representing the first item in the container.

iterator end( ): returns an appropriate iterator representing the end marker in the container (i.e., the position after the last item in the container).

We defer the discussion of iterators to Section 43.3.

43.2.1Sequence Containers

The three basic sequence containers in the STL are the vector, list, and deque.

vector is a growable array. The vector wraps an internal array, maintaining a notion of its size, and internally its current capacity. If a sequence of additions would cause the size to exceed capacity, the capacity is automatically doubled. This makes additions at the end of the vector take constant amortized time. list is a doubly linked list, in which pointers to both ends are maintained. deque is, in effect, a growable array that grows at both ends. There are several well-known ways to implement deque efficiently, but the standard is silent about which implementation must be used.

For vector, an insertion or deletion takes amortized time that is proportional to the distance from the back, while for a deque, these operations take amortized time that is proportional to the smaller of the distances from the front and back. For a list, these operations take worst-case time that is proportional to the smaller of the distances from the front and back if an index is specified, but constant worst-case time if the position is specified by an existing iterator. vector and deque support indexing in constant worst-case time; list does not.

The basic operations that are supported by both containers are:

void push_back( const Object & x ): adds x to the end of the container.

Object & back( ): returns the object at the end of the container (an accessor that returns a constant reference is also provided).

void pop_back( ): removes the object at the end of the container.

Object & front( ): returns the object at the front of the container (an accessor that returns a constant reference is also provided).

iterator insert( iterator pos, const Object & x ): adds x into the container, prior to the position given by the iterator pos. This is a constant-time operation for list, but not for vector or deque. The return value is an iterator representing the position of the inserted item.

iterator erase( iterator pos ): removes the object at the position given by the iterator. This is a constant-time operation for list, but not vector or deque. The return value is the position of the element that followed pos prior to the call. This operation invalidates pos, which is now stale.

iterator erase( iterator start, iterator end ): removes all items beginning at position start, up to but not including end. Observe that an entire container can be erased by the call: c.erase( c.begin( ), c.end( ) ).

For deque and list, two additional operations are available with expected semantics:

void push_front( const Object & x ): adds x to the front of the container.

void pop_front( ): removes the object at the front of the container.

The list also provides a splice operation that allows the transfer of a sublist to somewhere else.

void splice( iterator pos, list & l, iterator start, iterator end ): moves all items beginning at position start, up to but not including end to after pos. The moved items are assumed to have come from list l. The running time of this operation is proportional to the number of items moved, unless the items are moved within the same list (i.e., &l==this), in which case the running time is constant time.

For vector and deque, additional operations include:

Object & operator[] ( int idx ): returns the object at index idx in the container, with no bounds-checking (an accessor that returns a constant reference is also provided).

Object & at( int idx ): returns the object at index idx in the container, with bounds-checking (an accessor that returns a constant reference is also provided).

int capacity( ) const: returns the internal capacity.

void reserve( int newCapacity ): for vector only, sets the new capacity. If a good estimate is available, it can be used to avoid array expansion.

C++11 adds an array container that represents a fixed size array with no space overhead, and which is instantiated with a type and size (e.g., array<int,6>). array is useful for embedded systems as a safer replacement for built-in arrays, but does not support insertion or deletion. Generally speaking, the non-modifying operations discussed earlier, such as at, begin and end, and size work for array also.

C++11 also adds a forward_list, which is a singly linked list. Because in a standard list, insert and erase add and remove prior to the position given in the iterator, those operations would be linear time in a singly linked list. Accordingly, in a forward_list you’ll find additional member functions insert_after and erase_after that provide constant-time performance.

43.2.2Associative Containers

The STL provides two basic types of associative containers. Sets are used to store items. The multiset allows duplicates, while the set does not. Maps are used to store key/value pairs. The multimap allows duplicate keys, while the map does not. In classic C++, for sets, the items are logically maintained in sorted order, so an iterator to the “beginning item” represents the smallest item in the collection. For maps, the keys are logically maintained in sorted order. As a result, the most natural implementation of sets and maps makes use of balanced binary search trees; typically a top-down red black tree (see Chapter 11) is used. The significant liability is that basic operations take logarithmic worst-case time.

Historically, STL implementations (such as SGI’s version) provided hash_set and hash_map, which make use of hash tables (see Chapter 9) and provided constant average time per basic operation, but were non-standard. Finally, C++11 added four additional classes that maintain the sets and maps without order, using separate chaining hash tables. These classes are unordered_set, unordered_multiset, unordered_map, and unordered_multimap; the class names are chosen to avoid conflicting with various preexisting nonstandard classes.

C++ ordered sets and maps use either a default ordering (operator<) or one provided by a function object. In C++, a function object is implemented by providing a class that contains an overloaded operator() and then instantiating a template with the class name as a template parameter; similarly, unordered sets and maps use either a default hash function and equality operator (for instance, with strings), or ones provided by function objects (an example of this is shown later in Figures 43.1 and 43.2).

fig43_1.jpg

Figure 43.1Using the set class template.

fig43_2.jpg

Figure 43.2Using the set class template.

Both sets and maps make use of the pair class template. The pair class template stores two data members first and second, which can be directly accessed, without invoking member functions. Internally, maps store items of type pair. Additionally, several set member functions need to return two things; this is done by returning an appropriately instantiated pair.

43.2.2.1Sets and Multisets

The set and unordered_set provide several member functions in addition to the usual suspects, including:

pair<iterator,bool> insert( const Object & x ): adds x to the set. If x is not already present, the returned pair will contain the iterator representing the already contained x, and false. Otherwise, it will contain the iterator representing the newly inserted x, and true.

pair<iterator,bool> insert( iterator hint, const Object & x ): same behavior as the one-parameter insert, but allows specification of a hint, representing the position where x should go. If the underlying implementation is a finger search tree (see Chapter 12), or equivalent, the performance could be better than using the one-parameter insert.

iterator find( const Object & x ) const: returns an iterator representing the position of x in the set. If x is not found, the endmarker is returned.

int erase( const Object & x ): removes x (if present) and returns the number of items removed, which is either 0 or 1 in a set, and perhaps larger in a multiset.

iterator erase( iterator pos ): same behavior as the sequence container version.

iterator erase( iterator start, iterator end ): same behavior as the sequence container version.

iterator lower_bound( const Object & x ): returns an iterator to the first element in the set with a key that is greater than or equal to x (only for set).

iterator upper_bound( const Object & x ): returns an iterator to the first element in the set with a key that is greater than x (only for set).

pair<iterator,iterator> equal_range( const Object & x ): returns a pair of iterators representing lower_bound and upper_bound.

A multiset is like a set except that duplicates are allowed. The return type of insert is modified to indicate that the insert always succeeds:

iterator insert( const Object & x ): adds x to the multiset; returns an iterator representing the newly inserted x.

iterator insert( iterator hint, const Object & x ): adds x to the multiset; returns an iterator representing the newly inserted x. Performance might be enhanced if x is inserted close to the position given by hint.

For the multiset, the erase member function that takes an Object x removes all occurrences of x. To simply remove one occurrence, first call find to obtain an iterator, and then use the erase member function that takes an iterator.

In a multiset, to find all occurrences of x, we cannot simply call find; that returns an iterator referencing one occurrence (if there is one), but which specific occurrence is returned is not guaranteed. Instead, the range returned by a call to equal_range is used.

Figure 43.1 illustrates two sets: s1 which stores strings using the normal case-sensitive ordering and s2 which stores strings using case-insensitive comparisons. s2 is instantiated by providing the class of a function object as a template parameter. As a result, s1 contains three strings, but s2 considers "hello" and "HELLO" to be identical, and thus only stores one of them. Figure 43.2 shows a main that implements identical logic for unordered_sets.

43.2.2.2Maps and Multimaps

A map behaves like a set instantiated with a pair representing a key and value, with a comparison function that refers only to the key. Thus it supports all of the set operations. Similarly, an unordered_map behaves like a map, with hash function and equality test refers only to the key.

The find operation for maps requires only a key, but the iterator that it returns references a pair. Similarly, erase requires only a key and otherwise behaves like the set’s erase. insert is supported, but to use insert, we must insert a properly instantiated pair, which is cumbersome, and thus rarely done. Instead, the map overloads the array indexing operator[]:

ValueType & operator[] ( const KeyType & key ): if the key is present in the map, a reference to the value is returned. If the key is not present in the map, it is inserted with a default value into the map, and then a reference to the inserted default value is returned. The default value is obtained by applying a zero-parameter constructor or is zero for the primitive types.

The semantics of operator[] do not allow an accessor version of operator[], and so operator[] cannot be used on an immutable map. For instance, if a map is passed by constant reference, inside the routine, operator[] is unusable. In such a case, we can use find to obtain an iterator, test if the iterator represents the endmarker (in which case the find has failed), and if the iterator is valid, we can use it to access the pair’s second component, which is the value.

A multimap is a map in which duplicate keys are allowed. multimaps behave like maps but do not support operator[].

Figure 43.3 illustrates a combination of map and vector in which the keys are email aliases, and the values are the email addresses corresponding to each alias. Since there can be several addresses for each alias, the values are themselves vectors of strings. The trickiest part (other than the various parameter-passing modes) concerns the one-line implementation of addAlias. There, we see that aliasMap[name] refers to the vector value in the map for the name key. If name is not in the map, then the call to operator[] causes it to be placed into the map, with a default value being a zero-sized vector. Thus whether or not name was in the map prior to calling addAlias, after the access of the map, the call to push_back updates the vector value correctly.

fig43_3.jpg

Figure 43.3Using the map class template.

43.2.3Container Adapters

The STL provides adapter class templates to implement stacks, queues, and priority queues. These class templates are instantiated with the type of object to be stored and (optionally, if the default is unacceptable) the container class (such as vector, list, or deque) that is used to store the objects. Thus we can specify a linked list or array implementation for a stack, though in reality, this feature does little more than add nicer names than push_back to the (existing list or vector, respectively) interface.

43.2.3.1stack and queue

For stack, the member functions are push, pop, and top. For queue, we get push, front, and pop. By default, a vector is used for stack and a deque for queue. Figure 43.4 shows how to use the queue adapter, implemented with a doubly linked list.

fig43_4.jpg

Figure 43.4Using the queue adapter.

43.2.3.2priority_queue

The priority_queue template contains member functions named push, top, and pop, that mirror insert, findmax, and deletemax in a classic max-heap.

The priority queue template is instantiated with an item type, the container type (as in stack and queue), and the comparator, with defaults allowed for the last two parameters. Sometimes priority queues are set up to remove and access the smallest item instead of the largest item. In such a case, the priority queue can be instantiated with an appropriate greater function object to reverse the comparison order and allow access to the smallest item.

Figure 43.5 illustrates both a max-heap and a min-heap.

fig43_5.jpg

Figure 43.5Using the priority_queue adapter.

43.3Iterators

The iterator in C++ abstracts the notion of a position in the container. As we have already seen, there are several ways to obtain such a position. All containers provide begin and end member functions that return iterators representing the first item and endmarker, respectively. Many containers provide a searching member function (e.g., set::find) that returns an iterator viewing the found item or the endmarker if the item is not found.

43.3.1Basics

Throughout our discussion, we have used iterator as a type. In reality, in C++, each container defines several iterator types, nested in the scope of the container, and these specific iterator types are used by the programmer instead of simply using the word iterator. For instance, if we have a vector<int>, the basic iterator type is vector<int>::iterator. The basic iterator can be used to traverse and change the contents of the container.

Another iterator type, vector<int>::const_iterator, does not allow changes to the container on which the iterator is operating. All iterators are guaranteed to have at least the following set of operations:

++itr and itr++: advance the iterator itr to the next location. Both the prefix and postfix forms are available. This does not cause any change to the container.

*itr: returns a reference to the container object that itr is currently representing. The reference that is returned is modifiable for basic iterators, but is not modifiable (i.e., a constant reference) for const_iterators.

itr1==itr2: returns true if iterators itr1 and itr2 refer to the same position in the same container and false otherwise.

itr1!=itr2: returns true if iterators itr1 and itr2 refer to different positions or different containers and false otherwise.

Some iterators efficiently support - -itr and itr- -. Those iterators are called bidirectional iterators. All of the iterators for the common containers vector, list, deque, set, and map are bidirectional.

Some iterators efficiently support both itr+=k and itr+k. Those iterators are called random-access iterators. itr+=k advances the iterator k positions. itr+k returns a new iterator that is k positions ahead of itr. Also supported by random access iterators are itr-=k and itr-k, with obvious semantics, and itr1-itr2 which yields a separation distance as an integer. The iterators for vector and deque support random access.

All C++ containers have two member functions: begin and end that return iterators. Each collection defines four member functions:

iterator begin( )

const_iterator begin( ) const

iterator end( )

const_iterator end( ) const

begin returns an iterator that is positioned at the first item in the container. end returns an iterator that is positioned at the endmarker, which represents a position one past the last element in the container. On an empty container, begin and end return the same position. For random access iterators, the result of subtracting the return values of end and begin is always the size of the container.

A common use of an iterator is to sequentially scan through a container. In classic C++, this is done by initializing a local iterator to be a copy of the begin iterator, and have it step through the container, stopping as soon as it hits the endmarker.

As an example, Figure 43.6 shows a print function that prints the elements of any container, provided that the elements in the container have implemented an operator<<. Note that we must use a const_iterator to traverse the container, because the container is itself immutable in the scope of print. The test program illustrates five different containers that invoke the print function, along with the expected output (in comments). Observe that both set and multiset output in sorted order, with multiset allowing the second insertion of beta. Additionally, for map to be compatible with the print routine, we must provide an operator<< that works for the elements of the map, namely the appropriate pairs.

fig43_6.jpg

Figure 43.6Generic printing routine that works with five different containers.

In C++11, the range for loop can be used to simplify this idiom, for any container that provides an appropriate iteration mechanism. The code to implement print in Figure 43.7 is mechanically converted by the compiler into the equivalent code, shown in Figure 43.6. Note that since C++11 allows the compiler to deduce the needed type when auto is appropriately used, in many instances the details of whether iterator or const_iterator is needed are no longer something the programmer needs to worry about.

fig43_7.jpg

Figure 43.7Range for loop version of code in Figure 43.6.

43.3.2Reverse Iterators

Sometimes it is important to be able to traverse a container in the reverse direction. Because of the asymmetric nature of begin and end (representing the first element and the endmarker, rather than the last element, respectively) this is cumbersome to do, even for bidirectional iterators. As a result, containers that support bidirectional iterators typically also provide a reverse iterator. The reverse iterator comes in two flavors: reverse_iterator and const_reverse_iterator. For reverse iterators, ++ retreats one position toward the beginning, while - - advances one position toward the end. Container member functions rbegin and rend are used to obtain iterators representing the last element and the beginmarker, respectively. The code in Figure 43.8 prints any container in reverse order. Once again, the use of auto in C++11 removes from the programmer the burden of deciding whether reverse_iterator or const_reverse_iterator is needed.

fig43_8.jpg

Figure 43.8Printing a container in reverse.

43.4Additional Components of the STL

43.4.1Sorting, Searching, and Selection

The Standard Library includes a rich set of functions that can be applied to the standard containers. Some of the algorithms include routines for sorting, searching, copying (possibly with substitutions), shuffling, reversing, rotating, merging, and so on. In all there are over 60 generic algorithms. In this section, we highlight those related to efficient sorting, selection, and binary search.

43.4.1.1Sorting

Sorting in C++ is accomplished by use of function template sort. The parameters to sort represent the start and endmarker of a (half-open range in a) container and an optional comparator:

void sort( iterator begin, iterator end )

void sort( iterator begin, iterator end, Comparator cmp )

The iterators must support random access. As an example, in

 sort( v.begin( ), v.end( ) );
 sort( v.begin( ), v.end( ), greater<int>( ) );
 sort( v.begin( ), v.begin( ) + min( v.size ( ), 5 ) );

the first call sorts the entire container, v, in nondecreasing order. The second call sorts the entire container in nonincreasing order. The third call sorts the first five elements of the container in nondecreasing order. C++11 requires that sort provides O(N log N) worst-case performance; recent implementations have used a quicksort/heapsort hybird algorithm (introsort) [8], but no specific implementation is mandated.

The C++ library inherits the sorting routine qsort from C. qsort uses pointers to functions to perform its comparison making it significantly slower on average than the sort routine. Furthermore, many implementations use a version of quicksort that has been shown to provide quadratic behavior on some commonly occurring inputs [9]. Avoid using qsort.

43.4.1.2Selection

The function template nth_element is used for selection, and as expected has O(N) average-case running time. The parameters include a pair of iterators, and k:

void nth_element( iterator begin, int k, iterator end )

void nth_element( iterator begin, int k, iterator end, Comparator c)

As a result of calling nth_element, the item in the kth position is the kth smallest element, where counting as usual in C++ begins at 0.

Thus, in

 nth_element( v.begin( ), 0, v.end( ) );
 nth_element( v.begin( ), 0, v.end( ), greater<int>( ) );
 nth_element( v.begin( ), v.begin( ) + ( v.end( ) - vbegin( ) / 2 ), v.end( ) );

the first call places the smallest element in the position given by v.begin( ), the second call places the largest element in that position, and the third call places the median in the middle position.

43.4.1.3Searching

Several generic searching algorithms are available for containers. The most basic is:

iterator find( iterator begin, iterator end, const Object & x ): returns an iterator representing the first occurrence of x in the half-open range specified by begin and end, or end if x is not found.

find is simply a sequential search. If the range is sorted, binary_search can be used to find an object in logarithmic time. A comparator can be provided or the default ordering can be used. The signatures for binary_search are:

iterator binary_search( iterator begin, iterator end, const Object & x )

iterator binary_search( iterator begin, iterator end, const Object & x, Comparator cmp )

equal_range, lower_bound, and upper_bound search sorted ranges and behave with the same semantics as the identically named member functions in set.

Acknowledgments

Parts of this chapter are based upon the author’s work in References 6 and 7.

References

1.J. Lajoie, S. Lippman, and B.E. Moo, C++ Primer, Fifth Edition, Addison-Wesley, Reading, MA, 2013.

2.S. Meyers, Effective STL, Addison-Wesley, Reading, MA, 2001.

3.D.R. Musser and A. Saini, C++ Programming with the Standard Template Library, Addison-Wesley, Reading, MA, 1996.

4.B. Stroustrop, The C++ Programming Language, Fourth Edition, Addison-Wesley, Boston, MA, 2013.

5.B. Stroustrop, The Design and Evolution of C++, Addison-Wesley, Reading, MA, 1994.

6.M.A. Weiss, C++ for Java Programmers, Prentice-Hall, Englewood Cliffs, NJ, 2004.

7.M.A. Weiss, Data Structures and Algorithm Analysis in C++, Fourth Edition, Pearson, Boston, MA, 2014.

8.D.R. Musser, Introspective Sorting and Selection Algorithms, Software: Practice and Experience, 27, 1997, 983–993.

9.J.L. Bentley and M.D. McIlroy, Engineering: A Sort Function, Software: Practice and Experience, 23, 1993, 1249–1265.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.8.216