Mark Allen Weiss
Florida International University
Sequence Containers•Associative Containers•Container Adapters
43.4Additional Components of the STL
Sorting, Searching, and Selection
In C++, several classic data structures are implemented as a part of the standard library, commonly known as the Standard Template Library. The Standard Template Library (or simply, the STL) consists of a set of container classes, such as lists, sets, and maps; iterator classes that are used to traverse the container classes; and generic algorithms, such as sorting and searching. As its name implies, the library consists of (both function and class) templates. The STL is very powerful and makes some use of advanced C++ features. Our discussion is focused on the most basic uses of the STL, but does include some features found only in C++11 or later.
We partition our discussion of the Standard Template Library into the following sections:
1.Containers
2.Iterators
3.Generic Algorithms
See References 1–7 for additional material on the Standard Template Library.
The STL defines several container templates that store collections of objects. Some collections are unordered, while others are ordered. Some collections allow duplicates, while others do not. All containers support the following operations:
•int size( ) const
: returns the number of elements in the container.
•void clear( )
: removes all elements from the container.
•bool empty( ) const
: returns true
if the container contains no elements and false
otherwise.
There is no universal add
or insert
member function; different containers use different names.
The container classes can be split into three broad categories:
1.Sequence containers maintain items with a notion of a position in the collection. Examples include vector
, list
, and deque
.
2.Associative containers maintain items in a manner that allows efficient searching, insertion, and deletion (using balanced search trees, and starting in C++11, hash tables). Examples include set
, multiset
, map
, and multimap
.
3.Container adapters built on top of other containers to yield classic data structures. Examples include stack
, queue
, and priority_queue
.
Associated with all containers is an iterator. An iterator represents the notion of a position in the container and is used to step through the container. All containers support the following operations:
•iterator begin( )
: returns an appropriate iterator representing the first item in the container.
•iterator end( )
: returns an appropriate iterator representing the end marker in the container (i.e., the position after the last item in the container).
We defer the discussion of iterators to Section 43.3.
The three basic sequence containers in the STL are the vector
, list
, and deque
.
vector
is a growable array. The vector
wraps an internal array, maintaining a notion of its size, and internally its current capacity. If a sequence of additions would cause the size to exceed capacity, the capacity is automatically doubled. This makes additions at the end of the vector
take constant amortized time. list
is a doubly linked list, in which pointers to both ends are maintained. deque
is, in effect, a growable array that grows at both ends. There are several well-known ways to implement deque
efficiently, but the standard is silent about which implementation must be used.
For vector
, an insertion or deletion takes amortized time that is proportional to the distance from the back, while for a deque
, these operations take amortized time that is proportional to the smaller of the distances from the front and back. For a list
, these operations take worst-case time that is proportional to the smaller of the distances from the front and back if an index is specified, but constant worst-case time if the position is specified by an existing iterator. vector
and deque
support indexing in constant worst-case time; list
does not.
The basic operations that are supported by both containers are:
•void push_back( const Object & x )
: adds x
to the end of the container.
•Object & back( )
: returns the object at the end of the container (an accessor that returns a constant reference is also provided).
•void pop_back( )
: removes the object at the end of the container.
•Object & front( )
: returns the object at the front of the container (an accessor that returns a constant reference is also provided).
•iterator insert( iterator pos, const Object & x )
: adds x
into the container, prior to the position given by the iterator pos
. This is a constant-time operation for list
, but not for vector
or deque
. The return value is an iterator representing the position of the inserted item.
•iterator erase( iterator pos )
: removes the object at the position given by the iterator. This is a constant-time operation for list
, but not vector
or deque
. The return value is the position of the element that followed pos
prior to the call. This operation invalidates pos
, which is now stale.
•iterator erase( iterator start, iterator end )
: removes all items beginning at position start
, up to but not including end
. Observe that an entire container can be erased by the call: c.erase( c.begin( ), c.end( ) )
.
For deque
and list
, two additional operations are available with expected semantics:
•void push_front( const Object & x )
: adds x
to the front of the container.
•void pop_front( )
: removes the object at the front of the container.
The list
also provides a splice
operation that allows the transfer of a sublist to somewhere else.
•void splice( iterator pos, list & l, iterator start, iterator end )
: moves all items beginning at position start
, up to but not including end
to after pos
. The moved items are assumed to have come from list l
. The running time of this operation is proportional to the number of items moved, unless the items are moved within the same list (i.e., &l==this
), in which case the running time is constant time.
For vector
and deque
, additional operations include:
•Object & operator[] ( int idx )
: returns the object at index idx
in the container, with no bounds-checking (an accessor that returns a constant reference is also provided).
•Object & at( int idx )
: returns the object at index idx
in the container, with bounds-checking (an accessor that returns a constant reference is also provided).
•int capacity( ) const
: returns the internal capacity.
•void reserve( int newCapacity )
: for vector
only, sets the new capacity. If a good estimate is available, it can be used to avoid array expansion.
C++11 adds an array
container that represents a fixed size array with no space overhead, and which is instantiated with a type and size (e.g., array<int,6>
). array
is useful for embedded systems as a safer replacement for built-in arrays, but does not support insertion or deletion. Generally speaking, the non-modifying operations discussed earlier, such as at
, begin
and end
, and size
work for array
also.
C++11 also adds a forward_list
, which is a singly linked list. Because in a standard list
, insert
and erase
add and remove prior to the position given in the iterator, those operations would be linear time in a singly linked list. Accordingly, in a forward_list
you’ll find additional member functions insert_after
and erase_after
that provide constant-time performance.
The STL provides two basic types of associative containers. Sets are used to store items. The multiset
allows duplicates, while the set
does not. Maps are used to store key/value pairs. The multimap
allows duplicate keys, while the map
does not. In classic C++, for sets, the items are logically maintained in sorted order, so an iterator to the “beginning item” represents the smallest item in the collection. For maps, the keys are logically maintained in sorted order. As a result, the most natural implementation of sets and maps makes use of balanced binary search trees; typically a top-down red black tree (see Chapter 11) is used. The significant liability is that basic operations take logarithmic worst-case time.
Historically, STL implementations (such as SGI’s version) provided hash_set
and hash_map
, which make use of hash tables (see Chapter 9) and provided constant average time per basic operation, but were non-standard. Finally, C++11 added four additional classes that maintain the sets and maps without order, using separate chaining hash tables. These classes are unordered_set
, unordered_multiset
, unordered_map
, and unordered_multimap
; the class names are chosen to avoid conflicting with various preexisting nonstandard classes.
C++ ordered sets and maps use either a default ordering (operator<
) or one provided by a function object. In C++, a function object is implemented by providing a class that contains an overloaded operator()
and then instantiating a template with the class name as a template parameter; similarly, unordered sets and maps use either a default hash function and equality operator (for instance, with strings), or ones provided by function objects (an example of this is shown later in Figures 43.1 and 43.2).
Figure 43.1Using the set
class template.
Figure 43.2Using the set
class template.
Both sets and maps make use of the pair
class template. The pair
class template stores two data members first
and second
, which can be directly accessed, without invoking member functions. Internally, maps store items of type pair
. Additionally, several set
member functions need to return two things; this is done by returning an appropriately instantiated pair
.
43.2.2.1Sets and Multisets
The set
and unordered_set
provide several member functions in addition to the usual suspects, including:
•pair<iterator,bool> insert( const Object & x )
: adds x
to the set. If x
is not already present, the returned pair
will contain the iterator representing the already contained x
, and false
. Otherwise, it will contain the iterator representing the newly inserted x
, and true
.
•pair<iterator,bool> insert( iterator hint, const Object & x )
: same behavior as the one-parameter insert
, but allows specification of a hint, representing the position where x
should go. If the underlying implementation is a finger search tree (see Chapter 12), or equivalent, the performance could be better than using the one-parameter insert
.
•iterator find( const Object & x ) const
: returns an iterator representing the position of x
in the set. If x
is not found, the endmarker is returned.
•int erase( const Object & x )
: removes x
(if present) and returns the number of items removed, which is either 0 or 1 in a set
, and perhaps larger in a multiset.
•iterator erase( iterator pos )
: same behavior as the sequence container version.
•iterator erase( iterator start, iterator end )
: same behavior as the sequence container version.
•iterator lower_bound( const Object & x )
: returns an iterator to the first element in the set with a key that is greater than or equal to x
(only for set
).
•iterator upper_bound( const Object & x )
: returns an iterator to the first element in the set with a key that is greater than x
(only for set
).
•pair<iterator,iterator> equal_range( const Object & x )
: returns a pair of iterators representing lower_bound
and upper_bound
.
A multiset
is like a set except that duplicates are allowed. The return type of insert
is modified to indicate that the insert always succeeds:
•iterator insert( const Object & x )
: adds x
to the multiset; returns an iterator representing the newly inserted x
.
•iterator insert( iterator hint, const Object & x )
: adds x
to the multiset; returns an iterator representing the newly inserted x
. Performance might be enhanced if x
is inserted close to the position given by hint
.
For the multiset
, the erase
member function that takes an Object x
removes all occurrences of x
. To simply remove one occurrence, first call find
to obtain an iterator, and then use the erase
member function that takes an iterator.
In a multiset
, to find all occurrences of x
, we cannot simply call find
; that returns an iterator referencing one occurrence (if there is one), but which specific occurrence is returned is not guaranteed. Instead, the range returned by a call to equal_range
is used.
Figure 43.1 illustrates two set
s: s1
which stores string
s using the normal case-sensitive ordering and s2
which stores string
s using case-insensitive comparisons.
s2
is instantiated by providing the class of a function object as a template parameter. As a result, s1
contains three strings, but s2
considers "hello"
and "HELLO"
to be identical, and thus only stores one of them. Figure 43.2 shows a main
that implements identical logic for unordered_set
s.
43.2.2.2Maps and Multimaps
A map
behaves like a set
instantiated with a pair
representing a key and value, with a comparison function that refers only to the key. Thus it supports all of the set
operations. Similarly, an unordered_map
behaves like a map
, with hash function and equality test refers only to the key.
The find
operation for maps requires only a key, but the iterator that it returns references a pair
. Similarly, erase
requires only a key and otherwise behaves like the set
’s erase
. insert
is supported, but to use insert
, we must insert a properly instantiated pair
, which is cumbersome, and thus rarely done. Instead, the map
overloads the array indexing operator[]
:
•ValueType & operator[] ( const KeyType & key )
: if the key is present in the map, a reference to the value is returned. If the key is not present in the map, it is inserted with a default value into the map, and then a reference to the inserted default value is returned. The default value is obtained by applying a zero-parameter constructor or is zero for the primitive types.
The semantics of operator[]
do not allow an accessor version of operator[]
, and so operator[]
cannot be used on an immutable map
. For instance, if a map
is passed by constant reference, inside the routine, operator[]
is unusable. In such a case, we can use find
to obtain an iterator, test if the iterator represents the endmarker (in which case the find
has failed), and if the iterator is valid, we can use it to access the pair
’s second
component, which is the value.
A multimap
is a map
in which duplicate keys are allowed. multimap
s behave like map
s but do not support operator[]
.
Figure 43.3 illustrates a combination of map
and vector
in which the keys are email aliases, and the values are the email addresses corresponding to each alias. Since there can be several addresses for each alias, the values are themselves vector
s of string
s. The trickiest part (other than the various parameter-passing modes) concerns the one-line implementation of addAlias
. There, we see that aliasMap[name]
refers to the vector
value in the map
for the name
key. If name
is not in the map
, then the call to operator[]
causes it to be placed into the map
, with a default value being a zero-sized vector
. Thus whether or not name
was in the map
prior to calling addAlias
, after the access of the map
, the call to push_back
updates the vector
value correctly.
Figure 43.3Using the map
class template.
The STL provides adapter class templates to implement stacks, queues, and priority queues. These class templates are instantiated with the type of object to be stored and (optionally, if the default is unacceptable) the container class (such as vector
, list
, or deque
) that is used to store the objects. Thus we can specify a linked list or array implementation for a stack, though in reality, this feature does little more than add nicer names than push_back
to the (existing list
or vector
, respectively) interface.
43.2.3.1stack
and queue
For stack
, the member functions are push
, pop
, and top
. For queue
, we get push
,
front
, and pop
. By default, a vector
is used for stack
and a deque
for queue
. Figure 43.4 shows how to use the queue
adapter, implemented with a doubly linked list.
Figure 43.4Using the queue
adapter.
43.2.3.2priority_queue
The priority_queue
template contains member functions named push
, top
, and pop
, that mirror insert
, findmax
, and deletemax
in a classic max-heap.
The priority queue template is instantiated with an item type, the container type (as in stack
and queue
), and the comparator, with defaults allowed for the last two parameters. Sometimes priority queues are set up to remove and access the smallest item instead of the largest item. In such a case, the priority queue can be instantiated with an appropriate greater
function object to reverse the comparison order and allow access to the smallest item.
Figure 43.5 illustrates both a max-heap and a min-heap.
Figure 43.5Using the priority_queue
adapter.
The iterator in C++ abstracts the notion of a position in the container. As we have already seen, there are several ways to obtain such a position. All containers provide begin
and end
member functions that return iterators representing the first item and endmarker, respectively. Many containers provide a searching member function (e.g., set::find
) that returns an iterator viewing the found item or the endmarker if the item is not found.
Throughout our discussion, we have used iterator
as a type. In reality, in C++, each container defines several iterator types, nested in the scope of the container, and these specific iterator types are used by the programmer instead of simply using the word iterator
. For instance, if we have a vector<int>
, the basic iterator type is vector<int>::iterator
. The basic iterator can be used to traverse and change the contents of the container.
Another iterator type, vector<int>::const_iterator
, does not allow changes to the container on which the iterator is operating. All iterators are guaranteed to have at least the following set of operations:
•++itr
and itr++
: advance the iterator itr
to the next location. Both the prefix and postfix forms are available. This does not cause any change to the container.
•*itr
: returns a reference to the container object that itr
is currently representing. The reference that is returned is modifiable for basic iterators, but is not modifiable (i.e., a constant reference) for const_iterator
s.
•itr1==itr2
: returns true
if iterators itr1
and itr2
refer to the same position in the same container and false
otherwise.
•itr1!=itr2
: returns true
if iterators itr1
and itr2
refer to different positions or different containers and false
otherwise.
Some iterators efficiently support - -itr
and itr- -
. Those iterators are called bidirectional iterators. All of the iterators for the common containers vector
, list
, deque
, set
, and map
are bidirectional.
Some iterators efficiently support both itr+=k
and itr+k
. Those iterators are called random-access iterators. itr+=k
advances the iterator k
positions. itr+k
returns a new iterator that is k
positions ahead of itr
. Also supported by random access iterators are
itr-=k
and itr-k
, with obvious semantics, and itr1-itr2
which yields a separation distance as an integer. The iterators for vector
and deque
support random access.
All C++ containers have two member functions: begin
and end
that return iterators. Each collection defines four member functions:
•iterator begin( )
•const_iterator begin( ) const
•iterator end( )
•const_iterator end( ) const
begin
returns an iterator that is positioned at the first item in the container. end
returns an iterator that is positioned at the endmarker, which represents a position one past the last element in the container. On an empty container, begin
and end
return the same position. For random access iterators, the result of subtracting the return values of end
and begin
is always the size of the container.
A common use of an iterator is to sequentially scan through a container. In classic C++, this is done by initializing a local iterator to be a copy of the begin
iterator, and have it step through the container, stopping as soon as it hits the endmarker.
As an example, Figure 43.6 shows a print
function that prints the elements of any container, provided that the elements in the container have implemented an operator<<
. Note that we must use a const_iterator
to traverse the container, because the container is itself immutable in the scope of print
. The test program illustrates five different containers that invoke the print
function, along with the expected output (in comments). Observe that both set
and multiset
output in sorted order, with multiset
allowing the second insertion of beta
. Additionally, for map
to be compatible with the print
routine, we must provide an operator<<
that works for the elements of the map
, namely the appropriate pair
s.
Figure 43.6Generic printing routine that works with five different containers.
In C++11, the range for loop can be used to simplify this idiom, for any container that provides an appropriate iteration mechanism. The code to implement print
in Figure 43.7 is mechanically converted by the compiler into the equivalent code, shown in Figure 43.6. Note that since C++11 allows the compiler to deduce the needed type when auto
is appropriately used, in many instances the details of whether iterator
or
const_iterator
is needed are no longer something the programmer needs to worry about.
Figure 43.7Range for loop version of code in Figure 43.6.
Sometimes it is important to be able to traverse a container in the reverse direction. Because of the asymmetric nature of begin
and end
(representing the first element and the endmarker, rather than the last element, respectively) this is cumbersome to do, even for bidirectional iterators. As a result, containers that support bidirectional iterators typically also provide a reverse iterator. The reverse iterator comes in two flavors: reverse_iterator
and const_reverse_iterator
. For reverse iterators, ++
retreats one position toward the beginning, while - -
advances one position toward the end. Container member functions rbegin
and rend
are used to obtain iterators representing the last element and the beginmarker, respectively. The code in Figure 43.8 prints any container in reverse order. Once again, the use of auto
in C++11 removes from the programmer the burden of deciding whether reverse_iterator
or const_reverse_iterator
is needed.
Figure 43.8Printing a container in reverse.
43.4Additional Components of the STL
43.4.1Sorting, Searching, and Selection
The Standard Library includes a rich set of functions that can be applied to the standard containers. Some of the algorithms include routines for sorting, searching, copying (possibly with substitutions), shuffling, reversing, rotating, merging, and so on. In all there are over 60 generic algorithms. In this section, we highlight those related to efficient sorting, selection, and binary search.
43.4.1.1Sorting
Sorting in C++ is accomplished by use of function template sort
. The parameters to sort
represent the start and endmarker of a (half-open range in a) container and an optional comparator:
•void sort( iterator begin, iterator end )
•void sort( iterator begin, iterator end, Comparator cmp )
The iterators must support random access. As an example, in
sort( v.begin( ), v.end( ) ); sort( v.begin( ), v.end( ), greater<int>( ) ); sort( v.begin( ), v.begin( ) + min( v.size ( ), 5 ) );
the first call sorts the entire container, v
, in nondecreasing order. The second call sorts the entire container in nonincreasing order. The third call sorts the first five elements of the container in nondecreasing order. C++11 requires that sort
provides O(N log N) worst-case performance; recent implementations have used a quicksort/heapsort hybird algorithm (introsort) [8], but no specific implementation is mandated.
The C++ library inherits the sorting routine qsort
from C. qsort
uses pointers to functions to perform its comparison making it significantly slower on average than the sort
routine. Furthermore, many implementations use a version of quicksort that has been shown to provide quadratic behavior on some commonly occurring inputs [9]. Avoid using qsort
.
43.4.1.2Selection
The function template nth_element
is used for selection, and as expected has O(N) average-case running time. The parameters include a pair of iterators, and k:
•void nth_element( iterator begin, int k, iterator end )
•void nth_element( iterator begin, int k, iterator end, Comparator c)
As a result of calling nth_element
, the item in the kth position is the kth smallest element, where counting as usual in C++ begins at 0.
Thus, in
nth_element( v.begin( ), 0, v.end( ) ); nth_element( v.begin( ), 0, v.end( ), greater<int>( ) ); nth_element( v.begin( ), v.begin( ) + ( v.end( ) - vbegin( ) / 2 ), v.end( ) );
the first call places the smallest element in the position given by v.begin( )
, the second call places the largest element in that position, and the third call places the median in the middle position.
43.4.1.3Searching
Several generic searching algorithms are available for containers. The most basic is:
•iterator find( iterator begin, iterator end, const Object & x )
: returns an iterator representing the first occurrence of x
in the half-open range specified by begin
and end
, or end
if x
is not found.
find
is simply a sequential search. If the range is sorted, binary_search
can be used to find an object in logarithmic time. A comparator can be provided or the default ordering can be used. The signatures for binary_search
are:
•iterator binary_search( iterator begin, iterator end, const Object & x )
•iterator binary_search( iterator begin, iterator end, const Object & x, Comparator cmp )
equal_range
, lower_bound
, and upper_bound
search sorted ranges and behave with the same semantics as the identically named member functions in set
.
Parts of this chapter are based upon the author’s work in References 6 and 7.
1.J. Lajoie, S. Lippman, and B.E. Moo, C++ Primer, Fifth Edition, Addison-Wesley, Reading, MA, 2013.
2.S. Meyers, Effective STL, Addison-Wesley, Reading, MA, 2001.
3.D.R. Musser and A. Saini, C++ Programming with the Standard Template Library, Addison-Wesley, Reading, MA, 1996.
4.B. Stroustrop, The C++ Programming Language, Fourth Edition, Addison-Wesley, Boston, MA, 2013.
5.B. Stroustrop, The Design and Evolution of C++, Addison-Wesley, Reading, MA, 1994.
6.M.A. Weiss, C++ for Java Programmers, Prentice-Hall, Englewood Cliffs, NJ, 2004.
7.M.A. Weiss, Data Structures and Algorithm Analysis in C++, Fourth Edition, Pearson, Boston, MA, 2014.
8.D.R. Musser, Introspective Sorting and Selection Algorithms, Software: Practice and Experience, 27, 1997, 983–993.
9.J.L. Bentley and M.D. McIlroy, Engineering: A Sort Function, Software: Practice and Experience, 23, 1993, 1249–1265.
13.58.44.229