13

Algorithms

Do not multiply entities beyond necessity.

– William Occam

13.1 Introduction

A data structure, such as a list or a vector, is not very useful on its own. To use one, we need operations for basic access such as adding and removing elements (as is provided for list and vector). Furthermore, we rarely just store objects in a container. We sort them, print them, extract subsets, remove elements, search for objects, etc. Consequently, the standard library provides the most common algorithms for containers in addition to providing the most common container types. For example, we can simply and efficiently sort a vector of Entrys and place a copy of each unique vector element on a list:

void f(vector<Entry>& vec, list<Entry>& lst)
{
         sort(vec.begin(),vec.end());                                          // use < for order
         unique_copy(vec.begin(),vec.end(),lst.begin());         // don't copy adjacent equal elements
}

For this to work, less than (<) and equal (==) must be defined for Entrys. For example:

bool operator<(const Entry& x, const Entry& y)       // less than
{
         return x.name<y.name;             // order Entries by their names
}

A standard algorithm is expressed in terms of (half-open) sequences of elements. A sequence is represented by a pair of iterators specifying the first element and the one-beyond-the-last element: elements:

Images

In the example, sort() sorts the sequence defined by the pair of iterators vec.begin() and vec.end(), which just happens to be all the elements of a vector. For writing (output), we need only to specify the first element to be written. If more than one element is written, the elements following that initial element will be overwritten. Thus, to avoid errors, lst must have at least as many elements as there are unique values in vec.

Unfortunately, the standard library doesn’t offer an abstraction to support range-checked writing into a container. However, we can define one:

template<typename C>
class Checked_iter {
public:
         using value_type = typename C::value_type;
         using difference_type = int;

         Checked_iter() { throw Missing_container{}; } // concept forward_iterator requires a default constructor
         Checked_iter(C& cc) : pc{ &cc } {}
         Checked_iter(C& cc, typename C::iterator pp) : pc{ &cc }, p{ pp } {}

         Checked_iter& operator++() { check_end(); ++p; return *this; }
         Checked_iter operator++(int) { check_end(); auto t{ *this }; ++p; return t; }
         value_type& operator*() const { check_end(); return *p; }

         bool operator==(const Checked_iter& a) const { return p==a.p; }
         bool operator!=(const Checked_iter& a) const { return p!=a.p; }
private:
         void check_end() const { if (p == pc->end()) throw Overflow{}; }
         C* pc {}; // default initialize to nullptr
         typename C::iterator p = pc->begin();
};

Obviously, this is not standard-library quality, but it shows the idea:

vector<int> v1 {1, 2, 3};            // three elements
vector<int> v2(2);                              // two elements

copy(v1,v2.begin());                          // will overflow
copy(v1,Checked_iter{v2});             // will throw

If, in the read-and-sort example, we had wanted to place the unique elements in a new list, we could have written:

list<Entry> f(vector<Entry>& vec)
{
         list<Entry> res;
         sort(vec.begin(),vec.end());
         unique_copy(vec.begin(),vec.end(),back_inserter(res));           // append to res
         return res;
}

The call back_inserter(res) constructs an iterator for res that adds elements at the end of a container, extending the container to make room for them. This saves us from first having to allocate a fixed amount of space and then filling it. Thus, the standard containers plus back_inserter()s eliminate the need to use error-prone, explicit C-style memory management using realloc(). The standard-library list has a move constructor (§6.2.2) that makes returning res by value efficient (even for lists of thousands of elements).

When we find the pair-of-iterators style of code, such as sort(vec.begin(),vec.end()), tedious, we can use range versions of the algorithms and write sort(vec)13.5). The two versions are equivalent. Similarly, a range-for loop is roughly equivalent to a C-style loop using iterators directly:

for (auto& x : v) cout<<x;                                                 // write out all elements of v
for (auto p = v.begin(); p!=v.end(); ++p) cout<<*p;        // write out all elements of v

In addition to being simpler and less error-prone, the range-for version is often also more efficient.

13.2 Use of Iterators

For a container, a few iterators referring to useful elements can be obtained; begin() and end() are the best examples of this. In addition, many algorithms return iterators. For example, the standard algorithm find looks for a value in a sequence and returns an iterator to the element found:

bool has_c(const string& s, char c)           // does s contain the character c?
{
        auto p = find(s.begin(),s.end(),c);
        if (p!=s.end())
                return true;
        else
               return false;
}

Like many standard-library search algorithms, find returns end() to indicate “not found.” An equivalent, shorter, definition of has_c() is:

bool has_c(const string& s, char c)           // does s contain the character c?
{
        return find(s,c)!=s.end();
}

A more interesting exercise would be to find the location of all occurrences of a character in a string. We can return the set of occurrences as a vector<char*>s. Returning a vector is efficient because vector provides move semantics (§6.2.1). Assuming that we would like to modify the locations found, we pass a non-const string:

vector<string::iterator> find_all(string& s, char c)          // find all occurrences of c in s
{
          vector<char*> res;
          for (auto p = s.begin(); p!=s.end(); ++p)
                   if (*p==c)
                           res.push_back(&*p);
          return res;
}

We iterate through the string using a conventional loop, moving the iterator p forward one element at a time using ++ and looking at the elements using the dereference operator *. We could test find_all() like this:

void test()
{
         string m {"Mary had a little lamb"};
         for (auto p : find_all(m, a ))
                  if (*p!= a )
                          cerr << "a bug!
";
}

That call of find_all() could be graphically represented like this:

Images

Iterators and standard algorithms work equivalently on every standard container for which their use makes sense. Consequently, we could generalize find_all():

template<typename C, typename V>
vector<typename C::iterator> find_all(C& c, V v)           // find all occurrences of v in c
{
         vector<typename C::iterator> res;
         for (auto p = c.begin(); p!=c.end(); ++p)
                 if (*p==v)
                         res.push_back(p);
         return res;
}

The typename is needed to inform the compiler that C’s iterator is supposed to be a type and not a value of some type, say, the integer 7.

Alternatively, we could have returned a vector of ordinary pointers to the elements:

template<typename C, typename V>
auto find_all(C& c, V v)           // find all occurrences of v in c
{
        vector<range_value_t<C>*> res;
        for (auto& x : c)
                  if (x==v)
                           res.push_back(&x);
        return res;
}

While I was at it, I also simplified the code by using a range-for loop and the standard-library range_value_t16.4.4) to name the type of the elements. A simplified version of range_value_t can be defined like this:

template<typename T>
using range_value_type_t = T::value_type;

Using either version of find_all(), we can write:

void test()
{
         string m {"Mary had a little lamb"};

         for (auto p : find_all(m, a ))                              // p is a string::iterator
                  if (*p!= a )
                          cerr << "string bug!
";


         list<int> ld {1, 2, 3, 1, -11, 2};
         for (auto p : find_all(ld,1))                               // p is a list<int>::iterator
                  if (*p!=1)
                          cerr << "list bug!
";

         vector<string> vs {"red", "blue", "green", "green", "orange", "green"};
         for (auto p : find_all(vs,"red"))                  // p is a vector<string>::iterator
                  if (*p!="red")
                          cerr << "vector bug!
";

         for (auto p : find_all(vs,"green"))
                  *p = "vert";
}

Iterators are used to separate algorithms and containers. An algorithm operates on its data through iterators and knows nothing about the container in which the elements are stored. Conversely, a container knows nothing about the algorithms operating on its elements; all it does is to supply iterators upon request (e.g., begin() and end()). This model of separation between data storage and algorithm delivers very general and flexible software.

Images

13.3 Iterator Types

What are iterators really? Any particular iterator is an object of some type. There are, however, many different iterator types – an iterator needs to hold the information necessary for doing its job for a particular container type. These iterator types can be as different as the containers and the specialized needs they serve. For example, a vector’s iterator could be an ordinary pointer, because a pointer is quite a reasonable way of referring to an element of a vector:

Images

Alternatively, a vector iterator could be implemented as a pointer to the vector plus an index:

Images

Using such an iterator allows range checking.

A list iterator must be something more complicated than a simple pointer to an element because an element of a list in general does not know where the next element of that list is. Thus, a list iterator might be a pointer to a link:

Images

What is common for all iterators is their semantics and the naming of their operations. For example, applying ++ to any iterator yields an iterator that refers to the next element. Similarly, * yields the element to which the iterator refers. In fact, any object that obeys a few simple rules like these is an iterator. Iterator is a general idea, a concept (§8.2), and different kinds of iterators are made available as standard-library concepts, such as forward_iterator and random_access_iterator14.5). Furthermore, users rarely need to know the type of a specific iterator; each container “knows” its iterator types and makes them available under the conventional names iterator and const_iterator. For example, list<Entry>::iterator is the general iterator type for list<Entry>. We rarely have to worry about the details of how that type is defined.

In some cases, an iterator is not a member type, so the standard library offers iterator_t<X> that works wherever X’s iterator is defined.

13.3.1 Stream Iterators

Iterators are general and useful concepts for dealing with sequences of elements in containers. However, containers are not the only place where we find sequences of elements. For example, an input stream produces a sequence of values, and we write a sequence of values to an output stream. Consequently, the notion of iterators can be usefully applied to input and output.

To make an ostream_iterator, we need to specify which stream will be used and the type of objects written to it. For example:

ostream_iterator<string> oo {cout};            // write strings to cout

The effect of assigning to *oo is to write the assigned value to cout. For example:

int main()
{
        *oo = "Hello, ";              // meaning cout<<"Hello, "
        ++oo;
        *oo = "world!
";          // meaning cout<<"world!
"
}

This is yet another way of writing the canonical message to standard output. The ++oo is done to mimic writing into an array through a pointer. That way, we can use algorithms on streams. For example:

vector<string> v{ "Hello", ", ", "World!
" };
copy(v, oo);

Similarly, an istream_iterator is something that allows us to treat an input stream as a read-only container. Again, we must specify the stream to be used and the type of values expected:

istream_iterator<string> ii {cin};

Input iterators are used in pairs representing a sequence, so we must provide an istream_iterator to indicate the end of input. This is the default istream_iterator:

istream_iterator<string> eos {};

Typically, istream_iterators and ostream_iterators are not used directly. Instead, they are provided as arguments to algorithms. For example, we can write a simple program to read a file, sort the words read, eliminate duplicates, and write the result to another file:

int main()
{
        string from, to;
        cin >> from >> to;                                       // get source and target file names

        ifstream is {from};                                      // input stream for file "from"
        istream_iterator<string> ii {is};                // input iterator for stream
        istream_iterator<string> eos {};               // input sentinel

        ofstream os {to};                                        // output stream for file "to"
        ostream_iterator<string> oo {os,"
"};    // output iterator for stream plus a separator

        vector<string> b {ii,eos};                          // b is a vector initialized from input
        sort(b);                                                        // sort the buffer

        unique_copy(b,oo);                                   // copy the buffer to output, discard replicated values

        return !is.eof() || !os;                                 // return error state (§1.2.1, §11.4)
}

I used the range versions of sort() and unique_copy(). I could have used iterators directly, e.g., sort(b.begin(),b.end()), as is common in older code.

Please remember that to use both a traditional iterator version of a standard-library algorithm and its ranges counterpart, we need to either explicitly qualify the call of the range version or use a using-declaration (§9.3.2):

copy(v, oo);                              // potentially ambiguous
ranges::copy(v, oo);                // OK
using ranges::copy(v, oo);     // copy(v, oo) OK from here on
copy(v, oo);                              // OK

An ifstream is an istream that can be attached to a file (§11.7.2), and an ofstream is an ostream that can be attached to a file. The ostream_iterator’s second argument is used to delimit output values.

Actually, this program is longer than it needs to be. We read the strings into a vector, then we sort() them, and then we write them out, eliminating duplicates. A more elegant solution is not to store duplicates at all. This can be done by keeping the strings in a set, which does not keep duplicates and keeps its elements in order (§12.5). That way, we could replace the two lines using a vector with one using a set and replace unique_copy() with the simpler copy():

set<string> b {ii,eos};             // collect strings from input
copy(b,oo);                              // copy buffer to output

We used the names ii, eos, and oo only once, so we could further reduce the size of the program:

int main()
{
         string from, to;
         cin >> from >> to;               // get source and target file names

         ifstream is {from};              // input stream for file "from"
         ofstream os {to};                // output stream for file "to"

         set<string> b {istream_iterator<string>{is},istream_iterator<string>{}};       // read input
         copy(b,ostream_iterator<string>{os,"
"});                                                      // copy to output

         return !is.eof() || !os;                        // return error state (§1.2.1, §11.4)
}

It is a matter of taste and experience whether or not this last simplification improves readability.

13.4 Use of Predicates

In the examples so far, the algorithms have simply “built in” the action to be done for each element of a sequence. However, we often want to make that action a parameter to the algorithm. For example, the find algorithm (§13.2, §13.5) provides a convenient way of looking for a specific value. A more general variant looks for an element that fulfills a specified requirement, a predicate. For example, we might want to search a map for the first value larger than 42. A map allows us to access its elements as a sequence of (key,value) pairs, so we can search a map<string,int>’s sequence for a pair<const string,int> where the int is greater than 42:

void f(map<string,int>& m)
{
        auto p = find_if(m,Greater_than{42});
        // ...
}

Here, Greater_than is a function object (§7.3.2) holding the value (42) to be compared against a map entry of type pair<string,int>:

struct Greater_than {
         int val;
         Greater_than(int v) : val{v} { }
         bool operator()(const pair<string,int>& r) const { return r.second>val; }
};

Alternatively and equivalently, we could use a lambda expression (§7.3.2):

auto p = find_if(m, [](const auto& r) { return r.second>42; });

A predicate should not modify the elements to which it is applied.

13.5 Algorithm Overview

A general definition of an algorithm is “a finite set of rules which gives a sequence of operations for solving a specific set of problems [and] has five important features: Finiteness ... Definiteness ... Input ... Output ... Effectiveness” [Knuth,1968,§1.1]. In the context of the C++ standard library, an algorithm is a function template operating on sequences of elements.

The standard library provides many dozens of algorithms. The algorithms are defined in namespace std and presented in the <algorithm> and <numeric> headers. These standard-library algorithms all take sequences as inputs. A half-open sequence from b to e is referred to as [b:e). Here are a few examples:

Selected Standard Algorithms <algorithm>

f=for_each(b,e,f)

For each element x in [b:e) do f(x)

p=find(b,e,x)

p is the first p in [b:e) so that *p==x

p=find_if(b,e,f)

p is the first p in [b:e) so that f(*p)

n=count(b,e,x)

n is the number of elements *q in [b:e) so that *q==x

n=count_if(b,e,f)

n is the number of elements *q in [b:e) so that f(*q)

replace(b,e,v,v2)

Replace elements *q in [b:e) so that *q==v with v2

replace_if(b,e,f,v2)

Replace elements *q in [b:e) so that f(*q) with v2

p=copy(b,e,out)

Copy [b:e) to [out:p)

p=copy_if(b,e,out,f)

Copy elements *q from [b:e) so that f(*q) to [out:p)

p=move(b,e,out)

Move [b:e) to [out:p)

p=unique_copy(b,e,out)

Copy [b:e) to [out:p); don’t copy adjacent duplicates

sort(b,e)

Sort elements of [b:e) using < as the sorting criterion

sort(b,e,f)

Sort elements of [b:e) using f as the sorting criterion

(p1,p2)=equal_range(b,e,v)

[p1:p2) is the subsequence of the sorted sequence [b:e) with the value v; basically a binary search for v

p=merge(b,e,b2,e2,out)

Merge two sorted sequences [b:e) and [b2:e2) into [out:p)

p=merge(b,e,b2,e2,out,f)

Merge two sorted sequences [b:e) and [b2:e2) into [out:p) using f as the comparison

For each algorithm taking a [b:e) range, <ranges> offers a version that takes a range. Please remember (§9.3.2) that to use both a traditional iterator version of a standard-library algorithm and its ranges counterpart, you need to either explicitly qualify the call or use a using-declaration .

These algorithms, and many more (e.g., §17.3), can be applied to elements of containers, strings, and built-in arrays.

Some algorithms, such as replace() and sort(), modify element values, but no algorithm adds or subtracts elements of a container. The reason is that a sequence does not identify the container that holds the elements of the sequence. To add or delete elements, you need something that knows about the container (e.g., a back_inserter; §13.1) or directly refers to the container itself (e.g., push_back() or erase(); §12.2).

Lambdas are very common as operations passed as arguments. For example:

vector<int> v = {0,1,2,3,4,5};
for_each(v,[](int& x){ x=x*x; });                                               // v=={0,1,4,9,16,25}
for_each(v.begin(),v.begin()+3,[](int& x){ x=sqrt(x); });        // v=={0,1,2,9,16,25}

The standard-library algorithms tend to be more carefully designed, specified, and implemented than the average hand-crafted loop. Know them and use them in preference to code written in the bare language.

13.6 Parallel Algorithms

When the same task is to be done to many data items, we can execute it in parallel on each data item provided the computations on different data items are independent:

  • parallel execution: tasks are done on multiple threads (often running on several processor cores)

  • vectorized execution: tasks are done on a single thread using vectorization, also known as SIMD (“Single Instruction, Multiple Data”).

The standard library offers support for both and we can be specific about wanting sequential execution; in <execution> in namespace execution, we find:

  • seq: sequential execution

  • par: parallel execution (if feasible)

  • unseq: unsequenced (vectorized) execution (if feasible)

  • par_unseq: parallel and/or unsequenced (vectorized) execution (if feasible). Consider std::sort():

sort(v.begin(),v.end());                       // sequential
sort(seq,v.begin(),v.end());                // sequential (same as the default)
sort(par,v.begin(),v.end());                 // parallel
sort(par_unseq,v.begin(),v.end());    // parallel and/or vectorized

Whether it is worthwhile to parallelize and/or vectorize depends on the algorithm, the number of elements in the sequence, the hardware, and the utilization of that hardware by programs running on it. Consequently, the execution policy indicators are just hints. A compiler and/or run-time scheduler will decide how much concurrency to use. This is all nontrivial and the rule against making statements about efficiency without measurement is very important here.

Unfortunately, the range versions of the parallel algorithms are not yet in the standard, but if we need them, they are easy to define:

void sort(auto pol, random_access_range auto& r)
{
         sort(pol,r.begin(),r.end());
}

Most standard-library algorithms, including all in the table in §13.5 except equal_range, can be requested to be parallelized and vectorized using par and par_unseq as for sort(). Why not equal_range()? Because so far nobody has come up with a worthwhile parallel algorithm for that.

Many parallel algorithms are used primarily for numeric data; see §17.3.1.

When requesting parallel execution, be sure to avoid data races (§18.2) and deadlock (§18.3).

13.7 Advice

[1] An STL algorithm operates on one or more sequences; §13.1.

[2] An input sequence is half-open and defined by a pair of iterators; §13.1.

[3] You can define your own iterators to serve special needs; §13.1.

[4] Many algorithms can be applied to I/O streams; §13.3.1.

[5] When searching, an algorithm usually returns the end of the input sequence to indicate “not found”; §13.2.

[6] Algorithms do not directly add or subtract elements from their argument sequences; §13.2, §13.5.

[7] When writing a loop, consider whether it could be expressed as a general algorithm; §13.2.

[8] Use using-type-aliases to clean up messy notation; §13.2.

[9] Use predicates and other function objects to give standard algorithms a wider range of meanings; §13.4, §13.5.

[10] A predicate must not modify its argument; §13.4.

[11] Know your standard-library algorithms and prefer them to hand-crafted loops; §13.5.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.55.102