Why Iterators?

Understanding iterators is perhaps the key to understanding the STL. Just as templates make algorithms independent of the type of data stored, iterators make the algorithms independent of the type of container used. Thus, they are an essential component of the STL’s generic approach.

To see why iterators are needed, let’s look at how you might implement a find function for two different data representations and then see how you could generalize the approach. First, let’s consider a function that searches an ordinary array of double for a particular value. You could write the function like this:

double * find_ar(double * ar, int n, const double & val)
{
    for (int i = 0; i < n; i++)
        if (ar[i] == val)
            return &ar[i];
    return 0;  // or, in C++11, return nullptr;
}

If the function finds the value in the array, it returns the address in the array where the value is found; otherwise, it returns the null pointer. It uses subscript notation to move through the array. You could use a template to generalize to arrays of any type having an == operator. Nonetheless, this algorithm is still tied to one particular data structure—the array.

So let’s look at searching another kind of data structure, the linked list. (Chapter 12 uses a linked list to implement a Queue class.) The list consists of linked Node structures:

struct Node
{
    double item;
    Node * p_next;
};

Suppose you have a pointer that points to the first node in the list. The p_next pointer in each node points to the next node, and the p_next pointer for the last node in the list is set to 0. You could write a find_ll() function this way:

Node* find_ll(Node * head, const double & val)
{
    Node * start;
    for (start = head; start!= 0; start = start->p_next)
        if (start->item == val)
            return start;
    return 0;
}

Again, you could use a template to generalize this to lists of any data type supporting the == operator. Nonetheless, this algorithm is still tied to one particular data structure—the linked list.

If you consider details of implementation, the two find functions use different algorithms: One uses array indexing to move through a list of items, and the other resets start to start->p_next. But broadly, the two algorithms are the same: Compare the value with each value in the container in sequence until you find a match.

The goal of generic programming in this case would be to have a single find function that would work with arrays or linked lists or any other container type. That is, not only should the function be independent of the data type stored in the container, it should be independent of the data structure of the container itself. Templates provide a generic representation for the data type stored in a container. What’s needed is a generic representation of the process of moving through the values in a container. The iterator is that generalized representation.

What properties should an iterator have in order to implement a find function? Here’s a short list:

• You should be able to dereference an iterator in order to access the value to which it refers. That is, if p is an iterator, *p should be defined.

• You should be able to assign one iterator to another. That is, if p and q are iterators, the expression p = q should be defined.

• You should be able to compare one iterator to another for equality. That is, if p and q are iterators, the expressions p == q and p != q should be defined.

You should be able to move an iterator through all the elements of a container. This can be satisfied by defining ++p and p++ for an iterator p.

There are more things an iterator could do, but nothing more it need do—at least, not for the purposes of a find function. Actually, the STL defines several levels of iterators of increasing capabilities, and we’ll return to that matter later. Note, by the way, that an ordinary pointer meets the requirements of an iterator. Hence, you can rewrite the find_arr() function like this:

typedef double * iterator;
iterator find_ar(iterator ar, int n, const double & val)
{
    for (int i = 0; i < n; i++, ar++)
        if (*ar == val)
            return ar;
    return 0;
}

Then you can alter the function parameter list so that it takes a pointer to the beginning of the array and a pointer to one past-the-end of the array as arguments to indicate a range. (Listing 7.8 in Chapter 7, “Functions: C++’s Programming Modules,” does something similar.) And the function can return the end pointer as a sign the value was not found. The following version of find_ar() makes these changes:

typedef double * iterator;
iterator find_ar(iterator begin, iterator end, const double & val)
{
    iterator ar;
    for (ar = begin; ar != end;  ar++)
        if (*ar == val)
            return ar;
    return end;   // indicates val not found
}

For the find_ll() function, you can define an iterator class that defines the * and ++ operators:

struct Node
{
    double item;
    Node * p_next;
};

class iterator
{
    Node * pt;
public:
    iterator() : pt(0) {}
    iterator (Node * pn) : pt(pn) {}
    double operator*() { return pt->item;}
    iterator& operator++()     // for ++it
    {
        pt = pt->p_next;
        return *this;
    }
    iterator operator++(int)  // for it++
    {
        iterator tmp = *this;
        pt = pt->p_next;
        return tmp;
    }
// ... operator==(), operator!=(), etc.
};

(To distinguish between the prefix and postfix versions of the ++ operator, C++ adopted the convention of letting operator++() be the prefix version and operator++(int) be the suffix version; the argument is never used and hence needn’t be given a name.)

The main point here is not how, in detail, to define the iterator class, but that with such a class, the second find function can be written like this:

iterator find_ll(iterator head, const double & val)
{
    iterator start;
    for (start = head; start!= 0; ++start)
        if (*start == val)
            return start;
    return 0;
}

This is very nearly the same as find_ar(). The point of difference is in how the two functions determine whether they’ve reached the end of the values being searched. The find_ar() function uses an iterator to one-past-the-end, whereas find_ll() uses a null value stored in the final node. Remove that difference, and you can make the two functions identical. For example, you could require that the linked list have one additional element after the last official element. That is, you could have both the array and the linked list have a past-the-end element, and you could end the search when the iterator reaches the past-the-end position. Then find_ar() and find_ll() would have the same way of detecting the end of data and become identical algorithms. Note that requiring a past-the-end element moves from making requirements on iterators to making requirements on the container class.

The STL follows the approach just outlined. First, each container class (vector, list, deque, and so on) defines an iterator type appropriate to the class. For one class, the iterator might be a pointer; for another, it might be an object. Whatever the implementation, the iterator will provide the needed operations, such as * and ++. (Some classes may need more operations than others.) Next, each container class will have a past-the-end marker, which is the value assigned to an iterator when it has been incremented one past the last value in the container. Each container class will have begin() and end() methods that return iterators to the first element in a container and to the past-the-end position. And each container class will have the ++ operation take an iterator from the first element to past-the-end, visiting every container element en route.

To use a container class, you don’t need to know how its iterators are implemented nor how past-the-end is implemented. It’s enough to know that it does have iterators, that begin() returns an iterator to the first element, and that end() returns an iterator to past-the-end. For example, suppose you want to print the values in a vector<double> object. In that case, you can use this:

vector<double>::iterator pr;
for (pr = scores.begin(); pr != scores.end(); pr++)
    cout << *pr << endl;

Here the following line identifies pr as the iterator type defined for the vector<double> class:

vector<double>::iterator pr;

If you used the list<double> class template instead to store scores, you could use this code:

list<double>::iterator pr;
for (pr = scores.begin(); pr != scores.end(); pr++)
    cout << *pr << endl;

The only change is in the type declared for pr. Thus, by having each class define appropriate iterators and designing the classes in a uniform fashion, the STL lets you write the same code for containers that have quite dissimilar internal representations.

With C++ automatic type deduction, you can simplify further and use the following code with either the vector or the list:

for (auto pr = scores.begin(); pr != scores.end(); pr++)
    cout << *pr << endl;

Actually, as a matter of style, it’s better to avoid using the iterators directly; instead, if possible, you should use an STL function, such as for_each(), that takes care of the details for you. Alternatively, use the C++11 range-based for loop:

for (auto x : scores) cout << x << endl;

So to summarize the STL approach, you start with an algorithm for processing a container. You express it in as general terms as possible, making it independent of data type and container type. To make the general algorithm work with specific cases, you define iterators that meet the needs of the algorithm and place requirements on the container design. That is, basic iterator properties and container properties stem from requirements placed on the algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.74.66