Case Study: A Resizable Array

It can be a useful exercise to reinvent the wheel, especially if you don't insist on using the results. std::vector and std::valarray (see Appendix B, “A Short Library Reference”) will do well for most applications and have been carefully designed. Also, by now, C++ programmers know them and may be irritated if they have to learn new classes that you have created to do the same thing as std::vector and std::valarray.

But the issues of copying and value semantics are best illustrated by something nontrivial. Furthermore, the resizable array discussed in this section shows how you can have objects that obey value semantics that are not expensive to use.

As mentioned previously in this chapter, you should always pass complex objects by reference because of the copying overhead involved. (Something small and nimble, such as the Point class, can probably be moved as fast as a reference.) But then in the examples in this book, we often pass a std::string argument by value; isn't that inconsistent? If you really need string manipulation to be fast, you would be better off with passing const string& arguments, but the cost of passing by value is often not as great as you think.

You want to pretend as much as possible that string is part of the language's original furniture, like int or double, and for that value semantics must not be too expensive. One solution is a sharing scheme, in which a number of string objects actually share the same data. So passing by value is just passing an alias, as in the case of the array class defined earlier in this chapter, in the section “Copy Constructors and Assignment.”

But as soon as someone tries to write to this string, the string creates its own unique copy of the data. Thereafter it can relax. This is called copy-on-write and can save a lot of pointless copying. (It resembles modern manufacturing practice, where a computer manufacturer will assemble the item when it's ordered, to minimize inventory.)

How do the strings know that they're sharing data? Because the data is reference counted. That is, the data has an associated count, starting at 1. Anybody who wants to borrow the data must increment the reference count. Anybody who no longer needs the data must ask the data nicely to dispose of itself, in order to decrement the reference count. When the reference count becomes zero, the data knows it's no longer needed and commits hara-kiri. Here is a base class that implements this strategy:


// C++ By Example, Chapter 9
// ref_count.h
class RefCount {
   int m_ref_count;
public:
   void init_ref_count() {  m_ref_count = 1; }
   void incr_ref()       {  ++m_ref_count; }
   void decr_ref()       {  —m_ref_count; }
   int  ref_count()      {  return m_ref_count; }

   RefCount()            {  init_ref_count(); }
   virtual ~RefCount()   {  }

   virtual void dispose()
   {
     —m_ref_count;
       if (m_ref_count == 0) delete this;
   }
};
;> RefCount *p = new RefCount();
;> p->incr_ref();    // i.e. now shared by two objects
;> p->dispose();     // only by one!
;> p->dispose();     // finally gets deleted
					

The basic idea of reference-counted objects like RefCount is that the reference count is the number of owners of that object. Any would-be owners must cooperate in two ways: They must call incr_ref() when taking possession of a reference, and they must call dispose() when they don't need that reference anymore. It is more accurate to say that such objects are never owned, but only borrowed.

Please note the virtual destructor for RefCount. Like any method, a class's destructor can be overridden, and in fact it is recommended practice in any hierarchy involving polymorphic methods. You need to make the destructor virtual here because the dispose() method must call the correct destructor when it issues the suicidal command delete this.

The new Array class simply has a pointer to the actual representation object. Copying such arrays just copies these pointers, and it also increments the representation's reference count. So the first task is to work on the representation, which in the preceding example is a vector-like class that has the familiar std::vector interface.

A Resizable, Reference-Counted Vector

The implementation of the Vector class uses two pointers to int, called m_begin and m_end, which correspond directly to the begin() and end() methods (that is, m_end points just past the end of the allocated block pointed to by m_begin). It is not necessary to keep the number of elements as a member variable, but you can anyway for efficient access. The class uses typedef to make T into an alias for int so that you can easily use the same code for other types, simply by changing the typedef.

A simple pointer works fine as an iterator (after all, iterators are generalized pointers)—*p accesses the value, and ++p accesses the next value. This example shows how iterators and pointers behave in very similar ways. Because the size of the type int is 4 bytes, a pointer increment actually adds 4 bytes to the address each time, as you can see from the value of ++pi:


;> int *pi = new int[10];
;> for(int i = 0; i < 10; i++) pi[i] = i;
;> pi;
(int*) pi = 8F1E80
;> *pi;
(int) 0
;> ++pi;
(int*) 8F1E84
;> *pi;
(int) 1
;> —pi;
(int*) 8F1E80
;> int *p_end = pi + 10;
;> for(int *p = pi; p != p_end; ++p)
;1}  cout << *p << ' ';
0 1 2 3 4 5 6 7 8 9

The first part of the interface to Vector looks like this. iterator is defined as a typedef; this type will be accessed as Vector::iterator, precisely like with the standard containers. A nested class RangeError is defined as a type to be thrown when there's an 'index out of bounds' error, which will simularly be refered to as Vector::RangeError:

class Vector: public RefCount {
 public:
  typedef int   T;
 private:
   T *m_begin;
   T *m_end;
   int m_size;
   int m_cap;
 public:
   typedef T *       iterator;

   struct RangeError {
     int idx;
     RangeError(int i) {  idx = i; }
     int where() {  return idx; }
   };
   iterator begin()  {  return m_begin; }
   iterator end()    {  return m_end;   }
   int size()        {  return m_size; }
   int capacity()    {  return m_cap;  }

   T& operator[] (int i)
   {  return m_begin[i]; }

   T& at(int i)  {
    if (i < 0 || i >= m_size) throw RangeError(i);
    return m_begin[i];
   }

You know that the iterator is really just an int*, but you would like to keep this implementation detail to yourself. (It does not have to be a pointer and will certainly not be a plain pointer in the case of a linked list.) As with vector, the at() method provides checked access, and operator[] provides unchecked, fast access.

As well as the current size m_size, you also keep the actual capacity of the array m_cap because, for efficient operation, an array must overallocate. That is, the actual size of the allocated block is larger than the number of elements. In the following allocation code, the basic operation is _realloc(), with which _alloc_copy() and _grow() are defined:

void _copy(T *start, T *finish, T *out)
{  while (start != finish) *out++ = *start++; }

void _realloc(int new_cap, int new_sz, const Vector& v)
{
   T *tmp = new T[new_cap];
   if (m_begin) {
     _copy(v.m_begin,v.m_end,tmp);
      delete[]m_begin;
   }
   m_begin = tmp;
   m_end = m_begin + new_sz;
   m_size = new_sz;
   m_cap  = new_cap;
 }

void _alloc_copy(const Vector& v)
{  realloc(v.m_cap,v.m_size,v); }

void _grow()
{  realloc(m_size+NGROW,m_size,*this); }

By using _alloc_copy(), you can define the copy constructor and copy assignment operators, as in the following code:

Vector(const Vector& v)
{
  m_size = 0;
  m_begin = NULL;
  _alloc_copy(v);
}

Vector& operator= (const Vector& v)
{
  _alloc_copy(v);
  return *this;
}

The resizing operations then become simple:

void resize(int sz)
{  _realloc(sz+NGROW,sz,*this);  }

void reserve(int sz)
{  _realloc(sz,m_size,*this); }

The rest of the vector-like interface is as follows:

void clear()
 {
    delete m_begin;
    m_size = 0;
    m_cap = 0;
 }

 ~Vector() {  clear(); }

 void push_back(T t)
 {
    if (m_size+1 > m_cap) _grow();
    m_size++;
    *m_end++ = t;
 }

 void pop_back()   {  m_end—;  m_size—; }
 T back()          {  return *(m_end-1); }
 T front()         {  return *m_begin; }

};

Overallocation makes operations such as push_back() much more efficient: If there isn't enough capacity for an extra element, then more space is allocated using grow(). As you can see in the following example, the new Vector class works almost exactly like vector<int>. Note that a Vector of 10 elements has room for 20 elements with this implementation:


;> Vector v(10);
;> for(int i = 0; i < 10; i++) v[i] = i+1;
;> v.front();  v.back();
(int) 1
(int) 10
;> v.push_back(11);
;> v.size();
(int) 11
;> v.capacity();
(int) 20
;> Vector::iterator vi;
;> for(vi = v.begin(); vi != v.end(); ++vi)
;1}   cout << *vi << ' ';
1 2 3 4 5 6 7 8 9 10 11;>

The Array Class

The implementation of the Array class is a pointer to a Vector object. So the first part of Array is pretty boring—in it, most of Array's methods are delegated to the representation object:


class Array {
private:
   Vector *m_rep;

public:
 typedef unsigned int uint;
 typedef Vector::iterator iterator;
 typedef Vector::T T;

iterator begin()      {  return m_rep->begin(); }
iterator end()        {  return m_rep->end();   }
void clear()          {  m_rep->clear();        }
uint size()           {  return m_rep->size();  }
uint capacity()       {  return m_rep->capacity();  }
void resize (uint sz) {  m_rep->resize(sz);     }
void reserve(uint sz) {  m_rep->reserve(sz);    }
int  ref_count()      {  return m_rep->ref_count(); }

void pop_back()       {  m_rep->pop_back();    }
T back()              {  return m_rep->back(); }
void push_back(T t)   {  m_rep->push_back(t);  }

This may look very similar to the situation in the last chapter, where Employee had a Person member variable, and many of Employee's methods simply used the Person methods. There inheritance proved a better solution and saved much typing. However, note that the representation object is not fixed. The whole point of the scheme is that Array contains an indirect reference to Vector.

Here is a more interesting part of Array, where we share those references with other Array objects:

// 'copying'!
void attach(const Array& a)
{
  m_rep = a.m_rep;
  m_rep->incr_ref();
}

Array(const Array& a)
{  attach(a);  }

Array& operator= (const Array& a)
{  attach(a);  return *this; }

~Array()
{  m_rep->dispose(); }

Array(int sz = 0)
{   m_rep = new Vector(sz);  }

void unique() {
 if (m_rep->ref_count() > 1) {
    m_rep->dispose();
    m_rep = new Vector(*m_rep);
 }
}

T get(uint i) const
 {  return m_rep->at(i); }

void put(int val, uint i)
 {  unique(); m_rep->at(i) = val; }

Note that Array objects are never copied at first. After a1=a2, a1 and a2 share the same representation object, but the reference count of that object is then 2. Consider this common situation:

;> int array_test(Array a, int idx) {  return a.get(idx); }
;> Array arr(20);
;> fill(arr.begin(),arr.end(),0);  // initialize all to zero..
;> array_test(arr,2);
(int) 0
;> Array a2 = arr;
;> a2.put(1,0);
						

As usual, passing an object by value causes the copy constructor to be called. That initializes the local variable a in the method array_test(), and a is then destroyed when it goes out of scope. But the initialization does not involve copying, and in this case, the destruction does not involve deallocation. It just decrements the reference count of the shared vector object. This is why you must call m_rep->dispose(), rather than delete m_rep; m_rep will know when it needs to destroy itself.

Similarly, Array a2 = arr does not generate a copy, but the put() method calls unique(), which sees that the representation is shared because the reference count is greater than 1. If the representation object is shared, unique() lets go of the shared object and creates a new one by using the Vector copy constructor. Figure 9.2 shows the situation: Originally a and arr point to the same Vector object, which therefore has a reference count of 2. A new Vector object is created, and the reference count of the first Vector object is decremented to 1. The two objects a and arr are now completely independent.

Figure 9.2. Effect of arr.unique(); the objects a and arr no longer share the same Vector.


Most people would be dissatisfied with an array class that did not overload operator[]. Using get() to read values and put() to write values to the object seems inconvenient. The problem is that C++ can't really distinguish between being on the left- and the right-hand side of an assignment. The technique used to produce a text property in the section on user-defined conversions is useful here. operator[] returns a “smart reference” (IRef), which has a different meaning on the left-hand side. Here is the helper class IRef and how it is used by operator[]:

struct IRef {
  Array& m_arr;
  uint   m_idx;
  IRef(Array& arr, uint idx)
  : m_arr(arr),m_idx(idx) {  }

  void operator= (int val) {  m_arr.put(val,m_idx);  }
  operator int   ()        {  return m_arr.get(m_idx); }
};

IRef Array::operator[](uint i)
{  return IRef(*this,i); }

If an IRef object is found within an integer expression, the compiler will try to convert it to an int. The user-defined conversion operator will then actually get the value by using the Array::get() method.

If the IRef is on the left-hand side of an integer assignment, the assignment operator (operator=) is chosen, which causes the Array::put() method to be called. This in turn guarantees that the representation becomes unique.

It seems as though a lot is happening for each simple array access—first the indirect access and then the temporary IRef. However, most of the methods involved are easy to inline, and so the runtime cost of the scheme is not excessive.

There are still some things to be done in this example. For instance, support for the operators += and *= would need to be put into IRef if you wanted to use something like a[i] += 2, and begin() does not properly check whether the representation is shared. The latter would be useful because it would only need to call unique() once in begin(), and thereafter, it would produce a very fast pointer access.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.214.27