Copying

Programmers make certain assumptions about ordinary number variables in programming. One of these is that x = y means that the value of y is copied to x. Thereafter, x and y are completely independent of one another, and changing y does not change x. References and pointers do not follow this rule; it is easier to share a reference to an object than to make a copy of that object, but you then can't guarantee that the data will not be changed by someone else. The behavior of ordinary variables is called value semantics, where semantics refers to the meaning of a statement, as opposed to its syntax, which is how you write it.

There are fewer surprises with value semantics than with pointer semantics, but it involves copying and keeping redundant information. For example, you might have a vector<int> v1 with 1,000,000 elements; the simple assignment v2 = v1 would involve a lot of copying and use up an extra 4MB of memory. Therefore, understanding how C++ manages copying is crucial to writing good, fast, and reliable code.

Initialization versus Assignment

Two distinct kinds of copying happen in C++, and they are often confused with one another because they both usually use =. An object can be initialized when it is declared, and thereafter it can be assigned a new value. But initialization and assignment are not necessarily the same. This is most obvious in the case of references, such as the following:

;> int i = 1, j = 2;
;> int& ri = i;  // initialization
;> ri = j;       // assignment

The initialization ri = i means “make ri refer to i,” and the assignment ri = j means “copy j into whatever ri is referring to,” which means the assignment is actually modifying the number stored in i. References are an exceptional case, and for ordinary types, there is no effective difference between initialization and assignment—the results of int i; i = 1; and int i = 1 are the same. But even though the results must be consistent, class initialization and assignment are different operations. Here is a silly example that uses the old C output routine puts() to show when the constructors and operator= are called:

struct A {
  A()                      {  puts("A() default"); }
  A(const A& a)            {  puts("A() init");    }
  A& operator=(const A& a) {  puts("A() assign"); return *this; }
};
;> A
							a1;
A() default
;> A
							a2 = a1;
A() init
;> a2 = a1;
A() assign

The declaration A a2 = a1 must involve a constructor, and it matches the second declaration of A() with a const reference argument; the assignment a2 = a1 matches operator=. Basically, initialization involves construction and copying, whereas assignment involves just copying. If you leave out the second constructor A(const A& a), C++ will generate one; it will not use the assignment operator. If operator= is not present, the compiler will do sensible default copying (more about this in the next section.)

Initialization does not need a =. C++ is equally happy with object-style initializations, in which the argument is in parentheses. Ordinary types can be initialized with values in parentheses as well, and in fact this is the syntax for constructor initialization lists. So you can rewrite the preceding example as follows:

;> A
							a2(a1);
A() init
;> a2 = a1;
A() assign
;> int k(0);
;> k
							=
							0;
						

The second constructor of A, which takes a const reference argument, is called a copy constructor, and it is used in all initializations. Initialization happens in other places as well. Passing an object to a function by value, as in the following example, is a common case:

;> void f(A a) {   }
;> f(a1);
A() init

Effectively, the formal argument a of f() is a local variable, which must be initialized. The call f(a1) causes the declaration A a = a1 to happen, so f() receives its own local copy of a1. You can also return objects from functions, as in the following example:

;> A
							g() {  return a1; }
;> a2 = g();
A() init
A() assign

You should try to avoid passing and returning objects for large structures because of the copying overhead. Passing a const reference or returning a value via a reference does not cause copying to occur.

Memberwise Copying

What is the default, sensible, way that C++ copies objects? If you consider the following Person structure, what must happen to properly copy such objects?


struct Person {
   string m_name;
   string m_address;
   long   m_id;

   Person()
    : m_name("Joe"),m_address("No Fixed Abode"),m_id(0)
   { }
};
;> Person p;
;> p.m_name = "Jack";
(string&) 'Jack'
;> Person q;
;> p
							=
							p;
;> q.m_name;
(string&) 'Jack'

People imagine at first that objects are copied byte-by-byte; C copies structures this way and provides the library function memcpy() for copying raw memory. However, you should avoid using memcpy() on C++ objects; instead, you should let the compiler do the work for you because the compiler knows the exact layout of the objects in memory. For instance, C++ strings contain pointers to dynamically allocate memory, and you should not copy these pointers directly. (You'll learn why this is the case in the next section.) C++ automatically generates the following assignment operator for Person:

Person& Person::operator= (const Person& p)
{
  m_name = p.m_name;
  m_address = p.m_address;
  m_id = p.m_id;
  return *this;
}

This type of copying is called memberwise copying: All the member fields of the source are copied to the corresponding fields of the target. Memberwise copying is not the same as merely copying the bytes because some members have their own = operators, which must be used. Of course, if a simple structure contained no objects, such as strings or containers, memory would in effect simply be moved, and you can trust C++ to handle this case very efficiently. In the same way as operator=, C++ generates the following copy constructor, unless you supply your own:

Person::Person(const Person& p)
 : m_name(p.m_name), m_address(p.m_address),m_id(p.m_id)
{  }

Copy Constructors and Assignment Operators

Why would you need control over copying? C++ memberwise copying generates default copy constructors and assignment operators that work well for most cases. But in some cases memberwise copying leads to problems.

The following is a simple Array class, which is an improvement over the one discussed in Chapter 7, “Classes,” since its size can be specified when the array is created:

class Array {
 int *m_ptr;
 int m_size;
public:
 Array(int sz)
 {
   m_size = sz;
   m_ptr = new int[sz];
 }
 ~Array()
 {
   delete[] m_ptr;
 }
 int& operator[] (int j)
 {
   return m_ptr[j];
 }
};

This is the simplest possible dynamic array; space for sz integers is dynamically allocated by new int[sz]. This array form of new makes room for a number of objects of the given type, which are properly constructed if they are not simple types. (To ensure that all these objects are destroyed properly, you need to use delete[]. This will not make any difference for int objects, but it's a good practice to always use it with the array new.) The m_ptr pointer can then be safely indexed from 0 to sz-1; you can easily put a range check in operator[]. To prevent memory leaks, you should give the memory back when the array is destroyed; hence, you use delete[] m_ptr in the destructor. Here is the Array class in action:

;> Array ar(10);
;> for(int i = 0; i < 10; i++) ar[i] = i;
;> Array br(1);
;> br = ar;
(Array&) Array {}
;> br[2];
(int&) 2
;> ar[2];
(int&) 2
;> br[2] = 3;
(int&) 3
;> ar[2];
(int&) 3

Everything goes fine until you realize that you are not getting a true copy of br by using br = ar; the second array is effectively just an alias for the first. This is a consequence of memberwise copying; m_size and m_ptr are copied directly, so br shares ar's block of allocated memory. This is not how value semantics works, and it can be confusing and cause errors because you are working with the same data, using two different names. In precisely the same way, initialization will be incorrect. That is, Array b = a will cause b to share the same alloated block of memory as a. Figure 9.1 shows the situation; ar.m_ptr and ar.m_ptr are the same!

Figure 9.1. br and ar refer to the same memory block.


You will probably find out sooner rather than later; the following simple test function crashes badly when the arrays go out of scope and are destroyed:

void test()
{
  Array a(10),b(10);
  b = a;
}

After the assignment, both the a and b have a pointer to the same block of memory. The pointer is given back to the system when b is destroyed (which calls delete m_ptr.) After a pointer is deallocated, you should have nothing to do with it, and you should especially not try to dispose of it again, which is what happens when a is destroyed.

So in these cases it is necessary to explicitly define copy constructors and assignment operators. They essentially do the same thing: They both call copy(), which allocates the pointer and copies the values from another Array; it can be defined like this:


void Array::copy(const Array& ar)
{
  m_size = ar.m_size;
  m_ptr = new int[m_size];

  int *ptr = ar.m_ptr;
  for(int j = 0; j < m_size; j++) m_ptr[j] = ptr[j];
}
Array::Array(const Array& ar)
{
  copy(ar);
}
Array& Array::operator= (const Array& ar)
{
  delete m_ptr;
  copy(ar);
  return *this;
}

Factoring out the copy code into Copy() means that you can make sure that initialization and assignment are in step with one another. C++ has no way of knowing whether you have defined the initialization and assignment operations consistently. You should always define both operations or neither of them.

Some experts say you should always supply the initialization and assignment operations explicitly like this, so that it is clear how the class handles copying. Stanley Lippman (in a Dr Dobbs Journal article) pointed out that for simple classes that have to be very efficient, it's best to let the compiler handle copying the code. In such cases, you should be sure to include a comment which states that you have left out the initialization and assignment operations.

Copying with Inheritance

You can understand inheritance best if you consider the base class to be a member object. That is, if you inherit A from B, then B contains an A object, as in the case of Person containing Employee in Chapter 8, “Inheritance and Virtual Methods.” B usually inherits all of A's members, except for constructors, destructors, and the assignment operator. This is as it should be; B is usually a different beast from A, with extra data, and using the old assignment leaves the extra fields of B uninitialized. But if you don't supply copy constructors or assignment operators, then the compiler does memberwise copying on B's fields, and it uses the inherited methods to initialize the base class.

struct obj {
  obj()             {  puts("obj() default"); }
  obj(const obj& o) {  puts("obj() init");    }
};
struct A {
  obj mem_a;
  A()               {  puts("A() default"); }
  A(const A& a)     {  puts("A() init");    }
};
struct B: A {
  obj mem_b;
  B()               {  puts("B() default"); }
};
;> B
							b1;
A() default
B() default
;> B
							b2 = b1;
obj() default
A() init
obj() init

As expected, the default constructor for B first calls the default constructor for A. Remember the basic rule in operation here: C++ guarantees that everything in an object will be properly initialized, including the base class. There is no explicit copy constructor for B, so the compiler generates a copy constructor that does memberwise initialization. You can think of the base class A as the first member; it is initialized first, which causes A's copy constructor to be called. The member object mem_b is properly initialized, but only the default constructor is called for mem_a.

If you supply a copy constructor, you should not assume that the inherited copy operation will take place. In the following example, B has a copy constructor, but only A's default constructor is called:

struct B: A {
  obj mem_b;
  B(const B& b)  {  puts("B() init"); }
};
;> B
							b1;
;> B
							b2 = b1;
A() default
B() init

The default base constructor is called when a class is constructed and there is no explicit call in the class initialization list. You might be tempted to do the base initialization explicitly, but it is a better idea to rely on the official copy constructor, which is called on if you ask nicely, as in the following example:

struct B: A {
   obj mem_b;
   B(const B& b)
    : A(b)
    {  puts("B() init"); }
 };
;> B
							b2 = b1
A() init
B() init

Similarly, if you supply operator=, you should be prepared to call the inherited operator explicitly, as in the following example:


struct A {
  A& operator=(const A& a) {
    puts("A() =");
    return *this;
  };
};

struct B: A {
  B& operator=(const B& a)
  {
    A::operator=(a);   // call A's copy assignment operator directly!
    puts("B() op=");
    return *this;
  }
};
;> B
							b1,b2;
;> b1 = b2;
A() =
B() =

In this example, A::operator=(a) is merely calling the operator = directly as a function. You can do this with all operators; for instance, operator+(s1,s2) is the same as s1+s2. You have to call the operator like a function to specify that the inherited operator is needed. You cannot use *this = a, even though it compiles, because it finds B::operator=, and the program will crash quickly due to uncontrolled recursion. Alternatively, if you have defined A::operator= by using some copy() method (as recommended earlier in the chapter), you can call that directly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.142.232