Chapter 2. Introduction to C++ and the Standard Library

History

The standardization of C++ was started in 1989 and finished at the end of 1997, although some formal motions delayed the final publication until September 1998. The result was a reference manual with approximately 750 pages, published by the International Standards Organization (ISO). The standard has the title "Information Technology — Programming Languages — C++." Its document number is ISO/IEC 14882-1998, and it is distributed by the national bodies of the ISO, such as the ANSI in the United States.[1]

The standard was an important milestone for C++. Because it defines the exact contents and behavior of C++, it makes it easier to teach C++, to use C++ in applications, and to port C++ programs to different platforms. It also gives users greater freedom of choice regarding different C++ implementations. Its stability and portability help library providers and tool providers as well as implementers. Thus, the standard helps C++ application developers build better applications faster, and maintain them with less cost and effort.

Part of the standard is a standard library. This library provides core components for I/O, strings, containers (data structures), algorithms (such as sort, search, and merge), support for numeric computation, and (as could be expected from an international standard) support for internationalization (such as different character sets).

You may wonder why the standardization process took almost 10 years, and if you know some details about the standard you might wonder why after all this time it is still not perfect. Ten years, in fact, was not enough time! Although, according to the history and the context of the standardization process, a lot was accomplished. The result is usable in practice, but it is not perfect (nothing ever is).

The standard is not the result of a company with a big budget and a lot of time. Standards organizations pay nothing or almost nothing to the people who work on developing standards. So, if a participant doesn't work for a company that has a special interest in the standard, the work is done for fun. Thank goodness there were a lot of dedicated people who had the time and the money to do just that.

The C++ standard was not developed from scratch. It was based on the language as described by Bjarne Stroustrup, the creator of C++. The standard library, however, was not based on a book or on an existing library. Instead, different, existing classes were integrated.[2] Thus, the result is not very homogeneous. You will find different design principles for different components. A good example is the difference between the string class and the STL, which is a framework for data structures and algorithms:

  • String classes are designed as a safe and convenient component. Thus, they provide an almost self-explanatory interface and check for many errors in the interface.

  • The STL was designed to combine different data structures with different algorithms while achieving the best performance. Thus, the STL is not very convenient and it is not required to check for many logical errors. To benefit from the powerful framework and great performance of the STL, you must know the concepts and apply them carefully.

Both of these components are part of the same library. They were harmonized a bit, but they still follow their individual, fundamental design philosophies.

One component of the library existed as a de facto standard before standardization began: the IOStream library. Developed in 1984, it was reimplemented and partially redesigned in 1989. Because many programs were using it already, the general concept of the IOStream library was not changed, thus keeping it backward compatible.

In general, the whole standard (language and library) is the result of a lot of discussions and influence from hundreds of people all over the world. For example, the Japanese came up with important support for internationalization. Of course, mistakes were made, minds were changed, and people had different opinions. Then, in 1994, when people thought the standard was close to being finished, the STL was incorporated, which changed the whole library radically. However, to get finished, the thinking about major extensions was eventually stopped, regardless of how useful the extension would be. Thus, hash tables are not part of the standard, although they should be a part of the STL as a common data structure.

The current standard is not the end of the road. There will be fixes of bugs and inconsistencies, and there likely will be a next version of the standard in five years or so. However for the next few years, C++ programmers have a standard and the chance to write powerful code that is portable to very different platforms.

New Language Features

The core language and the library of C++ were standardized in parallel. In this way, the library could benefit from improvements in the language and the language could benefit from experiences of library implementation. In fact, during the standardization process the library often used special language features that were not yet available.

C++ is not the same language it was five years ago. If you didn't follow its evolution, you may be surprised with the new language features used by the library. This section gives you a brief overview of those new features. For details, refer to books on the language in question.

While I was writing this book (in 1998), not all compilers were able to provide all of the new language features. I hope (and expect) that this will change very soon (most compiler vendors were part of the standardization process). Thus, you may be restricted in your use of the library. Portable implementations of the library typically consider whether features are present in the environment they use (they usually have some test programs to check which language features are present, and then set preprocessor directives according to the result of the check). I'll mention any restrictions that are typical and important throughout the book by using footnotes.

The following subsections describe the most important new language features that are relevant for the C++ standard library.

Templates

Almost all parts of the library are written as templates. Without template support, you can't use the standard library. Moreover, the library needed new special template features, which I introduce after a short overview of templates.

Templates are functions or classes that are written for one or more types not yet specified. When you use a template, you pass the types as arguments, explicitly or implicitly. The following is a typical example a function that returns the maximum of two values:

   template <class T>
   inline const T& max (const T& a, const T& b)
   {
       // if a <b then use b else use a
       return  a < b ? b : a;
   }

Here, the first line defines T as an arbitrary data type that is specified by the caller when the caller calls the function. You can use any identifier as a parameter name, but using T is very common, if not a de facto convention. The type is classified by class, although it does not have to be a class. You can use any data type as long as it provides the operations that the template uses.[3]

Following the same principle, you can "parameterize" classes on arbitrary types. This is useful for container classes. You can implement the container operations for an arbitrary element type. The C++ standard library provides many template container classes (for example, see Chapter 6 or Chapter 10). It also uses template classes for many other reasons. For example, the string classes are parameterized on the type of the characters and the properties of the character set (see Chapter 11).

A template is not compiled once to generate code usable for any type; instead, it is compiled for each type or combination of types for which it is used. This leads to an important problem in the handling of templates in practice: You must have the implementation of a template function available when you call it, so that you can compile the function for your specific type. Therefore, the only portable way of using templates at the moment is to implement them in header files by using inline functions.[4]

The full functionality of the C++ standard library requires not only the support of templates in general, but also many new standardized template features, including those discussed in the following paragraphs.

Nontype Template Parameters

In addition to type parameters, it is also possible to use nontype parameters. A nontype parameter is then considered as part of the type. For example, for the standard class bitset<> (class bitset<> is introduced in Section 10.4,) you can pass the number of bits as the template argument. The following statements define two bitfields, one with 32 bits and one with 50 bits:

   bitset<32> flags32;     // bitset with 32 bits 
   bitset<50> flags50;     // bitset with 50 bits

These bitsets have different types because they use different template arguments. Thus, you can't assign or compare them (except if a corresponding type conversion is provided).

Default Template Parameters

Template classes may have default arguments. For example, the following declaration allows one to declare objects of class MyClass with one or two template arguments[5]:

   template <class T, class container = vector<T> >
   class MyClass;

If you pass only one argument, the default parameter is used as second argument:

   MyClass<int> x1;          // equivalent to: MyClass<int,vector<int> >

Note that default template arguments may be defined in terms of previous arguments.

Keyword typename

The keyword typename was introduced to specify that the identifier that follows is a type. Consider the following example:

   template <class T> 
   Class MyClass {
      typename T::SubType * ptr;
      ...
   };

Here, typename is used to clarify that SubType is a type defined within class T. Thus, ptr is a pointer to the type T::SubType. Without typename, SubType would be considered a static member. Thus

   T::SubType * ptr

would be a multiplication of value SubType of type T with ptr.

According to the qualification of SubType being a type, any type that is used in place of T must provide an inner type SubType. For example, the use of type Q as a template argument

   MyClass<Q> x;

is possible only if type Q has an inner type definition such as the following:

   class Q {
       typedef int SubType;
       ...
   };

In this case, the ptr member of MyClass<Q> would be a pointer to type int. However, the subtype could also be an abstract data type (such as a class):

   class Q {
       class SubType;
       ...
   };

Note that typename is always necessary to qualify an identifier of a template as being a type, even if an interpretation that is not a type would make no sense. Thus, the general rule in C++ is that any identifier of a template is considered to be a value, except if it is qualified by typename.

Apart from this, typename can also be used instead of class in a template declaration:

   template <typename T> class MyClass;

Member Templates

Member functions of classes may be templates. However, member templates may not be virtual, nor may they have default parameters. For example:

   class MyClass {
       ...
       template <class T>
       void f(T);
   };

Here, MyClass::f declares a set of member functions for parameters of any type. You can pass any argument as long as its type provides all operations used by f().

This feature is often used to support automatic type conversions for members in template classes. For example, in the following definition the argument x of assign() must have exactly the same type as the object it is called for:

   template <class T>
   class MyClass {
     private:
       T value;
     public:
       void assign (const MyClass<T>& x) { // x must have same type as *this
           value = x.value;
       }
       ...
   };

It would be an error to use different template types for the objects of the assign() operation even if an automatic type conversion from one type to the other is provided:

   void f()
   {
       MyClass<double> d;
       MyClass<int> i;


       d.assign(d);   //OK
       d.assign(i);   //ERROR: i is MyClass<int>
                      //        but MyClass<double> is required
   }

By providing a different template type for the member function, you relax the rule of exact match. The member template function argument may have any template type, then as long as the types are assignable:

   template <class T>
   class MyClass<T> {
     private:
       T value;
     public:
       template <class X>                      // member template
       void assign (const MyClass<X>& x) {      // allows different template types
           value = x.getValue();
       }
       T getValue() const {
           return value;
       }
       ...
   };


   void f()
   {
       MyClass<double> d;
       MyClass<int> i;


       d.assign(d);   // OK
       d.assign(i);   // OK (int is assignable to double)
   }

Note that the argument x of assign() now differs from the type of *this. Thus, you can't access private and protected members of MyClass<> directly. Instead, you have to use something like getValue() in this example.

A special form of a member template is a template constructor. Template constructors are usually provided to enable implicit type conversions when objects are copied. Note that a template constructor does not hide the implicit copy constructor. If the type matches exactly, the implicit copy constructor is generated and called. For example:

   template <class T>
   class MyClass {
     public:
       //copy constructor with implicit type conversion
//- does not hide implicit copy constructor
       template <class U>
       MyClass (const MyClass<U>& x);
       ...
   };


   void f()
   {
       MyClass<double> xd;
       ...
       MyClass<double> xd2(xd);     // calls built-in copy constructor
       MyClass<int> xi (xd);        // calls template constructor
       ...
   }

Here, the type of xd2 is the same as the type of xd, so it is initialized via the built-in copy constructor. The type of xi differs from the type of xd, so it is initialized by using the template constructor. Thus, if you write a template constructor, don't forget to provide a copy constructor, if the default copy constructor does not fit your needs. See Section 4.1, for another example of member templates.

Nested Template Classes

Nested classes may also be templates:

   template <class T>
   class MyClass {
       ...
       template <class T2>
       class NestedClass;
       ...
   };

Explicit Initialization for Fundamental Types

If you use the syntax of an explicit constructor call without arguments, fundamental types are initialized with zero:

   int i1;             // undefined value
   int i2 = int();     // initialized with zero

This feature is provided to enable you to write template code that ensures that values of any type have a certain default value. For example, in the following function the initialization guarantees that x is initialized with zero for fundamental types:

   template <class T>
   void f()
   {
       T x = T();
       ...
   }

Exception Handling

The C++ standard library uses exception handling. Using this feature, you can handle exceptions without "polluting" your function interfaces: arguments and return values. If you encounter an unexpected situation, you can stop the usual data processing by "throwing an exception":

   class Error;


   void f()
   {
        ...
        if (exception-condition) {
            throw Error();                 // create object of class Error and throw it as exception
        }
        ...
   }

The throw statement starts a process called stack unwinding; that is, any block or function is left as if there was a return statement. However, the program does not jump anywhere. For all local objects that are declared in the blocks that the program leaves due to the exception their destructors are called. Stack unwinding continues until main() is left, which ends the program, or until a catch clause "catches" and handles the exception:

   int main()
   {
       try {
           ...
           f();
           ...
       }
       catch (const Error&) {
           ... //handle exception
       }
       ...
   }

Here, any exception of type Error in the try block is handled in the catch clause.[6]

Exception objects are ordinary objects that are described in ordinary classes or ordinary fundamental types. Thus, you can use ints, strings, or template classes that are part of a class hierarchy. Usually you design (a hierarchy of) special error classes. You can use their state to pass any information you want from the point of error detection to the point of error handling.

Note that the concept is called exception handling not error handling. The two are not necessarily the same. For example, in many circumstances bad user input is not an exception; it typically happens. So it is often a good idea to handle wrong user input locally using the usual error-handling techniques.

You can specify which set of exceptions a function might throw by writing an exception specification:

   void f() throw(bad_alloc);  //f() may only throw bad_alloc exceptions

You can specify that a function does not throw an exception by declaring an empty set of exceptions:

   void f() throw();           //f() does not throw

A violation of an exception specification causes special behavior to occur. See the description of the exception class bad_exception on page 26 for details.

The C++ standard library provides some general features for exception handling, such as the standard exception classes and class auto_ptr (see Section 3.3, and Section 4.2, for details).

Namespaces

As more and more software is written as libraries, modules, or components, the combination of these different parts might result in a name clash. Namespaces solve this problem.

A namespace groups different identifiers in a named scope. By defining all identifiers in a namespace, the name of the namespace is the only global identifier that might conflict with other global symbols. Similar to the handling of classes, you have to qualify a symbol in a namespace by preceding the identifier with the name of the namespace, separated by the operator :: as follows:

//defining identifiers in namespace josuttis
   namespace josuttis {
       class File;
       void myGlobalFunc();
       ...
   }
   ...


   //using a namespace identifier
   josuttis::File obj;
   ...
   josuttis::myGlobalFunc();

Unlike classes, namespaces are open for definitions and extensions in different modules. Thus you can use namespaces to define modules, libraries, or components even by using multiple files. A namespace defines logical modules instead of physical modules (in UML and other modeling notations, a module is also called a package).

You don't have to qualify the namespace for functions if one or more argument types are defined in the namespace of the function. This rule is called Koenig lookup. For example:

//defining identifiers in namespace josuttis
   namespace josuttis {
       class File;
       void myGlobalFunc(const File&);
       ...
   }
   ...


   josuttis::File obj;
   ...
   myGlobalFunc(obj);   //OK, lookup finds josuttis::myGlobalFunc()

By using a using declaration, you can avoid the (remaining) tedious, repeated qualification of the namespace scope. For example, the declaration

   using josuttis::File;

makes File a local synonym in the current scope that stands for josuttis::File.

A using directive makes all names of a namespace available, because they would have been declared outside their namespace. However, the usual name conflicts may arise. For example, the directive

   using namespace josuttis;

makes File and myGlobalFunc() global in the current scope. The compiler will report an ambiguity if there also exists an identifier File or myGlobalFunc() in the global scope and the user uses the name without qualification.

Note that you should never use a using directive when the context is not clear (such as in header files, modules, or libraries). The directive might change the scope of identifiers of a namespace, so you might get different behavior than the one expected because you included or used your code in another module. In fact, using directives in header files is really bad design.

The C++ standard library defines all identifiers in namespace std. See Section 3.1, for details.

Type bool

To provide better support for Boolean values, type bool was introduced. Using bool increases readability and allows you to overload behavior for Boolean values. The literals true and false were introduced as Boolean values. Automatic type conversions to and from integral values are provided. The value 0 is equivalent to false. Any other value is equivalent to true.

Keyword explicit

By using the keyword explicit, you can prohibit a single argument constructor from defining an automatic type conversion. A typical example of the need for this feature is in a collection class in which you can pass the initial size as constructor argument. For example, you could declare a constructor that has an argument for the initial size of a stack:

   class Stack {
       explicit Stack(int size);      // create stack with initial size
...
   };

Here, the use of explicit is rather important. Without explicit this constructor would define an automatic type conversion from int to Stack. If this happens, you could assign an int to a Stack:

   Stack s;
   ...
   s = 40;      // Oops, creates a new Stack for 40 elements and assigns it to s

The automatic type conversion would convert the 40 to a stack with 40 elements and then assign it to s. This is probably not what was intended. By declaring the int constructor as explicit, such an assignment results in an error at compile time.

Note that explicit also rules out the initialization with type conversion by using the assignment syntax:

   Stack s1(40);        // OK
   Stack s2 = 40;       // ERROR

This is because there is a minor difference between

   X x;
   Y y(x);       // explicit conversion

and

   X x;
   Y y = x;      // implicit conversion

The former creates a new object of type Y by using an explicit conversion from type X, whereas the latter creates a new object of type Y by using an implicit conversion.

New Operators for Type Conversion

To enable you to clarify the meaning of an explicit type conversion for one argument, the following four new operators were introduced:

  1. static_cast

    This operator converts a value logically. It can be considered a creation of a temporary object that is initialized by the value that gets converted. The conversion is allowed only if a type conversion is defined (either as a built-in conversion rule or via a defined conversion operation). For example:

       float x;
       ...
       cout << static_cast<int>(x);        // print x as int
       ...
       f(static_cast<string>("hello"));    // call f() for string instead of char*
  2. dynamic_cast

    This operator enables you to downcast a polymorphic type to its real static type. This is the only cast that is checked at runtime. Thus, you could also use it to check the type of a polymorphic value. For example:

       class Car;         // abstract base class (has at least one virtual function)
    
    
       class Cabriolet : public Car {
           ...
       };
    
    
       class Limousine : public Car {
           ...
       };
    
    
       void f(Car* cp)
       {
    
           Cabriolet* p = dynamic_cast<Cabriolet*>(cp);
           if (p == NULL) {
               //p did not refer to an object of type Cabriolet
               ...
           }
       }

    In this example, f() contains a special behavior for objects that have the real static type Cabriolet. When the argument is a reference and the type conversion fails, dynamic_cast throws a bad_cast exception (bad_cast is described on page 26). Note that from a design point of view, it is always better to avoid such type-dependent statements when you program with polymorphic types.

  3. const_cast

    This operator adds or removes the constness of a type. In addition, you can remove a volatile qualification. Any other change of the type is not allowed.

  4. reinterpret_cast

    The behavior of this operator is implementation defined. It may be but is not required to reinterpret bits. Using this cast is usually not portable.

These operators replace the old cast techniques that use parentheses. They have the advantage of clarifying the intention of the conversion. The old casts with parentheses could be used for any of these type conversions except for dynamic_cast, so when they were used you could not formulate the exact reason for the conversion. The new operators enable the compiler to receive more information regarding the reason for the conversion and to report an error if the conversion does more than it should.

  • Note that these operators are provided for only one argument. Consider the following example:

   static_cast<Fraction>(15,100)     // Oops, creates Fraction(100)

This example does not do what you might expect. Instead of initializing a temporary fraction with numerator 15 and denominator 100, it initializes a temporary fraction only with the single value 100. The comma is not an argument separator here. Instead, it is the comma operator that combines two expressions into one expression and yields the second. The correct way to "convert" values 15 and 100 into a fraction is still

   Fraction(15,100)                  // fine, creates Fraction(15,100)

Initialization of Constant Static Members

It is now possible to initialize integral constant static members inside the class structure. This is useful when the constant is used in the class structure after the initialization. For example:

   class MyClass {
       static const int num = 100;
       int elems[num];
       ...
   };

Note that you still have to define space for a constant static member that is initialized within a class definition:

   const int MyClass::num;     // no initialization here

Definition of main()

I'd also like to clarify an important, often misunderstood, aspect of the core language — namely, the only correct and portable versions of main(). According to the C++ standard, only two definitions of main() are portable:

   int main()
   {
       ...
   }

and

   int main (int argc, char* argv[])
   {
       ...
   }

where argv (the array of command-line arguments) might also be defined as char**. Note that the return type int is required because the implicit int is deprecated.

You may, but are not required to, end main() with a return statement. Unlike C, C++ defines an implicit

   return 0;

at the end of main(). This means that every program that leaves main() without a return statement is successful (any value other than 0 represents a kind of failure). Because of this, my examples in this book have no return statement at the end of main(). Note that some compilers might print a warning message regarding this or even handle it as error. Well, that's life before the standard.

Complexity and the Big-O Notation

For certain parts of the C++ standard library (especially for the STL), the performance of algorithms and member functions was considered carefully. Thus, the standard requires a certain "complexity" of them. Computer scientists use a specialized notation to compare the relative complexity of an algorithm. Using this measure, one can categorize quickly the relative runtime of an algorithm as well as perform qualitative comparisons between algorithms. This measure is called Big-O notation.

The Big-O notation expresses the runtime of an algorithm as a function of a given input of size n. For example, if the runtime grows linearly with the number of elements (doubling the input doubles the runtime) the complexity is O(n). If the runtime is independent of the input, the complexity is O(1). Table 2.1 lists typical values of complexity and their Big-O notation.

It is important to observe that the Big-O notation hides factors with smaller exponents (such as constant factors). In particular, it doesn't matter how long an algorithm takes. Any two linear algorithms are considered equally acceptable by this measure. There even may be some situations in which the constant is so huge in a linear algorithm that even an exponential algorithm with a small constant would be preferable in practice. This is a valid criticism of the Big-O notation. Just be aware that it is only a rule of thumb; the algorithm with optimal complexity is not necessarily the best one.

Table 2.1. Typical Values of Complexity

Type Notation Meaning
Constant O(1) The runtime is independent of the number of elements.
Logarithmic O(log(n)) The runtime grows logarithmically with respect to the number of elements.
Linear O(n) The runtime grows linearly (with the same factor) as the number of elements grows.
n-log-n O(n * log(n)) The runtime grows as a product of linear and logarithmic complexity.
Quadratic O(n2) The runtime grows quadratically with respect to the number of elements.

Table 2.2 lists all the categories of complexity with a certain number of elements to give you a feel of how fast the runtime grows with respect to the number of elements. As you can see, with a small number of elements the runtimes don't differ much. Here, constant factors that are hidden by the Big-O notation may have a big influence. However, the more elements you have, the bigger the differences in the runtimes, so constant factors become meaningless. Remember to "think big" when you consider complexity.

Table 2.2. Runtime with Respect to the Complexity and the Number of Elements

Complexity No.of Elements
Type Notation 1 2 5 10 50 100 1000
Constant O(1) 1 1 1 1 1 1 1
Logarithmic O(log(n)) 1 2 3 4 6 7 10
Linear O(n) 1 2 5 10 50 100 1,000
n-log-n O(n * log(n)) 1 4 15 40 300 700 10,000
Quadratic O(n2) 1 4 25 100 2,500 10,000 1,000,000

Some complexity definitions in the C++ reference manual are specified as amortized. This means that the operations in the long term behave as described. However, a single operation may take longer than specified. For example, if you append elements to a dynamic array, the runtime depends on whether the array has enough memory for one more element. If there is enough memory, the complexity is constant because inserting a new last element always takes the same time. However, if there is not enough memory, the complexity is linear. This is because, depending on the actual number of elements, you have to allocate new memory and copy all elements. Reallocations are rather rare, so any sufficiently long sequence of that operation behaves as if each operation has constant complexity. Thus, the complexity of the insertion is "amortized" constant time.



[1] At the time this book was written, you could get the C++ standard at the ANSI Electronics Standard Store for $ 18.00 (US; see http://www.ansi.org/).

[2] You may wonder why the standardization process did not design a new library from scratch. The major purpose of standardization is not to invent or to develop something; it is to harmonize an existing practice.

[3] class was used here to avoid the introduction of a new keyword when templates were introduced. However, now there is a new keyword, typename, that you can also use here (see page 11).

[4] To avoid the problem of templates having to be present in header files, the standard introduced a template compilation model with the keyword export. However, I have not seen it implemented yet.

[5] Note that you have to put a space between the two ">" characters. ">>" would be parsed as shift operator, which would result in a syntax error.

[6] Exceptions end a call of the function, where you find the exception, with the ability to pass an object as argument back to the caller. However, this is not a function call back in the opposite direction (from the bottom where the problem was found to the top where the problem is solved or handled). You can't process the exception and continue from where you found the exception. In this regard, exception handling is completely different from signal handling.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.197.95