11. Overloading

The Devil is in the details.

traditional

Fine-grain overload resolution — ambiguity control — the null pointer — type-safe linkage — name mangling — controlling copying, allocation, derivation, etc. — smart pointers — smart references — increment and decrement — an exponentiation operator — user-defined operators — composite operators — enumerations — a Boolean type.

11.1 Introduction

Operators are used to provide notational convenience. Consider a simple formula F=M*A. No basic physics textbook states that as assign (F, multiply (M, A)). When variables can be of different types, we must decide whether to allow mixed-mode arithmetic or to require explicit conversion of operands to a common type. For example, if M is an int and A is a double we can either accept M*A and deduce that M must be promoted to a double before the multiplication, or we can require the programmer to write something like double (M) *A.

By choosing the former – as C, Fortran, and every other language used extensively for computation have – C++ entered a difficult area without perfect solutions. On the one hand, people want “natural” conversions without any fuss from the compiler, but on the other, they don’t want surprises. What is considered natural differs radically among people, and so do the kinds of surprises people are willing to tolerate. This, together with the constraint of compatibility with C’s rather chaotic built-in types and conversions, results in a fundamentally difficult problem.

The desire for flexibility and freedom of expression clashes with wishes for safety, predictability, and simplicity. This chapter looks at the refinements to the overloading mechanisms that resulted from this clash.

11.2 Overload Resolution

Overloading of function names and operators, as originally introduced into C++ (§3.6) [Stroustrup, 1984b] proved popular, but problems with the overload mechanism had surfaced. The improvements provided by Release 2.0 were summarized [Stroustrup, 1989b]:

“The C++ overloading mechanism was revised to allow resolution of types that used to be “too similar” and to gain independence of declaration order. The resulting scheme is more expressive and catches more ambiguity errors.”

The work on fine-grain resolution gave us the ability to overload based on the int/char, float/double, const/non-const, and base/derived distinctions. Order independence eliminated a source of nasty bugs. I will examine these two aspects of overloading in turn. Finally, I’ll explain why the overload keyword was made obsolete.

11.2.1 Fine-Grain Resolution

As first defined, the C++ overloading rules accepted the limitations of C’s built-in types [Kernighan,1978]. That is, there were no values of type float (technically, no rvalues) because in computation a float is immediately widened to a double. Similarly there were no values of type char because in every use a char is widened to an int. This led to complaints that single-precision floating point libraries couldn’t be provided naturally and that character manipulation functions were unnecessarily error-prone.

Consider an output function. If we can’t overload based on the char/int distinction, we have to use two names. In fact, the original stream library (§8.3.1) used:

ostream& operator<<(int); // output ints (incl. promoted
                         // chars) as sequence of digits.
ostream& put(char c);    // output chars as characters.

However, many people wrote

cout<< ' X ' ;

and were (naturally) surprised to find 88 (the numeric value of ASCII 'X') in their output instead of the character X.

To overcome this, the type rules of C++ were changed to allow types such as char and float to be considered in their unpromoted form by the overload resolution mechanism. In addition, the type of a literal character, such as 'X', was defined to be char. At the same time, the then recently invented ANSI C notation for expressing unsigned and float literals was adopted so that we could write:

float abs(float);
double abs(double);
int abs(int);
unsigned abs(unsigned);
char abs(char);

void f()
{
   abs(1) ;     // abs(int)
   abs(1U)      // abs(unsigned)
   abs(1.0)     // abs(double)
   abs(1.0F)    // abs(float)
   abs('a')     // abs(char)
}

In C, the type of a character literal such as 'a' is int. Surprisingly, giving 'a' type char in C++ doesn’t cause compatibility problems. Except for the pathological example sizeof ('a'), every construct that can be expressed in both C and C++ gives the same result.

In defining the type of a character literal as char, I relied partly on reports from Mike Tiemann on experience with a compiler option providing that interpretation in the GNU C++ compiler.

Similarly, it had been discovered that the difference between const and non-const could be used to good effect. An important use of overloading based on const was to provide a pair of functions

char* strtok(char*, const char*);
const char* strtok(const char*, const char*);

as an alternative to the ANSI C standard function

char* strtok(const char*, const char*);

The C strtok() returns a substring of the const string passed as its first argument. Having that substring non-const couldn’t be allowed for a C++ standard library because an implicit violation of the type system is not acceptable. On the other hand, incompatibilities with C had to be minimized, and providing two strtok functions allows most reasonable uses of strtok.

Allowing overloading based on const was part of a general tightening up of the rules for const and a trend towards enforcing those rules (§13.3).

Experience showed that hierarchies established by public class derivations should be taken into account in function matching so that the conversion to the “most derived” class is chosen if there is a choice. A void* argument is chosen only if no other pointer argument matches. For example:

class B {/*...*/};
class BB : public B { /* ... */ };
class BBB : public BB {/*...*/ };

void f(B*);
void f(BB*);
void f(void*);

void g(BBB* pbbb, BB* pbb, B* pb, int* pi)
{
    f(pbbb);   // f(BB*)
    f(pbb);    // f(BB*)
    f(pb);     // f(B*)
    f(pi);     // f(void*)
}

This ambiguity resolution rule matches the rule for virtual function calls where the member from the most derived class is chosen. Its introduction eliminated a source of errors. This change was so obvious that people greeted it with a yawn (“you mean it wasn’t that way before?”). The bugs disappeared and that was all.

The rule has one interesting property, though. It establishes void* as the root of the tree of class conversions. This fits with the view that construction makes an object out of raw memory and a destructor reverses that process by making raw memory out of an object (§2.11.1, §10.2). A conversion such as B* to void* allows an object to be seen as raw memory where no other property is of interest.

11.2.2 Ambiguity Control

The original C++ overloading mechanism resolved ambiguities by relying on the order of declaration. Declarations were tried in order and the first match “won.” To make this tolerable, only non-narrowing conversions were accepted in a match. For example:

overload void print(int);  // original (pre 2.0) rules:
void print(double);

void g()
{
    print(2.0);   // print(double): print(2.0)
                  // double-gt#int conversion not accepted.
    print(2.OF);  // print(double): print(double(2.OF))
                  // float-gt#int conversion not accepted
                  // float-gt#double conversion accepted.
    print(2);     // print(int): print(2).
}

This rule was simple to express, simple for users to understand, efficient at compile time, trivial for implementers to get right, and was a constant source of errors and confusion. Reversing the declaration order could completely change the meaning of a piece of code:

overload void print(double);  // original rules:
void print(int);

void g()
{
    print(2.0);   // print(double): print(2.0).
    print(2.OF);  // print(double): print(double(2.OF))
                  // float-gt#double conversion accepted,
    print(2);     // print(double): print(double(2))
                  // int-gt#double conversion accepted.
}

Basically, order dependence was too error-prone. It also became a serious obstacle to the effort to evolve C++ programming towards a greater use of libraries. My aim was to move to a view of programming as the composition of programs out of independent program fragments (see also §11.3) and order dependence was one of many obstacles.

The snag is that order-independent overloading rules complicate C++’s definition and implementation because a significant degree of compatibility with C and with the original C++ must be maintained. In particular, the simple rule “if an expression has two possible legal interpretations, it is ambiguous and thus illegal,” wasn’t a real option. For example, under that rule all of the calls of print () in the example above would be ambiguous and illegal.

I concluded we needed some notion of a “better match” rule so that we would prefer an exact type match to a match involving a conversion and prefer a safe conversion such as float to double over an unsafe (narrowing, value destroying, etc.) conversion such as float to int. The resulting series of discussions, refinements, and reconsiderations lasted for years. Some details are still being discussed in the standards committee. The main participants were Doug McIlroy, Andy Koenig, Jonathan Shopiro, and me. Early on, Doug pointed out that we were perilously close to trying to design a “natural” system for implicit conversions. He considered PL/I’s rules, which he had helped design, proof that such a “natural” system cannot be designed for a rich set of common data types – and C++ provides a rich set of built-in types with anarchic conversions plus the ability to define conversions between arbitrary user-defined types. My stated reason for entering this swamp was that we didn’t have any option but to try.

C compatibility, people’s expectations, and the aim to allow users to define types that can be used exactly as built-in types prevented us from banning implicit conversions. In retrospect, I agree with the decision to proceed with implicit conversions. I also agree with Doug’s observation that the task of minimizing surprises caused by implicit conversions is inherently difficult and that (at least given the requirement of C compatibility) surprises cannot be completely eliminated. Different programmers simply have differing expectations so whatever rule you choose, someone is going to be surprised sometime.

A fundamental problem is that the graph of built-in implicit conversions contains cycles. For example, implicit conversions exist not only from char to int, but also from int to char. This has the potential for endless subtle errors and prevented us from adopting a scheme for implicit conversions based on a lattice of conversions. Instead we devised a system of “matches” between types found in function declarations and the types of actual arguments. Matches involving conversions we considered less error-prone and less surprising were preferred over others. This allowed us to accommodate C’s standard promotion and standard conversion rules. I described the 2.0 version of this scheme like this [Stroustrup, 1989b]:

“Here is a slightly simplified explanation of the new rules. Note that with the exception of a few cases where the older rules allowed order dependence the new rules are compatible and old programs produce identical results under the new rules. For the last two years or so C++ implementations have issued warnings for the now “outlawed” order-dependent resolutions.

C++ distinguishes 5 kinds of “matches”:

[1] Match using no or only unavoidable conversions (for example, array name to pointer, function name to pointer to function, and T to const T).

[2] Match using integral promotions (as defined in the proposed ANSI C standard; that is, char to int, short to int and their unsigned counterparts) and float to double.

[3] Match using standard conversions (for example, int to double, derived* to base*, unsigned int to int).

[4] Match using user-defined conversions (both constructors and conversion operators).

[5] Match using the ellipsis ... in a function declaration.

Consider first functions of a single argument. The idea is always to choose the “best” match, that is the one highest on the list above. If there are two best matches, the call is ambiguous and thus a compile-time error.”

The examples above illustrate this rule. A more precise version of the rules can be found in the ARM.

A further rule is needed to cope with functions of more than one argument [ARM]:

“For calls involving more than one argument, a function is chosen provided it has a better match than every other function for at least one argument and at least as good a match as every other function for every argument. For example:

class complex {
    // . . .
    complex(double);
};


void f(int,double);
void f(double,int);
void f(complex,int);
void f(int ...) ;
void f(complex ...);

void g(complex z)
{
   f(1,2.0);    // f(int,double)
   f(1.0,2);    // f(double,int)
   f(z,1.2);    // f(complex,int)
   f(z,l,3);    // f(complex ...)
   f(2.0,z);    // f(int ...)
   f(1,1);      // error: ambiguous,
                // f(int,double) or f(double,int) ?
}

The unfortunate narrowing from double to int in the third and the second to last calls causes warnings. Such narrowings are allowed to preserve compatibility with C. In this particular case, the narrowing is harmless, but in many cases double to int conversions are value destroying and they should never be used thoughtlessly.”

Elaboration and formalization of this rule for multiple arguments led to the “intersect rule” found in [ARM,pp312-314]. The intersect rule was first formulated by Andrew Koenig during discussions with Doug McIlroy, Jonathan Shopiro, and me. I believe Jonathan was the one who found the truly bizarre examples that proved it necessary [ARM,pg313].

Please note how seriously the compatibility concerns were taken. My view is that anything less would have been taken quite badly by the vast majority of existing and future C++ users. A simpler, stricter, and more easily understood language would have attracted more adventurous programmers as well as programmers who are permanently discontented with existing languages. Had design decisions systematically favored simplicity and elegance over compatibility, C++ would today have been much smaller and cleaner. It would also have been an unimportant cult language.

11.2.3 The Null Pointer

Nothing seems to create more heat than a discussion of the proper way to express a pointer that doesn’t point to an object, the null pointer. C++ inherited its definition of the null pointer from Classic C [Kernighan,1978]:

“A constant expression that evaluates to zero is converted to a pointer, commonly called the null pointer. It is guaranteed that this value will produce a pointer distinguishable from a pointer to any object or function.”

The ARM further warns:

"Note that the null pointer need not be represented by the same bit pattern as the integer 0.”

The warning reflects the common misapprehension that if p=0 assigns the null pointer to the pointer p, then the representation of the null pointer must be the same as the integer zero, that is, a bit pattern of all-zeros. This is not so. C++ is sufficiently strongly typed that a concept such as the null pointer can be represented in whichever way the implementer chooses, independently of how that concept is represented in the source text. The one exception is when people use the ellipsis to suppress function argument checking:

int printf(const char* ...); // C style unchecked calls

printf(fmt, 0, (char)0, (void*)0, (int*)0, (int(*)())0);

Here, the casts are needed to specify exactly which kind of 0 is wanted. In this example, five different values could conceivably be passed.

In K&R C, function arguments were never checked and even in ANSI C you still can’t rely on argument checking because it is optional. For this reason, because 0 is not easy to spot in a C or C++ program and because people are used to a symbolic constant representing the null pointer in other languages, C programmers tend to use a macro called NULL to represent the null pointer. Unfortunately, there is no portable correct definition of NULL in K&R C. In ANSI C, (void*) 0 is a reasonable and increasingly popular definition for NULL.

However, (void*) 0 is not a good choice for the null pointer in C++:

char* p = (void*)0; /* legal C, illegal C++ */

A void* cannot be assigned to anything without a cast. Allowing implicit conversions of void* to other pointer types would open a serious hole in the type system. One might make a special case for (void*) 0, but special cases should only be admitted in dire need. Also, C++ usage was determined long before there was an ANSI C standard, and I do not want to have any critical part of C++ rely on a macro (§18). Consequently, I used plain 0, and that has worked very well over the years. People who insist on a symbolic constant usually define one of

const int NULL =0; //or
#define NULL 0

As far as the compiler is concerned, NULL and 0 are then synonymous. Unfortunately, so many people have added definitions NULL, NIL, Null, null, etc., to their code that providing yet another definition can be hazardous.

There is one kind of mistake that is not caught when 0 (however spelled) is used for the null pointer. Consider:

void f(char*);

void g() { f(0); } // calls f(char*)

Now add another f() and the meaning of g() silently changes:

void f(char*);
void f(int);

void g() { f(0); } // calls f(int)

This is an unfortunate side effect of 0 being an int that can be promoted to the null pointer, rather than a direct specification of the null pointer. I think a good compiler should warn, but I didn’t think of that in time for Cfront. Making the call f (0) ambiguous rather than resolving it in favor of f (int) would be feasible, but would probably not satisfy the people who want NULL or nil to be magical.

After one of the regular flame wars on comp.lang.c++ and comp.lang.c, one of my friends observed, “If 0 is their worst problem, then they are truly lucky.” In my experience, using 0 for the null pointer is not a problem in practice. I am still amazed, though, by the rule that accepts the result of any constant expression evaluating to 0 as the null pointer. This rule makes 2-2 and ~-l null pointers. Assigning 2+2 or -1 to a pointer is a type error, of course. That is not a rule that I like as an implementer either.

11.2.4 The overload Keyword

Originally, C++ allowed a name to be used for more than one name (that is, “to be overloaded”) only after an explicit overload declaration. For example:

overload max;              // "overload'' – obsolete in 2.0
int max(int,int);
double max(double,double);

I considered it too dangerous to use the same name for two functions without explicitly declaring an intent to overload. For example:

int abs(int);            //no x'overload abs''
double abs(double);     // used to be an error

This fear of overloading had two sources:

[1] Concern that undetected ambiguities could occur.

[2] Concern that a program could not be properly linked unless the programmer explicitly declared which functions were supposed to be overloaded.

The former fear proved largely groundless. The few problems found in actual use are dealt with by the order-independent overloading resolution rules. The latter fear proved to have a basis in a general problem with C separate compilation rules that had nothing to do with overloading (see §11.3).

On the other hand, the overload declarations themselves became a serious problem. One couldn’t merge pieces of software using the same function name for different functions unless both pieces had declared that name overloaded. This wasn’t usually the case. Typically, the name one wants to overload is the name of a C library function declared in a C header. For example:

/* Header for C standard math library, math.h: */
    double sqrt(double);
    /* ... */
// header for C++ complex arithmetic library, complex.h:
    overload sqrt;
    complex sqrt(complex);
    // ...

Now we could write

#include <complex.h>
#include <math.h>

but not

#include <math.h>
#include <complex.h>

because it was an error to use overload for sqrt() on its second declaration only. There were ways of alleviating this: rearranging declarations, putting constraints on the use of header files, and sprinkling overload declarations everywhere “just in case.” However, we found such tricks unmanageable in all but the simplest cases. Abolishing overload declarations and getting rid of the overload keyword worked much better.

11.3 Type-Safe Linkage

C linkage is very simple and completely unsafe. You declare a function

extern void f(char);

and the linker will merrily link that f to any f in its universe. The f linked to may be a function taking completely different arguments or even a non-function. This usually causes a run-time error of some sort (core dump, segment violation, etc.). Linkage problems are especially nasty because they increase disproportionately with the size of programs and with the amount of library use. C programmers have learned to live with this problem. However, the needs of the overloading mechanism caused a sense of urgency. Any solution for this linkage problem for C++ had to leave it possible to call C functions without added complication or overhead.

11.3.1 Overloading and Linkage

The solution to the C/C++ linkage problem in pre-2.0 implementations was to let the name generated for a C++ function be the same as would be generated for a C function of the same name whenever possible. Thus open() gets the name open on systems where C doesn’t modify its names on output, the name _open on systems where C adds a prefix underscore, etc.

This simple scheme clearly isn’t sufficient to cope with overloaded functions. The keyword overload was introduced partly to distinguish the hard case from the easy ones (see also §3.6).

The initial solution, like the subsequent ones, was based on the idea of encoding type information into names given to the linker (§3.3.3). To allow linkage to C functions, only the second and subsequent versions of an overloaded function had their names encoded. Thus the programmer would write:

overload sqrt;
double sqrt(double);    // a linker sees: sqrt
complex sqrt(complex);  // a linker sees: sqrt__F7complex

The C++ compiler generated code referring to sqrt and sqrt__F7 complex. Fortunately, I documented this trick only in the BUGS section of the C++ manual page.

The overloading scheme used for C++ before 2.0 interacted with the traditional C linkage scheme in ways that brought out the worst aspects of both. We had to solve three problems:

[1] Lack of type checking in the linker.

[2] Use of the overload keyword.

[3] Linking C++ and C program fragments.

A solution to 1 is to augment the name of every function with an encoding of its type. A solution to 2 is to abolish the overload keyword. A solution to 3 is for a C++ programmer to state explicitly when a function is supposed to have C-style linkage. Consequently, [Stroustrup, 1988a]:

"The question is whether a solution based on these three premises can be implemented without noticeable overhead and with only minimal inconvenience to C++ programmers. The ideal solution would

– Require no C++ language changes.

– Provide type-safe linkage.

– Allow for simple and convenient linkage to C.

– Break no existing C++ code.

– Allow use of (ANSI-style) C headers.

– Provide good error detection and error reporting.

– Be a good tool for library building.

– Impose no run-time overhead.

– Impose no compile-time overhead.

– Impose no link-time overhead.

We have not been able to devise a scheme that fulfills all of these criteria strictly, but the adopted scheme is a good approximation.”

Clearly, the solution was to type check all linkage. The problem then became how to do that without having to write a new linker for every system.

11.3.2 An Implementation of C++ Linkage

First of all, every C++ function name is encoded by appending its argument types. This ensures that a program will link only if every function called has a definition and that the argument types specified in declarations are the same as the types specified in the function definition. For example, given:

f(int i) {/*...*/ }            // defines f__Fi
f(int i, char* j) { /*... */ } // defines f__FiPc

These examples can be correctly handled:

extern f(int);             // refers to f__Fi
extern f(int,char*);      //  refers to f__FiPc
extern f(double,double);  //  refers to f__Fdd

void g()
{
    f(1);          // links to f__Fi
    f(1,"asdf");   // links to f__FiPc
    f(1,1);        // tries to link to f__Fdd
                   // link-time error: no f__Fdd defined
}

This leaves the problem of how to call a C function or a C++ function “masquerading” as a C function. To do this, a programmer must state that a function has C linkage. Otherwise, a function is assumed to be a C++ function and its name is encoded. To express this, an extension of the linkage-specification was introduced into C++:

extern "C" {
    double sqrt(double); // sqrt(double) has C linkage
}

The linkage specification does not affect the semantics of the program using sqrt() but simply tells the compiler to use the C naming conventions for the name used for sqrt() in the object code. This means that the linkage name of this sqrt() is sqrt or _sqrt or whatever is required by the C linkage conventions in a given system. One could also imagine a system in which the C linkage rules were the type-safe C++ linkage rules as described above so that the linkage name of the C function sqrt() was sqrt_Fd.

Naturally, suffixing with an encoding of the type is only an example of an implementation technique. It is, however, the technique we successfully used for Cfront, and it has been widely copied. It has the important properties of being simple and working with existing linkers. This implementation of the idea of type-safe linkage is not 100% safe, but then, again, in general, very few useful systems are 100% safe. A more complete description of the encoding (“name mangling”) scheme used by Cfront can be found in [ARM,§7.2c].

11.3.3 Retrospective

I think the combination of requiring type-safe linkage, providing a reasonable implementation, and providing an explicit escape for linking to other languages was the right one. As expected, the new linkage system eliminated problems without imposing burdens that users found hard to live with. In addition, a surprising number of linkage errors were found in old C and C++ code converted to the new style. My observation at the time was “switching to type-safe linkage feels like running lint on a C program for the first time – somewhat embarrassing.” Lint is a popular tool for checking separately compiled units of C programs for consistent use of types [Ker-nighan,1984]. During the initial introduction period, I tried to keep track of experiences. Type-safe linkage detected a hitherto undiscovered error in every significant C or C++ program we compiled and linked.

One surprise was that several programmers had acquired the nasty habit of supplying wrong function declarations simply to shut up the compiler. For example, a call f (1, a) causes an error if f() isn’t declared. When that happens, I had naively expected the programmer to either add the right declaration for the function or to add a header file containing that declaration. It turned out that there was a third alternative -just supply some declaration that fits the call:

void g()
{
    void f(int ...); // added to suppress error message
    // ...
    f(1,a);
}

Type-safe linkage detects such sloppiness and reports an error whenever a declaration doesn’t match a definition.

We also discovered a portability problem. People declared library functions directly rather than including the proper header file. I suppose the aim was to minimize compilation time, but the effect was that when the code was ported to another system, the declaration became wrong. Type-safe linkage helped us catch quite a few porting problems (mainly between UNIX System V and BSD UNIX) of this kind.

We considered several alternatives to the type-safe linkage schemes before deciding on the one actually added to the language [Stroustrup,1988]:

– Provide no escape and rely on tools for C linkage

– Provide type-safe linkage and overloading for functions explicitly marked overload only

– Provide type-safe linkage only for functions that couldn’t be C functions because they had types that couldn’t be expressed in C

Experience with the way the adopted scheme has been used convinced me that the problems I conjectured for the alternatives were genuine. In particular, extending the checking to all functions by default has been a boon and mixed C/C++ has been so popular that any complication of C/C++ linkage would have been most painful.

Two details prompted complaints from users and are causes for concern still. In one case, I think we made the right choice. In the other, I’m not so sure.

A function declared to have C linkage still has C++ calling semantics. That is, the formal arguments must be declared, and the actual arguments must match under the C++ matching and ambiguity control rules. Some users wanted functions with C linkage to obey the C calling rules. Allowing that would have allowed more direct use of C header files. It would also have allowed sloppy programmers to revert to C’s weaker type checking. Another argument against introducing special rules for C, however, is that programmers also asked for Pascal, Fortran, and PL/I linkage complete with support for the function-calling rules from those languages, such as implicit conversion of C-style strings to Pascal-type strings for functions with Pascal linkage, call by reference and added array-type information for functions with Fortran linkage, etc. Had we provided special services for C, we would have been obliged to add knowledge of an unbounded set of language calling conventions to C++ compilers. Resisting that pressure was right, though a significant added service could have been rendered to individual users of mixed language programming systems. Given the C++ semantics (only), people have found references (§3.7) useful to provide interfaces to languages such as Fortran and Pascal that support pass-by-reference arguments.

On the other hand, focusing only on linkage led to a problem. The solution doesn’t directly address the problems of an environment that supports mixed-language programming and pointers to functions with different calling conventions. Using the C++ linkage rules, we can directly express that a function obeys C++ or C calling conventions. Specifying that the function itself obeys C++ conventions but its argument obeys C conventions cannot be expressed directly. One solution is to express this indirectly [ARM,pgll8]. For example:

typedef void (*PV)(void*,void*);

void* sortl(void*, unsigned, PV);
extern "C" void* sort2(void*, unsigned, PV) ;

Here, sort1() has C++ linkage and takes a pointer to a function with C++ linkage; sort2() has C linkage and takes a pointer to a function with C++ linkage. These are the clear-cut cases. On the other hand, consider:

extern "C" typedef void (*CPV)(void*,void*);

void* sort3(void*, unsigned, CPV);
extern "C" void* sort4(void*, unsigned, CPV);

Here, sort3() has C++ linkage and takes a pointer to a function with C linkage; sort4() has C linkage and takes a pointer to a function with C linkage. That pushes the limits of what the language specifies and is ugly. The alternatives don’t seem to be attractive either: You could introduce calling conventions into the type system or use calling stubs extensively to handle such mixtures of calling conventions.

Linkage, inter-language calls, and inter-language object passing are inherently difficult problems and have many implementation-dependent aspects. It is also an area where the ground rules change as new languages, hardware architectures, and implementation techniques are developed. I expect that we haven’t heard the last of this matter.

11.4 Object Creation and Copying

Over the years, I have been asked for language features to disallow various operations regularly (say, twice a week for ten years). The reasons vary. Some want to optimize the implementation of a class in ways that can only be done if operations such as copying, derivation, or stack allocation are never performed on objects of that class. In other cases, such as objects representing real-world objects, the required semantics simply don’t include all of the operations C++ supplies by default.

The answer to most such requests was discovered during the work on 2.0: If you want to prohibit something, make the operation that does it a private member function 2.10).

11.4.1 Control of Copying

To prohibit copying of objects of class X, simply make the copy constructor and the assignment operator private:

class X {
    X& operator=(const X&);  // assignment
    X(const X&);             // copy constructor
    // ...
public:
    X(int);
    // ...
};


void f()
{
    X a(l);     // fine: can create Xs
    X b = a;    // error: X::X(const X&) private
    b = a;      // error: X::operator=(const X&) private
}

Naturally, the implementer of class X can still copy X objects, but in real cases that is typically acceptable or even required. Unfortunately, I don’t remember who thought of this first; I doubt it was me, see [Stroustrup, 1986,pgl72].

I personally consider it unfortunate that copy operations are defined by default and I prohibit copying of objects of many of my classes. However, C++ inherited its default assignment and copy constructors from C, and they are frequently used.

11.4.2 Control of Allocation

Other useful effects can be achieved by declaring operations private. For example, declaring a destructor private prevents stack and global allocation. It also prevents random use of delete:

class On_free_store {
    ~On_free_store();  // private destructor
    // ...
public:
    static void free(On_free_store* p) { delete p; }
    // ...
};

On_free_store glob1;  // error: private destructor

void f()
{
   On_free_store loc;   // error: private destructor
   On_free_store* p = new On_free_store; // fine
   // ...
   delete p;  // error: private destructor
   On_free_store::free(p); // fine
}

Naturally, such a class will typically be used with a highly optimized free store allocator or other semantics taking advantage of objects being on the free store.

The opposite effect – allowing global and local variables, but disallowing free store allocation – is obtained by declaring only an unusual operator new():

class No_free_store {
    class Dummy { };
    void* operator new(size_t,Dummy);
    // ...
};

No_free_store glob2;  // fine

void g()
{
    No_free_store loc; // fine
    No_free_store* p = new No_free_store; // error:
             // no No_free_store::operator new(size_t)
}

11.4.3 Control of Derivation

A private destructor also prevents derivation. For example:

class D : public On_free_store {
    // . . .
};

D d; // error: cannot call private base class destructor

This makes a class with a private destructor the logical complement to an abstract class. It is impossible to derive from On_free_store, so calls of On_free_store virtual functions need not use the virtual function mechanism. However, I don’t think any current compilers optimize based on that.

Later, Andrew Koenig discovered that it was even possible to prevent derivation without imposing restrictions on the kind of allocation that could be done:

class Usable_lock {
    friend Usable;
private:
    Usable_lock() {}
};

class Usable : public virtual Usable_lock {
    // ...
public:
    Usable();
    Usable(char*);
    // . . .

};

Usable a;

class DD : public Usable { };

DD dd;  // error: DD::DD() cannot access
            // Usable_lock::Usable_lock(): private member

this relies on the rule that a derived class must call the constructor of a virtual base class (implicitly or explicitly).

Such examples are usually more of an intellectual delight than techniques of real importance. Maybe that’s why discussing them is so popular.

11.4.4 Memberwise Copy

Originally, assignment and initialization were by default defined as bitwise copy. This caused problems when an object of a class with assignment was used as a member of a class that did not have assignment defined:

class X {/*...*/ X& operator = (const X&) };

struct Y { X a; };

void f(Y y1, Y y2)
{
    y1 = y2;
}

Here, y2. a was copied into y1. a with a bitwise copy. This is clearly wrong and simply the result of an oversight when assignment and copy constructors were introduced. After some discussion and at the urging of Andrew Koenig, the obvious solution was adopted: Copying of objects is defined as the memberwise copy of non-static members and base class objects.

This definition states that the meaning of x=y is x. operator= (y). This has an interesting (though not always desirable) implication. Consider:

class X {/*...*/};
class Y : public X {/*...*/ };

void g(X x, Y y)
{
    x = y; // x.operator=(y): fine
    y = x; // y.operator=(x): error x is not a Y
}

By default, assignment to X is X& X::operator = (const X&) so x=y is legal because Y is publicly derived from X. This is usually called slicing because a “slice” of y is assigned to x. Copy constructors are handled in a similar manner.

I’m leery about slicing from a practical point of view, but I don’t see any way of preventing it except by adding a very special rule. Also, at the time, I had an independent request for exactly these “slicing semantics” from Ravi Sethi who wanted it from a theoretical and pedagogical point of view: Unless you can assign an object of a derived class to an object of its public base class, then that would be the only point in C++ where a derived object couldn’t be used in place of a base object.

This leaves one problem with default copy operations: pointer members are copied, but what they point to isn’t. This is almost always wrong, but can’t be disallowed because of C compatibility. However, a compiler can easily provide a warning whenever a class with a pointer member is copied using a default copy constructor or assignment. For example:

class String {
    char* p;
    int sz;
public:
    // no copy defined here (sloppy)
};

void f(const String& s)
{
    String s2 = s; // warning: pointer copied
    s2 = s;        // warning: pointer copied
}

By default, assignment and copy construction in C++ defines what is sometimes called shallow copy; that is, it copies the members of a class, but not objects pointed to by those members. The alternative that recursively copies objects pointed to (often called deep copy) must be explicitly defined. Given the possibility of self-referential objects things could hardly be otherwise. In general, it is unwise to try to define assignment to do deep copy; defining a (virtual) copy function is usually a much better idea (see [2nd,pp217-220] and §13.7).

11.5 Notational Convenience

My aim was to allow the user to specify the meaning of every operator as long as it made sense and as long as it didn’t interfere seriously with predefined semantics. It would have been easier if I could have allowed overloading of all operators without exception, or disallowed overloading of every operator that had a predefined meaning for class objects. The resulting compromise doesn’t please everybody.

Almost all discussion and most problems encountered relate to operators that don’t fit the usual pattern of binary or prefix arithmetic operators.

11.5.1 Smart Pointers

Before 2.0, the pointer dereference operator -> couldn’t be defined by users. This made it hard to create classes of objects intended to behave like “smart pointers.” The reason was simply that when I defined operator overloading, I saw – > as a binary operator with very special rules for the right-hand operand (the member name). I remember a meeting at Mentor Graphics in Oregon where Jim Howard jumped up, marched round a rather large conference table to the blackboard, and disabused me of this misconception. Operator ->, he pointed out, could be seen as a unary postfix operator where the result was reapplied to the member name. When I reworked the overloading mechanism, I used that idea.

It follows that if the return type of an operator->() function is used, it must be a pointer to a class or an object of a class for which operator->() is defined. For example:

struct Y { int m; };

class Ptr {
    Y* p;
    // . . .
public:
    Ptr(Symbolic_ref);
    ~Ptr();
    
    Y* operator->()
    {
        // check p
        return p;
    }

};

Here, Ptr is defined so that Ptrs act as pointers to objects of class Y, except that some suitable computation is performed on each access.

void f(Ptr x, Ptr& xr, Ptr* xp)
{
    x->m;  // x.operator->()->m; that is, x.p->m
    xr->m; // xr.operator->()->m; that is, xr.p->m
    xp->m; // error: Ptr does not have a member m
}

Such classes are especially useful when defined as templates (§15.9.1) [2nd]:

template<class Y> class Ptr { /* ... */ };

void f(Ptr<complex> pc, Ptr<Shape> ps) { /* ... */ }

This was understood when overloading of -> was first implemented in 1986. Unfortunately, it was years before templates became available so that such code could actually be written.

For ordinary pointers, use of -> is synonymous with some uses of unary * and []. For example, for a Y* p it holds that:

p->m == (*p).m == p[0].m

As usual, no such guarantee is provided for user-defined operators. The equivalence can be provided where desired:

class Ptr {
    Y* p;
public:
    Y* operator->() { return p; }
    Y& operator*() { return *p; }
    Y& operator[](int i) { return p[i]; }
    // ...
};

The overloading of -> is important to a class of interesting programs and not just a minor curiosity. The reason is that indirection is a key concept and that overloading – > provides a clean, direct, and efficient way of representing it in a program. Another way of looking at operator -> is to consider it a way of providing C++ with a limited, but very useful, form of delegation12.7).

11.5.2 Smart References

When I decided to allow overloading of operator ->, I naturally considered whether operator . could be similarly overloaded.

At the time, I considered the following arguments conclusive: If obj is a class object then obj.m has a meaning for every member m of that object’s class. We try not to make the language mutable by redefining built-in operations (though that rule is violated for = out of dire need, and for unary &).

If we allowed overloading of . for a class X, we would be unable to access members of X by normal means; we would have to use a pointer and ->, but -> and & might also have been re-defined. I wanted an extensible language, not a mutable one.

These arguments are weighty, but not conclusive. In particular, in 1990 Jim Adcock proposed to allow overloading of operator . exactly the way operator -> is.

Why do people want to overload operator. ()? To provide a class that acts as a “handle” or a “proxy” for another class in which the real work is done. As an example, here is a multi-precision integer class used in the early discussions of overloading of operator.():

class Num {
    // ...
public:
    Num& operator=(const Num&);
    int operator[](int);        // extract digit
    Num operator+(const Num&);
    void truncateNdigits(int); // truncate
    // ...
};

I’d like to define a class RefNum that behaves like a Num& except for performing some added actions. For example, if I can write:

void f(Num a, Num b, Num c, int i)
{
    // ...
    c = a+b;
    int digit = c [i];
    c.truncateNdigits(i);
    // ...

}

then also I want to be able to write:

void g(RefNum a, RefNum b, RefNum c, int i)
{

    // ...
    c = a+b;
    int digits = c[i];
    c.truncateNdigits(i);
    // ...

}

Assume that operator. () is defined in exact parallel to operator->(). We first try the obvious definition of Ref Num:

class RefNum {
    Num* p;

public:
    RefNum(Num& a) { p = &a; }
    Num& operator.() { do_something(p); return *p; }
    void bind(Num& q) { p = &q; }
};

Unfortunately, this doesn’t have the right effect because . isn’t explicitly mentioned in all cases:

c = a+b;               // no dot
int digits = c[i];     // no dot
c.truncateNdigits(i);  // call operator.()

We would have to write forwarding functions to ensure the right action is performed when operators are applied to a RefNum:

class RefNum {
    Num* p;
public:
    RefNum(Num& a) { p = &a; }
    Num& operator.() { do_something(p); return *p; }
    void bind(Num& q) { p = &q; }
    
    // forwarding functions:

    RefNum& operator=(const RefNum& a)
        { do_something(p); *p=*a.p; return *this; }
    int operator[](int i)
        { do_something(p); return (*p)[i]; }
    RefNum operator+(const RefNum& a)
        { do_something(p); return RefNum(*p+*a.p); }
};

This is clearly tedious. Consequently, many people, including Andrew Koenig and me, considered the effect of applying operator.() to every operation on a RefNum. That way, the original definition of RefNum would make the original example work as desired (and initially expected).

However, applying operator.() this way implies that to access a member of RefNum itself you must use a pointer:

void h(RefNum r, Num& x)
{
    r.bind(x);      // error: no Num::bind
    (&r)->bind(x); // ok: call RefNum::bind
}

The C++ community seems split over the issue of which interpretation of operator.() is best. I lean towards the view that if operator.() should be allowed then it should be invoked for implicit uses as well as explicit ones. After all, the reason for defining operator.() is to avoid writing forwarding functions. Unless implicit uses of . are interpreted by operator.(), we’ll still have to write a lot of forwarding functions, or we would have to eschew operator overloading.

If we can define operator.(), the equivalence of a.m and (&a) ->m would no longer hold by definition. It could be made to hold by defining both operators() and operator->() to match operator.(), though, so I personally don’t see that as a significant problem. However, if we did that there would be no way of accessing members of the smart reference class. For example, RefNum::bind() would become completely inaccessible.

Is that important? Some people have answered, “No, like ordinary references, smart references shouldn’t ever be re-bound to a new object.” However, my experience is that smart references often need a re-bind operation or some other operation to make them genuinely useful. Most people seem to agree.

We are thus left in a quandary: We can either maintain the a.m and (&a) ->m equivalence or have access to members of the smart reference, but not both.

One way out of the dilemma would be to forward using operator.() for a. m only if the reference class doesn’t itself have a member called m. This happens to be my favorite resolution.

However, there is no consensus on the importance of overloading operator.() either. Consequently, operator.() isn’t part of C++ and the debates rage on.

11.5.3 Overloading Increment and Decrement

The increment operator ++ and the decrement operator -- were among the operators that users could define. However, Release 1.0 did not provide a mechanism for distinguishing prefix from postfix application. Given

class Ptr {
    // . . .
    void operator++();
};

the single Ptr::operator++() will be used for both:

void f(Ptr& p)
{
    p++; // p.operator++()
    ++p; // p.operator++()
}

Several people, notably Brian Kernighan, pointed out that this restriction was unnatural from a C perspective and prevented users from defining a class that could be used as a replacement for an ordinary pointer.

I had of course considered separate overloading or prefix and postfix increment when I designed the C++ operator overloading mechanism, but I had decided that adding syntax to express it wouldn’t be worthwhile. The number of suggestions I received over the years convinced me that I was wrong, provided I could find some minimal change to express the prefix/postfix distinction.

I considered the obvious solution, adding the keywords prefix and postfix to

C++:
   class Ptr_to_X {
       // ...
       X& operator prefix++(); // prefix ++
       X operator postfix++(); // postfix ++
};

or

class Ptr_to_X {
    // . . .
    X& prefix operator++(); //prefix ++
    X postfix operator++(); //postfix ++
} ;

However, I received the usual howl of outrage from people who dislike new keywords. Several alternatives that did not involve new keywords were suggested. For example:

class Ptr_to_X {
    // ...
    X& ++operator();  // prefix ++
    X operator++() ;  // postfix ++
};

or

class Ptr_to_X {
    // ...
    X& operator++() ,  // prefix because it
                       // returns a reference  
    X operator++();    // postfix because it
                       // doesn't return a reference
};

I considered the former too cute and the latter too subtle. Finally, I settled on:

class Ptr_to_x {
    // . . .
    X& operator++();    // prefix: no argument
    X operator++(int);  // postfix: because of
                        // the argument
}

This may be both too cute and too subtle, but it works, requires no new syntax, and has a logic to the madness. Other unary operators are prefix and take no arguments when defined as member functions. The “odd” and unused dummy int argument is used to indicate the odd postfix operators. In other words, in the postfix case, + + comes between the first (real) operand and the second (dummy) argument and is thus postfix.

These explanations are needed because the mechanism is unique and therefore a bit of a wart. Given a choice, I would probably have introduced the prefix and postfix keywords, but that didn’t appear feasible at the time. However, the only really important point is that the mechanism works and can be understood and used by the few programmers who really need it.

11.5.4 Overloading->*

Operator ->* was made overloadable primarily because there wasn’t any reason not to (because of orthogonality, if you must). It turns out to be useful for expressing binding operations that somehow have semantics that parallel those of the built-in meaning for ->* (§13.11). No special rules are needed; ->* behaves just like any other binary operator.

Operator .* wasn’t included among the operators a programmer could overload for the same reason operator . wasn’t (§11.5.2).

11.5.5 Overloading the Comma Operator

At the urging of Margaret Ellis, I allowed overloading of the comma operator. Basically, I couldn’t find any reason not to at the time. Actually, there is a reason: a, b is already defined for any a and b, so allowing overloading enables the programmer to change the meaning of a built-in operator. Fortunately, that is only possible if either a or b is a class object. There appear to be few practical uses of operator,(). Accepting it was primarily a generalization.

11.6 Adding Operators to C++

There never are enough operators to suit everyone’s taste. In fact, it seems that with the exception of people who are against essentially all operators on principle, everyone wants a few extra operators.

11.6.1 An Exponentiation Operator

Why doesn’t C++ have an exponentiation operator? The original reason was that C doesn’t have one. The semantics of C operators are supposed to be simple to the point where they each correspond to a machine instruction on a typical computer. An exponentiation operator doesn’t meet this criterion.

Why didn’t I immediately add an exponentiation operator when I first designed C++? My aim was to provide abstraction mechanisms, not new primitive operations. An exponentiation operator would have to be given a meaning for built-in arithmetic types. This was the area of C that I was determined to avoid changing. Further, C and therefore C++ are commonly criticized for having too many operators with a confusing variety of precedences. Despite these significant deterrents, I still considered adding an exponentiation operator and might have done so had there been no technical problems. I wasn’t fully convinced that an exponentiation operator was really needed in a language with overloading and inline functions, but it was tempting to add the operator simply to silence the repeated assertions that it was needed.

The exponentiation operator people wanted was * *. This would cause a problem because a* *b can be a legal C expression involving a dereference of a pointer b:

double f(double a, double* b)
{
    return a**b; // meaning a*(*b)
}

In addition, there seemed to be some disagreement among proponents of an exponentiation operator about which precedence that operator ought to have:

a = b**c**d; // (b**c)**d or b**(c**d) ?
a = -b**c;   // (-b)**c or -(b**c) ?

Finally, I had little wish to specify the mathematical properties of exponentiation.

At the time, these reasons convinced me that I could serve users better by focusing on other issues. In retrospect, all of these problems can be overcome. The real question is “Would it be worthwhile to do so?” The issue was brought to a head when Matt Austern presented a complete proposal to the C++ standards committee (§6) in 1992. On its way to the committee this proposal had received a lot of comments and been the subject of much debate on the net.

Why do people want an exponentiation operator?

– They are used to it from Fortran.

– They believe that an exponentiation operator is much more likely to be optimized than an exponentiation function.

– A function call is uglier in the kind of expressions actually written by physicists and other primary users of exponentiation.

Are these reasons sufficient to counterbalance the technical problems and objections? Also, how can the technical problems be overcome? The extensions working group discussed these issues and decided not to add an exponentiation operator. Dag Brück summarized the reasons:

– An operator provides notational convenience, but does not provide any new functionality. Members of the working group, representing heavy users of scientific/engineering computation, indicated that the operator syntax provides minor syntactic convenience.

– Every user of C++ must learn this new feature.

– Users have stressed the importance of substituting their own specialized exponentiation functions for the system default, which would not be possible with an intrinsic operator.

– The proposal is not sufficiently well motivated. In particular, by looking at one 30,000 line Fortran program one cannot conclude that the operator would be widely used in C++.

– The proposal requires adding a new operator and adding another precedence level, thus increasing the complexity of the language.

This brief statement somewhat understates the depth of the discussion. For example, several committee members reviewed significant bodies of corporate code for use of exponentiation and didn’t find the usage as critical as is sometimes asserted. Another key observation was that the majority of occurrences of * * in the Fortran code examined were of the form a**n where n was a small integer literal; writing a*a and a*a*a seemed viable alternatives in most cases.

Whether it would have been less work in the long run to accept the proposal remains to be seen. However, let me present some of the technical issues. Which operator would be best as a C++ exponentiation operator? C uses all the graphical characters in the ASCII character set with the exception of @ and $, and these were for several reasons not suitable. The operators !, ~, *~,^^, and even plain ^ when either operand was non-integral were considered. However, @, $, ~, and ! are national characters that don’t appear on all keyboards (see §6.5.3.1); @ and $ are further perceived by many as ugly for this purpose. The tokens ^ and ^^ read "exclusive or” to C programmers. An added constraint is that it should be possible to combine the exponentiation operator with the assignment operator in the same way other arithmetic operators are; for example, + and = gives +=. This eliminates ! because ! = already has a meaning. Matt Austern therefore settled on *~ and that is probably the best such choice.

All other technical issues were settled by following their resolution in Fortran. This is the only sane solution and saves a lot of work. Fortran is the standard in this area, and it requires very significant reasons to part ways with a de facto standard.

This point led me to revisit * * as an exponentiation operator for C++. I had, of course, demonstrated that this was impossible using traditional techniques, but when looking at the question again I realized that the C compatibility issues could be overcome by some compiler trickery. Assume we introduced the operator * *. We could handle the incompatibility by defining it to mean “dereference and multiply” when its second operand is a pointer:

void f(double a, double b, int* p)
{
   a**b; // meaning pow(a,b)
   a**p; // meaning a*(*p)
   **a;  // error: a is not a pointer
   **p;  // error: means *(*p) and *p is not a pointer
}

To fit into the language * * would of course have to be a token. This implies that when * * appears as a declarator it must be interpreted as double indirection:

char** p; // means char * * p;

The main problem with this is that the precedence of * * must be higher than * for a/b**c to mean what mathematicians would expect, that is a/(b**c). On the other hand a/b**p in C means (a/b) * (*p) and would quietly change its meaning to a/(b*(*p)). I suspect such code is rare in C and C++. Breaking it would be worthwhile if we decided to provide an exponentiation operator – especially because it would be trivial for a compiler to issue a warning where the meaning might change.

However, we decided not to add an exponentiation operator, so the issue is now purely academic. I was amused to see the horror that my semi-serious suggestion to use * * caused. I am also continuously amused and puzzled over the amount of heat generated by minor syntactic issues such as whether exponentiation should be spelled pow (a, b), a**b, or a*~b.

11.6.2 User-defined Operators

Could I have avoided the whole discussion about an exponentiation operator by designing a mechanism that allowed users to define their own operators? This would have solved the problem of missing operators in general.

When you need operators you invariably find that the set provided by C and C++ is insufficient to express every desired operation. The solution is to define functions. However, once you can say

a*b

for some class, functional forms like

pow(a,b)
abs(a)

start to look unsatisfactory. Consequently, people ask for the ability to define a meaning for

a pow b
abs a

This can be done. Algol68 showed one way. Further, people ask for the ability to define a meaning for

a ** b
a // b
 la

etc. This too can be done. The real question is whether allowing user-defined operators is worthwhile. I observed [ARM]:

“This extension, however, would imply a significant extension of complexity of syntax analysis and an uncertain gain in readability. It would be necessary either to allow the user to specify both the binding strength and the associativity of new operators or to fix those attributes for all user-defined operators. In either case, the binding of expressions such as

a = b**c**d;      // (b**c)**d or b**(c**d) ?

would be surprising or annoying to many users. It would also be necessary to resolve clashes with the syntax of the usual operators. Consider this, assuming * * and / / to be defined as binary operators:

a = a**p;   // a**p OR a *(*p)
a = a//p;
*P = 7;    // a = a*p = 7; maybe? ''

Consequently, user-defined operators would either have to be restricted to ordinary characters or require a distinguishing prefix such as . (dot):

a pow b;    // alternative 1
a .pow b;   // alternative 2
a .** b;    // alternative 3

User-defined operators must be given a precedence. The easiest way to do that is to specify the precedence of a user-defined operator to be the same as some built-in operator. However, that would not suffice to define the exponentiation operator “correctly.” For that we need something more elaborate. For example:

operator pow: binary, precedence between * and unary

Also, I am seriously worried about the readability of programs with user-defined operators with user-defined precedences. For example, more than one precedence for exponentiation has been used in programming languages so different people would define different precedences for pow. For example,

a = – b pow c * d;

would be parsed differently in different programs.

A simpler alternative is to give all user-defined operators the same precedence. The latter seemed very attractive until I discovered that even I and my two closest collaborators at the time, Andrew Koenig and Jonathan Shopiro, were unable to agree on a precedence. The obvious candidates are “very high” (for example, just above multiply) and “very low” (for example, just above assignment). Unfortunately, the number of cases where one seems ideal and the other absurd appeared endless. For example, it seems hard to get even the simplest example “right” with only a single precedence level. Consider:

a = b * c pow d;
a = b product c pow d;
a put b + c;

Thus, C++ doesn’t support user-defined operators.

11.6.3 Composite Operators

C++ supports overloading of unary and binary operators. I suspect it would be useful to support overloading of composite operators. In the ARM, I explained the idea like this:

"For example, the two multiplications in

Matrix a, b, c, d;
// ...
a = b * c * d;

might be implemented by a specially defined “double multiplication” operator defined like this:

Matrix operator * * (Matrix&, Matrix&, Matrix&) ;

that would cause the statement above to be interpreted like this:

a = operator * * (b,c,d);

In other words, having seen the declaration

Matrix operator * * (Matrix&, Matrix&, Matrix&) ;

the compiler looks for patterns of repeated Matrix multiplications and calls the function to interpret them. Patterns that are different or too complicated are handled using the usual (unary and binary) operators.

This extension has been independently invented several times as an efficient way of coping with common patterns of use in scientific computing using user-defined types. For example,

Matrix operator = * + (
    Matrix&,
    const Matrix&,
    double,
    const Matrix&
);

for handling statements like this:

a=b*l.7+d; ''

Naturally, the placement of whitespace would be very significant in such declarations. Alternatively, some other token could be used to signify the position of the operands:

Matrix operator.=.*.+.(
    Matrix&,
    const Matrix&,
    double,
    const Matrix&
) ;

I have never seen this idea explained in print prior to the ARM, but it is a common technique in code generators. I consider the idea promising for supporting optimized vector and matrix operations, but I have never had time to develop it sufficiently to be sure. It would be notational support for the old technique of defining functions performing composite operations given several arguments.

11.7 Enumerations

C enumerations constitute a curiously half-baked concept. Enumerations were not part of the original conception of C and were apparently reluctantly introduced into the language as a concession to people who insisted on getting a form of symbolic constants more substantial than Cpp’s parameterless macros. Consequently, the value of a C enumerator is of type int, and so is the value of a variable declared to be of “enumerator type.” An int can be freely assigned to any enumeration variable. For example:

enum Color { red, green, blue };

void f() /* C function */
{
    enum Color c = 2; /* ok */
    int i = c;        /* ok */
}

I had no need for enumerations in the styles of programming I wished to support and no particular wish to meddle in the affairs of enumerations, so C++ adopted C’s rule unchanged.

Unfortunately (or fortunately, if you like enumerations), the ANSI C committee left me with a problem. They changed or clarified the definition of enumerations such that pointers to different enumerations appeared to be different types:

enum Vehicle { car, horse_buggy, rocket };

void g(pc,pv) enum Color* pc; enum Vehicle* pv;
{
    pc = pv; /* probably illegal in ANSI C */
}

I had a longish discussion of this point involving C experts such as David Hanson, Brian Kernighan, Andrew Koenig, Doug McIlroy, David Prosser, and Dennis Ritchie. The discussion wasn’t completely conclusive – that in itself was an ominous sign -but there was an agreement that the intent of the standard was to outlaw the example, except maybe leaving a loophole accepting the example if (as is common) Color and Vehicle are represented by the same amount of storage.

This uncertainty was unacceptable to me because of function overloading. For example:

void f (Color*) ;
void f(Vehicle*);

must either declare one function twice or two overloaded functions. I had no wish to accept any weaselwording or implementation dependency. Similarly,

void f(Color);
void f(Vehicle);

must either declare one function or two overloaded functions. In C and pre-ARM C++, those declarations declared a single function twice. However, the cleanest way out was to deem each enumeration a separate type. For example:

void h() // C++
{
    Color c = 2;  // error
    c = Color(2); // ok: 2 explicitly converted to Color
    int i = c;    // ok: col implicitly converted to int
}

This resolution had been vocally demanded by someone every time I had discussed enumerations with C++ programmers. I suspect I acted rashly – despite months of delay and endless consulting with C and C++ experts – but nevertheless reached the best resolution for the future.

11.7.1 Overloading based on Enumerations

Having declared each enumeration a separate type, I forgot something obvious: An enumeration is a separate type defined by the user. Consequently, it is a user-defined type just as a class is. Consequently, it is possible to overload operators based on an enumeration. Martin O’Riordan pointed this out at an ANSI/ISO meeting. Together with Dag Brück, he worked out the details and overloading based on enumerations was accepted into C++. For example:

enum Season { winter, spring, summer, fall };

Season operator++(Season s)
{
    switch (s) {
    case winter: return spring;
    case spring: return summer;
    case summer: return fall;
    case fall: return winter;
    }
}

I used the switch to avoid integer arithmetic and casts.

11.7.2 A Boolean Type

One of the most common enumerations is

enum bool { false, true };

Every major program has that one or one of its cousins:

#define bool char
#define Bool int
typedef unsigned int BOOL;
typedef enum { F, T } Boolean;
const true = 1;
#define TRUE 1
#define False ("True)

The variations are apparently endless. Worse, most variations imply slight variations in semantics, and most clash with other variations when used together.

Naturally, this problem has been well known for years. Dag Brück and Andrew Koenig decided to do something about it:

“The idea of a Boolean data type in C++ is a religious issue. Some people, particularly those coming from Pascal or Algol, consider it absurd that C should lack such a type, let alone C++. Others, particularly those coming from C, consider it absurd that anyone would bother to add such a type to C++.”

Naturally, the first idea was to define an enum. However, an examination of hundreds of thousands of lines of C++ by Dag Brück and Sean Corfield revealed that most Boolean types were used in ways that required free conversion to and from int. This implied that defining a Boolean enumeration would break too much existing code. So why bother with a Boolean type?

[1] The Boolean data type is a fact of life whether it is a part of a C++ standard or not.

[2] The many clashing definitions makes it hard to use any Boolean type conveniently and safely.

[3] Many people want to overload based on a Boolean type.

Somewhat to my surprise, the ANSI/ISO accepted this argument so bool is now a distinct integral type in C++ with literals true and false. Non-zero values can be implicitly converted to true, and true can be implicitly converted to 1. Zero can be implicitly converted to false, and false can be implicitly converted to 0. This ensures a high degree of compatibility.

Some may even prefer F=MA, but the explanation of how to make that work (“overloading missing whitespace”) is beyond the scope of this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.222.47