6. Standardization

Don’t you try to outweird me, I get stranger things than you free with my breakfast cereal.

Zaphod Beeblebrox

What is a standard? — aims of the C++ standards effort — how does the committee operate? — who is on the committee? — language clarifications — name lookup rules — lifetime of temporaries — criteria for language extension — list of proposed extensions — keyword arguments — an exponentiation operator — restricted pointers — character sets.

6.1 What is a Standard?

There is much confusion in the minds of programmers about what a standard is and what it ought to be. One ideal for a standard is to completely specify exactly which programs are legal and exactly what the meaning of every such program is. For C and C++ at least, that is not the whole story. In fact, it can’t and shouldn’t be the ideal for languages designed to exploit the diverse world of hardware architectures and gadgets. For such languages, it is essential to have some behavior implementation-dependent. Thus, a standard is often described as “a contract between the programmer and the implementer.” It describes not only what is “legal” source text, but also what a programmer can rely on in general and what behavior is implementation-dependent. For example, in C and C++ one can declare variables of type int, but the standard doesn’t specify how large an int is, only that it has at least 16 bits.

It is possible to have long and somewhat learned debates about what the standard really is and what terminology can best be employed to express it. However, the key points are to sharply distinguish what is and what is not a valid program, and further to specify what behavior should be the same in all implementations and what is implementation-dependent. Exactly how those distinctions are drawn is important, but not very interesting to practical programmers. Most committee members focus on the more language-technical aspects of standardization so the main burden of tackling the thorny issues of what the standard standardizes falls on the committee’s project editor. Fortunately, our original project editor Jonathan Shopiro has an interest in such matters. Jonathan has now retired as editor in favor of Andrew Koenig, but Jonathan is still a member of the committee.

Another interesting (that is, very difficult) question is to which extent an implementation with features not specified in the standard is acceptable. It seems unreasonable to ban all such extensions. After all, some extensions are necessary to important sub-sections of the C++ community. For example, some machines have hardware that supports specific concurrency mechanisms, special addressing constraints, or special vector hardware. We can’t burden every C++ user with features to support all these incompatible special-purpose extensions. They will be incompatible and will often impose a cost even on non-users. However, it would be unfortunate to discourage implementers serving such communities from trying to be perfectly conforming except for their essential extensions. On the other hand, I was once presented with an “extension” that allowed access to private members of a class from every function in the program; that is, the implementer had not bothered to implement access control. I didn’t consider that a reasonable extension. Wordsmithing the standard to allow the former and not the latter is a nontrivial task.

An important point is to ensure that nonstandard extensions are detectable; otherwise, a programmer might wake up some morning and find significant code dependent on a supplier’s unique extensions and thus without the option to change suppliers with reasonable ease. As a naive student, I remember being surprised and pleased to find that the Fortran on our university mainframe was an “extended Fortran” with some neat features. My surprise turned to dismay when I realized that this implied that my programs would be useless except on CDC6000 series machines.

Thus, 100% portability of standards-conforming programs is not in general an achievable or desirable ideal for C++. A program that conforms to a standard is not necessarily 100% portable because it may display implementation-dependent behavior. Actually, most do. For example, a perfectly legal C or C++ program may change its meaning if it happens to depend on the results of the built-in remainder operator % applied to a negative number.

Further, real programs tend to have dependencies on libraries providing services not offered on every system. For example, a Microsoft Windows program is unlikely to run unchanged under X, and a program using the Borland foundation classes will not trivially be ported to run under MacApp. Portability of real programs comes from design that encapsulates implementation and environment dependencies, not just from adherence to a few simple rules in a standards document.

Knowing what a standard doesn’t guarantee is at least as important as knowing what it does promise.

6.1.1 Implementation Details

Every week, there seems to be a new request for standardizing things like the virtual table layout, the type-safe linkage name encoding scheme, or the debugger. However, these are quality-of-implementation issues or implementation details that are beyond the scope of the standard. Users would like libraries compiled with one compiler to work with code compiled with another, would like binaries to be transferable from one machine architecture to another, and would like debuggers to be independent of the implementation used to compile the code being examined.

However, standardization of instruction sets, operating-system interfaces, debugger formats, calling sequences, and object layouts is far beyond the ability of the standards group for a programming language that is merely one little cog in a much bigger system. Such universal standardization probably isn’t even desirable because it would stifle progress in machine architectures and operating systems. If a user needs total independence from hardware the system/environment must be built as an interpreter with its own standard environment for applications. That approach has its own problems; in particular, specialized hardware becomes hard to exploit and local style guides cannot be followed. If those problems are overcome by interfacing to code written in another language that allows nonportable code, such as C++, the problem recurs.

For a language suitable for serious systems work, we must live with the fact that every now and again a naive user posts a message to the net: “I moved my object code from my Mac to my SPARC and now it won’t work.” Like portability, interoperability is a matter of design and understanding of the constraints imposed by the environments. I often meet C programmers who are unaware that code compiled with two different C compilers for the same system is not guaranteed to link and in fact is unlikely to do so – yet express horror that C++ doesn’t guarantee such interoperability. As usual, we have a major task in educating users.

6.1.2 Reality Check

In addition to the many formal constraints on a standards committee, there is an informal and practical one: Many standards are simply ignored by their intended users. For example, the Pascal and Pascal2 standards are almost completely forgotten. For most Pascal programmers, “Pascal” means Borland’s greatly extended Pascal dialect. The language defined by the Pascal standard didn’t provide features users considered essential and the Pascal2 standard didn’t appear until a different informal “industry standard” had established itself. Another cautionary observation is that on UNIX most work is still done in K&R C; ANSI C is struggling in that community. The reason seems to be that some users don’t see the technical benefits of ANSI/ISO C compared to K&R C outweighing the short-term costs of a transition. Even an unchallenged standard can be slow finding its way into use. To become accepted, a standard must be timely and relevant to users’ needs. In my opinion, delivering a good standard for a good language in a timely manner is essential. Trying to change C++ into a “perfect” language or to produce a standard that cannot be misread by anyone – however devious or ill-educated – is far beyond the abilities of the committee (§3.13). In fact, it is beyond anyone working under the time constraint provided by a large user community (§7.1).

6.2 How does the Committee Operate?

There are actually several committees formed to standardize C++. The first and largest is the American National Standards Institute’s ANSI-X3J16 committee. That committee is the responsibility of the Computer and Business Equipment Manufactures Association, CBEMA, and operates under its rules. In particular, this means one-company-one-vote voting and a person who doesn’t work for a company counts as a company. A member can start voting at the second meeting attended. Officially, the most important committee is the International Standards Organization’s ISO-WG-21. That committee operates under international rules and is the one that will finally make the result an international standard. In particular, this means one-country-one-vote voting. Other countries, including Britain, Denmark, France, Germany, Japan, Russia, and Sweden now have their own national committees for standardizing C++. These national committees send requests, recommendations, and representatives to the joint ANSI/ISO meetings.

Basically, we have decided not to accept anything that doesn’t pass under both ANSI and ISO voting rules. This implies that the committee operates rather like a bicameral parliament with a “lower house” (ANSI) doing most of the arguing and an “upper house” (ISO) ratifying the decisions of the lower house provided they make sense and duly respect the interests of the international community.

On one occasion, this procedure led to the rejection of a proposal that would otherwise have passed by a small majority. Thus, I think the national representatives saved us from a mistake that could have caused dissension. I couldn’t interpret that majority as reflecting a consensus and I therefore think that – independently of the technical merit of the proposal – the national representatives gave the committee an important reminder of their responsibilities under their charter. The issue in question was that of whether C++ should have a specific form of defined minimum translation limits. A significantly improved proposal was accepted at a later meeting.

The ANSI and ISO committees meet jointly three times a year. To avoid confusion I will refer to them using the singular committee. A meeting lasts a week out of which many hours are taken up with legally mandated procedural stuff. Yet more hours are taken up by the kind of confusion you might expect when 70 people try to understand what the issues really are. Some daytime hours and several evenings are taken up by technical sessions where major C++ issues, such as international character handling and run-time type identification, and issues relevant to standards work, such as formal methods and organizations of international standardization bodies, are presented and discussed. The rest of the time is mostly taken up by working group meetings and discussions based on the reports from those working groups.

The current working groups are:

– C compatibility

– Core language

– Editorial

– Environment

– Extensions

– International issues

– Libraries

– Syntax

Clearly, there is too much work for the committee to handle in only three weeks of meetings a year, so much of the actual work goes on between meetings. To aid communication, we use email a lot. Every meeting involves something like three inches of double-sided paper memos. These memos are sent in two packages: one arrives a couple of weeks before a meeting to help members prepare, and one a couple of weeks after to reflect work done between the first mailing and the end of the meeting.

6.2.1 Who is on the C++ Standards Committee?

The C++ committee consists of individuals of diverse interests, concerns, and backgrounds. Some represent themselves, some represent giant corporations. Some use PCs, some use UNIX boxes, some use mainframes, etc. Some use C++, some don’t. Some want C++ to be more of an object-oriented language (according to a variety of definitions of “object-oriented”), others would have been more comfortable had ANSI C been the end-point of C’s evolution. Many have a background in C, some don’t. Some have a background in standards work, many don’t. Some have a computer science background, some don’t. Some are programmers, some are not. Some are language lawyers, some are not. Some serve end-users, some are tools suppliers. Some are interested in large projects, some are not. Some are interested in C compatibility, some are not.

Except that all are officially unpaid volunteers (though most represent companies), it is hard to find a generalization that covers all. This is good; only a very diverse group could ensure that the diverse interests of the C++ community are represented. It does make constructive discussion difficult and slow at times. In particular, this very open process is vulnerable to disruption by individuals whose technical or personal level of maturity doesn’t allow them to understand or respect the views of others. I also worry that the voice of C++ users (that is, programmers and designers of C++ applications) can be drowned by the voices of language lawyers, would-be language designers, standards bureaucrats, implementers, etc.

Usually about 70 people attend a meeting, and of those, about half attend almost all meetings. The number of voting, alternate, and observing members is more than 250. I’m an alternate member, meaning that I represent my company, but someone else from my company votes. Let me give you an idea about who is represented here by simply glancing over a list of members and copying out some of the better-known names chosen from the membership list in 1990: Amdahl, Apple, AT&T, Bellcore, Borland, British Aerospace, CDC, Data General, DEC, Fujitsu, Hewlett-Packard, IBM, Los Alamos National Labs, Lucid, Mentor Graphics, Microsoft, MIPS, NEC, NIH, Object Design, Ontologies, Prime Computer, SAS Institute, Siemens Nixdorf, Silicon Graphics, Sun, Tandem Computers, Tektronix, Texas Instruments, Unisys, US WEST, Wang, and Zortech. This list is of course biased towards companies I know of and towards large companies, but I hope you get the idea that the industry is well represented. Naturally, the individuals involved are as important as the companies they represent, but I will refrain from turning this into an advertisement for my friends by naming them.

6.3 Clarifications

Much of the best standards work is invisible to the average programmer and appears quite esoteric and often boring when presented. The reason is that a lot of effort is expended in finding ways of expressing clearly and completely “what everyone already knows, but just happens not to be spelled out in the manual” and in resolving obscure issues that – at least in theory – don’t affect most programmers. Naturally, these issues are essential to implementers trying to ensure that a given language use is correctly handled. In turn, these issues become essential to programmers because even the most carefully written large program will deliberately or accidentally depend on some feature that would appear obscure or esoteric to some. Unless implementers agree, the programmer has little choice between implementations and becomes the hostage of a single compiler purveyor – and that would be contrary to my view of what C++ is supposed to be (see §2.1).

I will present two issues, name lookup and lifetime of temporaries, to illustrate the difficult and detailed work done. The majority of the committee’s efforts are expended on such issues.

6.3.1 Lookup Issues

The most stubborn problems in the definition of C++ relate to name lookup: exactly which uses of a name refer to which declarations? Here, I’ll describe just one kind of lookup problem: the ones that relate to order dependencies between class member declarations. Consider:

int x;

class X {
    int f() { return x; }
    int x;
} ;

Which x does X::f() refer to? Also:

typedef char* T;

class Y {
    T f() { Ta = 0 ; return a; }
    typedef int T;
} ;

Which T does Y::f() use?

The ARM gives the answers: The x referred to in X::f() is X::x, and the definition of class Y is an error because the meaning of the type T changes after its use in Y::f().

Andrew Koenig, Scott Turner, Tom Pennello, Bill Gibbons, and several others devoted hours to finding precise, complete, useful, logical, and compatible (with the C standard and existing C++ code) answers to this kind of question at several consecutive meetings and weeks of work in between meetings. My involvement in these discussions was limited by my need to focus on extension-related issues.

Difficulties arise because of conflicts between goals:

[1] We want to be able to do syntax analysis reading the source text once only.

[2] Reordering the members of a class should not change the meaning of the class.

[3] A member function body explicitly written inline should mean the same thing when written out of line.

[4] Names from an outer scope should be usable from an inner scope (in the same way as they are in C).

[5] The rules for name lookup should be independent of what a name refers to.

If all of these rules hold, the language will be reasonably fast to parse, and users won’t have to worry about these rules because the compiler will catch the ambiguous and near ambiguous cases. The current rules come very close to this ideal.

6.3.1.1 The ARM Name Lookup Rules

In the ARM, I addressed the problems with moderate success. Names from outer scopes can be used directly, and I tried to minimize the resulting order dependencies by two rules:

[1] The type redefinition rule: A type name may not be redefined in a class after it has been used there.

[2] The rewrite rule: Member functions defined inline are analyzed as if they were defined immediately after the end of their class declarations.

The redefinition rule makes class Y an error:

typedef char* T;

class Y {
    T f() { T a = 0; return a; }
    typedef int T; // error T redefined after use
};

The rewrite rule says that class X should be understood as

int x;

class X {
    int f();
    int x;
};

inline int X::f() { return x; } // returns X::x

Unfortunately, not all examples are this simple. Consider:

const int i = 99;

class Z {
    int a[i];
    int f() { return i; }
    enum { i = 7 };
};

According to the ARM rules and (clearly?) contrary to their intent, this example is legal, and the two uses of i refer to different definitions and yield different values. The rewrite rule ensures that the i used in Z::f() is Z::i with the value 7. However, there is no rewrite rule for the i used as an index, so it refers to the global i with the value 9 9. Even though i is used to determine a type, it is not itself a type name, so it is not covered by the type redefinition rule. The ANSI/ISO rules ensure that the example is illegal because i is redefined after it has been used.

Also:

class T {
    A f ();
    void g() (Aa; /* ... */ }
    typedef int A;
} ;

Assume that no type A was defined outside T. Is the declaration of T::f() legal? Is the definition of T::g() legal? The ARM deems the declaration of T::f() illegal because A is undefined at that point; the ANSI/ISO rules agree. On the other hand, the ARM deems the definition of g() legal if you interpret the rewrite rule to say that “rewriting” takes place before syntax analysis and illegal if you interpret it to allow syntax analysis first and rewrite afterward. The issue is whether A is a type name when the syntax analysis is done. I think that the ARM supports the first view (that is, the declaration of T::g() is legal), but I wouldn’t claim that to be indisputably obvious. The ANSI/ISO rules agree with my interpretation of the ARM rules.

6.3.1.2 Why Allow Forward References?

In principle, these problems could be avoided by insisting on strict one-pass analysis: You can use a name if and only if it has been declared “above/before” and what happens “below/after” can’t affect a declaration. This is, after all, the rule in C and elsewhere in C++. For example:

int x;

void f()
{
    int y = x;    // global x
    int x = 7;
    int z = x;       // local x
}

However, when I first designed classes and inline functions, Doug McIlroy argued convincingly that serious confusion would result from applying that rule to class declarations. For example:

int x;

class X {
    void f() { int y = x; } // ::x or X::x?
    void g();
    int x;
    void h() { int y = x; } // X::x
};

void X :: g() { int y = x; } // X::x

When the declaration of X is large, the fact that different xs are present will often be unnoticed. Worse, unless the member x was used consistently, a silent change of meaning would result from a reordering of members. Taking a function body out of the class declaration into a separate member function declaration could also quietly change its meaning. The rewrite and redefinition rules provided protection against subtle errors and some freedom to reorganize classes.

These arguments apply to nonclass examples also, but only for classes is the compiler overhead of this protection affordable – and only for classes could C compatibility problems be avoided. In addition, class declarations are exactly where reorderings are most frequent and most likely to have undesirable side effects.

6.3.1.3 The ANSI/ISO Name Lookup Rules

Over the years, we found many examples that weren’t covered by the explicit ARM rules, were order-dependent in obscure and potentially dangerous ways, or the interpretation of the rules were uncertain. Some were pathological. One favorite was found by Scott Turner:

typedef int P();
typedef int Q();
class X {
  static P(Q); // define Q to be a P.
               // equivalent to ''static int Q()''
               // the parentheses around Q are redundant

               // Q is no longer a type in this scope

static Q(P);  // define Q to be a function
              // taking an argument of type P
              // and returning an int.
              // equivalent to "'static int Q(int())''
};

Declaring two functions with the same name in the same scope is fine as long as their argument types differ sufficiently. Reverse the order of member declarations, and we define two functions called P instead. Remove the typedef for either P or Q from the context, and we get yet other meanings.

This example ought to convince anybody that standards work is dangerous to your mental health. The rules we finally adopted makes this example undefined.

Note that this example – like many others – is based on the unfortunate “implicit int” rule inherited from C. I tried to get rid of that rule more than ten years ago (§2.8.1). Unfortunately, not all sick examples rely on the implicit int rule. For example:

int b;

class Z {
    static int a[sizeof(b)];
    static int b[sizeof(a)];
};

This example is an error because b changes meaning after it has been used. Fortunately, this kind of error is easy for a compiler to catch – unlike the P(Q) example.

At the Portland meeting in March 1993 the committee adopted these rules:

[1] The scope of a name declared in a class consists not only of the text following the name’s declarator but also of all function bodies, default arguments, and constructor initializers in that class (including such things in nested classes). It excludes the name’s own declarator.

[2] A name used in a class S must refer to the same declaration when reevaluated in its context and in the completed scope of S. The completed scope of S consists of the class S, S’s base classes, and all classes enclosing S. This is often called “the reconsideration rule.”

[3] If reordering member declarations in a class yields an alternate valid program under [1] and [2], the program’s meaning is undefined. This is often called “the reordering rule.”

Note that very few programs are affected by this change of rules. The new rules are primarily a clearer statement of the original intent. At first glance, these rules seem to require a multi-pass algorithm in a C++ implementation. However, they can be implemented by a single pass followed by one or more passes over information gathered during the first pass, and are not a performance bottleneck.

6.3.2 Lifetime of Temporaries

Many operations in C++ require the use of temporary values. For example:

void f(X al, X a2)
{
    extern void g(const X&);
    X z;
    //...
    z = al+a2; g(al+a2);
    //...
}

In general, an object (probably of type X) is needed to hold the result of al+a2 before assigning it to z. Similarly, an object is needed to hold the result of al+a2 passed to g(). Assume that X is a class with a destructor. Where, then, is the destructor for this temporary invoked? My original answer to that question was “at the end of the block just like every other local variable.” There proved to be two problems with this answer:

[1] Sometimes, that doesn’t leave a temporary around for long enough. For example, g() might push a pointer to its argument (the temporary resulting from al+a2) onto a stack, and someone might pop that pointer and try to use it after f() has returned, that is, after the temporary has been destroyed.

[2] Sometimes, that leaves a temporary around for too long. For example, X might be a 1,000 by 1,000 matrix type and dozens of temporary matrixes might be created before the end of block is reached. This will exhaust even large real memories and can send a virtual memory mechanism into spasms of paging.

In my experience, the former problem is rare in real programs, and its general solution is the use of automatic garbage collection (§10.7). The latter problem, however, is common and serious. In practice, it forced some people to enclose each statement suspected of generating temporaries in its own block:

void f(X al, X a2)
{
    extern void g(const X&);
    X z;
    //...
    {z = al+a2;}
    (g(al+a2);}
    //...
}

With the point of destruction at the end of the block – as implemented by Cfront -users could at least explicitly work around the problem. However, a better resolution was loudly demanded by some users. Consequently, in the ARM, I relaxed the rule to allow destruction at any point after the temporary value was first used and the end of the block. This was a misguided act of intended kindness. It caused confusion and helped nobody because as different implementers chose different lifetimes of temporaries, nobody could write code that was guaranteed to be portable except by assuming immediate destruction – and that was quickly shown to be unacceptable by breaking code using common and well-liked C++ idioms. For example:


class String {
    //...
public :
    friend String operator+(const String&,const String&);
    //...
    operator const char*(); // C-style string
};

void f(String s1, String s2)
{
    printf("%s", (const char*) (s1+s2));
    //...
}

The idea is that String’s conversion operator is invoked to produce a C-style string for printf to print. In the typical (naive and efficient) implementation, the conversion operator simply returns a pointer to part of the String object.

Given this simple implementation of the conversion operator, this example wouldn’t work under an “immediate destruction of temporaries” implementation: A temporary is created for s1 + s2, the conversion to a C-style string obtains a pointer to the internals of this temporary, the temporary is destroyed, and then the pointer to the internals of the now-destroyed temporary is passed to printf(). The destructor for the String temporary holding s1 + s2 would have freed the memory holding the C-style string.

Such code is common and even implementations that generally follow an immediate destruction strategy, such as GNU’s G++, tended to delay destruction in such cases. This kind of thinking led to the idea of destroying temporaries at the end of the statement in which they were constructed. This would make the example above not only legal, but guaranteed portable across implementations. However, other “almost equivalent” examples would break. For example:

void g(String s1, String s2)
{
    const char* p = s1+s2;
    printf("%s",p);
    //...
}

Given the “destroy temporaries at the end of statement” strategy the C-string pointed to by p would reside in the temporary representing s1 + s2 and be freed at the end of the statement initializing p.

Discussions of the lifetime of temporaries festered in the standards committee for about two years until Dag Brück successfully brought it to a close. Before that, the committee spent much time discussing the relative merits of solutions that all were good enough. Everyone also agreed that no solution was perfect. My opinion -somewhat loudly expressed – was that users were hurting for lack of a resolution and that the time had come to just pick one. I think the best alternative was chosen.

Dag’s summary of the issues in July 1993 was primarily based on work by Andrew Koenig, Scott Turner, and Tom Pennello. It identified seven main alternative points of destruction of a temporary:

[1] Just after the first use.

[2] At the end of statement.

[3] At the next branching point.

[4] At the end of block (original C++ rule, like Cfront).

[5] At the end of function.

[6] After the last use (implies garbage collection).

[7] Leave undefined between first use and end of block (ARM rule).

I leave it as an exercise to the reader to construct valid arguments in favor of each alternative. It can be done. However, serious, valid objections can also be made for each. Consequently, the real problem is picking an alternative with a good balance of benefits and problems.

In addition, we considered the possibility of having a temporary destroyed after its last use in a block, but that requires flow analysis, and we didn’t feel we could require every compiler to do a flow analysis well enough to ensure that “after the last use in a block” was a well-defined point in the computation in every implementation. Please note that local flow analysis would not be sufficient to provide reliable warning against “too early destruction;” conversion functions returning a pointer to the internals of an object are often defined in a compilation unit different from the ones in which they are used. Trying to ban such functions would be pointless because a ban would break much existing code and couldn’t be enforced anyway.

From about 1991, the committee focused on “end of statement,” and naturally that alternative was colloquially known as EOS. The problem was to decide precisely what EOS should mean. For example:

void h(String s1, String s2)
{
    const char* p;

    if (p = s1+s2) {
        //...
    }
}

Should the value of p be useful within the statement block? That is, does the destruction of the object holding s1+s2 take place at the end of the condition or at the end of the whole if statement? The answer is that the object holding s1+s2 will be destroyed at the end of the condition. It would be absurd to guarantee this:

if (p = s1+s2) printf("%s",p);

while making this

p = s1+s2;
printf("%s",p);

implementation-dependent.

How should branching within an expression be handled? For example, should this be guaranteed to work?:

if ((p=s1 + s2)  && p[0]) {
    //...
}

The answer is yes. It is much easier to explain this answer than to explain special rules for &&, ||, and ? :. There was some opposition to this, though, because this rule cannot be implemented in general without introducing flags to ensure that temporary objects are destroyed only if they appeared on a branch actually taken. However, the compiler writers on the committee rose to the challenge and demonstrated that the overhead imposed was vanishingly small and basically irrelevant.

Thus, EOS came to mean “end of full expression,” where a full expression is an expression that is not a sub-expression of another expression.

Note that the resolution to destroy temporaries at the end of full expression will break some Cfront code, but it will not break any code guaranteed to work by the ARM. The resolution addresses the desire for a well-defined and easy-to-explain point of destruction. It also satisfies the desire not to have temporaries hanging around for too long. Objects that need to stay around for longer must be named. Alternatively, one can use techniques that don’t require long-lived objects. For example:

void f(String s1, String s2)
{
    printf("%s",s1+s2); // ok

    const char* p = s1+s2;
    printf("%s",p); // won't work, temporary destroyed

    String s3 = s1+s2;
    printf("%s",(const char*)s3); // ok

    cout << s3;    // ok

    cout << s1+s2 ; // ok
}

6.4 Extensions

A critical issue was – and is – how to handle the constant stream of proposals for language changes and extensions. The focus of that effort is the extensions working group of which I’m chairman. It is much easier to accept a proposal than to reject it. You win friends that way, and people praise the language for having so many “neat features.” However, a language made as a shopping list of features without coherence will die, so there is no way we could accept even most of the features that would be of genuine help to some section of the C++ community.

At the Lund (Sweden) meeting this cautionary tale became popular:

“We often remind ourselves of the good ship Vasa. It was to be the pride of the Swedish navy and was built to be the biggest and most beautiful battleship ever. Unfortunately, to accommodate enough statues and guns, it underwent major redesigns and extension during construction. The result was that it only made it half way across Stockholm harbor before a gust of wind blew it over, and it sank killing about 50 people. It has been raised and you can now see it in a museum in Stockholm. It is a beauty to behold – far more beautiful at the time than its unextended first design and far more beautiful today than if it had suffered the usual fate of a 17th century battle ship – but that is no consolation to its designer, builders, and intended users [Stroustrup, 1992b].”

But why consider extensions at all? After all, X3J16 is a standards group, not a language design group chartered to design “C++++.” Worse, a group of more than 250 people with its members changing over time isn’t a promising forum for language design.

First of all, the group was mandated to deal with templates and exception handling. Even before the committee had time to work on those, suggestions for extensions and even for incompatible changes were being sent to committee members. The user community, even most users who didn’t personally submit proposals, clearly expected the committee to consider these suggestions. If the committee takes such suggestions seriously, as it does, it provides a focus for discussion of C++’s future. If it does not, the activity will simply go elsewhere and incompatible extensions will appear.

Also, despite paying lip service to minimalism and stability, many people like new features. Language design is intrinsically interesting, the debates about new features are stimulating, and they provide a good excuse for new articles and releases. Some features might even help programmers, but for many that seems to be a secondary motivation. If ignored, these factors can disrupt progress. I prefer them to have a constructive outlet.

Thus, the committee has a choice between discussing extensions, discussing dialects after they have come into use, and ignoring reality. Every one of these alternatives have been chosen by various standards committees over the years. Most -including the Ada, C, Cobol, Fortran, Modula-2, and Pascal-2 committees – have chosen to consider extensions.

My personal opinion is that extension activity of various sorts is inevitable, and it is better to have it out in the open and conducted in a semi-civilized manner in a public forum under somewhat formal rules. The alternative is a scramble to get ideas accepted through the mechanism of attracting users in the marketplace. That mechanism isn’t conducive to calm deliberation, open discussion, and attempts to serve all users. The result would be the language fracturing into dialects.

I consider the obvious dangers inherent in dealing with extensions preferable to the certain chaos that would result from not dealing with them. A slowly eroding majority of the committee has agreed, and we are approaching the point where extensions work as conducted until now must cease because standards documents will start appearing, and all activity must be directed towards responding to comments on those.

Only time will tell where the energy thus left without an outlet will go to. Some will go to other languages, some will go into experimental work, some will go into library building (the traditional C++ alternative to language changes). It is interesting to note that standards groups, like all other organizations, find it hard to disband themselves. Often, a standards group reconstitutes itself as a forum for revisions or as the bureaucratic mechanism for the creation of a next-level standard, that is, as a design committee for a new language or dialect. The Algol, Fortran, and Pascal committees, and even the ANSI C committee, provide examples of this phenomenon. Usually, the redirection of effort from standardizing an established language to the design of a would-be successor is accompanied by a major change in personnel and also of ideals.

In the meantime, I try to guard against the dangers of design by committee by spending significant time on every proposed extension. This strategy isn’t foolproof, but it does provide a degree of protection against the acceptance of mutually inconsistent features and against the loss of a coherent view of the language.

The danger of design by committee is the danger of losing a coherent view of what the language is and ought to evolve into in favor of political deals over individual features and resolutions.

A committee can easily fall into the trap of approving a feature just because someone insists that it is essential. It is always easier to argue for a feature than to argue that the advantage of the feature – which will be very plausible in all interesting cases – is outweighed by nebulous concerns of coherence, simplicity, stability, difficulties of transition, etc. Also, the way language committees work does not seem to lend itself well to arguments based on experimentation and experience-based reasoning. I’m not quite sure why this is, but maybe the committee format and resolution by voting favor arguments that are more easily digested by exhausted members. It also appears that logical arguments (and sometimes even illogical arguments) are more persuasive than reports on other people’s experience and experiments.

Thus, “standardization” can become a force for instability. The results of such instability can be a change for the better, but there is always the danger that it might become random change or change for the worse. To avoid this, standardization has to be done at the right stage of a language’s evolution: after its path of evolution has been clearly outlined and before divergent dialects supported by powerful commercial interests has emerged. I hope this is the case for C++, and that the committee will continue to show the necessary restraint in innovation.

It is worth remembering that people will manage even without extensions. Proponents of language features tend to forget that it is quite feasible to build good software without fancy language support. No individual language feature is necessary for good software design, not even the ones we would hate to be without. Good software can be and often is written in C or in a small subset of C++. The benefits of language features are the convenience of expressing ideas, the time needed to get a program right, the clarity of the resulting code, and the maintainability of the resulting code. It is not an absolute either/or. More good code has been written in languages denounced as “bad” than in languages proclaimed “wonderful” – much more.

6.4.1 Criteria

To help people understand what was involved in proposing an extension or a change to C++, the extensions working group formulated a set of questions that is likely to be asked about every proposed feature [Stroustrup, 1992b]:

"The list presents criteria that have been used to evaluate features for C++.

[1] Is it precise? (Can we understand what you are suggesting?) Make a clear, precise statement of the change as it affects the current draft of the language reference standard.

[a] What changes to the grammar are needed?

[b] What changes to the description of the language semantics are needed?

[c] Does it fit with the rest of the language?

[2] What is the rationale for the extension? (Why do you want it, and why would we also want it?)

[a] Why is the extension needed?

[b] Who is the audience for the change?

[c] Is this a general-purpose change?

[d] Does it affect one group of C++ language users more than others?

[e] Is it implementable on all reasonable hardware and systems?

[f] Is it useful on all reasonable hardware and systems?

[g] What kinds of programming and design styles does it support?

[h] What kinds of programming and design styles does it prevent?

[i] What other languages (if any) provide such features?

[j] Does it ease the design, implementation, or use of libraries?

[3] Has it been implemented? (If so, has it been implemented in the exact form that you are suggesting; and if not, why can you assume that experience from “similar” implementations or other languages will carry over to the feature as proposed?)

[a] What effect does it have on a C++ implementation?

[x] on compiler organization?

[y] on run-time support?

[b] Was the implementation complete?

[c] Was the implementation used by anyone other than the implementer(s)?

[4] What difference does the feature have on code?

[a] What does the code look like without the change?

[b] What is the effect of not doing the change?

[c] Does use of the new feature lead to demands for new support tools?

[5] What impact does the change have on efficiency and compatibility with C and existing C++?

[a] How does the change affect run-time efficiency?

[x] of code that uses the new feature?

[y] of code that does not use the new feature?

[b] How does the change affect compile and link times?

[c] Does the change affect existing programs?

[x] Must C++ code that does not use the feature be recompiled?

[y] Does the change affect linkage to languages such as C and Fortran?

[d] Does the change affect the degree of static or dynamic checking possible for C++ programs?

[6] How easy is the change to document and teach?

[a] to novices?

[b] to experts?

[7] What reasons could there be for not making the extension? There will be counter-arguments and part of our job is to find and evaluate them, so you can just as well save time by presenting a discussion.

[a] Does it affect old code that does not use the construct?

[b] Is it hard to learn?

[c] Does it lead to demands for further extensions?

[d] Does it lead to larger compilers?

[e] Does it require extensive run-time support?

[8] Are there

[a] Alternative ways of providing a feature to serve the need?

[b] Alternative ways of using the syntax suggested?

[c] Attractive generalizations of the suggested scheme?

Naturally, this list is not exhaustive. Please expand it to cover points relevant to your specific proposal and leave out points that are irrelevant.”

These questions are of course a collection of the kinds of questions practical language designers have always asked.

6.4.2 Status

So how is the committee doing? We won’t really know until the standard appears because there is no way of knowing how new proposals will fare. This summary is based on the state of affairs after the November 1994 meeting in Valley Forge. Proposing extensions for C++ seems to be popular. For example:

– Extended (international) character sets (§6.5.3.2)

– Various template extensions (§15.3.115.4, §15.8.2)

– Garbage collection (§ 10.7)

– NCEG proposals (for example, §6.5.2)

– Discriminated unions

– User-defined operators (§11.6.2)

– Evolvable/indirect classes

– Enumerations with predefined ++, <<, etc., operators

– Overloading based on return type

– Composite Operators (§11.6.3)

– Keyword for the null pointer (NULL, nil, etc.) (§11.2.3)

– Pre- and post-conditions

– Improvements to the Cpp macros

– Rebinding of references

– Continuations

– Currying.

There is some hope of restraint and that accepted features will be properly integrated into the language. Only a few new features have been accepted so far:

– Exception handling ("mandated”) (§16)

– Templates ("mandated”) (§15)

– European character set representation of C++ (§6.5.3.1)

– Relaxing rule for return types for overriding functions (§13.7)

– Run-time type identification (§ 14.2)

– Declarations in conditions (§3.11.5.2)

– Overloading based on enumerations (§11.7.1)

– User-defined allocation and deallocation operators for arrays (§10.3)

– Forward declaration of nested classes (§13.5)

– Namespaces (§17)

– Mutable (§13.3.3)

– Boolean type (§11.7.2)

– A new syntax for type conversion (§14.3)

– An explicit template instantiation operator (§15.10.4)

– Explicit template arguments in template function calls (§15.6.2)

– Member templates (§15.9.3)

– Class templates as template arguments (§15.3.1)

– A const static member of integral type can be initialized by a constant -expression within a class declaration

– Explicit constructors (§3.6.1)

– Static checking of exception specifications (§16.9).

Exceptions and templates stand out among the extensions as being mandated by the original proposal and described in the ARM, and also by being a couple of orders of magnitudes more difficult to define and to implement than any of the other proposals.

To contrast, the committee has rejected many proposals. For examples:

– Several proposals for direct support for concurrency

– Renaming of inherited names (§12.8)

– Keyword arguments (§6.5.1)

– Several proposals for slight modifications of the data hiding rules

– Restricted pointers (“son of noalias”) (§6.5.2)

– Exponentiation operator (§11.5.2)

– Automatically generated composite operators

– User-defined operator. () (§11.5.2)

– Nested functions

– Binary literals

– General initialization of members within a class declaration.

Please note that a rejection doesn’t imply that the proposal was deemed bad or even useless. In fact, most proposals that reach the committee are technically sound and would help at least some subset of the C++ user community. The reason is that most ideas never make it through the initial scrutiny and effort to make it into a proposal.

6.4.3 Problems with Good Extensions

Even good extensions cause problems. Assume for a moment that we have an extension everybody likes so that no time is wasted discussing its validity. It will still divert implementer efforts from tasks that some people will consider more important. For example, an implementer may have a choice of implementing the new feature or implementing an optimization in the code generator. Often, the feature will win out because it is more visible to users.

An extension can be perfect when viewed in isolation, yet flawed from a wider perspective. Most work on an extension focuses on its integration into the language and its interactions with other language features. The difficulty of this kind of work and the time needed to do it well is invariably underestimated.

Any new feature makes existing implementations outdated. They don’t handle the new feature. Thus, users will have to upgrade, live without the feature for a while, or manage two versions of a system (one for the latest implementations and one for the old one). This last option is typically the one library and tool builders must choose.

Teaching material will have to be updated to reflect the new feature – and maybe simultaneously reflect how the language used to be for the benefit of users that haven’t yet upgraded.

These are the negative effects of a "perfect” extension. If a proposed extension is controversial, it will in addition soak up effort from the committee members and from the community at large. If the extension has incompatible aspects, these may have to be addressed when upgrading from an older implementation to a new one – sometimes even when the new feature isn’t used. The classical example is the introduction of a new keyword. For example, this innocent looking function

void using(Table* namespace) { /* ... */ }

ceased to be legal when namespaces were introduced because using and namespace are new keywords. In my experience, though, the introduction of new keywords creates few technical problems, and those are easily fixed. Proposing a new keyword, on the other hand, never fails to cause a howl of outrage. The practical problems with new keywords can be minimized by choosing names that aren’t too likely to clash with existing identifiers. For this reason, using was preferred to use, and namespace was chosen over scope. When, as an experiment, we introduced using and namespace into a local implementation without any announcement, nobody actually noticed their presence for two months.

In addition to the very real problems of getting a new feature accepted and into use, the mere discussion of extensions can have negative effects by creating an impression of instability in the minds of some users. Many users and would-be users do not understand that changes are carefully screened to minimize effects on existing code. Idealistic proponents of new features often find the constraints of stability and compatibility with both C and existing C++ hard to accept and rarely do much to allay fears of instability. Also, enthusiastic proponents of “improvements” tend to overstate the weaknesses of the language to make their extensions look more attractive.

6.4.4 Coherence

I see the main challenge of extension proposals as maintaining the coherence of C++ and communicating a view of this coherence to the user community. Features accepted into C++ must work in combination, must support each other, must compensate for serious real problems in C++ as it stood without them, must fit syntactically and semantically into the language, and must support a manageable style of programming. A programming language cannot be just a set of neat features, and the primary effort involved in evaluating and developing extensions is to refine them so that they become an integral part of the language. For an extension that I consider seriously, I estimate that about 95% of my personal effort goes into finding a form of the original idea/proposal that can be smoothly integrated into C++. Typically, much of this effort involves working out a clear transition path for implementers and users. Even the best new feature must be rejected if there is no way users can adopt it without throwing away most of their old code and old tools. See Chapter 4 for a more extensive discussion of acceptance criteria.

6.5 Examples of Proposed Extensions

Generally in this book, I discuss a proposed language feature in the context of related features. A few, however, don’t seem to fit anywhere, so I use them as examples here. Not surprisingly, the features that don’t naturally fit anywhere have a tendency to get rejected. A feature, however reasonable when considered in isolation, should be considered with great suspicion unless it can be seen as part of a general effort to evolve the language in some definite direction.

6.5.1 Keyword Arguments

Roland Hartinger’s proposal for keyword arguments, that is, for a mechanism for specifying function arguments by name in a call, was close to technically perfect. The reason the proposal was withdrawn rather than accepted is therefore particularly interesting. It was withdrawn because the extensions group reached a consensus that the proposal was close to redundant, would cause compatibility problems with existing C++ code, and would encourage programming styles that ought not to be encouraged. The discussion here reflects the discussions in the extensions working group. As usual, hundreds of relevant remarks must remain unmentioned for lack of space.

Consider an ugly, but unfortunately not unrealistic, example borrowed from an analysis paper written by Bruce Eckel:

class window {
    // ...
public:
    window(
        wintype=standard,
        int ul_corner_x=0,
        int ul_corner_y=0,
        int xsize=100,
        int ysize=100,
        color Color=black,
        border Border=single,
        color Border_color=blue,
        WSTATE window_state=open);
    // ...
};

If you want to define a default window, all is well. If you want to define a window that is “almost default,” the specification can get tedious and error-prone. The proposal was simply to introduce a new operator, : =, to be used in calls to specify a value for a named argument. For example:

new window(Color:=green,ysize:=150);

would be equivalent to

new window(standard,0,0,100,150,green);

which, thanks to the default arguments, is equivalent to

new window(standard,0,0,100,150,green,single,blue,open);

This seems to be a useful bit of syntactic sugar that might make programs more readable and more robust. The proposal was implemented to be sure that all conceptual and integration problems were ironed out; no significant or difficult problems were found. In addition, the proposed mechanism was based on experience from other languages, such as Ada.

On the other hand, there is no doubt that we can live without keyword arguments; they do not provide any new fundamental facility, don’t support a significant new programming paradigm, and don’t close a hole in the type system. This leaves questions with answers that depend more on taste and impression of the state of the C++ user community:

[1] Will keyword arguments lead to better code?

[2] Will keyword arguments lead to confusion or teaching problems?

[3] Will keyword arguments cause compatibility problems?

[4] Should keyword arguments be one of the few extensions we can accept?

The first serious problem discovered with the proposal was that keyword arguments would introduce a new form of binding between a calling interface and an implementation:

[1] An argument must have the same name in a function declaration as in the function definition.

[2] Once a keyword argument is used, the name of that argument cannot be changed in the function definition without breaking user code.

Because of the cost of recompilation, many people are worried about any increase in the degree of binding between interfaces and implementations. Worse, this turned out to be a compatibility problem of significant magnitude. Some organizations recommend a style with “Tong, informative” argument names in header files, and “short, convenient” names in the definitions. For example:

void reverse(int* elements, int length_of_element_array);

// ...

void reverse(int* v, int n)
{
    // ...
}

Naturally, some people find that style abhorrent, whereas others (including me) find it quite reasonable. Apparently, significant amounts of such code exist. Further, an implication of keyword arguments would be that no name in a commonly distributed header file could be changed without risking breaking code. Different suppliers of header files for common services (for example, Posix or X) would also have to agree on argument names. This could easily become a bureaucratic nightmare.

Alternatively, the language shouldn’t require declarations to have the same name for the same argument. That seemed viable to me. However, people didn’t seem to like that variant either.

There could be a noticeable impact on link times if the rule that argument names must match across compilation units is checked. If it isn’t checked, the facility would not be type safe and could become a source of subtle errors.

Both the potential linking cost and the very real binding problem could be easily avoided by omitting argument names in header files. A cautious user might therefore avoid specifying argument names in header files. Thus, to quote Bill Gibbons, “The net impact on readability of C++ might actually be negative.”

My main worry about keyword arguments was actually that keyword arguments might slow the gradual transition from traditional programming techniques to data abstraction and object-oriented programming in C++. In code that I find best written and easiest to maintain, long argument lists are very rare. In fact, it is a common observation that a transition to a more object-oriented style leads to a significant decrease in the length of argument lists; what used to be arguments or global values become local state. Based on experience, I expect the average number of arguments to drop to less than two and that functions with more than two arguments will become rare. This implies that keyword arguments would be most useful in code we deemed poorly written. Would it be sensible to introduce a new feature that primarily supported programming styles that we would prefer to see decline? The consensus, based on this argument, the compatibility issues, and a few minor details, was no.

6.5.1.1 Alternatives to Keyword Arguments

Given that we don’t have keyword arguments, how would I reduce the length of the argument list in the window example to something convenient? First of all, the apparent complexity is already reduced by the default arguments. Adding extra types to represent common variants is another common technique:

class colored_window : public window {
public:
    colored_window(color c=black)
        :window(standard,0,0,100,100,c) { }
};


class bordered_window : public window {
public:
    bordered_window(border b=single, color bc=blue)
         :window(standard,0,0,100,100,black,b,be) { }
};

This technique has the advantage of channeling usage into a few common forms and can therefore be used to make code and behavior more regular. Another technique is to provide explicit operations for changing settings from the defaults:

class w_args {
  wintype wt;
  int ulcx, ulcy, xz, yz;
  color wc, be;
  border b;
  WSTATE ws;
public:
  w_args ()   // set defaults
     : wt(standard), ulcx(0), ulcy(0), xz(100), yz(100),
     wc(black), b(single), be(blue), ws(open) { }
     // override defaults:
    w_args& ysize(int s) { yz=s; return *this; }
    w_args& Color(color c) { wc=c; return *this; }
    w_args& Border(border bb) { b = bb; return *this; }
    w_args& Border_color(color c) { bc=c; return *this; }
    // ...
};

class window {
  // . . .
  window(w_args wa); // set options from wa
  // . . .
};

From this, we get a notational convenience that is roughly equivalent to what keyword arguments provide:

window w; // default window
window w(w_args().Color(green).ysize(150));

This technique has the significant advantage that it becomes easy to pass objects representing arguments around in a program.

Naturally, these techniques can be used in combination. The net effect of such techniques is to shorten argument lists and thereby decrease the need for keyword arguments.

A further reduction in the number of arguments could be obtained by using a Point type rather than expressing interfaces directly in terms of coordinates.

6.5.2 Restricted Pointers

A Fortran compiler is allowed to assume that if a function is given two arrays as arguments, then those arrays don’t overlap. A C++ function is not allowed to assume that. The result is an advantage in speed for the Fortran routine of between 15% and 30 times dependent on the quality of the compiler and the machine architecture. The spectacular savings come from vectorizing operations for machines with special vector hardware such as Crays.

Given C’s emphasis on efficiency, this was considered an affront and the ANSI C committee proposed to solve the problem by a mechanism called noalias to specify that a C pointer should be considered alias-free. Unfortunately, the proposal was late and so half-baked that it provoked Dennis Ritchie to his only intervention in the C standards process. He wrote a public letter stating, “noalias must go; this is non-negotiable.”

After that, the C and C++ community was understandably reluctant to tackle aliasing problems, but the issue is of key importance to C users on Crays so Mike Holly from Cray grasped the nettle and presented an improved anti-aliasing proposal to the Numerical C Extensions Group (NCEG) and to the C++ committee. The idea was to allow a programmer to state that a pointer should be considered alias-free by declaring it restricted. For example:

void* memcopy(void*restrict s1, const void* s2, size_t n);

Since s1 is specified to have no alias, there is no need to declare s2 restricted, also. The keyword restrict would syntactically apply to * in the same way that const and volatile do. This proposal would solve the C/Fortran efficiency discrepancy by selectively adopting the Fortran rule.

The C++ committee was naturally sympathetic to any proposal that improves efficiency and discussed the proposal at some length, but finally decided to reject it with hardly a dissenting voice. The key reasons for the rejection were:

[1] The extension is not safe. Declaring a pointer restricted allows the compiler to assume that the pointer has no aliases. However, a user wouldn’t necessarily be aware of this, and the compiler can’t ensure it. Because of the extensive use of pointers and references in C++, more errors are likely to arise from this source than Fortran experience might suggest.

[2] Alternatives to the extension have not been sufficiently explored. In many cases, alternatives such as an initial check for overlap combined with special code for non-overlapping arrays is an option. In other cases, direct calls to specialized math libraries, such as BLAS, can be used to tune vector operations for efficiency. Promising alternatives for optimization have yet to be explored. For example, global optimization of relatively small and stylized vector and matrix operations appears feasible and worthwhile for C++ compilers for high-performance machines.

[3] The extension is architecture-specific. High-performance numerical computation is a specialized field using specialized techniques and often specialized hardware. Because of this, it may be more appropriate to introduce a nonstandard architecture specific extension or pragma. Should the need for the utility of this kind of optimization prove useful beyond a narrow community using specialized machine architectures, the extension must be reevaluated.

One way of looking at this decision is as a reconfirmation of the idea that C++ supports abstraction through general mechanisms rather than specialized application areas through special-purpose mechanisms. I would certainly like to help the numerical computation community. The question is how? Following closely in Fortran’s footsteps for the classical vector and matrix algorithms may not be the best approach. It would be nice if every kind of numeric software could be written in C++ without loss of efficiency, but unless something can be found that achieves this without compromising the C++ type system it may be preferable to rely on Fortran, assembler, or architecture-specific extensions.

6.5.3 Character Sets

C relies on the American variant of the international 7-bit character set ISO 646-1983 called ASCII (ANSI3.4-1968). This causes two problems:

[1] ASCII contains punctuation characters and operator symbols, such as ] and {, that are not available in many national character sets.

[2] ASCII doesn’t contain characters, such as Å and æ, used in languages other than English.

6.5.3.1 Restricted Character Sets

The ASCII (ANSI3.4-1968) special characters [,],{,}, |, and occupy character set positions designated as alphabetic by ISO. In most European national ISO-646 character sets, these positions are occupied by letters not found in the English alphabet. For example, the Danish national character set uses these values for the vowels Æ, Å, æ, å, ø, and ø. No significant amount of text can be written in Danish without them. This leaves Danish programmers with the unpleasant choice of acquiring computer systems that handle full 8-bit character sets, such as ISO-8859/1/2, not using three vowels of their native language, or not using C++. Speakers of French, German, Spanish, Italian, etc., face the same alternatives. This has been a notable barrier to the use of C in Europe, especially in commercial settings (such as banking) where the use of 7-bit national character sets is pervasive in many countries.

For example, consider this innocent-looking ANSI C and C++ program:

int main(int argc, char* argv[])
{
    if (argc<l || *argv[1]=='') return 0;
    printf("Hello, %s ",argv[l]);
}

On a standard Danish terminal or printer this program will appear like this:

int main (int argc, char* argvÆÅ)
æ
    if (argc<l Ø Ø *argvÆ1Å== ' Ø 0 ') return 0;
    printf ("Hello, %sØn" ,argvÆ1Å);
å

It is amazing to realize that some people read and write this with ease. I don’t think that is a skill anyone should have to acquire.

The ANSI C committee adopted a partial solution to this problem by defining a set of trigraphs that allows national characters to be expressed:

 #     [     {           ]    }    ^     |     ~
??=   ??(   ??<   ??/   ??)   ??>  ??'   ??!   ??-

This can be useful for interchange of programs, but doesn’t make programs readable:

int main(int argc, char* arg??(??))
? ?<
    if (argc<l ??!??! *argv??(1??)=='??/0') return 0;
    printf("Hello, %s??/n",argv??(1??));
? ?>

Naturally, the real solution to this problem is for C and C++ programmers to buy equipment that supports both their native language and the characters needed by C and C++ well. Unfortunately, this appears to be infeasible for some, and the introduction of new equipment can be a very slow process. To help programmers stuck with such equipment and thereby help C++, the C++ standards committee decided to provide a more readable alternative.

The following keywords and digraphs are provided as equivalents to operators containing national characters:

keywords

digraphs

and

&&

<%

{

and_eq

& =

%>

}

bitand

&

<:

t

bitor

|

: >

]

compl

~

%:

#

not

!

%:%:

##

or

||

 

 

or_eq

| =

 

 

xor

^

 

 

xor_eq

^=

 

 

not_eq

! =

 

 

I would have preferred %% for # and <> for ! = but % : and not_eq were the best that the C and C++ committees could compromise on.

We can now write the example like this:

int main (int argc, char* argv<::>)
<%
    if (argc<l or *argv<:1:>=='??/0') return 0;
    printf ("Hello, %s??/n",argv<:1:>);
%>

Note that trigraphs are still necessary for putting “missing” characters such as into strings and character constants.

The introduction of the digraphs and the new keywords was most controversial. A large number of people – mostly people with English as their native language and with a strong C background – saw no reason to complicate and corrupt C++ for the benefit of people who were “unwilling to buy decent equipment.” I sympathize with that position because the digraphs and trigraphs are not pretty, and new keywords are always a source of incompatibilities. On the other hand, I have had to work on equipment that didn’t support my native language, and I have seen people drop C as a possible programming language in favor of “a language that doesn’t use funny characters.” In support of this observation, the IBM representative reported that the absence of ! in the EBCDIC character set used on IBM mainframes causes frequent and repeated complaints. I found it interesting to note that even where extended character sets are available, systems administration issues sometimes force their disuse.

My guess is that for a transition period of maybe a decade, keywords, digraphs, and trigraphs is the least bad solution. My hope is that it will help C++ become accepted in areas that C failed to penetrate, and thus support programmers who have not been represented in the C and C++ culture.

36.5.3.2 Extended Character Sets

Support for a restricted character set representation for C++ is essentially backward-looking. A more interesting and difficult problem is how to support extended character sets; that is, how to take advantage of character sets with more characters than ASCII. There are two distinct problems:

[1] How to support manipulation of extended character sets?

[2] How to allow extended character sets in the source text of a C++ program?

The C standards committee approached the former problem by defining a type wchar_t to represent multi-byte characters. In addition, a multi-byte string type wchar_t [] and printf-family I/O for wchar_t were provided. C++ continues in this direction by making wchar_t a proper type (rather than merely a synonym for another type defined using typedef as it is in C), by providing a standard string of wchar_t class called wstring, and by supporting these types in stream I/O.

This supports only a single “wide character” type. If a programmer needs more types, say a Japanese character, a string of Japanese characters, a Hebrew character, or a string of Hebrew characters, there are at least two alternative approaches. One can map these characters into a common character set large enough to hold both, say, Unicode, and write code that handles that using wchar_t. Alternatively one can define classes for each kind of character and string, say, Jchar, Jstring, Hchar, and Hstring, and have these classes supply the correct behavior for each. Such classes ought to be generated from a common template. My experience is that either approach can work, but that any decision that touches internationalization and multiple character sets becomes controversial and emotional faster than any other kind of problem.

The question of if and how to allow extended character sets to be used in C++ program text is no less tricky. Naturally, I would like to use the Danish words for apple, tree, boat, and island in programs dealing with such concepts. Allowing æble, træ, båd, and ø in comments is not difficult, and comments in languages other than English are indeed not uncommon. Allowing extended character sets in identifiers is more problematic. In principle, I’d like to allow identifiers written in Danish, Japanese, and Korean in a C or C++ program. There are no serious technical problems in doing that. In fact, a local C compiler written by Ken Thompson allows all Unicode characters with no special meaning in C in identifiers.

I worry about portability and comprehension, though. The technical portability problem can be handled. However, English has an important role as a common language for programmers, and I suspect that it would be unwise to abandon that without serious consideration. To most programmers, a systematic use of Hebrew, Chinese, Korean, etc., would be a significant barrier to comprehension. Even my native Danish could cause some headaches for the average English-speaking programmer.

The C++ committee hasn’t made any decisions on this issue so far, but I suspect it will have to and that every possible resolution will be controversial.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.227.194