Chapter 10. The C++ Preprocessor

The speech of man is like embroidered tapestries, since like them this has to be extended in order to display its patterns, but when it is rolled up it conceals and distorts them.

Themistocles

The first C compilers had no constants or inline functions. When C was still being developed, it soon became apparent that it needed a facility for handling named constants, macros, and include files. The solution was to create a preprocessor that is run on the programs before they are passed to the C compiler. The preprocessor is nothing more than a specialized text editor. Its syntax is completely different from C’s, and it has no understanding of C constructs. It is merely a dumb text editor.

The preprocessor was soon merged into the main C compiler. The C++ compiler kept this preprocessor. On some systems, such as Unix, it is still a separate program, automatically executed by the compiler wrapper cc. Some of the newer compilers, such as Borland-C++ Builder, have the preprocessor built in.

#define Statement

The #define statement can be used to define a constant. For example, the following two lines perform similar functions:

#define SIZE 20       // The array size is 20
const int SIZE = 20;  // The array size is 20

Actually the line #define SIZE 20 acts as a command to the preprocessor to globallychangeSIZEto20. This takes the drudgery and guesswork out of making changes.

All preprocessor commands begin with a hash mark (#) as the first character of the line. (You can put whitespace before the #, but this is rarely done.) C++ is free-format. Language elements can be placed anywhere on a line, and the end-of-line is treated just like a space. The preprocessor is not free-format. It depends on the hash mark (#) being the first character on the line. As you will see, the preprocessor knows nothing about C++ and can be (and is) used to edit things other than C++ programs.

Warning

The preprocessor is not part of the core C++ compiler. It uses an entirely different syntax and requires an entirely different mindset to use it well. Most problems you will see occur when the preprocessor is treated like C++.

Preprocessor directives terminate at the end of the line. In C++ a semicolon (;) ends a statement. The preprocessor directives do not end in a semicolon, and putting one in can lead to unexpected results. A preprocessor directive can be continued by putting a backslash () at the end of the line. The simplest use of the preprocessor is to define a replacement macro. For example, the command:

#define FOO bar

causes the preprocessor to replace the word “FOO” with the word “bar” everywhere “FOO” occurs. It is common programming practice to use all uppercase letters for macro names. This makes it very easy to tell the difference between a variable (all lowercase) and a macro (all uppercase).

The general form of a simple #define statement is:

#define Name Substitute-Text 

Name can be any valid C++ identifier. Substitute-Text can be anything as long as it fits on a single line. The Substitute-Text can include spaces, operators, and other characters.

Consider the following definition:

#define FOR_ALL for (i = 0; i < ARRAY_SIZE; ++i)

It is possible to use it like this:

/* 
 * Clear the array 
 */ 
FOR_ALL { 
    data[i] = 0; 
}

It is considered bad programming practice to define macros in this manner. Doing so tends to obscure the basic control flow of the program. In this example, if the programmer wants to know what the loop does, he must search the beginning of the program for the definition of FOR_ALL.

It is even worse to define macros that do large-scale replacement of basic C++ programming constructs. For example, you can define the following:

#define BEGIN { 
#define END } 

. . . 

    if (index == 0)  
    BEGIN 
        std::cout << "Starting
"; 
    END

The problem is that you are no longer programming in C++, but in a half C++, half-PASCAL mongrel.

The preprocessor can cause unexpected problems because it does not check for correct C++ syntax. For example, Example 10-1 generates an error on line 11.

Example 10-1. big/big.cpp
 1:#define BIG_NUMBER 10 ** 10
 2:
 3:int main(  )
 4:{
 5:    // index for our calculations 
 6:   int   index;                
 7:
 8:    index = 0;
 9:
10:    // syntax error on next line 
11:   while (index < BIG_NUMBER) {
12:        index = index * 8;
13:    }
14:   return (0);
15:

The problem is in the #define statement on line 1, but the error message points to line 11. The definition in line 1 causes the preprocessor to expand line 11 to look like:

while (index < 10 ** 10)

Because ** is an illegal operator, this generates a syntax error.

Question 10-1: The following program generates the answer 47 instead of the expected answer 144. Why? (Hint below.)

Example 10-2. first/first.cpp
#include <iostream>

#define FIRST_PART      7 
#define LAST_PART       5 
#define ALL_PARTS       FIRST_PART + LAST_PART 

int main(  ) { 
    std::cout << "The square of all the parts is " << 
        ALL_PARTS * ALL_PARTS << '
';
    return (0);
}

Hint:

CC -E prog.cc

sends the output of the preprocessor to the standard output.

In MS-DOS/Windows, the command:

cpp prog.cpp

creates a file called prog.i containing the output of the preprocessor.

Running the program for Example 10-2 through the preprocessor gives you the code shown in Example 10-3.

Example 10-3. first/first-ed.out
# 1 "first.cpp"
# 1 "/usr/local/lib/g++-include/iostream" 1 3
 
// About 900 lines of #include stuff omitted

inline ios& oct(ios& i)
{ i.setf(ios::oct, ios::dec|ios::hex|ios::oct); return i; }

# 1 "first.cpp" 2

int main(  ) { 
    std::cout << "The square of all the parts is " << 
                   7 + 5 * 7 + 5 << '
';
    return (0);
}

Tip

The output of the C++ preprocessor contains a lot of information, most of which can easily be ignored. In this case, you need to scan the output until you reach the std::cout line. Examining this line will give you an idea of what caused the error.

Question 10-2: Example 10-4 generates a warning that counter is used before it is set. This is a surprise because the for loop should set it. You also get a very strange warning, “null effect,” for line 11. What’s going on?

Example 10-4. max/max.cpp
// warning, spacing is VERY important 

#include <iostream>

#define MAX =10

int main(  )
{
    int  counter;

    for (counter =MAX; counter > 0; --counter)
        std::cout << "Hi there
";

    return (0);
}

Hint: Take a look at the preprocessor output.

Tip

Some preprocessors, such as the one that comes with the g++ compiler, add spaces around the tokens, which makes this program fail with a syntax error instead of compiling and generating strange code.

Question 10-3: Example 10-5 computes the wrong value for size. Why?

Example 10-5. size/size.cpp
#include <iostream>

#define SIZE    10;
#define FUDGE   SIZE -2;

int main(  )
{
    int size;// size to really use 
    
    size = FUDGE;
    std::cout << "Size is " << size << '
';
    return (0);
}

Question 10-4: Example 10-6 is supposed to print the message Fatal Error: Abort and exit when it receives bad data. But when it gets good data, it exits. Why?

Example 10-6. die/die.cpp
#include <iostream>
#include <cstdlib>

#define DIE 
    std::cerr << "Fatal Error:Abort
";exit(8); 

int main(  ) {     
    // a random value for testing 
    int value;  
    
    value = 1; 
    if (value < 0) 
        DIE; 

    std::cerr << "We did not die
";
    return (0);
}

#define Versus const

The const keyword is relatively new. Before const, #define was the only way to define constants, so most older code uses #define directives. However,the useof const is preferred over #define for several reasons. First, C++ checks the syntax of const statements immediately. The #define directive is not checked until the macro is used. Also, const uses C++ syntax, while #define has a syntax all its own. Finally, const follows normal C++ scope rules, whereas constantsdefined by a #define directive continue on forever.

In most cases a const statement is preferred over #define. Here are two ways of defining the same constant:

#define MAX 10 // Define a value using the pre-processor
               // (This can easily cause problems)

const int MAX = 10; // Define a C++ constant integer
                    // (Safer)

The #define directive is limited to defining simple constants. The const statement can define almost any type of C++ constant, including things such as structure classes. For example:

struct box {
    int width, height;   // Dimensions of the box in pixels
};

// Size of a pink box to be used for input
const box pink_box(1, 4);

The #define directive is, however, essential for things such as conditional compilation and other specialized uses.

Conditional Compilation

One problem programmers have is writing code that can work on many different machines. In theory, C++ code is portable; in practice, many machines have little quirks that must be accounted for. For example, this book covers Unix, MS-DOS, and Windows compilers. Although they are almost the same, there are some differences.

Through the use of conditional compilation, the preprocessor allows you great flexibility in changing the way code is generated. Suppose you want to put debugging code in the program while you are working on it and then remove the debugging code in the production version. You could do this by including the code in an #ifdef - #endif section, like this:

#ifdef DEBUG 
    std::cout << "In compute_hash, value " << value << " hash " << hash << "
"; 
#endif /* DEBUG */

Tip

You do not have to put the /* DEBUG */ after the #endif, but it is very useful as a comment.

If the beginning of the program contains the following directive, the std::cout is included:

#define DEBUG       /* Turn debugging on */

If the program contains the following directive, the std::cout is omitted:

#undef DEBUG        /* Turn debugging off */

Strictly speaking, the #undef DEBUG is unnecessary. If there is no #define DEBUG statement, DEBUG is undefined. The #undef DEBUG statement is used to indicate explicitly to anyone reading the code that DEBUG is used for conditional compilation and is now turned off.

The directive #ifndef causes the code to be compiled if the symbol is not defined:

#ifndef STACK_SIZE /* Is stack size defined? */
#define STACK_SIZE 100 /* It's not defined, so define it here */
#endif /* STACK_SIZE */

#else reverses the sense of the conditional. For example:

#ifdef DEBUG 
    std::cout << "Test version. Debugging is on
"; 
#else /* DEBUG */
    std::cout << "Production version
"; 
#endif /* DEBUG  */

A programmer may wish to temporarily remove a section of code. A common method of doing this is to comment out the code by enclosing it in /* */. This can cause problems, as shown by the following example:

/***** Comment out this section 
    section_report(  ); 
    /* Handle the end-of-section stuff */ 
    dump_table(  ); 
**** End of commented out section */

This generates a syntax error for the fifth line. Why? Because the */ on the third line ends the comment that started on the first line, and the fifth line :

**** End of commented out section */

is not a legal C++ statement.

A better method is to use the #ifdef construct to remove the code.

#ifdef UNDEF 
    section_report(  ); 
    /* Handle the end-of-section stuff */ 
    dump_table(  ); 
#endif /* UNDEF */

(Of course the code will be included if anyone defines the symbol UNDEF; however, anyone who does so should be shot.)

The compiler switch -Dsymbol allows symbols to be defined on the command line. For example, the command:

CC -DDEBUG -g -o prog prog.cc

compiles the program prog.c and includes all the code in #ifdef DEBUG/#endif /* DEBUG */ pairs, even though there is no #define DEBUG in the program. The Borland-C++ equivalent is:

bcc32 -DDEBUG -g -N -eprog.exe prog.c

The general form of the option is -Dsymbol or -Dsymbol=value. For example, the following sets MAX to 10:

CC -DMAX=10 -o prog prog.c

Most C++ compilers automatically define some system-dependent symbols. For example, Borland-C++ defines the symbol _ _BORLANDC_ _,and Windows-based compilers define _ _WIN32. The ANSI standard compiler C defines the symbol _ _STDC_ _. C++ compilers define the symbol _ _cplusplus. Most Unix compilers define a name for the system (e.g., Sun, VAX, Linux, etc.); however, they are rarely documented. The symbol unix is always defined for all Unix machines

Note

Command-line options specify the initial value of a symbol only. Any #define and #undef directives in the program can change the symbol’s value. For example, the directive #undef DEBUG results in DEBUG being undefined whether or not you use -DDEBUG .

#include Files

The #include directive allows the program to use source code from another file.

For example, you have been using the following directive in your programs:

#include <iostream>

This tells the preprocessor to take the file iostream and insert it in the current program. Files that are included in other programs are called header files. (Most #include directives come at the head of a program.) The angle brackets indicate that the file is a standard header file. In Unix, these files are usually located in /usr/include. In MS-DOS/Windows, they are located in an installation-dependent directory.

Standard include files are used for defining data structures and macros used by library routines. For example, std::cout is a standard object that (as you know by now) prints data on the standard output. The std::ostream class definition used by std::cout and its related routines is defined in iostream.[1]

Sometimes you may want to write your own set of include files. Local include files are particularly useful for storing constants and data structures when a program spans several files, which can be helpful for information sharing when a team of programmers is working on a single project. (See Chapter 23.)

Local include files may be specified by using double quotation marks (“) around the filename.

#include "defs.h"

The filename (”defs.h“) can be any valid filename. By convention, local C++ headers end in .h. The file specified by the #include can be a simple file, "defs.h“; a relative path, "../../data.h“; or an absolute path, "/root/include/const.h“. (In MS-DOS/Windows, you should use backslash () instead of slash (/) as a directory separator. For some reason though, you can still use slash (/) and things work.)

Include files may be nested, but this can cause problems. Suppose you define several useful constants in the file const.h. If the files data.h and io.h both include const.h, and you put the following in your program:

#include "data.h" 
#include "io.h"

you generate errors because the preprocessor sets the definitions in const.h twice. Defining a constant twice is not a fatal error; however, defining a data structure or union twice is an error and must be avoided.

One way around this problem is to have const.h check to see whether it has already been included and not define any symbols that have already been defined.

Look at the following code:

#ifndef _CONST_H_INCLUDED_ 

/* Define constants */ 

#define _CONST_H_INCLUDED_ 
#endif  /* _CONST_H_INCLUDED_ */

When const.h is included, it defines the symbol _CONST_H_INCLUDED_. If that symbol is already defined (because the file was included earlier), the #ifdef conditional hides all the other defines so they don’t cause trouble.

Tip

It is possible to put code in a header file, but this is considered poor programming practice. By convention, code goes in .cpp files and definitions, declarations, macros, and inline functions go in the .h files. You could include a .cpp file in another .cpp file, but this is considered bad practice.

Parameterized Macros

So far we have discussed only simple #defines or macros. Macros can take parameters. The following macro computes the square of a number:

#define SQR(x)  ((x) * (x))     /* Square a number */

Tip

There can be no space between the macro name (SQR in this example) and the open parenthesis.

When used, the macro replaces x with the text of its argument. SQR(5) expands to ((5) * (5)). It is a good rule always to put parentheses around the parameters of a macro. Example 10-7 illustrates the problems that can occur if this rule is not followed:

Example 10-7. sqr/sqr.cpp
#include <iostream>

#define SQR(x) (x * x)

int main(  )
{
    int counter;    // counter for loop

    for (counter = 0; counter < 5; ++counter) {
        std::cout << "x " << (counter+1) << 
                " x squared " << SQR(counter+1) << '
';
    }
    return (0);
}

Question 10-5: What does the above program output? (Try running it on your machine.) Why did it output what it did? (Try checking the output of the preprocessor.)

The keep-it-simple system of programming prevents us from using the increment (++) and decrement (--) operators except on a line by themselves. When used in an expression, they cause side effects, and this can lead to unexpected results, as illustrated in Example 10-8.

Example 10-8. sqr-i/sqr-i.cpp
#include <iostream>

#define SQR(x) ((x) * (x))

int main(  )
{
    int counter;    /* counter for loop */

    counter = 0;
    while (counter < 5)
        std::cout << "x " << (counter+1) << 
                " x squared " << SQR(++counter) << '
';
    return (0);
}

Why does this not produce the expected output? How much does the counter go up each time?

In the program shown in Example 10-8, the SQR(++counter) is expanded to ((++counter) * (++counter)) in this case. The result is that counter goes up by 2 each time through the loop. The actual result of this expression is system-dependent.

Question 10-6: Example 10-9 tells us we have an undefined variable, but our only variable name is counter. Why?

Example 10-9. rec/rec.cpp
#include <iostream>

#define RECIPROCAL (number) (1.0 / (number))

int main(  )
{
    float   counter;

    for (counter = 0.0; counter < 10.0; 
         counter += 1.0) {

        std::cout << "1/" << counter << " = " << 
                  RECIPROCAL(counter) << "
"; 
    }
    return (0);
}

The # Operator

The # operator is used inside a parameterized macro to turn an argument into a string. For example:

#define STR(data) #data
STR(hello)

This code generates:

"hello"

For a more extensive example of how to use this operator, see Chapter 27.

Parameterized Macros Versus Inline Functions

In most cases, to avoid most of the traps caused by parameterized macros, it is better to use inline functions. But there are cases where a parameterized macro may be better than an inline function. For example, the SQR macro works for both float and int data types. We’d have to write two inline functions to perform the same functions, or we could use a template function. (See Chapter 24.)

#define SQR(x) ((x) * (x))  // A parameterized macro
// Works, but is dangerous

// Inline function to do the same thing
inline int sqr(const int x) {
    return (x * x);
}

Advanced Features

This book does not cover the complete list of C++ preprocessor directives. Among the more advanced features are an advanced form of the #if directive for conditional compilations and the #pragma directive for inserting compiler-dependent commands into a file. See your C++ reference manual for more information on these features.

Summary

The C++ preprocessor is a very useful part of the C++ language. It has a completely different look and feel from C++. However, it must be treated apart from the main C++ compiler.

Problems in macro definitions often do not show up where the macro is defined, but result in errors much further down in the program. By following a few simple rules, you can decrease the chances of having problems:

  • Put parentheses around everything. In particular they should enclose #define constants and macro parameters.

  • When defining a macro with more than one statement, enclose the code in { }.

  • The preprocessor is not C++. Don’t use = or ;.

    #define X = 5 // Illegal
    #define X 5;  // Illegal
    #define X = 5; // Very illegal
    #define X 5   // Correct

Finally, if you got this far, be glad that the worst is over.

Programming Exercises

Note that the solutions to all the exercises below can be obtained using standard C++ syntax such as inline and enum. In general, using C++ construction is preferred over using macro definitions. However since this is the chapter on the preprocessor, macros should be used for these exercises.

Exercise 10-1: Create a set of macros to define a type called RETURN_STATUS and the following values: RETURN_SUCCESS, RETURN_WARNING, and RETURN_ERROR. Define a macro, CHECK_RETURN_FATAL,,, that takes a RETURN_STATUS as its argument and returns true if you have a fatal error.

Exercise 10-2: Write a macro that returns true if its parameter is divisible by 10 and false otherwise.

Exercise 10-3: Write a macro is_digit that returns true if its argument is a decimal digit. Write a second macro is_hex that returns true if its argument is a hex digit (0-9, A-F, a-f). The second macro should reference the first.

Exercise 10-4: Write a preprocessor macro that swaps two integers. (If you’re a real hacker, write one that does not use a temporary variable declared outside the macro.)

Answers to Chapter Questions

Answer 10-1: After the program has been run through the preprocessor, the std::cout statement is expanded to look like:

std::cout << "The square of all the parts is " <<  7 + 5 * 7 + 5  << '
';

The equation 7 + 5 * 7 + 5 evaluates to 47. It is a good rule to put parentheses ( ) around all expressions in macros. If you change the definition of ALL_PARTS to:

#define ALL_PARTS (FIRST_PART + LAST_PART)

the program executes correctly.

Answer 10-2: The preprocessor is a very simple-minded program. When it defines a macro, everything past the identifier is part of the macro. In this case, the definition of MAX is literally =10. When the for statement is expanded, the result is:

for (counter==10; counter > 0; --counter)

C++ allows you to compute a result and throw it away. For this statement, the program checks to see whether counter is 10 and discards the answer. Removing the = from the macro definition will correct the problem.

Answer 10-3: As with the previous problem, the preprocessor does not respect C++ syntax conventions. In this case, the programmer used a semicolon to end the statement, but the preprocessor included it as part of the definition for size. The assignment statement for size, expanded, is:

    size = 10; -2;;

The two semicolons at the end do not hurt anything, but the one in the middle is a killer. This line tells C++ to do two things: assign 10 to size and compute the value -2 and throw it away (this results in the null effect warning). Removing the semicolons will fix the problem.

Answer 10-4: The output of the preprocessor looks like:

int main(  ) {      
    int value;    
     
    value = 1;  
    if (value < 0)  
        std::cout << "Fatal Error: Abort
"; exit(8);  

    std::cout << "We did not die
"; 
    return (0); 
}

The problem is that two statements follow the if line. Normally they would be put on two lines. If we properly indent this program we get:

Example 10-10. die3/die.cpp
#include <iostream>
#include <cstdlib>

int main(  ) {     
    int value;  // a random value for testing 
    
    value = 1; 
    if (value < 0) 
        std::cout << "Fatal Error:Abort
";

    exit(8); 

    std::cout << "We did not die
";
    return (0);
}

From this it is obvious why we always exit. The fact that there were two statements after the if was hidden by using a single preprocessor macro. The cure for this problem is to put curly braces around all multistatement macros.

#define DIE  
    {std::cout << "Fatal Error: Abort
"; exit(8);}

Answer 10-5: The problem is that the preprocessor does not understand C++ syntax. The macro call:

SQR(counter+1)

expands to:

(counter+1 * counter+1)

The result is not the same as ((counter+1) * (counter+1)). To avoid this problem, use inline functions instead of parameterized macros:

inline int SQR(int x) { return (x*x);}

If you must use parameterized macros, enclose each instance of the parameter in parentheses:

#define SQR(x) ((x) * (x))

Answer 10-6: The only difference between a parameterized macro and one without parameters is the parentheses immediately following the macro name. In this case, a space follows the definition of RECIPROCAL, so it is not a parameterized macro. Instead it is a simple text replacement macro that replaces RECIPROCAL with:

(number) (1.0 / number)

Removing the space between RECIPROCAL and (number) corrects the problem.



[1] Actually, the ostream class is defined in the header ostream. However, this file is included by the iostream header. The result is that by including this single header, you get the definitions of standard objects such as std::cin and standard classes such as std::ostream.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.151.45