Chapter 10. The C++ Preprocessor

In This Chapter

  • Including source files

  • Defining constants and macros

  • Enumerating alternatives to constants

  • Inserting compile time checks

  • Simplifying declarations via typedef

You only thought that all you had to learn was C++. It turns out that C++ includes a preprocessor that works on your source files before the "real C++ compiler" ever gets to see it. Unfortunately, the syntax of the preprocessor is completely different than that of C++ itself.

Before you despair, however, let me hasten to add that the preprocessor is very basic and the C++ '09 standard has added a number of features that make the preprocessor almost unnecessary. Nevertheless, if the conversation turns to C++ at your next Coffee Club meeting, you'll be expected to understand the preprocessor.

What Is a Preprocessor?

Up until now, you may have thought of the C++ compiler as munching on your source code and spitting out an executable program in one step, but that isn't quite true.

First, the preprocessor makes a pass through your program looking for preprocessor instructions. The output of this preprocessor step is an intermediate file that has all the preprocessor commands expanded. This intermediate file gets passed to the C++ compiler for processing. The output from the C++ compiler is an object file that contains the machine instruction equivalent to your C++ source code. During the final step, a separate program known as the linker combines a set of standard libraries with your object file (or files as we'll see in Chapter 21) to create an executable program. (More on the standard library in the next section of this chapter.)

Tip

Object files normally carry the extension .o. Executable programs always carry the extension .exe in Windows and have no extension under Unix and Linux. Code::Blocks stores the object and executable files in their own folders. For example, if you've already built the IntAverage program from Chapter 2, you will have on your hard disk a folder C:CPP_ProgramsIntAverageobjDebug containing main.o and a folder C:CPP_ProgramsIntAverageinDebug that contains the executable program.

All preprocessor commands start with a # symbol in column 1 and end with the newline.

Warning

Like almost all rules in C++, this rule has an exception. You can spread a preprocessor command across multiple lines by ending the line with an escape character: . We won't have any preprocessor commands that are that complicated, however.

In this book, we'll be working with three preprocessor commands:

  • #include includes the contents of the specified file in place of the #include statement.

  • #define defines a constant or macro.

  • #if includes a section of code in the intermediary file if the following condition is true.

Each of these preprocessor commands is covered in the following sections.

Including Files

The C++ standard library consists of functions that are basic enough that almost everyone needs them. It would be silly to force every programmer to have to write them for herself. For example, the I/O functions, which we have been using to read input from the keyboard and write out to the console, are contained in the standard library.

However, C++ requires a prototype declaration for any function you call, whether it's in a library or not (see Chapter 6 if that doesn't make sense to you). Rather than force the programmer to type all these declarations by hand, the library authors created include files that contain little more than prototype declarations. All you have to do is #include the source file that contains the prototypes for the library routines you intend to use.

Take the following simple example. Suppose I had created a library that contains the trigonometric functions sin(), cosin(), tan(), and a whole lot more. I would likely create an include file mytrig with the following contents to go along with my standard library:

// include prototype declarations for my library
double sin(double x);
double cosin(double x);
double tan(double x);
// ...more prototype declarations...

Any program that wanted to make use of one of these math functions would #include that file, enclosing the name of the include file either in brackets or quotes as in

#include <mytrig>

or

#include "mytrig"

Note

The difference between the two forms of #include is a matter of where the preprocessor goes to look for the mytrig file. When the file is enclosed in quotes, the preprocessor assumes that the include file is locally grown, so it starts looking for the file in the same directory that it found the source file. If it doesn't find the file there, it starts looking in its own include file directories. The preprocessor assumes that include files in angle brackets are from the C++ library, so it skips looking in the source file directory and goes straight to the standard include file folders. Use quotes for any include file that you create and angle brackets for C++ library include files.

Thus, you might write a source file like the following:

// MyProgram - is very intelligent
#include "mytrig"

int main(int nArgc, char* pArguments[])
{
    cout << "The sin of .5 is " << sin(0.5) << endl;
    return 0;
}

The C++ compiler sees the following intermediary file after the preprocessor gets finished expanding the #include:

// MyProgram - is very intelligent
// include prototype declarations for my library
double sin(double x);
double cosin(double x);
double tan(double x);
// ...more prototype declarations...

int main(int nArgc, char* pArguments[])
{
cout << "The sin of .5 is " << sin(0.5) << endl;
    return 0;
}

Note

Warning

Historically, the convention was to end include files with .h. C still uses that standard. However, C++ dropped the extension when it revamped the include file structure. Now, C++ standard include files have no extension.

#Defining Things

The preprocessor also allows the programmer to #define expressions that get expanded during the preprocessor step. For example, you can #define a constant to be used throughout the program.

Tip

In usage, you pronounce the # sign as "pound," so you say "pound-define a constant" to distinguish from defining a constant in some other way.

#define TWO_PI 6.2831852

This makes the following statement much easier to understand:

double diameter = TWO_PI * radius;

than the equivalent expression, which is actually what the C++ compiler sees after the preprocessor has replaced TWO_PI with its definition:

double diameter = 6.2831852 * radius;

Another advantage is the ability to #define a constant in one place and use it everywhere. For example, I might include the following #define in an include file:

#define MAX_NAME_LENGTH 512

Throughout the program, I can truncate the names that I read from the keyboard to a common and consistent MAX_NAME_LENGTH. Not only is this easier to read but it also provides a single place in the program to change should I want to increase or decrease the maximum name length that I choose to process.

The preprocessor also allows the program to #define function-like macros with arguments that are expanded when the definition is used:

#define SQUARE(X) X * X

In use, such macro definitions look a lot like functions:

// calculate the area of a circle
double dArea = HALF_PI * SQUARE(dRadius);

Remember that the C++ compiler actually sees the file generated from the expansion of all macros. This can lead to some unexpected results. Consider the following code snippets (these are all taken from the program MacroConfusion, which is included on the CD-ROM):

int nSQ = SQUARE(2);
cout << "SQUARE(2) = " << nSQ << endl;

Reassuringly, this generates the expected output:

SQUARE(2) = 4

However, the following line:

int nSQ = SQUARE(1 + 2);
cout << "SQUARE(1 + 2) = " << nSQ << endl;

generates the surprising result:

SQUARE(1 + 2) = 5

The preprocesor simply replaced X in the macro definition with 1 + 2. What the C++ compiler actually sees is

int nSQ = 1 + 2 * 1 + 2;

Since multiplication has higher precedence than addition, this is turned into 1 + 2 + 2 which, of course, is 5. This confusion could be solved by liberal use of parentheses in the macro definition:

#define SQUARE(X) ((X) * (X))

This version generates the expected:

SQUARE(1 + 2) → ((1 + 2) * (1 + 2)) → 9

However, some unexpected results cannot be fixed no matter how hard you try. Consider the following snippet:

int i = 2;
cout << "i = " << i << endl;
int nSQ = SQUARE(i++);
cout << "SQUARE(i++) = " << nSQ << endl;
cout << "now i = " << i << endl;

This generates the following:

i = 3;
SQUARE(i++) = 9
now i = 5

The value generated by SQUARE is correct but the variable i has been incremented twice. The reason is obvious when you consider the expanded macro:

int i = 3;
nSQ = i++ * i++;

Since autoincrement has precedence, the two i++ operations are performed first. Both return the current value of i, which is 3. These two values are then multiplied together to return the expected value of 9. However, i is then incremented twice to generate a resulting value of 5.

Okay, how about not #defining things?

The sometimes unexpected results from the preprocessor have created heartburn for the fathers (and mothers) of C++ almost from the beginning. C++ has included features over the years to make most uses of #define unnecessary.

For example, C++ defines the inline function to replace the macro. This looks just like any other function declaration with the addition of the keyword inline tacked to the front:

inline int SQUARE(int x) { return x * x; }

This inline function definition looks very much like the previous macro definition for SQUARE() (I have written this definition on one line to highlight the similarities). However, an inline function is processed by the C++ compiler rather than by the preprocessor. This definition of SQUARE() does not suffer from any of the strange effects noted previously.

Warning

The inline keyword is supposed to suggest to the compiler that it "expand the function inline" rather than generate a call to some code somewhere to perform the operation. This was to satisfy the speed freaks, who wanted to avoid the overhead of performing a function call compared to a macro definition that generates no such call. The best that can be said is that inline functions may be expanded in place, but then again, they may not. There's no way to be sure without performing detailed timing analysis or examining the machine code output by the compiler.

Note

Some C++ compilers allowed programmers to use a variable declared const to take the place of a #define constant so long as the value of the constant was spelled out at compile time. This was formalized in the 2009 C++ standard, which makes the following legal:

const int MAX_NAME_LENGTH = 512;
int szName[MAX_NAME_LENGTH];

Note

The '09 standard goes so far as to introduce a new declaration type known as a const expression:

constexpr int square(int n1, int n2)
    { return n1 * n1 + 2 * n1 * n2 + n2 * n2;}

A const expression is valid if every subexpression can be calculated at compile time. This means that a const expression may contain nothing but references to constants and other const expressions.

Warning

The compiler included on the enclosed CD-ROM does not implement const expressions.

Enumerating other options

C++ provides a mechanism for defining constants of a separate, user-defined type. Suppose, for example, that I were writing a program that manipulated States of the Union. I could refer to the states by their name, such as "Texas" or "North Dakota." In practice, this is not convenient since repetitive string comparisons are computationally intensive and subject to error.

I could define a unique value for each state as follows:

#define DC_OR_TERRITORY 0
#define ALABAMA 1
#define ALASKA 2
#define ARKANSAS 3
//...and so on...

Not only does this avoid the clumsiness of comparing strings; it allows me to use the name of the state as an index into an array of properties such as population:

// increment the population of ALASKA (they need it)
population[ALASKA]++;

A statement such as this is much easier to understand than the semantically identical population[2]++. This is such a common thing to do that C++ allows the programmer to define what's known as an enumeration:

enum STATE {DC_OR_TERRITORY,  // gets 0
            ALABAMA,          // gets 1
            ALASKA,           // gets 2
            ARKANSAS,
           // ...and so on...

Each element of this enumeration is assigned a value starting at 0, so DC_OR_TERRITORY is defined as 0, ALABAMA is defined as 1, and so on. You can override this incremental sequencing by using as assign statement as follows:

enum STATE {DC,
            TERRITORY = 0,
            ALABAMA,
            ALASKA,
            // ...and so on...

This version of STATE defines an element DC, which is given the value 0. It then defines a new element TERRITORY, which is also assigned the value 0. ALABAMA picks up with 1 just as before.

Note

The '09 standard extended enumerations by allowing the programmer to create a user-defined enumerated type as follows (note the addition of the keyword class in the snippet):

enum class STATE {DC,
                   TERRITORIES = 0,
                   ALABAMA,
                   ALASKA,
                   // ...and so on...

This declaration creates a new type STATE and assigns it 52 members (ALABAMA through WYOMING plus DC and TERRITORIES). The programmer can now use STATE as she would any other variable type. A variable can be declared to be of type STATE:

STATE s = STATE::ALASKA;

Function calls can be differentiated by this new type:

int getPop(STATE s);            // return population
int setPop(STATE s, int pop);   // set the population

The type STATE is not just another word for int: arithmetic is not defined for members of type STATE. The following attempt to use STATE as an index into an array is not legal:

int getPop(STATE s)
{
    return population[s];  // not legal
}

However, the members of STATE can be converted to their integer equivalent (0 for DC and TERRITORIES, 1 for ALABAMA, 2 for ALASKA, and so on) through the application of a cast:

int getPop(STATE s)
{
    return population[(int)s];  // is legal
}

Including Things #if I Say So

The third major class of preprocessor statement is the #if, which is a preprocessor version of the C++ if statement:

#if constexpression
// included if constexpression evaluates to other than 0
#else
// included if constexpression evaluates to 0
#endif

This is known as conditional compilation because the set of statements between the #if and the #else or #endif are included in the compilation only if a condition is true. The constexpression phrase is limited to simple arithmetic and comparison operators. That's okay because anything more than an equality comparison and the occasional addition is rare.

For example, the following is a common use for #if. I can include the following definition within an include file with a name such as LogMessage:

#if DEBUG == 1
inline void logMessage(const char *pMessage)
        { cout << pMessage << endl; }
#else
#define logMessage(X) (0)
#endif

I can now sprinkle error messages throughout my program wherever I need them:

#define DEBUG 1
#include "LogMessage"
void testFunction(char *pArg)
{
    logMessage(pArg);
    // ...function continues...

With DEBUG set to 1, the logMessage() is converted into a call to an inline function that outputs the argument to the display. Once the program is working properly, I can remove the definition of DEBUG. Now the references to logMessage() invoke a macro that does nothing.

A second version of the conditional compilation is the #ifdef (which is pronounced "if def"):

#ifdef DEBUG
// included if DEBUG has been #defined
#else
// included if DEBUG has not been #defined
#endif

There is also an #ifndef (pronounced "if not def"), which is the logical reverse of #ifdef.

Intrinsically Defined Objects

C++ defines a set of intrinsic constants, which are shown in Table 10-1. These are constants that C++ thinks are just too cool to be without — and that you would have trouble defining for yourself anyway.

Table 10.1. Predefined Preprocessor Constants

Constant

Type

Meaning

__FILE__

const char const *

The name of the source file

__LINE__

const int

The current line number

__func__

const char const *

The name of the current function (C++ '09 only)

__DATE__

const char const *

The current date

__TIME__

const char const *

The current time

__TIMESTAMP__

const char const *

The current date and time

__STDC__

int

Set to 1 if the C++ compiler is compliant with the standard

__cplusplus

int

Set to 1 if the compiler is a C++ compiler (as opposed to a C compiler). This allows include files to be shared across environments.

These internal macros are particularly useful when generating error messages. You would think that C++ generates plenty of error messages on its own and doesn't need any more help, but sometimes you want to create your own compiler errors. For you, C++ offers not one, not two, but three options: #error, assert(), and static_assert(). Each of these three mechanisms works slightly differently.

The #error command is a preprocessor directive (as you can tell by the fact that it starts with the # sign). It causes the preprocessor to stop and output a message. Suppose that your program just won't work with anything but standard C++. You could add the following to the beginning of your program:

#if !__cplusplus || !__STDC__
#error This is a standard C++ program.
#endif

Now if someone tries to compile your program with other than a C++ compiler that strictly adheres to the standards, she will get a single neat error message rather than a raft of potentially meaningless error messages from a confused C compiler.

Tip

A more meaningful test would be for a particular compiler. Each compiler defines its own preprocessor constants. If your program required the GNU C++ implementation of the C++ '09 standards, you might add the following, taken straight out of one of the GNU include files:

#ifndef __GXX_EXPERIMENTAL_CXX0X__
#error This file requires compiler and library support for the upcoming 
ISO C++ standard, C++0x. This support is currently experimental, and must be 
enabled with the -std=c++0x or -std=gnu++0x compiler options.
#endif

Note

The backslash at the end of the line causes the preprocessor to ignore the newline character, effectively turning all three lines of error message into one long preprocessor command.

So, if __GXX_EXPERIMENTAL_CXX0X__ is not defined when the preprocessor gets to this point, the preprocessor stops and spits out the three lines telling you to go back and compile with some silly switch set.

Unlike #error, assert() performs its test when the resulting program is executed. For example, suppose that I had written a factorial program that calculates N * (N - 1) * (N - 2) and so on down to 1 for whatever N I pass it. Factorial is only defined for positive integers; passing a negative number to a factorial is always a mistake. To be careful, I should add a test for a nonpositive value at the beginning of the function:

int factorial(int N)

    assert(N > 0);
    // ...program continues...

The program now checks the argument to factorial() each time it is called. At the first sign of negativity, assert() halts the program with a message to the operator that the assertion failed, along with the file and line number.

Liberal use of assert() throughout your program is a good way to detect problems early during development, but constantly testing for errors that have already been found and removed during testing slows the program needlessly. To avoid this, C++ allows the programmer to "remove" the tests when creating the version of the program to be shipped to users: #define the constant NDEBUG (for "not debug mode"). This causes the preprocessor to convert all the calls to assert() in your module to "do nothing's" (universally known as NO-OPs).

The preprocessor cannot perform certain compile-time tests. For example, suppose that your program works properly only if the default integer size is 32 bits. The preprocessor is of no help since it knows nothing about integers or floating points. To address this situation, C++ '09 introduced the keyword static_assert(), which is interpreted by the compiler (rather than the preprocessor). It accepts two arguments: a const expression and a string, as in the following example:

static_assert(sizeof(int) == 4, "int is not 32-bits.");

If the const expression evaluates to 0 or false during compilation, the compiler outputs the string and stops. The static_assert() does not generate any runtime code. Remember, however, that the expression is evaluated at compile time so it cannot contain function calls or references to things that are known only when the program executes.

Typedef

The typedef keyword allows the programmer to create a shorthand name for a declaration. The careful application of typedef can make the resulting program easier to read. (Note that typedef is not actually a preprocessor command, but it's largely associated with include files and the preprocessor.)

typedef int* IntPtr;
typedef const IntPtr IntConstPtr;

int i;
int *const ptr1 = &i;
IntConstPtr ptr2= ptr1; // ptr1 and ptr2 are the same type

The first two declarations in this snippet give a new name to existing types. Thus, the second declaration declares IntConstPtr to be another name for int const*. When this new type is used in the declaration of ptr2, it has the same effect as the more complicated declaration of ptr1.

Although typedef does not introduce any new capability, it can make some complicated declarations a lot easier to read.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.234.28