In This Chapter
Performing arithmetic operations on character pointers
Examining the relationship between pointers and arrays
Increasing program performance
Extending pointer operations to different pointer types
Explaining the arguments to main()
in our C++ program template
C++ allows the programmer to operate on pointer variables much as she would on simple types of variables. (The concept of pointer variables is introduced in Chapter 8.) How and why this is done along with its implications are the subjects of this chapter.
Some of the same arithmetic operators I cover in Chapter 3 can be applied to pointer types. This section examines the implications of applying these operators to both to pointers and to the array types (I discuss arrays in Chapter 7). Table 9-1 lists the three fundamental operations that are defined on pointers. In Table 9-1, pointer, pointer1, and pointer2 are all of some pointer type, say char*;
and offset is an integer, for example, long
. C++ also supports the other operators related to addition and subtraction, such as ++
and +=
., although they are not listed in Table 9-1.
Table 9.1. The Three Basic Operations Defined on Pointer Types
Operation | Result | Meaning |
---|---|---|
| pointer | Calculate the address of the object |
| pointer | The opposite of addition |
| offset | Calculate the number of entries between |
The neighborhood memory model is useful to explain how pointer arithmetic works. Consider a city block in which all houses are numbered sequentially. The house at 123 Main Street has 122 Main Street on one side and 124 Main Street on the other.
Now it's pretty clear that the house four houses down from 123 Main Street must be 127 Main Street; thus, you can say 123 Main + 4 = 127 Main
. Similarly, if I were to ask how many houses are there from 123 Main to 127 Main, the answer would be four — 127 Main - 123 Main = 4
. (Just as an aside, a house is zero houses from itself: 123 Main - 123 Main = 0.)
But it makes no sense to ask how far away from 123 Main Street is 4 or what the sum of 123 Main and 127 Main is. In similar fashion, you can't add two addresses. Nor can you multiply an address, divide an address, square an address, or take the square root — you get the idea. You can perform any operation that can be converted to addition or subtraction. For example, if you increment a pointer to 123 Main Street, it now points to the house next door (at 124 Main, of course!).
Now return to the wonderful array for just a moment. Consider the case of an array of 32 1-byte characters called charArray
. If the first byte of this array is stored at address 0x100, the array will extend over the range 0x100 through 0x11f. charArray[0]
is located at address 0x100, charArray[1]
is at 0x101, charArray[2]
at 0x102, and so on.
After executing the expression
ptr = &charArray[0];
the pointer ptr
contains the address 0x100. The addition of an integer offset to a pointer is defined such that the relationships shown in Table 9-2 are true. Table 9-2 also demonstrates why adding an offset n
to ptr
calculates the address of the nth element in charArray
.
Table 9.2. Adding Offsets
Offset | Result | Is the Address of |
---|---|---|
+ 0 | 0x100 |
|
+ 1 | 0x101 |
|
+ 2 | 0x102 |
|
... | ... | ... |
+ n | 0x100+ n |
|
The addition of an offset to a pointer is identical to applying an index to an array.
Thus, if
char* ptr = &charArray[0];
then
*(ptr + n) ← corresponds with → charArray[n]
Because *
has higher precedence than addition, * ptr + n
adds n
to the character that ptr
points to. The parentheses are needed to force the addition to occur before the indirection. The expression *(ptr + n)
retrieves the character pointed at by the pointer ptr
plus the offset n
.
In fact, the correspondence between the two forms of expression is so strong that C++ considers array[n]
nothing more than a simplified version of *(ptr + n)
, where ptr
points to the first element in array
.
array[n] -- C++ interprets as → *(&array[0] + n)
To complete the association, C++ takes a second shortcut. If given
char charArray[20];
charArray
is defined as &charArray[0];
. That is, the name of an array without a subscript present is the address of the array itself. Thus, you can further simplify the association to
array[n] -- C++ interprets as → *(array + n)
The correspondence between indexing an array and pointer arithmetic is useful. For example, a displayArray()
function used to display the contents of an array of integers can be written as follows:
// displayArray - display the members of an // array of length nSize void displayArray(int intArray[], int nSize) { cout << "The value of the array is: "; for(int n; n < nSize; n++) { cout << n << ": " << intArray[n] << " "; } cout << endl; }
This version uses the array operations with which you are familiar. A pointer version of the same appears as follows:
// displayArray - display the members of an // array of length nSize void displayArray(int intArray[], int nSize) { cout << "The value of the array is: "; int* pArray = intArray; for(int n = 0; n < nSize; n++, pArray++) { cout << n << ": " << *pArray << " "; } cout << endl; }
The new displayArray()
begins by creating a pointer to an integer pArray
that points at the first element of intArray
.
The p
in the variable name indicates that the variable is a pointer, but this is just a convention, not a part of the C++ language.
The function then loops through each element of the array. On each loop, displayArray()
outputs the current integer (that is, the integer pointed at by pArray
) before incrementing the pointer to the next entry in intArray. displayArray()
can be tested using the following version of main()
:
int main(int nNumberofArgs, char* pszArgs[]) { int array[] = {4, 3, 2, 1}; displayArray(array, 4); // wait until user is ready before terminating program // to allow the user to see the program results system("PAUSE"); return 0; }
The output from this program is
The value of the array is: 0: 4 1: 3 2: 2 3: 1 Press any key to continue...
You may think this pointer conversion is silly; however, the pointer version of displayArray()
is actually more common than the array version among C++ programmers in the know. For some reason, C++ programmers don't seem to like arrays but they love pointer manipulation.
The use of pointers to access arrays is nowhere more common than in the accessing of character arrays.
A null-terminated string is simply a constant character array whose last character is a null. C++ uses the null character at the end to serve as a terminator. This null-terminated array serves as a quasivariable type of its own. (See Chapter 7 for an explanation of null-terminated string arrays.) Often C++ programmers use character pointers to manipulate such strings. The following code examples compare this technique to the earlier technique of indexing in the array.
Character pointers enjoy the same relationship with a character array that any other pointer and array share. However, the fact that strings end in a terminating null makes them especially amenable to pointer-based manipulation, as shown in the following DisplayString program:
// DisplayString - display an array of characters using // both a pointer and an array index #include <cstdio> #include <cstdlib> #include <iostream> using namespace std; int main(int nNumberofArgs, char* pszArgs[]) { // declare a string const char* szString = "Randy"; cout << "The array is '" << szString << "'" << endl; // display szString as an array cout << "Display the string as an array: "; for(int i = 0; i < 5; i++) { cout << szString[i]; } cout << endl; // now using typical pointer arithmetic cout << "Display string using a pointer: "; const char* pszString = szString; while(*pszString) { cout << *pszString; pszString++; } cout << endl; // wait until user is ready before terminating program // to allow the user to see the program results system("PAUSE"); return 0; }
The program first makes its way through the array szString
by indexing into the array of characters. The for
loop chosen stops when the index reaches 5, the length of the string.
The second loop displays the same string using a pointer. The program sets the variable pszString
equal to the address of the first character in the array. It then enters a loop that will continue until the char
pointed at by pszString
is equal to false
— in other words, until the character is a null
.
The integer value 0 is interpreted as false
— all other values are true
.
The program outputs the character pointed at by pszString
and then increments the pointer so that it points to the next character in the string before being returned to the top of the loop.
The dereference and increment can be (and usually are) combined into a single expression as follows:
cout << *pszString++;
The output of the program appears as follows:
The array is 'Randy' Display the string as an array: Randy Display string using a pointer: Randy Press any key to continue...
The sometimes-cryptic nature of pointer-based manipulation of character strings might lead the reader to wonder, "Why?" That is, what advantage does the char*
pointer version have over the easier-to-read index version?
The answer is partially (pre-)historic and partially human nature. When C, the progenitor to C++, was invented, compilers were pretty simplistic. These compilers could not perform the complicated optimizations that modern compilers can. As complicated as it might appear to the human reader, a statement such as *pszString++
could be converted into an amazingly small number of machine-level instructions even by a stupid compiler.
Older computer processors were not very fast by today's standards. In the early days of C, saving a few computer instructions was a big deal. This gave C a big advantage over other languages of the day, notably Fortran, which did not offer pointer arithmetic.
In addition to the efficiency factor, programmers like to generate clever program statements. After C++ programmers learn how to write compact and cryptic but efficient statements, there is no getting them back to accessing arrays with indices.
Do not generate complex C++ expressions to create a more efficient program. There is no obvious relationship between the number of C++ statements and the number of machine instructions generated.
It is not too hard to convince yourself that szTarget + n
points to szTarget [n]
when szTarget
is an array of char
s. After all, a char
occupies a single byte. If szTarget
is stored at 0x100, szTarget[5]
is located at 0x105.
It is not so obvious that pointer addition works in exactly the same way for an int
array because an int
takes 4 bytes for each char
's 1 byte (at least it does on a 32-bit Intel processor). If the first element in intArray
were located at 0x100, then intArray[5]
would be located at 0x114 (0x100 + (5 * 4) = 0x114
) and not 0x104.
Fortunately for us, array + n
points at array[n]
no matter how large a single element of array
might be. C++ takes care of the element size for us — it's clever that way.
Once again, the dusty old house analogy works here as well. (I mean dusty analogy, not dusty house.) The third house down from 123 Main is 126 Main, no matter how large the building might be, even if it's a hotel.
There are some differences between an array and a pointer. For one, the array allocates space for the data, whereas the pointer does not, as shown here:
void arrayVsPointer() { // allocate storage for 128 characters char charArray[128]; // allocate space for a pointer but not for // the thing pointed at char* pArray; }
Here charArray
allocates room for 128 characters. pArray
allocates only 4 bytes — the amount of storage required by a pointer.
Consider the following example:
char charArray[128]; charArray[10] = '0'; // this works fine char* pArray; pArray[10] = '0'; // this writes into random location
The expression pArray[10]
is syntactically equivalent to charArray[10]
, but pArray
has not been initialized so pArray[10]
references some random (garbage) location in memory.
The mistake of referencing memory with an uninitialized pointer variable is generally caught by the CPU when the program executes, resulting in the dreaded segment violation error that from time to time issues from your favorite applications under your favorite, or not-so-favorite, operating system. This problem is not generally the fault of the processor or the operating system but of the application.
A second difference between a pointer and the address of an array is that charArray
is a constant, whereas pArray
is not. Thus, the following for
loop used to initialize the array charArray
does not work:
void arrayVsPointer() { char charArray[10]; for (int i = 0; i < 10; i++) { *charArray = ' '; // this makes sense... charArray++; // ...this does not } }
The expression charArray++
makes no more sense than 10++
. The following version is correct:
void arrayVsPointer() { char charArray[10]; char* pArray = charArray; for (int i = 0; i < 10; i++) { *pArray = ' '; // this works great pArray++; }
C++ is completely quiet about what is and isn't a legal address, with one exception. C++ predefines the constant nullptr
with the following properties:
It is a constant value.
It can be assigned to any pointer type,
It evaluates to false
.
It is never a legal address.
The constant nullptr
is used to indicate when a pointer has not been initialized. It is also often used to indicate the last element in an array of pointers in much the same way that a null character is used to terminate a character string.
Actually the keyword nullptr
was introduced in the 2009 standard. Before that, the constant 0 was used to indicate a null pointer.
It is a safe practice to initialize pointers to the nullptr
(or 0 if your compiler doesn't support nullptr
yet). You should also clear out the contents of a pointer to heap memory after you invoke delete
to avoid deleting the same memory block twice:
delete pHeap; // return memory to the heap pHeap = nullptr; // now clear out the pointer
Passing the same address to delete twice will always cause your program to crash. Passing a nullptr
(or 0) to delete
has no effect.
If pointers can point to arrays, it seems only fitting that the reverse should be true. Arrays of pointers are a type of array of particular interest.
Just as arrays may contain other data types, an array may contain pointers. The following declares an array of pointers to int
s:
int* pInts[10];
Given the preceding declaration, pInts[0]
is a pointer to an int
value. Thus, the following is true:
void fn() { int n1; int* pInts[3]; pInts[0] = &n1; *pInts[0] = 1; }
or
void fn() { int n1, n2, n3; int* pInts[3] = {&n1, &n2, &n3}; for (int i = 0; i < 3; i++) { *pInts[i] = 0; } }
or even
void fn() { int* pInts[3] = {(new int), (new int), (new int)}; for (int i = 0; i < 3; i++) { *pInts[i] = 0; } }
The latter declares three int
objects off the heap. This type of declaration isn't used very often except in the case of an array of pointers to character strings. The following two examples show why arrays of character strings are useful.
Suppose I need a function that returns the name of the month corresponding to an integer argument passed to it. For example, if the program is passed a 1, it returns a pointer to the string "January"
; if 2, it reports "February"
, and so on. The month 0 and any numbers greater than 12 are assumed to be invalid. I could write the function as follows:
// int2month() - return the name of the month const char* int2month(int nMonth) { const char* pszReturnValue; switch(nMonth) { case 1: pszReturnValue = "January"; break; case 2: pszReturnValue = "February"; break; case 3: pszReturnValue = "March"; break; // ...and so forth... default: pszReturnValue = "invalid"; } return pszReturnValue; }
The switch()
control command is like a sequence of if
statements.
A more elegant solution uses the integer value for the month as an index into an array of pointers to the names of the months. In use, this appears as follows:
// define an array containing the names of the months const char *const pszMonths[] = {"invalid", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"}; // int2month() - return the name of the month const char* int2month(int nMonth) { // first check for a value out of range if (nMonth < 1 || nMonth > 12) { return "invalid"; } // nMonth is valid - return the name of the month return pszMonths[nMonth]; }
Here int2Month()
first checks to make sure that nMonth
is a number between 1 and 12, inclusive (the default
clause of the switch
statement handled that in the previous example). If nMonth
is valid, the function uses it as an offset into an array containing the names of the months.
This technique of referring to character strings by index is especially useful when writing your program to work in different languages. For example, a program may declare a ptrMonths
of pointers to Julian months in different languages. The program would initialize ptrMonth
to the proper names, be they in English, French, or German (for example) at execution time. In that way, ptrMonth[1]
points to the correct name of the first Julian month, irrespective of the language.
A program that demonstrates int2Month()
is included on the CD-ROM as DisplayMonths.
Now the truth can be told — what are all those funny argument declarations to main()
in our program template? The second argument to main()
is an array of pointers to null-terminated character strings. These strings contain the arguments to the program. The arguments to a program are the strings that appear with the program name when you launch it. These arguments are also known as parameters. The first argument to main()
is the number of parameters passed to the program. For example, suppose that I entered the following command at the command prompt:
MyProgram file.txt /w
The operating system executes the program contained in the file MyProgram.exe
, passing it the arguments file.txt
, and /w
.
Consider the following simple program:
// PrintArgs - write the arguments to the program // to the standard output #include <cstdio> #include <cstdlib> #include <iostream> using namespace std; int main(int nNumberofArgs, char* pszArgs[]) { // print a warning banner cout << "The arguments to " << pszArgs[0] << " are: "; // now write out the remaining arguments for (int i = 1; i < nNumberofArgs; i++) { cout << i << ":" << pszArgs[i] << " "; } // that's cout << "That's it" << endl; // wait until user is ready before terminating program // to allow the user to see the program results system("PAUSE"); return 0; }
As always, the function main()
accepts two arguments. The first argument is an int
that I have been calling (quite descriptively, as it turns out) nNumberofArgs
. This variable is the number of arguments passed to the program. The second argument is an array of pointers of type char*
that I have been calling pszArgs
.
If I were to execute the PrintArgs
program from the command prompt window as
PrintArgs arg1 arg2 arg3 /w
nArgs
would be 5 (one for each argument). The first argument is the name of the program itself. This could be anywhere from the simple "PrintArgs" to the slightly more complicated "PrintArgs.exe" to the full path — the C++ standard doesn't specify. The environment can even supply a null string ""
if it doesn't have access to the name of the program.
The remaining elements in pszArgs
point to the program arguments. For example, the element pszArgs[1]
points to "arg1" and pszArgs[2]
to "arg2". Because Windows does not place any significance on "/w", this string is also passed as an argument to be processed by the program.
Actually C++ includes one final value. The last value in the array, the one after the pointer to the last argument to the program, contains nullptr
.
To demonstrate how argument passing works, you need to build the program from within Code::Blocks and then execute the program directly from a command prompt. First ensure that Code::Blocks has built an executable by opening the PrintArgs projects and choosing Build
Next open a command prompt window. If you are running Unix or Linux, you're already there. If you are running Windows, choose Programs
Now you need to use the CD
command to navigate to the directory where Code::Blocks placed the PrintArgs program. If you used the default settings when installing Code::Blocks that directory will be C:CPP_ProgramsChap09PrintArgsinDebug
.
You can now execute the program by typing its name followed by your arguments. The following shows what happened when I did it in Windows Vista:
Microsoft Windows [Version 6.0.6001] Copyright (c) 2006 Microsoft Corporation. All rights reserved. C:UsersRandy>cd cpp_programschap09printargsindebug C:CPP_ProgramsChap09PrintArgsinDebug>PrintArgs arg1 arg2 arg3 /n The arguments to PrintArgs are: 1:arg1 2:arg2 3:arg3 4:/n That's it Press any key to continue . . .
Wild cards such as *.*
may or may not be expanded before being passed to the program — the standard is silent on this point. The Code::Blocks/gcc compiler included with this book does perform such expansion on Windows Vista, as the following example shows:
C:CPP_ProgramsChap09PrintArgs>bindebugPrintArgs *.* The arguments to bindebugPrintArgs are: 1:bin 2:main.cpp 3:obj 4:PrintArgs.cbp That's it Press any key to continue . . .
Here you see the names of the files in the current directory in place of the *.*
that I entered.
Wild-card expansion is performed under all forms of Unix and Linux as well. Wild-card expansion was specifically not performed under older versions of gcc and it isn't performed under Visual C++ Express.
You can add arguments to your program when you execute it from Code::Blocks as well. Choose Project
Windows passes arguments as a means of communicating with your program as well. Try the following experiment. Build your program as you would normally. Find the executable file using Windows Explorer. As noted earlier, the default location for the PrintArgs program is C:CPP_ProgramsChap09PrintArgsinDebug
. Now grab a file and drop it onto the filename. (It doesn't matter what file you choose because the program won't hurt it anyway.) Bam! The PrintArgs program starts right up, and the name of the file that you dropped on the program appears.
Now try again, but drop several files at once. Select multiple filenames while pressing the Ctrl key or by using the Shift key to select a group. Now drag the lot of them onto PrintArgs.exe and let go. The name of each file appears as output.
I dropped a few of the files that appear in my Program FilesWinZip
folder onto PrintArgs as an example:
The arguments to C:CPP_ProgramsChap09PrintArgsinDebugPrintArgs.exe are: 1:C:Program FilesWinZipVENDOR.TXT 2:C:Program FilesWinZipWHATSNEW.TXT 3:C:Program FilesWinZipWINZIP.CHM 4:C:Program FilesWinZipWINZIP.TXT 5:C:Program FilesWinZipWINZIP32.EXE 6:C:Program FilesWinZipWZ.COM 7:C:Program FilesWinZipWZ.PIF 8:C:Program FilesWinZipWZ32.DLL 9:C:Program FilesWinZipWZCAB.DLL 10:C:Program FilesWinZipWZCAB3.DLL 11:C:Program FilesWinZipFILE_ID.DIZ 12:C:Program FilesWinZipLICENSE.TXT 13:C:Program FilesWinZipORDER.TXT 14:C:Program FilesWinZipREADME.TXT That's it Press any key to continue . . .
Notice that the name of each file appears as a single argument, even though the filename may include spaces. Also note that Windows passes the full path name of the file.
18.191.181.252