One of the most powerful tools available to a C++ programmer is the pointer. Pointers provide the capability to manipulate computer memory directly. That power comes at a price: Pointers are one of the most difficult aspects of C++ for many beginners to learn.
A variable is an object that can hold a value. An integer variable holds a number. A character variable holds a letter. A pointer is a variable that holds a memory address.
Okay, so what is a memory address? To fully understand this, you must know a little about computer memory.
Computer memory is where variable values are stored. By convention, computer memory is divided into sequentially numbered memory locations. Each of these locations is a memory address.
Every variable of every type is located at a unique address in memory. Figure 10.1 shows a schematic representation of the storage of an unsigned long
integer variable, theAge
.
Different computers number the memory using different complex schemes. Usually, programmers don’t need to know the particular address of any given variable because the compiler handles the details. If you want this information, you can use the address of
operator &
, which is illustrated in the Addresser program in Listing 10.1.
1: #include <iostream>
2:
3: int main()
4: {
5: unsigned short shortVar = 5;
6: unsigned long longVar = 65535;
7: long sVar = -65535;
8:
9: std::cout << "shortVar: " << shortVar;
10: std::cout << " Address of shortVar: " << &shortVar << "
";
11: std::cout << "longVar: " << longVar;
12: std::cout << " Address of longVar: " << &longVar << "
";
13: std::cout << "sVar: " << sVar;
14: std::cout << " Address of sVar: " << &sVar << "
";
15:
16: return 0;
17: }
Addresser produces output such as the following:
shortVar: 5 Address of shortVar: 0x8fc9:fff4
longVar: 65535 Address of longVar: 0x8fc9:fff2
sVar: -65535 Address of sVar: 0x8fc9:ffee
The actual address of each pointer will differ because each computer will store variables at different addresses, depending on what else is in memory and how much memory is available.
The special character
in Listing 10.1 causes a tab character to be inserted in the output. This is a simple way of creating columns. There are other useful characters like this in addition to the newline character
.
The \
character is used to display a backslash.
The "
character is used to display a double quote.
The '
character is used to display a single quote.
Three variables are declared and initialized: a short
in line 5, an unsigned long
in line 6, and a long
in line 7. Their values and addresses are displayed in lines 9–14 by using the address of
operator &
.
The value of shortVar
is 5
(as expected), and its address is 0x8fc9:fff4. This complicated address is computer specific and can change slightly each time the program is run. Your results will differ. What doesn’t change, however, is that the difference in the first two addresses is 4 bytes if your computer uses 4-byte short
integers. The difference between the second and third is 4 bytes if your computer uses 4-byte long
integers. Figure 10.2 illustrates how the variables in this program would be stored in memory. (Note that on some computers the difference will be 4 bytes on both, depending on how your compiler is configured.)
There is no reason why you would need to know the actual numeric value of the address of each variable. What you care about is that each one has an address and that the right amount of memory is set aside.
How does the compiler know how much memory each variable needs? You tell the compiler how much memory to allow for your variables by declaring the variable’s type.
Therefore, if you declare your variable to be of type unsigned long
, the compiler knows to set aside 4 bytes of memory because every unsigned long
takes 4 bytes. The compiler takes care of assigning the actual address.
When a pointer is allocated, the compiler assigns enough memory to hold an address in your hardware and operating system environment. The size of a pointer might or might not be the same size as an integer, so be sure you make no assumptions.
Every variable has an address. Even without knowing the specific address of a given variable, you can store that address in a pointer.
For example, suppose that the variable howOld
is an integer. To declare a pointer called pAge
to hold its address, you write the following statement:
int *pAge = NULL;
This declares pAge
to be a pointer to int
. That is, pAge
is declared to hold the address of an int
.
Pointers can have any name that is legal for other variables. This book follows the convention of naming all pointers with an initial p
and a second letter capitalized, as in pAge
.
Note that pAge
is a variable like any other variable. When you declare an integer variable, it is set up to hold an integer. When you declare a pointer variable like pAge
, it is set up to hold an address. A pointer is just a special type of variable that holds the address of an object in memory; in this case, pAge
is holding the address of an integer variable.
You declare the type of variable you want the pointer to point to. This tells the compiler how to treat the memory at the location the pointer points to. The pointer itself contains an address.
In this example, pAge
is initialized to NULL. A pointer whose value is NULL is called a null pointer. All pointers, when they are created, should be initialized to something. If you don’t know what you want to assign to the pointer, assign NULL. A pointer that is not initialized is called a wild pointer. Wild pointers are dangerous.
You also might see pointers initialized to 0 like this:
int *pAge = 0;
The result should be the same as if you initialized it to NULL, but technically 0 is an integer constant, and NULL is an address constant of 0.
Did you Know?
The next version of C, C++0x, has a new nullptr
constant that represents a null pointer. When your C++ compiler supports this new version, use nullptr
instead of 0 or NULL.
If you initialize the pointer to 0 or NULL, you must specifically assign the address of howOld
to pAge
. Here’s code that shows how to do that:
int howOld = 50; // make a variable
int *pAge = 0; // make a pointer
pAge = &howOld; // put howOld's address in pAge
The first line creates a variable—howOld
, whose type is unsigned short int
—and initializes it with the value 50. The second line declares pAge
to be a pointer to type unsigned short int
and initializes the address to 0. You know that pAge
is a pointer because of the asterisk (*
) after the variable type and before the variable name.
The third and final line assigns the address of howOld
to the pointer pAge
. You can tell that the address of howOld
is being assigned to the pointer because of the address of
operator &
. If the address of
operator was not used, the value of howOld
would be assigned instead of its address. That value might be a valid address somewhere in memory, but that would be entirely a coincidence.
By the Way
Assigning a nonpointer to a pointer variable is a common error. Fortunately, the compiler will detect this and fail with an “invalid conversion” error.
At this point, pAge
has as its value the address of howOld
. howOld
, in turn, has the value 50. You could have accomplished this with fewer steps:
unsigned short int howOld = 50; // make a variable
unsigned short int *pAge = &howOld; // make pointer to howOld
pAge
is a pointer that now contains the address of the howOld
variable. Using pAge
, you actually can determine the value of howOld
, which in this case is 50. Accessing howOld
by using the pointer pAge
is called indirection because you are indirectly accessing howOld
by means of pAge
. Later this hour you see how to use indirection to access a variable’s value.
Indirection accesses the value at the address held by a pointer. The pointer provides an indirect way to get the value held at that address.
The indirection operator *
also is called the dereference operator. When a pointer is dereferenced, the value at the address stored by the pointer is retrieved. Consider the following statements to assign one variable’s value to another:
unsigned short int howOld = 50;
unsigned short int yourAge;
yourAge = howOld;
A pointer provides indirect access to the value of the variable whose address it stores. To assign the value in howOld
to the new variable yourAge
by way of the pointer pAge
, you write the following:
unsigned short int howOld = 50; // create the variable howOld
unsigned short int *pAge = &howOld; // pAge points to the address of howOld
unsigned short int yourAge; // create another variable
yourAge = *pAge; // assign value at pAge (50) to yourAge
The indirection operator *
in front of the variable pAge
means “the value stored at.” This assignment says, “Take the value stored at the address in pAge
and assign it to yourAge
.” Another way of thinking about it is “don’t affect the pointer, affect the item stored at the address in the pointer.”
By the Way
The indirection operator *
is used in two distinct ways with pointers: declaration and dereference. When a pointer is declared, the star indicates that it is a pointer, not a normal variable. For example:
unsigned short *pAge = NULL; // make a pointer to an unsigned short
When the pointer is dereferenced, the indirection operator indicates that the value at the memory location stored in the pointer is to be accessed, rather than the address itself:
*pAge = 5; // assign 5 to the value at pAge
Also note that this same character (*
) is used as the multiplication operator. The compiler knows which operator to call based on context.
We deal with indirection in our daily lives all the time. If you want to call the local pizza shop to order dinner but do not know their phone number, you go to the phone book to look it up. That information source is not the pizza shop, but it contains the “address” (phone number) of the pizza shop. When you do that, you perform indirection!
It is important to distinguish between a pointer, the address that the pointer holds, and the value at the address held by the pointer. This is the source of much of the confusion about pointers.
Consider the following code fragment:
int theVariable = 5;
int *pPointer = &theVariable;
theVariable
is declared to be an integer variable initialized with the value 5. pPointer
is declared to be a pointer to an integer; it is initialized with the address of theVariable
. The address that pPointer
holds is the address of theVariable
. The value at the address that pPointer
holds is 5. Figure 10.3 shows a schematic representation of theVariable
and pPointer
.
After a pointer is assigned the address of a variable, you can use that pointer to access the data in that variable. The Pointer program in Listing 10.2 demonstrates how the address of a local variable is assigned to a pointer and how the pointer manipulates the values in that variable.
1: #include <iostream>
2:
3: int main()
4: {
5: int myAge; // a variable
6: int *pAge = NULL; // a pointer
7:
8: myAge = 5;
9: pAge = &myAge; // assign address of myAge to pAge
10: std::cout << "myAge: " << myAge << "
";
11: std::cout << "*pAge: " << *pAge << "
";
12:
13: std::cout << "*pAge = 7
";
14: *pAge = 7; // sets myAge to 7
15: std::cout << "*pAge: " << *pAge << "
";
16: std::cout << "myAge: " << myAge << "
";
17:
18: std::cout << "myAge = 9
";
19: myAge = 9;
20: std::cout << "myAge: " << myAge << "
";
21: std::cout << "*pAge: " << *pAge << "
";
22:
23: return 0;
24: }
Here’s this program’s output:
myAge: 5
*pAge: 5
*pAge = 7
*pAge: 7
myAge: 7
myAge =9
myAge: 9
*pAge: 9
This program declares two variables: an int myAge
; and a pointer pAge
, which is a pointer to int
and which holds the address of myAge
. myAge
is assigned the value 5 in line 8; this is verified by the display in line 10.
In line 9, pAge
is assigned the address of myAge
. In line 11, pAge
is dereferenced and displayed, showing that the value at the address that pAge
stores is the 5 stored in myAge
. In line 14, the value 7 is assigned to the variable at the address stored in pAge
. This sets myAge
to 7, and the displays in lines 15 and 16 confirm this.
In line 19, the value 9 is assigned to the variable myAge
. This value is obtained directly in line 20 and indirectly—by dereferencing pAge
—in line 21.
Pointers enable you to manipulate addresses without ever knowing their real value. After this hour, you’ll take it on faith that when you assign the address of a variable to a pointer, the pointer really has the address of that variable as its value. But just this once, why not check to make sure? The PointerCheck program in Listing 10.3 puts a pointer to the test.
1: #include <iostream>
2:
3: int main()
4: {
5: unsigned short int myAge = 5, yourAge = 10;
6: unsigned short int *pAge = &myAge; // a pointer
7:
8: std::cout << "myAge: " << myAge;
9: std::cout << " yourAge: " << yourAge << "
";
10: std::cout << "&myAge: " << &myAge;
11: std::cout << " &yourAge: " << &yourAge <<"
";
12:
13: std::cout << "pAge: " << pAge << "
";
14: std::cout << "*pAge: " << *pAge << "
";
15:
16: pAge = &yourAge; // reassign the pointer
17:
18: std::cout << "myAge: " << myAge;
19: std::cout << " yourAge: " << yourAge << "
";
20: std::cout << "&myAge: " << &myAge;
21: std::cout << " &yourAge: " << &yourAge <<"
";
22:
23: std::cout << "pAge: " << pAge << "
";
24: std::cout << "*pAge: " << *pAge << "
";
25:
26: std::cout << "&pAge: " << &pAge << "
";
27: return 0;
28: }
This program produces the following output:
myAge: 5 yourAge: 10
&myAge: 1245066 &yourAge: 1245064
pAge: 1245066
*pAge: 5
myAge: 5 yourAge: 10
&myAge: 1245066 &yourAge: 1245064
pAge: 1245064
*pAge: 10
&pAge: 1245060
Your output will differ because each computer stores variables at different addresses, depending on what else is in memory and how much memory is available.
In line 5, myAge
and yourAge
are declared to be variables of type unsigned short
integer. In line 6, pAge
is declared to be a pointer to an unsigned short
integer, and it is initialized with the address of the variable myAge
.
Lines 8–11 print the values and the addresses of myAge
and yourAge
. Line 13 displays the contents of pAge
, which is the address of myAge
. Line 14 displays the result of dereferencing pAge
, which displays the value at pAge
—the value in myAge
, or 5.
This is the essence of pointers. Line 13 shows that pAge
stores the address of myAge
, and line 14 shows how to get the value stored in myAge
by dereferencing the pointer pAge
. Make sure that you understand this fully before you go on. Study the code and look at the output.
In line 16, pAge
is reassigned to point to the address of yourAge
. The values and addresses are displayed again. The output shows that pAge
now has the address of the variable yourAge
, and that dereferencing obtains the value in yourAge
.
Line 26 displays the address of pAge
itself. Like any variable, it too has an address, and that address can be stored in a pointer. (Assigning the address of a pointer to another pointer will be discussed shortly.)
So far, you’ve seen step-by-step details of assigning a variable’s address to a pointer. In practice, though, you would never do this. After all, why bother with a pointer when you already have a variable with access to that value? The only reason for this kind of pointer manipulation of a variable is to demonstrate how pointers work.
Now that you are comfortable with the syntax of pointers, you can put them to better use. Pointers are employed most often for three tasks:
• Managing data on the heap
• Accessing class member data and functions
• Passing variables by reference to functions
The rest of this hour focuses on managing data on the heap and accessing class member data and functions. In Hour 12, “Creating References,” you learn about passing variables by reference.
Programmers generally deal with five areas of memory:
• Global name space
• The heap
• Registers
• Code space
• The stack
Local variables are on the stack, along with function parameters. Code is in code space, of course, and global variables are in global name space. The registers are used for internal housekeeping functions, such as keeping track of the top of the stack and the instruction pointer. Just about all remaining memory is given over to the heap, which is sometimes referred to as the free store.
The problem with local variables is that they don’t persist. When the function returns, the local variables are thrown away. Global variables solve that problem at the cost of being accessible without restriction throughout the program, which leads to the creation of bug-prone code that is more difficult to understand and maintain. Putting data in the heap solves both of these problems.
You can think of the heap as a massive section of memory in which thousands of sequentially numbered cubbyholes lie waiting for your data. You can’t label these cubbyholes, though, as you can with the stack. You must ask for the address of the cubbyhole that you reserve and then stash that address away in a pointer.
One way to think about this is with an analogy: A friend gives you the 800 number for Acme Mail Order. You go home and program your telephone with that number, and then you throw away the piece of paper with the number on it.
When you push the button, a telephone rings somewhere, and Acme Mail Order answers. You don’t remember the number, and you don’t know where the other telephone is located, but the button gives you access to Acme Mail Order.
Acme Mail Order is your data on the heap. You don’t know where it is, but you know how to get to it. You access it by using its address—in this comparison, the telephone number. You don’t have to know that number; you just have to put it into a pointer—the speed-dial button. The pointer gives you access to your data without bothering you with the details.
The stack is cleaned automatically when a function returns. All the local variables go out of scope, and they are removed from the stack. The heap is not cleaned until your program ends, and it is your responsibility to free any memory that you’ve reserved when you are done with it. Leaving items hanging around in the heap when you no longer need them is known as a memory leak, a topic covered later in this hour.
The advantage to the heap is that the memory you reserve remains available until you explicitly free it. If you reserve memory on the heap while in a function, the memory is still available when the function returns.
The advantage of accessing memory in this way, rather than using global variables, is that only functions with access to the pointer have access to the data. This provides a tightly controlled interface to that data, and it eliminates the problem of one function changing that data in unexpected and unanticipated ways.
For this to work, you must be able to create a pointer to an area on the heap. The following sections describe how to do this.
new
KeywordYou allocate memory on the heap in C++ by using the new
keyword. new
is followed by the type of the object that you want to allocate so that the compiler knows how much memory is required. Therefore, new unsigned short int
allocates 2 bytes in the heap, and new long
allocates 4.
The return value from new
is a memory address. It must be assigned to a pointer. To create an unsigned short
on the heap, you might write the following:
unsigned short int *pPointer;
pPointer = new unsigned short int;
You can, of course, initialize the pointer at its creation:
unsigned short int *pPointer = new unsigned short int;
In either case, pPointer
now points to an unsigned short int
on the heap. You can use this like any other pointer to a variable and assign a value into that area of memory:
*pPointer = 72;
This means “put 72 at the value in pPointer
” or “assign the value 72 to the area on the heap to which pPointer
points.”
If new
cannot create memory on the heap—since memory is a limited resource—it throws an exception. Exceptions are error-handling objects covered in detail in Hour 24, “Dealing with Exceptions and Error Handling.”
By the Way
Some older compilers return the null pointer. If you have an older compiler, check your pointer for null each time you request new memory. All modern compilers can be counted on to throw an exception.
delete
KeywordWhen you have finished with your area of memory, you must call delete
on the pointer, which returns the memory to the heap. Remember that the pointer itself—as opposed to the memory it points to—is a local variable. When the function in which it is declared returns, that pointer goes out of scope and is lost. The memory allocated with the new
operator is not freed automatically, however. That memory becomes unavailable—a situation called a memory leak. It’s called a memory leak because that memory can’t be recovered until the program ends. It is as though the memory has leaked out of your computer.
To restore the memory to the heap, you use the keyword delete
. For example:
delete pPointer;
When you delete the pointer, what you are really doing is freeing up the memory whose address is stored in the pointer. You are saying, “Return to the heap the memory that this pointer points to.” The pointer is still a pointer, and it can be reassigned.
When you call delete
on a pointer, the memory it points to is freed. Calling delete
on that pointer again will crash your program! When you delete a pointer, set it to NULL. Calling delete
on a null pointer is guaranteed to be safe. For example:
Animal *pDog = new Animal;
delete pDog; // frees the memory
pDog = NULL; // sets pointer to null
// ...
delete pDog; // harmless
Don’t worry if the preceding code looks a little confusing. We’ll look at allocating objects on the heap in the next hour. This also works with atomic data types like int
, as shown here:
int *pNumber = new int;
delete pNumber; // frees the memory
pNumber = 0; // sets pointer to null
// ...
delete pNumber; // harmless
The Heap program in Listing 10.4 demonstrates allocating a variable on the heap, using that variable, and deleting it.
1: #include <iostream>
2:
3: int main()
4: {
5: int localVariable = 5;
6: int *pLocal= &localVariable;
7: int *pHeap = new int;
8: if (pHeap == NULL)
9: {
10: std::cout << "Error! No memory for pHeap!!";
11: return 1;
12: }
13: *pHeap = 7;
14: std::cout << "localVariable: " << localVariable << "
";
15: std::cout << "*pLocal: " << *pLocal << "
";
16: std::cout << "*pHeap: " << *pHeap << "
";
17: delete pHeap;
18: pHeap = new int;
19: if (pHeap == NULL)
20: {
21: std::cout << "Error! No memory for pHeap!!";
22: return 1;
23: }
24: *pHeap = 9;
25: std::cout << "*pHeap: " << *pHeap << "
";
26: delete pHeap;
27: return 0;
28: }
The program has the following output:
localVariable: 5
*pLocal: 5
*pHeap: 7
*pHeap: 9
Line 5 declares and initializes a local variable. Line 6 declares and initializes a pointer with the address of the local variable. Line 7 declares another pointer but initializes it with the result obtained from calling new int
. This allocates space on the heap for an int
. Line 13 assigns the value 7 to the newly allocated memory. Line 14 displays the value of the local variable, and line 15 prints the value pointed to by pLocal
. As expected, these are the same. Line 16 prints the value pointed to by pHeap
. It shows that the value assigned in line 13 is, in fact, accessible.
In line 17, the memory allocated in line 7 is returned to the heap by a call to delete
. This frees the memory and disassociates the pointer from that memory location. pHeap
is now free to point to other memory. It is reassigned in lines 18–24, and line 25 displays the result. Line 26 again restores that memory to the heap.
By the Way
Although line 26 is redundant (the end of the program would have returned that memory), it is a good idea to free this memory explicitly. If the program changes or is extended, it will be beneficial that this step was already taken care of.
Another way you might inadvertently create a memory leak is by reassigning your pointer before deleting the memory to which it points. Consider this code fragment:
1: unsigned short int *pPointer = new unsigned short int;
2: *pPointer = 72;
3: pPointer = new unsigned short int;
4: *pPointer = 84;
Line 1 in this fragment creates pPointer
and assigns it the address of an area on the heap. Line 2 stores the value 72 in that area of memory. Line 3 reassigns pPointer
to another area of memory. Line 4 places the value 84 in that area. The original area—in which the value 72 is now held—is unavailable because the pointer to that area of memory has been reassigned. There is no way to access that original area of memory, nor is there any way to free it before the program ends.
The code should have been written like this:
1: unsigned short int *pPointer = new unsigned short int;
2: *pPointer = 72;
3: delete pPointer;
4: pPointer = new unsigned short int;
5: *pPointer = 84;
Now the memory originally pointed to by pPointer
is deleted—and thus freed—in line 3 of the preceding fragment.
By the Way
For every time in your program that you call new
, there should be a call to delete
. It is important to keep track of which pointer owns an area of memory and to ensure that the memory is returned to the heap when you are done with it.
This hour was the first of two devoted to pointers, a subject that trips up more beginning C++ programmers than any other aspect of the language.
Variable values are stored in computer memory, which is organized into sequential memory locations. Each location is a memory address. Pointers are special variables to one of those addresses.
Pointers make it possible to manipulate computer memory directly in a program. When you know the memory address of data, you don’t have to use a variable to access that data. You can work with a pointer to that address instead.
There are tasks where it makes more sense to use pointers than variables. Pointers are one of the most powerful parts of the C++ language.
If they’re still a point of confusion, you will find they make more sense the further you progress through the book.
Q. Why are pointers so important?
A. As you saw during this hour, pointers are important because they are used to hold the address of objects on the heap and pass arguments by reference. In addition, in Hour 14, “Calling Advanced Functions,” you’ll see how pointers are used in class polymorphism.
Q. Why should I bother to declare anything on the heap?
A. Objects on the heap persist after the return of a function. In addition, the capability to store objects on the heap enables you to decide at runtime how many objects you need, instead of having to declare this in advance. This is explored in greater depth in Hour 11, “Developing Advanced Pointers.”
Q. If George Washington had accepted the offer to become the king of the United States, who would be the country’s king today?
A. Assuming that the United States followed the same rules of succession as England, the current king would be Paul Emery Washington, 85, a retired building supply company manager in San Antonio, Texas.
George Washington had no blood descendants, so when he died in 1799 the throne would pass to one of his brothers’ children. Some other Washingtons in the line also died childless or had no living male descendants, so the issue becomes complicated.
When the genealogical web site Ancestry.Com researched the American king succession in 2008, they found 8,000 of George Washington’s relatives could factor into the decision. There were four likely succession paths, and Paul Washington was at the end of two of them.
“I doubt if I’d be a very good king,” Paul told NBC’s Today Show. “We’ve done so well as a country without a king, so I think George made the best decision.”
Though he rejects his monarchial birthright, Paul has started calling his son Bill “Prince William.”
Now that you’ve learned about pointers, you can answer a few questions and complete a couple of exercises to firm up your knowledge.
1. What is the difference between 0 and NULL when initializing a pointer?
A. NULL creates a null pointer.
B. 0 creates a null pointer.
C. Both create null pointers.
2. What is it called when you don’t free heap space after you’re doing with it?
A. A memory leak
B. A memory hole
C. A memory fault
3. How do I free up memory allocated with new
?
A. return
B. delete
C. *
1. C. Both 0 and NULL initialize a pointer to address zero, making it a null pointer. Using NULL is more clear because it’s obviously a pointer, and 0 serves many other purposes in C++. When supported, the new constant nullptr
should be used instead of 0 or NULL.
2. A. Memory leak. The program continues to allocate new space as it needs it, but less and less memory is available.
3. B. Use the delete
keyword. It is good practice to delete
as soon as you are done with the contents of a variable on the heap.
1. Modify the PointerCheck program to multiply yourAge
and *pAge
and store the result in a new variable. Display that variable. Think about how the compiler can tell the difference between the *
operator for multiplication and *
for dereferencing pAge
.
2. Further modify PointerCheck to use dereferenced pointer *pAge
to change the contents of myAge
or yourAge
.
To see solutions to these activities, visit this book’s website at http://cplusplus.cadenhead.org.
18.117.187.113