char
Type: Characters and Small IntegersIt’s time to turn to the final integer type: char
. As you probably suspect from its name, the char
type is designed to store characters, such as letters and numeric digits. Now, whereas storing numbers is no big deal for computers, storing letters is another matter. Programming languages take the easy way out by using number codes for letters. Thus, the char
type is another integer type. It’s guaranteed to be large enough to represent the entire range of basic symbols—all the letters, digits, punctuation, and the like—for the target computer system. In practice, many systems support fewer than 128 kinds of characters, so a single byte can represent the whole range. Therefore, although char
is most often used to handle characters, you can also use it as an integer type that is typically smaller than short
.
The most common symbol set in the United States is the ASCII character set, described in Appendix C, “The ASCII Character Set.” A numeric code (the ASCII code) represents each character in the set. For example, 65 is the code for the character A, and 77 is the code for the character M. For convenience, this book assumes ASCII code in its examples. However, a C++ implementation uses whatever code is native to its host system—for example, EBCDIC (pronounced “eb-se-dik”) on an IBM mainframe. Neither ASCII nor EBCDIC serve international needs that well, and C++ supports a wide-character type that can hold a larger range of values, such as are used by the international Unicode character set. You’ll learn about this wchar_t
type later in this chapter.
Try the char
type in Listing 3.5.
// chartype.cpp -- the char type
#include <iostream>
int main( )
{
using namespace std;
char ch; // declare a char variable
cout << "Enter a character: " << endl;
cin >> ch;
cout << "Hola! ";
cout << "Thank you for the " << ch << " character." << endl;
return 0;
}
Here’s the output from the program in Listing 3.5:
Enter a character:
M
Hola! Thank you for the M character.
The interesting thing is that you type an M
, not the corresponding character code, 77
. Also the program prints an M
, not 77
. Yet if you peer into memory, you find that 77
is the value stored in the ch
variable. The magic, such as it is, lies not in the char
type but in cin
and cout
. These worthy facilities make conversions on your behalf. On input, cin
converts the keystroke input M
to the value 77
. On output, cout
converts the value 77
to the displayed character M
; cin
and cout
are guided by the type of variable. If you place the same value 77
into an int
variable, cout
displays it as 77
. (That is, cout
displays two 7 characters.) Listing 3.6 illustrates this point. It also shows how to write a character literal in C++: Enclose the character within two single quotation marks, as in 'M'
. (Note that the example doesn’t use double quotation marks. C++ uses single quotation marks for a character and double quotation marks for a string. The cout
object can handle either, but, as Chapter 4 discusses, the two are quite different from one another.) Finally, the program introduces a cout
feature, the cout.put()
function, which displays a single character.
// morechar.cpp -- the char type and int type contrasted
#include <iostream>
int main()
{
using namespace std;
char ch = 'M'; // assign ASCII code for M to ch
int i = ch; // store same code in an int
cout << "The ASCII code for " << ch << " is " << i << endl;
cout << "Add one to the character code:" << endl;
ch = ch + 1; // change character code in ch
i = ch; // save new character code in i
cout << "The ASCII code for " << ch << " is " << i << endl;
// using the cout.put() member function to display a char
cout << "Displaying char ch using cout.put(ch): ";
cout.put(ch);
// using cout.put() to display a char constant
cout.put('!'),
cout << endl << "Done" << endl;
return 0;
}
Here is the output from the program in Listing 3.6:
The ASCII code for M is 77
Add one to the character code:
The ASCII code for N is 78
Displaying char ch using cout.put(ch): N!
Done
In the program in Listing 3.6, the notation 'M'
represents the numeric code for the M character, so initializing the char
variable ch
to 'M'
sets ch
to the value 77
. The program then assigns the identical value to the int
variable i
, so both ch
and i
have the value 77
. Next, cout
displays ch
as M
and i
as 77
. As previously stated, a value’s type guides cout
as it chooses how to display that value—just another example of smart objects.
Because ch
is really an integer, you can apply integer operations to it, such as adding 1. This changes the value of ch
to 78. The program then resets i
to the new value. (Equivalently, you can simply add 1 to i
.) Again, cout
displays the char
version of that value as a character and the int
version as a number.
The fact that C++ represents characters as integers is a genuine convenience that makes it easy to manipulate character values. You don’t have to use awkward conversion functions to convert characters to ASCII and back.
Even digits entered via the keyboard are read as characters. Consider the following sequence:
char ch;
cin >> ch;
If you type 5
and Enter, this code reads the 5
character and stores the character code for the 5
character (53 in ASCII) in ch
. Now consider this code:
int n;
cin >> n;
The same input results in the program reading the 5
character and running a routine converting the character to the corresponding numeric value of 5
, which gets stored in n
.
Finally, the program uses the cout.put()
function to display both c
and a character constant.
cout.put()
Just what is cout.put()
, and why does it have a period in its name? The cout.put()
function is your first example of an important C++ OOP concept, the member function. Remember that a class defines how to represent data and how to manipulate it. A member function belongs to a class and describes a method for manipulating class data. The ostream
class, for example, has a put()
member function that is designed to output characters. You can use a member function only with a particular object of that class, such as the cout
object, in this case. To use a class member function with an object such as cout
, you use a period to combine the object name (cout
) with the function name (put()
). The period is called the membership operator. The notation cout.put()
means to use the class member function put()
with the class object cout
. You’ll learn about this in greater detail when you reach classes in Chapter 10, “Objects and Classes.” Now the only classes you have are the istream
and ostream
classes, and you can experiment with their member functions to get more comfortable with the concept.
The cout.put()
member function provides an alternative to using the <<
operator to display a character. At this point you might wonder why there is any need for cout.put()
. Much of the answer is historical. Before Release 2.0 of C++, cout
would display character variables as characters but display character constants, such as 'M'
and 'N'
, as numbers. The problem was that earlier versions of C++, like C, stored character constants as type int
. That is, the code 77
for 'M'
would be stored in a 16-bit or 32-bit unit. Meanwhile, char
variables typically occupied 8 bits. A statement like the following copied 8 bits (the important 8 bits) from the constant 'M'
to the variable ch
:
char ch = 'M';
Unfortunately, this meant that, to cout
, 'M'
and ch
looked quite different from one another, even though both held the same value. So a statement like the following would print the ASCII code for the $
character rather than simply display $
:
cout << '$';
But the following would print the character, as desired:
cout.put('$'),
Now, after Release 2.0, C++ stores single-character constants as type char
, not type int
. Therefore, cout
now correctly handles character constants.
The cin
object has a couple different ways of reading characters from input. You can explore these by using a program that uses a loop to read several characters, so we’ll return to this topic when we cover loops in Chapter 5, “Loops and Relational Expressions.”
char
LiteralsYou have several options for writing character literals in C++. The simplest choice for ordinary characters, such as letters, punctuation, and digits, is to enclose the character in single quotation marks. This notation stands for the numeric code for the character. For example, an ASCII system has the following correspondences:
• 'A'
is 65
, the ASCII code for A
.
• 'a'
is 97
, the ASCII code for a
.
• '5'
is 53
, the ASCII code for the digit 5
.
• ' '
is 32
, the ASCII code for the space character.
• '!'
is 33
, the ASCII code for the exclamation point.
Using this notation is better than using the numeric codes explicitly. It’s clearer, and it doesn’t assume a particular code. If a system uses EBCDIC, then 65
is not the code for A
, but 'A'
still represents the character.
There are some characters that you can’t enter into a program directly from the keyboard. For example, you can’t make the newline character part of a string by pressing the Enter key; instead, the program editor interprets that keystroke as a request for it to start a new line in your source code file. Other characters have difficulties because the C++ language imbues them with special significance. For example, the double quotation mark character delimits string literals, so you can’t just stick one in the middle of a string literal. C++ has special notations, called escape sequences, for several of these characters, as shown in Table 3.2. For example, a
represents the alert character, which beeps your terminal’s speaker or rings its bell. The escape sequence
represents a newline. And "
represents the double quotation mark as an ordinary character instead of a string delimiter. You can use these notations in strings or in character constants, as in the following examples:
char alarm = 'a';
cout << alarm << "Don't do that again!a
";
cout << "Ben "Buggsie" Hacker
was here!
";
The last line produces the following output:
Ben "Buggsie" Hacker
was here!
Note that you treat an escape sequence, such as
, just as a regular character, such as Q
. That is, you enclose it in single quotes to create a character constant and don’t use single quotes when including it as part of a string.
The escape sequence concept dates back to when people communicated with computers using the teletype, an electromechanical typewriter-printer, and modern systems don’t always honor the complete set of escape sequences. For example, some systems remain silent for the alarm character.
The newline character provides an alternative to endl
for inserting new lines into output. You can use the newline character in character constant notation ('
'
) or as character in a string ("
"
). All three of the following move the screen cursor to the beginning of the next line:
cout << endl; // using the endl manipulator
cout << '
'; // using a character constant
cout << "
"; // using a string
You can embed the newline character in a longer string; this is often more convenient than using endl
. For example, the following two cout
statements produce the same output:
cout << endl << endl << "What next?" << endl << "Enter a number:" << endl;
cout << "
What next?
Enter a number:
";
When you’re displaying a number, endl
is a bit easier to type than "
"
or '
'
, but when you’re displaying a string, ending the string with a newline character requires less typing:
cout << x << endl; // easier than cout << x << "
";
cout << "Dr. X.
"; // easier than cout << "The Dr. X." << endl;
Finally, you can use escape sequences based on the octal or hexadecimal codes for a character. For example, Ctrl+Z has an ASCII code of 26, which is 032 in octal and 0x1a in hexadecimal. You can represent this character with either of the following escape sequences: