Chapter 5
In This Chapter
Defining character variables and constants
Encoding characters
Declaring a string
Outputting characters to the console
Chapter 4 introduces the concept of the integer variable. This chapter introduces the integer’s smaller sibling, the character or char (pronounced variously as care, chair, or as in the first syllable of charcoal) to us insiders. I use characters in programs that appear in earlier chapters — this chapter introduces them formally.
Character variables are declared just like integers except with the keyword char in place of int:
char inputCharacter;
Character constants are defined as a single character enclosed in single quotes, as in the following:
char letterA = 'A';
This may seem like a silly question, but what exactly is ‘A’? To answer that, I need to explain what it means to encode characters.
As mentioned in Chapter 1, everything in the computer is represented by a pattern of ones and zeros — variations in voltage that are interpreted as numbers. Thus the bit pattern 0000 0001 is the number 1 when interpreted as an integer. However, this same bit pattern means something completely different when interpreted as an instruction by the processor. So it should come as no surprise that the computer encodes the characters of the alphabet by assigning each a number.
Consider the character ‘A’. You could assign it any value you want as long as we all agree on the value. For example, you could assign a value of 1 to ‘A’, if you wanted to. Logically, you might then assign the value 2 to ‘B’, 3 to ‘C’, and so on. In this scheme, ‘Z’ would get the value 26. You might then start over by assigning the value 27 to ‘a’, 28 to ‘b’, right down to 52 for ‘z’. That still leaves the digits ‘0’ through ‘9’ plus all the special symbols like space, period, comma, slash, semicolon, and the funny characters you see when you press the number keys while holding Shift down. Add to that the unprintable characters such as tab and newline. When all is said and done, you could encode the entire English keyboard using numbers between 1 and 127.
I say you could assign a value for ‘A’, ‘B’, and the remaining characters; however, that wouldn’t be a very good idea because it’s already been done. Sometime around 1963, there was a general agreement on how characters should be encoded in English. The ASCII (American Standard Coding for Information Interchange) character encoding shown in Table 5-1 was adopted pretty much universally except for one company. IBM published its own standard in 1963 as well. The two encoding standards duked it out for about ten years, but by the early 1970s — when C and C++ were being created — ASCII had just about won the battle. The char type was created with ASCII character encoding in mind.
The first thing that you’ll notice is that the first 32 characters are the “unprintable” characters. That doesn’t mean that these characters are so naughty that the censor won’t allow them to be printed — it means that they don’t appear as visible symbols when printed on the printer (or on the console, for that matter). Many of these characters are no longer used or used only in obscure ways. For example, character 25 “End of Medium” was probably printed as the last character before the end of a reel of magnetic tape. That was a big deal in 1963, but today … not so much, so use of the character is limited. My favorite is character 7, the Bell — used to ring the bell on the old teletype machines. (Code::Blocks C++ generates a beep when you display the bell character.)
The characters starting with 32 are all printable with the exception of the last one, 127, which is the Delete character.
The following simple program allows you to play with the ASCII character set:
// CharacterEncoding - allow the user to enter a
// numeric value then print that value
// out as a character
#include <cstdio>
#include <cstdlib>
#include <iostream>
using namespace std;
int main(int nNumberofArgs, char* pszArgs[])
{
// Prompt the user for a value
int nValue;
cout << "Enter decimal value of char to print:";
cin >> nValue;
// Now print that value back out as a character
char cValue = (char)nValue;
cout << "The char you entered was [" << cValue
<< "]" << endl;
// wait until user is ready before terminating program
// to allow the user to see the program results
cout << "Press Enter to continue..." << endl;
cin.ignore(10, '
'),
cin.get();
return 0;
}
This program begins by prompting the user to "Enter decimal value of a char to print". The program then reads the value entered by the user into the int variable nValue.
The program then assigns this value to a char variable named cValue.
cValue = nValue;
If I’d done that, however, the types of the variables wouldn’t match: The value on the right of the assignment is an int, while the value on the left is a char. C++ will perform the assignment anyway, but it will generally complain about such conversions by generating a warning during the build step. The cast converts the value in nValue to a char before performing the assignment:
cValue = (char)nValue; // cast nValue to a char before
// assigning the value to cValue
The final line outputs the character cValue within a set of square brackets.
The following shows a few sample runs of the program. In the first run, I entered the value 65, which Table 5-1 shows as the character ‘A’:
Enter decimal value of char to print:65.
The char you entered was [A]
Press Enter to continue …
The second time I entered the value 97, which corresponds to the character ‘a’:
Enter decimal value of char to print:97.
The char you entered was [a]
Press Enter to continue …
On subsequent runs, I tried special characters:
Enter decimal value of char to print:36.
The char you entered was [$]
Press to continue …
The value 7 didn’t print anything, but did cause my PC to issue a loud beep that scared the heck out of me.
The value 10 generated the following odd output:
Enter decimal value of char to print:10.
The char you entered was [
]
Press to continue …
Referring to Table 5-1, you can see that 10 is the newline character. This character doesn’t actually print anything, but it does cause subsequent output to start at the beginning of the next line — which is exactly what happened in this case: The closed brace appears by itself at the beginning of the next line when following a newline character.
Theoretically, you could print anything you want using individual characters. However, that could get really tedious — as the following code snippet demonstrates:
cout << 'E' << 'n' << 't' << 'e' << 'r' << ' '
<< 'd' << 'e' << 'c' << 'i' << 'm' << 'a'
<< 'l' << ' ' << 'v' << 'a' << 'l' << 'u'
<< 'e' << ' ' << 'o' << 'f' << ' ' << 'c'
<< 'h' << 'a' << 'r' << ' ' << 't' << 'o'
<< ' ' << 'p' << 'r' << 'i' << 'n' << 't'
<< ':';
C++ allows you to encode a sequence of characters by enclosing the string in double quotes:
cout << "Enter decimal value of char to print:";
I have a lot more to say about character strings in Chapter 16.
You can code a normal, printable character by placing it in single quotes:
char cSpace = ' ';
You can code any character you want, whether printable or not, by placing its octal value after a backslash:
char cSpace = ' 40';
You can code characters in base 16, also called hexadecimal, by preceding the number with a backslash followed by a small x as in the following example:
char cSpace = 'x20';
C++ provides names for some of the unprintable characters that are particularly useful. Some of the more common ones are shown in Table 5-2.
The most common is the newline character, which is nicknamed ' '. In addition, you must use the backslash if you want to print the single-quote character:
char cQuote = ''';
In addition, the character ‘\’ is a single backslash.
C:\Base DirectorySubdirectoryFile Name
This is encoded in C++ with each backslash replaced by a pair of backslashes, as follows:
"C:\\Base Directory\Subdirectory\File Name"
3.144.119.170