In Exploration 2, I introduced you to character literals in single quotes, such as ' ', to end a line of output, but I have not yet taken the time to explain these fundamental building blocks. Now is the time to explore characters in greater depth.
Character Type
The char type represents a single character. Internally, all computers represent characters as integers. The character set defines the mapping between characters and numeric values. Common character sets are ISO 8859-1 (also called Latin-1) and ISO 10646 (same as Unicode), but many, many other character sets are in wide use.
The C++ standard does not mandate any particular character set. The literal '4' represents the digit 4, but the actual value that the computer uses internally is up to the implementation. You should not assume any particular character set. For example, in ISO 8859-1 (Latin-1), '4' has the value 52, but in EBCDIC, it has the value 244.
Similarly, given a numeric value, you cannot assume anything about the character that value represents. If you know a char variable stores the value 169, the character may be 'z' (EBCDIC), '©' (Unicode), or 'Љ' (ISO 8859-5).
The same sequence is true for letters in the alphabet, that is, 'A' + 25 == 'Z', and 'q' - 'm' == 4, but C++ makes no guarantees concerning the relative values of, say, 'A' and 'a'.
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
Working and Playing with Characters
Briefly, this program reads numbers from the standard input and echoes the values to the standard output. If the program reads any invalid characters, it alerts the user (with a, which I describe later in this Exploration), ignores the line of input, and discards the value. Leading and trailing blank and tab characters are allowed. The program prints the saved numeric value only after reaching the end of an input line. This means if a line contains more than one valid number, the program prints only the last value. I ignore the possibility of overflow, to keep the code simple.
The get function takes a character variable as an argument. It reads one character from the input stream, then stores the character in that variable. The get function does not skip over white space. When you use get as a loop condition, it returns true if it successfully reads a character and the program should keep reading. It returns false if no more input is available or some kind of input error occurred.
All the digit characters have contiguous values, so the inner loop tests to determine if a character is a digit character by comparing it to the values for '0' and '9'. If it is a digit, subtracting the value of '0' from it leaves you with an integer in the range 0 to 9.
The final loop reads characters and does nothing with them. The loop terminates when it reads a new line character. In other words, the final loop reads and ignores the rest of the input line.
Programs that need to handle white space on their own (such as Listing 17-1) can use get, or you can tell the input stream not to skip over white space prior to reading a number or anything else. The next section discusses character I/O in more detail.
Character I/O
After turning off the skipws flag, the input stream does not skip over leading white space characters. For instance, if you were to try to read an integer, and the stream is positioned at white space, the read would fail. If you were to try to read a string, the string would be empty, and the stream position would not advance. So you have to consider carefully whether to skip white space. Typically, you would do that only when reading individual characters.
Remember that an input stream uses the >> operator (Exploration 5), even for manipulators. Using >> for manipulators seems to break the mnemonic of transferring data to the right, but it follows the convention of always using >> with an input stream. If you forget, the compiler will remind you.
Echoing Input to Output, One Character at a Time
You can also use the get member function, in which case you don’t need the noskipws manipulator.
Reading and Writing Points
The first for loop is the key. The loop condition reads an integer and a character and tests to determine if the character is a comma, before reading a second integer. The loop terminates if the input is invalid or ill-formed or if the loop reaches the end-of-file. A more sophisticated program would distinguish between these two cases, but that’s a side issue for the moment.
A for loop can have only one definition, not two. So I had to move the definition of sep out of the loop header. Keeping x and y inside the header avoids conflict with the variables in the second for loop, which have the same names but are distinct variables. In the second loop, the x and y variables are iterators, not integers. The loop iterates over two vectors at the same time. A range-based for loop doesn’t help in this case, so the loop must use explicit iterators.
Newlines and Portability
You’ve probably noticed that Listing 17-3, and every other program I’ve presented so far, prints ' ' at the end of each line of output. We have done so without considering what this really means. Different environments have different conventions for end-of-line characters. UNIX uses a line feed ('x0a'); macOS uses a carriage return ('x0d'); DOS and Microsoft Windows use a combination of a carriage return, followed by a line feed ('x0dx0a'); and some operating systems don’t use line terminators but, instead, have record-oriented files, in which each line is a separate record.
In all these cases, the C++ I/O streams automatically convert a native line ending to a single ' ' character. When you print ' ' to an output stream, the library automatically converts it to a native line ending (or terminates the record).
In other words, you can write programs that use ' ' as a line ending and not concern yourself with native OS conventions. Your source code will be portable to all C++ environments.
Character Escapes
Character Escape Sequences
Escape | Meaning |
---|---|
a | Alert: ring a bell or otherwise signal the user |
Backspace | |
f | Form feed |
| Newline |
| Carriage return |
| Horizontal tab |
v | Vertical tab |
\ | Literal |
' | Literal ' |
" | Literal " |
OOO | Octal (base 8) character value |
xXX . . . | Hexadecimal (base 16) character value |
The last two items are the most interesting. An escape sequence of one to three octal digits (0 to 7) specifies the value of the character. Which character the value represents is up to the implementation.
Understanding all the caveats from the first section of this Exploration, there are times when you must specify an actual character value. The most common is ' ', which is the character with value zero, also called a null character, which you may utilize to initialize char variables. It has some other uses as well, especially when interfacing with C functions and the C standard library.
The final escape sequence (x) lets you specify a character value in hexadecimal. Typically, you would use two hexadecimal digits, because this is all that fits in the typical, 8-bit char. (The purpose of longer x escapes is for wide characters, the subject of Exploration 59.)
The next Exploration continues your understanding of characters by examining how C++ classifies characters according to letter, digit, punctuation, and so on.