Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5

Character Expressions

In This Chapter

Defining character variables and constants

Encoding characters

Declaring a string

Outputting characters to the console

Chapter 4 introduces the concept of the integer variable. This chapter introduces the integer’s smaller sibling, the character or char (pronounced variously as care, chair, or as in the first syllable of charcoal) to us insiders. I use characters in programs that appear in earlier chapters — this chapter introduces them formally.

Defining Character Variables

Character variables are declared just like integers except with the keyword char in place of int:

char inputCharacter;

Character constants are defined as a single character enclosed in single quotes, as in the following:

char letterA = 'A';

This may seem like a silly question, but what exactly is ‘A’? To answer that, I need to explain what it means to encode characters.

Encoding characters

As mentioned in Chapter 1, everything in the computer is represented by a pattern of ones and zeros — variations in voltage that are interpreted as numbers. Thus the bit pattern 0000 0001 is the number 1 when interpreted as an integer. However, this same bit pattern means something completely different when interpreted as an instruction by the processor. So it should come as no surprise that the computer encodes the characters of the alphabet by assigning each a number.

Consider the character ‘A’. You could assign it any value you want as long as we all agree on the value. For example, you could assign a value of 1 to ‘A’, if you wanted to. Logically, you might then assign the value 2 to ‘B’, 3 to ‘C’, and so on. In this scheme, ‘Z’ would get the value 26. You might then start over by assigning the value 27 to ‘a’, 28 to ‘b’, right down to 52 for ‘z’. That still leaves the digits ‘0’ through ‘9’ plus all the special symbols like space, period, comma, slash, semicolon, and the funny characters you see when you press the number keys while holding Shift down. Add to that the unprintable characters such as tab and newline. When all is said and done, you could encode the entire English keyboard using numbers between 1 and 127.

I say you could assign a value for ‘A’, ‘B’, and the remaining characters; however, that wouldn’t be a very good idea because it’s already been done. Sometime around 1963, there was a general agreement on how characters should be encoded in English. The ASCII (American Standard Coding for Information Interchange) character encoding shown in Table 5-1 was adopted pretty much universally except for one company. IBM published its own standard in 1963 as well. The two encoding standards duked it out for about ten years, but by the early 1970s — when C and C++ were being created — ASCII had just about won the battle. The char type was created with ASCII character encoding in mind.

The first thing that you’ll notice is that the first 32 characters are the “unprintable” characters. That doesn’t mean that these characters are so naughty that the censor won’t allow them to be printed — it means that they don’t appear as visible symbols when printed on the printer (or on the console, for that matter). Many of these characters are no longer used or used only in obscure ways. For example, character 25 “End of Medium” was probably printed as the last character before the end of a reel of magnetic tape. That was a big deal in 1963, but today … not so much, so use of the character is limited. My favorite is character 7, the Bell — used to ring the bell on the old teletype machines. (Code::Blocks C++ generates a beep when you display the bell character.)

The characters starting with 32 are all printable with the exception of the last one, 127, which is the Delete character.

Example of character encoding

The following simple program allows you to play with the ASCII character set:

  // CharacterEncoding - allow the user to enter a
//                     numeric value then print that value
//                     out as a character

#include <cstdio>
#include <cstdlib>
#include <iostream>

using namespace std;

int main(int nNumberofArgs, char* pszArgs[])
{
    // Prompt the user for a value
    int nValue;
    cout << "Enter decimal value of char to print:";
    cin >> nValue;

    // Now print that value back out as a character
    char cValue = (char)nValue;
    cout << "The char you entered was [" << cValue
         << "]" << endl;

    // wait until user is ready before terminating program
    // to allow the user to see the program results
    cout << "Press Enter to continue..." << endl;
    cin.ignore(10, ' '),
    cin.get();
    return 0;
}

This program begins by prompting the user to "Enter decimal value of a char to print". The program then reads the value entered by the user into the int variable nValue.

The program then assigns this value to a char variable named cValue.

The (char) appearing in front of nValue is called a cast. In this case, it casts the value of nValue from an int to a char. I could have performed the assignment without the cast, as in

cValue = nValue;

If I’d done that, however, the types of the variables wouldn’t match: The value on the right of the assignment is an int, while the value on the left is a char. C++ will perform the assignment anyway, but it will generally complain about such conversions by generating a warning during the build step. The cast converts the value in nValue to a char before performing the assignment:

cValue = (char)nValue; // cast nValue to a char before
// assigning the value to cValue

The final line outputs the character cValue within a set of square brackets.

The following shows a few sample runs of the program. In the first run, I entered the value 65, which Table 5-1 shows as the character ‘A’:

Enter decimal value of char to print:65.
The char you entered was [A]
Press Enter to continue …

The second time I entered the value 97, which corresponds to the character ‘a’:

Enter decimal value of char to print:97.
The char you entered was [a]
Press Enter to continue …

On subsequent runs, I tried special characters:

Enter decimal value of char to print:36.
The char you entered was [$]
Press to continue …

The value 7 didn’t print anything, but did cause my PC to issue a loud beep that scared the heck out of me.

The value 10 generated the following odd output:

Enter decimal value of char to print:10.
The char you entered was [
]
Press to continue …

Referring to Table 5-1, you can see that 10 is the newline character. This character doesn’t actually print anything, but it does cause subsequent output to start at the beginning of the next line — which is exactly what happened in this case: The closed brace appears by itself at the beginning of the next line when following a newline character.

The endl that appears at the end of many of the output commands seen so far in this chapter generates a newline. It also does a few other things, which Chapter 31 describes.

Encoding Strings of Characters

Theoretically, you could print anything you want using individual characters. However, that could get really tedious — as the following code snippet demonstrates:

  cout << 'E' << 'n' << 't' << 'e' << 'r' << ' '
     << 'd' << 'e' << 'c' << 'i' << 'm' << 'a'
     << 'l' << ' ' << 'v' << 'a' << 'l' << 'u'
     << 'e' << ' ' << 'o' << 'f' << ' ' << 'c'
     << 'h' << 'a' << 'r' << ' ' << 't' << 'o'
     << ' ' << 'p' << 'r' << 'i' << 'n' << 't'
     << ':';

C++ allows you to encode a sequence of characters by enclosing the string in double quotes:

cout << "Enter decimal value of char to print:";

I have a lot more to say about character strings in Chapter 16.

Special Character Constants

You can code a normal, printable character by placing it in single quotes:

char cSpace = ' ';

You can code any character you want, whether printable or not, by placing its octal value after a backslash:

char cSpace = '40';

A constant that appears with a leading zero is assumed to be octal (that is, base 8).

You can code characters in base 16, also called hexadecimal, by preceding the number with a backslash followed by a small x as in the following example:

char cSpace = 'x20';

The decimal value 32 is equal to 40 in base 8 and 20 in base 16. Don’t worry if you don’t feel comfortable with octal or hexadecimal just yet. C++ provides shortcuts for the most common characters.

C++ provides names for some of the unprintable characters that are particularly useful. Some of the more common ones are shown in Table 5-2.

The most common is the newline character, which is nicknamed ' '. In addition, you must use the backslash if you want to print the single-quote character:

char cQuote = ''';

Because C++ normally interprets a single quotation mark as enclosing a character, you have to precede a single quote mark with a backslash character to tell it, “Hey, this single quote isn’t enclosing a character, it is the character.”

In addition, the character ‘\’ is a single backslash.

This leads to one of the more unfortunate coincidences in C++. In Windows, the backslash is used in filename paths, as in the following:

C:\Base DirectorySubdirectoryFile Name

This is encoded in C++ with each backslash replaced by a pair of backslashes, as follows:

"C:\\Base Directory\Subdirectory\File Name"

Wide load ahead

By the early 1970s (when C and C++ were invented), the 128-character ASCII character set had pretty much beat out all rivals. So it was logical that the char type was defined to accommodate the ASCII character set. This character set was fine for English but became overly restrictive when programmers tried to write applications for other European languages.

Fortunately, C and C++ had provided enough room in the char for 256 different characters. Standards committees got busy and used the characters between 128 and 255 for characters that occur in European languages but not English, such as umlauts and accented characters. You can see the results of their handy work using the example CharacterEncoding program from this chapter: Enter 142 and the program prints out an Ä.

Alternative character sets such as Cyrillic, Hebrew, and Arabic could be handled within this restrictive framework by changing character sets, known more commonly as fonts. Thus, while 97 might be a lowercase ‘a’ in the ASCII set, the same number would some other character in the Cyrillic character set — and something different yet again in Hebrew. This is not a very satisfactory solution because it prevents these languages from appearing together in the same output. And in any case it doesn’t handle Oriental languages, in particular Mandarin Chinese, which use far more than the 256 symbols that an ASCII character can represent.

The first C++ response to this problem was to introduce the “wide character” of type wchar_t. This was intended to implement whichever wide character set was native to the host operating system. On Windows, that would be the variant of Unicode known as UTF-2 or UTF-16. (Here the 2 stands for two bytes — the size of each wide character — and the 16 stands for 16 bits.) However, Macintosh’s OS X uses a different variant of Unicode known as UTF-8. Unicode can display not only every alphabet on the planet but also the kanjis used in Chinese and Japanese. The 2011 update to the C++ standard added two further types, char16_t and char32_t, which implement specifically UTF-16 and UTF-32.

For almost every feature that I describe in this book for handling character variables, there is an equivalent feature for the wide character types; programming Unicode, however, is beyond the scope of a beginning text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5: Character Expressions

Create new playlist

Sign In

Sign Up

Character Expressions

Defining Character Variables

Encoding characters

Example of character encoding

Encoding Strings of Characters

Special Character Constants

Table of Contents for
Chapter 5: Character Expressions