3.2.3. Dealing with the Characters in a string

Image

Often we need to deal with the individual characters in a string. We might want to check to see whether a string contains any whitespace, or to change the characters to lowercase, or to see whether a given character is present, and so on.

One part of this kind of processing involves how we gain access to the characters themselves. Sometimes we need to process every character. Other times we need to process only a specific character, or we can stop processing once some condition is met. It turns out that the best way to deal with these cases involves different language and library facilities.

The other part of processing characters is knowing and/or changing the characteristics of a character. This part of the job is handled by a set of library functions, described in Table 3.3 (overleaf). These functions are defined in the cctype header.

Table 3.3. cctype Functions

Image
Image
Processing Every Character? Use Range-Based for

If we want to do something to every character in a string, by far the best approach is to use a statement introduced by the new standard: the range for statement. This statement iterates through the elements in a given sequence and performs some operation on each value in that sequence. The syntactic form is

for (declaration : expression)
    statement

where expression is an object of a type that represents a sequence, and declaration defines the variable that we’ll use to access the underlying elements in the sequence. On each iteration, the variable in declaration is initialized from the value of the next element in expression.

A string represents a sequence of characters, so we can use a string as the expression in a range for. As a simple example, we can use a range for to print each character from a string on its own line of output:

string str("some string");
// print the characters in str one character to a line
for (auto c : str)      // for every char in str
    cout << c << endl;  // print the current character followed by a newline

The for loop associates the variable c with str. We define the loop control variable the same way we do any other variable. In this case, we use auto2.5.2, p. 68) to let the compiler determine the type of c, which in this case will be char. On each iteration, the next character in str will be copied into c. Thus, we can read this loop as saying, “For every character c in the string str,” do something. The “something” in this case is to print the character followed by a newline.

As a somewhat more complicated example, we’ll use a range for and the ispunct function to count the number of punctuation characters in a string:

string s("Hello World!!!");
// punct_cnt has the same type that s.size returns; see § 2.5.3 (p. 70)
decltype(s.size()) punct_cnt = 0;
// count the number of punctuation characters in s
for (auto c : s)        // for every char in s
    if (ispunct(c))     // if the character is punctuation
        ++punct_cnt;    // increment the punctuation counter
cout << punct_cnt
     << " punctuation characters in " << s << endl;

The output of this program is

3 punctuation characters in Hello World!!!

Here we use decltype2.5.3, p. 70) to declare our counter, punct_cnt. Its type is the type returned by calling s.size, which is string::size_type. We use a range for to process each character in the string. This time we check whether each character is punctuation. If so, we use the increment operator (§ 1.4.1, p. 12) to add 1 to the counter. When the range for completes, we print the result.

Using a Range for to Change the Characters in a string

If we want to change the value of the characters in a string, we must define the loop variable as a reference type (§ 2.3.1, p. 50). Remember that a reference is just another name for a given object. When we use a reference as our control variable, that variable is bound to each element in the sequence in turn. Using the reference, we can change the character to which the reference is bound.

Suppose that instead of counting punctuation, we wanted to convert a string to all uppercase letters. To do so we can use the library toupper function, which takes a character and returns the uppercase version of that character. To convert the whole string we need to call toupper on each character and put the result back in that character:

string s("Hello World!!!");
// convert s to uppercase
for (auto &c : s)   // for every char in s (note: c is a reference)
    c = toupper(c); // c is a reference, so the assignment changes the char in s
cout << s << endl;

The output of this code is

HELLO WORLD!!!

On each iteration, c refers to the next character in s. When we assign to c, we are changing the underlying character in s. So, when we execute

c = toupper(c); // c is a reference, so the assignment changes the char in s

we’re changing the value of the character to which c is bound. When this loop completes, all the characters in str will be uppercase.

Processing Only Some Characters?

A range for works well when we need to process every character. However, sometimes we need to access only a single character or to access characters until some condition is reached. For example, we might want to capitalize only the first character or only the first word in a string.

There are two ways to access individual characters in a string: We can use a subscript or an iterator. We’ll have more to say about iterators in § 3.4 (p. 106) and in Chapter 9.

The subscript operator (the [ ] operator) takes a string::size_type3.2.2, p. 88) value that denotes the position of the character we want to access. The operator returns a reference to the character at the given position.

Subscripts for strings start at zero; if s is a string with at least two characters, then s[0] is the first character, s[1] is the second, and the last character is in s[s.size() - 1].


Image Note

The values we use to subscript a string must be >= 0 and < size().

The result of using an index outside this range is undefined.

By implication, subscripting an empty string is undefined.


The value in the subscript is referred to as “a subscript” or “an index.” The index we supply can be any expression that yields an integral value. However, if our index has a signed type, its value will be converted to the unsigned type that string::size_type represents (§ 2.1.2, p. 36).

The following example uses the subscript operator to print the first character in a string:

if (!s.empty())            // make sure there's a character to print
    cout << s[0] << endl;  // print the first character in s

Before accessing the character, we check that s is not empty. Any time we use a subscript, we must ensure that there is a value at the given location. If s is empty, then s[0] is undefined.

So long as the string is not const2.4, p. 59), we can assign a new value to the character that the subscript operator returns. For example, we can capitalize the first letter as follows:

string s("some string");
if (!s.empty())             // make sure there's a character in s[0]
    s[0] = toupper(s[0]);   // assign a new value to the first character in s

The output of this program is

Some string

Using a Subscript for Iteration

As a another example, we’ll change the first word in s to all uppercase:

// process characters in s until we run out of characters or we hit a whitespace
for (decltype(s.size()) index = 0;
     index != s.size() && !isspace(s[index]); ++index)
        s[index] = toupper(s[index]); // capitalize the current character

This program generates

SOME string

Our for loop (§ 1.4.2, p. 13) uses index to subscript s. We use decltype to give index the appropriate type. We initialize index to 0 so that the first iteration will start on the first character in s. On each iteration we increment index to look at the next character in s. In the body of the loop we capitalize the current letter.

The new part in this loop is the condition in the for. That condition uses the logical AND operator (the && operator). This operator yields true if both operands are true and false otherwise. The important part about this operator is that we are guaranteed that it evaluates its right-hand operand only if the left-hand operand is true. In this case, we are guaranteed that we will not subscript s unless we know that index is in range. That is, s[index] is executed only if index is not equal to s.size(). Because index is never incremented beyond the value of s.size(), we know that index will always be less than s.size().

Using a Subscript for Random Access

In the previous example we advanced our subscript one position at a time to capitalize each character in sequence. We can also calculate an subscript and directly fetch the indicated character. There is no need to access characters in sequence.

As an example, let’s assume we have a number between 0 and 15 and we want to generate the hexadecimal representation of that number. We can do so using a string that is initialized to hold the 16 hexadecimal “digits”:

const string hexdigits = "0123456789ABCDEF"; // possible hex digits
cout << "Enter a series of numbers between 0 and 15"
     << " separated by spaces. Hit ENTER when finished: "
     << endl;
string result;        // will hold the resulting hexify'd string
string::size_type n;  // hold numbers from the input
while (cin >> n)
    if (n < hexdigits.size())    // ignore invalid input
        result += hexdigits[n];  // fetch the indicated hex digit
cout << "Your hex number is: " << result << endl;

If we give this program the input

12 0 5 15 8 15

the output will be

Your hex number is: C05F8F

We start by initializing hexdigits to hold the hexadecimal digits 0 through F. We make that string const2.4, p. 59) because we do not want these values to change. Inside the loop we use the input value n to subscript hexdigits. The value of hexdigits[n] is the char that appears at position n in hexdigits. For example, if n is 15, then the result is F; if it’s 12, the result is C; and so on. We append that digit to result, which we print once we have read all the input.

Whenever we use a subscript, we should think about how we know that it is in range. In this program, our subscript, n, is a string::size_type, which as we know is an unsigned type. As a result, we know that n is guaranteed to be greater than or equal to 0. Before we use n to subscript hexdigits, we verify that it is less than the size of hexdigits.


Exercises Section 3.2.3

Exercise 3.6: Use a range for to change all the characters in a string to X.

Exercise 3.7: What would happen if you define the loop control variable in the previous exercise as type char? Predict the results and then change your program to use a char to see if you were right.

Exercise 3.8: Rewrite the program in the first exercise, first using a while and again using a traditional for loop. Which of the three approaches do you prefer and why?

Exercise 3.9: What does the following program do? Is it valid? If not, why not?

string s;
cout << s[0] << endl;

Exercise 3.10: Write a program that reads a string of characters including punctuation and writes what was read but with the punctuation removed.

Exercise 3.11: Is the following range for legal? If so, what is the type of c?

const string s = "Keep out!";
for (auto &c : s) { /* ...  */ }


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.70.170