3.5.4. C-Style Character Strings

Image

Image Warning

Although C++ supports C-style strings, they should not be used by C++ programs. C-style strings are a surprisingly rich source of bugs and are the root cause of many security problems. They’re also harder to use!


Character string literals are an instance of a more general construct that C++ inherits from C: C-style character strings. C-style strings are not a type. Instead, they are a convention for how to represent and use character strings. Strings that follow this convention are stored in character arrays and are null terminated. By null-terminated we mean that the last character in the string is followed by a null character (''). Ordinarily we use pointers to manipulate these strings.

C Library String Functions

The Standard C library provides a set of functions, listed in Table 3.8, that operate on C-style strings. These functions are defined in the cstring header, which is the C++ version of the C header string.h.

Table 3.8. C-Style Character String Functions

Image

Image Warning

The functions in Table 3.8 do not verify their string parameters.


The pointer(s) passed to these routines must point to null-terminated array(s):

char ca[] = {'C', '+', '+'};  // not null terminated
cout << strlen(ca) << endl;   // disaster: ca isn't null terminated

In this case, ca is an array of char but is not null terminated. The result is undefined. The most likely effect of this call is that strlen will keep looking through the memory that follows ca until it encounters a null character.

Comparing Strings

Comparing two C-style strings is done quite differently from how we compare library strings. When we compare two library strings, we use the normal relational or equality operators:

string s1 = "A string example";
string s2 = "A different string";
if (s1 < s2)  // false: s2 is less than s1

Using these operators on similarly defined C-style strings compares the pointer values, not the strings themselves:

const char ca1[] = "A string example";
const char ca2[] = "A different string";
if (ca1 < ca2)  // undefined: compares two unrelated addresses

Remember that when we use an array, we are really using a pointer to the first element in the array (§ 3.5.3, p. 117). Hence, this condition actually compares two const char* values. Those pointers do not address the same object, so the comparison is undefined.

To compare the strings, rather than the pointer values, we can call strcmp. That function returns 0 if the strings are equal, or a positive or negative value, depending on whether the first string is larger or smaller than the second:

if (strcmp(ca1, ca2) < 0) // same effect as string comparison s1 < s2

Caller Is Responsible for Size of a Destination String

Concatenating or copying C-style strings is also very different from the same operations on library strings. For example, if we wanted to concatenate the two strings s1 and s2 defined above, we can do so directly:

// initialize largeStr as a concatenation of s1, a space, and s2
string largeStr = s1 + " " + s2;

Doing the same with our two arrays, ca1 and ca2, would be an error. The expression ca1 + ca2 tries to add two pointers, which is illegal and meaningless.

Instead we can use strcat and strcpy. However, to use these functions, we must pass an array to hold the resulting string. The array we pass must be large enough to hold the generated string, including the null character at the end. The code we show here, although a common usage pattern, is fraught with potential for serious error:

// disastrous if we miscalculated the size of largeStr
strcpy(largeStr, ca1);     // copies ca1 into largeStr
strcat(largeStr, " ");     // adds a space at the end of largeStr
strcat(largeStr, ca2);     // concatenates ca2 onto largeStr

The problem is that we can easily miscalculate the size needed for largeStr. Moreover, any time we change the values we want to store in largeStr, we have to remember to double-check that we calculated its size correctly. Unfortunately, programs similar to this code are widely distributed. Programs with such code are error-prone and often lead to serious security leaks.


Image Tip

For most applications, in addition to being safer, it is also more efficient to use library strings rather than C-style strings.



Exercises Section 3.5.4

Exercise 3.37: What does the following program do?

const char ca[] = {'h', 'e', 'l', 'l', 'o'};
const char *cp = ca;
while (*cp) {
    cout << *cp << endl;
    ++cp;
}

Exercise 3.38: In this section, we noted that it was not only illegal but meaningless to try to add two pointers. Why would adding two pointers be meaningless?

Exercise 3.39: Write a program to compare two strings. Now write a program to compare the values of two C-style character strings.

Exercise 3.40: Write a program to define two character arrays initialized from string literals. Now define a third character array to hold the concatenation of the two arrays. Use strcpy and strcat to copy the two arrays into the third.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.93.64