Comparing C-Style Strings

Suppose you want to see if a string in a character array is the word mate. If word is the array name, the following test might not do what you think it should do:

word == "mate"

Remember that the name of an array is a synonym for its address. Similarly, a quoted string constant is a synonym for its address. Thus, the preceding relational expression doesn’t test whether the strings are the same; it checks whether they are stored at the same address. The answer to that is no, even if the two strings have the same characters.

Because C++ handles C-style strings as addresses, you get little satisfaction if you try to use the relational operators to compare strings. Instead, you can go to the C-style string library and use the strcmp() function to compare strings. This function takes two string addresses as arguments. That means the arguments can be pointers, string constants, or character array names. If the two strings are identical, the function returns the value 0. If the first string precedes the second alphabetically, strcmp() returns a negative value, and if the first string follows the second alphabetically, strcmp() returns a positive value. Actually, “in the system collating sequence” is more accurate than “alphabetically.” This means that characters are compared according to the system code for characters. For example, in ASCII code, uppercase letters have smaller codes than the lowercase letters, so uppercase precedes lowercase in the collating sequence. Therefore, the string "Zoo" precedes the string "aviary". The fact that comparisons are based on code values also means that uppercase and lowercase letters differ, so the string "FOO" is different from the string "foo".

In some languages, such as BASIC and standard Pascal, strings stored in differently sized arrays are necessarily unequal to each other. But C-style strings are defined by the terminating null character, not by the size of the containing array. This means that two strings can be identical even if they are contained in differently sized arrays:

char big[80] = "Daffy";         // 5 letters plus
char little[6] = "Daffy";       // 5 letters plus

By the way, although you can’t use relational operators to compare strings, you can use them to compare characters because characters are actually integer types. Therefore, the following is valid code, at least for the ASCII and Unicode character sets, for displaying the characters of the alphabet:

for (ch = 'a'; ch <= 'z'; ch++)
      cout << ch;

The program in Listing 5.11 uses strcmp() in the test condition of a for loop. The program displays a word, changes its first letter, displays the word again, and keeps going until strcmp() determines that word is the same as the string "mate". Note that the listing includes the cstring file because it provides a function prototype for strcmp().

Listing 5.11. compstr1.cpp

// compstr1.cpp -- comparing strings using arrays
#include <iostream>
#include <cstring>     // prototype for strcmp()
int main()
    using namespace std;
    char word[5] = "?ate";

    for (char ch = 'a'; strcmp(word, "mate"); ch++)
        cout << word << endl;
        word[0] = ch;
    cout << "After loop ends, word is " << word << endl;
    return 0;

Here is the output for the program in Listing 5.11:

After loop ends, word is mate

Program Notes

The program in Listing 5.11 has some interesting points. One, of course, is the test. You want the loop to continue as long as word is not mate. That is, you want the test to continue as long as strcmp() says the two strings are not the same. The most obvious test for that is this:

strcmp(word, "mate") != 0    // strings are not the same

This statement has the value 1 (true) if the strings are unequal and the value 0 (false) if they are equal. But what about strcmp(word, "mate") by itself? It has a nonzero value (true) if the strings are unequal and the value 0 (false) if the strings are equal. In essence, the function returns true if the strings are different and false if they are the same. You can use just the function instead of the whole relational expression. This produces the same behavior and involves less typing. Also it’s the way C and C++ programmers have traditionally used strcmp().

Next, compstr1.cpp uses the increment operator to march the variable ch through the alphabet:


You can use the increment and decrement operators with character variables because type char really is an integer type, so the operation actually changes the integer code stored in the variable. Also note that using an array index makes it simple to change individual characters in a string:

word[0] = ch;

