Chapter 10. Character Strings

NOW, YOU ARE READY TO TAKE a look at character strings in more detail. You were first introduced to character strings in Chapter 3, “Compiling and Running Your First Program,” when you wrote your first C program. In the statement

printf ("Programming in C is fun.
");

the argument that is passed to the printf function is the character string

"Programming in C is fun.
"

The double quotation marks are used to delimit the character string, which can contain any combinations of letters, numbers, or special characters, other than a double quotation mark. But as you shall see shortly, it is even possible to include a double quotation mark inside a character string.

When introduced to the data type char, you learned that a variable that is declared to be of this type can contain only a single character. To assign a single character to such a variable, the character is enclosed within a pair of single quotation marks. Thus, the assignment

plusSign = '+';

has the effect of assigning the character '+' to the variable plusSign, assuming it has been appropriately declared. In addition, you learned that there is a distinction made between the single quotation and double quotation marks, and that if plusSign is declared to be of type char, then the statement

plusSign = "+";

is incorrect. Be certain you remember that single quotation and double quotation marks are used to create two different types of constants in C.

Arrays of Characters

If you want to be able to deal with variables that can hold more than a single character,[1] this is precisely where the array of characters comes into play.

In Program 7.6, you defined an array of characters called word as follows:

char  word [] = { 'H', 'e', 'l', 'l', 'o', '!' };

Remembering that in the absence of a particular array size, the C compiler automatically computes the number of elements in the array based upon the number of initializers, this statement reserves space in memory for exactly six characters, as shown in Figure 10.1.

The array word in memory.

Figure 10.1. The array word in memory.

To print out the contents of the array word, you ran through each element in the array and displayed it using the %c format characters.

With this technique, you can begin to build an assortment of useful functions for dealing with character strings. Some of the more commonly performed operations on character strings include combining two character strings together (concatenation), copying one character string to another, extracting a portion of a character string (substring), and determining if two character strings are equal (that is, if they contain the same characters). Take the first mentioned operation, concatenation, and develop a function to perform this task. You can define a call to your concat function as follows:

concat (result, str1, n1, str2, n2);

where str1 and str2 represent the two character arrays that are to be concatenated and n1 and n2 represent the number of characters in the respective arrays. This makes the function flexible enough so that you can concatenate two character arrays of arbitrary length. The argument result represents the character array that is to be the destination of the concatenated character arrays str1 followed by str2. See Program 10.1.

Example 10.1. Concatenating Character Arrays

// Function to concatenate two character arrays

#include <stdio.h>

void  concat (char  result[], const char  str1[], int  n1,
                    const char  str2[], int  n2)
{
    int  i, j;

    // copy str1 to result

    for ( i = 0;  i < n1;  ++i )
        result[i] = str1[i];

    // copy str2 to result

    for ( j = 0;  j < n2;  ++j )
        result[n1 + j] = str2[j];
}

int main (void)
{
    void   concat (char  result[], const char  str1[], int  n1,
                         const char  str2[], int  n2);
    const  char   s1[5] = { 'T', 'e', 's', 't', ' '};
    const  char   s2[6] = { 'w', 'o', 'r', 'k', 's', '.' };
    char   s3[11];
    int    i;

    concat (s3, s1, 5, s2, 6);

    for ( i = 0;  i < 11;  ++i )
        printf ("%c", s3[i]);

    printf ("
");

    return 0;
}

Example 10.1. Output

Test works.

The first for loop inside the concat function copies the characters from the str1 array into the result array. This loop is executed n1 times, which is the number of characters contained inside the str1 array.

The second for loop copies str2 into the result array. Because str1 was n1 characters long, copying into result begins at result[n1]—the position immediately following the one occupied by the last character of str1. After this for loop is done, the result array contains the n1+n2 characters representing str2 concatenated to the end of str1.

Inside the main routine, two const character arrays, s1 and s2, are defined. The first array is initialized to the characters 'T', 'e', 's', 't', and ''. This last character represents a blank space and is a perfectly valid character constant. The second array is initially set to the characters 'w', 'o', 'r', 'k', 's', and '.'. A third character array, s3, is defined with enough space to hold s1 concatenated to s2, or 11 characters. It is not declared as a const array because its contents will be changed.

The function call

concat (s3, s1, 5, s2, 6);

calls the concat function to concatenate the character arrays s1 and s2, with the destination array s3. The arguments 5 and 6 are passed to the function to indicate the number of characters in s1 and s2, respectively.

After the concat function has completed execution and returns to main, a for loop is set up to display the results of the function call. The 11 elements of s3 are displayed at the terminal, and as can be seen from the program’s output, the concat function seems to be working properly. In the preceding program example, it is assumed that the first argument to the concat function—the result array—contains enough space to hold the resulting concatenated character arrays. Failure to do so can produce unpredictable results when the program is run.

Variable-Length Character Strings

You can adopt a similar approach to that used by the concat function for defining other functions to deal with character arrays. That is, you can develop a set of routines, each of which has as its arguments one or more character arrays plus the number of characters contained in each such array. Unfortunately, after working with these functions for a while, you will find that it gets a bit tedious trying to keep track of the number of characters contained in each character array that you are using in your program—especially if you are using your arrays to store character strings of varying sizes. What you need is a method for dealing with character arrays without having to worry about precisely how many characters you have stored in them.

There is such a method, and it is based upon the idea of placing a special character at the end of every character string. In this manner, the function can then determine for itself when it has reached the end of a character string after it encounters this special character. By developing all of your functions to deal with character strings in this fashion, you can eliminate the need to specify the number of characters that are contained inside a character string.

In the C language, the special character that is used to signal the end of a string is known as the null character and is written as ''. So, the statement

const char  word [] = { 'H', 'e', 'l', 'l', 'o', '!', '' };

defines a character array called word that contains seven characters, the last of which is the null character. (Recall that the backslash character [] is a special character in the C language and does not count as a separate character; therefore, '' represents a single character in C.) The array word is depicted in Figure 10.2.

The array word with a terminating null character.

Figure 10.2. The array word with a terminating null character.

To begin with an illustration of how these variable-length character strings are used, write a function that counts the number of characters in a character string, as shown in Program 10.2. Call the function stringLength and have it take as its argument a character array that is terminated by the null character. The function determines the number of characters in the array and returns this value back to the calling routine. Define the number of characters in the array as the number of characters up to, but not including, the terminating null character. So, the function call

stringLength (characterString)

should return the value 3 if characterString is defined as follows:

char  characterString[] = { 'c', 'a', 't', '' };

Example 10.2. Counting the Characters in a String

// Function to count the number of characters in a string

#include <stdio.h>
int  stringLength (const char  string[])
{
    int  count = 0;

    while ( string[count] != '' )
        ++count;

    return count;
}

int main (void)
{
    int   stringLength (const char  string[]);
    const char  word1[] = { 'a', 's', 't', 'e', 'r', '' };
    const char  word2[] = { 'a', 't', '' };
    const char  word3[] = { 'a', 'w', 'e', '' };

    printf ("%i   %i   %i
", stringLength (word1),
             stringLength (word2), stringLength (word3));

    return 0;
}

Example 10.2. Output

5   2   3

The stringLength function declares its argument as a const array of characters because it is not making any changes to the array, merely counting its size.

Inside the stringLength function, the variable count is defined and its value set to 0. The program then enters a while loop to sequence through the string array until the null character is reached. When the function finally hits upon this character, signaling the end of the character string, the while loop is exited and the value of count is returned. This value represents the number of characters in the string, excluding the null character. You might want to trace through the operation of this loop on a small character array to verify that the value of count when the loop is exited is in fact equal to the number of characters in the array, excluding the null character.

In the main routine, three character arrays, word1, word2, and word3, are defined. The printf function call displays the results of calling the stringLength function for each of these three character arrays.

Initializing and Displaying Character Strings

Now, it is time to go back to the concat function developed in Program 10.1 and rewrite it to work with variable-length character strings. Obviously, the function must be changed somewhat because you no longer want to pass as arguments the number of characters in the two arrays. The function now takes only three arguments: the two character arrays to be concatenated and the character array in which to place the result.

Before delving into this program, you should first learn about two nice features that C provides for dealing with character strings.

The first feature involves the initialization of character arrays. C permits a character array to be initialized by simply specifying a constant character string rather than a list of individual characters. So, for example, the statement

char  word[] = { "Hello!" };

can be used to set up an array of characters called word with the initial characters ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘!’, and ‘’, respectively. You can also omit the braces when initializing character arrays in this manner. So, the statement

char word[] =  "Hello!";

is perfectly valid. Either statement is equivalent to the statement

char  word[] = { 'H', 'e', 'l', 'l', 'o', '!', '' };

If you’re explicitly specifying the size of the array, make certain you leave enough space for the terminating null character. So, in

char  word[7] = { "Hello!" };

the compiler has enough room in the array to place the terminating null character. However, in

char  word[6] = { "Hello!" };

the compiler can’t fit a terminating null character at the end of the array, and so it doesn’t put one there (and it doesn’t complain about it either).

In general, wherever they appear in your program, character-string constants in the C language are automatically terminated by the null character. This fact helps functions such as printf determine when the end of a character string has been reached. So, in the call

printf ("Programming in C is fun.
");

the null character is automatically placed after the newline character in the character string, thereby enabling the printf function to determine when it has reached the end of the format string.

The other feature to be mentioned here involves the display of character strings. The special format characters %s inside a printf format string can be used to display an array of characters that is terminated by the null character. So, if word is a null-terminated array of characters, the printf call

printf ("%s
", word);

can be used to display the entire contents of the word array at the terminal. The printf function assumes when it encounters the %s format characters that the corresponding argument is a character string that is terminated by a null character.

The two features just described were incorporated into the main routine of Program 10.3, which illustrates your revised concat function. Because you are no longer passing the number of characters in each string as arguments to the function, the function must determine when the end of each string is reached by testing for the null character. Also, when str1 is copied into the result array, you want to be certain not to also copy the null character because this ends the string in the result array right there. You do need, however, to place a null character into the result array after str2 has been copied so as to signal the end of the newly created string.

Example 10.3. Concatenating Character Strings

#include <stdio.h>

int main (void)
{
    void  concat (char  result[], const char  str1[], const char  str2[]);
    const char  s1[] = { "Test " };
    const char  s2[] = { "works." };
    char  s3[20];

    concat (s3, s1, s2);

    printf ("%s
", s3);

    return 0;
}

// Function to concatenate two character strings

void concat (char  result[], const char  str1[], const char  str2[])
{
    int  i, j;

    // copy str1 to result

    for ( i = 0;  str1[i] != '';  ++i )
        result[i] = str1[i];

    // copy str2 to result

    for ( j = 0;  str2[j] != '';  ++j )
        result[i + j] = str2[j];

    // Terminate the concatenated string with a null character

    result [i + j] = '';
}

Example 10.3. Output

Test works.

In the first for loop of the concat function, the characters contained inside str1 are copied into the result array until the null character is reached. Because the for loop terminates as soon as the null character is matched, it does not get copied into the result array.

In the second loop, the characters from str2 are copied into the result array directly after the final character from str1. This loop makes use of the fact that when the previous for loop finished execution, the value of i was equal to the number of characters in str1, excluding the null character. Therefore, the assignment statement

result[i + j] = str2[j];

is used to copy the characters from str2 into the proper locations of result.

After the second loop is completed, the concat function puts a null character at the end of the string. Study the function to ensure that you understand the use of i and j. Many program errors when dealing with character strings involve the use of an index number that is off by 1 in either direction.

Remember, to reference the first character of an array, an index number of 0 is used. In addition, if a character array string contains n characters, excluding the null byte, then string[n1] references the last (nonnull) character in the string, whereas string[n] references the null character. Furthermore, string must be defined to contain at least n + 1 characters, bearing in mind that the null character occupies a location in the array.

Returning to the program, the main routine defines two char arrays, s1 and s2, and sets their values using the new initialization technique previously described. The array s3 is defined to contain 20 characters, thus ensuring that sufficient space is reserved for the concatenated character string and saving you from the trouble of having to precisely calculate its size.

The concat function is then called with the three strings s1, s2, and s3 as arguments. The result, as contained in s3 after the concat function returns, is displayed using the %s format characters. Although s3 is defined to contain 20 characters, the printf function only displays characters from the array up to the null character.

Testing Two Character Strings for Equality

You cannot directly test two strings to see if they are equal with a statement such as

if ( string1 == string2 )
   ...

because the equality operator can only be applied to simple variable types, such as floats, ints, or chars, and not to more sophisticated types, such as structures or arrays.

To determine if two strings are equal, you must explicitly compare the two character strings character by character. If you reach the end of both character strings at the same time, and if all of the characters up to that point are identical, the two strings are equal; otherwise, they are not.

It might be a good idea to develop a function that can be used to compare two character strings, as shown in Program 10.4. You can call the function equalStrings and have it take as arguments the two character strings to be compared. Because you are only interested in determining whether the two character strings are equal, you can have the function return a bool value of true (or nonzero) if the two strings are identical, and false (or zero) if they are not. In this way, the function can be used directly inside test statements, such as in

if  ( equalStrings (string1, string2) )
   ...

Example 10.4. Testing Strings for Equality

// Function to determine if two strings are equal

#include <stdio.h>
#include <stdbool.h>

bool equalStrings (const char  s1[], const char  s2[])
{
    int  i = 0;
    bool areEqual;

    while ( s1[i] == s2 [i]  &&
                 s1[i] != '' &&  s2[i] != '' )
        ++i;

    if ( s1[i] == ''  &&  s2[i] == '' )
       areEqual = true;
    else
       areEqual = false;
    return areEqual;
}

int main (void)
{
     bool  equalStrings (const char  s1[], const char  s2[]);
     const char  stra[] = "string compare test";
     const char  strb[] = "string";

     printf ("%i
", equalStrings (stra, strb));
     printf ("%i
", equalStrings (stra, stra));
     printf ("%i
", equalStrings (strb, "string"));

     return 0;
}

Example 10.4. Output

0
1
1

The equalStrings function uses a while loop to sequence through the character strings s1 and s2. The loop is executed so long as the two character strings are equal (s1[i] == s2[i]) and so long as the end of either string is not reached (s1[i] != '' && s2[i] != ''). The variable i, which is used as the index number for both arrays, is incremented each time through the while loop.

The if statement that executes after the while loop has terminated determines if you have simultaneously reached the end of both strings s1 and s2. You could have used the statement

if ( s1[i] == s2[i] )
       ...

instead to achieve the same results. If you are at the end of both strings, the strings must be identical, in which case areEqual is set to true and returned to the calling routine. Otherwise, the strings are not identical and areEqual is set to false and returned.

In main, two character arrays stra and strb are set up and assigned the indicated initial values. The first call to the equalStrings function passes these two character arrays as arguments. Because these two strings are not equal, the function correctly returns a value of false, or 0.

The second call to the equalStrings function passes the string stra twice. The function correctly returns a true value to indicate that the two strings are equal, as verified by the program’s output.

The third call to the equalStrings function is a bit more interesting. As you can see from this example, you can pass a constant character string to a function that is expecting an array of characters as an argument. In Chapter 11, “Pointers,” you see how this works. The equalStrings function compares the character string contained in strb to the character string "string" and returns true to indicate that the two strings are equal.

Inputting Character Strings

By now, you are used to the idea of displaying a character string using the %s format characters. But what about reading in a character string from your window (or your “terminal window”)? Well, on your system, there are several library functions that you can use to input character strings. The scanf function can be used with the %s format characters to read in a string of characters up to a blank space, tab character, or the end of the line, whichever occurs first. So, the statements

char  string[81];

scanf ("%s", string);

have the effect of reading in a character string typed into your terminal window and storing it inside the character array string. Note that unlike previous scanf calls, in the case of reading strings, the & is not placed before the array name (the reason for this is also explained in Chapter 11).

If the preceding scanf call is executed, and the following characters are entered:

Shawshank

the string "Shawshank" is read in by the scanf function and is stored inside the string array. If the following line of text is typed instead:

iTunes playlist

just the string "iTunes" is stored inside the string array because the blank space after the word scanf terminates the string. If the scanf call is executed again, this time the string "playlist" is stored inside the string array because the scanf function always continues scanning from the most recent character that was read in.

The scanf function automatically terminates the string that is read in with a null character. So, execution of the preceding scanf call with the line of text

abcdefghijklmnopqrstuvwxyz

causes the entire lowercase alphabet to be stored in the first 26 locations of the string array, with string[26] automatically set to the null character.

If s1, s2, and s3 are defined to be character arrays of appropriate sizes, execution of the statement

scanf ("%s%s%s", s1, s2, s3);

with the line of text

micro computer system

results in the assignment of the string "micro" to s1, "computer" to s2, and "system" to s3. If the following line of text is typed instead:

system expansion

it results in the assignment of the string "system" to s1, and "expansion" to s2. Because no further characters appear on the line, the scanf function then waits for more input to be entered from your terminal window.

In Program 10.5, scanf is used to read three character strings.

Example 10.5. Reading Strings with scanf

//  Program to illustrate the %s scanf format characters

#include <stdio.h>

int main (void)
{
    char  s1[81], s2[81], s3[81];

    printf ("Enter text:
");

    scanf ("%s%s%s", s1, s2, s3);

    printf ("
s1 = %s
s2 = %s
s3 = %s
", s1, s2, s3);
    return 0;
}

Example 10.5. Output

Enter text:
system expansion
bus

s1 = system
s2 = expansion
s3 = bus

In the preceding program, the scanf function is called to read in three character strings: s1, s2, and s3. Because the first line of text contains only two character strings—where the definition of a character string to scanf is a sequence of characters up to a space, tab, or the end of the line—the program waits for more text to be entered. After this is done, the printf call is used to verify that the strings "system", "expansion", and "bus" are correctly stored inside the string arrays s1, s2, and s3, respectively.

If you type in more than 80 consecutive characters to the preceding program without pressing the spacebar, the tab key, or the Enter (or Return) key, scanf overflows one of the character arrays. This might cause the program to terminate abnormally or cause unpredictable things to happen. Unfortunately, scanf has no way of knowing how large your character arrays are. When handed a %s format, it simply continues to read and store characters until one of the noted terminator characters is reached.

If you place a number after the % in the scanf format string, this tells scanf the maximum number of characters to read. So, if you used the following scanf call:

scanf ("%80s%80s%80s", s1, s2, s3);

instead of the one shown in Program 10.5, scanf knows that no more than 80 characters are to be read and stored into either s1, s2, or s3. (You still have to leave room for the terminating null character that scanf stores at the end of the array. That’s why %80s is used instead of %81s.)

Single-Character Input

The standard library provides several functions for the express purposes of reading and writing single characters and entire character strings. A function called getchar can be used to read in a single character from the terminal. Repeated calls to the getchar function return successive single characters from the input. When the end of the line is reached, the function returns the newline character ' '. So, if the characters “abc” are typed at the terminal, followed immediately by the Enter (or Return) key, the first call to the getchar function returns the character 'a', the second call returns the character 'b', the third call returns 'c', and the fourth call returns the newline character ' '. A fifth call to this function causes the program to wait for more input to be entered from the terminal.

You might be wondering why you need the getchar function when you already know how to read in a single character with the %c format characters of the scanf function. Using the scanf function for this purpose is a perfectly valid approach; however, the getchar function is a more direct approach because its sole purpose is for reading in single characters, and, therefore, it does not require any arguments. The function returns a single character that might be assigned to a variable or used as desired by the program.

In many text-processing applications, you need to read in an entire line of text. This line of text is frequently stored in a single place—generally called a “buffer”—where it is processed further. Using the scanf call with the %s format characters does not work in such a case because the string is terminated as soon as a space is encountered in the input.

Also available from the function library is a function called gets. The sole purpose of this function—you guessed it—is to read in a single line of text. As an interesting program exercise, Program 10.6 shows how a function similar to the gets function—called readLine here—can be developed using the getchar function. The function takes a single argument: a character array in which the line of text is to be stored. Characters read from the terminal window up to, but not including, the newline character are stored in this array by the function.

Example 10.6. Reading Lines of Data

#include <stdio.h>

int main (void)
{
    int   i;
    char  line[81];
    void  readLine (char  buffer[]);

    for ( i = 0; i < 3; ++i )
    {
        readLine (line);
        printf ("%s

", line);
    }

    return 0;
}

// Function to read a line of text from the terminal

void  readLine (char  buffer[])
{
    char  character;
    int   i = 0;

    do
    {
        character = getchar ();
        buffer[i] = character;
        ++i;
    }
    while ( character != '
' );

    buffer[i - 1] = '';
}

Example 10.6. Output

This is a sample line of text.
This is a sample line of text.

abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz

runtime library routines
runtime library routines

The do loop in the readLine function is used to build up the input line inside the character array buffer. Each character that is returned by the getchar function is stored in the next location of the array. When the newline character is reached—signaling the end of the line—the loop is exited. The null character is then stored inside the array to terminate the character string, replacing the newline character that was stored there the last time that the loop was executed. The index number i1 indexes the correct position in the array because the index number was incremented one extra time inside the loop the last time it was executed.

The main routine defines a character array called line with enough space reserved to hold 81 characters. This ensures that an entire line (80 characters has historically been used as the line length of a “standard terminal”) plus the null character can be stored inside the array. However, even in windows that display 80 or fewer characters per line, you are still in danger of overflowing the array if you continue typing past the end of the line without pressing the Enter (or Return) key. It is a good idea to extend the readLine function to accept as a second argument the size of the buffer. In this way, the function can ensure that the capacity of the buffer is not exceeded.

The program then enters a for loop, which simply calls the readLine function three times. Each time that this function is called, a new line of text is read from the terminal. This line is simply echoed back at the terminal to verify proper operation of the function. After the third line of text has been displayed, execution of Program 10.6 is then complete.

For your next program example (see Program 10.7), consider a practical text-processing application: counting the number of words in a portion of text. Develop a function called countWords, which takes as its argument a character string and which returns the number of words contained in that string. For the sake of simplicity, assume here that a word is defined as a sequence of one or more alphabetic characters. The function can scan the character string for the occurrence of the first alphabetic character and considers all subsequent characters up to the first nonalphabetic character as part of the same word. Then, the function can continue scanning the string for the next alphabetic character, which identifies the start of a new word.

Example 10.7. Counting Words

//  Function to determine if a character is alphabetic

#include <stdio.h>
#include <stdbool.h>

bool alphabetic (const char  c)
{
    if  ( (c >= 'a'  &&  c <= 'z') || (c >= 'A'  &&  c <= 'Z') )
       return true;
    else
       return false;
}

/* Function to count the number of words in a string */

int  countWords (const char  string[])
{
    int   i, wordCount = 0;
    bool  lookingForWord = true, alphabetic (const char  c);

    for ( i = 0;  string[i] != '';  ++i )
        if ( alphabetic(string[i]) )
        {
            if ( lookingForWord )
            {
                ++wordCount;
                lookingForWord = false;
            }
        }
        else
            lookingForWord = true;
    return wordCount;
}

int main (void)
{
    const char  text1[] = "Well, here goes.";
    const char  text2[] = "And here we go... again.";
    int   countWords (const char  string[]);

    printf ("%s - words = %i
", text1, countWords (text1));
    printf ("%s - words = %i
", text2, countWords (text2));

    return 0;
}

Example 10.7. Output

Well, here goes. - words = 3
And here we go... again. - words = 5

The alphabetic function is straightforward enough—it simply tests the value of the character passed to it to determine if it is either a lowercase or uppercase letter. If it is either, the function returns true, indicating that the character is alphabetic; otherwise, the function returns false.

The countWords function is not as straightforward. The integer variable i is used as an index number to sequence through each character in the string. The integer variable lookingForWord is used as a flag to indicate whether you are currently in the process of looking for the start of a new word. At the beginning of the execution of the function, you obviously are looking for the start of a new word, so this flag is set to true. The local variable wordCount is used for the obvious purpose of counting the number of words in the character string.

For each character inside the character string, a call to the alphabetic function is made to determine whether the character is alphabetic. If the character is alphabetic, the lookingForWord flag is tested to determine if you are in the process of looking for a new word. If you are, the value of wordCount is incremented by 1, and the lookingForWord flag is set to false, indicating that you are no longer looking for the start of a new word.

If the character is alphabetic and the lookingForWord flag is false, this means that you are currently scanning inside a word. In such a case, the for loop is continued with the next character in the string.

If the character is not alphabetic—meaning either that you have reached the end of a word or that you have still not found the beginning of the next word—the flag lookingForWord is set to true (even though it might already be true).

When all of the characters inside the character string have been examined, the function returns the value of wordCount to indicate the number of words that were found in the character string.

It is helpful to present a table of the values of the various variables in the countWords function to see how the algorithm works. Table 10.1 shows such a table, with the first call to the countWords function from the preceding program as an example. The first line of Table 10.1 shows the initial value of the variables wordCount and lookingForWord before the for loop is entered. Subsequent lines depict the values of the indicated variables each time through the for loop. So, the second line of the table shows that the value of wordCount has been set to 1 and the lookingForWord flag set to false (0) after the first time through the loop (after the 'W' has been processed). The last line of the table shows the final values of the variables when the end of the string is reached. You should spend some time studying this table, verifying the values of the indicated variables against the logic of the countWords function. After this has been accomplished, you should then feel comfortable with the algorithm that is used by the function to count the number of words in a string.

Table 10.1. Execution of the countWords Function

i

string[i]

wordCount

lookingForWord

  

0

true

0

'W'

1

false

1

'e'

1

false

2

'l'

1

false

3

'l'

1

false

4

','

1

true

5

' '

1

true

6

'h'

2

false

7

'e'

2

false

8

'r'

2

false

9

'e'

2

false

10

' '

2

true

11

'g'

3

false

12

'o'

3

false

13

'e'

3

false

14

's'

3

false

15

'.'

3

true

16

''

3

true

The Null String

Now consider a slightly more practical example of the use of the countWords function. This time, you make use of your readLine function to allow the user to type in multiple lines of text at the terminal window. The program then counts the total number of words in the text and displays the result.

To make the program more flexible, you do not limit or specify the number of lines of text that are entered. Therefore, you must have a way for the user to “tell” the program when he is done entering text. One way to do this is to have the user simply press the Enter (or Return) key an extra time after the last line of text has been entered. When the readLine function is called to read in such a line, the function immediately encounters the newline character and, as a result, stores the null character as the first (and only) character in the buffer. Your program can check for this special case and can know that the last line of text has been entered after a line containing no characters has been read.

A character string that contains no characters other than the null character has a special name in the C language; it is called the null string. When you think about it, the use of the null string is still perfectly consistent with all of the functions that you have defined so far in this chapter. The stringLength function correctly returns 0 as the size of the null string; your concat function also properly concatenates “nothing” onto the end of another string; even your equalStrings function works correctly if either or both strings are null (and in the latter case, the function correctly calls these strings equal).

Always remember that the null string does, in fact, have a character in it, albeit a null one.

Sometimes, it becomes desirable to set the value of a character string to the null string. In C, the null string is denoted by an adjacent pair of double quotation marks. So, the statement

char  buffer[100] = "";

defines a character array called buffer and sets its value to the null string. Note that the character string "" is not the same as the character string " " because the second string contains a single blank character. (If you are doubtful, send both strings to the equalStrings function and see what result comes back.)

Program 10.8 uses the readLine, alphabetic, and countWords functions from previous programs. They have not been shown in the program listing to conserve space.

Example 10.8. Counting Words in a Piece of Text

#include <stdio.h>
#include <stdbool.h>

/*****  Insert alphabetic function here   *****/

/*****  Insert readLine function here    *****/

/*****  Insert countWords function here  *****/

int main (void)
{
    char  text[81];
    int   totalWords = 0;
    int   countWords (const char  string[]);
    void  readLine (char  buffer[]);
    bool  endOfText = false;

    printf ("Type in your text.
");
    printf ("When you are done, press 'RETURN'.

");

    while ( ! endOfText )
    {
        readLine (text);

        if ( text[0] == '' )
            endOfText = true;
        else
            totalWords += countWords (text);
    }

    printf ("
There are %i words in the above text.
",  totalWords);

    return 0;
}

Example 10.8. Output

Type in your text.
When you are done, press 'RETURN'.

Wendy glanced up at the ceiling where the mound of lasagna loomed
like a mottled mountain range. Within seconds, she was crowned with
ricotta ringlets and a tomato sauce tiara. Bits of beef formed meaty
moles on her forehead. After the second thud, her culinary coronation
was complete.
Enter
There are 48 words in the above text.

The line labeled Enter indicates the pressing of the Enter or Return key.

The endOfText variable is used as a flag to indicate when the end of the input text has been reached. The while loop is executed as long as this flag is false. Inside this loop, the program calls the readLine function to read a line of text. The if statement then tests the input line that is stored inside the text array to see if just the Enter (or Return) key was pressed. If so, then the buffer contains the null string, in which case the endOfText flag is set to true to signal that all of the text has been entered.

If the buffer does contain some text, the countWords function is called to count the number of words in the text array. The value that is returned by this function is added into the value of totalWords, which contains the cumulative number of words from all lines of text entered thus far.

After the while loop is exited, the program displays the value of totalWords, along with some informative text, at the terminal.

It might seem that the preceding program does not help to reduce your work efforts much because you still have to manually enter all of the text at the terminal. But as you will see in Chapter 16, “Input and Output Operations in C,” this same program can also be used to count the number of words contained in a file stored on a disk, for example. So, an author using a computer system for the preparation of a manuscript might find this program extremely valuable as it can be used to quickly determine the number of words contained in the manuscript (assuming the file is stored as a normal text file and not in some word processor format like Microsoft Word).

Escape Characters

As alluded to previously, the backslash character has a special significance that extends beyond its use in forming the newline and null characters. Just as the backslash and the letter n, when used in combination, cause subsequent printing to begin on a new line, so can other characters be combined with the backslash character to perform special functions. These various backslash characters, often referred to as escape characters, are summarized in Table 10.2.

Table 10.2. Escape Characters

Escape

Character Name

a

Audible alert



Backspace

f

Form feed

Newline

Carriage return

Horizontal tab

v

Vertical tab

\

Backslash

"

Double quotation mark

'

Single quotation mark

?

Question mark

nn

Octal character value nnn

unnnn

Universal character name

Unnnnnnnn

Universal character name

xnn

Hexadecimal character value nn

The first seven characters listed in Table 10.2 perform the indicated function on most output devices when they are displayed. The audible alert character, a, sounds a “bell” in most terminal windows. So, the printf call

printf ("aSYSTEM SHUT DOWN IN 5 MINUTES!!
");

sounds an alert and displays the indicated message.

Including the backspace character '' inside a character string causes the terminal to backspace one character at the point at which the character appears in the string, provided that it is supported by the terminal window. Similarly, the function call

printf ("%i	%i	%i
", a, b, c);

displays the value of a, spaces over to the next tab setting (typically set to every eight columns by default), displays the value of b, spaces over to the next tab setting, and then displays the value of c. The horizontal tab character is particularly useful for lining up data in columns.

To include the backslash character itself inside a character string, two backslash characters are necessary, so the printf call

printf ("\t is the horizontal tab character.
");

displays the following:

	 is the horizontal tab character.

Note that because the \ is encountered first in the string, a tab is not displayed in this case.

To include a double quotation character inside a character string, it must be preceded by a backslash. So, the printf call

printf (""Hello," he said.
");

results in the display of the message

"Hello," he said.

To assign a single quotation character to a character variable, the backslash character must be placed before the quotation mark. If c is declared to be a variable of type char, the statement

c = ''';

assigns a single quotation character to c.

The backslash character, followed immediately by a ?, is used to represent a ? character. This is sometimes necessary when dealing with trigraphs in non-ASCII character sets. For more details, consult Appendix A, “C Language Summary.”

The final four entries in Table 10.2 enable any character to be included in a character string. In the escape character 'nnn', nnn is a one- to three-digit octal number. In the escape character 'xnn', nn is a hexadecimal number. These numbers represent the internal code of the character. This enables characters that might not be directly available from the keyboard to be coded into a character string. For example, to include an ASCII escape character, which has the value octal 33, you could include the sequence 33 or x1b inside your string.

The null character '' is a special case of the escape character sequence described in the preceding paragraph. It represents the character that has a value of 0. In fact, because the value of the null character is 0, this knowledge is frequently used by programmers in tests and loops dealing with variable-length character strings. For example, the loop to count the length of a character string in the function stringLength from Program 10.2 can also be equivalently coded as follows:

while ( string[count] )
    ++count;

The value of string[count] is nonzero until the null character is reached, at which point the while loop is exited.

It should once again be pointed out that these escape characters are only considered a single character inside a string. So, the character string "33"Hello" " actually consists of nine characters (not counting the terminating null): the character '33', the double quotation character '"', the five characters in the word Hello, the double quotation character once again, and the newline character. Try passing the preceding character string to the stringLength function to verify that nine is indeed the number of characters in the string (again, excluding the terminating null).

A universal character name is formed by the characters u followed by four hexadecimal numbers or the characters U followed by eight hexadecimal numbers. It is used for specifying characters from extended character sets; that is, character sets that require more than the standard eight bits for internal representation. The universal character name escape sequence can be used to form identifier names from extended character sets, as well as to specify 16-bit and 32-bit characters inside wide character string and character string constants. For more information, refer to Appendix A.

More on Constant Strings

If you place a backslash character at the very end of the line and follow it immediately by a carriage return, it tells the C compiler to ignore the end of the line. This line continuation technique is used primarily for continuing long constant character strings onto the next line and, as you see in Chapter 13, “The Preprocessor,” for continuing a macro definition onto the next line.

Without the line continuation character, your C compiler generates an error message if you attempt to initialize a character string across multiple lines; for example:

      char  letters[] =
           { "abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ" };

By placing a backslash character at the end of each line to be continued, a character string constant can be written over multiple lines:

      char  letters[] =
           { "abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ" };

It is necessary to begin the continuation of the character string constant at the beginning of the next line because, otherwise, the leading blank spaces on the line get stored in the character string. The preceding statement, therefore, has the net result of defining the character array letters and of initializing its elements to the character string

"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

Another way to break up long character strings is to divide them into two or more adjacent strings. Adjacent strings are constant strings separated by zero or more spaces, tabs, or newlines. The compiler automatically concatenates adjacent strings together. Therefore, writing the strings

"one"  "two"  "three"

is syntactically equivalent to writing the single string

"onetwothree"

So, the letters array can also be set to the letters of the alphabet by writing

char  letters[] =
     { "abcdefghijklmnopqrstuvwxyz"
       "ABCDEFGHIJKLMNOPQRSTUVWXYZ" };

Finally, the three printf calls

printf ("Programming in C is fun
");
printf ("Programming"  " in C is fun
");
printf ("Programming"  " in C"  " is fun
");

all pass a single argument to printf because the compiler concatenates the strings together in the second and third calls.

Character Strings, Structures, and Arrays

You can combine the basic elements of the C programming language to form very powerful programming constructs in many ways. In Chapter 9, “Working with Structures,” for example, you saw how you could easily define an array of structures. Program 10.9 further illustrates the notion of arrays of structures, combined with the variable-length character string.

Suppose you want to write a computer program that acts like a dictionary. If you had such a program, you could use it whenever you came across a word whose meaning was not clear. You could type the word into the program, and the program could then automatically “look up” the word inside the dictionary and tell you its definition.

If you contemplate developing such a program, one of the first thoughts that comes to mind is the representation of the word and its definition inside the computer. Obviously, because the word and its definition are logically related, the notion of a structure comes immediately to mind. You can define a structure called entry, for example, to hold the word and its definition:

struct  entry
{
    char  word[15];
    char  definition[50];
};

In the preceding structure definition, you have defined enough space for a 14-letter word (remember, you are dealing with variable-length character strings, so you need to leave room for the null character) plus a 49-character definition. The following is an example of a variable defined to be of type struct entry that is initialized to contain the word “blob” and its definition.

struct entry  word1 = { "blob", "an amorphous mass" };

Because you want to provide for many words inside your dictionary, it seems logical to define an array of entry structures, such as in

struct entry  dictionary[100];

which allows for a dictionary of 100 words. Obviously, this is far from sufficient if you are interested in setting up an English language dictionary, which requires at least 100,000 entries to be of any value. In that case, you would probably adopt a more sophisticated approach, one that would typically involve storing the dictionary on the computer’s disk, as opposed to storing its entire contents in memory.

Having defined the structure of your dictionary, you should now think a bit about its organization. Most dictionaries are organized alphabetically. It makes sense to organize yours the same way. For now, assume that this is because it makes the dictionary easier to read. Later, you see the real motivation for such an organization.

Now, it’s time to think about the development of the program. It is convenient to define a function to look up a word inside the dictionary. If the word is found, the function could return the entry number of the word inside the dictionary; otherwise, the function could return –1 to indicate that the word was not found in the dictionary. So, a typical call to this function, which you can call lookup, might appear as follows:

entry = lookup (dictionary, word, entries);

In this case, the lookup function searches dictionary for the word as contained in the character string word. The third argument, entries, represents the number of entries in the dictionary. The function searches the dictionary for the specified word and returns the entry number in the dictionary if the word is found, or returns –1 if the word is not found.

In Program 10.9, the lookup function uses the equalStrings function defined in Program 10.4 to determine if the specified word matches an entry in the dictionary.

Example 10.9. Using the Dictionary Lookup Program

// Program to use the dictionary lookup program

#include <stdio.h>
#include <stdbool.h>

struct  entry
{
    char   word[15];

    char   definition[50];
};

/***** Insert equalStrings function here *****/

// function to look up a word inside a dictionary

int  lookup (const struct entry  dictionary[], const char  search[],
             const int  entries)
{
    int  i;
    bool equalStrings (const char s1[], const char s2[]);

    for ( i = 0;  i < entries;  ++i )
        if ( equalStrings (search, dictionary[i].word) )
            return i;

    return -1;
}

int main (void)
{
    const struct entry  dictionary[100] =
      { { "aardvark", "a burrowing African mammal"        },
        { "abyss",    "a bottomless pit"                  },
        { "acumen",   "mentally sharp; keen"              },
        { "addle",    "to become confused"                },
        { "aerie",    "a high nest"                       },
        { "affix",    "to append; attach"                 },
        { "agar",     "a jelly made from seaweed"         },
        { "ahoy",     "a nautical call of greeting"       },
        { "aigrette", "an ornamental cluster of feathers" },
        { "ajar",     "partially opened"                  } };

    char  word[10];
    int   entries = 10;
    int   entry;
    int   lookup (const struct entry  dictionary[], const char  search[],
                  const int  entries);

    printf ("Enter word: ");
    scanf ("%14s", word);
    entry = lookup (dictionary, word, entries);

    if ( entry != -1 )
        printf ("%s
", dictionary[entry].definition);
    else
        printf ("Sorry, the word %s is not in my dictionary.
", word);
    return 0;
}

Example 10.9. Output

Enter word: agar
a jelly made from seaweed

Example 10.9. Output (Rerun)

Enter word: accede
Sorry, the word accede is not in my dictionary.

The lookup function sequences through each entry in the dictionary. For each such entry, the function calls the equalStrings function to determine if the character string search matches the word member of the particular dictionary entry. If it does match, the function returns the value of the variable i, which is the entry number of the word that was found in the dictionary. The function is exited immediately upon execution of the return statement, despite the fact that the function is in the middle of executing a for loop.

If the lookup function exhausts all the entries in the dictionary without finding a match, the return statement after the for loop is executed to return the “not found” indication (–1) back to the caller.

A Better Search Method

The method used by the lookup function to search for a particular word in the dictionary is straightforward enough; the function simply performs a sequential search through all the entries in the dictionary until either a match is made or the end of the dictionary is reached. For a small-sized dictionary like the one in your program, this approach is perfectly fine. However, if you start dealing with large dictionaries containing hundreds or perhaps even thousands of entries, this approach might no longer be sufficient because of the time it takes to sequentially search through all of the entries. The time required can be considerable—even though considerable in this case could mean only a fraction of a second. One of the prime considerations that must be given to any sort of information retrieval program is that of speed. Because the searching process is one that is so frequently used in computer applications, much attention has been given by computer scientists to developing efficient algorithms for searching (about as much attention as has been given to the process of sorting).

You can make use of the fact that your dictionary is in alphabetical order to develop a more efficient lookup function. The first obvious optimization that comes to mind is in the case that the word you are looking for does not exist in the dictionary. You can make your lookup function “intelligent” enough to recognize when it has gone too far in its search. For example, if you look up the word “active” in the dictionary defined in Program 10.9, as soon as you reach the word “acumen,” you can conclude that “active” is not there because, if it was, it would have appeared in the dictionary before the word “acumen.”

As was mentioned, the preceding optimization strategy does help to reduce your search time somewhat, but only when a particular word is not present in the dictionary. What you are really looking for is an algorithm that reduces the search time in most cases, not just in one particular case. Such an algorithm exists under the name of the binary search.

The strategy behind the binary search is relatively simple to understand. To illustrate how this algorithm works, take an analogous situation of a simple guessing game. Suppose I pick a number from 1 to 99 and then tell you to try to guess the number in the fewest number of guesses. For each guess that you make, I can tell you if you are too low, too high, or if your guess is correct. After a few tries at the game, you will probably realize that a good way to narrow in on the answer is by using a halving process. For example, if you take 50 as your first guess, an answer of either “too high” or “too low” narrows the possibilities down from 100 to 49. If the answer was “too high,” the number must be from 1 to 49, inclusive; if the answer was “too low,” the number must be from 51 to 99, inclusive.

You can now repeat the halving process with the remaining 49 numbers. So if the first answer was “too low,” the next guess should be halfway between 51 and 99, which is 75. This process can be continued until you finally narrow in on the answer. On the average, this procedure takes far less time to arrive at the answer than any other search method.

The preceding discussion describes precisely how the binary search algorithm works. The following provides a formal description of the algorithm. In this algorithm, you are looking for an element x inside an array M, which contains n elements. The algorithm assumes that the array M is sorted in ascending order.

Binary Search Algorithm

  1. Set low to 0, high to n – 1.

  2. If low > high, x does not exist in M and the algorithm terminates.

  3. Set mid to (low + high) / 2.

  4. If M[mid] < x, set low to mid + 1 and go to step 2.

  5. If M[mid] > x, set high to mid – 1 and go to step 2.

  6. M[mid] equals x and the algorithm terminates.

The division performed in step 3 is an integer division, so if low is 0 and high is 49, the value of mid is 24.

Now that you have the algorithm for performing a binary search, you can rewrite your lookup function to use this new search strategy. Because the binary search must be able to determine if one value is less than, greater than, or equal to another value, you might want to replace your equalStrings function with another function that makes this type of determination for two character strings. Call the function compareStrings and have it return the value –1 if the first string is lexicographically less than the second string, 0 if the two strings are equal, and 1 if the first string is lexicographically greater than the second string. So, the function call

compareStrings ("alpha", "altered")

returns the value –1 because the first string is lexicographically less than the second string (think of this to mean that the first string occurs before the second string in a dictionary). And, the function call

compareStrings ("zioty", "yucca");

returns the value 1 because “zioty” is lexicographically greater than “yucca.”

In Program 10.10, the new compareStrings function is presented. The lookup function now uses the binary search method to scan through the dictionary. The main routine remains unchanged from the previous program.

Example 10.10. Modifying the Dictionary Lookup Using Binary Search

// Dictionary lookup program

#include <stdio.h>

struct  entry
{
    char  word[15];
    char  definition[50];
};

// Function to compare two character strings

int  compareStrings (const char  s1[], const char  s2[])
{
    int  i = 0, answer;

    while ( s1[i] == s2[i] && s1[i] != ''&& s2[i] != '' )
        ++i;

    if ( s1[i] < s2[i] )
        answer = -1;               /* s1 < s2  */
    else if ( s1[i] == s2[i] )
        answer = 0;                 /* s1 == s2 */
    else
        answer = 1;                 /* s1 > s2  */

    return answer;
}

// Function to look up a word inside a dictionary

int  lookup (const struct entry  dictionary[], const char  search[],
             const int  entries)
{
    int  low = 0;
    int  high = entries - 1;
    int  mid, result;
    int  compareStrings (const char  s1[], const char  s2[]);

    while  ( low <= high )
    {
        mid = (low + high) / 2;
        result = compareStrings (dictionary[mid].word, search);

        if ( result == -1 )
            low = mid + 1;
        else if ( result == 1 )
            high = mid - 1;
        else
            return mid;    /* found it */
     }

     return -1;            /* not found */
}

int main (void)
{
    const struct entry  dictionary[100] =
       { { "aardvark", "a burrowing African mammal"        },
         { "abyss",    "a bottomless pit"                  },
         { "acumen",   "mentally sharp; keen"              },
         { "addle",    "to become confused"                },
         { "aerie",    "a high nest"                       },
         { "affix",    "to append; attach"                 },
         { "agar",     "a jelly made from seaweed"         },
         { "ahoy",     "a nautical call of greeting"       },
         { "aigrette", "an ornamental cluster of feathers" },
         { "ajar",     "partially opened"                  } };
    int   entries = 10;
    char  word[15];
    int   entry;
    int   lookup (const struct entry  dictionary[], const char  search[],
                  const int  entries);

    printf ("Enter word: ");
    scanf ("%14s", word);

    entry = lookup (dictionary, word, entries);

    if ( entry != -1 )
        printf ("%s
", dictionary[entry].definition);
    else
        printf ("Sorry, the word %s is not in my dictionary.
", word);

    return 0;
}

Example 10.10. Output

Enter word: aigrette
an ornamental cluster of feathers

Example 10.10. Output (Rerun)

Enter word: acerb
Sorry, that word is not in my dictionary.

The compareStrings function is identical to the equalStrings function up through the end of the while loop. When the while loop is exited, the function analyzes the two characters that resulted in the termination of the while loop. If s1[i] is less than s2[i], s1 must be lexicographically less than s2. In such a case, –1 is returned. If s1[i] is equal to s2[i], the two strings are equal so 0 is returned. If neither is true, s1 must be lexicographically greater than s2, in which case 1 is returned.

The lookup function defines int variables low and high and assigns them initial values defined by the binary search algorithm. The while loop executes as long as low does not exceed high. Inside the loop, the value mid is calculated by adding low and high and dividing the result by 2. The compareStrings function is then called with the word contained in dictionary[mid] and the word you are searching for as arguments. The returned value is assigned to the variable result.

If compareStrings returns a value of –1—indicating that dictionary[mid].word is less than searchlookup sets the value of low to mid + 1. If compareStrings returns 1—indicating that dictionary[mid].search is greater than searchlookup sets the value of high to mid – 1. If neither –1 nor 1 is returned, the two strings must be equal, and, in that case, lookup returns the value of mid, which is the entry number of the word in the dictionary.

If low eventually exceeds high, the word is not in the dictionary. In that case, lookup returns –1 to indicate this “not found” condition.

Character Operations

Character variables and constants are frequently used in relational and arithmetic expressions. To properly use characters in such situations, it is necessary for you to understand how they are handled by the C compiler.

Whenever a character constant or variable is used in an expression in C, it is automatically converted to, and subsequently treated as, an integer value.

In Chapter 6, “Making Decisions,” you saw how the expression

c >= 'a'  &&  c <= 'z'

could be used to determine if the character variable c contained a lowercase letter. As mentioned there, such an expression could be used on systems that used an ASCII character representation because the lowercase letters are represented sequentially in ASCII, with no other characters in-between. The first part of the preceding expression, which compares the value of c against the value of the character constant 'a', is actually comparing the value of c against the internal representation of the character 'a'. In ASCII, the character 'a' has the value 97, the character 'b' has the value 98, and so on. Therefore, the expression c >= 'a' is TRUE (nonzero) for any lowercase character contained in c because it has a value that is greater than or equal to 97. However, because there are characters other than the lowercase letters whose ASCII values are greater than 97 (such as the open and close braces), the test must be bounded on the other end to ensure that the result of the expression is TRUE for lowercase characters only. For this reason, c is compared against the character 'z', which, in ASCII, has the value 122.

Because comparing the value of c against the characters 'a' and 'z' in the preceding expression actually compares c to the numerical representations of 'a' and 'z', the expression

c >= 97  &&  c <= 122

could be equivalently used to determine if c is a lowercase letter. The first expression is preferred, however, because it does not require the knowledge of the specific numerical values of the characters 'a' and 'z', and because its intentions are less obscure.

The printf call

printf ("%i
", c);

can be used to print out the value that is used to internally represent the character stored inside c. If your system uses ASCII, the statement

printf ("%i
", 'a'),

displays 97, for example.

Try to predict what the following two statements would produce:

c = 'a' + 1;
printf ("%c
", c);

Because the value of 'a' is 97 in ASCII, the effect of the first statement is to assign the value 98 to the character variable c. Because this value represents the character 'b' in ASCII, this is the character that is displayed by the printf call.

Although adding one to a character constant hardly seems practical, the preceding example gives way to an important technique that is used to convert the characters '0' through '9' into their corresponding numerical values 0 through 9. Recall that the character '0' is not the same as the integer 0, the character '1' is not the same as the integer 1, and so on. In fact, the character '0' has the numerical value 48 in ASCII, which is what is displayed by the following printf call:

printf ("%i
", '0'),

Suppose the character variable c contains one of the characters '0' through '9' and that you want to convert this value into the corresponding integer 0 through 9. Because the digits of virtually all character sets are represented by sequential integer values, you can easily convert c into its integer equivalent by subtracting the character constant '0' from it. Therefore, if i is defined as an integer variable, the statement

i = c - '0';

has the effect of converting the character digit contained in c into its equivalent integer value. Suppose c contained the character '5', which, in ASCII, is the number 53. The ASCII value of '0' is 48, so execution of the preceding statement results in the integer subtraction of 48 from 53, which results in the integer value 5 being assigned to i. On a machine that uses a character set other than ASCII, the same result would most likely be obtained, even though the internal representations of '5' and '0' might differ.

The preceding technique can be extended to convert a character string consisting of digits into its equivalent numerical representation. This has been done in Program 10.11 in which a function called strToInt is presented to convert the character string passed as its argument into an integer value. The function ends its scan of the character string after a nondigit character is encountered and returns the result back to the calling routine. It is assumed that an int variable is large enough to hold the value of the converted number.

Example 10.11. Converting a String to its Integer Equivalent

// Function to convert a string to an integer

#include <stdio.h>

int  strToInt (const char  string[])
{
    int  i, intValue, result = 0;

    for  ( i = 0; string[i] >= '0' && string[i] <= '9'; ++i )
    {
        intValue = string[i] - '0';
        result = result * 10 + intValue;
    }

    return result;
}

int main (void)
{
    int  strToInt (const char  string[]);

    printf ("%i
", strToInt("245"));
    printf ("%i
", strToInt("100") + 25);
    printf ("%i
", strToInt("13x5"));

    return 0;
}

Example 10.11. Output

245
125
13

The for loop is executed as long as the character contained in string[i] is a digit character. Each time through the loop, the character contained in string[i] is converted into its equivalent integer value and is then added into the value of result multiplied by 10. To see how this technique works, consider execution of this loop when the function is called with the character string "245" as an argument: The first time through the loop, intValue is assigned the value of string[0]'0'. Because string[0] contains the character '2', this results in the value 2 being assigned to intValue. Because the value of result is 0 the first time through the loop, multiplying it by 10 produces 0, which is added to intValue and stored back in result. So, by the end of the first pass through the loop, result contains the value 2.

The second time through the loop, intValue is set equal to 4, as calculated by subtracting '0' from '4'. Multiplying result by 10 produces 20, which is added to the value of intValue, producing 24 as the value stored in result.

The third time through the loop, intValue is equal to '5''0', or 5, which is added into the value of result multiplied by 10 (240). Thus, the value 245 is the value of result after the loop has been executed for the third time.

Upon encountering the terminating null character, the for loop is exited and the value of result, 245, is returned to the calling routine.

The strToInt function could be improved in two ways. First, it doesn’t handle negative numbers. Second, it doesn’t let you know whether the string contained any valid digit characters at all. For example, strToInt ("xxx") returns 0. These improvements are left as an exercise.

This discussion concludes this chapter on character strings. As you can see, C provides capabilities that enable character strings to be efficiently and easily manipulated. The library actually contains a wide variety of library functions for performing operations on strings. For example, it offers the function strlen to calculate the length of a character string, strcmp to compare two strings, strcat to concatenate two strings, strcpy to copy one string to another, atoi to convert a string to an integer, and isupper, islower, isalpha, and isdigit to test whether a character is uppercase, lowercase, alphabetic, or a digit. A good exercise is to rewrite the examples from this chapter to make use of these routines. Consult Appendix B, “The Standard C Library,” which lists many of the functions available from the library.

Exercises

1.

Type in and run the 11 programs presented in this chapter. Compare the output produced by each program with the output presented after each program in the text.

2.

Why could you have replaced the while statement of the equalStrings function of Program 10.4 with the statement

while ( s1[i] == s2[i]  &&  s1[i] != '' )

to achieve the same results?

3.

The countWords function from Programs 10.7 and 10.8 incorrectly counts a word that contains an apostrophe as two separate words. Modify this function to correctly handle this situation. Also, extend the function to count a sequence of positive or negative numbers, including any embedded commas and periods, as a single word.

4.

Write a function called substring to extract a portion of a character string. The function should be called as follows:

substring (source, start, count, result);

where source is the character string from which you are extracting the substring, start is an index number into source indicating the first character of the substring, count is the number of characters to be extracted from the source string, and result is an array of characters that is to contain the extracted substring. For example, the call

substring ("character", 4, 3, result);

extracts the substring "act" (three characters starting with character number 4) from the string "character" and places the result in result.

Be certain the function inserts a null character at the end of the substring in the result array. Also, have the function check that the requested number of characters does, in fact, exist in the string. If this is not the case, have the function end the substring when it reaches the end of the source string. So, for example, a call such as

substring ("two words", 4, 20, result);

should just place the string “words” inside the result array, even though 20 characters were requested by the call.

5.

Write a function called findString to determine if one character string exists inside another string. The first argument to the function should be the character string that is to be searched and the second argument is the string you are interested in finding. If the function finds the specified string, have it return the location in the source string where the string was found. If the function does not find the string, have it return –1. So, for example, the call

index = findString ("a chatterbox", "hat");

searches the string "a chatterbox" for the string "hat". Because "hat" does exist inside the source string, the function returns 3 to indicate the starting position inside the source string where "hat" was found.

6.

Write a function called removeString to remove a specified number of characters from a character string. The function should take three arguments: the source string, the starting index number in the source string, and the number of characters to remove. So, if the character array text contains the string "the wrong son", the call

removeString (text, 4, 6);

has the effect of removing the characters “wrong” (the word “wrong” plus the space that follows) from the array text. The resulting string inside text is then "the son".

7.

Write a function called insertString to insert one character string into another string. The arguments to the function should consist of the source string, the string to be inserted, and the position in the source string where the string is to be inserted. So, the call

insertString (text, "per", 10);

with text as originally defined in the previous exercise, results in the character string "per" being inserted inside text, beginning at text[10]. Therefore, the character string "the wrong person" is stored inside the text array after the function returned.

8.

Using the findString, removeString, and insertString functions from preceding exercises, write a function called replaceString that takes three character string arguments as follows

replaceString (source, s1, s2);

and that replaces s1 inside source with the character string s2. The function should call the findString function to locate s1 inside source, then call the removeString function to remove s1 from source, and finally call the insertString function to insert s2 into source at the proper location.

So, the function call

replaceString (text, "1", "one");

replaces the first occurrence of the character string "1" inside the character string text, if it exists, with the string "one". Similarly, the function call

replaceString (text, "*", "");

has the effect of removing the first asterisk inside the text array because the replacement string is the null string.

9.

You can extend even further the usefulness of the replaceString function from the preceding exercise if you have it return a value that indicates whether the replacement succeeded, which means that the string to be replaced was found inside the source string. So, if the function returns true if the replacement succeeds and false if it does not, the loop

do
   stillFound = replaceString (text, " ", "");
while  ( stillFound = true );

could be used to remove all blank spaces from text, for example.

Incorporate this change into the replaceStrings function and try it with various character strings to ensure that it works properly.

10.

Write a function called dictionarySort that sorts a dictionary, as defined in Programs 10.9 and 10.10, into alphabetical order.

11.

Extend the strToInt function from Program 10.11 so that if the first character of the string is a minus sign, the value that follows is taken as a negative number.

12.

Write a function called strToFloat that converts a character string into a floating-point value. Have the function accept an optional leading minus sign. So, the call

strToFloat ("-867.6921");

should return the value –867.6921.

13.

If c is a lowercase character, the expression

c – 'a' + 'A'

produces the uppercase equivalent of c, assuming an ASCII character set.

Write a function called uppercase that converts all lowercase characters in a string into their uppercase equivalents.

14.

Write a function called intToStr that converts an integer value into a character string. Be certain the function handles negative integers properly.

 



[1] Recall that the type wchar_t can be used for representing so-called wide characters, but that’s for handling a single character from an international character set. The discussion here is about storing sequences of multiple characters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.221.116