CHAPTER 6: Applications with Strings and Text

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6

Applications with Strings and Text

In the last chapter you were introduced to arrays and you saw how using arrays of numerical values could make many programming tasks much easier. In this chapter you'll extend your knowledge of arrays by exploring how you can use arrays of characters. You'll frequently have a need to work with a text string as a single entity. As you'll see, C doesn't provide you with a string data type as some other languages do. Instead, C uses an array of elements of type char to store a string.

In this chapter I'll show you how you can create and work with variables that store strings, and how the standard library functions can greatly simplify the processing of strings.

You'll learn the following:

How you can create string variables
How to join two or more strings together to form a single string
How you compare strings
How to use arrays of strings
How you work with wide character strings
What library functions are available to handle strings and how you can apply them
How to write a simple password-protection program

What Is a String?

You've already seen examples of string constants—quite frequently in fact. A string constant is a sequence of characters or symbols between a pair of double-quote characters. Anything between a pair of double quotes is interpreted by the compiler as a string, including any special characters and embedded spaces. Every time you've displayed a message using printf(), you've defined the message as a string constant. Examples of strings used in this way appear in the following statements:

printf("This is a string.");

  printf("This is on
two lines!");

  printf("For " you write \".");

These three example strings are shown in Figure 6-1. The decimal value of the character codes that will be stored in memory are shown below the characters.

Figure 6-1. Examples of strings in memory

The first string is a straightforward sequence of letters followed by a period. The printf() function will output this string as the following:

This is a string.

The second string has a newline character, , embedded in it so the string will be displayed over two lines:

This is on

two lines!

The third string may seem a little confusing but the output from printf() should make is clearer:

For " you write ".

You must write a double quote within a string as the escape sequence " because the compiler will interpret an explicit " as the end of the string. You must also use the escape sequence \ when you want to include a backslash in a string because a backslash in a string always signals to the compiler the start of an escape sequence.

As Figure 6-1 shows, a special character with the code value 0 is added to the end of each string to mark where it ends. This character is known as the null character (not to be confused with NULL, which you'll see later), and you write it as .

Note Because a string in C is always terminated by a character, the length of a string is always one greater than the number of characters in the string.

There's nothing to prevent you from adding a character to the end of a string yourself, but if you do, you'll simply end up with two of them. You can see how the null character works with a simple example. Have a look at the following program:

/* Program 6.1 Displaying a string */

#include <stdio.h>



int main(void)

{

  printf("The character  is used to terminate a string.");

  return 0;

}

If you compile and run this program, you'll get this output:

The character

It's probably not quite what you expected: only the first part of the string has been displayed. The output ends after the first two words because the printf() function stops outputting the string when it reaches the first null character, . Even though there's another at the end of string, it will never be reached. The first that's found always marks the end of the string.

String- and Text-Handling Methods

Unlike some other programming languages, C has no specific provision within its syntax for variables that store strings, and because there are no string variables, C has no special operators for processing strings. This is not a problem, though, because you're quite well-equipped to handle strings with the tools you have at your disposal already.

As I said at the beginning of this chapter, you use an array of type char to hold strings. This is the simplest form of string variable. You could declare a char array variable as follows:

char saying[20];

The variable saying that you've declared in this statement can accommodate a string that has up to 19 characters, because you must allow one element for the termination character. Of course, you can also use this array to store 20 characters that aren't a string.

Caution Remember that you must always declare the dimension of an array that you intend to use to store a string as at least one greater than the number of characters that you want to allow the string to have because the compiler will automatically add to the end of a string constant.

You could also initialize the preceding string variable in the following declaration:

char saying[] = "This is a string.";

Here you haven't explicitly defined the array dimension. The compiler will assign a value to the dimension sufficient to hold the initializing string constant. In this case it will be 18, which corresponds to 17 elements for the characters in the string, plus an extra one for the terminating . You could, of course, have put a value for the dimension yourself, but if you leave it for the compiler to do, you can be sure it will be correct.

You could also initialize just part of an array of elements of type char with a string, for example:

char str[40] = "To be";

Here, the compiler will initialize the first five elements from str[0] to str[4] with the characters of the specified string in sequence, and str[5] will contain the null value ''. Of course, space is allocated for all 40 elements of the array, and they're all available to use in any way you want.

Initializing a char array and declaring it as constant is a good way of handling standard messages:

const char message[] = "The end of the world is nigh";

Because you've declared message as const, it's protected from being modified explicitly within the program. Any attempt to do so will result in an error message from the compiler. This technique for defining standard messages is particularly useful if they're used in various places within a program. It prevents accidental modification of such constants in other parts of your program. Of course, if you do need to be able to change the message, then you shouldn't specify the array as const.

When you want to refer to the string stored in an array, you just use the array name by itself. For instance, if you want to output the string stored in message using the printf() function, you could write this:

printf("
The message is: %s", message);

The %s specification is for outputting a null-terminating string. At the position where the %s appears in the first argument, the printf() function will output successive characters from the message array until it finds the '' character. Of course, an array with elements of type char behaves in exactly the same way as an array of elements of any other type, so you use it in exactly the same way. Only the special string handling functions are sensitive to the '' character, so outside of that there really is nothing special about an array that holds a string.

The main disadvantage of using char arrays to hold a variety of different strings is the potentially wasted memory. Because arrays are, by definition, of a fixed length, you have to declare each array that you intend to use to store strings with its dimension set to accommodate the maximum string length you're likely to want to process. In most circumstances, your typical string length will be somewhat less than the maximum, so you end up wasting memory. Because you normally use your arrays here to store strings of different lengths, getting the length of a string is important, especially if you want to add to it. Let's look at how you do this using an example.

TRY IT OUT: FINDING OUT THE LENGTH OF A STRING

In this example, you're going to initialize two strings and then find out how many characters there are in each, excluding the null character:

/* Program 6.2 Lengths of strings  */

#include <stdio.h>

int main(void)

{

  char str1[] = "To be or not to be";

  char str2[] = ",that is the question";

  int count = 0;                  /* Stores the string length                  */

  while (str1[count] != '')    /* Increment count till we reach the string */

    count++;                      /*  terminating character.                  */

  printf("
The length of the string "%s" is %d characters.", str1, count);

count = 0;                      /* Reset to zero for next string            */

  while (str2[count] != '')  /* Count characters in second string        */

    count++;

  printf("
The length of the string "%s" is %d characters.
", str2, count);

  return 0;

}

The output you will get from this program is the following:

The length of the string "To be or not to be" is 18 characters.

The length of the string ",that is the question" is 21 characters.

How It Works

First you have the inevitable declarations for the variables that you'll be using:

char str1[] = "To be or not to be";

char str2[] = ",that is the question";

int count = 0;                    /* Stores the string length                  */

You declare two arrays of type char that are each initialized with a string. The compiler will set the size of each array to accommodate the string including its terminating null. You also declare and initialize a counter, count, to use in the loops in the program. Of course, you could have omitted the dimension for each array and left the compiler to figure out what is required, as you saw earlier.

Next, you have a while loop that determines the length of the first string:

while (str1[count] != '')      /* Increment count till we reach the string */

   count++;                        /*  terminating character.                  */

Using a loop in the way you do here is very common in programming with strings. To find the length, you simply keep incrementing a counter in the while loop as long as you haven't reached the end of string character. You can see how the condition for the continuation of the loop is whether the terminating '' has been reached. At the end of the loop, the variable count will contain the number of characters in the string, excluding the terminating null.

I have shown the while loop comparing the value of the str1[count] element with '' so the mechanism for finding the end of the string is clear to you. However, this loop would typically be written like this:

while(str1[count])

  count++;

The ASCII code value for the '' character is zero which corresponds to the Boolean value false. All other ASCII code values are nonzero and therefore correspond to the Boolean value true. Thus the loop will continue as long as str1[count] is not '', which is precisely what you want.

Now that you've determined the length, you display the string with the following statement:

printf("
The length of the string "%s" is %d characters.", str1, count);

This also displays the count of the number of characters that the string contains, excluding the terminating null. Notice that you use the new format specifier, %s that we saw earlier. This outputs characters from the string until it reaches the terminating null. If there was no terminating character, it would continue to output characters until it found one somewhere in memory. In some cases, that can mean a lot of output. You also use the escape character, ", to include a double quote in the string. If you don't precede the double-quote character with the backslash, the compiler will think it marked the end of the string that is the first argument to the printf() function, and the statement will cause an error message to be produced.

You find the length of the second string and display the result in exactly the same way as the first string.

Operations with Strings

The code in the previous example is designed to show you the mechanism for finding the length of a string, but you never have to write such code in practice. As you'll see very soon, the strlen() function in the standard library will determine the length of a null-terminated string for you. So now that you know how to find the lengths of strings, how can you manipulate them?

Unfortunately you can't use the assignment operator to copy a string in the way you do with int or double variables. To achieve the equivalent of an arithmetic assignment with strings, one string has to be copied element by element to the other. In fact, performing any operation on string variables is very different from the arithmetic operations with numeric variables you've seen so far. Let's look at some common operations that you might want to perform with strings and how you would achieve them.

Appending a String

Joining one string to the end of another is a common requirement. For instance, you might want to assemble a single message from two or more strings. You might define the error messages in a program as a few basic text strings to which you append one of a variety of strings to make the message specific to a particular error. Let's see how this works in the context of an example.

TRY IT OUT: JOINING STRINGS

You could rework the last example to append the second string to the first:

/* Program 6.3 Joining strings */

#include <stdio.h>



int main(void)

{

  char str1[40] = "To be or not to be";

  char str2[] = ",that is the question";

  int count1 = 0;                /* Length of str1 */

  int count2 = 0;                /* Length of str2 */



  /* find the length of the first string */

  while (str1[count1]        )    /* Increment count till we reach the string */

    count1++;                    /* terminating character.                    */



  /* Find the length of the second string */

  while (str2[count2])            /* Count characters in second string */

    count2++;



  /* Check that we have enough space for both strings  */

  if(sizeof str1 < count1 + count2 + 1)

    printf("
You can't put a quart into a pint pot.");

  else

  {  /* Copy 2nd string to end of the first  */

      count2 = 0;                  /* Reset index for str2 to 0    */

    while(str2[count2])          /* Copy up to null from str2    */

      str1[count1++] = str2[count2++];

str1[count1] = '';      /* Make sure we add terminator  */

    printf("
%s
", str1 );  /* Output combined string    */

  }

  return 0;

}

The output from this program will be the following:

To be or not to be, that is the question

How It Works

This program first finds the lengths of the two strings. It then checks that str1 has enough elements to hold both strings plus the terminating null character:

if(sizeof str1 < count1 + count2 + 1)

  printf("
You can't put a quart into a pint pot.");

Notice how you use the sizeof operator to get the total number of bytes in the array by just using the array name as an argument. The value that results from the expression sizeof str1 is the number of characters that the array will hold, because each character occupies 1 byte.

If you discover that the array is too small to hold the contents of both strings, then you display a message. The program will then end as you fall through the closing brace in main(). It's essential that you do not try to place more characters in the array than it can hold, as this will overwrite some memory that may contain important data. This is likely to crash your program. You should never append characters to a string without first checking that there is sufficient space in the array to accommodate them.

You reach the else block only if you're sure that both strings will fit in the first array. Here, you reset the variable count2 to 0 and copy the second string to the first array with the following statements:

else

  {  /* Copy 2nd string to end of the first  */

      count2 = 0;                  /* Reset index for str2 to 0    */

    while(str2[count2])          /* Copy up to null from str2    */

      str1[count1++] = str2[count2++];



    str1[count1] = '';          /* Make sure we add terminator */

    printf("
%s
", str1 );      /* Output combined string      */

  }

The variable count1 starts from the value that was left by the loop that determined the length of the first string, str1. This is why you use two separate variables to count the number of characters in each of the two strings. Because the array is indexed from 0, the value that's stored in count1 will point to the element containing '' at the end of the first string. So when you use count1 to index the array str1, you know that you're starting at the end of the message proper and that you'll overwrite the null character with the first character of the second string.

You then copy characters from str2 to str1 until you find the '' in str2. You still have to add a terminating '' to str1 because it isn't copied from str2. The end result of the operation is that you've added the contents of str2 to the end of str1, overwriting the terminating null character for str1 and adding a terminating null to the end of the combined string.

You could replace the three lines of code that did the copying with a more concise alternative:

while ((str1[count1++] = str2[count2++]));

This would replace the loop you have in the program as well as the statement to put a '' at the end of str1. This statement would copy the '' from str2 to str1, because the copying occurs in the loop continuation condition. Let's consider what happens at each stage.

Assign the value of str2[count2] to str1[count1]. An assignment expression has a value that is the value that was stored in the left operand of the assignment operator. In this case it is the character that was copied into str1[count1].
Increment each of the counters by 1, using the postfix form of the ++ operator.
Check whether the value of the assignment expression—which will be the last character stored in str1—is true or false. The loop ends after the '' has been copied to str1, which will result in the value of the assignment being false.

Arrays of Strings

It may have occurred to you by now that you could use a two-dimensional array of elements of type char to store strings, where each row is used to hold a separate string. In this way you could arrange to store a whole bunch of strings and refer to any of them through a single variable name, as in this example:

char sayings[3][32] = {

                        "Manners maketh man.",

                        "Many hands make light work.",

                        "Too many cooks spoil the broth."

                      };

This creates an array of three rows of 32 characters. The strings between the braces will be assigned in sequence to the three rows of the array, sayings[0], sayings[1], and sayings[2]. Note that you don't need braces around each string. The compiler can deduce that each string is intended to initialize one row of the array. The last dimension is specified to be 32, which is just sufficient to accommodate the longest string, including its terminating character. The first dimension specifies the number of strings.

When you're referring to an element of the array—sayings[i][j], for instance—the first index, i, identifies a row in the array, and the second index, j, identifies a character within a row. When you want to refer to a complete row containing one of the strings, you just use a single index value between square brackets. For instance, sayings[1] refers to the second string in the array, "Many hands make light work.".

Although you must specify the last dimension in an array of strings, you can leave it to the compiler to figure out how many strings there are:

char sayings[][32] = {

                        "Manners maketh man.",

                        "Many hands make light work.",

                        "Too many cooks spoil the broth."

                      };

I've omitted the value for the size of the first dimension in the array here so the compiler will deduce this from the initializers between braces. Because you have three initializing strings, the compiler will make the first array dimension 3. Of course, you must still make sure that the last dimension is large enough to accommodate the longest string, including its terminating null character.

You could output the three sayings with the following code:

for(int i = 0 ; i<3 ; i++)

printf("
%s", sayings[i]);

You reference a row of the array using a single index in the expression sayings[i]. This effectively accesses the one-dimensional array that is at index position i in the sayings array.

You could change the last example to use a two-dimensional array.

TRY IT OUT: ARRAYS OF STRINGS

Let's change the previous example so that it stores the two initial strings in a single array and incorporate the more concise coding for finding string lengths and copying strings:

/* Program 6.4 Arrays of strings */

#include <stdio.h>



int main(void)

{

  char str[][40] =  {

                      "To be or not to be"    ,

                      ", that is the question"

                    };

  int count[] = {0, 0};                /* Lengths of strings */



  /* find the lengths of the strings */

  for(int i = 0 ; i<2 ; i++)

    while (str[i][count[i]])

      count[i]++;



  /* Check that we have enough space for both strings  */

  if(sizeof str[0] < count[0] + count[1] + 1)

    printf("
You can't put a quart into a pint pot.");

  else

  {  /* Copy 2nd string to first */

    count[1] = 0;

    while((str[0][count[0]++] = str[1][count[1]++]));



    printf("
%s
", str[0]);    /* Output combined string */

  }

  return 0;

}

Typical output from this program is the following:

To be or not to be, that is the question

How It Works

You declare a single two-dimensional char array instead of the two one-dimensional arrays you had before:

char str[][40] = {

                      "To be or not to be",

                      ",that is the question"

                    };

The first initializing string is stored with the first index value as 0, and the second initializing string is stored with the first index value as 1. Of course, you could add as many initializing strings as you want between the braces, and the compiler would adjust the first array dimension to accommodate them.

The string lengths are now stored as elements in the count array. With count as an array we are able to find the lengths of both strings in the same loop:

for(int i = 0 ; i<2 ; i++)

    while (str[i][count[i]])

      count[i]++;

The outer for loop iterates of the two strings and the inner while loop iterates over the characters in the current string selected by i. This approach obviously applies to any number of strings in the str array; naturally the number of elements in the count array must be the same as the number of strings. A disadvantage of this approach is that if your strings are significantly less than 40 characters long, you waste quite a bit of memory in the array. In the next chapter you'll learn how you can avoid this and store each string in the most efficient manner.

String Library Functions

Now that you've struggled through the previous examples, laboriously copying strings from one variable to another, it's time to reveal that there's a standard library for string functions that can take care of all these little chores. Still, at least you know what's going on when you use the library functions.

The string functions are declared in the <string.h> header file, so you'll need to put

#include <string.h>

at the beginning of your program if you want to use them. The library actually contains quite a lot of functions, and your compiler may provide an even more extensive range of string library capabilities than is required by the C standard. I'll discuss just a few of the essential functions to demonstrate the basic idea and leave you to explore the rest on your own.

Copying Strings Using a Library Function

First, let's return to the process of copying the string stored in one array to another, which is the string equivalent of an assignment operation. The while loop mechanism you carefully created to do this must still be fresh in your mind. Well, you can do the same thing with this statement:

strcpy(string1, string2);

The arguments to the strcpy() function are char array names. What the function actually does is copy the string specified by the second argument to the string specified by the first argument, so in the preceding example string2 will be copied to string1, replacing what was previously stored in string1. The copy operation will include the terminating ''. It's your responsibility to ensure that the array string1 has sufficient space to accommodate string2. The function strcpy() has no way of checking the sizes of the arrays, so if it goes wrong it's all your fault. Obviously, the sizeof operator is important because you'll most likely check that everything is as it should be:

if(sizeof(string2) <= sizeof (string1))

  strcpy(string1, string2);

You execute the strcpy() operation only if the length of the string2 array is less than or equal to the length of the string1 array.

You have another function available, strncpy(), that will copy the first n characters of one string to another. The first argument is the destination string, the second argument is the source string, and the third argument is an integer of type size_t that specifies the number of characters to be copied. Here's an example of how this works:

char destination[] = "This string will be replaced";

char source[] = "This string will be copied in part";

size_t n = 26;                    /* Number of characters to be copied */

strncpy(destination, source, n);

After executing these statements, destination will contain the string "This string will be copied", because that corresponds to the first 26 characters from source. A '' character will be appended after the last character copied. If source has fewer than 26 characters, the function will add '' characters to make up the count to 26.

Note that when the length of the source string is greater than the number of characters to be copied, no additional '' character is added to the destination string by the strncpy() function. This means that the destination string may not have a termination null character in such cases, which can cause major problems with further operations with the destination string.

Determining String Length Using a Library Function

To find out the length of a string you have the function strlen(), which returns the length of a string as an integer of type size_t. To find the length of a string in Program 6.3 you wrote this:

while (str2[count2])

  count2++;

Instead of this rigmarole, you could simply write this:

count2 = strlen(str2);

Now the counting and searching that's necessary to find the end of the string is performed by the function, so you no longer have to worry about it. Note that it returns the length of the string excluding the '', which is generally the most convenient result. It also returns the value as size_t which corresponds to an unsigned integer type, so you may want to declare the variable to hold the result as size_t as well. If you don't, you may get warning messages from your compiler.

Just to remind you, type size_t is a type that is defined in the standard library header file <stddef.h>. This is also the type returned by the operator sizeof. The type size_t will be defined to be one of the unsigned integer types you have seen, typically unsigned int. The reason for implementing things this way is code portability. The type returned by sizeof and the strlen() function, among others, can vary from one C implementation to another. It's up to the compiler writer to decide what it should be. Defining the type to be size_t and defining size_t in a header file enables you to accommodate such implementation dependencies in your code very easily. As long as you define count2 in the preceding example as type size_t, you have code that will work in every standard C implementation, even though the definition of size_t may vary from one implementation to another.

So for the most portable code, you should write the following:

size_t count2 = 0;

count2 = strlen(str2);

As long as you have #include directives for <string.h> and <stddef.h>, this code will compile with the ISO/IEC standard C compiler.

Joining Strings Using a Library Function

In Program 6.3, you copied the second string onto the end of the first using the following rather complicated looking code:

count2 = 0;

while(str2[count2])

  str1[count1++] = str2[count2++];

str1[count1] = '';

Well, the string library gives a slight simplification here, too. You could use a function that joins one string to the end of another. You could achieve the same result as the preceding fragment with the following exceedingly simple statement:

strcat(str1, str2); /* Copy str2 to the end of str1 */

This function copies str2 to the end of str1. The strcat() function is so called because it performs string catenation; in other words it joins one string onto the end of another. As well as appending str2 to str1, the strcat() function also returns str1.

If you only want to append part of the source string to the destination string, you can use the strncat() function. This requires a third argument of type size_t that indicates the number of characters to be copied, for instance

strncat(str1, str2, 5); /* Copy 1st 5 characters of str2 to the end of str1 */

As with all the operations that involve copying one string to another, it's up to you to ensure that the destination array is sufficiently large to accommodate what's being copied to it. This function and others will happily overwrite whatever lies beyond the end of your destination array if you get it wrong.

All these string functions return the destination string. This allows you to use the value returned in another string operation, for example

size_t length = 0;

length = strlen(strncat(str1, str2, 5));

Here the strncat() function copies five characters from str2 to the end of str1. The function returns the array str1, so this is passed as an argument to the strlen() function. This will then return the length of the new version of str1 with the five characters from str2 appended.

TRY IT OUT: USING THE STRING LIBRARY

You now have enough tools to do a good job of rewriting Program 6.3:

/* Program 6.5 Joining strings - revitalized */

#include <stdio.h>

#include <string.h>

#define STR_LENGTH 40



int main(void)

{

  char str1[STR_LENGTH] = "To be or not to be";

  char str2[STR_LENGTH] = ",that is the question";



  if(STR_LENGTH > strlen(str1) + strlen(str2)) /* Enough space ?              */

    printf("
%s
", strcat(str1, str2));    /* yes, so display joined string */

else

    printf("
You can't put a quart into a pint pot.");

  return 0;

}

This program will produce exactly the same output as before.

How It Works

Well, what a difference a library makes. It actually makes the problem trivial, doesn't it? You've defined a symbol for the size of the arrays using a #define directive. If you want to change the array sizes in the program later, you can just modify the definition for STR_LENGTH. You simply check that you have enough space in your array by means of the if statement:

if(STR_LENGTH > strlen(str1) + strlen(str2)) /* Enough space ?                */

  printf("
%s
", strcat(str1, str2));    /* yes, so display joined string    */

else

  printf("
You can't put a quart into a pint pot.");

If you do have enough space, you join the strings using the strcat() function within the argument to the printf(). Because the strcat() function returns str1, the printf() displays the result of joining the strings. If str1 is too short, you just display a message. Note that the comparison uses the > operator—this is because the array length must be at least one greater than the sum of the two string lengths to allow for the terminating '' character.

Comparing Strings

The string library also provides functions for comparing strings and deciding whether one string is greater than or less than another. It may sound a bit odd applying such terms as "greater than" and "less than" to strings, but the result is produced quite simply. Successive corresponding characters of the two strings are compared based on the numerical value of their character codes. This mechanism is illustrated graphically in Figure 6-2, in which the character codes are shown as hexadecimal values.

Figure 6-2. Comparing two strings

If two strings are identical, then of course they're equal. The first pair of corresponding characters that are different in two strings determines whether the first string is less than or greater than the second. So, for example, if the character code for the character in the first string is less than the character code for the character in the second string, the first string is less than the second. This mechanism for comparison generally corresponds to what you expect when you're arranging strings in alphabetical order.

The function strcmp(str1, str2) compares two strings. It returns a value of type int that is less than, equal to, or greater than 0, corresponding to whether str1 is less than, equal to, or greater than str2. You can express the comparison illustrated in Figure 6-2 in the following code fragment:

char str1[] = "The quick brown fox";

char str2[] = "The quick black fox";

if(strcmp(str1, str2) < 0)

  printf("str1 is less than str2");

The printf() statement will execute only if the strcmp() function returns a negative integer. This will be when the strcmp() function finds a pair of corresponding characters in the two strings that do not match and the character code in str1 is less than the character code in str2.

The strncmp() function compares up to n characters of the two strings. The first two arguments are the same as for the strcmp() function and the number of characters to be compared is specified by a third argument that's an integer of type size_t. This function would be useful if you were processing strings with a prefix of ten characters, say, that represented a part number or a sequence number. You could use the strncmp() function to compare just the first ten characters of two strings to determine which should come first:

if(strncmp(str1, str2, 10) <= 0)

  printf("
%s
%s", str1, str2);

else

  printf("
%s
%s", str2, str1);

These statements output strings str1 and str2 arranged in ascending sequence according to the first ten characters in the strings.

Let's try comparing strings in a working example.

TRY IT OUT: COMPARING STRINGS

You can demonstrate the use of comparing strings in an example that compares just two words that you enter from the keyboard:

/* Program 6.6 Comparing strings */

#include <stdio.h>

#include <string.h>



int main(void)

{

  char word1[20];                /* Stores the first word  */

  char word2[20];                /* Stores the second word */



  printf("
Type in the first word (less than 20 characters):
1: ");

  scanf("%19s", word1);          /* Read the first word    */

  printf("Type in the second word (less than 20 characters):
 2: ");

  scanf("%19s", word2);          /* Read the second word    */



  /* Compare the two words */

  if(strcmp(word1,word2) == 0)

  printf("You have entered identical words");

else

    printf("%s precedes %s",

                    (strcmp(word1, word2) < 0) ? word1 : word2,

                    (strcmp(word1, word2) < 0) ? word2 : word1);

  return 0;

}

The program will read in two words and then tell you which word comes before the other alphabetically. The output looks something like this:

Type in the first word (less than 20 characters):

 1: apple

Type in the second word (less than 20 characters):

 2: banana

apple precedes banana

How It Works

You start the program with the #include directives for the header files for the standard input and output library, and the string handling library:

#include <stdio.h>

#include <string.h>

In the body of main(), you first declare two character arrays to store the words that you'll read in from the keyboard:

char word1[20];                  /* Stores the first word  */

char word2[20];                   /* Stores the second word */

You set the size of the arrays to 20. This should be enough for an example, but there's a risk that this may not be sufficient. As with the strcpy() function, it's your responsibility to allocate enough space for what the user may key in. The function scanf() will limit the number of characters read if you specify a width with the format specification. While this ensures the array limit will not be exceeded, any characters in excess of the width you specify will be left in the input stream and will be read by the next input operation for the stream.

The next task is to get two words from the user; so after a prompt you use scanf() twice to read a couple of words from the keyboard:

printf("
Type in the first word (less than 20 characters):
 1: ");

scanf("%19s", word1);            /* Read the first word    */

printf("Type in the second word (less than 20 characters):
 2: ");

scanf("%19s", word2);            /* Read the second word    */

The width specification of 19 characters ensures that the array size of 20 elements will not be exceeded. Notice how in this example you haven't used an & operator before the variables in the arguments to the scanf() function. This is because the name of an array by itself is an address. It corresponds to the address of the first element in the array. You could write this explicitly using the & operator like this:

scanf("%s", &word1[0]);

Therefore, &word1[0] is equal to word1! I'll go into more detail on this in the next chapter.

Finally, you use the strcmp() function to compare the two words that were entered:

if(strcmp(word1,word2) == 0)

  printf("You have entered identical words");

else

  printf("%s precedes %s",

                    (strcmp(word1, word2) < 0) ? word1 : word2,

                    (strcmp(word1, word2) < 0) ? word2 : word1);

If the value returned by the strcmp() function is 0, the two strings are equal and you display a message to this effect. If not, you print out a message specifying which word precedes the other. You do this using the conditional operator to specify which word you want to print first and which you want to print second.

Searching a String

The <string.h> header file declares several string-searching functions, but before I get into these, we'll take a peek at the subject of the next chapter, namely pointers. You'll need an appreciation of the basics of this in order to understand how to use the string-searching functions.

The Idea of a Pointer

As you'll learn in detail in the next chapter, C provides a remarkably useful type of variable called a pointer. A pointer is a variable that contains an address—that is, it contains a reference to another location in memory that can contain a value. You already used an address when you used the function scanf(). A pointer with the name pNumber is defined by the second of the following two statements:

int Number = 25;

int *pNumber = &Number;

Figure 6-3 illustrates what happens when these two statements are executed.

Figure 6-3. An example of a pointer

You declare a variable, Number, with the value 25, and a pointer, pNumber, which contains the address of Number. You can now use the variable pNumber in the expression *pNumber to obtain the value contained in Number. The * is the dereference operator and its effect is to access the data stored at the address specified by a pointer.

The main reason for introducing this idea here is that the functions I'll discuss in the following sections return pointers, so you could be a bit confused by them if there was no explanation here at all. If you end up confused anyway, don't worry—all will be illuminated in the next chapter.

Searching a String for a Character

The strchr() function searches a given string for a specified character. The first argument to the function is the string to be searched (which will be the address of a char array), and the second argument is the character that you're looking for. The function will search the string starting at the beginning and return a pointer to the first position in the string where the character is found. This is the address of this position in memory and is of type char* described as "pointer to char." So to store the value that's returned you must create a variable that can store an address of a character. If the character isn't found, the function will return a special value NULL, which is the equivalent of 0 for a pointer and represents a pointer that doesn't point to anything.

You can use the strchr() function like this:

char str[] = "The quick brown fox";  /* The string to be searched        */

char c = 'q';                        /* The character we are looking for */

char *pGot_char = NULL;              /* Pointer initialized to zero      */

pGot_char = strchr(str, c);          /* Stores address where c is found  */

You define the character that you're looking for by the variable c of type char. Because the strchr() function expects the second argument to be of type int, the compiler will convert the value of c to this type before passing it to the function.

You could just as well define c as type int like this:

int c = 'q'; /* Initialize with character code for q */

Functions are often implemented so that a character is passed as an argument of type int because it's simpler to work with type int than type char.

Figure 6-4 illustrates the result of this search using the strchr() function.

Figure 6-4. Searching for a character

The address of the first character in the string is given by the array name str. Because 'q' appears as the fifth character in the string, its address will be str + 4, an offset of 4 bytes from the first character. Thus, the variable pGot_char will contain the address str + 4.

Using the variable name pGot_char in an expression will access the address. If you want to access the character that's stored at that address too, then you must dereference the pointer. To do this, you precede the pointer variable name with the dereference operator *, for example:

printf("Character found was %c.", *pGot_char);

I'll go into more detail on using the dereferencing operator further in the next chapter.

Of course, in general it's always possible that the character you're searching for might not be found in the string, so you should take care that you don't attempt to dereference a NULL pointer.

If you do try to dereference a NULL pointer, your program will crash. This is very easy to avoid with an if statement, like this:

if(pGot_char != NULL)

  printf("Character found was %c.", *pGot_char);

Now you only execute the printf() statement when the variable pGot_char isn't NULL.

The strrchr() function is very similar in operation to the strchr() function, except that it searches for the character starting from the end of the string. Thus, it will return the address of the last occurrence of the character in the string, or NULL if the character isn't found.

Searching a String for a Substring

The strstr() function is probably the most useful of all the searching functions declared in string.h. It searches one string for the first occurrence of a substring and returns a pointer to the position in the first string where the substring is found. If it doesn't find a match, it returns NULL. So if the value returned here isn't NULL, you can be sure that the searching function that you're using has found an occurrence of what it was searching for. The first argument to the function is the string that is to be searched, and the second argument is the substring you're looking for.

Here is an example of how you might use the strstr() function:

char text[] = "Every dog has his day";

char word[] = "dog";

char *pFound = NULL;

pFound = strstr(text, word);

This searches text for the first occurrence of the string stored in word. Because the string "dog" appears starting at the seventh character in text, pFound will be set to the address text + 6. The search is case sensitive, so if you search the text string for "Dog", it won't be found.

TRY IT OUT: SEARCHING A STRING

Here's some of what I've been talking about in action:

/* Program 6.7 A demonstration of seeking and finding  */

#include <stdio.h>

#include <string.h>

int main(void)

{

  char str1[] = "This string contains the holy grail.";

  char str2[] = "the holy grail";

  char str3[] = "the holy grill";



  /* Search str1 for the occurrence of str2 */

  if(strstr(str1, str2) == NULL)

    printf("
"%s" was not found.", str2);

  else

    printf("
"%s" was found in "%s"",str2, str1);



  /* Search str1 for the occurrence of str3 */

  if(strstr(str1, str3) == NULL)

    printf("
"%s" was not found.", str3);

else

    printf("
We shouldn't get to here!");

  return 0;

}

This program produces the following output:

"the holy grail" was found in "This string contains the holy grail."

"the holy grill" was not found.

How It Works

Note the #include directive for <string.h>. This is necessary when you want to use any of the string processing functions.

You have three strings defined: str1, str2, and str3:

char str1[] = "This string contains the holy grail.";

char str2[] = "the holy grail";

char str3[] = "the holy grill";

In the first if statement, you use the library function strstr() to search for the occurrence of the second string in the first string:

if(strstr(str1, str2) == NULL)

  printf("
"%s" was not found.", str2);

else

  printf("
"%s" was found in "%s"",str2, str1);

You display a message corresponding to the result by testing the returned value of strstr() against NULL. If the value returned is equal to NULL, this indicates the second string wasn't found in the first, so a message is displayed to that effect. If the second string is found, the else is executed. In this case, a message is displayed indicating that the string was found.

You then repeat the process in the second if statement and check for the occurrence of the third string in the first:

if(strstr(str1, str3) == NULL)

  printf("
"%s" was not found.", str3);

else

  printf("
We shouldn't get to here!");

If you get output from the first or the last printf() in the program, something is seriously wrong.

Analyzing and Transforming Strings

If you need to examine the internal contents of a string, you can use the set of standard library functions that are declared the <ctype.h> header file that I introduced in Chapter 3. These provide you with a very flexible range of analytical functions that enable you to test what kind of character you have. They also have the advantage that they're independent of the character code on the computer you're using. Just to remind you, Table 6-1 shows the functions that will test for various categories of characters.

Table 6-1. Character Classification Functions

Function	Tests For
`islower()`	Lowercase letter
`isupper()`	Uppercase letter
`isalpha()`	Uppercase or lowercase letter
`isalnum()`	Uppercase or lowercase letter or a digit
`iscntrl()`	Control character
`isprint()`	Any printing character including space
`isgraph()`	Any printing character except space
`isdigit()`	Decimal digit (`'0'` to `'9'`)
`isxdigit()`	Hexadecimal digit (`'0'` to `'9'`, `'A'` to `'F'`, `'a'` to `'f'`)
`isblank()`	Standard blank characters (space, `' '`)
`isspace()`	Whitespace character (space, `' '`, `' '`, `'v'`, `' '`, `'f'`)
`ispunct()`	Printing character for which `isspace()` and `isalnum()` return `false`

The argument to a function is the character to be tested. All these functions return a nonzero value of type int if the character is within the set that's being tested for; otherwise, they return 0. Of course, these return values convert to true and false respectively so you can use them as Boolean values. Let's see how you can use these functions for testing the characters in a string.

TRY IT OUT: USING THE CHARACTER CLASSIFICATION FUNCTIONS

The following example determines how many digits and letters there are in a string that's entered from the keyboard:

/* Program 6.8 Testing characters in a string */

#include <stdio.h>

#include <ctype.h>



int main(void)

{

  char buffer[80];                /* Input buffer                */

  int i = 0;                      /* Buffer index                */

  int num_letters = 0;            /* Number of letters in input */

  int num_digits = 0;            /* Number of digits in input  */



  printf("
Enter an interesting string of less than 80 characters:
");

  gets(buffer);                  /* Read a string into buffer  */





  while(buffer[i] != '')

  {

    if(isalpha(buffer[i]))

      num_letters++;              /* Increment letter count      */

if(isdigit(buffer[i++]))

      num_digits++;              /* Increment digit count      */

  }

  printf("
Your string contained %d letters and %d digits.
",

                                              num_letters, num_digits);

  return 0;

}

The following is typical output from this program:

Enter an interesting string of less than 80 characters:

I was born on the 3rd of October 1895



Your string contained 24 letters and 5 digits.

How It Works

This example is quite straightforward. You read the string into the array, buffer, with the following statement:

gets(buffer);

The string that you enter is read into the array buffer using a new standard library function, gets(). So far, you've used only scanf() to accept input from the keyboard, but it's not very useful for reading strings because it interprets a space as the end of an input value. The gets() function has the advantage that it will read all the characters entered from the keyboard, including blanks, up to when you press the Enter key. This is then stored as a string into the area specified by its argument, which in this case is the buffer array. A '' will be appended to the string automatically.

As with any input or output operation, things can go wrong. If an error of some kind prevents the gets() function from reading the input successfully, it will return NULL (normally, it returns the address passed as the argument--buffer, in this case). You could therefore check that the read operation was successful using the following code fragment:

if(gets(buffer) == NULL)

{

  printf("Error reading input.");

  return 1;                        /* End the program */

}

This will output a message and end the program if the read operation fails for any reason. Errors on keyboard input are relatively rare, so you won't include this testing when you're reading from the keyboard in your examples; but if you are reading from a file, verifying that the read was successful is essential.

A disadvantage of the gets() function is that it will read a string of any length and attempt to store it in buffer. There is no check that buffer has sufficient space to store the string so there's another opportunity to crash the program. To avoid this you could use the fgets() function, which allows you to specify the maximum length of the input string. This is a function that is used for any kind of input stream, as opposed to gets() which only reads from the standard input stream stdin; so you also have to specify a third argument to fgets() indicating the stream that is to be read. Here's how you could use fgets() to read a string from the keyboard:

if(fgets(buffer, sizeof(buffer), stdin) == NULL)

{

  printf("Error reading input.");

  return 1;                        /* End the program */

}

The fgets() function reads a maximum of one less than the number of characters specified by the second argument. It then appends a character to the end of the string in memory, so the second argument in this case is sizeof(buffer). Note that there is another important difference between fgets() and gets(). For both functions, reading a newline character ends the input process, but fgets() stores a '' character when a newline is entered, whereas gets() does not. This means that if you are reading strings from the keyboard, strings read by fgets() will be one character longer than strings read by gets(). It also means that just pressing the Enter key as the input will result in an empty string "" with gets(), but will result in the string " " with fgets(). You'll use fgets() in the next example in this chapter, Program 6.9, where you have to take account of the newline character that is stored as part of the string. You'll also see more about the fgets() function in Chapter 12.

The statements that analyze the string are as follows:

while(buffer[i] != '')

{

  if(isalpha(buffer[i]))

    num_letters++;                /* Increment letter count      */

  if(isdigit(Buffer[i++]))

    num_digits++;                /* Increment digit count      */

}

The input string is tested character by character in the while loop. Checks are made for alphabetic characters and digits in the two if statements. When either is found, the appropriate counter is incremented. Note that you increment the index to the buffer array in the second if. Remember, because you're using the postfix form of the increment operator, the check is made using the current value of i, and then i is incremented.

You could implement this without using if statements:

while(buffer[i] != '')

{

  num_letters += isalpha(buffer[i]) != 0;

  num_digits += isdigit(buffer[i++]) != 0;

}

The test functions return a nonzero value (not necessarily 1, though) if the argument belongs to the group of characters being tested for. The value of the logical expressions to the right of the assignment operators will be true if the character does belong to the category you're testing for; otherwise, it will be false.

The way you've coded the example isn't a particularly efficient way of doing things, because you test for a digit even if you've already discovered the current character is alphabetic. You could try to improve on this if the TV is really bad one night.

Converting Characters

You've already seen that the standard library also includes two conversion functions that you get access to through <ctype.h>. The toupper() function converts from lowercase to uppercase, and the tolower() function does the reverse. Both functions return either the converted character or the same character for characters that are already in the correct case. You can therefore convert a string to uppercase using this statement:

for(int i = 0 ; (buffer[i] = toupper(buffer[i])) != '' ; i++);

This loop will convert the entire string to uppercase by stepping through the string one character at a time, converting lowercase to uppercase and leaving uppercase characters unchanged. The loop stops when it reaches the string termination character ''. This sort of pattern in which everything is done inside the loop control expressions is quite common in C.

Let's try a working example that applies these functions to a string.

TRY IT OUT: CONVERTING CHARACTERS

You can use the function toupper() in combination with the strstr() function to find out whether one string occurs in another, ignoring case. Look at the following example:

/* Program 6.9 Finding occurrences of one string in another  */

#include <stdio.h>

#include <string.h>

#include <ctype.h>



int main(void)

{

  char text[100];                /* Input buffer for string to be searched */

  char substring[40];            /* Input buffer for string sought          */



  printf("
Enter the string to be searched (less than 100 characters):
");

  fgets(text, sizeof(text), stdin);



  printf("
Enter the string sought (less than 40 characters):
");

  fgets(substring, sizeof(substring), stdin);



  /* overwrite the newline character in each string */

  text[strlen(text)-1] = '';

  substring[strlen(substring)-1] = '';



  printf("
First string entered:
%s
", text);

  printf("
Second string entered:
%s
", substring);



  /* Convert both strings to uppercase. */

  for(int i = 0 ; (text[i] = toupper(text[i])) ; i++);

  for(int i = 0 ; (substring[i] = toupper(substring[i])) ; i++);



    printf("
The second string %s found in the first.",

              ((strstr(text, substring) == NULL) ? "was not" : "was"));

  return 0;

}

Typical operation of this example will produce the following:

Enter the string to be searched(less than 100 characters):

Cry havoc, and let slip the dogs of war.



Enter the string sought (less than 40 characters ):

The Dogs of War



First string entered:

Cry havoc, and let slip the dogs of war



Second string entered:

The Dogs of War



The second string was found in the first.

How It Works

This program has three distinct phases: getting the input strings, converting both strings to uppercase, and searching the first string for an occurrence of the second.

First of all, you use printf() to prompt the user for the input, and you use the fgets() function introduced in the discussion of the previous example to read the input into text and substring:

printf("
Enter the string to be searched(less than 100 characters):
");

  fgets(text. sizeof(text), stdin);

  printf("
Enter the string sought (less than 40 characters ):
");

  gets(substring, sizeof(substring), stdin);

You use the fgets() function here because it will read in any string from the keyboard, including spaces, the input being terminated when the Enter key is pressed. The input process will only allow 99 characters to be entered for the first string, text, and 39 characters for the second string, substring. If more characters are entered they will be ignored so the operation of the program is safe.

You'll recall that fgets() stores the newline character that ends the input process. This doesn't matter particularly for the first string but it matters a lot for the second string you are searching for. For example, if the string you want to find is "dogs", the fgets() function will actually store "dogs ", which is not the same at all. You therefore remove the newline from each string by overwriting it with a '' character:

text[strlen(text)-1] = '';

  substring[strlen(substring)-1] = '';

The newline character is the next to last character in each string and the index for this position is the string length less 1.

Of course, if you exceed the limits for input, the strings will be truncated and the results are unlikely to be correct. This will be evident from the listing of the two strings that is produced by the following:

printf("
First string entered:
%s
", text);

  printf("
Second string entered:
%s
", substring);

The conversion of both strings to uppercase is accomplished using the following statements:

for(int i = 0 ; (text[i] = toupper(text[i])) ; i++);

  for(int i = 0 ; (substring[i] = toupper(substring[i])) ; i++);

You use for loops to do the conversion and the work is done entirely within the control expressions for the loops. The first for loop initializes i to 0, and then converts the ith character of text to uppercase in the loop condition and stores that result back in the same position in text. The loop continues as long as the character code stored in text[i] in the second loop control expression is nonzero, which will be for any character except NULL. The index i is incremented in the third loop control expression. This ensures that there's no confusion as to when the incrementing of i takes place. The second loop works in exactly the same way to convert substring to uppercase.

With both strings in uppercase, you can test for the occurrence of substring in text, regardless of the case of the original strings. The test is done inside the output statement that reports the result:

printf("
The second string %s found in the first.",

              ((strstr(text, substring) == NULL) ? "was not" : "was"));

The conditional operator chooses either "was not" or "was" to be part of the output string, depending on whether the strstr() function returns NULL. You saw earlier that the strstr() function returns NULL when the string specified by the second argument isn't found in the first. Otherwise, it returns the address where the string was found.

Converting Strings to Numerical Values

The <stdlib.h> header file declares functions that you can use to convert a string to a numerical value. Each of the functions in Table 6-2 requires an argument that's a pointer to a string or an array of type char that contains a string that's a representation of a numerical value.

Table 6-2. Functions That Convert Strings to Numerical Values

Function	Returns
`atof()`	A value of type `double` that is produced from the string argument
`atoi()`	A value of type `int` that is produced from the string argument
`atol()`	A value of type `long` that is produced from the string argument
`atoll()`	A value of type `long long` that is produced from the string argument

These functions are very easy to use, for example

char value_str[] = "98.4";

double value = 0;

value = atof(value_str);          /* Convert string to floating-point */

The value_str array contains a string representation of a value of type double. You pass the array name as the argument to the atof() function to convert it to type double. You use the other three functions in a similar way.

These functions are particularly useful when you need to read numerical input in the format of a string. This can happen when the sequence of the data input is uncertain, so you need to analyze the string in order to determine what it contains. Once you've figured out what kind of numerical value the string represents, you can use the appropriate library function to convert it.

Working with Wide Character Strings

Working with wide character strings is just as easy as working with the strings you have been using up to now. You store a wide character string in an array of elements of type wchar_t and a wide character string constant just needs the L modifier in front of it. Thus you can declare and initialize a wide character string like this:

wchar_t proverb[] = L"A nod is as good as a wink to a blind horse.";

As you saw back in Chapter 2, a wchar_t character occupies 2 bytes. The proverb string contains 44 characters plus the terminating null, so the string will occupy 90 bytes.

If you wanted to write the proverb string to the screen using printf() you must use the %S format specifier rather than %s that you use for ASCII string. If you use %s, the printf() function will assume the string consists of single-byte characters so the output will not be correct. Thus the following statement will output the wide character string correctly:

printf("The proverb is:
%S", proverb);

Operations on Wide Character Strings

The <wchar.h> header file declares a range of functions for operating on wide character strings that parallel the functions you have been working with that apply to ordinary strings. Table 6-3 shows the functions declared in <wchar.h> that are the wide character equivalents to the string functions I have already discussed in this chapter.

Table 6-3. Functions That Operate on Wide Character Strings

Function	Description
`wcslen(const wchar_t* ws)`	Returns a value of type `size_t` that is the length of the wide character string ws that you pass as the argument. The length excludes the termination `L''` character.
`wcscpy(wchar_t* destination, const wchar_t source)`	Copies the wide character string `source` to the wide character `string` destination. The function returns `source`.
`wcsncpy(wchar_t* destination, const wchar_t source, size_t n)`	Copies `n` characters from the wide character string `source` to the wide character string `destination`. If `source` contains less than `n` characters, `destination` is padded with `L''` characters. The function returns `source`.
`wcscat(whar_t* ws1, whar_t* ws2)`	Appends a copy of `ws2` to `ws1`. The first character of `ws2` overwrites the terminating null at the end of `ws1`. The function returns `ws1`.
`wcsncmp(const wchar_t* ws1, const wchar_t* ws2)`	Compares the wide character string pointed to by `ws1` with the wide character string pointed to by `ws2` and returns a value of type `int` that is less than, equal to, or greater than 0 if the string `ws1` is less than, equal to, or greater than the string `ws2`.
`wcscmp(const wchar_t* ws1, const wchar_t* ws2, size_t n)`	Compares up to `n` characters from the wide character string pointed to by `ws1` with the wide character string pointed to by `ws2`. The function returns a value of type `int` that is less than, equal to, or greater than 0 if the string of up to `n` characters from `ws1` is less than, equal to, or greater than the string of up to `n` characters from `ws2`.
`wcschr(const wchar_t* ws, wchar_t wc)`	Returns a pointer to the first occurrence of the wide character, `wc`, in the wide character string pointed to by `ws`. If `wc` is not found in `ws`, the `NULL` pointer value is returned.
`wcsstr(const wchar_t* ws1, const wchar_t* ws2)`	Returns a pointer to the first occurrence of the wide character string `ws2` in the wide character string `ws1`. If `ws2` is not found in `ws1`, the `NULL` pointer value is returned.

As you see from the descriptions, all these functions work in essentially the same way as the string functions you have already seen. Where the const keyword appears in the specification of the type of argument you can supply to a function, it implies that the argument will not be modified by the function. This forces the compiler to check that the function does not attempt to change such arguments. You'll see more on this in Chapter 7 when you explore how you create your own functions in more detail.

The <wchar.h> header also declares the fgetws() function that reads a wide character string from a stream such as stdin, which by default corresponds to the keyboard. You must supply three arguments to the fgetws() function, just like the fgets() function you use for reading for single-byte strings:

The first argument is a pointer to an array of wchar_t elements that is to store the string.
The second argument is a value n of type size_t that is the maximum number of characters that can be stored in the array.
The third argument is the stream from which the data is to be read, which will be stdin when you are reading a string from the keyboard.

The function reads up to n-1 characters from the stream and stores them in the array with an L'' appended. Reading a newline in less than n-1 characters from the stream signals the end of input. The function returns a pointer to the array containing the string.

Testing and Converting Wide Characters

The <wchar.h> header also declares functions to test for specific subsets of wide characters, analogous to the functions you have seen for characters of type char. These are shown in Table 6.4.

Table 6-4. Wide Character Classification Functions

Function	Tests For
`iswlower()`	Lowercase letter
`iswupper()`	Uppercase letter
`iswalnum()`	Uppercase or lowercase letter
`iswcntrl()`	Control character
`iswprint()`	Any printing character including space
`iswgraph()`	Any printing character except space
`iswdigit()`	Decimal digit (`L'0'` to `L'9'`)
`iswxdigit()`	Hexadecimal digit (`L'0'` to `L'9'`, `L'A'` to `L'F'`, `L'a'` to `L'f'`)
`iswblank()`	Standard blank characters (space, `L' '`)
`iswspace()`	Whitespace character (space, `L' '`, `L' '`, `L'v'`, `L' '`, `L'f'`)
`iswpunct()`	Printing character for which `iswspace()` and `iswalnum()` return `false`

You also have the case-conversion functions, towlower() and towupper(), that return the lowercase or uppercase equivalent of the wchar_t argument.

You can see some of the wide character functions in action with a wide character version of Program 6.9.

TRY IT OUT: CONVERTING WIDE CHARACTERS

This example uses the wide character equivalents of fgets(), toupper(), and wcsstr(). The code that has changed from Program 6.9 is shown in bold type.

/* Program 6.9A Finding occurrences of one wide character string in another  */

#include <stdio.h>

#include <wchar.h>



int main(void)

{

  wchar_t text[100];              /* Input buffer for string to be searched */

  wchar_t substring[40];          /* Input buffer for string sought          */



  printf("
Enter the string to be searched(less than 100 characters):
");

  fgetws(text, 100, stdin);

  printf("
Enter the string sought (less than 40 characters ):
");

  fgetws(substring, 40, stdin);



  /* overwrite the newline character in each string */

  text[wcslen(text)-1] = L'';

  substring[wcslen(substring)-1] = L'';



  printf("
First string entered:
%S
", text);

  printf("
Second string entered:
%S
", substring);



  /* Convert both strings to uppercase. */

  for(int i = 0 ; (text[i] = towupper(text[i])) ; i++);

  for(int i = 0 ; (substring[i] = towupper(substring[i])) ; i++);



    printf("
The second string %s found in the first.",

              ((wcsstr(text, substring) == NULL) ? "was not" : "was"));

  return 0;

}

The output will be the same as for the previous example.

How It Works

This works in the same way as the previous example except that it stores the input as wide character strings and makes use of wide character functions. The example is so similar there is not much to say about it. Of course, the arrays now have elements of type wchar_t and the names of the functions are slightly different. Reading from the keyboard into the wide character arrays is accomplished by the fgetws() function where you supply the limit on the number of characters that can be stored and the name of the stream as the second and third arguments. We replace the newline character in each string with the wide character version of the null terminator, L''. Prefixing a character literal with L makes it a literal of type wchar_t. Of course, the statements that output the strings use %S because we are outputting wide character strings.

Designing a Program

You've almost come to the end of this chapter. All that remains is to go through a larger example to use some of what you've learned so far.

The Problem

You are going to develop a program that will read a paragraph of text of an arbitrary length that is entered from the keyboard, and determine the frequency of which each word in the text occurs, ignoring case. The paragraph length won't be completely arbitrary, as you'll have to specify some limit for the array size within the program, but you can make the array that holds the text as large as you want.

The Analysis

To read the paragraph from the keyboard, you need to be able to read input lines of arbitrary length and assemble them into a single string that will ultimately contain the entire paragraph. You don't want lines truncated either, so fgets() looks like a good candidate for the input operation. If you define a symbol at the beginning of the code that specifies the array size to store the paragraph, you will be able to change the capacity of the program by changing the definition of the symbol.

The text will contain punctuation, so you will have to deal with that somehow if you are to be able to separate one word from another. It would be easy to extract the words from the text if each word is separated from the next by one or more spaces. You can arrange for this by replacing all characters that are not characters that appear in a word with spaces. You'll remove all the punctuation and any other odd characters that are lying around in the text. We don't need to retain the original text, but if you did you could just make a copy before eliminating the punctuation.

Separating out the words will be simple. All you need to do is extract each successive sequence of characters that are not spaces as a word. You can store the words in another array. Since you want to count word occurrences, ignoring case, you can store each word as lowercase. As you find a new word, you'll have to compare it with all the existing words you have found to see if it occurs previously. You'll only store it in the array if it is not already there. To record the number of occurrences of each word, you'll need another array to store the word counts. This array will need to accommodate as many counts as the number of words you have provided for in the program.

The Solution

This section outlines the steps you'll take to solve the problem. The program boils down to a simple sequence of steps that are more or less independent of one another. At the moment, the approach to implementing the program will be constrained by what you have learned up to now, and by the time you get to Chapter 9 you'll be able to implement this much more efficiently.

Step 1

The first step is to read the paragraph from the keyboard. As this is an arbitrary number of input lines it will be necessary to involve an indefinite loop. Let's first define the variables that we'll be using to code up the input mechanism:

/* Program 6.10 Analyzing text */

#include <stdio.h>

#include <string.h>



#define TEXTLEN  10000      /* Maximum length of text            */

#define BUFFERSIZE 100      /* Input buffer size                  */



int main(void)

{

  char text[TEXTLEN+1];

  char buffer[BUFFERSIZE];

  char endstr[] = "*
";          /* Signals end of input        */



  printf("Enter text on an arbitrary number of lines.");

  printf("
Enter a line containing just an asterisk to end input:

");



  /* Read an arbitrary number of lines of text */

  while(true)

  {

    /* A string containing an asterisk followed by newline */

    /* signals end of input                                */

    if(!strcmp(fgets(buffer, BUFFERSIZE, stdin), endstr))

      break;



    /* Check if we have space for latest input */

    if(strlen(text)+strlen(buffer)+1 > TEXTLEN)

      {

        printf("Maximum capacity for text exceeded. Terminating program.");

        return 1;

      }

    strcat(text, buffer);

  }



  /* Plus the rest of the program code ... */



  return 0;

}

You can compile and run this code as it stands if you like. The symbols TEXTLEN and BUFFERSIZE specify the capacity of the text array and the buffer array respectively. The text array will store the entire paragraph, and the buffer array stores a line of input. We need some way for the user to tell the program when he is finished entering text. As the initial prompt for input indicates, entering a single asterisk on a line will do this. The single asterisk input will be read by the fgets() function as the string "* " because the function stores newline characters that arise when the Enter key is pressed. The endstr array stores the string that marks the end of the input so you can compare each input line with this array.

The entire input process takes place within the indefinite while loop that follows the prompt for input. A line of input is read in the if statement:

if(!strcmp(fgets(buffer, BUFFERSIZE, stdin), endstr))

      break;

The fgets() function reads a maximum of BUFFERSIZE-1 characters from stdin. If the user enters a line longer than this, it won't really matter. The characters that are in excess of BUFFERSIZE-1 will be left in the input stream and will be read on the next loop iteration. You can check that this works by setting BUFFERSIZE at 10, say, and entering lines longer than ten characters.

Because the fgets() function returns a pointer to the string that you pass as the first argument, you can use fgets() as the argument to the strcmp() function to compare the string that was read with endstr. Thus, the if statement not only reads a line of input, it also checks whether the end of the input has been signaled by the user.

Before you append the new line of input to what's already stored in text, you check that there is still sufficient free space in text to accommodate the additional line. To append the new line, just use the strcat() library function to concatenate the string stored in buffer with the existing string in text.

Here's an example of output that results from executing this input operation:

Enter text on an arbitrary number of lines.

Enter a line containing just an asterisk to end input:



Mary had a little lamb,

Its feet were black as soot,

And into Mary's bread and jam,

His sooty foot he put.

*

Step 2

Now that you have read all the input text, you can replace the punctuation and any newline characters recorded by the fgets() function by spaces. The following code goes immediately before the return statement at the end of the previous version of main():

/* Replace everything except alpha and single quote characters by spaces */

  for(int i = 0 ; i < strlen(text) ; i++)

  {

    if(text[i] == quote || isalnum(text[i]))

      continue;

    text[i] = space;

  }

The loop iterates over the characters in the string stored in the text array. We are assuming that words can only contain letters, digits, and single-quote characters, so anything that is not in this set is replaced by a space character. The isalnum() that returns true for a character that is a letter or a digit is declared in the <ctype.h> header file so you must add an #include statement for this to the program. You also need to add declarations for the variables quote and space, following the declaration for endstr:

const char space = ' ';

const char quote = ''';

You could, of course, use character literals directly in the code, but defining variables like this helps to make the code a little more readable.

Step 3

The next step is to extract the words from the text array and store them in another array. You can first add a couple more definitions for symbols that relate to the array you will use to store the words. These go immediately after the definition for BUFFERSIZE:

#define MAXWORDS    500      /* Maximum number of different words */

#define WORDLEN      15      /* Maximum word length                */

You can now add the declarations for the additional arrays and working storage that you'll need for extracting the words from the text, and you can put these after the existing declarations at the beginning of main():

char words[MAXWORDS][WORDLEN+1];

  int nword[MAXWORDS];            /* Number of word occurrences */

  char word[WORDLEN+1];            /* Stores a single word        */

  int wordlen = 0;                /* Length of a word            */

  int wordcount = 0;              /* Number of words stored      */

The words array stores up to MAXWORDS word strings of length WORDLEN, excluding the terminating null. The nword array hold counts of the number of occurrences of the corresponding words in the words array. Each time you find a new word, you'll store it in the next available position in the words array and set the element in the nword array that is at the same index position to 1. When you find a word that you have found and stored previously in words, you just need to increment the corresponding element in the nword array.

You'll extract words from the text array in another indefinite while loop because you don't know in advance how many words there are. There is quite a lot of code in this loop so we'll put it together incrementally. Here's the initial loop contents:

/* Find unique words and store in words array */

  int index = 0;

  while(true)

  {

    /* Ignore any leading spaces before a word */

    while(text[index] == space)

      ++index;



    /* If we are at the end of text, we are done */

    if(text[index] == '')

      break;



    /* Extract a word */

    wordlen = 0;          /* Reset word length */

    while(text[index] == quote || isalpha(text[index]))

    {

      /* Check if word is too long */

      if(wordlen == WORDLEN)

      {

        printf("Maximum word length exceeded. Terminating program.");

        return 1;

      }

      word[wordlen++] = tolower(text[index++]);  /* Copy as lowercase      */

    }

    word[wordlen] = '';                        /* Add string terminator */

  }

This code follows the existing code in main(), immediately before the return statement at the end.

The index variable records the current character position in the text array. The first operation within the outer loop is to move past any spaces that are there so that index refers to the first character of a word. You do this in the inner while loop that just increments index as long as the current character is a space.

It's possible that the end of the string in text has been reached, so you check for this next. If the current character at position index is '', you exit the loop because all words must have been extracted.

Extracting a word just involves copying any character that is alphanumeric or a single quote. The first character that is not one of these marks the end of a word. You copy the characters that make up the word into the word array in another while loop, after converting each character to lowercase using the tolower() function from the standard library. Before storing a character in word, you check that the size of the array will not be exceeded. After the copying process, you just have to append a terminating null to the characters in the word array.

The next operation to be carried out in the loop is to see whether the word you have just extracted already exists in the words array. The following code does this and goes immediately before the closing brace for the while loop in the previous code fragment:

/* Check for word already stored */

    bool isnew = true;

    for(int i = 0 ; i< wordcount ; i++)

      if(strcmp(word, words[i]) == 0)

      {

        ++nword[i];

        isnew = false;

        break;

      }

The isnew variable records whether the word is present and is first initialized to indicate that the latest word you have extracted is indeed a new word. Within the for loop you compare word with successive strings in the words array using the strcmp() library function that compares two strings. The function returns 0 if the strings are identical; as soon as this occurs you set isnew to false, increment the corresponding element in the nword array, and exit the for loop.

The last operation within the indefinite loop that extracts words from text is to store the latest word in the words array, but only if it is new, of course. The following code does this:

if(isnew)

    {

      /* Check if we have space for another word */

      if(wordcount >= MAXWORDS)

      {

        printf("
 Maximum word count exceeded. Terminating program.");

        return 1;

      }



      strcpy(words[wordcount], word);    /* Store the new word  */

      nword[wordcount++] = 1;            /* Set its count to 1  */

    }

This code also goes after the previous code fragment, but before the closing brace in the indefinite while loop. If the isnew indicator is true, you have a new word to store, but first you verify that there is still space in the words array. The strcpy() function copies the string in word to the element of the words array selected by wordcount. You then set the value of the corresponding element of the nword array that holds the count of the number of times a word has been found in the text.

Step 4

The last code fragment that you need will output the words and their frequencies of occurrence. Following is a complete listing of the program with the additional code from steps 3 and 4 highlighted in bold font:

/* Program 6.10 Analyzing text */

#include <stdio.h>

#include <stdbool.h>

#include <string.h>

#include <ctype.h>



#define TEXTLEN  10000      /* Maximum length of text            */

#define BUFFERSIZE 100      /* Input buffer size                  */

#define MAXWORDS    500      /* Maximum number of different words */

#define WORDLEN             15      /* Maximum word length                */



int main(void)

{

  char text[TEXTLEN+1];

  char buffer[BUFFERSIZE];

  char endstr[] = "*
";          /* Signals end of input        */



  const char space = ' ';

  const char quote = ''';



  char words[MAXWORDS][WORDLEN+1];

  int nword[MAXWORDS];            /* Number of word occurrences */

  char word[WORDLEN+1];            /* Stores a single word        */

  int wordlen = 0;                /* Length of a word            */

  int wordcount = 0;              /* Number of words stored      */



  printf("Enter text on an arbitrary number of lines.");

  printf("
Enter a line containing just an asterisk to end input:

");



  /* Read an arbitrary number of lines of text */

  while(true)

  {

    /* A string containing an asterisk followed by newline */

    /* signals end of input                                */

    if(!strcmp(fgets(buffer, BUFFERSIZE, stdin), endstr))

      break;



    /* Check if we have space for latest input */

    if(strlen(text)+strlen(buffer)+1 > TEXTLEN)

      {

        printf("Maximum capacity for text exceeded. Terminating program.");

        return 1;

      }

    strcat(text, buffer);

  }



  /* Replace everything except alpha and single quote characters by spaces */

  for(int i = 0 ; i < strlen(text) ; i++)

  {

    if(text[i] == quote || isalnum(text[i]))

      continue;

    text[i] = space;

  }

  /* Find unique words and store in words array */

  int index = 0;

  while(true)

  {

    /* Ignore any leading spaces before a word */

    while(text[index] == space)

      ++index;



    /* If we are at the end of text, we are done */

    if(text[index] == '')

      break;



    /* Extract a word */

    wordlen = 0;          /* Reset word length */

    while(text[index] == quote || isalpha(text[index]))

    {

      /* Check if word is too long */

      if(wordlen == WORDLEN)

      {

        printf("Maximum word length exceeded. Terminating program.");

        return 1;

      }

      word[wordlen++] = tolower(text[index++]);  /* Copy as lowercase      */

    }

    word[wordlen] = '';                        /* Add string terminator */



    /* Check for word already stored */

    bool isnew = true;

    for(int i = 0 ; i< wordcount ; i++)

      if(strcmp(word, words[i]) == 0)

      {

        ++nword[i];

        isnew = false;

        break;

      }



    if(isnew)

    {

      /* Check if we have space for another word */

      if(wordcount >= MAXWORDS)

      {

        printf("
 Maximum word count exceeded. Terminating program.");

        return 1;

      }



      strcpy(words[wordcount], word);    /* Store the new word  */

      nword[wordcount++] = 1;            /* Set its count to 1  */

    }

  }

  /* Output the words and frequencies */

  for(int i = 0 ; i<wordcount ; i++)

  {

    if( !(i%3) )                         /* Three words to a line */

      printf("
");

    printf("  %-15s%5d", words[i], nword[i]);

  }



  return 0;

}

The seven lines highlighted in bold output the words and corresponding frequencies. This is very easily done in a for loop that iterates over the number of words. The loop code arranges for three words plus frequencies to be output per line by writing a newline character to stdout if the current value of i is a multiple of 3. The expression i%3 will be zero when i is a multiple of 3, and this value maps to the bool value false, so the expression !(i%3) will be true.

The program ends up as a main() function of more than 100 statements. When you learn the complete C language you would organize this program very differently with the code segmented into several much shorter functions. By Chapter 9 you'll be in a position to do this, and I would encourage you to revisit this example when you reach the end of Chapter 9. Here's a sample of output from the complete program:

Enter text on an arbitrary number of lines.

Enter a line containing just an asterisk to end input:



When I makes tea I makes tea, as old mother Grogan said.

And when I makes water I makes water.

Begob, ma'am, says Mrs Cahill, God send you don't make them in the same pot.

*



  when              2  i                  4  makes              4

  tea                2  as                1  old                1

  mother            1  grogan            1  said              1

  and                1  water              2  begob              1

  ma'am              1  says              1  mrs                1

  cahill            1  god                1  send              1

  you                1  don't              1  make              1

  them              1  in                1  the                1

  same              1  pot                1

Summary

In this chapter, you applied the techniques you acquired in earlier chapters to the general problem of dealing with character strings. Strings present a different, and perhaps more difficult, problem than numeric data types.

Most of the chapter dealt with handling strings using arrays, but I also mentioned pointers. These will provide you with even more flexibility in dealing with strings, and many other things besides, as you'll discover as soon as you move on to the next chapter.

Exercises

The following exercises enable you to try out what you've learned in this chapter. If you get stuck, look back over the chapter for help. If you're still stuck, you can download the solutions from the Source Code/Downloads section of the Apress web site (http://www.apress.com), but that really should be a last resort.

Exercise 6-1. Write a program that will prompt for and read a positive integer less than 1000 from the keyboard, and then create and output a string that is the value of the integer in words. For example, if 941 is entered, the program will create the string "Nine hundred and forty one".

Exercise 6-2. Write a program that will allow a list of words to be entered separated by commas, and then extract the words and output them one to a line, removing any leading or trailing spaces. For example, if the input is

John , Jack , Jill

then the output will be

John

Jack

Jill

Exercise 6-3. Write a program that will output a randomly chosen thought for the day from a set of at least five thoughts of your own choosing.

Exercise 6-4. A palindrome is a phrase that reads the same backward as forward, ignoring whitespace and punctuation. For example, "Madam, I'm Adam" and "Are we not drawn onward, we few? Drawn onward to new era?" are palindromes. Write a program that will determine whether a string entered from the keyboard is a palindrome.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 6: Applications with Strings and Text

Create new playlist

Sign In

Sign Up

CHAPTER 6Applications with Strings and Text

What Is a String?

String- and Text-Handling Methods

Operations with Strings

Appending a String

Arrays of Strings

String Library Functions

Copying Strings Using a Library Function

Determining String Length Using a Library Function

Joining Strings Using a Library Function

Comparing Strings

Searching a String

The Idea of a Pointer

Searching a String for a Character

Searching a String for a Substring

Analyzing and Transforming Strings

Converting Characters

Converting Strings to Numerical Values

Working with Wide Character Strings

Operations on Wide Character Strings

Testing and Converting Wide Characters

Designing a Program

The Problem

The Analysis

The Solution

Step 1

Step 2

Step 3

Step 4

Summary

Exercises

Table of Contents for
CHAPTER 6: Applications with Strings and Text

CHAPTER 6

Applications with Strings and Text