NOW, YOU ARE READY TO TAKE a look at character strings in more detail. You were first introduced to character strings in Chapter 3, “Compiling and Running Your First Program,” when you wrote your first C program. In the statement
printf ("Programming in C is fun. ");
the argument that is passed to the printf
function is the character string
"Programming in C is fun. "
The double quotation marks are used to delimit the character string, which can contain any combinations of letters, numbers, or special characters, other than a double quotation mark. But as you shall see shortly, it is even possible to include a double quotation mark inside a character string.
When introduced to the data type char
, you learned that a variable that is declared to be of this type can contain only a single character. To assign a single character to such a variable, the character is enclosed within a pair of single quotation marks. Thus, the assignment
plusSign = '+';
has the effect of assigning the character '+'
to the variable plusSign
, assuming it has been appropriately declared. In addition, you learned that there is a distinction made between the single quotation and double quotation marks, and that if plusSign
is declared to be of type char
, then the statement
plusSign = "+";
is incorrect. Be certain you remember that single quotation and double quotation marks are used to create two different types of constants in C.
If you want to be able to deal with variables that can hold more than a single character,[1] this is precisely where the array of characters comes into play.
In Program 7.6, you defined an array of characters called word
as follows:
char word [] = { 'H', 'e', 'l', 'l', 'o', '!' };
Remembering that in the absence of a particular array size, the C compiler automatically computes the number of elements in the array based upon the number of initializers, this statement reserves space in memory for exactly six characters, as shown in Figure 10.1.
To print out the contents of the array word
, you ran through each element in the array and displayed it using the %c
format characters.
With this technique, you can begin to build an assortment of useful functions for dealing with character strings. Some of the more commonly performed operations on character strings include combining two character strings together (concatenation), copying one character string to another, extracting a portion of a character string (substring), and determining if two character strings are equal (that is, if they contain the same characters). Take the first mentioned operation, concatenation, and develop a function to perform this task. You can define a call to your concat
function as follows:
concat (result, str1, n1, str2, n2);
where str1
and str2
represent the two character arrays that are to be concatenated and n1
and n2
represent the number of characters in the respective arrays. This makes the function flexible enough so that you can concatenate two character arrays of arbitrary length. The argument result
represents the character array that is to be the destination of the concatenated character arrays str1
followed by str2
. See Program 10.1.
Example 10.1. Concatenating Character Arrays
// Function to concatenate two character arrays #include <stdio.h> void concat (char result[], const char str1[], int n1, const char str2[], int n2) { int i, j; // copy str1 to result for ( i = 0; i < n1; ++i ) result[i] = str1[i]; // copy str2 to result for ( j = 0; j < n2; ++j ) result[n1 + j] = str2[j]; } int main (void) { void concat (char result[], const char str1[], int n1, const char str2[], int n2); const char s1[5] = { 'T', 'e', 's', 't', ' '}; const char s2[6] = { 'w', 'o', 'r', 'k', 's', '.' }; char s3[11]; int i; concat (s3, s1, 5, s2, 6); for ( i = 0; i < 11; ++i ) printf ("%c", s3[i]); printf (" "); return 0; }
The first for
loop inside the concat
function copies the characters from the str1
array into the result
array. This loop is executed n1
times, which is the number of characters contained inside the str1
array.
The second for
loop copies str2
into the result
array. Because str1
was n1
characters long, copying into result
begins at result[n1]
—the position immediately following the one occupied by the last character of str1
. After this for
loop is done, the result
array contains the n1+n2
characters representing str2
concatenated to the end of str1
.
Inside the main
routine, two const character
arrays, s1
and s2
, are defined. The first array is initialized to the characters 'T'
, 'e'
, 's'
, 't'
, and ''
. This last character represents a blank space and is a perfectly valid character constant. The second array is initially set to the characters 'w'
, 'o'
, 'r'
, 'k'
, 's'
, and '.'
. A third character array, s3
, is defined with enough space to hold s1
concatenated to s2
, or 11 characters. It is not declared as a const
array because its contents will be changed.
The function call
concat (s3, s1, 5, s2, 6);
calls the concat
function to concatenate the character arrays s1
and s2
, with the destination array s3
. The arguments 5 and 6 are passed to the function to indicate the number of characters in s1
and s2
, respectively.
After the concat
function has completed execution and returns to main
, a for
loop is set up to display the results of the function call. The 11 elements of s3
are displayed at the terminal, and as can be seen from the program’s output, the concat
function seems to be working properly. In the preceding program example, it is assumed that the first argument to the concat
function—the result array—contains enough space to hold the resulting concatenated character arrays. Failure to do so can produce unpredictable results when the program is run.
You can adopt a similar approach to that used by the concat
function for defining other functions to deal with character arrays. That is, you can develop a set of routines, each of which has as its arguments one or more character arrays plus the number of characters contained in each such array. Unfortunately, after working with these functions for a while, you will find that it gets a bit tedious trying to keep track of the number of characters contained in each character array that you are using in your program—especially if you are using your arrays to store character strings of varying sizes. What you need is a method for dealing with character arrays without having to worry about precisely how many characters you have stored in them.
There is such a method, and it is based upon the idea of placing a special character at the end of every character string. In this manner, the function can then determine for itself when it has reached the end of a character string after it encounters this special character. By developing all of your functions to deal with character strings in this fashion, you can eliminate the need to specify the number of characters that are contained inside a character string.
In the C language, the special character that is used to signal the end of a string is known as the null character and is written as ' '
. So, the statement
const char word [] = { 'H', 'e', 'l', 'l', 'o', '!', ' ' };
defines a character array called word
that contains seven characters, the last of which is the null character. (Recall that the backslash character [] is a special character in the C language and does not count as a separate character; therefore,
' '
represents a single character in C.) The array word
is depicted in Figure 10.2.
To begin with an illustration of how these variable-length character strings are used, write a function that counts the number of characters in a character string, as shown in Program 10.2. Call the function stringLength
and have it take as its argument a character array that is terminated by the null character. The function determines the number of characters in the array and returns this value back to the calling routine. Define the number of characters in the array as the number of characters up to, but not including, the terminating null character. So, the function call
stringLength (characterString)
should return the value 3
if characterString
is defined as follows:
char characterString[] = { 'c', 'a', 't', ' ' };
Example 10.2. Counting the Characters in a String
// Function to count the number of characters in a string #include <stdio.h> int stringLength (const char string[]) { int count = 0; while ( string[count] != ' ' ) ++count; return count; } int main (void) { int stringLength (const char string[]); const char word1[] = { 'a', 's', 't', 'e', 'r', ' ' }; const char word2[] = { 'a', 't', ' ' }; const char word3[] = { 'a', 'w', 'e', ' ' }; printf ("%i %i %i ", stringLength (word1), stringLength (word2), stringLength (word3)); return 0; }
The stringLength
function declares its argument as a const
array of characters because it is not making any changes to the array, merely counting its size.
Inside the stringLength
function, the variable count
is defined and its value set to 0
. The program then enters a while
loop to sequence through the string
array until the null character is reached. When the function finally hits upon this character, signaling the end of the character string, the while
loop is exited and the value of count
is returned. This value represents the number of characters in the string, excluding the null character. You might want to trace through the operation of this loop on a small character array to verify that the value of count
when the loop is exited is in fact equal to the number of characters in the array, excluding the null character.
In the main
routine, three character arrays, word1
, word2
, and word3
, are defined. The printf
function call displays the results of calling the stringLength
function for each of these three character arrays.
Now, it is time to go back to the concat
function developed in Program 10.1 and rewrite it to work with variable-length character strings. Obviously, the function must be changed somewhat because you no longer want to pass as arguments the number of characters in the two arrays. The function now takes only three arguments: the two character arrays to be concatenated and the character array in which to place the result.
Before delving into this program, you should first learn about two nice features that C provides for dealing with character strings.
The first feature involves the initialization of character arrays. C permits a character array to be initialized by simply specifying a constant character string rather than a list of individual characters. So, for example, the statement
char word[] = { "Hello!" };
can be used to set up an array of characters called word
with the initial characters ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘!’, and ‘ ’, respectively. You can also omit the braces when initializing character arrays in this manner. So, the statement
char word[] = "Hello!";
is perfectly valid. Either statement is equivalent to the statement
char word[] = { 'H', 'e', 'l', 'l', 'o', '!', ' ' };
If you’re explicitly specifying the size of the array, make certain you leave enough space for the terminating null character. So, in
char word[7] = { "Hello!" };
the compiler has enough room in the array to place the terminating null character. However, in
char word[6] = { "Hello!" };
the compiler can’t fit a terminating null character at the end of the array, and so it doesn’t put one there (and it doesn’t complain about it either).
In general, wherever they appear in your program, character-string constants in the C language are automatically terminated by the null character. This fact helps functions such as printf
determine when the end of a character string has been reached. So, in the call
printf ("Programming in C is fun. ");
the null character is automatically placed after the newline character in the character string, thereby enabling the printf
function to determine when it has reached the end of the format string.
The other feature to be mentioned here involves the display of character strings. The special format characters %s
inside a printf
format string can be used to display an array of characters that is terminated by the null character. So, if word
is a null-terminated array of characters, the printf
call
printf ("%s ", word);
can be used to display the entire contents of the word
array at the terminal. The printf
function assumes when it encounters the %s
format characters that the corresponding argument is a character string that is terminated by a null character.
The two features just described were incorporated into the main
routine of Program 10.3, which illustrates your revised concat
function. Because you are no longer passing the number of characters in each string as arguments to the function, the function must determine when the end of each string is reached by testing for the null character. Also, when str1
is copied into the result
array, you want to be certain not to also copy the null character because this ends the string in the result
array right there. You do need, however, to place a null character into the result
array after str2
has been copied so as to signal the end of the newly created string.
Example 10.3. Concatenating Character Strings
#include <stdio.h> int main (void) { void concat (char result[], const char str1[], const char str2[]); const char s1[] = { "Test " }; const char s2[] = { "works." }; char s3[20]; concat (s3, s1, s2); printf ("%s ", s3); return 0; } // Function to concatenate two character strings void concat (char result[], const char str1[], const char str2[]) { int i, j; // copy str1 to result for ( i = 0; str1[i] != ' '; ++i ) result[i] = str1[i]; // copy str2 to result for ( j = 0; str2[j] != ' '; ++j ) result[i + j] = str2[j]; // Terminate the concatenated string with a null character result [i + j] = ' '; }
In the first for
loop of the concat
function, the characters contained inside str1
are copied into the result
array until the null character is reached. Because the for
loop terminates as soon as the null character is matched, it does not get copied into the result
array.
In the second loop, the characters from str2
are copied into the result
array directly after the final character from str1
. This loop makes use of the fact that when the previous for
loop finished execution, the value of i
was equal to the number of characters in str1
, excluding the null character. Therefore, the assignment statement
result[i + j] = str2[j];
is used to copy the characters from str2
into the proper locations of result
.
After the second loop is completed, the concat
function puts a null character at the end of the string. Study the function to ensure that you understand the use of i
and j
. Many program errors when dealing with character strings involve the use of an index number that is off by 1 in either direction.
Remember, to reference the first character of an array, an index number of 0 is used. In addition, if a character array string
contains n
characters, excluding the null byte, then string[n
– 1]
references the last (nonnull) character in the string, whereas string[n]
references the null character. Furthermore, string
must be defined to contain at least n + 1
characters, bearing in mind that the null character occupies a location in the array.
Returning to the program, the main
routine defines two char
arrays, s1
and s2
, and sets their values using the new initialization technique previously described. The array s3
is defined to contain 20 characters, thus ensuring that sufficient space is reserved for the concatenated character string and saving you from the trouble of having to precisely calculate its size.
The concat
function is then called with the three strings s1
, s2
, and s3
as arguments. The result, as contained in s3
after the concat
function returns, is displayed using the %s
format characters. Although s3
is defined to contain 20 characters, the printf
function only displays characters from the array up to the null character.
You cannot directly test two strings to see if they are equal with a statement such as
if ( string1 == string2 ) ...
because the equality operator can only be applied to simple variable types, such as float
s, int
s, or char
s, and not to more sophisticated types, such as structures or arrays.
To determine if two strings are equal, you must explicitly compare the two character strings character by character. If you reach the end of both character strings at the same time, and if all of the characters up to that point are identical, the two strings are equal; otherwise, they are not.
It might be a good idea to develop a function that can be used to compare two character strings, as shown in Program 10.4. You can call the function equalStrings
and have it take as arguments the two character strings to be compared. Because you are only interested in determining whether the two character strings are equal, you can have the function return a bool
value of true
(or nonzero) if the two strings are identical, and false
(or zero) if they are not. In this way, the function can be used directly inside test statements, such as in
if ( equalStrings (string1, string2) ) ...
Example 10.4. Testing Strings for Equality
// Function to determine if two strings are equal #include <stdio.h> #include <stdbool.h> bool equalStrings (const char s1[], const char s2[]) { int i = 0; bool areEqual; while ( s1[i] == s2 [i] && s1[i] != ' ' && s2[i] != ' ' ) ++i; if ( s1[i] == ' ' && s2[i] == ' ' ) areEqual = true; else areEqual = false; return areEqual; } int main (void) { bool equalStrings (const char s1[], const char s2[]); const char stra[] = "string compare test"; const char strb[] = "string"; printf ("%i ", equalStrings (stra, strb)); printf ("%i ", equalStrings (stra, stra)); printf ("%i ", equalStrings (strb, "string")); return 0; }
The equalStrings
function uses a while
loop to sequence through the character strings s1
and s2
. The loop is executed so long as the two character strings are equal (s1[i] == s2[i]
) and so long as the end of either string is not reached (s1[i] != ' ' && s2[i] != ' '
). The variable i
, which is used as the index number for both arrays, is incremented each time through the while
loop.
The if
statement that executes after the while
loop has terminated determines if you have simultaneously reached the end of both strings s1
and s2
. You could have used the statement
if ( s1[i] == s2[i] ) ...
instead to achieve the same results. If you are at the end of both strings, the strings must be identical, in which case areEqual
is set to true
and returned to the calling routine. Otherwise, the strings are not identical and areEqual
is set to false
and returned.
In main
, two character arrays stra
and strb
are set up and assigned the indicated initial values. The first call to the equalStrings
function passes these two character arrays as arguments. Because these two strings are not equal, the function correctly returns a value of false
, or 0
.
The second call to the equalStrings
function passes the string stra
twice. The function correctly returns a true
value to indicate that the two strings are equal, as verified by the program’s output.
The third call to the equalStrings
function is a bit more interesting. As you can see from this example, you can pass a constant character string to a function that is expecting an array of characters as an argument. In Chapter 11, “Pointers,” you see how this works. The equalStrings
function compares the character string contained in strb
to the character string "string"
and returns true
to indicate that the two strings are equal.
By now, you are used to the idea of displaying a character string using the %s
format characters. But what about reading in a character string from your window (or your “terminal window”)? Well, on your system, there are several library functions that you can use to input character strings. The scanf
function can be used with the %s
format characters to read in a string of characters up to a blank space, tab character, or the end of the line, whichever occurs first. So, the statements
char string[81]; scanf ("%s", string);
have the effect of reading in a character string typed into your terminal window and storing it inside the character array string
. Note that unlike previous scanf
calls, in the case of reading strings, the &
is not placed before the array name (the reason for this is also explained in Chapter 11).
If the preceding scanf
call is executed, and the following characters are entered:
Shawshank
the string "Shawshank"
is read in by the scanf
function and is stored inside the string
array. If the following line of text is typed instead:
iTunes playlist
just the string "iTunes"
is stored inside the string
array because the blank space after the word scanf
terminates the string. If the scanf
call is executed again, this time the string "playlist"
is stored inside the string
array because the scanf
function always continues scanning from the most recent character that was read in.
The scanf
function automatically terminates the string that is read in with a null character. So, execution of the preceding scanf
call with the line of text
abcdefghijklmnopqrstuvwxyz
causes the entire lowercase alphabet to be stored in the first 26 locations of the string
array, with string[26]
automatically set to the null character.
If s1
, s2
, and s3
are defined to be character arrays of appropriate sizes, execution of the statement
scanf ("%s%s%s", s1, s2, s3);
with the line of text
micro computer system
results in the assignment of the string "micro"
to s1
, "computer"
to s2
, and "system"
to s3
. If the following line of text is typed instead:
system expansion
it results in the assignment of the string "system"
to s1
, and "expansion"
to s2
. Because no further characters appear on the line, the scanf
function then waits for more input to be entered from your terminal window.
In Program 10.5, scanf
is used to read three character strings.
In the preceding program, the scanf
function is called to read in three character strings: s1
, s2
, and s3
. Because the first line of text contains only two character strings—where the definition of a character string to scanf
is a sequence of characters up to a space, tab, or the end of the line—the program waits for more text to be entered. After this is done, the printf
call is used to verify that the strings "system"
, "expansion"
, and "bus"
are correctly stored inside the string arrays s1
, s2
, and s3
, respectively.
If you type in more than 80 consecutive characters to the preceding program without pressing the spacebar, the tab key, or the Enter (or Return) key, scanf
overflows one of the character arrays. This might cause the program to terminate abnormally or cause unpredictable things to happen. Unfortunately, scanf
has no way of knowing how large your character arrays are. When handed a %s
format, it simply continues to read and store characters until one of the noted terminator characters is reached.
If you place a number after the %
in the scanf
format string, this tells scanf
the maximum number of characters to read. So, if you used the following scanf
call:
scanf ("%80s%80s%80s", s1, s2, s3);
instead of the one shown in Program 10.5, scanf
knows that no more than 80 characters are to be read and stored into either s1
, s2
, or s3
. (You still have to leave room for the terminating null character that scanf
stores at the end of the array. That’s why %80s
is used instead of %81s
.)
The standard library provides several functions for the express purposes of reading and writing single characters and entire character strings. A function called getchar
can be used to read in a single character from the terminal. Repeated calls to the getchar
function return successive single characters from the input. When the end of the line is reached, the function returns the newline character '
'
. So, if the characters “abc” are typed at the terminal, followed immediately by the Enter (or Return) key, the first call to the getchar
function returns the character 'a'
, the second call returns the character 'b'
, the third call returns 'c'
, and the fourth call returns the newline character '
'
. A fifth call to this function causes the program to wait for more input to be entered from the terminal.
You might be wondering why you need the getchar
function when you already know how to read in a single character with the %c
format characters of the scanf
function. Using the scanf
function for this purpose is a perfectly valid approach; however, the getchar
function is a more direct approach because its sole purpose is for reading in single characters, and, therefore, it does not require any arguments. The function returns a single character that might be assigned to a variable or used as desired by the program.
In many text-processing applications, you need to read in an entire line of text. This line of text is frequently stored in a single place—generally called a “buffer”—where it is processed further. Using the scanf
call with the %s
format characters does not work in such a case because the string is terminated as soon as a space is encountered in the input.
Also available from the function library is a function called gets
. The sole purpose of this function—you guessed it—is to read in a single line of text. As an interesting program exercise, Program 10.6 shows how a function similar to the gets
function—called readLine
here—can be developed using the getchar
function. The function takes a single argument: a character array in which the line of text is to be stored. Characters read from the terminal window up to, but not including, the newline character are stored in this array by the function.
Example 10.6. Reading Lines of Data
#include <stdio.h> int main (void) { int i; char line[81]; void readLine (char buffer[]); for ( i = 0; i < 3; ++i ) { readLine (line); printf ("%s ", line); } return 0; } // Function to read a line of text from the terminal void readLine (char buffer[]) { char character; int i = 0; do { character = getchar (); buffer[i] = character; ++i; } while ( character != ' ' ); buffer[i - 1] = ' '; }
The do
loop in the readLine
function is used to build up the input line inside the character array buffer
. Each character that is returned by the getchar
function is stored in the next location of the array. When the newline character is reached—signaling the end of the line—the loop is exited. The null character is then stored inside the array to terminate the character string, replacing the newline character that was stored there the last time that the loop was executed. The index number i
– 1
indexes the correct position in the array because the index number was incremented one extra time inside the loop the last time it was executed.
The main
routine defines a character array called line
with enough space reserved to hold 81 characters. This ensures that an entire line (80 characters has historically been used as the line length of a “standard terminal”) plus the null character can be stored inside the array. However, even in windows that display 80 or fewer characters per line, you are still in danger of overflowing the array if you continue typing past the end of the line without pressing the Enter (or Return) key. It is a good idea to extend the readLine
function to accept as a second argument the size of the buffer. In this way, the function can ensure that the capacity of the buffer is not exceeded.
The program then enters a for
loop, which simply calls the readLine
function three times. Each time that this function is called, a new line of text is read from the terminal. This line is simply echoed back at the terminal to verify proper operation of the function. After the third line of text has been displayed, execution of Program 10.6 is then complete.
For your next program example (see Program 10.7), consider a practical text-processing application: counting the number of words in a portion of text. Develop a function called countWords
, which takes as its argument a character string and which returns the number of words contained in that string. For the sake of simplicity, assume here that a word is defined as a sequence of one or more alphabetic characters. The function can scan the character string for the occurrence of the first alphabetic character and considers all subsequent characters up to the first nonalphabetic character as part of the same word. Then, the function can continue scanning the string for the next alphabetic character, which identifies the start of a new word.
Example 10.7. Counting Words
// Function to determine if a character is alphabetic #include <stdio.h> #include <stdbool.h> bool alphabetic (const char c) { if ( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ) return true; else return false; } /* Function to count the number of words in a string */ int countWords (const char string[]) { int i, wordCount = 0; bool lookingForWord = true, alphabetic (const char c); for ( i = 0; string[i] != ' '; ++i ) if ( alphabetic(string[i]) ) { if ( lookingForWord ) { ++wordCount; lookingForWord = false; } } else lookingForWord = true; return wordCount; } int main (void) { const char text1[] = "Well, here goes."; const char text2[] = "And here we go... again."; int countWords (const char string[]); printf ("%s - words = %i ", text1, countWords (text1)); printf ("%s - words = %i ", text2, countWords (text2)); return 0; }
The alphabetic
function is straightforward enough—it simply tests the value of the character passed to it to determine if it is either a lowercase or uppercase letter. If it is either, the function returns true
, indicating that the character is alphabetic; otherwise, the function returns false
.
The countWords
function is not as straightforward. The integer variable i
is used as an index number to sequence through each character in the string. The integer variable lookingForWord
is used as a flag to indicate whether you are currently in the process of looking for the start of a new word. At the beginning of the execution of the function, you obviously are looking for the start of a new word, so this flag is set to true
. The local variable wordCount
is used for the obvious purpose of counting the number of words in the character string.
For each character inside the character string, a call to the alphabetic
function is made to determine whether the character is alphabetic. If the character is alphabetic, the lookingForWord
flag is tested to determine if you are in the process of looking for a new word. If you are, the value of wordCount
is incremented by 1, and the lookingForWord
flag is set to false
, indicating that you are no longer looking for the start of a new word.
If the character is alphabetic and the lookingForWord
flag is false
, this means that you are currently scanning inside a word. In such a case, the for
loop is continued with the next character in the string.
If the character is not alphabetic—meaning either that you have reached the end of a word or that you have still not found the beginning of the next word—the flag lookingForWord
is set to true
(even though it might already be true
).
When all of the characters inside the character string have been examined, the function returns the value of wordCount
to indicate the number of words that were found in the character string.
It is helpful to present a table of the values of the various variables in the countWords
function to see how the algorithm works. Table 10.1 shows such a table, with the first call to the countWords
function from the preceding program as an example. The first line of Table 10.1 shows the initial value of the variables wordCount
and lookingForWord
before the for
loop is entered. Subsequent lines depict the values of the indicated variables each time through the for
loop. So, the second line of the table shows that the value of wordCount
has been set to 1
and the lookingForWord
flag set to false
(0
) after the first time through the loop (after the 'W'
has been processed). The last line of the table shows the final values of the variables when the end of the string is reached. You should spend some time studying this table, verifying the values of the indicated variables against the logic of the countWords
function. After this has been accomplished, you should then feel comfortable with the algorithm that is used by the function to count the number of words in a string.
Now consider a slightly more practical example of the use of the countWords
function. This time, you make use of your readLine
function to allow the user to type in multiple lines of text at the terminal window. The program then counts the total number of words in the text and displays the result.
To make the program more flexible, you do not limit or specify the number of lines of text that are entered. Therefore, you must have a way for the user to “tell” the program when he is done entering text. One way to do this is to have the user simply press the Enter (or Return) key an extra time after the last line of text has been entered. When the readLine
function is called to read in such a line, the function immediately encounters the newline character and, as a result, stores the null character as the first (and only) character in the buffer. Your program can check for this special case and can know that the last line of text has been entered after a line containing no characters has been read.
A character string that contains no characters other than the null character has a special name in the C language; it is called the null string. When you think about it, the use of the null string is still perfectly consistent with all of the functions that you have defined so far in this chapter. The stringLength
function correctly returns 0 as the size of the null string; your concat
function also properly concatenates “nothing” onto the end of another string; even your equalStrings
function works correctly if either or both strings are null (and in the latter case, the function correctly calls these strings equal).
Always remember that the null string does, in fact, have a character in it, albeit a null one.
Sometimes, it becomes desirable to set the value of a character string to the null string. In C, the null string is denoted by an adjacent pair of double quotation marks. So, the statement
char buffer[100] = "";
defines a character array called buffer
and sets its value to the null string. Note that the character string ""
is not the same as the character string " "
because the second string contains a single blank character. (If you are doubtful, send both strings to the equalStrings
function and see what result comes back.)
Program 10.8 uses the readLine
, alphabetic
, and countWords
functions from previous programs. They have not been shown in the program listing to conserve space.
Example 10.8. Counting Words in a Piece of Text
#include <stdio.h> #include <stdbool.h> /***** Insert alphabetic function here *****/ /***** Insert readLine function here *****/ /***** Insert countWords function here *****/ int main (void) { char text[81]; int totalWords = 0; int countWords (const char string[]); void readLine (char buffer[]); bool endOfText = false; printf ("Type in your text. "); printf ("When you are done, press 'RETURN'. "); while ( ! endOfText ) { readLine (text); if ( text[0] == ' ' ) endOfText = true; else totalWords += countWords (text); } printf (" There are %i words in the above text. ", totalWords); return 0; }
Example 10.8. Output
Type in your text. When you are done, press 'RETURN'. Wendy glanced up at the ceiling where the mound of lasagna loomed like a mottled mountain range. Within seconds, she was crowned with ricotta ringlets and a tomato sauce tiara. Bits of beef formed meaty moles on her forehead. After the second thud, her culinary coronation was complete. Enter There are 48 words in the above text.
The line labeled Enter indicates the pressing of the Enter or Return key.
The endOfText
variable is used as a flag to indicate when the end of the input text has been reached. The while
loop is executed as long as this flag is false
. Inside this loop, the program calls the readLine
function to read a line of text. The if
statement then tests the input line that is stored inside the text
array to see if just the Enter (or Return) key was pressed. If so, then the buffer contains the null string, in which case the endOfText
flag is set to true
to signal that all of the text has been entered.
If the buffer does contain some text, the countWords
function is called to count the number of words in the text
array. The value that is returned by this function is added into the value of totalWords
, which contains the cumulative number of words from all lines of text entered thus far.
After the while
loop is exited, the program displays the value of totalWords
, along with some informative text, at the terminal.
It might seem that the preceding program does not help to reduce your work efforts much because you still have to manually enter all of the text at the terminal. But as you will see in Chapter 16, “Input and Output Operations in C,” this same program can also be used to count the number of words contained in a file stored on a disk, for example. So, an author using a computer system for the preparation of a manuscript might find this program extremely valuable as it can be used to quickly determine the number of words contained in the manuscript (assuming the file is stored as a normal text file and not in some word processor format like Microsoft Word).
As alluded to previously, the backslash character has a special significance that extends beyond its use in forming the newline and null characters. Just as the backslash and the letter n, when used in combination, cause subsequent printing to begin on a new line, so can other characters be combined with the backslash character to perform special functions. These various backslash characters, often referred to as escape characters, are summarized in Table 10.2.
Table 10.2. Escape Characters
Escape | Character Name |
---|---|
| Audible alert |
Backspace | |
| Form feed |
| Newline |
| Carriage return |
| Horizontal tab |
| Vertical tab |
| Backslash |
| Double quotation mark |
| Single quotation mark |
| Question mark |
| Octal character value nnn |
| Universal character name |
| Universal character name |
| Hexadecimal character value nn |
The first seven characters listed in Table 10.2 perform the indicated function on most output devices when they are displayed. The audible alert character, a
, sounds a “bell” in most terminal windows. So, the printf
call
printf ("aSYSTEM SHUT DOWN IN 5 MINUTES!! ");
sounds an alert and displays the indicated message.
Including the backspace character ''
inside a character string causes the terminal to backspace one character at the point at which the character appears in the string, provided that it is supported by the terminal window. Similarly, the function call
printf ("%i %i %i ", a, b, c);
displays the value of a
, spaces over to the next tab setting (typically set to every eight columns by default), displays the value of b
, spaces over to the next tab setting, and then displays the value of c
. The horizontal tab character is particularly useful for lining up data in columns.
To include the backslash character itself inside a character string, two backslash characters are necessary, so the printf
call
printf ("\t is the horizontal tab character. ");
displays the following:
is the horizontal tab character.
Note that because the \
is encountered first in the string, a tab is not displayed in this case.
To include a double quotation character inside a character string, it must be preceded by a backslash. So, the printf
call
printf (""Hello," he said. ");
results in the display of the message
"Hello," he said.
To assign a single quotation character to a character variable, the backslash character must be placed before the quotation mark. If c
is declared to be a variable of type char
, the statement
c = ''';
assigns a single quotation character to c
.
The backslash character, followed immediately by a ?
, is used to represent a ?
character. This is sometimes necessary when dealing with trigraphs in non-ASCII character sets. For more details, consult Appendix A, “C Language Summary.”
The final four entries in Table 10.2 enable any character to be included in a character string. In the escape character '
nnn'
, nnn
is a one- to three-digit octal number. In the escape character 'x
nn'
, nn
is a hexadecimal number. These numbers represent the internal code of the character. This enables characters that might not be directly available from the keyboard to be coded into a character string. For example, to include an ASCII escape character, which has the value octal 33, you could include the sequence