Tokenizing a String with strtok

Function strtok breaks a string into a series of tokens. A token is a sequence of characters separated by delimiting characters (usually spaces or punctuation marks). For example, in a line of text, each word can be considered a token, and the spaces separating the words can be considered delimiters. Multiple calls to strtok are required to break a string into tokens (assuming that the string contains more than one token). The first call to strtok contains two arguments, a string to be tokenized and a string containing characters that separate the tokens (i.e., delimiters). Line 15 in Fig. 20.25 assigns to tokenPtr a pointer to the first token in sentence. The second argument, " ", indicates that tokens in sentence are separated by spaces. Function strtok searches for the first character in sentence that’s not a delimiting character (space). This begins the first token. The function then finds the next delimiting character in the string and replaces it with a null ('') character. This terminates the current token. Function strtok saves (in a static variable) a pointer to the next character following the token in sentence and returns a pointer to the current token.


 1   // Fig. 20.25: fig20_25.cpp
 2   // Using strtok to tokenize a string.
 3   #include <iostream>
 4   #include <cstring> // prototype for strtok
 5   using namespace std;
 6
 7   int main()
 8   {
 9      char sentence[] = "This is a sentence with 7 tokens";
10
11      cout << "The string to be tokenized is: " << sentence
12         << " The tokens are: ";
13
14      // begin tokenization of sentence
15      char *tokenPtr = strtok( sentence, " " );
16
17      // continue tokenizing sentence until tokenPtr becomes NULL
18      while ( tokenPtr != NULL )
19      {
20         cout << tokenPtr << ' ';
21         tokenPtr = strtok( NULL, " " ); // get next token
22      } // end while
23
24      cout << " After strtok, sentence = " << sentence << endl;
25   } // end main


The string to be tokenized is:
This is a sentence with 7 tokens

The tokens are:

This
is
a
sentence
with
7
tokens

After strtok, sentence = This


Fig. 20.25. Using strtok to tokenize a string.

Subsequent calls to strtok to continue tokenizing sentence contain NULL as the first argument (line 21). The NULL argument indicates that the call to strtok should continue tokenizing from the location in sentence saved by the last call to strtok. Function strtok maintains this saved information in a manner that’s not visible to you. If no tokens remain when strtok is called, strtok returns NULL. The program of Fig. 20.25 uses strtok to tokenize the string "This is a sentence with 7 tokens". The program prints each token on a separate line. Line 24 outputs sentence after tokenization. Note that strtok modifies the input string; therefore, a copy of the string should be made if the program requires the original after the calls to strtok. When sentence is output after tokenization, only the word “This” prints, because strtok replaced each blank in sentence with a null character ('') during the tokenization process.


Image Common Programming Error 20.10

Not realizing that strtok modifies the string being tokenized, then attempting to use that string as if it were the original unmodified string is a logic error.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.171.212