Chapter 11. Strings

This chapter presents the string types of the C++ standard library. It describes the basic template class basic_string<> and its standard specializations string and wstring.

Strings can be a source of confusion. This is because it is not clear what is meant by the term string. Does it mean an ordinary character array of type char* (with or without the const qualifier), or an instance of class string, or is it a general name for objects that are kind of strings? In this chapter I use the term string for objects of one of the string types in the C++ standard library (whether it is string or wstring). For "ordinary strings" of type char* or const char*, I use the term C-string.

Note that the type of string literals (such as "hello") was changed into const char*. However, to provide backward compatibility there is an implicit but deprecated conversion to char* for them.

Motivation

The string classes of the C++ standard library enable you to use strings as normal types that cause no problems for the user. Thus, you can copy, assign, and compare strings as fundamental types without worrying or bothering about whether there is enough memory or for how long the internal memory is valid. You simply use operators, such as assignment by using =, comparison by using ==, and concatenation by using +. In short, the string types of the C++ standard library are designed in such a way that they behave as if they were a kind of fundamental data type that does not cause any trouble (at least in principle). Modern data processing is mostly string processing, so this is an important step for programmers coming from C, Fortran, or similar languages in which strings are a source of trouble.

The following sections offer two examples that demonstrate the abilities and uses of the string classes. They aren't very useful because they are written only for demonstration purposes.

A First Example: Extracting a Temporary File Name

The first example program uses command-line arguments to generate temporary file names. For example, if you start the program as

   string1 prog.dat mydir hello. oops.tmp end.dat

the output is

   prog.dat => prog.tmp
   mydir => mydir.tmp
   hello. => hello.tmp
   oops.tmp => oops.xxx
   end.dat => end.tmp

Usually, the generated file name has the extension .tmp, whereas the temporary file name for a name with the extension .tmp is .xxx.

The program is written in the following way:

   //string/string1.cpp

   #include <iostream>
   #include <string>
   using namespace std;

   int main (int argc, char* argv[])
   {

       string filename, basename, extname, tmpname;
       const string suffix("tmp");

       /*for each command-line argument
        *(which is an ordinary C-string)
        */
       for (int i=1; i<argc; ++i) {
           //process argument as file name
           filename = argv[i];

           //search period in file name
           string::size_type idx = filename.find('.'),
           if (idx == string::npos) {
               //file name does not contain any period
               tmpname = filename + '.' + suffix;
           }
           else {
                /* split file name into base name and extension
                 * - base name contains all characters before the period
                 * - extension contains all characters after the period
                 */
                basename = filename.substr(0, idx);
                extname = filename.substr(idx+1);
                if (extname.empty()) {
                    //contains period but no extension: append tmp
                    tmpname = filename;
                    tmpname += suffix;
                }
                else if (extname == suffix) {
                    //replace extension tmp with xxx
                    tmpname = filename;
                    tmpname.replace (idx+1, extname.size(), "xxx");
                }
                else {
                    //replace any extension with tmp
                    tmpname = filename;
                    tmpname.replace (idx+1, string::npos, suffix);
                }
          }

          //print file name and temporary name
          cout << filename << " => " << tmpname << endl;
       }
   }

At first,

   #include <string>

includes the header file for the C++ standard string classes. As usual, these classes are declared in namespace std.

The declaration

   string filename, basename, extname, tmpname;

creates four string variables. No argument is passed, so for their initialization the default constructor for string is called. The default constructor initializes them as empty strings.

The declaration

   const string suffix("tmp");

creates a constant string suffix that is used in the program as the normal suffix for temporary file names. The string is initialized by an ordinary C-string, so it has the value tmp. Note that C-strings can be combined with objects of class string in almost any situation in which two strings can be combined. In particular, in the entire program every occurrence of suffix could be replaced with "tmp" so that a C-string is used directly.

In each iteration of the for loop, the statement

    filename = argv[i];

assigns a new value to the string variable filename. In this case, the new value is an ordinary C-string. However, it could also be another object of class string or a single character that has type char.

The statement

    string::size_type idx = filename.find('.'),

searches the first occurrence of a period inside the string filename. The find() function is one of several functions that search for something inside strings. You could also search backward, for substrings, only in a part of a string, or for more than one character simultaneously. All these find functions return an index of the first matching position. Yes, the return value is an integer and not an iterator. The usual interface for strings is not based on the concept of the STL. However, some iterator support for strings is provided (see Section 11.2.13). The return type of all find functions is string::size_type, an unsigned integral type that is defined inside the string class.[1] As usual, the index of the first character is the value 0. The index of the last character is the value "numberOfCharacters-1." Note that "numberOfCharacters" is not a valid index. Unlike C-strings, objects of class string have no special character '' at the end of the string.

If the search fails, a special value is needed to return the failure. That value is npos, which is also defined by the string class. Thus, the line

    if (idx == string::npos)

checks whether the search for the period failed.

The type and value of npos are a big pitfall for the use of strings. Be very careful that you always use string::size_type and not int or unsigned for the return type when you want to check the return value of a find function. Otherwise, the comparison with string::npos might not work. See Section 11.2.12, for details.

If the search for the period fails in this example, the file name has no extension. In this case, the temporary file name is the concatenation of the original file name, the period character, and the previously defined extension for temporary files:

    tmpname = filename + '.' + suffix;

Thus, you can simply use operator + to concatenate two strings. It is also possible to concatenate strings with ordinary C-strings and single characters.

If the period is found, the else part is used. Here, the index of the period is used to split the file name into a base part and the extension. This is done by the substr() member function:

   basename = filename.substr(0, idx);
   extname = filename.substr(idx+1);

The first parameter of the substr() function is the starting index. The optional second argument is the number of characters (not the end index). If the second argument is not used, all remaining characters of the string are returned as a substring.

At all places where an index and a length are used as arguments, strings behave according to the following two rules:

  1. An argument specifying the index must have a valid value. That value must be less than the number of characters of the string (as usual, the index of the first character is 0). In addition, the index of the position after the last character could be used to specify the end.

    In most cases, any use of an index greater than the actual number of characters throws out_of_range. However, all functions that search for a character or a position (all find functions) allow any index. If the index exceeds the number of characters these functions simply return string::npos ("not found").

  2. An argument specifying the number of characters could have any value. If the size is greater than the remaining number of characters, all remaining characters are used. In particular, string::npos always works as a synonym for "all remaining characters."

Thus, the following expression throws an exception if the period is not found:

   filename.substr(filename.find('.'))

But, the following expression does not throw an exception:

    filename.substr(0, filename.find('. '))

If the period is not found, it results in the whole file name.

Even if the period is found, the extension that is returned by substr() might be empty because there are no more characters after the period. This is checked by

    if (extname.empty())

If this condition yields true, the generated temporary file name becomes the ordinary file name that has the normal extension appended:

    tmpname = filename;
    tmpname += suffix;

Here, operator += is used to append the extension.

The file name might already have the extension for temporary files. To check this, operator == is used to compare two strings:

    if (extname == suffix)

If this comparison yields true the normal extension for temporary files is replaced by the extension xxx:

    tmpname = filename;
    tmpname.replace (idx+1, extname.size(), "xxx");

Here,

    extname.size()

returns the number of characters of the string extname. Instead of size() you could use length(), which does exactly the same thing. So, both size() and length() return the number of characters. In particular, size() has nothing to do with the memory that the string uses.[2]

Next, after all special conditions are considered, normal processing takes place. The program replaces the whole extension by the ordinary extension for temporary file names:

    tmpname = filename;
    tmpname.replace (idx+1, string::npos, suffix);

Here, string::npos is used as a synonym for "all remaining characters." Thus, all remaining characters after the period are replaced with suffix. This replacement would also work if the file name contained a period but no extension. It would just replace "nothing" with suffix.

The statement that writes the original file name and the generated temporary file name shows that you can print the strings by using the usual output operators of streams (surprise, surprise):

   cout << filename << " => " << tmpname << endl;

A Second Example: Extracting Words and Printing Them Backward

The second example extracts single words from standard input and prints the characters of each word in reverse order. The words are separated by the usual whitespaces (newline, space, and tab), and by commas, periods, or semicolons.

    //string/string2.cpp

    #include <iostream>
    #include <string>
    using namespace std;

    int main (int argc, char** argv)
    {

       const string delims(" 	,.;");
       string line;
       //for every line read successfully
       while (getline(cin,line)) {
           string::size_type begIdx, endIdx;

           //search beginning of the first word
           begIdx = line.find_first_not_of(delims);

           //while beginning of a word found
           while (begIdx != string::npos) {
               //search end of the actual word
               endIdx = line.find_first_of (delims, begIdx);
               if (endIdx == string::npos) {
                   //end of word is end of line
                   endIdx = line.length();
               }

               //print characters in reverse order
               for (int i=endIdx-1; i>=static_cast<int>(begIdx); --i) 
                   cout << line [i];
               }
               cout << ' ';

               //search beginning of the next word
               begIdx = line.find_first_not_of (delims, endIdx);
           }
           cout << endl;
       }
    }

In this program, all characters used as word separators are defined in a special string constant:

    const string delims(" 	,.;");

The newline is also used as a delimiter. However, no special processing is necessary for it because the program reads line-by-line.

The outer loop runs as far as a line can be read into the string line:

    string line;
    while (getline(cin,line)) {
        ...
    }

The function getline() is a special function to read input from streams into a string. It reads every character up to the next end-of-line, which by default is the newline character. The line delimiter itself is extracted but not appended. By passing your special line delimiter as an optional third character argument you can use getline() to read token-by-token, where the tokens are separated by that special delimiter.

Inside the outer loop, the individual words are searched and printed. The first statement

    begIdx = line.find_first_not_of(delims);

searches for the beginning of the first word. The find_first_not_of() function returns the first index of a character that is not part of the passed string argument. Thus, this function returns the first character that is not one of the separators in delims. As usual for find functions, if no matching index is found, string::npos is returned.

The inner loop iterates as long as the beginning of a word can be found:

    while (begIdx != string::npos) {
        ...
    }

The first statement of the inner loop searches for the end of the current word:

    endIdx = line.find_first_of (delims, begIdx);

The find_first_of() function searches for the first occurrence of one of the characters passed as the first argument. In this case, an optional second argument is used that specifies where to start the search in the string. Thus, the first delimiter after the beginning of the word is searched.

If no such character is found, the end-of-line is used:

    if (endIdx == string::npos) {
        endIdx = line.length();
    }

Here, length() is used, which does the same thing as size(): It returns the number of characters.

In the next statement, all characters of the word are printed in reverse order:

    for (int i=endIdx-1; i>=static_cast<int>(begIdx); --i) {
        cout << line[i];
    }

Accessing a single character of the string is done with operator [ ]. Note that this operator does not check whether the index of the string is valid. Thus, you have to ensure that the index is valid (as was done here). A safer way to access a character is to use the at() member function. However, such a check costs runtime, so the check is not provided for the usual accessing of characters of a string.

Another nasty problem results from using the index of the string. That is, if you omit the cast of begIdx to int, this program might run in an endless loop or might crash. Similar to the first example program, the problem is that string::size_type is an unsigned integral type. Without the cast, the signed value i is converted automatically into an unsigned value because it is compared with a unsigned type. In this case, the expression

    i>=begIdx

always yields true if the current word starts at the beginning of the line. This is because begIdx is then zero and any unsigned value is greater than or equal to zero. So, an endless loop results that might get stopped by a crash due to an illegal memory access.

For this reason, I really don't like the concept of string::size_type and string::npos. See Section 11.2.12, for a workaround that is safer (but not perfect).

The last statement of the inner loop reinitializes begIdx to the beginning of the next word, if any:

    begIdx = line.find_first_not_of (delims, endIdx);

Unlike with the first call of find_first_not_of() in the example, here the end of the previous word is passed as the starting index for the search. If the previous word was the rest of the line, endIdx is the index of the end of the line. This simply means that the search starts from the end of the string, which returns string::npos.

Let's try this "useful and important" program. Here is some possible input:

    pots & pans
    I saw a reed

The output for this input is as follows:

    stop & snap
    I was a deer

I'd appreciate other examples of input for the next edition of this book.

Description of the String Classes

String Types

Header File

All types and functions for strings are defined in the header file <string>:

    #include <string>

As usual, it defines all identifiers in namespace std.

Template Class basic_string<>

Inside <string>, the type basic_string<> is defined as a basic template class for all string types:

    namespace std {
        template<class charT,
                 class traits = char_traits<charT>,
                 class Allocator = allocator<charT> >
        class basic_string;
    }

It is parameterized by the character type, the traits of the character type, and the memory model:

  • The first parameter is the data type of a single character.

  • The optional second parameter is a traits class, which provides all core operations for the characters of the string class. Such a traits class specifies how to copy or to compare characters (see Section 14.1.2, for details). If it is not specified, the default traits class according to the current character type is used. See Section 11.2.14, for a user-defined traits class that lets strings behave in a case-insensitive manner.

  • The third optional argument defines the memory model that is used by the string class. As usual, the default value is the default memory model allocator (see Section 3.4, and Chapter 15 for details).[3]

Types string and wstring

Two specializations of class basic_string<> are provided by the C++ standard library:

  1. string is the predefined specialization of that template for characters of type char:

            namespace std {
                typedef basic_string<char> string;
    
            }
  2. wstring is the predefined specialization of that template for characters of type wchar_t:

            namespace std {
                typedef basic_string<wchar_t> wstring;
    
            }

    Thus, you can use strings that use wider character sets, such as Unicode or some Asian character sets (see Chapter 14 for details about internationalization).

In the following sections no distinction is made between these different kinds of strings. The usage and the problems are the same because all string classes have the same interface. So, "string" means any string type, such as string and wstring. The examples in this book usually use type string because the European and Anglo-American environment is the common environment for software development.

Operation Overview

Table 11.1 lists all operations that are provided for strings.

Table 11.1. String Operation

Operation Effect
constructors Create or copy a string
destructor Destroys a string
=, assign() Assign a new value
swap() Swaps values between two strings
+=, append(), push_back() Append characters
insert() Inserts characters
erase() Deletes characters
clear() Removes all characters (makes it empty)
resize() Changes the number of characters (deletes or appends characters at the end)
replace() Replaces characters
+ Concatenates strings
==, !=, <, <=, >, >=, compare() Compare strings
size(), length() Return the number of characters
max_size() Returns the maximum possible number of characters
empty() Returns whether the string is empty
capacity() Returns the number of characters that can be held without reallocation
reserve() Reserves memory for a certain number of characters
[], at() Access a character
>>, getline() Read the value from a stream
<< Writes the value to a stream
copy() Copies or writes the contents to a C-string
c_str() Returns the value as C-string
data() Returns the value as character array
substr() Returns a certain substring
find functions Search for a certain substring or character
begin(), end() Provide normal iterator support
rbegin(), rend() Provide reverse iterator support
get_allocator() Returns the allocator

String Operation Arguments

Many operations are provided to manipulate strings. In particular, the operations that manipulate the value of a string have several overloaded versions that specify the new value with one, two, or three arguments. All these operations use the argument scheme of Table 11.2.

Table 11.2. Scheme of String Operation Arguments

Arguments Interpretation
const string & str The whole string str
const string & str, size_type idx, size_type num At most, the first num characters of str starting with index idx
const char* cstr The whole C-string cstr
const char* chars, size_type len len characters of the character array chars
char c The character c
size_type num, char c num occurrences of the character c
iterator beg, iterator end All characters in the range [beg,end)

Note that only the single-argument version char* handles the character '' as a special character that terminates the string. In all other cases '' is not a special character:

    std::string s1("nico");        //initializes s1 with: 'n' 'i' 'c' 'o'
    std::string s2("nico",5) ;     //initializes s2 with: 'n' 'i' 'c' 'o' ''
    std::string s3(5,''),        //initializes s3 with: '' '' '' '' ''

    s1.length()                     //yields 4
    s2.length()                     //yields 5
    s3.length()                     //yields 5

Thus, in general a string might contain any character. In particular, a string might contain the contents of a binary file.

See Table 11.3 for an overview of which operation uses which kind of arguments. All operators can only handle objects as single values. Therefore, to assign, compare, or append a part of a string or C-string, you must use the function that has the corresponding name.

Operations that Are Not Provided

The string classes of the C++ standard library do not solve every possible string problem. In fact, they do not provide direct solutions for

  • Regular expressions

  • Text processing (capitalization, case-insensitive comparisons)

Text processing, however, is not a big problem. See Section 11.2.13, for some examples.

Table 11.3. Available Operations that Have String Parameters

 Full String Part of String C-string (char*) char Array Single char num chars Iterator Range
constructorsYes Yes Yes Yes — Yes Yes
=Yes — Yes — Yes — —
assign() Yes Yes Yes Yes — Yes Yes
+= Yes — Yes — Yes — —
append( ) Yes Yes Yes Yes — Yes Yes
push_back() — — — — Yes — —
insert(), index version Yes Yes Yes Yes — Yes —
insert(), iterator version — — — — Yes Yes Yes
replace(), index version Yes Yes Yes Yes Yes Yes —
replace(), iterator vers. Yes — Yes Yes — Yes Yes
find functions Yes — Yes Yes Yes — —
+ Yes — Yes — Yes — —
==, !=, <, <=, >, >= Yes — Yes — — — —
compare() Yes Yes Yes Yes — — —

Constructors and Destructors

Table 11.4 lists all constructors and destructors for strings. These are described in this section. The initialization by a range that is specified by iterators is described in Section 11.2.13.

Table 11.4. Constructors and Destructor of Strings

Expression Effect
string s Creates the empty string s
string s(str) Creates a string as a copy of the existing string str
string s (str, stridx) Creates a string s that is initialized by the characters of string str starting with index stridx
string s(str, stridx, strlen) Creates a string s that is initialized by, at most, strlen characters of string str starting with index stridx
string s(cstr) Creates a string s that is initialized by the C-string cstr
string s (chars, chars_len) Creates a string s that is initialized by chars_len characters of the character array chars
string s(num,c) Creates a string that has num occurrences of character c
string s (beg, end) Creates a string that is initialized by all characters of the range [beg, end)
s.~string() Destroys all characters and frees the memory

You can't initialize a string with a single character. Instead, you must use its address or an additional number of occurrences:

    std:: string s('x'),      //ERROR
    std:: string s(1, 'x'),   //OK, creates a string that has one character 'x'

This means that there is an automatic type conversion from type const char* but not from type char to type string.

Strings and C-Strings

In standard C++ the type of string literals was changed from char* to const char*. However, to provide backward compatibility there is an implicit but deprecated conversion to char* for them. However, because string literals don't have type string, there is a strong relationship between "new" string class objects and ordinary C-strings: You can use ordinary C-strings in almost every situation where strings are combined with other string-like objects (comparing, appending, inserting, etc.). In particular, there is an automatic type conversion from const char* into strings. However, there is no automatic type conversion from a string object to a C-string. This is for safety reasons to prevent unintended type conversions that result in strange behavior (type char* often has strange behavior) and ambiguities (for example, in an expression that combines a string and a C-string it would be possible to convert string into char* and vice versa). Instead, there are several ways to create or write/copy in a C-string. In particular, c_str() is provided to generate the value of a string as a C-string (as a character array that has '' as its last character). By using copy(), you can copy or write the value to an existing C-string or character array.

Note that strings do not provide a special meaning for the character '', which is used as special character in an ordinary C-string to mark the end of the string. The character '' may be part of a string just like every other character.

Note also that you must not use a null pointer (NULL) instead of a char* parameter. Doing so results in strange behavior. This is because NULL has an integral type and is interpreted as the number zero or the character with value 0 if the operation is overloaded for a single integral type.

There are three possible ways to convert the contents of the string into a raw array of characters or C-string:

  1. data()

    Returns the contents of the string as an array of characters. Note that the return type is not a valid C-string because no '' character gets appended.

  2. c_str()

    Returns the contents of the string as a C-string. Thus, the '' character is appended.

  3. copy()

    Copies the contents of the string into a character array provided by the caller. An '' character is not appended.

Note that data() and c_str() return an array that is owned by the string. Thus, the caller must not modify or free the memory. For example:

    std::string s("12345");


    atoi(s.c_str())               //convert string into integer
    f(s.data(), s.length())       //call function for a character array
                                  //and the number of characters


    char buffer [100];
    s.copy (buffer, 100) ;        //copy at most 100 characters of s into buffer
    s.copy (buffer, 100,2) ;      //copy at most 100 characters of s into buffer
                                  //starting with the third character of s

You usually should use strings in the whole program and convert them into C-strings or character arrays only just immediately before you need the contents as type char*. Note that the return value of c_str() and data() is valid only until the next call of a nonconstant member function for the same string:

    std::string s;

    ...
    foo (s . c_str());     //s.c_str() is valid during the whole statement


    const char* p;
    p = s.c_str() ;        //p refers to the contents of s as a C-string
    foo (p);               //OK(p is still valid)
    s += " ext" ;          //invalidates p
    foo (p);              //ERROR: argument p is not valid

Size and Capacity

To use strings effectively and correctly you need to understand how the size and capacity of strings cooperate. For strings, three "sizes" exist:

  1. size() and length()

    Return the current number of characters of the string. Both functions are equivalent.[4]

    The empty() member function is a shortcut for checking whether the numbers of elements is zero. Thus, it checks whether the string is empty. You should use it instead of length() or size() because it might be faster.

  2. max_size()

    Returns the maximum number of characters that a string may contain. A string typically contains all characters in a single block of memory, so there might be relevant restrictions on PCs. Otherwise, this value usually is the maximum value of the type of the index less one. It is "less one" for two reasons: (a) The maximum value itself is npos and (b) an implementation might append '' internally at the end of the internal buffer so that it simply returns that buffer when the string is used as a C-string (for example, by c_str()). Whenever an operation results in a string that has a length greater than max_size(), the class throws length_error.

  3. capacity()

    Returns the number of characters that a string could contain without having to reallocate its internal memory.

Having sufficient capacity is important for two reasons:

  1. Reallocation invalidates all references, pointers, and iterators that refer to characters of the string.

  2. Reallocation takes time.

Thus, the capacity must be taken into account if a program uses pointers, references, or iterators that refer to a string or to characters of a string, or if speed is a goal.

The member function reserve() is provided to avoid reallocations. reserve() lets you reserve a certain capacity before you really need it to ensure that references are valid as long as the capacity is not exceeded:

    std::string s;      //create empty string
    s.reserve(80);      //reserve memory for 80 characters

The concept of capacity for strings is, in principle, the same as for vector containers (see Section 6.2.1); however, there is one big difference: Unlike vectors, you can call reserve() for strings to shrink the capacity. Calling reserve() with an argument that is less than the current capacity is, in effect, a nonbinding shrink request. If the argument is less than the current number of characters, it is a nonbinding shrink-to-fit request. Thus, although you might want to shrink the capacity, it is not guaranteed to happen. The default value of reserve() for string is 0. So, a call of reserve() without any argument is always a nonbinding shrink-to-fit request:

    s.reserve()  ;      //"would like to shrink capacity to fit the current size"

The call to shrink capacity is nonbinding because how to reach an optimal performance is implementation-defined. Implementations of the string class might have different design approaches with respect to speed and memory usage. Therefore, implementations might increase capacity in larger steps and might never shrink the capacity.

The standard, however, specifies that capacity may shrink only because of a call of reserve(). Thus, it is guaranteed that references, pointers, and iterators remain valid even when characters are deleted or changed, provided they refer to characters that have a position that is before the manipulated characters.

Element Access

A string allows you to have read or write access to the characters it contains. You can access a single character via either of two methods: the subscript operator [] and the at() member function. Both return the character at the position of the passed index. As usual, the first character has index 0 and the last character has index length()-1. However, note the following differences:

  • Operator [] does not check whether the index passed as an argument is valid; at() does. If at() is called with an invalid index, it throws an out_of_range exception. If operator [] is called with an invalid index, the behavior is undefined. The effect might be an illegal memory access that might then cause some nasty side effects or a crash (you're lucky if the result is a crash, because then you know that you did something wrong).

  • For the constant version of operator [], the position after the last character is valid. In this case, the current number of characters is a valid index. The operator returns the value that is generated by the default constructor of the character type. Thus, for objects of type string it returns the char ''.

    In all other cases (for the nonconstant version of operator [] and for the at() member function), the current number of characters is an invalid index. Using it might cause an exception or result in undefined behavior.

For example:

    const std::string cs("nico");      //cs contains: 'n' 'i' 'c' 'o'
    std::string s("abcde");            //s contains: 'a' 'b' 'c' 'd' 'e'


    s[2]                               //yields 'c'
    s.at(2)                            //yields 'c'


    s[100]                             //ERROR: undefined behavior
    s.at(100)                          //throws out_of_range


    s[s.length()]                      //ERROR: undefined behavior
    cs[cs.length()]                    //yields ''
    s.at(s.length())                   //throws out_of_range
    cs.at(cs.length())                 //throws out_of_range

To enable you to modify a character of a string, the nonconstant versions of [] and at() return a character reference. Note that this reference becomes invalid on reallocation:

    std::string s("abcde");        //s contains: 'a' 'b' 'c' 'd' 'e'


    char& r = s[2];                //reference to third character
    char* p = &s[3];                //pointer to fourth character


    r = 'X';                       //OK, s contains: 'a' 'b' 'X' 'd' 'e'
    *p = 'Y';                      //OK, s contains: 'a' 'b' 'X' 'Y' 'e'


    s = "new long value";          //reallocation invalidates r and p


    r = 'X';                       //ERROR: undefined behavior
    *p = 'Y';                      //ERROR: undefined behavior

Here, to avoid runtime errors, you would have had to reserve() enough capacity before r and p were initialized.

References and pointers that refer to characters of a string may be invalidated by the following operations:

  • If the value is swapped with swap()

  • If a new value is read by operator>>() or getline()

  • If the contents are exported by data() or c_str()

  • If any nonconstant member function is called, except operator [], at(), begin(), rbegin(), end(), or rend()

  • If any of these functions is followed by operator [], at(), begin(), rbegin(), end(), or rend()

The same applies to iterators (see Section 11.2.13).

Comparisons

The usual comparison operators are provided for strings. The operands may be strings or C-strings:

    std::string s1, s2;
    ...


    s1 == s2       //returns true if s1 and s2 contain the same characters
    s1 < "hello"   //return whether s1 is less than the C-string "hello"

If strings are compared by <, <=, >, or >=, their characters are compared lexicographically according to the current character traits. For example, all of the following comparisons yield true:

    std::string("aaaa") < std::string("bbbb")
    std::string("aaaa") < std::string("abba")
    std::string("aaaa") < std::string("aaaaaa")

By using the compare() member functions you can compare substrings. The compare() member functions can process more than one argument for each string so that you can specify a substring by its index and by its length. Note that compare() returns an integral value rather than a Boolean value. This return value has the following meaning: 0 means equal, a value less than zero means less than, and a value greater than zero means greater than. For example:

    std::string s("abcd");


    s.compare("abcd")          //returns 0
    s.compare ("dcba")         //returns a value < 0 (s is less)
    s.compare ("ab")           //returns a value > 0 (s is greater)


    s.compare (s)              //returns 0 (s is equal to s)
    s.compare(0,2,s,2,2)       //returns a value <0("ab" is less than "cd")
    s.compare (1,2, "bcx",2)   //returns 0 ("bc" is equal to "bc")

To use a different comparison criterion you can define your own comparison criterion and use STL comparison algorithms (see Section 11.2.13, for an example), or you can use special character traits that make comparisons on a case-insensitive basis. However, because a string type that has a special traits class is a different data type, you cannot combine or process these strings with objects of type string. See Section 11.2.14, for an example.

In programs for the international market it might be necessary to compare strings according to a specific locale. Class locale provides the parenthesis operator as convenient way to do this (see page 703). It uses the string collation facet, which is provided to compare strings for sorting according to some locale conventions. See Section 14.4.5, for details.

Modifiers

You can modify strings by using different member functions and operators.

Assignments

To modify a string you can use operator = to assign a new value. The new value may be a string, a C-string, or a single character. In addition, you can use the assign() member functions to assign strings when more than one argument is needed to describe the new value. For example:

    const std::string aString("othello");
    std::string s;


    s = aString;                //assign "othello"
    s = "two
lines";           //assign a C-string
    s = ' ';                    //assign a single character


    s.assign(aString);        //assign "othello" (equivalent to operator =)
    s.assign(aString, 1,3);     //assign "the"
    s.assign(aString,2,string::npos);       //assign "hello"


    s.assign("two
lines") ;    //assign a C-string (equivalent to operator =)
    s.assign("nico" ,5);        //assign the character array: 'n' 'i' 'c' 'o' ''
    s.assign(5,'x'),            //assign five characters: 'x' 'x' 'x' 'x' 'x'

You also can assign a range of characters that is defined by two iterators. See Section 11.2.13, for details.

Swapping Values

As with many nontrivial types, the string type provides a specialization of the swap() function, which swaps the contents of two strings (the global swap() function was introduced in Section 4.4.2). The specialization of swap() for strings guarantees constant complexity. So you should use it to swap the value of strings and to assign strings if you don't need the assigned string after the assignment.

Making Strings Empty

To remove all characters in a string, you have several possibilities. For example:

    std::string s;


    s = "";          // assign the empty string
    s.clear();       // clear contents
    s.erase();       // erase all characters

Inserting and Removing Characters

There are a lot of member functions to insert, remove, replace, and erase characters of a string. To append characters, you can use operator +=, append(), and push_back(). For example:

    const std::string aString("othello");
    std::string s;


    s += aString;            //append "othello"
    s += "two
lines";       //append C-string
    s += '
';               //append single character


    s.append(aString);       //append "othello" (equivalent to operator +=)
    s.append(aString,1,3);   //append "the"
    s.append(aString,2,std::string::npos);    //append "hello"


    s.append("two
lines");  //append C-string (equivalent to operator +=)
    s.append("nico" ,5);     //append character array: 'n' 'i' 'c' 'o' ''
    s.append(5,'x'),         //append five characters: 'x' 'x' 'x' 'x' 'x'


    s.push_back('
'),       //append single character (equivalent to operator +=)

Operator += appends single-argument values. append() lets you specify the appended value by using multiple arguments. One additional version of append() lets you append a range of characters specified by two iterators (see Section 11.2.13). The push_back() member function is provided for back inserters so that STL algorithms are able to append characters to a string (see Section 7.4.2, for details about back inserters and Section 11.2.13, for an example of their use with strings).

Similar to append(), several insert() member functions enable you to insert characters. They require the index of the character, behind which the new characters are inserted:

    const std::string aString("age");
    std::string s("p");


    s.insert(1,aString);        //s: page
    s.insert(1, "ersifl");      //s: persiflage

Note that no insert() member function is provided to pass the index and a single character. Thus you must pass a string or an additional number:

    s.insert(0,' '),     //ERROR
    s.insert(0," ");     //OK

You might also try

    s.insert(0,1, ' '),   //ERROR: ambiguous

However, this results in a nasty ambiguity because insert() is overloaded for the following signatures:

    insert (size_type idx, size_type num, charT c); //position is index
    insert (iterator  pos, size_type num, charT c); //position is iterator

For type string, size_type is usually defined as unsigned and iterator is often defined as char*. In this case, the first argument 0 has two equivalent conversions. So, to get the correct behavior you have to write:

    s.insert((std::string::size_type)0,1,' '),  //OK

The second interpretation of the ambiguity described here is an example of the use of iterators to insert characters. If you wish to specify the insert position as an iterator, you can do it in three ways: insert a single character, insert a certain number of the same character, and insert a range of characters specified by two iterators (see Section 11.2.13).

Similar to append() and insert(), several erase() functions remove characters, and several replace() functions replace characters. For example:

    std::string s = "i18n";                     //s: i18n
    s.replace(1,2, "nternationalizatio");       //s: internationalization
    s.erase(13);                                //s: international
    s.erase(7,5);                               //s: internal
    s.replace(0,2, "ex");                       //s: external

resize() lets you change the number of characters. If the new size that is passed as an argument is less than the current number of characters, characters are removed from the end. If the new size is greater than the current number of characters, characters are appended at the end. You can pass the character that is appended if the size of the string grows. If you don't, the default constructor for the character type is used (which is the '' character for type char).

Substrings and String Concatenation

You can extract a substring from any string by using the substr() member function. For example:

    std::string s("interchangeability");


    s.substr()                      //returns a copy of s
    s.substr(11)                    //returns string("ability")
    s.substr(5,6)                   //returns string ("change")
    s.substr(s.find('c'))           //returns string ("changeability")

You can concatenate two strings or C-strings, or one of those with single characters by using operator +. For example, the statements

    std::string s1("enter");
    std::string s2("nation");
    std::string i18n;


    i18n = 'i' + s1.substr(1) + s2 + "aliz" + s2.substr(1);
    std::cout << "i18n means: " + i18n << endl;

have the following output:

    i18n means: internationalization

Input/Output Operators

The usual I/O operators are defined for strings:

  • Operator >> reads a string from an input stream.

  • Operator << writes a string to an output stream.

These operators behave as they do for ordinary C-strings. In particular, operator >> operates as follows:

  1. It skips leading whitespaces if the skipws flag (see Section 13.7.7, page 625) is set.

  2. It reads all characters until any of the following happens:

    • The next character is a whitespace

    • The stream is no longer in a good state (for example due to end-of-file)

    • The current width() of the stream (see Section 13.7.3) is greater than zero and width() characters are read

    • max_size() characters are read

  3. It sets width() of the stream to 0.

Thus, in general, the input operator reads the next word while skipping leading whitespaces. A whitespace is any character for which isspace(c,strm.getloc()) is true (isspace() is explained in Section 14.4.4).

The output operator also takes the width() of the stream in consideration. That is, if width() is greater than 0, operator << writes at least width() characters.

The string classes also provide a special function in namespace std for reading line-by-line: std::getline(). This reads all characters (including leading whitespaces) until the line delimiter or end-of-file is reached. The line delimiter is extracted but not appended. By default, the line delimiter is the newline character, but you can pass your own "line" delimiter as an optional argument[5]. This way, you can read token-by-token separated by any arbitray character:

    std::string s;


    while (getline(std::cin,s)) {       //for each line read from cin
        ...

    }


    while (getline(std:: cin, s,':')) { //for each token separated by ':'
        ...

    }

Note that if you read token-by-token, the newline character is not a special character. In this case, the tokens might contain a newline character.

Searching and Finding

Strings provide a lot of functions to search and find characters or substrings.[6] You can search

  • A single character, a character sequence (substring), or one of a certain set of characters

  • Forward and backward

  • Starting from any position at the beginning or inside the string

In addition, all search algorithms of the STL can be called when iterators are used.

All search functions have the word find inside their name. They try to find a character position given a value that is passed as an argument. How the search proceeds depends on the exact name of the find function. Table 11.5 lists all of the search functions for strings.

Table 11.5. Search Functions for Strings

String Function Effect
find() Finds the first occurrence of value
rfind() Finds the last occurrence of value (reverse find)
find_first_of() Finds the first character that is part of value
find_last_of() Finds the last character that is part of value
find_first_not_of() Finds the first character that is not part of value
find_last_not_of() Finds the last character that is not part of value

All search functions return the index of the first character of the character sequence that matches the search. If the search fails, they return npos. The search functions use the following argument scheme:

  • The first argument is always the value that is searched.

  • The second optional value indicates an index at which to start the search in the string.

  • The optional third argument is the number of characters of the value to search.

Unfortunately, this argument scheme differs from that of the other string functions. With the other string functions, the starting index is the first argument, and the value and its length are adjacent arguments. In particular, each search function is overloaded with the following set of arguments:

  • const string& value

    The function searches against the characters of the string value.

  • const string& value, size_type idx

    The function searches against the characters of value, starting with index idx in *this.

  • const char* value

    The function searches against the characters of the C-string value.

  • const char* value, size_type idx

    The function searches against the characters of the C-string value, starting with index idx in *this.

  • const char* value, size_type idx, size_type value_len

    The function searches against the value_len characters of the character array value, starting with index idx in *this. Thus, the null character ('') has no special meaning here inside value.

  • const char value

    The function searches against the character value.

  • const char value, size_type idx

    The function searches against the characters value, starting with index idx in *this.

For example:

    std::string s("Hi Bill, I'm ill, so please pay the bill");


    s.find ("i1")                        //returns 4 (first substring "i1")
    s.find("i1", 10)                     //returns 13 (first substring "i1" starting from s[10])
    s.rfind("i1")                        //returns 37 (last substring "il")
    s.find_first_of("i1")                   //returns 1 (first char 'i' or 'l')
    s.find_last_of("i1")                    //returns 39 (last char 'i' or 'l')
    s.find_first_not_of("i1")               //returns 0 (first char neither 'i' nor 'l')
    s.find_last_not_of("i1")                //returns 36 (last char neither 'i' nor 'l')
    s.find("hi")                            //returns npos

You could also use STL algorithms to find characters or substrings in strings. They allow you to use your own comparison criterion (see Section 11.2.13, for an example). However, note that the naming scheme of the STL search algorithms differs from the naming scheme for string search functions (see Section 9.2.2, for details).

The Value npos

If a search function fails, it returns string::npos. Consider the following example:

    std::string s;
    std::string::size_type idx;         //be careful: don't use any other type!
    ...


    idx = s.find("substring");
    if (idx == std::string::npos) {
       ...
    }

The condition of the if statement yields true if and only if "substring" is not part of string s.

Be very careful when using the string value npos and its type. When you want to check the return value always use string::size_type and not int or unsigned for the type of the return value; otherwise, the comparison of the return value with string::npos might not work.

This behavior is the result of the design decision that npos is defined as -1:

    namespace std {
        template<class charT,
                 class traits = char_traits<charT>,
                 class Allocator = allocator<charT> >
        class basic_string {
          public:
                typedef typename Allocator::size_type size_type;
                ...
                static const size_type npos = -1;
                ...
        };
    }

Unfortunately, size_type (which is defined by the allocator of the string) must be an unsigned integral type. The default allocator, allocator, uses type size_t as size_type (see Section 15.3). Because -1 is converted into an unsigned integral type, npos is the maximum unsigned value of its type. However, the exact value depends on the exact definition of type size_type. Unfortunately, these maximum values differ. In fact, (unsigned long)-1 differs from (unsigned short)-1 (provided the size of the types differ). Thus, the comparison

    idx == std::string::npos

might yield false, if idx has the value -1 and idx and string::npos have different types:

    std::string s;

    ...
    int idx = s.find("not found");     //assume it returns npos
    if (idx == std:: string::npos) {   //ERROR: comparison might not work
        ...
    }

One way to avoid this error is to check whether the search fails directly:

    if (s.find("hi") == std::string::npos) {
        ...
    }

However, often you need the index of the matching character position. Thus, another simple solution is to define your own signed value for npos:

    const int NPOS = -1;

Now the comparison looks a bit different (and even more convenient):

    if (idx == NPOS) {     //works almost always
        ...
    }

Unfortunately, this solution is not perfect because the comparison fails if either idx has type unsigned short or the index is greater than the maximum value of int (because of these problems the standard did not define it that way). However, because both might happen very rarely, the solution works in most situations. To write portable code, however, you should always use string::size_type for any index of your string type. For a perfect solution you'd need some overloaded functions that consider the exact type of string::size_type. I hope the standard will provide a better solution in the future.

Iterator Support for Strings

A string is an ordered collection of characters. As a consequence, the C++ standard library provides an interface for strings that lets you use strings as STL containers.[7]

In particular, you can call the usual member functions to get iterators that iterate over the characters of a string. If you are not familiar with iterators, consider them as something that can refer to a single character inside a string, just as ordinary pointers do for C-strings. By using these objects, you can iterate over all characters of a string by calling several algorithms that either are provided by the C++ standard library or that are user defined. For example, you can sort the characters of a string, reverse the order, or find the character that has the maximum value.

String iterators are random access iterators. This means that they provide random access and that you can use all algorithms (see Section 5.3.2, and Section 7.2, for a discussion about iterator categories). As usual, the types of string iterators (iterator, const_iterator, and so on) are defined by the string class itself. The exact type is implementation defined, but usually string iterators are defined simply as ordinary pointers. See Section 7.2.6, for a discussion of a nasty difference between iterators that are implemented as pointers and iterators that are implemented as classes.

Iterators are invalidated when reallocation occurs or when certain changes are made to the values to which they refer. See Section 11.2.6, for details.

Iterator Functions for Strings

Table 11.6 shows all of the member functions that strings provide for iterators. As usual, the range specified by beg and end is a half-open range that includes beg but excludes end (often written as [beg,end), see Section 5.3).

To support the use of back inserters for string, the push_back() function is defined. See Section 7.4.2, for details about back inserters and page 502 for an example of their use with strings.

Example of Using String Iterators

A very useful thing that you can do with string iterators is to make all characters of a string lowercase or uppercase via a single statement. For example:

    //string/iter1.cpp

    #include <string>
    #include <iostream>
    #include <algorithm>
    #include <cctype>
    using namespace std;

Table 11.6. Iterator Operations of Strings

Expression Effect
s.begin() Returns a random access iterator for the first character
s.end() Returns a random access iterator for the position after the last character
s.rbegin() Returns a reverse iterator for the first character of a reverse iteration (thus, for the last character)
s.rend() Returns a reverse iterator for the position after the last character of a reverse iteration (thus, the position before the first character)
string s(beg,end) Creates a string that is initialized by all characters of the range [beg,end)
s.append(beg,end) Appends all characters of the range [beg,end)
s.assign(beg,end) Assigns all characters of the range [beg,end)
s.insert(pos,c) Inserts the character c at iterator position pos and returns the iterator position of the new character
s.insert(pos,num,c) Inserts num occurrences of the character c at iterator position pos and returns the iterator position of the first new character
s.insert(pos,beg,end) Inserts all characters of the range [beg,end) at iterator position pos
s.erase(pos) Deletes the character to which iterator pos refers and returns the position of the next character
s.erase(beg,end) Deletes all characters of the range [beg,end) and returns the next position of the next character
s.replace(beg, end, str) Replaces all characters of the range [beg,end) with the characters of string str
s.replace(beg,end,cstr) Replaces all characters of the range [beg,end) with the characters of the C-string cstr
s.replace(beg,end,cstr,len) Replaces all characters of the range [beg,end) with len characters of the character array cstr
s.replace(beg,end,num,c) Replaces all characters of the range [beg,end) with num occurrences of the character c
s.replace(beg,end,newBeg,newEnd) Replaces all characters of the range [beg,end) with all characters of the range [newBeg,newEnd)
    int main()
    {
        //create a string
        string s("The zip code of Hondelage in Germany is 38108");
        cout << "original: " << s << endl;


        //lowercase all characters
        transform (s.begin(), s.end(),    //source
                   s.begin(),             //destination
                   tolower);              //operation
        cout << "lowered: " << s << endl;


        //uppercase all characters
        transform (s.begin(), s.end(),    //source
                   s.begin(),             //destination
                   toupper);              //operation
        cout << "uppered: " << s << endl;

    }

The output of the program is as follows:

    original: The zip code of Hondelage in Germany is 38108
    lowered:  the zip code of hondelage in germany is 38108
    uppered:  THE ZIP CODE OF HONDELAGE IN GERMANY IS 38108

Note that tolower() and toupper() are old C functions that use the global locale. If you have a different locale or more than one locale in your program, you should use the new form of tolower() and toupper(). See Section 14.4.4, for details.

The following example demonstrates how the STL enables you to use your own search and sort criteria. It compares and searches strings in a case-insensitive way:

    //string/iter2.cpp

    #include <string>
    #include <iostream>
    #include <algorithm>
    using namespace std;


    bool nocase_compare (char c1, char c2)
    {
        return toupper(c1) == toupper(c2);
    }
    int main()
    {
        string s1("This is a string");
        string s2("STRING");


        //compare case insensitive
        if (s1.size() == s2.size() &&        //ensure same sizes
            equal (s1.begin(),s1.end(),      //first source string
                   s2.begin(),               //second source string
                   nocase_compare)) {        //comparison criterion
            cout << "the strings are equal" << endl;
        }
        else {
            cout << "the strings are not equal" << endl;
        }


        //search case insensitive
        string::iterator pos;
        pos = search (s1.begin(), s1.end(),  //source string in which to search
                      s2.begin(), s2.end(),  //substring to search
                      nocase_compare);       //comparison criterion
        if (pos == s1.end()) {
            cout << "s2 is not a substring of s1" << endl;
        }
        else {
            cout << ' " ' << s2 << "" is a substring of ""
                 << s1 << "" (at index " << pos - s1.begin() << ")"
                 << endl;
        }
    }

Note that the caller of equal() has to ensure that the second range has at least as many elements/characters as the first range. Thus, comparing the string size is necessary; otherwise, the behavior will be undefined.

In the last output statement you can process the difference of two string iterators to get the index of the character position:

    pos - s1.begin()

This is because string iterators are random access iterators. Similar to transferring an index into the iterator position, you can simply add the value of the index.

In this example the user-defined auxiliary function nocase_compare() is provided to compare two strings in a case-insensitive way. Instead, you can also use a combination of some function adapters and replace the expression nocase_compare with the following expression:

    compose_f_gx_hy(equal_to<int>(),
                     ptr_fun(toupper),
                     ptr_fun(toupper))

See page 309 and page 318 for further details.

If you use strings in sets or maps, you might need a special sorting criterion to let the collections sort the string in a case-insensitive way. See page 213 for an example that demonstrates how to do this.

The following program demonstrates other examples of strings using iterator functions:

    //string/iter3.cpp

    #include <string>
    #include <iostream>
    #include <algorithm>
    using namespace std;


    int main()
    {
        //create constant string
        const string hello("Hello, how are you?");


        //initialize string s with all characters of string hello
        string s(hello.begin(),hello.end());


        //iterate through all of the characters
        string::iterator pos;
        for (pos = s.begin(); pos != s.end(); ++pos) {
            cout << *pos;
        }
        cout << endl;


        //reverse the order of all characters inside the string
        reverse (s.begin(), s.end());
        cout << "reverse:       " << s << endl;


        //sort all characters inside the string
        sort (s.begin(), s.end());
        cout << "ordered:       " << s << endl;


        /*remove adjacent duplicates
         *-unique() reorders and returns new end
         *-erase() shrinks accordingly
         */
        s.erase (unique(s.begin(),
                        s.end()),
                 s.end());
        cout << "no duplicates: " << s << endl;
    }

The program has the following output:

    Hello, how are you?
    reverse:       ?uoy era woh, olleH
    ordered:          ,?Haeehlloooruwy
    no duplicates:  ,?Haehloruwy

The following example uses back inserters to read the standard input into a string:

    //string/unique.cpp

    #include <iostream>
    #include <string>
    #include <algorithm>
    #include <locale>
    using namespace std;


    class bothWhiteSpaces {
      private:
        const locale& loc; //locale
      public:
        /*constructor
         *-save the locale object
         */
        bothWhiteSpaces (const locale& l) : loc(1) {
        }
        /*function call
         *-returns whether both characters are whitespaces
         */
        bool operator() (char elem1, char elem2) {
            return isspace(elem1,loc) && isspace(elem2,loc);
        }
    };


    int main()
    {
        string contents;


        //don't skip leading whitespaces
        cin.unsetf (ios::skipws);


        //read all characters while compressing whitespaces
        unique_copy(istream_iterator<char>(cin) ,      //beginning of source
                    istream_iterator<char>(),          //end of source
                    back_inserter (contents),          //destination
                    bothWhiteSpaces (cin.getloc ())); //criterion for removing
        //process contents
        //-here: write it to the standard output
        cout << contents;
    }

By using the unique_copy() algorithm (see Section 9.7.2), all characters are read from the input stream cin and inserted into the string contents. The bothWhiteSpaces function object is used to check whether two consecutive characters are both whitespaces. To do this, it is initialized by the locale of cin and calls isspace(), which checks whether a character is a whitespace character (see Section 14.4.4, for a discussion of isspace()). unique_copy() uses the criterion bothWhiteSpaces to remove adjacent duplicate whitespaces. You can find a similar example in the reference section about unique_copy() on page 385.

Internationalization

As mentioned in the introduction of the string class (see Section 11.2.1), the template string class basic_string<> is parameterized by the character type, the traits of the character type, and the memory model. Type string is the specialization for characters of type char, and type wstring is the specialization for characters of type wchar_t.

The character traits are provided to specify the details of how to deal with aspects depending on the representation of a character type. An additional class is necessary because you can't change the interface of built-in types (such as char and wchar_t), and the same character type may have different traits. The details about the traits classes are described in Section 14.1.2.

The following code defines a special traits class for strings so that they operate in a case-insensitive way:

    //string/icstring.hpp
    #ifndef ICSTRING_HPP
    #define ICSTRING_HPP
 
    #include <string>
    #include <iostream>
    #include <cctype>


    /* replace functions of the standard char_traits<char>
     * so that strings behave in a case-insensitive way
     */
    struct ignorecase_traits : public std::char_traits<char> {
        //return whether c1 and c2 are equal
        static bool eq(const char& c1, const char& c2) {
            return std::toupper(c1)==std::toupper(c2);
        }
        //return whether cl is less than c2
        static bool It(const char& c1, const char& c2){
            return std::toupper(c1)<std::toupper(c2);
        }
        //compare up to n characters of s1 and s2
        static int compare(const char* s1, const char* s2, std::size_t n) {
            for (std::size_t i=0; i<n; ++i) {
                if (!eq(s1[i],s2[i])) {
                    return lt(s1 [i],s2[i])?-1:1;
                }
            }
            return 0;
        }
        //search c in s
        static const char* find(const char* s, std::size_t n,
                                const char& c) {
            for (std::size_t i=0; i<n; ++i) {
                 if (eq(s[i],c)) {
                     return &(s[i]);
                 }
            }
            return 0;
        }
    };
    //define a special type for such strings
    typedef std::basic_string<char,ignorecase_traits> icstring;


    /*define an output operator
     *because the traits type is different than that for std::ostream
     */
    inline 
    std::ostream& operator << (std::ostream& strm, const icstring& s)
    {
        //simply convert the icstring into a normal string
        return strm << std::string(s.data(), s.length());
    }

    #endif    // ICSTRING_HPP

The definition of the output operator is necessary because the standard only defines I/O operators for streams that use the same character and traits type. But here, the traits type differs, so we have to define our own output operator. For input operators the same problem occurs.

The following program demonstrates how to use these special kinds of strings:

    //string/icstring1.cpp

    #include "icstring.hpp"


    int main()
    {
        using std::cout;
        using std::endl;


        icstring s1("hallo");
        icstring s2("otto");
        icstring s3("hALLo");


        cout << std::boolalpha;
        cout << s1 << " == " << s2 << " : " << (s1==s2) << endl;
        cout << s1 << " == " << s3 << " : " << (s1==s3) << endl;


        icstring::size_type idx = s1.find("All");
        if (idx != icstring::npos) {
            cout << "index of "A11" in "" << s1 << "": "
                 << idx << endl;
        }
        else {
            cout << ""All" not found in "" << s1 << endl;
        }
    }

The program has the following output:

    hallo == otto : false
    hallo == hALLo : true
    index of "All" in "hallo": 1

See Chapter 14 for more details about internationalization.

Performance

The standard does not specify how the string class is to be implemented. It only specifies the interface. There may be important differences in speed and memory usage depending on the concept and priorities of the implementation.

If you prefer better speed, make sure that your string class uses a concept such as reference counting. Reference counting makes copies and assignments faster because the implementation only copies and assigns references instead of the contents of a string (see Section 6.8, for a smart pointer class that enables reference counting for any type). By using reference counting you might not even need to pass strings by constant reference; however, to maintain flexibility and portability, you always should.

Strings and Vectors

Strings and vectors behave similarly. This is not a surprise because both are containers that are typically implemented as dynamic arrays. Thus, you could consider a string as a special kind of a vector that has characters as elements. In fact, you can use a string as an STL container. This is covered by Section 11.2.13. However, considering a string as a special kind of vector is dangerous because there are many fundamental differences between the two. Chief of these are their two primary goals:

  • The primary goal of vectors is to handle and to manipulate the elements of the container, not the container as a whole. Thus, vector implementations are optimized to operate on elements inside the container.

  • The primary goal of strings is to handle and to manipulate the container (the string) as a whole. Thus, strings are optimized to reduce the costs of assigning and passing the whole container.

These different goals typically result in completely different implementations. For example, strings are often implemented by using reference counting; vectors never are. Nevertheless, you can also use vectors as ordinary C-strings. See Section 6.2.3, for details.

String Class in Detail

In this section string means the actual string class. It might be string, wstring, or any other specialization of class basic_string<>. Type char means the actual character type, which is char for string and wchar_t for wstring. Other types and values that are in italic type have definitions that depend on individual definitions of the character type or traits class. The details about traits classes are provided in Section 14.1.2.

Type Definitions and Static Values

string::traits_type

  • The type of the character traits.

  • The second template parameter of class basic_string.

  • For type string, it is equivalent to char_traits<char>.

string::value_type

  • The type of the characters.

  • It is equivalent to traits_type::char_type.

  • For type string, it is equivalent to char.

string::size_type

  • The unsigned integral type for size values and indices.

  • It is equivalent to allocator_type::size_type.

  • For type string, it is equivalent to size_t.

string::difference_type

  • The signed integral type for difference values.

  • It is equivalent to allocator_type::difference_type.

  • For type string, it is equivalent to ptrdiff_t.

string::reference

  • The type of character references.

  • It is equivalent to allocator_type::reference.

  • For type string, it is equivalent to char&.

string::const_reference

  • The type of constant character references.

  • It is equivalent to allocator_type::const_reference.

  • For type string, it is equivalent to const char&.

string::pointer

  • The type of character pointers.

  • It is equivalent to allocator_type::pointer.

  • For type string, it is equivalent to char*.

string::const_pointer

  • The type of constant character pointers.

  • It is equivalent to allocator_type::const_pointer.

  • For type string, it is equivalent to const char*.

string::iterator

  • The type of iterators.

  • The exact type is implementation defined.

  • For type string, it is typically char*.

string::const_iterator

  • The type of constant iterators.

  • The exact type is implementation defined.

  • For type string, it is typically const char*.

string::reverse_iterator

  • The type of reverse iterators.

  • It is equivalent to reverse_iterator<iterator>.

string::const_reverse_iterator

  • The type of constant reverse iterators.

  • It is equivalent to reverse_iterator<const_iterator>.

static const size_type string::npos

  • A special value that indicates one of the following:

    • "not found"

    • "all remaining characters"

  • It is an unsigned integral value that is initialized by -1.

  • Be careful when you use npos. See Section 11.2.12, for details.

Create, Copy, and Destroy Operations

string::string ()

  • The default constructor.

  • Creates an empty string.

string::string (const string& str)

  • The copy constructor.

  • Creates a new string as a copy of str.

string::string (const string& str, size_type str_idx)

string::string (const string& str, size_type str_idx, size_type str_num)

  • Create a new string that is initialized by, at most, the first str_num characters of str starting with index str_idx.

  • If str_num is missing, all characters from str_idx to the end of str are used.

  • Throws out_of_range if str_idx > str.size().

string::string (const char* cstr)

  • Creates a string that is initialized by the C-string cstr.

  • The string is initialized by all characters of cstr up to but not including ''.

  • Note that cstr must not be a null pointer (NULL).

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string::string (const char* chars, size_type chars_len)

  • Creates a string that is initialized by chars_len characters of the character array chars.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

  • Throws length_error if chars_len is equal to string::npos.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string::string (size_type num, char c)

  • Creates a string that is initialized by num occurrences of character c.

  • Throws length_error if num is equal to string::npos.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string ::string (InputIterator beg, Input Iterator end)

  • Creates a string that is initialized by all characters of the range [beg,end).

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string::~string ()

  • The destructor.

  • Destroys all characters and frees the memory.

Most constructors allow you to pass an allocator as an additional argument (see Section 11.3.12).

Operations for Size and Capacity

Size Operations

size_type string::size () const

size_type string::length () const

  • Both functions return the current number of characters.

  • They are equivalent.

  • To check whether the string is empty, you should use empty() because it might be faster.

bool string::empty () const

  • Returns whether the string is empty (contains no characters).

  • It is equivalent to string::size()==0, but it might be faster.

size_type string::max_size () const

  • Returns the maximum number of characters a string could contain.

  • Whenever an operation results in a string that has a length greater than max_size(), the class throws length_error.

Capacity Operations

size_type string::capacity () const

  • Returns the number of characters the string could contain without reallocation.

void string::reserve ()

void string::reserve (size_type num)

  • The second form reserves internal memory for at least num characters.

  • If num is less than the current capacity, the call is taken as a nonbinding request to shrink the capacity.

  • If num is less than the current number of characters, the call is taken as a nonbinding request to shrink the capacity to fit the current number of characters.

  • If no argument is passed, the call is always a nonbinding shrink-to-fit request.

  • The capacity is never reduced below the current number of characters.

  • Each reallocation invalidates all references, pointers, and iterators and takes some time, so a preemptive call to reserve() is useful to increase speed and to keep references, pointers, and iterators valid (see Section 11.2.5, for details).

Comparisons

bool comparison (const string& str1, const string& str2)

bool comparison (const string& str, const char* cstr)

bool comparison (const char* cstr, const string& str)

  • The first form returns the result of the comparison of two strings.

  • The second and third form return the result of the comparison of a string with a C-string.

  • comparison might be any of the following:

        operator ==
        operator !=
        operator <
        operator >
        operator <=
        operator >=
  • The values are compared lexicographically (see page 488).

int string::compare (const string& str) const

  • Compares the string *this with the string str.

  • Returns

    • 0 if both strings are equal

    • A value < 0 if *this is lexicographically less than str

    • A value > 0 if *this is lexicographically greater than str

  • For the comparison, traits::compare() is used (see Section 14.1.2).

  • See Section 11.2.7, for details.

int string::compare (size_type idx, size_type len, const string& str) const

  • Compares, at most, len characters of string *this, starting with index idx with the string str.

  • Throws out_of_range if idx >=size().

  • The comparison is performed as just described for compare (str).

int string::compare (size_type idx, size_type len, const string& str, size_type str_idx, size_type str_len) const

  • Compares, at most, len characters of string *this, starting with index idx with, at most, str_len characters of string str starting with index str_idx.

  • Throws out_of_range if idx >=size().

  • Throws out_of_range if str_idx > str.size().

  • The comparison is performed as just described for compare (str).

int string::compare (const char* cstr) const

  • Compares the characters of string *this with the characters of the C-string cstr.

  • The comparison is performed as just described for compare (str).

int string::compare (size_type idx, size_type len, const char* cstr) const

  • Compares, at most, len characters of string *this, starting with index idx with all characters of the C-string cstr.[8]

  • The comparison is performed as just described for compare(str).

  • Note that cstr must not be a null pointer (NULL).

int string::compare (size_type idx,size_type len, const char* chars, size_type chars_len)const

  • Compares, at most, len characters of string *this, starting with index idx with chars_len characters of the character array chars.

  • The comparison is performed as just described for compare(str).

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

  • Throws length_error if chars_len is equal to string::npos.

Character Access

char& string::operator [ ] (size_type idx)

char string::operator [ ] (size_type idx) const

  • Both forms return the character with the index idx (the first character has index 0).

  • For constant strings, length() is a valid index and the operator returns the value generated by the default constructor of the character type (for string: '').

  • For nonconstant strings, using length() as index value is invalid.

  • Passing an invalid index results in undefined behavior.

  • The reference returned for the nonconstant string may become invalidated due to string modifications or reallocations (see Section 11.2.6, for details).

  • If the caller can't ensure that the index is valid, at() should be used.

char& string::at (size_type idx)

const char& string::at (size_type idx) const

  • Both forms return the character that has the index idx (the first character has index 0).

  • For all strings, an index with length() as value is invalid.

  • Passing an invalid index (less than 0 or greater than or equal to size()) throws an out_of_range exception.

  • The reference returned for the nonconstant string may become invalidated due to string modifications or reallocations (see Section 11.2.6, for details).

  • If the caller ensures that the index is valid, she can use operator [], which is faster.

Generating C-Strings and Character Arrays

const char* string::c_str () const

  • Returns the contents of the string as a C-string (an array of characters that has the null character '' appended).

  • The return value is owned by the string. Thus, the caller must neither modify nor free or delete the return value.

  • The return value is valid only as long as the string exists, and as long as only constant functions are called for it.

const char* string::data () const

  • Returns the contents of the string as a character array.

  • The return value contains all characters of the string without any modification or extension. In particular, no null character is appended. Thus, the return value is, in general, not a valid C-string.

  • The return value is owned by the string. Thus, the caller must neither modify nor free or delete the return value.

  • The return value is valid only as long as the string exists, and as long as only constant functions are called for it.

size_type string::copy (char* buf, size_type buf_size) const

size_type string::copy (char* buf, size_type buf_size, size_type idx) const

  • Both forms copy, at most, buf_size characters of the string (beginning with index idx) into the character array buf.

  • They return the number of characters copied.

  • No null character is appended. Thus, the contents of buf might not be a valid C-string after the call.

  • The caller must ensure that buf has enough memory; otherwise, the call results in undefined behavior.

  • Throws out_of_range if idx > size().

Modifying Operations

Assignments

string& string::operator = (const string& str)

string& string::assign (const string& str)

  • Both operations assign the value of string str.

  • They return *this.

string& string::assign (const string& str, size_type str_idx, size_type str_num)

  • Assigns at most str_num characters of str starting with index str_idx.

  • Returns *this.

  • Throws out_of_range if str_idx > str. size().

string & string:: operator = (const char* cstr)

string & string::assign (const char* cstr)

  • Both operations assign the characters of the C-string cstr.

  • They assign all characters of cstr up to but not including ''.

  • Both operations return *this.

  • Note that cstr must not be a null pointer (NULL).

  • Both operations throw length_error if the resulting size exceeds the maximum number of characters.

string& string::assign (const char* chars, size_type chars_len)

  • Assigns chars_len characters of the character array chars.

  • Returns *this.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string:: operator = (char c)

  • Assigns character c as the new value.

  • Returns *this.

  • After this call, *this contains only this single character.

string & string::assign (size_type num, char c)

  • Assigns num occurrences of character c.

  • Returns *this.

  • Throws length_error if num is equal to string::npos.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

void string::swap (string& str)

void swap (string& str1, string& str2)

  • Both forms swap the value of two strings:

    • The member function swaps the contents of *this and str.

    • The global function swaps the contents of str1 and str2.

  • You should prefer these functions over assignment if possible because they are faster. In fact, they are guaranteed to have constant complexity. See Section 11.2.8, for details.

Appending Characters

string& string::operator += (const string& str)

string& string::append (const string& str)

  • Both operations append the characters of str.

  • They return *this.

  • Both operations throw length_error if the resulting size exceeds the maximum number of characters.

string& string::append (const string& str, size_type str_idx, size_type str_num)

  • Appends, at most, str_num characters of str, starting with index str_idx.

  • Returns *this.

  • Throws out_of_range if str_idx > str. size().

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string:: operator += (const char* cstr)

string& string::append (const char* cstr)

  • Both operations append the characters of the C-string cstr.

  • They return *this.

  • Note that cstr must not be a null pointer (NULL).

  • Both operations throw length_error if the resulting size exceeds the maximum number of characters.

string& string::append (const char* chars, size_type chars_len)

  • Appends chars_len characters of the character array chars.

  • Returns *this.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string::append (size_type num, char c)

  • Appends num occurrences of character c.

  • Returns *this.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string::operator += (char c)

void string:: push_back (char c)

  • Both operations append character c.

  • Operator += returns *this.

  • Both operations throw length_error if the resulting size exceeds the maximum number of characters.

string& string::append (InputIterator beg, InputIterator end)

  • Appends all characters of the range [beg,end).

  • Returns *this.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

Inserting Characters

string& string::insert (size_type idx, const string& str)

  • Inserts the characters of str so that the new characters start with index idx.

  • Returns *this.

  • Throws out_of_range if idx > size().

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string::insert (size_type idx, const string& str, size_type str_idx, size_type str_num)

  • Inserts, at most, str_num characters of str, starting with index str_idx, so that the new characters start with index idx.

  • Returns *this.

  • Throws out_of_range if idx > size().

  • Throws out_of_range if str_idx > str.size().

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string::insert (size_ type idx, const char* cstr)

  • Inserts the characters of the C-string cstr so that the new characters start with index idx.

  • Returns *this.

  • Note that cstr must not be a null pointer (NULL).

  • Throws out_of_range if idx > size().

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string::insert (size_type idx, const char* chars, size_type chars_len)

  • Inserts chars_len characters of the character array chars so that the new characters start with index idx.

  • Returns *this.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

  • Throws out_of_range if idx > size().

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string ::insert (size_type idx, size_type num, char c)

void string ::insert (iterator pos, size_type num, char c)

  • Both forms insert num occurrences of character c at the position specified by idx or pos respectively.

  • The first form inserts the new characters so that they start with index idx.

  • The second form inserts the new characters before the character to which iterator pos refers.

  • Note that the overloading of these two functions results in a possible ambiguity. If you pass 0 as first argument, it can be interpreted as an index (which is typically a conversion to unsigned) or as an iterator (which is often a conversion to char*). So in this case you should pass an index as the exact type. For example:

        std::string s;
        ...
        s.insert (0,1, ' ') ;                   //ERROR: ambiguous
        s.insert((std::string::size_type)0,1,' '),   //OK
  • Both forms return *this.

  • Both forms throw out_of_range if idx > size().

  • Both forms throw length_error if the resulting size exceeds the maximum number of characters.

iterator string ::insert (iterator pos, char c )

  • Inserts a copy of character c before the character to which iterator pos refers.

  • Returns the position of the character inserted.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

void string ::insert (iterator pos, InputIterator beg, InputIterator end )

  • Inserts all characters of the range [beg,end) before the character to which iterator pos refers.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

Erasing Characters

void string ::clear ()

string& string ::erase ()

  • Both functions delete all characters of the string. Thus, the string is empty after the call.

  • erase() returns *this.

string& string ::erase (size_type idx )

string& string ::erase (size_type idx, size_type len )

  • Both forms erase, at most, len characters of *this, starting at index idx.

  • They return *this.

  • If len is missing, all remaining characters are removed.

  • Both forms throw out_of_range if idx > size().

string& string ::erase (iterator pos)

string& string ::erase (iterator beg, iterator end )

  • Both forms erase the single character at iterator position pos or all characters of the range [beg,end) respectively.

  • They return the first character after the last character removed (thus, the second form returns end)[9]

Changing the Size

void string ::resize (size_type num)

void string ::resize (size_type num, char c )

  • Both forms change the number of characters of *this to num. Thus, if num is not equal to size(), they append or remove characters at the end according to the new size.

  • If the number of characters increases, the new characters are initialized by c. If c is missing, the characters are initialized by the default constructor of the character type (for string: '').

  • Both forms throw length_error if num is equal to string ::npos.

  • Both forms throw length_error if the resulting size exceeds the maximum number of characters.

Replacing Characters

string& string ::replace (size_type idx, size_type len, const string& str)

string& string ::replace (iterator beg, iterator end, const string& str)

  • The first form replaces, at most, len characters of *this, starting with index idx, with all characters of str.

  • The second form replaces all characters of the range [beg,end) with all characters of str.

  • Both forms return *this.

  • Both forms throw out_of_range if idx > size().

  • Both forms throw length_error if the resulting size exceeds the maximum number of characters.

string& string::replace (size_type idx, size_type len, const string& str, size_type str_idx, size_type str_num)

  • Replaces, at most, len characters of *this, starting with index idx, with at most str_num characters of str starting with index str_idx.

  • Returns *this.

  • Throws out_of_range if idx > size().

  • Throws out_of_range if str_idx > str. size().

  • Throws length_error if the resulting size exceeds the maximum number of characters.

string& string::replace (size_type idx, size_type len, const char* cstr)

string& string::replace (iterator beg, iterator end, const char* cstr)

  • Both forms replace, at most, len characters of *this, starting with index idx, or all characters of the range [beg,end), respectively, with all characters of the C-string cstr.

  • Both forms return *this.

  • Note that cstr must not be a null pointer (NULL).

  • Both forms throw out_of_range if idx > size().

  • Both forms throw length_error if the resulting size exceeds the maximum number of characters.

string& string::replace (size_type idx, size_type len, const char* chars, size_type chars_len)

string& string::replace (iterator beg, iterator end, const char* chars, size_type chars_len)

  • Both forms replace, at most, len characters of *this, starting with index idx, or all characters of the range [beg,end), respectively, with chars_len characters of the character array chars.

  • They return *this.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

  • Both forms throw out_of_range if idx > size().

  • Both forms throw length_error if the resulting size exceeds the maximum number of characters.

string& string::replace (size_type idx, size_type len, size_type num, char c)

string& string::replace (iterator beg, iterator end, size_type num, char c)

  • Both forms replace, at most, len characters of *this, starting with index idx, or all characters of the range [beg,end), respectively, with num occurrences of character c

  • They return *this.

  • Both forms throw out_of_range if idx > size().

  • Both forms throw length_error if the resulting size exceeds the maximum number of characters.

string& string::replace (iterator beg, iterator end InputIterator newBeg, InputIterator newEnd)

  • Replaces all characters of the range [beg,end) with all characters of the range [newBeg,newEnd).

  • Returns *this.

  • Throws length_error if the resulting size exceeds the maximum number of characters.

Searching and Finding

Find a Character

size_type string::find (char c) const

size_type string::find (char c, size_type idx) const

size_type string::rfind (char c) const

size_type string::rfind (char c, size_type idx) const

  • These functions search for the first/last character c (starting at index idx).

  • The rfind() functions search forward and return the first substring.

  • The rfind() functions search backward and return the last substring.

  • These functions return the index of the character when successful or string::npos if they fail.

Find a Substring

size_type string::find (const string& str) const

size_type string::find (const string& str, size_type idx) const

size_type string::rfind (const string& str) const

size_type string::rfind (const string& str, size_type idx) const

  • These functions search for the first/last substring str (starting at index idx).

  • The find() functions search forward and return the first substring.

  • The find() functions search backward and return the last substring.

  • These functions return the index of the first character of the substring when successful or string::npos if they fail.

size_type string::find (const char* cstr) const

size_type string::find (const char* cstr, size_type idx) const

size_type string::rfind (const char* cstr) const

size_type string::rfind (const char* cstr, size_type idx) const

  • These functions search for the first/last substring that has the characters of the C-string cstr (starting at index idx).

  • The find() functions search forward and return the first substring.

  • The rfind() functions search backward and return the last substring.

  • These functions return the index of the first character of the substring when successful or string::npos if they fail.

  • Note that cstr must not be a null pointer (NULL).

size_type string::find (const char* chars, size_type idx, size_type chars_len) const

size_type string::rfind (const char* chars, size_type idx, size_type chars_len) const

  • These functions search for the first/last substring that has chars_len characters of the character array chars (starting at index idx).

  • find() searches forward and returns the first substring.

  • find() searches backward and returns the last substring.

  • These functions return the index of the first character of the substring when successful or string::npos if they fail.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

Find First of Different Characters

size_type string::find_first_of (const string& str) const

size_type string::find_first_of (const string& str, size_type idx) const

size_type string::find_first_not_of (const string& str) const

size_type string::find_first_not_of (const string& str, size_type idx) const

  • These functions search for the first character that is or is not also an element of the string str (starting at index idx).

  • These functions return the index of that character or substring when successful or string::npos if they fail.

size_type string::find_first_of (const char* cstr) const

size_type string::find_first_of (const char* cstr, size_type idx) const

size_type string::find_first_not_of (const char* cstr) const

size_type string:: find_first_not_of (const char* cstr, size_type idx) const

  • These functions search for the first character that is or is not also an element of the C-string cstr (starting at index idx).

  • These functions return the index of that character when successful or string::npos if they fail.

  • Note that cstr must not be a null pointer (NULL).

size_type string::find_first_of (const char* chars, size_type idx, size_type chars_len) const

size_type string::find_first_not_of (const char* chars, size_type idx, size_type chars_len) const

  • These functions search for the first character that is or is not also an element of the chars_len characters of the character array chars (starting at index idx).

  • These functions return the index of that character when successful or string::npos if they fail.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

size_type string::find_first_of (char c) const

size_type string::find_first_of (char c, size_type idx) const

size_type string::find_first_not_of (char c) const

size_type string::find_first_not_of (char c, size_type idx) const

  • These functions search for the first character that has or does not have the value c (starting at index idx).

  • These functions return the index of that character when successful or string::npos if they fail.

Find Last of Different Characters

size_type string::find_last_of (const string& str) const

size_type string::find_last_of (const string& str, size_type idx) const

size_type string::find_last_not_of (const string& str) const

size_type string::find_last_not_of (const string& str, size_type idx) const

  • These functions search for the last character that is or is not also an element of the string str (starting at index idx).

  • These functions return the index of that character or substring when successful or string::npos if they fail.

size_type string::find_last_of (const char* cstr) const

size_type string::find_last_of (const char* cstr, size_type idx) const

size_type string::find_last_not_of (const char* cstr) const

size_type string::find_last_not_of (const char* cstr, size_type idx) const

  • These functions search for the last character that is or is not also an element of the C-string cstr (starting at index idx).

  • These functions return the index of that character when successful or string::npos if they fail.

  • Note that cstr must not be a null pointer (NULL).

size_type string::find_last_of (const char* chars, size_type idx, size_type chars_len) const

size_type string::find_last_not_of (const char* chars, size_type idx, size_type chars_len) const

  • These functions search for the last character that is or is not also an element of the chars_len characters of the character array chars (starting at index idx).

  • These functions return the index of that character when successful or string::npos if they fail.

  • Note that chars must have at least chars_len characters. The characters may have arbitrary values. Thus, '' has no special meaning.

size_type string::find_last_of ( char c) const

size_type string::find_last_of (char c, size_type idx) const

size_type string::find_last_not_of (char c) const

size_type string::find_last_not_of (char c, size_type idx) const

  • These functions search for the last character that has or does not have the value c (starting at index idx).

  • These functions return the index of that character when successful or string::npos if they fail.

Substrings and String Concatenation

string string::substr () const

string string::substr (size_type idx) const

string string::substr (size_type idx, size_type len) const

  • All forms return a substring of, at most, len characters of the string *this starting with index idx.

  • If len is missing, all remaining characters are used.

  • If idx and len are missing, a copy of the string is returned.

  • All forms throw out_of_range if idx > size().

string operator + (const string& str1, const string& str2)

string operator + (const string& str, const char* cstr)

string operator + (const char* cstr, const string& str)

string operator + (const string& str, char c)

string operator + (char c, const string& str)

  • All forms concatenate all characters of both operands and return the sum string.

  • The operands may be any of the following:

    • A string

    • A C-string

    • A single character

  • All forms throw length_error if the resulting size exceeds the maximum number of characters.

Input/Output Functions

ostream& operator<< (ostream& strm, const string& str)

  • Writes the characters of str to the stream strm.

  • If strm.width() is greater than 0, at least width() characters are written and width() is set to 0.

  • ostream is the ostream type basic_ostream<char> according to the character type (see Section 13.2.1).

istream& operator >> (istream& strm, string& str)

  • Reads the characters of the next word from strm into the string str.

  • If the skipws flag is set for strm, leading whitespaces are ignored.

  • Characters are extracted until any of the following happens:

    • strm.width() is greater than 0 and width() characters are stored

    • strm. good() is false (which might cause an appropriate exception)

    • isspace (c, strm. getloc()) is true for the next character c

    • str.max_size() characters are stored

  • The internal memory is reallocated accordingly.

  • istream is the istream type basic_istream<char> according to the character type (see Section 13.2.1).

istream& getline (istream& strm, string& str)

istream& getline (istream& strm, string& str, char delim)

  • Read the characters of the next line from strm into the string str.

  • All characters (including leading whitespaces) are extracted until any of the following happens:

    • strm.good() is false (which might cause an appropriate exception)

    • delim or strm. widen(' ') is extracted

    • str.max_size() characters are stored

  • The line delimiter is extracted but not appended.

  • The internal memory is reallocated accordingly.

  • istream is the istream type basic_istream<char> according to the character type (see Section 13.2.1).

Generating Iterators

iterator string::begin ()

const_iterator string::begin() const

  • Both forms return a random access iterator for the beginning of the string (the position of the first character).

  • If the string is empty, the call is equivalent to end().

iterator string::end ()

const_iterator string::end() const

  • Both forms return a random access iterator for the end of the string (the position after the last character).

  • Note that the character at the end is not defined. Thus, *s. end() results in undefined behavior.

  • If the string is empty, the call is equivalent to begin().

reverse_iterator string::rbegin ()

const_reverse_iterator string::rbegin () const

  • Both forms return a random access iterator for the beginning of a reverse iteration over the string (the position of the last character).

  • If the string is empty, the call is equivalent to rend().

  • For details about reverse iterators see Section 7.4.1.

reverse_iterator string::rend ()

const_reverse_iterator string::rend () const

  • Both forms return a random access iterator for the end of the reverse iteration over the string (the position before the first character).

  • Note that the character at the reverse end is not defined. Thus, *s.rend() results in undefined behavior.

  • If the string is empty, the call is equivalent to rbegin().

  • For details about reverse iterators see Section 7.4.1.

Allocator Support

Strings provide the usual members of classes with allocator support.

string::allocator_type

  • The type of the allocator.

  • Third template parameter of class basic_string<>.

  • For type string, it is equivalent to allocator<char>.

allocator_type string::get_allocator () const

  • Returns the memory model of the string.

Strings also provide all constructors with optional allocator arguments. The following are all of the string constructors, including their optional allocator arguments, according to the standard:

    namespace std {
        template<class charT,
                 class traits = char_traits<charT>,
                 class Allocator = allocator<charT> >
        class basic_string {
          public:
            //default constructor
            explicit basic_string(const Allocator& a = Allocator());


            //copy constructor and substrings
            basic_string(const basic_string& str,
                         size_type str_idx = 0,
                         size_type str_num = npos);
            basic_string(const basic_string& str,
                         size_type str_idx, size_type str_num,
                         const Allocator&);


            //constructor for C-strings
            basic_string(const charT* cstr,
                         const Allocator& a = Allocator());


            //constructor for character arrays
            basic_string(const charT* chars, size_type chars_len,
                         const Allocator& a = Allocator());


            //constructor for num occurrences of a character
            basic_string(size_type num, charT c,
                         const Allocator& a = Allocator());


            // constructor for a range of characters
            template<class InputIterator>
            basic_string(InputIterator beg, InputIterator end,
                         const Allocator& a = Allocator());
            ...

        };

   }

These constructors behave as described in Section 11.3.2, with the additional ability that you can pass your own memory model object. If the string is initialized by another string, the allocator also gets copied.[10] See Chapter 15 for more details about allocators.



[1] In particular, the size_type of a string depends on the memory model of the string class. See Section 11.3.12, for details.

[2] In this case, two member functions do the same with respect to the two different design approaches that are merged here. length() returns the length of the string as strlen() does for ordinary C-strings, whereas size() is the common member function for the number of elements according to the concept of the STL.

[3] In systems that do not support default template parameters, the third argument is usually missing.

[4] In this case, two member functions do the same thing because length() returns the length of the string, as strlen() does for ordinary C-strings, whereas size() is the common member function for the number of elements according to the concept of the STL.

[5] You don't have to qualify getline() with std:: because "Koenig lookup" will always consider the namespace where the class of an argument was defined when calling a function (see page 17).

[6] Don't be confused because I write about searching "and" finding. They are (almost) synonymous. The search functions use "find" in their name. However, unfortunately they don't guarantee to find anything. In fact, they "search" for something or "try to find" something. So I use the term search for the behavior of these functions and find with respect to their name.

[7] The STL is introduced in Chapter 5.

[8] The standard specifies the behavior of this form of compare() differently: It states that cstr is not considered a C-string but a character array, and passes npos as its length (in fact, it calls the following form of compare() by using npos as an additional parameter). This is a bug in the standard (it would always throw a length_error exception).

[9] The standard specifies that the second form of this function returns the position after end. This is a bug in the standard.

[10] The original standard states that the default allocator is used when a string gets copied. However, this does not make much sense, so this is the proposed resolution to fix this behavior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.31.156