17.3.4. Using regex_replace

Regular expressions are often used when we need not only to find a given sequence but also to replace that sequence with another one. For example, we might want to translate U.S. phone numbers into the form “ddd.ddd.dddd,” where the area code and next three digits are separated by a dot.

When we want to find and replace a regular expression in the input sequence, we call regex_replace . Like the search functions, regex_replace, which is described in Table 17.12, takes an input character sequence and a regex object. We must also pass a string that describes the output we want.

Table 17.12. Regular Expression Replace Operations

Image

We compose a replacement string by including the characters we want, intermixed with subexpressions from the matched substring. In this case, we want to use the second, fifth, and seventh subexpressions in our replacement string. We’ll ignore the first, third, fourth, and sixth, because these were used in the original formatting of the number but are not part of our replacement format. We refer to a particular subexpression by using a $ symbol followed by the index number for a subexpression:

string fmt = "$2.$5.$7"; // reformat numbers to ddd.ddd.dddd

We can use our regular-expression pattern and the replacement string as follows:

regex r(phone);  // a regex to find our pattern
string number = "(908) 555-0132";
cout << regex_replace(number, r, fmt) << endl;

The output from this program is

908.555.0132

Replacing Only Part of the Input Sequence

A more interesting use of our regular-expression processing would be to replace phone numbers that are embedded in a larger file. For example, we might have a file of names and phone number that had data like this:

morgan (201) 555-0168 862-555-0123
drew (973)555.0130
lee (609) 555-0132 2015550175 800.555-0100

that we want to transform to data like this:

morgan 201.555.0168 862.555.0123
drew 973.555.0130
lee 609.555.0132 201.555.0175 800.555.0000

We can generate this transformation with the following program:

int main()
{
    string phone =
       "(\()?(\d{3})(\))?([-. ])?(\d{3})([-. ])?(\d{4})";
    regex r(phone);  // a regex to find our pattern
    smatch m;
    string s;
    string fmt = "$2.$5.$7"; // reformat numbers to ddd.ddd.dddd
    // read each record from the input file
    while (getline(cin, s))
        cout << regex_replace(s, r, fmt) << endl;
    return 0;
}

We read each record into s and hand that record to regex_replace. This function finds and transforms all the matches in its input sequence.

Flags to Control Matches and Formatting

Just as the library defines flags to direct how to process a regular expression, the library also defines flags that we can use to control the match process or the formatting done during a replacement. These values are listed in Table 17.13 (overleaf). These flags can be passed to the regex_search or regex_match functions or to the format members of class smatch.

Table 17.13. Match Flags

Image

The match and format flags have type match_flag_type. These values are defined in a namespace named regex_constants. Like placeholders, which we used with bind10.3.4, p. 399), regex_constants is a namespace defined inside the std namespace. To use a name from regex_constants, we must qualify that name with the names of both namespaces:

using std::regex_constants::format_no_copy;

This declaration says that when our code uses format_no_copy, we want the object of that name from the namespace std::regex_constants. We can instead provide the alternative form of using that we will cover in § 18.2.2 (p. 792):

using namespace std::regex_constants;

Using Format Flags

By default, regex_replace outputs its entire input sequence. The parts that don’t match the regular expression are output without change; the parts that do match are formatted as indicated by the given format string. We can change this default behavior by specifying format_no_copy in the call to regex_replace:

// generate just the phone numbers: use a new format string
string fmt2 = "$2.$5.$7 "; // put space after the last number as a separator
// tell regex_replace to copy only the text that it replaces
cout << regex_replace(s, r, fmt2, format_no_copy) << endl;

Given the same input, this version of the program generates

201.555.0168 862.555.0123
973.555.0130
609.555.0132 201.555.0175 800.555.0100


Exercises Section 17.3.4

Exercise 17.24: Write your own version of the program to reformat phone numbers.

Exercise 17.25: Rewrite your phone program so that it writes only the first phone number for each person.

Exercise 17.26: Rewrite your phone program so that it writes only the second and subsequent phone numbers for people with more than one phone number.

Exercise 17.27: Write a program that reformats a nine-digit zip code as ddddd-dddd.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.189.186