Shun those studies in which the work that results dies with the worker.
The Notebooks
LEONARDO DA VINCI
Suppose that you want to scan a text file encoded in HTML and extract all the code snippets. Each snippet begins with “<CODE>”
and ends with “</CODE>”
. The two markers are not case sensitive. In Chapter 17, we looked at a regular expression to recognize these snippets:
const char *expr = "<CODE>";
Now we need to enhance that expression, to require both markers, and to capture the text between the two markers. To do that, we add the second marker and a capture group to hold the text between the markers:
const char *expr = "<CODE>(.*)</CODE>";
After a successful match, the capture group will hold the text that was found between the two markers. To look at that text, we need to pass a match_results
object to regex_search
. If it finds a match, regex_search
fills in the match_results
object with details of the capture groups. The template match_results
has a member operator[](size_type n)
that returns a reference to a sub_match
object, which, in turn, holds the information about the nth capture group. In this case, we’re interested in the first capture group, so after the search, we need to look at match[1]
.
Example 18.1. Searching for Code Snippets (regexres/snippets.cpp
)
#include <regex>
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::smatch;
using std::string; using std::ifstream; using std::cout;
static void show_matches(const char *fname)
{ // scan file named by fname, line by line
ifstream input(fname);
string str;
smatch match;
const char *expr = "<CODE>(.*)</CODE>";
regex rgx(expr, regex::icase);
while (getline(input, str))
{ // check line for match
if (regex_search(str, match, rgx))
cout << match[1] << '
';
}
}
int main(int argc, char *argv[])
{ // search for code snippets in text file
if (argc != 2)
{ // wrong number of arguments
cout << "Usage: snippets <filename>
";
return EXIT_FAILURE;
}
try
{ // search the file
show_matches(argv[1]);
}
catch(…)
{ // something went wrong
cout << "Error
";
return EXIT_FAILURE;
}
return 0;
}
This code works because the two sets of search functions discussed in Chapter 17 have additional overloads that provide more detailed information about the range of characters in the target sequence that matched the regular expression and the ranges of characters that matched capture groups in the regular expression. To get this additional information, you pass a match_results
object to any of the versions of regex_match
or regex_search
immediately before the regular expression object.
<regex>
Partial SynopsisIn this chapter we look at the details of the class template sub_match
, which identifies a matching subsequence, and the class template match_results
, which holds a set of sub_match
objects that, together, identify all matching subsequences from a search. Then we look again at the function templates regex_match
and regex_search
to see how to use match_results
objects with them. In particular, we look at the following new components of the header <regex>
:
// CLASS TEMPLATE sub_match
template<class BidIt>
class sub_match;
typedef sub_match<const char*> csub_match;
typedef sub_match<const wchar_t*> wcsub_match;
typedef sub_match<string::const_iterator> ssub_match;
typedef sub_match<wstring::const_iterator> wssub_match;
// COMPARISON OPERATORS FOR sub_match
template<class BidIt>
bool operator==(
const sub_match<BidIt>&, const sub_match<BidIt>&);
// also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt, class IOtraits, class Alloc>
bool operator==(various types, const sub_match<BidIt>&);
// also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt, class IOtraits, class Alloc>
bool operator==(const sub_match<BidIt>&, various types);
// also operator!=, operator<, operator<=, operator>, operator>=
// CLASS TEMPLATE match_results
template <class BidIt,
class Alloc = allocator <sub_match<BidIt> >
class match_results;
typedef match_results<const char*> cmatch;
typedef match_results<const wchar_t*> wcmatch;
typedef match_results<string::const_iterator> smatch;
typedef match_results<wstring::const_iterator> wsmatch;
// FUNCTION TEMPLATE swap FOR match_results
template <class Elem, class IOtraits,
class BidIt, class Alloc>
void swap(match_results<BidIt, Alloc>& left,
match_results<BidIt, Alloc>& right) throw();
// COMPARISON OPERATORS FOR match_results
template<class BidIt, class Alloc>
bool operator==(const match_results<BidIt, Alloc>&,
const match_results<BidIt, Alloc>&);
template<class BidIt, class Alloc>
bool operator!=(const match_results<BidIt, Alloc>&,
const match_results<BidIt, Alloc>&);
} }
sub_match
Class TemplateA sub_match
object holds a Boolean value named matched
that is true
if the sub_match
object points to a character sequence that was part of a successful match. In that case, its two iterator members, first
and second
, point to the beginning of the sequence and one past the end of the sequence, respectively. That is, given a sub_match
object sub
, if sub.matched
is true
, the half-open sequence [sub.first, sub.second)
delimits the matching character sequence. Your code can create sub_match
objects, but ordinarily, you’ll use the ones contained in a match_results
object.
template<class BidIt>
class sub_match : public std::pair<BidIt, BidIt>{
public:
bool matched;
difference_type length() const;
basic_string <value_type> str() const;
operator basic_string<value_type>() const;
int compare(const sub_match& right) const;
int compare(const basic_string<value_type>& right) const;
int compare(const value_type *right) const;
typedef BidIt iterator;
typedef typename iterator_traits<BidIt>::value_type
value_type;
typedef typename iterator_traits<BidIt>::difference_type
difference_type;
};
The template argument BidIt
must be a type that meets the requirements for a bidirectional iterator. Ordinarily, this argument comes from the template match_results
that holds the sub_match
objects, so as long as you provide a bidirectional iterator type to match_results
, this requirement will be satisfied.
The class template sub_match<BidIt>
is derived from std::pair<BidIt, BidIt>
. This base class provides the two members, first
and second
, that hold the two iterator values. The class template also has a Boolean member, matched
, that holds true
if the iterators point to a character sequence that was part of a successful match. That sequence can be empty—that is, first
and second
are equal—for a zero-length match. The sequence will also be empty if the corresponding capture group was not part of a successful match. In this case, the member matched
will hold the value false
, and the members first
and second
will point to the end of the target sequence.
A zero-length match can occur when a capture group consists solely of an assertion or of a repetition that allows zero repeats. For example:
• “^”
matches the target sequence “”
. The sub_match
object that designates the full match holds two iterators that both point to the first position in the target sequence, and its member matched
holds true
.
• “a(b*)a”
matches the target sequence “aa”
. The sub_match
object that designates the capture group holds iterators that both point to the second character in the target sequence, and its member matched
holds true
.
• “(a)|b”
matches the target sequence “b”
. The capture group is not part of the match. The sub_match
object that designates the capture group holds iterators that point to the end of the target sequence—and thus compare equal—and its member matched
holds false
.
Several of the member functions of sub_match<BidIt>
take arguments or return objects of type basic_string<value_type>
. As we’ll see, value_-type
is a typedef for the character type that the iterators point to. So basic_string<value_type>
is a basic_string
object that holds characters. When the text you’re searching consists of ordinary char
objects, basic_-string<value_type>
is basic_string<char>
, or, more simply, string
.
typedef BidIt iterator;
typedef typename iterator_traits<BidIt>::value_type
value_type;
typedef typename iterator_traits<BidIt>::difference_type
difference_type;
The first type is a synonym for the first template type argument. The second and third types are synonyms for the iterator type’s associated value_type
and difference_type
, respectively.
These type names can be convenient when you need to peer into the contents of the matching text. The type name iterator
names the type of the iterators that the sub_match
type holds; value_type
is the character type that the iterators point to; and difference_type
can hold the difference between two iterator
values. For example:
typedef std::tr1::sub_match<const char*> cmatch;
cmatch::iterator iter; // iter has type const char*
cmatch::value_type ch; // ch has type char
cmatch::difference_type d; // d has type std::ptrdiff_t
bool matched;
BidIt first; // inherited from pair
BidIt second; // inherited from pair
If the capture group corresponding to the sub_match
object was part of a successful match, the member matched
holds true
, and the members first
and second
designate the character range in the target sequence that matched the capture group. If the capture group was not part of a successful match, the member matched
holds false
, and the members first
and last
point to the end of the target sequence.
A newly constructed sub_match
object has not been part of a successful match, so its matched
member will hold false
. As we’ll see later, a call to a search algorithm that doesn’t find a match leaves the sub_match
objects in a match_-results
object in an unspecified state, so you cannot count on any particular pattern of values when a search fails. If a search succeeds, the member matched
in each sub_match
object that was part of the match holds true
, and the member matched
in each sub_match
object that was not part of the match holds false
.
Example 18.2. Objects of type sub_match
(regexres/subobjects.cpp
)
#include <regex>
#include <algorithm>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <string>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::match_results; using std::tr1::sub_match;
using std::copy;
using std::ostream_iterator; using std::string;
using std::cout;using std::setw;
template <class BidIt>
void show(const char *title, const sub_match <BidIt>& sm)
{
typedef sub_match<BidIt>::value_type MyTy;
cout << setw(20) << title << ": ";
if (sm.matched)
copy(sm.first , sm.second,
ostream_iterator<MyTy>(cout));
else
cout << "[no match]";
cout << '
';
}
int main()
{
regex rgx("(a+)|(b+)");
string tgt("bbb");
match_results<string::iterator> match;
show("no search" , match[0]);
if (!regex_match(tgt.begin(), tgt.end(), match, rgx))
cout << "search failed
";
else
{ // search succeeded, capture group 1 not part of match
show("full match" , match[0]);
show("capture group 1", m[1]);
show("capture group 2", m[2]);
}
return 0;
}
In this example, the expression match[0]
returns a reference to the sub_-match
object that represents the full match, and match[1]
and match[2]
return references to the sub_match
objects that represent the subsequences that matched the first and second capture groups, respectively.
difference_type length() const;
The member function returns 0 if the member matched
holds false
; otherwise, distance(first, second)
.
This function returns the number of characters in the matching sequence delimited by [first, second)
and returns 0 if the corresponding capture group was not part of the match. The function also returns 0 for a zero-length match, so don’t use this return value to distinguish between those two cases. Use the member matched
.
basic_string<value_type> str() const;
operator basic_string<value_type>() const;
The first member function returns an empty string object if matched
holds false
; otherwise, it returns basic_string<value_type>(first, second)
. The second member function returns str()
.
These member functions convert the matching sequence into a basic_string
object. This will often be more convenient than using the raw iterators first
and second
. Here’s the previous example, with the function show
rewritten to use str()
.
Example 18.3. String Conversions (regexres/strings.cpp
)
#include <regex>
#include <iomanip>
#include <iostream>
#include <string>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::match_results; using std::tr1::sub_match;
using std::string;
using std::cout; using std::setw;
template <class BidIt>
void show(const char *title, const sub_match<BidIt>& sm)
{
cout << setw(20) << title << ":";
if (sm.matched)
cout << sm.str() << '
';
else
cout << "[no match]
";
}
int main()
{
regex rgx("(a+)|(b+)");
string tgt("bbb");
match_results<string::iterator> m;
show("no search", m[0]);
if (!regex_match(tgt.begin() , tgt . end() , m , rgx))
cout << "search failed
";
else
{ // search succeeded, capture group 1 not part of match
show("full match", m[0]);
show("capture group 1", m[1]);
show("capture group 2", m[2]);
}
return 0;
}
int compare(const sub_match& right) const;
int compare(const basic_string<value_type>& right) const;
int compare(const value_type *right) const;
The first member function returns str().compare(right.str())
. The second and third member functions return str().compare(right)
.
That is, these functions do a lexicographical comparison of the matched sequence and their argument,[1] returning a negative value if the matched sequence comes before the argument, zero if they are equal, and a positive value if the matched sequence comes after the argument.
Example 18.4. The compare
Member Functions (regexres/compare.cpp
)
#include <regex>
#include <iostream>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::csub_match; using std::tr1::cmatch;
using std::cout;
static char *blocked_sites[] =
{ // block list; any resemblance between the names here
// and real URLs is probably accidental
"www.idontwantmykidshere.com",
"www.lotsofxxxstuff.com",
"www.nra.org"
};
const int nsites = sizeof(blocked_sites)
/ sizeof(*blocked_sites);
bool allow(const csub_match& match)
{ // return false if match is on the blocked list
for (int i = 0; i < nsites; ++i)
if (match.compare(blocked_sites[i]) == 0)
return false;
return true;
}
bool check_url(const char *url)
{ // return false if URL is not a valid HTTP URL or
// if the hostname is on the blocked list
regex rgx("http://([^/: ]+)(:(\d+))?(/.*)?");
cmatch match;
return regex_match(url , match , rgx) && allow(match[1]);
}
void connect(const char *url)
{ // connect to valid, unblocked URL
if (check_url(url))
{
cout << "Okay to connect: " << url << '
';
// remainder of connection code left as exercise for the reader
}
else
cout << "Invalid or blocked URL: " << url << '
';
}
int main()
{ // connect to a couple of sites
connect("http://www.xxx.com/risque/index.html");
connect("http://www.petebecker.com/tr1book");
connect("http:/invalid , for many reasons");
return 0;
}
In this example, I simplified the code by using some of the built-in typedefs instead of using the full names of the template instantiations. We’ll look at these typedefs later. For now, cmatch
is a synonym for match_results<const char*>
, which is the appropriate type to hold the results of a search through an array of char
. An object of type cmatch
, in turn, holds objects of type sub_match<const char*>
; the synonym for that one is csub_match
.
The function allow
does a linear search of the list of blocked URLs, to see whether the hostname passed to it is on the list. The function check_url
checks whether its argument is a valid HTTP URL, and, if so, extracts the hostname and calls allow
.[2]
template<class BidIt>
bool operator==(const sub_match<BidIt>& left,
const sub_match<BidIt>& right);
// also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt /* maybe more */>
bool operator==(
various types left, const sub_match<BidIt>& right);
// also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt /* maybe more */>
bool operator==(
const sub_match<BidIt>& left, various types right);
// also operator!=, operator<, operator<=, operator>, operator>=
Each function template operator==
returns true
only if the argument left
designates the same characters, in the same order, as the argument right
.
Each function template operator!=(left, right)
returns !(left == right)
.
Each function template operator<
returns true
only if the argument left
designates a sequence of characters that lexicographically precedes the sequence of characters designated by the argument right
.
Each function template operator<=(left, right)
returns !(right < left)
.
Each function template operator>(left, right)
returns right < left
.
Each function template operator>=(left, right)
returns !(left < right)
.
In addition to the overloaded member functions named compare
, there’s along list of operators for comparing sub_match
objects to various representations of character sequences. Rather than list all six comparison operators for each pair of types,[3] the preceding synopsis gives the declaration for operator==
. The remaining five operators are all declared in the obvious way.
The argument types referred to as various types can be any of the following, where Ty
is iterator_traits<BidIt>::value_type
:
• An object of type basic_string<Ty, Traits, Alloc>
• A pointer of type Ty*
That is, you can compare a sub_match<BidIt>
object to another sub_-match<BidIt>
object, to a basic_string
object that holds the same character type, to a null-terminated character string, and to a single character. Of course, the sub_match<BidIt>
object can be on either side of the comparison.
Example 18.5. Comparison Operators (regexres/operators.cpp
)
#include <regex>
#include <iostream>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::csub_match; using std::tr1::cmatch;
using std::cout;
static char *blocked_sites[] =
{ // block list; any resemblance between the names here
// and real URLs is probably accidental
"www.idontwantmykidshere.com",
"www.lotsofxxxstuff.com",
"www.nra.org"
};
const int nsites = sizeof(blocked_sites)
/ sizeof(*blocked_sites);
bool allow(const csub_match& match)
{ // return false if match is on the blocked list
for (int i = 0; i < nsites; ++i)
if (match == blocked_sites[i])
return false;
else if (match < blocked_sites[i])
return true;
return true;
}
bool check_url(const char *url)
{ // return false if URL is not a valid HTTP URL or
// if the hostname is on the blocked list
regex rgx("http://([^/:]+)(:(\d+))?(/.*)?");
cmatch match;
return regex_match(url , match , rgx) && allow(match[1]);
}
void connect(const char *url)
{ // connect to valid, unblocked URL
if (check_url(url))
{
cout << "Okay to connect: "<< url <<'
';
// remainder of connection code left as exercise for the reader
}
else
cout << "Invalid or blocked URL: " << url << '
';
}
int main()
{ // connect to a couple of sites
connect("http://www.xxx.com/risque/index.html");
connect("http://www.petebecker.com/tr1book");
connect("http:/invalid, for many reasons");
return 0;
}
This example is a lot like the previous one but with two differences, both in the function allow
. First, this example uses operator==
to check whether the hostname is in the blocked list. Second, this example uses operator<
to take advantage of the list’s being in alphabetical order to cut the linear search short when it reaches a name that comes after the target hostname.
sub_match
Typestypedef sub_match<const char*> csub_match;
typedef sub_match<const wchar_t*> wcsub_match;
typedef sub_match<std::string::const_iterator> ssub_match;
typedef sub_match<std::wstring::const_iterator> wssub_match;
The four names are synonyms for the most commonly used sub_match
types. Keep in mind that the template argument to sub_match
must be the iterator type associated with the target text that was passed to regex_match
or regex_search
. When the target text was passed as a char*
or wchar_-t*
(const
or otherwise), the associated iterator types are const char*
and const wchar_t*
, respectively. When the target text is held in a string
or wstring
object, the associated iterator type is the string type’s nested name const_iterator
.
Example 18.6. Predefined sub_match Types (regexres/predefined.cpp
)
#include <regex>
#include <iostream>
#include <string>
using std::tr1::regex; using std::tr1::wregex;
using std::tr1::regex_match;
using std::tr1::cmatch; using std::tr1::smatch;
using std::tr1::wcmatch;using std::tr1::wsmatch;
using std::tr1::csub_match; using std::tr1::ssub_match;
using std::tr1::wcsub_match; using std::tr1::wssub_match;
using std::string; using std::wstring;
using std::cout;
static void show(…)
{ // called with unknown type
cout << "Called with unknown argument type
";
}
static void show(csub_match match)
{ // called with csub match argument
cout << "Called show(csub_match)
";
}
static void show(wcsub_match match)
{ // called with wcsub match argument
cout << "Called show(wcsub_match)
";
}
static void show(ssub_match match)
{ // called with ssub match argument
cout << "Called show(ssub_match)
";
}
static void show(wssub_match match)
{ // called with wssub match argument
cout << "Called show(wssub_match)
";
}
int main()
{ // show sub match types for various match results types
regex rgx("abc");
cmatch match0;
if (regex_match("abc", match0, rgx))
show(match0[0]);
smatch match1;
if (regex_match(string("abc"), match1, rgx))
show(match1[0]);
wregex wrgx(L"abc");
wcmatch match2;
if (regex_match(L"abc", match2, wrgx))
show(match2[0]);
wsmatch match3;
if (regex_match(wstring(L"abc"), match3, wrgx))
show(match3[0]);
return 0;
}
match_results
Class TemplateThe class template match_results
is a nonmodifiable container.[4] It holds the results of a successful match found by a call to regex_match
or regex_-search
. Typically, your code will create a match_results<BidIt>
object, with the type BidIt
being an iterator of the same type as the iterator for the target text. For example, when the target text is passed as a const char*
, use match_results<const char *>
. When the target text is passed as a standard string
object, use match_results<string::const_iterator>
.
template <class BidIt,
class Alloc = allocator<
typename iterator_traits<BidIt>:: value_type> >
class match_results {
public:
explicit match_results(const Alloc& alloc = Alloc());
match_results (const match_results & right);
match_results& operator=(const match_results& right);
void swap(const match_results& other) throw();
const_reference operator[](size_type sub) const;
difference_type position(size_type sub = 0) const;
difference_type length(size_type sub = 0) const;
string_type str(size_type sub = 0) const;
const_reference prefix() const;
const_reference suffix() const;
const_iterator begin() const;
const_iterator end() const;
template<class OutIt>
OutIt format(OutIt out,
const string_type& fmt,
match_flag_type flags = format_default) const;
string_type format(const string_type& fmt,
match_flag_type flags = format_default) const;
size_type size() const;
size_type max_size() const;
bool empty() const;
allocator_type get_allocator() const;
typedef sub_match<BidIt> value_type;
typedef const typename Alloc::const_reference
const_reference;
typedef const_reference reference;
typedef T0 const_iterator;
typedef const_iterator iterator;
typedef typename iterator_traits<BidIt>::difference_type
difference_type;
typedef typename Alloc::size_type size_type;
typedef Alloc allocator_type;
typedef typename iterator_traits<BidIt>::value_type
char_type;
typedef basic_string<char_type> string_type;
};
The template takes two type arguments. The first, listed here as BidIt
, must be a bidirectional iterator, the same type as you’re going to use to point to the target text. The second is an allocator type. An object of this type is stored in the match_results
object and will be used to manage the memory needed to hold the various sub_match
objects that hold the details of a successful match. The default allocator type is an instance of the allocator from the standard library.
Objects of type match_results<BidIt>
can be created, copied, assigned, and swapped. These operations are discussed in Section 18.4.1. After a successful search, you can examine capture groups individually with the member functions position
, length
, str
, and operator[]
, and you can look at the part of the target text that preceded or followed the matching text with the member functions prefix
and suffix
. These are discussed in Section 18.4.2. Because a match_results
object is a container, you can call the member functions begin
and end
to get a pair of iterators that designate a half-open sequence of sub_match
objects, as discussed in Section 18.4.3. You can also ask about the number of elements in the container, with the member functions size
, max_size
, and empty
, and you can get a copy of the container’s allocator with get_allocator
. These functions are discussed in Section 18.4.4. Like all containers, the template defines several nested type names, described in Section 18.4.5. The library provides two operators to compare match_results<BidIt>
objects for equality (Section 18.4.6) and four typedef names that provide synonyms for commonly used match_results
instances (Section 18.4.7). Finally, two member functions can be used to produce formatted text by replacing various parts of the target text. These are discussed in Chapter 20, which covers formatting and text replacement.
match_results
Objectsexplicit match_results::match_results(
const Alloc& alloc = Alloc());
The constructor constructs a match_results
object that holds a copy of the argument alloc
and no elements.
Thus, after constructing an object with this constructor, the member function size
returns 0, and the member function str
returns an empty string.
match_results::match_results(const match_results& right);
match_results& match_results::operator=(
const match_results& right);
The copy constructor constructs an object that is a copy of its argument. The assignment operator replaces the object’s controlled sequence with a copy of its argument.
void match_results::swap(
const match_results& other) throw();
template<class Elem, class IOtraits,
class BidIt, class Alloc >
void swap(match_results<BidIt , Alloc >& left,
match_results<BidIt , Alloc>& right) throw()
{ // swap left and right
left.swap(right);
}
The member function swaps the object’s controlled sequence with its argument’s controlled sequence and does not throw exceptions. The non-member function calls left.swap(right)
.
Example 18.7. Constructors and Modifiers for match_results
(regexres/modify.cpp
)
#include <regex>
#include <iostream>
#include <stdlib.h>
using std::tr1::regex; using std::tr1::match_results;
using std::tr1::regex_search;
using std::cout;
typedef match_results<const char *> mtch;
static void show(const char *title, const mtch& match)
{ // summarize match results object
cout << title << ":
";
cout << "size:" << match.size() << '
';
cout << "contents: `" << match.str() << "`
";
}
int main()
{ // demonstrate various constructors and modifiers
mtch match;
show("after default constructor" , match);
regex rgx("b(c*)d");
const char *tgt = "abcccde";
mtch match1;
if (!regex_search(tgt, match1, rgx))
return EXIT_FAILURE;
show("after successful search" , match1);
mtch match2(match1);
show("after copy construction" , match2);
match.swap(match1);
show("after swap" , match);
swap(match , match1);
show("after another swap" , match);
match = match2;
show("after assignment" , match);
return 0;
}
const_reference
match_results::operator[](size_type n) const;
The operator returns a reference to the nth element in the controlled sequence or a reference to an empty sub_match
object if size() <= n
or if the nth capture group was not part of the match.
The 0th element of the controlled sequence is a sub_match
object that delineates the entire text that matched the regular expression. Succeeding elements delineate the text that matched the corresponding capture group. If a capture group was not part of the match or if n
is larger than the number of capture groups, the sub_match
object is empty; these sub_match
objects are not required to be distinct.
difference_type position(size_type n = 0) const;
The member function returns distance(prefix().first(), (*this)[n].first)
.
That is, it returns the offset of the beginning of the text that matches capture group n from the beginning of the target text.
difference_type length(size_type n = 0) const;
The member function returns (*this)[n].length()
.
That is, it returns the number of characters in the nth capture group.
string_type str(size_type n = 0) const;
The member function returns string_type((*this)[n])
.
That is, it returns an object of type string_type
that holds a copy of the text of the nth capture group.
const_reference match_results::prefix() const;
const_reference match_results::suffix() const;
The first member function returns a reference to an object of type sub_-match<BidIt>
that points to the character sequence that begins at the start of the target sequence and ends at (*this)[0].first
. The second member function returns a reference to an object of type sub_-match<BidIt>
that points to the character sequence that begins at (*this)[size() - 1].second
and ends at the end of the target sequence.
That is, the two member functions return sub_match
objects that point to the text that precedes and follows, respectively, the text that matched the regular expression.
Example 18.8. Examining Contained Objects (regexres/examine.cpp
)
#include <regex>
#include <iostream>
#include <stdlib.h>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::match_results; using std::tr1::sub_match;
using std::cout;
typedef match_results<const char *> mtch;
static void show(int idx, const mtch& match)
{ // show contents of match[idx]
cout << "match[" << idx << "]: "
<< (match[idx].matched ? " " : "not")
<< "matched, `" << match.str(idx)
<< "` at offset " << match.position(idx)
<< ", with length " << match.length(idx) << '
';
}
int main()
{ // demonstrate operator[]
regex rgx("b(c*|(x))d");
const char *tgt = "abcccde";
mtch match;
if (!regex_search(tgt, match, rgx))
return EXIT_FAILURE;
cout << "After search, size is "
<< match.size() << '
';
cout << "text preceding match is `"
<< match.prefix() << "`
";
for (int i = 0; i < match.size() + 2; ++i)
show(i, match);
cout << "text following match is `"
<< match.suffix() << "`
";
return 0;
}
The output from this program shows that match
holds three sub_match
objects. The object returned by prefix()
holds the text “a”
, which is the text that preceded the matching text. The object returned by suffix()
holds the text “e”
, which is the text that followed the matching text. The object returned by match[0]
holds the text “bcccd”
, which is all the target text that matched the regular expression. The object returned by match[1]
holds the text “ccc”
, which is the part of the target text that matched the first capture group, “(c*|(x))”
. The object returned by match[2]
is empty because capture group 2, “(x)”
, wasn’t part of the match. The objects returned by match[3]
and match[4]
are also empty because they refer to capture groups that don’t exist in the regular expression.
const_iterator match_results::begin() const;
const_iterator match_results::end() const;
The first member function returns a random access iterator that points to the first element of the controlled sequence or just beyond the end of an empty sequence. The second member function returns a random access iterator that points just beyond the end of the controlled sequence.
Note that the controlled sequence is the sequence of sub_match
objects returned by calling operator[]
with successive values from 0 to size() - 1
. It does not include the sub_match
objects returned by prefix
or suffix
unless those happen to be equal to one of the other sub_match
objects, which occurs only with empty sub_match
objects.
Example 18.9. Iterating Through an Object (regexres/iterate.cpp
)
#include <regex>
#include <iostream>
#include <algorithm>
#include <iterator>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::match_results; using std::tr1::sub_match;
using std::cout; using std::ostream_iterator;
using std::copy;
typedef const char *iter;
typedef sub_match<iter> sub;
typedef match_results<iter> mtch;
namespace std { // add inserter to namespace std
template <class Elem, class Alloc>
basic_ostream<Elem, Alloc>& operator<<(
basic_ostream<Elem, Alloc>& out, const sub & val)
{ // insert sub match <iter> into stream
return out << '`' << val.str() << '`';
}
}
int main()
{
regex rgx("b(c*|(x))d");
const char *tgt = "abcccde";
mtch match;
if (!regex_search(tgt, match, rgx))
return EXIT_FAILURE;
copy(match.begin(), match.end(),
ostream_iterator <sub>(cout, "
"));
return 0;
}
size_type match_results::size() const;
size_type match_results::max_size() const;
bool match_results::empty() const { return size() == 0;}
The first member function returns the length of the controlled sequence. The second member function returns the length of the longest sequence that the object can control. The third member function returns true
only if the length of the controlled sequence is 0.
allocator_type match_results::get_allocator() const;
The member function returns a copy of the stored allocator object.
typedef sub_match <BidIt> value_type;
typedef const typename Alloc::const_reference
const_reference;
typedef const_reference reference;
typedef T0 const_iterator;
typedef const_iterator iterator;
typedef typename iterator_traits<BidIt>::difference_type
difference_type;
typedef typename Alloc::size_type size_type;
typedef Alloc allocator_type;
typedef typename iterator_traits<BidIt>::value_type
char_type;
typedef basic_string<char_type> string_type;
The type names nested in match_results<BidIt, Alloc>
are defined as follows:
• value_type:
a synonym for sub_match<BidIt>
• const_reference
: a description of an object that can serve as a reference to an unmodifiable element of the controlled sequence
• reference
: a description of an object that can serve as a reference to an unmodifiable element of the controlled sequence
• const_iterator
: a description of an object that can serve as a random-access iterator that points at unmodifiable elements of the controlled sequence
• iterator
: a description of an object that can serve as a random-access iterator that points at unmodifiable elements of the controlled sequence
• difference_type
: a synonym for iterator_traits<BidIt>::difference_type
; it describes an object that can represent the difference between any two iterators that point at elements of the controlled sequence
• size_type:
a synonym for Alloc::size_type
• allocator_type
: a synonym for the template argument Alloc
• char_type
: a synonym for iterator_traits<BidIt>::value_type
, which is the element type of the character sequence that was searched
• string_type
: a synonym for basic_string<char_type
A match_results
object satisfies the requirements for a sequence container[5] except that operations that modify the sequence are not supported. All but the last two nested types are required for a sequence container. The last two make it easier to talk about the contents of the character sequences that the container holds.
match_results
Objectstemplate <class BidIt, class Alloc>
bool operator==(
const match_results<BidIt , Alloc>& left,
const match_results<BidIt , Alloc>& right);
template <class BidIt , class Alloc>
bool operator!=(
const match_results<BidIt , Alloc>& left,
const match_results<BidIt , Alloc>& right)
{ return !(left == right);}
The first operator returns true
only if left.size() == right.size()
and equal(left.begin(), left.end(), right.begin())
. The second operator returns true
only if !(left == right)
.
These operators apply the usual definition of equality for container types: Two containers are equal if they hold the same number of elements and corresponding elements are equal.
match_results
Typestypedef match_results<const char *> cmatch;
typedef match_results<const wchar_t *> wcmatch;
typedef match_results<string::const_iterator> smatch;
typedef match_results<wstring::const_iterator> wsmatch;
The four names are synonyms for the most commonly used match_results
types. Keep in mind that the template argument to match_results
must be the iterator type associated with the target text that was passed to regex_-match
or regex_search
. When the target text is a pointer to char
or wchar_t
(const
or otherwise), the associated iterator type is a pointer to const char
or to const wchar_t
. When the target text is a string
or wstring
object, the associated iterator type is the string type’s nested name, const_iterator
.
template<class OutIt>
OutIt match_results::format(OutIt out,
const string_type& fmt,
match_flag_type flags = format_default) const;
string_type match_results::format(
const string_type& fmt,
match_flag_type flags = format_default) const;
These member functions are discussed in the Chapter 20, which covers formatting and text replacement.
For each of the following errors, write a simple test case containing the error, and try to compile it. In the error messages, look for the key words that relate to the error in the code.
1. Attempting to modify the contents of a match_results
object
2. Attempting to specialize match_results
with an iterator type that is not a bidirectional iterator or a random access iterator
3. Attempting to call regex_search
with a match_results
specialization and a basic_regex
object whose element types are not the same
Write a utility function that takes a reference to a match_results
object and shows whether that object was part of a successful match and, if so, shows useful information about the match: its prefix, the contents of each capture group, and its suffix. For each capture group that was part of the match, indent its text by the number of characters that preceded the capture group in the original text. Use this utility function to review any of the Chapter 15 examples that were unclear.
Write a utility function that takes a reference to a match_results
object and an index value and shows whether the sub_match
object at that index value was part of a successful match and, if so, shows all the available information about the capture group that it matched: its position in the target text, its length, and its contents. Search for text matching the regular expression “(a(.*)b)|(c(.*)d)”
in the target text “ab”
, and compare the information about capture groups 2 and 4. Make sure that you understand the difference between them.
One of the differences between the ECMAScript grammar and the UNIX-based grammars is the UNIX requirement to find the longest sub-matches while finding the longest overall match. Write a program that searches for text matching the regular expression “(wee|week).*”
in the target text “weeknights”
, using both the ECMAScript and the ere grammars, and shows the contents of capture group 1. Also try it with ere and the flag match_any
.
18.222.120.200