Chapter 18. Search Results

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 18. Search Results

Shun those studies in which the work that results dies with the worker.

The Notebooks
LEONARDO DA VINCI

Suppose that you want to scan a text file encoded in HTML and extract all the code snippets. Each snippet begins with “<CODE>” and ends with “</CODE>”. The two markers are not case sensitive. In Chapter 17, we looked at a regular expression to recognize these snippets:

const char *expr = "<CODE>";

Now we need to enhance that expression, to require both markers, and to capture the text between the two markers. To do that, we add the second marker and a capture group to hold the text between the markers:

const char *expr = "<CODE>(.*)</CODE>";

After a successful match, the capture group will hold the text that was found between the two markers. To look at that text, we need to pass a match_results object to regex_search. If it finds a match, regex_search fills in the match_results object with details of the capture groups. The template match_results has a member operator[](size_type n) that returns a reference to a sub_match object, which, in turn, holds the information about the nth capture group. In this case, we’re interested in the first capture group, so after the search, we need to look at match[1].

Example 18.1. Searching for Code Snippets (regexres/snippets.cpp)

#include <regex>
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::smatch;
using std::string; using std::ifstream; using std::cout;

static void show_matches(const char *fname)

  { // scan file named by fname, line by line
  ifstream input(fname);
  string str;
  smatch match;
  const char *expr = "<CODE>(.*)</CODE>";
  regex rgx(expr, regex::icase);
  while (getline(input, str))
    { // check line for match
    if (regex_search(str, match, rgx))
      cout << match[1] << ' ';
    }
  }

int main(int argc, char *argv[])
  { // search for code snippets in text file
  if (argc != 2)
    { // wrong number of arguments
    cout << "Usage: snippets <filename> ";
    return EXIT_FAILURE;
    }
  try
    { // search the file
    show_matches(argv[1]);
    }
  catch(…)
    { // something went wrong
    cout << "Error ";
    return EXIT_FAILURE;
    }
  return 0;
  }

This code works because the two sets of search functions discussed in Chapter 17 have additional overloads that provide more detailed information about the range of characters in the target sequence that matched the regular expression and the ranges of characters that matched capture groups in the regular expression. To get this additional information, you pass a match_results object to any of the versions of regex_match or regex_search immediately before the regular expression object.

18.1. Header `<regex>` Partial Synopsis

In this chapter we look at the details of the class template sub_match, which identifies a matching subsequence, and the class template match_results, which holds a set of sub_match objects that, together, identify all matching subsequences from a search. Then we look again at the function templates regex_match and regex_search to see how to use match_results objects with them. In particular, we look at the following new components of the header <regex>:

    // CLASS TEMPLATE sub_match
template<class BidIt>
    class sub_match;
typedef sub_match<const char*> csub_match;
typedef sub_match<const wchar_t*> wcsub_match;
typedef sub_match<string::const_iterator> ssub_match;
typedef sub_match<wstring::const_iterator> wssub_match;

    // COMPARISON OPERATORS FOR sub_match
template<class BidIt>
    bool operator==(
      const   sub_match<BidIt>&,   const   sub_match<BidIt>&);
    // also operator!=, operator<, operator<=, operator>, operator>=

template<class    BidIt,    class    IOtraits,    class    Alloc>
    bool operator==(various types, const sub_match<BidIt>&);
    // also operator!=, operator<, operator<=, operator>, operator>=

template<class    BidIt,    class    IOtraits,    class    Alloc>
    bool operator==(const sub_match<BidIt>&, various types);
    // also operator!=, operator<, operator<=, operator>, operator>=

    // CLASS TEMPLATE match_results
template <class BidIt,
    class Alloc = allocator <sub_match<BidIt> >
    class match_results;
typedef match_results<const char*> cmatch;
typedef match_results<const wchar_t*> wcmatch;
typedef match_results<string::const_iterator> smatch;
typedef match_results<wstring::const_iterator> wsmatch;

    // FUNCTION TEMPLATE swap FOR match_results
template <class Elem, class IOtraits,
  class BidIt, class Alloc>
    void swap(match_results<BidIt, Alloc>& left,
        match_results<BidIt, Alloc>& right) throw();

    // COMPARISON OPERATORS FOR match_results
template<class    BidIt,   class    Alloc>
    bool operator==(const match_results<BidIt, Alloc>&,
        const match_results<BidIt, Alloc>&);
template<class    BidIt,   class    Alloc>
    bool operator!=(const match_results<BidIt, Alloc>&,
        const match_results<BidIt,  Alloc>&);

} }

18.2. The `sub_match` Class Template

A sub_match object holds a Boolean value named matched that is true if the sub_match object points to a character sequence that was part of a successful match. In that case, its two iterator members, first and second, point to the beginning of the sequence and one past the end of the sequence, respectively. That is, given a sub_match object sub, if sub.matched is true, the half-open sequence [sub.first, sub.second) delimits the matching character sequence. Your code can create sub_match objects, but ordinarily, you’ll use the ones contained in a match_results object.

template<class BidIt>
  class sub_match : public std::pair<BidIt, BidIt>{
public:
  bool matched;

difference_type length() const;
basic_string <value_type> str() const;
operator basic_string<value_type>() const;

int compare(const sub_match& right) const;
int compare(const basic_string<value_type>& right) const;
int compare(const value_type *right) const;

typedef BidIt iterator;
typedef typename iterator_traits<BidIt>::value_type
  value_type;
typedef typename iterator_traits<BidIt>::difference_type
  difference_type;
};

The template argument BidIt must be a type that meets the requirements for a bidirectional iterator. Ordinarily, this argument comes from the template match_results that holds the sub_match objects, so as long as you provide a bidirectional iterator type to match_results, this requirement will be satisfied.

The class template sub_match<BidIt> is derived from std::pair<BidIt, BidIt>. This base class provides the two members, first and second, that hold the two iterator values. The class template also has a Boolean member, matched, that holds true if the iterators point to a character sequence that was part of a successful match. That sequence can be empty—that is, first and second are equal—for a zero-length match. The sequence will also be empty if the corresponding capture group was not part of a successful match. In this case, the member matched will hold the value false, and the members first and second will point to the end of the target sequence.

A zero-length match can occur when a capture group consists solely of an assertion or of a repetition that allows zero repeats. For example:

• “^” matches the target sequence “”. The sub_match object that designates the full match holds two iterators that both point to the first position in the target sequence, and its member matched holds true.

• “a(b*)a” matches the target sequence “aa”. The sub_match object that designates the capture group holds iterators that both point to the second character in the target sequence, and its member matched holds true.

• “(a)|b” matches the target sequence “b”. The capture group is not part of the match. The sub_match object that designates the capture group holds iterators that point to the end of the target sequence—and thus compare equal—and its member matched holds false.

Several of the member functions of sub_match<BidIt> take arguments or return objects of type basic_string<value_type>. As we’ll see, value_-type is a typedef for the character type that the iterators point to. So basic_string<value_type> is a basic_string object that holds characters. When the text you’re searching consists of ordinary char objects, basic_-string<value_type> is basic_string<char>, or, more simply, string.

18.2.1. Nested Types

typedef BidIt iterator;
typedef typename iterator_traits<BidIt>::value_type
value_type;

typedef typename iterator_traits<BidIt>::difference_type
difference_type;

The first type is a synonym for the first template type argument. The second and third types are synonyms for the iterator type’s associated value_type and difference_type, respectively.

These type names can be convenient when you need to peer into the contents of the matching text. The type name iterator names the type of the iterators that the sub_match type holds; value_type is the character type that the iterators point to; and difference_type can hold the difference between two iterator values. For example:

typedef std::tr1::sub_match<const char*> cmatch;
cmatch::iterator iter;       // iter has type const char*
cmatch::value_type ch;       // ch has type char
cmatch::difference_type d;    // d has type std::ptrdiff_t

18.2.2. Access

bool matched;
BidIt first; // inherited from pair
BidIt second; // inherited from pair

If the capture group corresponding to the sub_match object was part of a successful match, the member matched holds true, and the members first and second designate the character range in the target sequence that matched the capture group. If the capture group was not part of a successful match, the member matched holds false, and the members first and last point to the end of the target sequence.

A newly constructed sub_match object has not been part of a successful match, so its matched member will hold false. As we’ll see later, a call to a search algorithm that doesn’t find a match leaves the sub_match objects in a match_-results object in an unspecified state, so you cannot count on any particular pattern of values when a search fails. If a search succeeds, the member matched in each sub_match object that was part of the match holds true, and the member matched in each sub_match object that was not part of the match holds false.

Example 18.2. Objects of type sub_match (regexres/subobjects.cpp)

#include <regex>
#include <algorithm>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <string>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::match_results; using std::tr1::sub_match;
using std::copy;
using std::ostream_iterator; using std::string;
using std::cout;using std::setw;

template <class BidIt>
void show(const char *title, const sub_match <BidIt>& sm)
  {
  typedef sub_match<BidIt>::value_type MyTy;
  cout << setw(20) << title << ": ";
  if (sm.matched)
    copy(sm.first , sm.second,
      ostream_iterator<MyTy>(cout));
  else
    cout << "[no match]";
  cout << ' ';
  }

int main()
  {
  regex rgx("(a+)|(b+)");
  string tgt("bbb");
  match_results<string::iterator> match;
  show("no search" , match[0]);
  if (!regex_match(tgt.begin(), tgt.end(), match, rgx))
    cout << "search failed ";
  else
    { // search succeeded, capture group 1 not part of match
    show("full match" , match[0]);
    show("capture group 1", m[1]);
    show("capture group 2", m[2]);
    }
  return 0;
  }

In this example, the expression match[0] returns a reference to the sub_-match object that represents the full match, and match[1] and match[2] return references to the sub_match objects that represent the subsequences that matched the first and second capture groups, respectively.

difference_type length() const;

The member function returns 0 if the member matched holds false; otherwise, distance(first, second).

This function returns the number of characters in the matching sequence delimited by [first, second) and returns 0 if the corresponding capture group was not part of the match. The function also returns 0 for a zero-length match, so don’t use this return value to distinguish between those two cases. Use the member matched.

basic_string<value_type> str() const;
operator basic_string<value_type>() const;

The first member function returns an empty string object if matched holds false; otherwise, it returns basic_string<value_type>(first, second). The second member function returns str().

These member functions convert the matching sequence into a basic_string object. This will often be more convenient than using the raw iterators first and second. Here’s the previous example, with the function show rewritten to use str().

Example 18.3. String Conversions (regexres/strings.cpp)

#include <regex>
#include <iomanip>
#include <iostream>
#include <string>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::match_results; using std::tr1::sub_match;
using std::string;
using std::cout; using std::setw;

template <class BidIt>
void show(const char *title, const sub_match<BidIt>& sm)

  {
  cout << setw(20) << title << ":";
  if (sm.matched)
    cout << sm.str() << ' ';
  else
    cout << "[no match] ";
  }

int main()
  {
  regex rgx("(a+)|(b+)");
  string tgt("bbb");
  match_results<string::iterator> m;
  show("no search", m[0]);
  if (!regex_match(tgt.begin() ,  tgt . end() , m ,  rgx))
    cout << "search failed ";
  else
    { // search succeeded, capture group 1 not part of match
    show("full match", m[0]);
    show("capture group 1", m[1]);
    show("capture group 2", m[2]);
    }
  return 0;
  }

18.2.3. Comparison

Member Functions

int compare(const sub_match& right) const;
int compare(const basic_string<value_type>& right) const;
int compare(const value_type *right) const;

The first member function returns str().compare(right.str()). The second and third member functions return str().compare(right).

That is, these functions do a lexicographical comparison of the matched sequence and their argument,^[1] returning a negative value if the matched sequence comes before the argument, zero if they are equal, and a positive value if the matched sequence comes after the argument.

Example 18.4. The compare Member Functions (regexres/compare.cpp)

#include <regex>
#include <iostream>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::csub_match; using std::tr1::cmatch;
using std::cout;

static char *blocked_sites[] =
{ // block list; any resemblance between the names here
  // and real URLs is probably accidental
"www.idontwantmykidshere.com",
"www.lotsofxxxstuff.com",
"www.nra.org"
};
const int nsites = sizeof(blocked_sites)
  / sizeof(*blocked_sites);

bool allow(const csub_match& match)
  { // return false if match is on the blocked list
  for (int i = 0; i < nsites; ++i)
    if (match.compare(blocked_sites[i]) == 0)
      return false;
  return true;
  }

bool check_url(const char *url)
  { // return false if URL is not a valid HTTP URL or
    // if the hostname is on the blocked list
  regex rgx("http://([^/: ]+)(:(\d+))?(/.*)?");
  cmatch match;

  return regex_match(url , match , rgx) && allow(match[1]);
  }

void connect(const char *url)
  { // connect to valid, unblocked URL
  if (check_url(url))
    {
    cout << "Okay to connect: " << url << ' ';
    // remainder of connection code left as exercise for the reader
    }
  else
    cout << "Invalid or blocked URL: "  << url << ' ';
  }

int main()
  { // connect to a couple of sites
  connect("http://www.xxx.com/risque/index.html");
  connect("http://www.petebecker.com/tr1book");
  connect("http:/invalid , for many reasons");
  return 0;
  }

In this example, I simplified the code by using some of the built-in typedefs instead of using the full names of the template instantiations. We’ll look at these typedefs later. For now, cmatch is a synonym for match_results<const char*>, which is the appropriate type to hold the results of a search through an array of char. An object of type cmatch, in turn, holds objects of type sub_match<const char*>; the synonym for that one is csub_match.

The function allow does a linear search of the list of blocked URLs, to see whether the hostname passed to it is on the list. The function check_url checks whether its argument is a valid HTTP URL, and, if so, extracts the hostname and calls allow.^[2]

Nonmember Operators

template<class BidIt>
    bool operator==(const sub_match<BidIt>& left,
      const sub_match<BidIt>& right);

    // also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt /* maybe more */>
    bool operator==(
      various types left, const sub_match<BidIt>& right);
    // also operator!=, operator<, operator<=, operator>, operator>=
template<class BidIt /* maybe more */>
    bool operator==(
      const sub_match<BidIt>& left, various types right);
    // also operator!=, operator<, operator<=, operator>, operator>=

Each function template operator== returns true only if the argument left designates the same characters, in the same order, as the argument right.

Each function template operator!=(left, right) returns !(left == right).

Each function template operator< returns true only if the argument left designates a sequence of characters that lexicographically precedes the sequence of characters designated by the argument right.

Each function template operator<=(left, right) returns !(right < left).

Each function template operator>(left, right) returns right < left.

Each function template operator>=(left, right) returns !(left < right).

In addition to the overloaded member functions named compare, there’s along list of operators for comparing sub_match objects to various representations of character sequences. Rather than list all six comparison operators for each pair of types,^[3] the preceding synopsis gives the declaration for operator==. The remaining five operators are all declared in the obvious way.

The argument types referred to as various types can be any of the following, where Ty is iterator_traits<BidIt>::value_type:

• An object of type basic_string<Ty, Traits, Alloc>

• A pointer of type Ty*

• A reference to type Ty

That is, you can compare a sub_match<BidIt> object to another sub_-match<BidIt> object, to a basic_string object that holds the same character type, to a null-terminated character string, and to a single character. Of course, the sub_match<BidIt> object can be on either side of the comparison.

Example 18.5. Comparison Operators (regexres/operators.cpp)

#include <regex>
#include <iostream>
using std::tr1::regex; using std::tr1::regex_match;
using std::tr1::csub_match; using std::tr1::cmatch;
using std::cout;

static char *blocked_sites[] =
{ // block list; any resemblance between the names here
  // and real URLs is probably accidental
"www.idontwantmykidshere.com",
"www.lotsofxxxstuff.com",
"www.nra.org"
};
const int nsites = sizeof(blocked_sites)
  / sizeof(*blocked_sites);

bool allow(const csub_match& match)
  { // return false if match is on the blocked list
  for (int i = 0; i < nsites; ++i)
    if (match == blocked_sites[i])
      return false;
    else if (match < blocked_sites[i])
      return true;
  return true;
  }

bool check_url(const char *url)
  { // return false if URL is not a valid HTTP URL or
    // if the hostname is on the blocked list
  regex rgx("http://([^/:]+)(:(\d+))?(/.*)?");
  cmatch match;
  return regex_match(url , match , rgx) && allow(match[1]);
  }

void connect(const char *url)
  { // connect to valid, unblocked URL
  if (check_url(url))
    {

    cout << "Okay to connect: "<< url <<' ';
    // remainder of connection code left as exercise for the reader
    }
  else
    cout << "Invalid or blocked URL: " << url << ' ';
  }

int main()
  { // connect to a couple of sites
  connect("http://www.xxx.com/risque/index.html");
  connect("http://www.petebecker.com/tr1book");
  connect("http:/invalid, for many reasons");
  return 0;
  }

This example is a lot like the previous one but with two differences, both in the function allow. First, this example uses operator== to check whether the hostname is in the blocked list. Second, this example uses operator< to take advantage of the list’s being in alphabetical order to cut the linear search short when it reaches a name that comes after the target hostname.

18.3. Predefined `sub_match` Types

typedef sub_match<const char*> csub_match;
typedef sub_match<const wchar_t*> wcsub_match;
typedef sub_match<std::string::const_iterator> ssub_match;
typedef sub_match<std::wstring::const_iterator> wssub_match;

The four names are synonyms for the most commonly used sub_match types. Keep in mind that the template argument to sub_match must be the iterator type associated with the target text that was passed to regex_match or regex_search. When the target text was passed as a char* or wchar_-t* (const or otherwise), the associated iterator types are const char* and const wchar_t*, respectively. When the target text is held in a string or wstring object, the associated iterator type is the string type’s nested name const_iterator.

Example 18.6. Predefined sub_match Types (regexres/predefined.cpp)

#include <regex>
#include <iostream>

#include <string>
using std::tr1::regex; using std::tr1::wregex;
using std::tr1::regex_match;
using std::tr1::cmatch; using std::tr1::smatch;
using std::tr1::wcmatch;using std::tr1::wsmatch;
using std::tr1::csub_match; using std::tr1::ssub_match;
using std::tr1::wcsub_match; using std::tr1::wssub_match;
using std::string; using std::wstring;
using std::cout;

static void show(…)
  { // called with unknown type
  cout << "Called with unknown argument type ";
  }

static void show(csub_match match)
  { // called with csub match argument
  cout << "Called show(csub_match) ";
  }

static void show(wcsub_match match)
  { // called with wcsub match argument
  cout << "Called show(wcsub_match) ";
  }

static void show(ssub_match match)
  { // called with ssub match argument
  cout << "Called show(ssub_match) ";
  }

static void show(wssub_match match)
  { // called with wssub match argument
  cout << "Called show(wssub_match) ";
  }

int main()
  { // show sub match types for various match results types
  regex rgx("abc");
  cmatch match0;
  if (regex_match("abc", match0, rgx))
    show(match0[0]);
  smatch match1;
if (regex_match(string("abc"), match1, rgx))
  show(match1[0]);
wregex wrgx(L"abc");

wcmatch match2;
if (regex_match(L"abc", match2, wrgx))
  show(match2[0]);
wsmatch match3;
if (regex_match(wstring(L"abc"), match3, wrgx))
  show(match3[0]);
return 0;
}

18.4. The `match_results` Class Template

The class template match_results is a nonmodifiable container.^[4] It holds the results of a successful match found by a call to regex_match or regex_-search. Typically, your code will create a match_results<BidIt> object, with the type BidIt being an iterator of the same type as the iterator for the target text. For example, when the target text is passed as a const char*, use match_results<const char *>. When the target text is passed as a standard string object, use match_results<string::const_iterator>.

template <class BidIt,
  class Alloc = allocator<
    typename iterator_traits<BidIt>:: value_type> >
   class match_results {
public:
  explicit match_results(const Alloc& alloc = Alloc());
  match_results (const match_results & right);

  match_results& operator=(const match_results& right);
  void swap(const match_results& other) throw();

  const_reference operator[](size_type sub) const;
  difference_type position(size_type sub = 0) const;
  difference_type length(size_type sub = 0) const;
  string_type str(size_type sub = 0) const;

  const_reference prefix() const;

  const_reference suffix() const;

  const_iterator begin() const;
  const_iterator end() const;
  template<class OutIt>
    OutIt format(OutIt out,
      const string_type& fmt,
      match_flag_type flags = format_default) const;
string_type format(const string_type& fmt,
  match_flag_type flags = format_default) const;

  size_type size() const;
  size_type max_size() const;
  bool empty() const;
  allocator_type get_allocator() const;

  typedef sub_match<BidIt> value_type;
  typedef const typename Alloc::const_reference
    const_reference;
  typedef const_reference reference;
  typedef T0 const_iterator;
  typedef const_iterator iterator;
  typedef typename iterator_traits<BidIt>::difference_type
    difference_type;
  typedef typename Alloc::size_type size_type;
  typedef Alloc allocator_type;
  typedef typename iterator_traits<BidIt>::value_type
    char_type;
  typedef basic_string<char_type> string_type;

  };

The template takes two type arguments. The first, listed here as BidIt, must be a bidirectional iterator, the same type as you’re going to use to point to the target text. The second is an allocator type. An object of this type is stored in the match_results object and will be used to manage the memory needed to hold the various sub_match objects that hold the details of a successful match. The default allocator type is an instance of the allocator from the standard library.

Objects of type match_results<BidIt> can be created, copied, assigned, and swapped. These operations are discussed in Section 18.4.1. After a successful search, you can examine capture groups individually with the member functions position, length, str, and operator[], and you can look at the part of the target text that preceded or followed the matching text with the member functions prefix and suffix. These are discussed in Section 18.4.2. Because a match_results object is a container, you can call the member functions begin and end to get a pair of iterators that designate a half-open sequence of sub_match objects, as discussed in Section 18.4.3. You can also ask about the number of elements in the container, with the member functions size, max_size, and empty, and you can get a copy of the container’s allocator with get_allocator. These functions are discussed in Section 18.4.4. Like all containers, the template defines several nested type names, described in Section 18.4.5. The library provides two operators to compare match_results<BidIt> objects for equality (Section 18.4.6) and four typedef names that provide synonyms for commonly used match_results instances (Section 18.4.7). Finally, two member functions can be used to produce formatted text by replacing various parts of the target text. These are discussed in Chapter 20, which covers formatting and text replacement.

18.4.1. Creating and Modifying `match_results` Objects

explicit match_results::match_results(
const Alloc& alloc = Alloc());

The constructor constructs a match_results object that holds a copy of the argument alloc and no elements.

Thus, after constructing an object with this constructor, the member function size returns 0, and the member function str returns an empty string.

match_results::match_results(const match_results& right);
match_results& match_results::operator=(
const match_results& right);

The copy constructor constructs an object that is a copy of its argument. The assignment operator replaces the object’s controlled sequence with a copy of its argument.

void match_results::swap(
  const match_results& other) throw();
template<class Elem, class IOtraits,

class BidIt, class Alloc >
  void swap(match_results<BidIt , Alloc >& left,
      match_results<BidIt , Alloc>& right) throw()
      { // swap left and right
      left.swap(right);
      }

The member function swaps the object’s controlled sequence with its argument’s controlled sequence and does not throw exceptions. The non-member function calls left.swap(right).

Example 18.7. Constructors and Modifiers for match_results (regexres/modify.cpp)

#include <regex>
#include <iostream>
#include <stdlib.h>
using std::tr1::regex; using std::tr1::match_results;
using std::tr1::regex_search;
using std::cout;

typedef match_results<const char *> mtch;

static void show(const char *title, const mtch& match)
  { // summarize match results object
  cout << title << ": ";
  cout << "size:" << match.size() << ' ';
  cout << "contents: `" << match.str() << "` ";
  }

int main()
  { // demonstrate various constructors and modifiers
  mtch match;
  show("after default constructor" , match);
  regex rgx("b(c*)d");
  const char *tgt = "abcccde";
  mtch match1;
  if (!regex_search(tgt, match1, rgx))
    return EXIT_FAILURE;
  show("after successful search" , match1);
  mtch match2(match1);
  show("after copy construction" , match2);
  match.swap(match1);
  show("after swap" , match);
  swap(match , match1);
  show("after another swap" , match);
  match = match2;

  show("after assignment" , match);
  return 0;
  }

18.4.2. Examining Individual Matches

const_reference
match_results::operator[](size_type n) const;

The operator returns a reference to the nth element in the controlled sequence or a reference to an empty sub_match object if size() <= n or if the nth capture group was not part of the match.

The 0th element of the controlled sequence is a sub_match object that delineates the entire text that matched the regular expression. Succeeding elements delineate the text that matched the corresponding capture group. If a capture group was not part of the match or if n is larger than the number of capture groups, the sub_match object is empty; these sub_match objects are not required to be distinct.

difference_type position(size_type n = 0) const;

The member function returns distance(prefix().first(), (*this)[n].first).

That is, it returns the offset of the beginning of the text that matches capture group n from the beginning of the target text.

difference_type length(size_type n = 0) const;

The member function returns (*this)[n].length().

That is, it returns the number of characters in the nth capture group.

string_type str(size_type n = 0) const;

The member function returns string_type((*this)[n]).

That is, it returns an object of type string_type that holds a copy of the text of the nth capture group.

const_reference match_results::prefix() const;
const_reference match_results::suffix() const;

The first member function returns a reference to an object of type sub_-match<BidIt> that points to the character sequence that begins at the start of the target sequence and ends at (*this)[0].first. The second member function returns a reference to an object of type sub_-match<BidIt> that points to the character sequence that begins at (*this)[size() - 1].second and ends at the end of the target sequence.

That is, the two member functions return sub_match objects that point to the text that precedes and follows, respectively, the text that matched the regular expression.

Example 18.8. Examining Contained Objects (regexres/examine.cpp)

#include <regex>
#include <iostream>
#include <stdlib.h>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::match_results; using std::tr1::sub_match;
using std::cout;

typedef match_results<const char *> mtch;

static void show(int idx, const mtch& match)
  { // show contents of match[idx]
  cout << "match[" << idx << "]: "
    << (match[idx].matched  ?  "  "  : "not")
    << "matched, `" << match.str(idx)
    << "` at offset " << match.position(idx)
    << ", with length " << match.length(idx) << ' ';
  }

int main()
  { // demonstrate operator[]
  regex rgx("b(c*|(x))d");
  const char *tgt = "abcccde";
  mtch match;
  if (!regex_search(tgt, match, rgx))
    return EXIT_FAILURE;

  cout << "After search, size is "
    << match.size() << ' ';
  cout << "text preceding match is `"
    << match.prefix() << "` ";
  for (int i = 0; i < match.size() + 2; ++i)
    show(i, match);
  cout << "text following match is `"
    << match.suffix() << "` ";
  return 0;
  }

The output from this program shows that match holds three sub_match objects. The object returned by prefix() holds the text “a”, which is the text that preceded the matching text. The object returned by suffix() holds the text “e”, which is the text that followed the matching text. The object returned by match[0] holds the text “bcccd”, which is all the target text that matched the regular expression. The object returned by match[1] holds the text “ccc”, which is the part of the target text that matched the first capture group, “(c*|(x))”. The object returned by match[2] is empty because capture group 2, “(x)”, wasn’t part of the match. The objects returned by match[3] and match[4] are also empty because they refer to capture groups that don’t exist in the regular expression.

18.4.3. Iterating Through All Matches

const_iterator match_results::begin() const;
const_iterator match_results::end() const;

The first member function returns a random access iterator that points to the first element of the controlled sequence or just beyond the end of an empty sequence. The second member function returns a random access iterator that points just beyond the end of the controlled sequence.

Note that the controlled sequence is the sequence of sub_match objects returned by calling operator[] with successive values from 0 to size() - 1. It does not include the sub_match objects returned by prefix or suffix unless those happen to be equal to one of the other sub_match objects, which occurs only with empty sub_match objects.

Example 18.9. Iterating Through an Object (regexres/iterate.cpp)

#include <regex>
#include <iostream>
#include <algorithm>
#include <iterator>
using std::tr1::regex; using std::tr1::regex_search;
using std::tr1::match_results; using std::tr1::sub_match;
using std::cout; using std::ostream_iterator;
using std::copy;

typedef const char *iter;
typedef sub_match<iter> sub;
typedef match_results<iter> mtch;

namespace std { // add inserter to namespace std
template <class Elem, class Alloc>
basic_ostream<Elem, Alloc>& operator<<(
  basic_ostream<Elem, Alloc>& out, const sub & val)
  { // insert sub match <iter> into stream
  return out << '`' << val.str() << '`';
  }
}

int main()
  {
  regex rgx("b(c*|(x))d");
  const char *tgt = "abcccde";
  mtch match;
  if (!regex_search(tgt, match, rgx))
    return EXIT_FAILURE;
  copy(match.begin(), match.end(),
    ostream_iterator <sub>(cout, " "));
  return 0;
  }

18.4.4. General Queries

size_type match_results::size() const;
size_type match_results::max_size() const;
bool match_results::empty() const { return size() == 0;}

The first member function returns the length of the controlled sequence. The second member function returns the length of the longest sequence that the object can control. The third member function returns true only if the length of the controlled sequence is 0.

allocator_type match_results::get_allocator() const;

The member function returns a copy of the stored allocator object.

18.4.5. Nested Types

typedef sub_match <BidIt> value_type;
typedef const typename Alloc::const_reference
  const_reference;
typedef const_reference reference;
typedef T0 const_iterator;
typedef const_iterator iterator;
typedef typename iterator_traits<BidIt>::difference_type
  difference_type;
typedef typename Alloc::size_type size_type;
typedef Alloc allocator_type;
typedef typename iterator_traits<BidIt>::value_type
  char_type;
typedef basic_string<char_type> string_type;

The type names nested in match_results<BidIt, Alloc> are defined as follows:

• value_type: a synonym for sub_match<BidIt>

• const_reference: a description of an object that can serve as a reference to an unmodifiable element of the controlled sequence

• reference: a description of an object that can serve as a reference to an unmodifiable element of the controlled sequence

• const_iterator: a description of an object that can serve as a random-access iterator that points at unmodifiable elements of the controlled sequence

• iterator: a description of an object that can serve as a random-access iterator that points at unmodifiable elements of the controlled sequence

• difference_type: a synonym for iterator_traits<BidIt>::difference_type; it describes an object that can represent the difference between any two iterators that point at elements of the controlled sequence

• size_type: a synonym for Alloc::size_type

• allocator_type: a synonym for the template argument Alloc

• char_type: a synonym for iterator_traits<BidIt>::value_type, which is the element type of the character sequence that was searched

• string_type: a synonym for basic_string<char_type

A match_results object satisfies the requirements for a sequence container^[5] except that operations that modify the sequence are not supported. All but the last two nested types are required for a sequence container. The last two make it easier to talk about the contents of the character sequences that the container holds.

18.4.6. Comparing `match_results` Objects

template <class BidIt, class Alloc>
  bool operator==(
    const match_results<BidIt , Alloc>& left,
    const match_results<BidIt , Alloc>& right);
template <class BidIt , class Alloc>
  bool operator!=(
    const match_results<BidIt , Alloc>& left,
    const match_results<BidIt , Alloc>& right)
      { return !(left == right);}

The first operator returns true only if left.size() == right.size() and equal(left.begin(), left.end(), right.begin()). The second operator returns true only if !(left == right).

These operators apply the usual definition of equality for container types: Two containers are equal if they hold the same number of elements and corresponding elements are equal.

18.4.7. Predefined `match_results` Types

typedef match_results<const char *> cmatch;
typedef match_results<const wchar_t *> wcmatch;
typedef match_results<string::const_iterator> smatch;
typedef match_results<wstring::const_iterator> wsmatch;

The four names are synonyms for the most commonly used match_results types. Keep in mind that the template argument to match_results must be the iterator type associated with the target text that was passed to regex_-match or regex_search. When the target text is a pointer to char or wchar_t (const or otherwise), the associated iterator type is a pointer to const char or to const wchar_t. When the target text is a string or wstring object, the associated iterator type is the string type’s nested name, const_iterator.

18.4.8. Formatting Text

template<class OutIt>
OutIt match_results::format(OutIt out,
  const string_type& fmt,
  match_flag_type flags = format_default) const;
string_type match_results::format(
  const string_type& fmt,
  match_flag_type flags = format_default) const;

These member functions are discussed in the Chapter 20, which covers formatting and text replacement.

Exercises

Exercise 1

For each of the following errors, write a simple test case containing the error, and try to compile it. In the error messages, look for the key words that relate to the error in the code.

1. Attempting to modify the contents of a match_results object

2. Attempting to specialize match_results with an iterator type that is not a bidirectional iterator or a random access iterator

3. Attempting to call regex_search with a match_results specialization and a basic_regex object whose element types are not the same

Exercise 2

Write a utility function that takes a reference to a match_results object and shows whether that object was part of a successful match and, if so, shows useful information about the match: its prefix, the contents of each capture group, and its suffix. For each capture group that was part of the match, indent its text by the number of characters that preceded the capture group in the original text. Use this utility function to review any of the Chapter 15 examples that were unclear.

Exercise 3

Write a utility function that takes a reference to a match_results object and an index value and shows whether the sub_match object at that index value was part of a successful match and, if so, shows all the available information about the capture group that it matched: its position in the target text, its length, and its contents. Search for text matching the regular expression “(a(.*)b)|(c(.*)d)” in the target text “ab”, and compare the information about capture groups 2 and 4. Make sure that you understand the difference between them.

Exercise 4

One of the differences between the ECMAScript grammar and the UNIX-based grammars is the UNIX requirement to find the longest sub-matches while finding the longest overall match. Write a program that searches for text matching the regular expression “(wee|week).*” in the target text “weeknights”, using both the ECMAScript and the ere grammars, and shows the contents of capture group 1. Also try it with ere and the flag match_any.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 18. Search Results

Create new playlist

Sign In

Sign Up

Chapter 18. Search Results

18.1. Header <regex> Partial Synopsis

18.2. The sub_match Class Template

18.2.1. Nested Types

18.2.2. Access

18.2.3. Comparison

Member Functions

Nonmember Operators

18.3. Predefined sub_match Types

18.4. The match_results Class Template

18.4.1. Creating and Modifying match_results Objects

18.4.2. Examining Individual Matches

18.4.3. Iterating Through All Matches

18.4.4. General Queries

18.4.5. Nested Types

18.4.6. Comparing match_results Objects

18.4.7. Predefined match_results Types

18.4.8. Formatting Text

Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Table of Contents for
Chapter 18. Search Results

18.1. Header `<regex>` Partial Synopsis

18.2. The `sub_match` Class Template

18.3. Predefined `sub_match` Types

18.4. The `match_results` Class Template

18.4.1. Creating and Modifying `match_results` Objects

18.4.6. Comparing `match_results` Objects

18.4.7. Predefined `match_results` Types