Chapter 16. Regular Expression Objects

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 16. Regular Expression Objects

I du believe with all my soul
In the gret Press’s freedom,
To pint the people to the goal
An’ in the traces lead ’em.

— “The Bigelow Papers”
JAMES RUSSELL LOWELL

All the operations that involve regular expressions begin with an object of a type that is an instantiation of the class template basic_regex. This object holds the regular expression to be matched. It’s passed as an argument to the search functions (Chapter 17) and to the constructors of the regular expression iterators (Chapter 19).

The template takes two type arguments. The first, referred to as Elem in this chapter, is the type of the characters that the regular expression object will traffic in. The most common element types are, of course, char and wchar_t. Two typedefs, named regex and wregex, respectively, can be used to create regular expression objects for sequences of characters of these two types. Thus, you can create regular expression objects for ordinary characters with the type basic_regex<char> or with the type regex; they mean the same thing.

The second type argument is a traits class, which we look at in detail in Chapter 21. This traits class has a default type of regex_traits<Elem>, so basic_regex<char> is shorthand for basic_regex<char, regex_traits<char> >. The class template regex_traits has explicit instantiations in the TR1 library for elements of type char and wchar_t. Other element types have no requirements. So if you want to use basic_regex for any element type other than char or wchar_t, you’ll have to provide your own traits class.^[1]

16.1. Definitions

A bitmask type is defined by the library implementation. Values of a bitmask type can be combined with the | operator to create new values that represent the union of the values specified by the operands and with the & operator to create new values that represent the intersection of the values specified by the operands.

An enumeration type is defined by the library implementation. It provides a set of named constants.

An empty regular expression does not match any character sequence.

The constructors, the assignment operators, and the assign member functions for objects of type basic_regex<Elem> all take an operand sequence that designates the regular expression that the resulting object will hold. The constructors and the assign member functions also take an additional argument that designates the regular expression grammar to use to interpret the operand sequence, as well as some optional flags to permit optimizations and to modify the meaning of some elements of the regular expression grammar. These functions all throw an object of type regex_error (see Section 16.6) if the operand sequence is not a valid regular expression.

In the descriptions of these functions, the names of the arguments are used to describe the form of the operand sequence:

• ptr: a null-terminated sequence of characters of type Elem—such as a C string, when Elem is type char—beginning at ptr, which must not be a null pointer, where the terminating element is the value Elem() and is not part of the operand sequence

• ptr, count: a sequence of count characters of type Elem beginning at ptr, which must not be a null pointer

• str: the sequence specified by the basic_string<Elem> object str

• first, last: a sequence of characters of type Elem delimited by the iterators first and last, in the range [first, last)

• right: the basic_regex<Elem> object right

For example, the constructor

explicit basic_regex( const Elem *ptr,
flag_type flags = ECMAScript)

constructs a basic_regex<Elem> object from a null-terminated character sequence:

basic_regex<char> rgx("a*b"); // rgx holds regular expression "a*b"

The constructor

basic_regex(const Elem *ptr, size_type count,
flag_type flags = ECMAScript)

constructs a basic_regex<Elem> object from a C-style array and a character count:

basic_regex<char> rgx("a*b", 2);
// rgx holds regular expression "a*"

The constructor

template<class STraits, class STalloc>
explicit basic_regex(
const basic_string<Elem, STraits, STalloc>& str,
flag_type flags = ECMAScript)

constructs a basic_regex<Elem> object from a C++ basic_string<Elem> object:

string str("a*b");
basic_regex<char> rgx(str); // rgx holds regular expression "a*b"

The constructor

template<class InIt>
basic_regex(InIt first, InIt last,
flag_type flags = ECMAScript)

constructs a basic_regex<Elem> object from a pair of iterators:

vector<char> vec;
vec.push_back('a'),
vec.push_back('*'),
vec.push_back('b'),
basic_regex<char> rgx(vec.begin(), vec.end());
// rgx holds regular expression "a*b"

And the constructor

basic_regex(const basic_regex <Elem>& right)

constructs a basic_regex<Elem> object from another basic_regex<Elem> object:

basic_regex<char> rgx("a*b"); // rgx holds regular expression "a*b"
basic_regex<char> rgy("a*b"); // rgy holds regular expression "a*b"

16.2. Header `<regex>` Partial Synopsis

In this chapter, we look at the following components of the header <regex>:

namespace std {   // C++ standard library
namespace tr1 {  // TR1 additions

    // bitmask_type syntax_option_type
namespace regex_constants {
  typedef bitmask_type syntax_option_type;
  static const syntax_option_type
    awk, basic, collate, ECMAScript, egrep,
    extended, grep, icase, nosubs, optimize;
  }

    // CLASS TEMPLATE basic_regex
template<class Elem,
    class RXtraits = regex_traits<Elem> >
    class basic_regex;
typedef basic_regex<char> regex;
typedef basic_regex<wchar_t> wregex;

    // FUNCTION TEMPLATE swap
template<class Elem, class RXtraits>
    void swap(basic_regex <Elem, RXtraits>& left,
        basic_regex <Elem, RXtraits>& right) throw ();

    // enumeration_type error_type
namespace regex_constants {
  typedef enumeration_type error_type;
  static const error_type error_backref, error_badbrace,
    error_badrepeat, error_brace, error_brack,
    error_collate, error_complexity, error_ctype,
    error_escape, error_paren, error_range,
    error_space, error_stack;
  }

    // CLASS regex_error
class regex_error;

} }

16.3. Syntax Options

namespace regex_constants { // regular expression constants
  typedef bitmask_type syntax_option_type;
  static const syntax_option_type
    awk, basic, collate, ECMAScript, egrep,
    extended, grep, icase, nosubs, optimize;
}

The type is a bitmask type that designates various combinations of a regular expression grammar, optimization flags, and syntax modifiers.

Values of type syntax_option_type are passed to members of the class template basic_regex to designate the regular expression grammar to use to interpret the member’s operand sequence, to modify the meaning of some elements of the regular expression grammar, and to permit optimizations.

The following constants designate regular expression grammars:

• ECMAScript: compile as ECMAScript

• basic: compile as BRE

• extended: compile as ERE

• awk: compile as awk

• grep: compile as grep

• egrep: compile as egrep

The following constants modify the meaning of some elements of the regular expression grammar:

• icase: make matches case insensitive

• collate: make collating locale sensitive

The following constants permit optimizations:

• nosubs: the implementation need not report the contents of capture groups

• optimize: the implementation should emphasize speed of matching rather than speed of regular expression compilation

When using these flags, exactly one of the regular expression grammar selectors must be used. Any combination of the other four flags can be logically ORed with the grammar selector to make a valid flags argument.

These names are defined in the namespace regex_constants, which is embedded in the namespace std::tr1. As a result, the fully qualified names for these constants are rather longwinded. For convenience, these constants are also available in the class template basic_regex, so you can get them from either place. For example, in all the following expressions, the identifier icase means the same thing:

std::tr1::regex_constants::icase;
std::tr1::basic_regex::icase;
using std::tr1::basic_regex;
basic_regex::icase;
using namespace std::tr1;
regex_constants::icase;
using namespace std::tr1::regex_constants;
icase;

16.3.1. Case-Insensitive Comparisons

Ordinarily, a character in a regular expression matches a character in the target text if the two characters have the same numeric value. For example, in the ASCII encoding, the character “a” has the value 0x61. A regular expression object regex rgx(“a”) will match the target text “a” and the target text “x61” because they both have the same numeric value as the ‘a’ in the regular expression. It will not match the target text “A” or the target text “x41”, which is the ASCII code for ‘A’.

If you pass the flag icase along with the text of the regular expression, characters will be compared for equality by converting each of them to lower-case and comparing the results. A regular expression object regex rgx(“a”, ECMAScript | nocase) will match the target text “a” and the target text “x61”, as well as the target text “A” and the target text “x41”.^[2]

The icase flag applies only to individual character comparisons. It does not change the meaning of the characters used to define a character range. Thus, the character range “[a-c]” represents the characters ‘a’, ‘b’, and ‘c’, with or without the icase flag. Of course, when you look for a match with the icase flag, that set of three characters will match uppercase as well as lowercase characters.^[3]

Example 16.1. The icase Flag (regexbasic/icase.cpp)

# include <regex>
# include <iostream>
using std::tr1::regex;
using std::tr1::regex_match;
using namespace std::tr1::regex_constants;
using std::cout;

static void match(const char * title,
const char *expr, const char * tgt,
  syntax_option_type flags)
  { // check for match
  regex rgx(expr, flags);
  cout << '`' << expr << "` (" << title << "): ";
  if (regex_match(tgt, rgx))
    cout << "matched";
  else
    cout << "didn't match";
  cout << " ` " << tgt << " ` ";
  }

static void match4(const char * title, const char * expr,
  syntax_option_type flags)
  { // check four matches
  match(title, expr, "a", flags);
  match(title, expr, " x61 ", flags);
  match(title, expr, "A", flags);
  match(title, expr, " x41 ", flags);
  }

int main ()
  { // demonstrate icase flag
  match4("case sensitive", "a", ECMAScript);
  match4("case insensitive", "a", ECMAScript | icase);
  match4("case sensitive", "[a-c]", ECMAScript);

  match4 ("case insensitive", "[a-c]",
    ECMAScript | icase);
  return 0;
  }

16.3.2. Character Ranges and the `collate` Flag

The collate flag does, however, change the rules for defining and testing character ranges. As we saw in Chapter 15, a character range is defined by writing the first and last characters, separated by a dash, inside a bracket expression. For example, the regular expression “[0-2]”, in the C locale, matches any of the characters ‘0’, ‘1’, or ‘2’.^[4] The rule is that for a character range whose end points are ch1 and ch2, a character ch is in the range if ch1 <= ch && ch <= ch2. The relative order of characters is determined by the relative order of their internal representation.

For many writing systems, that rule doesn’t work. We saw an example of this in Chapter 15, when we talked about character ranges. In the EBCDIC encoding, there are nonalphabetic characters represented by values between the values that represent ‘i’ and ‘j’, so a regular expression like “[h-k]” will end up including unexpected characters in the range.

Another example occurs with character encodings such as ISO-8859-1, which supplements the ASCII encoding with characters whose representations have values that are greater than 127.^[5] The character ‘a’ is represented by the value 0x61; the character ‘c’ is represented by the value 0x63. So if we ask whether the regular expression “[a-c]” matches the target sequence “â”, the answer would be no, because the representation of ‘â’ is 0xE2, which is not in the range [0x61, 0x63].

To fix both of these problems, make the definition of the range locale sensitive with collate. The test for inclusion in a range then involves an extra level of indirection: Each character ch is translated into a collating element by calling

use_facet<collate<Elem> >(
getloc()).transform(&ch, & ch + 1)

which returns an object of type std::basic_string<Elem>. These returned strings are then compared, and the result determines whether a character is in the range. If we refer to that rather unwieldy expression as TRANS(ch), the test for inclusion in a range whose end points are ch1 and ch2 is TRANS(ch1) <= TRANS(ch) && TRANS(ch) <= TRANS(ch2). In the ISO-8859-1 locale, this more complicated test will correctly put ‘â’ in the range “[a-c]”.

16.3.3. The `nosubs` Flag

In Chapter 18, we look at the class template match_results. You can pass a match_results object to the regular expression search functions; on a successful search, the function will fill in details about the text that each capture group matched. For example, when matching the regular expression “a(.*)d” to the text “abcd”, the match_results object will tell you that the first capture group matched the text “bc”. That information is often important, but sometimes doesn’t matter. In that case, you can use the flag nosubs to generate a basic_regex object that will not report the details of capture groups. That can make matching significantly faster.^[6]

16.3.4. Optimizing Searches

When you generate a regular expression object, the library code scans the sequence of characters that defines the regular expression and converts it into an internal representation.

The internal representation usually takes one of two general forms. The one that’s more difficult to create can produce faster searches. Passing the optimize flag when you generate a regular expression object asks for fast searching, even if that means taking longer to build the internal representation. Just as with the nosubs flag, there is no enforceable requirement here. But if you need fast searches, it doesn’t hurt to ask.^[7]

16.4. The `basic_regex` Class Template

template<class Elem,
  class RXtraits = regex_traits<Elem>
  class basic_regex {
public:

  basic_regex();
  explicit basic_regex(const Elem *ptr,
    flag_type flags = ECMAScript);
  basic_regex(const Elem *ptr, size_type count,
    flag_type flags = ECMAScript);
  basic_regex(const basic_regex& right);
  template<class STtraits, class STalloc>
    explicit basic_regex(
      const basic_string<Elem, STtraits, STalloc>& str,
      flag_type flags = ECMAScript);
  template<class InIt>
    explicit basic_regex(InIt first, InIt last,
      flag_type flags = ECMAScript);

  basic_regex& operator=(const Elem *ptr);
  template<class STtraits, class STalloc>
    basic_regex& operator=(
      const basic_string <Elem, STtraits, STalloc >& str);
  basic_regex& operator=(const basic_regex& right);
  basic_regex& assign(const Elem *ptr,
    flag_type flags = ECMAScript);
  basic_regex& assign(const Elem *ptr, size_type count,
    flag_type flags = ECMAScript);
  template<class STtraits, class STalloc>
  basic_regex& assign(
    const basic_string <Elem, STtraits, STalloc >& str,
    flag_type flags = ECMAScript);
  template<class InIt>
    basic_regex& assign(InIt first, InIt last,
      flag_type flags = ECMAScript);
      basic_regex& assign(const basic_regex& right);

  void swap(basic_regex& other) throw();

  locale_type imbue(locale_type loc);
  locale_type getloc() const;

  unsigned mark_count() const;

  flag_type flags() const;

  typedef Elem value_type;
  typedef regex_constants::syntax_option_type flag_type;
  typedef typename RXtraits::locale_type locale_type;

  static const flag_type ECMAScript =
    regex_constants::ECMAScript;
  static const flag_type basic =
    regex_constants::basic;
  static const flag_type extended =
    regex_constants::extended;
  static const flag_type grep =
    regex_constants::grep ;
  static const  flag_type egrep =
    regex_constants::egrep;
  static const flag_type awk =
    regex_constants::awk;

  static const flag_type nosubs =
    regex_constants::nosubs;
  static const flag_type optimize =
    regex_constants::optimize;
  static const flag_type icase =
    regex_constants::icase;
  static const flag_type collate =
    regex_constants::collate;
  };

16.4.1. `basic_regex` Summary

An object of type basic_regex<Elem> can be created by the template’s default constructor or from an operand sequence describing the regular expression that the object will hold. The constructors are discussed in Section 16.4.2 and the meanings of the various operand sequences are discussed in Section 16.1.

The destructor for basic_regex releases all resources used by the object.

You can change a basic_regex object so that it is empty or so that it holds a different regular expression. This is done with operator= or with the assign member functions, discussed in Section 16.4.3.

If a constructor, operator=, or assign fails, either because the operand sequence does not designate a valid regular expression or because there aren’t enough resources available, it throws an object of type regex_error. This is discussed in Section 16.6.

You can exchange the contents of two regular expression objects with the member function basic_regex::swap(basic_regex&) and with the non-member function swap(basic_regex&, basic_regex&). These functions are discussed in Section 16.4.4.

A basic_regex object holds a locale object that determines some of the properties of regular expression matching. You can change this locale object with the member function basic_regex::imbue, and you can get a copy of this local object with the member function basic_regex::getloc. These functions are discussed in Section 16.4.5.

You can get the number of capture groups in a regular expression by calling the member function basic_regex::mark_count. You can get a copy of the flags used for the regular expression by calling the member function basic_regex::flags. These functions are discussed in Section 16.4.6.

The template basic_regex defines two nested type names, based on its template type arguments, and repeats several type names and constants that are also defined in the namespace std::tr1::regex_constants. These definitions are discussed in Section 16.4.7.

16.4.2. Creating `basic_regex` Objects

basic_regex::basic_regex ();

The constructor constructs a basic_regex object that holds an empty regular expression.

explicit basic_regex::basic_regex (const Elem *ptr,
  flag_type flags = ECMAScript);
basic_regex::basic_regex (const Elem *ptr, size_type  count,
  flag_type flags = ECMAScript);
basic_regex__basic_regex(const basic_regex& right);
template<class STtraits, class STalloc>
  explicit basic_regex::basic_regex(
    const basic_string<Elem,  STtraits, STalloc>& str,
    flag_type flags = ECMAScript);
template<class InIt>
  explicit basic_regex::basic_regex(InIt first, InIt last,
    flag_type flags = ECMAScript);

Each of the constructors constructs a basic_regex object that holds a regular expression defined by the constructor’s operand sequence interpreted in accordance with the flags argument.

All the flags arguments have a default value of ECMAScript, so the default grammar is ECMAScript. To use a different grammar, pass the constant that represents that grammar to the constructor.

Example 16.2. basic_regex Constructors (regexbasic/construct.cpp)

# include <regex >
# include <string >
using std ::tr1 ::regex;
using std ::string;

int main ()
  { // demonstrate basic regex constructors
  regex  rgx0;                 // default constructor; matches nothing
  char  expr1[] = "abc [d-f]";
  regex rgx1 (expr1);     // holds "abc[d-f]", ECMAScript grammar
  regex rgx2 (expr1, 3);  // holds "abc", ECMAScript grammar
  regex rgx3 (rgx2);      // holds "abc", ECMAScript grammar
  string str ("[def]");
  regex rgx4 (str, regex ::basic);
                          // holds "[def]", BRE grammar
  regex rgx5 (str.begin(), str.end(),
    regex ::basic | regex ::icase);
                          // holds "[def]", BRE grammar,
                          // case insensitive
  return 0;
  }

16.4.3. Assigning `basic_regex` Objects

basic_regex& basic_regex::operator= (const Elem *ptr);
template<class STtraits, class STalloc>
  basic_regex& basic_regex::operator=(
    const basic_string<Elem, STtraits, STalloc>&str);
basic_regex& basic_regex::operator=(
  const basic_regex& right);

The operators each replace the regular expression held by *this with the regular expression defined by the operand sequence, then return *this.

The first two operators interpret the operand sequence in accordance with the ECMAScript grammar and no additional flags.

You cannot control the grammar or the other flags with an assignment.

Example 16.3. basic_regex Assignment Operators (regexbasic/assign.cpp)

# include <regex>
# include <string>
using  std ::tr1 ::regex;
using  std ::string;

int main()
  { // demonstrate basic_regex assignment operators
  regex rgx;                   // empty regular expression object
  rgx = "abc";                 // holds  "abc", ECMAScript encoding
  string str("[def]");
  rgx = str;                   // holds  "[def]", ECMAScript encoding
  regex rgx1 ("abc [def]",  regex ::basic);
  rgx = rgx1;              // holds  "abc[def]", BRE encoding
  return 0;
  }

basic_regex & basic_regex ::assign ( const Elem * ptr,
  flag_type flags  = ECMAScript);
basic_regex & basic_regex ::assign (
  const Elem * ptr, size_type   count,
  flag_type flags  = ECMAScript);
template < class STtraits, class   STalloc >
basic_regex & basic_regex ::assign (
  const basic_string <Elem, STtraits, STalloc >& str,
  flag_type flags  = ECMAScript);
template < class   InIt >
  basic_regex & basic_regex ::assign ( InIt   first, InIt last,
    flag_type flags =   ECMAScript);
basic_regex & basic_regex ::assign ( const   basic_regex &  right);

Each of the member functions replaces the regular expression held by *this with the regular expression defined by the operand sequence interpreted in accordance with the flags argument, if present.

16.4.4. Swapping `basic_regex` Objects

void basic_regex ::swap ( basic_regex & other) throw ();
template < class Elem, class RXtraits >
void swap ( basic_regex <Elem, RXtraits >& left,
basic_regex <Elem, RXtraits >& right) throw ();

The member function swaps the regular expressions between *this and other.

The non-member function calls left.swap(right).

Example 16.4. Member Function swap (regexbasic/swap.cpp)

#include <regex>
using std::tr1::regex;

int main()
  { // demonstrate use of swap
  regex  rgx0;            // empty regular expression object
  regex  rgx1("abc");     // holds  "abc"
  rgx0 . swap(rgx1);      // rgx0 holds  "abc" and rgx1 is empty
  swap(rgx0, rgx1);       // rgx0 is empty and rgx1 holds  "abc"
  return 0;
  }

16.4.5. Locales

locale_type basic_regex ::imbue ( locale_type loc);
locale_type basic_regex ::getloc () const ;

The first member function empties *this and calls imbue(loc) on the RXtraits object held by *this. The second member function returns a copy of the locale object held by the RXtraits object held by *this.

The interpretation of a regular expression depends on the locale that it was defined with. A basic_regex object does not keep a copy of the character sequence that defined its regular expression. The object can’t reinterpret the character sequence in accordance with the new locale, so when you call imbue, it discards the previous regular expression. If you don’t want to have an empty basic_regex object, you should provide a new regular expression by assigning from an operand sequence or calling assign or swap.

16.4.6. Access

unsigned basic_regex ::mark_count () const ;

The member function returns the number of capture groups in the regular expression.

When a basic_regex object was created with the flag nosubs, the regular expression engine is not required to keep track of the contents of capture groups. This does not affect the number of capture groups.

Example 16.5. basic_regex::mark_count (regexbasic/markcount.cpp)

# include <regex>
# include <iostream>
using std :: tr1 :: regex ;
using std :: cout ;

void   show_count ( const char * title, const regex & rgx)
  {
  cout << ' " ' <<  title <<  " " has " <<  rgx.mark_count ()
    <<  " capture group "
    <<  ( rgx . mark_count () == 1 ? "" : "s")
    <<  ". ";
  }

void show ( const char * expr)
  {
  regex rgx ( expr);
  show_count (expr, rgx);
  }

int main ()
  { // demonstrate use of mark_count
  show ("");
  show (" abc ");
  show ("( abc)");

  show ("(a)b(c)");
  show ("(a(b)c)");
  return 0;
  }

flag_type basic_regex ::flags () const ;

The member function returns a copy of the flags argument that was passed when the regular expression was defined. If *this is empty, it returns 0.

Example 16.6. basic_regex::flags (regexbasic/flags.cpp)

# include <regex >
# include < iostream >
using std :: tr1 :: regex ;
using std :: cout ;

void   show_flags ( const regex & rgx)
  {   // extract and show flag values
  regex :: flag_type flags = rgx . flags ();
  if (( flags & regex :: ECMAScript) == regex :: ECMAScript)
    cout << " ECMAScript ";
  else if (( flags & regex :: basic) == regex :: basic)
    cout << " basic ";
  else if (( flags & regex :: extended) == regex :: extended)
    cout << " extended ";
  else if (( flags & regex :: grep) == regex :: grep)
    cout << " grep ";
  else if (( flags & regex :: egrep) == regex :: egrep)
    cout << " egrep ";
  else if (( flags & regex :: awk) == regex :: awk)
    cout << "awk ";
  else
    cout << " unknown grammar ";
  if (( flags & regex :: icase) == regex :: icase)
    cout << " | icase ";
  if (( flags & regex :: collate) == regex :: collate)
    cout << " | collate ";
  if (( flags & regex :: nosubs) == regex :: nosubs)
    cout << " | nosubs ";

  if (( flags & regex :: optimize) == regex :: optimize)
    cout << " | optimize ";
  cout <<   ' ' ;
  }

int main ()
  {   // demonstrate member function basic_regex::flags
  regex  rgx ;
  show_flags (rgx);
  rgx . assign ("", regex :: grep | regex :: nosubs);
  show_flags (rgx);
  rgx =   "a";
  show_flags (rgx);
  return 0;
  }

16.4.7. Nested Types and Flags

typedef Elem   basic_regex ::value_type ;
typedef   regex_constants :: syntax_option_type
  basic_regex ::flag_type ;
typedef typename RXtraits :: locale_type
  basic_regex ::locale_type ;

The first typedef is a synonym for the template argument Elem. The second typedef is a synonym for the type regex_constants::syntax_option_-type. The third typedef is a synonym for the type locale_type, defined in the template argument RXtraits.

static const flag_type basic_regex ::ECMAScript   =
  regex_constants :: ECMAScript ;
static const flag_type basic_regex :: basic =
  regex_constants :: basic ;
static const flag_type basic_regex ::extended   =
  regex_constants :: extended ;
static const flag_type basic_regex ::grep   =
  regex_constants ::grep ;
static const flag_type basic_regex :: egrep =

  regex_constants :: egrep ;
static const flag_type basic_regex :: awk =
  regex_constants :: awk ;

static const flag_type basic_regex :: nosubs =
  regex_constants :: nosubs ;
static const flag_type basic_regex ::optimize =
  regex_constants :: optimize ;

static const flag_type basic_regex ::icase =
  regex_constants :: icase ;
static const flag_type basic_regex ::collate   =
  regex_constants :: collate ;

These constants are self-explanatory. They duplicate the values of some of the constants defined in the namespace std::tr1::regex_constants. Writing regex::basic is shorter than writing std::tr1::regex_constants::basic.

16.5. Predefined `basic_regex` Types

typedef basic_regex <char> regex ;
typedef basic_regex <wchar_t> wregex ;

The typedefs are synonyms for basic_regex<char> and basic_regex <wchar_t>, respectively.

These are the types you’ll use most often.

16.6. Error Handling

Some members of basic_regex and some global function templates throw an object of type regex_error when a runtime error occurs. The object holds a value of type regex_constants::error_type that indicates what the error was.

namespace regex_constants {
  typedef enumeration_type error_type ;
  static const error_type error_backref, error_badbrace,
    error_badrepeat, error_brace, error_brack,
    error_collate, error_complexity,   error_ctype,

    error_escape, error_paren, error_range,
    error_space, error_stack ;
  }

The typedef error_type designates an implementation-defined integral type that can be used to designate an error. The values that this type can take on are

• error_backref: the regular expression contained an invalid back reference.

• error_badbrace: the regular expression contained an invalid value in a repetition count.

• error_badrepeat: a repetition symbol—one of *, ?, +, and { in ECMAScript—was not preceded by an expression.

• error_brace: the opening symbol or the closing symbol of a repetition count was not properly matched.

• error_brack: the opening symbol or the closing symbol of a bracket expression was not properly matched

• error_collate: the regular expression contained an invalid collating element name.

• error_complexity: an attempted match failed because it was too complex.

• error_ctype: the regular expression contained an invalid character class name.

• error_escape: the regular expression contained an invalid character escape sequence.

• error_paren: the opening symbol or the closing symbol of a capture group was not properly matched.

• error_range: the regular expression contained an invalid character range specifier.

• error_space: parsing the regular expression failed because not enough resources were available.

• error_stack: an attempted match failed because not enough memory was available.

class regex_error : public std :: runtime_error {
public :
  explicit regex_error ( regex_constants :: error_type error);
  regex_constants :: error_type code () const ;
  };

When an error occurs in parsing a regular expression or when matching a regular expression to a target sequence, the library code throws an object of type regex_error.

The member function regex_error::code returns a value of type regex_-constants::error_type that indicates the nature of the error.

Example 16.7. Catching regex_error (regexbasic/error.cpp)

# include <regex >
# include <iostream>
using std :: tr1 :: regex ; using std :: tr1 :: regex_error ;
using std :: cout ;

const char * get_error (
  std :: tr1 :: regex_constants :: error_type code)
  {   // translate error code to text
  switch ( code)
    { // select text
    case std :: tr1 :: regex_constants :: error_backref :
      return " invalid back reference ";
    case std :: tr1 :: regex_constants :: error_badbrace :
      return " invalid repetition  count ";
    case std :: tr1 :: regex_constants :: error_badrepeat :
      return " repeat not preceded by  expression ";
    case std :: tr1 :: regex_constants :: error_brace :
      return " unmatched curly  brace ";
    case std :: tr1 :: regex_constants :: error_brack :
      return " unmatched square  bracket ";
    case std :: tr1 :: regex_constants :: error_collate :
      return " invalid collating element name ";
    case std :: tr1 :: regex_constants :: error_complexity :
      return " match too complex ";
    case std :: tr1 :: regex_constants :: error_ctype :
      return " invalid character class name ";
    case std :: tr1 :: regex_constants :: error_escape :
      return " invalid character escape  sequence ";
    case std :: tr1 ::  regex_constants :: error_paren :
      return " unmatched  parenthesis ";

    case std :: tr1 :: regex_constants :: error_range :
      return " invalid range specifier ";
    case std :: tr1 :: regex_constants :: error_space :
      return " insufficient resources ";
    case std :: tr1 :: regex_constants :: error_stack :
      return " out of memory ";
    default :
      return " unknown ";;
    }
  }

void test ( const char * expr)
  {   // construct regex object, catch exception
  cout <<  ' ` '  << expr << " ', ";
  try
    {   // try to construct regex object with invalid regular expression
    regex rgx ( expr);
    cout << " okay ";
    }
  catch ( const regex_error & error)
    {   // catch regex  error object
    cout << get_error ( error . code ()) < <   ' ' ;
    }
  }

int main ()
  {   // demonstrate use of error_type
  test ("a{3,1} ");
  test ("[b-a]");
  return 0;
  }

Exercises

Exercise 1

For each of the following errors, write a simple test case containing the error, and try to compile it. In the error messages, look for the key words that relate to the error in the code.

1. Attempting to construct a basic_regex<Elem> object from a C-style array holding a character type that is different from Elem

2. Attempting to construct a basic_regex<Elem> object from a basic_-string<Other> object, where Other is different from Elem

3. Attempting to construct a basic_regex<Elem> object from two iterators of different types

4. Attempting to swap two basic_regex objects with different character types

5. Constructing a basic_regex object using a grammar flag without the appropriate namespace qualifiers

6. Constructing a basic_regex object using a grammar flag and an option flag without the appropriate namespace qualifiers

Exercise 2

Expand Example 16.7 by adding regular expressions to generate all the error codes except error_complexity, error_space, and error_stack.

Exercise 3

Write a program that constructs a regex object from the text on its command line.

Exercise 4

Write a program that uses std::getline to read a line of text from the standard input stream into a std::string object, and construct a regex object from that text.

Exercise 5

Write a program that uses std::getline to read a line of text from a file named “input.txt” into a std::string object, and construct a regex object from that text.

Exercise 6

Write a program that uses two objects of type std::istream_-iterator<char> to read text from a file named “input.txt”, and construct a regex object directly from that text—that is, don’t use a string object or any other kind of intermediate storage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 16. Regular Expression Objects

Create new playlist

Sign In

Sign Up

Chapter 16. Regular Expression Objects

16.1. Definitions

16.2. Header <regex> Partial Synopsis

16.3. Syntax Options

16.3.1. Case-Insensitive Comparisons

16.3.2. Character Ranges and the collate Flag

16.3.3. The nosubs Flag

16.3.4. Optimizing Searches

16.4. The basic_regex Class Template

16.4.1. basic_regex Summary

16.4.2. Creating basic_regex Objects

16.4.3. Assigning basic_regex Objects

16.4.4. Swapping basic_regex Objects

16.4.5. Locales

16.4.6. Access

16.4.7. Nested Types and Flags

16.5. Predefined basic_regex Types

16.6. Error Handling

Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Table of Contents for
Chapter 16. Regular Expression Objects

16.2. Header `<regex>` Partial Synopsis

16.3.2. Character Ranges and the `collate` Flag

16.3.3. The `nosubs` Flag

16.4. The `basic_regex` Class Template

16.4.1. `basic_regex` Summary

16.4.2. Creating `basic_regex` Objects

16.4.3. Assigning `basic_regex` Objects

16.4.4. Swapping `basic_regex` Objects

16.5. Predefined `basic_regex` Types