Regular Expressions

The BCL includes support for performing regular expression matching and replacement capabilities. The expressions are based on Perl5 regexp, including lazy quantifiers (?? , *? , +? , {n,m}?), positive and negative lookahead, and conditional evaluation.

The types mentioned in this section all exist in the System.Text.RegularExpressions namespace.

Regex Class

The Regex class is the heart of the BCL regular expression support. Used both as an object instance and a static type, the Regex class represents an immutable, compiled instance of a regular expression that can be applied to a string via a matching process.

Internally, the regular expression is stored as either a sequence of internal regular expression bytecodes that are interpreted at match time or as compiled MSIL opcodes that are JIT-compiled by the CLR at runtime. This allows you to make a tradeoff between worsened regular expression startup time and memory utilization versus higher raw match performance at runtime.

For more information on the regular-expression options, supported character escapes, substitution patterns, character sets, positioning assertions, quantifiers, grouping constructs, backreferences, and alternation, see Appendix B .

Match and MatchCollection Classes

The Match class represents the result of applying a regular expression to a string, looking for the first successful match. The MatchCollection class contains a collection of Match instances that represent the result of applying a regular expression to a string recursively until the first unsuccessful match occurs.

Group Class

The Group class represents the results from a single grouping expression. From this class, it is possible to drill down to the individual subexpression matches with the Captures property.

Capture and Capture Collection Classes

The CaptureCollection class contains a collection of Capture instances, each representing the results of a single subexpression match.

Using Regular Expressions

Combining these classes, you can create the following example:

/*
 * Sample showing multiple groups,
 * and groups with multiple captures
 * Build the sample as:
 * csc /r:System.Text.RegularExpressions.dll test.cs
 */
using System;
using System.Text.RegularExpressions;
class Test
  {
  static void Main( )
    {
    string text = "abracadabra1abracadabra2abracadabra3";
    string pat = @"
	(		# start the first group
	  abra		# match the literal 'abra'
	  (		# start the second (inner) group
	  cad		# match the literal 'cad'
	  )?		# end the second (optional) group
	)		# end the first group
	+		# match one or more occurences
	";
    Console.WriteLine("Original text = ["+text+"]");
    Regex r = new Regex(pat, "x");	// use 'x' modifier to ignore comments
    int[] gnums = r.GetGroupNumbers( );	// get the list of group numbers
    Match m = r.Match(text);		// get first match
    while (m.Success)
      {
      for (int i = 1; i < gnums.Length; i++)	// start at group 1
	{
	Group g = m.Group(gnums[i]);		// get the group for this match
	Console.WriteLine("Group"+gnums[i]+"=["+g.ToString( )+"]");
	CaptureCollection cc = g.Captures;	// get caps for this group
	for (int j = 0; j < cc.Count; j++)
	  {
	  Capture c = cc[j];
	  Console.WriteLine(	"Capture" + j + "=["+c.ToString( ) +
				 "] Index=" + c.Index + " Length=" + c.Length);
	  }
	}
      m = m.NextMatch( );		// get next match
      }
    }
  }

The preceding example produces the following output:

Original text = [abracadabra1abracadabra2abracadabra3]
Group1=[abra]
	Capture0=[abracad] Index=0 Length=7
	Capture1=[abra] Index=7 Length=4
Group2=[cad]
	Capture0=[cad] Index=4 Length=3
Group1=[abra]
	Capture0=[abracad] Index=12 Length=7
	Capture1=[abra] Index=19 Length=4
Group2=[cad]
	Capture0=[cad] Index=16 Length=3
Group1=[abra]
	Capture0=[abracad] Index=24 Length=7
	Capture1=[abra] Index=31 Length=4
Group2=[cad]
	Capture0=[cad] Index=28 Length=3
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.63