The
BCL includes support for
performing regular expression matching and replacement capabilities.
The expressions are based on Perl5 regexp,
including lazy quantifiers
(??
,
*?
,
+?
,
{n,m}?
), positive and negative lookahead, and
conditional evaluation.
The types mentioned in this section all exist in the
System.Text.RegularExpressions
namespace.
The
Regex
class is the heart of the BCL regular
expression support. Used both as an object instance and a static
type, the Regex
class represents an immutable,
compiled instance of a regular expression that can be applied to a
string via a matching process.
Internally, the regular expression is stored as either a sequence of internal regular expression bytecodes that are interpreted at match time or as compiled MSIL opcodes that are JIT-compiled by the CLR at runtime. This allows you to make a tradeoff between worsened regular expression startup time and memory utilization versus higher raw match performance at runtime.
For more information on the regular-expression options, supported character escapes, substitution patterns, character sets, positioning assertions, quantifiers, grouping constructs, backreferences, and alternation, see Appendix B .
The Match
class represents the result of applying a
regular expression to a string, looking for the first successful
match. The MatchCollection
class contains a
collection of Match
instances that represent the
result of applying a regular expression to a string recursively until
the first unsuccessful match occurs.
The Group
class represents the results from a
single grouping expression. From this class, it is possible to drill
down to the individual subexpression matches with the
Captures
property.
The
CaptureCollection
class contains a collection of
Capture
instances, each representing the
results of a single subexpression match.
Combining these classes, you can create the following example:
/* * Sample showing multiple groups, * and groups with multiple captures * Build the sample as: * csc /r:System.Text.RegularExpressions.dll test.cs */ using System; using System.Text.RegularExpressions; class Test { static void Main( ) { string text = "abracadabra1abracadabra2abracadabra3"; string pat = @" ( # start the first group abra # match the literal 'abra' ( # start the second (inner) group cad # match the literal 'cad' )? # end the second (optional) group ) # end the first group + # match one or more occurences "; Console.WriteLine("Original text = ["+text+"]"); Regex r = new Regex(pat, "x"); // use 'x' modifier to ignore comments int[] gnums = r.GetGroupNumbers( ); // get the list of group numbers Match m = r.Match(text); // get first match while (m.Success) { for (int i = 1; i < gnums.Length; i++) // start at group 1 { Group g = m.Group(gnums[i]); // get the group for this match Console.WriteLine("Group"+gnums[i]+"=["+g.ToString( )+"]"); CaptureCollection cc = g.Captures; // get caps for this group for (int j = 0; j < cc.Count; j++) { Capture c = cc[j]; Console.WriteLine( "Capture" + j + "=["+c.ToString( ) + "] Index=" + c.Index + " Length=" + c.Length); } } m = m.NextMatch( ); // get next match } } }
The preceding example produces the following output:
Original text = [abracadabra1abracadabra2abracadabra3] Group1=[abra] Capture0=[abracad] Index=0 Length=7 Capture1=[abra] Index=7 Length=4 Group2=[cad] Capture0=[cad] Index=4 Length=3 Group1=[abra] Capture0=[abracad] Index=12 Length=7 Capture1=[abra] Index=19 Length=4 Group2=[cad] Capture0=[cad] Index=16 Length=3 Group1=[abra] Capture0=[abracad] Index=24 Length=7 Capture1=[abra] Index=31 Length=4 Group2=[cad] Capture0=[cad] Index=28 Length=3
3.12.136.63