Appendix T
Regular Expressions

This appendix summarizes methods of creating and using regular expressions.

Creating Regular Expressions

This section describes the characters you can use to define regular expressions.

Character Escapes

Character escapes are character sequences that match special characters. The following table summarizes useful character escapes.

EscapeMeaning
Matches the tab character
Matches the return character
Matches the new-line character
nnMatches a character with ASCII code given by the two or three octal digits nnn
xnnMatches a character with ASCII code given by the two hexadecimal digits nn
unnnnMatches a character with Unicode representation given by the four hexadecimal digits nnnn

Character Classes

A character class matches one of a set of characters. The following table summarizes useful character class constructs.

ConstructMeaning
[chars]Matches one of the characters inside the brackets. For example, [aeiou] matches a single lowercase vowel.
[^chars]Matches a character that is not inside the brackets. For example, [^aeiouAEIOU] matches a single nonvowel character such as Q, ?, or 3.
[first-last]Matches a character between the character first and the character last. For example, [a–z] matches any lowercase letter between a and z. You can combine multiple ranges as in [a-zA-Z], which matches uppercase or lowercase letters.
.This is a wildcard that matches any single character except . (To match a period, use the . escape sequence.)
wMatches a single “word” character. Normally, this is equivalent to [a-zA-Z_0-9], so it matches letters, the underscore character, and digits.
WMatches a single nonword character. Normally, this is equivalent to [^a-zA-Z_0-9].
sMatches a single whitespace character. Normally, this includes space, form feed, new line, return, tab, and vertical tab.
SMatches a single nonwhitespace character. Normally, this matches everything except space, form feed, new line, return, tab, and vertical tab.
dMatches a single decimal digit. Normally, this is equivalent to [0-9].
DMatches a single character that is not a decimal digit. Normally, this is equivalent to [^0-9].

Anchors

An anchor matches a part of the input without reading any characters from it. The following table summarizes useful anchors.

AnchorMeaning
^Matches the beginning of the line or string
$Matches the end of the string or before the at the end of the line or string
AMatches the beginning of the string
Matches the end of the string or before the at the end of the string
zMatches the end of the string
GMatches where the previous match ended
BMatches a nonword boundary

Regular Expression Options

You can specify regular expression options in three ways:

  • Pass a RegexOptions parameter to a Regex object’s constructor or to a pattern matching methods such as IsMatch.
  • Use the syntax (?options) to include inline options within a regular expression. Options can include i, m, n, s, and x. If the list begins with a character, the following options are turned off.
  • Use the syntax (?options:subexpression) within a regular expression. In this case, options is as before and subexpression is the part of a regular expression during which the options should apply.

The following table lists the available options.

OptionMeaning
iIgnore case.
mMultiline. Here ^ and $ match the beginning and ending of lines instead of the beginning and ending of the whole input string.
sSingle-line. Here . matches all characters including .
nExplicit capture. This makes the method not capture unnamed groups. See the following section “Grouping Constructs” for more information on groups.
xIgnore unescaped whitespace in the pattern and enable comments after the # character.

Grouping Constructs

Grouping constructs let you define capture groups within matching pieces of a string. Parentheses create groups. There are several kinds of groups, some of which are fairly specialized and confusing. The two most common are numbered and named groups.

To create a numbered group, simply enclose a subexpression in parentheses as in (w)1. The w in this expression matches a single word character. The parentheses mean this character is in the first numbered group. The 1 that follows matches whatever is in group 1, in this case a single word character. That means this expression matches a single word character that appears twice.

To create a named group, use the syntax (?<name>subexpression) where name is the name you want to assign to the group and subexpression is a subexpression. Use the syntax k<name> to refer to a named group.

For example, the expression (?<twice>w)k<twice> is equivalent to the previous expression (w)1 except the group is named twice.

Quantifiers

A quantifier makes the regular expression engine match the previous element a certain number of times. The following table describes regular expression quantifiers.

QuantifierMeaning
*Matches the previous element 0 or more times
+Matches the previous element 1 or more times
?Matches the previous element 0 or 1 times
{n}Matches the previous element exactly n times
{n,}Matches the previous element n or more times
{n,m}Matches the previous element between n and m times (inclusive)

If you follow one of these with ?, the pattern matches the preceding expression as few times as possible. For example, the pattern BO+ matches B followed by 1 or more Os, so it would match the BOO in BOOK. The pattern BO+? also matches B followed by 1 or more Os, but it matches as few Os as possible, so it would match only the BO in BOOK.

Alternation Constructs

An alternation construct uses the | character to allow a pattern to match either of two subexpressions. For example, the expression ^(true|yes)$ matches either true or yes.

Sample Regular Expressions

The following list shows several useful regular expressions.

  • ^d{3}-d{4}$—Matches a simple 7-digit phone number.
  • ^[2-9][0-9]{2}-d{4}$—Matches a 7-digit phone number more precisely.
  • ^[2-9][0-8]d-[2-9][0-9]{2}-d{4}$—Matches a 10-digit U.S. phone number with the format NPA-NXX-XXXX where N is a digit 2-9, P is a digit 0-8, A is any digit 0-9, and X is any digit 0-9.
  • ^([2-9][0-8]d-)?[2-9][0-9]{2}-d{4}$—Matches a U.S. phone number with an optional area code such as 202-234-5678 or 234-5678.
  • ^d{5}(-d{4})?$—Matches a U.S. ZIP code with optional +4 as in 12345 or 12345-6789.
  • ^[A-Z]d[A-Z] d[A-Z]d$—Matches a Canadian postal code with the format A1A 1A1 where A is any capital letter and 1 is any digit.
  • ^[a-zA-Z0-9._-]{3,16}$—Matches a username with 3 to 16 characters that can be dashes, letters, digits, periods, or underscores.
  • ^[a-zA-Z][a-zA-Z0-9._-]{2,15}$—Matches a username that includes a letter followed by 2 to 15 dashes, letters, digits, periods, or underscores.
  • ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9._%+-]+.[a-zA-Z]{2,4}$—Matches an e-mail address. (This pattern isn’t perfect but it matches most valid e-mail addresses.)
  • ^[+-]?[a-fA-Z0-9]{3}$—Matches a 3-digit hexadecimal value with an optional sign + or – as in +A1F.
  • ^(https?://)?([w-]+.)+[w-]+$—Matches a top-level HTTP web address such as http://www.csharphelper.com. (This pattern isn’t perfect. In particular it doesn’t validate the final part of the domain, so it would match www.something.whatever.)
  • ^(https?://)?([w-]+.)+[w-]+(/(([w-]+)(.[w-]+)*)*)*$—Matches an HTTP web URI such as http://www.csharphelper.com/howto_index.html. (Again this pattern isn’t perfect and doesn’t handle some more advanced URLs such as those that include =, ?, and # characters, but it does handle many typical URLs.)

Using Regular Expressions

The Regex class provides objects that you can use to work with regular expressions. The following table summarizes the Regex class’s most useful methods.

MethodPurpose
IsMatchReturns true if a string satisfies a regular expression.
MatchSearches a string for the first part of it that satisfies a regular expression.
MatchesReturns a collection giving information about all parts of a string that satisfy a regular expression.
ReplaceReplaces some or all the parts of the string that match a regular expression with a new value. (This is much more powerful than the string class’s Replace method.)
SplitSplits a string into an array of substrings delimited by pieces of the string that match a regular expression.

The Regex class also provides static versions of these methods that take both a string to examine and a regular expression as parameters.

The following sections summarize how to use the Regex class to perform common regular expression tasks.

Matching Patterns

The Regex class’s static IsMatch method gives you an easy way to determine whether a string satisfies a regular expression. The following code tests whether the text in variable text matches the pattern in variable pattern.

if (Regex.IsMatch(text, pattern))
    result = "Match";
else
    result = "No match";

Finding Matches

The Regex class’s Matches method can give you information about places where a string matches a regular expression. The following code locates pieces of the string in variable text that match the pattern in variable pattern.

// Make the regex object.
Regex regex = new Regex(pattern);

// Find the matches.
foreach (Match match in regex.Matches(text))
{
    // Display the match.
    Console.WriteLine(match.Value);
}

The following table lists the Match class’s most useful properties.

PropertyPurpose
GroupsReturns a collection of objects representing any groups captured by the regular expression. The Group class has Index, Length, and Value properties that describe the group.
IndexThe index of the match’s first character.
LengthThe length of the text represented by this match.
ValueThe text represented by this match.

Making Replacements

The Regex class’s static Replace method enables you to replace the parts of a string that match a pattern with a new string. The following code examines the string in variable text, locates pieces that match the pattern in variable pattern, and replaces them with the text in variable replaceWith.

string result = Regex.Replace(
    text,
    pattern,
    replaceWith);

For example, the following code replaces vowels in a string with question marks.

string result =
    Regex.Replace("The quick brown fox jumps over the lazy dog", "[aeiou]", "?")

The following text shows the result.

Th? q??ck br?wn f?x j?mps ?v?r th? l?zy d?g
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.249.194