Appendix B. Regular Expression Reference

Regular expressions play an important role in most text parsing and text matching tasks. They form an important underpinning of the -split and -match operators, the switch statement, the Select-String cmdlet, and more. Tables B-1 through B-9 list commonly used regular expressions.

Table B-1. Character classes: Patterns that represent sets of characters

Character class

Matches

.

Any character except for a newline. If the regular expression uses the SingleLine option, it matches any character.

PS > "T" -match '.'
True

[characters]

Any character in the brackets. For example: [aeiou].

PS > "Test" -match '[Tes]'
True

[^characters]

Any character not in the brackets. For example: [^aeiou].

PS > "Test" -match '[^Tes]'
False

[start-end]

Any character between the characters start and end, inclusive. You may include multiple character ranges between the brackets. For example, [a-eh-j].

PS > "Test" -match '[e-t]'
True

[^start-end]

Any character not between any of the character ranges start through end, inclusive. You may include multiple character ranges between the brackets. For example, [^a-eh-j].

PS > "Test" -match '[^e-t]'
False

p{character class}

Any character in the Unicode group or block range specified by {character class}.

PS > "+" -match 'p{Sm}'
True

P{character class}

Any character not in the Unicode group or block range specified by {character class}.

PS > "+" -match 'P{Sm}'
False

w

Any word character. Note that this is the Unicode definition of a word character, which includes digits, as well as many math symbols and various other symbols.

PS > "a" -match 'w'
True

W

Any nonword character.

PS > "!" -match 'W'
True

s

Any whitespace character.

PS > "`t" -match 's'
True

S

Any nonwhitespace character.

PS > " `t" -match 'S'
False

d

Any decimal digit.

PS > "5" -match 'd'
True

D

Any character that isn’t a decimal digit.

PS > "!" -match 'D'
True

Table B-2. Quantifiers: Expressions that enforce quantity on the preceding expression

Quantifier

Meaning

<none>

One match.

PS > "T" -match 'T'
True

*

Zero or more matches, matching as much as possible.

PS > "A" -match 'T*'
True
PS > "TTTTT" -match '^T*$'
True

PS > 'ATTT' -match 'AT*'; $Matches[0]
True
ATTT

+

One or more matches, matching as much as possible.

PS > "A" -match 'T+'
False
PS > "TTTTT" -match '^T+$'
True

PS > 'ATTT' -match 'AT+'; $Matches[0]
True
ATTT

?

Zero or one matches, matching as much as possible.

PS > "TTTTT" -match '^T?$'
False

PS > 'ATTT' -match 'AT?'; $Matches[0]
True
AT

{n}

Exactly n matches.

PS > "TTTTT" -match '^T{5}$'
True

{n,}

n or more matches, matching as much as possible.

PS > "TTTTT" -match '^T{4,}$'
True

{n,m}

Between n and m matches (inclusive), matching as much as possible.

PS > "TTTTT" -match '^T{4,6}$'
True

*?

Zero or more matches, matching as little as possible.

PS > "A" -match '^AT*?$'
True

PS > 'ATTT' -match 'AT*?'; $Matches[0]
True
A

+?

One or more matches, matching as little as possible.

PS > "A" -match '^AT+?$'
False

PS > 'ATTT' -match 'AT+?'; $Matches[0]
True
AT

??

Zero or one matches, matching as little as possible.

PS > "A" -match '^AT??$'
True

PS > 'ATTT' -match 'AT??'; $Matches[0]
True
A

{n}?

Exactly n matches.

PS > "TTTTT" -match '^T{5}?$'
True

{n,}?

n or more matches, matching as little as possible.

PS > "TTTTT" -match '^T{4,}?$'
True

{n,m}?

Between n and m matches (inclusive), matching as little as possible.

PS > "TTTTT" -match '^T{4,6}?$'
True

Table B-3. Grouping constructs: Expressions that let you group characters, patterns, and other expressions

Grouping construct

Description

(text)

Captures the text matched inside the parentheses. These captures are named by number (starting at one) based on the order of the opening parenthesis.

PS > "Hello" -match '^(.*)llo$'; $matches[1]
True
He

(?<name>)

Captures the text matched inside the parentheses. These captures are named by the name given in name.

PS > "Hello" -match '^(?<One>.*)llo$'; $matches.One
True
He

(?<name1-name2>)

A balancing group definition. This is an advanced regular expression construct, but lets you match evenly balanced pairs of terms.

(?:)

Noncapturing group.

PS > "A1" -match '((A|B)d)'; $matches
True

Name                              Value
----                              -----
2                                 A
1                                 A1
0                                 A1

PS > "A1" -match '((?:A|B)d)'; $matches
True

Name                              Value
----                              -----
1                                 A1
0                                 A1

(?imnsx-imnsx:)

Applies or disables the given option for this group. Supported options are:

i   case-insensitive
m   multiline
n   explicit capture
s   singleline
x   ignore whitespace

PS > "Te`nst" -match '(T e.st)'
False
PS > "Te`nst" -match '(?sx:T e.st)'
True

(?=)

Zero-width positive lookahead assertion. Ensures that the given pattern matches to the right, without actually performing the match.

PS > "555-1212" -match '(?=...-)(.*)'; $matches[1]
True
555-1212

(?!)

Zero-width negative lookahead assertion. Ensures that the given pattern does not match to the right, without actually performing the match.

PS > "friendly" -match '(?!friendly)friend'
False

(?<=)

Zero-width positive lookbehind assertion. Ensures that the given pattern matches to the left, without actually performing the match.

PS > "public int X" -match '^.*(?<=public )int .*$'
True

(?<!)

Zero-width negative lookbehind assertion. Ensures that the given pattern does not match to the left, without actually performing the match.

PS > "private int X" -match '^.*(?<!private )int .*$'
False

(?>)

Nonbacktracking subexpression. Matches only if this subexpression can be matched completely.

PS > "Hello World" -match '(Hello.*)orld'
True
PS > "Hello World" -match '(?>Hello.*)orld'
False

The nonbacktracking version of the subexpression fails to match, as its complete match would be “Hello World”.

Table B-4. Atomic zero-width assertions: Patterns that restrict where a match may occur

Assertion

Restriction

^

The match must occur at the beginning of the string (or line, if the Multiline option is in effect).

PS > "Test" -match '^est'
False

$

The match must occur at the end of the string (or line, if the Multiline option is in effect).

PS > "Test" -match 'Tes$'
False

A

The match must occur at the beginning of the string.

PS > "The`nTest" -match '(?m:^Test)'
True
PS > "The`nTest" -match '(?m:ATest)'
False



The match must occur at the end of the string, or before at the end of the string.

PS > "The`nTest`n" -match '(?m:The$)'
True
PS > "The`nTest`n" -match '(?m:The)'
False
PS > "The`nTest`n" -match 'Test'
True

z

The match must occur at the end of the string.

PS > "The`nTest`n" -match 'Testz'
False

G

The match must occur where the previous match ended. Used with System.Text.RegularExpressions.Match.NextMatch().



The match must occur on a word boundary: the first or last characters in words separated by nonalphanumeric characters.

PS > "Testing" -match 'ing'
True

B

The match must not occur on a word boundary.

PS > "Testing" -match 'ingB'
False

Table B-5. Substitution patterns: Patterns used in a regular expression replace operation

Pattern

Substitution

$number

The text matched by group number number.

PS > "Test" -replace "(.*)st",'$1ar'
Tear

${name}

The text matched by group named name.

PS > "Test" -replace "(?<pre>.*)st",'${pre}ar'
Tear

$$

A literal $.

PS > "Test" -replace ".",'$$'
$$$$

$&

A copy of the entire match.

PS > "Test" -replace "^.*$",'Found: $&'
Found: Test

$`

The text of the input string that precedes the match.

PS > "Test" -replace "est$",'Te$`'
TTeT

$'

The text of the input string that follows the match.

PS > "Test" -replace "^Tes",'Res$'''
Restt

$+

The last group captured.

PS > "Testing" -replace "(.*)ing",'$+ed'
Tested

$_

The entire input string.

PS > "Testing" -replace "(.*)ing",'String: $_'
String: Testing

Table B-6. Alternation constructs: Expressions that let you perform either/or logic

Alternation construct

Description

|

Matches any of the terms separated by the vertical bar character.

PS > "Test" -match '(B|T)est'
True

(?(expression)yes|no)

Matches the yes term if expression matches at this point. Otherwise, matches the no term. The no term is optional.

PS > "3.14" -match '(?(d)3.14|Pi)'
True
PS > "Pi" -match '(?(d)3.14|Pi)'
True
PS > "2.71" -match '(?(d)3.14|Pi)'
False

(?(name)yes|no)

Matches the yes term if the capture group named name has a capture at this point. Otherwise, matches the no term. The no term is optional.

PS > "123" -match '(?<one>1)?(?(one)23|234)'
True
PS > "23" -match '(?<one>1)?(?(one)23|234)'
False
PS > "234" -match '(?<one>1)?(?(one)23|234)'
True

Table B-7. Backreference constructs: Expressions that refer to a capture group within the expression

Backreference construct

Refers to

umber

Group number number in the expression.

PS > "|Text|" -match '(.)Text1'
True
PS > "|Text+" -match '(.)Text1'
False

k<name>

The group named name in the expression.

PS > "|Text|" -match '(?<Symbol>.)Textk<Symbol>'
True
PS > "|Text+" -match '(?<Symbol>.)Textk<Symbol>'
False

Table B-8. Other constructs: Other expressions that modify a regular expression

Construct

Description

(?imnsx-imnsx)

Applies or disables the given option for the rest of this expression. Supported options are:

i   case-insensitive
m   multiline
n   explicit capture
s   singleline
x   ignore whitespace

PS > "Te`nst" -match '(?sx)T e.st'
True

(?# )

Inline comment. This terminates at the first closing parenthesis.

PS > "Test" -match '(?# Match 'Test')Test'
True

# [to end of line]

Comment form allowed when the regular expression has the IgnoreWhitespace option enabled.

PS > "Test" -match '(?x)Test # Matches Test'
True

Table B-9. Character escapes: Character sequences that represent another character

Escaped character

Match

<ordinary characters>

Characters other than . $ ^ { [ ( | ) * + ? match themselves.

a

A bell (alarm) u0007.



A backspace u0008 if in a [] character class. In a regular expression,  denotes a word boundary (between w and W characters) except within a [] character class, where  refers to the backspace character. In a replacement pattern,  always denotes a backspace.

A tab u0009.

A carriage return u000D.

v

A vertical tab u000B.

f

A form feed u000C.

A new line u000A.

e

An escape u001B.

ddd

An ASCII character as octal (up to three digits). Numbers with no leading zero are treated as backreferences if they have only one digit, or if they correspond to a capturing group number.

xdd

An ASCII character using hexadecimal representation (exactly two digits).

cC

An ASCII control character; for example, cC is control-C.

udddd

A Unicode character using hexadecimal representation (exactly four digits).

When followed by a character that is not recognized as an escaped character, matches that character. For example, * is the literal character *.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.14.132