Regular Expression Syntax

The simplest kind of regular expression is a literal string. More complicated patterns involve the use of metacharacters to describe all the different choices and variations that you want to build into a pattern. Metacharacters don’t match themselves, but describe something else. The metacharacters are:

Metacharacter

Meaning

Escapes the character(s) immediately following it

.

Matches any single character except a newline (unless /s is used)

^

Matches at the beginning of the string (or line, if /m is used)

$

Matches at the end of the string (or line, if /m is used)

*

Matches the preceding element 0 or more times

+

Matches the preceding element 1 or more times

?

Matches the preceding element 0 or 1 times

{...}

Specifies a range of occurrences for the element preceding it

[...]

Matches any one of the class of characters contained within the brackets

(...)

Groups regular expressions

|

Matches either the expression preceding or following it

The . (single dot) is a wildcard character. When used in a regular expression, it can match any single character. The exception is the newline character ( ), except when you use the /s modifier on the pattern match operator. This modifier treats the string to be matched against as a single “long” string with embedded newlines.

The ^ and $ metacharacters are used as anchors in a regular expression. The ^ matches the beginning of a line. This character should appear only at the beginning of an expression to match the beginning of the line. The exception to this is when the /m (multiline) modifier is used, in which case it will match at the beginning of the string and after every newline (except the last, if there is one). Otherwise, ^ will match itself, unescaped, anywhere in a pattern, except if it is the first character in a bracketed character class, in which case it negates the class.

Similarly, $ will match the end of a line (just before a newline character) only if it is at the end of a pattern, unless /m is used, in which case it matches just before every newline and at the end of a string. You need to escape $ to match a literal dollar sign in all cases, because if $ isn’t at the end of a pattern (or placed right before a ) or ]), Perl will attempt to do variable interpretation. The same holds true for the @ sign, which Perl will interpret as an array variable start unless it is backslashed.

The *, +, and ? metacharacters are called quantifiers. They specify the number of times to match something. They act on the element immediately preceding them, which could be a single character (including the .), a grouped expression in parentheses, or a character class. The {...} construct is a generalized modifier. You can put two numbers separated by a comma within the braces to specify minimum and maximum numbers that the preceding element can match.

Parentheses are used to group characters or expressions. They also have the side effect of remembering what they matched so you can recall and reuse patterns with a special group of variables.

The | is the alternation operator in regular expressions. It matches either what’s on its left side or right side. It does not affect only single characters. For example:

/you|me|him|her/

looks for any of the four words. You should use parentheses to provide boundaries for alternation:

/And(y|rew)/

This will match either “Andy” or “Andrew”.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.55.18