The simplest kind of regular expression is a literal string. More complicated patterns involve the use of metacharacters to describe all the different choices and variations that you want to build into a pattern. Metacharacters don’t match themselves, but describe something else. The metacharacters are:
Metacharacter | Meaning |
---|---|
| Escapes the character(s) immediately following it |
| Matches any single character except a newline
(unless |
| Matches at the beginning of the string (or
line, if |
| Matches at the end of the string (or line, if
|
| Matches the preceding element 0 or more times |
| Matches the preceding element 1 or more times |
| Matches the preceding element 0 or 1 times |
| Specifies a range of occurrences for the element preceding it |
| Matches any one of the class of characters contained within the brackets |
| Groups regular expressions |
| Matches either the expression preceding or following it |
The .
(single dot)
is a wildcard character. When used in a regular expression, it can
match any single character. The exception is the newline character
(
), except when you use the
/s
modifier on the pattern match
operator. This modifier treats the string to be matched against as a
single “long” string with embedded newlines.
The ^
and $
metacharacters are used as anchors in a
regular expression. The ^
matches
the beginning of a line. This character should appear only at the
beginning of an expression to match the beginning of the line. The
exception to this is when the /m
(multiline) modifier is used, in which case it will match at the
beginning of the string and after every newline (except the last, if
there is one). Otherwise, ^
will
match itself, unescaped, anywhere in a pattern, except if it is the
first character in a bracketed character class, in which case it
negates the class.
Similarly, $
will match the
end of a line (just before a newline character) only if it is at the
end of a pattern, unless /m
is
used, in which case it matches just before every newline and at the
end of a string. You need to escape $
to match a literal dollar sign in all
cases, because if $
isn’t at the
end of a pattern (or placed right before a )
or ]
), Perl will attempt to do variable
interpretation. The same holds true for the @
sign, which Perl will interpret as an
array variable start unless it is backslashed.
The *
, +
, and ?
metacharacters are called
quantifiers. They specify the number of times
to match something. They act on the element immediately preceding
them, which could be a single character (including the .
), a grouped expression in parentheses,
or a character class. The {...}
construct is a generalized modifier. You can put two numbers
separated by a comma within the braces to specify minimum and
maximum numbers that the preceding element can match.
Parentheses are used to group characters or expressions. They also have the side effect of remembering what they matched so you can recall and reuse patterns with a special group of variables.
The |
is the
alternation operator in regular expressions. It matches either
what’s on its left side or right side. It does not affect only
single characters. For example:
/you|me|him|her/
looks for any of the four words. You should use parentheses to provide boundaries for alternation:
/And(y|rew)/
This will match either “Andy” or “Andrew”.
52.15.55.18