Constructs of the standard regular expression and meta characters

Let's get familiar with core constructs of regular expressions and some reserve meta characters that have a special meaning in regular expressions. We shall cover these constructs in detail in the coming chapters:

Symbol Meaning Example
. (dot or period) Matches any character other than newline. Matches #, @, A, f, 5, or .
* (asterisk) * matches zero or more occurrences of the preceding character or group. m* matches 0 or more occurrences of the letter m.
+ (plus) + matches one or more occurrences of the preceding element. m+ matches one or more occurrences of the letter m.
? (question mark) ? means optional match. It is used to match zero or one occurrence of the preceding element. It is also used for lazy matching (which will be covered in the coming chapters). nm? means match n or nm, as m is an optional match here.
| (pipe) | means alternation. It is used to match one of the elements separated by | m|n|p means match either the letter m or the letter n or the letter p
^ (cap) ^ is called anchor, that matches start of the line ^m matches m only when it is the first character of the string that we are testing against the regular expression. Also, note that you do not use ^ in the middle of a regular expression.
$ (dollar) $ is called anchor that matches line end. m$ matches m only at line end.
 (backslash followed by the letter b) Alphabets, numbers, and underscore are considered word characters.  asserts word boundary, which is the position just before and after a word.

java matches the word, java . So, it will not match javascript since the word, javascript, will fail to assert  after java in the regex.

B (backslash followed by uppercase B) B asserts true where  doesn't, that is, between two word characters.

For the input text, abc,

B will be asserted at two places:

  1. Between a and b.
  2. Between b and c.
(...) a sub-pattern inside round parentheses This is for grouping a part of text that can be used to capture a certain substring or for setting precedence. m(ab)*t matches m, followed by zero or more occurrences of the substring, ab, followed by t.
{min,max} A quantifier range to match the preceding element between the minimum and the maximum number. mp{2,4} matches m followed 2 to 4 occurrences of the letter p.
[...] This is called a character class. [A-Z] matches any uppercase English alphabet.
d (backslash followed by the letter d) This will match any digit. d matches any digit in the 0-9 range.
D (backslash followed by uppercase D) This matches any character that is not a digit. D matches a, $, or _.
s (backslash followed by the letter s) Matches any whitespace, including tab, space, or newline. s matches [ ].
S (backslash followed by uppercase S) Matches any non-whitespace. S matches the opposite of s
w (backslash followed by the letter w) Matches any word character that means all alphanumeric characters or underscore. w will match [a-zA-Z0-9_], so it will match any of these strings: "abc", "a123", or "pq_12_ABC"
W (backslash followed by the letter W) Matches any non-word character, including whitespaces. In regex, any character that is not matched by w can be matched using W. It will match any of these strings: "+/=", "$", or " !~"
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.9.169