Be careful with capturing groups inside a lookahead or lookbehind atomic group

You learned to use capturing groups inside lookahead or lookbehind patterns in the preceding examples. However, you must remember that lookaround expressions are zero-width atomic groups. The regex engine exits from these groups as soon as an assertion is evaluated to true or false. Due to this fact, there is no backtracking inside these groups.

Consider the following three regular expressions. The first one is without any lookahead or atomic group, the second regex uses a lookahead expression, and the third regex uses an atomic group. Note that in each of the regex patterns, we use a capturing group to match and capture zero or more word characters inside the outer group:

#(?:(w*))w*_1 
#(?=(w*))w*_1
#(?>(w*))w*_1

Suppose that we apply the preceding three regex patterns against the following input:

#abc_abc 

The first regex, #(?:(w+)).*_1, will find a successful match with group 1 as "abc". Next, it matches _ and back-references 1 to complete the match. Since the capturing group (w*) initially matches the complete input, "abc_abc", the regex engine backtracks multiple times to make this a successful match.

The second regex will fail to match because (w+) inside the lookahead will match and capture "abc_abc", and when the regex engine exits the lookahead group, it is not able to find a match with .*_1, as there is no further input and the engine won't backtrack to complete the match like it does in the first regex.

The third regex with an atomic group will also fail to match due to the same reason; the regex engine won't backtrack after matching the string inside an atomic group.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.152.58