Possessive quantifiers

Possessive quantifiers are quantifiers that are greedy when matching text like greedy quantifiers do. Both greedy and possessive quantifiers try to match as many characters as possible. The important difference, however, is that the possessive quantifiers do not backtrack (go back) unlike greedy quantifiers; therefore, it is possible that the regex match fails if the possessive quantifiers go too far.

This table shows all the three types of quantifiers, side by side:

Greedy Quantifier Lazy Quantifier Possessive Quantifier
m* m*? m*+
m+ m+? m++
m? m?? m?+
m{X} m{X}? m{X}+
m{X,} m{X,}? m{X,}+
m{X,Y} m{X,Y}? m{X,Y}+

 

Let's take an example input string, a1b5, and see the behavior of the greedy, lazy, and possessive quantifiers.

If we apply a regex using the greedy quantifier, w+d, then it will match a1b (the longest match before backtracking starts) using w+, and 5 will be matched using d; thus, the full match will be a1b5.

Now, if we apply a regex using the non-greedy quantifier, w+?d, then it will match a (the shortest match before expanding starts) using w+?, and then the adjacent digit 1 will be matched using d. Thus, the first full match will be a1. If we let the regex execute again, then it will find another match, b5.

Finally, if we apply a regex using the possessive quantifier, w++d, then it will match all the characters a1b5 (the longest possible match without giving back) using w++ . Due to this, d remains unmatched, and hence the regex fails to find any match.

Let's take another example. The requirement is to match a string that starts with lowercase English alphabets or hyphen. The string can have any character after the alphabets/hyphens, except a colon. There can be any number of any characters of any length after the colon until the end.

An example of a valid input is as-df999 and that of an invalid input is asdf-:123.

Now, let's try solving this regex problem using a greedy quantifier regex:

    ^[a-z-]+[^:].*$

Unfortunately, this is not the right regex pattern because this regex will match both the aforementioned valid and invalid inputs. This is because of the backtracking behavior of the regex engine in greedy quantifiers. The [a-z-]+ pattern will find the longest possible match in the form of asdf-, but due to the negated character class pattern [^:] , the regex engine will backtrack one position to asdf and will match the next hyphen for [^:]. All the remaining text, that is, :123, will be matched using .*.

Let's try to solve this regex problem using the following possessive quantifier regex:

    ^[a-z-]++[^:].*$

This regex pattern will still match our valid input, but it will fail to match an invalid input because there is no backtracking in possessive quantifiers; hence, the regex engine will not go back any position after matching asdf- in the second example string. Since the next character is a colon and our regex sub-pattern is [^:], the regex engine will stop matching and correctly declare our invalid input a failed match.

Possessive quantifiers are good for the performance of the underlying regex engine because the engine does not have to keep any backtracking information in memory. The performance increase is even more when a regex fails to match because possessive quantifiers fail faster. So, remember that the benefit of possessive quantifiers is to improve the regex performance, especially when using nested quantifiers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.76.164