Exploring the quantifiers

Each of these different quantifiers is greedy. A greedy quantifier will grab as much as it possibly can before allowing the regex engine to move on to the next character in the expression.

In the following example, the expression has been instructed to match everything it can, ending with a character. As a result, it takes everything up to the last , because the expression is greedy:

PS> 'C:longpath	osomefiles' -match '.*'; $matches[0]
True
C:longpath	osome

The repetition operators can be made lazy by adding the ? character. A lazy expression, by contrast, will get as little as it can before it ends:

PS> 'C:longpath	osomefiles' -match '.*?'; $matches[0]
True
C:

A possible use of a lazy quantifier is parsing HTML. The following line describes a very simple HTML table. The goal is to get the first table's data (td) element:

<table><tr><td>Value1</td><td>Value2</td></tr></table>

Using a greedy quantifier will potentially take too much:

PS> $html = '<table><tr><td>Value1</td><td>Value2</td></tr></table>'
$html -match '<td>.+</td>'; $matches[0]
True
<td>Value1</td><td>Value2</td>

Using a character class is one possible way to solve this problem. The character class is used to take all characters except >, which denotes the end of the next </td> tag:

PS> $html = '<table><tr><td>Value1</td><td>Value2</td></tr></table>'
PS> $html -match '<td>[^>]+</td>'
True
PS> $matches[0]
<td>Value1</td>

Another way to solve a problem is to use a lazy quantifier:

PS> $html = '<table><tr><td>Value1</td><td>Value2</td></tr></table>'
PS> $html -match '<td>.+?</td>'
True
PS> $matches[0]
<td>Value1</td>

Table of Contents for Exploring the quantifiers

Create new playlist

Sign In

Sign Up

Table of Contents for
Exploring the quantifiers