Forward references

Back reference is the term used when a capturing group is referenced after we capture it. Likewise, a forward reference is the term for referencing a group before capturing the text using a captured group. In other words, the forward reference refers to a capturing group that appears later in the regular expression pattern. Forward references are only useful if we use them inside a repeated group, since the regex engine will be able to resolve captured groups and populate the forward reference while repeating a match using quantifiers.

As with most modern languages, Java allows us to use the forward reference of the captured group. Forward reference is a fairly complex concept to understand in regex.

Let's consider an example regex to understand better:

    ^(d+)-(?:2[dA-Fa-f]{4}|(d*.?d+:))+$ 

Here, in this pattern, you can see that we are using the group reference, 2, before defining the captured group number, 2, which comes later in the regex pattern as: (d*.?d+:). For this reason, 2 is a forward reference instead of a backward reference.

With the preceding regex, the following string matches:

    666-6.6:6.6:abcd3.3: 

The string matches because in the first repetition of the non-captured group, the second captured group becomes 6.6: and this can be referenced later.

The following string also matches because the second captured group is overwritten each time it is matched:

    666-3.3:5.5:5.5:abcd 

The first value that the second captured group captures is 3.3:, but it is overwritten by 5.5: later, and this is matched the third time when the first part of the alternatives matches in the non-capturing group.

The following string, however, does not match, although the first capture of the second group will be the string, 3.3:, by the time it is required by the forward reference:

    666-3.3:abcd3.3: 

Thus, as it is undefined, it does not match anything, not even the empty string. Therefore, the first part of the alternative group does not match, and at this point, the second half also does not match, thereby leading to a failed matching. This way, the captured group, in this case, never captures the string, 3.3:.

To summarize this behavior, the forward reference does reference something that is defined in the regular expression later, but only when the expression has already been matched at least once.

For this reason, there is no point in using a forward reference outside some repeated construct as a part of one member of an alternative matching.

Similarly, in the following example, the regex pattern uses a forward reference:

^(?:1#color|(red|blue|white))+$ 

This anchored regex uses the forward reference, 1, before matching the literal text, #color, on the left-hand side of the alternation. On the right-hand side of the alternation, it matches one of the given colors and captures it in the first capturing group. The entire alternation has the + quantifier after to make repeated matches.

In the first attempt, 1 fails the match and forces the regex engine to attempt the right-hand side of the alternation and match a color to populate 1 with the matching color. If there is any text left to be matched in the input, the regex engine, in the second attempt, matches the now populated reference, 1, followed by #color as the first alternation or an allowed color as the second alternation. This process is repeated until the regex engine either matches the complete string or fails the match.

This will match any of the following strings:

    blue 
redblue
blueblueblue
redbluewhite
redred#color
whitewhite#color
blueredred#color

However, it won't match the following strings:

    #color 
red#color
bluered#color
redbluefruit
redbluered#color
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.47.208