Back references

Back references provide a convenient way of matching a repeated character or repeated tokens in the input text. By using back references, the regular expression engine can match the exact same text as previously matched by a capturing group.

The syntax of a back reference is a backslash followed by a capturing group number, as shown in the following example:

    3 

The preceding example is a back reference of the third capturing group.

In Java regular expressions, there can be up to 99 back references, each number referencing a captured group number.

For example, if we need to match a two-digit number with the restriction that both digits must be the same, then we need to capture the first digit and then use a back reference for the first captured group, as follows:

    ^(d)1$ 

Now, this regex will match any of these strings: 11, 22, 55, and 88.

We can also use quantifiers after back references as we use them with capturing or non-capturing groups.

For example, if we have to match a five-digit number with all the same digits, then we can use the following regex:

    ^(d)1{4}$ 

As an example, we want to create a regex that matches four pairs of digits, separated by the colon (:), dash (-), or slash (/) characters. We also require the digit pairs to be the same on the first and third positions, and likewise, they must also be the same on the second and fourth positions. For example, 12-45:12-45 and 56/00:56-00 will be a matching string, but 57-13-58:13 will not, because 57 in the first pair of digits is not the same as 58, which is the third pair of digits. We can write our regex as follows:

    ^(d{2})[:/-](d{2})[:/-]1[:/-]2$ 

Let's look at some more useful regular expressions using capturing groups and back references. Suppose we need to write a regular expression for two repeated words, separated by one or more non-word characters in between. We can write our regular expression as follows:

    ^(w+)W+1$ 

The back reference, 1, will make sure we match the exact same word as matched by the first capturing group in this regular expression.

Next, we need to write a regular expression for quoted words. Words can be wrapped using single quotes or double quotes, but the mixing of quotes is not allowed. We can then write our regex as follows:

    ^(['"])w+1$ 

Here, this regex pattern matches and captures the opening single or double quote using a character class. On the other side of the word, we use the 1 back reference to ensure that the same quote is matched in the end.

Back references cannot be used inside a character class, as we learned earlier. There is no grouping inside a character class, and most of the special meta characters become just literals inside a character class.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.95.245