G boundary assertion

G is a zero-width assertion. It is also a boundary matcher that asserts positions at the end of the previous match or at the start of the string, such as the A assertion for the very first match. The Java regex engine remembers the position of G within the context of a Matcher instance. If Matcher is instantiated again or is reset, then the position of G is also initialized to the start of the string.

For example, consider the following input:

,,,,,123,45,67 

Consider that we need to replace every comma that occurs only at the start of the input with a hyphen so that we have the same number of hyphens as the number of commas at the start. Our final output should be the following:

-----123,45,67 

We cannot just do replaceAll by matching each comma, since that will also replace the comma after 123 and 45, and moreover, we want the same number of hyphens as the number of commas in the input string.

For cases like this, we can use the G assertion and use this Java code snippet:

input = input.replaceAll("\G,", "-"); 

Since G matches the start of the line the first time, it will assert the position just before the first comma. Subsequently, it matches the positions after each comma, since G matches the positions at the end of the previous match. It will stop matching when the control reaches the digit 1. Each of these matches is replaced by a single hyphen, thus giving us the same number of hyphens in the replaced string as the number of leading commas in the original input.

Let's look at another full example to understand the use of G better.

Here is a sample input:

{%var1%, %var2%, %var3%} {%var4%, %var5%, %var6%} 

Our task is to replace the % characters with the # (hash) characters in the first {...} section only. We can assume that { and } are perfectly balanced. The expected output is as follows:

{#var1#, #var2#, #var3#} {%var4%, %var5%, %var6%} 

Note how the output has % replaced with # in only the first pair of {...}.

Here is a code listing to solve this problem:

package example.regex; 
class GBoundaryMatcher
{
public static void main (String[] args)
{
String input = "{%var1%, %var2%, %var3%} " +
"{%var4%, %var5%, %var6%}";
final String re = "(^[^{]*\{|\G(?!^),\h*)%([^%]+)%";
// now use above regex in replaceAll method
String repl = input.replaceAll(re, "$1#$2#");
System.out.println(repl);
}
}

Here is how this regex is doing the job in this code.

We use this regex to match our pattern:

"(^[^{]*\{|\G(?!^),\h*)%([^%]+)%" 

The preceding regex has two groups, as follows:

(^[^{]*\{|\G(?!^),\h*)

This is the captured group number one. We are using alternation to select two possible patterns here:

  • ^[^{]*\{: This matches all text up to the first { from the start.
  • \G(?!^),\h*: This matches text from the end of the previous match followed by a comma and zero or more horizontal whitespaces. (?!^) is a negative lookahead to avoid matching G at the start position. This is needed because G also successfully asserts the start of the input string in the first use.

The %([^%]+)% pattern is our captured group number two. It is the substring surrounded by the % characters.

In the replacement, we use the following:

$1#$2# 

This basically puts back the first group's captured text and wraps the second captured group in # instead of % to get the required output.

Once we compile and run the preceding code, it shows the following output:

{#var1#, #var2#, #var3#} {%var4%, %var5%, %var6%}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.144.228