The intersection of character classes

The intersection operation on character classes results in a composite class that contains every character allowed by all of its operand (inner) classes or, in other words, matches characters that belong to all the character classes in the composite character class pattern. The intersection operator is as follows:

&& 

For example, consider the following composite character class using the && operator:

 [A-Z&&[PQR]] 

This matches any character that is in the range of A to Z and is one of the single P, Q, or R characters. However, the preceding regular expression can also be simply written as follows:

[PQR] 

The following composite character class using intersection matches the digits, 5 and 6, since only these two digits belong to all the three character classes:

[1-7&&[3-6]&&[5-8]] 

To see this regex in action, let's use this complete code:

package example.regex; 
 
import java.util.regex.*; 
   
public class IntersectionExample 
{ 
  public static void main(String[] args) 
  { 
    final Pattern p = Pattern.compile("[1-7&&[3-6]&&[5-8]]"); 
    for (int i=0; i<10; i++) 
    { 
      String s = String.valueOf(i); 
      Matcher m = p.matcher(s); 
       System.out.printf("[%s] %s%n", s, 
          (m.matches() ? "matches" : "does not match")); 
    } 
  } 
} 

When we compile and run the preceding code, we will see the following output:

[0] does not match 
[1] does not match 
[2] does not match 
[3] does not match 
[4] does not match 
[5] matches 
[6] matches 
[7] does not match 
[8] does not match 
[9] does not match 
  

As you can see, it shows "matches" only for the digits, 5 and 6.

Let's take another example, which involves matching a non-whitespace character that is not a Unicode letter. We know we can use the following regex using a positive lookahead:

(?=S)P{L} 

We can also use an intersection operation to write this example, as follows:

[\S&&[\P{L}]] 

Due to the use of the && operator here, it matches a character that satisfies both the properties, S (non-whitespace) and P{L} (non-letters).

Note that the inner square brackets are optional when not using a negated character class in an intersection operation. Hence, the preceding regex can also be written as follows:

[\S&&\P{L}] 

Similarly, to match an uppercase Greek letter, we can use the intersection of the following two classes:

  • p{InGreek}: This matches a character in the Greek block
  • p{Lu}: This matches an uppercase Unicode letter

By combining these two character classes with intersection, we can make a single composite character class, as follows:

[p{InGreek}&&[p{Lu}]] 

To test the preceding regex, let's pick some Greek letters and write a simple Java code, such as the following, to test our regex against the selected Greek letters:

package example.regex; 
 
import java.util.regex.*; 
   
public class UppercaseGreekIntersectionExample 
{ 
  public static void main(String[] args) 
  { 
    final Pattern p = Pattern.compile("[\p{InGreek}&&[\p{Lu}]]"); 
    final String[] arr = new String[] { 
      "Γ", "Δ", "Θ", "Ξ", "Π", "Σ", "Φ", "α", "β", "γ", "δ", "ε", "A", "P", "e", "r" 
    }; 
 
    for (String s: arr)  
    { 
      Matcher m = p.matcher(s); 
      System.out.printf("[%s] %s%n", s, 
         (m.matches() ? "matches" : "does not match")); 
    } 
  } 
} 

When we run the preceding class, it prints the following output:

[Γ] matches
[Δ] matches [Θ] matches [Ξ] matches [Π] matches [Σ] matches [Φ] matches [α] does not match [β] does not match [γ] does not match [δ] does not match [ε] does not match [A] does not match [P] does not match [e] does not match [r] does not match

As you can see, "matches" is only printed for the uppercase Greek letters. For all the other letters, it prints "does not match".

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.19.75