The subtraction of character classes

Suppose we have to match characters that belong to one class but not to another in a composite character class pattern. There is no separate operator for the subtraction operation. Subtraction is performed by using the intersection operator, &&, and a negated inner character class.

A regular expression is usually more readable if we write the larger set in front and the one we want to subtract from it after the && operator.

For example, consider the following composite character class:

[0-9&&[^3-6]] 

It will match the digits, 0 to 9, except the digits, 3 to 6. This character class can also be written as a union of two character classes:

[[0-2][7-9]] 

We can also just use a simple character class, as follows:

[0-27-9]  
  

In order to match all the English consonant uppercase letters, we can subtract five vowels from uppercase letters, such as in the following regex:

[A-Z&&[^AEIOU]] 

We can also reverse the order of the two sets used in the preceding regex and use the following regex:

[[^AEIOU]&&A-Z]  

Suppose we want to match all punctuation characters except four basic math operators: +, -, *, and /. We can use the following composite character class using the subtraction operation:

[p{Punct}&&[^+*/-]] 

Here is a test class that tests the preceding subtraction character class:

package example.regex; 
 
import java.util.regex.*; 
   
public class SubtractionExample 
{ 
  public static void main(String[] args) 
  { 
    final Pattern p = Pattern.compile("[\p{Punct}&&[^+*/-]]"); 
    final String[] arr = new String[] { 
      "!", "@", "#", "$", "%", "+", "-", "*", "/", "1", "M", "d" 
    }; 
 
    for (String s: arr)  
    { 
      Matcher m = p.matcher(s); 
      System.out.printf("[%s] %s%n", s, 
         (m.matches() ? "matches" : "does not match")); 
    } 
  } 
} 

This program produces the following output when we run it after compilation:

[!] matches 
[@] matches 
[#] matches 
[$] matches 
[%] matches 
[+] does not match 
[-] does not match 
[*] does not match 
[/] does not match 
[1] does not match 
[M] does not match 
[d] does not match 

As is evident from this output, it allows all the punctuation characters except the four listed math operators.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.236.27