The union of character classes

The union of character classes will match a character that would be matched by any of the composing character classes. Essentially, this is the definition of the union operation in general. In regular expressions, it is possible to create unions of character classes by simply writing a character class inside another.

You may remember that character classes open with the [ character and close with the ] character, and we can list characters and character ranges between the opening and closing brackets.

In addition to those, we can use other character sets inside the brackets, and the resulting set will be the union of all these character classes. This way, there is no union operator to create the composition of these character classes; we just simply write them inside each other.

For example, consider the following composite character class:

[A-D[PQR]] 

This matches any character in the range of A to D or any single character P, Q, or R. This regular expression can also be written as follows:

[A-DPQR] 

We can also create a union of more than two character classes, such as as in the following regular expression:

[A-D[P-S][X-Z]] 

This matches any character in the range of A to D, any character in the range of P to S, or any character in the range from X to Z. This regular expression can also be written as follows:

[A-DP-SX-Z] 

The union of character classes can also be used with a negated inner character class, and this is where character class unions really start to shine and give us extra value. We get to see a good use of the union operation only when we use the union of various character classes with a negated character class.

Let's consider, for example, the following code listing on the union with a negated character class:

package example.regex; 
 
import java.util.regex.*; 
   
public class UnionExample 
{ 
  public static void main(String[] args) 
  { 
    final String re = "[#@.[^\p{Punct}\s]]"; 
    final String[] arr = new String[] { 
      "A", "#", "@", "1", "5", " ", "
", ":", ".", "a", "%", "-", "3" 
    }; 
    for (String s: arr) 
    { 
   System.out.printf("[%s] %s%n", s, 
   (s.matches(re) ? "matches" : "does not match")); 
    } 
  } 
} 

This regex has the following negated character class:

[^\p{Punct}\s] 

The preceding negated character class allows any character that is not a punctuation and not a space character. Now, suppose that we want to allow a few selected punctuation characters, @, #, and ., or in other words, the [@#.] character class. In this scenario, the union comes handy. We make a composite character class that uses a union of both these cases, as follows:

[#@.[^\p{Punct}\s]] 

Now, this composite character class will allow the [@#.] characters, or any character that is not a punctuation and not a space character.

Once we compile and run the preceding code, we'll get the following output:

[A] matches 
[#] matches 
[@] matches 
[1] matches 
[5] matches 
[ ] does not match 
[ 
] does not match 
[:] does not match 
[.] matches 
[a] matches 
[%] does not match 
[-] does not match 
[3] matches 
 

You can see the output of "matches" for all the character sets that are not included inside our negated character class or allowed by the [#@.] character class. It returns "does not match" for all other cases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.4.191