Embedded regular expression mode modifiers

Like all other regular expression flavors, Java also allows the embedding of some standard modes in the regular expression itself. These mode modifiers are used to change regular expression behavior in a certain manner. In the following table, we will list all these modes and their meaning:

Mode Name Meaning
(?i) Ignore case mode Enables case-insensitive matching for US-ASCII text
(?s) DOTALL mode Makes DOT match all the characters, including line breaks
(?m) Multiline mode Makes the carrot and dollar match the start and end of each line in a multiline input
(?u) Unicode-aware case folding Enables Unicode-aware case folding
(?U) Unicode matching Enables the Unicode version of predefined character classes and POSIX character classes.
(?d) Unix line mode Enables Unix lines mode
(?x) Comment mode Allows for the presence of whitespace and comments in the regex pattern

 

Let's check some examples to understand these modes better.

How to match an input in which the starting word is Java and the ending word is Mode, and we don't know what is in between these two words? Also, the input may contain line breaks as well.

Consider the following example input text, which is in two lines:

Java regex
Embedded Mode

Let's use the following regex:

AJava.*Modez

If we use the preceding regex, then the match will fail because we know that DOT matches all the characters except line breaks by default. Hence, we need to enable the DOTALL mode here using the following:

(?s)AJava.*Modez

Our regex will match the input because (?s) will enable the DOTALL mode and then .* will match the text between Java and Mode.

It is considered good practice to insert comments and line breaks in a complex and lengthy regular expression. In order to allow that, we will need to enable the comment mode using (?x).

Here is an example of a regex with comments and extra whitespaces using multiple modifiers, including (?x):

    String regex = "(?ixs)\A # assert start of the string
"
+ "java "
+ "\s "
+ "regex "
+ ".* # match 0 or more of any character including line breaks "
+ "Mode "
+ "\z # assert end of the string";

It is interesting to note that this regular expression will still match the input text that we used in the previous example. You can clearly see how the use of (?x) allows us to use arbitrary white-spaces and inline comments in our regex.

Let's examine the use of the MULTILINE mode. In the same input text, that is, Java regex Embedded Mode, what would be the regular expression that validates the first line only, which contains the text, Java regex?

Let's use anchors (caret and dollar) and write the regex as follows:

    ^Java regex$

This regex will fail to match our input because the input contains two lines and $ will not assert the position at the end of every line without enabling the MULTILINE mode.

Change your regex to the following:

    (?m)^Java regex$

And bingo! Our regex works now because we enabled the MULTILINE mode using (?m) at the start of the regex.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.91.254