So far, all the examples that we have seen in the first two chapters are for the English language only. However, a regular expression needs to have full support for all the languages using Unicode characters. Java has a Unicode-based regex engine and has extensive support for various Unicode scripts, blocks, and categories.
A specific Unicode character can be matched in two different ways in Java:
- Unicode escape sequence or the u notation: This can be written as "u1234" or "\u1234".
- Hex notation: This can be written as "x{1234}".