Here is the list of commonly used Unicode character properties in regular expressions that require to match Unicode texts:
Unicode character class | Meaning |
p{L} | Match any letter from any language |
p{Lu} | Match any uppercase letter from any language |
p{Ll} | Match any lowercase letter from any language |
p{N} | Match any digit from any language |
p{P} | Match any punctuation letter from any language |
p{Z} | Match any kind of whitespace or invisible separator |
p{C} | Match any invisible control letter |
p{Sc} | Match any currency symbol |
R | Any Unicode linebreak sequence; is equivalent to u000Du000A|[u000Au000Bu000Cu000Du0085u2028u2029]
It is recommended to use R to match any newline character even if dealing with ASCII text.
|