APPENDIX

Regular Expressions for Patterns

The pattern facet uses the familiar regular expression syntax to restrict the lexical space of data types. This appendix gives an overview of the pattern syntax.

Complex regular expressions can be constructed from simpler ones with the help of operators. Let S and T be arbitrary regular expressions, c and d be normal characters, and C be a character class (listed below).

A normal character c is any character that is not a metacharacter. Metacharacters are ., , ?, *, +, {,}, (,), [, and ].

Expression Meaning
c The string consisting of character c.
C The string consisting of a character belonging to character class C.
c-d The string consisting of any single character whose code value is between c and d (inclusive).
ST Concatenation. All strings st with s matching S and t matching T.
S|T Choice. All strings s that match S or T.
^S Negation. All strings s that do not match S.
S-T Difference. All strings s that match S but not T.
S? Option. All strings s that match S, or the empty string.
S* Powerset. All strings s matching k repetitions of S (including k = 0).
S+ All strings s matching k repetitions of S (including k > 0).
S{n,m} All strings s matching k repetitions of S (n <= k <= m).
S{n} All strings s matching exactly n repetitions of S.
S{n,} All strings s matching at least n repetitions of S.

Escape sequences can be used to represent characters that would otherwise be regarded as metacharacters.

Escape Sequence Represented Character
The newline character (#xA)
The return character (#xD)
The tab character (#x9)
\
| |
. .
- -
^ ^
? ?
* *
+ +
{ {
} }
( (
) )
[ [
] ]

Character classes are represented by the following escape sequence:

p{×} A character belonging to the category denoted by X (see the following tables).

Letter categories:

Category Represented Characters
L All letters.
Lu Only uppercase letters.
Ll Only lowercase letters.
Lt First character of a word may be uppercase depending on language (see Unicode technical report #21).
Lm Modifier. Various characters such as accents modifying the pronunciation of a character.
Lo Other.

Marks categories:

Category Represented Characters
M All marks.
Mn All marks, except nonspacing marks.
Mc Marks combined with whitespace.
Me Enclosing marks.

Numbers categories:

Category Represented Characters
N All numbers. This includes numbers that do not rely on decimal digits such as roman numbers, encircled numbers, bracketed numbers, etc.
Nd Only decimal digits.
Nl Letter digits such as roman numbers.
No All other digit symbols.

Punctuation categories:

Category Represented Characters
P All punctuation symbols.
Pc Connector. All connecting symbols, for example, the underscore.
Pd Dash. Various connecting dash symbols.
Ps Open. All opening symbols such as opening parentheses or brackets.
Pe Close. All closing symbols such as closing parentheses or brackets.
Pi Initial quote (may behave like Ps or Pe depending on usage).
Pf Final quote (may behave like Ps or Pe depending on usage).
Po Other punctuation symbols.

Separator categories:

Category Represented Characters
Z All separators.
Zs Separating space character.
Zl Line separators.
Zp Paragraph separators.

Symbol categories:

Category Represented Characters
S All symbols.
Sm Mathematical symbols.
Sc Currency symbols ($, €).
Sk Modifier. Various characters such as accents modifying the
pronunciation of a character, similar to Lm.
So Other symbols.

Other categories:

Category Represented Characters
C All other characters.
Cc Control. Nonprintable control characters.
Cf Formatting characters.
Co Private use for user-defined characters.
Cn Not assigned (no specific meaning within Unicode).

Abbreviations:

Escape Sequence Equivalent To Legend
. [^( | )] Anything except newline or carriage return.
s t(#x20| | | }] XML whitespace characters.
S [^s] XML non-whitespace characters.
i The set of initial XML name characters (Letter ‘_’ | ‘:’).  
I [^i] Anything except initial XML name characters.
c The set of XML name characters (NameChar).  
C [^c] Anything except XML name characters.
d p{Nd} Decimal digits.
D [^d] Anything except decimal digits.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.7.240