Regular Expressions for Patterns

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

APPENDIX

Regular Expressions for Patterns

The pattern facet uses the familiar regular expression syntax to restrict the lexical space of data types. This appendix gives an overview of the pattern syntax.

Complex regular expressions can be constructed from simpler ones with the help of operators. Let S and T be arbitrary regular expressions, c and d be normal characters, and C be a character class (listed below).

A normal character c is any character that is not a metacharacter. Metacharacters are ., , ?, *, +, {,}, (,), [, and ].

Expression	Meaning
c	The string consisting of character c.
C	The string consisting of a character belonging to character class C.
c-d	The string consisting of any single character whose code value is between c and d (inclusive).
ST	Concatenation. All strings st with s matching S and t matching T.
S\|T	Choice. All strings s that match S or T.
^S	Negation. All strings s that do not match S.
S-T	Difference. All strings s that match S but not T.
S?	Option. All strings s that match S, or the empty string.
S*	Powerset. All strings s matching k repetitions of S (including k = 0).
S+	All strings s matching k repetitions of S (including k > 0).
S{n,m}	All strings s matching k repetitions of S (n <= k <= m).
S{n}	All strings s matching exactly n repetitions of S.
S{n,}	All strings s matching at least n repetitions of S.

Escape sequences can be used to represent characters that would otherwise be regarded as metacharacters.

Escape Sequence	Represented Character
	The newline character (#xA)
	The return character (#xD)
	The tab character (#x9)
\
\|	\|
.	.
-	-
^	^
?	?
*	*
+	+
{	{
}	}
(	(
)	)
[	[
]	]

Character classes are represented by the following escape sequence:

p{×}	A character belonging to the category denoted by X (see the following tables).

Letter categories:

Category	Represented Characters
L	All letters.
Lu	Only uppercase letters.
Ll	Only lowercase letters.
Lt	First character of a word may be uppercase depending on language (see Unicode technical report #21).
Lm	Modifier. Various characters such as accents modifying the pronunciation of a character.
Lo	Other.

Marks categories:

Category	Represented Characters
M	All marks.
Mn	All marks, except nonspacing marks.
Mc	Marks combined with whitespace.
Me	Enclosing marks.

Numbers categories:

Category	Represented Characters
N	All numbers. This includes numbers that do not rely on decimal digits such as roman numbers, encircled numbers, bracketed numbers, etc.
Nd	Only decimal digits.
Nl	Letter digits such as roman numbers.
No	All other digit symbols.

Punctuation categories:

Category	Represented Characters
P	All punctuation symbols.
Pc	Connector. All connecting symbols, for example, the underscore.
Pd	Dash. Various connecting dash symbols.
Ps	Open. All opening symbols such as opening parentheses or brackets.
Pe	Close. All closing symbols such as closing parentheses or brackets.
Pi	Initial quote (may behave like Ps or Pe depending on usage).
Pf	Final quote (may behave like Ps or Pe depending on usage).
Po	Other punctuation symbols.

Separator categories:

Category	Represented Characters
Z	All separators.
Zs	Separating space character.
Zl	Line separators.
Zp	Paragraph separators.

Symbol categories:

Category	Represented Characters
S	All symbols.
Sm	Mathematical symbols.
Sc	Currency symbols ($, €).
Sk	Modifier. Various characters such as accents modifying the
	pronunciation of a character, similar to Lm.
So	Other symbols.

Other categories:

Category	Represented Characters
C	All other characters.
Cc	Control. Nonprintable control characters.
Cf	Formatting characters.
Co	Private use for user-defined characters.
Cn	Not assigned (no specific meaning within Unicode).

Abbreviations:

Escape Sequence	Equivalent To	Legend
.	[^( \| )]	Anything except newline or carriage return.
s	t(#x20\| \| \| }]	XML whitespace characters.
S	[^s]	XML non-whitespace characters.
i	The set of initial XML name characters (Letter ‘_’ \| ‘:’).
I	[^i]	Anything except initial XML name characters.
c	The set of XML name characters (NameChar).
C	[^c]	Anything except XML name characters.
d	p{Nd}	Decimal digits.
D	[^d]	Anything except decimal digits.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Regular Expressions for Patterns

Create new playlist

Sign In

Sign Up

Table of Contents for
Regular Expressions for Patterns