Table 11-1 summarizes the syntax of regular expressions available in all versions of Tcl:
. | Matches any character. |
* | Matches zero or more instances of the previous pattern item. |
+ | Matches one or more instances of the previous pattern item. |
? | Matches zero or one instances of the previous pattern item. |
( ) | Groups a subpattern. The repetition and alternation operators apply to the preceding subpattern. |
| | Alternation. |
[ ] | Delimit a set of characters. Ranges are specified as [x-y]. If the first character in the set is ^, then there is a match if the remaining characters in the set are not present. |
^ | Anchor the pattern to the beginning of the string. Only when first. |
$ | Anchor the pattern to the end of the string. Only when last. |
Advanced regular expressions, which were introduced in Tcl 8.1, add more syntax that is summarized in Table 11-2:
{m} | Matches m instances of the previous pattern item. |
{m}? | Matches m instances of the previous pattern item. Nongreedy. |
{m,} | Matches m or more instances of the previous pattern item. |
{m,}? | Matches m or more instances of the previous pattern item. Nongreedy. |
{m,n} | Matches m through n instances of the previous pattern item. |
{m,n}? | Matches m through n instances of the previous pattern item. Nongreedy. |
*? | Matches zero or more instances of the previous pattern item. Nongreedy. |
+? | Matches one or more instances of the previous pattern item. Nongreedy. |
?? | Matches zero or one instances of the previous pattern item. Nongreedy. |
(?:re) | Groups a subpattern, re, but does not capture the result. |
(?=re) | Positive look-ahead. Matches the point where re begins. |
(?!re) | Negative look-ahead. Matches the point where re does not begin. |
(?abc) | Embedded options, where abc is any number of option letters listed in Table 11-5. |
c | One of many backslash escapes listed in Table 11-4. |
[: :] | Delimits a character class within a bracketed expression. See Table 11-3. |
[. .] | Delimits a collating element within a bracketed expression. |
[= =] | Delimits an equivalence class within a bracketed expression. |
Table 11-3 lists the named character classes defined in advanced regular expressions and their associated backslash sequences, if any. Character class names are valid inside bracketed character sets with the [:class:] syntax.
alnum | Upper and lower case letters and digits. |
alpha | Upper and lower case letters. |
blank | Space and tab. |
cntrl | Control characters: u0001 through u001F. |
digit | The digits zero through nine. Also d. |
graph | Printing characters that are not in cntrl or space. |
lower | Lowercase letters. |
The same as alnum. | |
punct | Punctuation characters. |
space | Space, newline, carrage return, tab, vertical tab, form feed. Also s. |
upper | Uppercase letters. |
xdigit | Hexadecimal digits: zero through nine, a-f, A-F. |
Table 11-4 lists backslash sequences supported in Tcl 8.1.
a | Alert, or “bell”, character. |
A | Matches only at the beginning of the string. |
Backspace character, u0008. | |
B | Synonym for backslash. |
cX | Control-X. |
d | Digits. Same as [[:digit:]] |
D | Not a digit. Same as [^[:digit:]] |
e | Escape character, u001B. |
f | Form feed, u000C. |
m | Matches the beginning of a word. |
M | Matches the end of a word. |
Newline, u000A. | |
Carriage return, u000D. | |
s | Space. Same as [[:space:]] |
S | Not a space. Same as [^[:space:]] |
Horizontal tab, u0009. | |
uXXXX | A 16-bit Unicode character code. |
v | Vertical tab, u000B. |
w | Letters, digit, and underscore. Same as [[:alnum:]_] |
W | Not a letter, digit, or underscore. Same as [^[:alnum:]_] |
xhh | An 8-bit hexidecimal character code. Consumes all hex digits after x. |
y | Matches the beginning or end of a word. |
Y | Matches a point that is not the beginning or end of a word. |
Matches the end of the string. | |