Character Classes

The [...] construct is used to list a set of characters (a character class) of which one will match. Brackets are often used when capitalization is uncertain in a match:

/[tT]here/

A dash (-) may be used to indicate a range of characters in a character class:

/[a-zA-Z]/;  # Match any single letter
/[0-9]/;     # Match any single digit

To put a literal dash in the list you must use a backslash before it (-).

By placing a ^ as the first element in the brackets, you create a negated character class, i.e., it matches any character not in the list. For example:

/[^A-Z]/;    # Matches any character other than an uppercase letter

Some common character classes have their own predefined escape sequences for your programming convenience :

Code

Matches

d

A digit, same as [0-9]

D

A nondigit, same as [^0-9]

w

A word character (alphanumeric), same as [a-zA-Z_0-9]

W

A non-word character, [^a-zA-Z_0-9]

s

A whitespace character, same as [ f]

S

A non-whitespace character, [^ f]

C

Match a character (byte)

pP

Match P-named (Unicode) property

PP

Match non-P

X

Match extended unicode sequence

While Perl implements lc() and uc( ), which you can use for testing the proper case of words or characters, you can do the same with escape sequences :

Code

Matches

l

Lowercase until next character

u

Uppercase until next character

L

Lowercase until E

U

Uppercase until E

Q

Disable pattern metacharacters until E

E

End case modification

These elements match any single element in (or not in) their class. A w matches only one character of a word. Using a modifier, you can match a whole word, for example, with w+. The abbreviated classes may also be used within brackets as elements of other character classes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.37.151