Matching Text

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

1.3. Matching Text

A number of Unix text-processing utilities let you search for, and in some cases change, text patterns rather than fixed strings. These utilities include the editing programs ed, ex, vi, and sed, the awk programming language, and the commands grep and egrep. Text patterns (formally called regular expressions) contain normal characters mixed with special characters (called metacharacters).

This section presents the following topics:

Filenames versus patterns
List of metacharacters available to each program
Description of metacharacters
Examples

1.3.1. Filenames Versus Patterns

Metacharacters used in pattern matching are different from metacharacters used for filename expansion. When you issue a command on the command line, special characters are seen first by the shell, then by the program; therefore, unquoted metacharacters are interpreted by the shell for filename expansion. The command:

$ grep [A-Z]* chap[12]

could, for example, be transformed by the shell into:

$ grep Array.c Bug.c Comp.c chap1 chap2

and would then try to find the pattern Array.c in files Bug.c, Comp.c, chap1, and chap2. To bypass the shell and pass the special characters to grep, use quotes:

$ grep "[A-Z]*" chap[12]

Double quotes suffice in most cases, but single quotes are the safest bet.

Note also that in pattern matching, ? matches zero or one instance of a regular expression; in filename expansion, ? matches a single character.

1.3.2. Metacharacters

1.3.2.1. Search patterns

The characters in the following table have special meaning only in search patterns:

Character	Pattern
`.`	Match any single character except newline. Can match newline in awk.
`*`	Match any number (or none) of the single character that immediately precedes it. The preceding character can also be a regular expression. E.g., since `.` (dot) means any character, `.*` means "match any number of any character."
`^`	Match the following regular expression at the beginning of the line or string.
`$`	Match the preceding regular expression at the end of the line or string.
`[ ]`	Match any one of the enclosed characters. A hyphen (`-`) indicates a range of consecutive characters. A circumflex (`^`) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (`]`) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally).
`{n`,`m}`	Match a range of occurrences of the single character that immediately precedes it. The preceding character can also be a metacharacter. `{n}` matches exactly n occurrences, `{n,}` matches at least n occurrences, and `{n,m}` matches any number of occurrences between n and m. n and m must be between 0 and 255, inclusive.
`{n`,`m}`	Just like `{n,m}`, earlier, but with backslashes in front of the braces.
	Turn off the special meaning of the following character.
`( )`	Save the pattern enclosed between `(` and `)` into a special holding space. Up to nine patterns can be saved on a single line. The text matched by the subpatterns can be "replayed" in substitutions by the escape sequences `1` to `9`.
`n`	Replay the nth sub-pattern enclosed in `(` and `)` into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left. See the following examples.
`< >`	Match characters at beginning (`<`) or end (`>`) of a word.
`+`	Match one or more instances of preceding regular expression.
`?`	Match zero or one instances of preceding regular expression.
`\|`	Match the regular expression specified before or after.
`( )`	Apply a match to the enclosed group of regular expressions.

Many Unix systems allow the use of POSIX "character classes" within the square brackets that enclose a group of characters. These are typed enclosed in [: and :]. For example, [[:alnum:]] matches a single alphanumeric character.

Class	Characters Matched
`alnum`	Alphanumeric characters
`alpha`	Alphabetic characters
`blank`	Space or tab
`cntrl`	Control characters
`digit`	Decimal digits
`graph`	Non-space characters
`lower`	Lowercase characters
`print`	Printable characters
`space`	White-space characters
`upper`	Uppercase characters
`xdigit`	Hexadecimal digits

1.3.2.2. Replacement patterns

The characters in the following table have special meaning only in replacement patterns.

Character	Pattern
	Turn off the special meaning of the following character.
`n`	Restore the text matched by the nth pattern previously saved by `(` and `)`. n is a number from 1 to 9, with 1 starting on the left.
`&`	Reuse the text matched by the search pattern as part of the replacement pattern.
`~`	Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern. (ex and vi).
`%`	Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern. (ed).
`u`	Convert first character of replacement pattern to uppercase.
`U`	Convert entire replacement pattern to uppercase.
`l`	Convert first character of replacement pattern to lowercase.
`L`	Convert entire replacement pattern to lowercase.

1.3.3. Metacharacters, Listed
by Unix Program

Some metacharacters are valid for one program but not for another. Those that are available to a Unix program are marked by a bullet () in the following table. (This table is correct for SVR4 and Solaris and most commerical Unix systems, but it's always a good idea to verify your system's behavior.) Items marked with a "P" are specified by POSIX; double check your system's version. Full descriptions were provided in the previous section.

Symbol	awkegrep	Action
.		Match any character.
*		Match zero or more preceding.
^		Match beginning of line/string.
$		Match end of line/string.
		Escape following character.
[ ]		Match one from a set.
( )		Store pattern for later replay.^[1]
`n`		Replay sub-pattern in match.
{ }	P	Match a range of instances.
{ }		Match a range of instances.
		Match word's beginning or end.
+		Match one or more preceding.
?		Match zero or one preceding.
\|		Separate choices to match.
( )		Group expressions to match.

^[1] Stored sub-patterns can be "replayed" during matching. See the examples, below.

Note that in ed, ex, vi, and sed, you specify both a search pattern (on the left) and a replacement pattern (on the right). The metacharacters above are meaningful only in a search pattern.

In ed, ex, vi, and sed, the following metacharacters are valid only in a replacement pattern:

Symbol	ex	vi	sed	ed	Action
					Escape following character.
`n`					Text matching pattern stored in `( )`.
`&`					Text matching search pattern.
`~`					Reuse previous replacement pattern.
`%`					Reuse previous replacement pattern.
`u U`					Change character(s) to uppercase.
`l L`					Change character(s) to lowercase.
`E`					Turn off previous `U` or `L`.
`e`					Turn off previous `u` or `l`.

1.3.4. Examples of Searching

When used with grep or egrep, regular expressions should be surrounded by quotes. (If the pattern contains a $, you must use single quotes; e.g., 'pattern'.) When used with ed, ex, sed, and awk, regular expressions are usually surrounded by / although (except for awk), any delimiter works. Here are some example patterns.

Pattern	What Does It Match?
`bag`	The string bag.
`^bag`	bag at the beginning of the line.
`bag$`	bag at the end of the line.
`^bag$`	bag as the only word on the line.
`[Bb]ag`	Bag or bag.
`b[aeiou]g`	Second letter is a vowel.
`b[^aeiou]g`	Second letter is a consonant (or uppercase or symbol).
`b.g`	Second letter is any character.
`^…$`	Any line containing exactly three characters.
`^.`	Any line that begins with a dot.
`^.[a-z][a-z]`	Same, followed by two lowercase letters (e.g., troff requests).
`^.[a-z]{2}`	Same as previous, ed, grep and sed only.
`^[^.]`	Any line that doesn't begin with a dot.
`bugs*`	bug, bugs, bugss, etc.
`"word"`	A word in quotes.
`"word"`	A word, with or without quotes.
`[A-Z][A-Z]*`	One or more uppercase letters.
`[A-Z]+`	Same as previous, egrep or awk only.
`[[:upper:]]+`	Same as previous, POSIX egrep or awk.
`[A-Z].*`	An uppercase letter, followed by zero or more characters.
`[A-Z]*`	Zero or more uppercase letters.
`[a-zA-Z]`	Any letter, either lower- or uppercase.
`[^0-9A-Za-z]`	Any symbol or space (not a letter or a number).
`[^[:alnum:]]`	Same, using POSIX character class.

egrep or awk pattern	What Does It Match?
`[567]`	One of the numbers 5, 6, or 7.
`five\|six\|seven`	One of the words five, six, or seven.
`80[2-4]?86`	8086, 80286, 80386, or 80486.
`80[2-4]?86\|(Pentium(-II)?)`	8086, 80286, 80386, 80486, Pentium, or Pentium-II.
`compan(y\|ies)`	company or companies.

ex or vi pattern	What Does It Match?
`<the`	Words like theater, there or the.
`the>`	Words like breathe, seethe or the.
`<the>`	The word the.

ed, sed, or grep pattern	What Does It Match?
`0{5,}`	Five or more zeros in a row.
`[0-9]{3}-[0-9]{2}-[0-9]{4}`	U.S. Social Security number (nnn-nn-nnnn).
`(why).*1`	A line with two occurrences of why.
`([[:alpha:]_][[:alnum:]_.]*) = 1;`	C/C++ simple assignment statements.

1.3.4.1. Examples of searching and replacing

The following examples show the metacharacters available to sed or ex. Note that ex commands begin with a colon. A space is marked by a ; a tab is marked by a .

Command	Result
`s/.*/( & )/`	Redo the entire line, but add parentheses.
`s/.*/mv & &.old/`	Change a wordlist (one word per line) into mv commands.
`/^$/d`	Delete blank lines.
`:g/^$/d`	Same as previous, in ex editor.
`/^[]*$/d`	Delete blank lines, plus lines containing only spaces or tabs.
`:g/^[]*$/d`	Same as previous, in ex editor.
`s/*//g`	Turn one or more spaces into one space.
`:%s/*//g`	Same as previous, in ex editor.
`:s/[0-9]/Item &:/`	Turn a number into an item label (on the current line).
`:s`	Repeat the substitution on the first occurrence.
`:&`	Same as previous.
`:sg`	Same as previous, but for all occurrences on the line.
`:&g`	Same as previous.
`:%&g`	Repeat the substitution globally (i.e., on all lines).
`:.,$s/Fortran/U&/g`	On current line to last line, change word to uppercase.
`:%s/.*/L&/`	Lowercase entire file.
`:s/<./u&/g`	Uppercase first letter of each word on current line. (Useful for titles.)
`:%s/yes/No/g`	Globally change a word to No.
`:%s/Yes/~/g`	Globally change a different word to No (previous replacement).

Finally, some sed examples for transposing words. A simple transposition of two words might look like this:

s/die or do/do or die/ 	Transpose words

The real trick is to use hold buffers to transpose variable patterns. For example:

s/([Dd]ie) or ([Dd]o)/2 or 1/  Transpose, using 
                                                     hold buffers

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Matching Text

Create new playlist

Sign In

Sign Up