3.2. Extended grep (grep -E or egrep)

The main advantage of using extended grep is that additional regular expression metacharacters (see Table 3.4) have been added to the basic set. With the -E extension, Gnu grep allows the use of these new metacharacters.

Table 3.5. egrep's Regular Expression Metacharacters
Metacharacter Function Example What It Matches
^ Beginning of line anchor ^love Matches all lines beginning with love.
$ End of line anchor love$ Matches all lines ending with love.
. Matches one character l..e Matches lines containing an l, followed by two characters, followed by an e.
* Matches zero or more characters *love Matches lines with zero or more spaces, of the preceding characters followed by the pattern love.
[ ] Matches one character in the set [Ll]ove Matches lines containing love or Love.
[^] Matches one character not in the set [^A–KM–Z]ove Matches lines not containing A through K or M through Z, followed by ove.
New with grep -E or egrep
+ Matches one or more of the preceding characters [a–z]+ove Matches one or more lowercase letters, followed by ove. Would find move, approve, love, behoove, etc.
? Matches zero or one of the preceding characters lo?ve Matches for an l followed by either one or not any o's at all. Would find love or lve.
a|b Matches either a or b love|hate Matches for either expression, love or hate.
() Groups characters love(able|ly) (ov)+ Matches for loveable or lovely. Matches for one or more occurrences of ov.
x{m}x{m,} x{m,n}[a] Repetition of character x,m times, at least m times, or between m and n times o{5}o{5,}o{5, 10} Matches if line has 5 o's, at least 5 o's, or between 5 and 10 o's
w alphanumeric word character;[a-zA-Z0-9] lw*e Matches an l followed by zero more word characters, and an e.
W nonalphanumeric word character;[^a-zA-Z0-9]
 word boundary love Matches only the word love.

[a] The { } metacharacters are not supported on all versions of UNIX or all pattern-matching utilities; they usually work with vi and grep. They don't work with UNIX egrep at all.

3.2.1. Extended grep Examples (grep and grep -E)

The following examples illustrate the way the extended set of regular expression metacharacters are used with grep -E and egrep. The grep examples presented earlier illustrate the use of the standard metacharacters, also recognized by egrep. With basic Gnu grep (grep -G), it is possible to use any of the additional metacharacters, provided that each of the special metacharacters is preceded with a backslash.

In the following examples, all three variants of grep are shown to accomplish the same task.

						cat datafile
						northwest   NW   Charles Main       3.0   .98   3   34
						western     WE   Sharon Gray        53    .97   5   23
						southwest   SW   Lewis Dalsass      2.7   .8    2   18
						southern    SO   Suan Chin          5.1   .95   4   15
						southeast   SE   Patricia Hemenway  4.0   .7    4   17
						eastern     EA   TB Savage          4.4   .84   5   20
						northeast   NE   AM Main Jr.        5.1   .94   3   13
						north       NO   Margot Weber       4.5   .89   5    9
						central     CT   Ann Stephens       5.7   .94   5   13
					

Example 3.24.
1 % egrep 'NW|EA' datafile
						northwest       NW         Charles Main       3.0   .98   3   34
						eastern         EA         TB Savage          4.4   .84   5   20

2 % grep -E 'NW|EA' datafile
						northwest       NW         Charles Main       3.0   .98   3   34
						eastern         EA         TB Savage          4.4   .84   5   20

3 % grep 'NW|EA' datafile
4 % grep 'NW|EA' datafile
						northwest       NW       Charles Main         3.0   .98   3   34
						eastern         EA       TB Savage            4.4   .84   5   20
					

Explanation

  1. Prints the line if it contains either the expression NW or the expression EA. In this example, egrep is used. If you do not have the Gnu version of grep, use egrep.

  2. In this example, the Gnu grep is used with the -E option to include the extended metacharacters. Same as egrep.

  3. Regular grep does not normally support extended regular expressions; the vertical bar is an extended regular expression metacharacter used for alternation. Regular grep doesn't recognize it and searches for the explicit pattern `NW|EA.' Nothing matches; nothing prints.

  4. With Gnu regular grep (grep -G), if the metacharacter is preceded with a backslash it will be interpreted as an extended regular expression just as with egrep and grep -E.

% cat datafile
						northwest     NW     Charles Main        3.0    .98   3   34
						western       WE     Sharon Gray         53     .97   5   23
						southwest     SW     Lewis Dalsass       2.7    .8    2   18
						southern      SO     Suan Chin           5.1    .95   4   15
						southeast     SE     Patricia Hemenway   4.0    .7    4   17
						easten        EA     TB Savage           4.4    .84   5   20
						northeast     NE     AM Main Jr.         5.1    .94   3   13
						north         NO     Margot Weber        4.5    .89   5   9
						central       CT     Ann Stephens        5.7    .94   5   13
					

Example 3.25.
% egrep '3+' datafile
% grep -E '3+' datafile
% grep '3+' datafile
						northwest       NW     Charles Main       3.0   .98   3   34
						western         WE     Sharon Gray        5.3   .97   5   23
						northeast       NE     AM Main            5.1   .94   3   13
						central         CT     Ann Stephens       5.7   .94   5   13
					

Explanation

Prints all lines containing one or more 3s.

Example 3.26.
% egrep '2.?[0–9]' datafile
% grep -E '2.?[0–9]' datafile
% grep '2.?[0–9] ' datafile
						western                WE        Sharon Gray        5.3   .97   5   23
						southwest              SW        Lewis Dalsass      2.7   .8    2   18
						eastern                EA        TB Savage          4.4   .84   5   20
					

Explanation

Prints all lines containing a 2, followed by zero or one period, followed by a number in the range between 0 and 9.

Example 3.27.
% egrep '(no)+' datafile
% grep -E '(no)+' datafile
% grep '(no)+' datafile
						northwest        NW        Charles Main       3.0   .98   3   34
						northeast        NE        AM Main            5.1   .94   3   13
						north            NO        Margot Weber       4.5   .89   5    9
					

Explanation

Prints lines containing one or more occurrences of the pattern group no.

Example 3.28.
						grep -E 'w+W+[ABC] ' datafile
						northwest      NW        Charles Main       3.0   .98   3   34
						southern       SO        Suan Chin          5.1   .95   4   15
						northeast      NE        AM Main Jr.        5.1   .94   3   13
						central        CT        Ann Stephens       5.7   .94   5   13
					

Explanation

Prints all lines containing one or more alphanumeric word characters (w+), followed by one or more non-alphanumeric word characters (W+), followed by one letter in the set A, B, C.

Example 3.29.
% egrep 'S(h|u)' datafile
% grep -E 'S(h|u)' datafile
% grep 'S(h|u)' datafile
						western          WE       Sharon Gray        5.3   .97   5   23
						southern         SO       Suan Chin          5.1   .95   4   15
					

Explanation

Prints all lines containing S, followed by either h or u; i.e., Sh or Su.

Example 3.30.
% egrep 'Sh|u' datafile
% grep -E 'Sh|u' datafile
% grep 'Sh|u' datafile
						western          WE          Sharon Gray         5.3   .97   5   23
						southern         SO          Suan Chin           5.1   .95   4   15
						southwest        SW          Lewis Dalsass       2.7   .8    2   18
						southeast        SE          Patricia Hemenway   4.0   .7    4   17
					

Explanation

Prints all lines containing the expression Sh or u.

3.2.2. Anomalies with Regular and Extended Variants of grep

The variants of Gnu grep, supported by Linux, are almost, but not the same, as their UNIX namesakes. For example, the version of egrep, found in Solaris or BSD UNIX, does not support three metacharacter sets: { }for repetition, ( ) for tagging characters, and < >, the word anchors. Under Linux, these metacharacters are available with grep and grep -E, but egrep does not recognize < >. The following examples illustrate these differences, just in case you are running bash or tcsh under a UNIX system other than Linux, and you want to use grep and its family in your shell scripts.

% cat datafile
						northwest    NW        Charles Main        3.0   .98   3   34
						western      WE        Sharon Gray         53    .97   5   23
						southwest    SW        Lewis Dalsass       2.7   .8    2   18
						southern     SO        Suan Chin           5.1   .95   4   15
						southeast    SE        Patricia Hemenway   4.0   .7    4   17
						eastern      EA        TB Savage           4.4   .84   5   20
						northeast    NE        AM Main Jr.         5.1   .94   3   13
						north        NO        Margot Weber        4.5   .89   5    9
						central      CT        Ann Stephens        5.7   .94   5   13
					

Example 3.31.
   (Linux Gnu grep)
1 % grep '<north>' datafile
						Must use backslashes

2 % grep '<north>' datafile
						north            NO        Margot Weber       4.5   .89   5   9

3 % grep -E '<north>' datafile
						north            NO        Margot Weber       4.5   .89   5   9

4 % egrep '<north>' datafile
						north            NO        Margot Weber       4.5   .89   5   9

(Solaris egrep)
5 % egrep '<north>' datafile
						<no output; not recognized>
					

Explanation

  1. No matter what variant of grep is being used, the word anchor metacharacters, < >, must be preceded with a backslash.

  2. This time, grep searches for a word that begins and ends with north. < represents the beginning of word anchor and > represents the end of word anchor.

  3. Grep with the -E option, also recognizes the word anchors.

  4. The Gnu form of egrep recognizes the word anchors.

  5. When using Solaris (SVR4), egrep does not recognize word anchors as regular expression metacharacters.

Example 3.32.
  (Linux Gnu grep)
1 % grep 'w(es)t.*1' datafile
						grep: Invalid back reference

2 % grep 'w(es)t.*1' datafile
						northwest NW Charles Main       3.0 .98 3 34

3 % grep -E 'w(es)t.*1' datafile
						northwest NW Charles Main       3.0 .98 3 34

4 % egrep 'w(es)t.*1' datafile
						northwest NW Charles Main       3.0 .98 3 34

  (Solaris egrep)
5 % egrep 'w(es)t.*1' datafile
						<no output; not recognized>
					

Explanation

  1. When using regular grep, the ( ) extended metacharacters must be backslashed or an error occurs.

  2. If the regular expression, w(es)t, is matched, the pattern, es, is saved and stored in memory register 1. The expression reads: if west is found, tag and save es, search for any number of characters (.*) after it, followed by es (1) again, and print the line. The es in Charles is matched by the backreference.

  3. This is the same as the previous example, except, grep with the -E switch, does not precede the ( ) with backslashes.

  4. The Gnu egrep also uses the extended metacharacters, ( ), without backslashes.

  5. With Solaris, egrep doesn't recognize any form of tagging and backreferencing.

Example 3.33.
  (Linux Gnu grep)
1 % grep '.[0-9]{2}[^0-9]' datafile
						northwest       NW      Charles Main       3.0   .98   3   34
						western         WE      Sharon Gray        5.3   .97   5   23
						southern        SO      Suan Chin          5.1   .95   4   15
						eastern         EA      TB Savage          4.4   .84   5   20
						northeast       NE      AM Main Jr.        5.1   .94   3   13
						north           NO      Margot Weber       4.5   .89   5    9
						central         CT      Ann Stephens       5.7   .94   5   13

2 % grep -E '.[0-9]{2}[^0-9] ' datafile
						northwest       NW           Charles Main         3.0    .98   3    34
						western         WE           Sharon Gray          5.3    .97   5    23
						southern        SO           Suan Chin            5.1    .95   4    15
						eastern         EA           TB Savage            4.4    .84   5    20
						northeast       NE           AM Main Jr.          5.1    .94   3    13
						north           NO           Margot Weber         4.5    .89   5     9
						central         CT           Ann Stephens         5.7    .94   5    13

3 % egrep '.[0-9]{2}[^0-9]' datafile
						northwest      NW      Charles Main       3.0    .98    3    34
						western        WE      Sharon Gray        5.3    .97    5    23
						southern       SO      Suan Chin          5.1    .95    4    15
						eastern        EA      TB Savage          4.4    .84    5    20
						northeast      NE      AM Main Jr.        5.1    .94    3    13
						north          NO      Margot Weber       4.5    .89    5     9
						central        CT      Ann Stephens       5.7    .94    5    13

  (Solaris egrep)
4 % egrep '.[0-9]{2}[^0-9]' datafile
						<no output; not recognized with or without backslashes>
					

Explanation

  1. The extended metacharacters, {}, are used for repetition. The Gnu and UNIX versions of regular grep do not evaluate this extended metacharacter set unless the curly braces are preceded by backslashes. The whole expression reads: search for a literal period ., followed by a number between 0 and 9, [0-9], if the pattern is repeated exactly two times, {2}, followed by a nondigit [^0-9].

  2. With extended grep, grep -E, the repetition metacharacters, {2}, do not need to be preceded with backslashes as in the previous example.

  3. Because Gnu egrep and grep -E are functionally the same, this command produces the same output as the previous example.

  4. This is the standard UNIX version of egrep. It does not recognize the curly braces as an extended metacharacter set either with or without backslashes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.234.150