3.1. The grep Command

3.1.1. The Meaning of grep

The name grep can be traced back to the ex editor. If you invoked that editor and wanted to search for a string, you would type at the ex prompt:

: /pattern/p

The first line containing the string pattern would be printed as "p" by the print command. If you wanted all the lines that contained pattern to be printed, you would type:

:g/pattern/p

When g precedes pattern, it means "all lines in the file," or "perform a global substitution."

Because the search pattern is called a regular expression, we can substitute RE for pattern and the command reads:

: g/RE/p

And there you have it. The meaning of grep and the rigin of its name. It means "g lobally search for the regular expression (RE) and print out the line." The nice part of using grep is that you do not have to invoke an editor to perform a search, and you do not need to enclose the regular expression in forward slashes. It is much faster than using ex or vi.

3.1.2. How grep Works

The grep command searches for a pattern of characters in a file or multiple files. If the pattern contains white space, it must be quoted. The pattern is either a quoted string or a single word[2], and all other words following it are treated as filenames. Grep sends its output to the screen and does not change or affect the input file in any way.

[2] A word is also called a token.

Format

grep word filename filename

Example 3.1.
% grep Tom/etc/passwd
					

Explanation

Grep will search for the pattern Tom in a file called /etc/passwd. If successful, the line from the file will appear on the screen; if the pattern is not found, there will be no output at all; and if the file is not a legitimate file, an error will be sent to the screen. If the pattern is found, grep returns an exit status of 0, indicating success; if the pattern is not found, the exit status returned is 1; and if the file is not found, the exit status is 2.

The grep program can get its input from a standard input or a pipe, as well as from files. If you forget to name a file, grep will assume it is getting input from standard input, the keyboard, and will stop until you type something. If coming from a pipe, the output of a command will be piped as input to the grep command, and if a desired pattern is matched, grep will print the output to the screen.

Example 3.2.
% ps aux | grep root
					

Explanation

The output of the ps command (ps aux displays processes running on this system) is sent to grep and all lines containing root are printed.

3.1.3. Basic and Extended Regular Expressions

The grep command supports a number of regular expression metacharacters (see Table 3.2) to help further define the search pattern. It also provides a number of options (see Table 3.3) to modify the way it does its search or displays lines. For example, you can provide options to turn off case-sensitivity, display line numbers, display filenames, and so on.

There are two versions of regular expression metacharacters: basic and extended. The regular version of grep uses the basic set ( Table 3.2), and egrep (or grep -E) uses the extended set (Table 3.3). With Gnu grep, both sets are available. The basic set consists of:

						^, $, ., *, [ ], [^ ], <, and >
					

In addition, Gnu grep recognizes: , w, and W, as well as a new class of POSIX metacharacters. (See Table 3.4.)

With the -E option to Gnu grep, the extended set are available, but even without the -E option, regular grep, the default, can use the extended set of metacharacters provided that the metacharacters are preceded with a backslash.[3] For example, the extended set of metacharacters are:

[3] In any version of grep, a metacharacter can be quoted with a backslash to turn off its special meaning.

						?, +, { }, |, ()
					

The extended set of metacharacters have no special meaning to regular grep, unless they are backslashed as follows:

						?, +, {, |, (, )
					

The format for using the Gnu grep is shown in Table 3.1.

Table 3.1. Gnu grep
FormatWhat It Understands
grep 'pattern' filename(s) Basic RE metacharacters (the default)
grep -G 'pattern' filename(s) Same as above; the default
grep -E 'pattern' filename(s)Extended RE metacharacters
grep -F 'pattern' filenameNo RE metacharacters

Table 3.2. grep's Regular Expression Metacharacters (The Basic Set)
MetacharacterFunctionExampleWhat It Matches
^Beginning of line anchor^loveMatches all lines beginning with love.
$End of line anchor love$Matches all lines ending with love.
.Matches one characterl..eMatches lines containing an l, followed by two characters, followed by an e.
* Matches zero or more characters *love Matches lines with zero or more spaces, of the preceding characters followed by the pattern love.
[ ] Matches one character in the set [Ll]ove Matches lines containing love or Love.
[^] Matches one character not in the set [^A–K]ove Matches lines not containing A through K followed by ove.
<[a] Beginning of word anchor <love Matches lines containing a word that begins with love.
> End of word anchor love> Matches lines containing a word that ends with love.
(..)[b] Tags matched characters (love)able Tags marked portion in a register to be remembered later as number 1. To reference later, use 1 to repeat the pattern. May use up to nine tags, starting with the first tag at the leftmost part of the pattern. For example, the pattern love is saved in register 1 to be referenced later as 1.
x{m} x{m,} x{m,n}[c] Repetition of character x, m times, at least m times, or between m and n times o{5} o{5,}o{5,10} Matches if line has 5 o's, at least 5 o's, or between 5 and 10 o's
w alphanumeric word character;[a-zA-Z0-9] lw*e Matches an l followed by zero more word characters, and an e.
W nonalphanumeric word character;[^a-zA-Z0-9] loveW+ Matches love followed by one or more non-word characters, such as a period, question mark, etc.
 word boundary love Matches only the word love.

[a] These metacharacters do not work unless backslashed, even with grep -E and Gnu egrep; they don't work with UNIX egrep at all.

[b] These metacharacters are really part of the extended set, but are placed here because they work with UNIX grep and Gnu regular grep, if backslashed. They do not work with UNIX egrep at all.

[c] The { } metacharacters are not supported on all versions of UNIX or all pattern-matching utilities; they usually work with vi and grep. They don't work with UNIX egrep at all.

Table 3.3. The Additional Extended Set (Used with egrep and grep -E)
Metacharacter Function Example What It Matches
+ Matches one or more of the preceding characters [a–z]+ove Matches one or more lowercase letters, followed by ove. Would find move, approve, love, behoove, etc.
? Matches zero or one of the preceding characters lo?ve Matches for an l followed by either one or not any o's at all. Would find love or lve.
a|b|c Matches either a or b or c love|hate Matches for either expression, love or hate.
() Groups characters love(able|rs) (ov)+ Matches for loveable or lovers. Matches for one or more occurrences of ov.
(..) (...) 1 2[a] Tags matched characters (love)ing Tags marked portion in a register to be remembered later as number 1. To reference later, use 1 to repeat the pattern. May use up to nine tags, starting with the first tag at the leftmost part of the pattern. For example, the pattern love is saved in register 1 to be referenced later as 1.
x{m}x{m,} x{m,n}[b] Repetition of character x,m times, at least m times, or between m and n times o{5}o{5,}o{5, 10} Matches if line has 5 o's, at least 5 o's, or between 5 and 10 o's

[a] Tags and back references do not work with UNIX egrep.

[b] The { } metacharacters are not supported on all versions of UNIX or all pattern-matching utilities; they usually work with vi and grep. They do not work with UNIX egrep at all.

The POSIX Character Class

POSIX (the Portable Operating System Interface) is an industry standard to ensure that programs are portable across operating systems. In order to be portable, POSIX recognizes that different countries or locales may differ in the way they encode characters, represent currency, and how times and dates are represented. To handle different types of characters, POSIX added to the basic and extended regular expressions, the bracketed character class of characters shown in Table 3.4.

The class, for example, [:alnum:] is another way of saying A-Za-z0-9. To use this class, it must be enclosed in another set of brackets for it to be recognized as a regular expression. For example, A-Za-z0-9, by itself, is not a regular expression, but [A-Za-z0-9] is Likewise, [:alnum:] should be written [[:alnum:]]. The difference between using the first form, [A-Za-z0-9] and the bracketed form, [[:alnum:]] is that the first form is dependent on ASCII character encoding, whereas the second form allows characters from other languages to be represented in the class, such as Swedish rings and German umlauts.

Table 3.4. The Bracketed Character Class
Bracket Class Meaning
[:alnum:] alphanumeric characters
[:alpha:] alphabetic characters
[:cntrl:] control characters
[:digit:] numeric characters
[:graph:] nonblank characters (not spaces, control characters, etc.)
[:lower:] lowercase letters
[:print:] like [:graph:], but includes the space character
[:punct:] punctuation characters
[:space:] all white-space characters (newlines, spaces, tabs)
[:upper:] uppercase letters
[:xdigit:] allows digits in a hexadecimal number (0-9a-fA-F)

Example 3.3.
1  % grep  '[[:space:]].[[:digit:]][[:space:]]'  datafile
							southwest     SW     Lewis Dalsass        2.7    . 8    2     18
							southeast     SE     Patricia Hemenway    4.0     .7    4     15

2  % grep -E  '[[:space:]].[[:digit:]][[:space:]]'  datafile
							southwest     SW     Lewis Dalsass        2.7 .    8    2     18
							southeast     SE     Patricia Hemenway    4.0 .    7    4     15

3  % egrep  '[[:space:]].[[:digit:]][[:space:]]'  datafile
							southwest     SW     Lewis Dalsass         2.7 .   8    2     18
							southeast     SE     Patricia Hemenway     4.0 .   7    4     15
						

Explanation

1,2,3 For all Linux variants of grep (other than fgrep), the POSIX bracketed class set is supported. In each of these examples, grep will search for a space character, a literal period, a digit [0-9] and another space character.

3.1.4. grep and Exit Status

The grep command is very useful in shell scripts, because it always returns an exit status to indicate whether it was able to locate the pattern or the file you were looking for. If the pattern is found, grep returns an exit status of 0, indicating success; if grep cannot find the pattern, it returns 1 as its exit status; and if the file cannot be found, grep returns an exit status of 2. (Other Linux/UNIX utilities that search for patterns, such as sed and awk, do not use the exit status to indicate the success or failure of locating a pattern; they report failure only if there is a syntax error in a command.)

In the following example, john is not found in the /etc/passwd file.

Example 3.4.
1  % grep 'john' /etc/passwd
2  % echo $status(bash/tcsh/csh)
						1
						or
   $ echo $?
						(bash/sh/ksh/tcsh)
						1
					

Explanation

  1. Grep searches for john in the /etc/passwd file, and if successful, grep exits with a status of 0. If john is not found in the file, grep exits with 1. If the file is not found, an exit status of 2 is returned.

  2. The TC/C shell variable, status, and the Bourne/Korn shell variable, ?, are assigned the exit status of the last command that was executed. With the Bourne Again (bash) shell you can use either the ? or status variable to check exit status.

3.1.5. Regular grep Examples (grep, grep -G)

% cat datafile
						northwest    NW    Charles Main       3.0   .98   3    34
						western      WE    Sharon Gray        5.3   .97   5    23
						southwest    SW    Lewis Dalsass      2.7   .8    2    18
						southern     SO    Suan Chin          5.1   .95   4    15
						southeast    SE    Patricia Hemenway  4.0   .7    4    17
						eastern      EA    TB Savage          4.4   .84   5    20
						northeast    NE    AM Main Jr.        5.1   .94   3    13
						north        NO    Margot Weber       4.5   .89   5     9
						central      CT    Ann Stephens       5.7   .94   5    13
					

Example 3.5.
						grep NW datafile
						or
						grep -G NW datafile
						northwest       NW        Charles Main       3.0      .98     3    34
					

Explanation

Prints all lines containing the regular expression NW in a file called datafile.

Example 3.6.
						grep NW d*
						datafile:  northwest  NW    Charles Main     3.0   .98   3    34
						db:northwest          NW     Joel Craig      30    40    5    123
					

Explanation

Prints all lines containing the regular expression NW in all files starting with a d. The shell expands d* to all files that begin with a d; in this case the filenames are db and datafile.

Example 3.7.
						grep '^n'datafile
						northwest        NW     Charles Main     3.0   .98   3   34
						northeast        NE     AM Main Jr.      5.1   .94   3   13
						north            NO     Margot Weber     4.5   .89   5    9
					

Explanation

Prints all lines beginning with an n. The caret (^) is the beginning of line anchor.

Example 3.8.
						grep '4$' datafile
						northwest         NW      Charles Main      3.0    .98    3   34
					

Explanation

Prints all lines ending with a 4. The dollar sign ($) is the end of line anchor.

Example 3.9.
						grep TB Savage datafile
						grep: Savage: No such file or directory
						datafile:eastern    EA     TB Savage       4.4    .84    5    20
					

Explanation

Because the first argument is the pattern and all of the remaining arguments are file-names, grep will search for TB in a file called Savage and a file called datafile. To search for TB Savage, see the next example.

Example 3.10.
						grep 'TB Savage' datafile
						eastern          EA      TB Savage         4.4     .84     5     20
					

Explanation

Prints all lines containing the pattern TB Savage. Without quotes (in this example, either single or double quotes will do), the white space between TB and Savage would cause grep to search for TB in a file called Savage and a file called datafile, as in the previous example.

%  cat datafile
						northwest     NW    Charles Main         3.0   .98   3   34
						western       WE    Sharon Gray          53    .97   5   23
						southwest     SW    Lewis Dalsass        2.7   .8    2   18
						southern      SO    Suan Chin            5.1   .95   4   15
						southeast     SE    Patricia Hemenway    4.0   .7    4   17
						eastern       EA    TB Savage            4.4   .84   5   20
						northeast     NE    AM Main Jr.          5.1   .94   3   13
						north         NO    Margot Weber         4.5   .89   5    9
						central       CT    Ann Stephens         5.7   .94   5   13
					

Example 3.11.
						grep '5..'datafile
						western         WE       Sharon Gray       5.3   .97   5   23
						southern        SO       Suan Chin         5.1   .95   4   15
						northeast       NE       AM Main Jr.       5.1   .94   3   13
						central         CT       Ann Stephens      5.7   .94   5   13
					

Explanation

Prints a line containing the number 5, followed by a literal period and any single character. The "dot" metacharacter represents a single character, unless it is escaped with a backslash. When escaped, the character is no longer a special metacharacter, but represents itself, a literal period.

Example 3.12.
						grep '.5'datafile
						north            NO          Margot Weber       4.5     .89    5    9
					

Explanation

Prints any line containing the expression .5.

Example 3.13.
						grep '^[we] 'datafile
						western          WE     Sharon Gray     5.3   .97   5   23
						eastern          EA     TB Savage       4.4   .84   5   20
					

Explanation

Prints lines beginning with either a w or an e. The caret (^) is the beginning of line anchor, and either one of the characters in the brackets will be matched.

Example 3.14.
						grep '[^0-9]'datafile
						northwest       NW      Charles Main       3.0    .98    3   34
						western         WE      Sharon Gray        5.3    .97    5   23
						southwest       SW      Lewis Dalsass      2.7    .8     2   18
						southern        SO      Suan Chin          5.1    .95    4   15
						southeast       SE      Patricia Hemenway  4.0    .7     4   17
						eastern         EA      TB Savage          4.4    .84    5   20
						northeast       NE      AM Main Jr.        5.1    .94    3   13
						north           NO      Margot Weber       4.5    .89    5    9
						central         CT      Ann Stephens       5.7    .94    5   13
					

Explanation

Prints all lines containing one nondigit. The caret inside brackets means match one character not in the range. Because all lines have at least one nondigit, all lines are printed. (See the -v option.)

Example 3.15.
						grep ' [A-Z][A-Z] [A-Z] ' datafile
						eastern           EA        TB Savage         4.4    .84   5   20
						northeast         NE        AM Main Jr.       5.1    .94   3   13
					

Explanation

Prints all lines containing two capital letters followed by a space and a capital letter, e.g., TB Savage and AM Main.

Example 3.16.
						grep 'ss* ' datafile
						northwest        NW       Charles Main      3.0   .98   3   34
						southwest        SW       Lewis Dalsass     2.7   .8    2   18
					

Explanation

Prints all lines containing an s followed by zero or more consecutive s's and a space. Finds Charles and Dalsass.

						cat datafile
						northwest   NW     Charles Main       3.0   .98   3   34
						western     WE     Sharon Gray        53    .97   5   23
						southwest   SW     Lewis Dalsass      2.7   .8    2   18
						southern    SO     Suan Chin          5.1   .95   4   15
						southeast   SE     Patricia Hemenway  4.0   .7    4   17
						eastern     EA     TB Savage          4.4   .84   5   20
						northeast   NE     AM Main Jr.        5.1   .94   3   13
						north       NO     Margot Weber       4.5   .89   5    9
						central     CT     Ann Stephens       5.7   .94   5   13
					

Example 3.17.
						grep ' [a-z]{9}' datafile
						northwest         NW       Charles Main        3.0   .98   3 34
						southwest         SW       Lewis Dalsass       2.7   .8    2 18
						southeast         SE       Patricia Hemenway   4.0   .7    4 17
						northeast         NE       AM Main Jr.         5.1   .94   3 13
					

Explanation

Prints all lines where there are at least nine repeating lowercase letters, for example, northwest, southwest, southeast, and northeast.

Example 3.18.
						grep '(3).[0-9].*1 *1' datafile
						northwest       NW     Charles Main      3.0   .98   3   34
					

Explanation

Prints the line if it contains a 3 followed by a period and another number, followed by any number of characters (.*), another 3 (originally tagged), any number of tabs, and another 3. Because the 3 was enclosed in parentheses, (3), it can be later referenced with 1. 1 means that this was the first expression to be tagged with the ( ) pair.

Example 3.19.
						grep '<north' datafile
						northwest        NW        Charles Main       3.0   .98   3   34
						northeast        NE        AM Main Jr.        5.1   .94   3   13
						north            NO        Margot Weber       4.5   .89   5    9
					

Explanation

Prints all lines containing a word starting with north. The < is the beginning of word anchor.

Example 3.20.
						grep '<north>' datafile
						north              NO        Margot Weber          4.5    .89   5   9
					

Explanation

Prints the line if it contains the word north. The < is the beginning of word anchor, and the > is the end of word anchor.

Example 3.21.
						grep 'north' datafile
						north            NO        Margot Weber      4.5    .89    5    9
					

Explanation

Prints the line if it contains the word north. The  a word boundary. It can be used instead of the word anchors (see previous example) on all Gnu variants of grep.

Example 3.22.
						grep '^nw*W' datafile
						northwest        NW     Charles Main     3.0   .98   3   34
						northeast        NE     AM Main Jr.      5.1   .94   3   13
					

Explanation

Prints any line starting with an "n," followed by zero or more alphanumeric word characters, followed by a nonalphanumeric word character. w and W are standard word metacharacters for Gnu variants of grep.

Example 3.23.
						grep '<[a-z].*n> ' datafile
						northwest           NW        Charles Main      3.0   .98   3   34
						western             WE        Sharon Gray       5.3   .97   5   23
						southern            SO        Suan Chin         5.1   .95   4   15
						eastern             EA        TB Savage         4.4   .84   5   20
						northeast           NE        AM Main Jr.       5.1   .94   3   13
						central             CT        Ann Stephens      5.7   .94   5   13
					

Explanation

Prints all lines containing a word starting with a lowercase letter, followed by any number of characters, and a word ending in n. Watch the.* symbol. It means any character, including white space.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.247.125