The name grep can be traced back to the ex editor. If you invoked that editor and wanted to search for a string, you would type at the ex prompt:
: /pattern/p
The first line containing the string pattern would be printed as "p" by the print command. If you wanted all the lines that contained pattern to be printed, you would type:
:g/pattern/p
When g precedes pattern, it means "all lines in the file," or "perform a global substitution."
Because the search pattern is called a regular expression, we can substitute RE for pattern and the command reads:
: g/RE/p
And there you have it. The meaning of grep and the rigin of its name. It means "g lobally search for the regular expression (RE) and print out the line." The nice part of using grep is that you do not have to invoke an editor to perform a search, and you do not need to enclose the regular expression in forward slashes. It is much faster than using ex or vi.
The grep command searches for a pattern of characters in a file or multiple files. If the pattern contains white space, it must be quoted. The pattern is either a quoted string or a single word[2], and all other words following it are treated as filenames. Grep sends its output to the screen and does not change or affect the input file in any way.
[2] A word is also called a token.
Format
grep word filename filename
% grep Tom/etc/passwd
|
Explanation
Grep will search for the pattern Tom in a file called /etc/passwd. If successful, the line from the file will appear on the screen; if the pattern is not found, there will be no output at all; and if the file is not a legitimate file, an error will be sent to the screen. If the pattern is found, grep returns an exit status of 0, indicating success; if the pattern is not found, the exit status returned is 1; and if the file is not found, the exit status is 2.
The grep program can get its input from a standard input or a pipe, as well as from files. If you forget to name a file, grep will assume it is getting input from standard input, the keyboard, and will stop until you type something. If coming from a pipe, the output of a command will be piped as input to the grep command, and if a desired pattern is matched, grep will print the output to the screen.
% ps aux | grep root
|
Explanation
The output of the ps command (ps aux displays processes running on this system) is sent to grep and all lines containing root are printed.
The grep command supports a number of regular expression metacharacters (see Table 3.2) to help further define the search pattern. It also provides a number of options (see Table 3.3) to modify the way it does its search or displays lines. For example, you can provide options to turn off case-sensitivity, display line numbers, display filenames, and so on.
There are two versions of regular expression metacharacters: basic and extended. The regular version of grep uses the basic set ( Table 3.2), and egrep (or grep -E) uses the extended set (Table 3.3). With Gnu grep, both sets are available. The basic set consists of:
^, $, ., *, [ ], [^ ], <, and >
In addition, Gnu grep recognizes: , w, and W, as well as a new class of POSIX metacharacters. (See Table 3.4.)
With the -E option to Gnu grep, the extended set are available, but even without the -E option, regular grep, the default, can use the extended set of metacharacters provided that the metacharacters are preceded with a backslash.[3] For example, the extended set of metacharacters are:
[3] In any version of grep, a metacharacter can be quoted with a backslash to turn off its special meaning.
?, +, { }, |, ()
The extended set of metacharacters have no special meaning to regular grep, unless they are backslashed as follows:
?, +, {, |, (, )
The format for using the Gnu grep is shown in Table 3.1.
Format | What It Understands |
---|---|
grep 'pattern' filename(s) | Basic RE metacharacters (the default) |
grep -G 'pattern' filename(s) | Same as above; the default |
grep -E 'pattern' filename(s) | Extended RE metacharacters |
grep -F 'pattern' filename | No RE metacharacters |
Metacharacter | Function | Example | What It Matches |
---|---|---|---|
^ | Beginning of line anchor | ^love | Matches all lines beginning with love. |
$ | End of line anchor | love$ | Matches all lines ending with love. |
. | Matches one character | l..e | Matches lines containing an l, followed by two characters, followed by an e. |
* | Matches zero or more characters | *love | Matches lines with zero or more spaces, of the preceding characters followed by the pattern love. |
[ ] | Matches one character in the set | [Ll]ove | Matches lines containing love or Love. |
[^] | Matches one character not in the set | [^A–K]ove | Matches lines not containing A through K followed by ove. |
<[a] | Beginning of word anchor | <love | Matches lines containing a word that begins with love. |
> | End of word anchor | love> | Matches lines containing a word that ends with love. |
(..)[b] | Tags matched characters | (love)able | Tags marked portion in a register to be remembered later as number 1. To reference later, use 1 to repeat the pattern. May use up to nine tags, starting with the first tag at the leftmost part of the pattern. For example, the pattern love is saved in register 1 to be referenced later as 1. |
x{m} x{m,} x{m,n}[c] | Repetition of character x, m times, at least m times, or between m and n times | o{5} o{5,}o{5,10} | Matches if line has 5 o's, at least 5 o's, or between 5 and 10 o's |
w | alphanumeric word character;[a-zA-Z0-9] | lw*e | Matches an l followed by zero more word characters, and an e. |
W | nonalphanumeric word character;[^a-zA-Z0-9] | loveW+ | Matches love followed by one or more non-word characters, such as a period, question mark, etc. |
word boundary | love | Matches only the word love. |
[a] These metacharacters do not work unless backslashed, even with grep -E and Gnu egrep; they don't work with UNIX egrep at all.
[b] These metacharacters are really part of the extended set, but are placed here because they work with UNIX grep and Gnu regular grep, if backslashed. They do not work with UNIX egrep at all.
[c] The { } metacharacters are not supported on all versions of UNIX or all pattern-matching utilities; they usually work with vi and grep. They don't work with UNIX egrep at all.
Metacharacter | Function | Example | What It Matches |
---|---|---|---|
+ | Matches one or more of the preceding characters | [a–z]+ove | Matches one or more lowercase letters, followed by ove. Would find move, approve, love, behoove, etc. |
? | Matches zero or one of the preceding characters | lo?ve | Matches for an l followed by either one or not any o's at all. Would find love or lve. |
a|b|c | Matches either a or b or c | love|hate | Matches for either expression, love or hate. |
() | Groups characters | love(able|rs) (ov)+ | Matches for loveable or lovers. Matches for one or more occurrences of ov. |
(..) (...) 1 2[a] | Tags matched characters | (love)ing | Tags marked portion in a register to be remembered later as number 1. To reference later, use 1 to repeat the pattern. May use up to nine tags, starting with the first tag at the leftmost part of the pattern. For example, the pattern love is saved in register 1 to be referenced later as 1. |
x{m}x{m,} x{m,n}[b] | Repetition of character x,m times, at least m times, or between m and n times | o{5}o{5,}o{5, 10} | Matches if line has 5 o's, at least 5 o's, or between 5 and 10 o's |
[a] Tags and back references do not work with UNIX egrep.
[b] The { } metacharacters are not supported on all versions of UNIX or all pattern-matching utilities; they usually work with vi and grep. They do not work with UNIX egrep at all.
POSIX (the Portable Operating System Interface) is an industry standard to ensure that programs are portable across operating systems. In order to be portable, POSIX recognizes that different countries or locales may differ in the way they encode characters, represent currency, and how times and dates are represented. To handle different types of characters, POSIX added to the basic and extended regular expressions, the bracketed character class of characters shown in Table 3.4.
The class, for example, [:alnum:] is another way of saying A-Za-z0-9. To use this class, it must be enclosed in another set of brackets for it to be recognized as a regular expression. For example, A-Za-z0-9, by itself, is not a regular expression, but [A-Za-z0-9] is Likewise, [:alnum:] should be written [[:alnum:]]. The difference between using the first form, [A-Za-z0-9] and the bracketed form, [[:alnum:]] is that the first form is dependent on ASCII character encoding, whereas the second form allows characters from other languages to be represented in the class, such as Swedish rings and German umlauts.
Bracket Class | Meaning |
---|---|
[:alnum:] | alphanumeric characters |
[:alpha:] | alphabetic characters |
[:cntrl:] | control characters |
[:digit:] | numeric characters |
[:graph:] | nonblank characters (not spaces, control characters, etc.) |
[:lower:] | lowercase letters |
[:print:] | like [:graph:], but includes the space character |
[:punct:] | punctuation characters |
[:space:] | all white-space characters (newlines, spaces, tabs) |
[:upper:] | uppercase letters |
[:xdigit:] | allows digits in a hexadecimal number (0-9a-fA-F) |
1 % grep '[[:space:]].[[:digit:]][[:space:]]' datafile southwest SW Lewis Dalsass 2.7 . 8 2 18 southeast SE Patricia Hemenway 4.0 .7 4 15 2 % grep -E '[[:space:]].[[:digit:]][[:space:]]' datafile southwest SW Lewis Dalsass 2.7 . 8 2 18 southeast SE Patricia Hemenway 4.0 . 7 4 15 3 % egrep '[[:space:]].[[:digit:]][[:space:]]' datafile southwest SW Lewis Dalsass 2.7 . 8 2 18 southeast SE Patricia Hemenway 4.0 . 7 4 15 |
Explanation
1,2,3 For all Linux variants of grep (other than fgrep), the POSIX bracketed class set is supported. In each of these examples, grep will search for a space character, a literal period, a digit [0-9] and another space character.
The grep command is very useful in shell scripts, because it always returns an exit status to indicate whether it was able to locate the pattern or the file you were looking for. If the pattern is found, grep returns an exit status of 0, indicating success; if grep cannot find the pattern, it returns 1 as its exit status; and if the file cannot be found, grep returns an exit status of 2. (Other Linux/UNIX utilities that search for patterns, such as sed and awk, do not use the exit status to indicate the success or failure of locating a pattern; they report failure only if there is a syntax error in a command.)
In the following example, john is not found in the /etc/passwd file.
1 % grep 'john' /etc/passwd 2 % echo $status(bash/tcsh/csh) 1 or $ echo $? (bash/sh/ksh/tcsh) 1 |
Explanation
Grep searches for john in the /etc/passwd file, and if successful, grep exits with a status of 0. If john is not found in the file, grep exits with 1. If the file is not found, an exit status of 2 is returned.
% cat datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
grep NW datafile or grep -G NW datafile northwest NW Charles Main 3.0 .98 3 34 |
Explanation
Prints all lines containing the regular expression NW in a file called datafile.
grep NW d* datafile: northwest NW Charles Main 3.0 .98 3 34 db:northwest NW Joel Craig 30 40 5 123 |
Explanation
Prints all lines containing the regular expression NW in all files starting with a d. The shell expands d* to all files that begin with a d; in this case the filenames are db and datafile.
grep '^n'datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 |
Explanation
Prints all lines beginning with an n. The caret (^) is the beginning of line anchor.
grep '4$' datafile northwest NW Charles Main 3.0 .98 3 34 |
Explanation
Prints all lines ending with a 4. The dollar sign ($) is the end of line anchor.
grep TB Savage datafile grep: Savage: No such file or directory datafile:eastern EA TB Savage 4.4 .84 5 20 |
Explanation
Because the first argument is the pattern and all of the remaining arguments are file-names, grep will search for TB in a file called Savage and a file called datafile. To search for TB Savage, see the next example.
grep 'TB Savage' datafile eastern EA TB Savage 4.4 .84 5 20 |
Explanation
Prints all lines containing the pattern TB Savage. Without quotes (in this example, either single or double quotes will do), the white space between TB and Savage would cause grep to search for TB in a file called Savage and a file called datafile, as in the previous example.
% cat datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 53 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
grep '5..'datafile western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 northeast NE AM Main Jr. 5.1 .94 3 13 central CT Ann Stephens 5.7 .94 5 13 |
Explanation
Prints a line containing the number 5, followed by a literal period and any single character. The "dot" metacharacter represents a single character, unless it is escaped with a backslash. When escaped, the character is no longer a special metacharacter, but represents itself, a literal period.
grep '.5'datafile north NO Margot Weber 4.5 .89 5 9 |
Explanation
Prints any line containing the expression .5.
grep '^[we] 'datafile western WE Sharon Gray 5.3 .97 5 23 eastern EA TB Savage 4.4 .84 5 20 |
Explanation
Prints lines beginning with either a w or an e. The caret (^) is the beginning of line anchor, and either one of the characters in the brackets will be matched.
grep '[^0-9]'datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 |
Explanation
Prints all lines containing one nondigit. The caret inside brackets means match one character not in the range. Because all lines have at least one nondigit, all lines are printed. (See the -v option.)
grep ' [A-Z][A-Z] [A-Z] ' datafile eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 |
Explanation
Prints all lines containing two capital letters followed by a space and a capital letter, e.g., TB Savage and AM Main.
grep 'ss* ' datafile northwest NW Charles Main 3.0 .98 3 34 southwest SW Lewis Dalsass 2.7 .8 2 18 |
Explanation
Prints all lines containing an s followed by zero or more consecutive s's and a space. Finds Charles and Dalsass.
cat datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 53 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
grep ' [a-z]{9}' datafile northwest NW Charles Main 3.0 .98 3 34 southwest SW Lewis Dalsass 2.7 .8 2 18 southeast SE Patricia Hemenway 4.0 .7 4 17 northeast NE AM Main Jr. 5.1 .94 3 13 |
Explanation
Prints all lines where there are at least nine repeating lowercase letters, for example, northwest, southwest, southeast, and northeast.
grep '(3).[0-9].*1 *1' datafile northwest NW Charles Main 3.0 .98 3 34 |
Explanation
Prints the line if it contains a 3 followed by a period and another number, followed by any number of characters (.*), another 3 (originally tagged), any number of tabs, and another 3. Because the 3 was enclosed in parentheses, (3), it can be later referenced with 1. 1 means that this was the first expression to be tagged with the ( ) pair.
grep '<north' datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 |
Explanation
Prints all lines containing a word starting with north. The < is the beginning of word anchor.
grep '<north>' datafile north NO Margot Weber 4.5 .89 5 9 |
Explanation
Prints the line if it contains the word north. The < is the beginning of word anchor, and the > is the end of word anchor.
grep 'north' datafile north NO Margot Weber 4.5 .89 5 9 |
Explanation
Prints the line if it contains the word north. The a word boundary. It can be used instead of the word anchors (see previous example) on all Gnu variants of grep.
grep '^nw*W' datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 |
Explanation
Prints any line starting with an "n," followed by zero or more alphanumeric word characters, followed by a nonalphanumeric word character. w and W are standard word metacharacters for Gnu variants of grep.
grep '<[a-z].*n> ' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 central CT Ann Stephens 5.7 .94 5 13 |
Explanation
Prints all lines containing a word starting with a lowercase letter, followed by any number of characters, and a word ending in n. Watch the.* symbol. It means any character, including white space.
3.143.247.125