Filtering with grep

When you’re dealing with program output, you’ll often want to filter the results. The grep command lets you search text for characters or phrases. You can use grep to search through program output or a file. Let’s explore grep by working with some files.

Create a file named words.txt that contains several words, each on its own line:

 $ ​​cat​​ ​​<<​​ ​​'EOF'​​ ​​>​​ ​​words.txt
 >​​ ​​blue
 >​​ ​​apple
 >​​ ​​candy
 >​​ ​​hand
 >​​ ​​fork
 >​​ ​​EOF

Now use grep to search the file for the word and:

 $ ​​grep​​ ​​'and'​​ ​​words.txt
 candy
 hand

This displays the two lines of the file that show the string you specified. You get both results because they contain the string and somewhere on the line. This is the most simple form of searching. Surrounding the search term in quotes isn’t always necessary, but it’s a good habit to get into because you can run into some strange edge cases with special characters if you don’t.

You can also tell grep to remove lines containing that text. The -v option instructs grep to only show lines that don’t contain the search pattern you specified.

 $ ​​grep​​ ​​'and'​​ ​​-v​​ ​​words.txt
 blue
 apple
 fork

grep reads the file in and processes its contents, but you’re not limited to using grep on just files. You can use it to process output from other programs, which means you can use it to filter the streams of text other programs display.

Try it out by using grep to show you all the ls commands in your history:

 $ ​​history​​ ​​|​​ ​​grep​​ ​​'ls'
 ...
  471 ls
  479 ls
  484 ls
  500 history | grep 'ls'

When you ran the command on your machine, you probably saw a lot of results, and the last result was the history | grep command. You can filter that last command out by piping the output to grep again:

 $ ​​history​​ ​​|​​ ​​grep​​ ​​'ls'​​ ​​|​​ ​​grep​​ ​​-v​​ ​​'grep'
 ...
  471 ls
  479 ls
  484 ls

If there are too many commands for you to see, you can always pipe the output to less:

 $ ​​history​​ ​​|​​ ​​grep​​ ​​'ls'​​ ​​|​​ ​​grep​​ ​​-v​​ ​​'grep'​​ ​​|​​ ​​less

grep supports searching multiple files as well. Create another file with some more words:

 $ ​​cat​​ ​​<<​​ ​​'EOF'​​ ​​>​​ ​​words2.txt
 >​​ ​​blue​​ ​​car
 >​​ ​​apple​​ ​​pie
 >​​ ​​candy​​ ​​bar
 >​​ ​​hand​​ ​​in​​ ​​hand
 >​​ ​​fork​​ ​​in​​ ​​the​​ ​​road
 >​​ ​​EOF

Then, use grep to search both files for the word blue:

 $ ​​grep​​ ​​'blue'​​ ​​words.txt​​ ​​words2.txt
 words.txt:blue
 words2.txt:blue car

This time, grep shows the word, along with the name of the file that contains the word.

The grep command only shows the exact line containing the match, but you can tell it to give you a little more context. Using the -A and -B switches, you can specify the number of lines above and below the match:

 $ ​​grep​​ ​​'candy'​​ ​​-A​​ ​​2​​ ​​-B​​ ​​2​​ ​​words*
 words2.txt-blue car
 words2.txt-apple pie
 words2.txt:candy bar
 words2.txt-hand in hand
 words2.txt-fork in the road
 --
 words.txt-blue
 words.txt-apple
 words.txt:candy
 words.txt-hand
 words.txt-fork

The output separates the matches clearly.

In this example, you selected the same amount of lines before and after the matched line. In cases like this, you can shorten the command by using the -C switch instead of specifying both -A and -B:

 $ ​​grep​​ ​​'candy'​​ ​​-C​​ ​​2​​ ​​words*

The resulting output is the same as before. The -C switch shows the “context” around the results.

Adding the -n flag will show you the line number where the match was found:

 $ ​​grep​​ ​​'candy'​​ ​​-C​​ ​​2​​ ​​-n​​ ​​words*
 words2.txt-1-blue car
 words2.txt-2-apple pie
 words2.txt:3:candy bar
 words2.txt-4-hand in hand
 words2.txt-5-fork in the road
 --
 words.txt-1-blue
 words.txt-2-apple
 words.txt:3:candy
 words.txt-4-hand
 words.txt-5-fork

This is helpful when working with source code. You can use grep to look at your entire codebase and find phrases or keywords quickly, as grep can read directories recursively.

To demonstrate this, use grep to scan the contents of the /var/log folder for instances of your username:

 $ ​​sudo​​ ​​grep​​ ​​'brian'​​ ​​-r​​ ​​/var/log
 ...
 /var/log/auth.log:Mar 3 15:40:29 puzzles sudo: brian : TTY=pts/8 ;
 PWD=/home/brian ; USER=root ; COMMAND=/bin/grep brian -r /var/log/
 /var/log/auth.log:Mar 3 15:40:29 puzzles sudo: pam_unix(sudo:session):
 session opened for user root by brian(uid=0)
 Binary file /var/log/btmp matches
 Binary file /var/log/wtmp matches
 Binary file /var/log/auth.log.1 matches

You’ll see a stream of data returned, displaying events from your system logs.

All of the searches you performed so far are simple text searches, but you can use regular expressions, or regexes as well. A regex is a sequence of characters that defines a pattern for finding text.

This book doesn’t go into a ton of detail on regular expressions. However, you’ll use regular expressions a few more times throughout this book, so I’ll explain what’s going on with each one.

If you’d like more information on regular expressions, lots of online resources will help get you started, including Regex101,[10] an interactive online tool for building and debugging regular expressions.

For now, let’s try out regular expression with grep. If you search both files for the letter b, you get all of the lines containing that word:

 $ ​​grep​​ ​​'b'​​ ​​words*
 words.txt:blue
 words2.txt:blue car
 words2.txt:candy bar

But if you use the regular expression ^b, which means “look for the lower-case letter b at the beginning of the line,” you only see two results: blue and blue car:

 $ ​​grep​​ ​​'^b'​​ ​​words*
 words.txt:blue
 words2.txt:blue car

Similarly, if you use the expression e$, which means “look for any line ending with the letter e,” you see these three results:

 $ ​​grep​​ ​​'e$'​​ ​​words*
 words.txt:blue
 words.txt:apple
 words2.txt:apple pie

Likewise, use the regular expression blue|apple to search for lines that contain “blue” or “apple”. To use this regular expression with grep, use the -E switch:

 $ ​​grep​​ ​​-E​​ ​​'blue|apple'​​ ​​words*
 words.txt:blue
 words.txt:apple
 words2.txt:blue car
 words2.txt:apple pie

The -E switch lets you use extended regular expressions, which means that the characters |, ?, +, {, (, and ) are supported in the expression. These characters let you create more advanced search patterns. For example, the expression a(n|r) will look for any lines containing either “an” or “ar”:

 $ ​​grep​​ ​​-E​​ ​​'a(n|r)'​​ ​​words*
 words2.txt:blue car
 words2.txt:candy bar
 words2.txt:hand in hand
 words.txt:candy
 words.txt:hand

grep is a general purpose text search tool, and while there are some other options out there like ack[11] or ripgrep,[12] which have additional features aimed at working with source code, you should be comfortable using grep since it’s universally available.

Next, you’ll look at how to remove characters from output.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.194.57