Filtering an output using grep

One of the powerful and widely used command in shell is grep. It searches in an input file and matches lines in which the given pattern is found. By default, all the matched patterns are printed on stdout that is usually terminal. We can also redirect the matched output to other streams such as file. Instead of giving an input from a file, grep can also take the input from the redirected output of the command executed on the left-hand side of '|'.

Syntax

The syntax of using the grep command is as follows:

grep [OPTIONS] PATTERN [FILE...]

Here, FILE can be multiple files for a search. If no file is given as an input for a search, it will search the standard input.

PATTERN can be any valid regular expression. Put PATTERN within single quotes (') or double quotes (") as per need. For example, use single quotes (') to avoid any bash expansion and double quotes (") for expansion.

A lot of OPTIONS are available in grep. Some of the important and widely used options are discussed in the following table:

Option

Usage

-i

This enforces case insensitive match in both pattern and input file(s)

-v

This displays the non-matching line

-o

This displays only the matched part in the matching line

-f FILE

This obtains a pattern from a file, one per line

-e PATTERN

This specifies multiple search pattern

-E

This considers pattern as an extended regex (egrp)

-r

This reads all the files in a directory recursively, excluding resolving of symbolic links unless explicitly specified as an input file

-R

This reads all the files in a directory recursively and resolving symbolic if any

-a

This processes binary file as a text file

-n

This prefixes each matched line along with a line number

-q

Don't print anything on stdout

-s

Don't print error messages

-c

This prints the count of matching lines of each input file

-A NUM

This prints NUM lines after the actual string match. (No effect with the -o option)

-B NUM

This prints NUM lines before the actual string match. (No effect with the -o option)

-C NUM

This prints NUM lines after and before the actual string match. (No effect with the -o option)

Looking for a pattern in a file

A lot of times we have to search for a given string or a pattern in a file. The grep command provides us the capability to do it in a single line. Let's see the following example:

The input file for our example will be input1.txt:

$ cat input1.txt  # Input file for our example
This file is a text file to show demonstration
of grep command. grep is a very important and
powerful command in shell.
This file has been used in chapter 2

We will try to get the following information from the input1.txt file using the grep command:

  • Number of lines
  • Line starting with a capital letter
  • Line ending with a period (.)
  • Number of sentences
  • Searching sub-string sent lines that don't have a periodNumber of times the string file is used

The following shell script demonstrates how to do the above mentioned tasks:

#!/bin/bash
#Filename: pattern_search.sh
#Description: Searching for a pattern using input1.txt file

echo "Number of lines = `grep -c '.*' input1.txt`"
echo "Line starting with capital letter:"
grep -c ^[A-Z].* input1.txt
echo
echo "Line ending with full stop (.):"
grep '.*.$' input1.txt
echo
echo -n "Number of sentence = "
grep -c '.' input1.txt
echo "Strings matching sub-string sent:"
grep -o "sent" input1.txt
echo
echo "Lines not having full stop are:"
grep -v '.' input1.txt
echo
echo -n "Number of times string file used: = "
grep -o "file" input1.txt | wc -w

The output after running the pattern_search.sh shell script will be as follows:

Number of lines = 4
Line starting with capital letter:
2

Line ending with full stop (.):
powerful command in shell.

Number of sentence = 2
Strings matching sub-string sent:

Lines not having full stop are:
This file is a text file to show demonstration
This file has been used in chapter 2

Number of times string file used: = 3

Looking for a pattern in multiple files

The grep command also allows us to search for a pattern in multiple files as an input. To explain this in detail, we will head directly to the following example:

The input files, in our case, will be input1.txt and input2.txt.

We will reuse the content of the input1.txt file from the previous example:

The content of input2.txt is as follows:

$ cat input2.txt
Another file for demonstrating grep CommaNd usage.
It allows us to do CASE Insensitive string test
as well.
We can also do recursive SEARCH in a directory
using -R and -r Options.
grep allows to give a regular expression to
search for a PATTERN.
Some special characters like . * ( ) { } $ ^ ?
are used to form regexp.
Range of digit can be given to regexp e.g. [3-6],
[7-9], [0-9]

We will try to get the following information from the input1.txt and input2.txt files using the grep command:

  • Search for the string command
  • Case-insensitive search of the string command
  • Print the line number where the string grep matches
  • Search for punctuation marks
  • Print one line followed by the matching lines while searching for the string important

The following shell script demonstrates how to follow the preceding steps:

#!/bin/bash
# Filename: multiple_file_search.sh
# Description: Demonstrating search in multiple input files

echo "This program searches in files input1.txt and input2.txt"
echo "Search result for string "command":"
grep "command" input1.txt input2.txt
echo
echo "Case insensitive search of string "command":"
# input{1,2}.txt will be expanded by bash to input1.txt input2.txt
grep -i "command" input{1,2}.txt
echo
echo "Search for string "grep" and print matching line too:"
grep -n "grep" input{1,2}.txt
echo
echo "Punctuation marks in files:"
grep -n [[:punct:]] input{1,2}.txt
echo
echo "Next line content whose previous line has string "important":"
grep -A 1 'important' input1.txt input2.txt

The following screenshot is the output after running the shell script pattern_search.sh. The matched pattern string has been highlighted:

Looking for a pattern in multiple files

A few more grep usages

The following subsections will cover a few more usages of the grep command.

Searching in a binary file

So far, we have seen all the grep examples running on text files. We can also search for a pattern in binary files using grep. For this, we have to tell the grep command to treat a binary file as a text file too. The option -a or –text tells grep to consider a binary file as a test file.

We know that the grep command itself is a binary file that executes and gives a search result.

One of the option in grep is --text. The string --text should be somewhere available in the grep binary file. Let's search for it as follows:

$ grep --text '--text' /usr/bin/grep 
 -a, --text                equivalent to –binary-files=text

We saw that the string --text is found in the search path /usr/bin/grep. The character backslash ('') is used to escape its special meaning.

Now, let's search for the -w string in the wc binary. We know that the wc command has an option -w that counts the number of words in an input text.

$ grep -a '-w' /usr/bin/wc
  -w, --words            print the word counts

Searching in a directory

We can also tell grep to search into all files/directories in a directory recursively using the option -R. This avoids the hassle of specifying each file as an input text file to grep.

For example, we are interested in knowing at how many places #include <stdio.h> is used in a standard include directory:

$ grep -R '#include <stdio.h>' /usr/include/ | wc -l
77

This means that the #include <stdio.h> string is found at 77 places in the /usr/include directory.

In another example, we want to know how many Python files (the extension .py) in /usr/lib64/python2.7/ does "import os". We can check that as follows:

$ grep -R "import os" /usr/lib64/python2.7/*.py | wc -l
93

Excluding files/directories from a search

We can also specify the grep command to exclude a particular directory or file from search. This is useful when we don't want grep to look into a file or directory that has some confidential information. This is also useful in the case where we are sure that searching into a certain directory will be of no use. So, excluding them will reduce search time.

Suppose, there is a source code directory called s0, which uses the git version control. Now, we are interested in searching for a text or pattern in source files. In this case, searching in the .git subdirectory will be of no use. We can exclude .git from search as follows:

$  grep -R  --exclude-dir=.git "search_string" s0

Here, we are searching for the search_string string in the s0 directory and telling grep to not to search in the .git directory.

Instead of excluding a directory, to exclude a file, use the --exclude-from=FILE option.

Display a filename with a matching pattern

In some use-case, we don't bother with where the search matched and at how many places the search matched in a file. Instead, we are interested in knowing only the filename where at least one search matched.

For example, I want to save filenames that have a particular search pattern found in a file, or redirect to some other command for further processing. We can achieve this using the -l option:

$ grep -Rl "import os" /usr/lib64/python2.7/*.py > search_result.txt
$ wc -l search_result.txt
79

This example gets name of the file in which import os is written and saves result in file search_result.txt.

Matching an exact word

The exact matching of the word is also possible using word boundary that is  on both the sides of the search pattern.

Here, we will reuse the input1.txt file and its content:

$ grep -i --color "a" input1.txt

The --color option allows colored printing of the matched search result.

The "a" option tells grep to only look for the character a that is alone. In search results, it won't match the character a present as a sub-string in a string.

The following screenshot shows the output:

Matching an exact word
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.81.14