Chapter 4. Working with Text Files

Image

The following topics are covered in this chapter:

The following RHCSA exam objectives are covered in this chapter:

  • Use grep and regular expressions to analyze text

  • Create and edit text files

Since the early days of UNIX, working with text files has been an important administrator skill. Even on modern Linux versions such as Red Hat Enterprise Linux 8, working with text files is still an important skill. By applying the correct tools, you’ll easily find the information you need. This chapter is about these tools. Make sure that you master them well, because good knowledge of these tools really will make your work as a Linux administrator a lot easier. At the same time, it will increase your chances of passing the RHCSA exam.

“Do I Know This Already?” Quiz

The “Do I Know This Already?” quiz allows you to assess whether you should read this entire chapter thoroughly or jump to the “Exam Preparation Tasks” section. If you are in doubt about your answers to these questions or your own assessment of your knowledge of the topics, read the entire chapter. Table 4-1 lists the major headings in this chapter and their corresponding “Do I Know This Already?” quiz questions. You can find the answers in Appendix A, “Answers to the ‘Do I Know This Already?’ Quizzes and ‘Review Questions.’

Table 4-1 “Do I Know This Already?” Section-to-Question Mapping

Foundation Topics Section

Questions

Using Common Text File–Related Tools

1–5

A Primer to Using Regular Expressions

6–8

Using grep to Analyze Text

10

Working with Other Useful Text Processing Utilities

9

1. Which command was developed to show only the first ten lines in a text file?

a. head

b. top

c. first

d. cat

2. Which command enables you to count the number of words in a text file?

a. count

b. list

c. ls -l

d. wc

3. Which key on your keyboard do you use in less to go to the last line of the current text file?

a. End

b. Page Down

c. q

d. G

4. Which option is missing (...) from the following command, assuming that you want to filter the first field out of the /etc/passwd file and assuming that the character that is used as the field delimiter is a :?

  cut ... : -f 1 /etc/passwd

a. -d

b. -c

c. -t

d. -x

5. Which option is missing (...) if you want to sort the third column of the output of the command ps aux?

  ps aux | sort ...

a. -k3

b. -s3

c. -k f 3

d. -f 3

6. Which of the following commands would only show lines in the file /etc/passwd that start with the text anna?

a. grep anna /etc/passwd

b. grep -v anna /etc/passwd

c. grep $anna /etc/passwd

d. grep ^anna /etc/passwd

7. Which regular expression do you use to make the previous character optional?

a. ?

b. .

c. *

d. &

8. Which regular expression is used as a wildcard to refer to any single character?

a. ?

b. .

c. *

d. &

9. Which command prints the fourth field of a line in the /etc/passwd file if the text user occurs in that line?

a. awk '/user/ { print $4 }' /etc/passwd

b. awk -d : '/user/ { print $4 }' /etc/passwd

c. awk -F : '/user/ $4' /etc/passwd

d. awk -F : '/user/ { print $4 }' /etc/passwd

10. Which option would you use with grep to show only lines that do not contain the regular expression that was used?

a. -x

b. -v

c. -u

d. -q

Foundation Topics

Using Common Text File–Related Tools

Before we start talking about the best possible way to find text files containing specific text, let’s take a look at how you can display text files in an efficient way. Table 4-2 provides an overview of some common commands often used for this purpose.

Key topic

Table 4-2 Essential Tools for Managing Text File Contents

Command

Explanation

less

Opens the text file in a pager, which allows for easy reading

cat

Dumps the contents of the text file on the screen

head

Shows the first ten lines of the text file

tail

Shows the last ten lines of the text file

cut

Used to filter specific columns or characters from a text file

sort

Sorts the contents of a text file

wc

Counts the number of lines, words, and characters in a file

Apart from using these commands on a text file, they may also prove very useful when used in pipes. You can use the command less /etc/passwd, for example, to open the contents of the /etc/passwd file in the less pager, but you can also use the command ps aux | less, which sends the output of the command ps aux to the less pager to allow for easy reading.

Doing More with less

In many cases, as a Linux administrator you’ll need to read the contents of text files. The less utility offers a convenient way to do so. To open the contents of a text file in less, just type less followed by the name of the file you want to see, as in less /etc/passwd.

From less, you can use the Page Up and Page Down keys on your keyboard to browse through the file contents. Seen enough? Then you can press q to quit less. Also very useful is that you can easily search for specific contents in less using /sometext for a forward search and ?sometext for a backward search. Repeat the last search by using n.

If you think this sounds familiar, it should. You have seen similar behavior in vim and man. That is because all of these commands are based on the same code.

Note

Once upon a time, less was developed because it offered more features than the classical UNIX tool more that was developed to go through file contents page by page. So, the idea was to do more with less. Developers did not like that, so they enhanced the features of the more command as well. The result is that both more and less offer many features that are similar, and which tool you use doesn’t really matter that much anymore. There is one significant difference, though, and that is the more utility ends if the end of the file is reached. To prevent this behavior, you can start more with the -p option.

In Exercise 4-1, you apply some basic less skills to work with file contents and command output.

Exercise 4-1 Applying Basic less Skills

  1. From a terminal, type less /etc/passwd. This opens the /etc/passwd file in the less pager.

  2. Press G to go to the last line in the file.

  3. Type /root to look for the text root. You’ll see that all occurrences of the text root are highlighted.

  4. Press q to quit less.

  5. Type ps aux | less. This sends the output of the ps aux command (which shows a listing of all processes) to less. Browse through the list.

  6. Press q to quit less.

Showing File Contents with cat

The less utility is useful to read long text files. If a text file is not that long, you are probably better off using cat. This tool just dumps the contents of the text file on the terminal it was started from. This is convenient if the text file is short. If the text file is long, however, you’ll see all contents scrolling by on the screen, and only the lines that fit on the terminal screen are displayed. Using cat is simple. Just type cat followed by the name of the file you want to see. For instance, use cat /etc/passwd to show the contents of this file on your computer screen.

Tip

The cat utility dumps the contents of a file to the screen from the beginning to the end, which means that for a long file you’ll see the last lines of the file only. If you are interested in the first lines, you can use the tac utility, which gives the inversed result of cat.

Displaying the First or Last Lines of a File with head and tail

If a text file contains much information, it can be useful to filter the output a bit. You can use the head and tail utilities to do that. Using head on a text file will show by default the first ten lines of that file. Using tail on a text file shows the last ten lines by default. You can adjust the number of lines that are shown by adding -n followed by the number you want to see. So, tail -n 5 /etc/passwd shows the last five lines of the /etc/passwd file.

Tip

With older versions of head and tail, you had to use the -n option to specify the number of lines you wanted to see. With current versions of both utilities, you may also omit the -n option. So, using either tail -5 /etc/passwd or tail -n 5 /etc/passwd gives you the exact same results.

Another useful option that you can use with tail is -f. This option starts by showing you the last ten lines of the file you’ve specified, but it refreshes the display as new lines are added to the file. This is convenient for monitoring log files. The command tail -f /var/log/messages (which has to be run as the root user) is a common command to show in real time messages that are written to the main log file /var/log/messages. To close this screen, press Ctrl-C.

By combining tail and head, you can do smart things as well. Suppose, for instance, that you want to see line number 11 of the /etc/passwd file. To do that, use head -n 11 /etc/passwd | tail -n 1. The command before the pipe shows the first 11 lines from the file. The result is sent to the pipe, and on that result tail -n 1 is used, which leads to only line number 11 being displayed. In Exercise 4-2, you apply some basic head and tail operations to get the exact results that you want.

Exercise 4-2 Using Basic head and tail Operations

  1. Type tail -f /var/log/messages. You’ll see the last lines of /var/log/messages being displayed. The file doesn’t close automatically.

  2. Press Ctrl-C to quit the previous command.

  3. Type head -n 5 /etc/passwd to show the first five lines in /etc/passwd.

  4. Type tail -n 2 /etc/passwd to show the last two lines of /etc/passwd.

  5. Type head -n 5 /etc/passwd | tail -n 1 to show only line number 5 of the /etc/passwd file.

Filtering Specific Columns with cut

When working with text files, it can be useful to filter out specific fields. Imagine that you need to see a list of all users in the /etc/passwd file. In this file, several fields are defined, of which the first contains the name of the users who are defined. To filter out a specific field, the cut command is useful. To do this, use the -d option to specify the field delimiter followed by -f with the number of the specific field you want to filter out. So, the complete command is cut -d : -f 1 /etc/passwd if you want to filter out the first field of the /etc/passwd file. You can see the result in Example 4-1.

Example 4-1 Filtering Specific Fields with cut

[root@localhost ~]# cut -f 1 -d : /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
...

Sorting File Contents and Output with sort

Another very useful command to use on text files is sort. As you can probably guess, this command sorts text. If you type sort /etc/passwd, for instance, the content of the /etc/passwd file is sorted in byte order. You can use the sort command on the output of a command also, as in cut -f 1 -d : /etc/passwd | sort, which sorts the contents of the first column in the /etc/passwd file.

By default, the sort command sorts in byte order. Notice that this looks like alphabetical order, but it is not, as all capital letters are shown before lowercase letters. So Zoo would be listed before apple. In some cases, that is not convenient because the content that needs sorting may be numeric or in another format. The sort command offers different options to help sorting these specific types of data. Type, for instance, cut -f 2 -d : /etc/passwd | sort -n to sort the second field of the /etc/passwd file in numeric order. It can be useful also to sort in reverse order; if you use the command du -h | sort -rn, you get a list of files sorted with the biggest file in that directory listed first.

You can also use the sort command and specify which column you want to sort. To do this, use sort -k3 -t : /etc/passwd, for instance, which uses the field separator : to sort the third column of the /etc/passwd file.

You might also like the option to sort on a specific column of a file or the output of a command. An example is the command ps aux, which gives an overview of the busiest processes on a Linux server. (Example 4-2 shows partial output of this command.)

Example 4-2 Using ps aux to Find the Busiest Processes on a Linux Server

[root@localhost ~]# ps aux | tail -n 10
postfix 1350    0.0  0.7  91872  3848 ?     S   Jan24 0:00 qmgr –l
      -t unix -u
root    2162    0.0  0.3  115348 1928 tty1  Ss+ Jan24 0:00 -bash
postfix 5131    0.0  0.7  91804  3832 ?     S   12:10 0:00 pickup
      -l -t unix -u
root    5132    0.0  0.0  0      0    ?     S   12:10 0:00
      [kworker/0:1]
root    5146    0.0  0.9  133596 4868 ?     Ss  12:12 0:00 sshd:
      root@pts/0
root    5150    0.0  0.3  115352 1940 pts/0 Ss  12:12 0:00 -bash
root    5204    0.0  0.0  0      0    ?     S   12:20 0:00
      [kworker/0:2]
root    5211    0.0  0.0  0      0    ?     S   12:26 0:00
      [kworker/0:0]
root    5212    0.0  0.2  123356 1320 pts/0 R+  12:26 0:00 ps aux
root    5213    0.0  0.1  107928  672 pts/0 R+  12:26 0:00 tail -n 10

To sort the output of this command directly on the third column, use the command ps aux | sort -k3.

Counting Lines, Words, and Characters with wc

When working with text files, you sometimes get a large amount of output. Before deciding which approach to handling the large amount of output works best in a specific case, you might want to have an idea about the amount of text you are dealing with. In that case, the wc command is useful. In its output, this command gives three different results: the number of lines, the number of words, and the number of characters.

Consider, for example, the ps aux command. When executed as root, this command gives a list of all processes running on a server. One solution to count how many processes there are exactly is to pipe the output of ps aux through wc, as in ps aux | wc. You can see the result of the command in Example 4-3, which shows that the total number of lines is 90 and that there are 1,045 words and 7,583 characters in the command output.

Example 4-3 Counting the Number of Lines, Words, and Characters with wc

[root@localhost ~]# ps aux | wc
    90       1045   7583

A Primer to Using Regular Expressions

Working with text files is an important skill for a Linux administrator. You must know not only how to create and modify existing text files, but also how to find the text file that contains specific text.

It will be clear sometimes which specific text you are looking for. Other times, it might not. For example, are you looking for color or colour? Both spellings might give a match. This is just one example of why using flexible patterns while looking for text can prove useful. These flexible patterns are known as regular expressions in Linux.

To understand regular expressions a bit better, let’s take a look at a text file example, shown in Example 4-4. This file contains the last six lines from the /etc/passwd file. (This file is used for storing Linux accounts; see Chapter 6, “User and Group Management,” for more details.)

Example 4-4 Example Lines from /etc/passwd

[root@localhost ~]# tail -n 6 /etc/passwd
anna:x:1000:1000::/home/anna:/bin/bash
rihanna:x:1001:1001::/home/rihanna:/bin/bash
annabel:x:1002:1002::/home/annabel:/bin/bash
anand:x:1003:1003::/home/anand:/bin/bash
joanna:x:1004:1004::/home/joanna:/bin/bash
joana:x:1005:1005::/home/joana:/bin/bash

Now suppose that you are looking for the user anna. In that case, you could use the general regular expression parser grep to look for that specific string in the file /etc/passwd by using the command grep anna /etc/passwd. Example 4-5 shows the results of that command, and as you can see, way too many results are shown.

Example 4-5 Example of Why You Need to Know About Regular Expressions

[root@localhost ~]# grep anna /etc/passwd
anna:x:1000:1000::/home/anna:/bin/bash
rihanna:x:1001:1001::/home/rihanna:/bin/bash
annabel:x:1002:1002::/home/annabel:/bin/bash
joanna:x:1004:1004::/home/joanna:/bin/bash
Key topic

A regular expression is a search pattern that allows you to look for specific text in an advanced and flexible way.

Using Line Anchors

In Example 4-5, suppose that you wanted to specify that you are looking for lines that start with the text anna. The type of regular expression that specifies where in a line of output the result is expected is known as a line anchor.

To show only lines that start with the text you are looking for, you can use the regular expression ^ (in this case, to indicate that you are looking only for lines where anna is at the beginning of the line; see Example 4-6).

Example 4-6 Looking for Lines Starting with a Specific Pattern

[root@localhost ~]# grep ^anna /etc/passwd
anna:x:1000:1000::/home/anna:/bin/bash
annabel:x:1002:1002::/home/annabel:/bin/bash

Another regular expression that relates to the position of specific text in a specific line is $, which states that the line ends with some text. For instance, the command grep ash$ /etc/passwd shows all lines in the /etc/passwd file that end with the text ash. This command shows all accounts that have a shell and are able to log in (see Chapter 6 for more details).

Using Escaping in Regular Expressions

Although not mandatory, when using regular expressions, it is a good idea to use escaping to prevent regular expressions from being interpreted by the shell. When a command line is entered, the Bash shell parses the command line, looking for any special characters like *, $, and ?. It will next interpret these characters. The point is that regular expressions use some of these characters as well, and to make sure the Bash shell doesn’t interpret them, you should use escaping.

In many cases, it is not really necessary to use escaping; in some cases, the regular expression fails without escaping. To prevent this from ever happening, it is a good idea to put the regular expression between quotes. So, instead of typing grep ^anna /etc/passwd, it is better to use grep '^anna' /etc/passwd, even if in this case both examples work.

Using Wildcards and Multipliers

In some cases, you might know which text you are looking for, but you might not know how the specific text is written. Or you might just want to use one regular expression to match different patterns. In those cases, wildcards and multipliers come in handy.

To start with, there is the dot (.) regular expression. This is used as a wildcard character to look for one specific character. So, the regular expression r.t would match the strings rat, rot, and rut.

In some cases, you might want to be more specific about the characters you are looking for. If that is the case, you can specify a range of characters that you are looking for. For instance, the regular expression r[aou]t matches the strings rat, rot, and rut.

Another useful regular expression is the multiplier *. This matches zero or more of the previous character. That does not seem to be very useful, but indeed it is, as you will see in the examples at the end of this section.

If you know exactly how many of the previous character you are looking for, you can specify a number also, as in re{2}d, which would match reed, but not red. The last regular expression that is useful to know about is ?, which matches zero or one of the previous character. Table 4-3 provides an overview of the most important regular expressions.

Key topic

Table 4-3 Most Significant Regular Expressions

Regular Expression

Use

^text

Matches line that starts with specified text.

text$

Matches line that ends with specified text.

.

Wildcard. (Matches any single character.)

[abc]

Matches a, b, or c.

*

Matches zero to an infinite number of the previous character.

{2}

Matches exactly two of the previous character.

{1,3}

Matches a minimum of one and a maximum of three of the previous character.

colou?r

Matches zero or one of the previous character. This makes the previous character optional, which in this example would match both color and colour.

Let’s take a look at an example of a regular expression that comes from the man page semanage-fcontext and relates to managing SELinux (see Chapter 22, “Managing SELinux”). The example line contains the following regular expression:

"/web(/.*)?"

In this regular expression, the text /web is referred to. This text string can be followed by the regular expression (/.*)?, which means zero or one (/.*), which in fact means that it can be followed by nothing or (/.*). The (/.*) refers to a slash, which may be followed by an unlimited number of characters. To state it differently, the regular expression refers to the text /web, which may or may not be followed by any characters.

Using grep to Analyze Text

The ultimate utility to work with regular expressions is grep, which stands for “general regular expression parser.” Quite a few examples that you have seen already were based on the grep command. The grep command has a couple of useful options to make it even more efficient. Table 4-4 describes some of the most useful options.

Key topic

Table 4-4 Most Useful grep Options

Option

Use

-i

Not case sensitive. Matches upper- and lowercase letters.

-v

Only shows lines that do not contain the regular expression.

-r

Searches files in the current directory and all subdirectories.

-e

Searches for lines matching more than one regular expression.

-A <number>

Shows <number> of lines after the matching regular expression.

-B <number>

Shows <number> of lines before the matching regular expression.

In Exercise 4-3, you work through some examples using these grep options.

Exercise 4-3 Using Common grep Options

  1. Type grep '^#' /etc/sysconfig/sshd. This shows that the file /etc/sysconfig/sshd contains a number of lines that start with the comment sign, #.

  2. 2. To view the configuration lines that really matter, type grep -v '^#' /etc/ sysconfig/sshd. This shows only lines that do not start with a #.

  3. Type grep -v '^#' /etc/sysconfig/sshd -B 5. This shows lines that do not start with a # sign but also the five lines that are directly before each of those lines, which is useful because in the preceding lines you’ll typically find comments on how to use the specific parameters. However, you’ll also see that many blank lines are displayed.

  4. Type grep -v -e '^#' -e '^$' /etc/sysconfig/sshd. This excludes all blank lines and lines that start with #.

Working with Other Useful Text Processing Utilities

The grep utility is a powerful utility that allows you to work with regular expressions. It is not the only utility, though. Some even more powerful utilities exist, like awk and sed, both of which are extremely rich and merit a book by themselves. The utilities were developed in the time that computers did not commonly have screens attached, and for that reason they do a good job in treating text files in a scripted way.

As a Linux administrator in the twenty-first century, you do not have to be a specialist in using these utilities anymore. It does make sense, however, to know how to perform some common tasks using these utilities. The most useful use cases are summarized in the following examples.

This command shows the fourth line from /etc/passwd:

awk -F :   '{ print $4 }'  /etc/passwd

This is something that can be done by using the cut utility as well, but the awk utility is more successful in distinguishing the fields that are used in command output of files. The bottom line is that if cut does not work, you should try the awk utility.

You can also use the awk utility to do tasks that you might be used to using grep for. Consider the following example:

awk -F : '/user/ { print $4 }' /etc/passwd

This command searches the /etc/passwd file for the text user and will print the fourth field of any matching line.

In this example, the “stream editor” sed is used to print the fifth line from the /etc/passwd file:

sed -n 5p /etc/passwd

The sed utility is a very powerful utility for filtering text from text files (like grep), but it has the benefit that it also allows you to apply modifications to text files, as shown in the following example:

sed -i s/old-text/new-text/g ~/myfile

In this example, the sed utility is used to search for the text old-text in ~/myfile and replace all occurrences with the text new-text. Notice that the default sed behavior is to write the output to STDOUT, but the option -i will write the result directly to the file. Make sure that you know what you are doing before using this command, because it might be difficult to revert file modifications that are applied in this way.

You’ll like the following example if you’ve ever had a utility containing a specific line in a file that was erroneous:

sed -i -e '2d' ~/myfile

With this command, you can delete a line based on a specific line number. You can also make more complicated references to line numbers. Use, for instance, sed -i -e '2d;20,25d' ~/myfile to delete lines 2 and 20 through 25 in the file ~/myfile.

Tip

Do not focus on awk and sed too much. These are amazing utilities, but many of the things that can be accomplished using them can be done using other tools as well. The awk and sed tools are very rich, and you can easily get lost in them if you are trying to dig too deep.

Summary

In this chapter, you learned how to work with text files. You acquired some important skills like searching text files with grep and displaying text files or part of them with different utilities. You have also learned how regular expressions can be used to make the search results more specific. Finally, you learned about the very sophisticated utilities awk and sed, which allow you to perform more advanced operations on text files.

Exam Preparation Tasks

As mentioned in the section “How to Use This Book” in the Introduction, you have several choices for exam preparation: the end-of-chapter labs; the memory tables in Appendix B; Chapter 26, “Final Preparation”; and the practice exams.

Review All Key Topics

Review the most important topics in the chapter, noted with the Key Topic icon in the outer margin of the page. Table 4-5 lists a reference of these key topics and the page number on which each is found.

Key topic

Table 4-5 Key Topics for Chapter 4

Key Topic Element

Description

Page

Table 4-2

Essential tools for managing text file contents

84

Paragraph

Definition of regular expressions

90

Table 4-3

Most useful regular expressions

91

Table 4-4

Most useful grep options

92

Complete Tables and Lists from Memory

Print a copy of Appendix B, “Memory Tables” (found on the companion website), or at least the section for this chapter, and complete the tables and lists from memory. Appendix C, “Memory Tables Answer Key,” includes completed tables and lists to check your work.

Define Key Terms

Define the following key terms from this chapter and check your answers in the glossary:

regular expression

pager

escaping

wildcard

multiplier

line anchor

Review Questions

The questions that follow are meant to help you test your knowledge of concepts and terminology and the breadth of your knowledge. You can find the answers to these questions in Appendix A.

1. Which command enables you to see the results of the ps aux command in a way that you can easily browse up and down in the results?

2. Which command enables you to show the last five lines from ~/samplefile?

3. Which command do you use if you want to know how many words are in ~/samplefile?

4. After opening command output using tail -f ~/mylogfile, how do you stop showing output?

5. Which grep option do you use to exclude all lines that start with either a # or a ;?

6. Which regular expression do you use to match one or more of the preceding character?

7. Which grep command enables you to see text as well as TEXT in a file?

8. Which grep command enables you to show all lines starting with PATH, as well as the five lines just before that line?

9. Which sed command do you use to show line 9 from ~/samplefile?

10. Which command enables you to replace the word user with the word users in ~/samplefile?

End-of-Chapter Lab

In this end-of-chapter lab, you work with some of the most significant text processing utilities.

Lab 4.1

1. Describe two ways to show line 5 from the /etc/passwd file.

2. How would you locate all text files on your server that contain the current IP address? Do you need a regular expression to do this?

3. You have just used the sed command that replaces all occurrences of the text Administrator with root. Your Windows administrators do not like that very much. How do you revert?

4. Assuming that in the ps aux command the fifth line contains information about memory utilization, how would you process the output of that command to show the process that has the heaviest memory utilization first in the results list?

5. Which command enables you to filter the sixth column of ps aux output?

6. How do you delete the sixth line from the file ~/myfile?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.102.239