Chapter 3

Inspecting Files

Having seen how to create and manipulate files, now it’s time to learn how to examine their contents. This is especially important for files too long to fit on a single screen. In particular, we saw starting in Section 2.1 how to use the cat command to dump the file contents to the screen, but this doesn’t work very well for longer files.

3.1 Downloading a File

To give us a place to start, rather than creating a long file by hand (which is cumbersome) we’ll download a file from the Internet using the powerful curl utility. Sometimes written as “cURL”, the curl program allows us to interact with a URL1at the command line. Although it’s not part of the core Unix command set, the curl command is widely available for installation on Unix systems. To make sure it’s available on your system, we can use the which command, which looks to see if the given program is available at the command line.2 The way to use it is to type which followed by the name of the program—in this case, curl:

1. URL is short for Uniform Resource Locator, and in practice usually just means “web address”.

2. Technically, which locates a file on the user’s path, which is a list of directories where executable programs are located.

$ which curl
/usr/bin/curl

I’ve shown the output on my system (/usr/bin/curl, usually read as “user bin curl”), but the result on your system may differ. In particular, if the result is just a blank line, you will have to install curl, which you can do by Googling for “install curl” followed by the name of your operating system (e.g., “install curl macos”). (This sort of “Google for it” installation step is classic technical sophistication (Box 1.4).)

Once curl is installed, we can use the command in Listing 3.1 to download a file called sonnets.txt, which contains a large corpus of text.3

3. If for any reason using curl fails, you can always visit the URL in a browser and then use the File > Save As feature to save it to your local disk.

Listing 3.1: Using curl to download a longer file.

$ curl -OL https://cdn.learnenough.com/sonnets.txt
$ ls -rtl

Be sure to copy the command exactly; in particular, note that the option -OL contains a capital letter “O” (O) and not a zero (0). (Figuring out what these options do is left as an exercise (Section 3.5.1).) Also, on some systems (for mysterious reasons) you might have to run the command twice to get it to work; by inspecting the results of ls -rtl, you should be able to tell if the initial call to curl created the file sonnets.txt as expected. (If you do have to repeat the curl command, you could press up arrow twice to retrieve it, but see Box 3.1 for alternatives.)

The result of running Listing 3.1 is sonnets.txt, a file containing all 154 of Shakespeare’s sonnets. This file contains 2620 lines, far too many to fit on one screen. Learning how to inspect its contents is the goal of the rest of this chapter. (Among other things, we’ll learn how to determine that it has 2620 lines without counting them all by hand.)

3.1.1 Exercises

  1. Use the command curl -I https://www.learnenough.com/ to fetch the HTTP header for the Learn Enough website. What is the HTTP status code for the address? How does this differ from the status code for learnenough.com (without the https://)?

  2. Using ls, confirm that sonnets.txt exists on your system. How big is it in bytes? Hint: Recall from Section 2.2 that the “long form” of ls displays a byte count.

  3. The byte count in the previous exercise is high enough that it’s more naturally thought of in kilobytes (often treated as 1000 bytes, but actually equal to 210 = 1024 bytes). By adding the -h (“human-readable”) option to ls, list the long form of the sonnets file with a human-readable byte count.

  4. Suppose you wanted to list the files and directories using human-readable byte counts, all, by reverse time-sorted long-form. What command would you use? Why might this command be a personal favorite of the author of this tutorial?4

    4. Having known about ls -a and ls -rtl for a while—which together yield the suggestive command ls-artl—one day I decided to add an “h”’ (for obvious reasons [https://www.michaelhartl.com/]). This is actually how I accidentally discovered the useful -h option some years ago.

3.2 Making Heads and Tails of It

Two complementary commands for inspecting files are head and tail, which respectively allow us to view the beginning (head) and end (tail) of the file. The head command shows the first 10 lines of the file (Listing 3.2).

Listing 3.2: Looking at the head of the sample text file.

$ head sonnets.txt
Shake-speare's Sonnets

I

From fairest creatures we desire increase,
That thereby beauty's Rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel

Similarly, tail shows the last 10 lines of the file (Listing 3.3).

Listing 3.3: Looking at the tail of the sample text file.

$ tail sonnets.txt
The fairest votary took up that fire
Which many legions of true hearts had warm'd;
And so the general of hot desire
Was, sleeping, by a virgin hand disarm'd.
This brand she quenched in a cool well by,
Which from Love's fire took heat perpetual,
Growing a bath and healthful remedy,
For men diseas'd; but I, my mistress' thrall,
 Came there for cure and this by that I prove,
 Love's fire heats water, water cools not love.

These two commands are useful when (as is often the case) you know for sure you only need to inspect the beginning or end of a file.

3.2.1 Wordcount and Pipes

By the way, I didn’t recall offhand how many lines head and tail show by default. Since there are only 10 lines in the output, I could have counted them by hand, but in fact I was able to figure it out using the wc command (short for “wordcount”; recall Figure 2.4).

The most common use of wc is on full files. For example, we can run sonnets.txt through wc:

$ wc sonnets.txt
  2620  17670  95635 sonnets.txt

Here the three numbers indicate how many lines, words, and bytes there are in the file, so there are 2620 lines (thereby fulfilling the promise made at the end of Section 3.1), 17670 words, and 95635 bytes.

You are now in a position to be able to guess one method for determining how many lines are in head sonnets.txt. In particular, we can combine head with the redirect operator (Section 2.1) to make a file with the relevant contents, and then run wc on it, as shown in Listing 3.4.

Listing 3.4: Redirecting head and running wc on the result.

$ head sonnets.txt > sonnets_head.txt
$ wc sonnets_head.txt
   10   46   294 sonnets_head.txt

We see from Listing 3.4 that there are 10 lines in head wc (along with 46 words and 294 bytes). The same method, of course, would work for tail.

On the other hand, you might get the feeling that it’s a little unclean to make an intermediate file just to run wc on it, and indeed there’s a way to avoid it using a technique called pipes. Listing 3.5 shows how to do it.

Listing 3.5: Piping the result of head through wc.

$ head sonnets.txt | wc
   10   46   294

The command in Listing 3.5 runs head sonnets.txt and then pipes the result through wc using the pipe symbol | (Shift-backslash on most QWERTY keyboards). The reason this works is that the wc command, in addition to taking a filename as an argument, can (like many Unix programs) take input from “standard in” (compare to “standard out” mentioned in Section 1.3), which in this case is the output of head sonnets.txt shown in Listing 3.2. The wc program takes this input and counts it the same way it counts a file, yielding the same line, word, and byte counts as Listing 3.4.

3.2.2 Exercises

  1. By piping the results of tail sonnets.txt through wc, confirm that (like head) the tail command outputs 10 lines by default.

  2. By running man head, learn how to look at the first n lines of the file. By experimenting with different values of n, find a head command to print out just enough lines to display the first sonnet in its entirety (Figure 1.11).

  3. Pipe the results of the previous exercise through tail (with the appropriate options) to print out only the 14 lines composing Sonnet 1. Hint: The command will look something like head -n <i> sonnets.txt | tail -n <j>, where <i> and <j> represent the numerical arguments to the -n option.

  4. One of the most useful applications of tail is running tail -f to view a file that’s actively changing. This is especially common when monitoring files used to log the activity of, e.g., web servers, a practice known as “tailing the log file”. To simulate the creation of a log file, run ping learnenough.com > learnenough.log in one terminal tab. (The ping command “pings” a server to see if it’s working.) In a second tab, type the command to tail the log file. (At this point, both tabs will be stuck, so once you’ve gotten the gist of tail -f you should use the technique from Box 1.3 to get out of trouble.)

3.3 Less Is More

Unix provides two utilities for the common task of wanting to look at more than just the head or tail of a file. The older of these programs is called more, but (I’d guess initially as a tongue-in-cheek joke) there’s a more powerful variant called less.5 The less program is interactive, so it’s hard to capture in print, but here’s roughly what it looks like:

5. On some systems, apparently they’re exactly the same program, so less really is more (or, more accurately, more is less).

$ less sonnets.txt
Shake-speare's Sonnets

I

From fairest creatures we desire increase,
That thereby beauty's Rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
 Pity the world, or else this glutton be,
 To eat the world's due, by the grave and thee.

II

When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
sonnets.txt

The point of less is that it lets you navigate through the file in several useful ways, such as moving one line up or down with the arrow keys, pressing the spacebar to move a page down, and pressing ^F to move forward a page (i.e., the same as spacebar) or ^B to move back a page. To quit less, type q (for “quit”).

Perhaps the most powerful aspect of less is the forward slash key /, which lets you search through the file from beginning to end. For example, suppose we wanted to search through sonnets.txt for “rose” (Figure 3.1),6 one of the most frequently used images in the Sonnets.7 The way to do this in less is to type /rose (read “slash rose”), as shown in Listing 3.6.

6. Image courtesy of Shuang Li/Shutterstock.

7. Although Shakespeare’s sonnets are undated, most of them were probably composed during the reign of Queen Elizabeth, whose royal house adopted a rose (Figure 3.1) as its heraldic emblem. Given this context, Shakespeare’s choice of floral imagery isn’t surprising, but in fact only a few commentators on the Sonnets have noticed the seemingly obvious reference.

Image

Figure 3.1: A famous rose from the time of Shakespeare.

Listing 3.6: Searching for the string “rose” using less.

Shake-speare's Sonnets

I

From fairest creatures we desire increase,
That thereby beauty's Rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
 Pity the world, or else this glutton be,
 To eat the world's due, by the grave and thee.

II

When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
/rose

The result of pressing return after typing /rose in Listing 3.6 is to highlight the first occurrence of “rose” in the file. You can then press n to navigate to the next match, or N to navigate to the previous match.

The last two essential less commands are G to move to the end of the file and 1G (that’s 1 followed by G) to move back to the beginning. Table 3.1 summarizes what are in my view the most important key combinations (i.e., the ones I think you need to be dangerous), but if you’re curious you can find a longer list of commands at the Wikipedia page on less.

Table 3.1: The most important less commands.

Command

Description

Example

up & down arrow keys

Move up or down one line

 

spacebar

Move forward one page

 

^F

Move forward one page

 

^B

Move back one page

 

G

Move to end of file

 

1G

Move to beginning of file

 

/<string>

Search file for string

/rose

n

Move to next search result

 

N

Move to previous search result

 

q

Quit less

 

I encourage you to get in the habit of using less as your go-to utility for looking at the contents of a file. The skills you develop have other applications as well; for example, the man pages (Section 1.4) use the same interface as less, so by learning about less you’ll get better at navigating the man pages as well.

3.3.1 Exercises

  1. Run less on sonnets.txt. Go down three pages and then back up three pages. Go to the end of the file, then to the beginning, then quit.

  2. Search for the string “All” (case-sensitive). Go forward a few occurrences, then back a few occurrences. Then go to the beginning of the file and count the occurrences by searching forward until you hit the end. Compare your count to the result of running grep All sonnets.txt | wc. (We’ll learn about grep in Section 3.4.)

  3. Using less and / (“slash”), find the sonnet that begins with the line “Let me not”. Are there any other occurrences of this string in the Sonnets? Hint: Press n to find the next occurrence (if any). Extra credit: Listen to the sonnet (https://www.youtube.com/watch?v=bt7OynPUIY8) in both modern and original pronunciation. Which version’s rhyme scheme is better?

  4. Because man uses less, we are now in a position to search man pages interactively. By searching for the string “sort” in the man page for ls, discover the option to sort files by size. What is the command to display the long form of files sorted so the largest files appear at the bottom? Hint: Use ls -rtl as a model.

3.4 Grepping

One of the most powerful tools for inspecting file contents is grep, which probably stands for something, but it’s not important what. (We’ll actually mention it in a moment.) Indeed, grep is frequently used as a verb, as in “You should totally grep that file.”

The most common use of grep is just to search for a substring in a file. For example, we saw in Section 3.3 how to use less to search for the string “rose” in Shakespeare’s sonnets. Using grep, we can find the references directly, as shown in Listing 3.7.

Listing 3.7: Finding the occurrences of “rose” in Shakespeare’s sonnets.

$ grep rose sonnets.txt
The rose looks fair, but fairer we it deem
As the perfumed tincture of the roses.
Die to themselves. Sweet roses do not so;
Roses of shadow, since his rose is true?
Which, like a canker in the fragrant rose,
Nor praise the deep vermilion in the rose;
The roses fearfully on thorns did stand,
 Save thou, my rose, in it thou art my all.
I have seen roses damask'd, red and white,
But no such roses see I in her cheeks;

With the command in Listing 3.7, it appears that we are in a position to count the number of lines containing references to the word “rose” by piping to wc (as in Section 3.3), as shown in Listing 3.8.

Listing 3.8: Piping the results of grep to wc.

$ grep rose sonnets.txt | wc
   10   82   419

Listing 3.8 tells us that 10 lines contain “rose” (or “roses”, since “rose” is a substring of “roses”). But you may recall from Figure 1.11 that Shakespeare’s first sonnet contains “Rose” with a capital “R”. Referring to Listing 3.7, we see that this line has in fact been missed. This is because grep is case-sensitive by default, and “rose” doesn’t match “Rose”.

As you might suspect, grep has an option to perform case-insensitive matching as well. One way to figure it out is to search through the man page for grep:

  • Type man grep.

  • Type /case and then return.

  • Read off the result (Figure 3.2).

    Image

    Figure 3.2: The result of searching man grep for “case”.

(As noted briefly in Section 1.4, the man pages use the same interface as the less command we met in Section 3.3, so we can search through them using /.)

Applying the result of the above procedure yields Listing 3.9. Comparing the results of Listing 3.9 with Listing 3.8, we see that we now have 12 matching lines instead of only 10, so there must be a total of 12 – 10 = 2 lines containing “Rose” (but not “rose”) in the Sonnets.8

8. Actually, “ROSE”, “RoSE”, “rOSE”, etc., all match as well, but “Rose” is the likeliest candidate. Confirming this hunch is left as an exercise (Section 3.4.1).

Listing 3.9: Doing a case-insensitive grep.

$ grep -i rose sonnets.txt | wc
   12   96   508

The grep utility gets its name from a pattern-matching system called regular expressions (also called regexes for short): grep stands for “globally search a regular expression and print.” A full treatment of regular expressions is well beyond the scope of this tutorial, but before moving on we’ll sample just a small taste.

As one simple example, let’s match every line in sonnets.txt that has a word beginning with the letters “ro”, followed by any number of (lowercase) letters, and ending in “s”. The way to represent “any letter” with a regular expression is [a-z], and following a pattern with an asterisk * matches “zero or more” of that thing. Thus, ro[a-z]*s matches “ro” and “s” with zero or more letters in between. We can add spaces to the beginning and end to ensure that the match consists of entire words, like this:

$ grep ' ro[a-z]*s ' sonnets.txt
  To that sweet thief which sourly robs from me.
Die to themselves. Sweet roses do not so;
When rocks impregnable are not so stout,
He robs thee of, and pays it thee again.
The roses fearfully on thorns did stand,
I have seen roses damask'd, red and white,
But no such roses see I in her cheeks;

We can see that the regular expression matches strings like “robs” and “rocks” in addition to “roses”.

In general, one of the best tools for learning how to use regexes is an online regex builder, such as regex101, which lets you build up regexes interactively (Figure 3.3). Unfortunately, grep often doesn’t support the precise format used by regex builders (including hard-to-guess requirements for “escaping out” special characters), and precision in regular expressions is everything. As a result, despite its name origins, the truth is I rarely use the regular expression capabilities of grep. By the time the situation calls for regexes, I’m far likelier to reach for a text editor (Part II) or a full-strength programming language (Learn Enough JavaScript to Be Dangerous (https://www.learnenough.com/javascript), Learn Enough Ruby to Be Dangerous (https://www.learnenough.com/ruby)).

Image

Figure 3.3: An online regex builder (https://regex101.com/).

Nevertheless, the aspects of grep discussed in this section are nearly enough to be dangerous, covering a huge number of common cases (including the important application of grepping processes (Box 3.2)). We’ll see one final grep variant in Chapter 4 as part of our discussion of Unix directories.

3.4.1 Exercises

  1. By searching man grep for “line number”, construct a command to find the line numbers in sonnets.txt where the string “rose” appears.

  2. You should find that the last occurrence of “rose” is (via “roses”) on line 2203. Figure out how to go directly to this line when running less sonnets.txt. Hint: Recall from Table 3.1 that 1G goes to the top of the file, i.e., line 1. Similarly, 17G goes to line 17. Etc.

  3. By piping the output of grep to head, print out the first (and only the first) line in sonnets.txt containing “rose”. Hint: Use the result of the second exercise in Section 3.2.2.

  4. In Listing 3.9, we saw two additional lines that case-insensitively matched “rose”. Execute a command confirming that both of the lines contain the string “Rose” (and not, e.g., “rOSe”). Hint: Use a case-sensitive grep for “Rose”.

  5. You should find in the previous exercise that there are three lines matching “Rose” instead of the two you might have expected from Listing 3.9. This is because there is one line that contains both “Rose” and “rose”, and thus shows up in both grep rose and grep -i rose. Write a command confirming that the number of lines matching “Rose” but not matching “rose” is equal to the expected 2. Hint: Pipe the result of grep to grep -v, and then pipe that result to wc. (What does -v do? Read the man page for grep (Box 1.4).)

3.5 Summary

Important commands from this chapter are summarized in Table 3.2.

Table 3.2: Important commands from Chapter 3.

Command

Description

Example

curl

Interact with URLs

$ curl -O https://ex.co

which

Locate a program on the path

$ which curl

head <file>

Display first part of file

$ head foo

tail <file>

Display last part of file

$ tail bar

wc <file>

Count lines, words, bytes

$ wc foo

cmd1 | cmd2

Pipe cmd1 to cmd2

$ head foo | wc

ping <url>

Ping a server URL

$ ping google.com

less <file>

View file contents interactively

$ less foo

grep <string> <file>

Find string in file

$ grep foo bar.txt

grep -i <string> <file>

Find case-insensitively

$ grep -i foo bar.txt

ps

Show processes

$ ps aux

top

Show processes (sorted)

$ top

kill -<level> <pid>

Kill a process

$ kill -15 24601

pkill -<level> -f <name>

Kill matching processes

$ pkill -15 -f spring

3.5.1 Exercises

  1. The history command prints the history of commands in a particular terminal shell (subject to some limit, which is typically large). Pipe history to less to examine your command history. What was your 17th command?

  2. By piping the output of history to wc, count how many commands you’ve executed so far.

  3. One use of history is to grep your commands to find useful ones you’ve used before, with each command preceded by the corresponding number in the command history. By piping the output of history to grep, determine the number for the last occurrence of curl.

  4. In Box 3.1, we learned about !! (“bang bang”) to execute the previous command. Similarly, !n executes command number n, so that, e.g., !17 executes the 17th command in the command history. Use the result from the previous exercise to rerun the last occurrence of curl.

  5. What do the O and L options in Listing 3.1 mean? Hint: Pipe the output of curl -h to less and search first for the string -O and then for the string -L.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.159.152