Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix 2
Useful Shell Commands

Although this is first and foremost a book about text processing with Ruby, it’s also about an approach to text processing that fits in with existing tools. If you’re not particularly experienced with the Unix command line and the utilities it offers, this list might help you find the right tool for the job.

There are only 16 commands here, but together they form a considerable arsenal and—along with Ruby—will provide you with the tools you need for virtually all text processing tasks.

These commands are all part of GNU’s coreutils project and are invariably packaged with Linux distributions. Mac OS X ships with virtually all of them, and those that it doesn’t can be installed using Homebrew:^[17]

$ ​brew​​ ​​install​​ ​​coreutils​

Windows users should install Cygwin^[18] to get them.

The rest of the chapter gives a summary of each of these 16 commands.

cat

Outputs the content of the filenames passed to it. Its name comes from the word concatenate, since it concatenates the files one after another. If no files are given, it outputs standard input.

	$ cat foo.txt
	foo
	$ cat bar.txt
	bar
	$ cat foo.txt bar.txt
	foo
	bar

tac

Exactly like cat, but outputs the lines of the files in reverse—that is, starting from the last line of the first file and working backward, then the last line of the second, and so on.

	$ cat foo.txt
	foo
	bar
	$ tac foo.txt
	bar
	foo
	$ cat baz.txt
	baz
	$ tac foo.txt baz.txt
	bar
	foo
	baz

shuf

Randomizes the order of lines in standard input or in the specified file. If the -n option is specified, will output at most that many lines; otherwise, all of the lines in the input will be shuffled.

Display five random dictionary words:

	$ shuf -n 5 /usr/share/dict/words
	gasification
	merrymake
	thingum
	chiliastic
	zygose

head

Outputs the first n lines of a file, where n is by default 10 but can be altered by passing a number to head.

Output the first ten lines of foo.txt:

$ ​head​​ ​​foo.txt​

Output the first two lines of foo.txt:

$ ​head​​ ​​-2​​ ​​foo.txt​

Output the first five lines of foo.txt:

$ ​head​​ ​​-5​​ ​​foo.txt​

tail

Similar to head, but with the last n lines rather than the first.

Output the last ten lines of foo.txt:

$ ​tail​​ ​​foo.txt​

Output the last two lines of foo.txt:

$ ​tail​​ ​​-2​​ ​​foo.txt​

Output the last five lines of foo.txt:

$ ​tail​​ ​​-5​​ ​​foo.txt​

split

Takes a file and splits it into new files every n lines, where n is by default 1,000.

Split a file every 1,000 lines:

$ ​split​​ ​​big.txt​

Split a file every 500 lines:

$ ​split​​ ​​big.txt​​ ​​-l​​ ​​500​

Split a file every 100 bytes:

$ ​split​​ ​​big.txt​​ ​​-b​​ ​​100​

grep

Outputs only those lines that match a given pattern. Accepts regular expressions, allowing complex patterns to be matched, and supports recursing through the filesystem itself—allowing you to search across whole directories of files.

Show lines in standard input that match a given pattern:

	$ echo "foo bar" \| grep foo
	foo

Show lines that don’t match a given pattern:

	$ echo "foo bar" \| grep -v foo
	bar

Match against a regular expression:

	$ echo "foo bar" \| grep '^f'
	foo

Case-insensitive matching:

	$ echo "foo bar" \| grep -i FOO
	foo

Search through all files in the current directory and below:

	$ grep -r foo .
	./foo.txt: foo
	./bar.txt: foo

cut

Splits lines into fields, allowing you to process delimited data and only output particular columns. cut splits on tab by default but can be configured to separate fields by any character using the -d option.

Split fields on the space character and output the fourth field:

	$ date
	Tue 30 Jun 2015 11:37:52 BST

	$ date \| cut -d ' ' -f 4
	2015

Output multiple fields, illustrating both ranges and comma-separated lists of fields:

	$ date \| cut -d ' ' -f 1-3,5
	Tue 30 Jun 11:37:52

Performs a substitution on the input, allowing you to replace certain characters with others. tr is flexible about how characters are specified, allowing you to define ranges and use character classes.

Convert uppercase input to lowercase:

$ ​echo​​ ​​'HELLO WORLD'​​ ​​|​​ ​​tr​​ ​​A-Z​​ ​​a-z​

The same, but accounting for non-ASCII characters:

	$ echo 'HËLLØ WÔRLD' \|
	tr '[:upper:]' '[:lower:]'
	hëllø wôrld

Delete numbers from the input:

	$ echo 'HELLO 123 WORLD' \| tr -d 0-9
	HELLO WORLD

Delete anything that isn’t a letter from the input (the -c stands for complement):

	$ echo 'HELLO %^!@$()' \| tr -cd a-zA-Z
	HELLO

Compress multiple-space characters into one:

	$ echo 'Hello world' \| tr -s ' '
	Hello world

Counts the number of characters (with the -c option), words (with the -w option), or lines (with the -l option) in a given file or in standard input. If no options are specified, will output all three metrics.

Show statistics for a file. The first column shows lines, the second words, and the third characters:

	$ wc foo.txt
	103 392 3944 foo.txt

Display the number of lines in a file:

	$ wc -l foo.txt
	103

Display the number of characters in standard input:

	$ echo "Hello world" \| wc -c
	12

sort

Sorts input or the content of files. Takes options to treat sort data as numeric, to sort insensitively to case, and to ignore leading whitespace, among other things.

Sort a file alphabetically:

	$ cat foo.txt
	foo
	bar

	$ sort foo.txt
	bar
	foo

Sort input numerically:

	$ echo "12 111 1" \| sort
	1
	111
	12

	$ echo "12 111 1" \| sort -n
	1
	12
	111

Sort input in reverse order:

	$ echo "ant mole zebra" \| sort -r
	zebra
	mole
	ant

column

Converts data into columnar format. Very useful for performing alignment that would otherwise take painstaking manual adjustment.

Display a delimited file as an aligned table, with a header row:

	$ cat people.txt
	Samantha 57 Pianist
	Alice 31 Biochemist
	Terence 90 Retired
	Alex 20 Student

	$ ( echo "NAME AGE JOB"; cat people.txt ) \| column -t
	NAME AGE JOB
	Samantha 57 Pianist
	Alice 31 Biochemist
	Terence 90 Retired
	Alex 20 Student

uniq

Outputs its input, but for all consecutively identical lines, outputs those lines only once. So if a line containing foo was followed by three identical lines all containing foo, these four lines would be compressed to one in uniq’s output.

Compress consecutively identical lines:

	$ cat foo.txt
	foo
	foo
	bar
	foo

	$ uniq foo.txt
	foo
	bar
	foo

Display only lines that are unique across the whole file, by using sort to ensure identical lines always appear together:

	$ sort foo.txt \| uniq
	bar
	foo

Display lines that are unique across the whole file, along with a count of how many times those lines occurred:

	$ sort foo.txt \| uniq -c
	1 bar
	3 foo

paste

Nothing to do with the clipboard—as its name might suggest to modern ears, at least. paste joins together two files so that line one from file two is placed on the same line as line one from file one, joined by a tab. This effectively creates tabular data.

Join two files horizontally:

	$ cat first-names
	Avdi
	Katrina
	David

	$ cat last-names
	Grimm
	Owen
	Brady

	$ paste first-names last-names
	Avdi Grimm
	Katrina Owen
	David Brady

Join two files vertically:

	$ paste -s first-names last-names
	Avdi Katrina David
	Grimm Owen Brady

Separate fields with spaces, rather than tabs:

	$ paste -d' ' first-names last-names
	Avdi Grimm
	Katrina Owen
	David Brady

join

Joins two files together. Unlike paste, which does this based on the position of the lines in each file, join functions much more like a join in a relational database: it looks for fields with the same values and joins based on that equality.

Join two files based on the equality of values in the first column:

	$ cat users
	[email protected] Bob Smith
	[email protected] Alice Jones

	$ cat orders
	[email protected] Chips $1.95
	[email protected] Beer $3.50

	$ join users orders
	[email protected] Bob Smith Chips $1.95
	[email protected] Alice Jones Beer $3.50

Output only certain fields:

	$ join -o 1.2,2.2 users orders
	Bob Chips
	Alice Beer

comm

Given two sorted files, displays the lines that occur only in file one, the lines that occur only in file two, and the lines that occur in both. Can be configured to show any number of these columns. (For example, just the lines that are in both or just the lines that are unique to one or more files, but not the lines that are in both, etc.)

Display lines that occur only in file one, that occur only in file two, and that occur in both:

	$ cat 1.txt
	bar
	foo

	$ cat 2.txt
	bar
	baz

	$ comm 1.txt 2.txt
	bar
	baz
	foo

Display only the lines that are in both files:

	$ comm -1 -2 1.txt 2.txt
	bar

Display only the lines that occur only in file one:

	$ comm -2 -3 1.txt 2.txt
	foo

By composing together these various commands in different ways, you’ll be able to perform many text processing tasks without having to reach for anything else. When you find them limiting, you can reach for Ruby to fill in the gaps.

Footnotes

[17]	http://brew.sh/
[18]	https://www.cygwin.com/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix 2: Useful Shell Commands

Create new playlist

Sign In

Sign Up

Appendix 2Useful Shell Commands

Footnotes

Table of Contents for
Appendix 2: Useful Shell Commands

Appendix 2
Useful Shell Commands