Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11. Text File Basics

A text file is a file containing human-readable text. Each line ends with a line feed character, a carriage return, or both, depending on the operating system. By Linux convention, each line ends with a line feed, the (newline) character in printf.

The examples in this chapter use a text file that lists several pieces of furniture by name, price, quantity, and supplier number, as shown in Listing 11.1.

Example 11.1. orders.txt

Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
Bunk Bed,705.99,1,757
Queen Bed,925.99,1,757
Two-drawer Nightstand,125.99,1,756
Cedar Toy Chest,65.99,1,757
Six-drawer Dresser,525.99,1,757
Pine Round Table,375.99,1,757
Bar Tool,45.99,1,756
Lawn Chair,55.99,1,756
Rocking Chair,287.99,1,757
Cedar Armoire,825.99,1,757
Mahogany Writing Desk,463.99,1,756
Garden Bench,149.99,1,757
Walnut TV Stand,388.99,1,756
Victorian-style Sofa,1225.99,1,757
Chair - Rocking,287.99,1,75
Grandfather Clock,2045.99,1,756

Linux contains many utilities for working with text files. Some can act as filters, processing the text so that it can be passed on to yet another command using a pipeline. When a text file is passed through a pipeline, it is called a text stream, that is, a stream of text characters.

Working with Pathnames

Linux has three commands for pathnames.

The basename command examines a path and displays the filename. It doesn't check to see whether the file exists.

$ basename /home/kburtch/test/orders.txt
orders.txt

If a suffix is included as a second parameter, basename deletes the suffix if it matches the file's suffix.

$ basename /home/kburtch/test/orders.txt .txt
orders

The corresponding program for extracting the path to the file is dirname.

$ dirname /home/kburtch/test/orders.txt
/home/kburtch/test

There is no trailing slash after the final directory in the path.

To verify that a pathname is a correct Linux pathname, use the pathchk command. This command verifies that the directories in the path (if they already exist) are accessible and that the names of the directories and file are not too long. If there is a problem with the path, pathchk reports the problem and returns an error code of 1.

$ pathchk "~/x" && echo "Acceptable path"
Acceptable path
$ mkdir a
$ chmod 400 a
$ pathchk "a/test.txt"
pathchk: directory 'a' is not searchable
$ pathchk "~/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" && echo "Acceptable path"
pathchk: name 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx' has length 388; exceeds limit of 255

With the —portability (-p) switch, pathchk enforces stricter portability checks for all POSIX-compliant Unix systems. This identifies characters not allowed in a pathname, such as spaces.

$ pathchk "new file.txt"
$ pathchk -p "new file.txt"
pathchk: path 'new file.txt' contains nonportable character ' '

pathchk is useful for checking pathnames supplied from an outside source, such as pathnames from another script or those typed in by a user.

File Truncation

A particular feature of Unix-based operating systems, including the Linux ext3 file system, is the way space on a disk is reserved for a file. Under Linux, space is never released for a file. For example, if you overwrite a 1MB file with a single byte, Linux still reserves one megabyte of disk space for the file.

If you are working with files that vary greatly in size, you should remove the file and re-create it in order to free up the disk space rather than simply overwriting it.

This behavior affects all files, including directories. If a program removes all 5,000 files from a large directory, and puts a single file in that directory, the directory will still have space reserved for 5,000 file entries. The only way to release this space is to remove and re-create the directory.

Identifying Files

The built-in type command, as discussed in Chapter 3, “Files, Users, and Shell Customization,” identifies whether a command is built-in or not, and where the command is located if it is a Linux command.

To test files other than commands, the Linux file command performs a series of tests to determine the type of a file. First, file determines whether the file is a regular file or is empty. If the file is regular, file consults the /usr/share/magic file, checking the first few bytes of the file in an attempt to determine what the file contains. If the file is an ASCII text file, it performs a check of common words to try to determine the language of the text.

$ file empty_file.txt
empty_file.txt: empty
$ file orders.txt
orders.txt: ASCII text

file also works with programs. If check-orders.sh is a Bash script, file identifies it as a shell script.

$ file check-orders.sh
check-orders.sh: Bourne-Again shell script text
$ file /usr/bin/test
/usr/bin/test: ELF 32-bit LSB executable, Intel 80386, version 1,
dynamically linked (uses shared libs), stripped

For script programming, file's -b (brief) switch hides the name of the file and returns only the assessment of the file.

$ file -b orders.txt
ASCII text

Other useful switches include -f (file) to read filenames from a specific file. The -i switch returns the description as MIME type suitable for Web programming. With the -z (compressed) switch, file attempts to determine the type of files stored inside a compressed file. The -L switch follows symbolic links.

$ file -b -i orders.txt
text/plain, ASCII

Creating and Deleting Files

As discussed in Chapter 3, “Files, Users, and Shell Customization,” files are deleted with the rm (remove) command. The -f (force) command removes a file even when the file permissions indicate the script cannot write to the file, but rm never removes a file from a directory that the script does not own. (The sticky bit is an exception and is discussed in Chapter 15, “Shell Security.”)

As whenever you deal with files, always check that the file exists before you attempt to remove it. See Listing 11.2.

Example 11.2. rm_demo.sh

#!/bin/bash
#
# rm_demo.sh: deleting a file with rm
shopt -s -o nounset

declare -rx SCRIPT=${0##*/}
declare -rx FILE2REMOVE="orders.bak"
declare -x  STATUS

if [ ! -f "$FILE2REMOVE" ] ; then
   printf "%s
" "$SCRIPT: $FILE2REMOVE does not exist" >&2
   exit 192
else
   rm "$FILE2REMOVE" >&2
   STATUS=$?
   if [ $STATUS -ne 0 ] ; then
      printf "%s
" "$SCRIPT: Failed to remove file $FILE2REMOVE" >&2
      exit $STATUS
   fi
fi

exit 0

When removing multiple files, avoid using the -r (recursive) switch or filename globbing. Instead, get a list of the files to delete (using a command such as find, discussed next) and test each individual file before attempting to remove any of them. This is slower than the alternatives but if a problem occurs no files are removed and you can safely check for the cause of the problem.

New, empty files are created with the touch command. The command is called touch because, when it's used on an existing file, it changes the modification time even though it makes no changes to the file.

touch is often combined with rm to create new, empty files for a script. Appending output with >> does not result in an error if the file exists, eliminating the need to remember whether a file exists.

For example, if a script is to produce a summary file called run_results.txt, a fresh file can be created with Listing 11.3.

Example 11.3. touch_demo.sh

#!/bin/bash
#
# touch_demo.sh: using touch to create a new, empty file

shopt -s -o nounset

declare -rx RUN_RESULTS="./run_results.txt"

if [ -f "$RUN_RESULTS" ] ; then
   rm -f "$RUN_RESULTS"
   if [ $? -ne 0 ] ; then
      printf "%s
" "Error: unable to replace $RUN_RESULTS" >&2
   fi
   touch "$RUN_RESULTS"
fi

printf "Run stated %s
" "'date'" >> "$RUN_RESULTS"

The -f switch forces the creation of a new file every time.

Moving and Copying Files

Files are renamed or moved to new directories using the mv (move) command. If -f (force) is used, move overwrites an existing file instead of reporting an error. Use -f only when it is safe to overwrite the file.

You can combine touch with mv to back up an old file under a different name before starting a new file. The Linux convention for backup files is to rename them with a trailing tilde (~). See Listing 11.4.

Example 11.4. backup_demo.sh

#!/bin/bash
#
# backup_demo.sh

shopt -s -o nounset

declare -rx RUN_RESULTS="./run_results.txt"

if [ -f "$RUN_RESULTS" ] ; then
   mv -f "$RUN_RESULTS" "$RUN_RESULTS""~"
   if [ $? -ne 0 ] ; then
      printf "%s
" "Error: unable to backup $RUN_RESULTS" >&2
   fi
   touch "$RUN_RESULTS"
fi

printf "Run stated %s
" "'date'" >> "$RUN_RESULTS"

Because it is always safe to overwrite the backup, the move is forced with the -f switch. Archiving files is usually better than outright deleting because there is no way to “undelete” a file in Linux.

Similar to mv is the cp (copy) command. cp makes copies of a file and does not delete the original file. cp can also be used to make links instead of copies using the —link switch.

More Information About Files

There are two Linux commands that display information about a file that cannot be easily discovered with the test command.

The Linux stat command shows general information about the file, including the owner, the size, and the time of the last access.

$ stat ken.txt
  File: "ken.txt"
  Size: 84         Blocks: 8         Regular File
Access: (0664/-rw-rw-r—)         Uid: (  503/ kburtch)  Gid: (  503/ kburtch)
Device: 303        Inode: 131093     Links: 1
Access: Tue Feb 20 16:34:11 2001
Modify: Tue Feb 20 16:34:08 2001
Change: Tue Feb 20 16:34:08 2001

To make the information more readable from a script, use the -t (terse) switch. Each stat item is separated by a space.

$ stat -t orders.txt
orders.txt 21704 48 81fd 503 503 303 114674 1 6f 89 989439402
981490652 989436657

The Linux statftime command has similar capabilities to stat, but has a wider range of formatting options. statftime is similar to the date command: It has a string argument describing how the status information should be displayed. The argument is specified with the -f (format) switch.

The most common statftime format codes are as follows:

%c—. Standard format
%d—. Day (zero filled)
%D—. mm/dd/yy
%H—. Hour (24-hr clock)
%I—. Hour (12-hr clock)
%j—. Day (1..366)
%m—. Month
%M—. Minute
%S—. Second
%U—. Week number (Sunday)
%w—. Weekday (Sunday)
%Y—. Year
%%—. Percent character
%_A—. Uses file last access time
%_a—. Filename (no suffix)
%_C—. Uses file inode change time
%_d—. Device ID
%_e—. Seconds elapsed since epoch
%_f—. File system type
%_i—. Inode number
%_L—. Uses current (local) time
%_l—. Number of hard links
%_M—. Uses file last modified time
%_m—. Type/attribute/access bits
%_n—. Filename
%_r—. Rdev ID (char/block devices)
%_s—. File size (bytes)
%_U—. Uses current (UTC) time
%_u—. User ID (uid)
%_z—. Sequence number (1,2,...)

A complete list appears in the reference section at the end of this chapter.

By default, any of formatting codes referring to time will be based on the file's modified time.

$ statftime -f "%c" orders.txt
Tue Feb  6 15:17:32 2001

Other types of time can be selected by using a time code. The format argument is read left to right, which means different time codes can be combined in one format string. Using %_C, for example, changes the format codes to the inode change time (usually the time the file was created). Using %_L (local time) or %_U (UTC time) makes statftime behave like the date command.

$ statftime -f "modified time = %c current time = %_L%c" orders.txt
modified time = Tue Feb  6 15:17:32 2001 current time = Wed May
  9 15:49:01 2001
$ date
Wed May  9 15:49:01 2001

statftime can create meaningful archive filenames. Often files are sent with a name such as orders.txt and the script wants to save the orders with the date as part of the name.

$ statftime -f "%_a_%_L%m%d.txt" orders.txt
orders_0509.txt

Besides generating new filenames, statftime can be used to save information about a file to a variable.

$ BYTES='statftime -f "%_s" orders.txt'
$ printf "The file size is %d bytes
" "$BYTES"
The file size is 21704 bytes

When a list of files is supplied on standard input, the command processes each file in turn. The %_z code provides the position of the filename in the list, starting at 1.

Transferring Files Between Accounts (`wget`)

Linux has a convenient tool for downloading files from other logins on the current computer or across a network. wget (web get) retrieves files using FTP or HTTP. wget is designed specifically to retrieve files, making it easy to use in shell scripts. If a connection is broken, wget tries to reconnect and continue to download the file.

The wget program uses the same form of address as a Web browser, supporting ftp:// and http:// URLs. Login information is added to a URL by placing user: and password@ prior to the hostname. FTP URLs can end with an optional ;type=a or ;type=i for ASCII or IMAGE FTP downloads. For example, to download the info.txt file from the kburtch login with the password jabber12 on the current computer, you use:

$ wget ftp://kburtch:jabber12@localhost/info.txt;type=i

By default, wget uses —verbose message reporting. To report only errors, use the —quiet switch. To log what happened, append the results to a log file using —append-output and a log name and log the server responses with the —server-response switch.

$ wget —server-response —append-output wget.log  ftp://kburtch:
jabber12@localhost/info.txt;type=i

Whole accounts can be copied using the —mirror switch.

$ wget —mirror ftp://kburtch:jabber12@localhost;type=i

To make it easier to copy a set of files, the —glob switch can enable file pattern matching. —glob=on causes wget to pattern match any special characters in the filename. For example, to retrieve all text files:

$ wget —glob=on 'ftp://kburtch:jabber12@localhost/*.txt'

There are many special-purpose switches not covered here. A complete list of switches is in the reference section. Documentation is available on the wget home page at http://www.gnu.org/software/wget/wget.html.

Transferring Files with FTP

Besides wget, the most common way of transferring files between accounts is using the ftp command. FTP is a client/server system: An FTP server must be set up on your computer if there isn't one already. Most Linux distributions install an FTP server by default.

With an FTP client, you'll have to redirect the necessary download commands using standard input, but this is not necessary with wget.

To use ftp from a script, you use three switches. The -i (not interactive) switch disables the normal FTP prompts to the user. -n (no auto-login) suppresses the login prompt, requiring you to explicitly log in with the open and user commands. -v (verbose) displays more details about the transfer. The ftp commands can be embedded in a script using a here file.

ftp -i -n -v <<!
open ftp.nightlight.com
user incoming_orders password
cd linux_lightbulbs
binary
put $1
!
if [ $? -ne 0 ] ; then
   printf "%s
" "$SCRIPT: FTP transfer failed" >&2
   exit 192
fi

This script fragment opens an FTP connection to a computer called ftp.nightlight.com. It deposits a file in linux_lightbulbs directory in the incoming_orders account. If an error occurs, an error message is printed and the script stops.

Processing files sent by FTP is difficult because there is no way of knowing whether the files are still being transferred. Instead of saving a file to a temp file and then moving it to its final location, an FTP server will create a blank file and will slowly save the data to the file. The mere presence of the file is not enough to signify the transfer is complete. The usual method of handling this situation is to wait until the file has been modified within a reasonable amount of time (perhaps an hour). If the file hasn't been modified recently, the transfer is probably complete and the file can be safely renamed and moved to a permanent directory.

Some distributions have an ftpcopy (or a ftpcp) command, which will copy whole directories at one time. Care must be taken with ftpcopy because it is primarily intended as a mirroring tool and it will delete any local files not located at the remote account.

Transferring Files with Secure FTP (`sftp`)

Part of the OpenSSH (Open Source Secure Shell) project, Secure FTP (sftp) is another file-transfer program that works in a similar way to FTP but encrypts the transfer so that it cannot be intercepted or read by intermediary computers. The encryption process increases the amount of data and slows the transfer but provides protection for confidential information.

You must specify the computer and user account on the sftp command line. SFTP prompts you for the password.

$ sftp root@our_web_site.com:/etc/httpd/httpd.conf
Connecting to our_web_site.com...
root@our_web_site.coms password:
Fetching /etc/httpd/httpd.conf to httpd.conf

For security purposes, SFTP normally asks the user for the Linux login password. It doesn't request the password from standard input but from the controlling terminal. This means you can't include the password in the batch file. The solution to this problem is to use SSH's public key authentication using the ssh-keygen command. If you have not already done so, generate a new key pair as follows.

$ ssh-keygen -t rsa

A pair of authentication keys are stored under .ssh in your home directory. You must copy the public key (a file ending in .pub) to the remote machine and add it to a text file called ~/.sshd/authorized_keys. Each local login accessing the remote login needs a public key in authorized_keys. If a key pair exists, SFTP automatically uses the keys instead of the Linux login password.

Like FTP, SFTP needs a list of commands to carry out. SFTP includes a -b (batch) switch to specify a separate batch file containing the commands to execute. To use a convenient here file in your script, use a batch file called /dev/stdin.

The commands that SFTP understands are similar to FTP. For purposes of shell scripting, the basic transfer commands are the same. Transfers are always “binary.” There is a -v (verbose) switch, but it produces a lot of information. When the -b switch is used, SFTP shows the commands that are executed so the -v switch is not necessary for logging what happened during the transfer.

sftp -C -b /dev/stdin root@our_web_site.com <<!
cd /etc/httpd
get httpd.conf
!
STATUS=$?
if [ $STATUS -ne 0 ] ; then
   printf "%s
" "Error: SFTP transfer failed" >&2
   exit $STATUS
fi

The -C (compress) option attempts to compress the data for faster transfers.

For more information about ssh, sftp, and related programs, visit http://www.openssh.org/.

Verifying Files

Files sent by FTP or wget can be further checked by computing a checksum. The Linux cksum command counts the number of bytes in a file and prints a cyclic redundancy check (CRC) checksum, which can be used to verify that the file arrived complete and intact. The command uses a POSIX-compliant algorithm.

$ cksum orders.txt
491404265 21799 orders.txt

There is also a Linux sum command that provides compatibility with older Unix systems, but be aware that cksum is incompatible with sum.

For greater checksum security, some distributions include a md5sum command to compute an MD5 checksum. The —status switch quietly tests the file. The —binary (or -b) switch treats the file as binary data as opposed to text. The —warn switch prints warnings about bad MD5 formatting. —check (or -c) checks the sum on a file.

$ md5sum orders.txt
945eecc13707d4a23e27730a44774004  orders.txt
$ md5sum orders.txt > orderssum.txt
$ md5sum —check orderssum.txt
file1.txt: OK

Differences between two files can be pinpointed with the Linux cmp command.

$ cmp orders.txt orders2.txt
orders.txt orders2.txt differ: char 179, line 6

If two files don't differ, cmp prints nothing.

Splitting Large Files

Extremely large files can be split into smaller files using the Linux split command. Files can be split by bytes or by lines. The —bytes=s (or -b s) switch creates files of no more than s bytes. The —lines=s (or -l s) switch creates files of no more than s lines. The —line-bytes=s (or -C s) switch constraints each line to no more than s bytes. The size is a number with an optional b (512 byte blocks), k (kilobytes), or m (megabytes). The final parameter is the prefix to use for the new filenames.

$ split —bytes=10k huge_file.txt small_file
$ ls -l small_file*
-rw-rw-r—    1 kburtch  kburtch     10240 Aug 28 16:19 small_fileaa
-rw-rw-r—    1 kburtch  kburtch     10240 Aug 28 16:19 small_fileab
-rw-rw-r—    1 kburtch  kburtch      1319 Aug 28 16:19 small_fileac

You reassemble a split file with the Linux cat command. This command combines files and writes them to standard output. Be careful to combine the split files in the correct order.

$ cksum huge_file.txt
491404265 21799 huge_file.txt
$ cat small_fileaa small_fileab small_fileac > new_file
$ cksum new_file
491404265 21799 new_file

If the locale where the split occurred is the same as the locale where the file is being reassembled, it is safe to use wildcard globbing for the cat filenames.

The Linux csplit (context split) command splits a file at the points where a specific pattern appears.

The basic csplit pattern is a regular expression in slashes followed by an optional offset. The regular expression represents lines that will become the first line in the next new file. The offset is the number of lines to move forward or back from the matching line, which is by default zero. The pattern "/dogs/+1" will separate a file into two smaller files, the first ending with the first occurrence of the pattern dogs.

Quoting the pattern prevents it from being interpreted by Bash instead of the csplit command.

The —prefix=P (or -f P) switch sets the prefix for the new filenames. The —suffix=S (or -b S) writes the file numbers using the specified C printf function codes. The —digits=D (or -n D) switch specifies the maximum number of digits for file numbering. The default is two digits.

$ csplit —prefix "chairs" orders.txt "/Chair/"
107
485
$ ls -l chairs*
-rw-rw-r—    1 kburtch  kburtch       107 Oct  1 15:33 chairs00
-rw-rw-r—    1 kburtch  kburtch       485 Oct  1 15:33 chairs01
$ head -1 chairs01
Reclining Chair,1599.99,1,757

The first occurrence of the pattern Chair was in the line Reclining Chair.

Multiple patterns can be listed. A pattern delineated with percent signs (%) instead of with slashes indicates a portion of the file that should be ignored up to the indicated pattern. It can also have an offset. A number by itself indicates that particular line is to start the next new file. A number in curly braces repeats the last pattern a specific number of times, or an asterisk to match all occurrences of a pattern.

To split the orders.txt file into separate files, each beginning with the word Chair, use the all occurrences pattern.

$ csplit —prefix "chairs" orders.txt "/Chair/" "{*}"
107
222
23
179
61
$ ls -l chairs*
-rw-rw-r—    1 kburtch  kburtch       107 Oct  1 15:37 chairs00
-rw-rw-r—    1 kburtch  kburtch       222 Oct  1 15:37 chairs01
-rw-rw-r—    1 kburtch  kburtch        23 Oct  1 15:37 chairs02
-rw-rw-r—    1 kburtch  kburtch       179 Oct  1 15:37 chairs03
-rw-rw-r—    1 kburtch  kburtch        61 Oct  1 15:37 chairs04

The —elide-empty-files (or -z) switch doesn't save files that contain nothing. —keep-files (or -k) doesn't delete the generated files when an error occurs. The —quiet (or —silent or -q or -s) switch hides progress information.

csplit is useful in splitting large files containing repeated information, such as extracting individual orders sent from a customer as a single text file.

Tabs and Spaces

The Linux expand command converts Tab characters into spaces. The default is eight spaces, although you can change this with —tabs=n (or -t n) to n spaces. The —tabs switch can also use a comma-separated list of Tab stops.

$ printf "	A	TEST
" > test.txt
$ wc test.txt
      1       2       8 test.txt
$ expand test.txt | wc
      1       2      21

The —initial (or -i) switch converts only leading Tabs on a line.

$ expand —initial test.txt | wc
      1       2      15

The corresponding unexpand command converts multiple spaces back into Tab characters. The default is eight spaces to a Tab, but you can use the —tabs=n switch to change this. By default, only initial tabs are converted. Use the —all (or -a) switch to consider all spaces on a line.

Use expand to remove tabs from a file before processing it.

Temporary Files

Temporary files, files that exist only for the duration of a script's execution, are traditionally named using the $$ function. This function returns the process ID number of the current script. By including this number in the name of the temporary files, it makes the name of the file unique for each run of the script.

$ TMP="/tmp/reports.$$"
$ printf "%s
" "$TMP"
/tmp/reports.20629
$ touch "$TMP"

The drawback to this traditional approach lies in the fact that the name of a temporary file is predictable. A hostile program can see the process ID of your scripts when it runs and use that information to identify which temporary files your scripts are using. The temporary file could be deleted or the data replaced in order to alter the behavior of your script.

For better security, or to create multiple files with unique names, Linux has the mktemp command. This command creates a temporary file and prints the name to standard output so it can be stored in a variable. Each time mktemp creates a new file, the file is given a unique name. The name is created from a filename template the program supplies, which ends in the letter X six times. mktemp replaces the six letters with a unique, random code to create a new filename.

$ TMP='mktemp /tmp/reports.XXXXXX'
$ printf "%s
" "$TMP"
/tmp/reports.3LnWVw
$ ls -l "$TMP"
-rw———-    1 kburtch  kburtch         0 Aug  1 14:34 reports.3LnWVw

In this case, the letters XXXXXX are replaced with the code 3LnWvw.

mktemp creates temporary directories with the -d (directories) switch. You can suppress error messages with the -q (quiet) switch.

Lock Files

When many scripts share the same files, there needs to be a way for one script to indicate to another that it has finished its work. This typically happens when scripts overseen by two different development teams need to share files, or when a shared file can be used by only one script at a time.

A simple method for synchronizing scripts is the use of lock files. A lock file is like a flag variable: The existence of the file indicates a certain condition, in this case, that the file is being used by another program and should not be altered.

Most Linux distributions include a directory called /var/lock, a standard location to place lock files.

Suppose the invoicing files can be accessed by only one script at a time. A lock file called invoices_lock can be created to ensure only one script has access.

declare -r INVOICES_LOCKFILE="/var/lock/invoices_lock"
while test ! -f "$INVOICES_LOCKFILE" ; do
  printf "Waiting for invoices to be printed...
"
  sleep 10
done
touch "$INVOICES_LOCKFILE"

This script fragment checks every 10 seconds for the presence of invoices_lock. When the file disappears, the loop completes and the script creates a new lock file and proceeds to do its work. When the work is complete, the script should remove the lock file to allow other scripts to proceed.

If a lock file is not removed when one script is finished, it causes the next script to loop indefinitely. The while loop can be modified to use a timeout so that the script stops with an error if the invoice files are not accessible after a certain period of time.

declare -r INVOICES_LOCKFILE="/var/lock/invoices_lock"
declare -ir INVOICES_TIMEOUT=1800    # 30 minutes
declare -i TIME=0
TIME_STARTED='date +%s'
while test ! -f "$INVOICES_LOCKFILE" ; do
  printf "Waiting for the invoices to be printed...
"
  sleep 10
  TIME='date +%s'
  TIME=TIME-TIME_STARTED
  if [ $TIME -gt $INVOICES_TIMEOUT ] ; then
     printf "Timed out waiting for the invoices to print
"
     exit 1
  fi
done

The date command's %s code returns the current clock time in seconds. When two executions of date are subtracted from each other, the result is the number of seconds since the first date command was executed. In this case, the timeout period is 1800 seconds, or 30 minutes.

Named Pipes

Lock files are convenient when a small number of scripts share the same file. When too many scripts are waiting on a lock file, a race condition occurs: the computer spends a lot of time simply checking for the presence of the lock file instead of doing useful work. Fortunately, there are other ways to share information.

Two scripts can share data using a special kind of file called a named pipe. These pipes (also called FIFOs or queues) are files that can be read by one script while being written to by another. The effect is similar to the pipe operator (|), which forwards the results of one command as the input to another. Unlike a shell pipeline, the scripts using a named pipe run independently of one another, sharing only the pipe file between them. No lock files are required.

The mkfifo command creates a new named pipe.

$ mkfifo website_orders.fifo
$ ls -l website_orders.fifo
prw-rw-r—    1 kburtch  kburtch         0 May 22 14:14 orders.fifo

The file type p to the left of the ls output indicates this is a named pipe. If the ls filename typing option (-F) is used, the filename is followed by a vertical bar (|) to indicate a pipe.

The named pipe can be read like a regular file. Suppose, for example, you want to create a script to log incoming orders from the company Web site, as shown in Listing 11.5.

Example 11.5. do_web_orders.sh

#!/bin/bash
#
# do_web_orders.sh: read a list of orders and show date read

shopt -s -o nounset

declare -rx SCRIPT=${0##*/}
declare -rx QUEUE="website_orders.fifo"
declare DATE
declare ORDER

if test ! -r "$QUEUE" ; then
   printf "%s
" "$SCRIPT:$LINENO: the named pipe is missing or 
not readable" >&2
   exit 192
fi

{
  while read ORDER; do
     DATE='date'
     printf "%s: %s
" "$DATE" "$ORDER"
  done
} < $QUEUE

printf "Program complete"
exit 0

In this example, the contents of the pipe are read one line at a time just as if it was a regular file.

When a script reads from a pipe and there's no data, it sleeps (or blocks) until more data becomes available. If the program writing to the pipe completes, the script reading the pipe sees this as the end of the file. The while loop will complete and the script will continue after the loop.

To send orders through the pipe, they must be printed or otherwise redirected to the pipe. To simulate a series of orders, write the orders file to the named pipe using the cat command. Even though the cat command is running in the background, it continues writing orders to the named pipe until all the lines have been read by the script.

$ cat orders.txt > website_orders.fifo &
$ sh do_web_orders.sh
Tue May 22 14:23:00 EDT 2001: Birchwood China Hutch,475.99,1,756
Tue May 22 14:23:00 EDT 2001: Bookcase Oak Veneer,205.99,1,756
Tue May 22 14:23:00 EDT 2001: Small Bookcase Oak Veneer,205.99,1,756
Tue May 22 14:23:00 EDT 2001: Reclining Chair,1599.99,1,757
Tue May 22 14:23:00 EDT 2001: Bunk Bed,705.99,1,757
Tue May 22 14:23:00 EDT 2001: Queen Bed,925.99,1,757
Tue May 22 14:23:00 EDT 2001: Two-drawer Nightstand,125.99,1,756
Tue May 22 14:23:00 EDT 2001: Cedar Toy Chest,65.99,1,757
Tue May 22 14:23:00 EDT 2001: Six-drawer Dresser,525.99,1,757
Tue May 22 14:23:00 EDT 2001: Pine Round Table,375.99,1,757
Tue May 22 14:23:00 EDT 2001: Bar Stool,45.99,1,756
Tue May 22 14:23:00 EDT 2001: Lawn Chair,55.99,1,756
Tue May 22 14:23:00 EDT 2001: Rocking Chair,287.99,1,757
Tue May 22 14:23:00 EDT 2001: Cedar Armoire,825.99,1,757
Tue May 22 14:23:00 EDT 2001: Mahogany Writing Desk,463.99,1,756
Tue May 22 14:23:00 EDT 2001: Garden Bench,149.99,1,757
Tue May 22 14:23:00 EDT 2001: Walnut TV Stand,388.99,1,756
Tue May 22 14:23:00 EDT 2001: Victorian-style Sofa,1225.99,1,757
Tue May 22 14:23:00 EDT 2001: Chair - Rocking,287.99,1,757
Tue May 22 14:23:00 EDT 2001: Grandfather Clock,2045.99,1,756

Using tee, a program can write to two or more named pipes simultaneously.

Because a named pipe is not a regular file, commands such as grep, head, or tail can behave unexpectedly or block indefinitely waiting for information on the pipe to appear or complete. If in doubt, verify that the file is not a pipe before using these commands.

Process Substitution

Sometimes the vertical bar pipe operators cannot be used to link a series of commands together. When a command in the pipeline does not use standard input, or when it uses two sources of input, a pipeline cannot be formed. To create pipes when normal pipelines do not work, Bash uses a special feature called process substitution.

When a command is enclosed in <(...), Bash runs the command separately in a subshell, redirecting the results to a temporary named pipe instead of standard input. In place of the command, Bash substitutes the name of a named pipe file containing the results of the command.

Process substitution can be used anywhere a filename is normally used. For example, the Linux grep command, a file-searching command, can search a file for a list of strings. A temporary file can be used to search a log file for references to the files in the current directory.

$ ls -1 > temp.txt
$ grep -f temp.txt /var/log/nightrun_log.txt
Wed Aug 29 14:18:38 EDT 2001 invoice_error.txt deleted
$ rm temp.txt

A pipeline cannot be used to combine these commands because the list of files is being read from temp.txt, not standard input. However, these two commands can be rewritten as a single command using process substitution in place of the temporary filename.

$ grep -f <(ls -1) /var/log/nightrun_log.txt
Wed Aug 29 14:18:38 EDT 2001 invoice_error.txt deleted

In this case, the results of ls -1 are written to a temporary pipe. grep reads the list of files from the pipe and matches them against the contents of the nightrun_log.txt file. The fact that Bash replaces the ls command with the name of a temporary pipe can be checked with a printf statement.

$ printf "%s
" <(ls -1)
/dev/fd/63

Bash replaces -f <(ls -1) with -f /dev/fd/63. In this case, the pipe is opened as file descriptor 63. The left angle bracket (<) indicates that the temporary file is read by the command using it. Likewise, a right angle bracket (>) indicates that the temporary pipe is written to instead of read.

Opening Files

Files can be read by piping their contents to a command, or by redirecting the file as standard input to a command or group of commands. This is the easiest way to see what a text file contains, but it has two drawbacks. First, only one file can be examined at a time. Second, it prevents the script from interacting with the user because the read command reads from the redirected file instead of the keyboard.

Instead of piping or redirection, files can be opened for reading by redirecting the file to a descriptor number with the exec command, as shown in Listing 11.6.

Example 11.6. open_file.sh

#!/bin/bash
#
# open_file.sh: print the contents of orders.txt

shopt -s -o nounset

declare LINE

exec 3< orders.txt
while read LINE <&3 ; do
   printf "%s
" "$LINE"
done
exit 0

In this case, the file orders.txt is redirected to file descriptor 3. Descriptor 3 is the lowest number that programs can normally use. File descriptor 0 is standard input, file descriptor 1 is standard output, and file descriptor 2 is standard error.

The read command receives its input from descriptor 3 (orders.txt), which is being redirected by <. read can also read from a particular file descriptor using the Korn shell -u switch.

If the file opened with exec does not exist, Bash reports a "bad file number" error. The file descriptor must also be a literal number, not a variable.

If exec is not used, the file descriptor can still be opened but it cannot be reassigned.

3< orders.txt
3< orders2.txt

In this example, file descriptor 3 is orders.txt. The second line has no effect because descriptor 3 is already opened. If exec is used, the second line re-opens descriptor 3 as orders2.txt.

To save file descriptors, exec can copy a descriptor to a second descriptor. To make input file descriptor 4 the same file as file descriptor 3, do this

exec 4<&3

Now descriptor 3 and 4 refer to the same file and can be used interchangeably. Descriptor 3 can be used to open another file and can be restored to its original value by copying it back from descriptor 4. If descriptor 4 is omitted, Bash assumes that you want to change standard input (descriptor 0).

You can move a file descriptor by appending a minus sign to it. This closes the original file after the descriptor was copied.

exec 4<&3-

You can likewise duplicate output file descriptors with >& and move them by appending a minus sign. The default output is standard output (descriptor 1).

To open a file for writing, use the output redirection symbol (>).

exec 3<orders.txt
exec 4>log.out
while read LINE <&3 ; do
  printf "%s
" "$LINE" >&4
done

The <> symbol opens a file for both input and output.

exec 3<>orders.txt

The reading or writing proceeds sequentially from the beginning of the file. Writing to the file overwrites its contents: As long as the characters being overwritten are the same length as the original characters, the new characters replace the old. If the next line in a file is dog, for example, writing the line cat over dog replaces the word dog. However, if the next line in the file is horse, writing cat creates two lines—the line cat and the line se. The linefeed character following cat overwrites the letter r. The script will now read the line se.

<> has limited usefulness with regular files because there is no way to “back up” and rewrite something that was just read. You can only overwrite something that you are about to read next.

The script in Listing 11.7 reads through a file and appends a "Processed on" message to the end of the file.

Example 11.7. open_files2.sh

#!/bin/bash
#
# open_files2.sh

shopt -o -s nounset

declare LINE

exec 3<>orders.txt
while read LINE <&3 ; do
  printf "%s
" "$LINE"
done
printf "%s
" "Processed on "'date' >&3
exit 0

<> is especially useful for socket programming, which is discussed in Chapter 16, “Network Programming.”

As files can be opened, so they can also be closed. An input file descriptor can be closed with <&-. Be careful to include a file descriptor because, without one, this closes standard input. An output file descriptor can be closed with >&-. Without a descriptor, this closes standard output.

As a special Bash convention, file descriptors can be referred to by a pathname. A path in the form of /dev/fd/n refers to file descriptor n. For example, standard output is /dev/fd/1. Using this syntax, it is possible to refer to open file descriptors when running Linux commands.

$ exec 4>results.out
$ printf "%s
" "Send to fd 4 and standard out" | tee /dev/fd/4
Send to fd 4 and standard out
$ exec 4>&-
$ cat results.out
Send to fd 4 and standard out

Using `head` and `tail`

The Linux head command returns the first lines contained in a file. By default, head prints the first 10 lines. You can specify a specific number of lines with the —lines=n (or -n n) switch.

$ head —lines=5 orders.txt
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
Bunk Bed,705.99,1,757

You can abbreviate the —lines switch to a minus sign and the number of lines.

$ head -3 orders.txt
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756

The amount of lines can be followed by a c for characters, an l for lines, a k for kilobytes, or an m for megabytes. The —bytes (or -c) switch prints the number of bytes you specify.

$ head -9c orders.txt
Birchwood
$ head —bytes=9 orders.txt
Birchwood

The Linux tail command displays the final lines contained in a file. Like head, the amount of lines or bytes can be followed by a c for characters, an l for lines, a k for kilobytes, or an m for megabytes.

The switches are similar to the head command. The —bytes=n (or -c) switch prints the number of bytes you specify. The —lines=n (-n) switch prints the number of lines you specify.

$ tail -3 orders.txt
Walnut TV Stand,388.99,1,756
Victorian-style Sofa,1225.99,1,757
Grandfather Clock,2045.99,1,756

Combining tail and head in a pipeline, you can display any line or range of lines.

$ head -5 orders.txt | tail -1
Bunk Bed,705.99,1,757

If the starting line is a plus sign instead of a minus sign, tail counts that number of lines from the start of the file and prints the remainder. This is a feature of tail, not the head command.

$ tail +17 orders.txt
Walnut TV Stand,388.99,1,756
Victorian-style Sofa,1225.99,1,757
Grandfather Clock,2045.99,1,756

When using head or tail on arbitrary files in a script, always check to make sure that the file is a regular file to avoid unpleasant surprises.

File Statistics

The Linux wc (word count) command provides statistics about a file. By default, wc shows the size of the file in lines, words, and characters. To make wc useful in scripts, switches must be used to return a single statistic.

The —bytes (or —chars or -c) switch returns the file size, the same value as the file size returned by statftime.

$ wc —bytes invoices.txt
  20411 invoices.txt

To use wc in a script, direct the file through standard input so that the filename is suppressed.

$ wc —bytes < status_log.txt
  57496

The —lines (or -l) switch returns the number of lines in the file. That is, it counts the number of line feed characters.

$ wc —lines < status_log.txt
   1569

The —max-line-length (or -L) switch returns the length of the longest line. The —words (or -w) switch counts the number of words in the file.

wc can be used with variables when their values are printed into a pipeline.

$ declare -r TITLE="Annual Grain Yield Report"
$ printf "%s
" "$TITLE" | wc —words
      4

Cutting

The Linux cut command removes substrings from all lines contained in a file.

The —fields (or -f) switch prints a section of a line marked by a specific character. The —delimiter (or -d) switch chooses the character. To use a space as a delimiter, it must be escaped with a backslash or enclosed in quotes.

$ declare -r TITLE="Annual Grain Yield Report"
$ printf "%s
" "$TITLE" | cut -d' ' -f2
Grain

In this example, the delimiter is a space and the second field marked by a space is Grain. When cutting with printf, always make sure a line feed character is printed; otherwise, cut will return an empty string.

Multiple fields are indicated with commas and ranges as two numbers separated by a minus sign (-).

$ printf "%s
" "$TITLE" | cut -d' ' -f 2,4
Grain Report

You separate multiple fields using the delimiter character. To use a different delimiter character when displaying the results, use the —output-delimiter switch.

The —characters (or -c) switch prints the specified characters' positions. This is similar to the dollar sign expression substrings but any character or range of characters can be specified. The —bytes (or -b) switch works identically but is provided for future support of multi-byte international characters.

$ printf "%s
" "$TITLE" | cut —characters 1,3,6-8
Anl G

The —only-delimited (or -s) switch ignores lines in which the delimiter character doesn't appear. This is an easy way to skip a title or other notes at the beginning of a data file.

When used on multiple lines, cut cuts each line

$ cut -d, -f1 < orders.txt | head -3
Birchwood China Hutch
Bookcase Oak Veneer
Small Bookcase Oak Veneer

The script in Listing 11.8 adds the quantity fields in orders.txt.

Example 11.8. cut_demo.sh

#!/bin/bash
#
# cut_demo.sh: compute the total quantity from orders.txt

shopt -o -s nounset

declare -i QTY
declare -ix TOTAL_QTY=0

cut -d, -f3 orders.txt | {
  while read QTY ; do
    TOTAL_QTY=TOTAL_QTY+QTY
  done
  printf "The total quantity is %d
" "$TOTAL_QTY"
}
exit 0

Pasting

The Linux paste command combines lines from two or more files into a single line. With two files, paste writes to standard output the first line of the first file, a Tab character, and the first line from the second file, and then continues with the second line until all the lines have been written out. If one file is shorter than the other, blank lines are used for the missing lines.

The —delimiters (-d) switch is a list of one or more delimiters to use in place of a Tab. The paste command cycles through the list if it needs more delimiters than are provided in the list, as shown in Listing 11.9.

Example 11.9. two_columns.sh

#!/bin/bash
#
# two_columns.sh

shopt -s -o nounset

declare -r ORDERS="orders.txt"
declare -r COLUMN1="column1.txt"
declare -r COLUMN2="column2.txt"
declare –i LINES

LINES='wc -l < "$ORDERS"'
LINES=LINES/2

head -$LINES < "$ORDERS" > "$COLUMN1" 

LINES=LINES+1
tail +$LINES < "$ORDERS" > "$COLUMN2"

paste —delimiters="|" "$COLUMN1" "$COLUMN2"

rm "$COLUMN1"
rm "$COLUMN2"
exit 0

Running this script, the contents of orders.txt are separated into two columns, delineated by a vertical bar.

$ sh two_columns.sh
Birchwood China Hutch,475.99,1,756|Bar Stool,45.99,1,756
Bookcase Oak Veneer,205.99,1,756|Lawn Chair,55.99,1,756
Small Bookcase Oak Veneer,205.99,1,756|Rocking Chair,287.99,1,757
Reclining Chair,1599.99,1,757|Cedar Armoire,825.99,1,757
Bunk Bed,705.99,1,757|Mahogany Writing Desk,463.99,1,756
Queen Bed,925.99,1,757|Garden Bench,149.99,1,757
Two-drawer Nightstand,125.99,1,756|Walnut TV Stand,388.99,1,756
Cedar Toy Chest,65.99,1,757|Victorian-style Sofa,1225.99,1,757
Six-drawer Dresser,525.99,1,757|Chair - Rocking,287.99,1,757
Pine Round Table,375.99,1,757|Grandfather Clock,2045.99,1,756

Suppose you had a file called order1.txt containing an item from orders.txt separated into a list of the fields on single lines.

Birchwood China Hutch
475.99
1
756

The paste —serial (-s) switch pastes all the lines of each file into a single item, as opposed to combining a single line from each file one line at a time. This switch recombines the separate fields into a single line.

$ paste —serial —delimiters="," order1.txt
Birchwood China Hutch,475.99,1,756

To merge the lines of two or more files so that the lines follow one another, use the sort command with the -m switch.

Columns

Columns created with the paste command aren't suitable for all applications. For pretty displays, the Linux column command creates fixed-width columns. The columns are fitted to the size of the screen as determined by the COLUMNS environment variable, or to a specific row width using the -c switch.

$ column < orders.txt
Birchwood China Hutch,475.99,1,756      Bar Stool,45.99,1,756
Bookcase Oak Veneer,205.99,1,756        Lawn Chair,55.99,1,756
Small Bookcase Oak Veneer,205.99,1,756  Rocking Chair,287.99,1,757
Reclining Chair,1599.99,1,757           Cedar Armoire,825.99,1,757
Bunk Bed,705.99,1,757                   Mahogany Writing Desk,463.99,1,756
Queen Bed,925.99,1,757                  Garden Bench,149.99,1,757
Two-drawer Nightstand,125.99,1,756      Walnut TV Stand,388.99,1,756
Cedar Toy Chest,65.99,1,757             Victorian-style Sofa,1225.99,1,757
Six-drawer Dresser,525.99,1,757         Chair - Rocking,287.99,1,757
Pine Round Table,375.99,1,757           Grandfather Clock,2045.99,1,756

The -t switch creates a table from items delimited by a character specified by the -s switch.

$ column -s ',' -t < orders.txt | head -5
Birchwood China Hutch      475.99   1  756
Bookcase Oak Veneer        205.99   1  756
Small Bookcase Oak Veneer  205.99   1  756
Reclining Chair            1599.99  1  757
Bunk Bed                   705.99   1  757

The table fill-order can be swapped with the -x switch.

Folding

The Linux fold command ensures that a line is no longer than a certain number of characters. If a line is too long, a carriage return is inserted. fold wraps at 80 characters by default, but the —width=n (or -w) switch folds at any characters. The —spaces (or -s) switch folds at the nearest space to preserve words. The —bytes (or -b) switch counts a Tab character as one character instead of expanding it.

$ head -3 orders.txt | cut -d, -f 1
Birchwood China Hutch
Bookcase Oak Veneer
Small Bookcase Oak Veneer
$ head -3 orders.txt | cut -d, -f 1 | fold —width=10
Birchwood
China Hutc
h
Bookcase O
ak Veneer
Small Book
case Oak V
eneer
$ head -3 orders.txt | cut -d, -f 1 | fold —width=10 —spaces
Birchwood
China
Hutch
Bookcase
Oak Veneer
Small
Bookcase
Oak Veneer

Joining

The Linux join command combines two files together. join examines one line at a time from each file. If a certain segment of the lines match, they are combined into one line. Only one instance of the same segment is printed. The files are assumed to be sorted in the same order.

The line segment (or field) is chosen using three switches. The -1 switch selects the field number from the first file. The -2 switch selects the field number from the second. The -t switch specifies the character that separates one field from another. If these switches aren't used, join separates fields by spaces and examines the first field on each line.

Suppose the data in the orders.txt file was separated into two files, one with the pricing information (orders1.txt) and one with the quantity and account information (orders2.txt).

$ cat orders1.txt
Birchwood China Hutch,475.99
Bookcase Oak Veneer,205.99
Small Bookcase Oak Veneer,205.99
Reclining Chair,1599.99
Bunk Bed,705.99
$ cat orders2.txt
Birchwood China Hutch,1,756
Bookcase Oak Veneer,1,756
Small Bookcase Oak Veneer,1,756
Reclining Chair,1,757
Bunk Bed,1,757

To join these two files together, use a comma as a field separator and compare field 1 of the first file with field 1 of the second.

$ join -1 1 -2 1 -t, orders1.txt orders2.txt
Birchwood China Hutch,475.99,1,756
Bookcase Oak Veneer,205.99,1,756
Small Bookcase Oak Veneer,205.99,1,756
Reclining Chair,1599.99,1,757
Bunk Bed,705.99,1,757

If either file contains a line with a unique field, the field is discarded. Lines are joined only if matching fields are found in both files. To print unpaired lines, use -a 1 to print the unique lines in the first file or -a 2 to print the unique lines in the second file. The lines are printed as they appear in the files.

The sense of matching can be reversed with the -v switch. -v 1 prints the unique lines in the first file and -v 2 prints the unique lines in the second file.

The tests are case-insensitive when the —ignore-case (or -i) switch is used.

The fields can be rearranged using the -o (output) switch. Use a comma-separated field list to order the fields. A field is specified using the file number (1 or 2), a period and the field number from that file. A zero is a short form of the join field.

$ join -1 1 -2 1 -t, -o "1.2,2.3,2.2,0" orders1.txt orders2.txt
475.99,756,1,Birchwood China Hutch
205.99,756,1,Bookcase Oak Veneer
205.99,756,1,Small Bookcase Oak Veneer
1599.99,757,1,Reclining Chair
705.99,757,1,Bunk Bed

Merging

The merge command performs a three-way file merge. This is typically used to merge changes to one file from two separate sources. The merge is performed on a line-by-line basis. If there is a conflicting modification, merge displays a warning.

For easier reading, the -L (label) switch can be used to specify a title for the file, instead of reporting conflicts using the filename. This switch can be repeated three times for each of the three files.

For example, suppose there are three sets of orders for ice cream. The original set of orders(file1.txt) is as follows:

1 quart vanilla
2 quart chocolate

These orders have been modified by two people. The Barrie store now has (file2.txt) the following:

1 quart vanilla
1 quart strawberry
2 quart chocolate

And the Orillia (file3.txt) is as follows:

1 quart vanilla
2 quart chocolate
4 quart butter almond

The merge command reassembles the three files into one file.

$ merge -L "Barrie Store" -L "Original Orders" -L "Orillia Store" file2.txt

file1.txt file3.txt will change file2.txt so that it contains:

1 quart vanilla
1 quart strawberry
2 quart chocolate
4 quart butter almond

However, if the butter almond and strawberry orders were both added as the third line, merge reports a conflict:

$ merge -L "Barrie Store" -L "Original Orders" -L "Orillia Store" file2.txt
file1.txt file3.txt
merge: warning: conflicts during merge

file2.txt will contain the details of the conflict:

<<<<<<< Barrie Store
1 quart strawberry

=======
4 quart butter almond
>>>>>>> Orillia Store

If there are no problems merging, merge returns a zero exit status.

The -q (quiet) switch suppresses conflict warnings. -p (print) writes the output to standard output instead of overwriting the original file. The –A switch reports conflicts in the diff3 -A format.

Reference Section

`type` Command Switches

-a—. Shows all locations of the command
-p—. Shows the pathname of the command
-t—. Indicates the type of command

`file` Command Switches

-b—. Brief mode
-c—. Displays magic file output
-f file—. Reads a list of files to process from file
-i—. Shows the MIME type
-L—. Follows symbolic links
-m list—. Colon-separated list of magic files
-n—. Flushes the output after each file
-s—. Allows block or character special files
-v—. Version
-z—. Examines compressed files

`stat` Command Switches

-l—. Shows information about a link
-f—. Shows information about the file system on which the file resides

`statftime` Command Format Codes

%%—. Percent character
%_A—. Uses file last access time
%_a—. Filename (no suffix)
%_C—. Uses file inode change time
%_d—. Device ID
%_e—. Seconds elapsed since epoch
%_f—. File system type
%_g—. Group ID (gid) number
%_h—. Three-digit hash code of path
%_i—. Inode number
%_L—. Uses current (local) time
%_l—. Number of hard links
%_M—. Uses file last modified time
%_m—. Type/attribute/access bits
%_n—. Filename
%_r—. Rdev ID (char/block devices)
%_s—. File size (bytes)
%_U—. Uses current (UTC) time
%_u—. User ID (uid)
%_z—. Sequence number (1,2,...)
%A—. Full weekday name
%a—. Abbreviated weekday name
%B—. Full month name
%b—. Abbreviated month name
%C—. Century number
%c—. Standard format
%D—. mm/dd/yy
%d—. Day (zero filled)
%e—. Day (space filled)
%H—. Hour (24-hr clock)
%I—. Hour (12-hr clock)
%j—. Day (1..366)
%M—. Minute
%m—. Month
%n—. Line feed (newline) character
%P—. am/pm
%p—. AM/PM
%r—. hh:mm:ss AM/PM
%S—. Second
%T—. hh:mm:ss (24-hr)
%t—. Tab character
%U—. Week number (Sunday)
%V—. Week number (Monday)
%W—. Week number (Monday)
%w—. Weekday (Sunday)
%X—. Current time
%x—. Current date
%Y—. Year
%y—. Year (two digits)
%z—. Time zone

`wget` Command Switches

—accept L (or -A list)—. Comma-separated lists of suffixes and patterns to accept
—append-output log (or -a log)—. Like —output-file, but appends instead of overwriting
—background (or -b)—. Runs in the background as if it was started with &
—continue (or -c)—. Resumes a terminated download
—cache=O (or -C O)—. Doesn't return cached Web pages when “off”
—convert-links (or -k)—. Converts document links to reflect local directory
—cut-dirs=N—. Ignores the first N directories in a URL pathname
—delete-after—. Deletes downloaded files to “preload” caching servers
—directory-prefix=P (or -P P)—. Saves files under P instead of current directory
—domains list (or -D list)—. Accepts only given host domains
—dot-style=S—. Progress information can be displayed as default, binary, computer, mega, or micro
—exclude-directories=list (or -X list)—. Directories to reject when downloading
—exclude-domains list—. Rejects given host domains
—execute cmd (or -e cmd)—. Runs a resource file command
—follow-ftp—. Downloads FTP links in HTML documents
—force-directories (or -x)—. Always creates directories for the hostname when saving files
—force-html (or -F)—. Treats —input-file as an HTML document even if it doesn't look like one
—glob=O (or -g O)—. Allows file globbing in FTP URL filenames when “on”
—header=H—. Specifies an HTTP header to send to the Web server
—http-passwd=P—. Specifies a password (instead of in the URL)
—http-user=U—. Specifies a username (instead of in the URL)
—ignore-length—. Ignores bad document lengths returned by Web servers
—include-directories=list (or -I list)—. Directories to accept when downloading
—input-file=F (or -i F)—. Reads the URLS to get from the given file; it can be an HTML document
—level=D (or -l D)—. Maximum recursion level (default is 5)
—mirror (or -m)—. Enables recursion, infinite levels, time stamping, and keeping a .listing file
—no-clobber (or -nc)—. Doesn't replace existing files
—no-directories (or -nd)—. Saves all files in the current directory
—no-host-directories (or -nH)—. Never creates directories for the hostname
—no-host-lookup (or -nh)—. Disables DNS lookup of most hosts
—no-parent (or -np)—. Only retrieves files below the parent directory
—non-verbose (or -nv)—. Shows some progress information, but not all
—output-document=F (or -O F)—. Creates one file F containing all files; if -, all files are written to standard output
—output-file log (or -o log)—. Records all error messages to the given file
—passive-ftp—. Uses “passive” retrieval, useful when wget is behind a firewall
—proxy=O (or -Y O)—. Turns proxy support “on” or “off”
—proxy-passwd=P—. Specifies a password for a proxy server
—proxy-user=U—. Specifies a username for a proxy server
—quiet (or -q)—. Suppresses progress information
—quota=Q (or -Q Q)—. Stops downloads when the current files exceeds Q bytes; can also specify k kilobytes or m megabytes; inf disables the quota
—recursive (or -r)—. Recursively gets
—reject L (or -R list)—. Comma-separated lists of suffixes and patterns to reject
—relative list (or -L)—. Ignores all absolute links
—retr-symlinks—. Treats remote symbolic links as new files
—save-headers (or -s)—. Saves the Web server headers in the document file
—server-response (or -S)—. Shows server responses
—span-hosts (or -H)—. Spans across hosts when recursively retrieving
—spider—. Checks for the presence of a file, but doesn't download it
—timeout=S (or -T S)—. Network socket timeout in seconds; 0 for none
—timestamping (or -N)—. Only gets new files
—tries=N (or -t N)—. Try at most N tries; if inf, tries forever
—user-agent=U (or -U U)—. Specifies a different user agent than wget to access servers that don't allow wget
—verbose (or -v)—. By default, shows all progress information
—wait=S (or -w s)—. Pauses S seconds between retrievals. Can also specify m minutes, h hours, and d days.

`ftp` Command Switches

-A—. Active mode ftp (does not try passive mode)
-a—. Uses an anonymous login
-d—. Enables debugging
-e—. Disables command-line editing
-f—. Forces a cache reload for transfers that go through proxies
-g—. Disables filename globbing
-I—. Turns off interactive prompting during multiple file transfers
-n—. No auto-login upon initial connection
-o file—. When auto-fetching files, saves the contents in file
-p—. Uses passive mode (the default)
-P port—. Connects to the specified port instead of the default port
-r sec—. Retries connecting every sec seconds
-R—. Restarts all non-proxied auto-fetches
-t—. Enables packet tracing
-T dir, max [,inc]—. Sets maximum bytes/second transfer rate for direction dir, increments by optional inc
-v—. Enables verbose messages (default for terminals)
-V—. Disables verbose and progresses

`csplit` Command Switches

—suffix-format=FMT (or -b FMT)—. Uses printf formatting FMT instead of %d
—prefix=PFX (or -f PFX)—. Uses prefix PFX instead of xx
—keep-files (or -k)—. Does not remove output files on errors
—digits=D (or -n D)—. Uses specified number of digits instead of two
—quiet (or —silent or -s)—. Does not print progress information
—elide-empty-files (or -z)—. Removes empty output files

`expand` Command Switches

—initial (or -i)—. Does not convert Tab characters after non-whitespace characters
—tabs=N (or -t N)—. Changes tabs to N characters apart, not eight
—tabs=L (or -t L)—. Use comma-separated list of explicit Tab positions

`unexpand` Command Switches

—all (or -a)—. Converts all whitespace, instead of initial whitespace
—tabs=N (or -t N)—. Changes Tabs to N characters apart, not eight
—tabs=L (or -t L)—. Uses comma-separated list of explicit Tab positions

`mktemp` Command Switches

-d—. Makes a directory instead of a file
-q—. Fails silently if an error occurs
-u—. Operates in “unsafe” mode; creates the file and then deletes it to allow the script to create it later

`head` Command Switches

—bytes=B (or –c B)—. Prints the first B bytes
—lines=L (or –n L)—. Prints the first L lines instead of the first 10
—quiet (or —silent or -q)—. Never prints headers with filenames

`tail` Command Switches

—retry—. Keeps trying to open a file
—bytes=N (-c N)—. Outputs the last N bytes
—follow[=ND] (or -f ND)—. Outputs appended data as the file indicated by name N or descriptor D grows
—lines=N (or -n N)—. Outputs the last N lines, instead of the last 10
—max-unchanged-stats=N—. Continues to check file up to N times (default is 5), even if the file is deleted or renamed
—max-consecutive-size-changes=N—. After N iterations (default 200) with the same size, makes sure that the filename refers to the same inode
—pid=PID—. Terminates after process ID PID dies
–quiet (or —silent or -q)—. Never outputs headers with filenames
—sleep-interval=S (or -s S)—. Sleeps S seconds between iterations

`wc` Command Switches

—bytes (—chars or -c)—. Prints the byte counts
—lines (or -l)—. Prints the line feed (newline) counts
—max-line-length (or -L)—. Prints the length of the longest line
—words (or -w)—. Prints the word counts

`cut` Command Switches

—bytes=L (or -b L)—. Shows only these listed bytes
—characters=L (or -c L)—. Shows only these listed characters
—delimiter=D (or -d D)—. Uses delimiter D instead of a Tab character for the field delimiter
—fields=L (or -f L)—. Shows only these listed fields
—only-delimited (or -s)—. Does not show lines without delimiters
—output-delimiter=D—. Uses delimiter D as the output delimiter

`paste` Command Switches

—delimiters=L (or -d L)—. Uses character list L instead of Tab characters
—serial (or -s)—. Pastes one file at a time instead of in parallel

`join` Command Switches

-1 F—. Joins on field F of file 1
-2 F—. Joins on field F of file 2
-a file—. Prints unpaired lines from file
-e s—. Replaces missing input fields with string s
—ignore-case (or -i)—. Ignores differences in case when comparing fields
-o F—. Obeys format F while constructing output line
-t C—. Uses character C as input and output field separator
-v file—. Suppresses joined output lines from file

`merge` Command Switches

-A—. Merges conflicts by merging all changes leading from file2 to file3 into file1
-e—. Merge conflicts are marked as ==== and ====
-E—. Merge conflicts are marked as <<<<< and >>>>>>
-L label—. Uses up to three times to specify labels to be used in place of the filenames
-p—. Writes to standard output
-q—. Does not warn about conflicts

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 11. Text File Basics

Create new playlist

Sign In

Sign Up

Chapter 11. Text File Basics

Working with Pathnames

File Truncation

Identifying Files

Creating and Deleting Files

Moving and Copying Files

More Information About Files

Transferring Files Between Accounts (wget)

Transferring Files with FTP

Transferring Files with Secure FTP (sftp)

Verifying Files

Splitting Large Files

Tabs and Spaces

Temporary Files

Lock Files

Named Pipes

Process Substitution

Opening Files

Using head and tail

File Statistics

Cutting

Pasting

Columns

Folding

Joining

Merging

Reference Section

type Command Switches

file Command Switches

stat Command Switches

statftime Command Format Codes

wget Command Switches

ftp Command Switches

csplit Command Switches

expand Command Switches

unexpand Command Switches

mktemp Command Switches

head Command Switches

tail Command Switches

wc Command Switches

cut Command Switches

paste Command Switches

join Command Switches

merge Command Switches

Table of Contents for
11. Text File Basics