Chapter 5. Scripting with Files

Shells were originally designed to work with files. That's because a huge portion of what you need to do on a Unix system relates to files. Unix, for example, stores most configuration settings in files, and usually text files at that. This is different from Windows, for example, which stores many configuration settings in the Registry.

This means that many simple commands can manipulate the configuration of many applications, especially server applications. All of these commands work with files. In addition to the normal operations of reading from files, writing to files, and so on, shell scripts interact with files in a number of ways, including:

  • Combining files into archives.

  • Testing files with the test command.

  • Dealing with files on Mac OS X, especially for mobile users.

  • Outputting files stored within the scripts. These are called here files. You can use here files to drive interactive programs.

Shells provide a number of built-in commands to work with files, but for the most part, your scripts will call non–built-in commands for most work with files. That's because Unix and Unix-like systems offer a rich set of commands for working with files.

You can also use input redirection to read the contents of files. Output redirection allows you to send the output of a command to a file. And command pipelines allow you to feed the results of one command into another. These topics are covered in Chapter 8.

The following sections show all sorts of additional ways to work with files.

Combining Files into Archives

Unix and Unix-like systems include a lot of files as part of the standard operating system. Most anything you do on such a system involves files. This leads to the problem of how to transfer files from one system to another and from one directory to another.

To help with this, you can use the tar command. Tar, short for tape archive, comes from ancient times as the name suggests. But it remains in heavy use today. With tar, you can combine a number of files into one large file, usually called a tar file. You can then copy the tar file to another system or directory and then extract the files. Tar preserves the original directory structure.

To create a tar archive, use the cf command-line options to tar. For example:

$ tar cf myarchive.tar *.doc

This command combines all files with names ending in .doc into a tar archive called myarchive.tar. (By convention, use a .tar file-name extension for tar archives.)

Because of its age, tar does not use a dash with its command-line options. (With some versions of tar, you can use a dash before the options.)

The c option specifies to create an archive. The f option names the archive file.

To extract the files in this archive, use the following format:

$ tar xf myarchive.tar

The x option tells the tar command to extract the files.

Following the Unix philosophy of each tool performing one task, tar does not compress files. Instead, you can use the compress command. Uncompress compressed files with the uncompress command. A more modern compression program is called gzip. Gzip compresses to smaller files than compress and so is used for almost all compressed files on Unix and Unix-like systems. A newer program called bzip2 also compresses files.

The gzip file format is not compatible with the Windows PKZIP format, usually called ZIP files.

To use gzip, provide gzip the name of the file to compress. For example:

$ gzip myarchive.tar

By default, the gzip program compresses the given file. It then changes the file name to end with a .gz extension, indicating that the file is compressed with gzip. The original file is removed. You can verify this with the ls command:

$ ls myarch*
myarchive.tar.gz

Uncompress a gzipped file with gunzip. For example:

$ gunzip myarchive.tar.gz
$ ls myarch*
myarchive.tar

The gunzip program removes the .gz file and replaces it with an uncompressed version of the original file.

The zip program combines the features of tar and a compression program. Zip creates ZIP file archives that are compatible with the Windows PKZIP and WinZip programs. If you need to go back and forth to Windows systems, zip is a good choice. Java applications use the ZIP file format for Java archive, or jar, files. Macintosh systems often use a compression program called StuffIt. StuffIt files typically show a .sit file-name extension.

Working with File Modes

Unix systems use file modes to control which users can access files as well as whether the file is marked executable (covered in Chapter 4). The file modes are divided into three areas that correspond to three classes of people who can potentially access the file: the user or owner of the file, the group associated with the file, and everyone else.

Each file is associated with a group. Files you create will be associated, by default, with your default group. Groups were once very popular, but more recent operating systems tend to make a group for each user, eliminating most of the benefits of group access. (This change was made for security reasons.)

The following table shows the numeric values of the file modes. Note that all the numbers are in octal (base 8) rather than decimal (base 10, like normal numbers).

Value

Meaning

400

Owner has read permission.

200

Owner has write permission.

100

Owner has execute permission.

040

Group has read permission.

020

Group has write permission.

010

Group has execute permission.

004

All other users have read permission.

002

All other users have write permission.

001

All other users have execute permission.

The table shows three-digit values, which control the access for the owner of the file, the group associated with the file, and everyone else, in that order. All these numbers are in octal, which makes it a lot easier to add modes together. For example, 600 means the owner has read and write permission (and no one else has any permissions) because 600 = 200 + 400.

If you ever want evidence that Unix is possessed, note that the mode where all users have read and write permissions is 666.

To change permissions on a file, pass these octal modes to the chmod command. For example:

$ ls -l myarchive.tar
-rw-rw-r--  1 ericfj ericfj 266240 Oct 24 21:22 myarchive.tar
$ chmod 600 myarchive.tar
$ ls -l myarchive.tar
-rw-------  1 ericfj ericfj 266240 Oct 24 21:22 myarchive.tar

These commands show the file permissions before the change (664) and after the change (600). Note that in the long file listings, r stands for read permission, w for write permission, and x for execute permission. The three sets of letters apply to the owner, the group, and all others.

You can use these letters with the chmod command. This is especially useful if octal math is not your cup of tea, although the alternative is not that easy to understand either. The basic format is u + or - the user permissions, g + or - the group permissions, and o + or - the other users permissions. Use a plus, +, to add permissions, and a minus, -, to remove them.

For example, to set a file to have read and write permissions for the owner and no access to anyone else, use the following as a guide:

$ chmod 666 myarchive.tar
$ ls -l myarchive.tar
-rw-rw-rw-  1 ericfj ericfj 266240 Oct 24 21:22 myarchive.tar
$ chmod u+rw,g-rwx,o-rwx myarchive.tar
$ ls -l myarchive.tar
-rw-------  1 ericfj ericfj 266240 Oct 24 21:22 myarchive.tar

This example first sets read and write permissions for everyone on the file myarchive.tar. The ls command verifies the new permissions. Then the next chmod command sets the permissions to 600 by adding and removing permissions. Note that you can remove permissions that are not assigned, such as the execute permissions in this example.

You can also perform the permission change using a number of commands. For example:

$ chmod u+rw myarchive.tar

This command sets up the user permissions for the owner of the file.

$ chmod g-rwx myarchive.tar

This command removes any permissions from other users in the group associated with the file.

$ chmod o-rwx myarchive.tar

This command removes any permissions for all other users. Note that you do not have to set all permissions at once. You could, for example, just remove the group and other user permissions.

Testing Files with the test Command

The test command, used with conditionals, includes a number of options to test for file permissions as well as to test the type and size of files.

The following table lists the file options for the test command.

Test

Usage

-d file name

Returns true if the file name exists and is a directory

-e file name

Returns true if the file name exists

-f file name

Returns true if the file name exists and is a regular file

-r file name

Returns true if the file name exists and you have read permissions

-s file name

Returns true if the file name exists and is not empty (has a size greater than zero)

-w file name

Returns true if the file name exists and you have write permissions

-x file name

Returns true if the file name exists and you have execute permissions

You can combine the test command with the chmod command, as in the following Try It Out, to lock down permissions on files so that other users cannot access your files. This is very useful in this era of security vulnerabilities.

Dealing with Mac OS X Files

Mac OS X, while Unix-based, is not laid out in what is considered a "standard" Unix layout. This section addresses some differences from Unix-based systems that you should be aware of in dealing with files on a Mac OS X system.

The Legacies of NeXT

Mac OS X's Unix side has its roots in two places, NeXTSTEP and BSD. Most of the differences from Unix-based systems come from NeXT. The biggest distinction is the file system layout. On a standard Mac OS X box, you have a single partition. No swap, no nothing else, just the root partition, /. Swapfiles are discrete files, living in /var/vm. This method of swap has its good and bad points, but it's the generic way of handling swap for Mac OS X, so you don't have a swap partition unless it's been implemented manually.

Another missing item is the /opt directory. It's not a part of the standard OS X setup, and if you assume it's going to be there, you'll have a lot of error messages in your scripts. The /etc, /var, and /tmp directories are all there but actually live inside a directory named /private and are linked to the root level of the boot drive.

User home directories are not in the /Home directory but in /Users, except when they are on a server, and then are in the /Network hierarchy. If your script is going to be running in an elementary/high school environment, then options such as network homes and NetBoot are going to be quite common, and that can change things on you if you aren't prepared for it.

Most nonboot local volumes live in the /Volumes directory, even if they are partitions on the boot drive. The normal exceptions to this are NFS volumes, which still work like they do with most Unix systems, although they can live in /Volumes as well. Volumes in /Volumes typically show up on the user's desktop, and unless you set them otherwise, mount on user login only. So if you need a partition to be visible prior to login, you must manually set up that mount.

Network drives can live in both /Network and /Volumes, depending on the usage and the situation. The important thing to remember is that Mac OS X is not truly Linux or any other kind of Unix. While most things will work the way you expect, if you are going to be dealing with Mac OS X on a regular basis, you should take the time to become familiar with its file system.

Mobile File Systems and Mac OS X

Thanks to the prevalence of FireWire and the ease that the Mac has always had of booting from different partitions, drives, and the like, you will likely work with file systems that are quite dynamic. For the Unix world, Mac OS X users are extremely mobile. All those iPod music players can become spare data drives, or even boot drives, without much work. (Install an OS, and select the iPod in System Preferences

Mobile File Systems and Mac OS X

Note

Another fun aspect of FireWire drives and their relatives is that they allow you to ignore permissions. Mac OS X enables the local user to ignore permissions on a locally attached FireWire or USB drive with a click of a button in the Finder's Get Info window. So the Unix permissions on a FireWire or USB drive are only a suggestion, not a rule. Don't rely solely on Unix file permissions for security. It takes a Mac OS X power user only about 30 minutes to bypass those, even without using the Terminal.

The message here is don't hardcode paths in your scripts unless you have to, and if you do, try to warn users with read-me files.

Target Disk Mode

Thanks to an Apple feature called Target Disk Mode (TDM), mobile file systems are not limited to just portable hard drives but can also include the computer itself. That's correct—the entire Mac.

As a rule of thumb, any Mac with a G4 can run Target Disk Mode, although some G3 Macs can too. If you aren't sure whether your Mac supports Target Disk Mode, check the list at http://docs.info.apple.com/article.html?artnum=58583.

TDM is ridiculously simple to use. You boot, or restart a Mac, and hold down the T key until you see a giant FireWire symbol moving about the screen. (You can't miss it.) You then run a FireWire cable from the machine in Target Disk Mode to another Mac. Wait a few seconds, and voilà! You'll see the TDM machine showing up as a big hard drive on the other machine. Not to belabor the point, but again, avoid assumptions about the file system layout wherever possible, and if you can't, give plenty of warning.

Mobile File Systems and Disk Images

Disk images are everywhere. They've become the preferred method of software distribution on the Mac, If you see .dmg or .smi files, you're dealing with disk images. From a Unix point of view, these are physical disks. For example, here's the result of a mount command with a CD, a disk image, and a FireWire drive mounted:

$ mount
automount -nsl [291] on /Network (automounted)
automount -fstab [318] on /automount/Servers (automounted)
automount -static [318] on /automount/static (automounted)
/dev/disk1s0 on /Volumes/NWN_XP2 (local, nodev, nosuid, read-only)
/dev/disk2s3 on /Volumes/Untitled 1 (local, nodev, nosuid, journaled)
/dev/disk3 on /Volumes/Casper Suite 2.2 (local, nodev, nosuid, read-only, mounted by foo)

So which line is the disk image? Well, NWN_XP2 is the PC CD of the fantastic game Neverwinter Nights. Untitled 1 is a FireWire drive. Casper Suite 2.2 is a disk image. Note that there's really no way to tell which is which. CD-ROMs are always read-only, but disk images can be read/write, too. The biggest advantages of disk images are that they compress well, they're easy to use, and because they support AES encryption, they can be quite useful as a way to securely send data. (That last feature is a nice way to get around the "ignore permissions" issue on mobile drives, by the way.)

While you may feel you've been beaten about the head regarding the mobility of file systems in a Mac OS X world, there's a reason: You cannot assume that Mac OS X is like every other Unix. It's mostly like them at the shell level, but there are some real differences that you are not going to see too many other places, and if you forget that and hardcode everything to reflect a Linux/Solaris/HP-UX/AIX worldview, Mac OS X will make your scripts very sad and unhappy. That's not to say it's all bad. The vast majority of Mac OS X users don't tend to reconfigure their file layouts. They leave things alone. If the OS wants swapfiles in /var/vm instead of a separate swap partition, they're fine with that. If the OS wants to mount things that aren't boot partitions in /Volumes, great! As long as they don't have to dink with it, they're happy. You tend to not see the kind of hardcore customizations at the Unix level that you see on a different platform, so you can avoid most problems by testing to make sure that what you think is the way the world looks is actually the way the world is.

Naming Issues

Before I get into the differences between HFS+ and "normal" Unix physical file systems, there's one more thing to look at, and it's important. It can be summed up as: "Beware the colon and the space" or "Mac users are crazy."

It is fairly uncommon to see volume names with spaces in the rest of the Unix world. The root partition is /, and the others tend to have normal names such as swap, opt, and so on. This is not true in the Mac OS X world. Mac users name their drives, especially portable drives, things that just don't tend to happen in the traditional Unix world. Consider, for example, the person who names all his drives after characters in Herman Melville novels; he mounts drives named Moby Dick, The Pequod, and Quee-Queg.

If you assume that spaces are not used, it can bite you back, and it can bite you back hard. It bit Apple on one particularly memorable occasion, wherein a shell script misassumption in the iTunes 2.0 installer caused people with multiple volumes, or spaces in the names of their boot volumes, to suddenly find their drives erased. That would be what those in the IT field call bad.

As well, while most traditional shell scripters know that the / character is a directory delimiter, on Mac OS X boxes, under the default file system, the colon can do that job as well. So if you use colons in file names, Mac users are going to potentially see some very odd things when they look at those files in the Finder, the file system UI for Mac OS X.

HFS+ Versus UFS: The Mac OS X Holy War

One thing that traditional shell scripters will run into when they start dealing with Mac OS X is the file system. It's not UFS, or EXTFS*. It's normally (at least 95 percent of the time) the Hierarchical File System Extended FS, or more simply HFS+. This was the file system used on the Classic Mac OS starting with Mac OS 8.1, and it's the default file system on every shipping Mac. Luckily, if you sidestep philosophical issues, there are not too many differences between HFS+ and other Unix file systems.

The biggest difference is in the way HFS+ handles case. HFS+ is case insensitive but case preserving. So while it's not going to change the case of any file or directory names, it also doesn't care about it. To HFS+, filename and FILENAME are the same thing. If you're doing a lot of work with Apache on Mac OS X, you'll want to ensure that the hfs_apple_module is loaded and in use, and the case issue shouldn't cause you problems there either.

HFS+'s case insensitivity is going to bite you less than you think because to most Mac users, this is how it's always been, so they don't think that using filename, Filename, and FILENAME is a great way to keep different versions of the same file around. As long as you don't assume case-sensitivity in your scripts (and you shouldn't rely on this as a filing system anyway; it's really a poor way to version data), you should never hit a difference.

Another difference you'll see, but shouldn't be affected by, is the creation date. In most Unix systems, the only date you use is the modification date. On Mac OS X, HFS+ supports the creation date separately from the modification data as a part of a file's metadata. It's mostly used by users when finding files, and you'd have to explicitly code for it to use it, so it's not something you're going to stumble over.

Because HFS+ doesn't use inodes, it emulates those for the shell environment. It's unlikely you'll ever run into this, because HFS+ emulates hard links and inodes well enough that it should never cause you problems. But if you ask a Mac OS X user about inodes, you're probably going to get a blank look.

While HFS+ emulates hard links and supports symbolic links, it also supports a third lightweight file reference, namely the alias. An alias is like a symbolic link, but with an added feature: with a symbolic link, if you move what the link points to, the link dies. With an alias, if you move what the alias points to, as long as you don't move that file or folder off of the volume, the alias doesn't die; it just gets updated to the new location of the target. An alias appears as a file on disk, but the Finder treats an alias like a link. This dualism can cause problems in your scripts. You're unlikely to create aliases in your scripts. But if your script comes across what appears to be a file, and your script can't use it, it could be an alias.

The Terror of the Resource Fork

The final major difference between HFS+ and UFS file systems involves something that isn't really as much of a concern anymore: the resource fork. In the original versions of the Mac OS file systems, MFS and HFS, each file had two forks: a data fork and a resource fork. The data fork held data and maybe some metadata. The resource fork held things such as icons, code, comments, and so on. The problem was that if you just copied a dual-forked file to something like UFS, the resource fork tended to wither and die, and all you got was the data fork. For documents, this was anything from a nonissue to an annoyance. For applications, it was a death sentence.

However, HFS+ deals with this differently. It certainly supports the resource fork, but it's no longer a hard part of the file. It's just a metadata fork, à la an NTFS file stream that is mostly used by older applications. There are some modern Mac OS X applications that still use resource forks; the biggest example is Microsoft Office. But the vast majority don't do that anymore. They use a single-forked bundle format, which is just a directory that acts like a single file.

The resource fork will not show up directly in listings by the ls command. And the mv and cp commands won't properly copy the resource forks. Thus, you want to use the Finder to move or copy files on a Mac OS X system.

If you want to avoid resource forks, here are some rules of thumb that will keep you out of trouble:

  • Stay out of /Applications (Mac OS 9) and /System Folder. While old-style apps can live outside of those locations, almost everything in those directories uses a resource fork, and shell scripts have no business in them.

  • If the application looks like a directory and ends in .app, then it's a single-forked app. Otherwise, it's probably a dual-forked app.

    For example, here's a directory listing for Microsoft Word (which uses a resource fork) and Adobe Photoshop CS (which doesn't use a resource fork):

    -rwxrwxr-x   1 jwelch  admin  17069566 Sep 13 12:00 Microsoft Word
    drwxrwxrwx     3 jwelch  jwelch     102 Nov 24  2003 Adobe Photoshop CS.app

    Note that Word is a single file without the .app extension, whereas Photoshop is a directory with the .app extension. If you are using non–resource fork–aware shell utilities (almost all of them), and you see the Photoshop structure, you're probably okay. If you see the Word structure on an app, then you should proceed with caution. Unfortunately, it's hard to tell if a file is just a file or a resource-forked application. It's not impossible, but you're not going to do it with standard Unix tools on a Linux system. However, Microsoft Office is one of the last major Mac OS X apps to still use resource forks, so there's one final rule of thumb.

  • Stay out of any directory whose name contains Microsoft Office in the path or name. There is nothing in there you wish to mess with.

Note that these tips apply to applications. Files are trickier, but they also don't fall apart without a resource fork as often. Nine times out of ten, you just lose an icon. The only notable problem children here are clipping files (which end in .clipping, .textClipping, or .webloc) and old StuffIt Archives (which end in .sit). If you run into a file and aren't sure, make a copy and work on that. If you guess wrong, at least you've just mangled a copy.

At the moment, there aren't a lot of tools that handle resource forks well (although Mac OS X 10.4, aka Tiger, is supposed to fix a lot of this). Apple provides two with its developer tools, namely mvMac and cpMac, which are resource fork–aware versions of mv and cp, respectively. There are some third-party tools, such as hfstar, hfspax, and RsyncX, but they don't ship with the OS, so you can't rely on them. Unfortunately, resource forks are still a way of life on the Mac, so your best bet is to just avoid the places they are likely to be.

Working with Here Files and Interactive Programs

You've seen the mantra again and again in this book: Shell scripts were designed to work with files. You can take advantage of a special feature called here files to embed files within your scripts.

A here file or here document is a section of text in your shell script that the shell pulls out and treats as a separate file. You can use a here file to create a block of text, for output to the user, or for input to a command such as ftp, telnet, and other interactive applications. The shell runs the command as if you typed in the here file text as input for the command.

You can think of here files as "here is the file." Otherwise, the term sounds very odd.

The basic syntax for a here file is:

command <<FileContinuesUntilHere
...text of here file...
FileContinuesUntilHere

Use the << operator to signify that the text of the file is embedded within the script. The text FileContinuesUntilHere acts as a marker. The shell treats FileContinuesUntilHere as the end-of-file marker for the here file. (You can choose your own text as the end-of-file marker. You do not have to use FileContinuesUntilHere. Do not use an end-of-file marker that may appear in the text, however. Make your end-of-file markers unique.)

Here files are very similar to input redirection, covered in Chapter 8.

Displaying Messages with Here Files

One of the simplest ways to use here files is to output a message to the user. While the echo command also outputs messages to the user, if you need to output more than three lines, the echo command becomes inconvenient. In this case, a here file works a lot easier for you, the scripter.

In most cases, you can use the cat command to output a here file to the user, as shown in the following Try It Out example.

Customizing Here Files

Another advantage of here files over input redirection is that you can customize the text in the here file, using shell variables. The shell expands any variables referenced within a here file, just as the shell expands variables within the rest of your scripts. The following Try It Out demonstrates this.

If you omit the second argument, you see output like the following:

$ sh tps_report2 Monday

Broadcast message from ericfj (Sat Oct 28 12:45:35 2006):

Please complete all TPS reports and have them
on my desk by EOB Monday.

Your cooperation in this matter helps the smooth
flow of our departmental structure.

Action, urgency, excellence!
-Dick
Message sent

And if the user omits all the arguments, you see output like the following:

$ sh tps_report2

Broadcast message from ericfj (Sat Oct 28 12:45:35 2006):

Please complete all TPS reports and have them
on my desk by EOB today.

Your cooperation in this matter helps the smooth
flow of our departmental structure.

Action, urgency, excellence!
-Dick
Message sent

Driving Interactive Programs with Here Files

In addition to placing variables within a here file, another use of here files comes when you need to drive interactive programs from a script.

Turning Off Variable Substitution

By default, the shell substitutes variable values into the text of any here files. You can turn off this feature by enclosing the end of the here document marker in quotes. For example:

cat <<'QuotedEndMarker'
...text...
QuotedEndMarker

This allows you to include special formatting in your here file, for example, if you write a script that outputs a script or program source code.

Summary

Files. Files. Files. Most shell scripts work with files of some sort. This chapter extends the basic operations on files to include:

  • Combining files into archives with the tar command.

  • Compressing files with the gzip command.

  • Working with file permissions such as 600 for user read and write but no access to any others.

  • Testing files with the test command. This includes testing file permissions.

  • Handling issues on Mac OS X, especially regarding mobile computers and the file system issues that arise as users move from site to site.

  • Using the oddly named here files to output text, even including programs.

The next chapter covers sed, a program you can use to make script-driven modifications to files. You'll find sed to be a powerful command that warrants its own syntax. You'll also find that sed is used in a huge number of shell scripts.

Exercises

  1. Modify the lockdown script to remove only the group and other user permissions. Hint: The script will then no longer need to check and preserve the existing user permissions.

  2. Make the lockdown script an executable script. Show all the necessary steps.

  3. Rewrite the tps_report2 script. Preserve the concept that if the user forgets to pass command-line arguments, the script will fill in useful defaults. But create a more compact way to test the arguments using nested if statements.

  4. Rewrite the tps_report2 script using a case statement instead of if statements.

  5. Write a script with a here file that when run outputs a script that, in turn, when run outputs a script. That is, write a script that outputs a script that outputs a script. Hint: Start with a script that outputs another script. Then add the additional script.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.13.164