Input and Output with File Handles

Way back on Day 2, “Working with Strings and Numbers,” you learned just a bit about file handles, as part of the information on standard input and output. At that time, I ex-plained that STDIN and STDOUT are a special kind of file handle that refer to nonfile-based input and output streams—the keyboard and the screen, for example. And, conveniently, much of what you've learned already is going to apply just as well to file handles that refer to actual files.

In this section, you'll learn how to tame the wily file handles: creating them with the open function, reading from them, writing or appending to them, and closing them when you're done. Along the way, we'll review what you've learned so far about input and output.

Creating File Handles with open

To read input from a source, or to write output to a destination, you need to use a file handle. A file handle is commonly associated with a specific file on a disk that you're reading or writing from. It can also refer to a network connection (a socket), to a pipe (a sort of connection between standard output and standard input we'll look at on Day 18, “Perl and the Operating System”), or even to and from a specific hardware device. The file handle simply just makes all those things consistent so you can do the same things regardless of where the data is coming from or going to.

Perl provides three default file handles, two of which you've already seen: STDIN, STDOUT and STDERR. The first two are for standard input and output (the keyboard and the screen, typically). STDERR is the standard error, and is used for error messages and other asides that aren't part of the actual script output. You'll commonly see STDERR messages printed to the screen just like STDOUT; it's only programs that specifically make use of standard output (programs on the other side of pipes in Unix, for example) that will notice the difference.

You don't have to do anything to open or initialize these file handles; you can just go ahead and use them (as we have been throughout last week's lessons).

To read from or write to a file, you must first create a file handle for that operation with the open function. The open function opens a file for reading (input) or for writing (output), and associates a file handle name of your choosing to that file. Note that reading from and writing to a file are separate operations, and you'll need a file handle for each one.

The open function takes two arguments: the name of a file handle, and the file to open (which includes a special code indicating opening the file for reading or writing). Here are a few examples:

open(FILE, 'myfile'),

open(CONFIG, '.scriptconfig'),

open(LOG, '>/home/www/logfile'),

The name of the file handle is anything you want it to be. File handles are, by convention, all uppercase, and contain letters, numbers, or underscores. Unlike variables, they must start with a letter.

The second argument is the name of the file on the disk that will be attached to your file handle. A plain filename with no path information will be read from the current directory (either the one your script is being run from, or from some other directory if you've changed it). You learn more about navigating directories on Day 17, “Managing Files and Directories.”

If you're going to use path names other than single files, be careful—the path notation varies from platform to platform. On Unix, paths are delineated with forward slashes, as in the last one of the preceding examples.

For Windows systems, standard DOS notation, with backslashes in between directory names, works fine as long as you use single quotes to surround the path. Remember that backslashes indicate special characters in Perl, so you might end up creating a bizarre path with no relation to reality. If you do use double-quoted string, you must backslash each backslash to escape it properly.

open(FILE, 'c:	empfiles
umbers'),  # correct

# eeek! contains a tab and a newline (	, 
)
open(FILE, "c:	empfiles
umbers");

open(FILE "c:\tempfiles\numbers"); # correct

Because most modern Windows systems can handle directory pathnames with forward slashes, you might want to use those instead, for better portability to Unix systems (if you care) and to improve the readability of your scripts.

On the Mac, the directory separator is a colon, and absolute path names start with the disk or volume name (hard disk, CD-ROM, mounted disk, and so on). If you're concerned about portability to other systems, you'll need to make a note of it and convert your path names later. A couple examples of Mac syntax are

open(FILE, "My Hard Disk:Perl Stuff:config");

open(BOOKMARKS, "HD:System Folder:Preferences:Netscape:Bookmarks.html");

In each of these cases, we've been opening a file handle for reading input into the script. This is the default. If you want to write output back to a file, you still need a file handle and you still use open to get it, but you use it with a special character ahead of the filename:

open(OUT, ">output");

The > character indicates that this file handle is to be opened for writing. The given file is opened and any current contents, if they exist, are deleted (you can test to see if a file exists before you open it for writing to avoid this behavior; more about file tests later on in “Managing Files.”)

What if you want to read input from a file, do something with it, and then write the same file back? You'll have to open two file handles: one to read in the input, and then another later on to reopen the file for writing. Reading and writing are different processes and require different file handles.

Note

Actually, there is a code for both reading and writing to the same file: "+>filename". You might use this if you wanted to treat a file as a database, where instead of reading in the whole thing, you store it on disk, and then read and write to that file as you read or change data. This book will stick with reading and writing simple text files, in which case it's less confusing and easier to manage the data if you use two separate file handles: one to read the data into memory, and one to write the data back out again.


You can also open a file for appending—where the current contents of the file are retained, and when you print to the output file handle, your output is appended to the end of that file. To do this, use the >> special characters in your call to open:

open(FILE, ">>logfile");

The die Function

The open function is nearly always called in conjunction with a logical or and a call to die on the other side, like this:

open(FILE, "thefile") or die "Can't findfile";

The call to die isn't required, but it occurs so frequently in Perl code that the combination is almost boilerplate. If you don't do something like this, chances are good that if anyone else sees your code, they're going to ask you why you didn't.

“Open this file or die!” is the implied threat of this statement, and that's usually precisely what you want. The open command could potentially fail—for example, if you're opening the file for reading and it doesn't exist, if the disk is behaving weirdly and can't open the file, or for whatever other strange reason. Usually, you don't want your script to plow ahead if something has gone horribly wrong and it can't find anything to read. Fortunately, the open function returns undef if it could not open the file (and 1 if it could), so you can check the result and decide what to do.

Sometimes “what to do” can vary depending on the script. Commonly, however, you just want to exit the script with an error message. That's what the die function does: it imme-diately exits the entire Perl script, and prints its argument (a string message) to the STDERR file handle (the screen, typically).

If you put a newline character at the end of the message, Perl will print that message as it exits. If you leave off the newline, Perl will print an additional bit of information: at script.pl line nn. The script.pl will be the name of your script, and line nn will be the actual line number in which the die occurred. This can be useful for debugging your script later.

One other clever use of die: The Perl special variable $! contains the most recent operating system error (if your OS will generate one). By putting the $! variable inside the string to die, your error message can sometimes be more helpful than just “Can't open file.” For example, this version of die:

die "Can't open file: $!
";

might result in the message “Can't open file: Permission denied” if the reason the file can't be opened is because the user doesn't have the right access to that file. Use of $! is generally a good idea if you're calling die in response to some sort of system error.

Although die is commonly used on the other side of a call to open, don't think it's only useful there. You can use die (and its nonfatal equivalent, warn) anywhere in your script where you want to stop executing the script (or print a warning message). See the perlfunc man page for more information on how to use die and warn.

Reading Input from a File Handle

So you've got a file handle. It's attached to a file you've opened for reading. To read input from a file handle, use the <> (input) operator with the name of that file handle, like this:

$line = <FILE>;

Looks familiar, right? You've been doing the same thing with STDIN to get a line of input from the keyboard. That's what's so cool about file handles—you use exactly the same procedures to read from a file as you do to read from the keyboard or to read from a network connection to a server. Perl doesn't care. It's exactly the same procedure, and everything you've already learned applies.

In scalar context, the input operator reads a single line of input up until the newline:

$line = <STDIN>;
if (<FILE>) { print "more input..." } ;

One special case of using the input operator in a scalar context is to use it inside the test of a while loop. This has the effect of looping through the input one line at a time, assigning each line to the $_ variable, and stopping only when the end of the input is reached:

while (<FILE>) { .
   # ... process each line of the file in $_
}

You've seen this same notation a lot with empty input operators. The empty input operators <> are, themselves, a special case in Perl. As you've learned, you use the empty input operators to get input from files contained on the script's command line. What Perl does for those files is open them all for you and send their contents to you in order via the STDIN file handle. You don't have to do anything special to handle them. Of course, you could open and read each file yourself, but the use of <> in a while loop is an extremely handy shortcut.

In list context, the input operators behave as if the entire input was being read in at once, with each line in the input assigned to an element in the list. Watch out for the input operator in a list context, as it might not always do what you expect it to. Here are some examples:

@input = <FILE>;  # read the entire file into @input;

$input = <FILE>;  # read the first line of file into $input

($input) = <FILE>; # read the first line of file into $input,
                   # throw the rest of FILE away (yikes!)
print <FILE>;     # print the entire contents of <FILE> to the screen

Writing Output to a File Handle

To write output to a file handle, you'll commonly use the print or printf functions. By default, the print (and printf) functions print to the file handle STDOUT. To print to a different file handle, for example, to write a line to a file, first open the file handle for writing, as follows:

open(FILE, ">$myfile") or die "Can't find $myfile
";

And then use print with a file handle argument to put data into that file:

print FILE "$line
";

The printf and sprintf functions work similarly; include the file handle to print to before the formatting string and the values as shown here:

printf(FILE "%d responses have been tabulated
", $total / $count);

One very important part of print and printf that you need to be aware of: There is no comma between the file handle and the list of things to print. This comes under the heading of most common Perl mistakes (and will be caught if you have Perl warnings turned on). The file handle argument is entirely separate from the second argument, which is a list of elements separated by commas.

Reading and Writing Binary Files

Throughout this book, we've been reading and writing text data. But not all the files you work with in Perl are in text format; many of them may be binary files. If you're using Perl on Unix or the Mac, the difference won't matter; Unix and MacPerl can handle both text and binary files just fine. If you're using Windows, however, you'll get garbled results if you try to process a binary file in a normal Perl script.

Fortunately, there's an easy work-around: The binmode function takes a single file handle as an argument, and will process that file handle (read from it or write to it) in binary mode:

open(FILE, "myfile.exe ") or die "Can't open myfile: $!
";
binmode FILE;
while (<FILE>) { # read in binary mode...
						

Tip

For the sake of portability, it's a good idea to just go ahead and use binmode whenever you're dealing with binary programs. It never hurts anything, and if your program ever winds up being used on a system that requires the use of binmode, it will just work without modification.


Closing a File Handle

When you're done reading from or writing to a file handle, you need to close it. Actually, you often don't have to close it yourself; when your script finishes executing, Perl closes all your file handles for you. And if you call open multiple times on the same file handle (for example, to open a file handle for writing after you're done reading from it), Perl will close that file handle automatically before opening it again. However, it's considered good programming practice to close your file handles after you're done with them; that way they won't take up any more space in your script.

To close a file handle, use the close function, like this:

close FILE;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.150.109