Chapter 10. Getting a Handle on Files

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Getting a Handle on Files

Now it is time to see how Perl interacts with files, pipes, and command-line arguments. By the time you have finished this chapter, you should be able to explain the following script.

Click here to view code image

use feature 'say';
die "Insufficient arguments" if scalar @ARGV < 1;

while(<>){
say "$ARGV $. $_ ";
say "x" x 30 && close ARGV if eof;
}

10.1 The User-Defined Filehandle

If you are processing text, you will regularly be opening, closing, reading from, and writing to files. In Perl, we use filehandles to get access to system files.

A filehandle is a name for a file, device, pipe, or socket. In Chapter 4, “Getting a Handle on Printing,” we discussed the three default filehandles, STDIN, STDOUT, and STDERR. Perl also allows you to create your own filehandles for input and output operations on files, devices, pipes, or sockets. A filehandle allows you to associate the filehandle name with a system file and to use that filehandle to access the file.

10.1.1 Opening Files—The open Function

The open function lets you name a filehandle and the file you want to attach to that handle. If the filehandle is an undefined scalar variable, a new file filehandle is created (called autovivification) as a reference to a new anonymous filehandle. If the filehandle is an expression, it is a symbolic reference to the named file. The file can be opened for reading, writing, or appending (or both reading and writing), and the file can be opened to pipe data to or from a process. The open function returns a nonzero result if successful and the undefined value if it fails. Like scalars, arrays, and labels, filehandles have their own namespace. So that they will not be confused with reserved words, it is recommended that you use lexical scalars variables to hold your filehandles (as we will do in most of the examples in this chapter). If you use a bareword name for your filehandle, it is recommended that it be in capital letters (see the open function in Appendix A, “Perl Builtins, Pragmas, Modules, and the Debugger”).

When opening text files on Win32 platforms, the ( octal 15; and octal 12), are characters representing return and newline are translated into when text files are read from disk, and the ^Z character is read as an end-of-file (EOF) marker. The following functions for opening files should work fine with text files but will cause a problem with binary files (see Section 10.2.8, “Win32 Binary Files”).

10.1.2 Opening for Reading

The following examples illustrate how to open files for reading with both the older style and modern style. Even though the examples represent UNIX files, they will work the same way on Windows, Mac OS, and other systems.

Explanation

1. This indirect style is recommended. It is a three-part format for opening a file for reading. The open function will create a reference to a filehandle, lexical variable $fh (any scalar variable name can be used), then the < symbol indicates that the file will be opened for reading, and the third argument is the name of the system file, myfile. Since a full pathname is not specified for myfile, it must be in the current working directory, and you must have read permission to open it for reading. When leaving the block where $fh is defined, the variable will go out of scope and the file will be implicitly closed. This helps reduce the chance of a name clash if you use the same variable later in the file to open another file.

2. This is an alternate way to open a file with two arguments. The open function will create the user-defined filehandle FH and attach it to the system file /etc/passwd. The < symbol is not necessary, but may help clarify that this is a read operation. The full pathname is specified for passwd.

Closing the Filehandle

The close function closes the file, pipe, socket, or device attached to FILEHANDLE. Once FILEHANDLE is opened, it stays open until it goes out of scope, the script ends, or you call the open function again. (The next call to open closes FILEHANDLE before reopening it.) If you don’t explicitly close the file, when you reopen it this way, the line counter variable, $., will not be reset. Closing a pipe causes the process to wait until the pipe is complete and reports the status in the $! variable (see the following section, “The die Function” for more about the $! variable). It’s a good idea to explicitly close files and handles after you are finished using them, but if using a lexically scoped scalar as the filehandle, it will be closed as soon as it goes out of scope.

The die Function

In the following examples, the die function is used if a call to the open function fails. If Perl cannot open the file, the die function is used to exit the Perl script and print a message to STDERR, usually the screen.

If you were to go to your shell or MS-DOS prompt and type

cat junk (UNIX)

type junk (DOS)

and if junk is a nonexistent file, the following system error would appear on your screen:

Click here to view code image

cat: junk: No such file or directory (UNIX "cat" command)
The system cannot find the file specified. (Windows "type" command)

When using the die function, Perl provides a special variable $! to hold the value of the system error that occurs when you are unable to successfully open a file or execute a system utility. This is very useful for detecting a problem with the filehandle before continuing with the execution of the script.

Explanation

1. When trying to open the file /etc/password, the open fails (it should be /etc/passwd). The or operator causes its right operand to execute if the left operand fails. The die operator is executed. The string Can't open: is printed, followed by the system error No such file or directory. The suppresses any further output from the die function. All of die’s output is sent to STDERR after the program exits.

2. This is exactly like the first example, except that the has been removed from the string Can't open:. Omitting the causes the die function to append a string to the output, indicating the line number in the script where the system error occurred.

10.1.3 Reading from a File and Scalar Assignment

The Filehandle and $_

In Example 10.4, a file called datebook is opened for reading. Each line read is assigned, in turn, to $_, the default scalar that holds what was just read until the end of file is reached.

EXAMPLE 10.4

Click here to view code image

(The Text File: datebook)
   Steve Blenheim
   Betty Boop
   Lori Gortz
   Sir Lancelot
   Norma Cord
   Jon DeLoach
   Karen Evich

----------------------------------------------------------------

(The Script)
   use warnings;
   # Open a file with a filehandle
1  open(my $fh, "<", "datebook") || die "Can't open datebook: $! ";
2  while(<$fh>) { # The $_, hidden variable, gets a line at a time
3     print  if /Sir Lancelot/;
4  }
5  close($fh);

(Output)
3  Sir Lancelot

Explanation

1. $fh is a lexically scoped, user-defined scalar used as a filehandle. The open function will attach the system file datebook to it and open the file for reading. If open fails because the file datebook does not exist, the die operator will print to the screen, Can't open datebook: No such file or directory.

2. The expression in the while loop is the filehandle $fh, enclosed in angle brackets. The angle bracket operator is used for reading input and not part of the filehandle name. When the loop starts, the first line read will be stored in the $_ scalar variable. (Remember, the $_ variable holds each line of input from the file.) If it has not reached end of file, the loop will continue to take a line of input from the file, execute statements 3 and 4, and continue until end of file is reached.

3. The default input variable $_ is implicitly used to hold the current line of input read from the filehandle. If the line contains the regular expression Sir Lancelot, that line (stored in $_) is printed to STDOUT. For each loop iteration, the next line read is stored in $_ and tested.

4. At the end of the loop, control will go back to the top of the loop (line 2) and the next line of input will be read from the file; this process will continue until all the lines have been read.

5. After looping through the file, the filehandle is closed.

The Filehandle and a User-Defined Scalar Variable

In addition to the default $_ variable, Perl allows you to create your own user-defined scalar variables to hold input from a file.

EXAMPLE 10.5

Click here to view code image

(The Text File: datebook)
   Steve Blenheim
   Betty Boop
   Lori Gortz
   Sir Lancelot
   Norma Cord
   Jon DeLoach
   Karen Evich

----------------------------------------------------------------
(The Script)
   use warnings;
   # Open a file with a filehandle
1  open(my $fh, "<", "datebook") || die "Can't open datebook: $! ";
2  while(my $line = <$fh>) { # Each line, in turn, is assigned to $line
3     print "$line" if  $line =~ /^Lori/;
4  }
5  close($fh);

(Output)
3  Lori Gortz

“Slurping” a File into an Array

When assigning input from a file to an array, Perl takes each line (ending in ) as an element of the array, “slurping” up each line of the file and adding it to the array until end of file is reached.

Using map to Create Fields from a File

You can use the map function in conjunction with the split function to break up input into several elements of an array.

EXAMPLE 10.7

Click here to view code image

(The Script)
   use warnings;
   my(@lines, @fields, $field, $fh);
   # Map using a block
1  open($fh, "<", "datebook.master") or die $!;
2  @lines=<$fh>;
3  @fields = map { split(":") } @lines;
4  foreach $field (@fields){
5     print $field," ";
  }

(Output)
5  Sir Lancelot
   837-835-8257
   474 Camelot Boulevard, Bath, WY 28356
   5/13/69
   24500

   Tommy Savage
   408-724-0140
   1222 Oxbow Court, Sunnyvale, CA 94087
   5/19/66
   34200

   Yukio Takeshida
   387-827-1095
   13 Uno Lane, Asheville, NC 23556
   7/1/29
   57000

   Vinh Tranh
   438-910-7449
   8235 Maple Street, Wilmington, VT 29085
   9/23/63
   68900

Explanation

1. The datebook.master file is opened for reading from the $fh filehandle. Each line consists of colon-separated fields terminated by a newline.

2. The contents of the file are read and assigned to @lines. Each line of the file is an element of the array.

3. The map function uses the block format. (With the block format, don’t put a comma after the close of the block.) The split function splits up the input, @lines, at colons, resulting in a list where each field becomes an element of the array.

4. The foreach loop iterates through the array, assigning each element, in turn, to $field.

5. The display demonstrates the results of the mapping. Before mapping, the line was: Sir Lancelot:837-835-8257:474 Camelot Boulevard, Bath, WY 28356:5/13/69:24500

Slurping a File into a String with the read Function

The read function allows you to read in a specified number of characters, and put them in a variable. It returns the number of characters that were read. If you know the size of a file, you can read the entire file into a string, as shown in the next example.

EXAMPLE 10.9

Click here to view code image

(The Script)
use warnings;
1  open(my $fh, "<", "datebook") or die;
2  $size = -s $fh;
3  read($fh, $buffer, $size);
4  @fields = split(/ |:/,$buffer);
5  foreach $f (@fields){
6     print "$f ";
7     print "-" x 35, " " if $f =~ /^d+$/;
   }
   close $fh;
(Output)
Steve Blenheim
238-923-7366
95 Latham Lane, Easton, PA 83755
11/12/56
20300
-----------------------------------
Betty Boop
245-836-8357
635 Cutesy Lane, Hollywood, CA 91464
6/23/23
14500
-----------------------------------
Igor Chevsky
385-375-8395
3567 Populus Place, Caldwell, NJ 23875
6/18/68
23400
-----------------------------------

Explanation

1. The datebook file is opened for reading.

2. The -s test option returns the size of the datebook file referenced by the filehandle.

3. The read function reads $size bytes (characters) from the filehandle and stores them in the scalar, $buffer. The read function treats all characters as characters, including newline, spaces, tabs, and so forth.

4. The first argument to the split function is a regular expression matching either a newline or a colon, the delimiter for splitting up the string in $buffer, the second argument. (Remember, split splits a scalar and returns an array or list.) When either a newline or a colon is matched in $buffer, split returns an element to be stored in the array @lines. The first element of the array will be the name (for example, Steve Blenheim), the second element will be Steve’s phone number, then the address, the birthday, and salary. When the newline is matched, the next element will be the next name in $buffer. When the splitting has completed, a large array will be created.

5. The foreach loop is used to iterate through each element of the array.

6. Each element of the array is printed with a newline.

7. When the last field is reached, meaning the field containing the salary, a row of 35 dashes is printed. The salary is represented in the regular expression as “beginning and ending with one or more digits.” Since the salary field is the last on the line, it contains a newline. By printing the row of dashes and another newline, the output displays a line separating the records.

10.1.4 Loading a Hash from a File

Loading a hash from a file requires selecting what will be the key and what will be the value for the hash. Since keys must be unique, this method can be used for removing duplicate entries based on a key.

EXAMPLE 10.10

Click here to view code image

(The datafile)
Ann Willy:530-444-5678
Joe Shmoe:415-333-4567
Jack Sprat:213-453-1098
Ann Willy:530-444-5678
Jack Sprat:213-453-1098
Jack Sprat:213-453-1098

(The Script)
   open($fh, "<", "datafile") or die "$!";
   while(<$fh>){
1     ($name, $phone)=split(":");
2     $duphash{$name}=$_;
   }
3  foreach $key (sort keys %duphash){
4     print $duphash{$key};
   }
   close $fh;

(Output)
4  Ann Willy:530-444-5678
   Jack Sprat:213-453-1098
   Joe Shmoe:415-333-4567

10.2 Reading from STDIN

The three filehandles STDIN, STDOUT, and STDERR, as you may recall, are names given to three predefined streams, stdin, stdout, and stderr. By default, these filehandles are associated with your terminal. When printing output to the terminal screen, STDOUT is used. When printing errors, STDERR is used. When assigning user input to a variable, STDIN is used.

The Perl <> input operator encloses the STDIN filehandle so that the next line of standard input can be read from the terminal keyboard and assigned to a variable. Unlike the shell and C operations for reading input, Perl retains the newline on the end of the string when reading a line from standard input. If you don’t want the newline, then you have to explicitly remove it, or “chomp” it off (see the following Section 10.2.2, “The chop and chomp Functions”).

10.2.1 Assigning Input to a Scalar Variable

When reading input from the filehandle STDIN, if the context is scalar, one line of input is read, including the newline, and assigned to a scalar variable as a single string.

Explanation

1. The string What is your name? is sent to STDOUT, which is the screen by default.

2. The input operator <> (called the diamond operator) surrounding STDIN reads one line of input and assigns that line and its trailing newline to the scalar variable $name. When input is assigned to a scalar, characters are read until the user presses the Enter key.

3. The string is printed to STDOUT.

4. If the input operator is empty, the next line of input is read from STDIN, and the behavior is identical to line 2, except input is assigned to $paname.

10.2.2 The chop and chomp Functions

The chop function removes the last character in a scalar variable and the last character of each word in an array. Its return value is the character it chopped. Chop is used primarily to remove the newline from the line of input coming into your program, whether it is STDIN, a file, or the result of command substitution. When you first start learning Perl, the trailing newline can be a real pain!

The chomp function was introduced in Perl 5 to remove the last character in a scalar variable and the last character of each word in an array only if that character is the newline (or, to be more precise, the character that represents the input line separator, initially defined as a newline and stored in the $/ variable). It returns the number of characters it chomped. Using chomp instead of chop protects you from inadvertently removing some character other than the newline.

EXAMPLE 10.12

Click here to view code image

(The Script)
   # Getting rid of the trailing newline. Use chomp instead of chop.
1  print "Hello there, and what is your name? ";
2  $name = <STDIN>;
3  print "$name is a very high class name. ";
4  chop($name);   # Removes the last character no matter what it is.
5  print "$name is a very high class name. ";
6  chop($name);
7  print "$name has been chopped a little too much. ";
8  print "What is your age?  ";
9  chomp($age=<STDIN>); # Removes the last character if
                        # it is the newline.
10 chomp($age);         # The last character is not removed
                        # unless a newline.
11  print "For $age, you look so young! ";

(Output)
1  Hello there, and what is your name? Joe Smith
3  Joe Smith
   is a very high class name.
5  Joe Smith is a very high class name.

7  Joe Smit has been chopped a little too much.

8  What is your age? 25
11 For 25, you look so young!

Explanation

1. The quoted string is printed to the screen, STDOUT, by default.

2. The scalar variable is assigned a single line of text typed in by the user. The <> operator is used for read operations. In this case, it reads from STDIN, which is your keyboard, until the Enter key is pressed. The newline is included in the text that is assigned to the variable $name.

3. The value of $name is printed. Note that the newline breaks the line after Joe Smith, the user’s input.

4. The chop function removes the last character of the string assigned to $name. The character that was chopped is returned.

5. The string is printed again after the chop operation. The last character was removed (in this case, the newline).

6. This time chop will remove the last character in Joe Smith’s name, which is the h in Smith.

7. The quoted string is printed to STDOUT, indicating that the last character was removed.

9. The user input is first assigned to the variable $age. The trailing newline is chomped. The character whose value is stored in the special variable, $/, is removed. This value is, by default, the newline character. The number of characters chomped is returned. Because of the low precedence of the equal (=) operator, parentheses ensure that the assignment occurs before the chomp function chomps.

10. The second chomp will have no effect. The newline has already been removed, and chomp removes only the newline. It’s safer than using chop.

11. The chomped variable string is printed.

10.2.3 The read Function

The read function¹ allows you to read a number of characters into a variable from a specified filehandle. (The first character is character 0.) If reading from standard input, the filehandle is STDIN. The read function returns the number of bytes that were read. You will normally use this function with files or reading input from a server using CGI. To read the entire file you will need to know the size in bytes of that file.

1. The read function is similar to the fread function in the C language.

EXAMPLE 10.13

Click here to view code image

(The Script)
   use warnings;
   # Reading input in a requested number of bytes
1  print "Describe your favorite food in 10 bytes or less. ";
   print "If you type less than 10 characters, press CTRL+D on a line
       by itself. ";
2  my $number=read(STDIN, $favorite, 10);
3  print "You just typed: $favorite ";
4  print "The number of bytes read was $number. ";

(Output)
1  Describe your favorite food in 10 bytes or less.
   If you type less than 10 characters, press CTRL+D on a line by     itself.
   apple pie and ice cream         <-user input
3  You just typed: apple pie
4  The number of bytes read was 10.

Explanation

1. The user is asked for input. If he types less than 10 characters, he should press <CTRL>+D to exit.

2. The read function takes three arguments: the first argument is STDIN, the place from where the input is coming; the second argument is the scalar $favorite, where the input will be stored; and the third argument is the number of characters (bytes) that will be read.

3. The 10 characters read in are printed. The rest of the characters were left in the buffer and ignored.

4. The number of characters (bytes) actually read was stored in $number and is finally printed.

10.2.4 The getc Function

The getc function gets a single character from the keyboard or from a file. At EOF, getc returns a null string.

EXAMPLE 10.14

Click here to view code image

(The Script)
   use warnings;
   # Getting only one character of input
   print "Answer y or n   ";
1  my $answer=getc;     # Gets one character from stdin
2  $restofit=<>;        # What remains in the input buffer is
                        # assigned to $restofit
3  print "$answer ";
4  print "The characters left in the input buffer were:
         $restofit ";

(Output)
1  Answer y or n yessirreebob <ENTER>
3  y
4  The characters left in the input buffer were: essirreebob

10.2.5 Assigning Input to an Array

When reading input from the filehandle STDIN, if the context is an array, then each line is read with its newline and is treated as a single list item, and the read is continued until you press <CTRL>+D (in UNIX) or <CTRL>+Z (in Windows) for end of file (EOF). Normally, you will not assign input to an array, because it could eat up a large amount of memory, or because the user of your program may not realize that he should press <CTRL>+D or <CTRL>+Z to stop reading input.

EXAMPLE 10.15

Click here to view code image

(The Script)
   use warnings;
   # Assigning input to an array
1  print "Tell me everything about yourself. ";
2  my @all = <STDIN>;
3  print "@all";
4  print "The number of elements in the array are: ",
         $#all + 1, ". ";
5  print "The first element of the array is: $all[0]";

(Output)
1  Tell me everything about yourself.
2  OK. Let's see I was born before computers.
   I grew up in the 50s.
   I was in the hippie generation.
   I'm starting to get bored with talking about myself.
   <CTRL>+D

3  OK. Let's see I was born before computers.
   I grew up in the 50s.
   I was in the hippie generation.
   I'm starting to get bored with talking about myself.
4  The number of elements in the array are: 4.
5  The first element of the array is:
   OK. Let's see I was born before computers.

Explanation

1. The string Tell me everything about yourself. is printed to STDOUT.

2. The input operator <> surrounding STDIN reads input lines until <CTRL>+D, EOF, is reached. (For Windows, use <CTRL>+Z instead of <CTRL>+D.) Each line and its trailing newline are stored as a list element of the array @all.

3. The user input is printed to the screen after the user presses <CTRL>+D or <CTRL>+Z.

4. The $# construct lets you get the last subscript or index value in the array. By adding 1 to $#all, the size of the array is obtained; that is, the number of lines that were read.

5. $all[0] is the first element of the array that evaluates to the first line of input from the user. Each line read is an element of the array.

10.2.6 Assigning Input to a Hash

Reading input from STDIN and assigning it to a hash is like reading from a file. The line read can be assigned as a value corresponding to a hash key or as the key itself.

Explanation

1. The hash, %course, is declared here. Keys and values will be assigned later.

2. The scalar variable $course_number is assigned the value 101.

3. The string What is the name of course 101? is printed to STDOUT.

4. The name of the hash is %course. We are assigning a value to one of the hash elements. The key is $course_number enclosed in curly braces. The chomp function will remove the newline from the value assigned by the user.

5. The new hash is printed. It has one key and one value.

10.2.7 Opening for Writing

When opening a file for writing, the file will be created if it does not exist, and if it already exists, it must have write permission. If the file exists, its contents will be overwritten. The filehandle is used to access the system file.

Explanation

1. The scalar variable $file is set to the full pathname of a UNIX file called newfile. The scalar will be used to represent the name of the UNIX file to which output will be directed via the filehandle. This example will work the same way with Windows, but if you use the backslash as a directory separator, either enclose the path in single quotes, or use two backslashes; for example, C:\home\ellie\testing.

2. The user-defined filehandle $fh will change the default place to where output normally goes, STDOUT, to the file that it represents, newfile. The > symbol indicates that newfile will be created if it does not exist and opened for writing. If it does exist, it will be opened and any text in it will be overwritten, so be careful!

3. The print function will send its output to the file, instead of to the screen. The string hello world. will be written into newfile via the $fh filehandle. The file newfile will remain open unless it is explicitly closed or the Perl script ends (see “Closing the Filehandle” earlier in this chapter).

4. The print function will send its output to the filehandle $fh instead of to the screen. The string hello world, again. will be written into newfile via the $fh filehandle. The operating system keeps track of where the last write occurred and will send its next line of output to the location immediately following the last byte written to the file.

5. The script is executed. The output is sent to newfile.

6. The contents of the file newfile are printed.

10.2.8 Win32 Binary Files

Win32 distinguishes between text and binary files. If ^Z is found, the program may abort prematurely or have problems with the newline translation. When reading and writing Win32 binary files, use the binmode function to prevent these problems. The binmode function arranges for a specified filehandle to be read or written to in either binary (raw) or text mode. If the discipline argument is not specified, the mode is set to “raw.” The discipline is one of :raw, :crlf, :text, :utf8, :latin1, and so forth.

EXAMPLE 10.19

Click here to view code image

# This script copies one binary file to another.
# Note its use of binmode to set the mode of the filehandle.
   use warnings;
1  $infile="statsbar.gif";
2  open( my $in, "<" , "$infile" );
3  open( my $out, ">", "outfile.gif" );

4  binmode( $in );     # Crucial for binary files!

5  binmode( $out );
   # binmode should be called after open() but before any I/O
   # is done on the filehandle.

6  while ( read( $in, $buffer, 1024 ) ) {
7     print $out $buffer;
   }

8  close( INFILE );
   close( OUTFILE );

Explanation

1. The scalar $infile is assigned a .gif filename.

2. The file statsbar.gif is opened for reading and attached to the $in filehandle.

3. The file outfile.gif is opened for writing and assigned to the $out filehandle.

4. The binmode function arranges for the input file to be read as binary text.

5. The binmode function arranges for the output file to be written as binary text.

6. The read function reads 1,024 bytes at a time, storing the input read in the scalar $buffer.

7. After the 1,024 bytes are read in, they are sent out to the output file.

8. Both filehandles are closed. The result was that one binary file was copied to another binary file.

10.2.9 Opening for Appending

When opening a file for appending, the file will be created if it does not exist, and if it already exists, it must have write permission. If the file exists, its contents will be left intact, and the output will be appended to the end of the file. Again, the filehandle is used to access the file rather than accessing it by its real name.

Explanation

1. The user-defined filehandle $fh will be used to send and append output to the file called newfile. As with the shell, the redirection symbol directs the output from the default filehandle, STDOUT, and appends the output to the file newfile. If the file cannot be opened because, for example, the write permissions are turned off, the die operator will print the error message, Can't open newfile: Permission denied., and the script will exit.

2. The string, Just appended “hello world” to the end of newfile, will be written to end of newfile via the $fh filehandle.

10.2.10 The select Function

The select function sets the default output to the specified filehandle and returns the previously selected filehandle. All printing will go to the selected handle. Once you use select, you must remember to reset your default ouput to STDOUT or all output from your script will continue to be sent to the “selected” filehandle.

EXAMPLE 10.22

Click here to view code image

(The Script)
   use warnings;
1  open (my $fh,">", "newfile") || die "Can't open newfile: $! ";
2  select $fh;      # Select the new filehandle for output
3  open (my $db, "<", "datebook") || die "Can't open datebook: $! ";

   while(<$db>) {
4     print ;            # Output goes to $fh, i.e., newfile
   }
5  select(STDOUT);       # Send output back to the screen
   print "Good-bye. ";  # Output goes to the screen

Explanation

1. newfile is opened for writing and assigned to the filehandle $fh.

2. The select function assigns $fh as the current default filehandle for output. The return value from the select function is the name of the filehandle that was closed (STDOUT) in order to select $fh, the one that is now opened for writing.

3. The $db filehandle is opened for reading.

4. As each line is read into the $_ variable from the file referenced by $db, it is then printed to the currently selected filehandle, $fh. Notice that you don’t have to name the filehandle.

5. By selecting STDOUT, the rest of the program’s output will go to the screen.

10.2.11 File Locking with flock

To prevent two programs from writing to a file at the same time, you can lock the file so you have exclusive access to it, and then unlock it when you’re finished using it. The flock function takes two arguments: a filehandle and a file-locking operation. The operations are listed in Table 10.1.

Table 10.1 File-Locking Operations

Read permission is required on a file to obtain a shared lock, and write permission is required to obtain an exclusive lock. With operations 1 and 2, normally the caller requesting the file will block (wait) until the file is unlocked. If a nonblocking lock is used on a filehandle, an error is produced immediately if a request is made to get the locked file. (See Fcntl.pm for a better implementation of locks.)

EXAMPLE 10.23

Click here to view code image

   use warnings;
   # Program that uses file locking -- UNIX
1  $LOCK_EX = 2;
2  $LOCK_UN = 8;

3  print "Adding an entry to the datafile. ";
   print "Enter the name: ";
   chomp($name=<STDIN>);
   print "Enter the address: ";
   chomp($address=<STDIN>);

4  open(my $fh, ">>", "datafile") || die "Can't open: $! ";

5  flock($fh, $LOCK_EX) || die ;        # Lock the file

6  print $fh "$name:$address ";

7  flock($fh, $LOCK_UN) || die;         # Unlock the file

   close $fh;

Explanation

1. The scalar is assigned the value of the operation that will be used by the flock function to lock the file. This operation is to block (wait) until an exclusive lock can be created. It can be defined by importing the constants from Fcntl.pm as use Fcntl qw(:flock);

2. This operation will tell flock when to unlock the file so others can write to it.

3. The user is asked for the information to update the file. This information will be appended to the file.

4. The file is opened for appending.

5. The flock function puts an exclusive lock on the file.

6. The data is appended to the file.

7. Once the data has been appended, the file is unlocked so others can access it.

10.2.12 The seek and tell Functions

The seek Function

Seek allows you to randomly access a file. The seek function is the same as the fseek standard I/O function in C. Rather than closing the file and then reopening it, the seek function allows you to move to some byte (not line) position within the file. The seek function returns 1 if successful, 0 otherwise.

The seek function sets a position in a file, where the first byte is 0. Positions are as follows:

• 0 = Beginning of the file

• 1 = Current position in the file

• 2 = End of the file

The offset is the number of bytes from the file position. A positive offset moves the position forward in the file; a negative offset moves the position backward in the file for position 1 or 2.

The od command lets you look at how the characters in a file are stored. This file was created on a Win32 platform; on UNIX systems, the linefeed/newline is one character, .

Click here to view code image

$ od -c db
0000000000   S   t   e   v   e        B   l   e   n   h   e   i   m
0000000020   B   e   t   t   y        B   o   o   p         L   o   r   i
0000000040   G   o   r   t   z          S   i   r       L   a   n   c
0000000060   e   l   o   t          N   o   r   m   a       C   o   r   d
0000000100         J   o   n        D   e   L   o   a   c   h         K
0000000120   a   r   e   n       E    v   i   c   h
0000000134

EXAMPLE 10.24

Click here to view code image

(The Text File: db)
Steve Blenheim
Betty Boop
Lori Gortz
Sir Lancelot
Norma Cord
Jon DeLoach
Karen Evich

----------------------------------------------------------------

(The Script)
  use warnings;
   # Example using the seek function
1  open(my $fh, "<", "db") or die "Can't open: $! ";
2  while($line=<$fh>){       # Loop through the whole file
3     if ($line =~ /^Lori/) { print "--$line-- ";}
   }
4  seek($fh,0,0);            # Start at the beginning of the file
5  while(<$fh>) {
6     print if /Steve/;
   }
   close $fh;

(Output)
3  --Lori Gortz--
6  Steve Blenheim

Explanation

1. The db file is assigned to the $fh filehandle and opened for reading.

2. Each line of the file is assigned, in turn, to the scalar $line while looping through the file.

3. If $line begins with Lori, the print statement is executed.

4. The seek function causes the file pointer to be positioned at the top of the file (position 0) and starts reading at byte 0, the first character. If you want to get back to the top of the file without using seek, the filehandle must first be explicitly closed with the close function.

5. Starting at the top of the file, the loop is entered. The first line is read from the filehandle and assigned to $_, the default line holder.

6. If the pattern Steve is found in $_, the line will be printed.

EXAMPLE 10.25

Click here to view code image

(The Text File: db)
Steve Blenheim
Betty Boop
Lori Gortz
Sir Lancelot
Norma Cord
Jon DeLoach
Karen Evich

----------------------------------------------------------------

(The Script)
   use warnings;
1  open($fh, "<", "db") or die "Can't open db: $! ";
2  while(<$fh>){
3     last if /Norma/;   # This is the last line that
                         # will be processed
   }
4  seek($fh,0,1) or die;   # Seeking from the current position
5  $line=<$fh>;            # This is where the read starts again
6  print "$line";
7  close $fh;

(Output)
6  Jon DeLoach

Explanation

1. The db file is opened for reading via the $fh filehandle.

2. The while loop is entered. A line from the file is read and assigned to $_.

3. When the line containing the pattern Norma is reached, the last function causes the loop to be exited.

4. The seek function will reposition the file pointer at the byte position 0 where the next read operation would have been performed in the file, position 1; in other words, the line right after Norma. The byte position could be either a negative or positive value.

5. A line is read from the db file and assigned to the scalar $line. The line read is the line that would have been read just after the last function caused the loop to exit.

6. The value of $line is printed.

Explanation

1. The db file is opened for reading via the $fh filehandle.

2. The seek function starts at the end of the file (position 2) and backs up 13 bytes. The newline ( ), although not visible, is represented as the last two bytes in the line (Windows).

3. The while loop is entered, and each line, in turn, is read from the filehandle $fh.

4. Each line is printed. By backing up 13 characters from the end of the file, Karen Evich is printed. Note the output of the od -c command and count back 13 characters from the end of the file.

Click here to view code image

0000000000     S   t   e   v   e       B   l   e   n   h   e   i   m
0000000020     B   e   t   t   y       B   o   o   p         L   o   r   i
0000000040         G   o   r   t   z         S   i   r       L   a   n   c
0000000060     e   l   o   t         N   o   r   m   a       C   o   r   d
0000000100           J   o   n       D   e   L   o   a   c   h         K
0000000120     a   r   e   n       E   v   i   c   h
0000000134

The tell Function

The tell function returns the current byte position in the file and is used with the seek function to move to that position in the file. If FILEHANDLE is omitted, tell returns the position of the file last read.

EXAMPLE 10.27

Click here to view code image

(The Text File: db)
Steve Blenheim
Betty Boop
Lori Gortz
Sir Lancelot
Norma Cord
Jon DeLoach
Karen Evich

----------------------------------------------------------------

(The Script)
   use warnings;
   # Example using the tell function
1  open(my $fh,"<","db") || die "Can't open: $! ";
2  while ($line=<$fh>) {      # Loop through the whole file
      chomp($line);
3     if ($line =~ /^Lori/) {
4        $currentpos=tell;
5        print "The current byte position is $currentpos. ";
6        print "$line ";
      }
   }
7  seek($fh,$currentpos,0);   # Start after the line starting with Lori
8  @lines=(<$fh>);
9  print @lines;
10 close $fh;

(Output)
5  The current byte position is 40.
6  Lori Gortz

9  Sir Lancelot
   Norma Cord
   Jon DeLoach
   Karen Evich

Explanation

1. The db file is assigned to the $fh filehandle and opened for reading.

2. Each line of the file is assigned, in turn, to the scalar $line while looping through the file.

3. If the scalar $line contains the regular expression Lori, the if block is entered.

4. The tell function is called and returns the current byte position (starting at byte 0) in the file. This represents the position of the first character in the line that was just read in after the line containing Lori was processed.

5. The value in bytes is stored in $currentpos. It is printed. Byte position 40 represents the position where Sir Lancelot starts the line.

6. The line containing the regular expression Lori is printed.

7. The seek function will position the file pointer for $fh at the byte offset, $currentpos, 40 bytes from the beginning of the file. Without seek, there is no way to go directly to byte 40; the only option would be to start again from the beginning of the file.

8. The lines starting at offset 40 are read in and stored in the array @lines.

9. The array is printed, starting at offset 40.

10.2.13 Opening for Reading and Writing

Two files are used in the next example: a text file, called visitor.txt, which has an initital value of 1 as it’s only text, and countem.pl, the script that will be used to track the number of users who have run the script.

The visitor_count file is a Perl script that will add one to the visitor.txt file every time the script is executed.

Table 10.2 Reading and Writing Operations

EXAMPLE 10.28

Click here to view code image

(The text file, visitor.txt)
1

(The Script)
   use warnings;
   # Scriptname: countem.pl
   # Open visitor_count for reading first, and then writing
1  open(my $fh, "+<", "visitor.txt") ||
         die "Can't open visitor.txt: $! ";
2  $count=<$fh>;           # Read a number from from the file
3  print "You are visitor number $count.";
4  $count++;
5  seek($fh, 0,0) || die;  # Seek back to the top of the file
6  print $fh $count;       # Write the new number to the file
7  close $fh;
(Output)
(First run of countem.pl)
You are visitor number 1.

(Second run of countem.pl)
You are visitor number 2.

Explanation

1. The file visitor.txt is opened for reading first, and then writing. If the file does not exist or is not readable, die will cause the program to exit with an error message.

2. A line is read from the visitor.txt file. The first time the script is executed, the number 1 is read in from visitor.txt file and stored in the scalar $count.

3. The value of $count is printed.

4. The $count scalar is incremented by 1.

5. The seek function moves the file pointer to the beginning of the file.

6. The new value of $count is written back to the visitor.txt file. The number that was there is overwritten by the new value of $count each time the script is executed.

7. The file is closed.

Explanation

1. The filehandle $fh is opened for writing first. This means that the file joker will be created or, if it already exists, it will be truncated. Be careful not to mix up +< and +>.

2. The output is sent to joker via the $fh filehandle.

3. The seek function moves the filepointer to the beginning of the file.

4. The while loop is entered. A line is read from the file joker via the $fh filehandle and stored in $_.

5. Each line ($_) is printed after it is read until the end of the file is reached.

10.2.14 Opening for Anonymous Pipes

When using a pipe (also called a filter), a connection is made from one program to another. The program on the left-hand side of a pipe symbol sends its output into a temporary buffer and writes into it. On the other side of the pipe is a program that is a reader. It gets its input from the buffer. Here is an example of a typical UNIX pipe (see Figure 10.1):

who | wc -l

and an MS-DOS pipe:

dir /b | more

Figure 10.1 UNIX pipe example.

The output of the who command is sent to the wc command. The who command sends its output to the pipe; meaning, it writes to the pipe. The wc command gets its input from the pipe; it reads from the pipe. (If the wc command were not a reader, it would ignore what is in the pipe.) The output of the wc command is finally sent to the STDOUT, the terminal screen. The number of people logged on is printed.

When a Perl pipe is opened, the operating system command is either on the left-hand side or right-hand side of the pipe. For example, if you see | sort, the OS command is on the right side of the pipe symbol. There is nothing on the left side, which implies that Perl is there and Perl is the writer. Perl sends its output to the pipe and the sort command reads from it. On the other hand, if you see ls | or dir |, the OS command is on the left-hand side of the pipe, implying that Perl is on the right-hand side, making perl the reader.

(It is important to keep in mind that the process connecting to Perl is an operating system command. If you are running Perl on a UNIX or Linux system, the commands will be different from those on a Windows system, thereby making Perl scripts implementing pipes unportable between systems.)

The Output Filter

When creating a handle with the open function, you can open a filter so that the output is piped to a system command. The command is preceded by a pipe symbol (|) and replaces the filename argument in the previous examples. The output will be piped to the command and sent to STDOUT (see Figure 10.2). You can use the two-argument format or the three-argument format, as shown in the following example. With the three-argument format, your shell (bash, korn, and so on) may be avoided and, thus, shell wildcard expansion, redirection, and multistage pipelines will not be handled.

Figure 10.2 Perl output filter.

Explanation

1. The user-defined pipe handle MYPIPE will be used to pipe output from the Perl script to the OS command sort, which sorts lines, hence the newline between each animal.

2. The print function sends the output to MYPIPE instead of to the screen. The sort command on right-hand side of the pipe sorts by lines.

3. After you have finished using the pipe handle, close it. This guarantees that the command will complete before the script exits. If you don’t close the handle, the output may not be flushed properly.

4. This is the three-argument style for creating a pipe. The pipe name is a lexical scalar, $mypipe, the pipe symbol with a dash on its righthand side, |-, indicates that Perl will be sending output to the pipe where the OS sort command is represented by the dash. Perl writes; the sort command reads.

EXAMPLE 10.31

Click here to view code image

(The Text File)
$ cat emp.names
Steve Blenheim
Betty Boop
Igor Chevsky
Norma Cord
Jon DeLoach
Karen Evich

(The Script)
   use warnings;
   # Reverse first and last names and pipe to the sort command
1  open(FOO,"|sort");    # Open output filter
2  open($db, "<","emp.names") or die;    # Open DB for reading
3  while(<$db>) {
4     ($first, $last)= split(" ", $_);
5     print FOO "$last, $first ";
    }
6  close FOO;
   close $db;

(Output)
       Blenheim, Steve
       Boop, Betty
       Chevsky, Igor
       Cord, Norma
       DeLoach, Jon
       Evich, Karen

Explanation

1. The user-defined pipe handle, a bareword FOO, will be used to pipe output to the OS sort command.

2. The open function creates the pipe handle $db and attaches it to the UNIX file called emp.names.

3. The expression in the while loop contains the filehandle DB, enclosed in angle brackets, indicating a read operation. The loop will read the first line from the emp.names file and store it in the $_ scalar variable.

4. The split function splits each line by whitespace and returns a list consisting of $first, the first name, and $last, the last name.

5. The first and last names are printed to the FOO pipe and then sent to the OS sort command, and the output from the sort command will be sent to the screen.

6. The output filter FOO is closed.

Sending the Output of a Filter to a File

In the previous example, what if you had wanted to send the output of the filter to a file intead of to STDOUT? You can’t send output to a pipe and a filehandle at the same time, but you can redirect STDOUT to a filehandle. Since, later in the program, you may want STDOUT to be redirected back to the screen, you can first save it or simply reopen STDOUT to the terminal device by typing

open(STDOUT, ">/dev/tty");

The following example can better be accomplished by using Capture::Tiny from CPAN. Capture::Tiny fixes pitfalls, incuding avoiding accidentally clobbering someone else’s global filehandles.

EXAMPLE 10.32

Click here to view code image

   use warnings;
   # Program to redirect STDOUT from filter to a UNIX file
1  $| = 1;           # Flush buffers
2  my $tmpfile = "temp";
3  open($db, "<","data") || die qq/Can't open "data": $! /;
                                           # Open file for reading
4  open(SAVED, ">&STDOUT") || die "$! ";  # Save stdout
5  open(STDOUT, ">$tmpfile" ) || die "Can't open: $! ";
6  open(SORT, "| sort ") || die;           # Open output filter
7  while(<$db>){
8     print SORT;   # Output is first sorted and then sent to temp.
9  }
10 close SORT;
   close $db;
11 open(STDOUT, ">&SAVED") || die "Can't open";
12 print "Here we are printing to the screen again. ";
                     # This output will go to the screen
13 rename("temp","data");

Explanation

1. The $| variable guarantees an automatic flush of the output buffer after each print statement is executed. (See the autoflush module in Appendix A, “Perl Built-ins, Pragmas, Modules, and the Debugger.”)

2. The scalar $tmpfile is assigned temp to be used later as an output file.

3. The data file is opened for reading, and attached to the DB filehandle.

4. STDOUT is being copied and saved in another filehandle called SAVED. Behind the scenes, the file descriptors are being manipulated.

5. The temp file is being opened for writing and is assigned to the file descriptor normally reserved for STDOUT, the screen. The file descriptor for STDOUT has been closed and reopened for temp.

6. The output filter will be assigned to SORT. Perl’s output will be sent to the sort utility, which works here for both Windows and UNIX to sort alphabetically from the beginning of the line.

7. The $db filehandle is opened for reading.

8. The output filehandle will be sent to the temp file after being sorted.

9. Close the loop.

10. Close the pipe.

11. Open the standard output filehandle so that output is redirected back to the screen.

12. This line prints to the screen because STDOUT has been reassigned there.

13. The temp file is renamed data, overwriting what was in data with the contents of temp.

Input Filter

When creating a filehandle with the open function, you can also open a filter so that input is piped into Perl. The OS shell normally handles any special characters that need interpretation during the processing.

If you don’t have any need for the shell to process the command in the pipe (meaning you aren’t using redirection, wildcard expansion, or multiple pipes), you can use the three-argument format as previously shown in Example 10.30. See Figure 10.3.

Explanation

1. The user-defined pipe handle INPIPE will be used to pipe the output from the command as input to Perl. The output of a UNIX date command will be used as input by your Perl script via the INPIPE pipe handle. Windows users: use date /T.

2. The scalar $today will receive its input from the INPIPE pipe handle; in other words, Perl reads from INPIPE.

3. The value of the UNIX date command was assigned to $today and is displayed.

4. After you have finished using the pipe handle, use the close function to close it. This guarantees that the command will complete before the script exits. If you don’t close the pipe handle, the output may not be flushed properly.

5. This format became available in version 5.6. It allows you to create lexically scoped pipe handles which will be closed when the handle goes out of scope.

6. Now Perl reads from the pipe, storing the output of the UNIX date command in $today.

Figure 10.3 Perl input filter.

EXAMPLE 10.35

Click here to view code image

(The Script)
   use warnings;
   # Opening an input filter on a Win32 platform
1  open(LISTDIR, 'dir "C:perl" |') || die $!;
2  @filelist = <LISTDIR>;
3  foreach  $file ( @filelist ){
      print $file;
   }

(Output)
Volume in drive C is 010599
Volume Serial Number is 2237-130A

Directory of C:perl

03/31/1999  10:34p      <DIR>          .
03/31/1999  10:34p      <DIR>          ..
03/31/1999  10:37p              30,366 DeIsL1.isu
03/31/1999  10:34p      <DIR>          bin
03/31/1999  10:34p      <DIR>          lib
03/31/1999  10:35p      <DIR>          html
03/31/1999  10:35p      <DIR>          eg
03/31/1999  10:35p      <DIR>          site
               1 File(s)         30,366 bytes
               7 Dir(s)     488,873,984 bytes free

Explanation

1. The output of the Windows dir command will be piped to LISTDIR. When enclosed in angle brackets, the standard input will come from LISTDIR. If the open fails, the die operator will print an error and exit the script.

2. The output from the Windows dir command has been piped into pipe LISTDIR. The input is read from the pipe and assigned to the array @filelist. Each element of the array represents one line of input.

3. The foreach loop iterates through the array, printing one line at a time until the end of the array.

10.3 Passing Arguments

How does Perl pass command-line arguments to a Perl script? If you are coming from a C, C++, awk, or C shell background, at first glance you might think, “Oh, I already know this!” Beware! There are some subtle differences. So, read on.

10.3.1 The @ARGV Array

Perl does store arguments in a special array called @ARGV. The subscript starts at zero and, unlike C and awk, $ARGV[0] does not represent the name of the program; it represents the name of the first word after the script name. Like the shell languages, the $0 special variable is used to hold the name of the Perl script. Unlike the C shell, the $#ARGV expression contains the number of the last subscript in the array, not the number of elements in the array. The number of arguments is $#ARGV + 1. $#ARGV initially has a value of -1. To get the size of the @ARGV array, it is easier to just say scalar @ARGV.

When ARGV, the filehandle, is enclosed in angle brackets, <ARGV>, the command-line argument is treated as a filename. The filename is assigned to ARGV and the @ARGV array is shifted immediately to the left by one, thereby shortening the @ARGV array.

The value that is shifted off the @ARGV array is assigned to $ARGV. $ARGV contains the name of the currently selected filehandle. See Figure 10.4.

Figure 10.4 The many faces of ARGV.

EXAMPLE 10.36

Click here to view code image

(The Script)
   #!/usr/bin/env perl
   use warnings;
1  die "$0 requires an argument. " unless @ARGV;
                         # Must have at least one argument
2  print "@ARGV ";      # Print all arguments
3  print "$ARGV[0] ";   # Print first argument
4  print "$ARGV[1] ";   # Print second argument
5  print "The number of arguments is ", scalar @ARGV,". ";
6  print "There are ", $#ARGV + 1," arguments. ";
                              # $#ARGV is the last index value
7  print "$ARGV[$#ARGV] is the last one. ";  # Print last arg

(Output)
   $ perl.arg
2  perl.arg requires an argument.

   $ perl.arg f1 f2 f3 f4 f5
2  f1 f2 f3 f4 f5
3  f1
4  f2
5  The number of arguments is 5.
6  There are 5 arguments.
6  f5 is the last one.

Explanation

1. If there are no command-line arguments, the die function is executed and the script is terminated. The $0 special variable holds the name of the Perl script, perl.arg.

2. The contents of the @ARGV array are printed.

3. The first argument, not the script name, is printed.

4. The second argument is printed.

5. The scalar function evaluates an expression in scalar context; that is, it returns the size of the array @ARGV.

6. The $#ARGV variable contains the number value of the last index value. Since the index starts at zero, $#ARGV + 1 is the total number of arguments, not counting the script name; also the size of the array.

7. Since $#ARGV contains the value of the last index value, $ARGV[$#ARGV] is the value of the last element of the @ARGV array.

10.3.2 ARGV and the Null Filehandle

When used in loop expressions and enclosed in the input angle brackets (<>), each element of the @ARGV array is treated as a special filehandle. Perl shifts through the array, storing each element of the array in a variable $ARGV. A set of empty angle brackets is using the null filehandle, and Perl implicitly uses each element of the ARGV array as a filehandle. When using the input operators <>, either with or without the keyword ARGV, Perl shifts through its arguments one at a time, allowing you to process each argument in turn. Once the ARGV filehandle has been opened, the arguments are shifted off one at a time, so if they are to be used later, they must be saved in another array.

EXAMPLE 10.38

Click here to view code image

(The Text File: emp.names)
Steve Blenheim
Betty Boop
Igor Chevsky
Norma Cord
Jon DeLoach
Karen Evich
(The Script)
   #!/usr/bin/env perl
   use warnings;
   # Scriptname: grab.pl
   # Program will behave like grep -- will search for a pattern
   # in any number of files.
1  if ( scalar @ARGV < 2) {die "Usage: $0 pattern filename(s) ";}
2  my $pattern = shift;  # Implictly shifts @ARGV
3  while(my $line=<ARGV>){
      print "$ARGV: $.:  $line" if $line =~ /$pattern/i;
      close(ARGV) if eof;
   }

(Output)
   $ grab.pl
1  Usage: grab.pl pattern filenames(s)
   $ grab.pl 'norma' db
2  db:5: Norma Cord
   $ grab.pl 'Sir Lancelot' db
3  db:4: Sir Lancelot
   $ grab.pl '^.... ' db
4  db:3: Lori Gortz
   $ grab.pl Steve d*
5  datebook.master:12: Johann Erickson:Stevensville, Montana
   datafile:8: Steven Daniels:496-456-5676:83755:11/12/56:20300
   db:1: Steve Blenheim

Explanation

1. If there are no command-line arguments, the die function is executed. $0 is the current Perl script name.

2. The first argument is shifted from the @ARGV array. This should be the pattern that will be searched for.

3. Since the first argument was shifted off the @ARGV array and assigned to the scalar $pattern, the remaining arguments passed in from the command line are opened in turn to the ARGV filehandle. When the while loop is entered, a line is read and assigned to $line.

4. The $ARGV scalar holds the name of the file that is currently being processed. The $. variable holds the current line number. If the value in $pattern is matched, the filename where it was found, the number of the line where the pattern was found, and the line itself are printed. The i after the last delimiter in the pattern turns off case sensitivity.

5. When the file being processed reaches the end of file (EOF), the ARGV filehandle is closed. This causes the $. variable to be reset. If ARGV is not closed explicitly here, the $. variable will continue to increment and not be set back to 1 when the next file is read.

EXAMPLE 10.39

Click here to view code image

(The Script)
   #!/usr/bin/env perl
   use warnings;
1  if ( scalar @ARGV < 1 ){ die "Usage: $0 <argument>"; }
2  open(my $pw,"<", "/etc/passwd") || die "Can't open /etc/passwd: $!";
3  my $username=shift;   # Same as shift @ARGV
4  while( my $pwline = <$pw>){
5     unless ( $pwline =~ /$username:/){ print "$username is not
               a user here. "; next;}

   }
6  close $pw;
7  open(LOGGEDON, "who |" ) || die "Can't execute who $!" ;
8  while($logged = <LOGGEDON> ){
      if ( $logged =~ /$username/){ $logged_on = 1; last;}
   }
9  close LOGGEDON;
   die "$username is not logged on. " if ! $logged_on;
   print "$username is logged on and running these processes. ";
10  open(PROC, "ps -aux|" ) || die "Can't execute ps: $! ";
   while(my $line=<PROC>){
      print "$line" if  $line =~ /$username:/;
   }
11 close PROC;
   print '*' x 80; " ";
   print "So long. ";

(Output)
   $ checkon
1  Usage: checkon <argument>:  at checkon line 6.
   $ checkon joe
5  Joe is not a user here.
   $ checkon ellie
8  ellie is logged on and running these processes:
ellie   3825  6.4  4.5  212  464 p5 R   12:18   0:00 ps -aux
ellie   1383  0.8  8.4  360  876 p4 S   Dec 26 11:34 /usr/local/OW3/bin/xview
ellie   173   0.8 13.4 1932 1392 co S   Dec 20389:19 /usr/local/OW3/bin/xnews
ellie   164   0.0  0.0  100    0 co IW  Dec 20  0:00 -c
< some of the output was cut to save space >
ellie   3822  0.0  0.0    0    0 p5 Z   Dec 20  0:00 <defunct>
ellie   3823  0.0  1.1   28  112 p5 S   12:18   0:00 sh -c ps -aux | grep '^'
ellie   3821  0.0  5.6  144  580 p5 S   12:18   0:00 /bin/perl checkon ellie
ellie   3824  0.0  1.8   32  192 p5 S   12:18   0:00 grep ^ellie
ellie   3815  0.0  1.9   24  196 p4 S   12:18   0:00 script checkon.tsc
******************************************************************************

Explanation

1. This script calls for only one argument. If ARGV is empty (meaning, no arguments are passed at the command line), the die function is executed and the script exits with an error message. (Remember: $#ARGV is the number of the last subscript in the ARGV array, and ARGV[0] is the first argument, not counting the name of the script, which is $0.) If more than one argument is passed, the script will also exit with the error message.

2. The /etc/passwd file is opened for reading via the $pw filehandle.

3. The first argument is shifted from @ARGV and assigned to $username.

4. Each time the while loop is entered, a line of the /etc/passwd file is read via the $pw filehandle.

5. The =~ is used to test if the first argument passed matches the $username. If a match is not found, the loop is exited.

6. The filehandle is closed.

7. LOGGEDON is opened as a pipe to accept input. Output from the UNIX who command will be piped to the LOGGEDON pipe.

8. Each line of the input from the pipe is tested. If the user is logged on, the scalar $logged_on is set to 1, and the loop is exited.

9. The pipe is closed.

10. PROC is opened a pipe to accept input. Output from the UNIX ps command will be piped to PROC. Each line from the pipe is read in turn and placed in the scalar $line. If $line contains a match for the user, that line will be printed to STDOUT, the screen.

11. The pipe is closed.

10.3.3 The eof Function

The eof function can be used to test if end of file has been reached. It returns 1 if either the next read operation on a FILEHANDLE is at the end of the file, or the file was not opened. Without an argument, the eof function returns the eof status of the last file read. The eof function with parentheses can be used in a loop block to test the end of file when the last filehandle has been read. Without parentheses, each file opened can be tested for end of file.

Explanation

1. The first argument stored in the ARGV array is file1. The null filehandle is used in the while expression. The file file1 is opened for reading.

2. The $. variable is a special variable containing the line number of the currently opened filehandle. It is printed, followed by a tab and then the line itself.

3. If end of file is reached, print a row of 30 dashes.

4. The filehandle is closed in order to reset the $. value back to 1 for the next file that is opened. When file1 reaches end of file, the next argument, file2, is processed, starting at line 1.

10.3.4 The -i Switch—Editing Files in Place

The -i option is used to edit files in place. The files are named at the command line and stored in the @ARGV array. Perl will automatically rename the output file to the same name as the input file. The output file will be the selected default file for printing. To ensure that you keep a backup of the original file, you can specify an extension to the -i flag, such as -i.bak. The original file will be renamed filename.bak. The file must be assigned to the ARGV filehandle when it is being read from. Multiple files can be passed in from the command line and each, in turn, will be edited in place.

EXAMPLE 10.42

Click here to view code image

(The Text File)
1  $ more names.txt
   igor chevsky
   norma corder
   jennifer cowan
   john deloach
   fred fardbarkle
   lori gortz
   paco gutierrez
   ephram hardy
   james ikeda
(The Script)
   # Scriptname: inplace.plx
2   while(<ARGV>){   # Open ARGV for reading
3     tr/a-z/A-Z/;
4     print;    # Output goes to file currently being read in-place
5     close ARGV if eof;
   }

(Output)
6  $ perl -i.bak inplace.plx names.txt
   IGOR CHEVSKY
   NORMA CORDER
   JENNIFER COWAN
   JOHN DELOACH
   FRED FARDBARKLE
   LORI GORTZ
   PACO GUTIERREZ
   EPHRAM HARDY
   JAMES IKEDA
7  more names.txt.bak
   igor chevsky
   norma corder
   jennifer cowan
   john deloach
   fred fardbarkle
   lori gortz
   paco gutierrez
   ephram hardy
   james ikeda

Explanation

1. The contents of the original text file, called names.txt, is printed.

2. The while loop is entered. The ARGV filehandle will be opened for reading. The ARGV filehandle represents the file coming in from the command line, names.txt.

3. All lowercase letters are translated to uppercase letters in the file being processed (tr function).

4. The print function sends its output to the file being processed in place.

5. The ARGV filehandle will be closed when the end of file is reached. This makes it possible to reset line numbering for each file when processing multiple files or to mark the end of files when appending.

6. The -i in-place switch is used with an extension, bak. The name.txt file will be edited in place and the original file will be saved in names.txt.bak. The names.txt file has been changed, illustrating that the file was modified in place.

7. The names.txt.bak file was created as a backup file for the original file. The original file, names.txt, was changed in place.

10.4 File Testing

Like the shells, Perl provides a number of file test operators (see Table 10.3) to check for the various attributes of a file, such as existence, access permissions, directories, and so on. Most of the operators return 1 for true and “ ” (null) for false.

Table 10.3 File Test Operators

A single underscore can be used to represent the name of the file if the same file is tested more than once. The stat structure of the previous file test is used.

EXAMPLE 10.43

Click here to view code image

(At the Command Line)
1  $ ls -l perl.test
   -rwxr-xr-x  1 ellie         417 Apr 23 13:40 perl.test
2  $ ls -l afile
   -rws--x--x  1 ellie           0 Apr 23 14:07 afile

(In Script)
   use warnings;
   my $file="perl.test";

3  print "File is readable " if -r  $file;
   print "File is writeable " if -w  $file;
   print "File is executable " if -x  $file;
   print "File is a regular file " if -f  $file;
   print "File is a directory " if -d $file;
   print "File is text file " if -T $file;
   printf "File was modified in the last 12 hours " if -M $file < .5;
   print "File has been accessed in the last 12 hours. " if -M $file <= 12;

4  print "File has read, write, and execute set. "
         if -r $file && -w _ && -x _;
5  stat("afile");  # stat another file
   print "File is a set user id program. " if  -u _;
                   # underscore evaluates to last file stat'ed
   print "File is zero size. " if -z _;

(Output)
3  File is readable
   File is writeable
   File is executable
   File is a regular file
   *** No print out here because the file is not a directory ***
   File is text file
   File was last modified 0.000035 days ago.
   File has read, write, and execute set.
   File is a set user id program.
   File is zero size.

Explanation

1. The permissions, ownership, file size, and so on, on perl.test are shown.

2. The permissions, ownership, file size, and so on, on afile are shown.

3. The print statement is executed if the file is readable, writeable, executable, and so on.

4. Since the same file is checked for more than one attribute, an underscore is appended to the file test operator. The underscore references the stat structure, an array that holds information about the file.

5. The stat function returns a 13-element array containing the statistics about a file. As long as the underscore is appended to the file test operator, the statistics for afile are used in the tests that follow.

10.5 What You Should Know

1. What is a filehandle?

2. What does it mean to open a file for reading?

3. When opened for writing, if the file exists, what happens to it?

4. How does > differ from >> when opening a file?

5. What is the purpose of the select() function?

6. What is binmode?

7. What does the die() function accomplish when working with files?

8. How do Windows and UNIX differ in how they terminate a line?

9. What is an exclusive lock?

10. What does the tell() function return?

11. What is the difference between the +< and +> symbols?

12. What does the stat() function do?

13. How do you reposition the file pointer in a file?

14. How does the -M switch work when testing a file?

10.6 What’s Next?

Until this point, all the functions you have used were provided by Perl. The print() and printf(), push(), pop(), and chomp() functions are all examples of built-in Perl functions. All you had to know was what they were supposed to do and how to use them. You did not have to know what the Perl authors did to make the function work; you just assumed they knew what they were doing. In the next chapter, you will write your own functions, also called subroutines, and learn how to send messages to them and return some result.

Exercise 10: Getting a Handle on Things

Part 1

1. Create a filehandle for reading from the datebook file (on the CD); print to another filehandle the names of all those who have a salary greater than $50,000.

2. Ask the user to input data for a new entry in the datebook file. (The name, phone, address, and so on, will be stored in separate scalars.) Append the newline to the datebook file by using a user-defined filehandle.

Part 2

This problem appeared on a Web site called daniweb.com. Can you solve it?

1. We need a Perl program that will check whether or not an IP address entered by a user is valid. The user is to enter the IP address as a command-line parameter. For example, the user could type at the prompt

check_ip.pl 192.168.9.23

and the script will attempt to validate the IP address 192.168.9.23.

2. The script must first check whether the user has input any data and if not, display an appropriate error message. A valid IP address must have:

a. Four octets, each separated by a dot.

b. Only numbers are allowed in each of the four octets (meaning, no alphabetic or punctuation characters are allowed within each octet).

c. The first octet values are between 1 and 255. The second, third, and fourth octet values are between 0 and 255. Only one IP Address is to be input and validated (meaning, there is no looping through several IP addresses).

Part 3

1. Use a pipe to list all the files in your current directory, and print only those files that are readable text files. Use the die function to quit if the open fails. For UNIX users, the command is ls. For Windows use dir /b. (Hint: Don’t forget to chomp!)

2. Rewrite the program to test whether any of the files listed have been modified in the last 12 hours. Print the names of those files.

Part 4

1. Sort the datebook file by names, using a pipe.

Part 5

1. Create a number of duplicate entries in the datebook file. Fred Fardbarkle, for example, might appear five times, and Igor Chevsky three times. In most editors, this will be a simple copy/paste operation.

a. Write a program that will assign the name of the datebook file to a scalar and check to see if the file exists. If it does exist, the program will check to see if the file is readable and writeable. Use the die function to send any errors to the screen. Also tell the user when the datebook was last modified.

b. The program will read each line of the datebook file giving each person a 10% raise in salary. If, however, the person appears more than once in the file (assume having the same first and last name means it is a duplicate), he will be given a raise the first time, but if he appears again, he will be skipped. Send each line of output to a file called raise. The raise file should not contain any person’s name more than once. It will also reflect the 10% increase in pay. Display on the screen the average salary for all the people in the datebook file. For duplicate entries, print the names of those who appeared in the file more than once, and how many times each appeared.

2. Write a script called checking that will take any number of filenames as command-line arguments and will print the names of those files that are readable and writeable text files. The program will print an error message if there are no arguments, and exit.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10. Getting a Handle on Files

Create new playlist

Sign In

Sign Up

Chapter 10. Getting a Handle on Files

10.1 The User-Defined Filehandle

10.1.1 Opening Files—The open Function

10.1.2 Opening for Reading

Closing the Filehandle

The die Function

10.1.3 Reading from a File and Scalar Assignment

The Filehandle and $_

The Filehandle and a User-Defined Scalar Variable

“Slurping” a File into an Array

Using map to Create Fields from a File

Slurping a File into a String with the read Function

10.1.4 Loading a Hash from a File

10.2 Reading from STDIN

10.2.1 Assigning Input to a Scalar Variable

10.2.2 The chop and chomp Functions

10.2.3 The read Function

10.2.4 The getc Function

10.2.5 Assigning Input to an Array

10.2.6 Assigning Input to a Hash

10.2.7 Opening for Writing

10.2.8 Win32 Binary Files

10.2.9 Opening for Appending

10.2.10 The select Function

10.2.11 File Locking with flock

10.2.12 The seek and tell Functions

The seek Function

The tell Function

10.2.13 Opening for Reading and Writing

10.2.14 Opening for Anonymous Pipes

The Output Filter

Sending the Output of a Filter to a File

Input Filter

10.3 Passing Arguments

10.3.1 The @ARGV Array

10.3.2 ARGV and the Null Filehandle

10.3.3 The eof Function

10.3.4 The -i Switch—Editing Files in Place

10.4 File Testing

10.5 What You Should Know

10.6 What’s Next?

Exercise 10: Getting a Handle on Things

Table of Contents for
Chapter 10. Getting a Handle on Files