One of the best parts of being a programmer is launching someone else’s code so that you don’t have to write it yourself. It’s time to learn how to manage your children—child processes, that is—by launching other programs directly from Perl.
And like everything else in Perl, There’s More Than One Way To Do It, with lots of overlap, variations, and special features. So, if you don’t like the first way, just read on for another page or two for a solution more to your liking.
Perl is very portable; most of the rest of this book doesn’t need many notes saying that it works this way in Unix systems and that way in Windows and some other way on VMS. But when you’re starting other programs on your machine, different programs are available on a Macintosh than what you’ll likely find on an old Cray (which used to be a “super” computer). The examples in this chapter are primarily Unix based; if you have a non-Unix system, you can expect to see some differences.
The simplest way to launch a child process in Perl to run a program is
with the system
function. For example, to
invoke the Unix date command from within Perl, you tell system
that’s the program you want to
run:
system 'date';
These commands depend on your system, what it provides, and how it implements those commands. They aren’t Perl, but something Perl asks the system to do for your program. The same Unix command might have different calling conventions and options on different versions of that operating system.
If you are using Windows, that code will show you the date but also
prompt you to enter a new date. Your program will wait for you to enter a
new date. You probably want the /T
switch to suppress that:
system 'date /T';
You run that from the parent process. When it
runs, the system
command creates an
identical copy of your Perl program, called the child
process. The child process immediately changes itself into the command
that you want to run, such as date,
sharing Perl’s standard input, standard output, and standard error. This
means that the normal short date-and-time string generated by date ends up wherever Perl’s STDOUT
was already going.
The parameter to the system
function is generally whatever you normally type at the shell. So, if it
were a more complicated command, like ls -l $HOME to list
the contents of your home directory, you could put all that into the
parameter:
system 'ls -l $HOME';
That $HOME
is a shell variable that knows the path to your home directory. It’s not a Perl
variable and you don’t want to interpolate it. If you had it in double
quotes, you’d have to escape the $
to
prevent the interpolation:
system "ls -l $HOME";
On Windows, the same task uses the dir command. The
%
signs belong to the command, not a
Perl variable. But hashes don’t interpolate in double-quoted strings, so
you don’t need to escape them:
system "cmd /c dir %userprofile%"
If you have Cygwin or MinGW installed, some of the Windows command
shell commands may run something different than what you expect. Using
cmd /c
ensures that you get the Windows
version.
Now, the normal Unix date command
is output-only, but let’s say it’s a chatty command, asking first “for
which time zone do you want the time?”, or that the Windows version
prompts you for the new date. That message ends up on standard output and
the program listens on standard input (inherited from Perl’s STDIN
) for the response. You see the question,
and type in the answer (like “Zimbabwe time”), and then date will finish its duty.
While the child process is running, Perl patiently waits for it to finish. So if the date command takes 37 seconds, Perl pauses for those 37 seconds. You can use the shell’s facility to launch a background process, however:
system "long_running_command with parameters &";
Here, the shell gets launched, which then notices the ampersand at
the end of the command line, causing the shell to put long_running_command
into the background. And
then the shell exits rather quickly; Perl notices this and moves on. In
this case, the long_running_command
is
really a grandchild of the Perl process, to which
Perl really has no direct access or knowledge.
Windows doesn’t have a backgrounding mechanism, but start can run the command without your program waiting for it:
system 'start /B long_running_command with parameters'
When a command is “simple enough,” no shell gets involved. So for
the date and ls commands earlier, Perl directly launched your
requested command, which searches the inherited PATH
to find the command, if necessary. But if
there’s anything weird in the string (such as shell metacharacters like
the dollar sign, semicolon, or vertical bar), Perl invokes the standard
Bourne Shell (/bin/sh) in Unix or the
shell set in the PERL5SHELL
environment
variable in Windows (by default, that’s cmd /x/d/c).
The PATH
is the list of
directories your system searches to find programs. You can change
PATH
by adjusting $ENV{'PATH'}
at any time.
For example, you can write an entire little shell script in the argument. This one prints the contents of all the (nonhidden) files in the current directory:
system 'for i in *; do echo == $i ==; cat $i; done';
Here, again, you’re using single quotes because the dollar signs are
for the shell and not for Perl. Double quotes would allow Perl to
interpolate $i
to its current Perl
value and not let the shell expand it to its own value.
On Windows you don’t have those interpolation issues. The /R
works recursively, so you might end up with a
long list of files:
system 'for /R %i in (*) DO echo %i & type %i'
Note that your ability to do this doesn’t mean it’s wise for you to do this. You know it’s possible, but often there’s a pure Perl solution that will do the same thing. On the other hand, Perl is a glue language meant to work in the ugly spaces between programs that need to coordinate with each other.
The system operator may also be invoked with more than one argument, in which case a shell doesn’t get involved, no matter how complicated the text:
my $tarfile = 'something*wicked.tar'; my @dirs = qw(fred|flintstone <barney&rubble> betty ); system 'tar', 'cvf', $tarfile, @dirs;
system
can use an indirect
object, such as system { 'fred' }
'barney';
, which runs the program barney
, but lies to it so that it thinks
it’s called 'fred'
. See the perlsec
documentation or Mastering
Perl’s security chapter for more details.
In this case, the first parameter ('tar'
here) gives the name of a command found
in the normal PATH
-searching way,
while Perl passes the remaining arguments one by one, directly to that
command. Even if the arguments have shell-significant characters, such
as the name in $tarfile
or the
directory names in @dirs
, the shell
never gets a chance to mangle the string. That tar command will get precisely five parameters. Compare that with this
security problem:
system "tar cvf $tarfile @dirs"; # Oops!
Here, you’ve now piped a bunch of stuff into a flintstone command, put it into the
background, and opened betty for
output. That’s a relatively tame effect, but what if @dirs
was something more interesting, such
as:
my @dirs = qw( ; rm -rf / );
It doesn’t matter that @dirs
is
a list, because Perl simply interpolates it into the single string to
pass to system
.
And that’s a bit scary, especially if those variables are from
user input—such as from a web form or something. So if you
can arrange things so that you can use the multiple-argument version of system
, you
probably should use that way to launch your subprocess. You’ll have to
give up the ability to have the shell do the work for you to set up I/O
redirection, background processes, and the like, though. There’s no such
thing as a free lunch.
Note that redundantly, a single-argument invocation of system
is nearly equivalent to the proper
multiple-argument version of system
:
system $command_line; system '/bin/sh', '-c', $command_line;
But nobody writes the latter since that’s what Perl does already. If you want things processed by a different shell, like the C-shell, you can specify that:
system '/bin/csh', '-fc', $command_line;
This is also handy for handling whitespace in filenames since the shell doesn’t step in to break up the arguments. This command sees exactly one filename:
system 'touch', 'name with spaces.txt';
See Mastering
Perl for a longer discussion of the security
features of the list form of system
. The perlsec
documentation comes in handy too.
On Windows, you can set the $ENV{PERL5SHELL}
value to be the shell that you want. You’ll see environment variables in
the next section, so keep reading.
The return value of the system operator is based on the exit status of the child command:
unless (system 'date') { # Return was zero, meaning success print "We gave you a date, OK! "; }
An exit value of 0
usually
means that everything is OK, and a nonzero exit value usually indicates
that something went wrong. This is part of the “0 but true” notion that
the value of zero is a good thing. This is backward from the normal
“true is good—false is bad” strategy for most of the operators, so to
write a typical “do this or die” style, we’ll need to flip false and
true. The easiest way is to simply prefix the system
operator with a bang (the logical-not
operator):
!system 'rm -rf files_to_delete' or die 'something went wrong';
In this case, including $!
in
the error message is not appropriate, because the failure is most likely
somewhere within the experience of the external rm command, and it’s not a system
-related error within Perl that $!
can reveal.
Don’t rely on this behavior, though. It’s up to each command to decide what it wants to return. Some values may be nonzero but also indicate success. If that’s the case, you need to inspect the return value more closely.
The system
return value is two
octets. The “high” octet has the exit value of the program. If you want
that, you need to shift the bits down eight bits (remember the bit
operators from Chapter 12):
my $return_value = system( ... ); my $child_exit_code = $return_value >> 8;
The “low” octet combines several things. The highest bit notes if a core dump happened. The hexadecimal and binary representations (recall them from Chapter 2) can help mask out the parts you don’t want:
my $low_octet = $return_value & 0xFF; # mask out high octet my $dumped_core = $low_octet & 0b1_0000000; # 128 my $signal_number = $low_octet & 0b0111_1111; # 0x7f, or 127
Since Windows doesn’t have signals, the bits in these positions may have other meanings.
Your system may have a more specific error message in the
variable $^E
or ${^CHILD_ERROR_NATIVE}
. See perlrun
and the POSIX
module
(especially the W*
macros to decode
signals).
When you’re starting another process (with any of the methods we show here), you may need to set up its environment in one way or another. As we mentioned earlier, you could start the process with a certain working directory, which it inherits from your process. Another common configuration detail is the environment variables.
One of the best-known environment variables is PATH
. (If you’ve never
heard of it, you probably haven’t used a system that has environment
variables.) On Unix and similar systems, PATH
is a colon-separated list of directories
that may hold programs. When you type a command like rm fred, the system will look for the rm command in
that list of directories, in order. Perl (or your system) will use
PATH
whenever it needs to find the
program to run. If the program in turn runs other programs, those may also
be found along the PATH
. (Of course, if
you give a complete name for a command, such as /bin/echo, there’s no need to search PATH
. But that’s generally much less
convenient.)
In Perl, the environment variables are available via the special
%ENV
hash; each key in this hash
represents one environment variable. At the start of your program’s
execution, %ENV
holds values it has
inherited from its parent process (generally the shell). Modifying this
hash changes the environment variables, which will then be inherited by
new processes and possibly used by Perl as well. For example, suppose you
wished to run the system’s make utility
(which typically runs other programs), and you want to use a
private directory as the first place to look for commands (including
make itself). And let’s say that you
don’t want the IFS
environment variable
to be set when you run the command, because that might cause make or some subcommand to do the wrong thing.
Here we go:
$ENV{'PATH'} = "/home/rootbeer/bin:$ENV{'PATH'}"; delete $ENV{'IFS'}; my $make_result = system 'make';
Different systems construct their paths differently. Unix uses
colons, but Windows uses semicolons, for instance. This is your constant
headache of working with external programs. You have to know a lot that
isn’t Perl. But Perl knows about the system it runs on, and you can find
out what it knows with the Config
module through its %Config
variable. Instead of assuming the PATH
separator as you did in the previous example, you could use join
with a glue
string you get from %Config
:
use Config; $ENV{'PATH'} = join $Config{'path_sep'}, '/home/rootbeer/bin', $ENV{'PATH'};
Newly created processes will generally inherit from their parent the environment variables, the current working directory, the standard input, output, and error streams, and a few more esoteric items. See the documentation about programming on your system for more details. (But on most systems, your program can’t change the environment for the shell or other parent process that started it.)
Everything we’ve just said about system
syntax and semantics is also true about the exec
function, except for one (very important)
thing. The system
function creates a
child process, which then scurries off to perform the requested action
while Perl naps. The exec
function
causes the Perl process itself to perform the
requested action. Think of it as more like a “goto” than a subroutine
call.
For example, suppose you wanted to run the bedrock command in the /tmp directory, passing it arguments of -o args1 followed by whatever arguments your own program was invoked with. That’d look like this:
chdir '/tmp' or die "Cannot chdir /tmp: $!"; exec 'bedrock', '-o', 'args1', @ARGV;
When you reach the exec
operation, Perl locates bedrock and
“jumps into it.” At that point, there is no Perl process anymore, even
though it’s the same process, having performed the Unix exec
system call (or equivalent). The process
ID remains the same, but it’s now just the process running the bedrock command. When bedrock finishes, there’s no Perl to come back
to.
Why is this useful? Sometimes you want to use Perl to set up the environment for a program. You can affect environment variables, change the current working directory, and change the default filehandles:
$ENV{PATH} = '/bin:/usr/bin';
$ENV{DEBUG} = 1;
$ENV{ROCK} = 'granite';
chdir '/Users/fred';
open STDOUT, '>', '/tmp/granite.out';
exec 'bedrock';
If you use system
instead of
exec
, you have a Perl program just
standing around tapping its toes, waiting for the other program to complete
just so Perl could finally immediately exit as well, and that wastes a
resource.
Having said that, it’s actually quite rare to use exec
, except in combination with fork
(which you’ll
see later). If you are puzzling over system
versus exec
, just pick system
, and nearly all of the time you’ll be
just fine.
Because Perl is no longer in control once the requested command has
started, it doesn’t make any sense to have any Perl code following the
exec
, except for handling the error
when the requested command cannot be started:
exec 'date'; die "date couldn't run: $!";
With both system
and exec
, the output of the launched command ends up wherever Perl’s standard
output is going. Sometimes it’s interesting to capture that output as a
string value to perform further processing. And that’s done simply by
creating a string using backquotes instead of single or double
quotes:
my $now = `date`; # grab the output of date print "The time is now $now"; # newline already present
Normally, this date command
spits out a string approximately 30 characters long to its
standard output, giving the current date and time followed by a newline.
When you’ve placed date between
backquotes, Perl executes the date
command, arranging to capture its standard output as a string value, and
in this case assigning it to the $now
variable.
This is very similar to the Unix shell’s meaning for backquotes.
However, the shell also performs the additional job of ripping off the
final end-of-line to make it easier to use the value as part of other
things. Perl is honest; it gives the real output. To get the same result
in Perl, you can simply add an additional chomp
operation on the result:
chomp(my $no_newline_now = `date`); print "A moment ago, it was $no_newline_now, I think. ";
The value between backquotes is just like the single-argument form
of system
and is interpreted as a double-quoted string, meaning that
backslash-escapes and variables are expanded appropriately. For example,
to fetch the Perl documentation on a list of Perl functions, we might
invoke the perldoc
command repeatedly, each time with a different argument:
my @functions = qw{ int rand sleep length hex eof not exit sqrt umask }; my %about; foreach (@functions) { $about{$_} = `perldoc -t -f $_`; }
Note that $_
has a different value
for each invocation, letting you grab the output of a different command
that varies in only one of its parameters. Also note that if you haven’t seen
some of these functions yet, it might be useful to look them up in the
documentation to see what they do!
Instead of the backquotes, you can use the generalized quoting
operator, qx()
, that does
the same thing:
foreach (@functions) { $about{$_} = qx(perldoc -t -f $_); }
As with the other generalized quotes, you mainly use this when the
stuff inside the quotes also contains the default delimiter. If you want to
have a literal backquote in your command, you can use the qx()
mechanism to avoid the hassle of escaping
the offending character. There’s another benefit to generalized
quoting: if you use the single quote as the delimiter, the quoting
does not interpolate anything. If you want to use the shell’s process ID
variable, $$
, instead of
Perl’s, you use qx''
to avoid the
interpolation:
my $output = qx'echo $$';
At the risk of actually introducing the behavior by demonstrating how not to do it, we’d also like to suggest that you avoid using backquotes in a place where the value isn’t being captured. For example:
print "Starting the frobnitzigator: "; `frobnitz -enable`; # no need to do this if you ignore the string print "Done! ";
The problem is that Perl has to work a bit harder to capture the
output of this command, even if you don’t use it. This is known as
void context and you should generally avoid asking
Perl to do work when you won’t use the result. You also lose the option to
use multiple arguments to system
to
precisely control the argument list. So from both a security standpoint
and an efficiency viewpoint, just use system
instead, please.
Standard error of a backquoted command goes to the same place as
Perl’s current standard error output. If the command spits out error
messages to the default standard error, you’ll probably see them on the
terminal, which could be confusing to the user who hasn’t personally
invoked the frobnitz command but still
sees its errors. If you want to capture error messages with standard
output, you can use the shell’s normal “merge standard error to the
current standard output,” which is spelled 2>&1
in the normal Unix and Windows
shells:
my $output_with_errors = `frobnitz -enable 2>&1`;
Note that this will intermingle the standard error output with the
standard output, much as it appears on the terminal (although possibly in
a slightly different sequence because of buffering). If you need the
output and the error output separated, there are many more flexible
solutions, such as IPC::Open3
in the
standard Perl library, or writing your own forking code, as you will see
later. Similarly, standard input is inherited from Perl’s current standard
input. Most commands you typically use with backquotes do not read
standard input, so that’s rarely a problem. However, let’s say the
date command asked which time zone (as
we imagined earlier). That’ll be a problem because the prompt for “which
time zone” will be sent to standard output, which is being captured as
part of the value, and then the date
command will start trying to read from standard input. But since the user
has never seen the prompt, they don’t know they should be typing anything!
Pretty soon, the user calls you up and tells you your program is
stuck.
So, stay away from commands that read standard input. If you’re not sure whether something reads from standard input, add a redirection from /dev/null for input, like this for Unix:
my $result = `some_questionable_command arg arg argh </dev/null`;
and like this for Windows:
my $result = `some_questionable_command arg arg argh < NUL`;
Then the child shell will redirect input from the “null device,” and the questionable grandchild command will, at worst, try to read and immediately get an end-of-file.
The Capture::Tiny
and IPC::System::Simple
modules can capture the output while handling the system-specific
details for you. Install them from CPAN.
The scalar context use of backquotes returns the captured output as a single long string, even if it looks to you like there are multiple “lines” because it has newlines. Computers don’t care about lines, really. That’s something we care about and tell computers to interpret for us. Those newlines are just another character as far as a computer is concerned. However, using the same backquoted string in a list context yields a list containing one line of output per element.
For example, the Unix who command normally spits out a line of text for each current login on the system as follows:
merlyn tty/42 Dec 7 19:41 rootbeer console Dec 2 14:15 rootbeer tty/12 Dec 6 23:00
The left column is the username, the middle column is the TTY name (that is, the name of the user’s connection to the machine), and the rest of the line is the date and time of login (and possibly remote login information, but not in this example). In a scalar context, we get all that at once, which we would then need to split up on our own:
my $who_text = `who`; my @who_lines = split / /, $who_text;
But in a list context, we automatically get the data broken up by lines:
my @who_lines = `who`;
You’ll have a number of separate elements in @who_lines
, each one terminated by a newline.
Of course, adding a chomp
around the
outside of that will rip off those newlines, but you can go in a different
direction. If you put that as part of the value for a foreach
, you’ll iterate over the lines
automatically, placing each one in $_
:
foreach (`who`) { my($user, $tty, $date) = /(S+)s+(S+)s+(.*)/; $ttys{$user} .= "$tty at $date "; }
This loop will iterate three times for the sample who
output. (Your
system will probably have more than three active logins at any given
time.) Notice that you have a regular expression match, and in the
absence of the binding operator (=~
),
it matches against $_
—which is good,
because that’s where the data is.
Also notice the regular expression looks for a nonblank word, some
whitespace, a nonblank word, some whitespace, and then the rest of the
line up to, but not including, the newline (since dot doesn’t match
newline by default). That’s also good, because that’s what the data
looks like each time in $_
. That’ll
make $1
be merlyn
, $2
be tty/42
, and $3
be Dec 7
19:41
, as a successful match on the first time through the
loop.
Now you can see why dot (or N
) doesn’t match newline by default. It
makes it easy to write patterns like this one, in which we don’t have
to worry about a newline at the end of the string.
However, this regular expression match is in a list context, so
you get the list of memories instead of the true/false “did it match”
value, as you saw in Chapter 8. So, $user
ends up being merlyn
, and so on.
The second statement inside the loop simply stores away the TTY
and date information, appending to a (possibly undef
) value in the hash, because a user might
be logged in more than once, as user rootbeer
was in that example.
Running or capturing output from external commands is tricky business, especially
since Perl aims to work on so many diverse platforms, each with its own
way of doing things. Paul Fenwick’s IPC::System::Simple
module fixes that by
providing a simpler interface that hides the complexity of the operating
system−specific stuff. It doesn’t come with Perl (yet), so you have to get
it from CPAN.
There’s really not that much to say about this module, because it is
truly simple. You can use it to replace the built-in system
with its own, more robust version:
use IPC::System::Simple qw(system); my $tarfile = 'something*wicked.tar'; my @dirs = qw(fred|flintstone <barney&rubble> betty ); system 'tar', 'cvf', $tarfile, @dirs;
It also provides a systemx
that
never uses the shell, so you should never have the problem of unintended
shell actions:
systemx 'tar', 'cvf', $tarfile, @dirs;
If you want to capture the output, you change the system
or systemx
to capture
or capturex
, both of which work like backquotes
(but better):
my @output = capturex 'tar', 'cvf', $tarfile, @dirs;
Paul put in a lot of work to ensure that these subroutines do the right thing under Windows. There’s a lot more that this module can do to make your life easier, although we’ll refer you to the module documentation for that since some of the fancier features require references, which we don’t show you until you read Intermediate Perl. If you can use this module, we recommend it over the built-in Perl operators for the same thing.
So far, you’ve seen ways to deal with synchronous processes, where Perl stays in charge, launches a command, (usually) waits for it to finish, then possibly grabs its output. But Perl can also launch a child process that stays alive, communicating to Perl on an ongoing basis until the task is complete.
The syntax for launching a concurrent (parallel) child process is to
put the command as the “filename” for an open
call, and either precede or follow the
command with a vertical bar, which is the “pipe” character. For that
reason, this is often called a piped open. In the
two-argument form, the pipe goes before or after the command that you want
to run:
open DATE, 'date|' or die "cannot pipe from date: $!"; open MAIL, '|mail merlyn' or die "cannot pipe to mail: $!";
In the first example, with the vertical bar on the right, Perl
launches the command with its standard output connected to the DATE
filehandle opened for reading, similar to
the way that the command date |
your_program would work from the shell. In the second example,
with the vertical bar on the left, Perl connects the command’s standard
input to the MAIL
filehandle opened for
writing, similar to what happens with the command your_program | mail merlyn. In either case, the
command continues independently of the Perl process. The open
fails if Perl can’t start the child
process. If the command itself does not exist or exits erroneously, Perl
will not see this as an error when opening the filehandle, but as an error
when closing it. We’ll get to that in a moment.
If the Perl process exits before the command is complete, a command that’s been reading will see end-of-file, while a command that’s been writing will get a “broken pipe” error signal on the next write, by default.
The three-argument form is a bit tricky, because for the read
filehandle, the pipe character comes after the command. There are special
modes for that, though. For the filehandle mode, if you want a read
filehandle, you use -|
, and if you want
a write filehandle, you use |-
to show
which side of the pipe you want to place the command:
open my $date_fh, '-|', 'date' or die "cannot pipe from date: $!"; open my $mail_fh, '|-', 'mail merlyn' or die "cannot pipe to mail: $!";
The pipe open
can also take more
than three arguments. The fourth and subsequent arguments become the
arguments to the command, so you can break up that command string to
separate the command name from its arguments:
open my $mail_fh, '|-', 'mail', 'merlyn' or die "cannot pipe to mail: $!";
Sadly, the list form of the piped open
doesn’t work
in Windows. You’ll have to settle for a module to do that for you.
Either way, for all intents and purposes, the rest of the program doesn’t know, doesn’t care, and would have to work pretty hard to figure out that this is a filehandle opened on a process rather than on a file. So, to get data from a filehandle opened for reading, you read the filehandle normally:
my $now = <$date_fh>;
And to send data to the mail process (waiting for the body of a
message to deliver to merlyn
on
standard input), a simple print-with-a-filehandle will do:
print $mail_fh "The time is now $now"; # presume $now ends in newline
In short, you can pretend that these filehandles are hooked up to magical files, one that contains the output of the date command, and one that will automatically be mailed by the mail command.
If a process is connected to a filehandle that is open for reading, and then exits, the filehandle returns end-of-file, just like reading up to the end of a normal file. When you close a filehandle open for writing to a process, the process will see end-of-file. So, to finish sending the email, close the handle:
close $mail_fh; die "mail: nonzero exit of $?" if $?;
When closing a filehandle attached to a process, Perl waits for the process to
complete so that it can get the process’s exit status. The exit status
is then available in the $?
variable
(reminiscent of the same variable in the Bourne Shell) and is the same
kind of number as the value returned by the system
function: zero for success, nonzero for
failure. Each new exited process overwrites the previous value, though, so
save it quickly if you want it. (The $?
variable also
holds the exit status of the most recent system
or backquoted command, if you’re
curious.)
The processes are synchronized just like a pipelined command. If you try to read and no input is available, the process is suspended (without consuming additional CPU time) until the sending program has started speaking again. Similarly, if a writing process gets ahead of the reading process, the writing process is slowed down until the reader starts to catch up. There’s a buffer (usually 8 KB or so) in between, so they don’t have to stay precisely in lockstep.
Why use processes as filehandles? Well, it’s the only easy way to write to a process based on the results of a computation. But if you’re just reading, backquotes are often much easier to manage, unless you want to have the results as they come in.
For example, the Unix find command locates files based on their attributes, and it can take quite a while if used on a fairly large number of files (such as starting from the root directory). You can put a find command inside backquotes, but it’s often nicer to see the results as they are found:
open my $find_fh, '-|', 'find', qw( / -atime +90 -size +1000 -print ) or die "cannot pipe from find: $!"; while (<$find_fh>) { chomp; printf "%s size %dK last accessed %.2f days ago ", $_, (1023 + -s $_)/1024, -A $_; }
That find command looks for all the files that have not been accessed within the past 90 days and that are larger than 1,000 blocks (these are good candidates to move to longer-term storage). While find is searching and searching, Perl can wait. As it finds each file, Perl responds to the incoming name and displays some information about that file for further research. Had this been written with backquotes, you would not see any output until the find command had completely finished, and it’s comforting to see that it’s actually doing the job even before it’s done.
In addition to the high-level interfaces already described, Perl provides nearly direct access to the low-level process management system calls of Unix and some other systems. If you’ve never done this before, you will probably want to skip this section. While it’s a bit much to cover all that stuff in a chapter like this, let’s at least look at a quick reimplementation of this:
system 'date';
You can do that using the low-level system calls:
defined(my $pid = fork) or die "Cannot fork: $!"; unless ($pid) { # Child process is here exec 'date'; die "cannot exec date: $!"; } # Parent process is here waitpid($pid, 0);
Windows does not support a native fork
, but Perl tries to fake it. If you want to do this sort of
thing, you can use Win32::Process
or
a similar module for native process management.
Here, you check the return value from fork
, which is undef
if it failed. Usually it succeeds, causing
two separate processes to continue to the next line, but only the parent
process has a nonzero value in $pid
, so
only the child process executes the exec
function. The parent process skips over
that and executes the waitpid
function,
waiting for that particular child to finish (if others
finish in the meantime, they are ignored). If that all sounds like
gobbledygook, just remember that you can continue to use the system
function without being laughed at by your
friends.
When you go to this extra trouble, you also have full control over creating arbitrary pipes, rearranging filehandles, and noticing your process ID and your parent’s process ID (if knowable). But again, that’s all a bit complicated for this chapter, so see the details in the perlipc documentation (and in any good book on application programming for your system) for further information.
A Unix signal is a tiny message sent to a process. It can’t say much; it’s like
a car horn honking. Does that honk you hear mean “look out—the bridge
collapsed” or “the light has changed—get going” or “stop driving—you’ve
got a baby on the roof” or “hello, world”? Well, fortunately, Unix signals
are a little easier to interpret than that because there’s a different one
for each of these situations. Well, not exactly these
situations, but analogous Unix-like ones. For these, the signals
are SIGHUP
, SIGCONT
, SIGINT
, and the fake SIGZERO
(signal number zero).
Windows implements a subset of POSIX signals, so much of this might not be true on that system.
Different signals are identified by a name (such as SIGINT
, meaning “interrupt signal”) and a
corresponding small integer (in the range from 1 to 16, 1 to 32, or 1 to
63, depending on your Unix flavor). Programs or the operating system
typically send signals to another program when a significant event
happens, such as pressing the interrupt character (typically Ctrl-C) on
the terminal, which sends a SIGINT
to
all the processes attached to that terminal. Some signals are sent
automatically by the system, but they can also come from another
process.
You can send signals from your Perl process to another process, but
you have to know the target’s process ID number. How you figure that out
is a bit complicated, but let’s say you know that you want to send a
SIGINT
to process 4201. That’s easy
enough if you know that SIGINT
corresponds to the number 2
:
kill 2, 4201 or die "Cannot signal 4201 with SIGINT: $!";
It’s named “kill” because one of the primary purposes of signals is
to stop a process that’s gone on long enough. You can also use the string
'INT'
in place of the 2
, so you don’t have to know the number:
kill 'INT', 4201 or die "Cannot signal 4201 with SIGINT: $!";
You can even use the =>
operator to automatically quote the signal name:
kill INT => 4201 or die "Cannot signal 4201 with SIGINT: $!";
On a Unix system, the kill command (not the Perl built-in) can translate between the signal number and the name:
$ kill -l 2
INT
Or, given a name, it can give you the number:
$ kill -l INT
2
With no argument to -l
, it prints
all the numbers and names:
$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL
5) SIGTRAP 6) SIGABRT 7) SIGEMT 8) SIGFPE
9) SIGKILL 10) SIGBUS 11) SIGSEGV 12) SIGSYS
13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGURG
17) SIGSTOP 18) SIGTSTP 19) SIGCONT 20) SIGCHLD
21) SIGTTIN 22) SIGTTOU 23) SIGIO 24) SIGXCPU
25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH
29) SIGINFO 30) SIGUSR1 31) SIGUSR2
If you try to interrupt a process that no longer exists or isn’t yours, you’ll get a false return value.
You can also use this technique to see whether a process is still
alive. A special signal number of 0
says, “Just check to see whether I could send a
signal if I wanted to, but I don’t want to, so don’t actually send
anything.” So a process probe might look like:
unless (kill 0, $pid) { warn "$pid has gone away!"; }
Perhaps a little more interesting than sending signals is catching signals. Why might you want to do this? Well, suppose you have a program that creates files in /tmp, and you normally delete those files at the end of the program. If someone presses Ctrl-C during the execution, that leaves trash in /tmp, a very impolite thing to do. To fix this, you can create a signal handler that takes care of the cleanup:
my $temp_directory = "/tmp/myprog.$$"; # create files below here mkdir $temp_directory, 0700 or die "Cannot create $temp_directory: $!"; sub clean_up { unlink glob "$temp_directory/*"; rmdir $temp_directory; } sub my_int_handler { &clean_up(); die "interrupted, exiting... "; } $SIG{'INT'} = 'my_int_handler'; ...; # some unspecified code here # Time passes, the program runs, creates some temporary # files in the temp directory, maybe someone presses Ctrl-C ...; # Now it's the end of normal execution &clean_up();
The File::Temp
module, which
comes with Perl, can automatically clean up temporary files
and directories.
The assignment into the special %SIG
hash
activates the handler (until revoked). The key is the name of the signal
(without the constant SIG
prefix), and
the value is a string naming the subroutine, without the ampersand. From
then on, if a SIGINT
comes along, Perl
stops whatever it’s doing and jumps immediately to the subroutine. Your
subroutine cleans up the temp files and then exits. (And if nobody presses
Ctrl-C, we’ll still call &clean_up()
at the end of normal
execution.)
If the subroutine returns rather than exiting, execution resumes right where the signal interrupted it. This can be useful if the signal needs to actually interrupt something rather than causing it to stop. For example, suppose processing each line of a file takes a few seconds, which is pretty slow, and you want to abort the overall processing when an interrupt is processed—but not in the middle of processing a line. Just set a flag in the signal procedure and check it at the end of each line’s processing:
my $int_$flag = 0; $SIG{'INT'} = 'my_int_handler'; sub my_int_handler { $int_flag = 1; } while( ... doing stuff .. ) { last if $int_flag; ... } exit();
For the most part, Perl will only handle a signal once it reaches a
safe point to do so. For instance, Perl will not deliver most signals in
the middle of allocating memory or rearranging its internal data
structures. Perl delivers some signals, such as SIGILL
, SIGBUS
, and SIGSEGV
, right away, so those are still unsafe.
See the perlipc
documentation.
See “Answers to Chapter 15 Exercises” for answers to these exercises:
[6] Write a program that changes to some particular (hardcoded) directory, like the system’s root directory, then executes the ls -l command to get a long-format directory listing in that directory. (If you use a non-Unix system, use your own system’s command to get a detailed directory listing.)
[10] Modify the previous program to send the output of the command to a file called ls.out in the current directory. The error output should go to a file called ls.err. (You don’t need to do anything special about the fact that either of these files may end up being empty.)
[8] Write a program to parse the output of the date command to determine the current day of
the week. If the day of the week is a weekday, print get to work
; otherwise, print go play
. The output of the
date command begins with Mon
on a Monday. If you don’t have a
date command on your non-Unix
system, make a fake little program that simply prints a string like
date might print. We’ll even give
you this two-line program if you promise not to ask us how it
works:
#!/usr/bin/perl print localtime( ) . " ";
[15] (Unix only) Write an infinite loop program that catches
signals and reports which signal it caught and how many times it has
seen that signal before. Exit if you catch the INT
signal. If you can use the command-line
kill, you can send signals like
so:
$ kill -USR1 12345
If you can’t use the command-line kill, write another program to send signals to it. You might be able to get away with a Perl one-liner:
$ perl -e 'kill HUP => 12345'
3.128.204.140