Unix Features in Perl

Perl's Unix heritage shows in many of its built-in features, which are borrowed directly from Unix tools such as shells, or relate specifically to the management of various Unix files. In this part of the lesson, then, we'll look at the features in Perl useful on Unix systems, including

  • Working with environment variables

  • Using the system function to run other programs

  • Running other programs and capturing their output with backquotes

  • Creating and managing new processes with fork, wait, and exec

  • Some functions for managing Unix user and group information

Note that with the exception of processes, many of these features might also be available in versions of Perl for other systems, with different or more limited behavior. So, even if you're working on Windows, you might want to at least scan this section before skipping down to the part that relates to your own platform.

Note

If you're a Mac OS X user, you should pay attention to the Unix information in this lesson. Mac OS X is based on BSD Unix, and so the version of Perl that it uses is the Unix version.


Environment Variables

Perl scripts, like shell scripts, inherit their environment (the current execution path, username, shell, and so on) from the shell in which they were started (or from the user ID that runs them). And, if you run other programs or spawn processes from inside your Perl script, they will get their environment from your script in turn. When you run Perl scripts from the command line, these variables might not have much interest for you. But Perl scripts that run in other environments might have additional variables relating to that environment, or might have different values for those variables than what you expect. CGI scripts, for example, have a number of environment variables relating to various CGI-related features, as you learned on Day 16.

Perl stores all its environment variables in a special hash called %ENV, where the keys are the names of the variables, and the values are those values. Environment variables are commonly in uppercase. So, for example, to print the execution path for your script, you'd use a line like this:

print "Path: $ENV{PATH} 
";

You can print out all the environment variables and values using a regular foreach loop:

foreach $key (keys %ENV) {
   print "$key -> $ENV{$key} 
";
}
						

Running Unix Programs with system

Want to run some other Unix command from inside a Perl script? No problem. Just use the system function to do it, like this:

system('ls'),

In this case, system will simply run the ls command, listing the contents of the current directory to the standard output. To include options to the command you want to run, just include them inside the string argument. Anything you can type at a shell command (and that is available through the current execution path), you can include as an argument to system.

system("find t -name '*.t' -print | xargs chmod +x &");
system('ls -l *.pl'),

If you use a double-quoted string as the argument to system, Perl will interpolate variables before it passes the string on to the shell:

system("grep $thing $file | sort | uniq >newfile.txt");

Note

Be very careful when passing data you have not personally verified to the shell (for example, data some user entered from the keyboard). Malicious users could give you data that, when passed through to the shell unchecked, could damage or allow unauthorized access to your system. At the very least, verify incoming data before passing it to the shell. Alternatively, a mechanism in Perl called taint mode allows you to control and manage potentially insecure (tainted) data. See the perlsec man page for more information.


The return value of the system function is the return value of the command itself from the shell: 0 for success and 1 or greater for failure. Note that this is the reverse of the standard values Perl uses for true and false, so if you want to check for errors that might result from calling system, you'll want to use an and logical instead of an or:

system('who') and die "Cannot execute who
";

When system runs, Perl passes the string argument to a shell (usually /bin/sh) to expand any shell metacharacters (for example, variables or filename globs), and that shell then executes the command. If you don't have any shell metacharacters, you can make the process more efficient by passing system a list of arguments, instead of a single string. The first element of the list should be the name of the command to run, and any other elements should be the various arguments to that command:

system("grep $thing $file");  # starts a shell
system("grep", "$thing", "$file");   # bypasses the shell, slightly more
efficient

Perl will also make this optimization for you if your string argument is simple enough—that is, if it doesn't contain any special characters that the shell must process before actually exiting the program (for example, shell variables or filename globs).

In either case—a single string argument or a list—the system function will end up spawning new subprocesses to handle each of the commands in its argument. Each new process inherits its current environment variables from the values in %ENV, and shares its standard input, output, and error with the Perl script. Perl will wait for the command to complete before continuing on with the script (unless the command has a & at the end of it, which will run that command in the background, just as it would in the shell).

Note

Don't be too quick to use the system function. Because system spawns a separate process for each of the commands it runs (and sometimes a process for the shell that runs those commands as well), all those extra processes can mean a lot of overhead for your Perl script. Usually it's better to do a task with a bit of code inside your Perl script than to spawn a Unix shell to do the same thing. More portable, too.


Input with Backquotes

You've already seen how to get input into a script through the use of standard input and via file handles. The third way is through the use of backquotes (``), a common paradigm used in Unix shells.

Backquotes work similarly to system in that they run a Unix command inside a Perl script. The difference is in the output. Commands run with system simply print their output to the standard output. When you use backquotes to run a Unix command, the output of that command is captured either as a string or as a list of strings, depending on the context in which you use the backquotes.

For example, take the ls command, which prints out a listing of the directory:

$ls = `ls`;

Here, the backquotes execute the ls command in a Unix shell, and the output of that command (the standard output) is assigned the scalar variable $ls. In scalar context (as with this example), the resulting output is stored as a single string; in list context each line of output becomes a list element.

As with system, any command you can give to a Unix shell you can include as a backquoted command, and that command runs in its own process, inherits its environment from %ENV, and shares standard input, output, and error. The contents of the backquoted string are also variable-interpolated by Perl as double-quoted strings are. The return status of the command is stored in the special variable $?. As with system, that return status is 0 if successful, or 1 or greater if it failed.

Using Processes: fork, wait, and exec

When you run a Perl script, it runs as its own Unix process. For many simple scripts, one process might be all you need, particularly if your script runs mostly in a linear start-to-finish way. If you create more complex scripts, where different parts of the script need to do different things all at the same time, then you'll want to create another process and run that part of the script independently. That's what the fork function is used for. When you have a new process, you can keep track of its process ID (PID), wait for it to complete, or run another program in that process. You'll learn about all these things in this section.

Note

Creating new processes, and managing how they behave, is the one feature of Unix Perl that's nearly impossible to duplicate on other systems. So while creating new processes can give you a good amount of power over your Perl scripts, if your scripts are portable you'll want to avoid these features or think about how to work around them on other platforms.

Threads, a new experimental feature in Perl 5.005, promise to help with the problems of porting process-based scripts across platforms. Threads offer quite a lot of the multiprocessing-like behavior of Unix processes, while also being more lightweight and portable across all platforms. As of this writing, however, threads are extremely new and very experimental.


How Processes Work

Multiple processes are used to run different parts of your script concurrently. When you start your script, a process is created. When you create a new process from inside a script, that new process will run on its own, in its own memory space, until it's done or until you stop it from running. From your script you can spawn as many processes as you need, up to the limits of your system.

Why would you need multiple processes? When you want different bits of your program to run at once, or for multiple copies of your program to run at the same time. One common use for processes is for creating network-based servers, which wait for a connection from a client, and then process that connection in some way. With a server that uses a single process, when the connection comes in the server “wakes up” and processes that connection (parsing the input, looking up values in databases, returning files—whatever). But if your server is busy processing one connection and another connection arrives in the meantime, that second connection will just have to wait. If you've got a busy server you can end up with a whole queue of connections waiting for the server to finish and move on to the next connection.

If you create a server that uses processes, however, you can have a main body of the script that does nothing but wait for connections, and a second part that does nothing but process those connections. Then, if the main server gets a connection, it can spawn a new process, hand off the connection to that new process, and then the parent is free to go back to listening for new connections. The second process, in turn, handles the input from that connection, and then exits (dies) when it's done. It can repeat this procedure for every new connection, allowing each one to be dealt with in parallel rather than serially.

Network servers make a good example for explaining why processes are useful, but you don't need a network to use them. Any time you want to run different parts of your script in parallel, or separate some processing-intensive part of your script from the main body, processes can help you.

If you're familiar with threads in a language like Java, you might think you understand processes already. But beware. Unlike threads, any running process is completely independent of any other process. The parent and child processes run independently of each other. There is no shared memory, no shared variables, and no simple way to communicate information from one process to another. To communicate between processes you'll need to set up a mechanism called inter-process communication (IPC). The space isn't available to talk about IPC in this book, but I'll give you some pointers in “Going Deeper” at the end of the lesson.

Using fork and exit

To create a new process in your Perl script, you use the fork function. fork, which takes no arguments, creates a new second process in addition to the process for the original script. Each new process is a clone of the first, with all the same values of the same variables (although it doesn't share those with the parent; they're different memory locations altogether). The child continues running the same script, in parallel, to the end, using the same environment and the same standard input and output as the parent. From the point of the fork onward, it's as if you had started two copies of the same script.

Running the same identical script, however, is not usually why you create a new process. Usually you want the new process (known as the child) to execute something different from the first process (the parent). The most common way to use fork, then, is with an if conditional, which tests for the return value of the fork function. fork returns a different value depending on whether the current process is the parent or the child. In the parent, the return result is the PID (process ID) of the new process. In the child, the return result is 0 (if the fork didn't happen, for whatever reason, the return value is undef). By testing for this return value, you can run different bits of code in the child than you do in the parent.

The core boilerplate for creating processes often looks something like this:

if (defined($pid = fork)) {  # fork worked
   if ($pid) {               # pid is some number, this is the parent
      &parent();
   }  else {                    # pid is 0.  this is the child.
      &child();
   }
} else {                       # fork didn't work, try again or fail
   die "Fork didn't work...
";
}

In this example, the first line calls fork and stores the result in the variable $pid (the variable name $pid is almost universally used for process IDs, but you can call it anything you want, of course). That result can be one of three things: a process ID, 0, or undef. The call to defined in that first line checks for a successful result; otherwise we drop down to the outer else and exit with an error.

Note

If the fork doesn't occur because of some error, the current error message (or error number, depending on how you use it) will be stored in the global system variable $!. Because many fork errors tend to be transient (an overloaded system might not have new processes available at the moment, some Perl programmers test for a value of $! that contains the string "No more Processes", wait a while, and then try forking again.


The successful result can either be 0 or some number representing the process ID (PID) of the new process; each result tells the script which process it is. Here two mythical subroutines, &parent() and &child(), are called to execute different parts of the script depending on whether the script is executing as the parent or as the child.

Here's a simple example (in Listing 18.1) of a script that forks three child processes, printing messages from the parent and from each child. The end of the script prints the message "End":

Listing 18.1. processes.pl
1:  #!/usr/bin/perl -w
2:  use strict;
3:
4:  my $pid = undef;
5:
6:  foreach my $i (1..3) {
7:      if (defined($pid = fork)) {
8:          if ($pid) { #parent
9:              print "Parent: forked child $i ($pid)
";
10:         }  else {    #child
11:             print "Child $i: running
";
12:             last;
13:         }
14:     }
15: }
16:
17: print "End...
";

The output of this script will look something like this (you might get different results on your own system:

# processes.pl
Parent: forked child 1 (8577)
Parent: forked child 2 (8578)
Parent: forked child 3 (8579)
End...
#
Child 1: running
End...
Child 2: running
Child 3: running
End...
End...
							

That's some weird output. All the output from each process is intermingled, and what's that extra prompt doing in the middle? Why are there four “End…” statements?

The answer to all these questions lies in how each process executes and what it prints at what time. Let's start by looking solely at what happens in the parent:

  • Fork a new process in line 7. In the parent, $pid gets the process id of that new process.

  • Test for a nonzero value of $pid, and print a message for each process that gets forked (lines 8 and 9).

  • Repeat these steps two more times for each turn of the foreach loop.

  • Print “End…”.

  • Exit (printing the system prompt).

All that occurs fairly rapidly, so the output from the parent happens fairly quickly. Now let's look at any of the three children, whose execution starts just after the fork:

  • Test for the value of $pid in line 8. $pid for each of the children is 0, so the test in line 8 is false and we drop to line 10. Print the message in line 11.

  • Exit the foreach immediately with last. Without a last here the child would go ahead and repeat the loop as many times as remain (remember, the child starts from the exact same point as the fork left off. It's a clone of the parent).

  • Print “End…”.

  • Exit. No system prompt, because it was the parent that created the child.

The output from all the processes is intermingled as each one prints to the standard output. Each child process, however, does take some time to start up before it runs, which is why the output of the parent is printed and the parent exits before some of the children even start. Note also that the line that prints the “End…” is printed regardless of whether a parent or a child is running; because the child has all the same code as the parent when it runs, it will happily continue past the block that it's supposed to run and continue on.

Depending on your situation, you might not want any of this behavior. You might want the parent to exit only after the child is done, or the child to stop running when it's done with its specific block of code. Or you might want the parent to wait for one child to finish before you start up another child. All this involves process management, which we'll explore in the next section.

Process Management with exit and wait (and Sometimes kill)

Starting up a child process with fork, and then letting it run is kind of like letting an actual child of say, age 4, run wild in a public place. You'll get results, but they might not be exactly what you (or the people around you) want. That's what process control is for. Two functions, exit and wait, help you keep control of your processes.

Let's look at exit first. The exit function, most simply, stops running the current script at the point where it is called. It's sort of like the die function, in that respect, except that die exits with a failed status (on Unix) and prints an error message. exit simply ends the program with an option status argument (0 for success, 1 for failed).

Exit is most commonly used to stop a child from executing more of the parent's code than it's supposed to. Put an exit at the end of the block of the child's code, and the child will run only that far and then stop. So, for example, in the little processes.pl script we looked at in that last section, let's replace the call to last with a call to exit, like this:

...
}  else {    #child
    print "Child $i: running
";
    exit;
}

With this modification, the child will print its message, and then exit altogether. It won't restart the foreach loop, and it also won't ever print the “End…”. The parent, which is executing the other branch of the if, executes the “End…” after the loop is complete. The output of this version of the script will look like this:

# procexit.pl
Parent: forked child 1 (11828)
Parent: forked child 2 (11829)
Parent: forked child 3 (11830)
End...
#
Child 1: running
Child 2: running
Child 3: running

As with the previous example, the output from the parent and the child is intermingled, and the parent completes before the children do.

For further control over when each child runs and when the parent exits, use the wait or waitpid functions. Both wait and waitpid do the same thing: they cause the current process (often the parent) to stop executing until the child is finished. This prevents mingling of the output, keeps the parent from exiting too soon and, in more complicated scripts than this one, prevents your script from leaving “zombie” processes (child processes that have finished executing but are still hanging around the system taking up processing space).

The difference in wait and waitpid is that wait takes no arguments, and waits for any child to return. If you spawn five processes, and then call wait, the wait will return a successful result when any of the five child processes exits. The waitpid function, on the other hand, takes a process ID argument, and waits for that specific child process to finish (remember, the parent gets the PID of the child as the return to the fork function).

Both wait and waitpid return the PID of the child that exited, or -1 if there are no child processes currently running.

Let's return once again to our process example, where we spawn three children in a foreach loop. For exit we changed the behavior of the children. Now let's change the behavior of the parent by adding a call to wait, and another message inside the parent part of the conditional:

if ($pid) { #parent
  print "Parent : forked child $i ($pid)
";
  wait;
  print "Parent: child $i ($pid) complete
";
} else { ....

In the parent code for the previous example, the parent simply printed the first message, and then the foreach loop would repeat, spawning three children in quick succession. In this version, the child is created, the parent prints the first message, and then waits for that child to complete. Then it prints the second message. Each turn of the loop occurs only after the current child is done and has exited. The output of this version of the script looks like this:

# procwait.pl
Parent : forked child 1 (11876)
Child 1: running
Parent: child 1 (11876) complete
Parent : forked child 2 (11877)
Child 2: running
Parent: child 2 (11877) complete
Parent : forked child 3 (11878)
Child 3: running
Parent: child 3 (11878) complete
End...
#

Note here that execution is very regular: each child is forked, runs, and exits before the next process starts. And the parent stays in execution until the third child is done, exiting only at the end.

The fact that this example runs each child serially, one after the other, makes it sort of silly to have processes at all (particularly given that each process takes time to start up and takes up extra processing space on your system). Because the wait function is so flexible, however, you don't have to wait for the most recently spawned child to finish before spawning another one—you could spawn five processes, and then later on in your script call wait five times to clean up all the processes. We'll look this later in this lesson when we explore a larger example.

There's one last function worth mentioning in reference to controlling processes: the kill function, which sends a kill signal to a process. To use kill, you'll need to know something about signals. For the sake of space I'm not going to talk about signals in this chapter, but see “Going Deeper” and the perlfunc man pages for a few pointers.

Running Something Else in a Process with exec

When you create a new process with fork, that process creates a clone of the current script and continues processing from there. Sometimes, however, when you create a new process, you want that process to stop what it's doing altogether and to run some other program instead. That's where exec comes in.

The exec function causes the current process to stop running the current script, and to run something else. The “something else” is usually some other program or script, given as the argument to exec, something like this:

exec("grep $who /etc/passwd");

The arguments to exec follow the same rules as with system; if you use a single string argument, Perl passes that argument to the shell first. With a list of arguments you can bypass the shell process. In fact, the similarities between exec and system are not coincidental—the system function is, actually, a fork and an exec put together.

When Perl encounters an exec, that's the end of that script. The exec shifts control to the new program being exec'ed; no other lines in the script will be executed.

Other Unix-Related Functions

In addition to the functions mentioned throughout this section, Perl's set of built-in functions includes a number of other process-related functions and smaller utility functions for getting information about various parts of the system. Because these functions apply specifically to Unix system files or features, most of these functions are not available on other systems (although the developers porting Perl to those systems might attempt to create rudimentary equivalents for the behavior of these functions).

Table 18.1 shows a summary of many of these functions. For more information on any of these, see Appendix A, “Perl Functions,” or the perlfunc man page.

Table 18.1. Unix-Related Functions
Function What it Does
alarm Send a SIGALRM signal to a process
chroot Change the root directory for the current process
getgrent, Look up or set setgrent,values from endgrent /etc/groups
getgrgid Look up a group file entry in /etc/groups
getgrnam Look up a group file entry by name in /etc/groups
getpgrp Get the process group name for a process
getppid Get the process ID of the parent process (if the current script is running in a child process)
getpriority Return the current priority for a process, process group, or user
getpwent, setpwent, endpwent Look up or set values from /etc/passwd
getpwnam Look up a user by name in /etc/passwd
getpwuid Look up a user by user ID (UID) in /etc/passwd
setpgrp Set the process group for a process

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.29.48