Forked processes are the traditional way to structure parallel tasks, and they are a fundamental part of the Unix tool set. It’s a straightforward way to start an independent program, whether it is different from the calling program or not. Forking is based on the notion of copying programs: when a program calls the fork routine, the operating system makes a new copy of that program in memory and starts running that copy in parallel with the original. Some systems don’t really copy the original program (it’s an expensive operation), but the new copy works as if it were a literal copy.
After a fork operation, the original copy of the program is
called the parent process, and the copy created
by os.fork
is called the child process. In
general, parents can make any number of children, and children can
create child processes of their own; all forked processes run
independently and in parallel under the operating system’s control. It
is probably simpler in practice than in theory, though. The Python
script in Example 5-1 forks
new child processes until you type the letter q
at the console.
Example 5-1. PP3ESystemProcessesfork1.py
# forks child processes until you type 'q' import os def child( ): print 'Hello from child', os.getpid( ) os._exit(0) # else goes back to parent loop def parent( ): while 1: newpid = os.fork( ) if newpid == 0: child( ) else: print 'Hello from parent', os.getpid( ), newpid if raw_input( ) == 'q': break parent( )
Python’s process forking tools, available in the os
module, are simply thin wrappers over
standard forking calls in the C library. To start a new, parallel
process, call the os.fork
built-in
function. Because this function generates a copy of the calling
program, it returns a different value in each copy: zero in the child
process, and the process ID of the new child in the parent. Programs
generally test this result to begin different processing in the child
only; this script, for instance, runs the child
function in child processes
only.[*]
Unfortunately, this won’t work on Windows in standard Python
today; fork
is too much at odds
with the Windows model, and a port of this call is still in the works
(see also this chapter’s sidebar about Cygwin Python—you can fork with
Python on Windows under Cygwin, but it’s not exactly the same).
Because forking is ingrained in the Unix programming model, though,
this script works well on Unix, Linux, and modern Macs:
[mark@toy]$python fork1.py
Hello from parent 671 672 Hello from child 672 Hello from parent 671 673 Hello from child 673 Hello from parent 671 674 Hello from child 674q
These messages represent three forked child processes; the
unique identifiers of all the processes involved are fetched and
displayed with the os.getpid
call. A subtle point: the child
process function is also careful to
exit explicitly with an os._exit
call. We’ll discuss this call in more detail later in this chapter,
but if it’s not made, the child process would live on after the
child
function returns (remember,
it’s just a copy of the original process). The net effect is that the
child would go back to the loop in parent
and start forking children of its own
(i.e., the parent would have grandchildren). If you delete the exit
call and rerun, you’ll likely have to type more than one q to stop,
because multiple processes are running in the parent
function.
In Example 5-1, each
process exits very soon after it starts, so there’s little overlap in
time. Let’s do something slightly more sophisticated to better
illustrate multiple forked processes running in parallel. Example 5-2 starts up 10 copies of
itself, each copy counting up to 10 with a one-second delay between
iterations. The time.sleep
built-in
call simply pauses the calling process for a number of seconds (you
can pass a floating-point value to pause for fractions of
seconds).
Example 5-2. PP3ESystemProcessesfork-count.py
########################################################################## # fork basics: start 10 copies of this program running in parallel with # the original; each copy counts up to 10 on the same stdout stream--forks # copy process memory, including file descriptors; fork doesn't currently # work on Windows (without Cygwin): use os.spawnv to start programs on # Windows instead; spawnv is roughly like a fork+exec combination; ########################################################################## import os, time def counter(count): for i in range(count): time.sleep(1) print '[%s] => %s' % (os.getpid( ), i) for i in range(10): pid = os.fork( ) if pid != 0: print 'Process %d spawned' % pid else: counter(10) os._exit(0) print 'Main process exiting.'
When run, this script starts 10 processes immediately and exits. All 10 forked processes check in with their first count display one second later and every second thereafter. Child processes continue to run, even if the parent process that created them terminates:
mark@toy]$python fork-count.py
Process 846 spawned
Process 847 spawned
Process 848 spawned
Process 849 spawned
Process 850 spawned
Process 851 spawned
Process 852 spawned
Process 853 spawned
Process 854 spawned
Process 855 spawned
Main process exiting.
[mark@toy]$
[846] => 0
[847] => 0
[848] => 0
[849] => 0
[850] => 0
[851] => 0
[852] => 0
[853] => 0
[854] => 0
[855] => 0
[847] => 1
[846] => 1
...more output deleted...
The output of all of these processes shows up on the same screen, because all of them share the standard output stream. Technically, a forked process gets a copy of the original process’s global memory, including open file descriptors. Because of that, global objects like files start out with the same values in a child process, so all the processes here are tied to the same single stream. But it’s important to remember that global memory is copied, not shared; if a child process changes a global object, it changes only its own copy. (As we’ll see, this works differently in threads, the topic of the next section.)
In Examples
5-1 and 5-2, child
processes simply ran a function within the Python program and then
exited. On Unix-like platforms, forks are often the basis of
starting independently running programs that are completely
different from the program that performed the fork
call. For instance, Example 5-3 forks new processes
until we type q again, but child processes run a brand-new program
instead of calling a function in the same file.
Example 5-3. PP3ESystemProcessesfork-exec.py
# starts programs until you type 'q' import os parm = 0 while 1: parm = parm+1 pid = os.fork( ) if pid == 0: # copy process os.execlp('python', 'python', 'child.py', str(parm)) # overlay program assert False, 'error starting program' # shouldn't return else: print 'Child is', pid if raw_input( ) == 'q': break
If you’ve done much Unix development, the fork
/exec
combination will probably look
familiar. The main thing to notice is the os.execlp
call in this code. In a
nutshell, this call overlays (i.e., replaces)
with another process the program that is running in the current
process. Because of that, the combination of
os.fork
and os.execlp
means start a new process and
run a new program in that process—in other words, launch a new
program in parallel with the original program.
The arguments to os.execlp
specify the program to be run
by giving command-line arguments used to start the program (i.e.,
what Python scripts know as sys.argv
). If successful, the new
program begins running and the call to os.execlp
itself never returns (since
the original program has been replaced, there’s really nothing to
return to). If the call does return, an error has occurred, so we
code an assert
after it that
will always raise an exception if reached.
There are a handful of os.exec
variants in the Python standard
library; some allow us to configure environment variables for the
new program, pass command-line arguments in different forms, and
so on. All are available on both Unix and Windows, and they
replace the calling program (i.e., the Python interpreter).
exec
comes in eight flavors,
which can be a bit confusing unless you generalize:
os.execv(
program, commandlinesequence
)
The basic “v” exec
form is passed an executable program’s name, along with a
list or tuple of command-line argument strings used to run
the executable (that is, the words you would normally type
in a shell to start a program).
os.execl(
program, cmdarg1, cmdarg2,...
cmdargN
)
The basic “l” exec
form is passed an executable’s name, followed by one or more
command-line arguments passed as individual function
arguments. This is the same as os.execv(
program,
(
cmdarg1,
cmdarg2,...
))
.
os.execlp
os.execvp
Adding the letter p to the execv
and execl
names means that Python will
locate the executable’s directory using your system
search-path setting (i.e., PATH
).
os.execle
os.execve
Adding a letter e to the execv
and execl
names means an extra,
last argument is a dictionary
containing shell environment variables to send to the
program.
os.execvpe
os.execlpe
Adding the letters p and e to the basic exec
names means to use the search
path and to accept a shell environment
settings dictionary.
So, when the script in Example 5-3 calls os.execlp
, individually passed
parameters specify a command line for the program to be run on,
and the word python maps to an executable
file according to the underlying system search-path setting
environment variable (PATH
).
It’s as if we were running a command of the form python child.py 1
in a shell, but with a
different command-line argument on the end each time.
Just as when typed at a shell, the string of
arguments passed to os.execlp
by the fork-exec
script in
Example 5-3 starts
another Python program file, as shown in Example 5-4.
Example 5-4. PP3ESystemProcesseschild.py
import os, sys print 'Hello from child', os.getpid( ), sys.argv[1]
Here is this code in action on Linux. It doesn’t look much
different from the original fork1.py, but
it’s really running a new program in each
forked process. The more observant readers may notice that the
child process ID displayed is the same in the parent program and
the launched child.py program; os.execlp
simply overlays a program in
the same process.
[mark@toy]$python fork-exec.py
Child is 1094 Hello from child 1094 1 Child is 1095 Hello from child 1095 2 Child is 1096 Hello from child 1096 3q
There are other ways to start up programs in Python,
including the os.system
and
os.popen
we first met in Chapter 3 (to start shell command
lines), and the os.spawnv
call
we’ll meet later in this chapter (to start independent programs on
Windows and Unix); we will further explore such process-related
topics in more detail later in this chapter. We’ll also discuss
additional process topics in later chapters of this book. For
instance, forks are revisited in Chapter 13 to deal with servers
and their zombies—i.e., dead processes
lurking in system tables after their demise.
[*] At least in the current Python implementation, calling
os.fork
in a Python script
actually copies the Python interpreter process (if you look at
your process list, you’ll see two Python entries after a fork).
But since the Python interpreter records everything about your
running script, it’s OK to think of fork
as copying your program directly.
It really will if Python scripts are ever compiled to binary
machine code.
3.144.39.144