Forked processes are the traditional way to structure parallel tasks, and are a fundamental part of the Unix tool set. Forking is based on the notion of copying programs: when a program calls the fork routine, the operating system makes a new copy of that program in memory, and starts running that copy in parallel with the original. Some systems don’t really copy the original program (it’s an expensive operation), but the new copy works as if it was a literal copy.
After a fork operation, the original copy of the program is called
the parent process, and the copy created by
os.fork
is called the child
process. In general, parents can make any number of children, and
children can create child processes of their own -- all forked
processes run independently and in parallel under the operating
system’s control. It is probably simpler in practice than
theory, though; the Python script in Example 3-1
forks new child processes until you type a “q” at the
console.
Example 3-1. PP2ESystemProcessesfork1.py
# forks child processes until you type 'q' import os def child( ): print 'Hello from child', os.getpid( ) os._exit(0) # else goes back to parent loop def parent( ): while 1: newpid = os.fork( ) if newpid == 0: child( ) else: print 'Hello from parent', os.getpid( ), newpid if raw_input( ) == 'q': break parent( )
Python’s process forking tools, available in the
os
module, are simply thin wrappers over standard
forking calls in the C library. To start a new, parallel process,
call the os.fork
built-in function. Because this
function generates a copy of the calling program, it returns a
different value in each copy: zero in the child process, and the
process ID of the new child in the parent. Programs generally test
this result to begin different processing in the child only; this
script, for instance, runs the child
function in
child processes only.[23]
Unfortunately, this won’t work on Windows today;
fork
is at odds with the Windows model, and a port
of this call is still in the works. But because forking is ingrained
into the Unix programming model, this script works well on Unix and
Linux:
[mark@toy]$python fork1.py
Hello from parent 671 672 Hello from child 672 Hello from parent 671 673 Hello from child 673 Hello from parent 671 674 Hello from child 674q
These messages represent three forked child processes; the unique
identifiers of all the processes involved are fetched and displayed
with the os.getpid
call. A subtle point: The
child
process function is also careful to exit
explicitly with an os._exit
call. We’ll
discuss this call in more detail later in this chapter, but if
it’s not made, the child process would live on after the
child
function returns (remember, it’s just
a copy of the original process). The net effect is that the child
would go back to the loop in parent
and start
forking children of its own (i.e., the parent would have
grandchildren). If you delete the exit call and rerun, you’ll
likely have to type more than one “q” to stop, because
multiple processes are running in the parent
function.
In Example 3-1, each process exits very soon after
it starts, so there’s little overlap in time. Let’s do
something slightly more sophisticated to better illustrate multiple
forked processes running in parallel. Example 3-2
starts up 10 copies of itself, each copy counting up to 10 with a
one-second delay between iterations. The
time.sleep
built-in call simply pauses the calling
process for a number of seconds (pass a floating-point value to pause
for fractions of seconds).
Example 3-2. PP2ESystemProcessesfork-count.py
############################################################ # fork basics: start 10 copies of this program running in # parallel with the original; each copy counts up to 10 # on the same stdout stream--forks copy process memory, # including file descriptors; fork doesn't currently work # on Windows: use os.spawnv to start programs on Windows # instead; spawnv is roughly like a fork+exec combination; ############################################################ import os, time def counter(count): for i in range(count): time.sleep(1) print '[%s] => %s' % (os.getpid( ), i) for i in range(10): pid = os.fork( ) if pid != 0: print 'Process %d spawned' % pid else: counter(10) os._exit(0) print 'Main process exiting.'
When run, this script starts 10 processes immediately and exits. All 10 forked processes check in with their first count display one second later, and every second thereafter. Child processes continue to run, even if the parent process that created them terminates:
mark@toy]$ python fork-count.py
Process 846 spawned
Process 847 spawned
Process 848 spawned
Process 849 spawned
Process 850 spawned
Process 851 spawned
Process 852 spawned
Process 853 spawned
Process 854 spawned
Process 855 spawned
Main process exiting.
[mark@toy]$
[846] => 0
[847] => 0
[848] => 0
[849] => 0
[850] => 0
[851] => 0
[852] => 0
[853] => 0
[854] => 0
[855] => 0
[847] => 1
[846] => 1
...more output deleted...
The output of all these processes shows up on the same screen, because they all share the standard output stream. Technically, a forked process gets a copy of the original process’s global memory, including open file descriptors. Because of that, global objects like files start out with the same values in a child process. But it’s important to remember that global memory is copied, not shared -- if a child process changes a global object, it changes its own copy only. (As we’ll see, this works differently in threads, the topic of the next section.)
Examples Example 3-1 and Example 3-2 child processes simply ran a function within
the Python program and exited. On Unix-like platforms, forks are
often the basis of starting independently running programs that are
completely different from the program that performed the
fork
call. For instance, Example 3-3 forks new processes until we type
“q” again, but child processes run a brand new program
instead of calling a function in the same file.
Example 3-3. PP2ESystemProcessesfork-exec.py
# starts programs until you type 'q' import os parm = 0 while 1: parm = parm+1 pid = os.fork( ) if pid == 0: # copy process os.execlp('python', 'python', 'child.py', str(parm)) # overlay program assert 0, 'error starting program' # shouldn't return else: print 'Child is', pid if raw_input( ) == 'q': break
If you’ve done much Unix development, the
fork
/exec
combination will
probably look familiar. The main thing to notice is the
os.execlp
call in this code. In a nutshell, this
call overlays (i.e., replaces) the program
running in the current process with another program. Because of that,
the combination of os.fork
and os.execlp
means start a new process, and run a
new program in that process -- in other words, launch a new
program in parallel with the original program.
The arguments to os.execlp
specify the program to
be run by giving command-line arguments used to start the program
(i.e., what Python scripts know as sys.argv
). If
successful, the new program begins running and the call to
os.execlp
itself never returns (since the original
program has been replaced, there’s really nothing to return
to). If the call does return, an error has occurred, so we code an
assert
after it that will always raise an
exception if reached.
There are a handful of os.exec
variants in the
Python standard library; some allow us to configure environment
variables for the new program, pass command-line arguments in
different forms, and so on. All are available on both Unix and
Windows, and replace the calling program (i.e., the Python
interpreter). exec
comes in eight flavors, which
can be a bit confusing unless you generalize:
os.execv(
program,
commandlinesequence
)
The basic “v” exec
form is passed an
executable program’s name, along with a list or tuple of
command-line argument strings used to run the executable (that is,
the words you would normally type in a shell to start a program).
os.execl(
program,
cmdarg1,
cmdarg2,...
cmdargN
)
The basic “l” exec
form is passed an
executable’s name, followed by one or more command-line
arguments passed as individual function arguments. This is the same
as os.execv(
program,
(
cmdarg1,
cmdarg2,...
))
.
os.execlp
, os.execvp
Adding a “p” to the execv
and
execl
names means that Python will locate the
executable’s directory using your system search-path setting
(i.e., PATH).
os.execle
, os.execve
Adding an “e” to the execv
and
execl
names means an extra, last
argument is a dictionary containing shell environment
variables to send to the program.
os.execvpe
, os.execlpe
Adding both “p” and “e” to the basic
exec
names means to use the search-path,
and accept a shell environment settings
dictionary.
So, when the script in Example 3-3 calls
os.execlp
, individually passed parameters specify
a command line for the program to be run on, and the word
“python” maps to an executable file according to the
underlying system search-path setting ($PATH). It’s as if we
were running a command of the form python
child.py
1
in a shell, but with
a different command-line argument on the end each time.
Just as when typed at a shell, the string of arguments passed to
os.execlp
by the fork-exec
script in Example 3-3 starts another Python program
file, shown in Example 3-4.
Example 3-4. PP2ESystemProcesseschild.py
import os, sys print 'Hello from child', os.getpid( ), sys.argv[1]
Here is this code in action on Linux. It doesn’t look much
different from the original fork1.py
, but
it’s really running a new program in each
forked process. The more observant readers may notice that the child
process ID displayed is the same in the parent program and the
launched child.py
program -- os.execlp
simply overlays a program
in the same process:
[mark@toy]$python fork-exec.py
Child is 1094 Hello from child 1094 1 Child is 1095 Hello from child 1095 2 Child is 1096 Hello from child 1096 3q
There are other ways to start up programs in Python, including the
os.system
and os.popen
we met
in Chapter 2 (to start shell command lines), and
the os.spawnv
call we’ll meet later in this
chapter (to start independent programs on Windows); we further
explore such process-related topics in more detail later in this
chapter. We’ll also discuss additional process topics in later
chapters of this book. For instance, forks are revisited in Chapter 10, to deal with “zombies” -- dead
processes lurking in system tables after their demise.
[23] At least in the current Python
implementation, calling os.fork
in a Python script
actually copies the Python interpreter process (if you look at your
process list, you’ll see two Python entries after a fork). But
since the Python interpreter records everything about your running
script, it’s okay to think of fork
as
copying your program directly. It really will, if Python scripts are
ever compiled to binary machine code.
3.134.104.161