Pipes, another cross-program communication device, are made
available in Python with the built-in os.pipe
call. Pipes are unidirectional
channels that work something like a shared memory buffer, but with an
interface resembling a simple file on each of two ends. In typical
use, one program writes data on one end of the pipe, and another reads
that data on the other end. Each program sees only its end of the
pipes and processes it using normal Python file calls.
Pipes are much more within the operating system, though. For instance, calls to read a pipe will normally block the caller until data becomes available (i.e., is sent by the program on the other end) instead of returning an end-of-file indicator. Because of such properties, pipes are also a way to synchronize the execution of independent programs.
Pipes come in two flavors—anonymous and named. Named pipes (sometimes called fifos) are represented by a file on your computer. Anonymous pipes exist only within processes, though, and are typically used in conjunction with process forks as a way to link parent and spawned child processes within an application; parent and child converse over shared pipe file descriptors. Because named pipes are really external files, the communicating processes need not be related at all (in fact, they can be independently started programs).
Since they are more traditional, let’s start with a look at
anonymous pipes. To illustrate, the script in Example 5-16 uses the os.fork
call to make a copy of the calling
process as usual (we met forks earlier in this chapter). After
forking, the original parent process and its child copy speak
through the two ends of a pipe created with os.pipe
prior to the fork. The os.pipe
call returns a tuple of two
file descriptors—the low-level file identifiers
we met earlier—representing the input and output sides of the pipe.
Because forked child processes get copies of
their parents’ file descriptors, writing to the pipe’s output
descriptor in the child sends data back to the parent on the pipe
created before the child was spawned.
Example 5-16. PP3ESystemProcessespipe1.py
import os, time def child(pipeout): zzz = 0 while 1: time.sleep(zzz) # make parent wait os.write(pipeout, 'Spam %03d' % zzz) # send to parent zzz = (zzz+1) % 5 # goto 0 after 4 def parent( ): pipein, pipeout = os.pipe( ) # make 2-ended pipe if os.fork( ) == 0: # copy this process child(pipeout) # in copy, run child else: # in parent, listen to pipe while 1: line = os.read(pipein, 32) # blocks until data sent print 'Parent %d got "%s" at %s' % (os.getpid(), line, time.time( )) parent( )
If you run this program on Linux (pipe
is now available on Windows, but
fork
is not), the parent process
waits for the child to send data on the pipe each time it calls
os.read
. It’s almost as if the
child and parent act as client and server here—the parent starts the
child and waits for it to initiate communication.[*] Just to tease, the child keeps the parent waiting one
second longer between messages with time.sleep
calls, until the delay has
reached four seconds. When the zzz
delay counter hits 005, it rolls back
down to 000 and starts again:
[mark@toy]$python pipe1.py
Parent 1292 got "Spam 000" at 968370008.322
Parent 1292 got "Spam 001" at 968370009.319
Parent 1292 got "Spam 002" at 968370011.319
Parent 1292 got "Spam 003" at 968370014.319
Parent 1292 got "Spam 004Spam 000" at 968370018.319
Parent 1292 got "Spam 001" at 968370019.319
Parent 1292 got "Spam 002" at 968370021.319
Parent 1292 got "Spam 003" at 968370024.319
Parent 1292 got "Spam 004Spam 000" at 968370028.319
Parent 1292 got "Spam 001" at 968370029.319
Parent 1292 got "Spam 002" at 968370031.319
Parent 1292 got "Spam 003" at 968370034.319
If you look closely, you’ll see that when the child’s delay
counter hits 004, the parent ends up reading two messages from the
pipe at once; the child wrote two distinct
messages, but they were close enough in time to be fetched as a
single unit by the parent. Really, the parent blindly asks to read,
at most, 32 bytes each time, but it gets back whatever text is
available in the pipe (when it becomes available). To distinguish
messages better, we can mandate a separator character in the pipe.
An end-of-line makes this easy, because we can wrap the pipe
descriptor in a file object with os.fdopen
and rely on the file object’s
readline
method to scan up
through the next
separator in
the pipe. Example 5-17
implements this scheme.
Example 5-17. PP3ESystemProcessespipe2.py
# same as pipe1.py, but wrap pipe input in stdio file object # to read by line, and close unused pipe fds in both processes import os, time def child(pipeout): zzz = 0 while 1: time.sleep(zzz) # make parent wait os.write(pipeout, 'Spam %03d ' % zzz) # send to parent zzz = (zzz+1) % 5 # roll to 0 at 5 def parent( ): pipein, pipeout = os.pipe( ) # make 2-ended pipe if os.fork( ) == 0: # in child, write to pipe os.close(pipein) # close input side here child(pipeout) else: # in parent, listen to pipe os.close(pipeout) # close output side here pipein = os.fdopen(pipein) # make stdio input object while 1: line = pipein.readline( )[:-1] # blocks until data sent print 'Parent %d got "%s" at %s' % (os.getpid(), line, time.time( )) parent( )
This version has also been augmented to close the unused end of the pipe in each process (e.g., after the fork, the parent process closes its copy of the output side of the pipe written by the child); programs should close unused pipe ends in general. Running with this new version returns a single child message to the parent each time it reads from the pipe, because they are separated with markers when written:
[mark@toy]$python pipe2.py
Parent 1296 got "Spam 000" at 968370066.162
Parent 1296 got "Spam 001" at 968370067.159
Parent 1296 got "Spam 002" at 968370069.159
Parent 1296 got "Spam 003" at 968370072.159
Parent 1296 got "Spam 004" at 968370076.159
Parent 1296 got "Spam 000" at 968370076.161
Parent 1296 got "Spam 001" at 968370077.159
Parent 1296 got "Spam 002" at 968370079.159
Parent 1296 got "Spam 003" at 968370082.159
Parent 1296 got "Spam 004" at 968370086.159
Parent 1296 got "Spam 000" at 968370086.161
Parent 1296 got "Spam 001" at 968370087.159
Parent 1296 got "Spam 002" at 968370089.159
Pipes normally let data flow in only one direction—one side is input, one is output. What if you need your programs to talk back and forth, though? For example, one program might send another a request for information and then wait for that information to be sent back. A single pipe can’t generally handle such bidirectional conversations, but two pipes can. One pipe can be used to pass requests to a program and another can be used to ship replies back to the requestor.[*]
The module in Example
5-18 demonstrates one way to apply this idea to link the
input and output streams of two programs. Its spawn
function forks a new child program
and connects the input and output streams of the parent to the
output and input streams of the child. That is:
When the parent reads from its standard input, it is reading text sent to the child’s standard output.
When the parent writes to its standard output, it is sending data to the child’s standard input.
The net effect is that the two independent programs communicate by speaking over their standard streams.
Example 5-18. PP3ESystemProcessespipes.py
############################################################################# # spawn a child process/program, connect my stdin/stdout to child process's # stdout/stdin--my reads and writes map to output and input streams of the # spawned program; much like os.popen2 plus parent stream redirection; ############################################################################# import os, sys def spawn(prog, *args): # pass progname, cmdline args stdinFd = sys.stdin.fileno( ) # get descriptors for streams stdoutFd = sys.stdout.fileno( ) # normally stdin=0, stdout=1 parentStdin, childStdout = os.pipe( ) # make two IPC pipe channels childStdin, parentStdout = os.pipe( ) # pipe returns (inputfd, outoutfd) pid = os.fork( ) # make a copy of this process if pid: os.close(childStdout) # in parent process after fork: os.close(childStdin) # close child ends in parent os.dup2(parentStdin, stdinFd) # my sys.stdin copy = pipe1[0] os.dup2(parentStdout, stdoutFd) # my sys.stdout copy = pipe2[1] else: os.close(parentStdin) # in child process after fork: os.close(parentStdout) # close parent ends in child os.dup2(childStdin, stdinFd) # my sys.stdin copy = pipe2[0] os.dup2(childStdout, stdoutFd) # my sys.stdout copy = pipe1[1] args = (prog,) + args os.execvp(prog, args) # new program in this process assert False, 'execvp failed!' # os.exec call never returns here if _ _name_ _ == '_ _main_ _': mypid = os.getpid( ) spawn('python', 'pipes- testchild.py', 'spam') # fork child program print 'Hello 1 from parent', mypid # to child's stdin sys.stdout.flush( ) # subvert stdio buffering reply = raw_input( ) # from child's stdout sys.stderr.write('Parent got: "%s" ' % reply) # stderr not tied to pipe! print 'Hello 2 from parent', mypid sys.stdout.flush( ) reply = sys.stdin.readline( ) sys.stderr.write('Parent got: "%s" ' % reply[:-1])
The spawn
function in this
module does not work on Windows (remember that fork
isn’t yet available there today). In
fact, most of the calls in this module map straight to Unix system
calls (and may be arbitrarily terrifying at first glance to non-Unix
developers). We’ve already met some of these (e.g., os.fork
), but much of this code depends on
Unix concepts we don’t have time to address well in this text. But
in simple terms, here is a brief summary of the system calls
demonstrated in this code:
os.fork
Copies the calling process as usual and returns the child’s process ID in the parent process only.
os.execvp
Overlays a new program in the calling process; it’s just
like the os.execlp
used
earlier but takes a tuple or
list of command-line argument strings
(collected with the *args
form in the function header).
os.pipe
Returns a tuple of file descriptors representing the input and output ends of a pipe, as in earlier examples.
os.close(fd)
Closes the descriptor-based file fd
.
os.dup2(fd1,fd2)
Copies all system information associated with the file
named by the file descriptor fd1
to the file named by fd2
.
In terms of connecting standard streams, os.dup2
is the real nitty-gritty here. For
example, the call os.dup2(parentStdin,stdinFd)
essentially
assigns the parent process’s stdin
file to the input end of one of the
two pipes created; all stdin
reads will henceforth come from the pipe. By connecting the other
end of this pipe to the child process’s copy of the stdout
stream file with os.dup2(childStdout,stdoutFd)
, text
written by the child to its sdtdout
winds up being routed through the
pipe to the parent’s stdin
stream.
To test this utility, the self-test code at the end of the file spawns the program shown in Example 5-19 in a child process and reads and writes standard streams to converse with it over two pipes.
Example 5-19. PP3ESystemProcessespipes-testchild.py
import os, time, sys mypid = os.getpid( ) parentpid = os.getppid( ) sys.stderr.write('Child %d of %d got arg: %s ' % (mypid, parentpid, sys.argv[1])) for i in range(2): time.sleep(3) # make parent process wait by sleeping here input = raw_input( ) # stdin tied to pipe: comes from parent's stdout time.sleep(3) reply = 'Child %d got: [%s]' % (mypid, input) print reply # stdout tied to pipe: goes to parent's stdin sys.stdout.flush( ) # make sure it's sent now or else process blocks
Here is our test in action on Linux; its output is not incredibly impressive to read, but it represents two programs running independently and shipping data back and forth through a pipe device managed by the operating system. This is even more like a client/server model (if you imagine the child as the server). The text in square brackets in this output went from the parent process to the child and back to the parent again, all through pipes connected to standard streams:
[mark@toy]$python pipes.py
Child 797 of 796 got arg: spam
Parent got: "Child 797 got: [Hello 1 from parent 796]"
Parent got: "Child 797 got: [Hello 2 from parent 796]"
The two processes of the prior section’s example engage in a
simple dialog, but it’s already enough to illustrate some of the
dangers lurking in cross-program communications. First of all,
notice that both programs need to write to stderr
to display a message; their
stdout
streams are tied to the
other program’s input stream. Because processes share file
descriptors, stderr
is the same
in both parent and child, so status messages show up in the same
place.
More subtly, note that both parent and child call sys.stdout.flush
after they print text
to the stdout
stream. Input
requests on pipes normally block the caller if no data is
available, but it seems that this shouldn’t be a problem in our
example because there are as many writes as there are reads on the
other side of the pipe. By default, though, sys.stdout
is
buffered, so the printed text may not
actually be transmitted until some time in the future (when the
stdio
output buffers fill up).
In fact, if the flush calls are not made, both processes will get
stuck waiting for input from the other—input that is sitting in a
buffer and is never flushed out over the pipe. They wind up in a
deadlock state, both blocked on raw_input
calls waiting for events that
never occur.
Keep in mind that output buffering is really a function of
the system libraries used to access pipes, not of the pipes
themselves (pipes do queue up output data, but they never hide it
from readers!). In fact, it occurs in this example only because we
copy the pipe’s information over to sys.stdout
, a built-in file object that
uses stdio
buffering by
default. However, such anomalies can also occur when using other
cross-process tools, such as the popen2
and popen3
calls introduced in Chapter 3.
In general terms, if your programs engage in a two-way dialog like this, there are at least three ways to avoid buffer-related deadlock problems:
As demonstrated in this example, manually flushing
output pipe streams by calling the file flush
method is an easy way to force
buffers to be cleared.
It’s possible to use pipes in unbuffered mode. Either
use low-level os
module
calls to read and write pipe descriptors directly, or (on most
systems) pass a buffer size argument of zero to os.fdopen
to disable stdio
buffering in the file object
used to wrap the descriptor. For fifos, described in the next
section, do the same for open
.
Simply use the -u
Python command-line flag to turn off buffering for the
sys.stdout
stream (or
equivalently, set your PYTHONUNBUFFERED
environment
variable to a nonempty value).
The last technique merits a few more words. Try this: delete
all the sys.stdout.flush
calls
in Example 5-18 and
Example 5-19 (the files
pipes.py and
pipes-testchild.py) and change the parent’s
spawn call in pipes.py to this (i.e., add a
-u
command-line
argument):
spawn('python', '-u', 'pipes-testchild.py', 'spam')
Then start the program with a command line like this:
python -u pipes.py
. It will
work as it did with the manual stdout
flush calls, because stdout
will be operating in unbuffered
mode.
We’ll revisit the effects of unbuffered output streams in
Chapter 11, when we code a
GUI that displays the output of a non-GUI program by reading it
over a pipe in a thread. Deadlock in general, though, is a bigger
problem than we have space to address here; on the other hand, if
you know enough that you want to do IPC in Python, you’re probably
already a veteran of the deadlock wars. See also the sidebar below
on the pty
module and Pexpect
package for related tools.
On some platforms, it is also possible to create a pipe that exists as a file. Such files are called named pipes (or, sometimes, fifos) because they behave just like the pipes created within the previous section’s programs but are associated with a real file somewhere on your computer, external to any particular program.
Once a named pipe file is created, processes read and write it using normal file operations. Fifos are unidirectional streams. In typical operation, a server program reads data from the fifo, and one or more client programs write data to it. But a set of two fifos can be used to implement bidirectional communication just as we did for anonymous pipes in the prior section.
Because fifos reside in the filesystem, they are longer-lived than in-process anonymous pipes and can be accessed by programs started independently. The unnamed, in-process pipe examples thus far depend on the fact that file descriptors (including pipes) are copied to child processes’ memory. That makes it difficult to use anonymous pipes to connect programs started independently. With fifos, pipes are accessed instead by a filename visible to all programs running on the computer, regardless of any parent/child process relationships.
Because of that, fifos are better suited as general IPC mechanisms for independent client and server programs. For instance, a perpetually running server program may create and listen for requests on a fifo that can be accessed later by arbitrary clients not forked by the server. In a sense, fifos are an alternative to the socket interface we’ll meet in the next part of this book, but fifos do not directly support remote network connections, are not available on as many platforms, and are accessed using the standard file interface instead of the more unique socket port numbers and calls we’ll study later.
In Python, named pipe files are created with the os.mkfifo
call, available today on
Unix-like platforms but not on all flavors of Windows (though this
call is also available in Cygwin Python on Windows—see the earlier
sidebar). This creates only the external file, though; to send and
receive data through a fifo, it must be opened and processed as if
it were a standard file. Example
5-20 is a derivation of the pipe2.py
script listed earlier. It is written to use fifos rather than
anonymous pipes.
Example 5-20. PP3ESystemProcessespipefifo.py
############################################################### # named pipes; os.mkfifo not available on Windows 95/98/XP # (without Cygwin); no reason to fork here, since fifo file # pipes are external to processes--shared fds are irrelevent; ############################################################### import os, time, sys fifoname = '/tmp/pipefifo' # must open same name def child( ): pipeout = os.open(fifoname, os.O_WRONLY) # open fifo pipe file as fd zzz = 0 while 1: time.sleep(zzz) os.write(pipeout, 'Spam %03d ' % zzz) zzz = (zzz+1) % 5 def parent( ): pipein = open(fifoname, 'r') # open fifo as stdio object while 1: line = pipein.readline( )[:-1] # blocks until data sent print 'Parent %d got "%s" at %s' % (os.getpid(), line, time.time( )) if _ _name_ _ == '_ _main_ _': if not os.path.exists(fifoname): os.mkfifo(fifoname) # create a named pipe file if len(sys.argv) == 1: parent( ) # run as parent if no args else: # else run as child process child( )
Because the fifo exists independently of both parent and child, there’s no reason to fork here. The child may be started independently of the parent as long as it opens a fifo file by the same name. Here, for instance, on Linux the parent is started in one xterm window and then the child is started in another. Messages start appearing in the parent window only after the child is started and begins writing messages onto the fifo file:
[mark@toy]$python pipefifo.py
Parent 657 got "Spam 000" at 968390065.865 Parent 657 got "Spam 001" at 968390066.865 Parent 657 got "Spam 002" at 968390068.865 Parent 657 got "Spam 003" at 968390071.865 Parent 657 got "Spam 004" at 968390075.865 Parent 657 got "Spam 000" at 968390075.867 Parent 657 got "Spam 001" at 968390076.865 Parent 657 got "Spam 002" at 968390078.865 [mark@toy]$file /tmp/pipefifo
/tmp/pipefifo: fifo (named pipe) [mark@toy]$python pipefifo.py -child
[*] We will clarify the notions of “client” and “server” in the Internet programming part of this book. There, we’ll communicate with sockets (which are very roughly like bidirectional pipes for networks), but the overall conversation model is similar. Named pipes (fifos), described later, are a better match to the client/server model because they can be accessed by arbitrary, unrelated processes (no forks are required). But as we’ll see, the socket port model is generally used by most Internet scripting protocols.
[*] This really does have real-world applications. For instance, I once added a GUI interface to a command-line debugger for a C-like programming language by connecting two processes with pipes. The GUI ran as a separate process that constructed and sent commands to the existing debugger’s input stream pipe and parsed the results that showed up in the debugger’s output stream pipe. In effect, the GUI acted like a programmer typing commands at a keyboard. By spawning command-line programs with streams attached by pipes, systems can add new interfaces to legacy programs. We’ll see a simple example of this sort of structure in Chapter 11.
18.118.12.157