Now that I’ve shown you the somewhat longer story, let’s fill in the rest. The next few sections go into more detail on compilation and linking, code structure, data conversions, error handling, and reference counts. These are core ideas in Python C extensions—some of which we will later learn you can often largely forget.
You always must compile C extension files such as the hello.c example and somehow link them with the Python interpreter to make them accessible to Python scripts, but there is wide variability on how you might go about doing so. For example, a rule of the following form could be used to compile this C file on Linux too:
hello.so: hello.c gcc hello.c -c -g -fpic -I$(PYINC) -o hello.o gcc -shared hello.o -o hello.so rm -f hello.o
To compile the C file into a shareable object file on Solaris, you might instead say something like this:
hello.so: hello.c cc hello.c -c -KPIC -o hello.o ld -G hello.o -o hello.so rm hello.o
On other platforms, it’s more different still. Because compiler options vary widely, you’ll want to consult your C or C++ compiler’s documentation or Python’s extension manuals for platform- and compiler-specific details. The point is to determine how to compile a C source file into your platform’s notion of a shareable or dynamically loaded object file. Once you have, the rest is easy; Python supports dynamic loading of C extensions on all major platforms today.
Because build details vary so widely from machine to machine (and even compiler to compiler), the build scripts in this book will take some liberties with platform details. In general, most are shown under the Cygwin Unix-like environment on Windows, partly because it is a simpler alternative to a full Linux install and partly because this writer’s background is primarily in Unix development. Be sure to translate for your own context. If you use standard Windows build tools, see also the directories PC and PCbuild in Python’s current source distribution for pointers.
Technically, what I’ve been showing you so far is called dynamic binding, and it represents one of two ways to link compiled C extensions with the Python interpreter. Since the alternative, static binding, is more complex, dynamic binding is almost always the way to go. To bind dynamically, simply follow these steps:
Compile hello.c into a shareable object file for your system (e.g., .dll, .so).
Put the object file in a directory on Python’s module search path.
That is, once you’ve compiled the source code file into a
shareable object file, simply copy or move the object file to a
directory listed in sys.path
(which includes PYTHONPATH
and
.pth
path file settings). It
will be automatically loaded and linked by the Python interpreter
at runtime when the module is first imported anywhere in the
Python process—including imports from the interactive prompt, a
standalone or embedded Python program, or a C API call.
Notice that the only non-static
name in the
hello.c example C file is the initialization
function. Python calls this function by name after loading the
object file, so its name must be a C global and should generally
be of the form init
X
, where X
is both the name of the module in Python import statements and the
name passed to Py_InitModule
.
All other names in C extension files are arbitrary because they
are accessed by C pointer, not by name (more on this later). The
name of the C source file is arbitrary too—at import time, Python
cares only about the compiled object file.
Although dynamic binding is preferred in most applications, static binding allows extensions to be added to the Python interpreter in a more permanent fashion. This is more complex, though, because you must rebuild Python itself, and hence you need access to the Python source distribution (an interpreter executable won’t do). Moreover, static linking of extensions is prone to change over time, so you should consult the README file at the top of Python’s source distribution tree for current details.[*]
In short, though, one way to statically link the extension of Example 22-1 is to add a line such as the following:
hello ~/PP3E/Integrate/Extend/Hello/hello.c
to the Modules/Setup
configuration file in the Python source code tree (change the
~
if this isn’t in your home
directory). Alternatively, you can copy your C file to the
Modules directory (or add a link to it there
with an ln
command) and add a
line to Setup
, such as hello hello.c
.
Then, rebuild Python itself by running a make
command at the top level of the
Python source tree. Python reconstructs its own makefiles to
include the module you added to Setup
, such that your code becomes part
of the interpreter and its libraries. In fact, there’s really no
distinction between C extensions written by Python users and
services that are a standard part of the language; Python is built
with this same interface. The full format of module declaration
lines looks like this:
<module> ... [<sourceOrObjectFile> ...] [<cpparg> ...] [<library> ...]
Under this scheme, the name of the module’s initialization
function must match the name used in the Setup
file, or you’ll get linking errors
when you rebuild Python. The name of the source or object file
doesn’t have to match the module name; the leftmost name is the
resulting Python module’s name. This process and syntax are prone
to change over time, so again, be sure to consult the README file
at the top of Python’s source tree.
Static binding works on any platform and requires no extra makefile to compile extensions. It can be useful if you don’t want to ship extensions as separate files, or if you’re on a platform without dynamic linking support. Its downsides are that you need to update Python configuration files and rebuild the Python interpreter itself, so you must therefore have the full source distribution of Python to use static linking at all. Moreover, all statically linked extensions are always added to your interpreter, regardless of whether they are used by a particular program. This can needlessly increase the memory needed to run all Python programs.
With dynamic binding, you still need Python include files, but you can add C extensions even if all you have is a binary Python interpreter executable. Because extensions are separate object files, there is no need to rebuild Python itself or to access the full source distribution. And because object files are only loaded on demand in this mode, it generally makes for smaller executables too—Python loads into memory only the extensions actually imported by each program run. In other words, if you can use dynamic linking on your platform, you probably should.
As an alternative to makefiles, it’s possible to specify
compilation of C extensions by writing Python scripts that use tools
in the Distutils
package—a
standard part of Python that is used to build, install, and
distribute Python extensions coded in Python or C. Its larger goal
is automated building of distributed packages on target
machines.
We won’t go into Distutils
exhaustively in this text; see Python’s standard distribution and
installation manuals for more details. Among other things, Distutils
is the de facto way to
distribute larger Python packages these days. Its tools know how to
install a system in the right place on target machines (usually, in
Python’s standard site-packages
)
and handle many platform-specific details that are tedious and error
prone to accommodate manually.
For our purposes here, though, because Distutils
also has built-in support for
running common compilers on a variety of platforms (including
Cygwin), it provides an alternative to makefiles for situations
where the complexity of makefiles is either prohibitive or
unwarranted. For example, to compile the C code in Example 22-1, we can code the
makefile of Example 22-2,
or we can code and run the Python script in Example 22-4.
Example 22-4. PP3EIntegrateExtendHellohellouse.py
# to build: python disthello.py build # resulting dll shows up in build subdir from distutils.core import setup, Extension setup(ext_modules=[Extension('hello', ['hello.c'])])
Example 22-4 is a
Python script run by Python; it is not a makefile. Moreover, there
is nothing in it about a particular compiler or compiler options.
Instead, the Distutils
tools it
employs automatically detect and run an appropriate compiler for the
platform, using compiler options that are appropriate for building
dynamically linked Python extensions on that platform. For the
Cygwin test machine, gcc
is used
to generate a .dll dynamic library ready to be
imported into a Python script—exactly like the result of the
makefile in Example 22-2,
but considerably simpler:
.../PP3E/Integrate/Extend/Hello$python disthello.py build
running build
running build_ext
building 'hello' extension
creating build
creating build/temp.cygwin-1.5.19-i686-2.4
gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes
-I/usr/include/python2.4 -c hello.c -o build/temp.cygwin-1.5.19-i686-2.4/hello.o
hello.c:31: warning: function declaration isn't a prototype
creating build/lib.cygwin-1.5.19-i686-2.4
gcc -shared -Wl,--enable-auto-image-base build/temp.cygwin-1.5.19-i686-2.4/hello
.o -L/usr/lib/python2.4/config -lpython2.4
-o build/lib.cygwin-1.5.19-i686-2.4/hello.dll
The resulting binary library file shows up in the generated built subdirectory, but it’s used in Python code just as before:
.../PP3E/Integrate/Extend/Hello$cd build/lib.cygwin-1.5.19-i686-2.4/
.../PP3E/Integrate/Extend/Hello/build/lib.cygwin-1.5.19-i686-2.4$ls
hello.dll .../PP3E/Integrate/Extend/Hello/build/lib.cygwin-1.5.19-i686-2.4$python
>>>import hello
>>>hello._ _file_ _
'hello.dll' >>>hello.message('distutils')
'Hello, distutils'
Distutils
scripts can
become much more complex in order to specify build options; for
example, here is a slightly more verbose version of ours:
from distutils.core import setup, Extension setup(name='hello', version='1.0', ext_modules=[Extension('hello', ['hello.c'])])
Unfortunately, further details about both Distutils
and makefiles are beyond the
scope of this chapter and book. Especially if you’re not used to
makefiles, see the Python manuals for more details on Distutils
. Makefiles are a traditional way
to build code on some platforms and we will employ them in this
book, but Distutils
can sometimes
be simpler in cases where they apply.
Though simple, the hello.c code of Example 22-1 illustrates the structure common to all C modules. Most of it is glue code, whose only purpose is to wrap the C string processing logic for use in Python scripts. In fact, although this structure can vary somewhat, this file consists of fairly typical boilerplate code:
The C file first includes the standard Python.h header file (from the installed Python Include directory). This file defines almost every name exported by the Python API to C, and it serves as a starting point for exploring the API itself.
The file then defines a function to be called from the
Python interpreter in response to calls in Python programs. C
functions receive two Python objects as input, and send either
a Python object back to the interpreter as the result or a
NULL
to trigger an
exception in the script (more on this later). In C, a PyObject*
represents a generic
Python object pointer; you can use more specific type names,
but you don’t always have to. C module functions can be
declared C static (local to the file) because Python calls
them by pointer, not by name.
Near the end, the file provides an initialized table
(array) that maps function names to
function pointers (addresses). Names in
this table become module attribute names that Python code uses
to call the C functions. Pointers in this table are used by
the interpreter to dispatch C function calls. In effect, the
table “registers” attributes of the module. A NULL
entry terminates the
table.
Finally, the C file provides an initialization function, which Python calls the first time
this module is imported into a Python program. This function
calls the API function Py_InitModule
to build up the new
module’s attribute dictionary from the entries in the
registration table and create an entry for the C module on the
sys.modules
table
(described in Chapter 3).
Once so initialized, calls from Python are routed directly to
the C function through the registration table’s function
pointers.
C module functions are responsible for converting Python
objects to and from C datatypes. In Example 22-1, message
gets two Python input objects
passed from the Python interpreter: args
is a Python tuple holding the
arguments passed from the Python caller (the values listed in
parentheses in a Python program), and self
is ignored. It is useful only for
extension types (discussed later in this chapter).
After finishing its business, the C function can return any of
the following to the Python interpreter: a Python object (known in C
as PyObject*
), for an actual
result; a Python None
(known in C
as Py_None
), if the function
returns no real result; or a C NULL
pointer, to flag an error and raise a
Python exception.
There are distinct API tools for handling input conversions (Python to C) and output conversions (C to Python). It’s up to C functions to implement their call signatures (argument lists and types) by using these tools properly.
When the C function is run, the arguments passed
from a Python script are available in the args
Python tuple object. The API
function PyArg_Parse
—and its
cousin, PyArg_ParseTuple
, which
assumes it is converting a tuple object—is probably the easiest
way to extract and convert passed arguments to C form.
PyArg_Parse
takes a
Python object, a format string, and a variable-length list of C
target addresses. It converts the items in the tuple to C datatype
values according to the format string, and it stores the results
in the C variables whose addresses are passed in. The effect is
much like C’s scanf
string
function. For example, the hello
module converts a passed-in Python
string argument to a C char*
using the s
convert
code:
PyArg_Parse(args, "(s)", &fromPython) # or PyArg_ParseTuple(args, "s",...
To handle multiple arguments, simply string format codes together and include corresponding C targets for each code in the string. For instance, to convert an argument list holding a string, an integer, and another string to C, say this:
PyArg_Parse(args, "(sis)", &s1, &i, &s2) # or PyArg_ParseTuple(args, "sis",...
To verify that no arguments were passed, use an empty format string like this:
PyArg_Parse(args,"( )")
This API call checks that the number and types of the arguments passed from Python match the format string in the call. If there is a mismatch, it sets an exception and returns zero to C (more on errors shortly).
As we’ll see in Chapter 23, API functions may also
return Python objects to C as results when Python is being run as
an embedded language. Converting Python return values in this mode
is almost the same as converting Python arguments passed to C
extension functions, except that Python return values are not
always tuples. To convert returned Python objects to C form,
simply use PyArg_Parse
. Unlike
PyArg_ParseTuple
, this call
takes the same kinds of arguments but doesn’t expect the Python
object to be a tuple.
There are two ways to convert C data to Python
objects: by using type-specific API functions or via the general
object-builder function, Py_BuildValue
. The latter is more
general and is essentially the inverse of PyArg_Parse
, in that Py_BuildValue
converts C data to Python
objects according to a format string. For instance, to make a
Python string object from a C char*
, the hello
module uses an s
convert code:
return Py_BuildValue("s", result) # "result" is a C char []/*
More specific object constructors can be used instead:
return PyString_FromString(result) # same effect
Both calls make a Python string object from a C character
array pointer. See the now-standard Python extension and runtime
API manuals for an exhaustive list of such calls available.
Besides being easier to remember, though, Py_BuildValue
has syntax that allows you
to build lists in a single step, described next.
With a few exceptions, PyArg_Parse(Tuple)
and Py_BuildValue
use the same conversion
codes in format strings. A list of all supported conversion codes
appears in Python’s extension manuals. The most commonly used are
shown in Table 22-1;
the tuple, list, and dictionary formats can be nested.
Table 22-1. Common Python/C data conversion codes
Format-string code | C datatype | Python object type |
---|---|---|
s | char* | String |
s# | | String, length |
i | int | Integer |
l | | Integer |
c | char | String |
f | float | Floating-point |
d | double | Floating-point |
O | PyObject* | Raw (unconverted) object |
O& | | Converted object (calls converter) |
| Targets or values | Nested tuple |
| Series of arguments/values | List |
| Series of | Dictionary |
These codes are mostly what you’d expect (e.g., i
maps between a C int
and a Python integer object), but
here are a few usage notes on this table’s entries:
Pass in the address of a char*
for s
codes when converting
to C, not the address of a char
array: Python copies out the
address of an existing C string (and you must copy it to save
it indefinitely on the C side: use strdup
or similar).
The O
code is useful
to pass raw Python objects between languages; once you have a
raw object pointer, you can use lower-level API tools to
access object attributes by name, index and slice sequences,
and so on.
The O&
code lets
you pass in C converter functions for custom conversions. This
comes in handy for special processing to map an object to a C
datatype not directly supported by conversion codes (for
instance, when mapping to or from an entire C struct or C++
class instance). See the extensions manual for more
details.
The last two entries, [...]
and {...}
, are currently supported only
by Py_BuildValue
: you can
construct lists and dictionaries with format strings, but you
can’t unpack them. Instead, the API includes type-specific
routines for accessing sequence and mapping components given a
raw object pointer.
PyArg_Parse
supports some
extra codes, which must not be nested in tuple formats ((...)
):
|
The remaining arguments are optional (varargs
, much like the Python
language’s *
arguments).
The C targets are unchanged if arguments are missing in the
Python tuple. For instance, si|sd
requires two arguments but
allows up to four.
:
The function name follows, for use in error messages set by the call (argument mismatches). Normally Python sets the error message to a generic string.
;
A full error message follows, running to the end of the format string.
This format code list isn’t exhaustive, and the set of convert codes may expand over time; refer to Python’s extension manual for further details.
When you write C extensions, you need to be aware that errors can occur on either side of the languages fence. The following sections address both possibilities.
C extension module functions return a C NULL
value for the result object to flag
an error. When control returns to Python, the NULL
result triggers a normal Python
exception in the Python code that called the C function. To name
an exception, C code can also set the type and extra data of the
exceptions it triggers. For instance, the PyErr_SetString
API function sets the
exception object to a Python object and sets the exception’s extra
data to a character string:
PyErr_SetString(ErrorObject, message)
We will use this in the next example to be more specific
about exceptions raised when C detects an error. C modules may
also set a built-in Python exception; for instance, returning
NULL
after saying this:
PyErr_SetString(PyExc_IndexError, "index out-of-bounds")
raises a standard Python IndexError
exception with the message
string data. When an error is raised inside a Python API function,
both the exception object and its associated “extra data” are
automatically set by Python; there is no need to set it again in
the calling C function. For instance, when an argument-passing
error is detected in the PyArg_Parse
function, the hello
stack module just returns NULL
to propagate the exception to the
enclosing Python layer, instead of setting its own message.
Python API functions may be called from C extension
functions or from an enclosing C layer when Python is embedded. In
either case, C callers simply check the return value to detect
errors raised in Python API functions. For pointer result
functions, Python returns NULL
pointers on errors. For integer result functions, Python generally
returns a status code of -1
to
flag an error and a 0
or
positive value on success. (PyArg_Parse
is an exception to this
rule: it returns 0
when it
detects an error.) To make your programs robust, you should check
return codes for error indicators after most Python API calls;
some calls can fail for reasons you may not have expected (e.g.,
memory overflow).
The Python interpreter uses a reference-count scheme to implement garbage collection. Each Python object carries a count of the number of places it is referenced; when that count reaches zero, Python reclaims the object’s memory space automatically. Normally, Python manages the reference counts for objects behind the scenes; Python programs simply make and use objects without concern for managing storage space.
When extending or embedding Python, though, integrated C code
is responsible for managing the reference counts of the Python
objects it uses. How important this becomes depends on how many raw
Python objects a C module processes and which Python API functions
it calls. In simple programs, reference counts are of minor, if any,
concern; the hello
module, for
instance, makes no reference-count management calls at all.
When the API is used extensively, however, this task can become significant. In later examples, we’ll see calls of these forms show up:
Py_INCREF(obj)
Increments an object’s reference count.
Py_DECREF(obj)
Decrements an object’s reference count (reclaims if zero).
Py_XINCREF(obj)
Behaves similarly to Py_INCREF(obj)
, but ignores a
NULL
object pointer.
Py_XDECREF(obj)
Behaves similarly to py_DECREF(obj)
, but ignores a
NULL
object pointer.
C module functions are expected to return either an object
with an incremented reference count or NULL
to signal an error. As a general
rule, API functions that create new objects increment their
reference counts before returning them to C; unless a new object is
to be passed back to Python, the C program that creates it should
eventually decrement the object’s counts. In the extending scenario,
things are relatively simple; argument object reference counts need
not be decremented, and new result objects are passed back to Python
with their reference counts intact.
The upside of reference counts is that Python will never reclaim a Python object held by C as long as C increments the object’s reference count (or doesn’t decrement the count on an object it owns). Although it requires counter management calls, Python’s garbage collector scheme is fairly well suited to C integration.
Some C extensions may be required to perform additional tasks beyond data conversion, error handling, and reference counting. For instance, long-running C extension functions in threaded applications must release and later reacquire the global interpreter lock, so as to allow Python language threads to run in parallel. See the introduction to this topic in Chapter 5 for background details. Calls to long-running tasks implemented in C extensions, for example, are normally wrapped up in two C macros:
Py_BEGIN_ALLOW_THREADS ...Perform a potentially blocking operation... Py_END_ALLOW_THREADS
The first of these saves the thread state data structure in a local variable and releases the global lock; the second reacquires the lock and restores the thread state from the local variable. The net effect is to allow Python threads to run during the execution of the code in the enclosed block, instead of making them wait. The C code in the calling thread can run freely of and in parallel with other Python threads, as long as it doesn’t reenter the Python C API until it reacquires the lock.
The API has addition thread calls, and depending on the application, there may be other C coding requirements in general. In deference to space, though, and because we’re about to meet a tool that automates much of our integration work, we’ll defer to Python’s integration manuals for additional details.
[*] In fact, starting with Python 2.1, the
setup.py script at the top of the source
distribution attempts to detect which modules can be built,
and it automatically compiles them using the distutils
system described in the
next section. The setup.py script is run
by Python’s make system after building a minimal interpreter.
This process doesn’t always work, though, and you can still
customize the configuration by editing the
Modules/Setup file. As a more recent
alternative, see also the example lines in Python’s
setup.py for xxmodule.c
.
18.191.144.194