Chapter 17. Extension Programming with C

Don't let anybody mislead you: well-written code in C will always execute faster than code written in Python. Having said that, don't be misled: Developing code in Python will always be faster than developing code in C.

This may seem like a dilemma at first. You want to have fast code, and you want to produce it quickly. Balancing these, and the problem it creates, is actually easily solved. Develop your code in Python. After all, developer's time is much more expensive than the computer's time. Plus, humans have a miserable track record of predicting where a bottleneck is going to occur in a system. Spending time optimizing code up front by doing things like taking a lot of time to write a new program in C is usually wasted time. This is what led the esteemed computer scientist, C. A. R. Hoare, to say, "Premature optimization is the root of all evil." Of course, he was only talking about computer programs, but the point is there.

If you've written your code, optimized your algorithms, and still find performance is unacceptable, you should profile your application by finding out where it's spending its time, determine where the bottlenecks are, and reimplement those small parts in C as a Python extension module. That's part of what this chapter is about.

Or if you already have an existing body of code written in C and you want to leverage that from within Python, you can create a small Python extension module exposing that C code to your Python code so it can be called as though it were written in Python. This is probably the more common reason for implementing an extension module (a module written in a language other than Python).

In this chapter you learn:

  • How to create an extension module in C for the standard Python interpreter (but you have to promise that you'll do so only if you have absolutely no other option.) This chapter assumes you are already familiar with C. If you're not, you need to rope someone who is familiar with C into helping you out.

  • Basic and real-world, practical examples in which you define, in C, a class that can encode raw audio data into MP3-encoded data. Your class will be usable from Python and will make method calls on pure Python objects, demonstrating how you can communicate both ways.

  • How to work with the Python API from C.

This chapter is just an introduction to using the Python API from C and is no way a substitute for the API documentation found at http://docs.python.org/. You should look up the function definitions you'll be using because they're mentioned throughout the examples.

Extension Module Outline

First of all, a Python extension module is nothing more than a normal C library. On UNIX machines, these libraries usually end in .so (for shared object). On Windows machines, you typically see .dll (for dynamically linked library).

Before you get started, you're going to need the Python header files. On UNIX machines, this usually requires installing a developer-specific package. Windows users get these headers as part of the package when they use the binary Python installer.

For your first look at a Python extension module, you'll be grouping your code into three parts: the C functions you want to expose as the interface from your module; a table mapping the names of your functions as Python developers will see them to C functions inside the extension module; and an initialization function.

Most extension modules can be contained in a single C source file, sometimes called the glue. Start the file out including Python.h, which will give you access to the internal Python API used to hook your module into the interpreter. Be sure to include Python.h before any other headers you might need. You'll follow the includes with the functions you want to call from Python.

Interestingly, the signatures of the C implementations of your functions will always take one of the following three forms:

PyObject *MyFunction(PyObject *self, PyObject *args);

PyObject *MyFunctionWithKeywords(PyObject *self,
                                 PyObject *args,
                                 PyObject *kw);

PyObject *MyFunctionWithNoArgs(PyObject *self);

Typically, your C functions will look like the first of the preceding three declarations. The arguments passed into your functions are packed into a tuple that you'll have to break apart in order to use, which explains how you can implement a function in C that takes only two arguments but can accept any number of arguments as called from Python.

Notice how each one of the preceding declarations returns a Python object. There's no such thing asa "void" function in Python as there is in C. If you don't want your functions to return a value, return the C equivalent of Python's None value instead. The Python headers define a macro, Py_RETURN_NONE, that does this for you.

Seeing these declarations should make it obvious how object-oriented Python is. Everything is an object. In C, you'll be using the Python API to work with these objects, but the concepts you know from Python still hold.

The names of your C functions can be whatever you like because they'll never be seen outside of the extension module. In fact, the functions are usually declared with the static keyword (which in C means they're not visible outside of the current source file). In the example code, functions usually are named by combining the Python module and function names together, as shown here:

static PyObject *foo_bar(PyObject *self, PyObject *args) {
    /* Do something interesting here. */
    Py_RETURN_NONE;
}

This would be a Python function called bar inside of the module foo. You'll be putting pointers to your C functions into the method table for the module that usually comes next in your source code.

This method table is a simple array of PyMethodDef structures. That structure looks something like this:

struct PyMethodDef {
    char        *ml_name;
    PyCFunction  ml_meth;
    int          ml_flags;
    char        *ml_doc;
};

That first member, ml_name, is the name of the function as the Python interpreter will present it when it's used in Python programs. The PyCFunction member must be the address to a function that has any one of the signatures described previously. ml_flags tells the interpreter which of the three signatures ml_meth is using. ml_flags will usually have a value of METH_VARARGS. This value can be bitwise or'ed with METH_KEYWORDS if you want to allow keyword arguments into your function. It can also have a value of METH_NOARGS that indicates you don't want to accept any arguments. Finally, the last member in the PyMethodDef structure, ml_doc, is the docstring for the function, which can be NULL if you don't feel like writing one — shame on you.

This table needs to be terminated with a sentinel that consists of NULL and 0 values for the appropriate members.

This is what a table containing an entry for your foo_bar function would look like:

static PyMethodDef foo_methods[] = {
    { "bar", (PyCFunction)foo_bar, METH_NOARGS, "My first function."
 },
    { NULL, NULL, 0, NULL }
};

Casting the address of foo_bar to a PyCFunction is necessary to get the compiler to not warn you about incompatible pointer types. This is safe because of the METH_NOARGS flag for the ml_flags member, which indicates to the Python interpreter that it should call your C function with only one PyObject * as an argument (and not two as would be the case if you used METH_VARARGS, or three if you used METH_VARARGS|METH_KEYWORDS).

The last part of your extension module is the initialization function. This function is called by the Python interpreter when the module is loaded. It's required that the function be named initfoo, where foo is the name of the module.

The initialization function needs to be exported from the library you'll be building. The Python headers define PyMODINIT_FUNC to include the appropriate incantations for that to happen for the particular environment in which you're compiling. All you have to do is use it when defining the function.

Putting this all together looks like the following:

#include <Python.h>

static PyObject *foo_bar(PyObject *self, PyObject *args) {
    /* Do something interesting here. */
    Py_RETURN_NONE;
}

static PyMethodDef foo_methods[] = {
    { "bar", (PyCFunction)foo_bar, METH_NOARGS, NULL },
    { NULL, NULL, 0, NULL }
};

PyMODINIT_FUNC initfoo() {
    Py_InitModule3("foo", foo_methods, "My first extension module.");
}

The Py_InitModule3 function is typically what you use to define a module because it lets you define a docstring for a module, which is always a nice thing to do.

Building and Installing Extension Modules

You can build the extension module in a couple of different ways. The obvious way is to build it the way you build all of the libraries on your platform. Save the previous example as foo.c. Then, to compile your module on Linux, you could do something like this:

gcc -shared -I/usr/include/python3.1 foo.c -o foo.so

Building the extension module on Windows would look something like this:

cl /LD /IC:Python31include foo.c C:Python31libspython31.lib

For either of these commands to work, you'll need to have a C compiler installed and have it available in your path (if you're reading this chapter, you probably do). The Python headers need to be installed and accessible to the compiler. In both of these examples, the directory containing the Python headers is specified on the command line (as is the path to the Python library for the Windows compiler). If your headers and libraries are located in a different location, the commands will have to be modified accordingly.

The name of the actual shared object (or DLL on Windows) needs to be the same as the string passed in to Py_InitModule3 (minus the .so or .dll extension). Optionally, you can suffix the base name of the library with module. So your foo extension module could be called foo.so or foomodule.so.

This works, but it's not the only way to do it. The new and improved way of building extension modules is to use distutils, which is included in all recent versions of Python.

The distutils package makes it possible to distribute Python modules, both pure Python and extension modules, in a standard way. Modules are distributed in source form and built and installed via a setup script (usually called setup.py). As long as your users have the required compiler packages and Python headers installed, this usually works.

The setup script is surprisingly succinct:

from distutils.core import setup, Extension
setup(name='foo', version='1.0', ext_modules=[Extension('foo',
['foo.c'])])

Running this script through the Python interpreter demonstrates that you're getting quite a bit more than initially expected with just two lines of code:

$ python setup.py
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py - -help [cmd1 cmd2 ...]
   or: setup.py - -help-commands
   or: setup.py cmd - -help

error: no commands supplied

Trying again with the - -help-commands argument displays all of the commands your setup script can respond to:

$ python setup.py - -help-commands
Standard commands:
  build            build everything needed to install
  build_py         "build" pure Python modules (copy to build
directory)
  build_ext        build C/C++ extensions (compile/link to build
directory)
  build_clib       build C/C++ libraries used by Python extensions
  build_scripts    "build" scripts (copy and fixup #! line)
  clean            clean up output of 'build' command
  install          install everything from build directory
  install_lib      install all Python modules (extensions and pure
Python)
  install_headers  install C/C++ header files
  install_scripts  install scripts (Python or otherwise)
  install_data     install data files
sdist            create a source distribution (tarball, zip file,
etc.)
  register         register the distribution with the Python package
index
  bdist            create a built (binary) distribution
  bdist_dumb       create a "dumb" built distribution
  bdist_rpm        create an RPM distribution
  bdist_wininst    create an executable installer for MS Windows

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py - -help [cmd1 cmd2 ...]
   or: setup.py - -help-commands
   or: setup.py cmd - -help

There's a lot going on here, but all you need for now is the build command. Executing that will compile foo.c into foo.so (on Linux) or foo.dll (on Windows). This file will end up in a subdirectory of the build directory in your current directory unless you change that with more command-line options.

For the module to be importable by the Python interpreter, it needs to be in the current directory or in a directory listed in the PYTHONPATH environmental variable or in a directory listed in the sys.path list, which you can modify at runtime, although I wouldn't recommend it.

The easiest way to get this to happen is to use another one of the setup script commands:

$ python setup.py install

If you hadn't already built the module, this would have done that for you because building is a prerequisite for installing (much like a make file). The install command also copies the module to the site-packages directory for your Python installation. This site-packages directory is listed in sys.path, so after this is done, you can start using the module.

On UNIX-based systems, you'll most likely need to run this command as root in order to have permissions to write to the site-packages directory. This usually isn't a problem on Windows. It's also possible to install modules in alternative locations using the - -home or - -prefix command-line options, but doing this leaves you responsible for ensuring they're put in a directory the Python interpreter knows about when it's run.

Passing Parameters from Python to C

After you have everything built and installed, importing your new extension module and invoking its one function is easy:

>>> import foo
>>> dir(foo)
['__doc__', '__file__', '__name__', 'bar']
>>> foo.bar()

If you tried to pass in any arguments to your function, the interpreter will rightfully complain:

>>> foo.bar(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: bar() takes no arguments (1 given)

Because you'll most likely want to define functions that do accept arguments, you can use one of the other signatures for your C functions. For example, a "normal" function (one that accepts some number of parameters) would be defined like this:

static PyObject *foo_baz(PyObject *self, PyObject *args) {
    /* Parse args and do something interesting here. */
    Py_RETURN_NONE;
}

The method table containing an entry for the new function would look like this:

static PyMethodDef foo_methods[] = {
    { "bar",  (PyCFunction)foo_bar, METH_NOARGS, NULL },
    { "baz",  foo_baz, METH_VARARGS, NULL },
    { NULL, NULL, 0, NULL }
};

After making those changes to foo.c and saving them, you're going to want to close any open Python interpreters that imported the old version of the extension module so that you can recompile the source, start a new interpreter, and import the new version of the extension module. It's easy to forget to do this if you're compiling in one window and invoking Python in another.

Compiling the new version of your module and importing it will enable you to invoke the new function with any number of arguments of any type:

>>> foo.baz()
>>> foo.baz(1)
>>> foo.baz(1, 2.0)
>>> foo.baz(1, 2.0, "three")

The reason why anything goes is that you haven't written the C code to enforce a certain number and type of arguments.

The Python API gives you the PyArg_ParseTuple function to extract the arguments from the one PyObject pointer passed into your C function. This is a variadic function much like the standard sscanf function with which you might be familiar.

The first argument to PyArg_ParseTuple is the args argument. This is the object you'll be "parsing." The second argument is a format string describing the arguments as you expect them to appear. Each argument is represented by one or more characters in the format string. An i indicates that you expect the argument to be an integer-like object, which PyArg_ParseTuple will convert into a int as known in C. Specifying a d in the format string will give you a double, and s will give you a string (char *). For example, if you expected the baz function to be passed one integer, one double, and one string, your format string would be "ids". You can find the full list of indicators that you can include in a format string at http://docs.python.org/api/arg-parsing.html.

The remaining arguments to PyArg_ParseTuple are pointers to storage space of the appropriate type for your arguments, just like sscanf. Knowing this, you might rewrite baz to look like the following:

static PyObject *foo_baz(PyObject *self, PyObject *args) {
    int     i;
    double  d;
    char   *s;
    if (!PyArg_ParseTuple(args, "ids", &i, &d, &s)) {
        return NULL;
    }
    /* Do something interesting here. */
    Py_RETURN_NONE;
}

PyArg_ParseTuple will return 0 if it fails to extract exactly what was specified in the format string. It's important that you return NULL from your function when this happens so that the interpreter can generate an exception for your caller.

What about optional arguments? If you include a | (the vertical bar) character in your format string, the indicators to the left of the | will be required, but the indicators to the right will be optional. You're going to want to give your local storage for the optional arguments a default value because PyArg_ParseTuple won't write anything to those variables if the caller didn't specify the necessary arguments.

For example, if baz required one int, one double, and one string but also allowed an optional int, double, and then a string, you might rewrite it to look like this:

static PyObject *foo_baz(PyObject *self, PyObject *args) {
    int     i;
    double  d;
    char   *s;
    int     i2 = 4;
    double  d2 = 5.0;
    char   *s2 = "six";
    if (!PyArg_ParseTuple(args, "ids|ids", &i, &d, &s, &i2, &d2,
 &s2)) {
        return NULL;
    }
    /* Do something interesting here. */
    Py_RETURN_NONE;
}

Lastly, this next and final form your C functions might take will only be necessary when your functions accept keyword arguments. In this case, you'll use the signature that accepts three PyObject * arguments and set the ml_flags member in your method table entry to METH_VARARGS|METH_KEYWORDS. Instead of using the PyArg_ParseTuple function to extract your arguments, you'll use PyArg_ParseTupleAndKeywords.

This is what the function might look like:

static PyObject *foo_quux(PyObject *self, PyObject *args, PyObject
 *kw) {
    char *kwlist[] = { "i", "d", "s", NULL };
    int     i;
    double  d = 2.0;
    char   *s = "three";
    if (!PyArg_ParseTupleAndKeywords(args, kw, "i|ds", kwlist, &i,
 &d, &s)) {
        return NULL;
    }
    /* Do something interesting here. */
    Py_RETURN_NONE;
}

This would be its entry in the method table right after the entry for the baz function but before the sentinel entry:

{ "quux", (PyCFunction)foo_quux, METH_VARARGS|METH_KEYWORDS, NULL },

PyArg_ParseTupleAndKeywords works just like PyArg_ParseTuple with the exception of two extra arguments. First, you need to pass in the pointer to the Python object containing the keyword arguments. Second, you need to indicate what keywords you're interested in. You do that with a NULL-terminated list of strings. In the preceding example, you're saying that your keywords are "i", "d", and "s".

Each keyword needs to correspond with one indicator in the format string even if you don't ever intend to have your callers use a keyword for certain arguments. Notice how the preceding example includes three indicators in the format string. The first, "i", is required whereas the other two, "d" and "s", are optional. You could call this function (from Python) in any of the following ways:

>>> foo.quux(1)
>>> foo.quux(i=1)
>>> foo.quux(1, 2.0)
>>> foo.quux(1, 2.0, "three")
>>> foo.quux(1, 2.0, s="three")
>>> foo.quux(1, d=2.0)
>>> foo.quux(1, s="three")
>>> foo.quux(1, d=2.0, s="three")
>>> foo.quux(1, s="three", d=2.0)
>>> foo.quux(i=1, d=2.0, s="three")
>>> foo.quux(s="three", d=2.0, i=1)

You can probably come up with even more variations.

Returning Values from C to Python

PyArg_ParseTuple and PyArg_ParseTupleAndKeywords convert from Python objects into C values but what about going the other way? How would you return a value from a function implemented in C back into Python?

All of the function signatures you saw previously return a PyObject *, so you need to use whatever the opposite of PyArg_ParseTuple is in order to turn a C value into a Python object. That function is called Py_BuildValue.

Py_BuildValue takes in a format string much like PyArg_ParseTuple does. Instead of passing in the addresses of the values you're building, you pass in the actual values. Here's an example showing how to implement an add function:

static PyObject *foo_add(PyObject *self, PyObject *args) {
    int a;
    int b;
    if (!PyArg_ParseTuple(args, "ii", &a, &b)) {
         return NULL;
    }
    return Py_BuildValue("i", a + b);
}

The Python equivalent of this function would look like this:

def add(a, b):
    return a + b

What if you want to return more than one value from your function? In Python, you do that by returning a tuple. In C, you do that by building a tuple with Py_BuildValue. If your format string has more than one indicator, you'll get a tuple. You can also be explicit and surround your indicators with parentheses:

static PyObject *foo_add_and_subtract(PyObject *self, PyObject *args)
 {
    int a;
    int b;
    if (!PyArg_ParseTuple(args, "ii", &a, &b)) {
        return NULL;
    }
    return Py_BuildValue("(ii)", a + b, a - b);
}

To help visualize what this function is doing, this is what it would look like if implemented in Python:

def add_and_subtract(a, b):
    return (a + b, a - b)

Now, armed with just this knowledge, it's possible for you to create a wide variety of extension modules. Let's put this to good use and work on a real example.

The LAME Project

LAME is (or was) an acronym that originally stood for "LAME Ain't an MP3 Encoder." Whether or not it's officially considered an MP3 encoder isn't important to you, because it functions as a (most excellent) free and open-source library that is capable of encoding MP3s.

Dozens of software projects use LAME but not many are implemented in Python, which is why you'll be using it as an example to demonstrate just how easy it is to create extension modules for Python that leverage an existing C code base, even when the C code wasn't written to be interfaced with Python.

This example is also a very practical one. Consider how many years went into developing the LAME code base, which in case you don't know is many, many, many years. Would you really want to duplicate that work by reimplementing it in Python? Now consider what your answer would be if you were told how unbelievably slow it would run if you had a Python-only encoder! This isn't anything against Python, by the way. This would old true of any language that is higher-level than C. Languages such as Java, Perl, and so on would have the same limitation. This is a perfect example of code that you would not want to use Python to develop (there are very few examples where this is true).

Before creating an extension module that wraps the LAME library, you need to learn how to use the API exposed by that library. The core of the LAME API is small enough to create a quick demonstration with only a page or so of C code.

You need the LAME headers and libraries installed on your machine before you can write any code that uses its API, of course. The LAME Project's website is located on SourceForge at http://lame.sourceforge.net/. You can download the source code from there. Though you can download and compile and install the libraries for any part of the LAME package from there, you won't find any pre-built binaries on this site (presumably to avoid the potential legal issues of distributing an MP3 encoder). However, you can find links to sites that do provide downloadable binaries by looking for them on the LAME Project's website (if you'd rather not build from source).

You can find packages on the Web for most Linux distributions. Some names these packages may be listed under are lame, liblame, or the liblame-dev package. If you can't find a package or would rather build from source, ./configure, make, and make install will work to build a complete working installation of LAME, just as they do with almost every other project you build from source on Linux.

Windows users can use any of the pre-built binaries but those don't usually come with the headers, so you'll have to download those from the main site. If you're doing that, you might as well build the libraries yourself. The LAME source code includes a Visual Studio workspace that can build everything you need to get through the rest of this chapter. There will be errors (there were for the author), but the build process makes it far enough to finish building just what you need, so you can ignore those errors and be OK.

The general overview of creating an MP3 file with LAME is described here:

  1. Initialize the library.

  2. Set up the encoding parameters.

  3. Feed the library one buffer of audio data at a time (returning another buffer of MP3-encoded bytes of that data).

  4. Flush the encoder (possibly returning more MP3 data).

  5. Close the library.

That's it!

Here's an example written in C that uses the LAME API. It can encode any raw audio file into an MP3-encoded audio file. If you want to compile it to make sure it works, save it in a file called clame.c:

#include <stdio.h>
#include <stdlib.h>

#include <lame.h>

#define INBUFSIZE 4096
#define MP3BUFSIZE (int)(1.25 * INBUFSIZE) + 7200

int encode(char *inpath, char *outpath) {
    int status = 0;
    lame_global_flags *gfp;
    int ret_code;
    FILE *infp;
    FILE *outfp;
    short *input_buffer;
    int input_samples;
    char *mp3_buffer;
    int mp3_bytes;

    /* Initialize the library. */
    gfp = lame_init();
    if (gfp == NULL) {
        printf("lame_init returned NULL
");
        status = −1;
        goto exit;
    }

    /* Set the encoding parameters. */
    ret_code = lame_init_params(gfp);
    if (ret_code < 0) {
        printf("lame_init_params returned %d
", ret_code);
        status = −1;
        goto close_lame;
    }

    /* Open our input and output files. */
    infp = fopen(inpath, "rb");
    outfp = fopen(outpath, "wb");

    /* Allocate some buffers. */
    input_buffer = (short*)malloc(INBUFSIZE*2);
    mp3_buffer = (char*)malloc(MP3BUFSIZE);

    /* Read from the input file, encode, and write to the output
 file. */
    do {
        input_samples = fread(input_buffer, 2, INBUFSIZE, infp);
        if (input_samples > 0) {
            mp3_bytes = lame_encode_buffer_interleaved(
                gfp,
                input_buffer,
input_samples / 2,
                mp3_buffer,
                MP3BUFSIZE
            );
            if (mp3_bytes < 0) {
                printf("lame_encode_buffer_interleaved returned
 %d
", mp3_bytes);
                status = −1;
                goto free_buffers;
            } else if (mp3_bytes > 0) {
                fwrite(mp3_buffer, 1, mp3_bytes, outfp);
            }
        }
    } while (input_samples == INBUFSIZE);

    /* Flush the encoder of any remaining bytes. */
    mp3_bytes = lame_encode_flush(gfp, mp3_buffer,
sizeof(mp3_buffer));
    if (mp3_bytes > 0) {
        printf("writing %d mp3 bytes
", mp3_bytes);
        fwrite(mp3_buffer, 1, mp3_bytes, outfp);
    }

    /* Clean up. */

free_buffers:
    free(mp3_buffer);
    free(input_buffer);

    fclose(outfp);
    fclose(infp);

close_lame:
    lame_close(gfp);

exit:
    return status;
}

int main(int argc, char *argv[]) {
    if (argc < 3) {
        printf("usage: clame rawinfile mp3outfile
");
        exit(1);
    }
    encode(argv[1], argv[2]);
    return 0;
}

To compile the file on Linux, this command should work (assuming you installed a package like liblame-dev or that the lame development components have installed the appropriate header files in /usr/include/lame):

gcc -I/usr/include/lame clame.c -lmp3lame -o clame

On Windows, you'll probably have to use a command like this (assuming you built from source):

cl /IC:lame-3.98.2include clame.c 
  C:lame-3.98.2libmp3lameReleaselibmp3lame.lib 
  C:lame-3.98.2mpglibReleasempglib.lib

Those command-line parameters are telling the compiler where to find the LAME headers and necessary libraries. You'll probably have to adjust them to point to the correct directories.

That wasn't too bad, was it? Of course, this code doesn't know how to extract data out of a WAV or any other sort of audio file. It is assumed here that the input file contains nothing but raw, 16-bit, signed samples at 44.1 kHz. Turning a WAV file into one of these raw files is a simple command on most UNIX-based machines (assuming you have the sox program, which should also be available as a package):

sox test.wav -t raw test.raw

The LAME Extension Module

To create an extension module that enables you to encode a raw audio file into an MP3 could be as simple as creating a simple function that invokes the encode function you defined in the preceding example:

#include <Python.h>

#include <lame.h>

/* defined in clame.c */
int encode(char *, char *);

static PyObject *pylame1_encode(PyObject *self, PyObject *args) {
    int status;
    char *inpath;
    char *outpath;
    if (!PyArg_ParseTuple(args, "ss", &inpath, &outpath)) {
        return NULL;
    }
    status = encode(inpath, outpath);
    return Py_BuildValue("i", status);
}
static PyMethodDef pylame1_methods[] = {
    { "encode", pylame1_encode, METH_VARARGS, NULL },
    { NULL, NULL, 0, NULL }
};

PyMODINIT_FUNC initpylame1() {
    Py_InitModule3("pylame1", pylame1_methods, "My first LAME
module.");
}

Here the encode function accepts two string arguments — the input path and the output path.

Try saving the preceding code in a file called pylame1.c and compiling it with this command:

gcc -shared -I/usr/include/python3.1 -I/usr/include/lame 
  pylame1.c clame.c 
  -lmp3lame -o pylame1.so

On Windows, you'll need something like this:

cl /LD /IC:Python31include /IC:lame-3.96.1include 
  pylame1.c clame.c 
  C:Python31libspython31.lib 
 C:lame-3.98.2libmp3lameReleaselibmp3lame.lib 
  C:lamexs-3.98.2mpglibReleasempglib.lib

Note that you're compiling the same clame.c example you used in the previous section into this DLL by including it on the command line.

This works, but it's not ideal; you have no way of influencing how the encode function works other than by passing in two strings. What if you wanted to encode something other than a raw audio file? How about a WAV file or perhaps some audio data you're streaming off the network? There's no reason why you couldn't implement that functionality in Python, where it would be much easier to do.

You have two options: You can have the Python code pass the audio data into the encode function, one chunk at a time, just like you do in the C function. Or, you can pass some object with a read method in to encode, which would then read its data from that object.

Although the second option might sound more object oriented, the first is the better choice because it provides more flexibility. You could always define some sort of object that reads from some source and passes it on to the encoder, but it would be a lot harder to go the other way around.

Using this design is going to require that you make some changes in the extension module. Right now, there's just one function, and that's fine because that function is doing all of the work for you. With the new approach, however, you'll be making multiple calls to the function that you'll be using to encode the audio data as MP3 data. You can't have the function re-open the file every time it's called, so you're going to need to maintain some state information about where you are in the file somewhere. You can have the caller maintain that state, or you can encapsulate it inside some object defined by your module, which is the approach you'll be taking here.

The new version of your extension module needs to expose a class so that your clients can create instances of this class and invoke methods on them. You'll be hiding a small amount of state in those instances so they can remember which file they're writing to between method calls.

As you learn what you need to do for this new module, you'll see the snippets of code relevant to what is being explained. The entire source for pylame2.c is shown later so you can see the snippets together in all of their glory.

The C language syntax doesn't directly support defining a new class, but it does have structures; and in C structures can contain function pointers, which is good enough for what you're trying to do right now. When the Python interpreter creates a new instance of your class, it will actually be allocating enough space to store a new instance of your structure. It's that structure that will contain all of your state for each object.

The Python interpreter needs to store some information in your objects as well. Every object has a reference count and a type, so the first part of your structure has to contain these in order for the Python interpreter to find them:

typedef struct {
    PyObject_HEAD
    /* State goes here. */
} pylame2_EncoderObject;

The PyObject_HEAD macro will add the appropriate members to the structure — you just have to make sure that it's the first thing you add.

You need to provide a function to create the new instances of this structure:

static PyObject *Encoder_new(PyTypeObject *type, PyObject *args,
PyObject *kw) {
    pylame2_EncoderObject *self = (pylame2_EncoderObject *)
type->tp_alloc(type, 0);
    /* Initialize object here. */
    return (PyObject *)self;
}

Think of this as equivalent to Python's __new__ method. This function will be called by the interpreter when it needs to create a new instance of your type. Notice how you're not calling malloc directly but are instead invoking some other function as indicated by the tp_alloc member of the PyTypeObject that was passed in to your function. You see what function that is in a bit.

You also need a function to free your instances:

static  void Encoder_dealloc(PyObject *self) {
    self->ob_type->tp_free(self);
}

Think of this function as equivalent to Python's __del__ method and being a counterpart to Encoder_new. Because you're calling tp_free on your object's type object here, you're probably assuming that the tp_free function is the counterpart to the tp_alloc function and you're right.

What about the other methods your object is supposed to support? Do you add function pointers to your structure to represent those? If you did, each instance would be eating up memory with the exact same set of pointers, which would be a waste. Instead, you're going to store the function pointers for your methods in a separate structure and your objects will refer to that structure.

Remember that each object knows its type — there's a pointer to a type object hiding inside the PyObject_HEAD macro. Therefore, you need another structure to represent that:

static PyTypeObject pylame2_EncoderType = {
    PyObject_HEAD_INIT(NULL)
    0,                             /* ob_size */
    "pylame2.Encoder",             /* tp_name */
    sizeof(pylame2_EncoderObject), /* tp_basicsize */
    0,                             /* tp_itemsize */
    Encoder_dealloc,               /* tp_dealloc */
    0,                             /* tp_print */
    0,                             /* tp_getattr */
    0,                             /* tp_setattr */
    0,                             /* tp_compare */
    0,                             /* tp_repr */
    0,                             /* tp_as_number */
    0,                             /* tp_as_sequence */
    0,                             /* tp_as_mapping */
    0,                             /* tp_hash */
    0,                             /* tp_call */
    0,                             /* tp_str */
    0,                             /* tp_getattro */
    0,                             /* tp_setattro */
    0,                             /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT,            /* tp_flags */
    "My first encoder object.",    /* tp_doc */
    0,                             /* tp_traverse */
    0,                             /* tp_clear */
    0,                             /* tp_richcompare */
    0,                             /* tp_weaklistoffset */
    0,                             /* tp_iter */
    0,                             /* tp_iternext */
    0,                             /* tp_methods */
    0,                             /* tp_members */
    0,                             /* tp_getset */
    0,                             /* tp_base */
    0,                             /* tp_dict */
    0,                             /* tp_descr_get */
    0,                             /* tp_descr_set */
    0,                             /* tp_dictoffset */
    0,                             /* tp_init */
    0,                             /* tp_alloc */
    Encoder_new,                   /* tp_new */
    0,                             /* tp_free */
};

This is going to be the structure for what you're going to get a pointer to when your Encoder_new function is called. There's a lot to that structure (and even more that you can't see yet), but you're letting most of the members default to NULL for now. You'll go over the important bits before moving on.

The PyObject_HEAD_INIT macro adds the members that are common to all types. It must be the first member in the structure. It's like PyObject_HEAD except that it initializes the type pointer to whatever you pass in as an argument.

Remember: In Python, types are objects, too, so they also have types. You could call a type's type a "type type." The Python API calls it PyType_Type. It's the type of type objects. You really want to be able to pass &PyType_Type into this macro but some compilers won't let you statically initialize a structure member with a symbol defined in some other module, so you'll have to fill that in later.

The next member, ob_size, might look important but it's a remnant from an older version of the Python API and should be ignored. The member after the name of your type, tp_basicsize, represents the size of all your object instances. When the interpreter needs to allocate storage space for a new instance, it will request tp_basicsize bytes.

Most of the rest of the members are currently NULL, but you'll be filling them in later. They'll hold function pointers for some of the more common operations that many objects support.

The tp_flags member specifies some default flags for the type object, which all type objects need; and the tp_doc member holds a pointer to the docstring for the type, which you always want to provide because you're a good Python citizen.

Notice the tp_alloc and tp_free members, which are set to NULL. Aren't those the members you're calling from Encoder_new and Encoder_dealloc? Yes, they are, but you're going to use a Python API function to fill them in with the appropriate addresses later on because some platforms don't like it when you statically initialize structure members with addresses of functions in other libraries.

At this point, you've defined two structures. To actually make them available via your extension module, you need to add some code to your module's initialization function:

PyMODINIT_FUNC initpylame2() {
    PyObject *m;
    if (PyType_Ready(&pylame2_EncoderType) < 0) {
        return;
    }
    m = Py_InitModule3("pylame2", pylame2_methods, "My second LAME
module.");
    Py_INCREF(&pylame2_EncoderType);
    PyModule_AddObject(m, "Encoder", (PyObject *)
&pylame2_EncoderType);
}

PyType_Ready gets a type object "ready" for use by the interpreter. It sets the type of the object to PyType_Type and sets a number of the function pointer members that you had previously left NULL, along with a number of other bookkeeping tasks necessary in order to hook everything up properly, including setting your tp_alloc and tp_free members to suitable functions.

After you get your type object ready, you create your module as usual, but this time you're saving the return value (a pointer to a module object) so you can add your new type object to the module. Previously, you had been ignoring the return value and letting the method table define all of the members of the module. Because there's no way to fit a PyObject pointer into a method table, you need to use the PyModule_AddObject function to add your type object to the module. This function takes in the pointer to the module returned from Py_InitModule3, the name of your new object as it should be known in the module, and the pointer to the new object itself.

If you were to compile what you had so far, you'd be able to create new Encoder instances:

>>> import pylame2
>>> e = pylame2.Encoder()

That object doesn't do you much good, however, because it doesn't have any useful behavior yet.

To make these objects useful, you have to allow for some information to be passed into their initialization functions. That information could simply be the path to the file to which you want to write. Your initialization function could use that path to open a file handle that would enable you to write to it, but there'll be no writing until somebody invokes the encode method on your object. Therefore, your object needs to retain the handle for the file it opened.

You're also going to be invoking functions defined in the LAME library, so your objects will also need to remember the pointer to the lame_global_flags structure returned by lame_init.

Here's your structure with state and a modified Encoder_new function to initialize it:

typedef struct {
    PyObject_HEAD
    FILE *outfp;
    lame_global_flags *gfp;
} pylame2_EncoderObject;

static PyObject *Encoder_new(PyTypeObject *type, PyObject *args,
PyObject *kw) {
    pylame2_EncoderObject *self = (pylame2_EncoderObject *)
type->tp_alloc(type, 0);
    self->outfp = NULL;
    self->gfp = NULL;
    return (PyObject *)self;
}

You're not checking args and kw here, because this is the equivalent of Python's __new__ method, not __init__. It's in your C implementation of __init__ that you'll be opening the file and initializing the LAME library:

static int Encoder_init(pylame2_EncoderObject *self,
                         PyObject *args, PyObject *kw) {
    char *outpath;
    if (!PyArg_ParseTuple(args, "s", &outpath)) {
        return −1;
    }
    if (self->outfp || self->gfp) {
        PyErr_SetString(PyExc_Exception, "__init__ already called");
        return −1;
    }
    self->outfp = fopen(outpath, "wb");
    self->gfp = lame_init();
    lame_init_params(self->gfp);
    return 0;
}

Your __init__ implementation is checking two things. The first you've already seen. You're using PyArg_ParseTuple to ensure that you were passed in one string parameter. The second check is ensuring that the outfp and gfp members of your instance are NULL. If they're not, this function must already have been called for this object, so return the appropriate error code for this function after using the PyErr_SetString function to "set" an exception. After you return into the Python interpreter, an exception will be raised and your caller is going to have to catch it or suffer the consequences. You need to do this because it's always possible to call __init__ twice on an object. With this code in place, calling __init__ twice on your objects might look like this:

>>> import pylame2
>>> e = pylame2.Encoder("foo.mp3")
>>> e.__init__("bar.mp3")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
Exception: __init__ already called

Of course, you could be nice and reinitialize the object, but that's not necessary for what you want to get done today. You should also be checking for errors, of course.

To indicate that you want this initialization function to be called for each new instance of your class, you need to add the address that this function needs to your type object:

(initproc)Encoder_init,        /* tp_init */

You're casting it here because you cheated and declared that Encoder_init accepted a pylame2_EncoderObject * as its first argument instead of the more generic PyObject *. You can get away with this type of stuff in C, but you have to be absolutely certain that you know what you're doing.

Because your instances now contain state that reference resources, you need to ensure that those resources are properly disposed of when the object is released. To do this, update your Encoder_dealloc function:

static void Encoder_dealloc(pylame2_EncoderObject *self) {
    if (self->gfp) {
        lame_close(self->gfp);
    }
    if (self->outfp) {
        fclose(self->outfp);
    }
    self->ob_type->tp_free(self);
}

If you were to build your module with the code you have so far, import it, create an encoder object, and then delete it (using the del keyword or rebinding the variable referencing your object to None or some other object), you would end up with an empty file in the current directory because all you did was open and then close it without writing anything to it. You're getting closer!

You now need to add support for the encode and close methods to your type. Previously, you had created what was called a method table, but that was really defining module-level functions. Defining methods for classes is just as easy but different. You define the methods just like the module-level functions and then create a table listing them:

static PyObject *Encoder_encode(PyObject *self, PyObject *args) {
    Py_RETURN_NONE;
}

static PyObject *Encoder_close(PyObject *self) {
    Py_RETURN_NONE;
}

static PyMethodDef Encoder_methods[] = {
    { "encode", Encoder_encode, METH_VARARGS,
           "Encodes and writes data to the output file." },
    { "close", (PyCFunction)Encoder_close, METH_NOARGS,
          "Closes the output file." },
    { NULL, NULL, 0, NULL }
};

Then the address of the table is used to initialize the tp_methods member of your type object:

Encoder_methods,               /* tp_methods */

With those stubs in place, you could build the module and see the methods and even call them on your objects:

>>> import pylame2
>>> e = pylame2.Encoder('foo.mp3')
>>> dir(e)
['__class__', '__delattr__', '__doc__', '__getattribute__',
'__hash__','__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__str__', 'close', 'encode']
>>> e.encode()
>>> e.close()

All you have to do now is implement the functions. Here's Encoder_encode (sans complete error-checking):

static PyObject *Encoder_encode(pylame2_EncoderObject *self,
PyObject *args) {
    char *in_buffer;
    int in_length;
    int mp3_length;
    char *mp3_buffer;
    int mp3_bytes;
    if (!(self->outfp && self->gfp)) {
        PyErr_SetString(PyExc_Exception, "encoder not open");
        return NULL;
    }
    if (!PyArg_ParseTuple(args, "s#", &in_buffer, &in_length)) {
        return NULL;
    }
    in_length /= 2;
    mp3_length = (int)(1.25 * in_length) + 7200;
    mp3_buffer = (char *)malloc(mp3_length);
    if (in_length > 0) {
mp3_bytes = lame_encode_buffer_interleaved(
            self->gfp,
            (short *)in_buffer,
            in_length / 2,
            mp3_buffer,
            mp3_length
        );
        if (mp3_bytes > 0) {
            fwrite(mp3_buffer, 1, mp3_bytes, self->outfp);
        }
    }
    free(mp3_buffer);
    Py_RETURN_NONE;
}

You expect this argument to be passed a string. Unlike strings in C, which are simple NULL-terminated arrays of characters, you expect that this string will contain embedded NULL characters (the NULL character, which is simple the end-of-string indication in C has the value of '' in C. Note the single quotes — in C remember that the different quotes have different meanings. NULL can also be shown as "" in C.) Therefore, instead of using the "s" indicator when parsing the arguments, you use "s#", which allows for embedded NULL characters. PyArg_ParseTuple will return both the bytes in a buffer and the length of the buffer instead of tacking a NULL character on the end. Other than that, this function is pretty straightforward.

Here's Encoder_close:

static PyObject *Encoder_close(pylame2_EncoderObject *self) {
    int mp3_length;
    char *mp3_buffer;
    int mp3_bytes;
    if (!(self->outfp && self->gfp)) {
        PyErr_SetString(PyExc_Exception, "encoder not open");
        return NULL;
    }
    mp3_length = 7200;
    mp3_buffer = (char *)malloc(mp3_length);
    mp3_bytes = lame_encode_flush(self->gfp, mp3_buffer,
sizeof(mp3_buffer));
    if (mp3_bytes > 0) {
        fwrite(mp3_buffer, 1, mp3_bytes, self->outfp);
    }
    free(mp3_buffer);
    lame_close(self->gfp);
    self->gfp = NULL;
    fclose(self->outfp);
    self->outfp = NULL;
    Py_RETURN_NONE;
}

You need to make sure you set outfp and gfp to NULL here to prevent Encoder_dealloc from trying to close them again.

For both Encoder_encode and Encoder_close, you're checking to make sure your object is in a valid state for encoding and closing. Somebody could always call close and then follow that up with another call to close or even a call to encode. It's better to raise an exception than to bring down the process hosting your extension module.

You've gone over a lot to get to this point, so it would probably help if you could see the entire extension module in one large example:

#include <Python.h>

#include <lame.h>

typedef struct {
    PyObject_HEAD
    FILE *outfp;
    lame_global_flags *gfp;
} pylame2_EncoderObject;

static PyObject *Encoder_new(PyTypeObject *type, PyObject *args,
PyObject *kw) {
    pylame2_EncoderObject *self = (pylame2_EncoderObject *)
type->tp_alloc(type, 0);
    self->outfp = NULL;
    self->gfp = NULL;
    return (PyObject *)self;
}

static void Encoder_dealloc(pylame2_EncoderObject *self) {
    if (self->gfp) {
        lame_close(self->gfp);
    }
    if (self->outfp) {
        fclose(self->outfp);
    }
    self->ob_type->tp_free(self);
}

static int Encoder_init(pylame2_EncoderObject *self, PyObject *args,
PyObject *kw) {
    char *outpath;
    if (!PyArg_ParseTuple(args, "s", &outpath)) {
        return −1;
    }
    if (self->outfp || self->gfp) {
        PyErr_SetString(PyExc_Exception, "__init__ already called");
        return −1;
    }
    self->outfp = fopen(outpath, "wb");
    self->gfp = lame_init();
    lame_init_params(self->gfp);
    return 0;
}
static PyObject *Encoder_encode(pylame2_EncoderObject *self,
PyObject *args) {
    char *in_buffer;
    int in_length;
    int mp3_length;
    char *mp3_buffer;
    int mp3_bytes;
    if (!(self->outfp && self->gfp)) {
        PyErr_SetString(PyExc_Exception, "encoder not open");
        return NULL;
    }
    if (!PyArg_ParseTuple(args, "s#", &in_buffer, &in_length)) {
        return NULL;
    }
    in_length /= 2;
    mp3_length = (int)(1.25 * in_length) + 7200;
    mp3_buffer = (char *)malloc(mp3_length);
    if (in_length > 0) {
        mp3_bytes = lame_encode_buffer_interleaved(
            self->gfp,
            (short *)in_buffer,
            in_length / 2,
            mp3_buffer,
            mp3_length
        );
        if (mp3_bytes > 0) {
            fwrite(mp3_buffer, 1, mp3_bytes, self->outfp);
        }
    }
    free(mp3_buffer);
    Py_RETURN_NONE;
}

static PyObject *Encoder_close(pylame2_EncoderObject *self) {
    int mp3_length;
    char *mp3_buffer;
    int mp3_bytes;
    if (!(self->outfp && self->gfp)) {
        PyErr_SetString(PyExc_Exception, "encoder not open");
        return NULL;
    }
    mp3_length = 7200;
    mp3_buffer = (char *)malloc(mp3_length);
    mp3_bytes = lame_encode_flush(self->gfp, mp3_buffer,
sizeof(mp3_buffer));
    if (mp3_bytes > 0) {
        fwrite(mp3_buffer, 1, mp3_bytes, self->outfp);
    }
    free(mp3_buffer);
    lame_close(self->gfp);
    self->gfp = NULL;
    fclose(self->outfp);
    self->outfp = NULL;
    Py_RETURN_NONE;
}

static PyMethodDef Encoder_methods[] = {
    { "encode", (PyCFunction)Encoder_encode, METH_VARARGS,
          "Encodes and writes data to the output file." },
    { "close", (PyCFunction)Encoder_close, METH_NOARGS,
          "Closes the output file." },
    { NULL, NULL, 0, NULL }
};

static PyTypeObject pylame2_EncoderType = {
    PyObject_HEAD_INIT(NULL)
    0,                             /* ob_size */
    "pylame2.Encoder",             /* tp_name */
    sizeof(pylame2_EncoderObject), /* tp_basicsize */
    0,                             /* tp_itemsize */
    (destructor)Encoder_dealloc,   /* tp_dealloc */
    0,                             /* tp_print */
    0,                             /* tp_getattr */
    0,                             /* tp_setattr */
    0,                             /* tp_compare */
    0,                             /* tp_repr */
    0,                             /* tp_as_number */
    0,                             /* tp_as_sequence */
    0,                             /* tp_as_mapping */
    0,                             /* tp_hash */
    0,                             /* tp_call */
    0,                             /* tp_str */
    0,                             /* tp_getattro */
    0,                             /* tp_setattro */
    0,                             /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT,            /* tp_flags */
    "My first encoder object.",    /* tp_doc */
    0,                             /* tp_traverse */
    0,                             /* tp_clear */
    0,                             /* tp_richcompare */
    0,                             /* tp_weaklistoffset */
    0,                             /* tp_iter */
    0,                             /* tp_iternext */
    Encoder_methods,               /* tp_methods */
    0,                             /* tp_members */
    0,                             /* tp_getset */
    0,                             /* tp_base */
    0,                             /* tp_dict */
    0,                             /* tp_descr_get */
    0,                             /* tp_descr_set */
    0,                             /* tp_dictoffset */
    (initproc)Encoder_init,        /* tp_init */
    0,                             /* tp_alloc */
    Encoder_new,                   /* tp_new */
    0,                             /* tp_free */
};
static PyMethodDef pylame2_methods[] = {
    { NULL, NULL, 0, NULL }
};

PyMODINIT_FUNC initpylame2() {
    PyObject *m;
    if (PyType_Ready(&pylame2_EncoderType) < 0) {
        return;
    }
    m = Py_InitModule3("pylame2", pylame2_methods, "My second LAME
module.");
    Py_INCREF(&pylame2_EncoderType);
    PyModule_AddObject(m, "Encoder", (PyObject *)
&pylame2_EncoderType);
}

You can now save this file as pylame2.c and compile it.

On Linux:

gcc -shared -I/usr/include/python3.1 -I/usr/include/lame pylame2.c 
  -lmp3lame -o pylame2.so

On Windows:

cl /LD /IC:Python31include /IC:lame-3.98.2include pylame2.c 
  C:Python31libspython31.lib 
 C:lame-3.98.2libmp3lameReleaselibmp3lame.lib 
  C:lame-3.98.2mpglibReleasempglib.lib

Once that's done, you can exercise your new extension module with a simple driver script written entirely in Python:

import pylame2

INBUFSIZE = 4096

encoder = pylame2.Encoder('test.mp3')
input = file('test.raw', 'rb')
data = input.read(INBUFSIZE)

while data != '':
    encoder.encode(data)
    data = input.read(INBUFSIZE)

input.close()
encoder.close()

That completes version 2 of your extension module. You're able to read data from anywhere. Your sample driver is still reading from the raw input file you created earlier, but there's nothing stopping it from extracting that information out of a WAV file or reading it from a socket.

The only deficiency with this version of the module is that you can't customize how the encoded data is written. You're going to fix that in the next revision of the module by "writing" to an object and not directly to the file system. Intrigued? Read on.

Using Python Objects from C Code

Python's a dynamically typed language, so it doesn't have a formal concept of interfaces even though we use them all the time. The most common interface is the "file" interface. Terms like "file-like object" describe this interface. It's really nothing more than an object that "looks like" a file object. Usually, it can get by with only either a read or write method and nothing more.

For the next version of your extension module, you're going to allow your users to pass in any file-like object (supporting a write method) when constructing new encoder objects. Your encoder object will simply call the write method with the MP3-encoded bytes. You don't have to be concerned about whether it's a real file object or a socket or anything else your users can dream up. This is polymorphism at its finest.

In the last version of the module, your object held a FILE *. You need to change this by adding a reference to a PyObject and removing the FILE *:

typedef struct {
    PyObject_HEAD
    PyObject *outfp;
    lame_global_flags *gfp;
} pylame3_EncoderObject;

Encoder_new can stay the same because all it does is set outfp to NULL. Encoder_dealloc, however, needs to be modified:

static void Encoder_dealloc(pylame3_EncoderObject *self) {
    if (self->gfp) {
        lame_close(self->gfp);
    }
    Py_XDECREF(self->outfp);
    self->ob_type->tp_free(self);
}

Instead of calling fclose, you use the Py_XDECREF macro to decrement the reference count by one. You can't delete the object, because there might be other references to it. In fact, other references to this object are likely because the object came from outside of this module. You didn't create it, but somebody else did and passed it in to you. They probably still have a variable bound to that object.

If you're decrementing the reference count here in Encoder_dealloc, you must be incrementing it someplace else. You're doing that in Encoder_init:

static int Encoder_init(pylame3_EncoderObject *self,
                        PyObject *args, PyObjecti*kw) {
    PyObject *outfp;
    if (!PyArg_ParseTuple(args, "O", &outfp)) {
return −1;
    }
    if (self->outfp || self->gfp) {
        PyErr_SetString(PyExc_Exception, "__init__ already called");
        return −1;
    }
    self->outfp = outfp;
    Py_INCREF(self->outfp);
    self->gfp = lame_init();
    lame_init_params(self->gfp);
    return 0;
}

You've modified the format string for PyArg_ParseTuple to contain "0" instead of "s". "O" indicates that you want an object pointer. You don't care what type of object it is; you just don't want PyArg_ParseTuple to do any kind of conversion from the object to some primitive C data type.

After you're sure you were passed the correct number of arguments and __init__ hasn't been called before, you can store the object argument for later use. Here you're using the Py_INCREF macro to increment the reference count. This will keep the object alive until you decrement the count.

Why did the previous macro, Py_XDECREF, have an X in it, whereas this one did not? There are actually two forms of these macros. The "X" versions check to ensure that the pointer isn't NULL before adjusting the reference count. The non-"X" versions don't do that check. They're faster, but you have to know what you're doing in order to use them correctly. The documentation for PyArg_ParseTuple tells us that if it succeeds, the output pointer will be valid, so you were safe using Py_INCREF here, but you might not have been that safe using Encoder_dealloc.

Making sure that you perfectly balance your increments with your decrements is the trickiest part of implementing extension modules, so be careful. If you don't, you could leak memory, or you might access an object that's already been deleted, which is never a good thing.

It's also very important to pay attention to the documentation for the different API functions you use in terms of references. Some functions will increase the reference count before returning it. Others won't. The documentation for PyArg_ParseTuple states that the reference count is not increased, which is why you have to increment it if you expect it to stick around for as long as you need it.

Now that you have an object (that hopefully has a write method), you need to use it. Instead of calling fwrite in Encoder_encode and Encoder_close, you want to call the write method on your object. The Python API has a function called PyObject_CallMethod that will do exactly what you need it to do. Here's the snippet of code you would use in both Encoder_encode and Encoder_close to call the write method on your object:

PyObject* write_result = PyObject_CallMethod(
                             self->outfp, "write", "(s#)",
mp3_buffer, mp3_bytes);
if (!write_result) {
    free(mp3_buffer);
    return NULL;
}
Py_DECREF(write_result);

PyObject_CallMethod requires three parameters. The first is the object on which you're invoking the method. This object will be the first argument into the method, usually called self. The second argument to PyObject_CallMethod is the name of the method. The third argument is a format string describing the arguments. This can be NULL if there are no arguments. When it's not NULL, it looks very similar to a PyArg_ParseTuple format string except it's always surrounded with parentheses. PyObject_CallMethod is basically calling Py_BuildValue for you with these parameters, and the tuple that results is being passed in to your method.

PyObject_CallMethod returns a PyObject *. All write method implementations probably return None, but you're still responsible for decrementing the reference count.

Because most of pylame3.c hasn't changed from pylame2.c, I won't include the entire file here. It shouldn't be too difficult to insert the changes described in this section.

Once the new version of the module is compiled, you can use any file-like object you want as a parameter to the Encoder object. Here's an example that demonstrates this:

import pylame3

INBUFSIZE = 4096

class MyFile(file):

    def __init__(self, path, mode):
        file.__init__(self, path, mode)
        self.n = 0

    def write(self, s):
        file.write(self, s)
        self.n += 1

output = MyFile('test3.mp3', 'wb')
encoder = pylame3.Encoder(output)
input = file('test.raw', 'rb')

data = input.read(INBUFSIZE)
while data != '':
    encoder.encode(data)
    data = input.read(INBUFSIZE)

input.close()
encoder.close()
output.close()

print('output.write was called %d times' % output.n)

This example includes a class derived from the built-in file object to show off some of the stuff you can do. OK, it's not that impressive, but it at least shows how flexible your new extension module can be. As long as you pass in an object that has a write method, your extension module is happy.

Summary

In this chapter, you learned how to expose simple functions implemented in C to Python developers by creating an extension module and defining a method table. Converting Python objects to C values is done using PyArg_ParseTuple. Going the opposite way, turning a C value into a Python object is done using Py_BuildValue.

You also looked at how to define new types in an extension module by defining the object and type structures. You set up the type object so that it could create new instances of your type and later destroy them. Making sure that you correctly increment and decrement the reference counts of objects that you use requires careful consideration.

There's a lot more to writing extension modules, of course, but not enough room in one chapter to cover it all. Be sure to consult the documentation at http://docs.python.org/ext/ext.html and http://docs.python.org/api/api.html.

The key points to take away from this chapter are:

  • While code that is written in C may run faster than code written in Python, it is important to note that writing code in Python is much faster than writing it in C.

  • Python extension modules are normal C libraries. On UNIX machines, these libraries usually end in .so (for shared object). On Windows machines, you typically see .dll (for dynamically linked library).

  • The distutils package makes it possible to distribute Python modules in a standard way.

  • The - -help-commands argument displays all of the commands that a setup script is capable of responding to.

  • You can use PyArg_ParseTuple and PyArg_ParseTupleAndKeywords to convert from Python objects into C values. To do the reverse, use Py_BuildValue.

Exercises

  1. Add a new module-level function to the foo module you created earlier in the chapter. Call the function reverse_tuple and implement it so that it accepts one tuple as an argument and returns a similarly sized tuple with the elements in reverse order. Completing this exercise is going to require research on your part because you need to know how to "unpack" a tuple. You already know one way to create a tuple (using Py_BuildValue), but that's not going to work for this exercise, because you want your function to work with tuples of arbitrary size. The Python/C API documentation for tuples (at http://docs.python.org/api/tupleObjects.html) lists all of the functions you need to accomplish this. Be careful with your reference counting!

  2. List and dictionary objects are an extremely important part of nearly all Python applications so it would be useful to learn how to manipulate those objects from C. Add another function to the foo module called dict2list that accepts a dictionary as a parameter and returns a list. The members of the list should alternate between the keys and the values in the dictionary. The order isn't important as long as each key is followed by its value. You'll have to look up how to iterate over the items in the dictionary (hint: look up PyDict_Next) and how to create a list and append items to it (hint: look up PyList_New and PyList_Append).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.237.24