13.4 Calling C Code with cgo

A Go program might need to use a hardware driver implemented in C, query an embedded database implemented in C++, or use some linear algebra routines implemented in Fortran. C has long been the lingua franca of programming, so many packages intended for widespread use export a C-compatible API, regardless of the language of their implementation.

In this section, we’ll build a simple data compression program that uses cgo, a tool that creates Go bindings for C functions. Such tools are called foreign-function interfaces (FFIs), and cgo is not the only one for Go programs. SWIG (swig.org) is another; it provides more complex features for integrating with C++ classes, but we won’t show it here.

The compress/... subtree of the standard library provides compressors and decompressors for popular compression algorithms, including LZW (used by the Unix compress command) and DEFLATE (used by the GNU gzip command). The APIs of these packages vary slightly in details, but they all provide a wrapper for an io.Writer that compresses the data written to it, and a wrapper for an io.Reader that decompresses the data read from it. For example:

package gzip // compress/gzip

func NewWriter(w io.Writer) io.WriteCloser
func NewReader(r io.Reader) (io.ReadCloser, error)

The bzip2 algorithm, which is based on the elegant Burrows-Wheeler transform, runs slower than gzip but yields significantly better compression. The compress/bzip2 package provides a decompressor for bzip2, but at the moment the package provides no compressor. Building one from scratch is a substantial undertaking, but there is a well-documented and high-performance open-source C implementation, the libbzip2 package from bzip.org.

If the C library were small, we would just port it to pure Go, and if its performance were not critical for our purposes, we would be better off invoking a C program as a helper subprocess using the os/exec package. It’s when you need to use a complex, performance-critical library with a narrow C API that it may make sense to wrap it using cgo. For the rest of this chapter, we’ll work through an example.

From the libbzip2 C package, we need the bz_stream struct type, which holds the input and output buffers, and three C functions: BZ2_bzCompressInit, which allocates the stream’s buffers; BZ2_bzCompress, which compresses data from the input buffer to the output buffer; and BZ2_bzCompressEnd, which releases the buffers. (Don’t worry about the mechanics of the libbzip2 package; the purpose of this example is to show how the parts fit together.)

We’ll call the BZ2_bzCompressInit and BZ2_bzCompressEnd C functions directly from Go, but for BZ2_bzCompress, we’ll define a wrapper function in C, to show how it’s done. The C source file below lives alongside the Go code in our package:

gopl.io/ch13/bzip
/* This file is gopl.io/ch13/bzip/bzip2.c,         */
/* a simple wrapper for libbzip2 suitable for cgo. */
#include <bzlib.h>

int bz2compress(bz_stream *s, int action,
                char *in, unsigned *inlen, char *out, unsigned *outlen) {
  s->next_in = in;
  s->avail_in = *inlen;
  s->next_out = out;
  s->avail_out = *outlen;
  int r = BZ2_bzCompress(s, action);
  *inlen -= s->avail_in;
  *outlen -= s->avail_out;
  return r;
}

Now let’s turn to the Go code, the first part of which is shown below. The import "C" declaration is special. There is no package C, but this import causes go build to preprocess the file using the cgo tool before the Go compiler sees it.

// Package bzip provides a writer that uses bzip2 compression (bzip.org).
package bzip

/*
#cgo CFLAGS: -I/usr/include
#cgo LDFLAGS: -L/usr/lib -lbz2
#include <bzlib.h>
int bz2compress(bz_stream *s, int action,
                char *in, unsigned *inlen, char *out, unsigned *outlen);
*/
import "C"

import (
    "io"
    "unsafe"
)

type writer struct {
    w      io.Writer // underlying output stream
    stream *C.bz_stream
    outbuf [64 * 1024]byte
}

// NewWriter returns a writer for bzip2-compressed streams.
func NewWriter(out io.Writer) io.WriteCloser {
    const (
        blockSize  = 9
        verbosity  = 0
        workFactor = 30
    )
    w := &writer{w: out, stream: new(C.bz_stream)}
    C.BZ2_bzCompressInit(w.stream, blockSize, verbosity, workFactor)
    return w
}

During preprocessing, cgo generates a temporary package that contains Go declarations corresponding to all the C functions and types used by the file, such as C.bz_stream and C.BZ2_bzCompressInit. The cgo tool discovers these types by invoking the C compiler in a special way on the contents of the comment that precedes the import declaration.

The comment may also contain #cgo directives that specify extra options to the C toolchain. The CFLAGS and LDFLAGS values contribute extra arguments to the compiler and linker commands so that they can locate the bzlib.h header file and the libbz2.a archive library. The example assumes that these are installed beneath /usr on your system. You may need to alter or delete these flags for your installation.

NewWriter makes a call to the C function BZ2_bzCompressInit to initialize the buffers for the stream. The writer type includes another buffer that will be used to drain the decompressor’s output buffer.

The Write method, shown below, feeds the uncompressed data to the compressor, calling the function bz2compress in a loop until all the data has been consumed. Observe that the Go program may access C types like bz_stream, char, and uint, C functions like bz2compress, and even object-like C preprocessor macros such as BZ_RUN, all through the C.x notation. The C.uint type is distinct from Go’s uint type, even if both have the same width.

func (w *writer) Write(data []byte) (int, error) {
    if w.stream == nil {
        panic("closed")
    }
    var total int // uncompressed bytes written

    for len(data) > 0 {
        inlen, outlen := C.uint(len(data)), C.uint(cap(w.outbuf))
        C.bz2compress(w.stream, C.BZ_RUN,
            (*C.char)(unsafe.Pointer(&data[0])), &inlen,
            (*C.char)(unsafe.Pointer(&w.outbuf)), &outlen)
        total += int(inlen)
        data = data[inlen:]
        if _, err := w.w.Write(w.outbuf[:outlen]); err != nil {
            return total, err
        }
    }
    return total, nil
}

Each iteration of the loop passes bz2compress the address and length of the remaining portion of data, and the address and capacity of w.outbuf. The two length variables are passed by their addresses, not their values, so that the C function can update them to indicate how much uncompressed data was consumed and how much compressed data was produced. Each chunk of compressed data is then written to the underlying io.Writer.

The Close method has a similar structure to Write, using a loop to flush out any remaining compressed data from the stream’s output buffer.

// Close flushes the compressed data and closes the stream.
// It does not close the underlying io.Writer.
func (w *writer) Close() error {
    if w.stream == nil {
        panic("closed")
    }
    defer func() {
        C.BZ2_bzCompressEnd(w.stream)
        w.stream = nil
    }()
    for {
        inlen, outlen := C.uint(0), C.uint(cap(w.outbuf))
        r := C.bz2compress(w.stream, C.BZ_FINISH, nil, &inlen,
            (*C.char)(unsafe.Pointer(&w.outbuf)), &outlen)
        if _, err := w.w.Write(w.outbuf[:outlen]); err != nil {
            return err
        }
        if r == C.BZ_STREAM_END {
            return nil
        }
    }
}

Upon completion, Close calls C.BZ2_bzCompressEnd to release the stream buffers, using defer to ensure that this happens on all return paths. At this point the w.stream pointer is no longer safe to dereference. To be defensive, we set it to nil, and add explicit nil checks to each method, so that the program panics if the user mistakenly calls a method after Close. Not only is writer not concurrency-safe, but concurrent calls to Close and Write could cause the program to crash in C code. Fixing this is Exercise 13.3.

The program below, bzipper, is a bzip2 compressor command that uses our new package. It behaves like the bzip2 command present on many Unix systems.

gopl.io/ch13/bzipper
// Bzipper reads input, bzip2-compresses it, and writes it out.
package main

import (
    "io"
    "log"
    "os"

    "gopl.io/ch13/bzip"
)

func main() {
    w := bzip.NewWriter(os.Stdout)
    if _, err := io.Copy(w, os.Stdin); err != nil {
        log.Fatalf("bzipper: %v
", err)
    }
    if err := w.Close(); err != nil {
        log.Fatalf("bzipper: close: %v
", err)
    }
}

In the session below, we use bzipper to compress /usr/share/dict/words, the system dictionary, from 938,848 bytes to 335,405 bytes—about a third of its original size—then uncompress it with the system bunzip2 command. The SHA256 hash is the same before and after, giving us confidence that the compressor is working correctly. (If you don’t have sha256sum on your system, use your solution to Exercise 4.2.)

$ go build gopl.io/ch13/bzipper
$ wc -c < /usr/share/dict/words
938848
$ sha256sum < /usr/share/dict/words
126a4ef38493313edc50b86f90dfdaf7c59ec6c948451eac228f2f3a8ab1a6ed -
$ ./bzipper < /usr/share/dict/words | wc -c
335405
$ ./bzipper < /usr/share/dict/words | bunzip2 | sha256sum
126a4ef38493313edc50b86f90dfdaf7c59ec6c948451eac228f2f3a8ab1a6ed -

We’ve demonstrated linking a C library into a Go program. Going in the other direction, it’s also possible to compile a Go program as a static archive that can be linked into a C program or as a shared library that can be dynamically loaded by a C program. We’ve only scratched the surface of cgo here, and there is much more to say about memory management, pointers, callbacks, signal handling, strings, errno, finalizers, and the relationship between goroutines and operating system threads, much of it very subtle. In particular, the rules for correctly passing pointers from Go to C or vice versa are complex, for reasons similar to those we discussed in Section 13.2, and not yet authoritatively specified. For further reading, start with https://golang.org/cmd/cgo.

Exercise 13.3: Use sync.Mutex to make bzip2.writer safe for concurrent use by multiple goroutines.

Exercise 13.4: Depending on C libraries has its drawbacks. Provide an alternative pure-Go implementation of bzip.NewWriter that uses the os/exec package to run /bin/bzip2 as a subprocess.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.162.247