cgo
A Go program might need to use a hardware driver implemented in C, query an embedded database implemented in C++, or use some linear algebra routines implemented in Fortran. C has long been the lingua franca of programming, so many packages intended for widespread use export a C-compatible API, regardless of the language of their implementation.
In this section, we’ll build a simple data compression program that
uses cgo
, a tool that creates Go bindings for C functions.
Such tools are called foreign-function interfaces (FFIs), and
cgo
is not the only one for Go programs.
SWIG (swig.org
) is another; it provides more complex
features for integrating with C++ classes, but we won’t show it here.
The compress/...
subtree of the standard library provides
compressors and decompressors for popular compression algorithms,
including LZW (used by the Unix compress
command) and DEFLATE
(used by the GNU gzip
command).
The APIs of these packages vary slightly in details, but they all
provide a wrapper for an io.Writer
that compresses the data
written to it, and a wrapper for an io.Reader
that decompresses
the data read from it. For example:
package gzip // compress/gzip func NewWriter(w io.Writer) io.WriteCloser func NewReader(r io.Reader) (io.ReadCloser, error)
The bzip2 algorithm, which is based on the elegant Burrows-Wheeler
transform, runs slower than gzip but yields significantly better
compression. The compress/bzip2
package provides a decompressor
for bzip2, but at the moment the package provides no compressor.
Building one from scratch is a substantial undertaking, but there is a
well-documented and high-performance open-source C
implementation, the libbzip2
package from bzip.org
.
If the C library were small, we would just port it to pure Go, and if
its performance were not critical for our purposes, we would be better
off invoking a C program as a helper subprocess using the
os/exec
package.
It’s when you need to use a complex, performance-critical library with
a narrow C API that it may make sense to wrap it using cgo
.
For the rest of this chapter, we’ll work through an example.
From the libbzip2
C package, we need the bz_stream
struct type, which holds the input and output buffers, and three C
functions: BZ2_bzCompressInit
, which allocates the stream’s
buffers; BZ2_bzCompress
, which compresses data from the input
buffer to the output buffer; and BZ2_bzCompressEnd
, which
releases the buffers.
(Don’t worry about the mechanics of the libbzip2
package; the
purpose of this example is to show how the parts fit together.)
We’ll call the BZ2_bzCompressInit
and BZ2_bzCompressEnd
C functions directly from Go, but for BZ2_bzCompress
, we’ll
define a wrapper function in C, to show how it’s done.
The C source file below lives alongside the Go code in our package:
/* This file is gopl.io/ch13/bzip/bzip2.c, */ /* a simple wrapper for libbzip2 suitable for cgo. */ #include <bzlib.h> int bz2compress(bz_stream *s, int action, char *in, unsigned *inlen, char *out, unsigned *outlen) { s->next_in = in; s->avail_in = *inlen; s->next_out = out; s->avail_out = *outlen; int r = BZ2_bzCompress(s, action); *inlen -= s->avail_in; *outlen -= s->avail_out; return r; }
Now let’s turn to the Go code, the first part of which is shown below.
The import "C"
declaration is special.
There is no package C
, but this import causes go build
to preprocess the file using the cgo
tool before the Go
compiler sees it.
// Package bzip provides a writer that uses bzip2 compression (bzip.org). package bzip /* #cgo CFLAGS: -I/usr/include #cgo LDFLAGS: -L/usr/lib -lbz2 #include <bzlib.h> int bz2compress(bz_stream *s, int action, char *in, unsigned *inlen, char *out, unsigned *outlen); */ import "C" import ( "io" "unsafe" ) type writer struct { w io.Writer // underlying output stream stream *C.bz_stream outbuf [64 * 1024]byte } // NewWriter returns a writer for bzip2-compressed streams. func NewWriter(out io.Writer) io.WriteCloser { const ( blockSize = 9 verbosity = 0 workFactor = 30 ) w := &writer{w: out, stream: new(C.bz_stream)} C.BZ2_bzCompressInit(w.stream, blockSize, verbosity, workFactor) return w }
During preprocessing, cgo
generates a temporary package that contains
Go declarations corresponding to all the C functions and types used by
the file, such as C.bz_stream
and C.BZ2_bzCompressInit
.
The cgo
tool discovers these types by invoking the C compiler
in a special way on the contents of the comment that precedes the
import declaration.
The comment may also contain #cgo
directives that specify extra
options to the C toolchain.
The CFLAGS
and LDFLAGS
values contribute extra arguments
to the compiler and linker commands so that they can locate the
bzlib.h
header file and the libbz2.a
archive library.
The example assumes that these are installed beneath /usr
on
your system. You may need to alter or delete these flags for your
installation.
NewWriter
makes a call to the C function
BZ2_bzCompressInit
to initialize the buffers for the stream.
The writer
type includes another buffer that will be used to
drain the decompressor’s output buffer.
The Write
method, shown below, feeds the uncompressed
data
to the compressor, calling the function bz2compress
in a loop until all the data has been consumed.
Observe that the Go program may access C types like
bz_stream
, char
, and uint
, C functions like
bz2compress
, and even object-like C preprocessor macros such as
BZ_RUN
, all through the C.x
notation.
The C.uint
type is distinct from Go’s uint
type, even if
both have the same width.
func (w *writer) Write(data []byte) (int, error) { if w.stream == nil { panic("closed") } var total int // uncompressed bytes written for len(data) > 0 { inlen, outlen := C.uint(len(data)), C.uint(cap(w.outbuf)) C.bz2compress(w.stream, C.BZ_RUN, (*C.char)(unsafe.Pointer(&data[0])), &inlen, (*C.char)(unsafe.Pointer(&w.outbuf)), &outlen) total += int(inlen) data = data[inlen:] if _, err := w.w.Write(w.outbuf[:outlen]); err != nil { return total, err } } return total, nil }
Each iteration of the loop passes bz2compress
the address and length of
the remaining portion of data
, and the address and capacity of
w.outbuf
.
The two length variables are passed by their addresses, not their
values, so that the C function can update them to indicate how much
uncompressed data was consumed and how much compressed data was
produced.
Each chunk of compressed data is then written to the underlying io.Writer
.
The Close
method has a similar structure to Write
,
using a loop to flush out any remaining compressed data from the
stream’s output buffer.
// Close flushes the compressed data and closes the stream. // It does not close the underlying io.Writer. func (w *writer) Close() error { if w.stream == nil { panic("closed") } defer func() { C.BZ2_bzCompressEnd(w.stream) w.stream = nil }() for { inlen, outlen := C.uint(0), C.uint(cap(w.outbuf)) r := C.bz2compress(w.stream, C.BZ_FINISH, nil, &inlen, (*C.char)(unsafe.Pointer(&w.outbuf)), &outlen) if _, err := w.w.Write(w.outbuf[:outlen]); err != nil { return err } if r == C.BZ_STREAM_END { return nil } } }
Upon completion, Close
calls C.BZ2_bzCompressEnd
to
release the stream buffers, using defer
to ensure that this
happens on all return paths.
At this point the w.stream
pointer is no longer safe to
dereference.
To be defensive, we set it to nil
, and add explicit nil checks to each
method, so that the program panics if the user mistakenly calls a
method after Close
.
Not only is writer
not concurrency-safe, but
concurrent calls to Close
and Write
could cause the
program to crash in C code. Fixing this is Exercise 13.3.
The program below, bzipper
, is a bzip2 compressor command that
uses our new package.
It behaves like the bzip2
command present on many Unix systems.
// Bzipper reads input, bzip2-compresses it, and writes it out. package main import ( "io" "log" "os" "gopl.io/ch13/bzip" ) func main() { w := bzip.NewWriter(os.Stdout) if _, err := io.Copy(w, os.Stdin); err != nil { log.Fatalf("bzipper: %v ", err) } if err := w.Close(); err != nil { log.Fatalf("bzipper: close: %v ", err) } }
In the session below, we use bzipper
to compress
/usr/share/dict/words
, the system dictionary, from 938,848 bytes
to 335,405 bytes—about a third of its original size—then
uncompress it with the system bunzip2
command.
The SHA256 hash is the same before and after, giving us
confidence that the compressor is working correctly.
(If you don’t have sha256sum
on your system, use your solution
to Exercise 4.2.)
$ go build gopl.io/ch13/bzipper $ wc -c < /usr/share/dict/words 938848 $ sha256sum < /usr/share/dict/words 126a4ef38493313edc50b86f90dfdaf7c59ec6c948451eac228f2f3a8ab1a6ed - $ ./bzipper < /usr/share/dict/words | wc -c 335405 $ ./bzipper < /usr/share/dict/words | bunzip2 | sha256sum 126a4ef38493313edc50b86f90dfdaf7c59ec6c948451eac228f2f3a8ab1a6ed -
We’ve demonstrated linking a C library into a Go program.
Going in the other direction, it’s also
possible to compile a Go program as a static archive that can be linked
into a C program or as a shared library that can be dynamically
loaded by a C program.
We’ve only scratched the surface of cgo
here, and there is
much more to say about memory management, pointers, callbacks, signal
handling, strings, errno
, finalizers, and the relationship between
goroutines and operating system threads, much of it very subtle.
In particular, the rules for correctly passing pointers from Go to C
or vice versa are complex, for reasons similar to those we discussed
in Section 13.2, and not yet authoritatively specified.
For further reading, start with https://golang.org/cmd/cgo
.
Exercise 13.3:
Use sync.Mutex
to make bzip2.writer
safe for
concurrent use by multiple goroutines.
Exercise 13.4:
Depending on C libraries has its drawbacks.
Provide an alternative pure-Go implementation of bzip.NewWriter
that uses the os/exec
package to run /bin/bzip2
as a
subprocess.
3.17.162.247