Chapter 9. Connecting D with C

There are many reasons a programming language may fail to gain traction, but a surefire way to discourage adoption is to make it incompatible with C, whose ABI is the lingua franca of programming languages. Even if the creators of a new software project are in a position to choose any language they'd like to work with, it's unlikely that they would be willing to take the time to port or recreate the popular battle-tested C libraries they are sure to need, such as image loaders, graphics interfaces, or database drivers. The easier it is to interface with C, the better.

Binary compatibility with C was a priority with D from the beginning. This means it's possible for code written in D to directly call functions written in C (or any language that exposes a C ABI-compatible interface), and vice versa. There's no need for any intermediate layer to bridge the two languages together. This means it's easy, and often quick, to get a D and C program up and running. In this chapter, we're going to take a fairly comprehensive look at how to make D and C work together. It's impossible to cover every possible corner case that may be encountered in the pantheon of C arcana, but it's still going to be heavy on details. This chapter can serve as a reference to connect the two languages in the majority of cases. The layout looks like this:

  • Preliminaries: terminology, object file formats, and linkage attributes
  • Binding D to C: function prototypes and type translation
  • Calling C from D: handling arrays, memory, and exceptions
  • Calling D from C: how to manage DRuntime

Preliminaries

The bulk of this chapter is about C code and D code, and what you need to do in order for the two to communicate. There are a few prerequisites that need to be established before we roll up our sleeves. First, there needs to be a clear understanding of the key terminology used in this chapter, both in order to understand the content and to discuss it with others. An understanding of the different types of binary output from the different compiler toolchains is key to getting D and C binaries to link. Second, it's beneficial to know at least a little of what's going on under the hood when D and C are combined in the same program. We'll cover all of that here in this section.

Terminology

In order to avoid the potential for misunderstanding in this chapter and in conversations with other D programmers, we're going to clearly define certain terms that pop up in any discussion of connecting D with C. There are those for whom the meanings of some of these words blur together, but we're focusing strictly on how they are generally used in the D community at large.

Bindings, wrappers, and ports

The primary focus of this chapter is how to create a binding between D and C. A binding is a bit of code that allows one language to communicate with another. There are two other terms that sometimes get mixed up in the same context: wrapper and port. If you ever create a binding to a C library and intend to make it publicly available for others, you want to make sure you're using the correct terminology.

Different languages have different approaches to creating language bindings. Java programmers bind to foreign-language libraries through the Java Native Interface (JNI). An intermediate layer, which sits between the Java code and the library, is created as a C or C++ library. The JNI is used to translate Java types and function calls to something understood by the foreign library. In this scenario, the Java code does not bind to the foreign library itself, but rather to the intermediate layer. This means there need not be a one-to-one correspondence between the types and function signatures in the library and those on the Java side. In fact, the intermediate layer can hide the library interface completely and expose an entirely new interface to the Java code, one that wraps the library. In that situation, the result can be called a wrapper.

C++ has the advantage that C++ compilers can understand C header files. No bindings need to be created to use a C library in C++. However, some C++ programmers prefer to create an object-oriented interface over a C library, using C++ features that aren't present in C. Again, this can be called a wrapper. When going the other way, from C++ to C, the terminology isn't so obvious. Much of the C++ API is hidden behind C functions. The interface has to be different simply because C does not support many C++ features. Is this a binding or a wrapper? Both terms are sometimes used in this case.

Like C++, D can communicate directly with C without the need for a JNI-like layer in the middle. However, D compilers do not know how to parse C header files. That means the function prototypes and type declarations in a C header need to be translated to D. The result is a module full of aliases, type declarations, and function prototypes that can be called a binding. The new D module can be imported in any D project and, once compiled and linked with the C library, will allow D code to call into the C API directly.

Several great examples of this can be found in the DRuntime source that ships with DMD. The core.stdc package is a collection of modules that, together, form a binding to the C library. In it you'll find function prototypes such as this one from core.stdc.stdlib:

extern(C) void exit(int status);

That comes with a companion set of manifest constants, which are the result of translating the C definitions for the possible values of status:

enum EXIT_SUCCESS = 0;
enum EXIT_FAILURE = 1;
enum MB_CUR_MAX   = 1;

When you compile any D program, the standard C library is automatically linked, so you can call exit and any other standard C functions any time you like. All you need is to import the correct module.

In D, there is a clear distinction between bindings and wrappers. Bindings provide no implementation of the functions in an API, only the signatures; they link directly with the equivalent symbols in a C library. A wrapper can be implemented on top of a binding to give it a more D-like interface, but the functions in a wrapper must have an implementation. As an example, you can find a binding to the 2D graphics library, SDL, at https://github.com/DerelictOrg/DerelictSDL2. Then at https://github.com/d-gamedev-team/gfm/tree/master/sdl2/gfm/sdl2 is an API that uses the binding internally, but wraps the SDL interface in something more D-friendly.

That brings us to the word port. Typically, this term is used to indicate that a program has been translated from one language, platform, or CPU architecture to another. Looking at core.stdc again, some might say that the headers of the C standard library have been ported to D, but that's misleading; we cannot say that the entire library has been ported. Translating the headers is what is necessary to create a binding; translating the source is creating a port. As an example, the colorize package that we used in the previous chapter is a port of a Ruby library to D. Bindings have a number of constraints which ports don't have, perhaps the biggest being a dependency on the original C library.

Note

Once, I was scrolling through the DUB registry looking at new packages that had been added and found one that was described as a port of a C library. Clicking through and looking at the source showed it to be a binding, not a port. It may seem like a small matter to confuse terminology like that, but inaccurate terminology can lead to a surprising number of support requests and issue reports from people who don't bother to click through to the source first.

Dynamic and static – context matters

From personal experience in maintaining the Derelict bindings over several years, more confusion arises from the terms dynamic binding and static binding than from any other related terms. There are four other terms that include dynamic or static and which are often used in any discussion about compiling, linking, and using bindings: dynamic linking, static linking, dynamic libraries, and static libraries. Even if you are familiar with these terms and what they describe, you are encouraged to read through this subsection to fully understand the difference between static and dynamic bindings.

Static libraries and static linking

Static libraries are a link-time dependency. They are archives of object files that are handed off to the linker along with any other object files intended to be combined to form an executable or shared library. On Windows, they tend to have the .lib (Library) extension, while on other platforms (and Windows versions of GCC) they usually have the .a (Archive) extension. When a static library is created, no linking takes place. The compiled objects are gathered into a single library file, and there they stay until the library is ultimately handed off to a linker during the build process of an executable or shared library. This means that if the library has any other link-time or runtime dependencies, the final binary will have those dependencies as well.

The job of the linker, in addition to creating the final binary, is to make sure that any reference to a symbol in any of the object files it is given, be it a function call or a variable access, is matched up with the symbol's memory offset. With a static library, everything needed is right there for the linker to make use of. It's just the same as if every object file in the library were given to the linker individually on the command line. Linking with a static library is known as static linking.

Dynamic libraries and dynamic linking

Dynamic libraries (I will use shared library and dynamic library interchangeably in this book) are often a link-time dependency, but are always a runtime dependency. On Windows, they have the .dll (Dynamic Link Library) extension, while on Unix-based systems they have the .so (Shared Object) extension (Mac OS X additionally supports .dylib, or Dynamic Library files). Dynamic libraries are created by a linker, not by a library tool. This means that any link-time dependencies a shared library has are part of the library itself; any executable using the shared library need not worry about them. Runtime dependencies, by definition, still need to be available when the program is executed.

Any program that uses a dynamic library needs to know the address of any symbols it needs from the library. There are two ways to make this happen. The first, and most common, is dynamic linking. With this approach, the linker does the work of matching up the offsets of the library symbols with any points of access in the executable. This is similar to what it does with static linking, but in this case the library is not combined with the executable. Instead, when the executable is loaded into memory at runtime, the dynamic library is loaded by the system's dynamic linker (or runtime linker), which I'll refer to as the system loader. The preliminary work done by the linker allows the loader to match function calls and variable accesses with the correct memory addresses.

On Unix-based systems, dynamic linking takes place by giving the shared object file directly to the linker, along with any object files and static libraries intended to form the final executable. The linker knows how to read the library and find the memory offset of each symbol that is used by the other files it is given. On Windows, when a DLL is created, a separate library, called an import library, is also created. This file, somewhat confusingly, has the same .lib extension as a static library. The import library contains all of the offsets for every symbol in the DLL, so it is passed to the linker in place of the DLL itself. Some C and C++ linkers on Windows know how to fetch the memory offsets directly from a DLL, so they can be given either the import library or the DLL.

The second way to make use of a dynamic library is for the program to load it manually at runtime (often called dynamic loading, but we've got enough dynamics to deal with here already, so I'll use the term manual loading). Essentially, the programmer must do in code what the system loader would have done at application start up: match any use of a dynamic library's symbols in a program with the addresses of the symbols after the library is loaded into memory. This requires all functions and variables declared and used in the program to be pointers (more on that shortly). Using an API exposed by the operating system, the dynamic library is loaded into memory with a function call, then another function is used to extract the address of every required symbol, which is then assigned to an appropriate pointer.

In order for the system to load a dynamic library at runtime, it must know where the library can be found. Every operating system defines a search path for dynamic libraries. Though there are normally several locations on the search path, there are typically only one or two directories on any given system where most shared libraries live. On Windows, it's normal for any non-system libraries required by a program to ship in the same directory as the executable, with the result that multiple copies of the same library may be installed with multiple programs. On Unix-based systems, it's preferred for dependencies to be installed on the system search path through a package manager so that every program can share the same copy of the library. This is supported on OS X, but it additionally supports packing dependencies with the executable in an application bundle.

Dynamic and static bindings

Now we get to the underlying theme of this chapter. When setting out to create a D binding to a C library (or vice versa), a decision must be made on what type of binding it is going to be: a static binding or a dynamic binding. Unfortunately, static and dynamic used in this context can sometimes lead to the erroneous conclusion that the former type of binding requires static linking and the latter requires dynamic linking. Let's nip that misconception in the bud right now.

A static binding is one that always has a link-time dependency on the bound library, but that dependency can be in the form of a static library or a dynamic (or import) library. In this scenario, functions and global variables are declared on the D side, much as they would be in any C header file. The core.stdc package in DRuntime is a static binding to the standard C library. Let's look again at the declaration of the exit function:

extern(C) void exit(int status);

In order for this to compile, one of two things must happen: either the static version or the dynamic version of the C library must be passed to the linker. Either way, there is a link-time dependency. Failure to pass a library to the linker would cause it to complain about a missing symbol. If the dynamic library is linked, then there is an additional runtime dependency as well. DMD will automatically link the C standard library, though whether it does so statically or dynamically depends on the platform.

With a dynamic binding, we completely eliminate the link-time dependency, but take a guaranteed runtime dependency as a trade-off. With this approach, normal function declarations are out the window. Instead, we have to declare function pointers. In a dynamic binding to the C standard library, the declaration of the exit function on the D side would look like this:

extern(C) alias pexit = void function(int);
pexit exit;

Then, somewhere in the program, the standard C library needs to be loaded into memory, and the address of the exit symbol must be fetched and assigned to the function pointer.

Just to drive the point home, because it is so often the source of misunderstanding: static bindings can link with either static or dynamic libraries, but they must always link with something; dynamic bindings have no link-time dependencies at all, but the dynamic library must always be loaded manually at runtime.

Object file formats

One potential sore spot when working with static bindings is object file formats. Any given linker knows how to deal with a specific format, which means any object files and libraries it is given must be in that format. On Linux, Mac, and other Unix-based platforms, this isn't such a big deal. The compiler backends on all of these platforms output the same object file format, such as elf on Linux and the BSDs, or mach-o on OS X. On Windows, the picture isn't so rosy.

Among the three major D compilers on Windows, there are three linkers to contend with: the DMC linker that DMD uses by default, the MinGW linker used by GDC and one flavor of LDC, and the Microsoft linker used by DMD in 64-bit mode (and 32-bit with the –m32mscoff switch) and another flavor of LDC. Among these three linkers are two primary object formats: OMF and COFF. The DMC linker outputs object files in the ancient OMF format, whereas everything else outputs COFF. This is an issue that affects both static and import libraries.

Another potential thorn arises when dealing with static libraries generated by MinGW. Sometimes, it's possible for them to work with the Microsoft toolchain, as they use the COFF format and link with the Microsoft C Runtime. Unfortunately, there are a number of incompatibilities that can crop up in the form of linker errors. Even static libraries compiled directly with Microsoft Visual Studio can sometimes result in linker errors when given to DMD, depending on the options that were used to compile the library.

The bottom line is that, with a static binding, all static libraries, import libraries, and object files given to the linker must be in the file format the linker understands. Preferably, the libraries and object files will all have been compiled by the same toolchain. Generally, you want to follow these guidelines when compiling any C library intended to be used with a static binding in a program compiled by DMD:

  • On Windows, when compiling the program with the –m32 switch (the default), all C libraries should be compiled with DMC
  • On Windows, when compiling the program with –m64 or –m32mscoff, all C libraries should be compiled with the Microsoft compiler
  • On other platforms, all C libraries can be compiled with either GCC or clang

If you're coming to D from a language such as Java and have never compiled a C library before, most popular C library projects for which D bindings exist provide binary distributions for different platforms and compiler toolchains. You may never need to compile any C at all. However, it's still useful to learn about some of the different build tools many C projects use. There may be times when no binary distribution is available and you have no choice but to compile it yourself.

Tip

Conversion tools

When compiling with DMD on Windows using the default architecture (–m32), COFF files can be converted to OMF using a conversion tool such as Agner Fogg's free object file converter (http://agner.org/optimize/#objconv) or the coff2omf utility that is part of the commercial Digital Mars Extended Utility Package (EUP) (http://www.digitalmars.com/eup.html). The EUP also contains a tool, coffimplib, which will create an import library in OMF format from a DLL compiled as COFF. For all three tools, the results may not be perfect.

Linkage attributes

The fundamental mechanism that affects how D and C symbols interact with one another is the Application Binary Interface (ABI). This defines such things as how types are laid out in memory, what their sizes are, and so on. Most of that we don't have to worry about when creating a binding, as the compiler takes care of it for us. However, there are two aspects of the ABI to which active attention should be paid in order to ensure the binding matches up with the C library. Get this wrong and any binding you create becomes nothing more than a pile of linker errors or access violations waiting to happen. One mechanism is that of name mangling, the other is calling conventions.

Name mangling

With a language that supports function overloading, a linker needs to be able to distinguish between different overloads of a function. It also needs to be able to distinguish between any symbols of the same name in different namespaces. This is where name mangling comes into play. The compiler takes symbols declared in source code and mangles, or decorates, them with a set of characters that have predefined meanings. We can see this in D by calling the mangleof property on any symbol. Save the following as $LEARNINGD/Chapter09/mangle.d:

module mangle;
import std.stdio;
int x;
void printInt(int i) { writeln(i); }
void main() {
  writeln(x.mangleof);
  writeln(printInt.mangleof);
}

Running this results in the following output:

_D6mangle1xi
_D6mangle8printIntFiZv

A linker need not know or care what the mangled names indicate, but a tool that understands D mangling can make use of it. In both lines, _D indicates that this is the D name-mangling format. The 6 immediately after it says that the symbol following the number, mangle, has six characters. Being that mangle is the first symbol in the name, we know it's the name of the module. It acts as a namespace. In the first line, mangle is followed by 1xi. The 1 indicates a one-character symbol name, x is the name, and the i tells us it's an int.

Similarly, the second line tells us that the symbol name has 8 characters and the name is printInt. F lets us know that it's a function, i that it takes an int parameter, and Z, in this case, indicates that the next character represents the return type of the function. Since that happens to be v, we know the return type is void. You can read more about D's name mangling at http://dlang.org/abi.html.

Not all languages define a name-mangling format as D does. C++, for example, does not; each compiler has its own approach to name mangling, which is one of several aspects of the C++ ABI that makes it extremely difficult to bind to C++ libraries (though, as we'll see in Chapter 11, Taking D to the Next Level, there is ongoing work to make it possible in D). C, on the other hand, is the lingua franca of programming languages for a reason: it has a well-defined ABI that does not include function overloading or namespaces.

That's not to say that C compilers don't use any sort of decorations. It's still necessary to distinguish between variables that are declared locally to a compilation unit rather than globally, but this has no impact on bindings. Some compilers may decorate a C symbol in a static library with an underscore, but this is usually not an issue in practice. The short of it is that when a C header is translated into D, any symbols that need to link up on both sides cannot be declared with the default D name mangling. The C side knows nothing about D's name-mangling scheme, so nothing would ever match up unless it's disabled. We'll see how to do this soon, but first we need to talk about calling conventions.

Calling conventions

When a function is called, there are a number of steps that must be taken, both at the beginning of the call and at the end. During compilation, the compiler generates the appropriate instructions to carry out those steps. This includes instructions to preserve the contents of the CPU registers if needed, pushing function parameters on the stack or copying them into registers before the call, looking in the correct location for a return value once a function call has ended, and other low-level details that we programmers never have to manage ourselves unless we are programming in assembly. In order for the correct instructions to be generated, the requirements must be detailed somewhere of how to carry out any given function call, including whether it expects any parameters in registers, in what order stack parameters should be pushed, and so on. That's the role played by a calling convention.

A calling convention defines how every aspect of a function call should be handled. When a function is compiled, the compiler determines which calling convention is associated with the function and generates the appropriate instructions to fetch parameters and return a value. When any code calling that function is compiled, the compiler must be made aware of the calling convention originally used to compile the function so that it can generate the appropriate instructions for the call.

By default, all functions in D are assumed to have the D calling convention. As I write, the D calling convention on non-Windows systems is documented to be the same as the C calling convention supported by the system C compiler. In practice, there are undocumented discrepancies, but this isn't an issue for general use. The convention for Windows is described at http://dlang.org/abi.html.

Putting it together

When binding to any C library from D, it's important to know exactly which calling convention the library uses. On non-Windows systems, this is almost always the C calling convention. On Windows, it is usually either the C calling convention or the system calling convention, stdcall (standard call). Often, the calling convention used is not described anywhere in a library's documentation and it's necessary to look at the headers. If you find __stdcall defined in any of the headers, such as something like this:

#define DLL_CALLCONV __stdcall

Then you know any functions annotated with DLL_CALLCONV have the standard call calling convention. The C calling convention might also be defined explicitly with a __cdecl. If no convention is declared, you can assume the C calling convention, which is the default for all C compilers.

Tip

Changing the default

Some C compilers allow changing the default calling convention through a command line switch. For people using Visual C++, this is easily done in a project's properties window. This is a potential issue when using precompiled libraries with bindings.

D allows you to specify both the name-mangling scheme of any symbol and the calling convention of any function by using a linkage attribute. Here's an example:

module linkage;
extern(C) int cint;
extern(D) int dint;
extern(C) int aFuncWithCLinkage() { return 1; }
extern(D) int aFuncWithDLinkage() { return 2; }
void main() {
  import std.stdio;
  writeln(cint.mangleof);
  writeln(dint.mangleof);
  writeln(aFuncWithCLinkage.mangleof);
  writeln(aFuncWithDLinkage.mangleof);
}

extern(C) is a linkage attribute that turns off name mangling (to match C) and specifies that a function has the C calling convention. extern(D) specifies the D name mangling and calling convention. The output looks like this:

cint
_D7linkage4dinti
aFuncWithCLinkage
_D7linkage17aFuncWithDLinkageFZi

That's a big difference. It's easy to forget that linkage attributes affect more than calling conventions. Of course, since D is the default name-mangling scheme and calling convention, the need to specify it in code is rare (in the previous example, main has D linkage). However, linkage attributes can be declared with a colon (:) and with braces ({}), so it may sometimes be needed. For example:

extern(C):
  // A bunch of stuff with C linkage
extern(D):
  // Enable D linkage again

In addition to C and D, there are also the Windows, System, and Pascal linkage attributes. extern(Windows) is used for functions that have the standard call calling convention. extern(System) defaults to extern(Windows) on Windows and extern(C) elsewhere. There are some cross-platform libraries out there that use the default C calling convention on most platforms, but use standard call on Windows. extern(System) eliminates the need to declare two sets of function declarations to match the different calling conventions. The need for extern(Pascal) is extremely rare, if not nonexistent. It was the system calling convention for Windows back in the 16-bit days.

Linkage attributes have no effect on type declarations. We can see that in this example:

module types;
extern(C) struct CStruct {
  int x, y;
}
struct DStruct {
  int x, y;
}
void main() {
  import std.stdio;
  writeln(CStruct.mangleof);
  writeln(DStruct.mangleof);
}

The output:

S5types7CStruct
S5types7DStruct

Some new D programmers think that types must always be declared as extern(C) when binding to C, but that's not the case. The types in D don't even need to have the same name as the C types, as the type names will never be emitted to the binary. All that matters is that the D types are binary compatible with the C types. More on this in the next section.

Another point of confusion comes from function implementations with C linkage. Functions in a C library binding have no implementation, only declarations. However, when using a binding, it is sometimes necessary to implement an extern(C) function to use as a callback. Sometimes, new D users have the impression that D features cannot be used in such a function. This is not the case. Remember, the linkage attribute only affects the mangling of the function's name and its calling convention. There are no restrictions on the features that can be used in the function body. On the other hand, there can be negative consequences when it comes to throwing exceptions and allocating GC memory, but that has nothing to do with the linkage attribute. We'll cover those issues later in the chapter when we talk about calling C from D.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.197.10