Binding D to C

The first step that must be taken when implementing a binding to a C library is to decide whether it will be a static or dynamic binding. The former requires nothing more than the translation of all of the C types, global variables, and function signatures. The latter additionally requires a supporting API to load a shared library into memory. We'll cover both approaches in this section.

Once the type of binding has been settled, then begins the work of translating the C headers. This requires enough familiarity with C to understand not just the types and function signatures, but also the preprocessor definitions. While we'll cover some common preprocessor usage to look out for and how to translate it to D, there's not enough room here to provide a full tutorial on C preprocessor directives. Those readers unfamiliar with C are strongly advised to take the time to learn some of the basics before taking on the task of translating any C headers.

There are two tools in the D ecosystem that provide some amount of automated translation from C headers to D modules. htod is available at http://dlang.org/htod.html. It's built from the frontend of DMC, the Digital Mars C and C++ compiler. Another option is DStep, which uses libclang. It can be found at https://github.com/jacob-carlborg/dstep. Neither tool is perfect; it is often necessary to manually modify the output of both, which makes the information in this chapter relevant. Furthermore, both tools only generate static bindings.

Function prototypes

As we saw earlier, the declaration of function prototypes differs for static and dynamic bindings. The one trait they have in common is that they both need to be declared with a linkage attribute. Given that the functions are implemented in C, it's also normally safe to declare them with the @nogc and nothrow attributes. With that, in any module where the C function signatures are declared, you can first set aside a block like so:

extern(C) @nogc nothrow {
  // Function signatures go here
}

Replace extern(C) with extern(System) or extern(Windows) as required.

Now, consider the following hypothetical C function declaration in a C header:

extern int some_c_function(int a);

Let's put a declaration in our extern(C) block for a static binding:

extern(C) @nogc nothrow {
  int some_c_function(int a);
}

A key thing to remember in a static binding is that the function names on the D side must exactly match the function names on the C side. The linker will be doing the work of matching up the function symbols (either via static or dynamic linking) and it isn't smart enough to guess that someCFunction declared in D is the same as some_c_function declared in C. It doesn't know or care that they came from different languages. All it knows about is object files and symbols, so the symbols must be the same.

Another thing to consider is that the parameter names are optional. These only have meaning in the function implementation. In the function prototypes, only the types of the parameters matter. You can omit the parameter names completely, or change them to something else. Both of the following will happily link to the C side:

int some_c_function(int bugaloo);
int some_c_function(int);

If you intend to add Ddoc comments to your binding, it's a good idea to keep the parameter names to make the documentation more clear. It's also best practice to use the same parameter names as the original C library, though I doubt anyone would fault you for changing the names for clarity where appropriate. Additionally, keeping the parameter names makes it possible to use them with compile-time reflection.

If the original C library is well documented, it's reasonable to point users of your binding to the original documentation rather than providing your own. After all, implementing Ddoc comments for the C API means you have to make sure that you don't contradict the original documentation. Then you have to make sure it all stays in sync. That seems like a waste of effort when a perfectly good set of documentation already exists. Ideally, users of your binding would never need to look at its implementation to understand how to use it. They can get everything they need from the C library documentation and examples.

In a dynamic binding, the declaration might look like this:

extern(C) @nogc nothrow {
  int function(int) someCFunction;
}

Here, we've declared a function pointer instead of a function with no body. Again, the parameter name is not required and, in this case, we've left it out. Moreover, we don't need to use the original function name. In a dynamic binding, the linker plays no role in matching symbols. The programmer will load the shared library, fetch the function address, and assign it to the function pointer manually. That said, you really don't want to use the real function name in this case.

The problem is that the variable that holds the function pointer is also declared as extern(C), which means the symbols for the function pointer and the C function will be identical. It may work most of the time on most platforms, but there is potential for things to blow up. I can tell you from experience that with many libraries it will cause errors during application startup. on Linux If you want to keep the same name, use an alias:

extern(C) @nogc nothrow {
  alias p_some_c_function = int function(int);
}
__gshared p_some_c_function some_c_function;

We'll see why __gshared is used here shortly.

If we were making a static binding to this single-function C library, we would be finished. All we would need to do is import the module containing the function declaration and link with the C library, either statically or dynamically, when we build the program. For a dynamic binding, however, there's still more work to do.

Manually loading shared libraries

A function pointer is useless until it has been given the address of a function. In order to get the address, we have to use the system API to load the library into memory. On Windows, that means using the functions LoadLibrary, GetProcAddress, and FreeLibrary. On the other systems DMD supports, we need dlopen, dlsym, and dlclose. A barebones loader might look something like this, which you can save as $LEARNINGD/Chapter09/clib/loader.d (we'll make use of it shortly):

module loader;
import std.string;
version(Posix) {
  import core.sys.posix.dlfcn;
  alias SharedLibrary = void*;
  SharedLibrary loadSharedLibrary(string libName) {
    return dlopen(libName.toStringz(), RTLD_NOW);
  }
  void unload(SharedLibrary lib) {
    dlclose(lib);
  }
  void* getSymbol(SharedLibrary lib, string symbolName) {
    return dlsym(lib, symbolName.toStringz());
  }
}
else version(Windows) {
  import core.sys.windows.windows;
  alias SharedLibrary = HMODULE;
  SharedLibrary loadSharedLibrary(string libName) {
    import std.utf : toUTF16z;
    return LoadLibraryW(libName.toUTF16z());
  }
  void unload(SharedLibrary lib) {
    FreeLibrary(lib);
  }
  void* getSymbol(SharedLibrary lib, string symbolName) {
    return GetProcAddress(lib, symbolName.toStringz());
  }
}
else static assert(0, "SharedLibrary unsupported on this platform.");

Tip

Error handling

This implementation does no error handling, but a more robust implementation would throw exceptions with a system-generated error message as the exception message. See the functions std.windows.syserror.sysErrorString and std.posix.dlfcn.dlerror.

With that in hand, we can do something like this to load a library:

auto lib = loadSharedLibrary(libName);
if(!lib) throw new Error("Failed to load library " ~ libName);

To load a function pointer that is not aliased, such as someCFunction which we saw previously, we can't just call getSymbol and assign the return value directly to the function pointer. The return type is void*, but the compiler requires a cast to the function pointer type. However, the type used with the cast operator must include the linkage and function attributes used in the declaration of the function pointer. It's not possible to include all of that directly in the cast. There are different ways to handle it.

The simplest thing to do in our case is to call typeof on someCFunction and use the result as the type in the cast:

someCFunction = cast(typeof(someCFunction))lib.loadSymbol("some_c_function");

Things are different when the aliased form is used:

extern(C) @nogc nothrow {
  alias p_some_c_function = int function(int);
}
__gshared p_some_c_function some_c_function;

With this, we can then load the function like so:

some_c_function = cast(p_some_c_function))lib.loadSymbol("some_c_function");

One issue with this approach is that some_c_function, being a variable, has thread-local storage like any other variable in D. This means that in a multi-threaded app, every thread will have a copy of the pointer, but it will be null in all of them except for the one in which the library is loaded. There are two ways to solve this. The hard way is to make sure that getSymbol is called once in every thread. The easy way is to add __gshared to the declaration as we have done.

We'll dig into this a little more in Chapter 11, Taking D to the Next Level, but there are two ways to make a variable available across threads in D: __gshared and shared. The former has no guarantees. All it does is put the variable in global storage, making it just like any global variable in C or C++. The latter actually affects the type of the variable; the compiler is able to help prevent the variable from being used in ways it shouldn't be.

Trying it out

In the $LEARNINGD/Chapter09/clib directory of the book's sample source distribution, you'll find a C source file, clib.c, which looks like this:

#include <stdio.h>
#ifdef _MSC_VER
__declspec(dllexport)
#endif
int some_c_function(int a) {
  printf("Hello, D! From C! %d
", a);
  return a + 20;
}

This is accompanied by four Windows-specific binaries: clib32.lib, clib32.dll, clib64.lib, and clib64.dll. The library files are import libraries, not static libraries, intended to be linked with the static binding. Because they are import libraries, each has a runtime dependency on its corresponding DLL.

If you are working on a platform other than Windows, you can use GCC (or clang, if it is your system compiler) to compile the corresponding version of the shared library for your system. The following command line should get the job done:

gcc -shared -o libclib.so -fPIC clib.c

You'll also find loader.d, the implementation of which we saw previously, and a D module named dclib.d. The top part of this module provides both a static and dynamic binding to some_c_function. The rest shows how to use the two versions of the binding. The implementation is:

extern(C) @nogc nothrow {
  version(ClibDynamic)
  int function(int) some_c_function;
  else
    int some_c_function(int);
}
void main() {
  version(ClibDynamic)
  {
    import loader;
    version(Win64) enum libName = "clib64.dll";
    else version(Win32) enum libName = "clib32.dll";
    else enum libName = "libclib.so";

    auto lib = loadSharedLibrary(libName);
    if(!lib) throw new Exception("Failed to load library " ~ libName);

    some_c_function = cast(typeof(some_c_function))lib.loadSymbol("some_c_function");
    if(!some_c_function) throw new Exception("Failed to load some_c_function");
  }
  import std.stdio : writeln;
  writeln(some_c_function(10));
}

The command line used to compile all of this depends on the platform and linker you are using. Compiling to use the static binding in the default 32-bit mode on Windows:

dmd dclib.d clib32.lib -oftestStatic

This uses the static binding, links with clib32.lib, and creates an executable named testStatic.exe. To see the sort of error a user would get when the DLL is missing, temporarily rename clib32.dll and execute the program. To test the dynamic binding, use this command line:

dmd -version=ClibDynamic dclib.d loader.d -oftestDynamic

This time, we don't link to anything, but need to compile loader.d along with the main module. We specify the version ClibDynamic to trigger the correct code path and we output the binary as testDynamic.exe to avoid mixing it up with testStatic. Once again, temporarily rename clib32.dll and see what happens. When manually loading a shared library like this, the loader also needs to manually handle the case where loading fails. One benefit of this approach is that it provides the opportunity to display a message box with a user-friendly message instructing the user on how to solve the problem, or provide a link to a web page that does.

Compiling in 64-bit mode with the MS linker is similar:

dmd -m64 dclib.d clib64.lib -oftestStatic64
dmd -m64 -version=ClibDynamic dclib.d loader.d -oftestDynamic64

Again, we're using distinct file names to avoid overwriting the 32-bit binaries.

When compiling the static binding with GCC on other platforms, we need to tell the linker to look for the library in the current directory, as it is not on the library search path by default. -L-L. will make that happen. Then we can use -L-lclib to link the library:

dmd -L-L. -L-lclib dclib.d -oftestStatic

Compiling the dynamic binding is almost the same as on Windows, but on Linux (not Mac or the BSDs) we have to link with libdl to have access to dlopen and friends:

dmd -version=ClibDynamic -L-ldl dclib.d loader.d -oftestDynamic

When executing either version at this point, you will most likely see an error telling you that libclib.so can't be found. Unlike Windows, Unix-like systems are generally not configured to search for shared libraries in the executable's directory. In order for the library to be found, you can copy it to one of the system paths (such as /usr/lib) or, preferred for this simple test case, temporarily add the executable directory to the LD_LIBRARY_PATH environment variable. Assuming you are working in ~/LearningD/Chapter09/clib, then the following command will do it:

export LD_LIBRARY_PATH=~/LearningD/Chapter09/clib:$LD_LIBRARY_PATH

With that, you should be able to execute ./testStatic and ./testDynamic just fine.

No matter the platform or linker, a successful run should print these lines to the console:

Hello, D! From C! 10
30

C types to D types

Getting from C to D in terms of types is rather simple. Most of the basic types are directly equivalent, as you can see from the following table:

C types

D types

void

void

signed char

byte

unsigned char

ubyte

short

short

unsigned short

ushort

int

int

unsigned int

uint

long

core.stdc.config.c_long

unsigned long

core.stdc.config.c_ulong

long long

long

unsigned long long

ulong

float

float

double

double

long double

core.stdc.config.c_long_double

There are a few entries in this table that warrant explanation. First, the translation of signed char and unsigned char to byte and ubyte applies when the C types are used to represent numbers rather than strings. However, it's rare to see C code with char types explicitly declared as signed. The reason it appears in this table is because the C specification does not specify that the default char type be signed or unsigned, but rather leaves it implementation defined. In practice, most C compilers implement the default char as a signed type, which matches the default for other types (short, int, and long), but it's still possible to encounter libraries that have been compiled with the default char type to be unsigned; GCC supports the -funsigned-char command line switch that does just that. So while it's generally safe to treat the default C char as signed, be on the lookout for corner cases.

The size of the C long and unsigned long types can differ among C compilers. Some implement them as 32-bit types and others as 64-bit. To account for that difference, it's best to import the DRuntime module core.stdc.config and use the c_long and c_ulong types, which match the size of the long and unsigned long types implemented by the backend. When it comes to long double, you may come across some documentation or an old forum post that recommends translating it to real in D. Once upon a time, that was the correct thing to do, but that has not been true since DMD gained support for the Microsoft toolchain. There, the size of long double is 64 bits, rather than 80. To translate this type, import core.stdc.config and use c_long_double. This list of special cases could grow as support is added for more compilers and platforms.

Tip

Don't define your own

If, for whatever reason, you're tempted to avoid importing core.stdc.config and declare your own alias for c_long_double to treat it as a double when using the MS backend, please don't. The compiler specially recognizes c_long_double when it's used with the MS backend and generates special name mangling for instances of that type. Using anything else could break ABI compatibility.

Strings and characters

There are two character types in C, char and wchar_t. Strings in C are represented as arrays of either type, most often referred to as char* and wchar_t* strings. The former can be translated to D directly as char*, although some prefer to translate it as ubyte* to reflect the fact that the D char type is always encoded as UTF-8, while there is no such guarantee on the C side. In practice, this is more of an issue for how the instances of the type are used, more than how they are translated.

The wchar_t type can't be directly translated. The issue is that the size and encoding of wchar_t is not consistent across platforms. On Windows, it is a 2-byte value encoded as UTF-16, while on other platforms it is a 4-byte value encoded as UTF-32. There is no wrapper type in core.stdc.config, but in this case it's easy to resolve:

version(Windows) alias wchar_t = wchar;
else alias wchar_t = dchar;

Special types

The types size_t and ptrdiff_t are often used in C. These are types that are not part of the language, but are defined in the standard library. D also provides aliases that correspond exactly to the type and size of each as defined in the relevant C compiler, so a direct translation is appropriate. They are always globally available, so no special import is required.

Complex types have been a part of C since C99. There are three types, float _Complex, double _Complex, and long double _Complex. In D, these translate to, respectively, cfloat, cdouble, and creal. The functions found in the C header file complex.h are translated to D in the DRuntime module core.stdc.complex. It's expected that these three complex types will be deprecated at some point in the future, to be replaced by the type std.complex.Complex, which is usable now.

C99 also specifies a Boolean type, _Bool, which is typedefed to bool in stdbool.h. The specification requires only that the type be large enough to hold the values 0 and 1. The C compilers we need to worry about for DMD implement _Bool as a 1-byte type. As such, the C _Bool or bool can be translated directly to D bool. Older versions of GCC implemented it as a 4-byte type, so be on the lookout if you're ever forced to compile a D program against C libraries compiled with GCC 3.x.

C99 also introduced stdint.h to the C standard library. This provides a number of typedefs for integer types of a guaranteed size. For example, int8_t, uint8_t, int16_t, uint16_t, and so on. When you encounter these in a C header, you have two options for translation. One option is just to translate them to the D type of the same size. For example, int8_t and uint8_t would translate to byte and ubyte. The other option is to import core.stdc.stdint and use the C types directly.

Enumerations

The C enum and the D enum are equivalent, so that a direct translation is possible. An example:

// In C
enum {
  BB_ONE,
  BB_TWO,
  BB_TEN = 10
};
// In D
enum {
  BB_ONE,
  BB_TWO,
  BB_TEN = 10
}

Some thought needs to be given toward how to translate named enums. Consider the following:

typedef enum colors_t {
  COL_RED,
  COL_GREEN,
  COL_BLUE
}

It may seem that a direct translation to D would look like this:

enum colors_t {
  COL_RED,
  COL_GREEN,
  COL_BLUE
}

However, that isn't an accurate translation. There is no notion of enum namespaces in C, but to access the members of this enum in D would require using the colors_t namespace, for example, colors_t.COL_RED. The following would be more appropriately called a direct translation:

alias colors_t = int;
enum {
  COL_RED,
  COL_GREEN,
  COL_BLUE
}

Now this can be used exactly as the type is used on the C side, which is important when you want to maintain compatibility with existing C code. The following approach allows for both C and D styles:

enum Colors {
  red,
  green,
  blue,
}
alias colors_t = Colors;
enum {
  COL_RED = Colors.red,
  COL_GREEN = Colors.green,
  COL_BLUE = Colors.blue,
}

Structures

The D struct is binary compatible with the C struct, so here too, most translations are direct. An example is:

// In C
struct point {
  int x, y;
};
// In D
struct point {
  int x, y;
}

The only difference between these two types shows up in usage. In C, anywhere the point type is to be used, the struct keyword must be included, for example, in the declaration struct point p. Many C programmers prefer to use a typedef for their struct types, which eliminates the need for the struct keyword in variable declarations and function parameters by creating an alias for the type. This typically takes one of two forms:

// Option 1
typedef struct point point_t;
struct point {
  int x, y;
};
// Option 2
typedef struct {
  int x, y;
} point_t;

Option 2 is shorthand for Option 1, with the caveat that point_t is not usable inside the braces. A good example of where this comes into play is a linked list node:

typedef struct node_s node_t;
struct node_s {
    void *item;
    node_t *next;
};
typedef struct node_s {
    void *item;
    struct node_s *next;
} node_t;

Note that in the first version of node_s, the the typedefed name is used.. This is perfectly legal when using an external typedef, but it isn't possible in the second version. There, since node_t is not visible inside the braces, the struct keyword cannot be omitted in the declaration of the member next. In D, the two types look like this, regardless of which approach was used in their C declarations:

struct point_t {
  int x, y;
}
struct node_t {
  void* item;
  node_t* next;
}

When any C struct has a typedefed alias, the alias should always be preferred in the D translation.

It's possible to have multiple type aliases on a single C struct. This is most often used to declare both a value type and a pointer type:

typedef struct {
  int x, y;
} point_t, *pointptr_t;

In D, we would translate this as:

struct point_t {
  int x, y;
}
alias pointptr_t = point_t*;

Sometimes, a C struct is aliased only to a pointer and without a struct name. In that case, only pointers to that type can be declared:

typedef struct {
  int i;
} *handle_t;

Most often, when this form is used, the members of the struct are only intended to be used internally. If that's the case, the struct can be declared like this on the D side:

struct _handle;
alias handle_t = _handle*;

We could declare _handle as an empty struct, but by omitting the braces we prevent anyone from declaring any variables of type _handle. Moreover, no TypeInfo is generated in this case, but it would be with an empty struct. These days, most C programmers would likely not implement a handle type like this. A more likely implementation today would look like this:

typedef struct handle_s handle_t;

In the public-facing API, there is no implementation of handle_s. It is hidden away inside one of the source modules. Given only the header file, the compiler assumes that struct handle_s is implemented somewhere and will let the linker sort it out. However, without the implementation handy, the compiler has no way of determining the size of a handle_t. As such, it will only allow the declaration of pointers. The C API will then contain a number of functions that look like this:

handle_t* create_handle(int some_arg);
void manipulate_handle(handle_t *handle, int some_arg);
void destroy_handle(handle_t *handle);

In D, we can declare handle_t the same way we declared _handle previously:

struct handle_t;

Another C idiom that isn't so common these days, but may still be encountered now and again, is the inclusion of an array declaration in the declaration of the struct type itself. For example:

struct point {
  int x, y;
} points[3] = {
  {10, 20},
  {30, 40},
  {50, 60}
};

D does not support this syntax. The array must be declared separately:

struct point {
  int x, y;
}
point[3] points = [
  point(10, 20),
  point(30, 40),
  point(50, 60)
];

Pointers

While C pointers are directly translatable to D, it pays to keep in mind the difference in declaration syntax. Consider the declarations of these two variables in C:

int* px, x;

This is not a declaration of two int pointers, but rather one int pointer, px, and one int. In a perfect world, all C programmers would conform to a style that brings a little clarity:

int *px, x;

Or, better still, declare the variables on separate lines. As it stands, there are a variety of styles that must be interpreted when reading C code. At any rate, the previous declarations in D must be separated:

int* px;
int x;

Always remember that a pointer in a variable declaration in D is associated with the type, not the variable.

Type aliases

It's not uncommon to see type aliases in C libraries. One common use is to define fixed-size integers. Since the C99 standard was released, such types have been available in stdint.h, but not all C compilers support C99. A great many C libraries are still written against the C89 standard for the widest possible platform support, so you will frequently encounter typedefed and #defined aliases for integer types to hide any differences in type sizes across platforms. Here are a couple of examples:

typedef signed char Sint8;
typedef unsigned char Uint8;

Despite the name, the C typedef does not create a new type, only an alias. Whenever the compiler sees Sint8, it effectively replaces it with signed char. The following defines have the same effect, but are handled by the preprocessor rather than the compiler:

#define Sint8 signed char
#define Uint8 unsigned char

The preprocessor parses a source module before the compiler does, substituting every instance of Sint8 and Uint8 with signed char and unsigned char. The typedef approach is generally preferred and is much more common. Both approaches can be translated to D using alias declarations:

alias Sint8 = sbyte;
alias Uint8 = ubyte;

It is not strictly necessary to translate type aliases, as the actual types, byte and ubyte in this case, can be used directly. But again, maintaining conformance with the original C library should always be a priority. It also minimizes the risk of introducing bugs when translating function parameters, struct members, or global variables.

Function pointers

In C libraries, function pointers are often declared for use as callbacks and to simulate struct member functions. They might be aliased with a typedef, but sometimes they aren't. For example:

typedef struct {
  void* (*alloc)(size_t);
  void (*dealloc)(void*);    
} allocator_t;
void set_allocator(allocator_t *allocator);

And using type aliases:

typedef void* (*AllocFunc)(size_t);
typedef void (*DeallocFunc)(void*);
typedef struct {
  AllocFunc alloc;
  DeallocFunc dealloc;
} allocator_t;
void set_allocator_funcs(AllocFunc alloc, DeallocFunc dealloc);

Sometimes, they are declared as function parameters:

void set_alloc_func(void* (*alloc)(size_t));

There are a couple of things to remember when translating these to D. First, callbacks should always follow the same calling convention they have in the C header, meaning they must be given the appropriate linkage attribute. Second, they should probably always be marked with the nothrow attribute for reasons that will be explained later in the chapter, but it isn't always quite so clear whether or not to use @nogc.

Function pointers intended for use as callbacks aren't intended to be called in D code. The pointers will be handed off to the C side and called from there. From that perspective, it doesn't matter whether they are marked @nogc or not, as the C side can call the function through the pointer either way. However, it makes a big difference for the user of the binding. @nogc means they won't be able to do something as common as calling writeln to log information from the callback. In our specific example, it's not a bad thing for the user to want the AllocFunc implementation to allocate GC memory (as long as he or she keeps track of it). Consider carefully before adding @nogc to a callback, but, as a general rule, it's best to lean toward omitting it.

Careful consideration should also be given to function pointers that aren't callbacks, but are intended for use as struct members. These may actually be called on the D side and may need to be called from @nogc functions. In this case, it might make sense to mark them as @nogc. Doing so prevents any GC allocations from taking place in the implementations, but not doing so prevents them from being called by other @nogc functions. Consider how the type is intended to be used, and what tasks the function pointers are intended to perform, and use that to help guide your decision. Of course, if the function pointers are set to point at functions on the C side, then go ahead and add @nogc and nothrow to your heart's content.

With that, we can translate each of the previous declarations. The first looks like this:

struct allocator_t {
extern(C):
  void* function(size_t) alloc;
  void function(void*) dealloc;
}

The function set_allocator can be translated directly. From the second snippet, allocator_t and set_allocator_funcs can be translated directly. AllocFunc and DeallocFunc become aliases:

extern(C) nothrow {
  alias AllocFunc = void* function(size_t);
  alias DeallocFunc = void function(void*);
}

Finally, the function set_alloc_func could be translated like this (using the form for a static binding):

extern(C) @nogc nothrow {
  void set_alloc_func(void* function(size_t));
}

In this situation, a function pointer declared as a function parameter picks up the extern(C) linkage, but does not pick up the two function attributes. If you want the callback implementation to be nothrow, you'll have to declare it like this:

extern(C) @nogc nothrow {
  void set_alloc_func(void* function(size_t) nothrow);
}

It may be preferable to go ahead and alias the callback anyway:

extern(C):
alias AllocFunc = void* function(size_t) nothrow;
void set_alloc_func(AllocFunc) @nogc nothrow;

Defined constants

Despite the presence of an enum type in C, it's not uncommon for C programmers to use the preprocessor to define constant values. A widely used library that does this is OpenGL. Just take a peek at any implementation of the OpenGL headers and you'll be greeted with a massive list of #define directives associating integer literals in hexadecimal with names such as GL_TEXTURE_2D. Such constants need not be in hexadecimal format, nor do they need to be integer literals. For example:

#define MAX_BUFFER 2048
#define INVALID_HANDLE 0xFFFFFFFF
#define UPDATE_INTERVAL (1.0/30.0)
#define ERROR_STRING "You did something stupid, didn't you?"

All of these can be translated to D as manifest constants:

enum MAX_BUFFER = 2048;
enum INVALID_HANDLE = 0xFFFFFFFF;
enum UPDATE_INTERVAL = 1.0/30.0;
enum ERROR_STRING = "You did something stupid, didn't you?";

Function parameters and return types

When translating function parameters and return types to D, everything that has been said about types so far in this chapter applies. An int is an int, a float is a float, and so on. As mentioned earlier, parameter names can be included or omitted as desired. The important part is that the D types match the C types. However, there is one type of parameter that needs special attention: the static array.

Consider the following C function prototype:

void add_three_elements(float array[3]);

In C, this signature does not cause three floats to be copied when this function is called. Any array passed to this function will still decay to a pointer. Moreover, it may contain fewer than or more than three elements. In short, it isn't different from this declaration:

void add_three_elements(float *array);

A little-known variation is to use the static keyword inside the array brackets:

void add_three_elements(float array[static 3]);

This tells the compiler that the array should contain at least three elements.

To translate the first function to D, we could get away with treating it as taking a float pointer parameter, but that would be misleading to anyone who looks at the source of the binding. The C code is telling us that the function expects three parameters, even though it isn't enforced. For the form that uses the static keyword, the float* approach is an even worse idea, as that would allow the caller to pass an array containing fewer elements than the function expects. In both cases, it's best to use a static array.

We can't just declare an extern(C) function in D that takes a static array and be done with it, though. Recall from Chapter 2, Building a Foundation with D Fundamentals, that a static array in D is passed by value, meaning all of its elements are copied. Try passing one to a C function that expects a C array, which decays to a pointer, and you'll corrupt the stack. The solution is to declare the static array parameter to have the ref storage class:

extern(C) @nogc nothrow void add_three_floats(ref float[3]);

Be careful with static arrays that are aliased. Take, for example, the following C declarations:

typedef float vec3[3];
void vec3_add(vec3 lhs, vec3 rhs, vec3 result);

When we translate the vec3 to D, it's going to look like this:

alias vec3 = float[3];

Once that is done and work begins on translating the function signatures, it's easy to forget that vec3 is actually a static array, especially if it's used in numerous functions. The parameters in vec3_add need to be declared as ref.

One more thing to consider is when const is applied to pointers used as function parameters and return types. For the parameters, the C side doesn't know or care anything about D const, so from that perspective it doesn't matter if the parameter is translated as const on the D side or not. But remember that const parameters serve as a bridge between unqualified, const, and immutable variables, allowing all three to be passed in the same parameter slot. If you don't add the const on the D side, you'll needlessly force callers to cast away const or immutable in some situations. This is particularly annoying when dealing with strings. The short of it is, always translate const function parameters as const.

It's also important to keep the const around when it is applied to return types. The C function is expecting that the contents of the pointer will not be modified. If const is not present on the D side, that contract is easily broken:

// In C
int const *;    // mutable pointer to const int
const int *;    // ditto
int * const;    // const pointer to mutable int
int const * const; // const pointer to const int
// In D
const(int)*     // The first two declarations above
const(int*)     // The second two -- const pointer to
                // mutable int isn't possible in D.

Symbols

Function parameters, struct members, and type aliases need to be named according to the rules set out in Chapter 2, Building a Foundation with D Fundamentals. It's not uncommon to see names in C that are keywords in D. For example, the previous add_vec3 function could easily look like this:

void vec3_add(vec3 lhs, vec3 rhs, vec3 out);

Since out is a D keyword, it can't be used in the translation. The options are to drop the name entirely, or to use a different name. For struct members, dropping it is not an option, so the only choice is to change the name. For example, _out, out_, or anything that can distinguish it from the keyword. If you're trying to maintain conformance with the original C code, you'll want to make it as close to the original as possible.

That solution works for member variables and function parameters, but sometimes C functions might have a D keyword as a name. In this case, prepending an underscore isn't going to work if you're implementing a static binding. D defines a pragma, mangle, which solves the problem. Simply name the function anything you'd like and give the desired name to the pragma. Consider a C function named body. Translated to D:

pragma(mangle, "body") extern(C) void cbody();

We use cbody to avoid conflict with the D keyword body, but the pragma instructs the compiler to use body instead of cbody for its generated output.

Global variables

Just as a function in C is usually separated into a prototype in a header and an implementation in a source module, so is a global variable. This is because anything that is implemented in a header file will be copied directly into every source module that includes that header. In order to have only one global instance, the prototype and implementation must be separate. Here's what a global variable might look like:

// foo.h
extern int g_foo;
// foo.c
int g_foo = 0;

For a static binding, there are three things that need to be accounted for in the translation of g_foo. One is the linkage attribute, since it affects the mangling of the symbol. If the variable is declared to have D linkage, the linker will never be able to find it. Another is the extern keyword. Note that extern(C) indicates a linkage attribute, but extern by itself, with no parentheses, tells the compiler that the symbol is not implemented in the current compilation unit, so it can leave it to the linker to sort things out.

The last thing at issue is something we touched on earlier in this chapter. Recall that variables in D have thread-local storage by default. This is not the case in C. Any global variable declared in C is accessible to all threads. This can't be forgotten when translating. In this case, shared is not an option, since it actually affects the type of the variable. The type must be the same as it is in C. So, once again, we turn to __gshared.

With all of that in mind, the translation of g_foo from C to D should look like this:

__gshared extern extern(C) g_foo;

Substitute System or Windows for C as needed. If there are multiple global variables to declare, a colon or a pair of brackets could be used:

__gshared extern extern(C) {
  int g_foo;
}
__gshared extern extern(C):
  int g_foo;

For dynamic bindings, the variable must be declared as a pointer. In this case, the linkage attribute is not necessary. Since the symbol is going to be loaded manually, having D linkage isn't going to hurt. Also, extern does not apply here. Since the variable is a pointer, it really is implemented on the D side. We can use the same getSymbol implementation we used for loading function pointers to load the address of the actual g_foo into the pointer.

The __gshared attribute isn't a strict requirement in this case, but it ought to be used to make things easier and faster. Remember, space will be reserved for a thread-local pointer in each thread, but it will not be set automatically to point at anything. If you don't want the complexity of calling getSymbol every time a thread is launched, use __gshared. Bear in mind that if it is not used and the pointer is thread-local, that does not affect what the pointer actually points to. Implementing a bunch of thread-local pointers to a global C variable may very well be begging for trouble.

There's one last thing to consider with global variables in dynamic bindings. Because the variable is declared as a pointer, the user will inevitably have to take this into account when assigning a value to it. After all, to set the value of a pointer, the dereference operator has to be used: *g_foo = 10. Not only does this break compatibility with any existing C code, it's very easy to forget. One solution is to use two wrapper functions that can be used as properties. Another is to use a single function that returns a reference.

So, our global variable in a dynamic binding could look like this:

private __gshared int* _foo;
int g_foo() { return *_foo; }
void g_foo(int foo) { *_foo = foo; }

Users can then do:

g_foo = 20;
writeln(g_foo);

This also makes for consistency between the static and dynamic version of a binding, if both are implemented.

Macros

Macros are a common sight in C headers. Here's an example:

#define FOURCC(a,b,c,d) ((d)<<24) | ((c)<<16) | ((b)<<8) | (a)))

Note

Technically, anything implemented with #define is referred to as a macro, but in this text I'm using the term to refer to any #defined bit of code that doesn't establish type aliases or constant values, simply to differentiate between the various types of usage.

There are two options for translating a macro like this: make it a function, or make it a function template. Which approach is taken often boils down to personal preference. The only issue to be wary of is whether the template can be instantiated without the instantiation operator. If not, then existing C code can't be copied verbatim into D. For most macros, like the previous one, that shouldn't be an issue.

Sometimes macros include a cast to a specific type so that it's obvious what the translated function should return. Other times, it must be deduced. It may be possible to derive hints by looking at the C source or examples, or by using existing tools (such as gcc -E), though frequently we are left to figure things out on our own. In this case, given that the macro makes use of the full range of a 32-bit integer, we should choose uint. Then the translated function becomes:

uint FOURCC(uint a, uint b, uint c, uint d) {
  return ((d)<<24) | ((c)<<16) | ((b)<<8) | (a)));
}

Note that a static binding that only includes type and function declarations does not need to be linked at compile time; its modules only need be present on the import path. Adding function bodies means the binding now becomes a link-time dependency. This would not happen if FOURCC were implemented as a template.

Not all macros are this straightforward. Sometimes you have to follow a chain of nested macros to figure out what's going on. That might mean implementing one function for each macro, or perhaps combining them all into one. It largely depends on how they are used on the C side. Sometimes, a macro is not intended to be used by users of the library, but is instead used only in other macros. Ultimately, this sort of thing is a judgment call.

Some macros can't be translated to functions easily. Consider the following:

#define STRINGIFY(s) #s
#define CASESTRING(c) case c: return STRINGIFY(c)

A hash (#) in front of a macro argument expands to the string form of whatever was given to the macro. Some C programmers would prefer to use return #c in the CASESTRING macro, but others would prefer to make it as clear as possible that a symbol is being converted into a string by using a helper such as STRINGIFY.

CASESTRING is a fairly common macro, the purpose of which is to take an enum member, use it in a case statement inside a switch, and return its string representation. Something like this:

switch(enumValue) {
  CASESTRING(BB_ONE);
  CASESTRING(BB_TWO);
  CASESTRING(BB_THREE);
  default: return "Undefined";
}

Macros such as CASESTRING and STRINGIFY are surely intended primarily for internal use in the C library. When they are in the public-facing headers, users of the library can make use of them, but they shouldn't be considered part of the library's API. Given that, and that they have no use in D, there's normally no need to try to translate them when creating a binding.

Sometimes macros are used to give a semblance of inheritance to C struct types:

#define OBJECTBASE 
  int type; 
  const char *name; 
  size_t size;
typedef struct {
  OBJECTBASE
} object_base_t;
typedef struct {
  OBJECTBASE
  int x, y, z;
} extended_object_t;

The backslash () at the end of the first three lines tells the compiler that the macro continues on the next line. We could choose not to implement an equivalent of OBJBASE on the D side and just manually add each field to every struct declaration that needs them, but that's error prone. It's better to go ahead and declare a template or string mixin and use that instead:

mixin template OBJBASE() {
  int type;
  const(char)* name;
  size_t size;
}
struct base_object_t {
  mixin OBJBASE;
}
struct extended_object_t {
  mixin OBJBASE;
  int x, y, z;
}

Sometimes, arguments to a macro are pasted together to form something new. This is akin to D's string mixins, though nowhere near as flexible. Most often, such macros are used for convenience, but sometimes they are used for a specific purpose, such as hiding implementation details. For example, the Win32 API makes use of many different types of object handles. Normally, these handles are aliased to void* with a #define, but when compiled with the preprocessor definition STRICT, they are aliased to something else completely:

#define DECLARE_HANDLE(n) typedef struct n##__{int i;}*n

When this macro is called with something like this:

DECLARE_HANDLE(HMODULE);

It expands to this:

typedef struct HMODULE__ {
    int i;
} *HMODULE;

The __ is pasted on to the macro argument with ## to form the struct name. Translating to D:

struct HMODULE__ {
  int i;
}
alias HMODULE = *HMODULE__;

Here, the struct name need not be HMODULE__. It can effectively be anything. The important bit is the alias. At any rate, whenever pasting with ## is encountered in a macro, careful attention needs to be given to what the macro is doing in order to decide if and how it needs to be translated.

There are so many creative ways to use (or abuse) the C preprocessor that even someone who has been programming in C for more than 20 years can still learn new tricks. Thankfully, it's rare to encounter arcane preprocessor magic, so most of the macros you encounter will be fairly easy to translate. For those cases where you can't figure out quite what's going on, try looking up a tutorial on the C preprocessor or asking for help in the D forums.

Conditional compilation

In D we have version blocks and static if, but C programmers use the preprocessor for conditional compilation. This takes the form of #if, #ifdef, and #if defined. The #if directive is used to test the value of a defined constant:

#define DEBUG_MODE = 1
#if DEBUG_MODE
// Debug code
#else
// Non-debug code
#endif

In this specific case, we could likely get away with debug {} in the D translation, while in others we'd want to use a version block with the same name as the C code. The #if directive can also be used with the > and < operators:

#if DEBUG_MODE > 2
// Debug mode  code
#endif

This translates nicely to D as debug(3) {}.

The #ifdef directive tests whether something has been defined. It's frequently used to test for platform, CPU architecture, and even debug mode:

#ifdef _WIN32
// Windows code
#else
// Other platforms
#endif

_WIN32 is predefined by most C compilers when compiling on Windows. It's easily translatable as version(Windows). Keep in mind that _WIN32 is defined by C compilers even when compiling in 64-bit mode on Windows, while version(Win32) in D means compilation is targeting 32-bit Windows specifically.

#if defined allows multiple checks to be combined into one:

#if defined(linux) || defined(__FreeBSD__)
// Code specific to Linux and FreeBSD
#endif

Recall from Chapter 4, Running Code at Compile Time, that D does not allow Boolean version blocks. There, we saw a way to use static if to achieve the same result, but using version blocks, the previous code would look like this:

// Add this to the top of every module that needs it
version(linux) version = LinuxOrFreeBSD;
else version(FreeBSD) version = LinuxOrFreeBSD;
// Then elsewhere in the module...
version(LinuxOrFreeBSD) { }

Alternatively, the code could be duplicated for each platform.

It's often obvious how to translate predefined preprocessor macros like these to D, but it still helps to familiarize yourself with the predefined macros found in DMC, GCC, and the Microsoft compiler. Tests for custom defines, such as ENABLE_LOGGING, or ALLOW_PNG, are always translated to use version blocks.

One potential source of trouble to be aware of is something like this:

typedef struct {
    float x, y;
#ifdef ENABLE_3D
    float z;
#endif
} vertex_t;

The D translation is straightforward:

struct vertex_t {
    float x, y;
    version(ENABLE_3D) float z;
}

With this type, anything compiled with ENABLE_3D is going to be binary incompatible with anything that isn't. For a C library you control, this is a non-issue. On Windows, it's easy to compile the C library exactly how you want it and, if linking dynamically or using a dynamic binding, ship the DLL with your app. With a widely distributed library, particularly on a system such as Linux where a number of libraries are preinstalled and users often compile their own versions, the potential for breakage is high. Especially when using a dynamic binding. The best thing to do in that scenario is to determine what the most common compile configuration is for the C library and use that as the default for your binding.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.67.70