The first step that must be taken when implementing a binding to a C library is to decide whether it will be a static or dynamic binding. The former requires nothing more than the translation of all of the C types, global variables, and function signatures. The latter additionally requires a supporting API to load a shared library into memory. We'll cover both approaches in this section.
Once the type of binding has been settled, then begins the work of translating the C headers. This requires enough familiarity with C to understand not just the types and function signatures, but also the preprocessor definitions. While we'll cover some common preprocessor usage to look out for and how to translate it to D, there's not enough room here to provide a full tutorial on C preprocessor directives. Those readers unfamiliar with C are strongly advised to take the time to learn some of the basics before taking on the task of translating any C headers.
There are two tools in the D ecosystem that provide some amount of automated translation from C headers to D modules. htod
is available at http://dlang.org/htod.html. It's built from the frontend of DMC, the Digital Mars C and C++ compiler. Another option is DStep, which uses libclang
. It can be found at https://github.com/jacob-carlborg/dstep. Neither tool is perfect; it is often necessary to manually modify the output of both, which makes the information in this chapter relevant. Furthermore, both tools only generate static bindings.
As we saw earlier, the declaration of function prototypes differs for static and dynamic bindings. The one trait they have in common is that they both need to be declared with a linkage attribute. Given that the functions are implemented in C, it's also normally safe to declare them with the @nogc
and nothrow
attributes. With that, in any module where the C function signatures are declared, you can first set aside a block like so:
extern(C) @nogc nothrow { // Function signatures go here }
Replace extern(C)
with extern(System)
or extern(Windows)
as required.
Now, consider the following hypothetical C function declaration in a C header:
extern int some_c_function(int a);
Let's put a declaration in our extern(C)
block for a static binding:
extern(C) @nogc nothrow { int some_c_function(int a); }
A key thing to remember in a static binding is that the function names on the D side must exactly match the function names on the C side. The linker will be doing the work of matching up the function symbols (either via static or dynamic linking) and it isn't smart enough to guess that someCFunction
declared in D is the same as some_c_function
declared in C. It doesn't know or care that they came from different languages. All it knows about is object files and symbols, so the symbols must be the same.
Another thing to consider is that the parameter names are optional. These only have meaning in the function implementation. In the function prototypes, only the types of the parameters matter. You can omit the parameter names completely, or change them to something else. Both of the following will happily link to the C side:
int some_c_function(int bugaloo); int some_c_function(int);
If you intend to add Ddoc comments to your binding, it's a good idea to keep the parameter names to make the documentation more clear. It's also best practice to use the same parameter names as the original C library, though I doubt anyone would fault you for changing the names for clarity where appropriate. Additionally, keeping the parameter names makes it possible to use them with compile-time reflection.
If the original C library is well documented, it's reasonable to point users of your binding to the original documentation rather than providing your own. After all, implementing Ddoc comments for the C API means you have to make sure that you don't contradict the original documentation. Then you have to make sure it all stays in sync. That seems like a waste of effort when a perfectly good set of documentation already exists. Ideally, users of your binding would never need to look at its implementation to understand how to use it. They can get everything they need from the C library documentation and examples.
In a dynamic binding, the declaration might look like this:
extern(C) @nogc nothrow { int function(int) someCFunction; }
Here, we've declared a function pointer instead of a function with no body. Again, the parameter name is not required and, in this case, we've left it out. Moreover, we don't need to use the original function name. In a dynamic binding, the linker plays no role in matching symbols. The programmer will load the shared library, fetch the function address, and assign it to the function pointer manually. That said, you really don't want to use the real function name in this case.
The problem is that the variable that holds the function pointer is also declared as extern(C)
, which means the symbols for the function pointer and the C function will be identical. It may work most of the time on most platforms, but there is potential for things to blow up. I can tell you from experience that with many libraries it will cause errors during application startup. on Linux If you want to keep the same name, use an alias:
extern(C) @nogc nothrow { alias p_some_c_function = int function(int); } __gshared p_some_c_function some_c_function;
We'll see why __gshared
is used here shortly.
If we were making a static binding to this single-function C library, we would be finished. All we would need to do is import the module containing the function declaration and link with the C library, either statically or dynamically, when we build the program. For a dynamic binding, however, there's still more work to do.
A function pointer is useless until it has been given the address of a function. In order to get the address, we have to use the system API to load the library into memory. On Windows, that means using the functions LoadLibrary
, GetProcAddress
, and FreeLibrary
. On the other systems DMD supports, we need dlopen
, dlsym
, and dlclose
. A barebones loader might look something like this, which you can save as $LEARNINGD/Chapter09/clib/loader.d
(we'll make use of it shortly):
module loader; import std.string; version(Posix) { import core.sys.posix.dlfcn; alias SharedLibrary = void*; SharedLibrary loadSharedLibrary(string libName) { return dlopen(libName.toStringz(), RTLD_NOW); } void unload(SharedLibrary lib) { dlclose(lib); } void* getSymbol(SharedLibrary lib, string symbolName) { return dlsym(lib, symbolName.toStringz()); } } else version(Windows) { import core.sys.windows.windows; alias SharedLibrary = HMODULE; SharedLibrary loadSharedLibrary(string libName) { import std.utf : toUTF16z; return LoadLibraryW(libName.toUTF16z()); } void unload(SharedLibrary lib) { FreeLibrary(lib); } void* getSymbol(SharedLibrary lib, string symbolName) { return GetProcAddress(lib, symbolName.toStringz()); } } else static assert(0, "SharedLibrary unsupported on this platform.");
With that in hand, we can do something like this to load a library:
auto lib = loadSharedLibrary(libName); if(!lib) throw new Error("Failed to load library " ~ libName);
To load a function pointer that is not aliased, such as someCFunction
which we saw previously, we can't just call getSymbol
and assign the return value directly to the function pointer. The return type is void*
, but the compiler requires a cast to the function pointer type. However, the type used with the cast
operator must include the linkage and function attributes used in the declaration of the function pointer. It's not possible to include all of that directly in the cast. There are different ways to handle it.
The simplest thing to do in our case is to call typeof
on someCFunction
and use the result as the type in the cast
:
someCFunction = cast(typeof(someCFunction))lib.loadSymbol("some_c_function");
Things are different when the aliased form is used:
extern(C) @nogc nothrow { alias p_some_c_function = int function(int); } __gshared p_some_c_function some_c_function;
With this, we can then load the function like so:
some_c_function = cast(p_some_c_function))lib.loadSymbol("some_c_function");
One issue with this approach is that some_c_function
, being a variable, has thread-local storage like any other variable in D. This means that in a multi-threaded app, every thread will have a copy of the pointer, but it will be null
in all of them except for the one in which the library is loaded. There are two ways to solve this. The hard way is to make sure that getSymbol
is called once in every thread. The easy way is to add __gshared
to the declaration as we have done.
We'll dig into this a little more in Chapter 11, Taking D to the Next Level, but there are two ways to make a variable available across threads in D: __gshared
and shared
. The former has no guarantees. All it does is put the variable in global storage, making it just like any global variable in C or C++. The latter actually affects the type of the variable; the compiler is able to help prevent the variable from being used in ways it shouldn't be.
In the $LEARNINGD/Chapter09/clib
directory of the book's sample source distribution, you'll find a C source file, clib.c
, which looks like this:
#include <stdio.h> #ifdef _MSC_VER __declspec(dllexport) #endif int some_c_function(int a) { printf("Hello, D! From C! %d ", a); return a + 20; }
This is accompanied by four Windows-specific binaries: clib32.lib
, clib32.dll
, clib64.lib
, and clib64.dll
. The library files are import libraries, not static libraries, intended to be linked with the static binding. Because they are import libraries, each has a runtime dependency on its corresponding DLL.
If you are working on a platform other than Windows, you can use GCC (or clang, if it is your system compiler) to compile the corresponding version of the shared library for your system. The following command line should get the job done:
gcc -shared -o libclib.so -fPIC clib.c
You'll also find loader.d
, the implementation of which we saw previously, and a D module named dclib.d
. The top part of this module provides both a static and dynamic binding to some_c_function
. The rest shows how to use the two versions of the binding. The implementation is:
extern(C) @nogc nothrow { version(ClibDynamic) int function(int) some_c_function; else int some_c_function(int); } void main() { version(ClibDynamic) { import loader; version(Win64) enum libName = "clib64.dll"; else version(Win32) enum libName = "clib32.dll"; else enum libName = "libclib.so"; auto lib = loadSharedLibrary(libName); if(!lib) throw new Exception("Failed to load library " ~ libName); some_c_function = cast(typeof(some_c_function))lib.loadSymbol("some_c_function"); if(!some_c_function) throw new Exception("Failed to load some_c_function"); } import std.stdio : writeln; writeln(some_c_function(10)); }
The command line used to compile all of this depends on the platform and linker you are using. Compiling to use the static binding in the default 32-bit mode on Windows:
dmd dclib.d clib32.lib -oftestStatic
This uses the static binding, links with clib32.lib,
and creates an executable named testStatic.exe
. To see the sort of error a user would get when the DLL is missing, temporarily rename clib32.dll
and execute the program. To test the dynamic binding, use this command line:
dmd -version=ClibDynamic dclib.d loader.d -oftestDynamic
This time, we don't link to anything, but need to compile loader.d
along with the main module. We specify the version ClibDynamic
to trigger the correct code path and we output the binary as testDynamic.exe
to avoid mixing it up with testStatic
. Once again, temporarily rename clib32.dll
and see what happens. When manually loading a shared library like this, the loader also needs to manually handle the case where loading fails. One benefit of this approach is that it provides the opportunity to display a message box with a user-friendly message instructing the user on how to solve the problem, or provide a link to a web page that does.
Compiling in 64-bit mode with the MS linker is similar:
dmd -m64 dclib.d clib64.lib -oftestStatic64 dmd -m64 -version=ClibDynamic dclib.d loader.d -oftestDynamic64
Again, we're using distinct file names to avoid overwriting the 32-bit binaries.
When compiling the static binding with GCC on other platforms, we need to tell the linker to look for the library in the current directory, as it is not on the library search path by default. -L-L.
will make that happen. Then we can use -L-lclib
to link the library:
dmd -L-L. -L-lclib dclib.d -oftestStatic
Compiling the dynamic binding is almost the same as on Windows, but on Linux (not Mac or the BSDs) we have to link with libdl
to have access to dlopen
and friends:
dmd -version=ClibDynamic -L-ldl dclib.d loader.d -oftestDynamic
When executing either version at this point, you will most likely see an error telling you that libclib.so
can't be found. Unlike Windows, Unix-like systems are generally not configured to search for shared libraries in the executable's directory. In order for the library to be found, you can copy it to one of the system paths (such as /usr/lib
) or, preferred for this simple test case, temporarily add the executable directory to the LD_LIBRARY_PATH
environment variable. Assuming you are working in ~/LearningD/Chapter09/clib
, then the following command will do it:
export LD_LIBRARY_PATH=~/LearningD/Chapter09/clib:$LD_LIBRARY_PATH
With that, you should be able to execute ./testStatic
and ./testDynamic
just fine.
No matter the platform or linker, a successful run should print these lines to the console:
Hello, D! From C! 10 30
Getting from C to D in terms of types is rather simple. Most of the basic types are directly equivalent, as you can see from the following table:
C types |
D types |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There are a few entries in this table that warrant explanation. First, the translation of signed char
and unsigned char
to byte
and ubyte
applies when the C types are used to represent numbers rather than strings. However, it's rare to see C code with char
types explicitly declared as signed
. The reason it appears in this table is because the C specification does not specify that the default char
type be signed or unsigned, but rather leaves it implementation defined. In practice, most C compilers implement the default char
as a signed type, which matches the default for other types (short
, int
, and long
), but it's still possible to encounter libraries that have been compiled with the default char
type to be unsigned; GCC supports the -funsigned-char
command line switch that does just that. So while it's generally safe to treat the default C char
as signed, be on the lookout for corner cases.
The size of the C long
and unsigned long
types can differ among C compilers. Some implement them as 32-bit types and others as 64-bit. To account for that difference, it's best to import the DRuntime module core.stdc.config
and use the c_long
and c_ulong
types, which match the size of the long
and unsigned long
types implemented by the backend. When it comes to long double
, you may come across some documentation or an old forum post that recommends translating it to real
in D. Once upon a time, that was the correct thing to do, but that has not been true since DMD gained support for the Microsoft toolchain. There, the size of long double
is 64 bits, rather than 80. To translate this type, import core.stdc.config
and use c_long_double
. This list of special cases could grow as support is added for more compilers and platforms.
Don't define your own
If, for whatever reason, you're tempted to avoid importing core.stdc.config
and declare your own alias for c_long_double
to treat it as a double
when using the MS backend, please don't. The compiler specially recognizes c_long_double
when it's used with the MS backend and generates special name mangling for instances of that type. Using anything else could break ABI compatibility.
There are two character types in C, char
and wchar_t
. Strings in C are represented as arrays of either type, most often referred to as char*
and wchar_t*
strings. The former can be translated to D directly as char*
, although some prefer to translate it as ubyte*
to reflect the fact that the D char
type is always encoded as UTF-8, while there is no such guarantee on the C side. In practice, this is more of an issue for how the instances of the type are used, more than how they are translated.
The wchar_t
type can't be directly translated. The issue is that the size and encoding of wchar_t
is not consistent across platforms. On Windows, it is a 2-byte value encoded as UTF-16, while on other platforms it is a 4-byte value encoded as UTF-32. There is no wrapper type in core.stdc.config
, but in this case it's easy to resolve:
version(Windows) alias wchar_t = wchar; else alias wchar_t = dchar;
The types
size_t
and ptrdiff_t
are often used in C. These are types that are not part of the language, but are defined in the standard library. D also provides aliases that correspond exactly to the type and size of each as defined in the relevant C compiler, so a direct translation is appropriate. They are always globally available, so no special import is required.
Complex types have been a part of C since C99. There are three types, float _Complex
, double _Complex
, and long double _Complex
. In D, these translate to, respectively, cfloat
, cdouble
, and creal
. The functions found in the C header file complex.h
are translated to D in the DRuntime module core.stdc.complex
. It's expected that these three complex types will be deprecated at some point in the future, to be replaced by the type std.complex.Complex
, which is usable now.
C99 also specifies a Boolean type, _Bool
, which is typedef
ed to bool
in stdbool.h
. The specification requires only that the type be large enough to hold the values 0
and 1
. The C compilers we need to worry about for DMD implement _Bool
as a 1-byte type. As such, the C _Bool
or bool
can be translated directly to D bool
. Older versions of GCC implemented it as a 4-byte type, so be on the lookout if you're ever forced to compile a D program against C libraries compiled with GCC 3.x.
C99 also introduced stdint.h
to the C standard library. This provides a number of typedef
s for integer types of a guaranteed size. For example, int8_t
, uint8_t
, int16_t
, uint16_t,
and so on. When you encounter these in a C header, you have two options for translation. One option is just to translate them to the D type of the same size. For example, int8_t
and uint8_t
would translate to byte
and ubyte
. The other option is to import core.stdc.stdint
and use the C types directly.
The C enum
and the D enum
are equivalent, so that a direct translation is possible. An example:
// In C enum { BB_ONE, BB_TWO, BB_TEN = 10 }; // In D enum { BB_ONE, BB_TWO, BB_TEN = 10 }
Some thought needs to be given toward how to translate named enums. Consider the following:
typedef enum colors_t { COL_RED, COL_GREEN, COL_BLUE }
It may seem that a direct translation to D would look like this:
enum colors_t { COL_RED, COL_GREEN, COL_BLUE }
However, that isn't an accurate translation. There is no notion of enum
namespaces in C, but to access the members of this enum
in D would require using the colors_t
namespace, for example, colors_t.COL_RED
. The following would be more appropriately called a direct translation:
alias colors_t = int; enum { COL_RED, COL_GREEN, COL_BLUE }
Now this can be used exactly as the type is used on the C side, which is important when you want to maintain compatibility with existing C code. The following approach allows for both C and D styles:
enum Colors { red, green, blue, } alias colors_t = Colors; enum { COL_RED = Colors.red, COL_GREEN = Colors.green, COL_BLUE = Colors.blue, }
The D struct
is binary compatible with the C struct
, so here too, most translations are direct. An example is:
// In C struct point { int x, y; }; // In D struct point { int x, y; }
The only difference between these two types shows up in usage. In C, anywhere the point
type is to be used, the struct
keyword must be included, for example, in the declaration struct point p
. Many C programmers prefer to use a typedef
for their struct
types, which eliminates the need for the struct
keyword in variable declarations and function parameters by creating an alias for the type. This typically takes one of two forms:
// Option 1 typedef struct point point_t; struct point { int x, y; }; // Option 2 typedef struct { int x, y; } point_t;
Option 2 is shorthand for Option 1, with the caveat that point_t
is not usable inside the braces. A good example of where this comes into play is a linked list node:
typedef struct node_s node_t; struct node_s { void *item; node_t *next; }; typedef struct node_s { void *item; struct node_s *next; } node_t;
Note that in the first version of node_s
, the the typedef
ed name is used.. This is perfectly legal when using an external typedef
, but it isn't possible in the second version. There, since node_t
is not visible inside the braces, the struct
keyword cannot be omitted in the declaration of the member next
. In D, the two types look like this, regardless of which approach was used in their C declarations:
struct point_t { int x, y; } struct node_t { void* item; node_t* next; }
When any C struct
has a typedef
ed alias, the alias should always be preferred in the D translation.
It's possible to have multiple type aliases on a single C struct
. This is most often used to declare both a value type and a pointer type:
typedef struct { int x, y; } point_t, *pointptr_t;
In D, we would translate this as:
struct point_t { int x, y; } alias pointptr_t = point_t*;
Sometimes, a C struct
is aliased only to a pointer and without a struct
name. In that case, only pointers to that type can be declared:
typedef struct { int i; } *handle_t;
Most often, when this form is used, the members of the struct
are only intended to be used internally. If that's the case, the struct
can be declared like this on the D side:
struct _handle; alias handle_t = _handle*;
We could declare _handle
as an empty struct
, but by omitting the braces we prevent anyone from declaring any variables of type _handle
. Moreover, no TypeInfo
is generated in this case, but it would be with an empty struct
. These days, most C programmers would likely not implement a handle type like this. A more likely implementation today would look like this:
typedef struct handle_s handle_t;
In the public-facing API, there is no implementation of handle_s
. It is hidden away inside one of the source modules. Given only the header file, the compiler assumes that struct handle_s
is implemented somewhere and will let the linker sort it out. However, without the implementation handy, the compiler has no way of determining the size of a handle_t
. As such, it will only allow the declaration of pointers. The C API will then contain a number of functions that look like this:
handle_t* create_handle(int some_arg); void manipulate_handle(handle_t *handle, int some_arg); void destroy_handle(handle_t *handle);
In D, we can declare handle_t
the same way we declared _handle
previously:
struct handle_t;
Another C idiom that isn't so common these days, but may still be encountered now and again, is the inclusion of an array declaration in the declaration of the struct
type itself. For example:
struct point { int x, y; } points[3] = { {10, 20}, {30, 40}, {50, 60} };
D does not support this syntax. The array must be declared separately:
struct point { int x, y; } point[3] points = [ point(10, 20), point(30, 40), point(50, 60) ];
While C pointers are directly translatable to D, it pays to keep in mind the difference in declaration syntax. Consider the declarations of these two variables in C:
int* px, x;
This is not a declaration of two int
pointers, but rather one int
pointer, px
, and one int
. In a perfect world, all C programmers would conform to a style that brings a little clarity:
int *px, x;
Or, better still, declare the variables on separate lines. As it stands, there are a variety of styles that must be interpreted when reading C code. At any rate, the previous declarations in D must be separated:
int* px; int x;
Always remember that a pointer in a variable declaration in D is associated with the type, not the variable.
It's not uncommon to see type aliases in C libraries. One common use is to define fixed-size integers. Since the C99 standard was released, such types have been available in stdint.h
, but not all C compilers support C99. A great many C libraries are still written against the C89 standard for the widest possible platform support, so you will frequently encounter typedef
ed and #define
d aliases for integer types to hide any differences in type sizes across platforms. Here are a couple of examples:
typedef signed char Sint8; typedef unsigned char Uint8;
Despite the name, the C typedef
does not create a new type, only an alias. Whenever the compiler sees Sint8
, it effectively replaces it with signed char
. The following defines have the same effect, but are handled by the preprocessor rather than the compiler:
#define Sint8 signed char #define Uint8 unsigned char
The preprocessor parses a source module before the compiler does, substituting every instance of Sint8
and Uint8
with signed char
and unsigned char
. The typedef
approach is generally preferred and is much more common. Both approaches can be translated to D using alias
declarations:
alias Sint8 = sbyte; alias Uint8 = ubyte;
It is not strictly necessary to translate type aliases, as the actual types, byte
and ubyte
in this case, can be used directly. But again, maintaining conformance with the original C library should always be a priority. It also minimizes the risk of introducing bugs when translating function parameters, struct
members, or global variables.
In C libraries, function pointers are often declared for use as callbacks and to simulate struct
member functions. They might be aliased with a typedef
, but sometimes they aren't. For example:
typedef struct { void* (*alloc)(size_t); void (*dealloc)(void*); } allocator_t; void set_allocator(allocator_t *allocator);
And using type aliases:
typedef void* (*AllocFunc)(size_t); typedef void (*DeallocFunc)(void*); typedef struct { AllocFunc alloc; DeallocFunc dealloc; } allocator_t; void set_allocator_funcs(AllocFunc alloc, DeallocFunc dealloc);
Sometimes, they are declared as function parameters:
void set_alloc_func(void* (*alloc)(size_t));
There are a couple of things to remember when translating these to D. First, callbacks should always follow the same calling convention they have in the C header, meaning they must be given the appropriate linkage attribute. Second, they should probably always be marked with the nothrow
attribute for reasons that will be explained later in the chapter, but it isn't always quite so clear whether or not to use @nogc
.
Function pointers intended for use as callbacks aren't intended to be called in D code. The pointers will be handed off to the C side and called from there. From that perspective, it doesn't matter whether they are marked @nogc
or not, as the C side can call the function through the pointer either way. However, it makes a big difference for the user of the binding. @nogc
means they won't be able to do something as common as calling writeln
to log information from the callback. In our specific example, it's not a bad thing for the user to want the AllocFunc
implementation to allocate GC memory (as long as he or she keeps track of it). Consider carefully before adding @nogc
to a callback, but, as a general rule, it's best to lean toward omitting it.
Careful consideration should also be given to function pointers that aren't callbacks, but are intended for use as struct
members. These may actually be called on the D side and may need to be called from @nogc
functions. In this case, it might make sense to mark them as @nogc
. Doing so prevents any GC allocations from taking place in the implementations, but not doing so prevents them from being called by other @nogc
functions. Consider how the type is intended to be used, and what tasks the function pointers are intended to perform, and use that to help guide your decision. Of course, if the function pointers are set to point at functions on the C side, then go ahead and add @nogc
and nothrow
to your heart's content.
With that, we can translate each of the previous declarations. The first looks like this:
struct allocator_t { extern(C): void* function(size_t) alloc; void function(void*) dealloc; }
The function set_allocator
can be translated directly. From the second snippet, allocator_t
and set_allocator_funcs
can be translated directly. AllocFunc
and DeallocFunc
become aliases:
extern(C) nothrow { alias AllocFunc = void* function(size_t); alias DeallocFunc = void function(void*); }
Finally, the function set_alloc_func
could be translated like this (using the form for a static binding):
extern(C) @nogc nothrow { void set_alloc_func(void* function(size_t)); }
In this situation, a function pointer declared as a function parameter picks up the
extern(C)
linkage, but does not pick up the two function attributes. If you want the callback implementation to be nothrow
, you'll have to declare it like this:
extern(C) @nogc nothrow { void set_alloc_func(void* function(size_t) nothrow); }
It may be preferable to go ahead and alias the callback anyway:
extern(C): alias AllocFunc = void* function(size_t) nothrow; void set_alloc_func(AllocFunc) @nogc nothrow;
Despite the presence of an enum
type in C, it's not uncommon for C programmers to use the preprocessor to define constant values. A widely used library that does this is OpenGL. Just take a peek at any implementation of the OpenGL headers and you'll be greeted with a massive list of #define
directives associating integer literals in hexadecimal with names such as GL_TEXTURE_2D
. Such constants need not be in hexadecimal format, nor do they need to be integer literals. For example:
#define MAX_BUFFER 2048 #define INVALID_HANDLE 0xFFFFFFFF #define UPDATE_INTERVAL (1.0/30.0) #define ERROR_STRING "You did something stupid, didn't you?"
All of these can be translated to D as manifest constants:
enum MAX_BUFFER = 2048; enum INVALID_HANDLE = 0xFFFFFFFF; enum UPDATE_INTERVAL = 1.0/30.0; enum ERROR_STRING = "You did something stupid, didn't you?";
When translating function parameters and return types to D, everything that has been said about types so far in this chapter applies. An int
is an int
, a float
is a float
, and so on. As mentioned earlier, parameter names can be included or omitted as desired. The important part is that the D types match the C types. However, there is one type of parameter that needs special attention: the static array.
Consider the following C function prototype:
void add_three_elements(float array[3]);
In C, this signature does not cause three floats to be copied when this function is called. Any array passed to this function will still decay to a pointer. Moreover, it may contain fewer than or more than three elements. In short, it isn't different from this declaration:
void add_three_elements(float *array);
A little-known variation is to use the static keyword inside the array brackets:
void add_three_elements(float array[static 3]);
This tells the compiler that the array should contain at least three elements.
To translate the first function to D, we could get away with treating it as taking a float
pointer parameter, but that would be misleading to anyone who looks at the source of the binding. The C code is telling us that the function expects three parameters, even though it isn't enforced. For the form that uses the static
keyword, the float*
approach is an even worse idea, as that would allow the caller to pass an array containing fewer elements than the function expects. In both cases, it's best to use a static array.
We can't just declare an extern(C)
function in D that takes a static array and be done with it, though. Recall from Chapter 2, Building a Foundation with D Fundamentals, that a static array in D is passed by value, meaning all of its elements are copied. Try passing one to a C function that expects a C array, which decays to a pointer, and you'll corrupt the stack. The solution is to declare the static array parameter to have the ref
storage class:
extern(C) @nogc nothrow void add_three_floats(ref float[3]);
Be careful with static arrays that are aliased. Take, for example, the following C declarations:
typedef float vec3[3]; void vec3_add(vec3 lhs, vec3 rhs, vec3 result);
When we translate the vec3
to D, it's going to look like this:
alias vec3 = float[3];
Once that is done and work begins on translating the function signatures, it's easy to forget that vec3
is actually a static array, especially if it's used in numerous functions. The parameters in vec3_add
need to be declared as ref
.
One more thing to consider is when const
is applied to pointers used as function parameters and return types. For the parameters, the C side doesn't know or care anything about D const
, so from that perspective it doesn't matter if the parameter is translated as const
on the D side or not. But remember that const
parameters serve as a bridge between unqualified, const,
and immutable
variables, allowing all three to be passed in the same parameter slot. If you don't add the const
on the D side, you'll needlessly force callers to cast away const
or immutable
in some situations. This is particularly annoying when dealing with strings. The short of it is, always translate const
function parameters as const
.
It's also important to keep the const
around when it is applied to return types. The C function is expecting that the contents of the pointer will not be modified. If const
is not present on the D side, that contract is easily broken:
// In C int const *; // mutable pointer to const int const int *; // ditto int * const; // const pointer to mutable int int const * const; // const pointer to const int // In D const(int)* // The first two declarations above const(int*) // The second two -- const pointer to // mutable int isn't possible in D.
Function parameters, struct
members, and type aliases need to be named according to the rules set out in Chapter 2, Building a Foundation with D Fundamentals. It's not uncommon to see names in C that are keywords in D. For example, the previous add_vec3
function could easily look like this:
void vec3_add(vec3 lhs, vec3 rhs, vec3 out);
Since out
is a D keyword, it can't be used in the translation. The options are to drop the name entirely, or to use a different name. For struct
members, dropping it is not an option, so the only choice is to change the name. For example, _out
, out_
, or anything that can distinguish it from the keyword. If you're trying to maintain conformance with the original C code, you'll want to make it as close to the original as possible.
That solution works for member variables and function parameters, but sometimes C functions might have a D keyword as a name. In this case, prepending an underscore isn't going to work if you're implementing a static binding. D defines a pragma, mangle, which solves the problem. Simply name the function anything you'd like and give the desired name to the pragma. Consider a C function named body
. Translated to D:
pragma(mangle, "body") extern(C) void cbody();
We use cbody
to avoid conflict with the D keyword body
, but the pragma instructs the compiler to use body
instead of cbody
for its generated output.
Just as a function in C is usually separated into a prototype in a header and an implementation in a source module, so is a global variable. This is because anything that is implemented in a header file will be copied directly into every source module that includes that header. In order to have only one global instance, the prototype and implementation must be separate. Here's what a global variable might look like:
// foo.h extern int g_foo; // foo.c int g_foo = 0;
For a static binding, there are three things that need to be accounted for in the translation of g_foo
. One is the linkage attribute, since it affects the mangling of the symbol. If the variable is declared to have D linkage, the linker will never be able to find it. Another is the extern
keyword. Note that extern(C)
indicates a linkage attribute, but extern
by itself, with no parentheses, tells the compiler that the symbol is not implemented in the current compilation unit, so it can leave it to the linker to sort things out.
The last thing at issue is something we touched on earlier in this chapter. Recall that variables in D have thread-local storage by default. This is not the case in C. Any global variable declared in C is accessible to all threads. This can't be forgotten when translating. In this case, shared
is not an option, since it actually affects the type of the variable. The type must be the same as it is in C. So, once again, we turn to __gshared
.
With all of that in mind, the translation of g_foo
from C to D should look like this:
__gshared extern extern(C) g_foo;
Substitute System
or Windows
for C
as needed. If there are multiple global variables to declare, a colon or a pair of brackets could be used:
__gshared extern extern(C) { int g_foo; } __gshared extern extern(C): int g_foo;
For dynamic bindings, the variable must be declared as a pointer. In this case, the linkage attribute is not necessary. Since the symbol is going to be loaded manually, having D linkage isn't going to hurt. Also, extern
does not apply here. Since the variable is a pointer, it really is implemented on the D side. We can use the same getSymbol
implementation we used for loading function pointers to load the address of the actual g_foo
into the pointer.
The __gshared
attribute isn't a strict requirement in this case, but it ought to be used to make things easier and faster. Remember, space will be reserved for a thread-local pointer in each thread, but it will not be set automatically to point at anything. If you don't want the complexity of calling getSymbol
every time a thread is launched, use __gshared
. Bear in mind that if it is not used and the pointer is thread-local, that does not affect what the pointer actually points to. Implementing a bunch of thread-local pointers to a global C variable may very well be begging for trouble.
There's one last thing to consider with global variables in dynamic bindings. Because the variable is declared as a pointer, the user will inevitably have to take this into account when assigning a value to it. After all, to set the value of a pointer, the dereference operator has to be used: *g_foo = 10
. Not only does this break compatibility with any existing C code, it's very easy to forget. One solution is to use two wrapper functions that can be used as properties. Another is to use a single function that returns a reference.
So, our global variable in a dynamic binding could look like this:
private __gshared int* _foo; int g_foo() { return *_foo; } void g_foo(int foo) { *_foo = foo; }
Users can then do:
g_foo = 20; writeln(g_foo);
This also makes for consistency between the static and dynamic version of a binding, if both are implemented.
Macros are a common sight in C headers. Here's an example:
#define FOURCC(a,b,c,d) ((d)<<24) | ((c)<<16) | ((b)<<8) | (a)))
There are two options for translating a macro like this: make it a function, or make it a function template. Which approach is taken often boils down to personal preference. The only issue to be wary of is whether the template can be instantiated without the instantiation operator. If not, then existing C code can't be copied verbatim into D. For most macros, like the previous one, that shouldn't be an issue.
Sometimes macros include a cast to a specific type so that it's obvious what the translated function should return. Other times, it must be deduced. It may be possible to derive hints by looking at the C source or examples, or by using existing tools (such as gcc -E
), though frequently we are left to figure things out on our own. In this case, given that the macro makes use of the full range of a 32-bit integer, we should choose uint
. Then the translated function becomes:
uint FOURCC(uint a, uint b, uint c, uint d) { return ((d)<<24) | ((c)<<16) | ((b)<<8) | (a))); }
Note that a static binding that only includes type and function declarations does not need to be linked at compile time; its modules only need be present on the import path. Adding function bodies means the binding now becomes a link-time dependency. This would not happen if FOURCC were implemented as a template.
Not all macros are this straightforward. Sometimes you have to follow a chain of nested macros to figure out what's going on. That might mean implementing one function for each macro, or perhaps combining them all into one. It largely depends on how they are used on the C side. Sometimes, a macro is not intended to be used by users of the library, but is instead used only in other macros. Ultimately, this sort of thing is a judgment call.
Some macros can't be translated to functions easily. Consider the following:
#define STRINGIFY(s) #s #define CASESTRING(c) case c: return STRINGIFY(c)
A hash (#
) in front of a macro argument expands to the string form of whatever was given to the macro. Some C programmers would prefer to use return #c
in the CASESTRING
macro, but others would prefer to make it as clear as possible that a symbol is being converted into a string by using a helper such as STRINGIFY
.
CASESTRING
is a fairly common macro, the purpose of which is to take an enum
member, use it in a case
statement inside a switch
, and return its string representation. Something like this:
switch(enumValue) { CASESTRING(BB_ONE); CASESTRING(BB_TWO); CASESTRING(BB_THREE); default: return "Undefined"; }
Macros such as CASESTRING
and STRINGIFY
are surely intended primarily for internal use in the C library. When they are in the public-facing headers, users of the library can make use of them, but they shouldn't be considered part of the library's API. Given that, and that they have no use in D, there's normally no need to try to translate them when creating a binding.
Sometimes macros are used to give a semblance of inheritance to C struct
types:
#define OBJECTBASE int type; const char *name; size_t size; typedef struct { OBJECTBASE } object_base_t; typedef struct { OBJECTBASE int x, y, z; } extended_object_t;
The backslash () at the end of the first three lines tells the compiler that the macro continues on the next line. We could choose not to implement an equivalent of
OBJBASE
on the D side and just manually add each field to every struct
declaration that needs them, but that's error prone. It's better to go ahead and declare a template or string mixin and use that instead:
mixin template OBJBASE() { int type; const(char)* name; size_t size; } struct base_object_t { mixin OBJBASE; } struct extended_object_t { mixin OBJBASE; int x, y, z; }
Sometimes, arguments to a macro are pasted together to form something new. This is akin to D's string mixins, though nowhere near as flexible. Most often, such macros are used for convenience, but sometimes they are used for a specific purpose, such as hiding implementation details. For example, the Win32 API makes use of many different types of object handles. Normally, these handles are aliased to void*
with a #define
, but when compiled with the preprocessor definition STRICT
, they are aliased to something else completely:
#define DECLARE_HANDLE(n) typedef struct n##__{int i;}*n
When this macro is called with something like this:
DECLARE_HANDLE(HMODULE);
It expands to this:
typedef struct HMODULE__ { int i; } *HMODULE;
The __
is pasted on to the macro argument with ##
to form the struct
name. Translating to D:
struct HMODULE__ { int i; } alias HMODULE = *HMODULE__;
Here, the struct
name need not be HMODULE__
. It can effectively be anything. The important bit is the alias
. At any rate, whenever pasting with ##
is encountered in a macro, careful attention needs to be given to what the macro is doing in order to decide if and how it needs to be translated.
There are so many creative ways to use (or abuse) the C preprocessor that even someone who has been programming in C for more than 20 years can still learn new tricks. Thankfully, it's rare to encounter arcane preprocessor magic, so most of the macros you encounter will be fairly easy to translate. For those cases where you can't figure out quite what's going on, try looking up a tutorial on the C preprocessor or asking for help in the D forums.
In D we have
version
blocks and static if
, but C programmers use the preprocessor for conditional compilation. This takes the form of #if
, #ifdef
, and #if defined
. The #if
directive is used to test the value of a defined constant:
#define DEBUG_MODE = 1 #if DEBUG_MODE // Debug code #else // Non-debug code #endif
In this specific case, we could likely get away with debug {}
in the D translation, while in others we'd want to use a version
block with the same name as the C code. The #if
directive can also be used with the >
and <
operators:
#if DEBUG_MODE > 2 // Debug mode code #endif
This translates nicely to D as debug(3) {}
.
The #ifdef
directive tests whether something has been defined. It's frequently used to test for platform, CPU architecture, and even debug mode:
#ifdef _WIN32 // Windows code #else // Other platforms #endif
_WIN32
is predefined by most C compilers when compiling on Windows. It's easily translatable as version(Windows)
. Keep in mind that _WIN32
is defined by C compilers even when compiling in 64-bit mode on Windows, while version(Win32)
in D means compilation is targeting 32-bit Windows specifically.
#if defined
allows multiple checks to be combined into one:
#if defined(linux) || defined(__FreeBSD__) // Code specific to Linux and FreeBSD #endif
Recall from Chapter 4, Running Code at Compile Time, that D does not allow Boolean version
blocks. There, we saw a way to use static if
to achieve the same result, but using version
blocks, the previous code would look like this:
// Add this to the top of every module that needs it version(linux) version = LinuxOrFreeBSD; else version(FreeBSD) version = LinuxOrFreeBSD; // Then elsewhere in the module... version(LinuxOrFreeBSD) { }
Alternatively, the code could be duplicated for each platform.
It's often obvious how to translate predefined preprocessor macros like these to D, but it still helps to familiarize yourself with the predefined macros found in DMC, GCC, and the Microsoft compiler. Tests for custom defines, such as ENABLE_LOGGING
, or ALLOW_PNG
, are always translated to use version blocks.
One potential source of trouble to be aware of is something like this:
typedef struct { float x, y; #ifdef ENABLE_3D float z; #endif } vertex_t;
The D translation is straightforward:
struct vertex_t { float x, y; version(ENABLE_3D) float z; }
With this type, anything compiled with ENABLE_3D
is going to be binary incompatible with anything that isn't. For a C library you control, this is a non-issue. On Windows, it's easy to compile the C library exactly how you want it and, if linking dynamically or using a dynamic binding, ship the DLL with your app. With a widely distributed library, particularly on a system such as Linux where a number of libraries are preinstalled and users often compile their own versions, the potential for breakage is high. Especially when using a dynamic binding. The best thing to do in that scenario is to determine what the most common compile configuration is for the C library and use that as the default for your binding.
18.116.67.70