Calling C from D

Once a binding is complete, there are potential crash-inducing bugs to be on the lookout for. Incorrect linkage attributes, the wrong number of function parameters, or any given function parameter declared with the wrong size can all bring the house down when the problem function is called. These are issues on the implementation side. There are other potential problems on the user's side that can also cause crashes or unexpected behavior. That's the focus of this section.

D arrays and C arrays

The inherent difference between C arrays and D arrays is a potential source of both compile-time and runtime errors. Here, we'll see the major issues to be aware of.

Basic arrays

When a C function expects to take an array as an argument, the corresponding parameter is declared as a pointer. Take the following example:

#include <stdio.h>
void printThreeInts(int *ints) {
  int i;
  for(i=0; i<3; ++i)
    printf("%i
", ints[i]);
}

It's common practice in C to require an array length to be passed along with the array pointer in functions that have array parameters. In this case, a length parameter is omitted since the expected length of the array is in the name of the function, a practice that, while dangerous and error-prone, isn't exactly uncommon.

C programmers really have to be on their toes, but that means D programmers also have to pay close attention when interacting with C. It's so easy to get used to the safety of D's arrays and forget that calling C functions often requires extra vigilance. Any array you pass to this function must have at least three elements. What happens when you pass a shorter array depends entirely on what exists in the memory locations beyond the end of the array at the time the function is called.

There are two ways to call printThreeInts that require no copying or conversions. One is to use a cast, something that is not recommended. The other, recommended, way is to use the .ptr property common to all arrays:

extern(C) @nogc nothrow void printThreeInts(int*);
void main() {
  int[] ints = [1, 2, 3];
  printThreeInts(cast(int*)ints);
  printThreeInts(ints.ptr);
}

Knowing that a D array is a data structure that contains a length and a pointer, it may appear baffling that we can legally cast it to a pointer. Consider this:

struct IntArray {
  size_t length;
  int* ptr;
}
auto ia = IntArray(ints.length, ints.ptr);
auto pi = cast(int*)ia;

If you were to insert this at the end of main and try to compile, the compiler would rightly complain that you can't cast ia of type IntArray to int*. But that's because your IntArray type doesn't have the privileges given to built-in arrays.

When we cast our ints array to int*, the compiler is smart enough to pick up on the fact that all we really want to do is substitute ints.ptr in place of the cast. That's essentially what it does. Often, new D programmers tend to use the cast, probably because that's the sort of thing they're used to in other languages, and there is some existing D code out there that does that. Veteran D programmers tend to prefer using the .ptr property directly. Not only is it fewer characters to type and makes it clear exactly what's going on, but there's no room for getting the type wrong. Using the cast, you could inadvertently do something like cast away immutable or const. Always prefer passing the .ptr.

If you save the previous C example as $LEARNINGD/Chapter09/threeints/arrc.c and the D example as arr.d in the same directory, you can compile with the following commands. Using DMD with the default Digital Mars linker:

dmc -c arrc.c
dmd arr.d arrc.obj

Using DMD with the Microsoft linker (be sure to use the x64 Native Tools Command Prompt shortcut in the Visual Studio installation directories):

cl /c arrc.c
dmd -m64 arr.d arrc.obj

Using DMD on platforms other than Windows:

gcc -c arrc.c
dmd arr.d arrc.o

Tip

Wrapping array functions

One way to avoid making mistakes in calling C functions such as printThreeInts is to use a wrapper function. The wrapper accepts a D array as a parameter and verifies it's of the correct length:

void printThreeInts(int[] arr) {
  assert(arr.length >= 3);
  printThreeInts(arr.ptr);
}

When a C function takes a pointer parameter to which it intends to write data, it's important that the destination array has enough space to hold all of the elements it will be assigned. For example, this function writes three integers to an int array:

void storeThreeInts(int *ints) {
  int i;
  for(i=0; i<3; ++i)
    ints[i] = i+100;
}

With the appropriate binding, we can call it like so:

auto threeInts = new int[3];
storeThreeInts(threeInts.ptr);
writeln(threeInts);

Here, we've used a dynamic array allocated to hold three elements, but it could be a static array as well. As long as the array has enough space allocated to hold at least three elements, we're good.

Arrays of arrays

Sometimes, a C function takes a pointer-to-a-pointer, such as int**, to represent either a single array or an array of arrays. How to handle this depends on the context. Let's first consider the case of a pointer to an array. The following C function takes an array of int arrays and prints the members of each:

void printIntArrays(size_t count, int **arrays, size_t *sizes) {
  size_t i, j;
  for(i=0; i<count; ++i) {
    printf("Array #%d:
", i + 1);
      for(j=0; j<sizes[i]; ++j)
        printf("	%i
", arrays[i][j]);
  }
}

The first parameter, count, is necessary for the function to know how many arrays it has to work with. The second parameter is the array of arrays, and the third is an array containing the size of each int array. The prototype for a static D binding is:

extern(C) @nogc nothrow 
void printIntArrays(size_t, int**,size_t*);

Though it may seem odd, it's possible to pass a single array to this function. Let's look at that case first, since less work needs to be done than when we pass an array of arrays:

auto fourInts = [10, 20, 30, 40];
auto fourIntsSize = fourInts.length;
auto pfi = fourInts.ptr;
printIntArrays(1, &pfi, &fourIntsSize);

The highlighted lines are important here. It's not possible to take the address of either .length or .ptr. The only way to get the pointers we need is to first assign them to temporary variables and take the addresses of those. Still, this is easy compared to what we have to do for an array of arrays:

auto intArrays = [[10, 20, 30], [1, 3, 5, 7, 9], [100, 101]];
auto ptrs = new int*[3];
auto sizes = new size_t[](intArrays.length);
foreach(i, ia; intArrays) {
  ptrs[i] = ia.ptr;
  sizes[i] = ia.length;
}
printIntArrays(intArrays.length, ptrs.ptr, sizes.ptr);

The two highlighted lines show that we have to allocate two arrays for this case. It's not possible to cast a D rectangular array to a C pointer-to-a-pointer.

In a different context, a function might always expect a pointer-to-a-pointer to actually be a pointer to a single array, rather than to multiple arrays, perhaps to assign the array variable a new address. For example, the following function takes an int** and reassigns the pointer to a local static array. It then returns the size of the local array so that the calling code knows how many elements it's pointing to:

size_t getIntList(int** parray) {
  static int localArray[3] = {10, 20, 30};
  *parray = localArray;
  return 3;
}

Given what we've seen so far, your first instinct might be to try this:

int*[] ipa = new int*[](1);
auto size = getIntList(ipa.ptr);

Since an array of pointers is just like any other array, it makes sense that we should be able to allocate an array of them large enough to hold the number of elements the function wants to write to it. This is no different than what we did for the storeThreeInts example. We know we're getting one array element, so we allocate space to hold one element.

If you think about it, the allocation is wasteful. There's no reason to allocate space on the GC heap to hold any C array pointers. If we're worried about the contents of the array changing out from under us on the C side, or perhaps the original array address becoming invalid, allocating space to store the pointer buys us nothing. We would need to allocate space for the elements, then copy them all over to guarantee we can hang on to them for as long as we want. So we can do away with the allocation and just do this:

int* pi;
auto size = getIntList(&pi);

Now we can take this pointer and slice it to get a D array:

auto intList = pi[0 .. size];

At this point, we still haven't allocated any GC memory. If we want, we can call .dup on the array to allocate space for a new array and copy all the elements over, or we can just work with the slice directly. If you don't want to modify the original elements, or to manually manage their lifetime, or if you're concerned about something happening to them on the C side, just go for the .dup.

Strings

We know that D strings are also D arrays, so it's reasonable to expect that they behave the same when interacting with C. To a large extent, they do. However, the compiler does give string literals some special treatment that normal arrays just don't get. Try to compile and run the following program and see what happens:

void main() {
  import core.stdc.stdio : puts;
  puts("Giving a D string literal to a C function.");
}

Here, we are calling the standard C library function puts with a D string literal as an argument. puts is not a D wrapper, but a direct binding to the C function. It takes a C string, const char*, as an argument. So, what gives? How does this compile and run?

The compiler treats string literals specially. A string literal is implicitly convertible to a const or immutable pointer to char. This is true of function parameters and variable declarations. The same does not hold for regular array literals, nor is it true of string variables:

void main() {
  import core.stdc.stdio : puts;
  auto str = "Giving a D string to a C function.";
  puts(str);
}

This will fail to compile. From what we've seen so far about passing D arrays to C, we know we can work around this:

puts(str.ptr);

In this particular case, this works and causes no harm, but it isn't a general-purpose solution. This is another point where D strings differ from other array types.

There's no getting around the fact that C strings are expected to be null-terminated, meaning the last character in the string should be '' (or 0). The D compiler will let you pass D string literals directly to C functions because all string literals in D are null-terminated. This feature exists specifically to make them directly compatible with C. However, D does not require all strings to be null-terminated. Strings received from external sources, such as files or network packets, are not guaranteed to have a trailing ''; they will only have one if they've been initialized with a literal. For that reason, the compiler does not allow string variables to be implicitly converted to C-style strings, treating them just like any other array.

In this specific case, we know that str was assigned a literal, so we know that it is null-terminated. In the general case, however, when we cannot guarantee the original source of a string variable was a literal, we need to turn to the Phobos function std.string.toStringz.

We've already seen this function and its cousin, std.utf.toUTF16z, in the loader module presented earlier in the chapter. Given a string str, toStringz ensures that it is null-terminated. This usually means that a new string is allocated with space for the null-terminator. The UTF variations perform the same task, while also converting the input to the appropriately UTF-encoded string type.

Tip

Memory and toStringz

As I write, str.ptr is sometimes returned by toStringz, meaning no allocation is made if the null-terminator is already present. Unfortunately, bugs can arise in specific situations that create a false positive, so this is almost certain to change at some point unless D gains the ability to detect string literals through compile-time introspection.

When implementing a function that accepts a D string and hands it off to a C function, you should always use toStringz or, if the C API requires it, one of the UTF versions, before passing the string on to C. If you don't, then you've not only allowed the potential for a crash, you've also opened a pretty big security hole. Even if you think you're 100% sure that the function will only take literals, perhaps because it's private to the module and you completely control the types of strings it gets, you should still use toStringz.

Another thing to watch out for is what the C function does with any strings you give it. I have actually seen someone recommend that a C function declared to take a char* be translated to D to accept a const(char)* instead, solely to make it easier to pass a D string to it. When using a binding that you didn't create, always familiarize yourself with the original C API before you get started. You certainly don't want a C function to attempt to modify your immutable(char)[] strings in D. Nothing good can come of that. If you do need to pass a string to a C function that will be modified, just dup it to a variable of type char[] and then pass that on to the C function. Note that there's no guarantee that duping a string will preserve the null-terminator in the copy.

auto s = "Dup me!";
char[] cstr = s.dup ~ '';
modifyString(cstr.ptr);

If the C function is treating the string as a buffer and doesn't need to read it first, just do as you would with any array; allocate an array of characters large enough to hold the output, then pass it on:

auto cstr = new char[1024];
writeToBuffer(cstr.length, cstr.ptr);

Sometimes, such a function might be documented to require that the array contain the null-terminator, even though it's empty. In that case, keep in mind that the .init value of a char is 0xFF, not 0, so you'll have to set the value yourself.

Memory

When interacting with C, never forget that D has a garbage collector. The potential for nasty bugs is high here. Any memory allocated by the GC that is passed off to C could cause problems down the road. Consider this function:

void dFunction() {
  auto ints = [1, 2, 3];
  cFunction(ints);
}

ints is allocated on the GC heap from an array literal and then passed to a C function. Once dFunction returns, what happens depends on what cFunction does with the pointer. If it is stored in a stack variable, all is well as the GC knows how to scan the stack. If, on the other hand, it's stored somewhere in the C heap, such as in a struct instance allocated via malloc, all bets are off; the GC has no way of knowing that an active reference to the allocated memory still exists. At any point, the GC could collect the memory it allocated for ints. If a reference to it still exists in the C heap, then the next attempt to access it on the C side will be accessing an invalid memory location.

There are different options for ensuring that things don't blow up in this situation. The first that might come to mind is to keep a reference to the GC-allocated memory somewhere on the D side. Another option is to just use C's malloc to handle the allocation and forget about the GC issues completely. A third option is to inform the GC to always consider the memory block as live and never bother to collect it. This is done by importing std.memory and calling GC.addRoot, something we'll look at in Chapter 11, Taking D to the Next Level.

Unfortunately, it isn't always obvious when GC memory is being allocated. Sometimes, the allocation might be hidden. Earlier, we saw the function std.string.toStringz and learned that it sometimes allocates memory, but not always. You'll see a great deal of D code calling C functions like this:

some_c_function(myString.toStringz());

There's no way of knowing just by looking at this function call whether or not toStringz is allocating. That's perfectly fine as long as it's certain that some_c_function isn't going to keep a pointer to the string hanging around for later use (or even if it will call realloc or free on your pointer, which is just bad news for GC-allocated memory). If there's no way of knowing for sure what the C function is going to do, then it's best to be safe and store a reference to or call, GC.addRoot with, the return value of toStringz.

Thankfully, the majority of C functions that handle strings do not need to keep them locally. If they do, well-written functions will copy the string to a locally allocated buffer so that the calling code need not worry about it. It's the corner cases you have to watch out for. The same holds true for any pointer to GC-allocated memory that you pass to a C function. Always read the documentation for the C library first. If it isn't clear, check the source if it's available. Never pass GC-allocated memory into C blindly.

C callbacks and exceptions

Earlier in the chapter, I recommended that you annotate every C callback in a binding with nothrow. Now we're going to see why. To do so, save the following C file as $LEARNINGD/Chapter09/exceptions/call.c:

void callCallback(void (*callback)(void)) {
  callback();
}

Then, save the following as except.d in the same directory:

extern(C) @nogc nothrow void callCallback(void function());
extern(C) void callbackImpl() {
  throw new Exception("Testing!");
}
void main() {
  try {
    callCallback(&callbackImpl);
  }
  catch(Exception e) {
    import std.stdio : writeln;
    writeln("Caught it!");
  }
}

To compile, use the same command lines we used a few pages back with the array examples. For example, compiling with the DMC toolchain:

dmc -c call.c
dmd except.d call.obj

I compiled and executed the program on both Windows and Linux with the DMC, 64-bit Microsoft, and GCC linkers. I even tried it out with the 32-bit MinGW-backed and 64-bit Microsoft-backed versions of ldc2 on Windows. In every test run except for one, the exception was printed to the screen, meaning that it was never caught; it completely bypassed the exception handler. The odd man out was the 64-bit ldc2 version, which only managed to crash.

In effect, any exception thrown in a D implementation of a function that is called from C is either an unrecoverable exception or, potentially, the cause of a crash. If you are creating a binding to a C library that makes use of callbacks, marking the callbacks as nothrow will prevent users of the binding from unwittingly allowing recoverable exceptions to become unrecoverable.

As a user, whether the binding annotates the callbacks as nothrow or not, you should never throw, or allow to be thrown, an exception from a C callback. There's just no guarantee that it won't corrupt the program state. Neither is it a good idea to litter your callbacks with try...catch blocks which just swallow the exception and move on. Doing that could eventually cause your program to become unstable.

One way to work around this issue is to collect any exceptions thrown inside a callback and tuck them away in a variable that can be tested at a convenient time. This means every callback that does anything that could potentially throw will need a try...catch block that, in the catch, adds the caught exception to the array. Elsewhere in the program, at a point after the callbacks have run, the program can test for specific recoverable exceptions and continue if any of those are encountered. Otherwise, the first unrecoverable exception is thrown.

In this example, saved as $LEARNINGD/Chapter09/exceptions/except2.d, the exception is generated manually rather than caught. This is what you would need to do anyway if you wanted to throw from a callback. The exception is stored in an array, which is checked after the callback is called, where it is compared against a fictitious RecoverableLibraryException. Since it isn't an instance of that type, it is immediately thrown after a message is printed:

extern(C) nothrow alias CallbackFunc = void function();
extern(C) @nogc nothrow void callCallback(CallbackFunc);
extern(C) nothrow void callbackImpl() {
  _callbackExceptions ~= new Exception("Testing!");
}
class RecoverableLibraryException : Exception {
  this(string msg) {
    super(msg);
  }
}
Exception[] _callbackExceptions;
void main() {
  import std.stdio : writeln;
  callCallback(&callbackImpl);
  foreach(ex; _callbackExceptions) {
    if(auto arle = cast(RecoverableLibraryException)ex)
      writeln(arle.toString);
    else {
      writeln("Not recoverable!");
      throw ex;
    }
  }
}

A more sophisticated implementation would chain all caught exceptions together and forgo the array. For a real-world example, see an article I wrote on the topic at http://www.gamedev.net/page/resources/_/technical/general-programming/d-exceptions-and-c-callbacks-r3323. Note that the example there does not use nothrow, but really should.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.216.75