Creating an array replacement

Many operations on D's built-in array slices use the garbage collector. If you want to use alternate resource management strategies, you'll want a replacement type, which implements operators that can be used with the other strategy and disables those that cannot be used. It is possible to avoid all allocations using a static stack buffer, which can give a considerable performance improvement. We'll create an array replacement, which uses a static buffer for a predicted size and grows into a dynamically allocated buffer, if required.

How to do it…

To create an array replacement, perform the following steps:

  1. Define a struct that takes an element type and expected size.
  2. Define variables with the key components: a static buffer with the expected size, a slice to the current buffer, and the current length.
  3. Write a destructor to free the memory, if you are managing it manually.
  4. Overload the convenience operators you want on the array: length, capacity, reserve, append, and slice.
  5. Disable the operators you do not want to use. For example, concatenation or copying may be disabled because those operations are inconvenient and error prone without the garbage collector's help.
  6. Use alias this if you want implicit slicing. Consider the following code:
    struct MyArray(T, size_t expectedSize = 0) {
      private T[expectedSize] staticBuffer;
      private T[] activeBuffer;
      private size_t currentLength;
    
      /+
      // if you want to use malloc instead of gc
    
      // don't copy this; we own our own memory
      // note that if you slice this, that is a borrowedreference;
      // don't store it.
      @disable this(this); // disable copying
    
      ~this() {
        // frees the memory if it was malloced
        if(activeBuffer.ptr !is staticBuffer.ptr)
        free(activeBuffer.ptr);
      }
      +/
      public void reserve(size_t s) {
        if(s > activeBuffer.length)
        activeBuffer.reserve(s); // GC operation.alternatively, you could use realloc
      }
    
      public @property size_t length() {
        return currentLength;
      }
    
      public @property void length(size_t newLength) {
        // make sure we're using the static buffer first
        if(activeBuffer is null)
        activeBuffer = staticBuffer[];
    
        // then if we don't have enough room, we'll reallocatewith
        // the GC (its own capacity should optimize this)
        if(newLength > activeBuffer.length) {
          activeBuffer.length = newLength; // GC operation
          // alternatively, you could use realloc
        }
    
        if(newLength < currentLength)
        activeBuffer[newLength .. currentLength] = T.init;
    
        currentLength = newLength;
      }
    
      public void clear() { length = 0; }
    
      // disable concatenation
      @disable void opBinary(string op : "~")(T t){}
      @disable void opBinary(string op : "~")(T[] t){}
    
      // but customize appending
      public T opOpAssign(string op : "~")(T t) {
        length = currentLength + 1;
        return activeBuffer[currentLength - 1] = t;
      }
    
      public T[] opOpAssign(string op: "~")(T[] t) {
        auto start = currentLength;
        length = start + t.length;
        return activeBuffer[start .. start + t.length] = t[];
      }
    
      public T[] opSlice() {
        return activeBuffer[0 .. currentLength];
      }
      alias opSlice this;
    }
    // test usage
    void main() {
      // declare your array of ints with an optimized size often
      MyArray!(int, 10) intArray;
      myArray ~= 1; // populate it
      myArray.length = 3; // change the length
      import std.stdio;
      writeln(myArray[]); // print the current contents
    }

How it works…

This array replacement serves two purposes; it optimizes the number of allocations and gives us control over the allocation method. To understand why this is valuable, we need to compare it against similar operations with built-in dynamic (garbage collection allocated) arrays.

Garbage collection arrays do contain appending and preallocation optimizations, including a reserve method that, like our reserve method, preallocates space up to at least the given capacity so that the subsequent append operations perform faster.

However, slice semantics are very conservative. To avoid the possibility of stomping over another slice's memory, reducing the length of a slice always zeroes out its capacity:

int[] a = [1, 2, 3]; // a.capacity == 3; appending will not realloc
int[] b = a[0 .. 2]; // b.capacity == 0

If you were to write b ~= 4, you can expect b == [1,2,4], but you can also still expect a == [1,2,3]. To ensure this is the case, appending to a partial slice will always reallocate—it will copy the array to a new location and append it there, leaving the original memory alone.

While this gives predictable and generally useful slice semantics, it isn't always best for performance. Consider the following code snippet:

a ~= 4; // append a 4
a.length = 0; // reset the array…. but this also clears the capacity
a ~= 1; // this reallocates a brand new array

Note

Capacity is not stored with the slice. It is stored with the memory block returned by the garbage collector. Querying capacity thus is more than a simple member lookup—it must walk a garbage collector structure. The garbage collector implementation contains a cache of most recently used capacities to accelerate this process, allowing multiple appends on one array to perform well.

If you are adding and removing elements often, this becomes a serious bottleneck. (Refer to the recipe in Chapter 3, Ranges, where we used these same techniques in the stack collection.)

By keeping a separate length variable, MyArray is a bit more heavyweight than a basic slice, but the benefit is that our array gives precision control. It owns its own memory, so it doesn't have to worry about stomping on another function's data. As a result, changing the length, appending or clearing, only reallocates when strictly necessary.

A further optimization—again at the expense of additional variables in the structure—is the static array. Using a compile-time parameter, we let the user tweak this to their specific use case. If you know you are working with a small data set, using stack memory (a static array) to hold it avoids all allocations. This gives the best possible speed.

The rest of the implementation of MyArray is aimed towards making it convenient to use while minimizing the likelihood of writing buggy code with it. We implement the ~= operator:

public T opOpAssign(string op : "~")(T t);
public T[] opOpAssign(string op: "~")(T[] t);

Operator overloading in D is done by implementing functions with a particular name: opBinary for a op b operators, opAssign for a = b, opOpAssign for a op= b, and others. Visit http://dlang.org/operatoroverloading.html for a complete list.

The compile-time parameter is the operator that is passed as a string. Here, we used template specialization syntax to only implement op == "~". In a compile-time parameter list, any parameter may be followed by a colon, then a specific value for that argument. If the template is instantiated with that argument, this function is used instead of any other overloads. This lets us write special implementations for particular arguments or to only implement the function for those particular arguments.

We implement the operator twice, once for a single element so that array ~= 1 works, and then again for an array so that we can efficiently implement array ~= [1,2,3] as well.

We also used operator overloading syntax to disable the binary ~ operator:

  @disable void opBinary(string op : "~")(T t){}

The @disable annotation can be attached to any function to statically disallow its use. Any attempt to write our_array ~ value will cause a compilation failure. The reason we disabled this is that the concatenation operator makes it very easy to hide allocations and lose references. Consider a = b ~ c; here, we would allocate a new array, copy b into it, append c into it, and assign it to a. It may not append into the existing memory block of b due to the conservative nature of the concatenation operator. It does not modify its arguments; instead it returns a new array. Since this custom array is meant to control memory allocations, this is undesirable.

Another worry with binary concatenation is b ~ c ~ d. Here, b~c is evaluated first, which returns a new array (we'll call it tmp). Then, tmp ~ d is evaluated, returning another new array, which is the value of the expression. The problem is that the tmp array reference is completely lost! When using garbage collector arrays, this is OK (convenient, though somewhat inefficient); the garbage collector will clean it up. However, with manual memory management, a lost array reference results in a memory leak. The safest thing to do here is to disable the operator entirely and force the user to explicitly manage their arrays.

Note

Why do we have to explicitly disable the operator instead of simply leaving it unimplemented? Since we wrote alias opSlice this, any operator that isn't found on the object itself will be attempted on the return value of opSlice, which returns a built-in slice. Since ~ is implemented on built-in slices, in terms of the garbage collector, this would succeed; however, this isn't what we wanted.

Lastly, we implemented opSlice without arguments—the [] operator—and used that with alias this. The alias this declaration allows the transparent forwarding of unimplemented operations as well as implicit conversion to the child type. This is known as subtyping. This lets our array conveniently interact with other D functions that aren't familiar with our implementation. However, it does have a downside; the other D functions may escape references to our data, which would become invalid when we free the memory.

If you pass a slice to a function, whether explicitly with the [] operator or via implicit conversion with alias this, be sure the function manages it as a lent resource and does not keep the reference.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.137.67