Chapter 11. Taking D to the Next Level

The previous ten chapters cover enough of D, its standard library, and the ecosystem for any programmer to use as a guide in implementing a variety of applications and libraries in D. The language and library features that were covered were either fundamental, such as those discussed in Chapter 2, Building a Foundation with D Fundamentals, and Chapter 3, Programming Objects the D Way, or used so frequently that they are encountered on a regular basis in D libraries, tutorials, and example code. A number of features were not covered, either because they do not fit into the categories of fundamental and frequently used, or because they aren't quite ready for prime time.

This chapter introduces several of the language and library features that were not covered elsewhere in the book. None of the features here are given in-depth coverage, only enough to provide a general overview of each. Consider this chapter a platform from which to launch further exploration of the D programming language to improve your knowledge and experience. Here are the things we'll be looking at in this chapter, which are organized in no particular order:

  • Concurrency: D's support for different multithreaded programming models in the language, the runtime, and the standard library
  • SafeD: an introduction to the language features that help guarantee memory safety
  • Functional purity: a brief introduction to D's support for pure functions
  • The garbage collector: a look at the garbage collector API, which can be used to gain more control over when and how the GC does its work
  • Connecting with C++: A quick look at binding D with C++ libraries
  • The Future of D: a few optimistic words about D's future

Concurrency

Once upon a time, multithreaded programming was the exclusive realm of people with pointy hats who uttered strange incantations. Mere mortals fell victim to the evils of race conditions and deadlocks too easily. Yet, in this age of multi-core processors, the arcane is on the verge of becoming the mundane. D's multifaceted support of concurrency is oriented toward giving programmers the tools to make it so.

The traditional model of multithreaded programming, lock-based synchronization and data sharing, began to fall out of favor even before multicore processors came along. Such code is difficult to properly implement, test, and maintain. Other models, such as thread-per-system, thread-per-task, and message passing, improved the situation, making it easier to design frameworks that hide the nasty details behind an interface that appears single-threaded. Recently, it has become easier to implement loops that operate on data in parallel in some languages through built-in support, libraries, and compiler extensions. As software gains access to more and more cores, both on the CPU and the GPU, this latter model becomes more important. D comes with support for each of these models, spread across the language, the runtime, and the standard library. This section presents a brief introduction to all of the support for concurrent programming in D, with suggestions on where to go to learn more.

Threads and fibers

The heart of any concurrent programming model in D is the Thread class found in the DRuntime package core.thread. D's threads are heavyweight, meaning they map to kernel threads managed by the operating system. They carry all the baggage that comes from each thread having its own context that needs to be activated when a thread is given its time slice. A more lightweight option is the Fiber class. Not only do fibers carry around less baggage, their execution is managed by the program rather than the operating system.

Threads

You may use the Thread class to spawn new threads. Even in single-threaded programs, its static methods, such as sleep or yield, can be called to affect the execution of the current thread. New threads can be created either by subclassing Thread, or by instantiating a Thread instance directly. In both cases, a function that returns void and takes no arguments can be passed to the Thread constructor in the form of a delegate or function pointer.

import core.thread;
import std.stdio;
class MyThread : Thread {
  this() {
    super(&run);
  }
  private void run() {
    writeln("MyThread is running.");
  }
}
void myThreadFunc() {
  writeln("myThreadFunc is running.");
}
void main() {
  auto myThread1 = new MyThread;
  auto myThread2 = new Thread(&myThreadFunc);
  myThread1.start();
  myThread2.start();
}

There are C libraries and frameworks that provide a platform-agnostic way to create threads. Sometimes, such as when the C library requires the use of a custom thread handle for certain functions, it is necessary to use the foreign API to create new threads. This usually requires a pointer to a function that the new thread will call when it is executed. Any such threads should usually be registered with DRuntime inside the thread function by calling thread_attachThis.

extern(C) void threadFunc(void* data) {
  import core.thread : thread_attachThis, thread_detachThis;
  thread_attachThis();
  scope(exit) thread_detachThis();
}

Registering foreign threads with DRuntime is necessary to ensure that all required thread-local initialization is done. It's also important if the thread touches GC-managed memory. Before the GC scans any particular block of memory, it pauses the execution of all active threads. If a thread has not been registered with DRuntime, then the GC can't pause it. For this reason, you should always prefer to use the Thread class to create new threads in D, even when using C libraries. Threads should be created through third-party APIs only in the rare cases when it is unavoidable. Foreign threads should always be registered with DRuntime if they touch anything on the D side.

Fibers

A Fiber, an alternative implementation of a coroutine, can be spawned in the same manner as a Thread, by subclassing or by instantiation with a function pointer.

The delegate or function pointer associated with a fiber is executed when the call member function is called. Execution happens in the calling thread, which is blocked until the yield function is called, as shown in the following example:

import core.thread;
import std.stdio;
void myFiberFunc() {
  writeln("Execution begun.");
  Fiber.yield();
  writeln("Execution resumed.");
}
void main() {
  auto fiber = new Fiber(&myFiberFunc);
  fiber.call();
  writeln("Execution paused.");
  fiber.call();
}

Always keep in mind the difference between a fiber and a thread. A Thread instance represents a system resource. Each system thread has its own copies of thread-local data, so any mutations of such data through the run member function of a Thread instance happen on local copies and will not be visible in other threads. Non thread-local mutable data should be protected through synchronization primitives. A Fiber instance does not represent a system resource, meaning it does not have its own copies of thread-local data. If there is any possibility that multiple threads can run the call function on a Fiber instance, then care must be taken to synchronize access to all data that can be accessed through that function. As long as the same thread executes the function every time, synchronization is only an issue with data that is not thread-local. We'll see a bit about synchronization in D shortly.

Data sharing

As we know from Chapter 2, Building a Foundation with D Fundamentals, all variables declared in D are thread-local by default, meaning each thread has its own copy of each variable. We've also seen brief mentions of the shared and __gshared attributes. Fundamentally, they both achieve the same end in that they flag a variable as being outside of thread-local storage, meaning it is shared by all threads. Other than that, they are quite different, each coming with its own guarantees and consequences.

__gshared

Applying __gshared to a module-scope variable in D is essentially the same as declaring any variable in C. It is entirely up to the programmer to ensure that access to the variable by multiple threads is properly guarded. The same holds true for member variables of aggregate types, with the added side effect that such variables are also static. For example, the declarations of shared1 and shared2 in the following snippet are equivalent:

class SharedMembers {
  __gshared static int shared1;
  __gshared int shared2;
}

__gshared is a necessity when declaring variables in C library bindings, but it should otherwise be a rarity in normal D code.

Shared

There are a few things to be aware of when applying the shared attribute to a variable. First, it must be understood that shared modifies the type.

int tlsVar;             // type == int
shared int sharedVar;   // type == shared(int)

This has consequences in how shared variables are used as function arguments and assigned to other variables. While value types can convert just fine, this does not hold with reference types or pointers, for example, a shared(int)* does not implicitly convert to int*.

Second, shared is transitive. Applying shared to an instance of an aggregate type means all of its members are also shared.

struct ShareMe {
  int* intPtr;
}
shared ShareMe sm;
int x;
sm.intPtr = &x; // Error!

Here, sm.inPtr = &x fails because &x yields int*, not shared(int*), which is the type of sm.intptr thanks to the declaration of sm as shared.

Third, the compiler prohibits any unprotected, non-atomic modification of a shared variable. In the following snippet, the second line is illegal:

shared int sharedInt;
++sharedInt;

In this case, the error can be avoided using a template function from the core.atomic module in DRuntime.

import core.atomic : atomicOp;
atomicOp!"+="(sharedInt, 1);

Note

As I write, ++sharedInt does not result in a compiler error. Instead, the compiler outputs the following message: Deprecation: read-modify-write operations are not allowed for shared variables. Use core.atomic.atomicOp!"+="(sharedInt, 1) instead. The code will still compile and the program will execute, but there's a good chance for a race condition to appear. At some point, this will become a compiler error. For now, it's necessary to pay attention to the compiler output to ensure that this sort of thing doesn't slip into any code using shared variables.

Synchronization and atomics

Synchronization goes hand-in-hand with data sharing. Without the means to protect a variable from simultaneous access by multiple threads, strange things can happen (note that there is no need to protect data from multiple fibers; Fiber instances are no different from any other class instance in that regard). Another option, as seen in the previous section, is to perform modifications of variables atomically; in one step, where possible. D has support for synchronization both in the language and in the runtime and for atomic operations in the runtime.

Automatic synchronization

The synchronized statement creates a scope in which all variable accesses are protected by a mutex. When the scope is entered, the mutex is acquired (locked). When the scope is exited, the mutex is released.

private int _someInt;
void setSomeInt(int newVal) {
  synchronized {
    _someInt = newVal;
  }
}

The compiler will allocate a new mutex object specifically for each synchronized blocks. This behavior can be overridden by providing any expression that yields a class or interface instance for the synchronized statement to use. Every class instance has its own mutex which the compiler will use instead of allocating a new one. That said, it's considered good practice to use an instance of std.mutex.Mutex.

import std.mutex;
auto mutex = new Mutex;
synchronized(mutex) {
    ...
}

synchronized can be applied to class (but never struct) declarations. Doing so makes every member function of that class synchronized and causes the mutex associated with each instance of the class to be used as the monitor, meaning that it's equivalent to adding a synchronized(this) statement inside every function in the class. With this, only shared instances of the class can be instantiated and all member function calls will be serialized.

As I write, there are two issues to watch out for regarding synchronized classes. One is public member variables. Right now, it's possible to declare them in a synchronized class, but this can be problematic if they are mutable as it allows for non-synchronized mutation. It is expected that this will be deprecated at some point.

The second is the documentation at http://dlang.org/class.html#synchronized-classes says the following:

"Member functions of non-synchronized classes cannot be individually marked as synchronized. The synchronized attribute must be applied to the class declaration itself."

In practice, the compiler actually does allow synchronized to be applied to individual member functions. Again, instances of the class must be declared as shared. It is unlikely that this will change, as it is certain to break code in active projects. One such project is DWT, a port to D of the SWT library for Java (see https://github.com/d-widget-toolkit/dwt).

Manual synchronization

The DRuntime package core.sync contains several modules that expose primitives that can be used to manually implement synchronization for different behaviors. The package includes two types of mutexes, a generic recursive mutex in the mutex module, and a mutex that allows for shared read access and exclusive write access in the rwmutex module. Additionally, the modules condition, semaphore, and barrier provide eponymous primitives. If you're looking to implement lock-based data sharing yourself, this is a good place to start.

Atomics

An atomic operation is one that appears to happen instantaneously. Such operations are safe in multithreaded programming because there is an inherent guarantee that only one thread can perform the operation at a time, meaning that no locks are required. The core.atomic module in DRuntime provides a handful of functions that allow for lock-free concurrency. Earlier, we observed how to use the template function atomicOp to convert the non-atomic operation of adding 1 to a shared(int) into an atomic one. Other functions in the module allow for atomic loads and stores, atomic compare and swap (cas), and atomic memory barriers (memory fences).

When using atomic operations, it's important to have a good grasp of memory ordering. Members of the enumeration core.atomic.MemoryOrder can be used with the atomicLoad and atomicStore functions to specify the type of memory barrier instruction the CPU should use in carrying out the operation. Although it's a talk related to C++, a good place to start is Herb Sutter's two-part talk from C++ and Beyond 2012, titled, atomic<> Weapons: The C++ Memory Model and Modern Hardware at https://isocpp.org/blog/2013/02/atomic-weapons-the-c-memory-model-and-modern-hardware-herb-sutter.

Message passing

Phobos provides foundational support for the message passing model of concurrent programming in the std.concurrency module. This is the preferred way of handling concurrency in D; you should only turn to other models if std.concurrency doesn't meet your needs. This module hides most of the raw details of concurrent programming behind a simplified API; rather than manipulating the Thread class directly, programs call std.concurrency.spawn and get a Tid (thread ID) in return that is then used as a marker to identify messages sent and received between threads.

import std.concurrency;
import std.stdio;
void myThreadFunc(Tid owner) {
  receive(
    (string s) { writefln("Message to thread %s: %s", owner, s); }
  );
}
void main() {
  auto child1 = spawn(&myThreadFunc, thisTid);
  auto child2 = spawn(&myThreadFunc, thisTid);
  send(child1, "Message for child1.");
  send(child2, "Message for child2.");
}

Here, two new threads are created by passing a pointer to myThreadFunc and the result of thisTid, which returns the Tid of the current thread, to spawn. Then the parent thread sends a message to each child. The send function takes a Tid followed by any number of parameters of any type. The receive function is a template that takes any number of delegates as parameters, each of which can itself have different parameters and return types. The delegates are registered with the owning thread as message handlers; when a message is received, all of the registered delegates are searched to see if any have a parameter list that matches the parameters sent via the send function. In this example, one handler that accepts a string is registered for each child thread.

It's notable that std.concurrency deals in logical threads. In other words, a Tid may represent an actual Thread, or it may represent a Fiber. By default, spawn creates new kernel threads, but it's possible to implement a Scheduler, such as the example std.concurrency.FiberScheduler, that causes spawn to create new fibers instead.

std.concurrency contains variations of spawn, send, and receive, as well as utility functions and types, which can be used as a foundation for a higher-level message passing API.

Parallelism

When processing large amounts of data, one way to utilize the power of multi-core processers is to break the data into chunks and process each chunk in parallel. D has support for this in the form of the Phobos module std.parallelism.

The module is built around the Task and TaskPool types, with a few helper functions to make things more convenient to use. A Task represents a unit of work. A TaskPool maintains a queue of tasks and a number of worker threads. Member functions of TaskPool can be called to process and apply algorithms to the data in the task queue. For example, the member functions map and reduce perform the same operations as their std.algorithm counterparts, but do so across multiple threads in parallel. Another interesting member function of TaskPool is parallel, which allows the execution of a parallel foreach loop. There is a convenience function, also called parallel, which uses the default TaskPool instance. The following example scales 100 million two-dimensional vectors. When compiled with –version=SingleThread, it all happens on one thread.

struct Vec2 {
  float x = 1.0f, y = 2.0f;
}
void main() {
  import std.stdio : writeln;
  import std.datetime : MonoTime;
  auto vecs = new Vec2[](100_000_000);
  auto before = MonoTime.currTime;
  version(SingleThread) {
    foreach(ref vec; vecs) {
      vec.x *= 2.0f;
      vec.y *= 2.0f;
    }
  }
  else {
    import std.parallelism : parallel;
    foreach(ref vec; parallel(vecs)) {
      vec.x *= 2.0f;
      vec.y *= 2.0f;
    }
  }
  writeln(MonoTime.currTime - before);
}

Given that there are only two multiplications and assignments per vector, there isn't enough work for a parallel foreach loop to be beneficial with lower numbers of vector instances. Change the 100_000_000 to 100_000, for example, and you may find that the parallel version is slower. It was for me. But with 100 million instances, the parallel version won out in multiple runs on my machine. If you need to process large datasets, particularly by performing complex operations, std.parallelism makes it quite simple to take advantage of multiple cores and process the data in parallel.

More information

The Phobos documentation at http://dlang.org/phobos/index.html is a source of more detailed information for most of the topics we've covered in this section. Additionally, the concurrency chapter of Andrei Alexandrescu's book The D Programming Language is available online at http://www.informit.com/articles/printerfriendly/1609144. The article Getting More Fiber in Your Diet at http://octarineparrot.com/article/view/getting-more-fiber-in-your-diet contains a more complex fiber example that is compared against an implementation using threads.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.197.136