Chapter 2.6. T.120: Use template metaprogramming only when you really need to

I have served on the program committee of several annual C++ conferences. This means that I participate in choosing what talks are presented to conferences from the many that are submitted. Choosing takes the form of voting: in the usual case there are about a dozen members on the committee. One topic that is guaranteed to get a high score is template metaprogramming, which I shall henceforward refer to as TMP. This is an exciting field of development in C++ because it promises so much, and yet it often does more harm than good: this is yet another manifestation of the shiny hammer syndrome.

There are several things that make TMP rather complex, as well as attractive to conference organizers and developers alike, but also an unattractive proposition to engineering managers.

First, TMP happens at compile time, and the nature of compile-time programming means that there is no mutable state available. This means that it is rather like functional programming, which can be a tricky paradigm to master.

Second, there is no flow control except through recursion. Again, recursion takes some effort to understand; indeed, there is an expression that “to iterate is human, to recurse is divine.”

Third, debugging is not available through conventional means. If your code fails, you must simply look at the problem until the answer occurs to you. There are no other options. If you are lucky, you will get some compiler error output, which rapidly explodes into messages that are hundreds of characters long.

Fourth, I would have to be feeling extremely charitable to describe a piece of TMP as self-documenting. The constructs required to deliver TMP are opaque and require a broad understanding of some dusty corners of the language and the standard library.

Fifth, the impact on compilation time can be dramatic. Template instantiation takes time, and if it happens inside a header, then that time cost will be exacted many times over. Of course, function and class templates require inline definitions to operate, so this is the usual case.

Unfortunately, the sheer exhilaration of getting a piece of TMP to work is hard to beat. Also, TMP can offer superior representations of the problem domain at hand, delivering generic and reusable solutions. It is very tempting to reach for TMP when it is not necessary to do so.

But what IS metaprogramming, and how does TMP model it?

In a nutshell, metaprogramming blurs the line between state and execution by treating code as data. The program can see and understand its own code and reason about it, making decisions based on its own structure and content. This leads to things like self-modifying code. For example, and this is going back many years, knowledge of Z80 assembly has allowed me to change code while it’s running. I was able to change jump destinations while code was running to change the behavior of algorithms at runtime.

In the case of C++, the compiler knows about the language and treats all the declarations as data. It works on this data to produce the final program. This means there is scope for metaprogramming, which is realized through templates. Templates are customization points for the language, yielding new types depending on how they are instantiated. These types can be thought of as the result of calculations based on the programmer’s input. For example, given the function template declaration:

template <class T> void fn(T);

the declaration

fn<int>(17);

can be thought of as the result of applying int to the function template. This makes C++ its own metalanguage, a facility known as reflection. As we shall see, the reflection facilities offered by C++, though Turing complete, are rather obscure and obfuscate rather than clarify code.

Here is a trivial example, for generating the sum of all integers from 1 to N:

// Recursive calculation delivers iteration
template <int N> struct sum_integers {
  static constexpr int result = N + sum_integers<N-1>::result;
};
// Explicit specialization forms the recursion base case // Note that if constexpr may also suffice template <> struct sum_integers<1> { static constexpr int result = 1; };
int main () { return sum_integers<10>::result; }

This will simply return 55: that value will be calculated at compile time. You can try this out and see how big N can get before compilation fails, and look at why it fails.

The state is contained in the result member and is a const value. Repeated explicit instantiation of sum_integers with a successively reducing parameter achieves the recursion required. If you comment out the base case, the compiler should emit an error related to the complexity of the instantiation context. Here it is using clang 12.0.0, edited slightly for presentation:

<source>:3:37: fatal error:
recursive template instantiation exceeded maximum depth of 1024
  static constexpr int result = N + sum_integers<N-1>::result;
                                    ^
<source>:3:37: note:
  in instantiation of template class 'sum_integers<-1014>' requested here
<source>:3:37: note:
  in instantiation of template class 'sum_integers<-1013>' requested here
<source>:3:37: note:
  in instantiation of template class 'sum_integers<-1012>' requested here
<source>:3:37: note:
  in instantiation of template class 'sum_integers<-1011>' requested here
<source>:3:37: note:
  in instantiation of template class 'sum_integers<-1010>' requested here
<source>:3:37: note:
  (skipping 1015 contexts in backtrace;
  use -ftemplate-backtrace-limit=0 to see all)
<source>:3:37: note:
  in instantiation of template class 'sum_integers<6>' requested here
<source>:3:37: note:
  in instantiation of template class 'sum_integers<7>' requested here
<source>:3:37: note:
  in instantiation of template class 'sum_integers<8>' requested here
<source>:3:37: note:
  in instantiation of template class 'sum_integers<9>' requested here
<source>:12:10: note:
  in instantiation of template class 'sum_integers<10>' requested here
  return sum_integers<10>::result;
         ^
<source>:3:37: note:
  use -ftemplate-depth=N to increase recursive template instantiation depth
  static constexpr int result = N + sum_integers<N-1>::result;
                                    ^
1 error generated.
Compiler returned: 1

Here with GCC 11.1:

<source>:
In instantiation of 'constexpr const int sum_integers<-889>::result':
<source>:3:56:
  recursively required from 'constexpr const int sum_integers<9>::result'
<source>:3:56:
  required from 'constexpr const int sum_integers<10>::result'
<source>:12:28:
  required from here
<source>:3:56: fatal error:
template instantiation depth exceeds maximum of 900
(use '-ftemplate-depth=' to increase the maximum)
    3 |   static constexpr int result = N + sum_integers<N-1>::result;
      |                                                        ^~~~~~
compilation terminated.
Compiler returned: 1

The MSVC compiler timed out on Compiler Explorer, although to its credit it produced the best assembly when compiling the correct source code:

main  PROC
      mov  eax, 55                ; 00000037H
      ret  0
main  ENDP

If you are familiar with recursion and base cases, or with proof by induction, this should be clear to you, but the documentation helps here.

There are plenty of resources out there that demonstrate how you can use partial specialization for conditional compilation and use types for returning values. These two features along with iteration via recursion yield TMP as Turing complete, which is a frankly terrifying idea.

This is a trivial example, so let’s look at a genuine use case: expression templates.1 These are sometimes used in linear algebra implementations. Consider this vector class, and by vector we mean the mathematical object rather than the standard container. It is likely to be used in three-dimensional geometry. We shall call it vector_f because it is a vector of floats:

template <size_t N>
class vector_f {
public:
  vector_f();
  vector_f(std::initializer_list<float> init);
  float operator[](size_t i) const; // read-only accessor
  float& operator[](size_t i); // read-write accessor
  size_t const size(); // extract the size parameter
private: std::array<float, N> data; };

1. I am delighted to credit my committee colleague Daveed Vandevoorde, as well as Todd Veldhuizen, as detailed at https://en.wikipedia.org/wiki/Expression_templates.

We want to be able to add vector_f objects together. This means we will need an addition operator function template:

template <size_t N>
vector_f<N> operator+(vector_f<N> const& u, vector_f<N> const& v) {
  vector_f<N> sum;
  for (size_t i = 0; i < N; i++) {
    sum[i] = u[i] + v[i];
  }
  return sum;
}

No rocket science here. It’s a raw loop, but we want to highlight something: the code creates the return value, populates it, and returns it. The compiler can do some loop unrolling for small values of N. We cannot initialize the return value using the initializer list constructor, but it seems entirely sufficient.

A common activity when dealing with vectors is to add several together in a single operation:

vector_f<3> v = a + b + c;

This will result in two loop iterations and a discarded temporary object, the result of a + b. As N gets bigger this cost will become increasingly apparent. The solution is to defer the generation of the addition operator as far as possible. This is achieved by having the addition operator return a special type that evaluates the addition on demand rather than immediately. It’s a kind of lazy evaluation. Strap in…

First, we need an expression class:

template <struct E> class vector_expression {
  public:
    float operator[](size_t i) const {
      return static_cast<E const&>(*this)[i]; }
    size_t size() const {
      return static_cast<E const&>(*this).size; }
};

This uses the curiously recurring template pattern, possibly the first piece of TMP to be published. It delegates the bracket operator and the size call to the class which is the parameter of this class template.

We now need to derive vector_f from this class:

template <size_t N> class vector_f
    : public vector_expression<vector_f<N>> {
public:
  vector_f();
  vector_f(std::initializer_list<float> init);
  template <class E>
  vector_f (vector_expression <E> const& e);
  float operator[](size_t i) const; // read-only accessor
  float& operator[](size_t i); // read-write accessor
  size_t const size(); // extract the size parameter
private: std::array<N, float> data; };

We have added a new constructor which takes the parent as a parameter. This is where we construct an instance from an expression type rather than a list of values. It looks like this:

template <size_t N>
template <class E>
vector_f<N>::vector_f(vector_expression<E> const& e)
  : data(e.size()) {
  for (size_t i = 0; i != e.size(); ++i) {
    data[i] = e[i]; //(1)
  }
}

This is where the evaluation takes place. The final part is the actual addition expression class. This looks like:

template <class E1, class E2> class vector_sum
  : public vector_expression<vector_sum<E1, E2>> {
  E1 const& u;
  E2 const& v;
public:
  vector_sum(E1 const& u, E2 const& v);
  float operator[](size_t i) const { return u[i] + v[i]; } //(2)
  size_t size()  const { return v.size(); }
};
template <typename E1, typename E2> vector_sum<E1, E2> operator+(vector_expression<E1> const& u, vector_expression<E2> const& v) { return vector_sum<E1, E2>(*static_cast<E1 const*>(&u), *static_cast<E2 const*>(&v)); }

That is everything required for addition. The expression

a + b + c

is now no longer of the type vector_f<3>, but of the type

vector_sum<vector_sum<vector_f<3>, vector_f<3>>>

When this expression is assigned to a vector_f<3> object, the constructor that takes a vector_expression is invoked, which assigns the expression elements to the data elements (1). The bracket operator for vector_sum returns the sum of the two objects (2), which recurses to the sum of another two vector_sum objects, which

finally expands to the sum of three elements.

As you can see, this requires no temporaries and only one loop, which is exactly what we were looking for. This sort of lazy evaluation is also used in the ranges library: expression types are built during compilation and evaluated at the moment of assignment.

Now, that example took a lot of exposition. It is clear that the amount of documentation required to explain what is going on is considerable, and the biggest problem I have with code reviews is the lack of documentation, which means that the chances of sufficient documentation for tricks like this are low. However, this is a popular pattern that can yield benefits, so, if you are going to deploy it (because you really need to), make sure you do so with thorough documentation. Also, make sure that you do really need to deploy this pattern. Compiler writers are clever. Check that there is indeed a shortfall in performance. Look at the generated assembly before and after. Measure performance. Measure build time. What yielded advantage three years ago may no longer have the edge, so look before you leap, or rather, check before you commit.

std::enable_if => requires

Let’s look at another popular technique.

When C++98 emerged, one of the first things I did was look at the interactions of the containers with the algorithms. At the time, I was working for a game company in London and trying to persuade some of the engineers that C++ was the way to go (I was possibly a little premature). I demonstrated tiny functions that would search and sort their way through our collections of stuff. The big problem, though, was std::vector. Whenever its capacity was exceeded and a resize was triggered, it would copy the contents, one by one, to the new location using a for loop, and then it would destroy the prior versions. Of course it did: the standard required the contents be copy constructed to their new location and then their destructors invoked. In the majority of situations, though, there was an obvious optimization, and that was to use memcpy rather than per-item copying.

memcpy is a C function that you should never use directly, since it simply duplicates the contents of memory somewhere else, without any regard for the copy constructor. This means that although the data is correct, any additional initialization, for example registering objects with a notification service, or destruction, for example deregistering from the notification service, is skipped. memcpy is an implementation detail that should not be part of your daily use.

My colleagues turned and walked away. I was crestfallen and decided to write a vector that would use memcpy rather than copy constructing and destructing as it should have done. I called it mem_vector. This was only for use with types that had no constructor or destructor, and it was great. Everyone was excited about it and put it to good use in combination with the algorithms. I was on cloud nine for about three days until someone broke everything by adding a constructor to a type. The mem_vector was no longer fit for purpose in that case, and soon it became clear that it was not fit for purpose unless the code was finished, by which point The Old Ways had been followed anyway.

What I REALLY needed was a way of choosing between using memcpy and individual per-element copy-and-destroy. This was a bit of a head scratcher, since it meant providing a member function overload that would be chosen for families of types rather than specific types. I wanted to be able to declare:

template <class T>
void mem_vector<T>::resize(size_type count);

(this was before C++11) along with

template <class Trivial>
void mem_vector<Trivial>::resize(size_type count);

and have the compiler call the appropriate mem_vector depending on the nature of the template parameter rather than the specific type of the template parameter. One day I heard about SFINAE (Substitution Failure Is Not An Error) and suddenly it dawned on me: I could embed an empty struct called trivial in all the appropriate types. Then I could redeclare the second resize function:

template <class T>
void mem_vector<T>::resize(size_type count, T::trivial* = 0);

If the trivial member struct did not exist, then the function would not be considered. But all I had done was move the problem around. There was still no way of enforcing the trivial member struct would be removed if the class were no longer trivial. This was especially obvious when structs were inherited from other trivial structs and the parent struct became nontrivial. I despaired.

It wasn’t all bad: we were able to fudge something together using compiler-specific intrinsics, and then C++11 arrived. Suddenly everything was fixed by the awesome power of std::enable_if and the type_traits header. Oh, frabjous day!

There was one teensy little problem, though. The code was nearly illegible. Types like this:

std::enable_if<std::is_trivially_constructible<T>::value>::type

(yes, that’s a type) became scattered throughout the codebase, introducing further cognitive load.

Again, the problem with this TMP technique is that it obfuscates the code and puts the brakes on rapid apprehension of what is going on. Frankly, it looks like transmission noise to me. Yes, with practice, we became able to read it with some ease, but there was a big learning phase for new programmers.

This complexity did not prevent the widespread adoption of this technique. A quick tour through older GitHub repositories will reveal swathes of code with carefully crafted function overloads differentiated by std::enable_if clauses of varying verbosity. The committee was not blind to this problem, though, and in C++17 a new facility was added to the language: the constexpr if statement.

This beauty solved a few classes of problems that would historically have seen you reaching for enable_if. It allows you to evaluate an expression at compile time and choose execution based on the result. You can use it like this:

if constexpr(sizeof(int) == 4)
{
  // ints are 32 bits wide.
}

but particularly you can use it where you might have used enable_if, for example:

if constexpr(std::is_move_constructible_v<T>)
{
  // the function template is being specialized for a move-constructible type
}

Rather than having function template overloads for different types, you were now able to express the difference in a single function. For example, the following pair of overloads:

template <class T, typename =
    std::enable_if<std::is_move_constructible_v<T> >::type>
void do_stuff()
{
  …
}
template <class T, typename =
    std::enable_if<!std::is_move_constructible_v<T> >::type>
void do_stuff()
{
  …
}

can be replaced with:

template <class T>
void do_stuff()
{
  if constexpr(std::is_move_constructible_v<T>)
  {
    …
  }
  else
  {
    …
  }
}

The committee didn’t stop there, though. After a long gestation period concepts were added to the language, which brought with them constraints and requires

clauses. This makes it much clearer to specify the restrictions on types for specialization. A requires clause looks like this:

template <class T>
requires std::is_move_constructible_v<T>
void do_stuff()
{
  …
}

This function will only be available to types that are move constructible. This is conceptually identical to std::enable_if but it is a lot easier to apprehend. This gives us a way of avoiding TMP and being explicit with our intentions.

There is still more to come. One of the committee study groups, SG7, is devoted to reflection. This is what is being attempted with much of TMP. The intention is to add to the language many of the facilities that are being handcrafted using TMP, and eventually to eliminate the need for TMP. Reflection is a big part of metaprogramming, and the study group is bringing together many ideas to deliver a coherent metaprogramming strategy and solution. Even more features are being worked on that rely on reflection, one of which is metaclasses. This allows the programmer to define the shape of a class rather than just the substitutable types, allowing clients of the metaclass to instantiate types of classes without worrying about boilerplate code.

All of these features will deliver better clarity for everyone: rather than adding burdensome additional things to learn, it will enable the simplification of a lot of existing code and the elimination of unnecessary angle brackets.

We hope that this guideline has impressed upon you the complexity of TMP and why it should be deployed as a last resort. Of course, there are places where metaprogramming can only be delivered through the careful use of templates. The Hacker’s Delight of solving problems in such a way is strong but, in time, you should be able to deliver clearer code.

There is an aphorism that every engineer should hold close to their heart.

Clever code is easy. Easy code is clever.

The best code is that which is presented to the reader who, upon examining it, says, “What’s so special about this? It’s obvious.” I get the Hacker’s Delight when someone says this to me. It is very unusual for anyone to say that about template metaprogramming.

Summary

In summary:

• Metaprogramming in C++ is modeled via function and class templates

• This self-awareness offers reflection

• Metaprogramming techniques, while useful, have been adopted into the language more explicitly

• This trajectory is continuing, obviating the need for metaprogramming via templates

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.200.211