Chapter 5.1. P.4: Ideally, a program should be statically type safe

Type safety is a security feature of C++

I rank type safety close to deterministic destruction among my favorite features of C++. Type safety tells different parts of your program what a bit pattern means. For example, here is the bit pattern for 1,729 as a 32-bit integer:

std::int32_t Ramanujan_i = 0b0000'0000'0000'0000'0000'0110'1100'0001;

Here is the bit pattern for 1,729 as a 32-bit unsigned integer:

std::uint32_t Ramanujan_u = 0b0000'0000'0000'0000'0000'0110'1100'0001;

As you can see, it is identical. However, consider the bit pattern for 1,729 as a float:

float Ramanujan_f = 0b0100'0100'1101'1000'0010'0000'0000'0000;

If this bit pattern were interpreted as a 32-bit integer, it would yield the value 1,155,014,656. Conversely, if Ramanujan_i were interpreted as a float, it would yield the value 2.423 * 10-42.

Other languages use what is known as duck typing. This starts with the premise that the programmer knows exactly what they are doing at all times. Given this fact, if they want to multiply a string by a color, they should be able to do that, and the interpreter will try its very best to satisfy that request, emitting an error at runtime if it fails. Compile-time errors are preferable to run-time errors since they do not require code coverage testing to expose them. Enforcing type safety eliminates an entire class of errors.

The C++ standard goes to extraordinary lengths to help you write type-safe and yet readable code. If you call a function that takes a long argument and pass a short argument instead, the compiler knows how to generate code to accommodate that without generating errors. If you pass a float argument, the compiler will silently insert a run-time function call that will round the number to the nearest integer and convert the bit pattern to the appropriate integer representation. However, this is a time-consuming function. There was a point in my career where I would grep my compiler output for the vendor implementation of this function, _ftol, to ensure that it was never implicitly called.

Even better, function overloading enables you to create versions of functions for different types, and function templates allow you to specify algorithms independent of types. As we saw in Chapter 1.2, overload resolution is particularly nuanced, going so far as allowing certain conversions between types to ensure correct arguments are passed to function parameters. The _ftol function is an example of this behavior.

The auto keyword eliminates type concerns by inferring types from the code you are writing. If you write

auto Ramanujan = 1729;

the compiler will treat Ramanujan as an integer, populating the memory consumed by the definition with the correct bit pattern. Similarly, if you write

auto Ramanujan = 1729u;

or

auto Ramanujan = 1729.f;

the compiler will infer the type from the assignment (unsigned int and float, respectively) and correctly populate the memory.

All this machinery exists so that you do not need to worry about representation. The less time you spend programming with types and the more time you spend programming with interfaces, the safer your code will be.

Union

There are several ways of subverting type safety, most of which are legacies from C. Consider the union keyword.

union converter {
  float f;
  std::uint32_t ui;
};

This declares a type whose contents may be interpreted as either a float or an unsigned integer. As demonstrated above, this is a road to regret and a pathway to perdition. The programmer is required to know, at all times, which type is being represented. They are entirely entitled to write code such as:

void f1(int);
void f2(float);
void bad_things_happen() { converter c; c.ui = 1729; f1(c.ui); f2(c.f); }

The call to f2 will pass the value 2.423 * 10-42 which MAY be what was intended but seems highly unlikely.

The correct way to contain one of many types in a single object is to use the C++17 Standard Library type std::variant. This is a discriminated union, which means that the variant object knows which type it is holding. This follows the advice of Core Guideline C.181: “Avoid ‘naked’ unions.” Use it like this:

#include <variant>
std::variant<int, float> v; v = 12; float f = std::get<float>(v); // Will throw std::bad_variant_access

You might consider this a peculiar example, but I used such a union many years ago to quickly calculate an inverse square root. It looked like this:

float reciprocal_square_root(float val) {
  union converter {
    float f;
    std::uint32_t i;
  };
  converter c  = {.f = val};
  c.i = 0x5f3759df - (c.i >> 1);
  c.f *= 1.5f - (0.5f * val * c.f * c.f);
  return c.f;
}

This abomination uses type punning, which is undefined behavior in C++, to exploit the characteristics of floating-point representation and the superior speed of multiplication and bit shifting. The SSE instruction set has thankfully rendered this redundant thanks to the addition of the instruction rsqrtss, but since C++20 it has been possible to do this correctly using std::bit_cast, thus:

float Q_rsqrt(float val)
{
  auto half_val = val * 0.5f;
  auto i = std::bit_cast<std::uint32_t>(val);
  i = 0x5f3759df - (i >> 1);
  val = std::bit_cast<float>(i);
  val *= 1.5f - (half_val * val * val);
  return val;
}

This use of casting still fights type-safe programming, which allows us to segue nicely into the next section.

Casting

Casting is the act of changing the type of an object. It is a general computer science term and has a rather more precise set of meanings in C++. Changing the type of an object can hurt static type safety. Sometimes it is perfectly safe, as in this example:

short f2();
long result = f2();

Here, the result of the call to f2 has been converted from a short to a long using an implicit conversion. This is an integral promotion, where the representation is simply widened. Every value that can be represented by an object of type short can also be represented by an object of type long. The same is true when promoting from float to double.

Not all conversions are as safe as that. Consider this:

long f2();
short result = (short)f2();

This is an explicit conversion where the destination type is named. If you know that the return value of f2 lies within the representation range of a short object, then this is conditionally safe. However, it is not entirely safe: the function may change specification and start returning larger numbers. There is no requirement for the compiler to emit a diagnostic telling you that you are performing a risky conversion, although most will warn you.

This style of conversion is often known as C-style casting. The syntax comes straight from the C programming language. Conversion was important to the C programming language. Consider this function signature:

long max(long, long);

If we call this function with an object of type long and an object of type short and we don’t have conversion available to us, then we will be presented with an error and we will need to write a different function:

long max_ls(long, short);

(Since C doesn’t have overloading, when you write a version of the function that takes different parameter types, you need to use a different name.) Fortunately, we were saved from this excessive verbiage. In today’s C++, though, we have even less reason to worry. With function overloading and function templates we can afford to be more particular about casting.

Casting really can be quite risky. It’s so dangerous in fact that cumbersome new keywords were introduced to highlight that dangerous things are happening. They are:

static_cast<T>(expr) // Behaves like a C-style cast
dynamic_cast<T>(expr) // Casts to a type inherited from the expression
const_cast<T>(expr) // Eliminates the const qualification
reinterpret_cast<T>(expr) // Reinterprets the bit pattern

Core Guideline ES.48: “Avoid casts” warns you against it. The sight of these should strike fear into your heart. Some compilers offer a command-line switch that warns you of an explicit C-style cast so that you can quickly spot when something nasty is going on. Often, this will happen when you are calling a function from another library that is not so well written as yours. Replacing all your C-style casts with static_cast invocations will highlight where everything is getting messy. Your compiler will tell you when a static_cast makes no sense, for example casting from an int to a std::pair<std::string, char>. Just as with the C-style cast, you are saying, “I know what the range of values is, this will be fine.”

Things get more dangerous as you proceed down the list. dynamic_cast will allow you to cast through a hierarchy to a subclass. Just the sound of that description should worry you. When you type dynamic_cast you are that person who says during an argument, “I just know, I don’t need evidence.” A dynamic_cast can of course fail, particularly if the author’s certainty was misplaced. In such a case either a null pointer is returned or, in the case of applying dynamic_cast to a reference, an exception is thrown and you will have to sheepishly catch that exception and clean up the mess. It’s a design smell: you are asking, “Are you an object of type X?” but this is the purview of the type system. If you are explicitly inquiring, you are implicitly subverting it.

With const_cast, we are moving into “downright evil” territory. const_cast allows you to cast away constness or, more rarely, cast to const volatile. We covered this in Chapter 3.4, so you already know this is a bad idea. If you are eliminating const qualification, having been passed, for example, a const-qualified reference to something, you are pulling the rug out from under the caller’s feet. You have advertised in your API that “you can safely pass objects to this function by reference rather than by value, foregoing the overhead of copying, and it will remain unchanged.” This is a lie: if you need to cast away constness, then that suggests that you are going to change the object in some way, contradicting your advertised promise. You will win no friends proceeding like this. Again, the only time you should use this is when you are calling into a library that is poorly written, in this case one that is not const-correct. const_cast is also used to eliminate volatile qualification; there are several deprecations lining up regarding the volatile keyword, so this may become an acceptable use case.

Finally, with reinterpret_cast, we are well into footgun territory. A footgun is for shooting yourself in the foot, and this is a common outcome that accompanies the use of the keyword reinterpret_cast. It converts between types by simply declaring that the bit pattern should now be interpreted as something else. It costs no CPU cycles (unlike static_cast or dynamic_cast, which can insert a run-time function to perform the conversion) and simply says, “This is my show now. I don’t need type safety for this part.” Some things are unavailable for change: you cannot cast away const or volatile using reinterpret_cast. For that you need to perform two consecutive casts, which would look like:

int c = reinterpret_cast<int>(const_cast<MyType&>(f3()));

The sight of all this punctuation should give you pause for reflection. You might call something like this to create a handle to arbitrary types, but honestly, there are safer ways of doing that.

We have already encountered std::bit_cast earlier in the chapter. This is even nastier, but it is not a language-level operation; rather, it is a library facility. One thing that reinterpret_cast cannot be used for is to cast between pointer or reference types. You need to use std::bit_cast for that. This is the final level of depravity, as far away from statically type-safe as you can get.

Type safety is a security feature. Casting can break type safety. Highlight casting where it happens and avoid it where possible.

Unsigned

The unsigned keyword is a strange fish. It modifies integral types to signal that they will have unsigned representation. Particularly, it signals that the type does not use two’s complement representation. This variation in representation should be making you slightly queasy now after reading about casting.

In my experience, the most common incorrect application of unsigned derives from a misunderstanding of class invariants. An engineer might need to represent a value that can never be less than zero, and so will choose an unsigned type to do so. However, how do they assert that invariant? This will never fail:

void f1(unsigned int positive)
{
  … assert(positive >= 0);
}

There is no representation available for any number less than zero, so it will always be greater than or equal to zero. This also informs one of my (least) favorite code review bugs:

for (unsigned int index = my_collection.size(); index >= 0; --index)
{
  … // your program will never leave this loop
}

While it may seem like a good idea to represent kelvin temperature, mass, or screen coordinates with an unsigned type, the problem comes when you want to do some arithmetic. The output of this program is counterintuitive:

#include <iostream>
int main() {
  unsigned int five = 5;
  int negative_five = -5;
  if (negative_five < five) // signed/unsigned mismatch
    std::cout << "true";
  else
    std::cout << "false";
  return 0;
}

This will print false. You have fallen victim to the silent cast. During the comparison operation, negative_five is implicitly converted to an unsigned int. It is an integer promotion, widening representation. Unfortunately, that two’s complement bit pattern represents a huge number which is of course considerably greater than five. Core Guideline ES.100: “Don’t mix signed and unsigned arithmetic” is very clear about this.

You will notice that we used explicit types rather than auto. If we had instead used auto, the type of five would have been int. To make it an unsigned int we would have had to type:

auto five = 5u;

The default is signed. This is a default that C++ got right.

If you tried compiling this yourself, you will have almost certainly encountered a warning at line 5, noting that there is a signed/unsigned mismatch. You are perfectly entitled to write this code (it is not an error), but there may be trouble. This is why you should pay attention to all warnings and eliminate each one.

The problem is that if you try mixing signed and unsigned arithmetic, unwanted and entirely predictable things will happen. Your signed values will be promoted to unsigned values, and any comparison may yield the incorrect result. In larger code-bases, one of your libraries may export unsigned results, and this will infect libraries that deal purely with signed values. Disaster looms.

Even worse, some code shops obscure the use of unsigned by creating shorter aliases, for example:

using u32 = unsigned int;

Sight of the keyword unsigned should be the equivalent of neon flashing lights telling you to apply the brakes and navigate the hazardous roads ahead with extreme care.

There are some situations where unsigned is the correct choice. Core Guideline ES.101: “Use unsigned types for bit manipulation” highlights one of them. They are very limited, though:

• If you are modeling hardware registers that hold unsigned values

• If you are dealing with sizes rather than quantities; for example values returned by sizeof

• If you are doing bit manipulation with masks, since you will be doing no arithmetic with these values

Here is the rule: if you are doing any arithmetic, including comparison, use a signed type. If you are using an unsigned type to get an extra bit of representation, you are using the wrong type and you should go wider or recognize that you are performing a very risky optimization. It would not surprise me if a bitfield type makes its way into the language sooner or later, making even that case redundant.

Unfortunately, there is a rather large error in the standard library. All the size member functions on the containers return size_t, which is an unsigned type. This is a misunderstanding of the difference between quantity and amount. The size of a container is the quantity of elements it contains. The size of an object is the amount of memory it occupies.

Fortunately, since C++20, we have been blessed with the arrival of the function std::ssize, short for signed size. It returns a signed value. Disavow all use of the size member function, and instead use this nonmember function thus:

auto collection = std::vector<int>{1, 3, 5, 7, 9};
auto element_count = std::ssize(collection);

Buffers and sizes

Staying with sizes, consider buffers. There are two important things to keep in mind when dealing with buffers: the address of the buffer and the length of the buffer. If you do not overflow the buffer, everything is fine. Buffer overflow is a class of run-time error that can be fiendishly hard to discover.

For example, look at this code fragment:

void fill_buffer(char* text, int length)
{
  char buf[256];
  strncpy(buf, text, length);
  …
}

The obvious error is that length may be greater than 256. If you do not assert that the char array is big enough, you are open to risk.

Note that buf is of type char[256] which is a different type to, for example, char[128]. The size is important but can be easily lost by passing the address of the beginning of the array to a function that simply takes a pointer. Consider this pair of functions:

void fill_n_buffer(char*);
void src(std::ifstream& file)
{
  char source[256] = {0};
  … // populate source from file
  fill_n_buffer(source);
}

fill_n_buffer is expecting a char* yet it is being passed a char[256]. This is called array decay, and it is well named because the type is decaying into something less useful. The information about the size of the array is lost: the type char[256] has decayed to a char*. You must hope that fill_n_buffer is able to deal with this reduced information. The presence of the n in the name may suggest that it is expecting a null-terminated string, in the style of the C Standard Library, but we hope you can see that this is a risky proposition that could fail easily.

The code is working at a dangerous level of abstraction. The chance of overwriting memory is high, so the code is unsafe. The correct approach is to use a buffer abstraction rather than directly write to or read from memory. There are several available: std::string is a somewhat heavyweight approach for handling mutating strings of characters, but this is what we have in the standard, and this is not the place to dwell on its nature. However, if you are simply reading a buffer, there is a lighter abstraction called std::string_view. This marvel is a lightweight version of std::string consisting only of const member functions, aside from the special functions (the default constructor, the move and copy constructors, the move and copy assignment operators, and the destructor). It is usually implemented as a pointer and a size. You can construct it with a pointer and a size, or just a pointer, or a pair of iterators. This makes it very flexible and the first choice for working with read-only strings.

If your buffer contains something other than a character type, there are still options. In C++20 std::span was introduced to the library. This is a lightweight version of vector, also consisting only of const member functions. These two types mean that you should never be authoring functions with parameters that are pointer/ size pairs. Wrap them into a buffer abstraction, using std::span or std::string_view.

Summary

It is easier than ever to write secure C++. This should not be a surprise: one of the goals of C++ is safety, and through that, security. You can prioritize type safety and make use of abstractions like span that militate against buffer overflows. You can avoid representation clashes by avoiding unsigned types when using arithmetic, casting only when interfacing with old or poorly written libraries, and preferring discriminated unions to C-style unions. All of these things are easy replacements that lead to safer, easier, and more intelligible code by ensuring that your program makes full use of the type safety C++ provides.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.37.169