Basic types

Most of D's basic data types will be familiar to C-family programmers. In this section, we're first going to look at what the basic data types are. Then we'll discuss a couple of features that are related not only to the basic types, but to all types.

The types

First up, D includes the special type void to mean no type. There is no such thing as a variable of type void. As in C, void is used to indicate that a function does not return a value. void pointers can be declared to represent pointers to any type.

Instances of the bool type are guaranteed to be eight bits in size and can hold one of two possible values: true and false. In any expression that expects a Boolean value, any zero value is converted to false and non-zero is converted to true. Conversely, in any expression that expects a numeric type, false and true are converted to 0 and 1. Variables of type bool are initialized to false by default.

D supports signed and unsigned versions of integral types in 8-, 16-, 32-, and 64-bit flavors. The default initialization value of each is 0. The following table lists the size in bits along with the minimum and maximum values for each integral type. Note that the unsigned types are named by prefixing a u to the name of their signed counterparts.

Name

Size

Minimum Value

Maximum Value

byte

8

-128

127

ubyte

8

0

255

short

16

-32,768

32,767

ushort

16

0

65,535

int

32

-2,147,483,648

2,147,483,647

uint

32

0

4,294,967,295

long

64

-9,223,372,036,854,775,808

9,223,372,036,854,775,807

ulong

64

0

18,446,744,073,709,551,615

D supports three floating-point types. In addition to the traditional 32-bit float and 64-bit double, there is a third type called real. The latter is known to be of the largest floating point size representable in hardware. On x86, that is either 80-bits or the size of a double, whichever is larger. In reality, all floating point operations in D may be performed in the largest hardware size even when the operands are declared as float or double. Floating point computations and representations in D follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754). The pages http://dlang.org/float.html and http://dlang.org/d-floating-point.html are recommended reading.

Floating-point types are default-initialized to a type-specific value representing NaN (Not a Number). When a floating-point variable is assigned a value too high or too low for it to represent, it is set to a value representing infinity or negative infinity respectively.

There are three types in D which are intended to represent UTF code units. The default initialization value for each type is an invalid Unicode value. The following table lists each character type, its size in bits, its Unicode encoding, and its initialization value in hexadecimal notation.

Name

Size

Encoding

Init Value

char

8

UTF8

0xFF

wchar

16

UTF16

0xFFFF

dchar

32

UTF32

0x0000FFFF

Literals

D supports several different formats for basic type literals. We'll look at each in turn.

Integer literals

Integer literals can take three forms: decimal, hexadecimal, and binary. Hexadecimal numbers are denoted by the 0x or 0X prefixes and binary numbers by 0b or 0B. Any leading 0s after the prefix can be omitted. For example, 0x00FF and 0xFF are identical, as are 0b0011 and 0b11. Any other integer literal, as long as it does not begin with 0, is interpreted as decimal. Each format also allows for any number of underscores in any position except the first.

int d1 = 1000000;
int d2 = 1_000_000;
int h1 = 0x000000FF;    // Hexadecimal for 255
int h2 = 0X00_00_00_FF; // Ditto
int h3 = 0xFF;          // Ditto
int b1 = 0b01010101;    // Binary for 85
int b2 = 0B0101_0101;   // Ditto
int b3 = 0b101_0101;    // Ditto

Octal literals are supported in Phobos via the std.conv.octal function.

import std.conv : octal;
int oct = octal!377;

The ! in octal!377 is the syntax for template instantiation, which we'll examine in Chapter 5, Generic Programming Made Easy.

By default, all integer literals are inferred as int unless they require more than 32 bits, in which case they are inferred as long. Literals too big for long refuse to compile without help. That comes in the form of the uL and UL suffixes, which both force a literal to be interpreted as ulong. There are also the suffixes u and U to force uint, and L to force long. All of these suffixes work with decimal, hexadecimal, and binary literals, as well as the octal template (octal!377uL). In the following example, the typeid expression is used to obtain a textual version of each literal's type, which writeln then prints to stdout. Can you guess what the output of each is going to be? One of these lines will cause a signed integer overflow error.

writeln(typeid(2_147_483_647));
writeln(typeid(2_147_483_648));
writeln(typeid(2_147_483_648U));
writeln(typeid(9_223_372_036_854_775_807));
writeln(typeid(9_223_372_036_854_775_808));
writeln(typeid(9_223_372_036_854_775_808UL));
writeln(typeid(10));
writeln(typeid(10U));
writeln(typeid(10L));
writeln(typeid(10UL));

Floating-point literals

Floating-point literals can be represented in both decimal and hexadecimal forms. They are interpreted as double by default. Appending f or F will force a float and appending L will force a real. Note that 3.0, 3.0f, and 3f are all floating point literals, but 3 and 3L are integrals.

writeln(typeid(3.0));
writeln(typeid(3.0f));
writeln(typeid(3.0F));
writeln(typeid(3.0L));
writeln(typeid(3f));

Exponential notation is also supported as is the rarely-used hexadecimal format for floating point. The latter takes some getting used to if you aren't familiar with it. A description of both can be found on my D blog at http://dblog.aldacron.net/floating-point-literals-in-d/.

Character literals

The type of a character literal depends on how many bytes are required to represent a single code unit. The byte size of each code unit depends on the encoding represented by the type. The difference can be seen here:

char c1 = 'a';  // OK: one code unit
char c2 = 'é';  // Error: two code units
wchar wc = 'é'; // OK: one code unit

In UTF-8, which is what the char type represents, one code unit is eight bits in size. The literal 'a' fits nicely in eight bits, so we can store it in a variable of type char. The literal 'é' requires two UTF-8 code units, so it cannot be represented by a single char. Since it's only one code unit in UTF-16, the type of the literal is wchar.

Conversions

D has some rules that make it easy to know when one type can be converted to another through implicit conversion, and when a cast needed to force explicit conversion. The first rule we'll see concerns integral types: narrowing conversions are never implicit. Exhibit A:

int a = 100;
ubyte b = a;

Because int is a 32-bit value and ubyte is an 8-bit value, it doesn't matter that 100 will fit into a ubyte; it's a narrowing conversion and D just doesn't allow those implicitly. It can be coerced with a cast:

ubyte b = cast(ubyte)a;

In this case, the value 100 will be assigned to b successfully. However, if it were a value that does not fit in eight bits, such as 257, the cast would cause all but the eight least significant bits to be dropped, resulting in b having a completely different value. Note that going from an unsigned type to the signed type of the same size is not considered a narrowing conversion. The compiler will always implicitly convert in this case, and vice versa. Just be aware of the consequences. For example:

ubyte u = 255;
byte b = u; // b is -1

Next we can say that floating point types are never implicitly converted to integral types, but integral types are always implicitly converted to floating point.

float f1 = 3.0f;
int x1 = f1;   // Error
int x2 = 2;
float f2 = x2; // OK: f2 is 2.0

You can cast f1 to int and the assignment to x will compile, but in doing so you'll lose the fractional part of the float.

When a literal is assigned to a variable, the compiler uses a technique called value range propagation to determine whether or not to allow compilation without a cast. Essentially, if the literal can be represented by the type it's being assigned to, then the assignment (or initialization) will compile. Otherwise, the compiler produces an implicit conversion error, which can be eliminated by a cast. Some examples of this are as follows:

ubyte ub = 256;     // Error
byte b1 = 128;      // Error
byte b2 = 127;      // OK
float f = 33;       // OK
int i = 3.0f;       // Error

The last scenario to consider is when multiple types are used in binary expressions, which are expressions that have two operands. Take the addition expression as an example. What type is a?

byte b = 10;
short s = 1024;
auto a = b + s;

Answering this question requires knowing the rules for arithmetic conversions. If either operand is real, double, or float, then the other operand is converted to that type. If the operands are both integral types, integer promotion is applied to each of them. Types smaller than int (bool, byte, ubyte, short, ushort, char, wchar) are converted to int; dchar is converted to uint; however, int, uint, long, and ulong are left untouched. Once integer promotion is complete, the following steps are taken in order:

  • If both operands are the same type, no more conversions are necessary
  • If both are signed or both are unsigned, the smaller type is converted to the larger
  • If the unsigned type is smaller than the signed type, it's converted to the signed type
  • The signed type is converted to the unsigned type

Applying these rules to the snippet above, neither b nor s are floating-point types, so integer promotion takes place. Both types are smaller than int, so both are promoted to int. Next, we find that, since both types are now int, no further conversions are necessary and the operation can take place. So a is of type int. Change all three variables to ubyte and the same rules apply, a common source of confusion for new D users who don't understand why they get a compiler error in that case.

Alias declarations

An alias declaration allows an existing type (and other symbols, as we'll see later) to be referred to by a different name. This does not create a new type. Consider the following:

alias MyInt = int;
MyInt mi = 2.0;

The second line will fail to compile, producing an error message telling you that double cannot be implicitly converted to int. There's no mention of MyInt at all, because to the compiler, it isn't a type. It's simply a synonym for int.

Two aliases that are declared by default are size_t and ptrdiff_t. The former is defined to be an unsigned integral type large enough to represent an offset into all addressable memory. The latter is defined to be a signed basic type the same size as size_t. In practice, that means the respective types are uint/int in 32-bit and ulong/long in 64-bit.

Properties

You can think of properties as values that can be queried to divine information about types or type instances. Some properties are common to all types, others are type-specific. Some are context-dependent, meaning they can return different values depending on whether the query is made on a type, a variable, or a literal. Others are context-neutral, meaning they always return the same value for any given type and instances of that type. The following snippet demonstrates accessing a property:

writeln(int.sizeof);
writeln(3.sizeof);
int a;
writeln(a.sizeof);

Properties are accessed using dot notation on a type, a literal, or a variable, with the name of the property following the dot. The .sizeof property is one of those properties common to all types. It's also one that is context-neutral. Run the snippet and you'll find that the same value is printed for .sizeof on the type int, the integer literal 3, and the variable a.

There are five common properties that are available on every type. The two we most often care about are .init and .sizeof. The former tells you the default initialization value of a given type; the latter tells you the size, in bytes, of a given type as size_t. You can read about all the basic type properties, including those not shown anywhere in this section, at http://dlang.org/property.html.

Most built-in types have a few type-specific properties. The integral types all have properties called .min and .max that return the minimum and maximum values representable by variables of that type. Floating-point types have a number of properties, most of which are only of interest to people doing fairly involved floating-point work. Of general interest may be .nan and .inf, which return the values of NaN and infinity. .max returns the maximum value representable and its negation is the minimum.

writeln(float.max);     // Maximum float value
writeln(-float.max);    // Minimum float value

We're not going to go into all floating-point properties here. We will, however, take a look at an example of a program that reproduces the integral types table from earlier in this section.

auto formatStr = "%10-s %10-s %20-s %20-s";
writefln( formatStr, "Name", ".sizeof", ".min", ".max");
writefln(formatStr, "byte", byte.sizeof, byte.min, byte.max);
writefln(formatStr, "ubyte", ubyte.sizeof, ubyte.min, ubyte.max);
writefln(formatStr, "short", short.sizeof, short.min, short.max);
writefln(formatStr, "ushort", ushort.sizeof, ushort.min, ushort.max);
writefln(formatStr, "int", int.sizeof, int.min, int.max);
writefln(formatStr, "uint", uint.sizeof, uint.min, uint.max);
writefln(formatStr, "long", long.sizeof, long.min, long.max);
writefln(formatStr, "ulong", ulong.sizeof, ulong.min, ulong.max);

writefln was introduced in the previous chapter. It uses the same format specifiers that C uses, most of which have the same meaning. You'll find that %s is quite different, though. In C, it indicates that an argument is a string. In D, it means the default formatting for the given type should be used. For example:

writefln("Float.max is %s and int.max is %s", float.max, int.max);

Here, the compiler will substitute the value of float.max for the first %s and use the default float formatting. Similarly, int.max replaces the second %s with the default formatting for int. If you make a mistake and have more specifiers than arguments, you'll have no trouble compiling but will get a FormatException at runtime. If you have more arguments than specifiers, the extra arguments will be ignored.

We aren't using plain old %s in our program. We've added 10 and - between % and s. Format specifiers begin with % and end with a character. Several things can go in between. The 10 indicates that we want an argument to be printed in a field at least ten characters wide. The - means we want to left-justify the text within the field. %-10s and %10-s are the same. In other words, the string Name has four characters. Left justified in a field of ten characters, it will be followed by six spaces. The actual output looks like this:

Properties

You can read more about format strings and format specifiers at http://dlang.org/phobos/std_format.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.107.193