Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

6. Using Primitive Types

Carlo Milanesi¹

(1)

Bergamo, Italy

In this chapter, you will learn:

How to write numeric literals in hexadecimal, octal, or binary notation
How to use the underscore character to make numeric literals easier to read
How to use the exponential notation to write huge or tiny numbers in a compact form
Which are the twelve primitive integer numeric types, and the two primitive floating-point numeric types; what are their ranges; and when it is better to use each of them
How to specify numeric literals of concrete types or of unconstrained types
How to convert a numeric value to another numeric type
The other primitive types: Booleans, characters, and empty tuples
How type inference works
How to express the types of arrays and vectors
How to assign a name to a compile-time constant
How to use the compiler to discover the type of an expression

Nondecimal Numeric Bases

The way we write numbers every day uses the so-called decimal notation or base-ten notation . Though, sometimes it is handy to write numbers in a base different from ten:

let hexadecimal = 0x10;

let octal = 0o10;

let binary = 0b10;

let mut n = 10;

print!("{} ", n);

n = hexadecimal;

print!("{} ", n);

n = octal;

print!("{} ", n);

n = binary;

print!("{}", n);

This will print: 16 10 8 2.

It is so because, if a literal integer number starts with a zero digit followed by an “x” (that is the third letter of “hexadecimal”), that number is expressed in hexadecimal notation; if it starts instead with a zero followed by an “o” (that is the initial of “octal”), it is a number expressed in octal notation; and if it starts with a zero followed by a “b” (that is the initial of “binary”), it is a number expressed in binary notation. In every other case, the number is expressed in decimal notation.

The numbers of this example are expressed in different notations, but they are all of the same type: integer numbers.

The n variable could receive the assignments from the other variables, because they are all of the same type.

Instead, floating-point numbers can be expressed only in decimal notation.

Notice that such representations exist only in source code, as the machine code generated by the Rust compiler always uses a binary notation, both for integer numbers and for floating-point numbers.

For example, the program:

print!("{} {}", 0xA, 0b100000000);

and the program

print!("{} {}", 10, 256);

generate exactly the same executable program.

A last point: the letters used as hexadecimal digits may be indifferently uppercase or lowercase. For example, the number 0xAEf5b is equal to the number 0xaeF5B.

However, the letters used to indicate the numeric base must be lowercase. Therefore, the expressions 0X4, 0O4 and 0B4 are illegal.

The Underscore in Numeric Literals

We can write the integer number “one billion,” as 1000000000. But are you sure that it contains exactly nine zeros?

The integer number “one billion” comes out to be more readable if you write it as 1_000_000_000. The underscore characters (“_”) can be inserted in any literal number, even floating point, and they are ignored by the compiler.

Even the number 3___4_.56_ is a valid number, and it is equal to 34.56. Though usually the underscore characters are used only to group decimal digits or octal digits by three, or hexadecimal or binary digits by four, as in:

let hexadecimal = 0x_00FF_F7A3;

let decimal = 1_234_567;

let octal = 0o_777_205_162;

let binary = 0b_0110_1001_1111_0001;

print!("{} {} {} {}", hexadecimal, decimal, octal, binary);

This will print: 16775075 1234567 134023794 27121.

The Exponential Notation

Floating-point numbers can reach huge values, both positive and negative, like one billion of billions of billions of billions, and also hugely small values, that is, values very near to zero, like one billionth of a billionth of a billionth of a billionth. If we wrote the literals of such hugely large or small numbers using the notation used so far, we would have to write many zeros, and the resulting numbers would be hard to read, even using underscores.

But you can also write floating-point literal numbers in another way:

let one_thousand = 1e3;

let one_million = 1e6;

let thirteen_billions_and_half = 13.5e9;

let twelve_millionths = 12e-6;

print!("{}, {}, {}, {}", one_thousand, one_million,

thirteen_billions_and_half, twelve_millionths);

It will print: 1000, 1000000, 13500000000, 0.000012.

The first line uses a literal meaning “one times ten raised to the third power.” To get the usual decimal notation, it is enough to write the number before the “e” and then shift the decimal point to the right by as many places as are indicated after the “e”, adding zeros if there are not enough digits. In our case, we write “1”, then we shift the point by three places, adding as many zeros, and we get the number “1000”.

The number before the “e” is named mantissa, while the number following it is named exponent. They are both signed decimal numbers. The mantissa may have also a decimal point, and it may be followed by a fractional part.

This notation is named exponential. The literals written in exponential notation are still floating-point numbers, even if they are written with no decimal point.

The second literal in the example, 1e6, means one times ten raised to the sixth power. It is equal to 1,000,000.

The third literal, 13.5e9, means thirteen point five times ten raised to the ninth power. There is already one decimal digit beyond which we have to shift the point, and after it we must add eight zeros, to get the value 13500000000. This number also could be written as 1.35e10, meaning one point thirty-five times ten raised to the tenth power, or as 135e8 or 13500e6, and in other ways, all generating the same machine code.

Finally, the fourth literal in the code example, 12e-6, means twelve times ten raised to the sixth negative power, or, equivalently, twelve divided by ten raised to the sixth power. Using a negative exponent of ten is equivalent to writing the mantissa and then shifting the point to the left by as many digits as are indicated after the minus sign of the exponent, adding the needed zeros. So, the number 12e-6 is equal to the number 0.000012.

The Various Kinds of Signed Integer Numbers

So far, we said that there are two types of numbers: integer numbers and floating-point numbers.

There are programming languages having only floating-point numbers, without having a specific type for integer numbers; with respect to such languages, having two distinct numeric types is more complicated. But Rust is much more complicated, as it can actually use twelve different integer numeric types and two floating-point numeric types. Having further numeric types can have performance advantages, so that many programming languages have several numeric types.

So far we used integer numbers and floating-numbers, without further specifications, but it is possible to be more precise when defining the internal format of such numbers.

Here an important aspect of Rust appears: efficiency. Actually, Rust has been designed to be extremely efficient.

A simple programming language could use only 32-bit integer numbers. But if we want to store a long sequence of small numbers, say between 0 and 200, and if every value were stored in a 32-bit object, some memory would be wasted, because it is possible to represent any number between 0 and 200 using only 8 bits, which is a quarter of 32 bits.

This is not only for saving RAM space or storage space, but also to optimize speed. That’s because the larger our objects are, the more cache space they use, and cache space is strictly limited. If an object cannot be contained in cache, its access will slow down the program. To have a fast program, you should keep in cache as much processed data as possible. To this purpose, objects should be as small as possible. In our example, we shouldn’t use more than 8 bits to store our numbers.

On the other hand, 32-bit numbers could be not large enough to represent all the values required by the application. For example, a program could need to store with accuracy a number larger than ten billion. In such a case, a type having more than 32 bits is needed.

Therefore, Rust provides the opportunity to use 8-bit integer numbers, 16-bit integer numbers, 32-bit integer numbers, 64-bit integer numbers, and also 128-bit integer numbers. And in case many of them are needed, like in an array or in a vector, it is advisable to use the smallest data type able to represent all the values required by application logic.

Let’s see how we can use these numbers:

let a: i8 = 5;

let b: i16 = 5;

let c: i32 = 5;

let d: i64 = 5;

let e: i128 = 5;

print!("{} {} {} {} {}", a, b, c, d, e);

The “: i8” clause, inserted in the first statement, and the similar ones in the four following statements, define the datatype of the variable that is being declared, and also of the object represented by such variable.

The words i8, i16, i32, i64, and i128 are Rust keywords that identify, respectively, the type 8-bit signed integer number, the type 16-bit signed integer number , the type 32-bit signed integer number, the type 64-bit signed integer number, and the type 128-bit signed integer number. The i letter is the initial of “integer.”

Such types identify with precision how many bits will be used by the object. For example, the “a” variable will use eight bits, which will be able to represent 256 distinct values; being a signed number, such an object will be able to contain values between -128 and +127, extremes included.

The variables “a”, “b”, “c”, “d”, and “e” are of five different types, so if we append an assignment statement from one of these variables to another, like b = d;, we will get a compilation error.

We already saw that it is not possible to compute an addition between an integer number and a floating-point number, because they have different types. Similarly, it is not possible to sum two integer numbers having a different number of bits:

let a: i8 = 5;

let b: i16 = 5;

print!("{}", a + b);

This code generates two compilation error messages for the same error: mismatched types and cannot add `i16` to `i8`.

Maybe someone will wonder why the number of bits of an integer number must be exactly 8, 16, 32, 64, or 128, and not, for example, 19.

This is due to two reasons, both regarding efficiency:

Every modern processor has instructions for arithmetic and data transfer that apply efficiently only to numbers having a number or bits that is a power of 2, starting from a minimum of 8 bits; that is the minimum addressable chunk of bits. A 19-bit number would be handled anyway by the same machine language instructions that handle 32-bit numbers, and therefore there is no advantage in distinguishing a 19-bit type from 32-bit type.
Memory management is more efficient when manipulating objects having as size a power of two. Therefore, having objects of different sizes causes less efficient code or the need to allocate additional space to reach a power of two (this operation is named padding).

Unsigned Integer Number Types

If we had to define an object containing an integer number that can have values between 0 and 200, which type is best to use for that object? To minimize the space in cache, in RAM, and in storage, and also to minimize the time to access such memory, it should be better to use the smallest type among those that can represent all such values. The i8 type is the smallest, but it can only represent values between -128 and +127, and therefore it’s no good. So, with the type encountered so far, we must use i16.

This is not optimal, as only eight bit could represent all the values between 0 and 255. Though, it is required to reinterpret such bits to mean such numbers; it happens that such reinterpretation is already included in the machine language of all modern processors, so it would be a pity (read “inefficient”) to not use it.

Therefore, Rust allows the use of five other numeric types:

let a: u8 = 5;

let b: u16 = 5;

let c: u32 = 5;

let d: u64 = 5;

let e: u128 = 5;

print!("{} {} {} {} {}", a, b, c, d, e);

Here we introduced five other types of integer numbers. The “u” letter, shorthand for unsigned, indicates it is an unsigned integer number. The number after the “u” letter indicates how many bits are used by such object; for example, the “a” variable uses eight bits, using which it can represent 256 distinct values; therefore, being an unsigned number, such values will be the integer numbers from 0 to 255, extremes included.

But there is at least one other reason to prefer unsigned numbers to signed numbers. If we want to check if a signed integer number x is between zero included and a positive value n excluded, we should write the Boolean expression 0 <= x && x < n. But if x is an unsigned number, such a check can be done simply using the expression x < n.

Notice that the variables “a” “b”, “c”, “d”, and “e” have five different types, which are distinct among them, and also from the corresponding signed types.

Target-Dependent Integer-Number Types

So far we have seen ten different types to represent integer numbers, but Rust still has other integer numeric types.

When you access an item in an array or in a vector, which type should have the index?

You could think that, if you have a small array you could use an i8 value or a u8 value, while if you have a somewhat larger array it would be required to use instead an i16 value or a u16 value.

It isn’t so. It turns out that the most efficient type to use as index of an array or of a vector is as follows:

On 16-bit computers, it is an unsigned 16-bit integer.
On 32-bit computers, it is an unsigned 32-bit integer.
On 64-bit computers, it is an unsigned 64-bit integer.

In other words, the index of an array or vector should be unsigned, and it should have the same size of a memory address.

At present, Rust supports one 16-bit system, several 32-bit systems, and many 64-bit systems. So, which type should we use to write some source code that should be optimal on all such systems?

Notice that it is not relevant on which system the compiler runs, but on which system the program generated by the compiler will run. Actually, by a so-called cross-compilation, a compiler can generate machine code for a system having a different architecture from the one where the compiler is run. The system for which machine code is generated is named target. So, there is a need to specify an integer numeric type having a size dependent on the target, which is a 16-bit integer if the target is a 16-bit system, a 32-bit integer if the target is a 32-bit system, and a 64-bit integer if the target is a 64-bit system.

To such purpose, Rust contains the isize type and the usize type:

let arr = [11, 22, 33];

let i: usize = 2;

print!("{}", arr[i]);

This will print: 33.

In the word usize, the “u” letter indicates it is an unsigned integer, and the size word indicates it is a type thought to measure the length of some (possibly very large) object.

The usize type is implemented by the compiler as the u16 type, if it is generating machine code for a 16-bit system; it is implemented as the u32 type , if it is generating machine code for a 32-bit system; and it is implemented as the u64 type, if it is generating machine code for a 64-bit system.

In general, the usize type is useful every time there is a need for an unsigned integer, having the same size of a memory address (aka pointer).

In particular, if you have to index an array:

let arr = [11, 22, 33];

let i: usize = 2;

print!("{}", arr[i]);

let i: isize = 2;

print!("{}", arr[i]);

let i: u32 = 2;

print!("{}", arr[i]);

let i: u64 = 2;

print!("{}", arr[i]);

This code will generate three compilation errors, one for each call to print, except the first one. The error messages are:

the type `[{integer}]` cannot be indexed by `isize`

the type `[{integer}]` cannot be indexed by `u32`

the type `[{integer}]` cannot be indexed by `u64`

Actually, only the usize type is allowed as an index of an array.

Similar error messages are printed if you use a vector instead of an array.

In such a way, Rust allows us to access arrays and vectors only in the most efficient way.

Notice that it is not allowed even to use an index of u16 type on a 16-bit system, nor to use an index of u32 type on a 32-bit system, nor to use an index of u64 type on a 64-bit system. This guarantees source code portability.

For symmetry, there is also the isize type, which is a signed integer, having the same size of a memory address in the target system.

Type Inference

In the previous chapters we were declaring variables without specifying their type, and we were talking about the types “integer number,” “floating-point number,” and so on.

In this chapter we started to add to the variable declarations the data type annotations.

But if no type is specified, do variables still have a specific type, or are they of a generic type?

let a = [0];

let i = 0;

print!("{}", a[i]);

This program is valid. How come? Didn’t we say that to index an array, only usize expressions are valid?

In fact, each variable and each expression always has a well-defined type, but it is not always required to specify explicitly such a type. In many cases the compiler is able to deduce it, or, as it is usually said, to infer it, from the way in which the variable or expression in question is used.

For instance, in the preceding example, after having assigned to “i” the integer value 0, the compiler infers that the type of “i” must be that of an integer number, but it has not yet determined exactly which one, among the twelve integer types available in Rust. We say that the type of such variable is that of a generic, or better, unconstrained integer number.

However, when the compiler realizes that such variable is used to index an array, an operation allowed only to the usize type, the compiler assigns the usize type to the “i” variable, as it is the only allowed type in that statement.

In this program,

let i = 0;

let _j: u16 = i;

the compiler first determines that “i” is of type “unconstrained integer number.” Then it determines that “_j” is of type u16, as such type is explicitly annotated. Then, as “i” is used to initialize “_j”, an operation allowed only to expressions of type u16, it determines that “i” is of such type.

But the compilation of this program

let i = 0;

let _j: u16 = i;

let _k: i16 = i;

generates an error at the third line, with the message expected `i16`, found `u16`.

Indeed, the compiler, following the preceding reasoning, has determined at the second line that “i” must be of type u16, but at the third line “i” is used to initialize a variable of type i16.

Conversely, this program is valid:

let i = 0;

let _j: u16 = i;

let _k = i;

In this case, it is the variable “_k” that is of type u16.

Notice that such reasoning is always performed at compile time. In the final stage of every successful compilation, every variable has one concrete, constrained type.

If the compiler cannot infer the type of a 0 variable, it generates the compilation error: type annotations needed.

Instead, if the compiler succeeds in inferring only that the type is an integer one but it cannot constrain it to a specific integer type, then, as default integer type, it takes the type i32.

For example, if you try to compile the following statement:

let _n = 8_000_000_000;

You get the error message: literal out of range for `i32`.

This means that first the compiler has inferred i32 to be the type of the expression used to initialize the _n variable, and then it has noted that such literal has a value larger than the maximum value for the i32 type.

The Type Inference Algorithm

We saw that the compiler always tries to determine a concrete type for each variable and for each expression. For what we saw so far, the algorithm used is the following one:

If a type is explicitly specified, the type of the variable must be the specified one.
If the type of a variable or an expression hasn’t yet been determined at all, and such variable or expression is used in an expression or in a declaration that can be valid only with a specific type, then that type is determined in this way for such variable or expression. Such determination may be of a constrained kind, or it may be of an unconstrained kind. A constrained type is a specific type, like i8 or u64, while an unconstrained type is a category of types, like {integer}. The word constrained is used to mean that the compiler finds that the Rust syntax allows only one specific type in that context. The word unconstrained means that the compiler finds that a family of types, like all the positive integers or all the integers, is allowed in that context.
If, at the end of the parsing, the compiler has determined only that a variable is of an unconstrained integer numeric type, that type is defined to be i32. Instead, if the type is completely undetermined, a compilation error is generated.

Floating-Point Numeric Types

Regarding the floating-point numeric types, the situation is similar to that of integer numbers, but much simpler: at present in Rust there are only two floating-point types.

let a: f64 = 4.6;

let b: f32 = 3.91;

print!("{} {}", a, b);

This program will print: 4.6 3.91.

The f64 type is that of 64-bit floating-point numbers, while the f32 type is that of 32-bit floating-point numbers. The “f” letter is shorthand for floating-point. Such types correspond exactly and respectively to the double and float types of the C language.

So far, Rust has no other numeric types, but if a 128-bit floating-point type were added, probably its name would be f128.

What we said about integer types holds also for these types. For example:

let a = 4.6;

let mut _b: f32 = 3.91e5;

_b = a;

This program is valid. The compiler, parsing the first line, determines that the variable “a” has an unconstrained floating-point numeric type. Then, it determines that the type of the variable “_b” is f32, because it is specified by a type annotation. Then, parsing the third line, it determines that the variable “a” is of type f32, because this is the only type allowed to assign a value to a variable of f32 type.

The default floating-point type is the 64-bit one. Therefore, if there wasn’t the last line in this program, the “a” variable would have been of f64 type.

For floating-point numbers, the criteria to choose between 32-bit and 64-bit numbers are similar to those for integer numbers, but somewhat fuzzier. It is still true that 32-bit numbers occupy exactly half as much memory and cache as 64-bit numbers. And it is still true that the maximum value that can be represented by a 64-bit number is larger than the maximum value that can be represented by a 32-bit number. However, the latter is so large that it is rarely exceeded.

Instead, a more important fact is that 64-bit numbers have much more digits in mantissa, and that makes such numbers much more precise. Indeed, 32-bit numbers have a 24-bit mantissa, while 64-bit numbers have a 53-bit mantissa.

To give you an idea, 32-bit numbers can represent exactly all the integer numbers only up to around 16 million, while 64-bit numbers can represent exactly all the integer numbers up to around 9 million billions. Put in other words, each value of f32 type, expressed in decimal notation, has almost 7 significant digits, while every f64 has almost 16 significant digits.

Explicit Conversions

We have said several times that Rust performs a very strict checking of types: every time that the compiler expects an expression of a certain type, it generates an error if it finds an expression of another type, even if similar; and in every arithmetic expression, the compiler expects that its operands are of the same type.

These rules would seem likely to forbid any calculation involving objects of different types; but it isn’t so:

let a: i16 = 12;

let b: u32 = 4;

let c: f32 = 3.7;

print!("{}", a as i8 + b as i8 + c as i8);

This will print: 19.

The variables “a”, “b”, and “c” are of three different types. The last one is not even an integer number. However, by using the “as” operator, followed by the name of a type, you can do many conversions, including the three just shown.

All of the three objects of the example are converted into objects of type i8, so such resulting objects can be summed up.

Notice that if the destination type is less expressive than the original type, you may lose information. For example, when you convert the fractional value 3.7 into the integer type i8, the fractional part is discarded, and 3 is obtained.

The following code generates three compilation errors:

let a = 500 as i8;

let b = 100_000 as u16;

let c = 10_000_000_000 as u32;

print!("{} {} {}", a, b, c);

The first one has the message: literal out of range for `i8`. It happens because the value 500 is too large for the type i8.

Similar messages are emitted for the second and third lines.

But if you insert at the beginning of the program the following line:

#[allow(overflowing_literals)]

you can compile the program, which will print some possibly surprising numbers: -12 34464 1410065408.

The inserted attribute instructs the compiler to allow the use of literals that surely will cause arithmetic overflow at runtime.

The resulting values are understandable only when thinking about the binary code that is used to represent the integer numbers.

If we take the binary representation of the value 500, we extract its least significant 8 bits, and then we interpret such an 8-bit sequence as an “i8” object, when we print that object in decimal notation, we get -12.

Similarly, the least significant 16 bits of the binary representation of one hundred thousand, interpreted as an unsigned integer, are printed in decimal notation as 34464; and the least significant 32 bits of the binary representation of ten billions, interpreted as an unsigned number, are printed in decimal notation as 1410065408.

Therefore the “as” operator, if applied to an integer number object, extracts from that object enough least significant bits to represent the specified type, and it generates such value as the result of the expression.

Type Suffixes of Numeric Literals

So far, we used two kinds of numeric literals: the integer ones, like -150; and the floating-point ones, like 6.022e23. The former is of type unconstrained integer number, and the latter of type unconstrained floating-point number.

If you want to constrain a number, there are several ways:

let _a: i16 = -150;

let _b = -150 as i16;

let _c = -150 + _b - _b;

let _d = -150i16;

All these four variables are of type i16, and they have the same value.

The first one has a type annotation in the variable declaration. In the second line, the unconstrained integer numeric expression has been converted into a specific type. In the third line, the expression contains the subexpressions “_b” of a specific type, so the whole expression gets that type. At last, in the fourth line a new notation is used.

If a type is specified after an integer numeric literal, the literal gets such type, as if an “as” keyword were interposed. Notice that between the literal and the type, no blank is allowed. If you like, you can add some underscores, like in -150_i16 or 5__u32.

Similarly, you can add a type-specification suffix also to floating-point literals: 6.022e23f64 is a 64-bit floating-point number, while -4f32 and 0_f32 are 32-bit floating-point numbers. Notice that the decimal point is not needed if there is no fractional digit.

All the Numeric Types

In summary, here is an example that uses all the Rust numeric types:

let _: i8 = 127;

let _: i16 = 32_767;

let _: i32 = 2_147_483_647;

let _: i64 = 9_223_372_036_854_775_807;

let _: i128 = 170_141_183_460_469_231_731_687_303_715_884_105_727;

let _: isize = 100;

let _: u8 = 255;

let _: u16 = 65_535;

let _: u32 = 4_294_967_295;

let _: u64 = 18_446_744_073_709_551_615;

let _: u128 = 340_282_366_920_938_463_463_374_607_431_768_211_455;

let _: usize = 100;

let _: f32 = 1e38;

let _: f64 = 1e308;

All of them are initialized to the maximum value of their type, except for isize and usize, whose maximum value depends on the processor architecture; and except for f32 and f64, whose mantissa cannot be exactly represented in decimal notation. And here is the list of all Rust built-in integer numeric types.

Type	Occupied Bytes	Minimum Value	Maximum Value
I8	1	–128	+127
i16	2	–32,768	+32,767
i32	4	–2³¹	+2³¹ – 1
i64	8	–2⁶³	+2⁶³ – 1
i128	16	–2¹²⁷	+2¹²⁷ – 1
isize	2 or 4 or 8	on a 16-bit target: –2¹⁵; on a 32-bit target: –2³¹; on a 64-bit target: –2⁶³	on a 16-bit target: –2¹⁵ – 1; on a 32-bit target: –2³¹ – 1; on a 64-bit target: +2⁶³ – 1
u8	1	0	+255
u16	2	0	+65,535
u32	4	0	+2³² – 1
u64	8	0	+2⁶⁴ – 1
u128	16	0	+2¹²⁸ – 1
usize	2 or 4 or 8	0	on a 16-bit target: +2¹⁶ – 1; on a 32-bit target: +2³² – 1; on a 64-bit target: +2⁶⁴ – 1

Instead, there are just two floating-point numeric types:

f32, having 32 bits, is equivalent to the float type of the C language.
f64, having 64 bits, is equivalent to the double type of the C language.

Booleans and Characters

In addition to numeric types, Rust defines some other primitive built-in types:

let a: bool = true; print!("[{}]", a);

let b: char = 'a'; print!("[{}]", b);

This will print: [true][a].

The bool type, which we saw already, is equivalent to the type of C++ language having the same name. It admits only the two values: false and true. It is used mainly for the condition in the if statements and in the while statements.

The char type, which actually we haven’t seen yet, looks like the type of C language having the same name, but in fact it differs a lot from it. To start with, a C language char typically occupies only one byte, while an isolated Rust char occupies four bytes. This is due to the fact that Rust chars are Unicode characters, and the Unicode standard defines more than one million possible values.

Literal characters are enclosed in single quotes, and they can be also non-ASCII characters. For example, this code

let e_grave = 'è';

let japanese_character = 'さ';

println!("{} {}", e_grave, japanese_character);

in a terminal supporting Unicode, will print: è さ.

Notice that, differing from C language, neither bool nor char are in any way considered numbers. So, both of the following statements are illegal:

let _a = 'a' + 'b';

let _b = false + true;

For the first statement, the generated error is: cannot add `char` to `char`. For the other, the generated error is: cannot add `bool` to `bool`.

However, both types may be converted to numbers:

print!("{} {} {} {} {}", true as u8, false as u8,

'A' as u32, 'à' as u32, '€' as u32);

This will print: 1 0 65 224 8364.

In this way, we just discovered that true is represented by the number 1, false by the number 0, the “A” character by the number 65, the grave “a” by the number 224, and the euro symbol by the number 8364.

If you want to convert numbers into Booleans or into characters, you can use these features:

let truthy = 1;

let falsy = 0;

print!("{} {} {} {}", truthy != 0, falsy != 0,

65 as char, 224 as char);

This will print: true false A à.

But you cannot use the “as bool” clause with a number, because not every numeric value corresponds to a Boolean; in fact only zero and one have this property, so in general such a conversion would not be well defined.

So, if the zero value is meant to represent falsity for a number, to convert such number into a Boolean, which is to see if it corresponds to truth, it is enough to check if it is different from zero.

The situation of characters is similar. Each character is represented by a 32-bit number, so it may be converted to it. But not every 32-bit number represents a character, so some (actually, most) 32-bit numbers wouldn’t be convertible into characters. Therefore, the expression “8364 as char” is illegal.

To convert any number into a character, you need to use the char::from_u32 standard library function, not described here.

However, for each number between 0 and 255, there is a Unicode character corresponding to it, so it is allowed to convert any number of u8 type into a character. Actually, that has been done, in the preceding example, for the numbers 65 and 224.

It may be interesting, for those who don’t yet know Unicode, to see all the printable characters corresponding to the first 256 numbers. The characters having code from 0 to 31 and from 127 to 159 are control characters, which cannot be properly printed. Here is the code that prints the other ones:

for n in 32..127 {

println!("{}: [{}]", n, n as u8 as char);

}

for n in 160..256 {

println!("{}: [{}]", n, n as u8 as char);

}

This will print a line for each of the Unicode characters having code from 32 to 126 and from 150 to 255, as the ranges exclude the ending number.

Notice that n must be converted to u8 type before being able to convert it into a char.

The Empty Tuple

There is another primitive, weird type, whose name in Rust is “()”, which is just a pair of parentheses. Such type has only one value, which is written in the same way as its type, which is “()”. This type somewhat corresponds to the “void” type of the C language, or to the “undefined” type of JavaScript, as it represents the absence of type information. It is named empty tuple.

This type appears in several cases, like the following ones:

let a: () = ();

let b = { 12; 87; 283 };

let c = { 12; 87; 283; };

let d = {};

let e = if false { };

let f = while false { };

print!("{:?} {} {:?} {:?} {:?} {:?}",

a, b, c, d, e, f);

This code will print: () 283 () () () ().

The first line declares a variable of empty tuple type, and initializes it using the only possible value. In the last line, the print macro cannot match such type with the placeholder “{}”, so the debug placeholder “{:?}” must be used, as explained in the “Debug Print” section of Chapter 5.

The second line, let b = { 12; 87; 283 };, declares a variable and initializes it with the value of a block. From here, some new concepts appear.

The first concept is that a simple number like 12 or 87 can be used in place of any statement, by putting a semicolon character after it, because any expression can be used in place of a statement. Of course such a statement does nothing, so it will generate no machine code.

The second concept is that the value of a block is defined to be the value of its last expression, if there is such an expression; so, in the case of the second line, the value of the block is the integer number 283, and such value is used to initialize the “b” variable, which therefore will be of i32 type.

The third line, let c = { 12; 87; 283; };, shows the case where the contents of a block ends with the statement terminator, the semicolon character. In such case, the value of the block is the empty tuple, and that value is used to initialize the “c” variable, which therefore will be of empty tuple type.

The fourth line, let d = {};, declares the “d” variable, and it initializes it with the value of an empty block. Also, empty blocks have empty tuples as their values.

In the fifth line, let e = if false { };, there is a conditional expression without the “else” branch. When the “else” branch is missing, the “else { }” clause is implied. Therefore, the statement is meant to be “let e = if false { } else { }”. Such a conditional expression is valid, as both branches have the same type.

The sixth line, let f = while false { };, shows that the while-statement also has the value of an empty tuple. Actually both the while statement block and the while statement itself must always have an empty tuple as value, and therefore it makes little sense to use the while construct as an expression. This also holds for loop statements and for statements.

Array and Vector Types

When presenting arrays and vectors, we said that if we change the type of the contained items, we implicitly also change the type of the containers, both arrays and vectors; and if we change the number of the contained items, we implicitly also change the type of arrays but not the type of vectors.

If you want to make explicit the type of arrays or vectors, you should write:

let _array1: [char; 3] = ['x', 'y', 'z'];

let _array2: [f32; 200] = [0f32; 200];

let _vector1: Vec<char> = vec!['x', 'y', 'z'];

let _vector2: Vec<i32> = vec![0; 5000];

As it is shown, the expression that represents the type of an array contains both the type of the items and their number, separated by a semicolon and enclosed in square brackets.

The type of a vector is written as the word Vec (with an uppercase initial), followed by the type of the contained items, enclosed in angular brackets.

Constants

The following program is illegal:

let n = 20;

let _ = [0; n];

It generates the error attempt to use a non-constant value in a constant, because arrays must be of a length known at compile time. Even if “n” is immutable, and so, in a sense, it is constant, its initial value could be determined at runtime, therefore we are not allowed to use it to specify the size of an array.

But the following program is valid:

const N: usize = 20;

let _ = [0; N];

The “const” keyword allows us to declare an identifier having a value defined at compile time, and of course not changeable at runtime. In its declaration, it is required to specify its type.

Rust constants correspond to C language macros with no arguments. The C code that corresponds to the previous Rust code is this:

#define N 20

int _[N];

A Rust constant can be considered a name that at compile time is associated to a value, not to an object. The compiler replaces such value in every place in the program where the constant’s name is used.

Discovering the Type of an Expression

You will often come across an expression, and wonder what the type of that expression is.

This could be answered by an interpreter, an integrated development environment, or the written documentation, but there is a trick to answering such kind of questions using only the compiler.

Say we want to know the type of the expression 4u32 / 3u32, which in some languages is a floating-point number.

We just add a statement that tries to use that expression to initialize an empty tuple variable. If the program compiles with no errors, that means that the value of our expression is an empty tuple. But in our case we have:

let _: () = 4u32 / 3u32;

The compilation of this program generates the error mismatched types, and the detail of the error message explains expected `()`, found `u32`. From this explanation, we learn that our expression is of u32 type.

Sometimes, the error message is vaguer:

let _: () = 4 / 3;

For this program the error explanation is expected `()`, found integer. That word integer does not indicate a concrete type; it indicates a still unconstrained type, which the compiler has determined to be an integral type but not yet which of the several existing integral types.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Using Primitive Types

Create new playlist

Sign In

Sign Up

6. Using Primitive Types

Nondecimal Numeric Bases

The Underscore in Numeric Literals

The Exponential Notation

The Various Kinds of Signed Integer Numbers

Unsigned Integer Number Types

Target-Dependent Integer-Number Types

Type Inference

The Type Inference Algorithm

Floating-Point Numeric Types

Explicit Conversions

Type Suffixes of Numeric Literals

All the Numeric Types

Booleans and Characters

The Empty Tuple

Array and Vector Types

Constants

Discovering the Type of an Expression

Table of Contents for
6. Using Primitive Types