© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
C. MilanesiBeginning Rusthttps://doi.org/10.1007/978-1-4842-7208-4_5

5. Using Data Sequences

Carlo Milanesi1  
(1)
Bergamo, Italy
 
In this chapter, you will learn:
  • How to define sequences of objects of the same type, having fixed length (arrays) or variable length (vectors)

  • How to specify the initial contents of arrays or vectors, by listing the items or by specifying one item and its repeat count

  • How to read or write the value of single items of arrays or vectors

  • How to add items to a vector or to remove items from a vector

  • How to create arrays with several dimensions

  • How to create empty arrays or vectors

  • How to print or copy whole arrays or vectors

  • How to specify whether the compiler should deny, allow with a warning, or silently allow possible programming errors

  • What is a panic, and why it may be generated when accessing an array or a vector

Arrays

So far, we have seen how to store in a variable a string, a number, or a Boolean. But if you want to store several strings into a single variable, you can write this:
let x = ["English", "This", "sentence", "a", "in", "is"];
print!("{} {} {} {} {} {}",
    x[1], x[5], x[3], x[2], x[4], x[0]);

That will print: This is a sentence in English.

The first statement declares the x variable as an immutable object made of an array of six objects, specified in the statement itself. Each of such objects is of string type. Such a compound kind of object is indeed named array.

The second statement contains six expressions, each of them making a read access to a different element of x. Such accesses specify the item to be accessed by means of a positional index, or subscript, put between brackets. Notice that indexes always start from zero, and so, as the array has six elements, the index of the last item is 5. This behavior is similar to that of C language arrays.

To determine how many elements there are in an array, you can do this:
let a = [true, false];
let b = [1, 2, 3, 4, 5];
print!("{}, {}.", a.len(), b.len());

It will print: 2, 5.

The a variable is an array of two Booleans, while the b variable is an array of five integer numbers.

The third statement invokes on the objects a and b the len function of the standard library, to get the number of the objects contained in the array. Both the syntax and semantics of these invocations are similar to those of the expression "abc".len(), used to get the length in bytes of a string.

Notice that in the examples, every array contains elements of the same type; only strings, only Booleans, or only integer numbers.

If you try to write
let x = ["This", 4];
or
let x = [4, 5.];

you get a mismatched types compilation error, indicating the array cannot contain objects of different types.

It is possible to create arrays of any types of items, as long as in every array all the items are of the same type.

It is so, because there is not a single type array. The concept of array is that of a generic type, which is parameterized by its items type, and also by the number of its items. So, in the first example of the chapter, the variable x is of type “array of 5 strings”; instead, in the second example, a is of type “array of 2 Booleans,” and b is of type “array of 5 integer numbers.”

In the following program, every line but the first one will generate a compilation error:
let mut x = ["a"]; // array of strings
x[0] = 3;
x[-1] = "b";
x[0.] = "b";
x[false] = "b";
x["0"] = "b";

The second statement is wrong because it tries to assign an integer number to an item of an array of strings. So the compiler emits a mismatched types error message.

The other statements are wrong because the compiler expects any index used to access an array to be a non-negative integer number, and the expressions used in such statements are not.

The following one is a different case:
let x = ["a"]; // array of strings
let _y = x[1];

It has a valid syntax, but nevertheless it is wrong, because the second statement tries to access the second element of an array having only one element.

The compiler error message is: this operation will panic at runtime . It means that this statement would generate a runtime error when executed. The compiler message also contains the explanation: index out of bounds: the length is 1 but the index is 1.

Rust Attributes

The Rust compiler is somewhat flexible about the strictness of analysis in the Rust language. For some possible error conditions, like not using a variable after its last assignment, the compiler allows the developer to specify three possible behavior of compilation:
  • Deny: The compiler emits an error message, and it does not generate the executable code.

  • Warn: The compiler emits a warning message, but it does generate the executable code.

  • Allow: The compiler does not emit any message, and it does generate the executable code.

To specify such behaviors, the developer must write into the code specific statements, named attributes.

Here is an example:
#[deny(unused_variables)]
let x = 1;
#[warn(unused_variables)]
let y = 2;
#[allow(unused_variables)]
let z = 3;
When compiling this code, the following error is emitted:
error: unused variable: `x`
Then the following warning is emitted:
warning: unused variable: `y`

And then no executable code is generated, because there is at least one error.

The last statement does not generate any output.

This code contains three attributes. Any attribute usually is on a line by itself. It begins with a “#” character followed by an expression in brackets. It applies to the statement immediately following it.

In the cases shown, the expressions are the word deny or warn or allow, followed by an identifier in parentheses. The identifier must be a known compiler option. If this case it is unused_variables; that means that this attribute specifies how the compiler must handle the case that the value of the variable declared in the following statement is not used after the last assignment.

If a compiler option is not specified by an attribute, it has a default value anyway. The default value of the unused_variables option is warn. It is for this reason that if you don’t use the value of a variable after its last assignment, the generated warning is followed by the following line:
note: `#[warn(unused_variables)]` on by default
Instead of specifying an attribute just before any declaration, it is possible to declare it before a function definition, in the following way:
#[allow(unused_variables)]
fn main() {
    let x = 1;
    let y = 2;
}

This whole program does not generate any warnings, because the attribute in the first line applies to the whole main function, and therefore to both the x and y variables.

Another compiler option regards runtime errors.

Panicking

Accessing an array with a too large index has notoriously undefined behavior in C language. Instead, the Rust language has the goal of avoiding any undefined behaviors.

To avoid such an undefined behavior, the compiler inserts into the generated executable code some statements to check whether any access to arrays use a proper index, that is, an index less than the length of the array. In case an out-of-bounds index is actually used, such checking statements generate an abort operation for the current thread, that, by default, causes the immediate termination of the thread itself. If the process has just one thread, such thread termination causes the termination of the whole process.

In Rust jargon, such an abort operation is named panic, and the action of aborting the current thread is named panicking.

In general, the compiler cannot detect out-of-bound indexing errors, because array indices may have variable values; though, consider the following code, already cited before:
let x = ["a"];
let _y = x[1];
print!("End");

Here the compiler knows both the size of the array and the value of the index, because they are constants, and so it can foresee that the generated code will panic at runtime.

Therefore, when you compile this code, the compiler emits an error message containing the following sentences.

First:
error: this operation will panic at runtime
Then:
index out of bounds: the length is 1 but the index is 1
And then:
note: `#[deny(unconditional_panic)]` on by default

The first sentence states what the compiler has foreseen, as a compilation error.

The second sentence explains why this operation will panic.

The third statement explains that, by default, such condition generates a compile-time error. The identifier unconditional_panic means a condition in which, if this statement is executed, it will surely generate a panic at runtime; therefore it is probably wrong, or in any case it should be replaced by a more explicit error message.

You can obtain an executable program if you write, instead:
let x = ["a"];
#[warn(unconditional_panic)]
let _y = x[1];
print!("End");
It will generate the compilation message:
warning: this operation will panic at runtime
If you write the following code instead:
let x = ["a"];
#[allow(unconditional_panic)]
let _y = x[1];
print!("End");

you will get no compilation error or warning.

In both cases, when you run the program you will see on the terminal a message starting with this:
thread 'main' panicked at
'index out of bounds: the len is 1 but the index is 1'

After that, the process will terminate without printing the End word.

Notice that such abnormal termination is not caused by the operating system, like a segmentation violation, but it is caused by instructions inserted by the Rust compiler itself into the executable program.

Mutable Arrays

Now get back to what we can do with arrays.

The modification of the items of an array is possible only on mutable arrays:
let mut x = ["This", "is", "a", "sentence"];
x[2] = "a nice";
print!("{} {} {} {}.", x[0], x[1], x[2], x[3]);

This will print: This is a nice sentence.

The first statement contains the mut keyword, and the second statement assigns the new value to the third item of the array. This operation is allowed by the compiler, because the following three conditions hold:
  • The x variable is mutable.

  • The type of the new value assigned is the same as the other items of x. Actually, they are all strings.

  • The index is a non-negative integer number.

In addition, at runtime, the operation is performed without panicking, because the index is less than the array size, that is, the condition 2 < 4 holds.

Instead, it is not allowed to add items to an array or to remove items from an array. Therefore, its length is a compile-time-defined constant.

A mutable variable of an array type, as any mutable variable, can be the target of an assignment from another array:
let mut x = ["a", "b", "c"];
print!("{}{}{}. ", x[0], x[1], x[2]);
x = ["X", "Y", "Z"];
print!("{}{}{}. ", x[0], x[1], x[2]);
let y = ["1", "2", "3"];
x = y;
print!("{}{}{}.", x[0], x[1], x[2]);

This will print: abc. XYZ. 123.

In the first line, an array is created and assigned to the x variable. In the third line, another array is created and assigned to the same x variable, therefore replacing all the three existing strings. In the fifth line, another array is created and assigned to the new y variable. In the sixth line, such array is assigned to the x variable, therefore replacing the three existing values.

If x weren’t mutable, two compilation errors would be generated: one at the third line and one at the sixth.

The following code generates “mismatched types” compilation errors both at the second and at the third lines:
let mut x = ["a", "b", "c"];
x = ["X", "Y"];
x = [15, 16, 17];

Actually, because of the first line, x is of type “array of three elements of type string.” The statement in the second line tries to assign to x a value of type “array of two elements of type string”; in this case, the number of elements is different, even if the type of each element is the same.

The statement in the third line tries to assign to x a value of type “array of three elements of type integer number”; in this case, the number of the elements is the same, but the type of each element is different.

In both cases, the type of the whole value that is being assigned is different from the type of the target variable.

Arrays of Explicitly Specified Size

We already saw how to create an array by listing the items initially contained.

If you want to handle many items, instead of writing many expressions, you can write
let mut x = [4.; 5000];
x[2000] = 3.14;
print!("{}, {}", x[1000], x[2000]);

That will print: 4, 3.14.

The first statement declares the x variable as a mutable array of 5000 floating-point numbers, all initially equal to 4. Notice the usage of a semicolon instead of a comma inside the square brackets.

The second statement assigns the value 3.14 to the item at position 2000 of such array.

Finally the value, which never changed, at position 1000, and the value, which just changed, at position 2000 are printed. Notice that the valid indexes for this array go from 0 to 4999.

To scan the items of an array, the for-statement is very useful:
let mut fib = [1; 12];
for i in 2..fib.len() {
    fib[i] = fib[i - 2] + fib[i - 1];
}
for i in 0..fib.len() {
    print!("{}, ", fib[i]);
}

This will print: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144,.

This program computes the first 12 numbers of the famous Fibonacci sequence and then it prints them. The sequence is defined in this way: the first two numbers are both 1, and every other number is the sum of the two preceding numbers.

Examine the program.

The first statement creates a variable named fib and initializes it using an array of 12 numbers with value 1.

The statement in the following three lines is a for loop, in which the i variable is initialized to 2, and it is incremented up to the length of the array. The body of the loop assigns to each item of the array, starting from the third one, the sum of the two preceding items. Given that, when each item is written, the preceding items have already received their correct value, such assignment always uses correct values.

Finally, there is another for loop, to print all the numbers; this time its index starts from 0.

Multidimensional Arrays

You can easily write arrays having several dimensions:
let mut x = [[[23; 4]; 8]; 15];
x[14][7][3] = 56;
print!("{}, {}", x[0][0][0], x[14][7][3]);

This will print: 23, 56.

The first statement declares an array of 15 items, each of them being an array of 8 items, each of them being an array of 4 items, each of them initialized with the integer number 23.

The second statement accesses the 15th and last item of the array, so referring an array; then it accesses the 8th and last item of the array, so getting an array; then it accesses the 4th and last item of such array, so getting an item of type integer number; finally, it assigns the integer number 56 to that item.

The third statement prints the content of the very first item and the content of the very last item of the array.

Because multidimensional arrays are no more than arrays of arrays, it is easy to get the sizes of a given array:
let x = [[[0; 4]; 8]; 15];
print!("{}, {}, {}.",
    x.len(), x[0].len(), x[0][0].len());

This will print: 15, 8, 4.

A big limitation of arrays is the fact that their size must be defined at compilation time:
let length = 6;
let arr = [0; length];

The compilation of this code generates the error attempt to use a non-constant value in a constant. Actually, the expression length is a variable, and therefore conceptually it is not a compile-time constant, even if it is immutable and even if it has just been initialized by a constant. The size of an array cannot be an expression containing variables.

This is because, in general, immutable variables can have a value that is computed only at runtime, even if in this case it can be easily computed at compilation time. Instead, the size of an array must always be known by the compiler at compilation time.

Vectors

To create sequences of objects whose size is defined at runtime, the Rust standard library provides the Vec type, shorthand for vector:
let x = vec!["This", "is"];
print!("{} {}. Length: {}.", x[0], x[1], x.len());

This will print: This is. Length: 2.

The first statement looks like the creation of an immutable array, with the only difference being the appearance of the vec! clause.

Such a clause is an invocation of the vec macro of the standard library. vec is an abbreviation of vector.

The effect of such a macro is indeed to create a vector, initially containing the two strings specified between square brackets. Actually, when len() is invoked on such an object, 2 is returned.

Vectors allow us to do everything that is allowed for arrays, but they allow also us to change their size after they have been initialized:
let mut x = vec!["This", "is"]; print!("{}", x.len());
x.push("a"); print!(" {}", x.len());
x.push("sentence"); print!(" {}", x.len());
x[0] = "That";
for i in 0..x.len() { print!(" {}", x[i]); }

This will print: 2 3 4 That is a sentence.

The first line creates a mutable vector, initially containing two strings, and prints its length, meaning the number of strings contained in the vector.

The second line invokes the push function on the just created vector, and prints its new length. Such a function adds its argument at the bottom of the vector. To be legal, it must have exactly one argument, and such argument must be of the same type of the items of the vector. The word push is the word commonly used for the operation of adding an item to a stack data structure.

The third line adds another string at the end of the vector, so making the vector contain four items, and it prints the new length of the vector.

The fourth line replaces the value of the first item of the vector. This operation and the two preceding ones are allowed only because the x variable is mutable.

The fifth line scans the four items of the vector, and prints them all on the terminal.

Let’s see another example:
let length = 5000;
let mut y = vec![4.; length];
y[6] = 3.14;
y.push(4.89);
print!("{}, {}, {}", y[6], y[4999], y[5000]);

This will print: 3.14, 4, 4.89.

The second line declares a variable having as value a vector containing a sequence of 5000 items, each one of them having value 4. The length of the sequence is specified by the variable length. This wouldn’t be allowed with an array.

The third line changes the value of the seventh item of the vector.

The fourth line adds a new item at the end of the vector. Because the vector had 5000 items, the new item will get position 5000, that is, the last position of any vector having 5001 items.

Finally, three items are printed: the changed item, the item that was the last before the addition, and the just added item.

So we saw that, differing from arrays, you can create vectors with a length defined at runtime, and you can change their length at runtime, after initialization. Also vectors, like arrays, have generic types, but while the type of every array is defined by two parameters, a type and a length, the type of every vector is defined by a single parameter, the type of its elements. The length of every vector is variable at runtime, so it does not belong to the type, as in Rust all types are defined only at compile time.

Therefore, this program:
let mut _x = vec!["a", "b", "c"];
_x = vec!["X", "Y"];
is valid, as a vector of strings is assigned to a vector of strings, even if they have different lengths. However, this one:
let mut _x = vec!["a", "b", "c"];
_x = vec![15, 16, 17];

is illegal, as a vector of numbers cannot be assigned to a vector of strings.

As vectors can do all that arrays can do, what’s the need to use arrays? The answer is that arrays are more efficient, so if at compile time you know how many items to put in your collection, you get a faster program by using an array instead of a vector.

Those who know the C++ language should have imagined that Rust arrays are equivalent to C++ std::array objects, while Rust vectors are equivalent to C++ std::vector objects.

Other Operations on Vectors

The standard library provides many operations regarding vectors. Here are some of them:
let mut x = vec!["This", "is", "a", "sentence"];
for i in 0..x.len() { print!("{} ", x[i]); } println!();
x.insert(1, "line");
for i in 0..x.len() { print!("{} ", x[i]); } println!();
x.insert(2, "contains");
for i in 0..x.len() { print!("{} ", x[i]); } println!();
x.remove(3);
for i in 0..x.len() { print!("{} ", x[i]); } println!();
x.push("about Rust");
for i in 0..x.len() { print!("{} ", x[i]); } println!();
x.pop();
for i in 0..x.len() { print!("{} ", x[i]); } println!();
This program will print the following lines:
This is a sentence
This line is a sentence
This line contains is a sentence
This line contains a sentence
This line contains a sentence about Rust
This line contains a sentence

Let’s analyze it. The lines at position 2, 4, 6, 8, 10, and 12 print the current contents of the x vector.

The third line inserts the string “line” at position 1, that is, at the second place, just after the string This.

The fifth line inserts the string “contains” in the next position.

The seventh line removes the item that, after the last two insertions, is in position 3, that is, the string “is”, which initially was in position 1.

At this point, we already have the vector we need, but just to show some other features of vectors, we add a string at the end, and next we remove it.

As it is shown, the vector.push(item); statement is equivalent to vector.insert(vector.len(), item);, while the statement vector.pop() is equivalent to vector.remove(vector.len() - 1).

Given that the push and pop functions operate only at the last position, while the insert and remove functions can operate at any position, someone could think that the former two statements are much less used than the latter ones, or even that they are useless. Well, he who thinks so would be wrong, because, using vectors, adding or removing items at the last position is quite typical; it is at least as common as adding or removing items at any other position. This is because inserting or removing items at the end of a vector is faster than adding or removing them at any other position. So, if a specific position is not required, the last position is the preferred one.

Notice that the insert library function operates on three arguments. One is the vector in which the item is to be inserted, and it is written before the name of the function. Another one is the position inside the vector, where the item is to be inserted, and it is passed as the first argument in the parentheses. The third one is the value to insert, passed as the second argument in the parentheses.

If it were written
let mut x = vec!["This", "is", "a", "sentence"];
x.insert("line", 1);

it would generate the compilation errors mismatched types at both arguments of the call to insert. This is because it would mean insert into the vector x the number 1 at position "line". This kind of logic error is reported by the compiler, because the insert function, like any Rust function, requires that its argument list has exactly the types specified by the function definition. The insert function applied to a vector of strings requires two arguments: the first one must be an integer number, and the second argument must be a string. Passing some more arguments, or fewer arguments, or some arguments of different types causes compilation errors.

Ambiguity is possible only with a vector of integer numbers:
let mut _x = vec![12, 13, 14, 15];
_x.insert(3, 1);

This code is valid, but at a glance it is not obvious if it is inserting the number 1 at position 3 (and it is indeed so), or if it is inserting the number 3 at position 1 (and it is not so). Therefore, only with vectors of integer numbers, the logic error of exchanging the two arguments wouldn’t be detected by the compiler. In any other case, the compiler helps to avoid this kind of error.

Empty Arrays and Vectors

We saw that arrays and vectors have generic types, parameterized by the type of their items, and that such type is inferred by the type of the expressions used to initialize such arrays or vectors.

Now, let’s assume we want to invoke a function f, in this way:
f(["help", "debug"], vec![0, 4, 15]);

It accepts two arguments: an array of options, where each option is a string; and a vector of options, where each option is an integer number.

However, if we want to tell such function that we don’t want to pass any option, we could try to write:
f([], vec![]);

This is not allowed, though, because the compiler is not able to determine the type of the array nor the type of the vector. It needs some items to make such inference, and here we have no items.

Then how can we declare an empty array or an empty vector?

If we compile
let _a = [];

we get a compilation error with the message “type annotations needed for `[_; 0]`” and then “cannot infer type”.

Instead, if we write
let _a = [""; 0];

the compilation is successful, and creates an empty array whose type is that of an array of strings. The empty string specified is never used at runtime; it is used only by the compiler to understand that the expression is an array of strings.

Similarly, the code
let _a = vec![true; 0];
let _b = vec![false; 0];

declares two variables having the same type, and initially also the same value, as the true and false expressions are used only to specify the type as Boolean.

Therefore, our preceding function can be invoked in this way:
f([""; 0], vec![0; 0]);

Debug Print

As we saw, the print and println macros accept only a string as their first argument, but the possible further arguments can be of various types, including integer numbers, floating-point numbers, and Boolean values. However, if you want to print the contents of an array or of a vector, the following code is not allowed:
print!("{} {}", [1, 2, 3], vec![4, 5]);

This happens because both the array passed as second argument and the vector passed as third argument have no standard display format, and so two error messages are emitted.

When debugging a program, though, it is useful to display the contents of such structures without having to resort to for loops. For this purpose, you can write
print!("{:?} {:?}", [1, 2, 3], vec![4, 5]);

That will print: [1, 2, 3] [4, 5].

By inserting the characters :? enclosed in the braces of a placeholder, you are telling the print macro (and the println macro) to generate a debug format for the corresponding data. So, whenever you want to print the contents of any variable, if the simple placeholder “{}” does not work, you may hope that the debug placeholder “{:?}” works.

Copying Arrays and Vectors

If you want to copy an entire array or an entire vector, you are not required to write the code that scans its items:
let mut a1 = [4, 56, -2];
let a2 = [7, 81, 12500];
println!("{:?} {:?}", a1, a2);
a1 = a2;
println!("{:?} {:?}", a1, a2);
a1[1] = 10;
println!("{:?} {:?}", a1, a2);
This will print:
[4, 56, -2] [7, 81, 12500]
[7, 81, 12500] [7, 81, 12500]
[7, 10, 12500] [7, 81, 12500]

The first printed line shows the contents of the two arrays just after they are initialized.

The second printed line shows the contents of them just after a2 is copied onto a1. Indeed, the a1 array is exactly overwritten by the contents of the a2 array.

The third printed line shows the contents of the arrays just after the second item of a1 is changed. It appears that nothing else is changed.

Using vectors is not so simple, though.

This code will work correctly:
let mut a1 = vec![4, 56, -2];
let a2 = vec![7, 81, 12500];
println!("{:?} {:?}", a1, a2);
a1 = a2;
println!("{:?}", a1);
a1[1] = 10;
println!("{:?}", a1);
It will print the following lines:
[4, 56, -2] [7, 81, 12500]
[7, 81, 12500]
[7, 10, 12500]

Here, the previous program has been changed by replacing arrays with vectors, and by avoiding printing the a2 variable in the second and third printing statements.

This is because trying to also print a2 would generate a compilation error. For example, if the last line is replaced by this one:
println!("{:?} {:?}", a1, a2);

the error message borrow of moved value: `a2` is printed. It means that the a2 variable hasn’t been simply copied. Actually it has been moved, which means copied and destroyed. So, differing from arrays, when a vector is assigned to another vector, the original vector exists no more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.60.18