© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
C. MilanesiBeginning Rusthttps://doi.org/10.1007/978-1-4842-7208-4_16

16. Using Iterators

Carlo Milanesi1  
(1)
Bergamo, Italy
 
In this chapter, you will learn:
  • How characters are stored in Rust strings, and why they do not allow direct access

  • How to read string characters or string bytes using iterators

  • How to extract items from vectors and arrays using iterators

  • How to get references to items from vectors, arrays, and slices using nonmutating iterators

  • How to modify items from vectors, arrays, and slices using mutating iterators

  • A shorthand notation for using iterators in for loops

  • How to use some iterator adapters: filter, map, and enumerate

  • How to use some iterator consumers: any, all, count, sum, min, max, and collect

  • The concepts of iterator chains and lazy processing

String Characters

We already saw that Rust has both static strings and dynamic strings, and that both types share the same character coding, which is UTF-8. Such coding uses sequences of one to six bytes to represent each Unicode character, so a string is not simply an array of characters, but it is an array of bytes that represents a sequence of characters.

But given that s is a string, what’s the meaning of the expression s[0]? Is it the first character of s or the first byte of s?

Any of these choices could be surprising for someone, so in Rust such an expression is not allowed for strings. To get a byte from a string, it is necessary to first convert the string to a slice of bytes, like in this code:
let s = "abc012è€";
for i in 0..s.len() {
    println!("{}: {}", i, s.as_bytes()[i]);
}
It will print:
0: 97
1: 98
2: 99
3: 48
4: 49
5: 50
6: 195
7: 168
8: 226
9: 130
10: 172

The function as_bytes converts the string to which it is applied into a slice of immutable u8 numbers. Such conversion has zero runtime cost, because the representation of a string buffer is already that sequence of bytes.

The UTF-8 representation of any ASCII character is just the ASCII code of that character. And so, for the characters a, b, c, 0, 1, and 2, their ASCII value is printed.

The è character is represented by a pair of bytes, having values 195 and 168. And the character is represented by a sequence of three bytes, having values 226, 130, and 172. Therefore, to get to a character in a given position in a string, it is necessary to scan all the previous characters.

This situation is similar to that of text files compared with fixed-record-length files. Using a fixed-record-length file, it is possible to read a record in any n position by seeking that position, without previously reading all the preceding lines. But using a variable-line-length file to read the nth line requires you to read all the preceding lines.

Scanning a String

Therefore, to process the characters of a string, it is necessary to scan them.

Let’s assume that, given the string “€èe,” we want to print the third character. First we must scan three bytes to get the first character, because the character is represented by a sequence of three bytes; then we must scan two further bytes to get the second character, because the è character is represented by a sequence of two bytes; then we must scan one further byte to get the third character, because the e character is represented by a sequence of just one byte.

So, we need a way to get the next character of a string, given the current position, and to advance the current position, at the end of the read character.

In computer science, the objects that perform such behavior of extracting an item at the current position in a sequence, and then advance that position, are named iterators (or sometimes cursors). Therefore, we need a string iterator.

Here is a program that uses a string iterator:
fn print_nth_char(s: &str, mut n: u32) {
    let mut iter: std::str::Chars = s.chars();
    loop {
        let item: Option<char> = iter.next();
        match item {
            Some(c) => if n == 0 { print!("{}", c); break; },
            None => { break; },
        }
        n -= 1;
    }
}
print_nth_char("€èe", 0); // It prints: €
print_nth_char("ۏe", 2); // It prints: e

This program first defines a function whose purpose is to receive a string s and a number n, and then to print the character of s at position n (counting from 0), if there is a character at such position, or else to do nothing. The last two lines of the program invoke such a function to print the first and the third character of €èe, and so the program prints €e.

The Rust standard library provides a string iterator type named Chars. Given a string s, you get an iterator over s by evaluating s.chars(), as is done in the second line of the preceding program.

Any iterator has the next function. Such a function returns the next item of the underlying sequence at the current position, and advances the current position.

Some sequences like the range 1.. do not have an end, but most sequences do have an end, like vectors, arrays, and slices. An iterator cannot return the next value when the end of the sequence has been reached. So when an iterator has reached such an end, it must be capable of communicating that there are no more items to return.

To consider the possibility of having finished the sequence, the next function of Rust iterators returns a value of Option<T> type. That value is None if the sequence has no more items.

Using the match statement , the Some case causes the processing of the next character of the string, and the None case causes the exit from the otherwise infinite loop.

If the function argument n was 0, the first character of the string must be printed, and so, at the first iteration of the loop, the value of the c variable would be printed, and the loop would be quit. For any other value of n, nothing is done with that character.

After the match statement, the n counter, which was mutable, is decremented so that, when it reaches 0, the required character to print is also reached.

Given a string, it is easy to print the numeric codes of its characters, as shown by this code:
fn print_codes(s: &str) {
    let mut iter = s.chars();
    loop {
        match iter.next() {
            Some(c) => { println!("{}: {}", c, c as u32); },
            None => { break; },
        }
    }
}
print_codes("ۏe");
It will print:
€: 8364
è: 232
e: 101

For every character, the character itself is printed, along with its numeric code.

Using String Iterators in for Loops

The previous example was somewhat cumbersome, but it can undergo the following drastic syntactic simplification:
fn print_codes(s: &str) {
    for c in s.chars() {
        println!("{}: {}", c, c as u32);
    }
}
print_codes("ۏe");

This program generates the same machine code as the previous one, but it is much clearer for a human reader.

It appears that the expression after the in keyword in a for loop can be an iterator.

But what exactly is an iterator? It is not a type, but, in a way, it is a type specification. An iterator is considered to be any expression that has a next method, with no arguments, returning an Option<T> value.

So far, in for loops, we used ranges. Well, all ranges with a starting limit are actually iterators, as they have the proper next function. Here is some code using ranges as iterators.
let _v1 = (0u32..10).next();
let _v2 = (5u32..).next();
// let _v3 = (..8u32).next(); // ILLEGAL: Missing start value
// let _v4 = (..).next(); // ILLEGAL: Missing start value

The first line is valid, as std::ops::Range<u32> is an iterator.

The second line is also valid, as std::ops::RangeFrom<u32> is an iterator.

The third and fourth lines wouldn’t be legal, as std::ops::RangeTo<u32> and std::ops::RangeFull are not iterators.

It is also possible to iterate over the bytes of a string:
for byte in "ۏe".bytes() {
    print!("{} ", byte);
}

It will print: 226 130 172 195 168 101 . The first three numbers represent the € character; the next two numbers represent the è character; and the last byte, 101, is the ASCII code of the e character.

This program can be broken down into the following code:
let string: &str = "ۏe";
let string_it: std::str::Bytes = string.bytes();
for byte in string_it {
    print!("{} ", byte);
}

While the chars function, seen previously, returns a value whose type is std::str::Chars, the bytes function , used here, returns a value whose type is std::str::Bytes.

Both Chars and Bytes are string iterator types, but while the next function of Chars returns the next character of the string, the next function of Bytes returns the next byte of the string.

These string functions are both different from the as_bytes function, which returns a slice reference on the bytes of the string.

The Rust Editions

The Rust language evolves in major versions named “editions”. So far three editions of Rust have been released: the 2015 Edition, the 2018 Edition, and the 2021 Edition. The last one has been released on October 21st, 2021.

By default, the Rust compiler accepts code written according the 2015 edition. So far all the code we wrote was accepted by all three editions of Rust. Though, since the next section, and also in the ensuing chapters, some features supported only by the 2021 Edition will be used. Therefore, the compiler must be instructed to support such version of the language.

Therefore, since now, the command to compile the example program must be this:
rustc --edition 2021 main.rs

Iterators over Vectors, Arrays, and Slices

In addition to iterating over strings, it is also quite typical to iterate over vectors, arrays, or slices. We have seen that strings have the chars function and the bytes function , which return two different string iterators types. Similarly, vectors, arrays, and slices have the into_iter function that returns an iterator over the sequence it is applied to. Here is some code that uses it:
let vec_iterator: std::vec::IntoIter<i32>
    = vec![10, 20, 30].into_iter();
for item in vec_iterator {
    let j: i32 = item;
    print!("{} ", j + 1);
}

It will print: 11 21 31 .

The into_iter function , when applied to a vector of i32 items, returns an iterator of type IntoIter<i32>. Such iterator generates values taken from that vector.

That code can be simplified as this:
for item in vec![10, 20, 30].into_iter() {
    print!("{} ", item + 1);
}

Previously, we said that any type that implements the function next is said to be an iterator. Vectors do not implement the function next, so they are not iterators, although they implement the function into_iter, which returns an iterator. Any type that implements the function into_iter is said to be iterable.

In addition to vectors, arrays are also iterable, as shown in the following code:
let array_iterator: std::array::IntoIter<i32, 3_usize>
    = [10, 20, 30].into_iter();
for item in array_iterator {
    let j: i32 = item;
    print!("{} ", j + 1);
}

It will print: 11 21 31 .

The into_iter function, when applied to an array, generates an iterator over that array.

Similarly, slices are also iterable, as shown in the following code:
let slice_iterator: std::slice::Iter<i32>
    = [40, 50, 60][0..2].into_iter();
for item in slice_iterator {
    let j: &i32 = item;
    print!("{} ", *j + 1);
}

It will print: 41 51 . Only the two items contained in the slice are passed to the loop.

Notice that, when it is applied to a vector or to an array, the into_iter function returns an IntoIter iterator, which generates values taken from such sequences, as expected. Instead, when the into_iter function is applied to a slice, it returns an Iter iterator, which returns references to the items contained in the slice. The reason for this is that, in general, it may not be allowed to extract items from a slice.

Using the into_iter function on a vector or on an array, you get an iterator that extracts the items from the iterable object. By default, you cannot change such an item, because it is immutable even if the iterated object is mutable. Therefore, this code is illegal:
let mut v = vec![10, 20, 30];
for item in v.into_iter() {
    item += 1; // ILLEGAL: item is not mutable
    print!("{} ", item);
}

It generates the compilation error cannot assign twice to immutable variable `item`.

If we want to change such items, we can do that by decorating the declaration of the loop variable with the mut clause, in this way:
let v = vec![10, 20, 30];
for mut item in v.into_iter() {
    item += 1;
    print!("{} ", item);
}

It will print: 11 21 31 .

This code does not change the original vector, because the increment in the third line acts on a value that has been extracted from the vector.

Iterators Generating References

The examples in the previous sections used iterators that extracted values from the iterated objects, with those being strings, string slices, vectors, or arrays.

Instead, we could prefer to operate on the items of an iterable object while keeping such items inside the object that contains them. So we want our loop variable to be a reference inside the iterable object. With vectors, we can do that using the iter function , available for all iterable types, as shown in this code:
let v = vec![10, 20, 30];
let vec_ref_iterator: std::slice::Iter<i32> = v.iter();
for item_ref in vec_ref_iterator {
    print!("{} ", *item_ref + 1);
}

It will print: 11 21 31 .

The iter function, applied to an object of type Vec<i32>, returns an iterator that generates items whose type is &i32. More in general, the iter function returns an iterator that generates items whose type is a reference to the items contained in the iterable to which it is applied.

Arrays and slices also have a similar iter function, as shown here:
let array_ref_iterator: std::slice::Iter<i32>
    = [10, 20, 30].iter();
for item in array_ref_iterator {
    print!("{} ", *item + 1);
}
print!("; ");
let slice_ref_iterator: std::slice::Iter<i32>
    = [10, 20, 30][0..2].iter();
for item in slice_ref_iterator {
    print!("{} ", *item + 1);
}

It will print: 11 21 31 ; 11 21 .

Actually, for slices, the iter function is equivalent to the into_iter function, as both return an iterator that generates references.

Iterations without Mutation

So far we used iterators over sequences only to read the items contained in such sequences, and this is quite typical.

When iterating over the characters of a string, it is unreasonable to try to change such characters, as the new characters may be represented by a different number of bytes than the existing characters. For example, if an è character is replaced by an e character, two bytes would be replaced by just one byte. Therefore, the Rust standard library has no way to change a string character by character using a character string iterator.

In addition, when iterating over the bytes of a string, it is unsafe to try to change such bytes, as the new bytes may result in a sequence that is not a valid UTF-8 string. Therefore, the Rust standard library has no way to change a string byte by byte using a byte string iterator.

As we have seen, with the iterator obtained using the into_iter or the iter functions on a vector, an array or a slice, you are not allowed to change the items inside such sequences, even if such sequences are mutable.

However, in a mutable vector, array, or slice, you can change a single item by accessing it using its index. An iterator is just another tool to access the items of a sequence. So it is reasonable to desire changing the items of a sequence using an iterator. Such kinds of iterators are shown in the next section.

Iterations with Mutation

So, there is the need to change the values of a sequence using an iterator, though, to such end, a mutable iterator is of no help. In fact, a mutable iterator is an object that can be made to iterate over another sequence, not an object that can be used to mutate the sequence that it iterates over.

A possible use of a mutable iterator is this:
let slice1 = &[3, 4, 5];
let slice2 = &[7, 8];
let mut iterator = slice1.iter();
for item_ref in iterator {
    print!("{} ", *item_ref);
}
print!("; ");
iterator = slice2.iter();
for item_ref in iterator {
    print!("{} ", *item_ref);
}

It will print: 3 4 5 ; 7 8 .

The mutable variable iterator first refers to the sequence slice1 and then to the sequence slice2 . If you remove the mut clause in the third line, you will get the compilation error cannot assign twice to immutable variable `iterator`.

An iterator of type Iter is similar to a reference, in that a mutable reference is not the same concept of a reference to a mutable object.

So if you want to change the values in a sequence through an iterator over such a sequence, you cannot use a normal (mutable or immutable) iterator.

For such a purpose, you need another type of iterator, a mutating iterator, which, of course must be initialized over a mutable sequence, like this code shows:
let mut v = vec![3, 4, 5];
let iterator: std::slice::IterMut<i32> = v.iter_mut();
for mut_item_ref in iterator {
    *mut_item_ref += 1;
}
print!("{:?}", v);

It will print: [4, 5, 6].

The iter_mut function returns an object of type IterMut<i32>.

Think of the purpose of the iter function as “get an iterator that generates references to items to read,” and the purpose of the iter_mut function as “get an iterator that generates references to items to read or to write.”

Notice that the v variable has been declared as mutable. As the purpose of the iter_mut function is to allow changes to the iterated object, such object must be mutable. If you remove the mut clause in the first line, you get the compilation error cannot borrow `v` as mutable, as it is not declared as mutable.

We have seen the use of the iter_mut function applied to a vector. Similar functions exist for arrays and slices.

Shorthand for Using Iterators in Loops

When using for loops, there is a more compact syntax for using iterators.

Instead of writing:
for item in vec![10, 20, 30].into_iter() {
       print!("{} ", item + 1);
}
you can write, equivalently:
for item in vec![10, 20, 30] {
    print!("{} ", item + 1);
}

In this code, the call to into_iter has been removed. If a value that implements the into_iter function is passed to a for statement, such function is implicitly invoked.

Similarly, instead of writing:
for item in vec![10, 20, 30].iter() {
    print!("{} ", *item + 1);
}
you can write equivalently:
for item in &vec![10, 20, 30] {
    print!("{} ", *item + 1);
}

In this code, the call to iter has been removed, and an “&” sign has been added. If a reference to a value that implements the iter function is passed to a for statement, such function is implicitly invoked.

Similarly, instead of writing:
for item in vec![10, 20, 30].iter_mut() {
    *item += 1; print!("{} ", item);
}
you can write equivalently:
for item in &mut vec![10, 20, 30] {
    *item += 1; print!("{} ", item);
}

In this code, the call to iter_mut has been removed, and an “&mut” clause has been added. If a reference to a mutable value that implements the iter_mut function is passed to a for statement, such function is implicitly invoked.

Iterator Generators

So far we have encountered five functions that get a sequence and return an iterator: chars, bytes, into_iter, iter, and iter_mut. Functions that can be applied to a value that is not an iterator, but that return an iterator are named iterator generators, because they transform a noniterator into an iterator.

An Iterator Adapter: filter

Let’s see some other uses of iterators.

Here is a problem that can be solved using iterators: Given an array of numbers, how can I print all the negative numbers of such an array?

A possible solution is this:
let arr = [66, -8, 43, 19, 0, -31];
for n in arr.into_iter() {
    if n < 0 { print!("{} ", n); }
}

It will print: -8 -31 .

But another possible solution is this:
let arr = [66, -8, 43, 19, 0, -31];
for n in arr.into_iter().filter(|x_ref| *x_ref < 0) {
    print!("{} ", n);
}

The filter function is in the standard library. It is to be applied to an iterator, and it takes a closure as argument. As its name suggests, the purpose of this function is filtering the iterated sequence, that is, to discard the items that do not satisfy the criterion implemented by the closure, and let pass only the items that satisfy such criterion.

The filter function gets an item at a time from the iterator, and invokes the closure once for every item, passing to the closure a reference to the current item. In our example, the reference to the current item, which is an integer number, is assigned to the x_ref closure argument.

The closure must return a Boolean that indicates whether the item is accepted (true) or rejected (false) by the filtering. The rejected items are destroyed, while the accepted ones are passed to the surrounding expression.

In fact, the filter function returns an iterator that (when its next function is invoked) produces just the items for which the closure returned true.

As we were interested in accepting only the negative numbers, the condition inside the closure is *x_ref < 0, because we want to compare with zero the item, not its reference.

We said that the filter function returns an iterator. Therefore we can use it inside a for loop, where we used to use iterators.

Because the filter function gets an iterator and returns an iterator, it can be seen that it transforms an iterator into another iterator. Such iterator transformers are usually named iterator adapters. The term adapter recalls that of electrical connectors: if a plug does not fit a socket, you use an adapter.

The map Iterator Adapter

Here is another problem that can be solved using iterators: Given an array of numbers, how can you print the double of each number of that array?

You can do it in this way:
let arr = [66, -8, 43, 19, 0, -31];
for n in arr.into_iter() {
    print!("{} ", n * 2);
}

It will print: 132 -16 86 38 0 -62 .

But you can also do it in this way:
let arr = [66, -8, 43, 19, 0, -31];
for n in arr.into_iter().map(|x| x * 2) {
    print!("{} ", n);
}

The map function is another iterator adapter in the standard library. Its purpose is to transform the values produced by an iterator into other values. Differing from the filter function, the value returned by the closure can be of any type. Such value represents the transformed value.

Actually, the map function returns a newly created iterator that produces all the items returned by the closure received as an argument.

While the filter adapter removes some items of the iterated sequence, and it keeps the others unchanged, the map adapter does not remove any items, but it transforms them.

Another difference between them is that while filter passes a reference as the argument of its closure, map passes a value.

The enumerate Iterator Adapter

In most programming languages, to iterate over a sequence you increment an integer counter and then access the items of the sequence using that counter, in this way:
let arr = ['a', 'b', 'c'];
for index in 0..arr.len() {
    print!("{} {}, ", index, arr[index]);
}

It will print: 0 a, 1 b, 2 c, .

Using an iterator over the sequence, it is possible to avoid using the integer counter, with this code:
let arr = ['a', 'b', 'c'];
for ch in arr.into_iter() {
    print!("{}, ", ch);
}

It will print: a, b, c, .

But if you also need a counter, you could go back to the old technique, or you could add another variable and increment it explicitly, in this way:
let arr = ['a', 'b', 'c'];
let mut index = 0;
for ch in arr.into_iter() {
    print!("{} {}, ", index, ch);
    index += 1;
}
But there is this other possibility:
let arr = ['a', 'b', 'c'];
for (index, ch) in arr.into_iter().enumerate() {
    print!("{} {}, ", index, ch);
}

In the second line, the loop variable is actually a tuple of two variables: the index variable, having type usize; and the ch variable, having type char. At the first iteration, the index variable gets the value 0, while the ch value gets as value the first character of the array. At every iteration, both index and ch receive new values.

This works because the enumerate function takes an iterator and returns another iterator. At each iteration, this returned iterator returns a value of type (usize, char). This tuple has a counter as its first field, and as its second field the same item received from the first iterator.

An Iterator Consumer: any

Given a string, how can you determine if it contains a given character?

You could do it in this way:
let s = "Hello, world!";
let ch = 'R';
let mut contains = false;
for c in s.chars() {
    if c == ch {
        contains = true;
    }
}
print!(""{}" {} '{}'.",
    s,
    if contains {
        "contains"
    } else {
        "does not contain"
    },
    ch);

It will print: "Hello, world!" does not contain 'R'.

It does so because character comparison is case sensitive. But if you replace the uppercase R in the second line with a lowercase r, it will print: "Hello, world!" contains 'r'.

You could do it equivalently in this way:
let s = "Hello, world!";
let ch = 'R';
print!(""{}" {} '{}'.",
    s,
    if s.chars().any(|c| c == ch) {
        "contains"
    } else {
        "does not contain"
    },
    ch);

Here, the contains variable and the loop that possibly sets it to true have been removed; and the only other use of such a variable has been replaced by the expression s.chars().any(|c| c == ch).

As the only purpose of the contains variable was to indicate if the s string contained the ch character, the expression that replaces it must also have the same value.

We know that the s.chars() expression is evaluated to an iterator over the characters of the s string. Then the any function, which is in the standard library, is applied to such iterator. Its purpose is to determine if a Boolean function (a.k.a. predicate) is true for any item produced by the iterator.

The any function receives a closure as an argument. It applies that closure to every item received from the iterator, and it returns true as soon as the closure returns true on an item, or returns false if the closure returns false for all the items.

Therefore, such a function tells us if any item satisfies the condition specified by the closure.

You can also use the any function to determine if an array contains any negative number, as this code shows:
print!("{} ",
    [45, 8, 2, 6].into_iter().any(|n| n < 0));
print!("{} ",
    [45, 8, -2, 6].into_iter().any(|n| n < 0));

It will print: false true .

To clarify, you can annotate the closures with types:
print!("{} ", [45, 8, 2, 6].into_iter()
    .any(|n: i32| -> bool { n < 0 }));
print!("{} ", [45, 8, -2, 6].into_iter()
    .any(|n: i32| -> bool { n < 0 }));

Notice that while the iterator adapters seen previously returned iterators, the any function is applied to an iterator, but it returns a Boolean, not an iterator.

Every function that is applied to an iterator but does not return an iterator is called iterator consumer, because it gets data from an iterator but does not put them into another iterator, so it consumes data instead of adapting data.

The all Iterator Consumer

With the any function , you can determine if at least one iterated item satisfies a condition. And how can you determine if all iterated items satisfy a condition?

You can use the all iterator consumer. For example, to determine if all the numbers in an array are positive, you can write:
print!("{} ", [45, 8, 2, 6].into_iter()
    .all(|n: i32| -> bool { n > 0 }));
print!("{} ", [45, 8, -2, 6].into_iter()
    .all(|n: i32| -> bool { n > 0 }));

It will print: true false .

Notice that while the any function means a repeated application of the OR logical operator, the all function means a repeated application of the AND logical operator.

Notice also that, following the rules of logic, if the any function is applied to an iterator that does not return any item, the any function returns false, whichever is its closure. Similarly, the all function returns true when applied to an empty iterator, whichever is its closure.

The count Iterator Consumer

Given an iterator, how do you know how many items it will produce?

Well, if you have a vector, an array, or a slice, you would best use the len function of such objects, as it is the simplest and fastest way to get their lengths. But if you want to know how many characters there are in a string, you must scan it all, because the number of chars comprising a string is not stored anywhere, unless you did it.

So you need to use the simplest iterator consumer: count.
let s = "ۏe";
print!("{} {}", s.chars().count(), s.len());

It will print: 3 6, meaning that this string contains three characters represented by six bytes.

The count iterator consumer does not get any arguments, and it always returns a usize value.

The sum Iterator Consumer

If, instead of counting the iterated items, you want to add them up, it is almost as simple:
print!("{}", [45, 8, -2, 6].into_iter().sum::<i32>());
It will print: 57. Also, the sum iterator consumer does not get arguments. Yet, it requires a type parameter in angle brackets. It is the type of the returned number. Here, it is required, because otherwise the compiler could not infer such a type. But in other cases, like the following one, it is optional:
let s: i32 = [45, 8, -2, 6].into_iter().sum();
print!("{}", s);

Here, the value returned by sum is assigned to a variable having type i32, so it must return a value of such type.

It is also possible to add the items of an empty sequence:
let s: u32 = [0; 0].into_iter().sum();
print!("{}", s);

It will print: 0.

Notice that while the count function was applicable to any iterator, the sum function is applicable only to iterators that produce addable items. The statement [3.4].into_iter().sum::<f64>(); is valid, while the statement [true].into_iter().sum::<bool>(); is illegal, because it is not allowed to sum Booleans.

The min and max Iterator Consumers

If an iterator produces values that can be compared with one another, it is possible to get the minimum or the maximum of those values. But there is a problem: the empty sequences. If our iterator produces no items, we can count them, and this count is zero; we can add them up, and their sum is zero; but we cannot compute the maximum or the minimum of them. Therefore, the min and the max iterator consumers produce an Option value, which is Some number if they are applied to a nonempty sequence of numbers, but it is None if it is applied to an empty sequence:
let arr = [45, 8, -2, 6];
match arr.into_iter().min() {
    Some(n) => print!("{} ", n),
    _ => (),
}
match arr.into_iter().max() {
    Some(n) => print!("{} ", n),
    _ => (),
}
match [0; 0].into_iter().min() {
    Some(n) => print!("{} ", n),
    _ => print!("---"),
}

It will print: -2 45 ---.

The min and max consumers can also be applied to iterators that produce nonnumeric objects, provided they are comparable. For example, strings are comparable, so we can write this:
let arr = ["hello", "brave", "new", "world"];
match arr.into_iter().min() {
    Some(n) => print!("{} ", n),
    _ => (),
}
match arr.into_iter().max() {
    Some(n) => print!("{} ", n),
    _ => (),
}

It will print: brave world . Those two words are, respectively, the first and last of the array in alphabetical order.

The collect Consumer

The any, all, count, sum, min, and max iterator consumers return simple information regarding a possibly long sequence of items.

But we could wish to put all the consumed items into a vector. Here is a way to do it:
let arr = [36, 1, 15, 9, 4];
let v = arr.into_iter().collect::<Vec<i32>>();
print!("{:?}", v);

It will print: [36, 1, 15, 9, 4].

The collect function has created a new Vec<i32> object, and it has pushed into it all the items received from the iterator.

The collect function can be used to put items into various kinds of variable size collections, like vectors, linked lists, hashtables (but excluding arrays). Therefore, it is a generic function, parameterized by the type of destination collection. The expression it.collect::<Vec<i32>>() means that the numbers generated by the it iterator will be collected into a vector.

If the context does not allow inferring either the type of collection (Vec) or the type of the contained items (i32), we must specify both of them, like in the expression Vec<i32>. Though, in the previous example, the type of the items could be inferred to be i32, so we can write the following code:
let arr = [36, 1, 15, 9, 4];
let v = arr.into_iter().collect::<Vec<_>>();
print!("{:?}", v);

In this code, the type i32 has been replaced by the don’t-care symbol _.

But if even the type of the resulting collection can be inferred, then it can be omitted from the parameterization of collect. So, this program is equivalent to the previous one:
let arr = [36, 1, 15, 9, 4];
let v: Vec<_> = arr.into_iter().collect();
print!("{:?}", v);
In addition, even string characters can be collected into a string or into a vector:
let s = "Hello";
println!("{:?}", s.chars().collect::<String>());
println!("{:?}", s.chars().collect::<Vec<char>>());
It will print:
"Hello"
['H', 'e', 'l', 'l', 'o']

The second and third statements apply the chars function to a string, obtaining an iterator producing characters. But the second statement collects those characters into a String object, while the third statement collects them into a vector of characters.

You can also handle the bytes of strings, in two ways:
let s = "Hello";
println!("{:?}", s.bytes().collect::<Vec<u8>>());
println!("{:?}", s.as_bytes().iter().collect::<Vec<&u8>>());
It will print:
[72, 101, 108, 108, 111]
[72, 101, 108, 108, 111]

The second statement uses the bytes function to obtain an iterator producing bytes. Then those bytes, which are the representation of the characters, are collected into a vector.

The third statement uses the as_bytes function to see the string as a slice of bytes. Next, the iter function is used to obtain an iterator over such slice, producing references to bytes. Then, such references to bytes are collected into a vector. When a vector of references is printed for debugging, the referenced objects are actually printed.

Notice that the collect function cannot be used to put the iterated items into a static string, an array, or a slice, because it needs to allocate the needed space at runtime, and such sequences cannot allocate heap memory.

Iterator Chains

Assume you have an array of numbers, and you want to create a vector containing only the positive numbers of such an array, multiplied by two.

You could write it without using iterators, in this way:
let arr = [66, -8, 43, 19, 0, -31];
let mut v = vec![];
for i in 0..arr.len() {
    if arr[i] > 0 { v.push(arr[i] * 2); }
}
print!("{:?}", v);

It will print: [132, 86, 38].

Or, equivalently, you could use an iterator generator, without using iterator adapters or iterator consumers:
let arr = [66, -8, 43, 19, 0, -31];
let mut v = vec![];
for n in arr.into_iter() {
    if n > 0 { v.push(n * 2); }
}
print!("{:?}", v);
Or, equivalently, you could use an iterator generator and two iterator adapters, without using iterator consumers:
let arr = [66, -8, 43, 19, 0, -31];
let mut v = vec![];
for n in arr
    .into_iter()
    .filter(|x| *x > 0)
    .map(|x| x * 2)
{
    v.push(n);
}
print!("{:?}", v);
Or, equivalently, you could use an iterator generator, two iterator adapters, and an iterator consumer:
let arr = [66, -8, 43, 19, 0, -31];
let v = arr
    .into_iter()
    .filter(|x| *x > 0)
    .map(|x| x * 2)
    .collect::<Vec<_>>();
print!("{:?}", v);

This last version shows a programming pattern that is typical of functional languages: the iterator chain.

From a sequence, an iterator is created; then zero or more iterator adapters are chained; and then an iterator consumer closes the chain.

Such chains begin with an iterator or an iterator generator, they proceed with zero or more iterator adapters, and they end with an iterator consumer.

We saw several iterator generators: chars, bytes, into_iter, iter, and iter_mut; and we saw ranges, which are iterators with no need to be created by a generator.

We saw several iterator adapters: filter, map, and enumerate.

And we saw several iterator consumers: any, all, count, sum, min, max, and collect.

The standard library contains many more such functions.

Iterators Are “Lazy”

Now, let’s see an example that helps us to understand the mechanism behind the behavior of iterators. It is the previous code example, with some debug print statements added:
let v = [66, -8, 43, 19, 0, -31]
    .into_iter()
    .filter(|x| { print!("F{} ", x); *x > 0 })
    .map(|x| { print!("M{} ", x); x * 2 })
    .collect::<Vec<_>>();
print!("{:?}", v);

It will print: F66 M66 F-8 F43 M43 F19 M19 F0 F-31 [132, 86, 38].

The runtime operations are the following ones.

The invocation of into_iter prepares a temporary iterator, but it does not access the array. Let’s name “I” the iterator returned by into_iter.

The invocation of filter on “I” prepares another temporary iterator, but it does not manage data. Let’s name “F” the iterator returned by filter.

The invocation of map on “F” prepares another temporary iterator, but it does not manage data. Let’s name “M” the iterator returned by map.

The invocation of collect on “M” asks “M” for an item; “M” asks “F” for an item; “F” asks “I” for an item. Then “I” takes the number 66 from the array and passes it to “F”, which prints it, checks whether it is positive, and passes it to “M”, which prints it, doubles it, and passes it to collect, which then pushes it into the vector.

Then, collect, because it has just received Some item and not None, asks “M” for another item, and the trip is repeated until the number -8 arrives to “F”, which rejects it as nonpositive. Indeed, -8 is not printed by “M”. At this point, “F,” because before it has just received Some item and has rejected it, asks “I” for another item.

The algorithm proceeds in this way until the array is finished. When “I” cannot find other items in the array, it sends a None to “F” to indicate there are no more items. When “F” receives a None , it sends it to “M”, which sends it to collect, which stops asking items, and the whole statement is finished.

If this whole expression except for the collect invocation is in the header of a for loop, we have this code:
let mut v = vec![];
for item in [66, -8, 43, 19, 0, -31]
    .into_iter()
    .filter(|x| { print!("F{} ", x); *x > 0 })
    .map(|x| { print!("M{} ", x); x * 2 }) {
    v.push(item);
}
print!("{:?}", v);

This code is equivalent: the same mechanism is activated, and it prints the same output.

But let’s omit both the for loop and any iterator consumer.
[66, -8, 43, 19, 0, -31]
    .into_iter()
    .filter(|x| { print!("F{} ", x); *x > 0 })
    .map(|x| { print!("M{} ", x); x * 2 });

This program prints nothing, because it does nothing. Even the compiler reports the warning: unused `Map` that must be used, and then the note: iterators are lazy and do nothing unless consumed.

In computer science, to be lazy means trying to do some processing as late as possible. Iterator adapters are lazy, as they process data only when another function asks them for an item: it can be another iterator adapter, or an iterator consumer, or a for loop, which acts as a consumer. If there is no data sink, there is no data access.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.188.241