Chapter 15. Iterators

It was the end of a very long day.

Phil

An iterator is a value that produces a sequence of values, typically for a loop to operate on. Rust’s standard library provides iterators that traverse vectors, strings, hash tables, and other collections, but also iterators to produce lines of text from an input stream, connections arriving at a network server, values received from other threads over a communications channel, and so on. And of course, you can implement iterators for your own purposes. Rust’s for loop provides a natural syntax for using iterators, but iterators themselves also provide a rich set of methods for mapping, filtering, joining, collecting, and so on.

Rust’s iterators are flexible, expressive, and efficient. Consider the following function, which returns the sum of the first n positive integers (often called the nth triangle number):

fn triangle(n: i32) -> i32 {
    let mut sum = 0;
    for i in 1..n+1 {
        sum += i;
    }
    sum
}

The expression 1..n+1 is a Range<i32> value. A Range<i32> is an iterator that produces the integers from its start value (inclusive) to its end value (exclusive), so you can use it as the operand of the for loop to sum the values from 1 to n.

But iterators also have a fold method, which you can use in the equivalent definition:

fn triangle(n: i32) -> i32 {
    (1..n+1).fold(0, |sum, item| sum + item)
}

Starting with 0 as the running total, fold takes each value that 1..n+1 produces and applies the closure |sum, item| sum + item to the running total and the value. The closure’s return value is taken as the new running total. The last value it returns is what fold itself returns—in this case, the total of the entire sequence. This may look strange if you’re used to for and while loops, but once you’ve gotten used to it, fold is a legible and concise alternative.

This is pretty standard fare for functional programming languages, which put a premium on expressiveness. But Rust’s iterators were carefully designed to ensure that the compiler can translate them into excellent machine code as well. In a release build of the second definition shown before, Rust knows the definition of fold, and inlines it into triangle. Next, the closure |sum, item| sum + item is inlined into that. Finally, Rust examines the combined code and recognizes that there’s a simpler way to sum the numbers from one to n: the sum is always equal to n * (n+1) / 2. Rust translates the entire body of triangle, loop, closure, and all, into a single multiplication instruction and a few other bits of arithmetic.

This example happens to involve simple arithmetic, but iterators also perform well when put to heavier use. They’re another example of Rust providing flexible abstractions that impose little or no overhead in typical use.

The rest of this chapter falls into five parts:

  • First we’ll explain the Iterator and IntoIterator traits, which are the foundation of Rust’s iterators.

  • Then we’ll go over the three stages of a typical iterator pipeline: creating an iterator from some sort of value source; adapting one sort of iterator into another by selecting or processing values as they go by; and then consuming the values the iterator produces.

  • Finally, we’ll show how to implement iterators for your own types.

There are a lot of methods, so it’s fine to skim a section once you’ve got the general idea. But iterators are very common in idiomatic Rust, and being familiar with the tools that come with them is essential to mastering the language.

The Iterator and IntoIterator Traits

An iterator is any value that implements the std::iter::Iterator trait:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
    ... // many default methods
}

Item is the type of value the iterator produces. The next method either returns Some(v), where v is the iterator’s next value, or returns None to indicate the end of the sequence. Here we’ve omitted Iterator’s many default methods; we’ll cover them individually throughout the rest of this chapter.

If there’s a natural way to iterate over some type, it can implement std::iter::IntoIterator, whose into_iter method takes a value and returns an iterator over it:

trait IntoIterator where Self::IntoIter::Item == Self::Item {
    type Item;
    type IntoIter: Iterator;
    fn into_iter(self) -> Self::IntoIter;
}

IntoIter is the type of the iterator value itself, and Item is the type of value it produces. We call any type that implements IntoIterator an iterable, because it’s something you could iterate over if you asked.

Rust’s for loop brings all these parts together nicely. To iterate over a vector’s elements, you can write:

println!("There's:");
let v = vec!["antimony", "arsenic", "aluminum", "selenium"];

for element in &v {
    println!("{}", element);
}

Under the hood, every for loop is just shorthand for calls to IntoIterator and Iterator methods:

let mut iterator = (&v).into_iter();
while let Some(element) = iterator.next() {
    println!("{}", element);
}

The for loop uses IntoIterator::into_iter to convert its operand &v into an iterator, and then calls Iterator::next repeatedly. Each time that returns Some(element), the for loop executes its body; and if it returns None, the loop finishes.

Although a for loop always calls into_iter on its operand, you can also pass iterators to for loops directly; this occurs when you loop over a Range, for example. All iterators automatically implement IntoIterator, with an into_iter method that simply returns the iterator.

If you call an iterator’s next method again after it has returned None, the Iterator trait doesn’t specify what it should do. Most iterators will just return None again, but not all. (If this causes problems, the fuse adaptor covered in “fuse” can help.)

Here’s some terminology for iterators:

  • As we’ve said, an iterator is any type that implements Iterator.

  • An iterable is any type that implements IntoIterator: you can get an iterator over it by calling its into_iter method. The vector reference &v is the iterable in this case.

  • An iterator produces values.

  • The values an iterator produces are items. Here, the items are "antimony", "arsenic", and so on.

  • The code that receives the items an iterator produces is the consumer. In this example, the for loop consumes the iterator’s items.

Creating Iterators

The Rust standard library documentation explains in detail what sort of iterators each type provides, but the library follows some general conventions to help you get oriented and find what you need.

iter and iter_mut Methods

Most collection types provide iter and iter_mut methods that return the natural iterators over the type, producing a shared or mutable reference to each item. Slices like &[T] and &str have iter and iter_mut methods too. These methods are the most common way to get an iterator, if you’re not going to let a for loop take care of it for you:

let v = vec![4, 20, 12, 8, 6];
let mut iterator = v.iter();
assert_eq!(iterator.next(), Some(&4));
assert_eq!(iterator.next(), Some(&20));
assert_eq!(iterator.next(), Some(&12));
assert_eq!(iterator.next(), Some(&8));
assert_eq!(iterator.next(), Some(&6));
assert_eq!(iterator.next(), None);

This iterator’s item type is &i32: each call to next produces a reference to the next element, until we reach the end of the vector.

Each type is free to implement iter and iter_mut in whatever way makes the most sense for its purpose. The iter method on std::path::Path returns an iterator that produces one path component at a time:

use std::ffi::OsStr;
use std::path::Path;

let path = Path::new("C:/Users/JimB/Downloads/Fedora.iso");
let mut iterator = path.iter();
assert_eq!(iterator.next(), Some(OsStr::new("C:")));
assert_eq!(iterator.next(), Some(OsStr::new("Users")));
assert_eq!(iterator.next(), Some(OsStr::new("JimB")));
...

This iterator’s item type is &std::ffi::OsStr, a borrowed slice of a string of the sort accepted by operating system calls.

IntoIterator Implementations

When a type implements IntoIterator, you can call its into_iter method yourself, just as a for loop would:

// You should usually use HashSet, but its iteration order is
// nondeterministic, so BTreeSet works better in examples.
use std::collections::BTreeSet;
let mut favorites = BTreeSet::new();
favorites.insert("Lucy in the Sky With Diamonds".to_string());
favorites.insert("Liebesträume No. 3".to_string());

let mut it = favorites.into_iter();
assert_eq!(it.next(), Some("Liebesträume No. 3".to_string()));
assert_eq!(it.next(), Some("Lucy in the Sky With Diamonds".to_string()));
assert_eq!(it.next(), None);

Most collections actually provide several implementations of IntoIterator, for shared references, mutable references, and moves:

  • Given a shared reference to the collection, into_iter returns an iterator that produces shared references to its items. For example, in the preceding code, (&favorites).into_iter() would return an iterator whose Item type is &String.

  • Given a mutable reference to the collection, into_iter returns an iterator that produces mutable references to the items. For example, if vector is some Vec<String>, the call (&mut vector).into_iter() returns an iterator whose Item type is &mut String.

  • When passed the collection by value, into_iter returns an iterator that takes ownership of the collection and returns items by value; the items’ ownership moves from the collection to the consumer, and the original collection is consumed in the process. For example, the call favorites.into_iter() in the preceding code returns an iterator that produces each string by value; the consumer receives ownership of each string. When the iterator is dropped, any elements remaining in the BTreeSet are dropped too, and the set’s now-empty husk is disposed of.

Since a for loop applies IntoIterator::into_iter to its operand, these three implementations are what create the following idioms for iterating over shared or mutable references to a collection, or consuming the collection and taking ownership of its elements:

for element in &collection { ... }
for element in &mut collection { ... }
for element in collection { ... }

Each of these simply results in a call to one of the IntoIterator implementations listed here.

Not every type provides all three implementations. For example, HashSet, BTreeSet and BinaryHeap don’t implement IntoIterator on mutable references, since modifying their elements would probably violate the type’s invariants: the modified value might have a different hash value, or be ordered differently with respect to its neighbors, so modifying it would leave it incorrectly placed. Other types do support mutation, but only partially. For example, HashMap and BTreeMap produce mutable reference to their entries’ values, but only shared references to their keys, for similar reasons to those given earlier.

The general principle is that iteration should be efficient and predictable, so rather than providing implementations that are expensive or could exhibit surprising behavior (for example, rehashing modified HashSet entries, and potentially revisiting them later in the iteration), Rust omits them entirely.

Slices implement two of the three IntoIterator variants; since they don’t own their elements, there is no “by value” case. Instead, into_iter for &[T] and &mut [T] returns an iterator that produces shared and mutable references to the elements. If you imagine the underlying slice type [T] as a collection of some sort, this fits neatly into the overall pattern.

You may have noticed that the first two IntoIterator variants, for shared and mutable references, are equivalent to calling iter or iter_mut on the referent. Why does Rust provide both?

IntoIterator is what makes for loops work, so that’s obviously necessary. But when you’re not using a for loop, favorites.iter() is clearer than (&favorites).into_iter(). Iteration by shared reference is something you’ll need frequently, so iter and iter_mut are still valuable for their ergonomics.

IntoIterator can also be useful in generic code: you can use a bound like T: IntoIterator to restrict the type variable T to types that can be iterated over. Or, you can write T: IntoIterator<Item=U> to further require the iteration to produce a particular type U. For example, this function dumps values from any iterable whose items are printable with the "{:?}" format:

use std::fmt::Debug;

fn dump<T, U>(t: T)
    where T: IntoIterator<Item=U>,
          U: Debug
{
    for u in t {
        println!("{:?}", u);
    }
}

You can’t write this generic function using iter and iter_mut, since they’re not methods of any trait: most iterable types just happen to have methods by those names.

drain Methods

Many collection types provide a drain method that takes a mutable reference to the collection and returns an iterator that passes ownership of each element to the consumer. However, unlike the into_iter() method, which takes the collection by value and consumes it, drain merely borrows a mutable reference to the collection, and when the iterator is dropped, it removes any remaining elements from the collection, and leaves it empty.

On types that can be indexed by a range, like Strings, vectors, and VecDeques, the drain method takes a range of elements to remove, rather than draining the entire sequence:

use std::iter::FromIterator;

let mut outer = "Earth".to_string();
let inner = String::from_iter(outer.drain(1..4));

assert_eq!(outer, "Eh");
assert_eq!(inner, "art");

If you do need to drain the entire sequence, use the full range, .., as the argument.

Other Iterator Sources

The previous sections are mostly concerned with collection types like vectors and HashMap, but there are many other types in the standard library that support iteration. Table 15-1 summarizes the more interesting ones, but there are many more. We cover some of these methods in more detail in the chapters dedicated to the specific types (namely, Chapters 16, 17, and 18).

Table 15-1. Other iterators in the standard library
Type or trait Expression Notes
std::ops::Range 1..10 Endpoints must be an integer type to be iterable. Range includes start value, and excludes end value.
std::ops::RangeFrom 1.. Unbounded iteration. Start must be an integer. May panic or overflow if the value reaches the limit of the type.
Option<T> Some(10).iter() Behaves like a vector whose length is either 0 (None) or 1 (Some(v)).
Result<T, E> Ok("blah").iter() Similar to Option, producing Ok values.
Vec<T>, &[T] v.windows(16) Produces every contiguous slice of the given length, from left to right. The windows overlap.
v.chunks(16) Produces nonoverlapping, contiguous slices of the given length, from left to right.
v.chunks_mut(1024) Like chunks, but slices are mutable.
v.split(|byte| byte & 1 != 0) Produces slices separated by elements that match the given predicate.
v.split_mut(...) As above, but produces mutable slices.
v.rsplit(...) Like split, but produces slices from right to left.
v.splitn(n, ...) Like split, but produces at most n slices.
String, &str s.bytes() Produces the bytes of the UTF-8 form.
s.chars() Produces the chars the UTF-8 represents.
s.split_whitespace() Splits string by whitespace, and produces slices of nonspace characters.
s.lines() Produces slices of the lines of the string.
s.split('/') Splits string on a given pattern, producing the slices between matches. Patterns can be many things: characters, strings, closures.
s.matches(char::is_numeric) Produces slices matching the given pattern.
std::collections::HashMap,
std::collections::BTreeMap
map.keys(),
map.values()
Produces shared references to keys or values of the map.
map.values_mut() Produces mutable references to entries’ values.
std::collections::HashSet,
std::collections::BTreeSet
set1.union(set2) Produces shared references to elements of union of set1 and set2.
set1.intersection(set2) Produces shared references to elements of intersection of set1 and set2.
std::sync::mpsc::Receiver recv.iter() Produces values sent from another thread on the corresponding Sender.
std::io::Read stream.bytes() Produces bytes from an I/O stream.
stream.chars() Parses stream as UTF-8 and produces chars.
std::io::BufRead bufstream.lines() Parses stream as UTF-8, produces lines as Strings.
bufstream.split(0) Splits stream on given byte, produces inter-byte Vec<u8> buffers.
std::fs::ReadDir std::fs::read_dir(path) Produces directory entries.
std::net::TcpListener listener.incoming() Produces incoming network connections.
Free functions std::iter::empty() Returns None immediately.
std::iter::once(5) Produces the given value, and then ends.
std::iter::repeat("#9") Produces the given value forever.

Iterator Adapters

Once you have an iterator in hand, the Iterator trait provides a broad selection of adapter methods, or simply adapters, that consume one iterator and build a new one with useful behaviors. To see how adapters work, we’ll show how to use two of the most popular ones.

map and filter

The Iterator trait’s map adapter lets you transform an iterator by applying a closure to its items. The filter adapter lets you filter out items from an iterator, using a closure to decide which to keep and which to drop.

For example, suppose you’re iterating over lines of text, and want to omit leading and trailing whitespace from each line. The standard library’s str::trim method drops leading and trailing whitespace from a single &str, returning a new, trimmed &str that borrows from the original. You can use the map adapter to apply str::trim to each line from the iterator:

let text = "  ponies  
   giraffes
iguanas  
squid".to_string();
let v: Vec<&str> = text.lines()
    .map(str::trim)
    .collect();
assert_eq!(v, ["ponies", "giraffes", "iguanas", "squid"]);

The text.lines() call returns an iterator that produces the string’s lines. Calling map on that iterator returns a second iterator that applies str::trim to each line, and produces the results as its items. Finally, collect gathers those items into a vector.

The iterator that map returns is, of course, itself a candidate for further adaptation. If you want to exclude iguanas from the result, you can write the following:

let text = "  ponies  
   giraffes
iguanas  
squid".to_string();
let v: Vec<&str> = text.lines()
    .map(str::trim)
    .filter(|s| *s != "iguanas")
    .collect();
assert_eq!(v, ["ponies", "giraffes", "squid"]);

Here, filter returns a third iterator that produces only those items from the map iterator for which the closure |s| *s != "iguanas" returns true. A chain of iterator adapters is like a pipeline in the Unix shell: each adapter has a single purpose, and it’s clear how the sequence is being transformed as one reads from left to right.

These adapters’ signatures are as follows:

fn map<B, F>(self, f: F) -> some Iterator<Item=B>
    where Self: Sized, F: FnMut(Self::Item) -> B;

fn filter<P>(self, predicate: P) -> some Iterator<Item=Self::Item>
    where Self: Sized, P: FnMut(&Self::Item) -> bool;

The some Iterator<...> notation we’re using for the return types is not valid Rust.1 The real return types are opaque struct types, which aren’t informative; what matters in practice is that these methods return iterators with the given Item type.

Since most adapters take self by value, they require Self to be Sized (which all common iterators are).

A map iterator passes each item to its closure by value, and in turn, passes along ownership of the closure’s result to its consumer. A filter iterator passes each item to its closure by shared reference, retaining ownership in case the item is selected to be passed on to its consumer. This is why the example must dereference s to compare it with "iguanas": the filter iterator’s item type is &str, so the type of the closure’s argument s is &&str.

There are two important points to notice about iterator adapters.

First, simply calling an adapter on an iterator doesn’t consume any items; it just returns a new iterator, ready to produce its own items by drawing from the first iterator as needed. In a chain of adapters, the only way to make any work actually get done is to call next on the final iterator.

So in our earlier example, the method call text.lines() itself doesn’t actually parse any lines from the string; it just returns an iterator that would parse lines if asked. Similarly, map and filter just return new iterators that would map or filter if asked. No work takes place until collect starts calling next on the filter iterator.

This point is especially important if you use adapters that have side effects. For example, this code prints nothing at all:

["earth", "water", "air", "fire"]
    .iter().map(|elt| println!("{}", elt));

The iter call returns an iterator over the array’s elements, and the map call returns a second iterator that applies the closure to each value the first produces. But there is nothing here that ever actually demands a value from the whole chain, so no next method ever runs. In fact, Rust will warn you about this:

warning: unused result which must be used:
iterator adaptors are lazy and do nothing unless consumed
    |
387 | /         ["earth", "water", "air", "fire"]
388 | |             .iter().map(|elt| println!("{}", elt));
    | |___________________________________________________^
    |
= note: #[warn(unused_must_use)] on by default

The term “lazy” in the error message is not a disparaging term; it’s just jargon for any mechanism that puts off a computation until its value is needed. It is Rust’s convention that iterators should do the minimum work necessary to satisfy each call to next; in the example, there are no such calls at all, so no work takes place.

The second important point is that iterator adapters are a zero-overhead abstraction. Since map, filter, and their companions are generic, applying them to an iterator specializes their code for the specific iterator type involved. This means that Rust has enough information to inline each iterator’s next method into its consumer, and then translate the entire arrangement into machine code as a unit. So the lines/map/filter chain of iterators we showed before is as efficient as the code you would probably write by hand:

for line in text.lines() {
    let line = line.trim();
    if line != "iguanas" {
        v.push(line);
    }
}

The rest of this section covers the various adapters available on the Iterator trait.

filter_map and flat_map

The map adapter is fine in situations where each incoming item produces one outgoing item. But what if you want to delete certain items from the iteration instead of processing them, or replace single items with zero or more items? The filter_map and flat_map adapters grant you this flexibility.

The filter_map adapter is similar to map except that it lets its closure either transform the item into a new item (as map does) or drop the item from the iteration. Thus, it’s a bit like a combination of filter and map. Its signature is as follows:

fn filter_map<B, F>(self, f: F) -> some Iterator<Item=B>
    where Self: Sized, F: FnMut(Self::Item) -> Option<B>;

This is the same as map’s signature, except that here the closure returns Option<B>, not simply B. When the closure returns None, the item is dropped from the iteration; when it returns Some(b), then b is the next item the filter_map iterator produces.

For example, suppose you want to scan a string for whitespace-separated words that can be parsed as numbers, and process the numbers, dropping the other words. You can write:

use std::str::FromStr;

let text = "1
frond .25  289
3.1415 estuary
";
for number in text.split_whitespace()
                  .filter_map(|w| f64::from_str(w).ok()) {
    println!("{:4.2}", number.sqrt());
}

This prints the following:

1.00
0.50
17.00
1.77

The closure given to filter_map tries to parse each whitespace-separated slice using f64::from_str. That returns a Result<f64, ParseFloatError>, which .ok() turns into an Option<f64>: a parse error becomes None, whereas a successful parse result becomes Some(v). The filter_map iterator drops all the None values, and produces the value v for each Some(v).

But what’s the point in fusing map and filter into a single operation like this, instead of just using those adapters directly? The filter_map adapter shows its value in situations like the one just shown, when the best way to decide whether to include the item in the iteration is to actually try to process it. You can do the same thing with only filter and map, but it’s a bit ungainly:

text.split_whitespace()
    .map(|w| f64::from_str(w))
    .filter(|r| r.is_ok())
    .map(|r| r.unwrap())

You can think of the flat_map adapter as continuing in the same vein as map and filter_map, except that now the closure can return not just one item (as with map) or zero or one items (as with filter_map), but a sequence of any number of items. The flat_map iterator produces the concatenation of the sequences the closure returns.

The signature of flat_map is shown here:

fn flat_map<U, F>(self, f: F) -> some Iterator<Item=U::Item>
    where F: FnMut(Self::Item) -> U, U: IntoIterator;

The closure passed to flat_map must return an iterable, but any sort of iterable will do.2

For example, suppose we have a table mapping countries to their major cities. Given a list of countries, how can we iterate over their major cities?

use std::collections::HashMap;

let mut major_cities = HashMap::new();
major_cities.insert("Japan", vec!["Tokyo", "Kyoto"]);
major_cities.insert("The United States", vec!["Portland", "Nashville"]);
major_cities.insert("Brazil", vec!["São Paulo", "Brasília"]);
major_cities.insert("Kenya", vec!["Nairobi", "Mombasa"]);
major_cities.insert("The Netherlands", vec!["Amsterdam", "Utrecht"]);

let countries = ["Japan", "Brazil", "Kenya"];

for &city in countries.iter().flat_map(|country| &major_cities[country]) {
    println!("{}", city);
}

This prints the following:

Tokyo
Kyoto
São Paulo
Brasília
Nairobi
Mombasa

One way to look at this would be to say that, for each country, we retrieve the vector of its cities, concatenate all the vectors together into a single sequence, and print that.

But remember that iterators are lazy: it’s only the for loop’s calls to the flat_map iterator’s next method that cause work to be done. The full concatenated sequence is never constructed in memory. Instead, what we have here is a little state machine that draws from the city iterator, one item at a time, until it’s exhausted, and only then produces a new city iterator for the next country. The effect is that of a nested loop, but packaged up for use as an iterator.

scan

The scan adapter resembles map, except that the closure is given a mutable value it can consult, and has the option of terminating the iteration early. It takes an initial state value, and then a closure that accepts a mutable reference to the state, and the next item from the underlying iterator. The closure must return an Option, which the scan iterator takes as its next item.

For example, here’s an iterator chain that squares another iterator’s items, and terminates the iteration once their sum exceeds 10:

let iter = (0..10)
    .scan(0, |sum, item| {
        *sum += item;
        if *sum > 10 {
            None
        } else {
            Some(item * item)
        }
    });

assert_eq!(iter.collect::<Vec<i32>>(), vec![0, 1, 4, 9, 16]);

The closure’s sum argument is a mutable reference to a value private to the iterator and initialized to scan’s first argument—in this case, 0. The closure updates *sum, checks whether it has exceeded the limit, and returns the iterator’s next result.

take and take_while

The Iterator trait’s take and take_while adapters let you end an iteration after a certain number of items, or when a closure decides to cut things off. Their signatures are as follows:

fn take(self, n: usize) -> some Iterator<Item=Self::Item>
    where Self: Sized;

fn take_while<P>(self, predicate: P) -> some Iterator<Item=Self::Item>
    where Self: Sized, P: FnMut(&Self::Item) -> bool;

Both take ownership of an iterator and return a new iterator that passes along items from the first one, possibly ending the sequence earlier. The take iterator returns None after producing at most n items. The take_while iterator applies predicate to each item, and returns None in place of the first item for which predicate returns false, and on every subsequent call to next.

For example, given an email message with a blank line separating the headers from the message body, you can use take_while to iterate over only the headers:

let message = "To: jimb

               From: superego <[email protected]>

               

               Did you get any writing done today?

               When will you stop wasting time plotting fractals?
";
for header in message.lines().take_while(|l| !l.is_empty()) {
    println!("{}" , header);
}

Recall from “String Literals” that when a line in a string ends with a backslash, Rust doesn’t include the indentation of the next line in the string, so none of the lines in the string have any leading whitespace. This means that the third line of message is blank. The take_while adapter terminates the iteration as soon as it sees that blank line, so this code prints only the two lines:

To: jimb
From: superego <editor@oreilly.com>

skip and skip_while

The Iterator trait’s skip and skip_while methods are the complement of take and take_while: they drop a certain number of items from the beginning of an iteration, or drop items until a closure finds one acceptable, and then pass the remaining items through unchanged. Their signatures are as follows:

fn skip(self, n: usize) -> some Iterator<Item=Self::Item>
    where Self: Sized;

fn skip_while<P>(self, predicate: P) -> some Iterator<Item=Self::Item>
    where Self: Sized, P: FnMut(&Self::Item) -> bool;

One common use for the skip adapter is to skip the command name when iterating over a program’s command-line arguments. In Chapter 2, our greatest common denominator calculator used the following code to loop over its command-line arguments:

for arg in std::env::args().skip(1) {
    ...
}

The std::env::args function returns an iterator that produces the program’s arguments as Strings, the first item being the name of the program itself. That’s not a string we want to process in this loop. Calling skip(1) on that iterator returns a new iterator that drops the program name the first time it’s called, and then produces all the subsequent arguments.

The skip_while adapter uses a closure to decide how many items to drop from the beginning of the sequence. You can iterate over the body lines of the message from the previous section like this:

for body in message.lines()
    .skip_while(|l| !l.is_empty())
    .skip(1) {
    println!("{}" , body);
}

This uses skip_while to skip nonblank lines, but that iterator does produce the blank line itself—after all, the closure returned false for that line. So we use the skip method as well to drop that, giving us an iterator whose first item will be the message body’s first line. Taken together with the declaration of message from the previous section, this code prints:

Did you get any writing done today?
When will you stop wasting time plotting fractals?

peekable

A peekable iterator lets you peek at the next item that will be produced without actually consuming it. You can turn almost any iterator into a peekable iterator by calling the Iterator trait’s peekable method:

fn peekable(self) -> std::iter::Peekable<Self>
    where Self: Sized;

Here, Peekable<Self> is a struct that implements Iterator<Item=Self::Item>, and Self is the type of the underlying iterator.

A Peekable iterator has an additional method peek that returns an Option<&Item>: None if the underlying iterator is done, and otherwise Some(r), where r is a shared reference to the next item. (Note that, if the iterator’s item type is already a reference to something, this ends up being a reference to a reference.)

Calling peek tries to draw the next item from the underlying iterator, and if there is one, caches it until the next call to next. All the other Iterator methods on Peekable know about this cache: for example, iter.last() on a peekable iterator iter knows to check the cache after exhausting the underlying iterator.

Peekable iterators are essential when you can’t decide how many items to consume from an iterator until you’ve gone too far. For example, if you’re parsing numbers from a stream of characters, you can’t decide where the number ends until you’ve seen the first non-number character following it:

use std::iter::Peekable;

fn parse_number<I>(tokens: &mut Peekable<I>) -> u32
    where I: Iterator<Item=char>
{
    let mut n = 0;
    loop {
        match tokens.peek() {
            Some(r) if r.is_digit(10) => {
                n = n * 10 + r.to_digit(10).unwrap();
            }
            _ => return n
        }
        tokens.next();
    }
}

let mut chars = "226153980,1766319049".chars().peekable();
assert_eq!(parse_number(&mut chars), 226153980);
// Look, `parse_number` didn't consume the comma! So we will.
assert_eq!(chars.next(), Some(','));
assert_eq!(parse_number(&mut chars), 1766319049);
assert_eq!(chars.next(), None);

The parse_number function uses peek to check the next character, and consumes it only if it is a digit. If it isn’t a digit or the iterator is exhausted (that is, if peek returns None), we return the number we’ve parsed and leave the next character in the iterator, ready to be consumed.

fuse

Once an Iterator has returned None, the trait doesn’t specify how it ought to behave if you call its next method again. Most iterators just return None again, but not all. If your code counts on that behavior, you may be in for a surprise.

The fuse adapter takes any iterator and turns into one that will definitely continue to return None once it has done so the first time:

struct Flaky(bool);

impl Iterator for Flaky {
    type Item = &'static str;
    fn next(&mut self) -> Option<Self::Item> {
        if self.0 {
            self.0 = false;
            Some("totally the last item")
        } else {
            self.0 = true; // D'oh!
            None
        }
    }
}

let mut flaky = Flaky(true);
assert_eq!(flaky.next(), Some("totally the last item"));
assert_eq!(flaky.next(), None);
assert_eq!(flaky.next(), Some("totally the last item"));

let mut not_flaky = Flaky(true).fuse();
assert_eq!(not_flaky.next(), Some("totally the last item"));
assert_eq!(not_flaky.next(), None);
assert_eq!(not_flaky.next(), None);

The fuse adapter is probably most useful in generic code that needs to work with iterators of uncertain origin. Rather than hoping that every iterator you’ll have to deal with will be well-behaved, you can use fuse to make sure.

Reversible Iterators and rev

Some iterators are able to draw items from both ends of the sequence. You can reverse such iterators by using the rev adapter. For example, an iterator over a vector could just as easily draw items from the end of the vector as from the start. Such iterators can implement the std::iter::DoubleEndedIterator trait, which extends Iterator:

trait DoubleEndedIterator: Iterator {
    fn next_back(&mut self) -> Option<Self::Item>;
}

You can think of a double-ended iterator as having two fingers marking the current front and back of the sequence. Drawing items from either end advances that finger toward the other; when the two meet, the iteration is done:

use std::iter::DoubleEndedIterator;

let bee_parts = ["head", "thorax", "abdomen"];

let mut iter = bee_parts.iter();
assert_eq!(iter.next(),      Some(&"head"));
assert_eq!(iter.next_back(), Some(&"abdomen"));
assert_eq!(iter.next(),      Some(&"thorax"));

assert_eq!(iter.next_back(), None);
assert_eq!(iter.next(),      None);

The structure of an iterator over a slice makes this behavior easy to implement: it is literally a pair of pointers to the start and end of the range of elements we haven’t yet produced; next and next_back simply draw an item from the one or the other. Iterators for ordered collections like BTreeSet and BTreeMap are double-ended too: their next_back method draws the greatest elements or entries first. In general, the standard library provides double-ended iteration whenever it’s practical.

But not all iterators can do this so easily: an iterator producing values from other threads arriving at a channel’s Receiver has no way to anticipate what the last value received might be. In general, you’ll need to check the standard library’s documentation to see which iterators implement DoubleEndedIterator and which don’t.

If an iterator is double-ended, you can reverse it with the rev adapter:

fn rev(self) -> some Iterator<Item=Self>
    where Self: Sized + DoubleEndedIterator;

The returned iterator is also double-ended: its next and next_back methods are simply exchanged:

let meals = ["breakfast", "lunch", "dinner"];

let mut iter = meals.iter().rev();
assert_eq!(iter.next(), Some(&"dinner"));
assert_eq!(iter.next(), Some(&"lunch"));
assert_eq!(iter.next(), Some(&"breakfast"));
assert_eq!(iter.next(), None);

Most iterator adapters, if applied to a reversible iterator, return another reversible iterator. For example, map and filter preserve reversibility.

inspect

The inspect adapter is handy for debugging pipelines of iterator adapters, but it isn’t used much in production code. It simply applies a closure to a shared reference to each item, and then passes the item through. The closure can’t affect the items, but it can do things like print them or make assertions about them.

This example shows a case in which converting a string to uppercase changes its length:

let upper_case: String = "große".chars()
    .inspect(|c| println!("before: {:?}", c))
    .flat_map(|c| c.to_uppercase())
    .inspect(|c| println!(" after:     {:?}", c))
    .collect();
assert_eq!(upper_case, "GROSSE");

The uppercase equivalent of the lowercase German letter “ß” is “SS”, which is why char::to_uppercase returns an iterator over characters, not a single replacement character. The preceding code uses flat_map to concatenate all the sequences that to_uppercase returns into a single String, printing the following as it does so:

before: 'g'
 after:     'G'
before: 'r'
 after:     'R'
before: 'o'
 after:     'O'
before: 'ß'
 after:     'S'
 after:     'S'
before: 'e'
 after:     'E'

chain

The chain adapter appends one iterator to another. More precisely, i1.chain(i2) returns an iterator that draws items from i1 until it’s exhausted, and then draws items from i2.

The chain adapter’s signature is as follows:

fn chain<U>(self, other: U) -> some Iterator<Item=Self::Item>
    where Self: Sized, U: IntoIterator<Item=Self::Item>;

In other words, you can chain an iterator together with any iterable that produces the same item type.

For example:

let v: Vec<i32> = (1..4).chain(vec![20, 30, 40]).collect();
assert_eq!(v, [1, 2, 3, 20, 30, 40]);

A chain iterator is reversible, if both of its underlying iterators are:

let v: Vec<i32> = (1..4).chain(vec![20, 30, 40]).rev().collect();
assert_eq!(v, [40, 30, 20, 3, 2, 1]);

A chain iterator keeps track of whether each of the two underlying iterators has returned None, and directs next and next_back calls to one or the other as appropriate.

enumerate

The Iterator trait’s enumerate adapter attaches a running index to the sequence, taking an iterator that produces items A, B, C, ... and returning an iterator that produces pairs (0, A), (1, B), (2, C), .... It looks trivial at first glance, but it’s used surprisingly often.

Consumers can use that index to distinguish one item from another, and establish the context in which to process each one. For example, the Mandelbrot set plotter in Chapter 2 splits the image into eight horizontal bands and assigns each one to a different thread. That code uses enumerate to tell each thread which portion of the image its band corresponds to.

Starting with a rectangular buffer of pixels:

let mut pixels = vec![0; columns * rows];

It uses chunks_mut to split the image into horizontal bands, one per thread:

let threads = 8;
let band_rows = rows / threads + 1;
...
let bands: Vec<&mut [u8]> = pixels.chunks_mut(band_rows * columns).collect();

And then it iterates over the bands, starting a thread for each one:

for (i, band) in bands.into_iter().enumerate() {
    let top = band_rows * i;
    // start a thread to render rows `top..top + band_rows`
}

Each iteration gets a pair (i, band), where band is the &mut [u8] slice of the pixel buffer the thread should draw into, and i is the index of that band in the overall image, courtesy of the enumerate adapter. Given the boundaries of the plot and the size of the bands, this is enough information for the thread to determine which portion of the image it has been assigned, and thus what to draw into band.

zip

The zip adapter combines two iterators into a single iterator that produces pairs holding one value from each iterator, like a zipper joining its two sides into a single seam. The zipped iterator ends when either of the two underlying iterators ends.

For example, you can get the same effect as the enumerate adapter by zipping the half-open range 0.. with the other iterator:

let v: Vec<_> = (0..).zip("ABCD".chars()).collect();
assert_eq!(v, vec![(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D')]);

In this sense, you can think of zip as a generalization of enumerate: whereas enumerate attaches indices to the sequence, zip attaches any arbitrary iterator’s items. We suggested before that enumerate can help provide context for processing items; zip is a more flexible way to do the same.

The argument to zip doesn’t need to be an iterator itself; it can be any iterable:

use std::iter::repeat;

let endings = vec!["once", "twice", "chicken soup with rice"];
let rhyme: Vec<_> = repeat("going")
    .zip(endings)
    .collect();
assert_eq!(rhyme, vec![("going", "once"),
                       ("going", "twice"),
                       ("going", "chicken soup with rice")]);

by_ref

Throughout this section, we’ve been attaching adapters to iterators. Once you’ve done so, can you ever take the adapter off again? Usually, no: adapters take ownership of the underlying iterator, and provide no method to give it back.

An iterator’s by_ref method borrows a mutable reference to the iterator, so that you can apply adaptors to the reference. When you’re done consuming items from these adaptors, you drop them, the borrow ends, and you regain access to your original iterator.

For example, earlier in the chapter we showed how to use take_while and skip_while to process the header lines and body of a mail message. But what if you want to do both, using the same underlying iterator? Using by_ref, we can use take_while to handle the headers, and when that’s done, get the underlying iterator back, which take_while has left exactly in position to handle the message body:

let message = "To: jimb

               From: id

               

               Oooooh, donuts!!
";

let mut lines = message.lines();

println!("Headers:");
for header in lines.by_ref().take_while(|l| !l.is_empty()) {
    println!("{}" , header);
}

println!("
Body:");
for body in lines {
    println!("{}" , body);
}

The call lines.by_ref() borrows a mutable reference to the iterator, and it is this reference that the take_while iterator takes ownership of. That iterator goes out of scope at the end of the first for loop, meaning that the borrow has ended, so you can use lines again in the second for loop. This prints the following:

Headers:
To: jimb
From: id

Body:
Oooooh, donuts!!

The by_ref adapter’s definition is trivial: it returns a mutable reference to the iterator. Then, the standard library includes this strange little implementation:

impl<'a, I: Iterator + ?Sized> Iterator for &'a mut I {
    type Item = I::Item;
    fn next(&mut self) -> Option<I::Item> {
        (**self).next()
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        (**self).size_hint()
    }
}

In other words, if I is some iterator type, then &mut I is an iterator too, whose next and size_hint methods defer to its referent. When you call an adapter on a mutable reference to an iterator, the adapter takes ownership of the reference, not the iterator itself. That’s just a borrow that ends when the adapter goes out of scope.

cloned

The cloned adapter takes an iterator that produces references, and returns an iterator that produces values cloned from those references. Naturally, the referent type must implement Clone. For example:

let a = ['1', '2', '3', '∞'];

assert_eq!(a.iter().next(),          Some(&'1'));
assert_eq!(a.iter().cloned().next(), Some('1'));

cycle

The cycle adapter returns an iterator that endlessly repeats the sequence produced by the underlying iterator. The underlying iterator must implement std::clone::Clone, so that cycle can save its initial state and reuse it each time the cycle starts again.

For example:

let dirs = ["North", "East", "South", "West"];
let mut spin = dirs.iter().cycle();
assert_eq!(spin.next(), Some(&"North"));
assert_eq!(spin.next(), Some(&"East"));
assert_eq!(spin.next(), Some(&"South"));
assert_eq!(spin.next(), Some(&"West"));
assert_eq!(spin.next(), Some(&"North"));
assert_eq!(spin.next(), Some(&"East"));

Or, for a really gratuitous use of iterators:

use std::iter::{once, repeat};

let fizzes = repeat("").take(2).chain(once("fizz")).cycle();
let buzzes = repeat("").take(4).chain(once("buzz")).cycle();
let fizzes_buzzes = fizzes.zip(buzzes);

let fizz_buzz = (1..100).zip(fizzes_buzzes)
    .map(|tuple|
         match tuple {
             (i, ("", "")) => i.to_string(),
             (_, (fizz, buzz)) => format!("{}{}", fizz, buzz)
         });

for line in fizz_buzz {
    println!("{}", line);
}

This plays a children’s word game, now sometimes used as a job interview question for coders, in which the players take turns counting, replacing any number divisible by three with the word “fizz”, and any number divisible by five with “buzz”. Numbers divisible by both become “fizzbuzz”.

Consuming Iterators

So far we’ve covered creating iterators, and adapting them into new iterators; here we finish off the process by showing ways to consume them.

Of course, you can consume an iterator with a for loop, or call next explicitly, but there are many common tasks that you shouldn’t have to write out again and again. The Iterator trait provides a broad selection of methods to cover many of these.

Simple Accumulation: count, sum, product

The count method draws items from an iterator until it returns None, and tells you how many it got. Here’s a short program that counts the number of lines on its standard input:

use std::io::prelude::*;

fn main() {
    let stdin = std::io::stdin();
    println!("{}", stdin.lock().lines().count());
}

The sum and product methods compute the sum or product of the iterator’s items, which must be integers or floating-point numbers:

fn triangle(n: u64) -> u64 {
    (1..n+1).sum()
}
assert_eq!(triangle(20), 210);

fn factorial(n: u64) -> u64 {
    (1..n+1).product()
}
assert_eq!(factorial(20), 2432902008176640000);

(You can extend sum and product to work with other types by implementing the std::iter::Sum and std::iter::Product traits, which we won’t describe in this book.)

max, min

The min and max methods on Iterator return the least or greatest item the iterator produces. The iterator’s item type must implement std::cmp::Ord, so that items can be compared with one another. For example:

assert_eq!([-2, 0, 1, 0, -2, -5].iter().max(), Some(&1));
assert_eq!([-2, 0, 1, 0, -2, -5].iter().min(), Some(&-5));

These methods return an Option<Self::Item>, so that they can return None if the iterator produces no items.

As explained in “Equality Tests”, Rust’s floating-point types f32 and f64 implement only std::cmp::PartialOrd, not std::cmp::Ord, so you can’t use the min and max methods to compute the least or greatest of a sequence of floating-point numbers. This is not a popular aspect of Rust’s design, but it is deliberate: it’s not clear what such functions should do with IEEE NaN values. Simply ignoring them would risk masking more serious problems in the code.

If you know how you would like to handle NaN values, you can use the max_by and min_by iterator methods instead, which let you supply your own comparison function.

max_by, min_by

The max_by and min_by methods return the maximum or minimum item the iterator produces, as determined by a comparison function you provide:

use std::cmp::{PartialOrd, Ordering};

// Compare two f64 values. Panic if given a NaN.
fn cmp(lhs: &&f64, rhs: &&f64) -> Ordering {
    lhs.partial_cmp(rhs).unwrap()
}

let numbers = [1.0, 4.0, 2.0];
assert_eq!(numbers.iter().max_by(cmp), Some(&4.0));
assert_eq!(numbers.iter().min_by(cmp), Some(&1.0));

let numbers = [1.0, 4.0, std::f64::NAN, 2.0];
assert_eq!(numbers.iter().max_by(cmp), Some(&4.0)); // panics

(The double references in cmp’s parameters arise because numbers.iter() produces references to the elements, and then max_by and min_by pass the closure references to the iterator’s items.)

max_by_key, min_by_key

The max_by_key and min_by_key methods on Iterator let you select the maximum or minimum item as determined by a closure applied to each item. The closure can select some field of the item, or perform a computation on the items. Since you’re often interested in data associated with some minimum or maximum, not just the extremum itself, these functions are often more useful than min and max. Their signatures are as follows:

fn min_by_key<B: Ord, F>(self, f: F) -> Option<Self::Item>
    where Self: Sized, F: FnMut(&Self::Item) -> B;

fn max_by_key<B: Ord, F>(self, f: F) -> Option<Self::Item>
    where Self: Sized, F: FnMut(&Self::Item) -> B;

That is, given a closure that takes an item and returns any ordered type B, return the item for which the closure returned the maximum or minimum B, or None if no items were produced.

For example, if you need to scan a hash table of cities to find the cities with the largest and smallest populations, you could write:

use std::collections::HashMap;

let mut populations = HashMap::new();
populations.insert("Portland",  583_776);
populations.insert("Fossil",        449);
populations.insert("Greenhorn",       2);
populations.insert("Boring",      7_762);
populations.insert("The Dalles", 15_340);

assert_eq!(populations.iter().max_by_key(|&(_name, pop)| pop),
           Some((&"Portland", &583_776)));
assert_eq!(populations.iter().min_by_key(|&(_name, pop)| pop),
           Some((&"Greenhorn", &2)));

The closure |&(_name, pop)| pop gets applied to each item the iterator produces, and returns the value to use for comparison—in this case, the city’s population. The value returned is the entire item, not just the value the closure returns. (Naturally, if you were making queries like this often, you’d probably want to arrange for a more efficient way to find the entries than making a linear search through the table.)

Comparing Item Sequences

You can use the < and == operators to compare strings, vectors, and slices, assuming their individual elements can be compared. Although iterators do not support Rust’s comparison operators, they do provide methods like eq and lt that do the same job, drawing pairs of items from the iterators and comparing them until a decision can be reached. For example:

let packed =  "Helen of Troy";
let spaced =  "Helen   of    Troy";
let obscure = "Helen of Sandusky"; // nice person, just not famous

assert!(packed != spaced);
assert!(packed.split_whitespace().eq(spaced.split_whitespace()));

// This is true because ' ' < 'o'.
assert!(spaced < obscure);

// This is true because 'Troy' > 'Sandusky'.
assert!(spaced.split_whitespace().gt(obscure.split_whitespace()));

The calls to split_whitespace return iterators over the whitespace-separated words of the string. Using the eq and gt methods on these iterators performs a word-by-word comparison, instead of a character-by-character comparison. These are all possible because &str implements PartialOrd and PartialEq.

Iterators provide the eq and ne methods for equality comparisons, and lt, le, gt, and ge methods for ordered comparisons. The cmp and partial_cmp methods behave like the corresponding methods of the Ord and PartialOrd traits.

any and all

The any and all methods apply a closure to each item the iterator produces, and return true if the closure returns true for any item, or for all the items:

let id = "Iterator";

assert!( id.chars().any(char::is_uppercase));
assert!(!id.chars().all(char::is_uppercase));

These methods consume only as many items as they need to determine the answer. For example, if the closure ever returns true for a given item, then any returns true immediately, without drawing any more items from the iterator.

position, rposition, and ExactSizeIterator

The position method applies a closure to each item from the iterator and returns the index of the first item for which the closure returns true. More precisely, it returns an Option of the index: if the closure returns true for no item, position returns None. It stops drawing items as soon as the closure returns true. For example:

let text = "Xerxes";
assert_eq!(text.chars().position(|c| c == 'e'), Some(1));
assert_eq!(text.chars().position(|c| c == 'z'), None);

The rposition method is the same, except that it searches from the right. For example:

let bytes = b"Xerxes";
assert_eq!(bytes.iter().rposition(|&c| c == b'e'), Some(4));
assert_eq!(bytes.iter().rposition(|&c| c == b'X'), Some(0));

The rposition method requires a reversible iterator, so that it can draw items from the right end of the sequence. It also requires an exact-size iterator, so that it can assign indices the same way position would, starting with 0 at the left. An exact-size iterator is one that implements the std::iter::ExactSizeIterator trait:

pub trait ExactSizeIterator: Iterator {
    fn len(&self) -> usize { ... }
    fn is_empty(&self) -> bool { ... }
}

The len method returns the number of items remaining, and the is_empty method returns true if iteration is complete.

Naturally, not every iterator knows how many items it will produce in advance; in the preceding examples, the chars iterator on &str does not (UTF-8 is a variable-width encoding), so you can’t use rposition on strings. But an iterator over an array of bytes certainly knows the array’s length, so it can implement ExactSizeIterator.

fold

The fold method is a very general tool for accumulating some sort of result over the entire sequence of items an iterator produces. Given an initial value, which we’ll call the accumulator, and a closure, fold repeatedly applies the closure to the current accumulator and the next item from the iterator. The value the closure returns is taken as the new accumulator, to be passed to the closure with the next item. The final accumulator value is what fold itself returns. If the sequence is empty, fold simply returns the initial accumulator.

Many of the other methods for consuming an iterator’s values can be written as uses of fold:

let a = [5, 6, 7, 8, 9, 10];

assert_eq!(a.iter().fold(0, |n, _| n+1), 6);        // count
assert_eq!(a.iter().fold(0, |n, i| n+i), 45);       // sum
assert_eq!(a.iter().fold(1, |n, i| n*i), 151200);   // product

// max
assert_eq!(a.iter().fold(i32::min_value(), |m, &i| std::cmp::max(m, i)),
           10);

The fold method’s signature is as follows:

fn fold<A, F>(self, init: A, f: F) -> A
    where Self: Sized, F: FnMut(A, Self::Item) -> A;

Here, A is the accumulator type. The init argument is an A, as is the closure’s first argument and return value, and the return value of fold itself.

Note that the accumulator values are moved into and out of the closure, so you can use fold with non-Copy accumulator types:

let a = ["Pack ", "my ", "box ", "with ",
         "five ", "dozen ", "liquor ", "jugs"];

let pangram = a.iter().fold(String::new(),
                            |mut s, &w| { s.push_str(w); s });
assert_eq!(pangram, "Pack my box with five dozen liquor jugs");

nth

The nth method takes an index n, skips that many items from the iterator, and returns the next item, or None if the sequence ends before that point. Calling .nth(0) is equivalent to .next().

It doesn’t take ownership of the iterator the way an adapter would, so you can call it many times.

let mut squares = (0..10).map(|i| i*i);

assert_eq!(squares.nth(4), Some(16));
assert_eq!(squares.nth(0), Some(25));
assert_eq!(squares.nth(6), None);

Its signature is shown here:

fn nth(&mut self, n: usize) -> Option<Self::Item>
    where Self: Sized;

last

The last method consumes items until the iterator returns None, and then returns the last item. If the iterator produces no items, then last returns None. Its signature is as follows:

fn last(self) -> Option<Self::Item>;

For example:

let squares = (0..10).map(|i| i*i);
assert_eq!(squares.last(), Some(81));

This consumes all the iterator’s items starting from the front, even if the iterator is reversible. If you have a reversible iterator and don’t need to consume all its items, you should instead just write iter.rev().next().

find

The find method draws items from an iterator, returning the first item for which the given closure returns true, or None if the sequence ends before a suitable item is found. Its signature is:

fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
    where Self: Sized,
          P: FnMut(&Self::Item) -> bool;

For example, using the table of cities and populations from “max_by_key, min_by_key”, you could write:

assert_eq!(populations.iter().find(|&(_name, &pop)| pop > 1_000_000),
           None);
assert_eq!(populations.iter().find(|&(_name, &pop)| pop > 500_000),
           Some((&"Portland", &583_776)));

None of the cities in the table have a population above a million, but there is one city with half a million people.

Building Collections: collect and FromIterator

Throughout the book, we’ve been using the collect method to build vectors holding an iterator’s items. For example, in Chapter 2, we called std::env::args() to get an iterator over the program’s command-line arguments, and then called that iterator’s collect method to gather them into a vector:

let args: Vec<String> = std::env::args().collect();

But collect isn’t specific to vectors: in fact, it can build any kind of collection from Rust’s standard library, as long as the iterator produces a suitable item type:

use std::collections::{HashSet, BTreeSet, LinkedList, HashMap, BTreeMap};

let args: HashSet<String> = std::env::args().collect();
let args: BTreeSet<String> = std::env::args().collect();
let args: LinkedList<String> = std::env::args().collect();

// Collecting a map requires (key, value) pairs, so for this example,
// zip the sequence of strings with a sequence of integers.
let args: HashMap<String, usize> = std::env::args().zip(0..).collect();
let args: BTreeMap<String, usize> = std::env::args().zip(0..).collect();

// and so on

Naturally, collect itself doesn’t know how to construct all these types. Rather, when some collection type like Vec or HashMap knows how to construct itself from an iterator, it implements the std::iter::FromIterator trait, for which collect is just a convenient veneer:

trait FromIterator<A>: Sized {
    fn from_iter<T: IntoIterator<Item=A>>(iter: T) -> Self;
}

If a collection type implements FromIterator<A>, then its static method from_iter builds a value of that type from an iterable producing items of type A.

In the simplest case, the implementation could simply construct an empty collection, and then add the items from the iterator one by one. For example, std::collections::LinkedList’s implementation of FromIterator works this way.

However, some types can do better than that. For example, constructing a vector from some iterator iter could be as simple as:

let mut vec = Vec::new();
for item in iter {
    vec.push(item)
}
vec

But this isn’t ideal: as the vector grows, it may need to expand its buffer, requiring a call to the heap allocator and a copy of the extant elements. Vectors do take algorithmic measures to keep this overhead low, but if there were some way to simply allocate an initial buffer of the right size to begin with, there would be no need to resize at all.

This is where the Iterator trait’s size_hint method comes in:

trait Iterator {
    ...
    fn size_hint(&self) -> (usize, Option<usize>) {
        (0, None)
    }
}

This method returns a lower bound and optional upper bound on the number of items the iterator will produce. The default definition returns zero as the lower bound and declines to name an upper bound, saying, in effect, “I have no idea,” but many iterators can do better than this. An iterator over a Range, for example, knows exactly how many items it will produce, as does an iterator over a Vec or HashMap. Such iterators provide their own specialized definitions for size_hint.

These bounds are exactly the information that Vec’s implementation of FromIterator needs to size the new vector’s buffer correctly from the start. Insertions still check that the buffer is large enough, so even if the hint is incorrect, only performance is affected, not safety. Other types can take similar steps: for example, HashSet and HashMap also use Iterator::size_hint to choose an appropriate initial size for their hash table.

One note about type inference: at the top of this section, it’s a bit strange to see the same call, std::env::args().collect(), produce four different kinds of collections depending on its context. The return type of collect is its type parameter, so the first two calls are equivalent to the following:

let args = std::env::args().collect::<Vec<String>>();
let args = std::env::args().collect::<HashSet<String>>();

But as long as there’s only one type that could possibly work as collect’s argument, Rust’s type inference will supply it for you. When you spell out the type of args, you ensure this is the case.

The Extend Trait

If a type implements the std::iter::Extend trait, then its extend method adds an iterable’s items to the collection:

let mut v: Vec<i32> = (0..5).map(|i| 1 << i).collect();
v.extend(&[31, 57, 99, 163]);
assert_eq!(v, &[1, 2, 4, 8, 16, 31, 57, 99, 163]);

All of the standard collections implement Extend, so they all have this method; so does String. Arrays and slices, which have a fixed length, do not.

The trait’s definition is as follows:

trait Extend<A> {
    fn extend<T>(&mut self, iter: T)
        where T: IntoIterator<Item=A>;
}

Obviously, this is very similar to std::iter::FromIterator: that creates a new collection, whereas Extend extends an existing collection. In fact, several implementations of FromIterator in the standard library simply create a new empty collection, and then call extend to populate it. For example, the implementation of FromIterator for std::collections::LinkedList works this way:

impl<T> FromIterator<T> for LinkedList<T> {
    fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self {
        let mut list = Self::new();
        list.extend(iter);
        list
    }
}

partition

The partition method divides an iterator’s items among two collections, using a closure to decide where each item belongs:

let things = ["doorknob", "mushroom", "noodle", "giraffe", "grapefruit"];

// Amazing fact: the name of a living thing always starts with an
// odd-numbered letter.
let (living, nonliving): (Vec<&str>, Vec<&str>)
    = things.iter().partition(|name| name.as_bytes()[0] & 1 != 0);

assert_eq!(living,    vec!["mushroom", "giraffe", "grapefruit"]);
assert_eq!(nonliving, vec!["doorknob", "noodle"]);

Like collect, partition can make any sort of collections you like, although both must be of the same type. And like collect, you’ll need to specify the return type: the preceding example writes out the type of living and nonliving, and lets type inference choose the right type parameters for the call to partition.

The signature of partition is as follows:

fn partition<B, F>(self, f: F) -> (B, B)
    where Self: Sized,
          B: Default + Extend<Self::Item>,
          F: FnMut(&Self::Item) -> bool;

Whereas collect requires its result type to implement FromIterator, partition instead requires std::default::Default, which all Rust collections implement by returning an empty collection, and std::default::Extend.

Why doesn’t partition just split the iterator into two iterators, instead of building two collections? One reason is that items drawn from the underlying iterator but not yet drawn from the appropriate partitioned iterator would need to be buffered somewhere; you would end up building a collection of some sort internally, anyway. But the more fundamental reason has to do with safety. Partitioning one iterator into two would require the two partitions to share the same underlying iterator. Iterators must be mutable to be used; so the underlying iterator would necessarily be shared, mutable state, which Rust’s safety depends on avoiding.

Implementing Your Own Iterators

You can implement the IntoIterator and Iterator traits for your own types, making all the adapters and consumers shown in this chapter available for use, along with lots of other library and crate code written to work with the standard iterator interface. In this section, we’ll show a simple iterator over a range type, and then a more complex iterator over a binary tree type.

Suppose we have the following range type (simplified from the standard library’s std::ops::Range<T> type):

struct I32Range {
    start: i32,
    end: i32
}

Iterating over an I32Range requires two pieces of state: the current value, and the limit at which the iteration should end. This happens to be a nice fit for the I32Range type itself, using start as the next value, and end as the limit. So you can implement Iterator like so:

impl Iterator for I32Range {
    type Item = i32;
    fn next(&mut self) -> Option<i32> {
        if self.start >= self.end {
            return None;
        }
        let result = Some(self.start);
        self.start += 1;
        result
    }
}

This iterator produces i32 items, so that’s the Item type. If the iteration is complete, next returns None; otherwise, it produces the next value, and updates its state to prepare for the next call.

Of course, a for loop uses IntoIterator::into_iter to convert its operand into an iterator. But the standard library provides a blanket implementation of IntoIterator for every type that implements Iterator, so I32Range is ready for use:

let mut pi = 0.0;
let mut numerator = 1.0;

for k in (I32Range { start: 0, end: 14 }) {
    pi += numerator / (2*k + 1) as f64;
    numerator /= -3.0;
}
pi *= f64::sqrt(12.0);

// IEEE 754 specifies this result exactly.
assert_eq!(pi as f32, std::f32::consts::PI);

But I32Range is a special case, in that the iterable and iterator are the same type. Many cases aren’t so simple. For example, here’s the binary tree type from Chapter 10:

enum BinaryTree<T> {
    Empty,
    NonEmpty(Box<TreeNode<T>>)
}

struct TreeNode<T> {
    element: T,
    left: BinaryTree<T>,
    right: BinaryTree<T>
}

The classic way to walk a binary tree is to recurse, using the stack of function calls to keep track of your place in the tree and the nodes yet to be visited. But when implementing Iterator for BinaryTree<T>, each call to next must produce exactly one value and return. To keep track of the tree nodes it has yet to produce, the iterator must maintain its own stack. Here’s one possible iterator type for BinaryTree:

use self::BinaryTree::*;

// The state of an in-order traversal of a `BinaryTree`.
struct TreeIter<'a, T: 'a> {
    // A stack of references to tree nodes. Since we use `Vec`'s
    // `push` and `pop` methods, the top of the stack is the end of the
    // vector.
    //
    // The node the iterator will visit next is at the top of the stack,
    // with those ancestors still unvisited below it. If the stack is empty,
    // the iteration is over.
    unvisited: Vec<&'a TreeNode<T>>
}

It turns out that pushing the nodes running down the left edge of a subtree is a common operation, so we’ll define a method on TreeIter to do that:

impl<'a, T: 'a> TreeIter<'a, T> {
    fn push_left_edge(&mut self, mut tree: &'a BinaryTree<T>) {
        while let NonEmpty(ref node) = *tree {
            self.unvisited.push(node);
            tree = &node.left;
        }
    }
}

With this helper method in place, we can give BinaryTree an iter method that returns an iterator over the tree:

impl<T> BinaryTree<T> {
    fn iter(&self) -> TreeIter<T> {
        let mut iter = TreeIter { unvisited: Vec::new() };
        iter.push_left_edge(self);
        iter
    }
}

The iter method constructs an empty TreeIter, and then calls push_left_edge to set up the initial stack. The leftmost node ends up on the top, as required by the unvisited stack’s rules.

Following the standard library’s practices, we can then implement IntoIterator on a shared reference to a tree with a call to BinaryTree::iter:

impl<'a, T: 'a> IntoIterator for &'a BinaryTree<T> {
    type Item = &'a T;
    type IntoIter = TreeIter<'a, T>;
    fn into_iter(self) -> Self::IntoIter {
        self.iter()
    }
}

The IntoIter definition establishes TreeIter as the iterator type for a &BinaryTree.

Finally, in the Iterator implementation, we get to actually walk the tree. Like BinaryTree’s iter method, the iterator’s next method is guided by the stack’s rules:

impl<'a, T> Iterator for TreeIter<'a, T> {
    type Item = &'a T;
    fn next(&mut self) -> Option<&'a T> {
        // Find the node this iteration must produce,
        // or finish the iteration.
        let node = match self.unvisited.pop() {
            None => return None,
            Some(n) => n
        };

        // The next node after this one is the leftmost child of
        // this node's right child, so push the path from here down.
        self.push_left_edge(&node.right);

        // Produce a reference to this node's value.
        Some(&node.element)
    }
}

If the stack is empty, the iteration is complete. Otherwise, node is a reference to the node to visit now; this call will return a reference to its element field. But first, we must advance the iterator’s state to the next node. If this node has a right subtree, the next node to visit is the subtree’s leftmost node, and we can use push_left_edge to push it, and its unvisited ancestors, onto the stack. But if this node has no right subtree, push_left_edge has no effect, which is just what we want: we can count on the new top of the stack to be node’s first unvisited ancestor, if any.

With IntoIterator and Iterator implementations in place, we can finally use a for loop to iterate over a BinaryTree by reference:

fn make_node<T>(left: BinaryTree<T>, element: T, right: BinaryTree<T>)
           -> BinaryTree<T>
{
    NonEmpty(Box::new(TreeNode { left, element, right }))
}

// Build a small tree.
let subtree_l = make_node(Empty, "mecha", Empty);
let subtree_rl = make_node(Empty, "droid", Empty);
let subtree_r = make_node(subtree_rl, "robot", Empty);
let tree = make_node(subtree_l, "Jaeger", subtree_r);

// Iterate over it.
let mut v = Vec::new();
for kind in &tree {
    v.push(*kind);
}
assert_eq!(v, ["mecha", "Jaeger", "droid", "robot"]);

Figure 15-1 shows how the unvisited stack behaves as we iterate through a sample tree. At every step, the next node to be visited is at the top of the stack, with all its unvisited ancestors below it.

A tree of four nodes. The root is "Jaeger", with left and right children "mecha" and "robot". "Robot" has a left child, "droid". Iterating over this tree takes the stack through five states: First stack, top to bottom: (mecha Jaeger). Second stack: (Jaeger). Third stack: (droid robot). Fourth stack: (robot). Fifth stack: empty.
Figure 15-1. Iterating over a binary tree

All the usual iterator adapters and consumers are ready for use on our trees:

assert_eq!(tree.iter()
           .map(|name| format!("mega-{}", name))
           .collect::<Vec<_>>(),
           vec!["mega-mecha", "mega-Jaeger",
                "mega-droid", "mega-robot"]);

1 Rust RFC 1522 will add syntax to the language very much like our some Iterator notation. As of Rust 1.17, it is not yet included in the language by default.

2 In fact, since Option is an iterable behaving like a sequence of zero or one items, iterator.filter_map​(closure) is equivalent to iterator.flat_map(closure), assuming closure returns an Option<T>.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.181.252