Chapter 18. Input and Output

Doolittle: What concrete evidence do you have that you exist?

Bomb #20: Hmmmm...well...I think, therefore I am.

Doolittle: That’s good. That’s very good. But how do you know that anything else exists?

Bomb #20: My sensory apparatus reveals it to me.

Dark Star

Rust’s standard library features for input and output are organized around three traits—Read, BufRead, and Write—and the various types that implement them:

  • Values that implement Read have methods for byte-oriented input. They’re called readers.

  • Values that implement BufRead are buffered readers. They support all the methods of Read, plus methods for reading lines of text and so forth.

  • Values that implement Write support both byte-oriented and UTF-8 text output. They’re called writers.

Figure 18-1 shows these three traits and some examples of reader and writer types.

In this chapter, we’ll show how to use these traits and their methods, the various types that implement them, and other ways to interact with files, the terminal, and the network.

A diagram that shows the three traits, Read, BufRead, and Write,           and a few examples of types that implement each one. Some types, like File,           implement both Read and Write.
Figure 18-1. Selected reader and writer types from the Rust standard library

Readers and Writers

Readers are values that your program can read bytes from. Examples include:

  • Files opened using std::fs::File::open(filename)

  • std::net::TcpStreams, for receiving data over the network

  • std::io::stdin(), for reading from the process’s standard input stream

  • std::io::Cursor<&[u8]> values, which are readers that “read” from a byte array that’s already in memory

Writers are values that your program can write bytes to. Examples include:

  • Files opened using std::fs::File::create(filename)

  • std::net::TcpStreams, for sending data over the network

  • std::io::stdout() and std::io:stderr(), for writing to the terminal

  • std::io::Cursor<&mut [u8]> values, which let you treat any mutable slice of bytes as a file for writing

  • Vec<u8>, a writer whose write methods append to the vector

Since there are standard traits for readers and writers (std::io::Read and std::io::Write), it’s quite common to write generic code that works across a variety of input or output channels. For example, here’s a function that copies all bytes from any reader to any writer:

use std::io::{self, Read, Write, ErrorKind};

const DEFAULT_BUF_SIZE: usize = 8 * 1024;

pub fn copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W)
    -> io::Result<u64>
    where R: Read, W: Write
{
    let mut buf = [0; DEFAULT_BUF_SIZE];
    let mut written = 0;
    loop {
        let len = match reader.read(&mut buf) {
            Ok(0) => return Ok(written),
            Ok(len) => len,
            Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        };
        writer.write_all(&buf[..len])?;
        written += len as u64;
    }
}

This is the implementation of std::io::copy() from Rust’s standard library. Since it’s generic, you can use it to copy data from a File to a TcpStream, from Stdin to an in-memory Vec<u8>, etc.

If the error-handling code here is unclear, revisit Chapter 7. We’ll be using Results constantly in the pages ahead; it’s important to have a good grasp of how they work.

The four std::io traits Read, BufRead, Write, and Seek are so commonly used that there’s a prelude module containing only those traits:

use std::io::prelude::*;

You’ll see this once or twice in this chapter. We also make a habit of importing the std::io module itself:

use std::io::{self, Read, Write, ErrorKind};

The self keyword here declares io as an alias to the std::io module. That way, std::io::Result and std::io::Error can be written more concisely as io::Result and io::Error, and so on.

Readers

std::io::Read has several methods for reading data. All of them take the reader itself by mut reference.

  • reader.read(&mut buffer) reads some bytes from the data source and stores them in the given buffer. The type of the buffer argument is &mut [u8]. This reads up to buffer.len() bytes.

    The return type is io::Result<u64>, which is a type alias for Result<u64, io::Error>. On success, the u64 value is the number of bytes read—which may be equal to or less than buffer.len(), even if there’s more data to come, at the whim of the data source. Ok(0) means there is no more input to read.

    On error, .read() returns Err(err), where err is an io::Error value. An io::Error is printable, for the benefit of humans; for programs, it has a .kind() method that returns an error code of type io::ErrorKind. The members of this enum have names like PermissionDenied and ConnectionReset. Most indicate serious errors that can’t be ignored, but one kind of error should be handled specially. io::ErrorKind::Interrupted corresponds to the Unix error code EINTR, which means the read happened to be interrupted by a signal. Unless the program is designed to do something clever with signals, it should just retry the read. The code for copy(), in the preceding section, shows an example of this.

    As you can see, the .read() method is very low-level, even inheriting quirks of the underlying operating system. If you’re implementing the Read trait for a new type of data source, this gives you a lot of leeway. If you’re trying to read some data, it’s a pain. Therefore, Rust provides several higher-level convenience methods. All of them have default implementations in terms of .read(). They all handle ErrorKind::Interrupted, so you don’t have to.

  • reader.read_to_end(&mut byte_vec) reads all remaining input from this reader, appending it to byte_vec, which is a Vec<u8>. Returns an io::Result<usize>, the number of bytes read.

    There is no limit on the amount of data this method will pile into the vector, so don’t use it on an untrusted source. (You can impose a limit using the .take() method, described below.)

  • reader.read_to_string(&mut string) is the same, but append the data to the given String. If the stream isn’t valid UTF-8, this returns an ErrorKind::InvalidData error.

    In some languages, byte input and character input are handled by different types. These days, UTF-8 is so dominant that Rust acknowledges this de facto standard and supports UTF-8 everywhere. Other character sets are supported with the open source encoding crate.

  • reader.read_exact(&mut buf) reads exactly enough data to fill the given buffer. The argument type is &[u8]. If the reader runs out of data before reading buf.len() bytes, this returns an ErrorKind::UnexpectedEof error.

Those are the main methods of the Read trait. In addition, there are four adapter methods that take the reader by value, transforming it into an iterator or a different reader:

  • reader.bytes() returns an iterator over the bytes of the input stream. The item type is io::Result<u8>, so an error check is required for every byte. Furthermore, this calls reader.read() once per byte, which will be very inefficient if the reader is not buffered.

  • reader.chars() is the same, but iterates over characters, treating the input as UTF-8. Invalid UTF-8 causes an InvalidData error.

  • reader.chain(reader2) returns a new reader that produces all the input from reader, followed by all the input from reader2.

  • reader.take(n) returns a new reader that reads from the same source as reader, but is limited to n bytes of input.

There is no method for closing a reader. Readers and writers typically implement Drop so that they are closed automatically.

Buffered Readers

For efficiency, readers and writers can be buffered, which simply means they have a chunk of memory (a buffer) that holds some input or output data in memory. This saves on system calls, as shown in Figure 18-2. The application reads data from the BufReader, in this example by calling its .read_line() method. The BufReader in turn gets its input in larger chunks from the operating system.

The picture shows,           below, in a horizontal strip, bytes as they appear in a file on disk;           above, a BufReader holds a slice of those same bytes in memory.           To the left, we see a String, the result of the program having read some bytes           out of the BufReader.           To the right, the operating system holds some more of the file           in its own read-ahead cache.
Figure 18-2. A buffered file reader

This picture is not to scale. The actual default size of a BufReader’s buffer is several kilobytes, so a single system read can serve hundreds of .read_line() calls. This matters because system calls are slow.

(As the picture shows, the operating system has a buffer too, for the same reason: system calls are slow, but reading data from a disk is slower.)

Buffered readers implement both Read and a second trait, BufRead, which adds the following methods:

  • reader.read_line(&mut line) reads a line of text and appends it to line, which is a String. The newline character ' ' at the end of the line is included in line. If the input has Windows-style line endings, " ", both characters are included in line.

    The return value is an io::Result<usize>, the number of bytes read, including the line ending, if any.

    If the reader is at the end of the input, this leaves line unchanged and returns Ok(0).

  • reader.lines() returns an iterator over the lines of the input. The item type is io::Result<String>. Newline characters are not included in the strings. If the input has Windows-style line endings, " ", both characters are stripped.

    This method is almost always what you want for text input. The next two sections show some examples of its use.

  • reader.read_until(stop_byte, &mut byte_vec) and reader.split(stop​_byte) are just like .read_line() and .lines(), but byte-oriented, producing Vec<u8>s instead of Strings. You choose the delimiter stop_byte.

BufRead also provides a pair of low-level methods, .fill_buf() and .consume(n), for direct access to the reader’s internal buffer. For more about these methods, see the online documentation.

The next two sections cover buffered readers in more detail.

Reading Lines

Here is a function that implements the Unix grep utility. It searches many lines of text, typically piped in from another command, for a given string:

use std::io;
use std::io::prelude::*;

fn grep(target: &str) -> io::Result<()> {
    let stdin = io::stdin();
    for line_result in stdin.lock().lines() {
        let line = line_result?;
        if line.contains(target) {
            println!("{}", line);
        }
    }
    Ok(())
}

Since we want to call .lines(), we need a source of input that implements BufRead. In this case, we call io::stdin() to get the data that’s being piped to us. However, the Rust standard library protects stdin with a mutex. We call .lock() to lock stdin for the current thread’s exclusive use; it returns a StdinLock value that implements BufRead. At the end of the loop, the StdinLock is dropped, releasing the mutex. (Without a mutex, two threads trying to read from stdin at the same time would cause undefined behavior. C has the same issue and solves it the same way: all of the C standard input and output functions obtain a lock behind the scenes. The only difference is that in Rust, the lock is part of the API.)

The rest of the function is straightforward: it calls .lines() and loops over the resulting iterator. Because this iterator produces Result values, we use the ? operator to check for errors.

Suppose we want to take our grep program a step further and add support for searching files on disk. We can make this function generic:

fn grep<R>(target: &str, reader: R) -> io::Result<()>
    where R: BufRead
{
    for line_result in reader.lines() {
        let line = line_result?;
        if line.contains(target) {
            println!("{}", line);
        }
    }
    Ok(())
}

Now we can pass it either a StdinLock or a buffered File:

let stdin = io::stdin();
grep(&target, stdin.lock())?;  // ok

let f = File::open(file)?;
grep(&target, BufReader::new(f))?;  // also ok

Note that a File is not automatically buffered. File implements Read but not BufRead. However, it’s easy to create a buffered reader for a File, or any other unbuffered reader. BufReader::new(reader) does this. (To set the size of the buffer, use BufReader::with_capacity(size, reader).)

In most languages, files are buffered by default. If you want unbuffered input or output, you have to figure out how to turn buffering off. In Rust, File and BufReader are two separate library features, because sometimes you want files without buffering, and sometimes you want buffering without files (for example, you may want to buffer input from the network).

The full program, including error handling and some crude argument parsing, is shown here:

// grep - Search stdin or some files for lines matching a given string.

use std::error::Error;
use std::io::{self, BufReader};
use std::io::prelude::*;
use std::fs::File;
use std::path::PathBuf;

fn grep<R>(target: &str, reader: R) -> io::Result<()>
    where R: BufRead
{
    for line_result in reader.lines() {
        let line = line_result?;
        if line.contains(target) {
            println!("{}", line);
        }
    }
    Ok(())
}

fn grep_main() -> Result<(), Box<Error>> {
    // Get the command-line arguments. The first argument is the
    // string to search for; the rest are filenames.
    let mut args = std::env::args().skip(1);
    let target = match args.next() {
        Some(s) => s,
        None => Err("usage: grep PATTERN FILE...")?
    };
    let files: Vec<PathBuf> = args.map(PathBuf::from).collect();

    if files.is_empty() {
        let stdin = io::stdin();
        grep(&target, stdin.lock())?;
    } else {
        for file in files {
            let f = File::open(file)?;
            grep(&target, BufReader::new(f))?;
        }
    }

    Ok(())
}

fn main() {
    let result = grep_main();
    if let Err(err) = result {
        let _ = writeln!(io::stderr(), "{}", err);
    }
}

Collecting Lines

Several reader methods, including .lines(), return iterators that produce Result values. The first time you want to collect all the lines of a file into one big vector, you’ll run into a problem getting rid of the Results.

// ok, but not what you want
let results: Vec<io::Result<String>> = reader.lines().collect();

// error: can't convert collection of Results to Vec<String>
let lines: Vec<String> = reader.lines().collect();

The second try doesn’t compile: what would happen to the errors? The straightforward solution is to write a for loop and check each item for errors:

let mut lines = vec![];
for line_result in reader.lines() {
    lines.push(line_result?);
}

Not bad; but it would be nice to use .collect() here, and it turns out that we can. We just have to know which type to ask for:

let lines = reader.lines().collect::<io::Result<Vec<String>>>()?;

How does this work? The standard library contains an implementation of FromIterator for Result—easy to overlook in the online documentation—that makes this possible:

impl<T, E, C> FromIterator<Result<T, E>> for Result<C, E>
    where C: FromIterator<T>
{
    ...
}

This says: if you can collect items of type T into a collection of type C (“where C: FromIterator<T>”) then you can collect items of type Result<T, E> into a result of type Result<C, E> (“FromIterator<Result<T, E>> for Result<C, E>”).

In other words, io::Result<Vec<String>> is a collection type, so the .collect() method can create and populate values of that type.

Writers

As we’ve seen, input is mostly done using methods. Output is a bit different.

Throughout the book, we’ve used println!() to produce plain-text output.

println!("Hello, world!");

println!("The greatest common divisor of {:?} is {}",
         numbers, d);

There’s also a print!() macro, which does not add a newline character at the end. The formatting codes for print!() and println!() are the same as those for the format! macro, described in “Formatting Values”.

To send output to a writer, use the write!() and writeln!() macros. They are the same as print!() and println!(), except for two differences.

writeln!(io::stderr(), "error: world not helloable")?;

writeln!(&mut byte_vec, "The greatest common divisor of {:?} is {}",
         numbers, d)?;

One difference is that the write macros each take an extra first argument, a writer. The other is that they return a Result, so errors must be handled. That’s why we used the ? operator at the end of each line.

The print macros don’t return a Result; they simply panic if the write fails. Since they write to the terminal, this is rare.

The Write trait has these methods:

  • writer.write(&buf) writes some of the bytes in the slice buf to the underlying stream. It returns an io::Result<usize>. On success, this gives the number of bytes written, which may be less than buf.len(), at the whim of the stream.

    Like Reader::read(), this is a low-level method that you should avoid using directly.

  • writer.write_all(&buf) writes all the bytes in the slice buf. Returns Result<()>.

  • writer.flush() flushes any buffered data to the underlying stream. Returns Result<()>.

Like readers, writers are closed automatically when they are dropped.

Just as BufReader::new(reader) adds a buffer to any reader, BufWriter::new(writer) adds a buffer to any writer.

let file = File::create("tmp.txt")?;
let writer = BufWriter::new(file);

To set the size of the buffer, use BufWriter::with_capacity(size, writer).

When a BufWriter is dropped, all remaining buffered data is written to the underlying writer. However, if an error occurs during this write, the error is ignored. (Since this happens inside BufWriter’s .drop() method, there is no useful place to report the error.) To make sure your application notices all output errors, manually .flush() buffered writers before dropping them.

Files

We’ve already seen two ways to open a file:

  • File::open(filename) opens an existing file for reading. It returns an io::Result<File>, and it’s an error if the file doesn’t exist.

  • File::create(filename) creates a new file for writing. If a file exists with the given filename, it is truncated.

Note that the File type is in the filesystem module, std::fs, not std::io.

When neither of these fits the bill, you can use OpenOptions to specify the exact desired behavior:

use std::fs::OpenOptions;

let log = OpenOptions::new()
    .append(true)  // if file exists, add to the end
    .open("server.log")?;

let file = OpenOptions::new()
    .write(true)
    .create_new(true)  // fail if file exists
    .open("new_file.txt")?;

The methods .append(), .write(), .create_new(), and so on are designed to be chained like this: each one returns self. This method-chaining design pattern is common enough to have a name in Rust: it’s called a builder. std::process::Command is another example. For more details on OpenOptions, see the online documentation.

Once a File has been opened, it behaves like any other reader or writer. You can add a buffer if needed. The File will be closed automatically when you drop it.

Seeking

File also implements the Seek trait, which means you can hop around within a File rather than reading or writing in a single pass from the beginning to the end. Seek is defined like this:

pub trait Seek {
    fn seek(&mut self, pos: SeekFrom) -> io::Result<u64>;
}

pub enum SeekFrom {
    Start(u64),
    End(i64),
    Current(i64)
}

Thanks to the enum, the seek method is nicely expressive: use file.seek(SeekFrom::Start(0)) to rewind to the beginning, file.seek(SeekFrom::Current(-8)) to go back a few bytes, and so on.

Seeking within a file is slow. Whether you’re using a hard disk or a solid-state drive (SSD), a seek takes as long as reading several megabytes of data.

Other Reader and Writer Types

Earlier in this chapter, we gave a few examples of types other than File that implement Read and Write. Here, we’ll give a few more details about these types.

  • io::stdin() returns a reader for the standard input stream. Its type is io::Stdin. Since this is shared by all threads, each read acquires and releases a mutex.

    Stdin has a .lock() method that acquires the mutex and returns an io::StdinLock, a buffered reader that holds the mutex until it’s dropped. Individual operations on the StdinLock therefore avoid the mutex overhead. We showed example code using this method in “Reading Lines”.

    For technical reasons, io::stdin().lock() doesn’t work. The lock holds a reference to the Stdin value, and that means the Stdin value must be stored somewhere so that it lives long enough:

    let stdin = io::stdin();
    let lines = stdin.lock().lines();  // ok
  • io::stdout() and io::stderr() return writers for the standard output and standard error streams. These too have mutexes and .lock() methods.

  • Vec<u8> implements Write. Writing to a Vec<u8> extends the vector with the new data.

    (String, however, does not implement Write. To build a string using Write, first write to a Vec<u8>, then use String::from_utf8(vec) to convert the vector to a string.)

  • Cursor::new(buf) creates a Cursor, a buffered reader that reads from buf. This is how you create a reader that reads from a String. The argument buf can be any type that implements AsRef<[u8]>, so you can also pass a &[u8], &str, or Vec<u8>.

    Cursors are trivial internally. They have just two fields: buf itself; and an integer, the offset in buf where the next read will start. The position is initially 0.

    Cursors implement Read, BufRead, and Seek. If the type of buf is &mut [u8] or Vec<u8>, then the Cursor also implements Write. Writing to a cursor overwrites bytes in buf starting at the current position. If you try to write past the end of a &mut [u8], you’ll get a partial write or an io::Error. Using a cursor to write past the end of a Vec<u8> is fine, though: it grows the vector. Cursor<&mut [u8]> and Cursor<Vec<u8>> thus implement all four of the std::io::prelude traits.

  • std::net::TcpStream represents a TCP network connection. Since TCP enables two-way communication, it’s both a reader and a writer.

    The static method TcpStream::connect(("hostname", PORT)) tries to connect to a server and returns an io::Result<TcpStream>.

  • std::process::Command supports spawning a child process and piping data to its standard input, like so:

    use std::process::{Command, Stdio};
    
    let mut child =
        Command::new("grep")
        .arg("-e")
        .arg("a.*e.*i.*o.*u")
        .stdin(Stdio::piped())
        .spawn()?;
    
    let mut to_child = child.stdin.take().unwrap();
    for word in my_words {
        writeln!(to_child, "{}", word)?;
    }
    drop(to_child);  // close grep's stdin, so it will exit
    child.wait()?;

    The type of child.stdin is Option<std::process::ChildStdin>; here we’ve used .stdin(Stdio::piped()) when setting up the child process, so child.stdin is definitely populated when .spawn() succeeds. If we hadn’t, child.stdin would be None.

    Command also has similar methods .stdout() and .stderr(), which can be used to request readers in child.stdout and child.stderr.

The std::io module also offers a handful of functions that return trivial readers and writers.

  • io::sink() is the no-op writer. All the write methods return Ok, but the data is just discarded.

  • io::empty() is the no-op reader. Reading always succeeds, but returns end-of-input.

  • io::repeat(byte) returns a reader that repeats the given byte endlessly.

Binary Data, Compression, and Serialization

Many open source crates build on the std::io framework to offer extra features.

The byteorder crate offers ReadBytesExt and WriteBytesExt traits that add methods to all readers and writers for binary input and output:

use byteorder::{ReadBytesExt, WriteBytesExt, LittleEndian};

let n = reader.read_u32::<LittleEndian>()?;
writer.write_i64::<LittleEndian>(n as i64)?;

The flate2 crate provides adapter methods for reading and writing gzipped data:

use flate2::FlateReadExt;

let file = File::open("access.log.gz")?;
let mut gzip_reader = file.gz_decode()?;

The serde crate is for serialization and deserialization: it converts back and forth between Rust structs and bytes. We mentioned this once before, in “Traits and Other People’s Types”. Now we can take a closer look.

Suppose we have some data—the map for a text adventure game—stored in a HashMap:

type RoomId = String;                       // each room has a unique name
type RoomExits = Vec<(char, RoomId)>;       // ...and a list of exits
type RoomMap = HashMap<RoomId, RoomExits>;  // room names and exits, simple

// Create a simple map.
let mut map = RoomMap::new();
map.insert("Cobble Crawl".to_string(),
           vec![('W', "Debris Room".to_string())]);
map.insert("Debris Room".to_string(),
           vec![('E', "Cobble Crawl".to_string()),
                ('W', "Sloping Canyon".to_string())]);
...

Turning this data into JSON for output is just a few lines of code:

use std::io;
use serde::Serialize;
use serde_json::Serializer;

let mut serializer = Serializer::new(io::stdout());
map.serialize(&mut serializer)?;

This code uses the serialize method of the serde::Serialize trait. The library attaches this trait to all types that it knows how to serialize, and that includes all of the types that appear in our data: strings, characters, tuples, vectors, and HashMaps.

serde is flexible. In this program, the output is JSON data, because we chose the serde_json serializer. Other formats, like MessagePack, are also available. Likewise, you could send this output to a file, a Vec<u8>, or any other writer. The code above prints the data on stdout. Here it is:

{"Debris Room":[["E","Cobble Crawl"],["W","Sloping Canyon"]],"Cobble Crawl":
[["W","Debris Room"]]}

serde also includes support for deriving the two key serde traits:

#[derive(Serialize, Deserialize)]
struct Player {
    location: String,
    items: Vec<String>,
    health: u32
}

As of Rust 1.17, this #[derive] attribute requires a few extra steps when setting up your project. We won’t cover that here; see the serde documentation for details. In short, the build system autogenerates implementations of serde::Serialize and serde::Deserialize for Player, so that serializing a Player value is simple:

player.serialize(&mut serializer)?;

The output looks like this:

{"location":"Cobble Crawl","items":["a wand"],"health":3}

Files and Directories

The next few sections cover Rust’s features for working with files and directories, which live in the std::path and std::fs modules. All of these features involve working with filenames, so we’ll start with the filename types.

OsStr and Path

Inconveniently, your operating system does not force filenames to be valid Unicode. Here are two Linux shell commands that create text files. Only the first uses a valid UTF-8 filename.

$ echo "hello world" > ô.txt
$ echo "O brave new world, that has such filenames in't" > $'xf4'.txt

Both commands pass without comment, because the Linux kernel doesn’t know UTF-8 from Ogg Vorbis. To the kernel, any string of bytes (excluding null bytes and slashes) is an acceptable filename. It’s a similar story on Windows: almost any string of 16-bit “wide characters” is an acceptable filename, even strings that are not valid UTF-16. The same is true of other strings the operating system handles, like command-line arguments and environment variables.

Rust strings are always valid Unicode. Filenames are almost always Unicode in practice, but Rust has to cope somehow with the rare case where they aren’t. This is why Rust has std::ffi::OsStr and OsString.

OsStr is a string type that’s a superset of UTF-8. Its job is to be able to represent all filenames, command-line arguments, and environment variables on the current system, whether they’re valid Unicode or not. On Unix, an OsStr can hold any sequence of bytes. On Windows, an OsStr is stored using an extension of UTF-8 that can encode any sequence of 16-bit-values, including unmatched surrogates.

So we have two string types: str for actual Unicode strings; and OsStr for whatever nonsense your operating system can dish out. We’ll introduce one more: std::path::Path, for filenames. This one is purely a convenience. Path is exactly like OsStr, but it adds many handy filename-related methods, which we’ll cover in the next section. Use Path for both absolute and relative paths. For an individual component of a path, use OsStr.

Lastly, for each string type, there’s a corresponding owning type: a String owns a heap-allocated str, a std::ffi::OsString owns a heap-allocated OsStr, and a std::path::PathBuf owns a heap-allocated Path.

str OsStr Path
Unsized type, always passed by reference Yes Yes Yes
Can contain any Unicode text Yes Yes Yes
Looks just like UTF-8, normally Yes Yes Yes
Can contain non-Unicode data No Yes Yes
Text processing methods Yes No No
Filename-related methods No No Yes
Owned, growable, heap-allocated equivalent String OsString PathBuf
Convert to owned type .to_string() .to_os_string() .to_path_buf()

All three of these types implement a common trait, AsRef<Path>, so we can easily declare a generic function that accepts “any filename type” as an argument. This uses a technique we showed in “AsRef and AsMut”:

use std::path::Path;
use std::io;

fn swizzle_file<P>(path_arg: P) -> io::Result<()>
   where P: AsRef<Path>
{
    let path = path_arg.as_ref();
    ...
}

All the standard functions and methods that take path arguments use this technique, so you can freely pass string literals to any of them.

Path and PathBuf Methods

Path offers the following methods, among others:

  • Path::new(str) converts a &str or &OsStr to a &Path. This doesn’t copy the string: the new &Path points to the same bytes as the original &str or &OsStr.

    use std::path::Path;
    let home_dir = Path::new("/home/fwolfe");

    (The similar method OsStr::new(str) converts a &str to a &OsStr.)

  • path.parent() returns the path’s parent directory, if any. The return type is Option<&Path>.

    This doesn’t copy the path: the parent directory of path is always a substring of path.

    assert_eq!(Path::new("/home/fwolfe/program.txt").parent(),
               Some(Path::new("/home/fwolfe")));
  • path.file_name() returns the last component of path, if any. The return type is Option<&OsStr>.

    In the typical case, where path consists of a directory, then a slash, then a filename, this returns the filename.

    assert_eq!(Path::new("/home/fwolfe/program.txt").file_name(),
               Some(OsStr::new("program.txt")));
  • path.is_absolute() and path.is_relative() tell whether the file is absolute, like the Unix path /usr/bin/advent or the Windows path C:Program Files; or relative, like src/main.rs.

  • path1.join(path2) joins two paths, returning a new PathBuf.

    let path1 = Path::new("/usr/share/dict");
    assert_eq!(path1.join("words"),
               Path::new("/usr/share/dict/words"));

    If path2 is an absolute path, this just returns a copy of path2, so this method can be used to convert any path to an absolute path:

    let abs_path = std::env::current_dir()?.join(any_path);
  • path.components() returns an iterator over the components of the given path, from left to right. The item type of this iterator is std::path::Component, an enum that can represent all the different pieces that can appear in filenames:

    pub enum Component<'a> {
        Prefix(PrefixComponent<'a>),  // Windows-only: a drive letter or share
        RootDir,                      // the root directory, `/` or ``
        CurDir,                       // the `.` special directory
        ParentDir,                    // the `..` special directory
        Normal(&'a OsStr)             // plain file and directory names
    }

    For example, the Windows path \veniceMusicA Love Supreme4-Psalm.mp3 consists of a Prefix representing \veniceMusic, followed by a RootDir, and then two Normal components representing A Love Supreme and 04-Psalm.mp3.

    For details, see the online documentation.

These methods work on strings in memory. Paths also have some methods that query the filesystem: .exists(), .is_file(), .is_dir(), .read_dir(), ​.canonicalize(), and so on. See the online documentation to learn more.

There are three methods for converting Paths to strings. Each one allows for the possibility of invalid UTF-8 in the Path.

  • path.to_str() converts a Path to a string, as an Option<&str>. If path isn’t valid UTF-8, this returns None.

    if let Some(file_str) = path.to_str() {
        println!("{}", file_str);
    }  // ...otherwise skip this weirdly named file
  • path.to_string_lossy() is basically the same thing, but it manages to return some sort of string in all cases. If path isn’t valid UTF-8, these methods make a copy, replacing each invalid byte sequence with the Unicode replacement character, U+FFFD (‘�’).

    The return type is std::borrow::Cow<str>: an either-borrowed-or-owned string. To get a String from this value, use its .to_owned() method. (For more about Cow, see “Borrow and ToOwned at Work: The Humble Cow”.)

  • path.display() is for printing paths:

    println!("Download found. You put it in: {}", dir_path.display());

    The value this returns isn’t a string, but it implements std::fmt::Display, so it can be used with format!(), println!(), and friends. If the path isn’t valid UTF-8, the output may contain the � character.

Filesystem Access Functions

Table 18-1 shows some of the functions in std::fs and their approximate equivalents on Unix and Windows. All of these functions return io::Result values. They are Result<()> unless otherwise noted.

Table 18-1. Summary of filesystem access functions
  Rust function Unix Windows
Creating and deleting create_dir(path) mkdir() CreateDirectory()
create_dir_all(path) like mkdir -p like mkdir
remove_dir(path) rmdir() RemoveDirectory()
remove_dir_all(path) like rm -r like rmdir /s
remove_file(path) unlink() DeleteFile()
Copying, moving, and linking copy(src_path, dest_path) -> Result<u64> like cp -p CopyFileEx()
rename(src_path, dest_path) rename() MoveFileEx()
hard_link(src_path, dest_path) link() CreateHardLink()
Inspecting canonicalize(path) -> Result<PathBuf> realpath() GetFinalPathNameByHandle()
metadata(path) -> Result<Metadata> stat() GetFileInformationByHandle()
symlink_metadata(path) -> Result<Metadata> lstat() GetFileInformationByHandle()
read_dir(path) -> Result<ReadDir> opendir() FindFirstFile()
read_link(path) -> Result<PathBuf> readlink() FSCTL_GET_REPARSE_POINT
Permissions set_permissions(path, perm) chmod() SetFileAttributes()

(The number returned by copy() is the size of the copied file, in bytes. For creating symbolic links, see “Platform-Specific Features”.)

As you can see, Rust strives to provide portable functions that work predictably on Windows as well as macOS, Linux, and other Unix systems.

A full tutorial on filesystems is beyond the scope of this book, but if you’re curious about any of these functions, you can easily find more about them online. We’ll show some examples in the next section.

All of these functions are implemented by calling out to the operating system. For example, std::fs::canonicalize(path) does not merely use string processing to eliminate . and .. from the given path. It resolves relative paths using the current working directory, and it chases symbolic links. It’s an error if the path doesn’t exist.

The Metadata type produced by std::fs::metadata(path) and std::fs::symlink_metadata(path) contains such information as the file type and size, permissions, and timestamps. As always, consult the documentation for details.

As a convenience, the Path type has a few of these built in as methods: path.metadata(), for example, is the same thing as std::fs::metadata(path).

Reading Directories

To list the contents of a directory, use std::fs::read_dir, or equivalently, the .read_dir() method of a Path:

for entry_result in path.read_dir()? {
    let entry = entry_result?;
    println!("{}", entry.file_name().to_string_lossy());
}

Note the two uses of ? in this code. The first line checks for errors opening the directory. The second line checks for errors reading the next entry.

The type of entry is std::fs::DirEntry, and it’s a struct with just a few methods:

  • entry.file_name() is the name of the file or directory, as an OsString.

  • entry.path() is the same, but with the original path joined to it, producing a new PathBuf. If the directory we’re listing is "/home/jimb", and entry.file_name() is ".emacs", then entry.path() would return PathBuf::from("/home/jimb/.emacs").

  • entry.file_type() returns an io::Result<FileType>. FileType has .is_file(), .is_dir(), and .is_symlink() methods.

  • entry.metadata() gets the rest of the metadata about this entry.

The special directories . and .. are not listed when reading a directory.

Here’s a more substantial example. The following code recursively copies a directory tree from one place to another on disk:

use std::fs;
use std::io;
use std::path::Path;

/// Copy the existing directory `src` to the target path `dst`.
fn copy_dir_to(src: &Path, dst: &Path) -> io::Result<()> {
    if !dst.is_dir() {
        fs::create_dir(dst)?;
    }

    for entry_result in src.read_dir()? {
        let entry = entry_result?;
        let file_type = entry.file_type()?;
        copy_to(&entry.path(), &file_type, &dst.join(entry.file_name()))?;
    }

    Ok(())
}

A separate function, copy_to, copies individual directory entries:

/// Copy whatever is at `src` to the target path `dst`.
fn copy_to(src: &Path, src_type: &fs::FileType, dst: &Path) -> io::Result<()> {
    if src_type.is_file() {
        fs::copy(src, dst)?;
    } else if src_type.is_dir() {
        copy_dir_to(src, dst)?;
    } else {
        return Err(io::Error::new(io::ErrorKind::Other,
                                  format!("don't know how to copy: {}",
                                          src.display())));
    }
    Ok(())
}

Platform-Specific Features

So far, our copy_to function can copy files and directories. Suppose we also want to support symbolic links on Unix.

There is no portable way to create symbolic links that works on both Unix and Windows, but the standard library offers a Unix-specific symlink function,

use std::os::unix::fs::symlink;

and with this, our job is easy. We need only add a branch to the if-expression in copy_to:

...
} else if src_type.is_symlink() {
    let target = src.read_link()?;
    symlink(target, dst)?;
...

This will work as long as we compile our program only for Unix systems, such as Linux and macOS.

The std::os module contains various platform-specific features, like symlink. The actual body of std::os in the standard library looks like this (taking some poetic license):

//! OS-specific functionality.

#[cfg(unix)]                    pub mod unix;
#[cfg(windows)]                 pub mod windows;
#[cfg(target_os = "ios")]       pub mod ios;
#[cfg(target_os = "linux")]     pub mod linux;
#[cfg(target_os = "macos")]     pub mod macos;
...

The #[cfg] attribute indicates conditional compilation: each of these modules exists only on some platforms. This is why our modified program, using std::os::unix, will successfully compile only for Unix: on other platforms, std::os::unix doesn’t exist.

If we want our code to compile on all platforms, with support for symbolic links on Unix, we must use #[cfg] in our program as well. In this case, it’s easiest to import symlink on Unix, while defining our own symlink stub on other systems:

#[cfg(unix)]
use std::os::unix::fs::symlink;

/// Stub implementation of `symlink` for platforms that don't provide it.
#[cfg(not(unix))]
fn symlink<P: AsRef<Path>, Q: AsRef<Path>>(src: P, _dst: Q)
    -> std::io::Result<()>
{
    Err(io::Error::new(io::ErrorKind::Other,
                       format!("can't copy symbolic link: {}",
                               src.as_ref().display())))
}

As of this writing, the online documentation at https://doc.rust-lang.org/std is generated by running rustdoc on the standard library—on Linux. This means that system-specific functionality for macOS, Windows, and other platforms does not show up in the online documentation. The best way to find it is to use rustup doc to see the HTML documentation for your platform. Of course, another option is to consult the source code, which is available online.

It turns out that symlink is something of a special case. Most Unix-specific features are not standalone functions but rather extension traits that add new methods to standard library types. (We covered extension traits in “Traits and Other People’s Types”.) There’s a prelude module that can be used to enable all of these extensions at once:

use std::os::unix::prelude::*;

For example, on Unix, this adds a .mode() method to std::fs::Permissions, providing access to the underlying u32 value that represents permissions on Unix. Similarly, it extends std::fs::Metadata with accessors for the fields of the underlying struct stat value—such as .uid(), the user ID of the file’s owner.

All told, what’s in std::os is pretty basic. Much more platform-specific functionality is available via third-party crates, like winreg for accessing the Windows registry.

Networking

A tutorial on networking is well beyond the scope of this book. However, if you already know a bit about network programming, this section will help you get started with networking in Rust.

For low-level networking code, start with the std::net module, which provides cross-platform support for TCP and UDP networking. Use the native_tls crate for SSL/TLS support.

These modules provide the building blocks for straightforward, blocking input and output over the network. You can write a simple server in a few lines of code, using std::net and spawning a thread for each connection. For example, here’s an “echo” server:

use std::net::TcpListener;
use std::io;
use std::thread::spawn;

/// Accept connections forever, spawning a thread for each one.
fn echo_main(addr: &str) -> io::Result<()> {
    let listener = TcpListener::bind(addr)?;
    println!("listening on {}", addr);
    loop {
        // Wait for a client to connect.
        let (mut stream, addr) = listener.accept()?;
        println!("connection received from {}", addr);

        // Spawn a thread to handle this client.
        let mut write_stream = stream.try_clone()?;
        spawn(move || {
            // Echo everything we receive from `stream` back to it.
            io::copy(&mut stream, &mut write_stream)
                .expect("error in client thread: ");
            println!("connection closed");
        });
    }
}

fn main() {
    echo_main("127.0.0.1:17007").expect("error: ");
}

An echo server simply repeats back everything you send to it. This kind of code is not so different from what you’d write in Java or Python. (We’ll cover std::thread::spawn() in the next chapter.)

However, for high-performance servers, you’ll need to use asynchronous input and output. The mio crate provides the needed support. MIO is very low-level. It provides a simple event loop and asynchronous methods for reading, writing, connecting, and accepting connections—basically an asynchronous copy of the whole networking API. Whenever an asynchronous operation completes, MIO passes an event to an event handler method that you write.

There’s also the experimental tokio crate, which wraps the mio event loop in a futures-based API, reminiscent of JavaScript promises.

Higher-level protocols are supported by third-party crates. For example, the reqwest crate offers a beautiful API for HTTP clients. Here is a complete command-line program that fetches any document with an http: or https: URL and dumps it to your terminal. This code was written using reqwest = "0.5.1".

extern crate reqwest;

use std::error::Error;
use std::io::{self, Write};

fn http_get_main(url: &str) -> Result<(), Box<Error>> {
    // Send the HTTP request and get a response.
    let mut response = reqwest::get(url)?;
    if !response.status().is_success() {
        Err(format!("{}", response.status()))?;
    }

    // Read the response body and write it to stdout.
    let stdout = io::stdout();
    io::copy(&mut response, &mut stdout.lock())?;

    Ok(())
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    if args.len() != 2 {
        writeln!(io::stderr(), "usage: http-get URL").unwrap();
        return;
    }

    if let Err(err) = http_get_main(&args[1]) {
        writeln!(io::stderr(), "error: {}", err).unwrap();
    }
}

The iron framework for HTTP servers offers high-level touches such as the BeforeMiddleware and AfterMiddleware traits, which help you compose an app from pluggable parts. The websocket crate implements the WebSocket protocol. And so on. Rust is a young language with a busy open source ecosystem. Support for networking is rapidly expanding.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.122.68