Chapter 3. On The Catwalk

When you are alone, you are the cat, you are the phone. You are an animal.

They Might Be Giants

In this chapter, the challenge is to write a clone of the cat program, which is so named because it can concatenate many files into one file. That is, given files a, b, and c, you could execute cat a b c > all to stream all the lines from these three files and redirect them into a file called all. The program will accept an option to prefix each line with the line number.

In this chapter, you’ll learn:

  • How to organize your code into a library and a binary crate

  • How to use testing-first development

  • The difference between public and private variables and functions

  • How to test for the existence of a file

  • How to create a random string for a file that does not exist

  • How to read regular files or STDIN (pronounced standard in)

  • How to use eprintln! to print to STDERR and format! to format a string

  • How to write a test that provides input on STDIN

  • How and why to create a struct

  • How to define mutually exclusive arguments

  • How to use the enumerate method of an iterator

  • More about how and why to use a Box

How cat Works

I’ll start by showing how cat works so that you know what is expected of the challenge. The BSD version of cat does not respond to --help, so I must use man cat to read the manual page. For such a limited program, it has a surprising number of options:

CAT(1)                    BSD General Commands Manual                   CAT(1)

NAME
     cat -- concatenate and print files

SYNOPSIS
     cat [-benstuv] [file ...]

DESCRIPTION
     The cat utility reads files sequentially, writing them to the standard
     output.  The file operands are processed in command-line order.  If file
     is a single dash ('-') or absent, cat reads from the standard input.  If
     file is a UNIX domain socket, cat connects to it and then reads it until
     EOF.  This complements the UNIX domain binding capability available in
     inetd(8).

     The options are as follows:

     -b      Number the non-blank output lines, starting at 1.

     -e      Display non-printing characters (see the -v option), and display
             a dollar sign ('$') at the end of each line.

     -n      Number the output lines, starting at 1.

     -s      Squeeze multiple adjacent empty lines, causing the output to be
             single spaced.

     -t      Display non-printing characters (see the -v option), and display
             tab characters as '^I'.

     -u      Disable output buffering.

     -v      Display non-printing characters so they are visible.  Control
             characters print as '^X' for control-X; the delete character
             (octal 0177) prints as '^?'.  Non-ASCII characters (with the high
             bit set) are printed as 'M-' (for meta) followed by the character
             for the low 7 bits.

EXIT STATUS
     The cat utility exits 0 on success, and >0 if an error occurs.

The GNU version does respond to --help:

$ cat --help
Usage: cat [OPTION]... [FILE]...
Concatenate FILE(s), or standard input, to standard output.

  -A, --show-all           equivalent to -vET
  -b, --number-nonblank    number nonempty output lines, overrides -n
  -e                       equivalent to -vE
  -E, --show-ends          display $ at end of each line
  -n, --number             number all output lines
  -s, --squeeze-blank      suppress repeated empty output lines
  -t                       equivalent to -vT
  -T, --show-tabs          display TAB characters as ^I
  -u                       (ignored)
  -v, --show-nonprinting   use ^ and M- notation, except for LFD and TAB
      --help     display this help and exit
      --version  output version information and exit

With no FILE, or when FILE is -, read standard input.

Examples:
  cat f - g  Output f's contents, then standard input, then g's contents.
  cat        Copy standard input to standard output.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'cat invocation'
Note

The BSD version predates the GNU version, so the latter implements all the same short flags to be compatible. As is typical of GNU programs, it also offers long flag aliases like --number for -n and --number-nonblank for -b. I will show you how to offer both options like the GNU version.

For the challenge program, I will only implement the options -b|--number-nonblank and -n|--number. I will also show how to read regular files and STDIN when given a filename argument of “-”. I’ve put four files for testing into the 03_catr/tests/inputs directory:

  1. empty.txt: an empty file

  2. fox.txt: a single line of text

  3. spiders.txt: three lines of text

  4. the-bustle.txt: a lovely poem by Emily Dickinson that has nine lines including one blank

Empty files are common, if useless. I include this to ensure my program can gracefully handle unexpected input. That is, I want my program to at least not fall over. The following command produces no output, so I expect my program to do the same:

$ cd 03_catr
$ cat tests/inputs/empty.txt

Next, I’ll run cat on a file with one line of text:

$ cat tests/inputs/fox.txt
The quick brown fox jumps over the lazy dog.

The -n|--number and -b|--number-nonblank flags will both number the lines, and the line number is right-justified in a field six characters wide followed by a tab character and then the line of text. To distinguish the tab character, I can use the -t option to display non-printing characters so that the tab shows as ^I. In the following command, I use the Unix pipe | to connect STDOUT from the first command to STDIN in the second command:

$ cat -n tests/inputs/fox.txt | cat -t
     1^IThe quick brown fox jumps over the lazy dog.

The spiders.txt file has three lines of text which should be numbered with the -b option:

$ cat -b tests/inputs/spiders.txt
     1	Don't worry, spiders,
     2	I keep house
     3	casually.

The difference between -n (on the left) and -b (on the right) is apparent only with the-bustle.txt as the latter will only number nonblank lines:

$ cat -n tests/inputs/the-bustle.txt    $ cat -b tests/inputs/the-bustle.txt
     1	The bustle in a house                1	The bustle in a house
     2	The morning after death              2	The morning after death
     3	Is solemnest of industries           3	Is solemnest of industries
     4	Enacted upon earth,—                 4	Enacted upon earth,—
     5
     6	The sweeping up the heart,           5	The sweeping up the heart,
     7	And putting love away                6	And putting love away
     8	We shall not want to use again       7	We shall not want to use again
     9	Until eternity.                      8	Until eternity.
Note

Oddly, you can use -b and -n together, and the -b option takes precedence. The challenge program will allow only one or the other.

When processing any file that does not exist or cannot be opened, cat will print a message to STDERR and move to the next file. In the following example, I’m using blargh as a nonexistent file. I create the file cant-touch-this using the touch command and use the chmod command to set the file with the permissions that make it unreadable. You’ll learn more about what the 000 means in Chapter 15 when you write a clone of ls:

$ touch cant-touch-this && chmod 000 cant-touch-this
$ cat tests/inputs/fox.txt blargh tests/inputs/spiders.txt cant-touch-this
The quick brown fox jumps over the lazy dog. 1
cat: blargh: No such file or directory 2
Don't worry, spiders, 3
I keep house
casually.
cat: cant-touch-this: Permission denied 4
1

This is the output from the first file.

2

This is an error for a nonexistent file.

3

This is the output from the third file.

4

This is the error for an unreadable file.

Finally, run cat with all the files and notice that it starts renumbering the lines for each file:

$ cd tests/inputs
$ cat -n empty.txt fox.txt spiders.txt the-bustle.txt
     1	The quick brown fox jumps over the lazy dog.
     1	Don't worry, spiders,
     2	I keep house
     3	casually.
     1	The bustle in a house
     2	The morning after death
     3	Is solemnest of industries
     4	Enacted upon earth,—
     5
     6	The sweeping up the heart,
     7	And putting love away
     8	We shall not want to use again
     9	Until eternity.

If you look at the mk-outs.sh script, you’ll see I execute cat with all these files, individually and together, as regular files and through STDIN, using no flags and with the -n and -b flags. I capture all the outputs to various files in the tests/expected directory to use in testing.

Getting Started with Test-Driven Development

In Chapter 2, I wrote the tests at the end of the chapter because I needed to show you some basics of the language. Starting with this exercise, I want to make you think about test-driven development (TDD) as described in a book by that title written by Kent Beck (Addison-Wesley, 2002). TDD advocates writing tests for code before writing the code as shown in Figure 3-1.

fig 1 tdd
Figure 3-1. The test-driven development cycle
Note

Technically, TDD involves writing each test as you add a feature. Since I’ve written all the tests for you, you might consider this more like test-first development. Once you’ve written code that passes the tests, you can start to refactor your code to improve it, perhaps by shortening the lines of code or by finding a faster implementation.

The challenge program you write should be called catr (pronounced cat-er) for a Rust version of cat. I suggest you begin with cargo new catr to start a new application and then copy my 03_catr/tests directory into your source tree. Don’t copy anything but the tests as you will write the rest of the code yourself. You should have a structure like this:

$ tree -L 2 catr/
catr
├── Cargo.toml
├── src
│   └── main.rs
└── tests
    ├── cli.rs
    ├── expected
    └── inputs

4 directories, 3 files

I’m going to use all the same external crates as in Chapter 2 plus the rand crate for testing, so update your Cargo.toml to this:

[dependencies]
clap = "2.33"

[dev-dependencies]
assert_cmd = "1"
predicates = "1"
rand = "0.8"

Now run cargo test to download the crates, compile your program, and run the tests. All the tests should fail. Your mission, should you choose to accept it, is to write a program that will pass these tests.

Creating a Library Crate

The program in Chapter 2 was quite short and easily fit into src/main.rs. The typical programs you will write in your career will likely be much longer. Starting with this program, I will divide my code into a library in src/lib.rs and a binary in src/main.rs that will call library functions. I believe this organization makes it easier to test and grow applications over time.

First, I’ll move all the important bits from src/main.rs into a function called run in src/lib.rs. This function will return a kind of Result to indicate success or failure. This is similar to the TestResult type alias from Chapter 2. The TestResult always returns the unit type () in the Ok variant, but MyResult can return an Ok that contains any type which I can denote using the generic T:

use std::error::Error; 1

type MyResult<T> = Result<T, Box<dyn Error>>;2

pub fn run() -> MyResult<()> { 3
    println!("Hello, world!"); 4
    Ok(()) 5
}
1

Import the Error trait for representing error values.

2

Create a MyResult to represent an Ok value for any type T or some Err value that implements the Error trait.

3

Define a public (pub) function that returns either Ok containing the unit type () or some error Err.

4

Print Hello, world!.

5

Return an indication that the function ran successfully.

Tip

By default, all the variables and functions in a module are private. In the preceding code, I must use pub to make this library function accessible to the rest of the program.

To call this, change src/main.rs to this:

fn main() {
    if let Err(e) = catr::run() { 1
        eprintln!("{}", e); 2
        std::process::exit(1); 3
    }
}
1

Execute the catr::run function and check if the return value matches an Err(e) where e is some value that implements the Error trait, which means among other things that it can be printed.

2

Use the eprintln! (error print line) macro to print the error message e to STDERR.

3

Exit the program with a nonzero value to indicate an error.

Tip

The eprint! and eprintln! macros are just like print! and println! except that they print to STDERR.

If you execute cargo run, you should see Hello, world! as before.

Defining the Parameters

Next, I’ll add the program’s parameters, and I’d like to introduce a struct called Config to represent the arguments to the program. A struct is a data structure in which you define the names and types of the elements it will contain. It’s similar to a class definition in other languages. In this case, I want a struct that describes the values the program will need such as a list of the input filenames and the flags for numbering the lines of output.

Add the following struct to src/lib.rs. It’s common to place such definitions near the top after the use statements:

#[derive(Debug)] 1
pub struct Config { 2
    files: Vec<String>, 3
    number_lines: bool, 4
    number_nonblank_lines: bool, 5
}
1

The derive macro allows me to add the Debug trait so the struct can be printed.

2

Define a struct called Config. The pub (public) makes this accessible outside the library.

3

The files will be a vector of strings.

4

This is a Boolean value to indicate whether or not to print the line numbers.

5

This is a Boolean to control printing line numbers only for nonblank lines.

To use a struct, I can create an instance of it with specific values. In the following sketch of a get_args function, you can see it finishes by creating a new Config with the runtime values from the user. Add use clap::{App, Arg} and this function to your src/lib.rs. Try to complete the function on your own, stealing what you can from Chapter 2:

pub fn get_args() -> MyResult<Config> { 1
    let matches = App::new("catr")... 2

    Ok(Config { 3
        files: ...,
        number_lines: ...,
        number_nonblank_lines: ...,
    })
}
1

This is a public function that returns a MyResult that will contain either a Config on success or an error.

2

Here you should define the parameters and process the matches.

3

Create a Config using the supplied values.

This means the run function needs to be updated to accept a Config argument. For now, print it:

pub fn run(config: Config) -> MyResult<()> { 1
    dbg!(config); 2
    Ok(())
}
1

The function will accept a Config struct and will return Ok with the unit type if successful.

2

Use the dbg! (debug) macro to print the configuration.

Update your src/main.rs as follows:

fn main() {
    if let Err(e) = catr::get_args().and_then(catr::run) { 1
        eprintln!("{}", e); 2
        std::process::exit(1); 3
    }
}
1

Call the catr::get_args function, then use Result::and_then to pass the Ok(config) to catr::run.

2

If either get_args or run returns an Err, print it to STDERR.

3

Exit the program with a nonzero value.

See if you can get your program to print a usage like this:

$ cargo run --quiet -- --help
catr 0.1.0
Ken Youens-Clark <[email protected]>
Rust cat

USAGE:
    catr [FLAGS] <FILE>...

FLAGS:
    -h, --help               Prints help information
    -n, --number             Number lines
    -b, --number-nonblank    Number non-blank lines
    -V, --version            Prints version information

ARGS:
    <FILE>...    Input file(s) [default: -]

With no arguments, you program should be able to print a configuration structure like this:

$ cargo run
[src/lib.rs:52] config = Config {
    files: [ 1
        "-",
    ],
    number_lines: false, 2
    number_nonblank_lines: false,
}
1

The default files should contain “-” for STDIN.

2

The Boolean values should default to false.

Run with arguments and be sure the config looks like this:

$ cargo run -- -n tests/inputs/fox.txt
[src/lib.rs:52] config = Config {
    files: [
        "tests/inputs/fox.txt", 1
    ],
    number_lines: true, 2
    number_nonblank_lines: false,
}
1

The positional file argument is parsed into the files.

2

The -n option causes number_lines to be true.

While the BSD version will allow both -n and -b, the challenge program should consider these to be mutually exclusive and generate an error when used together:

$ cargo run -- -b -n tests/inputs/fox.txt
error: The argument '--number-nonblank' cannot be used with '--number'

Give it a go. Seriously! I want you to try writing your version of this before you read ahead. I’ll wait here until you finish.

All set? Compare what you have to my get_args function:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("catr")
        .version("0.1.0")
        .author("Ken Youens-Clark <[email protected]>")
        .about("Rust cat")
        .arg(
            Arg::with_name("files") 1
                .value_name("FILE")
                .help("Input file(s)")
                .required(true)
                .default_value("-")
                .min_values(1),
        )
        .arg(
            Arg::with_name("number") 2
                .help("Number lines")
                .short("n")
                .long("number")
                .takes_value(false)
                .conflicts_with("number_nonblank"),
        )
        .arg(
            Arg::with_name("number_nonblank") 3
                .help("Number non-blank lines")
                .short("b")
                .long("number-nonblank")
                .takes_value(false),
        )
        .get_matches();

    Ok(Config {
        files: matches.values_of_lossy("files").unwrap(), 4
        number_lines: matches.is_present("number"), 5
        number_nonblank_lines: matches.is_present("number_nonblank"),
    })
}
1

This positional argument is for the files and is required to have at least one value that defaults to “-”.

2

This is an option that has a short name -n and a long name --number. It does not take a value because it is a flag. When present, it will tell the program to print line numbers. It cannot occur in conjunction with -b.

3

The -b|--number-nonblank flag controls whether to print line numbers for nonblank lines.

4

Because at least one value is required, it should be safe to call Option::unwrap.

5

The two Boolean options are either present or not.

Tip

Optional arguments have short and/or long names, but positional ones do not. You can define optional arguments before or after positional arguments. Defining positional arguments with min_values also implies multiple values but does not for optional parameters.

With this much code, you should be able to pass at least a couple of the tests when you execute cargo test. There will be a great deal of output showing you all the failing test output, but don’t despair. You will soon see a fully passing test suite.

Processing the Files

Now that you have validated all the arguments, you are ready to process the files and create the correct output. First modify the run function in src/lib.rs to print each filename:

pub fn run(config: Config) -> MyResult<()> {
    for filename in config.files { 1
        println!("{}", filename); 2
    }
    Ok(())
}
1

Iterate through each filename.

2

Print the filename.

Run the program with some input files. In the following example, the bash shell will expand the file glob *.txt into all filenames that end with the extension .txt:

$ cargo run -- tests/inputs/*.txt 1
tests/inputs/empty.txt
tests/inputs/fox.txt
tests/inputs/spiders.txt
tests/inputs/the-bustle.txt
1

TK

Windows PowerShell can expand file globs using Get-ChildItem:

> cargo run -q -- -n (Get-ChildItem .	estsinputs*.txt)
C:Userskyclarkwork
ust-sysprog3_catr	estsinputsempty.txt
C:Userskyclarkwork
ust-sysprog3_catr	estsinputsfox.txt
C:Userskyclarkwork
ust-sysprog3_catr	estsinputsspiders.txt
C:Userskyclarkwork
ust-sysprog3_catr	estsinputs	he-bustle.txt

Opening a File or STDIN

The next step is to try to open each filename. When the filename is “-”, I should open STDIN; otherwise, I will attempt to open the given filename and handle errors. For the following code, you will need to expand your imports to the following:

use clap::{App, Arg};
use std::error::Error;
use std::fs::File;
use std::io::{self, BufRead, BufReader};

This next step is a bit tricky, so I’d like to provide an open function for you to use. In the following code, I’m using the match keyword, which is similar to a switch statement in C. Specifically, I’m matching on whether filename is equal to “-” or something else, which is specified using the wildcard _:

fn open(filename: &str) -> MyResult<Box<dyn BufRead>> { 1
    match filename {
        "-" => Ok(Box::new(BufReader::new(io::stdin()))), 2
        _ => Ok(Box::new(BufReader::new(File::open(filename)?))), 3
    }
}
1

The function will accept the filename and will return either an error or a boxed value that implements the BufRead trait.

2

When the filename is “-”, read from std::io::stdin.

3

Otherwise, use File::open to try to open the given file or propagate an error.

If File::open is successful, the result will be a filehandle, which is a mechanism for reading the contents of a file. Both a filehandle and std::io::stdin implement the BufRead trait, which means the values will, for instance, respond to the BufRead::lines function to produce lines of text. Note that BufRead::lines will remove any line endings such as on Windows and on Unix.

Again you see I’m using a Box to create a pointer to heap-allocated memory to hold the filehandle. You may wonder if this is completely necessary. I could try to write the function without using Box:

// This will not compile
fn open(filename: &str) -> MyResult<dyn BufRead> {
    match filename {
        "-" => Ok(BufReader::new(io::stdin())),
        _ => Ok(BufReader::new(File::open(filename)?)),
    }
}

If I try to compile this code, I get the following error:

error[E0277]: the size for values of type `(dyn std::io::BufRead + 'static)`
cannot be known at compilation time
   --> src/lib.rs:88:28
    |
88  | fn open(filename: &str) -> MyResult<dyn BufRead> {
    |                            ^^^^^^^^^^^^^^^^^^^^^
    |                            doesn't have a size known at compile-time
    |
    = help: the trait `Sized` is not implemented for `(dyn std::io::BufRead
    + 'static)`

As the compiler says, there is not an implementation in BufRead for the Sized trait. If a variable doesn’t have a fixed, known size, then Rust can’t store it on the stack. The solution is to instead allocate memory on the heap by putting the return value into a Box, which is a pointer with a known size.

The preceding open function is really dense. I can appreciate it if you think that’s more than a little complicated; however, it handles basically any error you will encounter. To demonstrate this, change your run to the following:

pub fn run(config: Config) -> MyResult<()> {
    for filename in config.files { 1
        match open(&filename) { 2
            Err(err) => eprintln!("Failed to open {}: {}", filename, err), 3
            Ok(_) => println!("Opened {}", filename), 4
        }
    }
    Ok(())
}
1

Iterate through the filenames.

2

Try to open the filename. Note the use of & to borrow the variable.

3

Print an error message to STDERR when open fails.

4

Print a successful message when open works.

Try to run your program with the following:

  1. A valid input file

  2. A nonexistent file

  3. An unreadable file

For the last option, you can create a file that cannot be read like so:

$ touch cant-touch-this && chmod 000 cant-touch-this

Run your program and verify your code gracefully prints error messages for bad input files and continues to process the valid ones:

$ cargo run -- blargh cant-touch-this tests/inputs/fox.txt
Failed to open blargh: No such file or directory (os error 2)
Failed to open cant-touch-this: Permission denied (os error 13)
Opened tests/inputs/fox.txt

With this addition, you should be able to pass cargo test skips_bad_file. Now that you are able to open and read valid input files, I want you to finish the program on your own. Can you figure out how to read the opened file line-by-line? Start with tests/inputs/fox.txt that has only one line. You should be able to see the following output:

$ cargo run -- tests/inputs/fox.txt
The quick brown fox jumps over the lazy dog.

Verify that you can read STDIN by default. In the following command, I will use the | to pipe STDOUT from the first command to the STDIN of the second command:

$ cat tests/inputs/fox.txt | cargo run
The quick brown fox jumps over the lazy dog.

The output should be the same when providing the filename “-”. In the following command, I will use the bash redirect operator < to take input from the given filename and provide it to STDIN:

$ cargo run -- - < tests/inputs/fox.txt
The quick brown fox jumps over the lazy dog.

Next, try an input file with more than one line and try to number the lines for -n:

$ cargo run -- -n tests/inputs/spiders.txt
     1	Don't worry, spiders,
     2	I keep house
     3	casually.

Then try to skip blank lines in the numbering for -b:

$ cargo run -- -b tests/inputs/the-bustle.txt
     1	The bustle in a house
     2	The morning after death
     3	Is solemnest of industries
     4	Enacted upon earth,—

     5	The sweeping up the heart,
     6	And putting love away
     7	We shall not want to use again
     8	Until eternity.

Run cargo test often to see which tests are failing. The tests in tests/cli.rs are similar to Chapter 2, but I’ve added a little more organization. For instance, I define several constant &str values at the top of that module which I use throughout the crate. I use a common convention of ALL_CAPS names to highlight the fact that they are scoped or visible throughout the crate:

const PRG: &str = "catr";
const EMPTY: &str = "tests/inputs/empty.txt";
const FOX: &str = "tests/inputs/fox.txt";
const SPIDERS: &str = "tests/inputs/spiders.txt";
const BUSTLE: &str = "tests/inputs/the-bustle.txt";

To test that the program will die when given a nonexistent file, I use the rand crate to generate a random filename that does not exist. For the following function, I will use rand::{distributions::Alphanumeric, Rng} to import various parts of the crate I need in this function:

fn gen_bad_file() -> String { 1
    loop { 2
        let filename: String = rand::thread_rng() 3
            .sample_iter(&Alphanumeric)
            .take(7)
            .map(char::from)
            .collect();

        if fs::metadata(&filename).is_err() { 4
            return filename;
        }
    }
}
1

The function will return a String, which is a dynamically generated string closely related to the str struct I’ve been using.

2

Start an infinite loop.

3

Create a random string of seven alphanumeric characters.

4

fs::metadata returns an error when the given filename does not exist, so return the nonexistent filename.

Note

In the preceding function, I use filename two times after creating it. The first time, I borrow it using &filename, and the second time I don’t use the ampersand. Try removing the & and running the code. You should get an error message that ownership of filename value is moved into fs::metadata. Effectively, the function consumes the value, leaving it unusable. The & shows I only want to borrow a reference to the value.

error[E0382]: use of moved value: `filename`
  --> tests/cli.rs:37:20
   |
30 |         let filename: String = rand::thread_rng()
   |             -------- move occurs because `filename` has type `String`,
   |                      which does not implement the `Copy` trait
...
36 |         if fs::metadata(filename).is_err() {
   |                         -------- value moved here
37 |             return filename;
   |                    ^^^^^^^^ value used here after move

Don’t worry if you don’t completely understand the preceding code yet. I’m only showing this so you understand how it is used in the skips_bad_file test:

#[test]
fn skips_bad_file() -> TestResult {
    let bad = gen_bad_file(); 1
    let expected = format!("{}: .* [(]os error 2[)]", bad); 2
    Command::cargo_bin(PRG)? 3
        .arg(&bad)
        .assert()
        .success() 4
        .stderr(predicate::str::is_match(expected)?);
    Ok(())
}
1

Generate the name of a nonexistent file.

2

The expected error message should include the filename and the string “os error 2” on both Windows or Unix platforms.

3

Run the program with the bad file and verify that STDERR matches the expected pattern.

4

The command should succeed as bad files should only generate warnings and not kill the process.

Tip

In the preceding function, I used the format! macro to generate a new String. This macro works like print! except that it returns the value rather than printing it.

I created a run helper function to run the program with input arguments and verify that the output matches the text in the file generated by mk-outs.sh:

fn run(args: &[&str], expected_file: &str) -> TestResult { 1
    let expected = fs::read_to_string(expected_file)?; 2
    Command::cargo_bin(PRG)? 3
        .args(args)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

The function accepts a slice of &str arguments and the filename with the expected output. The function returns a TestResult.

2

Try to read the expected output file.

3

Execute the program with the arguments and verify it runs successfully and produces the expected output.

I use this function like so:

#[test]
fn bustle() -> TestResult {
    run(&[BUSTLE], "tests/expected/the-bustle.txt.out") 1
}
1

Run the program with the BUSTLE input file and verify that the output matches the output produced by mk-outs.sh.

I also wrote a helper function to provide input via STDIN:

fn run_stdin(
    input_file: &str, 1
    args: &[&str],
    expected_file: &str,
) -> TestResult {
    let input = fs::read_to_string(input_file)?; 2
    let expected = fs::read_to_string(expected_file)?;
    Command::cargo_bin(PRG)? 3
        .args(args)
        .write_stdin(input)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

The first argument is the filename containing the text that should be given to STDIN.

2

Try to read the input and expected files.

3

Try to run the program with the given arguments and STDIN and verify the output.

This function is used similarly:

#[test]
fn bustle_stdin() -> TestResult {
    run_stdin(BUSTLE, &["-"], "tests/expected/the-bustle.txt.stdin.out") 1
}
1

Run the program using the contents of the given filename as STDIN and an input filename of “-”. Verify the output matches the expected value.

That should be enough to get started. Off you go! Come back when you’re done.

Solution

I hope you found this an interesting and challenging program to write. It’s important to tackle complicated programs one step at a time. I’ll show you how I built my program in this way.

Reading the Lines in a File

I started with printing the lines of an open filehandle:

pub fn run(config: Config) -> MyResult<()> {
    for filename in config.files {
        match open(&filename) {
            Err(err) => eprintln!("{}: {}", filename, err), 1
            Ok(file) => {
                for line_result in file.lines() { 2
                    let line = line_result?; 3
                    println!("{}", line); 4
                }
            }
        }
    }
    Ok(())
}
1

Print the filename and error when there is a problem opening a file.

2

Iterate over each line_result value from BufRead::lines.

3

Either unpack an Ok value from line_result or propagate an error.

4

Print the line.

Note

When reading the lines from a file, you don’t get the lines directly from the filehandle but instead get a std::io::Result, which is a “type is broadly used across std::io for any operation which may produce an error.” Reading and writing files falls into the category of IO (input/output) which depends on external resources like the operating and file systems. While it’s unlikely that reading a line from a filehandle will fail, the point is that it could fail.

If you run cargo test, you should pass about half of the tests, which is not bad for so few lines of code.

Printing Line Numbers

Next, I’d like to add the printing of line numbers for the -n|--number option. One solution that will likely be familiar to C programmers would be something like this:

pub fn run(config: Config) -> MyResult<()> {
    for filename in config.files {
        match open(&filename) {
            Err(err) => eprintln!("{}: {}", filename, err),
            Ok(file) => {
                let mut line_num = 0; 1
                for line_result in file.lines() {
                    let line = line_result?;
                    line_num += 1; 2

                    if config.number_lines { 3
                        println!("{:>6}	{}", line_num, line); 4
                    } else {
                        println!("{}", line); 5
                    }
                }
            }
        }
    }
    Ok(())
}
1

Initialize a mutable counter variable to hold the line number.

2

Add 1 to the line number.

3

Check whether to print line numbers.

4

If so, print the current line number in a right-justified field 6 characters wide followed by a tab character, and then the line of text.

5

Otherwise, print the line.

Recall that all variables in Rust are immutable by default, so it’s necessary to add mut to line_num as I intend to change it. The += operator is a compound assignment that adds the righthand value 1 to line_num to increment it1. Of note, too, is the formatting syntax {:>6} that indicates the width of the field as six characters with the text aligned to the right. (You can use < for left-justified and ^ for centered text.) This syntax is similar to printf in C, Perl, and Python’s string formatting.

If I run this version of the program, it looks pretty good:

$ cargo run -- tests/inputs/spiders.txt -n
     1	Don't worry, spiders,
     2	I keep house
     3	casually.

While this works adequately, I’d like to point out a more idiomatic solution using Iterator::enumerate. This method will return a tuple containing the index position and value for each element in an iterable, which is something that can produce values until exhausted:

pub fn run(config: Config) -> MyResult<()> {
    for filename in config.files {
        match open(&filename) {
            Err(err) => eprintln!("{}: {}", filename, err),
            Ok(file) => {
                for (line_num, line_result) in file.lines().enumerate() { 1
                    let line = line_result?;
                    if config.number_lines {
                        println!("{:>6}	{}", line_num + 1, line); 2
                    } else {
                        println!("{}", line);
                    }
                }
            }
        }
    }
    Ok(())
}
1

The tuple values from Iterator::enumerate can be unpacked using pattern matching.

2

Numbering from enumerate starts at 0, so add 1 to mimic cat which starts at 1.

This will create the same output, but now the code avoids using a mutable value. I can execute cargo test fox to run all the tests starting with fox, and I find that two out of three pass. The program fails on the -b flag, so next I need to handle printing the line numbers only for nonblank lines. Notice in this version, I’m also going to remove line_result and shadow the line variable:

pub fn run(config: Config) -> MyResult<()> {
    for filename in config.files {
        match open(&filename) {
            Err(err) => eprintln!("{}: {}", filename, err),
            Ok(file) => {
                let mut last_num = 0; 1
                for (line_num, line) in file.lines().enumerate() {
                    let line = line?; 2
                    if config.number_lines { 3
                        println!("{:>6}	{}", line_num + 1, line);
                    } else if config.number_nonblank_lines { 4
                        if !line.is_empty() {
                            last_num += 1;
                            println!("{:>6}	{}", last_num, line); 5
                        } else {
                            println!(); 6
                        }
                    } else {
                        println!("{}", line); 7
                    }
                }
            }
        }
    }
    Ok(())
}
1

Initialize a mutable variable for the number of the last nonblank line.

2

Shadow the line with the result of unpacking the Result.

3

Handle printing line numbers.

4

Handle printing line numbers for nonblank lines.

5

If the line is not empty, increment last_num and print the output.

6

If the line is empty, print a blank line.

7

If there are no numbering options, print the line.

Note

Shadowing a variable is Rust is when you reuse a variable’s name and set it to a new value. Arguably the line_result/line code may be more explicit and readable, but reusing line in this context is more Rustic code you’re likely to encounter.

If you run cargo test, you should pass all the tests.

Going Further

You have a working program now, but you don’t have to stop there. If you’re up for an additional challenge, try implementing the other options shown in the manual pages for both the BSD and GNU versions. For each option, use cat to create the expected output file, then expand the tests to check that your program creates this same output. I’d also recommend you check out the bat, which is another Rust clone of cat (“with wings”) for a more complete implementation.

The number lines output of cat -n is similar in ways to nl, a “line numbering filter” program. cat is also a bit similar to programs that will show you a page or screenfull of text at a time, so called pagers like more and less. (more would show you a page of text with “More” at the bottom to let you know you could continue. Obviously someone decided to be clever and named their clone less, but it does the same thing.) Consider implementing both of those programs. Read the manual pages, create the test output, and copy the ideas from this project to write and test your versions.

Summary

You made big strides in this chapter, creating a much more complex program. Consider what you learned:

  • You separated your code into library (src/lib.rs) and binary (src/main.rs) crates, which can make it easier to organize and encapsulate ideas.

  • You created your first struct, which is a bit like a class declaration in other languages. This struct allowed you to create a complex data structure called Config to describe the inputs for your program.

  • By default, all values and functions are immutable and private. You learned to use mut to make a value mutable and pub to make a value or function public.

  • You used testing-first approach where all the tests exist before the program is even written. When the program passes all the tests, you can be confident your program meets all the specifications encoded in the tests.

  • You saw how to use the rand crate to generate a random string for a nonexistent file.

  • You figured out how to read lines of text from both STDIN or regular files.

  • You used the eprintln! macros to print to STDERR and format! to dynamically generate a new string.

  • You used a for loop to visit each element in an iterable.

  • You found that the Iterator::enumerate method will return both the index and element as a tuple, which was useful for numbering the lines of text.

  • You learned to use a Box that points to a filehandle to read either STDIN or a regular file.

In the next chapter, you’ll learn a good deal more about reading files by lines, bytes, or characters.

1 Note that Rust does not have a unary ++ operator, so you cannot use line_num++ to increment a variable by 1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.253.170