When you are alone, you are the cat, you are the phone. You are an animal.
They Might Be Giants
In this chapter, the challenge is to write a clone of the cat
program, which is so named because it can concatenate many files into one file.
That is, given files a, b, and c, you could execute cat a b c > all
to stream all the lines from these three files and redirect them into a file called all.
The program will accept an option to prefix each line with the line number.
In this chapter, you’ll learn:
How to organize your code into a library and a binary crate
How to use testing-first development
The difference between public and private variables and functions
How to test for the existence of a file
How to create a random string for a file that does not exist
How to read regular files or STDIN
(pronounced standard in)
How to use eprintln!
to print to STDERR
and format!
to format a string
How to write a test that provides input on STDIN
How and why to create a struct
How to define mutually exclusive arguments
How to use the enumerate
method of an iterator
More about how and why to use a Box
I’ll start by showing how cat
works so that you know what is expected of the challenge.
The BSD version of cat
does not respond to --help
, so I must use man cat
to read the manual page.
For such a limited program, it has a surprising number of options:
CAT(1) BSD General Commands Manual CAT(1) NAME cat -- concatenate and print files SYNOPSIS cat [-benstuv] [file ...] DESCRIPTION The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash ('-') or absent, cat reads from the standard input. If file is a UNIX domain socket, cat connects to it and then reads it until EOF. This complements the UNIX domain binding capability available in inetd(8). The options are as follows: -b Number the non-blank output lines, starting at 1. -e Display non-printing characters (see the -v option), and display a dollar sign ('$') at the end of each line. -n Number the output lines, starting at 1. -s Squeeze multiple adjacent empty lines, causing the output to be single spaced. -t Display non-printing characters (see the -v option), and display tab characters as '^I'. -u Disable output buffering. -v Display non-printing characters so they are visible. Control characters print as '^X' for control-X; the delete character (octal 0177) prints as '^?'. Non-ASCII characters (with the high bit set) are printed as 'M-' (for meta) followed by the character for the low 7 bits. EXIT STATUS The cat utility exits 0 on success, and >0 if an error occurs.
The GNU version does respond to --help
:
$ cat --help Usage: cat [OPTION]... [FILE]... Concatenate FILE(s), or standard input, to standard output. -A, --show-all equivalent to -vET -b, --number-nonblank number nonempty output lines, overrides -n -e equivalent to -vE -E, --show-ends display $ at end of each line -n, --number number all output lines -s, --squeeze-blank suppress repeated empty output lines -t equivalent to -vT -T, --show-tabs display TAB characters as ^I -u (ignored) -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB --help display this help and exit --version output version information and exit With no FILE, or when FILE is -, read standard input. Examples: cat f - g Output f's contents, then standard input, then g's contents. cat Copy standard input to standard output. GNU coreutils online help: <http://www.gnu.org/software/coreutils/> For complete documentation, run: info coreutils 'cat invocation'
The BSD version predates the GNU version, so the latter implements all the same short flags to be compatible. As is typical of GNU programs, it also offers long flag aliases like --number
for -n
and --number-nonblank
for -b
. I will show you how to offer both options like the GNU version.
For the challenge program, I will only implement the options -b|--number-nonblank
and -n|--number
.
I will also show how to read regular files and STDIN
when given a filename argument of “-”.
I’ve put four files for testing into the 03_catr/tests/inputs directory:
empty.txt: an empty file
fox.txt: a single line of text
spiders.txt: three lines of text
the-bustle.txt: a lovely poem by Emily Dickinson that has nine lines including one blank
Empty files are common, if useless. I include this to ensure my program can gracefully handle unexpected input. That is, I want my program to at least not fall over. The following command produces no output, so I expect my program to do the same:
$ cd 03_catr $ cat tests/inputs/empty.txt
Next, I’ll run cat
on a file with one line of text:
$ cat tests/inputs/fox.txt The quick brown fox jumps over the lazy dog.
The -n|--number
and -b|--number-nonblank
flags will both number the lines, and the line number is right-justified in a field six characters wide followed by a tab character and then the line of text.
To distinguish the tab character, I can use the -t
option to display non-printing characters so that the tab shows as ^I
.
In the following command, I use the Unix pipe |
to connect STDOUT
from the first command to STDIN
in the second command:
$ cat -n tests/inputs/fox.txt | cat -t 1^IThe quick brown fox jumps over the lazy dog.
The spiders.txt file has three lines of text which should be numbered with the -b
option:
$ cat -b tests/inputs/spiders.txt 1 Don't worry, spiders, 2 I keep house 3 casually.
The difference between -n
(on the left) and -b
(on the right) is apparent only with the-bustle.txt as the latter will only number nonblank lines:
$ cat -n tests/inputs/the-bustle.txt $ cat -b tests/inputs/the-bustle.txt 1 The bustle in a house 1 The bustle in a house 2 The morning after death 2 The morning after death 3 Is solemnest of industries 3 Is solemnest of industries 4 Enacted upon earth,— 4 Enacted upon earth,— 5 6 The sweeping up the heart, 5 The sweeping up the heart, 7 And putting love away 6 And putting love away 8 We shall not want to use again 7 We shall not want to use again 9 Until eternity. 8 Until eternity.
Oddly, you can use -b
and -n
together, and the -b
option takes precedence. The challenge program will allow only one or the other.
When processing any file that does not exist or cannot be opened, cat
will print a message to STDERR
and move to the next file.
In the following example, I’m using blargh as a nonexistent file.
I create the file cant-touch-this using the touch
command and use the chmod
command to set the file with the permissions that make it unreadable.
You’ll learn more about what the 000
means in Chapter 15 when you write a clone of ls
:
$ touch cant-touch-this && chmod 000 cant-touch-this $ cat tests/inputs/fox.txt blargh tests/inputs/spiders.txt cant-touch-this The quick brown fox jumps over the lazy dog.cat: blargh: No such file or directory
Don't worry, spiders,
I keep house casually. cat: cant-touch-this: Permission denied
This is the output from the first file.
This is an error for a nonexistent file.
This is the output from the third file.
This is the error for an unreadable file.
Finally, run cat
with all the files and notice that it starts renumbering the lines for each file:
$ cd tests/inputs $ cat -n empty.txt fox.txt spiders.txt the-bustle.txt 1 The quick brown fox jumps over the lazy dog. 1 Don't worry, spiders, 2 I keep house 3 casually. 1 The bustle in a house 2 The morning after death 3 Is solemnest of industries 4 Enacted upon earth,— 5 6 The sweeping up the heart, 7 And putting love away 8 We shall not want to use again 9 Until eternity.
If you look at the mk-outs.sh script, you’ll see I execute cat
with all these files, individually and together, as regular files and through STDIN
, using no flags and with the -n
and -b
flags.
I capture all the outputs to various files in the tests/expected directory to use in testing.
In Chapter 2, I wrote the tests at the end of the chapter because I needed to show you some basics of the language. Starting with this exercise, I want to make you think about test-driven development (TDD) as described in a book by that title written by Kent Beck (Addison-Wesley, 2002). TDD advocates writing tests for code before writing the code as shown in Figure 3-1.
Technically, TDD involves writing each test as you add a feature. Since I’ve written all the tests for you, you might consider this more like test-first development. Once you’ve written code that passes the tests, you can start to refactor your code to improve it, perhaps by shortening the lines of code or by finding a faster implementation.
The challenge program you write should be called catr
(pronounced cat-er) for a Rust version of cat
.
I suggest you begin with cargo new catr
to start a new application and then copy my 03_catr/tests directory into your source tree.
Don’t copy anything but the tests as you will write the rest of the code yourself.
You should have a structure like this:
$ tree -L 2 catr/ catr ├── Cargo.toml ├── src │ └── main.rs └── tests ├── cli.rs ├── expected └── inputs 4 directories, 3 files
I’m going to use all the same external crates as in Chapter 2 plus the rand
crate for testing, so update your Cargo.toml to this:
[dependencies] clap = "2.33" [dev-dependencies] assert_cmd = "1" predicates = "1" rand = "0.8"
Now run cargo test
to download the crates, compile your program, and run the tests.
All the tests should fail.
Your mission, should you choose to accept it, is to write a program that will pass these tests.
The program in Chapter 2 was quite short and easily fit into src/main.rs. The typical programs you will write in your career will likely be much longer. Starting with this program, I will divide my code into a library in src/lib.rs and a binary in src/main.rs that will call library functions. I believe this organization makes it easier to test and grow applications over time.
First, I’ll move all the important bits from src/main.rs into a function called run
in src/lib.rs.
This function will return a kind of Result
to indicate success or failure.
This is similar to the TestResult
type alias from Chapter 2.
The TestResult
always returns the unit type ()
in the Ok
variant, but MyResult
can return an Ok
that contains any type which I can denote using the generic T
:
use std::error::Error;type MyResult<T> = Result<T, Box<dyn Error>>;
pub fn run() -> MyResult<()> {
println!("Hello, world!");
Ok(())
}
Import the Error
trait for representing error values.
Create a MyResult
to represent an Ok
value for any type T
or some Err
value that implements the Error
trait.
Define a public (pub
) function that returns either Ok
containing the unit type ()
or some error Err
.
Print Hello, world!.
Return an indication that the function ran successfully.
By default, all the variables and functions in a module are private. In the preceding code, I must use pub
to make this library function accessible to the rest of the program.
To call this, change src/main.rs to this:
fn main() { if let Err(e) = catr::run() {eprintln!("{}", e);
std::process::exit(1);
} }
Execute the catr::run
function and check if the return value matches an Err(e)
where e
is some value that implements the Error
trait, which means among other things that it can be printed.
Use the eprintln!
(error print line) macro to print the error message e
to STDERR
.
Exit the program with a nonzero value to indicate an error.
The eprint!
and eprintln!
macros are just like print!
and println!
except that they print to STDERR
.
If you execute cargo run
, you should see Hello, world! as before.
Next, I’ll add the program’s parameters, and I’d like to introduce a struct
called Config
to represent the arguments to the program.
A struct
is a data structure in which you define the names and types of the elements it will contain.
It’s similar to a class definition in other languages.
In this case, I want a struct
that describes the values the program will need such as a list of the input filenames and the flags for numbering the lines of output.
Add the following struct
to src/lib.rs.
It’s common to place such definitions near the top after the use
statements:
#[derive(Debug)]pub struct Config {
files: Vec<String>,
number_lines: bool,
number_nonblank_lines: bool,
}
The derive
macro allows me to add the Debug
trait so the struct can be printed.
Define a struct
called Config
. The pub
(public) makes this accessible outside the library.
The files
will be a vector of strings.
This is a Boolean value to indicate whether or not to print the line numbers.
This is a Boolean to control printing line numbers only for nonblank lines.
To use a struct
, I can create an instance of it with specific values.
In the following sketch of a get_args
function, you can see it finishes by creating a new Config
with the runtime values from the user.
Add use clap::{App, Arg}
and this function to your src/lib.rs.
Try to complete the function on your own, stealing what you can from Chapter 2:
pub fn get_args() -> MyResult<Config> {let matches = App::new("catr")...
Ok(Config {
files: ..., number_lines: ..., number_nonblank_lines: ..., }) }
This is a public function that returns a MyResult
that will contain either a Config
on success or an error.
Here you should define the parameters and process the matches.
Create a Config
using the supplied values.
This means the run
function needs to be updated to accept a Config
argument.
For now, print it:
pub fn run(config: Config) -> MyResult<()> {dbg!(config);
Ok(()) }
The function will accept a Config
struct and will return Ok
with the unit type if successful.
Use the dbg!
(debug) macro to print the configuration.
Update your src/main.rs as follows:
fn main() { if let Err(e) = catr::get_args().and_then(catr::run) {eprintln!("{}", e);
std::process::exit(1);
} }
Call the catr::get_args
function, then use Result::and_then
to pass the Ok(config)
to catr::run
.
If either get_args
or run
returns an Err
, print it to STDERR
.
Exit the program with a nonzero value.
See if you can get your program to print a usage like this:
$ cargo run --quiet -- --help catr 0.1.0 Ken Youens-Clark <[email protected]> Rust cat USAGE: catr [FLAGS] <FILE>... FLAGS: -h, --help Prints help information -n, --number Number lines -b, --number-nonblank Number non-blank lines -V, --version Prints version information ARGS: <FILE>... Input file(s) [default: -]
With no arguments, you program should be able to print a configuration structure like this:
$ cargo run [src/lib.rs:52] config = Config { files: ["-", ], number_lines: false,
number_nonblank_lines: false, }
Run with arguments and be sure the config
looks like this:
$ cargo run -- -n tests/inputs/fox.txt [src/lib.rs:52] config = Config { files: [ "tests/inputs/fox.txt",], number_lines: true,
number_nonblank_lines: false, }
The positional file argument is parsed into the files
.
The -n
option causes number_lines
to be true
.
While the BSD version will allow both -n
and -b
, the challenge program should consider these to be mutually exclusive and generate an error when used together:
$ cargo run -- -b -n tests/inputs/fox.txt error: The argument '--number-nonblank' cannot be used with '--number'
Give it a go. Seriously! I want you to try writing your version of this before you read ahead. I’ll wait here until you finish.
All set?
Compare what you have to my get_args
function:
pub fn get_args() -> MyResult<Config> { let matches = App::new("catr") .version("0.1.0") .author("Ken Youens-Clark <[email protected]>") .about("Rust cat") .arg( Arg::with_name("files").value_name("FILE") .help("Input file(s)") .required(true) .default_value("-") .min_values(1), ) .arg( Arg::with_name("number")
.help("Number lines") .short("n") .long("number") .takes_value(false) .conflicts_with("number_nonblank"), ) .arg( Arg::with_name("number_nonblank")
.help("Number non-blank lines") .short("b") .long("number-nonblank") .takes_value(false), ) .get_matches(); Ok(Config { files: matches.values_of_lossy("files").unwrap(),
number_lines: matches.is_present("number"),
number_nonblank_lines: matches.is_present("number_nonblank"), }) }
This positional argument is for the files and is required to have at least one value that defaults to “-”.
This is an option that has a short
name -n
and a long
name --number
. It does not take a value because it is a flag. When present, it will tell the program to print line numbers. It cannot occur in conjunction with -b
.
The -b|--number-nonblank
flag controls whether to print line numbers for nonblank lines.
Because at least one value is required, it should be safe to call Option::unwrap
.
The two Boolean options are either present or not.
Optional arguments have short
and/or long
names, but positional ones do not. You can define optional arguments before or after positional arguments. Defining positional arguments with min_values
also implies multiple
values but does not for optional parameters.
With this much code, you should be able to pass at least a couple of the tests when you execute cargo test
.
There will be a great deal of output showing you all the failing test output, but don’t despair.
You will soon see a fully passing test suite.
Now that you have validated all the arguments, you are ready to process the files and create the correct output.
First modify the run
function in src/lib.rs to print each filename:
pub fn run(config: Config) -> MyResult<()> { for filename in config.files {println!("{}", filename);
} Ok(()) }
Run the program with some input files.
In the following example, the bash
shell will expand the file glob *.txt into all filenames that end with the extension .txt:
$ cargo run -- tests/inputs/*.txttests/inputs/empty.txt tests/inputs/fox.txt tests/inputs/spiders.txt tests/inputs/the-bustle.txt
Windows PowerShell can expand file globs using Get-ChildItem
:
> cargo run -q -- -n (Get-ChildItem . estsinputs*.txt) C:Userskyclarkwork ust-sysprog 3_catr estsinputsempty.txt C:Userskyclarkwork ust-sysprog 3_catr estsinputsfox.txt C:Userskyclarkwork ust-sysprog 3_catr estsinputsspiders.txt C:Userskyclarkwork ust-sysprog 3_catr estsinputs he-bustle.txt
The next step is to try to open each filename.
When the filename is “-”, I should open STDIN
; otherwise, I will attempt to open the given filename and handle errors.
For the following code, you will need to expand your imports to the following:
use clap::{App, Arg}; use std::error::Error; use std::fs::File; use std::io::{self, BufRead, BufReader};
This next step is a bit tricky, so I’d like to provide an open
function for you to use.
In the following code, I’m using the match
keyword, which is similar to a switch
statement in C.
Specifically, I’m matching on whether filename
is equal to “-” or something else, which is specified using the wildcard _
:
fn open(filename: &str) -> MyResult<Box<dyn BufRead>> {match filename { "-" => Ok(Box::new(BufReader::new(io::stdin()))),
_ => Ok(Box::new(BufReader::new(File::open(filename)?))),
} }
The function will accept the filename and will return either an error or a boxed value that implements the BufRead
trait.
When the filename is “-”, read from std::io::stdin
.
Otherwise, use File::open
to try to open the given file or propagate an error.
If File::open
is successful, the result will be a filehandle, which is a mechanism for reading the contents of a file.
Both a filehandle and std::io::stdin
implement the BufRead
trait, which means the values will, for instance, respond to the BufRead::lines
function to produce lines of text.
Note that BufRead::lines
will remove any line endings such as
on Windows and
on Unix.
Again you see I’m using a Box
to create a pointer to heap-allocated memory to hold the filehandle.
You may wonder if this is completely necessary.
I could try to write the function without using Box
:
// This will not compile fn open(filename: &str) -> MyResult<dyn BufRead> { match filename { "-" => Ok(BufReader::new(io::stdin())), _ => Ok(BufReader::new(File::open(filename)?)), } }
If I try to compile this code, I get the following error:
error[E0277]: the size for values of type `(dyn std::io::BufRead + 'static)` cannot be known at compilation time --> src/lib.rs:88:28 | 88 | fn open(filename: &str) -> MyResult<dyn BufRead> { | ^^^^^^^^^^^^^^^^^^^^^ | doesn't have a size known at compile-time | = help: the trait `Sized` is not implemented for `(dyn std::io::BufRead + 'static)`
As the compiler says, there is not an implementation in BufRead
for the Sized
trait.
If a variable doesn’t have a fixed, known size, then Rust can’t store it on the stack.
The solution is to instead allocate memory on the heap by putting the return value into a Box
, which is a pointer with a known size.
The preceding open
function is really dense.
I can appreciate it if you think that’s more than a little complicated; however, it handles basically any error you will encounter.
To demonstrate this, change your run
to the following:
pub fn run(config: Config) -> MyResult<()> { for filename in config.files {match open(&filename) {
Err(err) => eprintln!("Failed to open {}: {}", filename, err),
Ok(_) => println!("Opened {}", filename),
} } Ok(()) }
Iterate through the filenames.
Try to open the filename. Note the use of &
to borrow the variable.
Print an error message to STDERR
when open
fails.
Print a successful message when open
works.
Try to run your program with the following:
A valid input file
A nonexistent file
An unreadable file
For the last option, you can create a file that cannot be read like so:
$ touch cant-touch-this && chmod 000 cant-touch-this
Run your program and verify your code gracefully prints error messages for bad input files and continues to process the valid ones:
$ cargo run -- blargh cant-touch-this tests/inputs/fox.txt Failed to open blargh: No such file or directory (os error 2) Failed to open cant-touch-this: Permission denied (os error 13) Opened tests/inputs/fox.txt
With this addition, you should be able to pass cargo test skips_bad_file
.
Now that you are able to open and read valid input files, I want you to finish the program on your own.
Can you figure out how to read the opened file line-by-line?
Start with tests/inputs/fox.txt that has only one line.
You should be able to see the following output:
$ cargo run -- tests/inputs/fox.txt The quick brown fox jumps over the lazy dog.
Verify that you can read STDIN
by default.
In the following command, I will use the |
to pipe STDOUT
from the first command to the STDIN
of the second command:
$ cat tests/inputs/fox.txt | cargo run The quick brown fox jumps over the lazy dog.
The output should be the same when providing the filename “-”.
In the following command, I will use the bash
redirect operator <
to take input from the given filename and provide it to STDIN
:
$ cargo run -- - < tests/inputs/fox.txt The quick brown fox jumps over the lazy dog.
Next, try an input file with more than one line and try to number the lines for -n
:
$ cargo run -- -n tests/inputs/spiders.txt 1 Don't worry, spiders, 2 I keep house 3 casually.
Then try to skip blank lines in the numbering for -b
:
$ cargo run -- -b tests/inputs/the-bustle.txt 1 The bustle in a house 2 The morning after death 3 Is solemnest of industries 4 Enacted upon earth,— 5 The sweeping up the heart, 6 And putting love away 7 We shall not want to use again 8 Until eternity.
Run cargo test
often to see which tests are failing.
The tests in tests/cli.rs are similar to Chapter 2, but I’ve added a little more organization.
For instance, I define several constant &str
values at the top of that module which I use throughout the crate.
I use a common convention of ALL_CAPS
names to highlight the fact that they are scoped or visible throughout the crate:
const PRG: &str = "catr"; const EMPTY: &str = "tests/inputs/empty.txt"; const FOX: &str = "tests/inputs/fox.txt"; const SPIDERS: &str = "tests/inputs/spiders.txt"; const BUSTLE: &str = "tests/inputs/the-bustle.txt";
To test that the program will die when given a nonexistent file, I use the rand
crate to generate a random filename that does not exist.
For the following function, I will use rand::{distributions::Alphanumeric, Rng}
to import various parts of the crate I need in this function:
fn gen_bad_file() -> String {loop {
let filename: String = rand::thread_rng()
.sample_iter(&Alphanumeric) .take(7) .map(char::from) .collect(); if fs::metadata(&filename).is_err() {
return filename; } } }
The function will return a String
, which is a dynamically generated string closely related to the str
struct I’ve been using.
Start an infinite loop
.
Create a random string of seven alphanumeric characters.
fs::metadata
returns an error when the given filename does not exist, so return the nonexistent filename.
In the preceding function, I use filename
two times after creating it. The first time, I borrow it using &filename
, and the second time I don’t use the ampersand. Try removing the &
and running the code. You should get an error message that ownership of filename
value is moved into fs::metadata
. Effectively, the function consumes the value, leaving it unusable. The &
shows I only want to borrow a reference to the value.
error[E0382]: use of moved value: `filename` --> tests/cli.rs:37:20 | 30 | let filename: String = rand::thread_rng() | -------- move occurs because `filename` has type `String`, | which does not implement the `Copy` trait ... 36 | if fs::metadata(filename).is_err() { | -------- value moved here 37 | return filename; | ^^^^^^^^ value used here after move
Don’t worry if you don’t completely understand the preceding code yet.
I’m only showing this so you understand how it is used in the skips_bad_file
test:
#[test] fn skips_bad_file() -> TestResult { let bad = gen_bad_file();let expected = format!("{}: .* [(]os error 2[)]", bad);
Command::cargo_bin(PRG)?
.arg(&bad) .assert() .success()
.stderr(predicate::str::is_match(expected)?); Ok(()) }
Generate the name of a nonexistent file.
The expected error message should include the filename and the string “os error 2” on both Windows or Unix platforms.
Run the program with the bad file and verify that STDERR
matches the expected pattern.
The command should succeed as bad files should only generate warnings and not kill the process.
In the preceding function, I used the format!
macro to generate a new String
. This macro works like print!
except that it returns the value rather than printing it.
I created a run
helper function to run the program with input arguments and verify that the output matches the text in the file generated by mk-outs.sh:
fn run(args: &[&str], expected_file: &str) -> TestResult {let expected = fs::read_to_string(expected_file)?;
Command::cargo_bin(PRG)?
.args(args) .assert() .success() .stdout(expected); Ok(()) }
The function accepts a slice of &str
arguments and the filename with the expected output. The function returns a TestResult
.
Try to read the expected output file.
Execute the program with the arguments and verify it runs successfully and produces the expected output.
I use this function like so:
#[test] fn bustle() -> TestResult { run(&[BUSTLE], "tests/expected/the-bustle.txt.out")}
Run the program with the BUSTLE
input file and verify that the output matches the output produced by mk-outs.sh.
I also wrote a helper function to provide input via STDIN
:
fn run_stdin( input_file: &str,args: &[&str], expected_file: &str, ) -> TestResult { let input = fs::read_to_string(input_file)?;
let expected = fs::read_to_string(expected_file)?; Command::cargo_bin(PRG)?
.args(args) .write_stdin(input) .assert() .success() .stdout(expected); Ok(()) }
The first argument is the filename containing the text that should be given to STDIN
.
Try to read the input and expected files.
Try to run the program with the given arguments and STDIN
and verify the output.
This function is used similarly:
#[test] fn bustle_stdin() -> TestResult { run_stdin(BUSTLE, &["-"], "tests/expected/the-bustle.txt.stdin.out")}
Run the program using the contents of the given filename as STDIN
and an input filename of “-”. Verify the output matches the expected value.
That should be enough to get started. Off you go! Come back when you’re done.
I hope you found this an interesting and challenging program to write. It’s important to tackle complicated programs one step at a time. I’ll show you how I built my program in this way.
I started with printing the lines of an open filehandle:
pub fn run(config: Config) -> MyResult<()> { for filename in config.files { match open(&filename) { Err(err) => eprintln!("{}: {}", filename, err),Ok(file) => { for line_result in file.lines() {
let line = line_result?;
println!("{}", line);
} } } } Ok(()) }
Print the filename and error when there is a problem opening a file.
Iterate over each line_result
value from BufRead::lines
.
Either unpack an Ok
value from line_result
or propagate an error.
Print the line.
When reading the lines from a file, you don’t get the lines directly from the filehandle but instead get a std::io::Result
, which is a “type is broadly used across std::io
for any operation which may produce an error.” Reading and writing files falls into the category of IO (input/output) which depends on external resources like the operating and file systems. While it’s unlikely that reading a line from a filehandle will fail, the point is that it could fail.
If you run cargo test
, you should pass about half of the tests, which is not bad for so few lines of code.
Next, I’d like to add the printing of line numbers for the -n|--number
option.
One solution that will likely be familiar to C programmers would be something like this:
pub fn run(config: Config) -> MyResult<()> { for filename in config.files { match open(&filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(file) => { let mut line_num = 0;for line_result in file.lines() { let line = line_result?; line_num += 1;
if config.number_lines {
println!("{:>6} {}", line_num, line);
} else { println!("{}", line);
} } } } } Ok(()) }
Initialize a mutable counter variable to hold the line number.
Add 1 to the line number.
Check whether to print line numbers.
If so, print the current line number in a right-justified field 6 characters wide followed by a tab character, and then the line of text.
Otherwise, print the line.
Recall that all variables in Rust are immutable by default, so it’s necessary to add mut
to line_num
as I intend to change it.
The +=
operator is a compound assignment that adds the righthand value 1 to line_num
to increment it1.
Of note, too, is the formatting syntax {:>6}
that indicates the width of the field as six characters with the text aligned to the right.
(You can use <
for left-justified and ^
for centered text.)
This syntax is similar to printf
in C, Perl, and Python’s string formatting.
If I run this version of the program, it looks pretty good:
$ cargo run -- tests/inputs/spiders.txt -n 1 Don't worry, spiders, 2 I keep house 3 casually.
While this works adequately, I’d like to point out a more idiomatic solution using Iterator::enumerate
.
This method will return a tuple containing the index position and value for each element in an iterable, which is something that can produce values until exhausted:
pub fn run(config: Config) -> MyResult<()> { for filename in config.files { match open(&filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(file) => { for (line_num, line_result) in file.lines().enumerate() {let line = line_result?; if config.number_lines { println!("{:>6} {}", line_num + 1, line);
} else { println!("{}", line); } } } } } Ok(()) }
The tuple values from Iterator::enumerate
can be unpacked using pattern matching.
Numbering from enumerate
starts at 0, so add 1 to mimic cat
which starts at 1.
This will create the same output, but now the code avoids using a mutable value.
I can execute cargo test fox
to run all the tests starting with fox, and I find that two out of three pass.
The program fails on the -b
flag, so next I need to handle printing the line numbers only for nonblank lines.
Notice in this version, I’m also going to remove line_result
and shadow the line
variable:
pub fn run(config: Config) -> MyResult<()> { for filename in config.files { match open(&filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(file) => { let mut last_num = 0;for (line_num, line) in file.lines().enumerate() { let line = line?;
if config.number_lines {
println!("{:>6} {}", line_num + 1, line); } else if config.number_nonblank_lines {
if !line.is_empty() { last_num += 1; println!("{:>6} {}", last_num, line);
} else { println!();
} } else { println!("{}", line);
} } } } } Ok(()) }
Initialize a mutable variable for the number of the last nonblank line.
Shadow the line
with the result of unpacking the Result
.
Handle printing line numbers.
Handle printing line numbers for nonblank lines.
If the line is not empty, increment last_num
and print the output.
If the line is empty, print a blank line.
If there are no numbering options, print the line.
Shadowing a variable is Rust is when you reuse a variable’s name and set it to a new value. Arguably the line_result
/line
code may be more explicit and readable, but reusing line
in this context is more Rustic code you’re likely to encounter.
If you run cargo test
, you should pass all the tests.
You have a working program now, but you don’t have to stop there.
If you’re up for an additional challenge, try implementing the other options shown in the manual pages for both the BSD and GNU versions.
For each option, use cat
to create the expected output file, then expand the tests to check that your program creates this same output.
I’d also recommend you check out the bat
, which is another Rust clone of cat
(“with wings”) for a more complete implementation.
The number lines output of cat -n
is similar in ways to nl
, a “line numbering filter” program.
cat
is also a bit similar to programs that will show you a page or screenfull of text at a time, so called pagers like more
and less
.
(more
would show you a page of text with “More” at the bottom to let you know you could continue. Obviously someone decided to be clever and named their clone less
, but it does the same thing.)
Consider implementing both of those programs.
Read the manual pages, create the test output, and copy the ideas from this project to write and test your versions.
You made big strides in this chapter, creating a much more complex program. Consider what you learned:
You separated your code into library (src/lib.rs) and binary (src/main.rs) crates, which can make it easier to organize and encapsulate ideas.
You created your first struct
, which is a bit like a class declaration in other languages. This struct allowed you to create a complex data structure called Config
to describe the inputs for your program.
By default, all values and functions are immutable and private. You learned to use mut
to make a value mutable and pub
to make a value or function public.
You used testing-first approach where all the tests exist before the program is even written. When the program passes all the tests, you can be confident your program meets all the specifications encoded in the tests.
You saw how to use the rand
crate to generate a random string for a nonexistent file.
You figured out how to read lines of text from both STDIN
or regular files.
You used the eprintln!
macros to print to STDERR
and format!
to dynamically generate a new string.
You used a for
loop to visit each element in an iterable.
You found that the Iterator::enumerate
method will return both the index and element as a tuple, which was useful for numbering the lines of text.
You learned to use a Box
that points to a filehandle to read either STDIN
or a regular file.
In the next chapter, you’ll learn a good deal more about reading files by lines, bytes, or characters.
1 Note that Rust does not have a unary ++
operator, so you cannot use line_num++
to increment a variable by 1.