You could count on me with just one hand
They Might Be Giants
The venerable wc
(word count) program dates back to version 1 of AT&T UNIX.
This program will display the number of lines, words, and bytes found in some text from STDIN
or one or more files.
In this exercise, you will learn:
How to use the Iterator::all
function
How to create a module for tests
How to fake a filehandle for testing
How to conditionally format and print a value
How to break a line of text into words and characters
How to use Iterator::collect
to turn an iterator into a vector
Here is an excerpt from the BSD wc
manual page that describes how the tool works:
WC(1) BSD General Commands Manual WC(1) NAME wc -- word, line, character, and byte count SYNOPSIS wc [-clmw] [file ...] DESCRIPTION The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard output. A line is defined as a string of characters delimited by a <newline> character. Characters beyond the final <newline> charac- ter will not be included in the line count. A word is defined as a string of characters delimited by white space characters. White space characters are the set of characters for which the iswspace(3) function returns true. If more than one input file is specified, a line of cumulative counts for all the files is displayed on a separate line after the output for the last file. The following options are available: -c The number of bytes in each input file is written to the standard output. This will cancel out any prior usage of the -m option. -l The number of lines in each input file is written to the standard output. -m The number of characters in each input file is written to the standard output. If the current locale does not support multi- byte characters, this is equivalent to the -c option. This will cancel out any prior usage of the -c option. -w The number of words in each input file is written to the standard output. When an option is specified, wc only reports the information requested by that option. The order of output always takes the form of line, word, byte, and file name. The default action is equivalent to specifying the -c, -l and -w options. If no files are specified, the standard input is used and no file name is displayed. The prompt will accept input until receiving EOF, or [^D] in most environments.
A picture is worth a kilobyte of words, so I’ll show you some examples using the test files in the 05_wcr directory. First, consider an empty file that should report 0 lines, words, and bytes, each of which should be right-justified in a field 8 characters wide:
$ cd 05_wcr $ wc tests/inputs/empty.txt 0 0 0 tests/inputs/empty.txt
Next, try a file with one line of text.
Note that I’ve put varying amounts of spaces in between some words and a tab character for reasons that will be discussed later.
I will use the cat
with the flags -t
to display the tab character as ^I
and the -e
to display $
for the end of the line:
$ cat -te tests/inputs/fox.txt The quick brown fox^Ijumps over the lazy dog.$
This example is short enough that I can manually count all the lines, words, bytes as shown in Figure 5-1 where spaces are noted with raised dots, the tab character with
, and the end-of-line as $
.
I find that wc
is in agreement:
$ wc tests/inputs/fox.txt 1 9 48 tests/inputs/fox.txt
As mentioned in Chapter 3, bytes may equate to characters for ASCII, but Unicode characters can take multiple bytes. The file tests/inputs/atlamal.txt contains the first stanza from Atlamál hin groenlenzku or The Greenland Ballad of Atli, an Old Norse poem1:
$ cat tests/inputs/atlamal.txt Frétt hefir öld óvu, þá er endr of gerðu seggir samkundu, sú var nýt fæstum, æxtu einmæli, yggr var þeim síðan ok it sama sonum Gjúka, er váru sannráðnir.
According to wc
, this file contains 4 lines, 29 words, and 177 bytes:
$ wc tests/inputs/atlamal.txt 4 29 177 tests/inputs/atlamal.txt
If I only wanted the number of lines, I can use the -l
flag and only that column will be shown:
$ wc -l tests/inputs/atlamal.txt 4 tests/inputs/atlamal.txt
I can similarly request only the number of bytes with -c
or words with -w
, and only those two columns will be shown:
$ wc -w -c tests/inputs/atlamal.txt 29 177 tests/inputs/atlamal.txt
I can request the number of characters using the -m
flag:
$ wc -m tests/inputs/atlamal.txt 159 tests/inputs/atlamal.txt
The GNU version of wc
will show both character and byte counts if you provide both the flags -m
and -c
, but the BSD version will only show one or the other with the latter flag taking precedence:
$ wc -cm tests/inputs/atlamal.txt159 tests/inputs/atlamal.txt $ wc -mc tests/inputs/atlamal.txt
177 tests/inputs/atlamal.txt
Note that no matter the order of the flags like -wc
or -cw
, the output columns are always ordered by lines, words, and bytes/characters:
$ wc -cw tests/inputs/atlamal.txt 29 177 tests/inputs/atlamal.txt
If no positional arguments are provided, wc
will read from STDIN
and will not print a filename:
$ cat tests/inputs/atlamal.txt | wc -lc 4 173
The GNU version of wc
will understand the filename -
to mean STDIN
, and it also provides long flag names as well as some other options:
$ wc --help Usage: wc [OPTION]... [FILE]... or: wc [OPTION]... --files0-from=F Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. With no FILE, or when FILE is -, read standard input. A word is a non-zero-length sequence of characters delimited by white space. The options below may be used to select which counts are printed, always in the following order: newline, word, character, byte, maximum line length. -c, --bytes print the byte counts -m, --chars print the character counts -l, --lines print the newline counts --files0-from=F read input from the files specified by NUL-terminated names in file F; If F is - then read names from standard input -L, --max-line-length print the length of the longest line -w, --words print the word counts --help display this help and exit --version output version information and exit
If processing more than one file, both versions will finish with a total line showing the number of lines, words, and bytes for all the inputs:
$ wc tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt 5 38 225 total
Nonexistent files are noted with a warning to STDERR
as the files are being processed:
$ wc tests/inputs/fox.txt blargh tests/inputs/atlamal.txt 1 9 48 tests/inputs/fox.txt wc: blargh: open: No such file or directory 4 29 177 tests/inputs/atlamal.txt 5 38 225 total
I can also redirect filehandle 2
in bash
to verify that wc
prints the warning to STDERR
:
$ wc tests/inputs/fox.txt blargh tests/inputs/atlamal.txt 2>err1 9 48 tests/inputs/fox.txt 4 29 177 tests/inputs/atlamal.txt 5 38 225 total $ cat err
wc: blargh: open: No such file or directory
The preceding behavior is as much as the challenge program will be expected to implement. There is an extensive test suite to help you along the way.
The challenge program should be called wcr
(pronounced wick-er) for our Rust version of wc
.
Use cargo new wcr
to start, then modify your Cargo.toml to add the following dependencies:
[dependencies] clap = "2.33" [dev-dependencies] assert_cmd = "1" predicates = "1" rand = "0.8"
Copy my 05_wcr/tests directory into your new project and run cargo test
to perform an initial build and run the tests, all of which should fail.
I can be a rather unimaginative sort sometimes, so I’m going to keep using the same structure for src/main.rs that I’ve used in the previous programs:
fn main() { if let Err(e) = wcr::get_args().and_then(wcr::run) { eprintln!("{}", e); std::process::exit(1); } }
Following is a skeleton for src/lib.rs you can copy.
First, here is how I would define the Config
to represent the command-line parameters:
use clap::{App, Arg}; use std::error::Error; type MyResult<T> = Result<T, Box<dyn Error>>; #[derive(Debug)] pub struct Config { files: Vec<String>,lines: bool,
words: bool,
bytes: bool,
chars: bool,
}
The files
will be a vector of strings.
The lines
is a Boolean for whether to print the line count.
The words
is a Boolean for whether to print the word count.
The bytes
is a Boolean for whether to print the byte count.
The chars
is a Boolean for whether to print the character count.
Here are the two functions you’ll need to get started.
I’ll let you fill in the get_args
from this skeleton:
pub fn get_args() -> MyResult<Config> { let matches = App::new("wcr") .version("0.1.0") .author("Ken Youens-Clark <[email protected]>") .about("Rust wc") // What goes here? .get_matches() Ok(Config { files: ... lines: ... words: ... bytes: ... chars: ... }) }
I suggest you start your run
by printing the configuration:
pub fn run(config: Config) -> MyResult<()> { println!("{:#?}", config); Ok(()) }
Try to get your program to generate --help
output like the following:
$ cargo run -- --help wcr 0.1.0 Ken Youens-Clark <[email protected]> Rust wc USAGE: wcr [FLAGS] [FILE]... FLAGS: -c, --bytes Show byte count -m, --chars Show character count -h, --help Prints help information -l, --lines Show line count -V, --version Prints version information -w, --words Show word count ARGS: <FILE>... Input file(s) [default: -]
I fretted a bit about whether to mimic the BSD or GNU version of wc
for combining the -m
(character) and -c
(bytes) flags.
I decided to use the BSD behavior, so your program should disallow both of these flags used together:
$ cargo run -- -cm tests/inputs/fox.txt error: The argument '--bytes' cannot be used with '--chars' USAGE: wcr --bytes --chars
The default behavior will be to print lines, words, and bytes, which means those values in the configuration should be true
when none have been explicitly requested by the user.
Ensure your program will print this:
$ cargo run -- tests/inputs/fox.txt Config { files: ["tests/inputs/fox.txt", ], lines: true, words: true, bytes: true, chars: false,
}
A positional argument should be found in the files
.
The chars
value should be false
unless the -m|--chars
flag is present.
If any single flag is present, then all the other flags not mentioned should be false
:
$ cargo run -- -l tests/inputs/*.txtConfig { files: [ "tests/inputs/atlamal.txt", "tests/inputs/empty.txt", "tests/inputs/fox.txt", ], lines: true,
words: false, bytes: false, chars: false, }
Stop here and get this much working. My dog needs a bath, so I’ll be right back.
I guess you got that figured out, so following is the first part of my get_args
.
There’s nothing new to how I declare the parameters, so I’ll not comment on this:
pub fn get_args() -> MyResult<Config> { let matches = App::new("wcr") .version("0.1.0") .author("Ken Youens-Clark <[email protected]>") .about("Rust wc") .arg( Arg::with_name("files") .value_name("FILE") .help("Input file(s)") .default_value("-") .min_values(1), ) .arg( Arg::with_name("lines") .value_name("LINES") .help("Show line count") .takes_value(false) .short("l") .long("lines"), ) .arg( Arg::with_name("words") .value_name("WORDS") .help("Show word count") .takes_value(false) .short("w") .long("words"), ) .arg( Arg::with_name("bytes") .value_name("BYTES") .help("Show byte count") .takes_value(false) .short("c") .long("bytes"), ) .arg( Arg::with_name("chars") .value_name("CHARS") .help("Show character count") .takes_value(false) .short("m") .long("chars") .conflicts_with("bytes"), ) .get_matches();
After clap
parses the arguments, I unpack them and try to figure out the default values:
let mut lines = matches.is_present("lines");let mut words = matches.is_present("words"); let mut bytes = matches.is_present("bytes"); let mut chars = matches.is_present("chars"); if [lines, words, bytes, chars].iter().all(|v| v == &false) {
lines = true; words = true; bytes = true; chars = false; } Ok(Config {
files: matches.values_of_lossy("files").unwrap(), lines, words, bytes, chars, }) }
Unpack all the flags.
If all the flags are false
, then set lines
, words
, and bytes
to true
.
Use the struct field init shorthand to set the values.
I want to highlight that I create a slice with all the flags and call the slice::iter
method to create an iterator.
This is so I can use the Iterator::all
function to find if all the values are false
.
This method expects a closure, which is an anonymous function that can be passed as an argument to another function.
Here, the closure is a predicate or a test that figures out if an element is false
.
I want to know if all the flags are false
, and so a reference to each flag is passed as the argument to the closure.
The values are references, so I must compare each value to &false
which is a reference to a Boolean value.
If all the evaluations are true
, then Iterator::all
will return true
2.
Now to work on the counting.
You might start by processing each file, counting the various bits, and printing the desired columns.
I suggest you once again use the open
function from Chapter 2 for opening the files:
fn open(filename: &str) -> MyResult<Box<dyn BufRead>> { match filename { "-" => Ok(Box::new(BufReader::new(io::stdin()))), _ => Ok(Box::new(BufReader::new(File::open(filename)?))), } }
Be sure to expand your imports to the following:
use clap::{App, Arg}; use std::error::Error; use std::fs::File; use std::io::{self, BufRead, BufReader};
Here is a run
function to get you going:
pub fn run(config: Config) -> MyResult<()> { for filename in &config.files { match open(filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(_file) => println!("Opened {}", filename), } } Ok(()) }
Using your open filehandle, start as simply as possible using the empty file and make sure your program prints zeros for the three columns of lines, words, and bytes:
$ cargo run -- tests/inputs/empty.txt 0 0 0 tests/inputs/empty.txt
Next use tests/inputs/fox.txt and make sure you get the following counts. I specifically added various kinds and numbers of whitespace to challenge you on how to split the text into words:
$ cargo run -- tests/inputs/fox.txt 1 9 48 tests/inputs/fox.txt
Be sure your program can handle the Unicode in tests/inputs/atlamal.txt correctly:
$ cargo run -- tests/inputs/atlamal.txt 4 29 177 tests/inputs/atlamal.txt
And that you correctly count the characters:
$ cargo run -- tests/inputs/atlamal.txt -wml 4 29 159 tests/inputs/atlamal.txt
When you can handle any one file, use more than one to check that you print the correct total column:
$ cargo run -- tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt 5 38 225 total
When all that works correctly, try reading from STDIN
:
$ cat tests/inputs/atlamal.txt | cargo run 4 29 177
Run cargo test
often to see how you’re progressing.
Don’t read ahead until you pass all the tests.
I’d like to walk you through how I arrived at a solution, and I’d like to stress that mine is just one of many possible ways to write this program.
As long as your code passes the tests and produces the same output as the BSD version of wc
, then it works well and you should be proud of your accomplishments.
I suggested you start by iterating the filenames and printing the counts for the file.
To that end, I decided to create a function called count
that would take a filehandle and possibly return a struct called FileInfo
containing the number of lines, words, bytes, and characters each represented as a usize
.
I say that the function will possibly return this struct because the function will involve IO, which could go sideways.
I place the following definition just after the Config
struct.
For reasons I will explain shortly, this must derive the PartialEq
trait in addition to Debug
:
#[derive(Debug, PartialEq)] pub struct FileInfo { num_lines: usize, num_words: usize, num_bytes: usize, num_chars: usize, }
To represent this the function might succeed or fail, it will return a MyResult<FileInfo>
meaning that on success it will have Ok<FileInfo>
and on failure it will have an Err
.
To start this function, I will initialize some mutable variables to count all the elements and will return a FileInfo
struct:
pub fn count(mut file: impl BufRead) -> MyResult<FileInfo> {let mut num_lines = 0;
let mut num_words = 0; let mut num_bytes = 0; let mut num_chars = 0; Ok(FileInfo { num_lines,
num_words, num_bytes, num_chars, }) }
The count
function will accept a mutable file
value, and it might return a FileInfo
struct.
Initialize mutable variables to count the lines, words, bytes, and characters.
For now, return a FileInfo
with all zeros.
I’m introducing the impl
keyword to indicate that the file
value must implement the BufRead
trait. Recall that open
returns a value that meets this criteria. You’ll shortly see how this makes the function flexible.
In Chapter 3, I showed how to write a unit test, placing it just after the function it was testing.
I’m going to create a unit test for the count
function, but this time I’m going to place it inside a module called tests
.
This is a tidy way to group unit tests, and I can use a configuration option that tells Rust to only compile the module during testing.
This is especially useful because I want to use std::io::Cursor
in my test to make a fake filehandle for the count
function.
The module will not be included when I build and run the program when not testing.
A Cursor
is “used with in-memory buffers, anything implementing AsRef<[u8]>
, to allow them to implement Read
and/or Write
, allowing these buffers to be used anywhere you might use a reader or writer that does actual I/O.”
Following is how I create the tests
module and then import and test the count
function:
#[cfg(test)]mod tests {
use super::{count, FileInfo};
use std::io::Cursor;
#[test] fn test_count() { let text = "I don't want the world. I just want your half. "; let info = count(Cursor::new(text));
assert!(info.is_ok());
let expected = FileInfo { num_lines: 1, num_words: 10, num_chars: 48, num_bytes: 48, }; assert_eq!(info.unwrap(), expected);
} }
The cfg
enables conditional compilation, so this module will only be compiled when testing.
Define a new module (mod
) called tests to contain test code.
Import the count
function and FileInfo
struct from the parent module super
, meaning next above or higher and refers to the module above tests
that contains it.
Import std::io::Cursor
.
Run count
with the Cursor
, which implements BufRead
.
Ensure the result is Ok
.
Compare the result to the expected value. This comparison requires FileInfo
to derive PartialEq
.
Run this test using cargo test test_count
.
You will see lots of warnings from the Rust compiler about unused variables or variables that do not need to be mutable.
The most important result is that the test fails:
failures: ---- tests::test_count stdout ---- thread 'tests::test_count' panicked at 'assertion failed: `(left == right)` left: `FileInfo { num_lines: 0, num_words: 0, num_bytes: 0, num_chars: 0 }`, right: `FileInfo { num_lines: 1, num_words: 10, num_bytes: 48, num_chars: 48 }`', src/lib.rs:146:9
Take some time to write the rest of the count
function that will pass this test.
I will take my squeaky clean dog for a walk and maybe have some tea.
OK, we’re back.
Now I’ll show you how I wrote my count
function.
I know from Chapter 3 that BufRead::lines
will remove the line endings, and I don’t want that because newlines in Windows files are two bytes (
) but Unix newlines are just one byte (
).
I can copy some code from Chapter 3 that uses BufRead::read_line
instead to read each line into a buffer.
Conveniently, this function tells me how many bytes have been read from the file:
pub fn count(mut file: impl BufRead) -> MyResult<FileInfo> { let mut num_lines = 0; let mut num_words = 0; let mut num_bytes = 0; let mut num_chars = 0; let mut line = String::new();loop {
let line_bytes = file.read_line(&mut line)?;
if line_bytes == 0 {
break; } num_bytes += line_bytes;
num_lines += 1;
num_words += line.split_whitespace().count();
num_chars += line.chars().count();
line.clear();
} Ok(FileInfo { num_lines, num_words, num_bytes, num_chars, }) }
Create a mutable buffer to hold each line
of text.
Create an infinite loop
for reading the filehandle.
Try to read a line from the filehandle.
End of file (EOF) has been reached when zero bytes are read, so break
out of the loop.
Add the number of bytes from this line to the num_bytes
variable.
Each time through the loop is a line, so increment num_lines
.
Use the str::split_whitespace
method to break the string on whitespace and use Iterator::count
to find the number of words.
Use the str::chars
method to break the string into Unicode characters and use Iterator::count
to count the characters.
Clear the line
buffer for the next line of text.
Earlier I stressed how this input file had a varying number and type of whitespace separating words. I did this in case you chose to iterate over the characters to find word boundaries. That is, you might have not found str::split_whitespace
and instead used an iterator over str::chars
. If you increment the number of words every time you find whitespace assuming there is only one between words, you might end up overcounting words. Instead, you would need to find runs of whitespace as the delimiter between words. Similarly, you might have been tempted to use a regular expression which would also need to consider one or more whitespace characters.
With these changes, test_count
passes.
I’d like to see how this looks, so I can change run
to print the results of successfully counting the elements of the file or print a warning to STDERR
when the file can’t be opened:
pub fn run(config: Config) -> MyResult<()> { for filename in &config.files { match open(filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(file) => { if let Ok(info) = count(file) {println!("{:?}", info);
} } } } Ok(()) }
When I run it on one of the test inputs, it appears to work for a valid file:
$ cargo run -- tests/inputs/fox.txt FileInfo { num_lines: 1, num_words: 9, num_bytes: 48, num_chars: 48 }
It even handles reading from STDIN
:
$ cat tests/inputs/fox.txt | cargo run FileInfo { num_lines: 1, num_words: 9, num_bytes: 48, num_chars: 48 }
Now to make the output meet the specifications.
To create the expected output, I can start by changing run
to always print the lines, words, and bytes followed by the filename:
pub fn run(config: Config) -> MyResult<()> { for filename in &config.files { match open(filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(file) => { if let Ok(info) = count(file) { println!( "{:>8}{:>8}{:>8} {}",info.num_lines, info.num_words, info.num_bytes, filename ); } } } } Ok(()) }
If I run it with one input file, it’s already looking pretty sweet:
$ cargo run -- tests/inputs/fox.txt 1 9 48 tests/inputs/fox.txt
If I run cargo test fox
, I pass one out of eight tests.
Huzzah!
running 8 tests test fox ... ok test fox_bytes ... FAILED test fox_chars ... FAILED test fox_bytes_lines ... FAILED test fox_words_bytes ... FAILED test fox_words ... FAILED test fox_words_lines ... FAILED test fox_lines ... FAILED
Inspect tests/cli.rs to see what the passing test looks like. Note that the tests reference constant values declared at the top of the module:
const PRG: &str = "wcr"; const EMPTY: &str = "tests/inputs/empty.txt"; const FOX: &str = "tests/inputs/fox.txt"; const ATLAMAL: &str = "tests/inputs/atlamal.txt";
Again I have a run
helper function to run my tests:
fn run(args: &[&str], expected_file: &str) -> TestResult { let expected = fs::read_to_string(expected_file)?;Command::cargo_bin(PRG)?
.args(args) .assert() .success() .stdout(expected); Ok(()) }
Try to read the expected
output for this command.
Run the wcr
program with the given arguments. Assert that the program succeeds and that STDOUT
matches the expected
value.
The fox
test is running wcr
with the FOX
input file and no options, comparing it to the contents of the expected output file which was generated using 05_wcr/mk-outs.sh:
#[test] fn fox() -> TestResult { run(&[FOX], "tests/expected/fox.txt.out") }
Look at the next function in the file to see a failing test:
#[test] fn fox_bytes() -> TestResult { run(&["--bytes", FOX], "tests/expected/fox.txt.c.out")}
When run with --bytes
, my program should only print that column of output, but it always prints lines, words, and bytes.
I decided to write a function called format_field
in src/lib.rs that would conditionally return a formatted string or the empty string depending on a Boolean value:
fn format_field(value: usize, show: bool) -> String {if show {
format!("{:>8}", value)
} else { "".to_string()
} }
The function accepts a usize
value and a Boolean and returns a String
.
Check if the show
value is true
.
Return a new string by formatting the number into a string 8 characters wide.
Otherwise, return the empty string.
Why does this function return a String
and not a str
? They’re both strings, but a str
is an immutable, fixed-length string. The string that will be returned from the function is dynamically generated at runtime, so I must use String
, which is a growable, heap-allocated structure. Props to dynamic languages like Perl and Python that hide so many complexities of strings which turn out to be way more complicated than I ever considered.
I can expand my tests
module to add a unit test for this:
#[cfg(test)] mod tests { use super::{count, format_field, FileInfo};use std::io::Cursor; #[test] fn test_count() {} // Same as before #[test] fn test_format_field() { assert_eq!(format_field(1, false), "");
assert_eq!(format_field(3, true), " 3");
assert_eq!(format_field(10, true), " 10");
} }
Add format_field
to the imports.
The function should return the empty string when show
is false
.
Check width for a single-digit number.
Check width for a double-digit number.
Here is how I use it in context where I also handle printing the empty string when reading from STDIN
:
pub fn run(config: Config) -> MyResult<()> { for filename in &config.files { match open(filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(file) => { if let Ok(info) = count(file) { println!( "{}{}{}{}{}",format_field(info.num_lines, config.lines), format_field(info.num_words, config.words), format_field(info.num_bytes, config.bytes), format_field(info.num_chars, config.chars), if filename.as_str() == "-" {
"".to_string() } else { format!(" {}", filename) } ); } } } } Ok(()) }
Format the output for each of the columns using the format_field
function.
When the filename is “-”, print the empty string; otherwise, print a space and the filename.
With these changes, all the tests for cargo test fox
pass.
If I run the entire test suite, I’m still failing the all tests:
failures: test_all test_all_bytes test_all_bytes_lines test_all_lines test_all_words test_all_words_bytes test_all_words_lines
Look at the all
function in tests/cli.rs to see that the test is using all the input files as arguments:
#[test] fn all() -> TestResult { run(&[EMPTY, FOX, ATLAMAL], "tests/expected/all.out") }
If I run my current program with all the input files, I can see that I’m missing the total line:
$ cargo run -- tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt
Here is my final run
function that keeps a running total and prints those values when there is more than one input:
pub fn run(config: Config) -> MyResult<()> { let mut total_lines = 0;let mut total_words = 0; let mut total_bytes = 0; let mut total_chars = 0; for filename in &config.files { match open(filename) { Err(err) => eprintln!("{}: {}", filename, err), Ok(file) => { if let Ok(info) = count(file) { println!( "{}{}{}{}{}", format_field(info.num_lines, config.lines), format_field(info.num_words, config.words), format_field(info.num_bytes, config.bytes), format_field(info.num_chars, config.chars), if filename.as_str() == "-" { "".to_string() } else { format!(" {}", filename) } ); total_lines += info.num_lines;
total_words += info.num_words; total_bytes += info.num_bytes; total_chars += info.num_chars; } } } } if config.files.len() > 1 {
println!( "{}{}{}{} total", format_field(total_lines, config.lines), format_field(total_words, config.words), format_field(total_bytes, config.bytes), format_field(total_chars, config.chars) ); } Ok(()) }
Create mutable variables to track the total number of lines, words, bytes, and characters.
Update the totals using the values from this file.
Print the totals if there is more than one input.
This appears to work well:
$ cargo run -- tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt 5 38 225 totalcargo run -- -m tests/inputs/atlamal.txt
I can count characters instead of bytes:
$ cargo run -- -m tests/inputs/atlamal.txt 159 tests/inputs/atlamal.txt
I can show and hide any columns I want:
$ cargo run -- -wc tests/inputs/atlamal.txt 29 177 tests/inputs/atlamal.txt
Most importantly, cargo test
shows all passing tests.
Write a version that mimics the output from the GNU wc
instead of the BSD version.
If your system already has the GNU version, run the mk-outs.sh program to generate the expected outputs for the given input files.
Modify the program to create the correct output according to the tests.
Then expand the program to handle the additional options like --files0-from
for reading the input filenames from a file and --max-line-length
to print the length of the longest line.
Add tests for the new functionality.
Next, ponder the mysteries of the iswspace
function mentioned in the BSD manual page noted at the beginning of the chapter.
I had wanted to include a test file of the Issa haiku from Chapter 2 but in the original Japanese characters3:
隅の蜘案じな煤はとらぬぞよ
Issa
BSD wc
thinks there are 3 words:
$ wc spiders.txt 1 3 40 spiders.txt
The GNU version says there is only 1 word:
$ wc spiders.txt 1 1 40 spiders.txt
I didn’t want to open that can of worms, but if you were creating a version of this program to release to the public, what would you report for the number of words?
Reflect upon your progress in this chapter:
You learned that the Iterator::all
function will return true
if all the elements evaluate to true
for the given predicate, which is a closure accepting an element. Many similar Iterator
methods accept a closure as an argument for testing, selecting, and transforming the elements.
You used the str::split_whitespace
and str::chars
methods to break text into words and characters.
You used the Iterator::count
method to count the number of items.
You wrote a function to conditionally format a value or the empty string to support the printing or omission of information according to the flag arguments.
You organized your unit tests into a tests
module and imported functions from the parent module called super
.
You saw how to use std::io::Cursor
to create a fake filehandle for testing a function that expects something that implements BufRead
.
In about 200 lines of Rust, you wrote a pretty passable replacement for one of the most widely used Unix programs.
1 There are many who know how of old did men, In counsel gather; little good did they get; In secret they plotted, it was sore for them later, And for Gjuki’s sons, whose trust they deceived.
2 When my youngest first started brushing his own his teeth before bed, I would ask if he’d brushed and flossed. The problem was that he was prone to fibbing, so it was hard to trust him. In an actual exchange one night, I asked “Did you brush and floss your teeth?” Yes, he replied. “Did you brush your teeth?” Yes, he replied. “Did you floss your teeth?” No, he replied. So clearly he failed to properly combine Boolean values because a true
statement and a false
statement should result in a false
outcome.
3 A more literal translation might be “Corner spider, rest easy, my soot-broom is idle.”