Please explain the expression on your face
They Might Be Giants
The grep
program1 will find lines of input that match a given regular expression.
By default, the input can come from STDIN
, but you can also provide the names of one or more files or directories if you also use a recursive option to find all the files in those directories.
The normal output will be the lines that match the given pattern, but you can invert the match to find the lines that don’t match.
You can also instruct grep
to print the number of matching lines instead of the lines.
Pattern matching is normally case-sensitive, but you can use an option to perform case-insensitive matching.
While the original program will do more, the challenge program will only go this far.
You will learn:
How to use a case-sensitive regular expression
About variations of regular expression syntax
Another syntax to indicate a trait bound
How to use Rust’s logical AND (&&
) and OR (||
) operators
The manual page for the BSD grep
shows just how many different options the command will accept:
GREP(1) BSD General Commands Manual GREP(1) NAME grep, egrep, fgrep, zgrep, zegrep, zfgrep -- file pattern searcher SYNOPSIS grep [-abcdDEFGHhIiJLlmnOopqRSsUVvwxZ] [-A num] [-B num] [-C[num]] [-e pattern] [-f file] [--binary-files=value] [--color[=when]] [--colour[=when]] [--context[=num]] [--label] [--line-buffered] [--null] [pattern] [file ...] DESCRIPTION The grep utility searches any given input files, selecting lines that match one or more patterns. By default, a pattern matches an input line if the regular expression (RE) in the pattern matches the input line without its trailing newline. An empty expression matches every line. Each input line that matches at least one of the patterns is written to the standard output. grep is used for simple patterns and basic regular expressions (BREs); egrep can handle extended regular expressions (EREs). See re_format(7) for more information on regular expressions. fgrep is quicker than both grep and egrep, but can only handle fixed patterns (i.e. it does not interpret regular expressions). Patterns may consist of one or more lines, allowing any of the pattern lines to match a portion of the input.
The GNU version is very similar:
GREP(1) General Commands Manual GREP(1) NAME grep, egrep, fgrep - print lines matching a pattern SYNOPSIS grep [OPTIONS] PATTERN [FILE...] grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...] DESCRIPTION grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.
To demonstrate how grep
works, I’ll use the 09_grepr/tests/inputs directory:
$ cd 09_grepr/tests/inputs $ wc -l * 9 bustle.txt 0 empty.txt 1 fox.txt 9 nobody.txt 19 total
To start verify for yourself that grep fox empty.txt
will print nothing when using the empty.txt file.
As shown by the usage, grep
accepts a regular expression as the first positional argument and possibly some input files for the rest.
Note that an empty regular expression will match all lines of input, and here I’ll use the input file fox.txt, which contains one line of text:
$ grep "" fox.txt The quick brown fox jumps over the lazy dog.
Take a peek at this lovely Emily Dickinson poem, and notice that “Nobody” is always capitalized:
$ cat nobody.txt I'm Nobody! Who are you? Are you—Nobody—too? Then there's a pair of us! Don't tell! they'd advertise—you know! How dreary—to be—Somebody! How public—like a Frog— To tell one's name—the livelong June— To an admiring Bog!
If I search for Nobody, the two lines containing the string are printed:
$ grep Nobody nobody.txt I'm Nobody! Who are you? Are you—Nobody—too?
If I search for lowercase “nobody” with grep nobody nobody.txt
, nothing is printed.
I can, however, use -i|--ignore-case
to find these lines:
$ grep -i nobody nobody.txt I'm Nobody! Who are you? Are you—Nobody—too?
I can use -v|--invert-match
option to find the lines that don’t match the pattern:
$ grep -v Nobody nobody.txt Then there's a pair of us! Don't tell! they'd advertise—you know! How dreary—to be—Somebody! How public—like a Frog— To tell one's name—the livelong June— To an admiring Bog!
The -c|--count
option will cause the output to be a summary of the number of times a match occurs:
$ grep -c Nobody nobody.txt 2
I can combine -v
and -c
to count the lines not matching:
$ grep -vc Nobody nobody.txt 7
When searching multiple input files, the output includes filename:
$ grep The *.txt bustle.txt:The bustle in a house bustle.txt:The morning after death bustle.txt:The sweeping up the heart, fox.txt:The quick brown fox jumps over the lazy dog. nobody.txt:Then there's a pair of us!
The filename is also included for the counts:
$ grep -c The *.txt bustle.txt:3 empty.txt:0 fox.txt:1 nobody.txt:1
Normally, the positional arguments are files, and the inclusion of a directory such as my $HOME directory will cause grep
to print a warning:
$ grep The bustle.txt $HOME fox.txt bustle.txt:The bustle in a house bustle.txt:The morning after death bustle.txt:The sweeping up the heart, grep: /Users/kyclark: Is a directory fox.txt:The quick brown fox jumps over the lazy dog.
Directory names are only acceptable when using the -r|--recursive
option to find all the files in a directory that contain matching text:
$ grep -r The . ./nobody.txt:Then there's a pair of us! ./bustle.txt:The bustle in a house ./bustle.txt:The morning after death ./bustle.txt:The sweeping up the heart, ./fox.txt:The quick brown fox jumps over the lazy dog.
The -r
and -i
short flags can be combined to perform a recursive, case-insensitive search of one or more directories:
$ grep -ri the . ./nobody.txt:Then there's a pair of us! ./nobody.txt:Don't tell! they'd advertise—you know! ./nobody.txt:To tell one's name—the livelong June— ./bustle.txt:The bustle in a house ./bustle.txt:The morning after death ./bustle.txt:The sweeping up the heart, ./fox.txt:The quick brown fox jumps over the lazy dog.
Without any positional arguments for inputs, grep
will read STDIN
:
$ cat * | grep -i the The bustle in a house The morning after death The sweeping up the heart, The quick brown fox jumps over the lazy dog. Then there's a pair of us! Don't tell! they'd advertise—you know! To tell one's name—the livelong June—
This is as far as the challenge program is expected to go.
The name of the challenge program should be grepr
for a Rust version of grep
.
Start with cargo new grepr
, then copy the 09_grepr/tests directory into your new project.
In addition to using clap
to parse the command-line arguments, my solution will use regex
for regular expressions and walkdir
to find the input files.
Here is how I start the Cargo.toml:
[dependencies] clap = "2.33" regex = "1" walkdir = "2" sys-info = "0.9" [dev-dependencies] assert_cmd = "1" predicates = "1" rand = "0.8"
You can run cargo test
to perform an initial build and run the tests, all of which should fail.
I will start with the following for src/main.rs:
fn main() { if let Err(e) = grepr::get_args().and_then(grepr::run) { eprintln!("{}", e); std::process::exit(1); } }
Following is how I started my src/lib.rs.
Note that all the Boolean options default to false
:
use clap::{App, Arg}; use regex::{Regex, RegexBuilder}; use std::error::Error; type MyResult<T> = Result<T, Box<dyn Error>>; #[derive(Debug)] pub struct Config { pattern: Regex,files: Vec<String>,
recursive: bool,
count: bool,
invert_match: bool,
}
The pattern
is a compiled regular expression.
The files
is a vector of strings.
The recursive
option is a Boolean to recursively search directories.
The count
option is a Boolean to display a count of the matches.
The invert_match
is a Boolean to find lines that do not match the pattern.
Here is how I started my get_args
and run
functions.
You should fill in the missing parts:
pub fn get_args() -> MyResult<Config> { let matches = App::new("grepr") .version("0.1.0") .author("Ken Youens-Clark <[email protected]>") .about("Rust grep") // What goes here? .get_matches(); Ok(Config { pattern: ... files: ... recursive: ... count: ... invert_match: ... }) } pub fn run(config: Config) -> MyResult<()> { println!("{:#?}", config); Ok(()) }
Your program should be able to produce the following usage:
grepr 0.1.0 Ken Youens-Clark <[email protected]> Rust grep USAGE: grepr [FLAGS] <PATTERN> <FILE>... FLAGS: -c, --count Count occurrences -h, --help Prints help information -i, --insensitive Case-insensitive -v, --invert-match Invert match -r, --recursive Recursive search -V, --version Prints version information ARGS: <PATTERN> Search pattern<FILE>... Input file(s) [default: -]
The search pattern is a required argument.
The input files are optional and default to “-” for STDIN
.
Your program should be able to print a Config
like the following when provided a pattern and no input files:
$ cargo run -- dog Config { pattern: dog, files: [ "-", ], recursive: false, count: false, invert_match: false, }
It should be able to handle one or more input files and handle the flags:
$ cargo run -- dog -ricv tests/inputs/*.txt Config { pattern: dog, files: [ "tests/inputs/bustle.txt", "tests/inputs/empty.txt", "tests/inputs/fox.txt", "tests/inputs/nobody.txt", ], recursive: true, count: true, invert_match: true, }
It should reject an invalid regular expression.
For instance, *
signifies zero or more of the preceding pattern.
By itself, this is incomplete:
$ cargo run -- * Invalid pattern "*"
I assume you figured that out. Following is how I declared my arguments:
pub fn get_args() -> MyResult<Config> { let matches = App::new("grepr") .version("0.1.0") .author("Ken Youens-Clark <[email protected]>") .about("Rust grep") .arg( Arg::with_name("pattern").value_name("PATTERN") .help("Search pattern") .required(true), ) .arg( Arg::with_name("files")
.value_name("FILE") .help("Input file(s)") .required(true) .default_value("-") .min_values(1), ) .arg( Arg::with_name("insensitive")
.value_name("INSENSITIVE") .help("Case-insensitive") .short("i") .long("insensitive") .takes_value(false), ) .arg( Arg::with_name("recursive")
.value_name("RECURSIVE") .help("Recursive search") .short("r") .long("recursive") .takes_value(false), ) .arg( Arg::with_name("count")
.value_name("COUNT") .help("Count occurrences") .short("c") .long("count") .takes_value(false), ) .arg( Arg::with_name("invert")
.value_name("INVERT") .help("Invert match") .short("v") .long("invert-match") .takes_value(false), ) .get_matches();
The first positional argument is for the pattern
.
The rest of the positional arguments are for the inputs. The default is “-”.
The insensitive
flag will handle case-insensitive options.
The recursive
flag will handle searching for files in directories.
The count
flag will cause the program to print counts.
The invert
flag will search for lines not matching the pattern.
Here the order in which you declare the positional parameters is important as the first one defined will be for the first positional argument. You can still define the options before or after the positional parameters.
Next, I used the arguments to create a regular expression that will incorporate the insensitive
option:
let pattern = matches.value_of("pattern").unwrap();let pattern = RegexBuilder::new(pattern)
.case_insensitive(matches.is_present("insensitive"))
.build()
.map_err(|_| format!("Invalid pattern "{}"", pattern))?;
Ok(Config {
pattern, files: matches.values_of_lossy("files").unwrap(), recursive: matches.is_present("recursive"), count: matches.is_present("count"), invert_match: matches.is_present("invert"), }) }
The pattern
is required, so it should be safe to unwrap the value.
The RegexBuilder::new
method will create a new regular expression.
The Regex::case_insensitive
method will cause the regex to disregard case in comparisons when the insensitive
is flag is present.
The Regex::build
method will compile the regex.
If build
returns an error, use Result::map_err
to create an error message that the given pattern is invalid.
Return the Config
.
RegexBuilder::build
will reject any pattern that is not a valid regular expression, and this raises an interesting point.
There are many syntaxes for writing regular expressions.
If you look closely at the manual page for grep
, you’ll notice these options:
-E, --extended-regexp Interpret pattern as an extended regular expression (i.e. force grep to behave as egrep). -e pattern, --regexp=pattern Specify a pattern used during the search of the input: an input line is selected if it matches any of the specified patterns. This option is most useful when multiple -e options are used to specify multiple patterns, or when a pattern begins with a dash (`-').
The converse of these options is:
-G, --basic-regexp Interpret pattern as a basic regular expression (i.e. force grep to behave as traditional grep).
Regular expressions have been around since the 1950s when they were invented by the American mathematician Stephen Cole Kleene2.
Since that time, the syntax has been modified and expanded by various groups, perhaps most notably by the Perl community which created Perl Compatible Regular Expressions (PCRE).
By default, grep
will only parse basic regexes, but the preceding flags can allow it to use other varieties.
For instance, I can use the pattern ee
to search for any lines containing two adjacent es:
$ grep 'ee' tests/inputs/* tests/inputs/bustle.txt:The sweeping up the heart,
If I wanted to find any character that is repeated twice, the pattern is (.)1
where the dot (.
) represents any character and the capturing parentheses allow me to use the backreference 1
to refer to the first capture group.
This is an example of an extended expression, and so requires the -E
flag:
$ grep -E '(.)1' tests/inputs/* tests/inputs/bustle.txt:The sweeping up the heart, tests/inputs/bustle.txt:And putting love away tests/inputs/bustle.txt:We shall not want to use again tests/inputs/nobody.txt:Are you—Nobody—too? tests/inputs/nobody.txt:Don't tell! they'd advertise—you know! tests/inputs/nobody.txt:To tell one's name—the livelong June—
The Rust regex
crate documentation notes that its “syntax is similar to Perl-style regular expressions, but lacks a few features like look around and backreferences.”
(Look-around assertions allow the expression to assert that a pattern must be followed or preceded by another pattern, and backreferences allow the pattern to refer to previously captured values.)
This means that the challenge program will work more like the egrep
in handling extended regular expressions by default.
Sadly, this also means that the program will not be able to handle the preceding pattern because it requires backreferences.
It will still be a wicked cool program to write, so let’s get at it.
To use the compiled regex, I next need to find all the files to search.
Recall that the user might provide directory names with the --recursive
option to search for all the files contained in each directory; otherwise, directory names should result in a warning printed to STDERR
.
I decided to write a function called find_files
that will accept a vector of strings which may be file or directory names along with a Boolean for whether or not to recurse into directories.
It returns a vector of MyResult
values that will either hold a string which is the name of a valid file or an error message:
fn find_files(files: &[String], recursive: bool) -> Vec<MyResult<String>> { unimplemented!(); }
To test this, I can add a tests
module to src/lib.rs.
Note that this will use the rand
module which should be listed in the [dev-dependencies]
section of your Cargo.toml as noted earlier in the chapter:
#[cfg(test)] mod tests { use super::find_files; use rand::{distributions::Alphanumeric, Rng}; #[test] fn test_find_files() { // Verify that the function finds a file known to exist let files = find_files(&["./tests/inputs/fox.txt".to_string()], false); assert_eq!(files.len(), 1); assert_eq!(files[0].as_ref().unwrap(), "./tests/inputs/fox.txt"); // The function should reject a directory without the recursive option let files = find_files(&["./tests/inputs".to_string()], false); assert_eq!(files.len(), 1); if let Err(e) = &files[0] { assert_eq!( e.to_string(), "./tests/inputs is a directory".to_string() ); } // Verify the function recurses to find four files in the directory let res = find_files(&["./tests/inputs".to_string()], true); let mut files: Vec<String> = res .iter() .map(|r| r.as_ref().unwrap().replace("\", "/")) .collect(); files.sort(); assert_eq!(files.len(), 4); assert_eq!( files, vec![ "./tests/inputs/bustle.txt", "./tests/inputs/empty.txt", "./tests/inputs/fox.txt", "./tests/inputs/nobody.txt", ] ); // Generate a random string to represent a nonexistent file let bad: String = rand::thread_rng() .sample_iter(&Alphanumeric) .take(7) .map(char::from) .collect(); // Verify that the function returns the bad file as an error let files = find_files(&[bad], false); assert_eq!(files.len(), 1); assert!(files[0].is_err()); } }
You should be able to run cargo test test_find_files
to verify that your function finds existing files, recurses properly, and will report nonexistent files as errors.
Here is how I can use it in my code:
pub fn run(config: Config) -> MyResult<()> { println!("pattern "{}"", config.pattern); for entry in find_files(&config.files, config.recursive) { match entry { Err(e) => eprintln!("{}", e), Ok(filename) => println!("file "{}"", filename), } } Ok(()) }
My solution uses WalkDir
, which I introduced in Chapter 7 (findr
).
See if you can get your program to reproduce the following output.
To start, the default input should be “-” to represent reading from STDIN
:
$ cargo run -- fox pattern "fox" file "-"
Printing a regular expression means calling the Regex::as_str
method. Regex::build
notes that this “will produce the pattern given to new verbatim. Notably, it will not incorporate any of the flags set on this builder.”
Explicitly listing “-” as the input should produce the same output:
$ cargo run -- fox - pattern "fox" file "-"
The program should handle multiple input files:
$ cargo run -- fox tests/inputs/* pattern "fox" file "tests/inputs/bustle.txt" file "tests/inputs/empty.txt" file "tests/inputs/fox.txt" file "tests/inputs/nobody.txt"
A directory name without a --recursive
option should be rejected:
$ cargo run -- fox tests/inputs pattern "fox" tests/inputs is a directory
With the --recursive
flag, it should find the directory’s files:
$ cargo run -- -r fox tests/inputs pattern "fox" file "tests/inputs/empty.txt" file "tests/inputs/nobody.txt" file "tests/inputs/bustle.txt" file "tests/inputs/fox.txt"
Nonexistent arguments should be printed to STDERR
in the course of handling each entry:
$ cargo run -- -r fox blargh tests/inputs/fox.txt pattern "fox" blargh: No such file or directory (os error 2) file "tests/inputs/fox.txt"
Once you are properly handling the inputs, it’s time to open the files and search for matching lines.
I suggest you again use the open
function from earlier chapters that will open and read either an existing file or STDIN
for the filename “-”.
You will need to add use std::fs::File
and use std::io::{self, BufRead, BufReader}
for this:
fn open(filename: &str) -> MyResult<Box<dyn BufRead>> { match filename { "-" => Ok(Box::new(BufReader::new(io::stdin()))), _ => Ok(Box::new(BufReader::new(File::open(filename)?))), } }
When reading the lines, be sure to preserve the line endings as one of the input files contains Windows-style CRLF endings.
My solution uses a function called find_lines
which you can start with the following:
fn find_lines<T: BufRead>( mut file: T,pattern: &Regex,
invert_match: bool,
) -> MyResult<Vec<String>> { unimplemented!(); }
The file
must implement the std::io::BufRead
trait.
The pattern
is a reference to a compiled regular expression.
The invert_match
is a Boolean for whether to reverse the match operation.
In Chapter 5, I used impl BufRead
to indicate a value that must implement the BufRead
. In the preceding code, I’m using <T: BufRead>
to write the trait bound for the type T
.
To test this function, I expanded my tests
module by adding the following test_find_lines
function which again uses std::io::Cursor
to create a fake filehandle that implements BufRead
for testing:
#[cfg(test)] mod test { use super::{find_files, find_lines}; use rand::{distributions::Alphanumeric, Rng}; use regex::{Regex, RegexBuilder}; use std::io::Cursor; #[test] fn test_find_lines() { let text = b"Lorem Ipsum DOLOR"; // The pattern _or_ should match the one line, "Lorem" let re1 = Regex::new("or").unwrap(); let matches = find_lines(Cursor::new(&text), &re1, false); assert!(matches.is_ok()); assert_eq!(matches.unwrap().len(), 1); // When inverted, the function should match the other two lines let matches = find_lines(Cursor::new(&text), &re1, true); assert!(matches.is_ok()); assert_eq!(matches.unwrap().len(), 2); // This regex will be case-insensitive let re2 = RegexBuilder::new("or").case_insensitive(true) .build() .unwrap(); // The two lines "Lorem" and "DOLOR" should match let matches = find_lines(Cursor::new(&text), &re2, false);
assert!(matches.is_ok()); assert_eq!(matches.unwrap().len(), 2); // When inverted, the one remaining line should match let matches = find_lines(Cursor::new(&text), &re2, true); assert!(matches.is_ok()); assert_eq!(matches.unwrap().len(), 1); } #[test] fn test_find_files() {} // Same as before }
Try writing this function and then running cargo test test_find_lines
until it passes.
Next, I suggest you incorporate these ideas into your run
:
pub fn run(config: Config) -> MyResult<()> { let entries = find_files(&config.files, config.recursive);for entry in entries { match entry { Err(e) => eprintln!("{}", e),
Ok(filename) => match open(&filename) {
Err(e) => eprintln!("{}: {}", filename, e),
Ok(_file) => println!("Opened {}", filename),
}, } } Ok(()) }
Look for the input files.
Handle the errors from finding input files.
Try to open a valid filename.
Handle errors opening a file.
Here you have an open filehandle.
Start as simply as possible, perhaps by using an empty regular expression that should match all the lines from the input:
$ cargo run -- "" tests/inputs/fox.txt The quick brown fox jumps over the lazy dog.
Be sure you are reading STDIN
by default:
$ cargo run -- "" < tests/inputs/fox.txt The quick brown fox jumps over the lazy dog.
Run with several input files and a case-sensitive pattern:
$ cargo run -- The tests/inputs/* tests/inputs/bustle.txt:The bustle in a house tests/inputs/bustle.txt:The morning after death tests/inputs/bustle.txt:The sweeping up the heart, tests/inputs/fox.txt:The quick brown fox jumps over the lazy dog. tests/inputs/nobody.txt:Then there's a pair of us!
Then try to print the number of matches instead of the lines:
$ cargo run -- --count The tests/inputs/* tests/inputs/bustle.txt:3 tests/inputs/empty.txt:0 tests/inputs/fox.txt:1 tests/inputs/nobody.txt:1
Incorporate the --insensitive
option:
$ cargo run -- --count --insensitive The tests/inputs/* tests/inputs/bustle.txt:3 tests/inputs/empty.txt:0 tests/inputs/fox.txt:1 tests/inputs/nobody.txt:3
Next, try to invert the matching:
$ cargo run -- --count --invert-match The tests/inputs/* tests/inputs/bustle.txt:6 tests/inputs/empty.txt:0 tests/inputs/fox.txt:0 tests/inputs/nobody.txt:8
Be sure your --recursive
option works:
$ cargo run -- -icr the tests/inputs tests/inputs/empty.txt:0 tests/inputs/nobody.txt:3 tests/inputs/bustle.txt:3 tests/inputs/fox.txt:1
Handle errors like nonexistent files while processing the files in order:
$ cargo run -- fox blargh tests/inputs/fox.txt blargh: No such file or directory (os error 2) tests/inputs/fox.txt:The quick brown fox jumps over the lazy dog.
Another potential problem you should gracefully handle is a failure to open a file perhaps due to insufficient permissions:
$ touch hammer && chmod 000 hammer $ cargo run -- fox hammer tests/inputs/fox.txt hammer: Permission denied (os error 13) tests/inputs/fox.txt:The quick brown fox jumps over the lazy dog.
These challenges are getting harder, so it’s OK to feel a bit overwhelmed by the requirements.
Try to tackle each task in order, and keep running cargo test
to see how many you’re able to pass.
When you get stuck, run grep
with the arguments and closely examine the output.
Then run your program with the same arguments and try to find the differences.
To start, I’ll share my find_files
function:
fn find_files(files: &[String], recursive: bool) -> Vec<MyResult<String>> { let mut results = vec![];for path in files {
match path.as_str() { "-" => results.push(Ok(path.to_string())),
_ => match fs::metadata(&path) {
Ok(metadata) => { if metadata.is_dir() {
if recursive {
for entry in WalkDir::new(path)
.into_iter() .filter_map(|e| e.ok()) .filter(|e| e.file_type().is_file()) { results.push(Ok(entry .path() .display() .to_string())); } } else { results.push(Err(From::from(format!(
"{} is a directory", path )))); } } else if metadata.is_file() {
results.push(Ok(path.to_string())); } } Err(e) => {
results.push(Err(From::from(format!("{}: {}", path, e)))) } }, } } results }
Initialize an empty vector to hold the results
.
Iterate over each of the given filenames.
First, accept the filename “-” for STDIN
.
Try to get the file’s metadata.
Check if the entry is a directory.
Check if the user wants to recursively search directories.
Add all the files in the given directory to the results
.
Note an error that the given entry is a directory.
If the entry is a file, add it to the results
.
This arm will be triggered by nonexistent files.
Next, I will share my find_lines
function.
This borrows heavily from previous functions that read files line-by-line, so I won’t comment on code I’ve used before:
fn find_lines<T: BufRead>( mut file: T, pattern: &Regex, invert_match: bool, ) -> MyResult<Vec<String>> { let mut matches = vec![];let mut line = String::new(); loop { let bytes = file.read_line(&mut line)?; if bytes == 0 { break; } if (pattern.is_match(&line) && !invert_match)
|| (!pattern.is_match(&line) && invert_match)
{ matches.push(line.clone());
} line.clear(); } Ok(matches)
}
Initialize a mutable vector to hold the matching lines.
Verify that the lines matches and I’m not supposed to invert the match.
Alternately if the line does not match and I am supposed to invert the match.
I must clone
the string to add it to the matches
.
In the preceding function, the &&
is a short-circuiting logical AND that will only evaluate to true
if both operands are true
. The ||
is the short-circuiting logical OR, and will evaluate to true
if either of the operands is true
.
At the beginning of my run
function, I decided to create a closure to handle the printing of the output with or without the filenames given the number of input files:
pub fn run(config: Config) -> MyResult<()> { let entries = find_files(&config.files, config.recursive);let num_files = &entries.len();
let print = |fname: &str, val: &str| {
if num_files > &1 { print!("{}:{}", fname, val); } else { print!("{}", val); } };
Find all the inputs.
Find the number of inputs.
Create a print
closure that uses the number of inputs to decide whether to print the filename in the output.
Continuing from there, my code attempts to find the matching lines from the entries:
for entry in entries { match entry { Err(e) => eprintln!("{}", e),Ok(filename) => match open(&filename) {
Err(e) => eprintln!("{}: {}", filename, e),
Ok(file) => { match find_lines(
file, &config.pattern, config.invert_match, ) { Err(e) => eprintln!("{}", e),
Ok(matches) => { if config.count {
print( &filename, &format!("{} ", &matches.len()), ); } else { for line in &matches { print(&filename, line); } } } } } }, } } Ok(()) }
ripgrep
is a very complete Rust implementation of grep
and is well worthy of your study.
You can install the program using the instructions provided and then execute rg
.
As shown in Figure 9-1, the matching text is highlighted in the ouput.
Try to add that feature to your program using Regex::find
to find the start and stop positions of the matching pattern and something like colorize
to highlight the match.
rg
tool will highlight the matching textMostly this chapter challenged you to extend skills you’ve learned from previous chapters.
For instance, in Chapter 7 (findr
), you learned how to recursively find files in directories, and several previous chapters have used regular expressions.
In this chapter, you combined those skills to find content in files matching (or not) a given regex.
In addition, you learned the following:
How to use RegexBuilder
to create more complicated regular expressions using, for instance, the case-insensitive option to match strings regardless of case.
There are multiple syntaxes for writing regular expressions that different tools recognize such as PCRE. Rust’s regex
engine does not implement some features of PCRE such as look-around asssertions or backreferences.
You can indicate the trait bound BufRead
in function signatures either using impl BufRead
or by using <T: BufRead>
.
Rust’s logical AND operator (&&
) evaluates to true
if both the operands are true
. The OR (||
) operator evaluates to true
is either is true
.
1 The name grep
comes from the sed
command g/re/p
which means global regular expression print.
2 If you would like to learn more about regexes, I recommend Mastering Regular Expressions by Jeffrey Friedl (O’Reilly, 2006).