Chapter 9. Jack the Grepper

Please explain the expression on your face

They Might Be Giants

The grep program1 will find lines of input that match a given regular expression. By default, the input can come from STDIN, but you can also provide the names of one or more files or directories if you also use a recursive option to find all the files in those directories. The normal output will be the lines that match the given pattern, but you can invert the match to find the lines that don’t match. You can also instruct grep to print the number of matching lines instead of the lines. Pattern matching is normally case-sensitive, but you can use an option to perform case-insensitive matching. While the original program will do more, the challenge program will only go this far.

You will learn:

  • How to use a case-sensitive regular expression

  • About variations of regular expression syntax

  • Another syntax to indicate a trait bound

  • How to use Rust’s logical AND (&&) and OR (||) operators

How grep Works

The manual page for the BSD grep shows just how many different options the command will accept:

GREP(1)                   BSD General Commands Manual                  GREP(1)

NAME
     grep, egrep, fgrep, zgrep, zegrep, zfgrep -- file pattern searcher

SYNOPSIS
     grep [-abcdDEFGHhIiJLlmnOopqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
          [-e pattern] [-f file] [--binary-files=value] [--color[=when]]
          [--colour[=when]] [--context[=num]] [--label] [--line-buffered]
          [--null] [pattern] [file ...]

DESCRIPTION
     The grep utility searches any given input files, selecting lines that
     match one or more patterns.  By default, a pattern matches an input line
     if the regular expression (RE) in the pattern matches the input line
     without its trailing newline.  An empty expression matches every line.
     Each input line that matches at least one of the patterns is written to
     the standard output.

     grep is used for simple patterns and basic regular expressions (BREs);
     egrep can handle extended regular expressions (EREs).  See re_format(7)
     for more information on regular expressions.  fgrep is quicker than both
     grep and egrep, but can only handle fixed patterns (i.e. it does not
     interpret regular expressions).  Patterns may consist of one or more
     lines, allowing any of the pattern lines to match a portion of the input.

The GNU version is very similar:

GREP(1)                     General Commands Manual                    GREP(1)

NAME
       grep, egrep, fgrep - print lines matching a pattern

SYNOPSIS
       grep [OPTIONS] PATTERN [FILE...]
       grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

DESCRIPTION
       grep  searches the named input FILEs (or standard input if no files are
       named, or if a single hyphen-minus (-) is given as file name) for lines
       containing  a  match to the given PATTERN.  By default, grep prints the
       matching lines.

To demonstrate how grep works, I’ll use the 09_grepr/tests/inputs directory:

$ cd 09_grepr/tests/inputs
$ wc -l *
       9 bustle.txt
       0 empty.txt
       1 fox.txt
       9 nobody.txt
      19 total

To start verify for yourself that grep fox empty.txt will print nothing when using the empty.txt file. As shown by the usage, grep accepts a regular expression as the first positional argument and possibly some input files for the rest. Note that an empty regular expression will match all lines of input, and here I’ll use the input file fox.txt, which contains one line of text:

$ grep "" fox.txt
The quick brown fox jumps over the lazy dog.

Take a peek at this lovely Emily Dickinson poem, and notice that “Nobody” is always capitalized:

$ cat nobody.txt
I'm Nobody! Who are you?
Are you—Nobody—too?
Then there's a pair of us!
Don't tell! they'd advertise—you know!

How dreary—to be—Somebody!
How public—like a Frog—
To tell one's name—the livelong June—
To an admiring Bog!

If I search for Nobody, the two lines containing the string are printed:

$ grep Nobody nobody.txt
I'm Nobody! Who are you?
Are you—Nobody—too?

If I search for lowercase “nobody” with grep nobody nobody.txt, nothing is printed. I can, however, use -i|--ignore-case to find these lines:

$ grep -i nobody nobody.txt
I'm Nobody! Who are you?
Are you—Nobody—too?

I can use -v|--invert-match option to find the lines that don’t match the pattern:

$ grep -v Nobody nobody.txt
Then there's a pair of us!
Don't tell! they'd advertise—you know!

How dreary—to be—Somebody!
How public—like a Frog—
To tell one's name—the livelong June—
To an admiring Bog!

The -c|--count option will cause the output to be a summary of the number of times a match occurs:

$ grep -c Nobody nobody.txt
2

I can combine -v and -c to count the lines not matching:

$ grep -vc Nobody nobody.txt
7

When searching multiple input files, the output includes filename:

$ grep The *.txt
bustle.txt:The bustle in a house
bustle.txt:The morning after death
bustle.txt:The sweeping up the heart,
fox.txt:The quick brown fox jumps over the lazy dog.
nobody.txt:Then there's a pair of us!

The filename is also included for the counts:

$ grep -c The *.txt
bustle.txt:3
empty.txt:0
fox.txt:1
nobody.txt:1

Normally, the positional arguments are files, and the inclusion of a directory such as my $HOME directory will cause grep to print a warning:

$ grep The bustle.txt $HOME fox.txt
bustle.txt:The bustle in a house
bustle.txt:The morning after death
bustle.txt:The sweeping up the heart,
grep: /Users/kyclark: Is a directory
fox.txt:The quick brown fox jumps over the lazy dog.

Directory names are only acceptable when using the -r|--recursive option to find all the files in a directory that contain matching text:

$ grep -r The .
./nobody.txt:Then there's a pair of us!
./bustle.txt:The bustle in a house
./bustle.txt:The morning after death
./bustle.txt:The sweeping up the heart,
./fox.txt:The quick brown fox jumps over the lazy dog.

The -r and -i short flags can be combined to perform a recursive, case-insensitive search of one or more directories:

$ grep -ri the .
./nobody.txt:Then there's a pair of us!
./nobody.txt:Don't tell! they'd advertise—you know!
./nobody.txt:To tell one's name—the livelong June—
./bustle.txt:The bustle in a house
./bustle.txt:The morning after death
./bustle.txt:The sweeping up the heart,
./fox.txt:The quick brown fox jumps over the lazy dog.

Without any positional arguments for inputs, grep will read STDIN:

$ cat * | grep -i the
The bustle in a house
The morning after death
The sweeping up the heart,
The quick brown fox jumps over the lazy dog.
Then there's a pair of us!
Don't tell! they'd advertise—you know!
To tell one's name—the livelong June—

This is as far as the challenge program is expected to go.

Getting Started

The name of the challenge program should be grepr for a Rust version of grep. Start with cargo new grepr, then copy the 09_grepr/tests directory into your new project. In addition to using clap to parse the command-line arguments, my solution will use regex for regular expressions and walkdir to find the input files. Here is how I start the Cargo.toml:

[dependencies]
clap = "2.33"
regex = "1"
walkdir = "2"
sys-info = "0.9"

[dev-dependencies]
assert_cmd = "1"
predicates = "1"
rand = "0.8"

You can run cargo test to perform an initial build and run the tests, all of which should fail.

Defining the Arguments

I will start with the following for src/main.rs:

fn main() {
    if let Err(e) = grepr::get_args().and_then(grepr::run) {
        eprintln!("{}", e);
        std::process::exit(1);
    }
}

Following is how I started my src/lib.rs. Note that all the Boolean options default to false:

use clap::{App, Arg};
use regex::{Regex, RegexBuilder};
use std::error::Error;

type MyResult<T> = Result<T, Box<dyn Error>>;

#[derive(Debug)]
pub struct Config {
    pattern: Regex, 1
    files: Vec<String>, 2
    recursive: bool, 3
    count: bool, 4
    invert_match: bool, 5
}
1

The pattern is a compiled regular expression.

2

The files is a vector of strings.

3

The recursive option is a Boolean to recursively search directories.

4

The count option is a Boolean to display a count of the matches.

5

The invert_match is a Boolean to find lines that do not match the pattern.

Here is how I started my get_args and run functions. You should fill in the missing parts:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("grepr")
        .version("0.1.0")
        .author("Ken Youens-Clark <[email protected]>")
        .about("Rust grep")
        // What goes here?
        .get_matches();

    Ok(Config {
        pattern: ...
        files: ...
        recursive: ...
        count: ...
        invert_match: ...
    })
}

pub fn run(config: Config) -> MyResult<()> {
    println!("{:#?}", config);
    Ok(())
}

Your program should be able to produce the following usage:

grepr 0.1.0
Ken Youens-Clark <[email protected]>
Rust grep

USAGE:
    grepr [FLAGS] <PATTERN> <FILE>...

FLAGS:
    -c, --count           Count occurrences
    -h, --help            Prints help information
    -i, --insensitive     Case-insensitive
    -v, --invert-match    Invert match
    -r, --recursive       Recursive search
    -V, --version         Prints version information

ARGS:
    <PATTERN>    Search pattern 1
    <FILE>...    Input file(s) [default: -] 2
1

The search pattern is a required argument.

2

The input files are optional and default to “-” for STDIN.

Your program should be able to print a Config like the following when provided a pattern and no input files:

$ cargo run -- dog
Config {
    pattern: dog,
    files: [
        "-",
    ],
    recursive: false,
    count: false,
    invert_match: false,
}

It should be able to handle one or more input files and handle the flags:

$ cargo run -- dog -ricv tests/inputs/*.txt
Config {
    pattern: dog,
    files: [
        "tests/inputs/bustle.txt",
        "tests/inputs/empty.txt",
        "tests/inputs/fox.txt",
        "tests/inputs/nobody.txt",
    ],
    recursive: true,
    count: true,
    invert_match: true,
}

It should reject an invalid regular expression. For instance, * signifies zero or more of the preceding pattern. By itself, this is incomplete:

$ cargo run -- *
Invalid pattern "*"

I assume you figured that out. Following is how I declared my arguments:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("grepr")
        .version("0.1.0")
        .author("Ken Youens-Clark <[email protected]>")
        .about("Rust grep")
        .arg(
            Arg::with_name("pattern") 1
                .value_name("PATTERN")
                .help("Search pattern")
                .required(true),
        )
        .arg(
            Arg::with_name("files") 2
                .value_name("FILE")
                .help("Input file(s)")
                .required(true)
                .default_value("-")
                .min_values(1),
        )
        .arg(
            Arg::with_name("insensitive") 3
                .value_name("INSENSITIVE")
                .help("Case-insensitive")
                .short("i")
                .long("insensitive")
                .takes_value(false),
        )
        .arg(
            Arg::with_name("recursive") 4
                .value_name("RECURSIVE")
                .help("Recursive search")
                .short("r")
                .long("recursive")
                .takes_value(false),
        )
        .arg(
            Arg::with_name("count") 5
                .value_name("COUNT")
                .help("Count occurrences")
                .short("c")
                .long("count")
                .takes_value(false),
        )
        .arg(
            Arg::with_name("invert") 6
                .value_name("INVERT")
                .help("Invert match")
                .short("v")
                .long("invert-match")
                .takes_value(false),
        )
        .get_matches();
1

The first positional argument is for the pattern.

2

The rest of the positional arguments are for the inputs. The default is “-”.

3

The insensitive flag will handle case-insensitive options.

4

The recursive flag will handle searching for files in directories.

5

The count flag will cause the program to print counts.

6

The invert flag will search for lines not matching the pattern.

Note

Here the order in which you declare the positional parameters is important as the first one defined will be for the first positional argument. You can still define the options before or after the positional parameters.

Next, I used the arguments to create a regular expression that will incorporate the insensitive option:

    let pattern = matches.value_of("pattern").unwrap(); 1
    let pattern = RegexBuilder::new(pattern) 2
        .case_insensitive(matches.is_present("insensitive")) 3
        .build() 4
        .map_err(|_| format!("Invalid pattern "{}"", pattern))?; 5

    Ok(Config { 6
        pattern,
        files: matches.values_of_lossy("files").unwrap(),
        recursive: matches.is_present("recursive"),
        count: matches.is_present("count"),
        invert_match: matches.is_present("invert"),
    })
}
1

The pattern is required, so it should be safe to unwrap the value.

2

The RegexBuilder::new method will create a new regular expression.

3

The Regex::case_insensitive method will cause the regex to disregard case in comparisons when the insensitive is flag is present.

4

The Regex::build method will compile the regex.

5

If build returns an error, use Result::map_err to create an error message that the given pattern is invalid.

6

Return the Config.

RegexBuilder::build will reject any pattern that is not a valid regular expression, and this raises an interesting point. There are many syntaxes for writing regular expressions. If you look closely at the manual page for grep, you’ll notice these options:

-E, --extended-regexp
        Interpret pattern as an extended regular expression (i.e. force
        grep to behave as egrep).

-e pattern, --regexp=pattern
        Specify a pattern used during the search of the input: an input
        line is selected if it matches any of the specified patterns.
        This option is most useful when multiple -e options are used to
        specify multiple patterns, or when a pattern begins with a dash
        (`-').

The converse of these options is:

-G, --basic-regexp
        Interpret pattern as a basic regular expression (i.e. force grep
        to behave as traditional grep).

Regular expressions have been around since the 1950s when they were invented by the American mathematician Stephen Cole Kleene2. Since that time, the syntax has been modified and expanded by various groups, perhaps most notably by the Perl community which created Perl Compatible Regular Expressions (PCRE). By default, grep will only parse basic regexes, but the preceding flags can allow it to use other varieties. For instance, I can use the pattern ee to search for any lines containing two adjacent es:

$ grep 'ee' tests/inputs/*
tests/inputs/bustle.txt:The sweeping up the heart,

If I wanted to find any character that is repeated twice, the pattern is (.)1 where the dot (.) represents any character and the capturing parentheses allow me to use the backreference 1 to refer to the first capture group. This is an example of an extended expression, and so requires the -E flag:

$ grep -E '(.)1' tests/inputs/*
tests/inputs/bustle.txt:The sweeping up the heart,
tests/inputs/bustle.txt:And putting love away
tests/inputs/bustle.txt:We shall not want to use again
tests/inputs/nobody.txt:Are you—Nobody—too?
tests/inputs/nobody.txt:Don't tell! they'd advertise—you know!
tests/inputs/nobody.txt:To tell one's name—the livelong June—

The Rust regex crate documentation notes that its “syntax is similar to Perl-style regular expressions, but lacks a few features like look around and backreferences.” (Look-around assertions allow the expression to assert that a pattern must be followed or preceded by another pattern, and backreferences allow the pattern to refer to previously captured values.) This means that the challenge program will work more like the egrep in handling extended regular expressions by default. Sadly, this also means that the program will not be able to handle the preceding pattern because it requires backreferences. It will still be a wicked cool program to write, so let’s get at it.

Finding the Files to Search

To use the compiled regex, I next need to find all the files to search. Recall that the user might provide directory names with the --recursive option to search for all the files contained in each directory; otherwise, directory names should result in a warning printed to STDERR. I decided to write a function called find_files that will accept a vector of strings which may be file or directory names along with a Boolean for whether or not to recurse into directories. It returns a vector of MyResult values that will either hold a string which is the name of a valid file or an error message:

fn find_files(files: &[String], recursive: bool) -> Vec<MyResult<String>> {
    unimplemented!();
}

To test this, I can add a tests module to src/lib.rs. Note that this will use the rand module which should be listed in the [dev-dependencies] section of your Cargo.toml as noted earlier in the chapter:

#[cfg(test)]
mod tests {
    use super::find_files;
    use rand::{distributions::Alphanumeric, Rng};

    #[test]
    fn test_find_files() {
        // Verify that the function finds a file known to exist
        let files =
            find_files(&["./tests/inputs/fox.txt".to_string()], false);
        assert_eq!(files.len(), 1);
        assert_eq!(files[0].as_ref().unwrap(), "./tests/inputs/fox.txt");

        // The function should reject a directory without the recursive option
        let files = find_files(&["./tests/inputs".to_string()], false);
        assert_eq!(files.len(), 1);
        if let Err(e) = &files[0] {
            assert_eq!(
                e.to_string(),
                "./tests/inputs is a directory".to_string()
            );
        }

        // Verify the function recurses to find four files in the directory
        let res = find_files(&["./tests/inputs".to_string()], true);
        let mut files: Vec<String> = res
            .iter()
            .map(|r| r.as_ref().unwrap().replace("\", "/"))
            .collect();
        files.sort();
        assert_eq!(files.len(), 4);
        assert_eq!(
            files,
            vec![
                "./tests/inputs/bustle.txt",
                "./tests/inputs/empty.txt",
                "./tests/inputs/fox.txt",
                "./tests/inputs/nobody.txt",
            ]
        );

        // Generate a random string to represent a nonexistent file
        let bad: String = rand::thread_rng()
            .sample_iter(&Alphanumeric)
            .take(7)
            .map(char::from)
            .collect();

        // Verify that the function returns the bad file as an error
        let files = find_files(&[bad], false);
        assert_eq!(files.len(), 1);
        assert!(files[0].is_err());
    }
}

You should be able to run cargo test test_find_files to verify that your function finds existing files, recurses properly, and will report nonexistent files as errors. Here is how I can use it in my code:

pub fn run(config: Config) -> MyResult<()> {
    println!("pattern "{}"", config.pattern);

    for entry in find_files(&config.files, config.recursive) {
        match entry {
            Err(e) => eprintln!("{}", e),
            Ok(filename) => println!("file "{}"", filename),
        }
    }

    Ok(())
}

My solution uses WalkDir, which I introduced in Chapter 7 (findr). See if you can get your program to reproduce the following output. To start, the default input should be “-” to represent reading from STDIN:

$ cargo run -- fox
pattern "fox"
file "-"
Note

Printing a regular expression means calling the Regex::as_str method. Regex::build notes that this “will produce the pattern given to new verbatim. Notably, it will not incorporate any of the flags set on this builder.”

Explicitly listing “-” as the input should produce the same output:

$ cargo run -- fox -
pattern "fox"
file "-"

The program should handle multiple input files:

$ cargo run -- fox tests/inputs/*
pattern "fox"
file "tests/inputs/bustle.txt"
file "tests/inputs/empty.txt"
file "tests/inputs/fox.txt"
file "tests/inputs/nobody.txt"

A directory name without a --recursive option should be rejected:

$ cargo run -- fox tests/inputs
pattern "fox"
tests/inputs is a directory

With the --recursive flag, it should find the directory’s files:

$ cargo run -- -r fox tests/inputs
pattern "fox"
file "tests/inputs/empty.txt"
file "tests/inputs/nobody.txt"
file "tests/inputs/bustle.txt"
file "tests/inputs/fox.txt"

Nonexistent arguments should be printed to STDERR in the course of handling each entry:

$ cargo run -- -r fox blargh tests/inputs/fox.txt
pattern "fox"
blargh: No such file or directory (os error 2)
file "tests/inputs/fox.txt"

Finding the Matching Lines of Input

Once you are properly handling the inputs, it’s time to open the files and search for matching lines. I suggest you again use the open function from earlier chapters that will open and read either an existing file or STDIN for the filename “-”. You will need to add use std::fs::File and use std::io::{self, BufRead, BufReader} for this:

fn open(filename: &str) -> MyResult<Box<dyn BufRead>> {
    match filename {
        "-" => Ok(Box::new(BufReader::new(io::stdin()))),
        _ => Ok(Box::new(BufReader::new(File::open(filename)?))),
    }
}

When reading the lines, be sure to preserve the line endings as one of the input files contains Windows-style CRLF endings. My solution uses a function called find_lines which you can start with the following:

fn find_lines<T: BufRead>(
    mut file: T, 1
    pattern: &Regex, 2
    invert_match: bool, 3
) -> MyResult<Vec<String>> {
    unimplemented!();
}
1

The file must implement the std::io::BufRead trait.

2

The pattern is a reference to a compiled regular expression.

3

The invert_match is a Boolean for whether to reverse the match operation.

Note

In Chapter 5, I used impl BufRead to indicate a value that must implement the BufRead. In the preceding code, I’m using <T: BufRead> to write the trait bound for the type T.

To test this function, I expanded my tests module by adding the following test_find_lines function which again uses std::io::Cursor to create a fake filehandle that implements BufRead for testing:

#[cfg(test)]
mod test {
    use super::{find_files, find_lines};
    use rand::{distributions::Alphanumeric, Rng};
    use regex::{Regex, RegexBuilder};
    use std::io::Cursor;

    #[test]
    fn test_find_lines() {
        let text = b"Lorem
Ipsum
DOLOR";

        // The pattern _or_ should match the one line, "Lorem"
        let re1 = Regex::new("or").unwrap();
        let matches = find_lines(Cursor::new(&text), &re1, false);
        assert!(matches.is_ok());
        assert_eq!(matches.unwrap().len(), 1);

        // When inverted, the function should match the other two lines
        let matches = find_lines(Cursor::new(&text), &re1, true);
        assert!(matches.is_ok());
        assert_eq!(matches.unwrap().len(), 2);

        // This regex will be case-insensitive
        let re2 = RegexBuilder::new("or") 3
            .case_insensitive(true)
            .build()
            .unwrap();

        // The two lines "Lorem" and "DOLOR" should match
        let matches = find_lines(Cursor::new(&text), &re2, false); 4
        assert!(matches.is_ok());
        assert_eq!(matches.unwrap().len(), 2);

        // When inverted, the one remaining line should match
        let matches = find_lines(Cursor::new(&text), &re2, true);
        assert!(matches.is_ok());
        assert_eq!(matches.unwrap().len(), 1);
    }

    #[test]
    fn test_find_files() {} // Same as before
}

Try writing this function and then running cargo test test_find_lines until it passes. Next, I suggest you incorporate these ideas into your run:

pub fn run(config: Config) -> MyResult<()> {
    let entries = find_files(&config.files, config.recursive); 1
    for entry in entries {
        match entry {
            Err(e) => eprintln!("{}", e), 2
            Ok(filename) => match open(&filename) { 3
                Err(e) => eprintln!("{}: {}", filename, e), 4
                Ok(_file) => println!("Opened {}", filename), 5
            },
        }
    }

    Ok(())
}
1

Look for the input files.

2

Handle the errors from finding input files.

3

Try to open a valid filename.

4

Handle errors opening a file.

5

Here you have an open filehandle.

Start as simply as possible, perhaps by using an empty regular expression that should match all the lines from the input:

$ cargo run -- "" tests/inputs/fox.txt
The quick brown fox jumps over the lazy dog.

Be sure you are reading STDIN by default:

$ cargo run -- "" < tests/inputs/fox.txt
The quick brown fox jumps over the lazy dog.

Run with several input files and a case-sensitive pattern:

$ cargo run -- The tests/inputs/*
tests/inputs/bustle.txt:The bustle in a house
tests/inputs/bustle.txt:The morning after death
tests/inputs/bustle.txt:The sweeping up the heart,
tests/inputs/fox.txt:The quick brown fox jumps over the lazy dog.
tests/inputs/nobody.txt:Then there's a pair of us!

Then try to print the number of matches instead of the lines:

$ cargo run -- --count The tests/inputs/*
tests/inputs/bustle.txt:3
tests/inputs/empty.txt:0
tests/inputs/fox.txt:1
tests/inputs/nobody.txt:1

Incorporate the --insensitive option:

$ cargo run -- --count --insensitive The tests/inputs/*
tests/inputs/bustle.txt:3
tests/inputs/empty.txt:0
tests/inputs/fox.txt:1
tests/inputs/nobody.txt:3

Next, try to invert the matching:

$ cargo run -- --count --invert-match The tests/inputs/*
tests/inputs/bustle.txt:6
tests/inputs/empty.txt:0
tests/inputs/fox.txt:0
tests/inputs/nobody.txt:8

Be sure your --recursive option works:

$ cargo run -- -icr the tests/inputs
tests/inputs/empty.txt:0
tests/inputs/nobody.txt:3
tests/inputs/bustle.txt:3
tests/inputs/fox.txt:1

Handle errors like nonexistent files while processing the files in order:

$ cargo run -- fox blargh tests/inputs/fox.txt
blargh: No such file or directory (os error 2)
tests/inputs/fox.txt:The quick brown fox jumps over the lazy dog.

Another potential problem you should gracefully handle is a failure to open a file perhaps due to insufficient permissions:

$ touch hammer && chmod 000 hammer
$ cargo run -- fox hammer tests/inputs/fox.txt
hammer: Permission denied (os error 13)
tests/inputs/fox.txt:The quick brown fox jumps over the lazy dog.

These challenges are getting harder, so it’s OK to feel a bit overwhelmed by the requirements. Try to tackle each task in order, and keep running cargo test to see how many you’re able to pass. When you get stuck, run grep with the arguments and closely examine the output. Then run your program with the same arguments and try to find the differences.

Solution

To start, I’ll share my find_files function:

fn find_files(files: &[String], recursive: bool) -> Vec<MyResult<String>> {
    let mut results = vec![]; 1

    for path in files { 2
        match path.as_str() {
            "-" => results.push(Ok(path.to_string())), 3
            _ => match fs::metadata(&path) { 4
                Ok(metadata) => {
                    if metadata.is_dir() { 5
                        if recursive { 6
                            for entry in WalkDir::new(path) 7
                                .into_iter()
                                .filter_map(|e| e.ok())
                                .filter(|e| e.file_type().is_file())
                            {
                                results.push(Ok(entry
                                    .path()
                                    .display()
                                    .to_string()));
                            }
                        } else {
                            results.push(Err(From::from(format!( 8
                                "{} is a directory",
                                path
                            ))));
                        }
                    } else if metadata.is_file() { 9
                        results.push(Ok(path.to_string()));
                    }
                }
                Err(e) => { 10
                    results.push(Err(From::from(format!("{}: {}", path, e))))
                }
            },
        }
    }

    results
}
1

Initialize an empty vector to hold the results.

2

Iterate over each of the given filenames.

3

First, accept the filename “-” for STDIN.

4

Try to get the file’s metadata.

5

Check if the entry is a directory.

6

Check if the user wants to recursively search directories.

7

Add all the files in the given directory to the results.

8

Note an error that the given entry is a directory.

9

If the entry is a file, add it to the results.

10

This arm will be triggered by nonexistent files.

Next, I will share my find_lines function. This borrows heavily from previous functions that read files line-by-line, so I won’t comment on code I’ve used before:

fn find_lines<T: BufRead>(
    mut file: T,
    pattern: &Regex,
    invert_match: bool,
) -> MyResult<Vec<String>> {
    let mut matches = vec![]; 1
    let mut line = String::new();

    loop {
        let bytes = file.read_line(&mut line)?;
        if bytes == 0 {
            break;
        }
        if (pattern.is_match(&line) && !invert_match) 2
            || (!pattern.is_match(&line) && invert_match) 3
        {
            matches.push(line.clone()); 4
        }
        line.clear();
    }

    Ok(matches) 3
}
1

Initialize a mutable vector to hold the matching lines.

2

Verify that the lines matches and I’m not supposed to invert the match.

3

Alternately if the line does not match and I am supposed to invert the match.

4

I must clone the string to add it to the matches.

Note

In the preceding function, the && is a short-circuiting logical AND that will only evaluate to true if both operands are true. The || is the short-circuiting logical OR, and will evaluate to true if either of the operands is true.

At the beginning of my run function, I decided to create a closure to handle the printing of the output with or without the filenames given the number of input files:

pub fn run(config: Config) -> MyResult<()> {
    let entries = find_files(&config.files, config.recursive); 1
    let num_files = &entries.len(); 2
    let print = |fname: &str, val: &str| { 3
        if num_files > &1 {
            print!("{}:{}", fname, val);
        } else {
            print!("{}", val);
        }
    };
1

Find all the inputs.

2

Find the number of inputs.

3

Create a print closure that uses the number of inputs to decide whether to print the filename in the output.

Continuing from there, my code attempts to find the matching lines from the entries:

    for entry in entries {
        match entry {
            Err(e) => eprintln!("{}", e), 1
            Ok(filename) => match open(&filename) { 2
                Err(e) => eprintln!("{}: {}", filename, e), 3
                Ok(file) => {
                    match find_lines( 4
                        file,
                        &config.pattern,
                        config.invert_match,
                    ) {
                        Err(e) => eprintln!("{}", e), 5
                        Ok(matches) => {
                            if config.count { 6
                                print(
                                    &filename,
                                    &format!("{}
", &matches.len()),
                                );
                            } else {
                                for line in &matches {
                                    print(&filename, line);
                                }
                            }
                        }
                    }
                }
            },
        }
    }
    Ok(())
}
1

Print errors like nonexistent files to STDERR.

2

Attempt to open an existing file that could fail because of permissions.

3

Print an error to STDERR.

4

Attempt to find the matching lines of text.

5

Print errors to STDERR.

6

Decide whether to print the number of matches or the matches themselves.

Going Further

ripgrep is a very complete Rust implementation of grep and is well worthy of your study. You can install the program using the instructions provided and then execute rg. As shown in Figure 9-1, the matching text is highlighted in the ouput. Try to add that feature to your program using Regex::find to find the start and stop positions of the matching pattern and something like colorize to highlight the match.

rg
Figure 9-1. The rg tool will highlight the matching text

Summary

Mostly this chapter challenged you to extend skills you’ve learned from previous chapters. For instance, in Chapter 7 (findr), you learned how to recursively find files in directories, and several previous chapters have used regular expressions. In this chapter, you combined those skills to find content in files matching (or not) a given regex.

In addition, you learned the following:

  • How to use RegexBuilder to create more complicated regular expressions using, for instance, the case-insensitive option to match strings regardless of case.

  • There are multiple syntaxes for writing regular expressions that different tools recognize such as PCRE. Rust’s regex engine does not implement some features of PCRE such as look-around asssertions or backreferences.

  • You can indicate the trait bound BufRead in function signatures either using impl BufRead or by using <T: BufRead>.

  • Rust’s logical AND operator (&&) evaluates to true if both the operands are true. The OR (||) operator evaluates to true is either is true.

1 The name grep comes from the sed command g/re/p which means global regular expression print.

2 If you would like to learn more about regexes, I recommend Mastering Regular Expressions by Jeffrey Friedl (O’Reilly, 2006).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset