Chapter 7. Finders Keepers

Then is when I maybe should have wrote it down but when I looked around to find a pen and then I tried to think of what you said

They Might Be Giants

The find utility will, unsurprisingly, find things. With no options, it will recursively search the current working directory for all entries, including files, symbolic links, sockets, directories, and more. You can restrict it to search one or more paths and include restrictions to find entries matching myriad restrictions such as names, file sizes, file types, modification times, permissions, and more. The challenge program you write will locate files, directories, or links in one or more directories having names that match one or more regular expressions.

You will learn:

  • How to use clap to constrain possible values for command-line arguments

  • How to anchor a regular expression to the end of a string

  • How to create an enumerated type (enum)

  • How to recursively search file paths using the walkdir crate

  • How to use Iterator::any

  • How to chain multiple filter and map operations

How find Works

The manual page for find is truly amazing. It goes on for about 500 lines detailing all the options you can use to find files and directories. Here is just the beginning that shows the general vibe of the BSD find:

FIND(1)                   BSD General Commands Manual                  FIND(1)

NAME
     find -- walk a file hierarchy

SYNOPSIS
     find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
     find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]

DESCRIPTION
     The find utility recursively descends the directory tree for each path
     listed, evaluating an expression (composed of the ''primaries'' and
     ''operands'' listed below) in terms of each file in the tree.

The GNU find is similar:

$ find --help
Usage: find [-H] [-L] [-P] [-Olevel]
[-D help|tree|search|stat|rates|opt|exec] [path...] [expression]

default path is the current directory; default expression is -print
expression may consist of: operators, options, tests, and actions:

operators (decreasing precedence; -and is implicit where no others are given):
      ( EXPR )   ! EXPR   -not EXPR   EXPR1 -a EXPR2   EXPR1 -and EXPR2
      EXPR1 -o EXPR2   EXPR1 -or EXPR2   EXPR1 , EXPR2

positional options (always true): -daystart -follow -regextype

normal options (always true, specified before other expressions):
      -depth --help -maxdepth LEVELS -mindepth LEVELS -mount -noleaf
      --version -xautofs -xdev -ignore_readdir_race -noignore_readdir_race

tests (N can be +N or -N or N): -amin N -anewer FILE -atime N -cmin N
      -cnewer FILE -ctime N -empty -false -fstype TYPE -gid N -group NAME
      -ilname PATTERN -iname PATTERN -inum N -iwholename PATTERN
      -iregex PATTERN -links N -lname PATTERN -mmin N -mtime N
      -name PATTERN -newer FILE -nouser -nogroup -path PATTERN
      -perm [-/]MODE -regex PATTERN -readable -writable -executable
      -wholename PATTERN -size N[bcwkMG] -true -type [bcdpflsD] -uid N
      -used N -user NAME -xtype [bcdpfls] -context CONTEXT

actions: -delete -print0 -printf FORMAT -fprintf FILE FORMAT -print
      -fprint0 FILE -fprint FILE -ls -fls FILE -prune -quit
      -exec COMMAND ; -exec COMMAND {} + -ok COMMAND ;
      -execdir COMMAND ; -execdir COMMAND {} + -okdir COMMAND ;

As usual, the challenge program will only attempt to implement a subset of these options that I’ll demonstrate forthwith using the files in 07_findr/tests/inputs. The following output from tree shows the directory and file structure. Note that -> indicates that d/b.csv is a link to the file a/b/b.csv. A link is a pointer or a shortcut to another file or directory:

$ cd 07_findr/tests/inputs
$ tree
.
├── a
│   ├── a.txt
│   └── b
│       ├── b.csv
│       └── c
│           └── c.mp3
├── d
│   ├── b.csv -> ../a/b/b.csv
│   ├── d.tsv
│   ├── d.txt
│   └── e
│       └── e.mp3
├── f
│   └── f.txt
└── g.csv

6 directories, 9 files
Note

Windows does not have a symbolic link (AKA symlink) like Unix, so there are four tests that will fail because the path testsinputsd.csv exists as a regular file and not as a link. I recommend Windows users explore writing and testing this program in Windows Subsystem for Linux.

To start, find will accept one or more positional arguments which are the starting paths. For each path, find will recursively search for all files and directories found therein. If I am in the tests/inputs directory and indicate . for the current working directory, find will list all the contents. Note that I will be showing the output from find when run on macOS which differs from the ordering of the entries shown on Linux:

$ find .
.
./g.csv
./a
./a/a.txt
./a/b
./a/b/b.csv
./a/b/c
./a/b/c/c.mp3
./f
./f/f.txt
./d
./d/b.csv
./d/d.txt
./d/d.tsv
./d/e
./d/e/e.mp3

Using the -type option1, I can specify “f” to only find files:

$ find . -type f
./g.csv
./a/a.txt
./a/b/b.csv
./a/b/c/c.mp3
./f/f.txt
./d/d.txt
./d/d.tsv
./d/e/e.mp3

I can use “l” to only find links:

$ find . -type l
./d/b.csv

I can also use “d” to only find directories:

$ find . -type d
.
./a
./a/b
./a/b/c
./f
./d
./d/e

While the challenge program will only try to find these types, find will accept several more -type values per the manual page:

-type t
     True if the file is of the specified type.  Possible file types
     are as follows:

     b       block special
     c       character special
     d       directory
     f       regular file
     l       symbolic link
     p       FIFO
     s       socket

If you give a -type value not found in this list, find will stop with an error:

$ find . -type x
find: -type: x: unknown type

The -name option can locate items matching a file glob pattern such as *.csv for any entry ending with .csv. The * on the command line must be escaped with a backslash so that it is passed as a literal character and not interpreted by the shell:

$ find . -name *.csv
./g.csv
./a/b/b.csv
./d/b.csv

You can also put the pattern in quotes:

$ find . -name "*.csv"
./g.csv
./a/b/b.csv
./d/b.csv

I can search for multiple -name patterns by chaining them with -o for or:

$ find . -name "*.txt" -o -name "*.csv"
./g.csv
./a/a.txt
./a/b/b.csv
./f/f.txt
./d/b.csv
./d/d.txt

I can combine -type and -name options. For instance, I can search for files or links matching *.csv:

$ find . -name "*.csv" -type f -o -type l
./g.csv
./a/b/b.csv
./d/b.csv

I must use parentheses to group the -type arguments when the -name condition follows an or expression:

$ find . ( -type f -o -type l ) -name "*.csv"
./g.csv
./a/b/b.csv
./d/b.csv

I can also list multiple search paths as positional arguments:

$ find a/b d -name "*.mp3"
a/b/c/c.mp3
d/e/e.mp3

If the given search path is an invalid directory, find will print an error:

$ find blargh
find: blargh: No such file or directory

I find it odd that find accepts files as a path argument, simply printing the filename:

$ find a/a.txt
a/a.txt

The challenge program, however, will only accept readable directory names as valid arguments. While find can do much more, this is as much as you will implement in this chapter.

Getting Started

The program you write will be called findr (pronounced find-er), and I recommend you run cargo new findr to start. Update Cargo.toml with the following:

[dependencies]
clap = "2.33"
walkdir = "2"
regex = "1"

[dev-dependencies]
assert_cmd = "1"
predicates = "1"
rand = "0.8"
sys-info = "0.9" 1
1

This module is needed to detect when the tests are running on Windows and make changes in the expected output.

Normally I would suggest that you copy the 07_findr/tests directory into your project, but this will not work because the symlink in the tests/inputs directory will not be preserved causing your tests to fail. Instead, I’ve provided a bash script in the 07_findr directory that will copy the tests into a destination directory. Run with no arguments to see the usage:

$ ./cp-tests.sh
Usage: cp-tests.sh DEST_DIR

Assuming I created my new project in $HOME/rust/findr, I can use the program like this:

$ ./cp-tests.sh ~/work/rust/findr
Copying "tests" to "/Users/kyclark/work/rust/findr"
Fixing symlink
Done.

Run cargo test to build the program and run the tests, all of which should fail.

Defining the Arguments

I will use the following for src/main.rs:

fn main() {
    if let Err(e) = findr::get_args().and_then(findr::run) {
        eprintln!("{}", e);
        std::process::exit(1);
    }
}

Before I show you how I started my src/lib.rs, I want to show the expected command-line interface:

$ cargo run -- --help
findr 0.1.0
Ken Youens-Clark <[email protected]>
Rust find

USAGE:
    findr [OPTIONS] [--] [DIR]... 1

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -n, --name <NAME>...    Name 2
    -t, --type <TYPE>...    Entry type [possible values: f, d, l] 3

ARGS:
    <DIR>...    Search directory [default: .] 4
1

The -- separates multiple optional values from the multiple positional values. Alternatively, you can place the positional arguments before the options as the find program does.

2

The -n|--name option can specify one or more patterns.

3

The -t|--type option can specify one or more of f for files, d for directories, or l for links.

4

TK

You can model this however you like, but here is how I decided to start:

use crate::EntryType::*; 1
use clap::{App, Arg};
use regex::Regex;
use std::error::Error;

type MyResult<T> = Result<T, Box<dyn Error>>;

#[derive(Debug, PartialEq)] 2
enum EntryType {
    Dir,
    File,
    Link,
}

#[derive(Debug)]
pub struct Config {
    dirs: Vec<String>, 3
    names: Option<Vec<Regex>>, 4
    entry_types: Option<Vec<EntryType>>, 5
}
1

This will allow me to use, for instance, Dir instead of EntryType::Dir.

2

The EntryType is an enumerated list of possible values.

3

The dirs will be a vector of strings.

4

The names will be an optional vector of compiled regular expressions.

5

The entry_types will be an optional vector of EntryType variants.

In the preceding code, I’m introducing enum, which is a “type that can be any one of several variants.” You’ve already been using enums such as Option, which has the variants Some<T> or None, and Result, which has the variants Ok<T> and Err<E>. In a language without such a type, you’d probably have to use literal strings in your code like “dir,” “file,” and “link.” In Rust, I can create a new enum called EntryType with exactly three possibilities: Dir, File, or Link. I can use these values in pattern matching with much more precision than matching strings, which might be misspelled. Additionally, Rust will not allow me to match on EntryType values without considering all the variants, which adds yet another layer of safety in using them.

Tip

Per Rust naming conventions, types, structs, traits, and enum variants use UpperCamelCase.

Here is how you might start the get_args function:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("findr")
        .version("0.1.0")
        .author("Ken Youens-Clark <[email protected]>")
        .about("Rust find")
        // What goes here?
        .matches()

    Ok(Config {
        dirs: ...
        names: ...
        entry_types: ...
    })
}

Perhaps start the run function by printing the config:

pub fn run(config: Config) -> MyResult<()> {
    println!("{:?}", config);
    Ok(())
}

When run with no arguments, the default Config values should look like this:

$ cargo run
Config { dirs: ["."], names: None, entry_types: None }

When given a --type argument of “f,” the entry_types should include the File variant:

$ cargo run -- --type f
Config { dirs: ["."], names: None, entry_types: Some([File]) }

or Dir when the value is “d”:

$ cargo run -- --type d
Config { dirs: ["."], names: None, entry_types: Some([Dir]) }

or Link when the value is “l”:

$ cargo run -- --type l
Config { dirs: ["."], names: None, entry_types: Some([Link]) }

Any other value should be rejected. You can get clap::Arg to handle this, so read the documentation closely:

$ cargo run -- --type x
error: 'x' isn't a valid value for '--type <TYPE>...'
	[possible values: d, f, l]

USAGE:
    findr --type <TYPE>

For more information try --help

I’ll be using the Regex module to match file and directory names, which means that the --name value must be a valid regular expression. Regex syntax differs slightly from file glob patterns as shown in Figure 6-1. For instance, the asterisk (*) in the file glob *.txt means zero or more of any character and the dot has no special meaning2, so this will match files that end in .txt. In regex syntax, however, the asterisk means zero or more of the previous character, so I need to write .* where the dot (.) is a metacharacter that means any one character.

fig 1 glob vs regex
Figure 7-1. The asterisk * and dot . have different meanings in file globs versus regular expressions

This means that the equivalent regex should use a backslash to escape the literal dot such as .*.txt, which must be double-escaped on the command line:

$ cargo run -- --name .*\.txt
Config { dirs: ["."], names: Some([.*.txt]), entry_types: None }

Alternatively, you can place the dot inside a character class like [.] where it is no longer a metacharacter:

$ cargo run -- --name .*[.]txt
Config { dirs: ["."], names: Some([.*[.]txt]), entry_types: None }

Technically, the regular expression will match anywhere in the string, even at the beginning because .* means zero or more of anything:

let re = Regex::new(".*[.]csv").unwrap();
assert!(re.is_match("foo.csv"));
assert!(re.is_match(".csv.foo"));

If I want to insist that the regex matches at the end of the string, I can add $ at the end of the pattern to indicate the end of the string:

let re = Regex::new(".*[.]csv$").unwrap();
assert!(re.is_match("foo.csv"));
assert!(!re.is_match(".csv.foo"));
Tip

The converse of using $ to anchor a pattern to the end of a string is to use ^ to indicate the beginning of the string.

If I try to use the same file glob pattern that find expects, it will be rejected:

$ cargo run -- --name *.txt
Invalid --name "*.txt"

All the Config fields should accept multiple values. For this output, I changed run to pretty-print the config:

$ cargo run -- -t f l -n txt mp3 -- tests/inputs/a tests/inputs/d
Config {
    dirs: [
        "tests/inputs/a",
        "tests/inputs/d",
    ],
    names: Some(
        [
            txt,
            mp3,
        ],
    ),
    entry_types: Some(
        [
            File,
            Link,
        ],
    ),
}

It’s important to get this much working before attempting to solve the rest of the program. Don’t proceed until your program can replicate the preceding output and can pass at least cargo test dies:

running 2 tests
test dies_bad_type ... ok
test dies_bad_name ... ok

Validating the Arguments

Following is my get_args function so that we can regroup on the task at hand:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("findr")
        .version("0.1.0")
        .author("Ken Youens-Clark <[email protected]>")
        .about("Rust find")
        .arg(
            Arg::with_name("dirs") 1
                .value_name("DIR")
                .help("Search directory")
                .default_value(".")
                .min_values(1),
        )
        .arg(
            Arg::with_name("names") 2
                .value_name("NAME")
                .help("Name")
                .short("n")
                .long("name")
                .takes_value(true)
                .multiple(true),
        )
        .arg(
            Arg::with_name("types") 3
                .value_name("TYPE")
                .help("Entry type")
                .short("t")
                .long("type")
                .possible_values(&["f", "d", "l"])
                .takes_value(true)
                .multiple(true),
        )
        .get_matches();
1

The dirs argument requires at least one value and defaults to a dot (.).

2

The names option accepts zero or more values.

3

The types option accepts zero or more values of f, d, or l.

Next, I handle the possible filenames, transforming them into regular expressions or rejecting invalid patterns:

    let mut names = vec![]; 1
    if let Some(vals) = matches.values_of_lossy("names") { 2
        for name in vals { 3
            match Regex::new(&name) { 4
                Ok(re) => names.push(re), 5
                _ => {
                    return Err(From::from(format!( 6
                        "Invalid --name "{}"",
                        name
                    )))
                }
            }
        }
    }
1

Create a mutable vector to hold the regular expressions.

2

See if the user has provided Some(vals) for the option.

3

Iterate over the values.

4

Try to create a new Regex with the name.

5

Add a valid regex to the list of names.

6

Return an error that the pattern is not valid.

Next, I interpret the entry types. Even though I used Arg::possible_values to ensure that the user could only supply “f,” “d,” or “l,” Rust still requires a match arm for any other possible string:

    let entry_types = matches.values_of_lossy("types").map(|vals| { 1
        vals.iter() 2
            .filter_map(|val| match val.as_str() { 3
                "d" => Some(Dir), 4
                "f" => Some(File),
                "l" => Some(Link),
                _ => None, 5
            })
            .collect() 6
    });
1

ArgMatches.values_of_lossy will return an Option<Vec<String>>. Use Option::map to handle Some(vals).

2

Iterate over each of the values.

3

Use Iterator::filter_map that “yields only the values for which the supplied closure returns Some(value).”

4

If the value is “d,” “f,” or “l,” return the appropriate EntryType.

5

This arm should never be selected, but return None anyway.

6

Use Iterator::collect to gather the values into a vector of EntryType values.

I end the function by returning the Config:

    Ok(Config {
        dirs: matches.values_of_lossy("dirs").unwrap(),
        names: if names.is_empty() { None } else { Some(names) }, 1
        entry_types,
    })
}
1

The names should be Some value when present or None when absent.

Find All the Things

Now that you have validated the arguments from the user, it’s time to look for the items that match the conditions. You might start by iterating over config.dirs and trying to find all the files contained in each. I will use the walkdir crate for this. Following is how I can use some of the example code from the documentation to print all the entries. Be sure to add use walkdir::WalkDir for the following:

pub fn run(config: Config) -> MyResult<()> {
    for dirname in config.dirs {
        for entry in WalkDir::new(dirname) {
            println!("{}", entry?.path().display()); 1
        }
    }
    Ok(())
}
1

Note the use of entry? to unpack the Result and propagate an error.

To see if this works, I’ll list the contents of the tests/inputs/a/b. Note that this is the order I see on macOS:

$ cargo run -- tests/inputs/a/b
tests/inputs/a/b
tests/inputs/a/b/b.csv
tests/inputs/a/b/c
tests/inputs/a/b/c/c.mp3

On Linux, I see the following output:

$ cargo run -- tests/inputs/a/b
tests/inputs/a/b
tests/inputs/a/b/c
tests/inputs/a/b/c/c.mp3
tests/inputs/a/b/b.csv

On Windows/Powershell, I see this output:

> cargo run -- tests/inputs/a/b
tests/inputs/a/b
tests/inputs/a/b.csv
tests/inputs/a/bc
tests/inputs/a/bcc.mp3

I’ve written the test suite to check the lines of output irrespective of order, and I’ve also included specific output files for Windows to ensure the backslashes are correct. A quick check with cargo test shows that this simple version of the program already passes several tests. One problem is that this program fails rather ungracefully with a nonexistent directory name causing the program to stop as soon as it tries to read the bad directory:

$ cargo run -- blargh tests/inputs/a/b
IO error for operation on blargh: No such file or directory (os error 2)

I recommend you build from this. First, figure out if the given argument names a directory that can be read. If not, print an error to STDERR and move to the next argument. Then iterate over the contents of the directory and show files, directories, or links when config.entry_types contains the appropriate EntryType. Next, filter out entry names that fail to match any of the given regular expressions when they are present. I would encourage you to look at the mk-outs.sh program I used to generate the expected output files for various executions of the original find command, and then read tests/cli.rs to see how these commands are translated to work with findr.

You got this. I know you can do it.

Solution

As suggested, my first step is to weed out anything that isn’t a directory or which can’t be read, perhaps due to permission problems. With the following code, the program passes cargo test skips_bad_dir:

pub fn run(config: Config) -> MyResult<()> {
    for dirname in config.dirs {
        match fs::read_dir(&dirname) { 1
            Err(e) => eprintln!("{}: {}", dirname, e), 2
            _ => {
                for entry in WalkDir::new(dirname) { 3
                    println!("{}", entry?.path().display());
                }
            }
        }
    }
    Ok(())
}
1

Use fs::read_dir to attempt reading a given directory.

2

When this fails, print an error message and move on.

3

Iterate over the directory entries and print their names.

Next, if the user has indicated only certain entry types, I should skip those entries that don’t match:

pub fn run(config: Config) -> MyResult<()> {
    for dirname in config.dirs {
        match fs::read_dir(&dirname) {
            Err(e) => eprintln!("{}: {}", dirname, e),
            _ => {
                for entry in WalkDir::new(dirname) {
                    let entry = entry?; 1
                    if let Some(types) = &config.entry_types { 2
                        if !types.iter().any(|type_| match type_ { 3
                            Link => entry.path_is_symlink(),
                            Dir => entry.file_type().is_dir(),
                            File => entry.file_type().is_file(),
                        }) {
                            continue; 4
                        }
                    }
                    println!("{}", entry.path().display());
                }
            }
        }
    }
    Ok(())
}
1

Unpack the Result.

2

See if there are Some(types) to filter the entries.

3

Use Iterator::any to see if any of the desired types matches the entry’s type.

4

Skip to the next entry when the condition is not met.

Recall that I used Iterator::all in Chapter 5 to return true if all of the elements in a vector passed some predicate. In the preceding code, I’m using Iterator::any to return true if at least one of the elements proves true for the predicate, which in this case is whether the entry’s type matches one of the desired types. When I check the output, it seems to be finding, for instance, all the directories:

$ cargo run -- tests/inputs/ -t d
tests/inputs/
tests/inputs/a
tests/inputs/a/b
tests/inputs/a/b/c
tests/inputs/f
tests/inputs/d
tests/inputs/d/e

I can run cargo test type on Linux and macOS to verify that I’m now passing all of the tests that check for types alone. (Windows will fail because of the aforementioned lack of symbolic links.) The failures are for a combination of type and name, so next, I need to skip the filenames that don’t match one of the given regular expressions:

pub fn run(config: Config) -> MyResult<()> {
    for dirname in config.dirs {
        match fs::read_dir(&dirname) {
            Err(e) => eprintln!("{}: {}", dirname, e),
            _ => {
                for entry in WalkDir::new(dirname) {
                    let entry = entry?;

                    // Same as before
                    if let Some(types) = &config.entry_types {}

                    if let Some(names) = &config.names {
                        if !names.iter().any(|re| {
                            re.is_match(&entry.file_name().to_string_lossy())
                        }) {
                            continue;
                        }
                    }

                    println!("{}", entry.path().display());
                }
            }
        }
    }
    Ok(())
}

I can use this to find, for instance, any regular file matching mp3, and it seems to work:

$ cargo run -- tests/inputs/ -t f -n mp3
tests/inputs/a/b/c/c.mp3
tests/inputs/d/e/e.mp3

If I run cargo test with this version of the program on a Unix-type platform, all tests pass. Huzzah! I could stop at this point, but I feel my code could be more elegant. I want to refactor this code, which means I want to restructure it without changing the way it works. Specifically, I don’t like how I’m checking the types and names and using continue to skip entries. These are filter operations, so I’d like to use Iterator::filter. Following is my final run that still passes all the tests. Be sure you add use walkdir::DirEntry to your code for this:

pub fn run(config: Config) -> MyResult<()> {
    let type_filter = |entry: &DirEntry| match &config.entry_types { 1
        Some(types) => types.iter().any(|t| match t {
            Link => entry.path_is_symlink(),
            Dir => entry.file_type().is_dir(),
            File => entry.file_type().is_file(),
        }),
        _ => true,
    };

    let name_filter = |entry: &DirEntry| match &config.names { 2
        Some(names) => names
            .iter()
            .any(|re| re.is_match(&entry.file_name().to_string_lossy())),
        _ => true,
    };

    for dirname in &config.dirs {
        match fs::read_dir(&dirname) {
            Err(e) => eprintln!("{}: {}", dirname, e),
            _ => {
                let entries = WalkDir::new(dirname)
                    .into_iter()
                    .filter_map(|e| e.ok()) 3
                    .filter(type_filter) 4
                    .filter(name_filter) 5
                    .map(|entry| entry.path().display().to_string()) 6
                    .collect::<Vec<String>>(); 7
                println!("{}", entries.join("
"));
            }
        }
    }

    Ok(())
}
1

Create a closure to filter entries on any of the regular expressions.

2

Create a similar closure to filter entries by any of the types.

3

Turn WalkDir into an iterator and use Iterator::filter_map to select Ok values.

4

Filter out unwanted types.

5

Filter out unwanted names.

6

Turn each DirEntry into a string to display.

7

Use Iterator::collect to create a Vec<String>.

In the preceding code, I create two closures to use with filter operations. I chose to use closures because I wanted to capture values from the config as I first showed in Chapter 6. The first closure checks if any of the config.entry_types matches the DirEntry::file_type:

let type_filter = |entry: &DirEntry| match &config.entry_types {
    Some(types) => types.iter().any(|type_| match type_ { 1
        Link => entry.path_is_symlink(), 2
        Dir => entry.file_type().is_dir(), 3
        File => entry.file_type().is_file(), 4
    }),
    _ => true, 4
};
1

Iterate over the config.entry_types to compare to the given entry. Note that type is a reserved word in Rust, so I use type_.

2

When the type is Link, return whether the entry is a symlink.

3

When the type is Dir, return whether the entry is a directory.

4

When the type is File, return whether the entry is a file.

The preceding match takes advantage of the Rust compiler’s ability to ensure that all variants of EntryType have been covered. For instance, comment out one arm like so:

let type_filter = |entry: &DirEntry| match &config.entry_types {
    Some(types) => types.iter().any(|t| match t {
        Link => entry.path_is_symlink(),
        Dir => entry.file_type().is_dir(),
        //File => entry.file_type().is_file(),
    }),
    _ => true,
};

The program will not compile, and the compiler warns that I’ve missed a case. You will not get this kind of safety if you use strings to model this. The enum type makes your code far safer and easier to verify and modify:

error[E0004]: non-exhaustive patterns: `&File` not covered
  --> src/lib.rs:95:51
   |
10 | / enum EntryType {
11 | |     Dir,
12 | |     File,
   | |     ---- not covered
13 | |     Link,
14 | | }
   | |_- `EntryType` defined here
...
95 |           Some(types) => types.iter().any(|t| match t {
   |                                                     ^ pattern `&File`
   |                                                       not covered
   |
   = help: ensure that all possible cases are being handled, possibly by
     adding wildcards or more match arms
   = note: the matched value is of type `&EntryType`

The second closure is used to remove filenames that don’t match one of the given regular expressions:

let name_filter = |entry: &DirEntry| match &config.names {
    Some(names) => names 1
        .iter()
        .any(|re| re.is_match(&entry.file_name().to_string_lossy())),
    _ => true, 2
};
1

When there are Some(names), use Iterator::any to check if the DirEntry::file_name matches any one of the regexes.

2

When there are no regexes, return true.

The last piece I would like to highlight is the multiple operations I can chain together with iterators in the following code. As with reading lines from a file or entries in a directory, each value in the iterator is a Result that might yield a DirEntry value. I use Iterator::filter_map to map each Result into a closure that only allows values that yield an Ok(DirEntry) value. The DirEntry values are then passed to the two filters for types and names before being shunted to the map operation to transform them into String values.

let entries = WalkDir::new(dirname)
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(type_filter)
    .filter(name_filter)
    .map(|entry| entry.path().display().to_string())
    .collect::<Vec<String>>();

While that is fairly compact code, I find it lean and expressive. I appreciate how much these functions are doing for me and how well they fit together. You are free to write code however you like so long as it passes the tests, but I find this to be my preferred solution.

Going Further

As with all the previous programs, I challenge you to implement all of the other features in find. For instance, two very useful options of find are -max_depth and -min_depth to control how deeply into the directory structure it should search. I notice there are WalkDir::min_depth and WalkDir::max_depth options you might use.

Next, perhaps try to find files by size. The find program has a particular syntax for indicating files less than, greater than, or exactly equal to sizes:

-size n[ckMGTP]
     True if the file's size, rounded up, in 512-byte blocks is n.  If
     n is followed by a c, then the primary is true if the file's size
     is n bytes (characters).  Similarly if n is followed by a scale
     indicator then the file's size is compared to n scaled as:

     k       kilobytes (1024 bytes)
     M       megabytes (1024 kilobytes)
     G       gigabytes (1024 megabytes)
     T       terabytes (1024 gigabytes)
     P       petabytes (1024 terabytes)

The find program can also take action on the results. For instance, there is a -delete option to remove an entry. This is useful for finding and removing empty files:

$ find . -size 0 -delete

I’ve often thought it would be nice to have a -count option to tell me how many items are found the way that uniq -c did in the last chapter. I can, of course, pipe this into wc -l (or, even better, wcr), but consider adding such an option to your program. Finally, I’d recommend you look at the source code for fd, another Rust replacement for find.

Summary

I hope you have an appreciation now for how complex real-world programs can become. The find program can combine multiple comparisons to help you find, say, the large files eating up your disk or files that haven’t been modified in a long time which can be removed. Consider the skills you learned in this chapter:

  • You can now use Arg::possible_values to constrain argument values to a limited set of strings, saving you time in validating user input.

  • You can use ^ at the beginning of a regular expression to anchor the pattern to the beginning of the string and $ at the end to anchor to the end of the string.

  • You can create an enum type to represent alternate possibilities for a type. This provides far more security than using strings.

  • You can use WalkDir to recursively search through a directory structure and evaluate the DirEntry values to find files, directories, and links.

  • You learned how to chain multiple operations like any, filter, map, and filter_map with iterators.

1 This is one of those odd programs that has no short flags and the long flags start with a single dash.

2 Like Freud said, “Sometimes a dot is just a dot.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset