Then is when I maybe should have wrote it down but when I looked around to find a pen and then I tried to think of what you said
They Might Be Giants
The find
utility will, unsurprisingly, find things.
With no options, it will recursively search the current working directory for all entries, including files, symbolic links, sockets, directories, and more.
You can restrict it to search one or more paths and include restrictions to find entries matching myriad restrictions such as names, file sizes, file types, modification times, permissions, and more.
The challenge program you write will locate files, directories, or links in one or more directories having names that match one or more regular expressions.
You will learn:
How to use clap
to constrain possible values for command-line arguments
How to anchor a regular expression to the end of a string
How to create an enumerated type (enum
)
How to recursively search file paths using the walkdir
crate
How to use Iterator::any
How to chain multiple filter
and map
operations
The manual page for find
is truly amazing.
It goes on for about 500 lines detailing all the options you can use to find files and directories.
Here is just the beginning that shows the general vibe of the BSD find
:
FIND(1) BSD General Commands Manual FIND(1) NAME find -- walk a file hierarchy SYNOPSIS find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression] find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression] DESCRIPTION The find utility recursively descends the directory tree for each path listed, evaluating an expression (composed of the ''primaries'' and ''operands'' listed below) in terms of each file in the tree.
The GNU find
is similar:
$ find --help Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] default path is the current directory; default expression is -print expression may consist of: operators, options, tests, and actions: operators (decreasing precedence; -and is implicit where no others are given): ( EXPR ) ! EXPR -not EXPR EXPR1 -a EXPR2 EXPR1 -and EXPR2 EXPR1 -o EXPR2 EXPR1 -or EXPR2 EXPR1 , EXPR2 positional options (always true): -daystart -follow -regextype normal options (always true, specified before other expressions): -depth --help -maxdepth LEVELS -mindepth LEVELS -mount -noleaf --version -xautofs -xdev -ignore_readdir_race -noignore_readdir_race tests (N can be +N or -N or N): -amin N -anewer FILE -atime N -cmin N -cnewer FILE -ctime N -empty -false -fstype TYPE -gid N -group NAME -ilname PATTERN -iname PATTERN -inum N -iwholename PATTERN -iregex PATTERN -links N -lname PATTERN -mmin N -mtime N -name PATTERN -newer FILE -nouser -nogroup -path PATTERN -perm [-/]MODE -regex PATTERN -readable -writable -executable -wholename PATTERN -size N[bcwkMG] -true -type [bcdpflsD] -uid N -used N -user NAME -xtype [bcdpfls] -context CONTEXT actions: -delete -print0 -printf FORMAT -fprintf FILE FORMAT -print -fprint0 FILE -fprint FILE -ls -fls FILE -prune -quit -exec COMMAND ; -exec COMMAND {} + -ok COMMAND ; -execdir COMMAND ; -execdir COMMAND {} + -okdir COMMAND ;
As usual, the challenge program will only attempt to implement a subset of these options that I’ll demonstrate forthwith using the files in 07_findr/tests/inputs.
The following output from tree
shows the directory and file structure.
Note that ->
indicates that d/b.csv is a link to the file a/b/b.csv.
A link is a pointer or a shortcut to another file or directory:
$ cd 07_findr/tests/inputs $ tree . ├── a │ ├── a.txt │ └── b │ ├── b.csv │ └── c │ └── c.mp3 ├── d │ ├── b.csv -> ../a/b/b.csv │ ├── d.tsv │ ├── d.txt │ └── e │ └── e.mp3 ├── f │ └── f.txt └── g.csv 6 directories, 9 files
Windows does not have a symbolic link (AKA symlink) like Unix, so there are four tests that will fail because the path testsinputsd.csv exists as a regular file and not as a link. I recommend Windows users explore writing and testing this program in Windows Subsystem for Linux.
To start, find
will accept one or more positional arguments which are the starting paths.
For each path, find
will recursively search for all files and directories found therein.
If I am in the tests/inputs directory and indicate .
for the current working directory, find
will list all the contents.
Note that I will be showing the output from find
when run on macOS which differs from the ordering of the entries shown on Linux:
$ find . . ./g.csv ./a ./a/a.txt ./a/b ./a/b/b.csv ./a/b/c ./a/b/c/c.mp3 ./f ./f/f.txt ./d ./d/b.csv ./d/d.txt ./d/d.tsv ./d/e ./d/e/e.mp3
Using the -type
option1, I can specify “f” to only find files:
$ find . -type f ./g.csv ./a/a.txt ./a/b/b.csv ./a/b/c/c.mp3 ./f/f.txt ./d/d.txt ./d/d.tsv ./d/e/e.mp3
I can use “l” to only find links:
$ find . -type l ./d/b.csv
I can also use “d” to only find directories:
$ find . -type d . ./a ./a/b ./a/b/c ./f ./d ./d/e
While the challenge program will only try to find these types, find
will accept several more -type
values per the manual page:
-type t True if the file is of the specified type. Possible file types are as follows: b block special c character special d directory f regular file l symbolic link p FIFO s socket
If you give a -type
value not found in this list, find
will stop with an error:
$ find . -type x find: -type: x: unknown type
The -name
option can locate items matching a file glob pattern such as *.csv for any entry ending with .csv.
The *
on the command line must be escaped with a backslash so that it is passed as a literal character and not interpreted by the shell:
$ find . -name *.csv ./g.csv ./a/b/b.csv ./d/b.csv
You can also put the pattern in quotes:
$ find . -name "*.csv" ./g.csv ./a/b/b.csv ./d/b.csv
I can search for multiple -name
patterns by chaining them with -o
for or:
$ find . -name "*.txt" -o -name "*.csv" ./g.csv ./a/a.txt ./a/b/b.csv ./f/f.txt ./d/b.csv ./d/d.txt
I can combine -type
and -name
options.
For instance, I can search for files or links matching *.csv
:
$ find . -name "*.csv" -type f -o -type l ./g.csv ./a/b/b.csv ./d/b.csv
I must use parentheses to group the -type
arguments when the -name
condition follows an or expression:
$ find . ( -type f -o -type l ) -name "*.csv" ./g.csv ./a/b/b.csv ./d/b.csv
I can also list multiple search paths as positional arguments:
$ find a/b d -name "*.mp3" a/b/c/c.mp3 d/e/e.mp3
If the given search path is an invalid directory, find
will print an error:
$ find blargh find: blargh: No such file or directory
I find it odd that find
accepts files as a path argument, simply printing the filename:
$ find a/a.txt a/a.txt
The challenge program, however, will only accept readable directory names as valid arguments.
While find
can do much more, this is as much as you will implement in this chapter.
The program you write will be called findr
(pronounced find-er), and I recommend you run cargo new findr
to start.
Update Cargo.toml with the following:
[dependencies] clap = "2.33" walkdir = "2" regex = "1" [dev-dependencies] assert_cmd = "1" predicates = "1" rand = "0.8" sys-info = "0.9"
This module is needed to detect when the tests are running on Windows and make changes in the expected output.
Normally I would suggest that you copy the 07_findr/tests directory into your project, but this will not work because the symlink in the tests/inputs directory will not be preserved causing your tests to fail.
Instead, I’ve provided a bash
script in the 07_findr directory that will copy the tests into a destination directory.
Run with no arguments to see the usage:
$ ./cp-tests.sh Usage: cp-tests.sh DEST_DIR
Assuming I created my new project in $HOME/rust/findr
, I can use the program like this:
$ ./cp-tests.sh ~/work/rust/findr Copying "tests" to "/Users/kyclark/work/rust/findr" Fixing symlink Done.
Run cargo test
to build the program and run the tests, all of which should fail.
I will use the following for src/main.rs:
fn main() { if let Err(e) = findr::get_args().and_then(findr::run) { eprintln!("{}", e); std::process::exit(1); } }
Before I show you how I started my src/lib.rs, I want to show the expected command-line interface:
$ cargo run -- --help findr 0.1.0 Ken Youens-Clark <[email protected]> Rust find USAGE: findr [OPTIONS] [--] [DIR]...FLAGS: -h, --help Prints help information -V, --version Prints version information OPTIONS: -n, --name <NAME>... Name
-t, --type <TYPE>... Entry type [possible values: f, d, l]
ARGS: <DIR>... Search directory [default: .]
The --
separates multiple optional values from the multiple positional values. Alternatively, you can place the positional arguments before the options as the find
program does.
The -n|--name
option can specify one or more patterns.
The -t|--type
option can specify one or more of f for files, d for directories, or l for links.
TK
You can model this however you like, but here is how I decided to start:
use crate::EntryType::*;use clap::{App, Arg}; use regex::Regex; use std::error::Error; type MyResult<T> = Result<T, Box<dyn Error>>; #[derive(Debug, PartialEq)]
enum EntryType { Dir, File, Link, } #[derive(Debug)] pub struct Config { dirs: Vec<String>,
names: Option<Vec<Regex>>,
entry_types: Option<Vec<EntryType>>,
}
This will allow me to use, for instance, Dir
instead of EntryType::Dir
.
The EntryType
is an enumerated list of possible values.
The dirs
will be a vector of strings.
The names
will be an optional vector of compiled regular expressions.
The entry_types
will be an optional vector of EntryType
variants.
In the preceding code, I’m introducing enum
, which is a “type that can be any one of several variants.”
You’ve already been using enums such as Option
, which has the variants Some<T>
or None
, and Result
, which has the variants Ok<T>
and Err<E>
.
In a language without such a type, you’d probably have to use literal strings in your code like “dir,” “file,” and “link.”
In Rust, I can create a new enum
called EntryType
with exactly three possibilities: Dir
, File
, or Link
.
I can use these values in pattern matching with much more precision than matching strings, which might be misspelled.
Additionally, Rust will not allow me to match
on EntryType
values without considering all the variants, which adds yet another layer of safety in using them.
Per Rust naming conventions, types, structs, traits, and enum variants use UpperCamelCase
.
Here is how you might start the get_args
function:
pub fn get_args() -> MyResult<Config> { let matches = App::new("findr") .version("0.1.0") .author("Ken Youens-Clark <[email protected]>") .about("Rust find") // What goes here? .matches() Ok(Config { dirs: ... names: ... entry_types: ... }) }
Perhaps start the run
function by printing the config
:
pub fn run(config: Config) -> MyResult<()> { println!("{:?}", config); Ok(()) }
When run with no arguments, the default Config
values should look like this:
$ cargo run Config { dirs: ["."], names: None, entry_types: None }
When given a --type
argument of “f,” the entry_types
should include the File
variant:
$ cargo run -- --type f Config { dirs: ["."], names: None, entry_types: Some([File]) }
or Dir
when the value is “d”:
$ cargo run -- --type d Config { dirs: ["."], names: None, entry_types: Some([Dir]) }
or Link
when the value is “l”:
$ cargo run -- --type l Config { dirs: ["."], names: None, entry_types: Some([Link]) }
Any other value should be rejected.
You can get clap::Arg
to handle this, so read the documentation closely:
$ cargo run -- --type x error: 'x' isn't a valid value for '--type <TYPE>...' [possible values: d, f, l] USAGE: findr --type <TYPE> For more information try --help
I’ll be using the Regex
module to match file and directory names, which means that the --name
value must be a valid regular expression.
Regex syntax differs slightly from file glob patterns as shown in Figure 6-1.
For instance, the asterisk (*
) in the file glob *.txt
means zero or more of any character and the dot has no special meaning2, so this will match files that end in .txt.
In regex syntax, however, the asterisk means zero or more of the previous character, so I need to write .*
where the dot (.
) is a metacharacter that means any one character.
*
and dot .
have different meanings in file globs versus regular expressionsThis means that the equivalent regex should use a backslash to escape the literal dot such as .*.txt
, which must be double-escaped on the command line:
$ cargo run -- --name .*\.txt Config { dirs: ["."], names: Some([.*.txt]), entry_types: None }
Alternatively, you can place the dot inside a character class like [.]
where it is no longer a metacharacter:
$ cargo run -- --name .*[.]txt Config { dirs: ["."], names: Some([.*[.]txt]), entry_types: None }
Technically, the regular expression will match anywhere in the string, even at the beginning because .*
means zero or more of anything:
let re = Regex::new(".*[.]csv").unwrap(); assert!(re.is_match("foo.csv")); assert!(re.is_match(".csv.foo"));
If I want to insist that the regex matches at the end of the string, I can add $
at the end of the pattern to indicate the end of the string:
let re = Regex::new(".*[.]csv$").unwrap(); assert!(re.is_match("foo.csv")); assert!(!re.is_match(".csv.foo"));
The converse of using $
to anchor a pattern to the end of a string is to use ^
to indicate the beginning of the string.
If I try to use the same file glob pattern that find
expects, it will be rejected:
$ cargo run -- --name *.txt Invalid --name "*.txt"
All the Config
fields should accept multiple values.
For this output, I changed run
to pretty-print the config
:
$ cargo run -- -t f l -n txt mp3 -- tests/inputs/a tests/inputs/d Config { dirs: [ "tests/inputs/a", "tests/inputs/d", ], names: Some( [ txt, mp3, ], ), entry_types: Some( [ File, Link, ], ), }
It’s important to get this much working before attempting to solve the rest of the program.
Don’t proceed until your program can replicate the preceding output and can pass at least cargo test dies
:
running 2 tests test dies_bad_type ... ok test dies_bad_name ... ok
Following is my get_args
function so that we can regroup on the task at hand:
pub fn get_args() -> MyResult<Config> { let matches = App::new("findr") .version("0.1.0") .author("Ken Youens-Clark <[email protected]>") .about("Rust find") .arg( Arg::with_name("dirs").value_name("DIR") .help("Search directory") .default_value(".") .min_values(1), ) .arg( Arg::with_name("names")
.value_name("NAME") .help("Name") .short("n") .long("name") .takes_value(true) .multiple(true), ) .arg( Arg::with_name("types")
.value_name("TYPE") .help("Entry type") .short("t") .long("type") .possible_values(&["f", "d", "l"]) .takes_value(true) .multiple(true), ) .get_matches();
The dirs
argument requires at least one value and defaults to a dot (.
).
The names
option accepts zero or more values.
The types
option accepts zero or more values of f, d, or l.
Next, I handle the possible filenames, transforming them into regular expressions or rejecting invalid patterns:
let mut names = vec![];if let Some(vals) = matches.values_of_lossy("names") {
for name in vals {
match Regex::new(&name) {
Ok(re) => names.push(re),
_ => { return Err(From::from(format!(
"Invalid --name "{}"", name ))) } } } }
Create a mutable vector to hold the regular expressions.
See if the user has provided Some(vals)
for the option.
Iterate over the values.
Try to create a new Regex
with the name.
Add a valid regex to the list of names
.
Return an error that the pattern is not valid.
Next, I interpret the entry types.
Even though I used Arg::possible_values
to ensure that the user could only supply “f,” “d,” or “l,” Rust still requires a match
arm for any other possible string:
let entry_types = matches.values_of_lossy("types").map(|vals| {vals.iter()
.filter_map(|val| match val.as_str() {
"d" => Some(Dir),
"f" => Some(File), "l" => Some(Link), _ => None,
}) .collect()
});
ArgMatches.values_of_lossy
will return an Option<Vec<String>>
. Use Option::map
to handle Some(vals)
.
Iterate over each of the values.
Use Iterator::filter_map
that “yields only the values for which the supplied closure returns Some(value)
.”
If the value is “d,” “f,” or “l,” return the appropriate EntryType
.
This arm should never be selected, but return None
anyway.
Use Iterator::collect
to gather the values into a vector of EntryType
values.
I end the function by returning the Config
:
Ok(Config { dirs: matches.values_of_lossy("dirs").unwrap(), names: if names.is_empty() { None } else { Some(names) },entry_types, }) }
Now that you have validated the arguments from the user, it’s time to look for the items that match the conditions.
You might start by iterating over config.dirs
and trying to find all the files contained in each.
I will use the walkdir
crate for this.
Following is how I can use some of the example code from the documentation to print all the entries.
Be sure to add use walkdir::WalkDir
for the following:
pub fn run(config: Config) -> MyResult<()> { for dirname in config.dirs { for entry in WalkDir::new(dirname) { println!("{}", entry?.path().display());} } Ok(()) }
To see if this works, I’ll list the contents of the tests/inputs/a/b. Note that this is the order I see on macOS:
$ cargo run -- tests/inputs/a/b tests/inputs/a/b tests/inputs/a/b/b.csv tests/inputs/a/b/c tests/inputs/a/b/c/c.mp3
On Linux, I see the following output:
$ cargo run -- tests/inputs/a/b tests/inputs/a/b tests/inputs/a/b/c tests/inputs/a/b/c/c.mp3 tests/inputs/a/b/b.csv
On Windows/Powershell, I see this output:
> cargo run -- tests/inputs/a/b tests/inputs/a/b tests/inputs/a/b.csv tests/inputs/a/bc tests/inputs/a/bcc.mp3
I’ve written the test suite to check the lines of output irrespective of order, and I’ve also included specific output files for Windows to ensure the backslashes are correct.
A quick check with cargo test
shows that this simple version of the program already passes several tests.
One problem is that this program fails rather ungracefully with a nonexistent directory name causing the program to stop as soon as it tries to read the bad directory:
$ cargo run -- blargh tests/inputs/a/b IO error for operation on blargh: No such file or directory (os error 2)
I recommend you build from this.
First, figure out if the given argument names a directory that can be read.
If not, print an error to STDERR
and move to the next argument.
Then iterate over the contents of the directory and show files, directories, or links when config.entry_types
contains the appropriate EntryType
.
Next, filter out entry names that fail to match any of the given regular expressions when they are present.
I would encourage you to look at the mk-outs.sh program I used to generate the expected output files for various executions of the original find
command, and then read tests/cli.rs to see how these commands are translated to work with findr
.
You got this. I know you can do it.
As suggested, my first step is to weed out anything that isn’t a directory or which can’t be read, perhaps due to permission problems.
With the following code, the program passes cargo test skips_bad_dir
:
pub fn run(config: Config) -> MyResult<()> { for dirname in config.dirs { match fs::read_dir(&dirname) {Err(e) => eprintln!("{}: {}", dirname, e),
_ => { for entry in WalkDir::new(dirname) {
println!("{}", entry?.path().display()); } } } } Ok(()) }
Use fs::read_dir
to attempt reading a given directory.
When this fails, print an error message and move on.
Iterate over the directory entries and print their names.
Next, if the user has indicated only certain entry types, I should skip those entries that don’t match:
pub fn run(config: Config) -> MyResult<()> { for dirname in config.dirs { match fs::read_dir(&dirname) { Err(e) => eprintln!("{}: {}", dirname, e), _ => { for entry in WalkDir::new(dirname) { let entry = entry?;if let Some(types) = &config.entry_types {
if !types.iter().any(|type_| match type_ {
Link => entry.path_is_symlink(), Dir => entry.file_type().is_dir(), File => entry.file_type().is_file(), }) { continue;
} } println!("{}", entry.path().display()); } } } } Ok(()) }
Unpack the Result
.
See if there are Some(types)
to filter the entries.
Use Iterator::any
to see if any of the desired types matches the entry’s type.
Skip to the next entry when the condition is not met.
Recall that I used Iterator::all
in Chapter 5 to return true
if all of the elements in a vector passed some predicate.
In the preceding code, I’m using Iterator::any
to return true
if at least one of the elements proves true
for the predicate, which in this case is whether the entry’s type matches one of the desired types.
When I check the output, it seems to be finding, for instance, all the directories:
$ cargo run -- tests/inputs/ -t d tests/inputs/ tests/inputs/a tests/inputs/a/b tests/inputs/a/b/c tests/inputs/f tests/inputs/d tests/inputs/d/e
I can run cargo test type
on Linux and macOS to verify that I’m now passing all of the tests that check for types alone.
(Windows will fail because of the aforementioned lack of symbolic links.)
The failures are for a combination of type and name, so next, I need to skip the filenames that don’t match one of the given regular expressions:
pub fn run(config: Config) -> MyResult<()> { for dirname in config.dirs { match fs::read_dir(&dirname) { Err(e) => eprintln!("{}: {}", dirname, e), _ => { for entry in WalkDir::new(dirname) { let entry = entry?; // Same as before if let Some(types) = &config.entry_types {} if let Some(names) = &config.names { if !names.iter().any(|re| { re.is_match(&entry.file_name().to_string_lossy()) }) { continue; } } println!("{}", entry.path().display()); } } } } Ok(()) }
I can use this to find, for instance, any regular file matching mp3, and it seems to work:
$ cargo run -- tests/inputs/ -t f -n mp3 tests/inputs/a/b/c/c.mp3 tests/inputs/d/e/e.mp3
If I run cargo test
with this version of the program on a Unix-type platform, all tests pass.
Huzzah!
I could stop at this point, but I feel my code could be more elegant.
I want to refactor this code, which means I want to restructure it without changing the way it works.
Specifically, I don’t like how I’m checking the types and names and using continue
to skip entries.
These are filter operations, so I’d like to use Iterator::filter
.
Following is my final run
that still passes all the tests.
Be sure you add use walkdir::DirEntry
to your code for this:
pub fn run(config: Config) -> MyResult<()> { let type_filter = |entry: &DirEntry| match &config.entry_types {Some(types) => types.iter().any(|t| match t { Link => entry.path_is_symlink(), Dir => entry.file_type().is_dir(), File => entry.file_type().is_file(), }), _ => true, }; let name_filter = |entry: &DirEntry| match &config.names {
Some(names) => names .iter() .any(|re| re.is_match(&entry.file_name().to_string_lossy())), _ => true, }; for dirname in &config.dirs { match fs::read_dir(&dirname) { Err(e) => eprintln!("{}: {}", dirname, e), _ => { let entries = WalkDir::new(dirname) .into_iter() .filter_map(|e| e.ok())
.filter(type_filter)
.filter(name_filter)
.map(|entry| entry.path().display().to_string())
.collect::<Vec<String>>();
println!("{}", entries.join(" ")); } } } Ok(()) }
Create a closure to filter entries on any
of the regular expressions.
Create a similar closure to filter entries by any
of the types.
Turn WalkDir
into an iterator and use Iterator::filter_map
to select Ok
values.
Filter out unwanted types.
Filter out unwanted names.
Turn each DirEntry
into a string to display.
Use Iterator::collect
to create a Vec<String>
.
In the preceding code, I create two closures to use with filter
operations.
I chose to use closures because I wanted to capture values from the config
as I first showed in Chapter 6.
The first closure checks if any of the config.entry_types
matches the DirEntry::file_type
:
let type_filter = |entry: &DirEntry| match &config.entry_types { Some(types) => types.iter().any(|type_| match type_ {Link => entry.path_is_symlink(),
Dir => entry.file_type().is_dir(),
File => entry.file_type().is_file(),
}), _ => true,
};
Iterate over the config.entry_types
to compare to the given entry. Note that type
is a reserved word in Rust, so I use type_
.
When the type is Link
, return whether the entry is a symlink.
When the type is Dir
, return whether the entry is a directory.
When the type is File
, return whether the entry is a file.
The preceding match
takes advantage of the Rust compiler’s ability to ensure that all variants of EntryType
have been covered.
For instance, comment out one arm like so:
let type_filter = |entry: &DirEntry| match &config.entry_types { Some(types) => types.iter().any(|t| match t { Link => entry.path_is_symlink(), Dir => entry.file_type().is_dir(), //File => entry.file_type().is_file(), }), _ => true, };
The program will not compile, and the compiler warns that I’ve missed a case.
You will not get this kind of safety if you use strings to model this.
The enum
type makes your code far safer and easier to verify and modify:
error[E0004]: non-exhaustive patterns: `&File` not covered --> src/lib.rs:95:51 | 10 | / enum EntryType { 11 | | Dir, 12 | | File, | | ---- not covered 13 | | Link, 14 | | } | |_- `EntryType` defined here ... 95 | Some(types) => types.iter().any(|t| match t { | ^ pattern `&File` | not covered | = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms = note: the matched value is of type `&EntryType`
The second closure is used to remove filenames that don’t match one of the given regular expressions:
let name_filter = |entry: &DirEntry| match &config.names { Some(names) => names.iter() .any(|re| re.is_match(&entry.file_name().to_string_lossy())), _ => true,
};
When there are Some(names)
, use Iterator::any
to check if the DirEntry::file_name
matches any one of the regexes.
When there are no regexes, return true
.
The last piece I would like to highlight is the multiple operations I can chain together with iterators in the following code.
As with reading lines from a file or entries in a directory, each value in the iterator is a Result
that might yield a DirEntry
value.
I use Iterator::filter_map
to map each Result
into a closure that only allows values that yield an Ok(DirEntry)
value.
The DirEntry
values are then passed to the two filters for types and names before being shunted to the map
operation to transform them into String
values.
let entries = WalkDir::new(dirname) .into_iter() .filter_map(|e| e.ok()) .filter(type_filter) .filter(name_filter) .map(|entry| entry.path().display().to_string()) .collect::<Vec<String>>();
While that is fairly compact code, I find it lean and expressive. I appreciate how much these functions are doing for me and how well they fit together. You are free to write code however you like so long as it passes the tests, but I find this to be my preferred solution.
As with all the previous programs, I challenge you to implement all of the other features in find
.
For instance, two very useful options of find
are -max_depth
and -min_depth
to control how deeply into the directory structure it should search.
I notice there are WalkDir::min_depth
and WalkDir::max_depth
options you might use.
Next, perhaps try to find files by size.
The find
program has a particular syntax for indicating files less than, greater than, or exactly equal to sizes:
-size n[ckMGTP] True if the file's size, rounded up, in 512-byte blocks is n. If n is followed by a c, then the primary is true if the file's size is n bytes (characters). Similarly if n is followed by a scale indicator then the file's size is compared to n scaled as: k kilobytes (1024 bytes) M megabytes (1024 kilobytes) G gigabytes (1024 megabytes) T terabytes (1024 gigabytes) P petabytes (1024 terabytes)
The find
program can also take action on the results.
For instance, there is a -delete
option to remove an entry.
This is useful for finding and removing empty files:
$ find . -size 0 -delete
I’ve often thought it would be nice to have a -count
option to tell me how many items are found the way that uniq -c
did in the last chapter.
I can, of course, pipe this into wc -l
(or, even better, wcr
), but consider adding such an option to your program.
Finally, I’d recommend you look at the source code for fd
, another Rust replacement for find
.
I hope you have an appreciation now for how complex real-world programs can become.
The find
program can combine multiple comparisons to help you find, say, the large files eating up your disk or files that haven’t been modified in a long time which can be removed.
Consider the skills you learned in this chapter:
You can now use Arg::possible_values
to constrain argument values to a limited set of strings, saving you time in validating user input.
You can use ^
at the beginning of a regular expression to anchor the pattern to the beginning of the string and $
at the end to anchor to the end of the string.
You can create an enum
type to represent alternate possibilities for a type. This provides far more security than using strings.
You can use WalkDir
to recursively search through a directory structure and evaluate the DirEntry
values to find files, directories, and links.
You learned how to chain multiple operations like any
, filter
, map
, and filter_map
with iterators.
18.206.48.243