Chapter 6. Den of Uniquity

There’s only one everything

The Might Be Giants

The uniq program (pronounced unique) will find the distinct strings in lines of text either from a file or STDIN. Among its many uses, it is often employed to count how many times each unique string is found. I think you will find it challenging to create a program that mimics the output from the original tool. Here are some of the skills you will learn:

  • How to write to a file or STDOUT (pronounced standard out)

  • How and why to write a closure to capture a variable

  • About the don’t repeat yourself (DRY) concept

  • About the Write trait and the write! and writeln! macros

  • How and why to use temporary files

  • How to indicate the lifetime of a variable

  • How to write tests that pass arguments on the command line or through STDIN

How uniq Works

Following is part of the manual page for the BSD version of uniq. The challenge program in this chapter will only implement the reading of a file or STDIN, writing to a file or STDOUT, and counting the lines for the -c flag:

UNIQ(1)                   BSD General Commands Manual                  UNIQ(1)

NAME
     uniq -- report or filter out repeated lines in a file

SYNOPSIS
     uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]

DESCRIPTION
     The uniq utility reads the specified input_file comparing adjacent lines,
     and writes a copy of each unique input line to the output_file.  If
     input_file is a single dash ('-') or absent, the standard input is read.
     If output_file is absent, standard output is used for output.  The second
     and succeeding copies of identical adjacent input lines are not written.
     Repeated lines in the input will not be detected if they are not adja-
     cent, so it may be necessary to sort the files first.

     The following options are available:

     -c      Precede each output line with the count of the number of times
             the line occurred in the input, followed by a single space.

     -d      Only output lines that are repeated in the input.

     -f num  Ignore the first num fields in each input line when doing compar-
             isons.  A field is a string of non-blank characters separated
             from adjacent fields by blanks.  Field numbers are one based,
             i.e., the first field is field one.

     -s chars
             Ignore the first chars characters in each input line when doing
             comparisons.  If specified in conjunction with the -f option, the
             first chars characters after the first num fields will be
             ignored.  Character numbers are one based, i.e., the first char-
             acter is character one.

     -u      Only output lines that are not repeated in the input.

     -i      Case insensitive comparison of lines.

In the 06_uniq directory of the Git repository, you will find various input files I’ll use for testing. To start, note that uniq will print nothing when given an empty file:

$ uniq tests/inputs/empty.txt

Given a file with just one line, the one line will be printed:

$ uniq tests/inputs/one.txt
a

It will also print the number of times a line occurs before the line when run with the -c option. The count is right-justified in a field 4 characters wide and is followed by a single space and then the line:

$ uniq -c tests/inputs/one.txt
   1 a

Given input that contains two duplicate lines, only one line will be emitted:

$ cat tests/inputs/two.txt
a
a
$ uniq tests/inputs/two.txt
a

The line is shown to have a count of 2:

$ uniq -c tests/inputs/two.txt
   2 a

A longer input file shows that uniq only considers the lines in order and not globally as the line one shows several times in this input file:

$ cat tests/inputs/three.txt
a
a
b
b
a
c
c
c
a
d
d
d
d

When counting, uniq starts over at 1 each time it sees a new string:

$ uniq -c tests/inputs/three.txt
   2 a
   2 b
   1 a
   3 c
   1 a
   4 d

If you wanted actually unique values, you must first sort the input. Note in the following command that uniq will read STDIN by default:

$ sort tests/inputs/three.txt | uniq -c
   4 a
   2 b
   3 c
   4 d

The file tests/inputs/skip.txt contains a blank line in the input acts just like any other value and so it will reset the counter:

$ uniq -c tests/inputs/skip.txt
   1 a
   1
   1 a
   1 b

If you study the usage closely, you’ll see a very subtle indication of how to write the output to a file. Notice how input and output in the following are grouped inside square brackets to indicate that they are optional as a pair. That is, if you provide input, then you may also optionally provide output:

usage: uniq [-c | -d | -u] [-i] [-f fields] [-s chars] [input [output]]

For example, I can count tests/inputs/two.txt and place the output into out:

$ uniq -c tests/inputs/two.txt out
$ cat out
      2 a

With no positional arguments, uniq will read from STDIN by default:

$ cat tests/inputs/two.txt | uniq -c
      2 a

If you want to read from STDIN and indicate the output filename, you must use a dash (“-” for the input filename:

$ cat tests/inputs/two.txt | uniq -c - out
$ cat out
      2 a

The GNU version works basically the same while also providing many more options:

$ uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Filter adjacent matching lines from INPUT (or standard input),
writing to OUTPUT (or standard output).

With no options, matching lines are merged to the first occurrence.

Mandatory arguments to long options are mandatory for short options too.
  -c, --count           prefix lines by the number of occurrences
  -d, --repeated        only print duplicate lines, one for each group
  -D, --all-repeated[=METHOD]  print all duplicate lines
                          groups can be delimited with an empty line
                          METHOD={none(default),prepend,separate}
  -f, --skip-fields=N   avoid comparing the first N fields
      --group[=METHOD]  show all items, separating groups with an empty line
                          METHOD={separate(default),prepend,append,both}
  -i, --ignore-case     ignore differences in case when comparing
  -s, --skip-chars=N    avoid comparing the first N characters
  -u, --unique          only print unique lines
  -z, --zero-terminated  end lines with 0 byte, not newline
  -w, --check-chars=N   compare no more than N characters in lines
      --help     display this help and exit
      --version  output version information and exit

A field is a run of blanks (usually spaces and/or TABs), then non-blank
characters.  Fields are skipped before chars.

Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use 'sort -u' without 'uniq'.
Also, comparisons honor the rules specified by 'LC_COLLATE'.

As you can see, both the BSD and GNU versions have many more options, but this is as much as the challenge program is expected to implement.

Getting Started

Just so there are no surprises, this program will be called uniqr for a Rust version of uniq. Start by running cargo new uniqr to create a new binary package, then modify your Cargo.toml to add the following dependencies:

[dependencies]
clap = "2.33"

[dev-dependencies]
assert_cmd = "1"
predicates = "1"
tempfile = "3"
rand = "0.8"

Copy my 06_uniqr/tests directory into your projects, and then run cargo test to ensure the program compiles and that the tests run and fail.

Defining the Arguments

If you would like to copy my normal module structure, then modify your src/main.rs to the following:

fn main() {
    if let Err(e) = uniqr::get_args().and_then(uniqr::run) {
        eprintln!("{}", e);
        std::process::exit(1);
    }
}

You can start your src/lib.rs like previous chapters:

use clap::{App, Arg};
use std::error::Error;

type MyResult<T> = Result<T, Box<dyn Error>>;

#[derive(Debug)]
pub struct Config {
    in_file: String, 1
    out_file: Option<String>, 2
    count: bool, 3
}
1

The program will either be read from a given in_file or STDIN.

2

The output will either be written to a given out_file or STDOUT.

3

The count is a Boolean for whether to print the counts of each line.

Here is an outline for get_args:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("uniq")
        .version("0.1.0")
        .author("Ken Youens-Clark <[email protected]>")
        .about("Rust uniq")
        // What goes here?
        .get_matches();

    Ok(Config {
        in_file: ...
        out_file: ...
        count: ...
    })
}

I suggest you start your run by printing the config:

pub fn run(config: Config) -> MyResult<()> {
    println!("{:?}", config);
    Ok(())
}

Your program should be able to produce the following usage:

$ cargo run -- -h
uniqr 0.1.0
Ken Youens-Clark <[email protected]>
Rust uniq

USAGE:
    uniqr [FLAGS] [ARGS]

FLAGS:
    -c, --count      Show counts 1
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <FILE>    Input file [default: -] 2
    <FILE>    Output file 3
1

The -c|--count flag is optional.

2

The input file is the first positional argument and defaults to a dash.

3

The output file is the second positional argument and is optional.

By default the program will read from STDIN which can be represented using a dash:

$ cargo run
Config { in_file: "-", out_file: None, count: false }

The first positional argument should be interpreted as the in_file, the second positional argument as the out_file. Note that clap can handle options either before or after positional arguments:

$ cargo run -- tests/inputs/one.txt out --count
Config { in_file: "tests/inputs/one.txt", out_file: Some("out"), count: true }
Tip

I am trying to mimic the original versions as much as possible, but I do not like optional positional parameters. In my opinion, it would be better to have an -o|--output option that defaults to STDOUT and have only one optional positional argument for the input file which defaults to STDIN.

I assume you are an upright and moral person who figured out how to write that on your own, so I will now share my solution to this:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("uniq")
        .version("0.1.0")
        .author("Ken Youens-Clark <[email protected]>")
        .about("Rust uniq")
        .arg(
            Arg::with_name("in_file")
                .value_name("FILE")
                .help("Input file")
                .default_value("-"),
        )
        .arg(
            Arg::with_name("out_file")
                .value_name("FILE")
                .help("Output file"),
        )
        .arg(
            Arg::with_name("count")
                .value_name("COUNT")
                .help("Show counts")
                .short("c")
                .long("count")
                .takes_value(false),
        )
        .get_matches();

    Ok(Config {
        in_file: matches.value_of("in_file").map(str::to_string).unwrap(), 1
        out_file: matches.value_of("out_file").map(String::from), 2
        count: matches.is_present("count"), 3
    })
}
1

Convert the in_file argument to a String.

2

Convert the out_file argument to an Option<String>.

3

The count is either present or not, so convert to a bool.

In the preceding code, ArgMatches::value_of method returns Option<&str>. To convert in_file to a String, I’m calling Option::map which would normally take a closure, which I first mentioned in Chapter 5. Instead, I’m using two existing functions that will convert the &str to a String: str::to_string, and String::from, which could also be written as From::from because Rust infers the String type. Remember, functions are first-class objects that can be passed as arguments to other functions.

Because I have defined the in_file argument to have a default value, it should never be None; therefore, I can use Option::map to convert Some(&str) to Some(String) and do nothing with a None. I similarly do this for the out_file but leave it as an Option since it is not required.

Testing the Program

The test suite in tests/cli.rs is fairly large containing 78 tests. I intended to test the program under the following conditions:

  1. Input file as the only positional argument, check STDOUT

  2. Input file as a positional argument with --count option, check STDOUT

  3. Input from STDIN with no positional arguments, check STDOUT

  4. Input from STDIN with --count and no positional arguments, check STDOUT

  5. Input and output files as positional arguments, check output file

  6. Input and output files as positional arguments with --count, check output file

  7. Input from STDIN and output files as positional arguments with --count, check output file

Given how large and complicated the tests became, I incorporated a bit more structure in the code. I encourage you to read the tests which start off like so:

use assert_cmd::Command;
use predicates::prelude::*;
use rand::{distributions::Alphanumeric, Rng};
use std::fs;
use tempfile::NamedTempFile; 1

type TestResult = Result<(), Box<dyn std::error::Error>>;

struct Test<'a> { 2
    input: &'a str,
    out: &'a str,
    out_count: &'a str,
}
1

This is used to create temporary output files.

2

A struct to define the input files and expected output values with and without the counts.

Note the use of 'a to denote the lifetime of the values. I want to define structs with &str values, and the Rust compiler would like to know exactly how long the values are expected to stick around relative to each other. The fact that all the values have the same lifetime 'a means that they are all expected to live as long as each other. If you remove the lifetime annotations and run the tests, you’ll see similar warnings from the compiler as shown in the previous section along with an suggestion of exactly how to fix it:

error[E0106]: missing lifetime specifier
 --> tests/cli.rs:8:12
  |
8 |     input: &str,
  |            ^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
7 | struct Test<'a> {
8 |     input: &'a str,

Next, I define the constant values I need for testing:

const PRG: &str = "uniqr"; 1

const EMPTY: Test = Test {
    input: "tests/inputs/empty.txt", 2
    out: "tests/inputs/empty.txt.out", 3
    out_count: "tests/inputs/empty.txt.c.out", 4
};
1

The name of the program being tested

2

The location of the input file for this test

3

The location of the output file without the counts

4

The location of the output file with the counts

After the declaration of EMPTY, there are many more Test structures followed by several helper functions. The run function will use the Test.input as an input file and will compare STDOUT to the contents of the Test.out file:

fn run(test: &Test) -> TestResult { 1
    let expected = fs::read_to_string(test.out)?; 2
    Command::cargo_bin(PRG)? 3
        .arg(test.input)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

The function accepts a Test and returns a TestResult.

2

Try to read the expected output file.

3

Try to run the program with the input file as an argument, verify it ran successfully, and compare the STDOUT to the expected value.

The next helper works very similarly but this time tests for the counting:

fn run_count(test: &Test) -> TestResult {
    let expected = fs::read_to_string(test.out_count)?; 1
    Command::cargo_bin(PRG)?
        .args(&[test.input, "-c"]) 2
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

Read the Test.out_count file for the expected output.

2

Pass both the Test.input value and the flag -c to count the lines.

The next function will pass the input as STDIN:

fn run_stdin(test: &Test) -> TestResult {
    let input = fs::read_to_string(test.input)?; 1
    let expected = fs::read_to_string(test.out)?; 2
    Command::cargo_bin(PRG)? 3
        .write_stdin(input)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

Try to read the Test.input file.

2

Try to read the Test.out file.

3

Pass the input through STDIN and verify that STDOUT is the expected value.

The next function tests both reading from STDIN and counting the lines:

fn run_stdin_count(test: &Test) -> TestResult {
    let input = fs::read_to_string(test.input)?;
    let expected = fs::read_to_string(test.out_count)?;
    Command::cargo_bin(PRG)? 1
        .arg("--count")
        .write_stdin(input)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

Run the program with the long --count flag and feed the input to STDIN, verify that STDOUT is correct.

The next function checks that the program accepts both the input and output files as positional arguments. This is somewhat more interesting as I needed to use temporary files in the testing because, as you have seen repeatedly, Rust will run the tests in parallel. If I were to use the same dummy filename like blargh to write all the output files, the tests would overwrite each other’s output. To get around this, I use the tempfile::NamedTempFile to get a dynamically generated temporary filename that will automatically be removed when I finish:

fn run_outfile(test: &Test) -> TestResult {
    let expected = fs::read_to_string(test.out)?;
    let outfile = NamedTempFile::new()?; 1
    let outpath = &outfile.path().to_str().unwrap(); 2

    Command::cargo_bin(PRG)? 3
        .args(&[test.input, outpath])
        .assert()
        .success()
        .stdout("");
    let contents = fs::read_to_string(&outpath)?; 4
    assert_eq!(&expected, &contents); 5

    Ok(())
}
1

Try to get a named temporary file.

2

Get the path to the file.

3

Run the program with the input and output filenames as arguments, verify there is nothing in STDOUT.

4

Try to read the output file.

5

Check that the contents of the output file match the expected value.

The next two functions are variations on what I’ve already shown, adding in the --count flag and finally asking the program to read from STDIN when the input filename is a dash. The rest of the module calls these helpers using the various structs to run all the tests. It was tedious to write all this, but it’s very important to test every single aspect of a program with all possible combinations of the arguments. You can perhaps see now why I didn’t want to add any more functionality to the challenge program.

One interesting aspect of implementing open-source programs like uniq is that you can sometimes find the source code for those programs. While looking for the original test suite, I found a Perl program used to test the GNU version. I used several of those examples in the tests/inputs/t[1-6].txt input files, as you can see in 06_uniqr/mk-outs.sh:

$ cat mk-outs.sh
#!/usr/bin/env bash

ROOT="tests/inputs"
OUT_DIR="tests/expected"

[[ ! -d "$OUT_DIR" ]] && mkdir -p "$OUT_DIR"

# Cf https://github.com/coreutils/coreutils/blob/master/tests/misc/uniq.pl
echo -ne "a
a
"    > $ROOT/t1.txt 1
echo -ne "a
a"      > $ROOT/t2.txt 2
echo -ne "a
b"      > $ROOT/t3.txt 3
echo -ne "a
a
b"   > $ROOT/t4.txt 4
echo -ne "b
a
a
" > $ROOT/t5.txt 5
echo -ne "a
b
c
" > $ROOT/t6.txt 6

for FILE in $ROOT/*.txt; do
    BASENAME=$(basename "$FILE")
    uniq      $FILE > ${OUT_DIR}/${BASENAME}.out
    uniq -c   $FILE > ${OUT_DIR}/${BASENAME}.c.out
    uniq    < $FILE > ${OUT_DIR}/${BASENAME}.stdin.out
    uniq -c < $FILE > ${OUT_DIR}/${BASENAME}.stdin.c.out
done
1

Two lines each ending with a newline

2

No trailing newline on last line

3

Two different lines, no trailing newline

4

Two lines the same, last is different with no trailing newline

5

Two different values with newlines on each

6

Three different values with newlines on each

I found it surprisingly challenging to exactly match the output from uniq, but I guess I’m a better person for the effort now. I would suggest start in src/lib.rs by reading the input file, and again it makes sense to use the open function from previous chapters:

fn open(filename: &str) -> MyResult<Box<dyn BufRead>> {
    match filename {
        "-" => Ok(Box::new(BufReader::new(io::stdin()))),
        _ => Ok(Box::new(BufReader::new(File::open(filename)?))),
    }
}

Be sure you expand your imports to include the following:

use clap::{App, Arg};
use std::{ 1
    error::Error,
    fs::File,
    io::{self, BufRead, BufReader},
};
1

This syntax lets me group imports by common prefixes, so all the following come from std.

You can borrow quite a bit of code from Chapter 3 that reads lines of text from an input file or STDIN while preserving the line endings:

pub fn run(config: Config) -> MyResult<()> {
    let mut file = open(&config.in_file)
        .map_err(|e| format!("{}: {}", config.in_file, e))?; 1
    let mut line = String::new(); 2
    loop { 3
        let bytes = file.read_line(&mut line)?; 4
        if bytes == 0 { 5
            break;
        }
        print!("{}", line); 6
        line.clear(); 7
    }
    Ok(())
}
1

Read either STDIN if the input file is a dash or open the given filename. Create an informative error message when this fails.

2

Create a new, empty mutable String buffer to hold each line.

3

Create an infinite loop.

4

Read a line of text while preserving the line endings.

5

If no bytes were read, break out of the loop.

6

Print the line buffer.

7

Clear the line buffer.

Run your program with an input file to ensure it works:

$ cargo run -- tests/inputs/one.txt
a

It should also work for reading STDIN:

$ cargo run -- - < tests/inputs/one.txt
a

Try to get your program to iterate the lines of input and count each unique run of lines, then try to get your program to print the lines with and without the counts. Look at how open works, and try to copy those ideas to use File::create to write output to a file or STDOUT. Remember that you can run just a subset of tests with a command like cargo test empty to run all the tests with the string empty in the name. I know you can do this.

Solution

I’ll step you through how I arrived at a solution. Your version may be different, but it’s fine as long as it passes the test suite. Building on the last version I showed, I decided to create two additional mutable variables to hold the last line of text and the running count. For now, I will always print the count to make sure it’s working correctly:

pub fn run(config: Config) -> MyResult<()> {
    let mut file = open(&config.in_file)
        .map_err(|e| format!("{}: {}", config.in_file, e))?;
    let mut line = String::new();
    let mut last = String::new(); 1
    let mut count: u64 = 0; 2

    loop {
        let bytes = file.read_line(&mut line)?;
        if bytes == 0 {
            break;
        }

        if line.trim_end() != last.trim_end() { 3
            if count > 0 { 4
                print!("{:>4} {}", count, last); 5
            }
            last = line.clone(); 6
            count = 0; 7
        }

        count += 1; 8
        line.clear();
    }

    if count > 0 { 9
        print!("{:>4} {}", count, last);
    }

    Ok(())
}
1

Create a mutable variable to hold the last line of text.

2

Create a mutable variable to hold the count.

3

Compare the current line to the last time, both trimmed of any possible trailing whitespace.

4

Only print the output when count is greater than 0.

5

Print the count right-justified in a column 4 characters wide followed by a space and the line value.

6

Set the last variable to a copy of the current line.

7

Reset the counter to 0.

8

Increment the counter by 1.

9

Handle the last line of the file.

If I run cargo test, this will pass a goodly number of tests. This code is clunky, though. I don’t like having to check if count > 0 twice as it violates the don’t repeat yourself (DRY) principle, so I’m thinking that needs to be put into a function. I’m also always printing the count, but I should only print this when config.count is true—another condition that could go into this function. I decided to write this as a closure inside the run function to close around the config.count value:

let print = |count: &u64, line: &String| { 1
    if count > &0 { 2
        if config.count { 3
            print!("{:>4} {}", &count, &line); 4
        } else {
            print!("{}", &line); 5
        }
    };
};
1

The print closure will accept count and line values.

2

Only print if the count is greater than 0.

3

Check if the config.count value is true.

4

Use the print! macro to print the count and line to STDOUT.

5

Otherwise, print the line to STDOUT.

I can update the rest of the function to use this closure:

loop {
    let bytes = file.read_line(&mut line)?;
    if bytes == 0 {
        break;
    }

    if line.trim_end() != last.trim_end() {
        print(&count, &last);
        last = line.clone();
        count = 0;
    }

    count += 1;
    line.clear();
}

print(&count, &last);

With these changes, my program will pass several more tests. If I look at the failed tests, they all contain outfile because I’m failing to write a named output file. I can add this last feature with two changes. First, I would like to open the output file in the same way as I open the input file by either creating a named output file or by using STDOUT. You’ll need to add use std::io::Write for the following code, which you can place just just after the file variable:

let mut out_file: Box<dyn Write> = match &config.out_file { 1
    Some(out_name) => Box::new(File::create(&out_name)?), 2
    _ => Box::new(io::stdout()), 3
};
1

The mutable out_file will be a boxed value that implements the std::io::Write trait.

2

When config.out_file is Some filename, use File::create to try to create the file.

3

Otherwise, use std::io::stdout.

If you follow the documentation for both File::create and io::stdout, you’ll see both have a Traits section showing the various traits they implement. Both show that they implement Write, and so they satisfy the type requirement Box<dyn Write> which says that the value inside the Box must implement this trait.

The second change I need to make is to use out_file for the output. I will replace the print! macro with write! to write the output to a stream like a filehandle or STDOUT. The first argument to write! must be a mutable value that implements the Write trait. The documentation shows that write! will return a std::io::Result because it might fail. As such, I changed my print closure to return MyResult. Here is the final version of my run function that passes all the tests:

pub fn run(config: Config) -> MyResult<()> {
    let mut file = open(&config.in_file)
        .map_err(|e| format!("{}: {}", config.in_file, e))?; 1
    let mut out_file: Box<dyn Write> = match &config.out_file { 2
        Some(out_name) => Box::new(File::create(&out_name)?),
        _ => Box::new(io::stdout()),
    };

    let mut print = |count: &u64, line: &String| -> MyResult<()> { 3
        if count > &0 {
            if config.count {
                write!(out_file, "{:>4} {}", &count, &line)?;
            } else {
                write!(out_file, "{}", &line)?;
            }
        };
        Ok(())
    };

    let mut line = String::new();
    let mut last = String::new();
    let mut count: u64 = 0;
    loop {
        let bytes = file.read_line(&mut line)?;
        if bytes == 0 {
            break;
        }

        if line.trim_end() != last.trim_end() {
            print(&count, &last)?; 4
            last = line.clone();
            count = 0;
        }

        count += 1;
        line.clear();
    }

    print(&count, &last)?; 5

    Ok(())
}
1

Open either STDIN or the given input filename.

2

Open either STDOUT or the given output filename.

3

Create a mutable print closure to format the output.

4

Use the print closure to possibly print output. Use ? to propagate potential errors.

5

Handle the last line of the file.

I would think it’s highly unlikely that you would independently write something exactly like my solution. As long as your solution passes the tests, it’s perfectly acceptable. Part of what I like about writing with tests is that there is an objective determination when a program meets some level of specifications. Louis Srygley says, “Without requirements or design, programming is the art of adding bugs to an empty text file.” I would say that tests are the requirements made incarnate. Without tests, you simply have no way to know when a change to your program strays from the requirements or breaks the design.

In closing, I would share that I explored a couple of other ways to write this algorithm, one of which read all the lines of the input file into a vector and looked at pairs of lines using Vec::windows. This was interesting but could fail if the size of the input file exceeded the available memory on my machine. The solution presented here will only ever allocate memory for the current and last lines and so should scale to any size file.

Going Further

As usual, the BSD and GNU versions of uniq both have many more features than I chose to include in the challenge. I would encourage you to add all the features you would like to have in your version. Consider reading the original source code and tests to see what you can borrow. Be sure to add tests for each feature, and always run the entire test suite to verify that all previous features still work.

In my mind, uniq is closely tied with sort as I often use them together. Consider implementing your own version of sort, at least to the point of sorting values lexicographically (in dictionary order) or numerically.

Summary

I hope you found this challenge as interesting to write as I did. Let’s review some of the things you learned:

  • You can now open a new file for writing or use STDOUT.

  • The idea of DRY is that you move any duplicated code into a single abstraction like a function or a closure.

  • You’ve seen that a closure can be used to capture values from the enclosing scope.

  • You learned about the Write trait and how values that implement this trait can be used with the write! and writeln! macros.

  • Sometimes you will find you need to write a temporary file. You’ve now used a Rust crate that helps find a unique filename and open a filehandle for you.

  • The Rust compiler may sometimes require you to indicate the lifetime of a variable which is how long it lives in relation to other variables.

1 The name closure refers to this capturing or closing around a lexically scoped binding.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.234.55.154