Finding the largest files in a directory

Suppose you're out of disk space. A solution may be to delete old, large files from a directory. Let's write a D program to perform this task.

How to do it…

Execute the following steps to find the largest files in a directory:

  1. Use the std.file.dirEntries function to get a listing of all files.
  2. Define the DirEntry variable as an array.
  3. Sort the array by size in descending order by using std.algorithm and a lambda function.
  4. Filter out the newer files with std.algorithm.filter.
  5. Delete the top 10 files with the std.file.remove function.

The code is as follows:

void main() {
    import std.file, std.algorithm, std.datetime, std.range;
    DirEntry[] allFiles;
    foreach(DirEntry entry; dirEntries("target_directory", SpanMode.depth))
        allFiles ~= entry;
    auto sorted = sort!((a, b) => a.size > b.size)(allFiles);
    auto filtered = filter!((a) => Clock.currTime() - a.timeLastModified >> 14.days)(sorted);
    foreach(file; filtered.take!(10))
       remove(file.name);
}

How it works…

Phobos provides the std.file module for high-level operations on files and directories. With it, we can read and write files, list files in a directory, get file information, and perform common operations such as deleting and copying files.

The dirEntries function returns an object that works with foreach. Depending on the type you request in the loop, it will provide different information. The foreach(string name; dirEntries()) function gives you just the filenames. The foreach(DirEntry entry; dirEntries()) function gives details.

This is implemented with a function called opApply. D's foreach loop understands four kinds of items: a numeric interval, arrays (or slices), input ranges, and objects with a member function called opApply. These are explained in detail in the following paragraphs.

Numeric intervals are a simple start-to-finish progression of integers, as shown in the following line of code:

foreach(num; 0 .. 10) { /* loops from num = 0 up to, but notincluding, num = 10 */ }

Input ranges are iterable objects that are used throughout much of Phobos. Indeed, the sort, filter, and take functions we use here from std.algorithm both consume and return input ranges. Ranges will be covered in greater depth later in this book.

While input ranges are useful for a variety of tasks, they aren't ideal for everything. The opApply function is used for these cases. It is a special member function on a struct or a class that takes a delegate. The arguments to the delegate are the foreach iteration variable types, and the body of the delegate is automatically set to be the inner code of the loop. The delegate's return value gives flow control, similar to blocks in Ruby.

After gathering the data, we use std.algorithm to sort, filter, and limit the size of the results. These functions show the power of input ranges and lambda functions. The syntax (a) => a; is a lambda function. First, there is a parameter list in parentheses. Types are optional here; if excluded, the lambda function is implemented as a template with implicit types from context. Then, the symbol => is the key indicator of a lambda function, and finally you have the return value. The short lambda syntax is only usable for a single expression and cannot return void.

Note

The (a) => a function is actually a function template. The compile-time parameters it needs for its parameter list are determined from context by the compiler.

The (a) => a function, in this context, could alternatively be written as a => a. If it has only one argument, the parentheses are optional. It could also be written as int delegate(int a) { return a; }, function(int a) { return a; }, or even (int a) { return a; }. The delegate and function options make two separate but related types. The difference between a delegate and a function is that a delegate has a context pointer whereas a function does not. The context pointer gives the delegate access to variables from the surrounding scope. A function can only access global variables, data through its arguments, and static data. If you do not specify one of the two keywords, delegate is usually the default; the compiler will give you what you require.

With that background information, let's look at the following line of code in more detail:

sort!((a, b) => a.size > b.size)(allFiles);

The std.algorithm.sort function takes two arguments: an optional comparison function, given as a compile-time argument for maximum efficiency, and a random access range to sort. A random access range is any iterable object from where you can jump around to different indexes. The most common random access range is an array or a slice. This is why we built an array of DirEntry. Firstly, because the dirEntries function uses the opApply iteration, so it isn't a range, and secondly, to sort, we need the whole list ready at once.


The next line is very similar. Again, we use a function from std.algorithm that takes a range and a function (called a predicate in the std.algorithm documentation). The filter returns a new range with all the properties of the sorted list except items that don't match the filter requirement. For example, the file that was last modified more than 14 days before the current time is removed.

Let's also look at the syntax of 14.days. The days function is a function in the module core.time with the @property Duration days(int a); signature. This uses a D feature called Uniform Function Call Syntax (UFCS). With UFCS, a call to foo.bar may be rewritten as bar(foo). This lets us extend any type, including built-in types such as integers and arrays in our code, adding new pseudomembers and properties. When used properly, this gives extensibility, readability, and can even help encapsulation, allowing you to write extension methods outside the original module, thus limiting access to private data.

Finally, we complete our task by using take(10) (via a UFCS call), which takes the first 10 items off the filtered list, and calling remove from std.file to remove (delete) the file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.129.90