40. Find files, LINQ style

Before I describe the new example solution, let's recap how Solution 34. Find files, handled this problem. That solution added a GetFiles extension method to the DirectoryInfo class. The method looped through a list of patterns, calling GetFiles for each to find the files matching each pattern. It added the returned files to a dictionary so that it could easily determine whether it had already found a file. After finding the files, the method extracted the dictionary's keys and values, sorted them, and returned the sorted files.

The FindFiles extension method called GetFiles to find files matching its patterns. It then looped through the files, read each with File.ReadAllText, and used the string class's Contains method to see if a particular file contained the target text. It added the files containing the target text to a list and, when it had checked every file, returned the list.

The following code shows the LINQ version:

// Find files that match any of the indicated patterns and that
// contain the target string.
// Do not include duplicates and return the files sorted.
public static FileInfo[] FindFilesLINQ(this DirectoryInfo dirinfo,
IEnumerable<string> patterns, string target = "",
SearchOption option = SearchOption.TopDirectoryOnly)
{
// Find files that match the patterns.
var fileQuery =
from string pattern in patterns
from FileInfo fileinfo in dirinfo.GetFiles(pattern, option)
group fileinfo by fileinfo.FullName
into namegroup
select namegroup.First();

// If target isn't blank, select files that contain it.
if ((target != null) && (target.Length > 0))
fileQuery =
from FileInfo fileinfo in fileQuery
where File.ReadAllText(fileinfo.FullName).Contains(target)
select fileinfo;

// Take distinct values, sort, and return as an array.
return fileQuery.OrderBy(x => x.Name).ToArray();
}

This method creates a LINQ query that loops through the file patterns and calls GetFiles to find the files that match each of them. If a file matches more than one pattern, then the result may contain the same file multiple times. To handle this situation, the query groups FileInfo objects by file name. The query finishes by selecting the first FileInfo from each group.

Next, if the target string is nonblank, the code creates a second query that loops through the selected files, uses File.ReadAllText to read each file, and then uses Contains to see if a file contains the target string.

After setting up the query, the method invokes its OrderBy method to sort the results by the files' names. It converts the result into an array and returns it.

Again, the version that you should use depends on your preferences. The LINQ version is undoubtedly shorter than the two methods used by the non-LINQ version, but LINQ queries are fairly complex. You should pick the version that suits you best.

Note that you can also mix LINQ and non-LINQ operations. For example, you could use the first query without the group clause, use C# code to remove duplicate entries, and then feed the result into the final query.

As was the case in Solution 39. Directory size, PLINQ style, PLINQ is unlikely to make these queries run faster. The limiting factor will be the speed of the disk accesses, and PLINQ just adds extra overhead, which will slow things down.

Download the FindFilesLINQ example solution to see additional details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.4.174