Chapter 8. Filesystem I/O

8.0 Introduction

This chapter deals with a number of filesystem-related subjects, such as directory- or folder-based programming tasks. Some of the more advanced topics in filesystem I/O (input/output) are also touched on, such as:

  • Locking subsections of a file

  • Monitoring for certain filesystem actions

  • Version information in files

  • File compression

Various file and directory I/O techniques are used throughout the recipes to show you how to perform tasks such as creating, opening, deleting, reading, and writing with files and directories. This is fundamental knowledge that will help you understand the file I/O recipes and how to modify them for your purposes.

A number of the recipes have been updated to use the async and await operators to help alleviate the latency you’d typically encounter when dealing with the filesystem or network when performing file I/O. Using async and await improves your code’s overall responsiveness by allowing the I/O operations to occur but not to block the calling thread as they normally would until they’ve completed.

Unless otherwise specified, you need the following using statements in any program that uses snippets or methods from this chapter:

using System;
using System.IO;

8.1 Searching for Directories or Files Using Wildcards

Problem

You are attempting to find one or more specific files or directories that may or may not exist within the current filesystem. You might need to use wildcard characters in order to widen the search—for example, searching for all usermode dump files in a filesystem. These files have a .dmp extension.

Solution

There are several methods of obtaining this information. The first three methods return a string array containing the full path of each item. The next three methods return an object that encapsulates a directory, a file, or both.

The static GetFileSystemEntries method on the Directory class returns a string array containing the names of all files and directories within a single directory, for example:

public static void DisplayFilesAndSubDirectories(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));

    string[] items = Directory.GetFileSystemEntries(path);
    Array.ForEach(items, item =>
    {
        Console.WriteLine(item);
    });
}

The static GetDirectories method on the Directory class returns a string array containing the names of all directories within a single directory. The following method, DisplayDirs, shows how you might use it:

public static void DisplaySubDirectories(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));

    string[] items = Directory.GetDirectories(path);
    Array.ForEach(items, item =>
    {
        Console.WriteLine(item);
    });
}

The static GetFiles method on the Directory class returns a string array containing the names of all files within a single directory. The following method is very similar to DisplayDirs but calls Directory.GetFiles instead of Directory.GetDirectories:

public static void DisplayFiles(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));

    string[] items = Directory.GetFiles(path);
    Array.ForEach(items, item =>
    {
        Console.WriteLine(item);
    });
}

These next two methods return an object instead of simply a string. The GetFileSystemInfos method of the DirectoryInfo object returns a strongly typed array of FileSystemInfo objects (that is, of DirectoryInfo and FileInfo objects) representing the directories and files within a single directory. The following example calls the GetFileSystemInfos method to retrieve an array of FileSystemInfo objects representing all the items in a particular directory and then lists a string of display information for FileSystemInfo to the console window. The display information is created by the extension method ToDisplayString on FileSystemInfo:

public static void DisplayDirectoryContents(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));

    DirectoryInfo mainDir = new DirectoryInfo(path);
    var fileSystemDisplayInfos =
        (from fsi in mainDir.GetFileSystemInfos()
        where fsi is FileSystemInfo || fsi is DirectoryInfo
        select fsi.ToDisplayString()).ToArray();

    Array.ForEach(fileSystemDisplayInfos, s =>
    {
        Console.WriteLine(s);
    });
}

public static string ToDisplayString(this FileSystemInfo fileSystemInfo)
{
    string type = fileSystemInfo.GetType().ToString();
    if (fileSystemInfo is DirectoryInfo)
        type = "DIRECTORY";
    else if (fileSystemInfo is FileInfo)
        type = "FILE";
    return $"{type}: {fileSystemInfo.Name}";
}

The output for this code is shown here:

DIRECTORY: MyNestedTempDir
DIRECTORY: MyNestedTempDirPattern
FILE: MyTempFile.PDB
FILE: MyTempFile.TXT

The GetDirectories instance method of the DirectoryInfo object returns an array of DirectoryInfo objects representing only subdirectories in a single directory. For example, the following code calls the GetDirectories method to retrieve an array of DirectoryInfo objects and then displays the Name property of each object to the console window:

public static void DisplayDirectoriesFromInfo(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));

    DirectoryInfo mainDir = new DirectoryInfo(path);
    DirectoryInfo[] items = mainDir.GetDirectories();
    Array.ForEach(items, item =>
    {
        Console.WriteLine($"DIRECTORY: {item.Name}");
    });
}

The GetFiles instance method of the DirectoryInfo object returns an array of FileInfo objects representing only the files in a single directory. For example, the following code calls the GetFiles method to retrieve an array of FileInfo objects, and then it displays the Name property of each object to the console window:

public static void DisplayFilesFromInfo(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));

    DirectoryInfo mainDir = new DirectoryInfo(path);
    FileInfo[] items = mainDir.GetFiles();
    Array.ForEach(items, item =>
    {
        Console.WriteLine($"FILE: {item.Name}");
    });
}

The static GetFileSystemEntries method on the Directory class returns all files and directories in a single directory that match pattern:

public static void DisplayFilesWithPattern(string path, string pattern)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));
    if (string.IsNullOrWhiteSpace(pattern))
        throw new ArgumentNullException(nameof(pattern));

    string[] items = Directory.GetFileSystemEntries(path, pattern);
    Array.ForEach(items, item =>
    {
        Console.WriteLine(item);
    });
}

The static GetDirectories method on the Directory class returns only those directories in a single directory that match pattern:

public static void DisplayDirectoriesWithPattern(string path, string pattern)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));
    if (string.IsNullOrWhiteSpace(pattern))
        throw new ArgumentNullException(nameof(pattern));

    string[] items = Directory.GetDirectories(path, pattern);
    Array.ForEach(items, item =>
    {
        Console.WriteLine(item);
    });
}

The static GetFiles method on the Directory class returns only those files in a single directory that match pattern:

public static void DisplayFilesWithGetFiles(string path, string pattern)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));
    if (string.IsNullOrWhiteSpace(pattern))
        throw new ArgumentNullException(nameof(pattern));

    string[] items = Directory.GetFiles(path, pattern);
    Array.ForEach(items, item =>
    {
        Console.WriteLine(item);
    });
}

These next three methods return an object instead of simply a string. The first instance method is GetFileSystemInfos, which returns both directories and files in a single directory that match pattern:

public static void DisplayDirectoryContentsWithPattern(string path, 
    string pattern)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));
    if (string.IsNullOrWhiteSpace(pattern))
        throw new ArgumentNullException(nameof(pattern));

    DirectoryInfo mainDir = new DirectoryInfo(path);
    var fileSystemDisplayInfos =
        (from fsi in mainDir.GetFileSystemInfos(pattern)
        where fsi is FileSystemInfo || fsi is DirectoryInfo
        select fsi.ToDisplayString()).ToArray();

    Array.ForEach(fileSystemDisplayInfos, s =>
    {
        Console.WriteLine(s);
    });
}

The GetDirectories instance method returns only directories (contained in the DirectoryInfo object) in a single directory that match pattern:

public static void DisplayDirectoriesWithPatternFromInfo(string path, 
    string pattern)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));
    if (string.IsNullOrWhiteSpace(pattern))
        throw new ArgumentNullException(nameof(pattern));

    DirectoryInfo mainDir = new DirectoryInfo(path);
    DirectoryInfo[] items = mainDir.GetDirectories(pattern);
    Array.ForEach(items, item =>
    {
        Console.WriteLine($"DIRECTORY: {item.Name}");
    });
}

The GetFiles instance method returns only file information (contained in the FileInfo object) in a single directory that matches pattern:

public static void DisplayFilesWithInstanceGetFiles(string path, string pattern)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));
    if (string.IsNullOrWhiteSpace(pattern))
        throw new ArgumentNullException(nameof(pattern));

    DirectoryInfo mainDir = new DirectoryInfo(path);
    FileInfo[] items = mainDir.GetFiles(pattern);
    Array.ForEach(items, item =>
    {
        Console.WriteLine($"FILE: {item.Name}");
    });
}

Discussion

If you need just an array of strings containing paths to both directories and files, you can use the static method Directory.GetFileSystemEntries. The string array returned does not include any information about whether an individual element is a directory or a file. Each string element contains the entire path to either a directory or file contained within the specified path.

To quickly and easily distinguish between directories and files, use the Directory. GetDirectories and Directory.GetFiles static methods. These methods return arrays of directory names and filenames. These methods return an array of string objects. Each element contains the full path to the directory or file.

Returning a string is fine if you do not need any other information about the directory or file returned or if you are going to need more information for only one of the files returned. It is more efficient to use the static methods to get the list of filenames and just retrieve the FileInfo for the ones you need than to have all of the FileInfos constructed for the directory, as the instance methods will do. If you need to access attributes, lengths, or times on every one of the files, you should consider using the instance methods that retrieve the FileInfo details.

The instance method GetFileSystemInfos returns an array of strongly typed FileSystemInfo objects. (The FileSystemInfo object is the base class to the DirectoryInfo and FileInfo objects.) Therefore, you can test whether the returned type is a DirectoryInfo or FileInfo object using the is or as keyword. Once you know what subclass the object really is, you can cast the object to that type and begin using it.

To get only DirectoryInfo objects, use the overloaded GetDirectories instance method. To get only FileInfo objects, use the overloaded GetFiles instance method. These methods return an array of DirectoryInfo and FileInfo objects, respectively, each element of which encapsulates a directory or file.

There are certain behaviors to be aware of for the patterns you can provide when filtering the results from GetFiles or GetFileSystemInfos:

  • The pattern cannot contain any of the InvalidPathChars and cannot use the “go back up in the folder structure one level” symbol (..).

  • The order in which the items in the array are returned is not guaranteed, but you can use Sort or order the results in a query.

  • When an extension is exactly three characters, the behavior is different in that the pattern will match on any files with those first three characters in the extension.

  • *.htm returns files having an extension of .htm, .html, .htma, and so on.

  • When an extension has fewer than or more than three characters, the pattern will perform exact matching.

  • *.cs returns only files having an extension of .cs.

See Also

The “DirectoryInfo Class,” “FileInfo Class,” and “FileSystemInfo Class” topics in the MSDN documentation.

8.2 Obtaining the Directory Tree

Problem

You need to get a directory tree, potentially including filenames, extending from any point in the directory hierarchy. In addition, each directory or file returned must be in the form of an object encapsulating that item. This will allow you to perform operations on the returned objects, such as deleting the file, renaming the file, or examining/changing its attributes. Finally, you potentially need the ability to search for a specific subset of these items based on a pattern, such as finding only files with the .pdb extension.

Solution

By calling the GetFileSystemInfos instance method, you can retrieve all of the files and directories down the directory hierarchy from any starting point as an enumerable list:

public static IEnumerable<FileSystemInfo> GetAllFilesAndDirectories(string dir)
{
    if (string.IsNullOrWhiteSpace(dir))
        throw new ArgumentNullException(nameof(dir));

    DirectoryInfo dirInfo = new DirectoryInfo(dir);
    Stack<FileSystemInfo> stack = new Stack<FileSystemInfo>();

    stack.Push(dirInfo);
    while (dirInfo != null || stack.Count > 0)
    {
        FileSystemInfo fileSystemInfo = stack.Pop();
        DirectoryInfo subDirectoryInfo = fileSystemInfo as DirectoryInfo;
        if (subDirectoryInfo != null)
        {
            yield return subDirectoryInfo;
            foreach (FileSystemInfo fsi in subDirectoryInfo.GetFileSystemInfos())
                stack.Push(fsi);
            dirInfo = subDirectoryInfo;
        }
        else
        {
            yield return fileSystemInfo;
            dirInfo = null;
        }
    }
}

To display the results of the file and directory retrieval, use the following query:

public static void DisplayAllFilesAndDirectories(string dir)
{
    if (string.IsNullOrWhiteSpace(dir))
        throw new ArgumentNullException(nameof(dir));

    var strings = (from fileSystemInfo in GetAllFilesAndDirectories(dir)
                    select fileSystemInfo.ToDisplayString()).ToArray();

    Array.ForEach(strings, s => { Console.WriteLine(s); });
}

Since the results are queryable, you don’t have to retrieve information about all files and directories. The following query uses a case-insensitive comparison to obtain a listing of all files with the extension of .pdb that reside in directories that contain Chapter 1:

public static void DisplayAllFilesWithExtension(string dir, string extension)
{
    if (string.IsNullOrWhiteSpace(dir))
        throw new ArgumentNullException(nameof(dir));
    if (string.IsNullOrWhiteSpace(extension))
        throw new ArgumentNullException(nameof(extension));

    var strings = (from fileSystemInfo in GetAllFilesAndDirectories(dir)
                    where fileSystemInfo is FileInfo &&
                            fileSystemInfo.FullName.Contains("Chapter 1") &&
                            (string.Compare(fileSystemInfo.Extension, extension,
                                        StringComparison.OrdinalIgnoreCase) == 0)
                    select fileSystemInfo.ToDisplayString()).ToArray();

    Array.ForEach(strings, s => { Console.WriteLine(s); });
}

Discussion

To obtain a tree representation of a directory and the files it contains, you could use recursive iterators in a method like this:

public static IEnumerable<FileSystemInfo> GetAllFilesAndDirectoriesWithRecursion(
    string dir)
{
    if (string.IsNullOrWhiteSpace(dir))
        throw new ArgumentNullException(nameof(dir));

    DirectoryInfo dirInfo = new DirectoryInfo(dir);
    FileSystemInfo[] fileSystemInfos = dirInfo.GetFileSystemInfos();
    foreach (FileSystemInfo fileSystemInfo in fileSystemInfos)
    {
        yield return fileSystemInfo;
        if (fileSystemInfo is DirectoryInfo)
        {
            foreach (FileSystemInfo fsi in
GetAllFilesAndDirectoriesWithRecursion(fileSystemInfo.FullName))
                yield return fsi;
        }
    }
}

public static void DisplayAllFilesAndDirectoriesWithRecursion(string dir)
{
    if (string.IsNullOrWhiteSpace(dir))
        throw new ArgumentNullException(nameof(dir));

    var strings = (from fileSystemInfo in 
                    GetAllFilesAndDirectoriesWithRecursion(dir)
                    select fileSystemInfo.ToDisplayString()).ToArray();

    Array.ForEach(strings, s => { Console.WriteLine(s); });
}

The main difference between this and the Solution code is that this uses recursive iterators, and the Solution uses iterative iterators and an explicit stack.

Note

You would not want to use the recursive iterator method, as the performance is in fact O(n * d), where n is the number of FileSystemInfos and d is the depth of the directory hierarchy—which is typically log n. See the demonstration code.

You can check the performance with the following code if the Solution methods are renamed to DisplayAllFilesAndDirectoriesWithoutRecursion and DisplayAllFilesWithExtensionWithoutRecursion, respectively:

string dir = Environment.GetFolderPath(Environment.SpecialFolder.ProgramFiles);

// list all of the files without recursion
Stopwatch watch1 = Stopwatch.StartNew();
DisplayAllFilesAndDirectoriesWithoutRecursion(tempDir1);
watch1.Stop();
Console.WriteLine("*************************");

// list all of the files without using recursion
Stopwatch watch2 = Stopwatch.StartNew();
DisplayAllFilesAndDirectoriesWithoutRecursion(tempDir1);
watch2.Stop();
Console.WriteLine("*************************");
Console.WriteLine(
    $"Non-Recursive method time elapsed {watch1.Elapsed.ToString()}");
Console.WriteLine($"Recursive method time elapsed {watch2.Elapsed.ToString()}");

Here is the code without recursion methods:

public static void DisplayAllFilesAndDirectoriesWithoutRecursion(string dir)
{
    var strings = from fileSystemInfo in 
                    GetAllFilesAndDirectoriesWithoutRecursion(dir)
                    select fileSystemInfo.ToDisplayString();

    foreach (string s in strings)
        Console.WriteLine(s);
}

public static void DisplayAllFilesWithExtensionWithoutRecursion(string dir,
    string extension)
{
    var strings = from fileSystemInfo in 
                    GetAllFilesAndDirectoriesWithoutRecursion(dir)
                    where fileSystemInfo is FileInfo &&
                        fileSystemInfo.FullName.Contains("Chapter 1") &&
                        (string.Compare(fileSystemInfo.Extension, extension,
                                        StringComparison.OrdinalIgnoreCase) == 0)
                    select fileSystemInfo.ToDisplayString();

    foreach (string s in strings)
        Console.WriteLine(s);
}

public static IEnumerable<FileSystemInfo> 
    GetAllFilesAndDirectoriesWithoutRecursion(
    string dir)
{
    DirectoryInfo dirInfo = new DirectoryInfo(dir);
    Stack<FileSystemInfo> stack = new Stack<FileSystemInfo>();

    stack.Push(dirInfo);
    while (dirInfo != null || stack.Count > 0)
    {
        FileSystemInfo fileSystemInfo = stack.Pop();
        DirectoryInfo subDirectoryInfo = fileSystemInfo as DirectoryInfo;
        if (subDirectoryInfo != null)
        {
            yield return subDirectoryInfo;
            foreach (FileSystemInfo fsi in subDirectoryInfo.GetFileSystemInfos())
                stack.Push(fsi);
            dirInfo = subDirectoryInfo;
        }
        else
        {
            yield return fileSystemInfo;
            dirInfo = null;
        }
    }
}

See Also

The “DirectoryInfo Class,” “FileInfo Class,” and “FileSystemInfo Class” topics in the MSDN documentation.

8.3 Parsing a Path

Problem

You need to separate the constituent parts of a path and place them into separate variables.

Solution

Use the static methods of the Path class:

public static void DisplayPathParts(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));

    string root = Path.GetPathRoot(path);
    string dirName = Path.GetDirectoryName(path);
    string fullFileName = Path.GetFileName(path);
    string fileExt = Path.GetExtension(path);
    string fileNameWithoutExt = Path.GetFileNameWithoutExtension(path);
    StringBuilder format = new StringBuilder();
    format.Append($"ParsePath of {path} breaks up into the following pieces:" +
        $"{Environment.NewLine}");
    format.Append($"	Root: {root}{Environment.NewLine}");
    format.Append($"	Directory Name: {dirName}{Environment.NewLine}");
    format.Append($"	Full File Name: {fullFileName}{Environment.NewLine}");
    format.Append($"	File Extension: {fileExt}{Environment.NewLine}");
    format.Append($"	File Name Without Extension: {fileNameWithoutExt}" +
        $"{Environment.NewLine}");
    Console.WriteLine(format.ToString());
}

If the string C: est empfile.txt is passed to this method, the output looks like this:

ParsePath of C:	est	empfile.txt breaks up into the following pieces:
        Root: C:
        Directory Name: C:	est
        Full File Name: tempfile.txt
        File Extension: .txt
        File Name Without Extension: tempfile

Discussion

The Path class contains methods that can be used to parse a given path. Using these classes is much easier and less error-prone than writing path- and filename-parsing code. If these classes are not used, you could also introduce security holes into your application if the information gathered from manual parsing routines is used in security decisions for your application. There are five main methods used to parse a path: GetPathRoot, GetDirectoryName, GetFileName, GetExtension, and GetFileNameWithoutExtension. Each has a single parameter, path, which represents the path to be parsed:

GetPathRoot
This method returns the root directory of the path. If no root is provided in the path, such as when a relative path is used, this method returns an empty string, not null.
GetDirectoryName
This method returns the complete path for the directory containing the file.
GetFileName
This method returns the filename, including the file extension. If no filename is provided in the path, this method returns an empty string, not null.
GetExtension
This method returns the file’s extension. If no extension is provided for the file or no file exists in the path, this method returns an empty string, not null.
GetFileNameWithoutExtension
This method returns the root filename without the file extension.

Be aware that these methods do not actually determine whether the drives, directories, or even files exist on the system that runs these methods. These methods are string parsers, and if you pass one of them a string in some strange format (such as \ZY:foo), it will try to do what it can with it anyway:

ParsePath of \ZY:foo breaks up into the following pieces:
        Root: \ZY:foo
        Directory Name:
        Full File Name: foo
        File Extension:
        File Name Without Extension: foo

These methods will, however, throw an exception if illegal characters are found in the path.

To determine whether files or directories exist, use the static Directory.Exists or File.Exists method.

See Also

The “Path Class” topic in the MSDN documentation.

8.4 Launching and Interacting with Console Utilities

Problem

You have an application that you need to automate and that takes input only from the standard input stream. You need to drive this application via the commands it will take over the standard input stream.

Solution

Say you need to drive the cmd.exe application to display the current time with the TIME /T command (you could just run this command from the command line, but this way we can demonstrate an alternative method to drive an application that responds to standard input). The way to do this is to launch a process that is looking for input on the standard input stream. This is accomplished via the Process class StartInfo property, which is an instance of a ProcessStartInfo class. StartInfo has fields that control many details of the environment in which the new process will execute, and the Process.Start method will launch the new process with those options.

First, make sure that the StartInfo.RedirectStandardInput property is set to true. This setting notifies the process that it should read from standard input. Then, set the StartInfo.UseShellExecute property to false, because if you were to let the shell launch the process for you, it would prevent you from redirecting standard input.

Once this is done, launch the process and write to its standard input stream as shown in Example 8-1.

Example 8-1. RunProcessToReadStdIn method
public static void RunProcessToReadStandardInput()
{
    Process application = new Process();
    // Run the command shell.
    application.StartInfo.FileName = @"cmd.exe";

    // Turn on command extensions for cmd.exe.
    application.StartInfo.Arguments = "/E:ON";

    application.StartInfo.RedirectStandardInput = true;

    application.StartInfo.UseShellExecute = false;

    application.Start();

    StreamWriter input = application.StandardInput;
    // Run the command to display the time.
    input.WriteLine("TIME /T");

    // Stop the application we launched.
    input.WriteLine("exit");
}

Discussion

Redirecting the input stream for a process allows you to programmatically interact with certain applications and utilities that you would otherwise not be able to automate without additional tools. Once the input has been redirected, you can write into the standard input stream of the process by reading the Process.StandardInput property, which returns a StreamWriter. Once you have that, you can send things to the process via WriteLine calls, as shown earlier.

To use StandardInput, you have to specify true for the StartInfo property’s RedirectStandardInput property. Otherwise, reading the StandardInput property throws an exception.

When UseShellExecute is false, you can use Process only to create executable processes. Normally you can use the Process class to perform operations on the file, such as printing a Microsoft Word document. Another difference when UseShellExecute is set to false is that the working directory is not used to find the executable, so you must be mindful to pass a full path or have the executable on your PATH environment variable.

See Also

The “Process Class,” “ProcessStartInfo Class,” “RedirectStandardInput Property,” and “UseShellExecute Property” topics in the MSDN documentation.

8.5 Locking Subsections of a File

Problem

You need to read or write data from or to a section of a file, and you want to make sure that no other processes or threads can access, modify, or delete the file until you have finished with it.

Solution

To lock out other processes from accessing your file while you are using it, you use the Lock method of the FileStream class. The following code creates a file from the fileName parameter and writes two lines to it. The entire file is then locked via the Lock method. While the file is locked, the code goes off and does some other processing; when this code returns, the file is closed and thereby unlocked:

public static async Task CreateLockedFileAsync(string fileName)
{
    if (string.IsNullOrWhiteSpace(fileName))
        throw new ArgumentNullException(nameof(fileName));

    FileStream fileStream = null;
    try
    {
        fileStream = new FileStream(fileName,
                FileMode.Create,
                FileAccess.ReadWrite,
                FileShare.ReadWrite, 4096, useAsync: true);

        using (StreamWriter writer = new StreamWriter(fileStream))
        {
            await writer.WriteLineAsync("The First Line");
            await writer.WriteLineAsync("The Second Line");
            await writer.FlushAsync();

            try
            {
                // Lock all of the file.
                fileStream.Lock(0, fileStream.Length);

                // Do some lengthy processing here...
                Thread.Sleep(1000);
            }
            finally
            {
                // Make sure we unlock the file.
                // If a process terminates with part of a file locked or closes 
                // a file that has outstanding locks, the behavior is undefined 
                // which is MS speak for bad things....
                fileStream.Unlock(0, fileStream.Length);
            }

            await writer.WriteLineAsync("The Third Line");
            fileStream = null;
        }
    }
    finally
    {
        if (fileStream != null)
            fileStream.Dispose();
    }
}
Note

Note that in the CreateLockedFileAsync method we are using the async and await operators. The async operator allows you to indicate that a method is eligible for suspension at certain points, and the await operator designates those suspension points in your code—which means that the compiler knows that the async method can’t continue past that point until the awaited asynchronous process is complete. While it waits, the caller gets control back. This helps your program in that the thread for the caller is not blocked and can perform other work, but the method will still act as if it was called synchronously.

Discussion

If a file is opened within your application and the FileShare parameter of the FileStream.Open call is set to FileShare.ReadWrite or FileShare.Write, other code in your application can view or alter the contents of the file while you are using it. To handle file access with more granularity, use the Lock method of the FileStream object to prevent other code from overwriting all or a portion of your file. Once you are done with the locked portion of your file, you can call the Unlock method on the FileStream object to allow other code in your application to write data to that portion of the file.

To lock an entire file, use the following syntax:

fileStream.Lock(0, fileStream.Length);

To lock a portion of a file, use the following syntax:

fileStream.Lock(4, fileStream.Length - 4);

This line of code locks the entire file except for the first four characters. Note that you can lock an entire file and still open it multiple times, as well as write to it.

If another thread is accessing this file, you might see an IOException thrown during the call to one of the WriteAsync, FlushAsync, or Close methods. For example, the following code is prone to such an exception:

public static async Task CreateLockedFileWithExceptionAsync(string fileName)
{

    FileStream fileStream = null;
    try
    {
        fileStream = new FileStream(fileName,
                FileMode.Create,
                FileAccess.ReadWrite,
                FileShare.ReadWrite, 4096, useAsync: true);
        using (StreamWriter streamWriter = new StreamWriter(fileStream))
        {
            await streamWriter.WriteLineAsync("The First Line");
            await streamWriter.WriteLineAsync("The Second Line");
            await streamWriter.FlushAsync();

            // Lock all of the file.
            fileStream.Lock(0, fileStream.Length);

            FileStream writeFileStream = null;
            try
            {
                writeFileStream = new FileStream(fileName,
                                            FileMode.Open,
                                            FileAccess.Write,
                                            FileShare.ReadWrite, 4096, 
                                            useAsync: true);
                using (StreamWriter streamWriter2 = 
                    new StreamWriter(writeFileStream))
                {
                    await streamWriter2.WriteAsync("foo ");
                    try
                    {
                        streamWriter2.Close(); // --> Exception occurs here!
                    }
                    catch
                    {
                        Console.WriteLine(
                        "The streamWriter2.Close call generated an exception.");
                    }
                    streamWriter.WriteLine("The Third Line");
                }
                writeFileStream = null;
            }
            finally
            {
                if (writeFileStream != null)
                    writeFileStream.Dispose();
            }
        }
        fileStream = null;
    }
    finally
    {
        if (fileStream != null)
            fileStream.Dispose();
    }
}

This code produces the following output:

The streamWriter2.Close call generated an exception.

Even though streamWriter2, the second StreamWriter object, writes to a locked file, it is only when the streamWriter2.Close method is executed that the IOException is thrown.

If the code for this recipe were rewritten as follows:

public static async Task CreateLockedFileWithUnlockAsync(string fileName)
{
    FileStream fileStream = null;
    try
    {
        fileStream = new FileStream(fileName,
                                    FileMode.Create,
                                    FileAccess.ReadWrite,
                                    FileShare.ReadWrite, 4096, useAsync: true);
        using (StreamWriter streamWriter = new StreamWriter(fileStream))
        {
            await streamWriter.WriteLineAsync("The First Line");
            await streamWriter.WriteLineAsync("The Second Line");
            await streamWriter.FlushAsync();

            // Lock all of the file.
            fileStream.Lock(0, fileStream.Length);

            // Try to access the locked file...
            FileStream writeFileStream = null;
            try
            {
                writeFileStream = new FileStream(fileName,
                                            FileMode.Open,
                                            FileAccess.Write,
                                            FileShare.ReadWrite, 4096, 
                                            useAsync: true);
                using (StreamWriter streamWriter2 = 
                    new StreamWriter(writeFileStream))
                {
                    await streamWriter2.WriteAsync("foo");
                    fileStream.Unlock(0, fileStream.Length);
                    await streamWriter2.FlushAsync();
                }
                writeFileStream = null;
            }
            finally
            {
                if (writeFileStream != null)
                    writeFileStream.Dispose();
            }
        }
        fileStream = null;
    }
    finally
    {
        if (fileStream != null)
            fileStream.Dispose();
    }
}

no exception is thrown. This is because the code unlocked the FileStream object that initially locked the entire file. This action also freed all of the locks on the file that this FileStream object was holding onto. In the example, the streamWriter2.WriteAsync("Foo") method had written Foo to the stream’s buffer but had not flushed it, so the string Foo was still waiting to be flushed and written to the actual file. Keep this situation in mind when interleaving the opening, locking, and closing of streams. Sometimes mistakes in code are not immediately found during code reviews, unit testing, or formal quality assurance, and this can lead to some bugs that are more difficult to track down, so tread carefully when using file locking.

See Also

The “StreamWriter Class,” “FileStream Class,” and “Asynchronous Programming with Async and Await” topics in the MSDN documentation.

8.6 Waiting for an Action to Occur in the Filesystem

Problem

You need to be notified when a particular event occurs in the filesystem, such as the renaming of a file or directory, the increasing or decreasing of the size of a file, the deletion of a file or directory, the creation of a file or directory, or even the changing of a file’s or directory’s attribute(s). However, this notification must occur synchronously. In other words, the application cannot continue unless a specific action occurs to a file or directory.

Solution

The WaitForChanged method of the FileSystemWatcher class can be called to wait synchronously for an event notification. This is illustrated by the WaitForZipCreation method shown in Example 8-2, which waits for an action—more specifically, the creation of the Backup.zip file somewhere on the C: drive—to be performed before proceeding to the next line of code, which is the WriteLine statement. Finally, we spin off a task to do the actual work of creating the file. By doing this as a Task, we allow the processing to occur on a separate thread when one becomes available and the FileSystemWatcher to detect the file creation.

Example 8-2. WaitForZipCreation method
public static void WaitForZipCreation(string path, string fileName)
{
    if (string.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException(nameof(path));
    if (string.IsNullOrWhiteSpace(fileName))
        throw new ArgumentNullException(nameof(fileName));

    FileSystemWatcher fsw = null;
    try
    {
        fsw = new FileSystemWatcher();
        string [] data = new string[] {path,fileName};
        fsw.Path = path;
        fsw.Filter = fileName;
        fsw.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
            | NotifyFilters.FileName | NotifyFilters.DirectoryName;

        // Run the code to generate the file we are looking for
        // Normally you wouldn't do this as another source is creating
        // this file
        Task work = Task.Run(() =>
        {
            try
            {
                // wait a sec...
                Thread.Sleep(1000);
                // create a file in the temp directory
                if (data.Length == 2)
                {
                    string dataPath = data[0];
                    string dataFile = path + data[1];
                    Console.WriteLine($"Creating {dataFile} in task...");
                    FileStream fileStream = File.Create(dataFile);
                    fileStream.Close();
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.ToString());
            }
        });

        // Don't await the work task finish, as we detect that
        // through the FileSystemWatcher
        WaitForChangedResult result =
            fsw.WaitForChanged(WatcherChangeTypes.Created);
        Console.WriteLine($"{result.Name} created at {path}.");
    }
    catch(Exception e)
    {
        Console.WriteLine(e.ToString());
    }
    finally
    {
        // clean it up
        File.Delete(fileName);
        fsw?.Dispose();
    }
}

Discussion

The WaitForChanged method returns a WaitForChangedResult structure that contains the properties listed in Table 8-1.

Table 8-1. WaitForChangedResult properties
Property Description
ChangeType Lists the type of change that occurred. This change is returned as a WatcherChangeTypes enumeration. The values of this enumeration can possibly be ORed together.
Name Holds the name of the file or directory that was changed. If the file or directory was renamed, this property returns the changed name. Its value is set to null if the operation method call times out.
OldName The original name of the modified file or directory. If this file or directory was not renamed, this property will return the same value as the Name property. Its value is set to null if the operation method call times out.
TimedOut Holds a Boolean indicating whether the WaitForChanged method timed out (true) or not (false).

The way we are currently making the WaitForChanged call could possibly block indefinitely. To prevent the code from hanging forever on the WaitForChanged call, you can specify a timeout value of three seconds as follows:

WaitForChangedResult result =
         fsw.WaitForChanged(WatcherChangeTypes.Created, 3000);

The NotifyFilters enumeration allows you to specify the types of files or folders to watch for, as shown in Table 8-2.

Table 8-2. NotifyFilters enumeration
Enumeration value Definition
FileName Name of the file
DirectoryName Name of the directory
Attributes The file or folder attributes
Size The file or folder size
LastWrite The date the file or folder last had anything written to it
LastAccess The date the file or folder was last opened
CreationTime The time the file or folder was created
Security The security settings of the file or folder

See Also

The “FileSystemWatcher Class,” “NotifyFilters Enumeration,” and “WaitForChangedResult Structure” topics in the MSDN documentation.

8.7 Comparing Version Information of Two Executable Modules

Problem

You need to programmatically compare the version information of two executable modules. An executable module is a file that contains executable code, such as an .exe or .dll file. The ability to compare the version information of two executable modules can be very useful to an application in situations such as:

  • Trying to determine if it has all of the “right” pieces present to execute.

  • Deciding on an assembly to dynamically load through reflection.

  • Looking for the newest version of a file or .dll from many files spread out in the local filesystem or on a network.

Solution

Use the CompareFileVersions method to compare executable module version information. This method accepts two filenames, including their paths, as parameters. The version information of each module is retrieved and compared. This file returns a FileComparison enumeration, defined as follows:

public enum FileComparison
{
    Error = 0,
    Newer = 1,
    Older = 2,
    Same = 3
}

The code for the CompareFileVersions method is shown in Example 8-3.

Example 8-3. CompareFileVersions method
private static FileComparison ComparePart(int p1, int p2) =>
    p1 > p2 ? FileComparison.Newer :
        (p1 < p2 ? FileComparison.Older : FileComparison.Same);

public static FileComparison CompareFileVersions(string file1, string file2)
{
    if (string.IsNullOrWhiteSpace(file1))
        throw new ArgumentNullException(nameof(file1));
    if (string.IsNullOrWhiteSpace(file2))
        throw new ArgumentNullException(nameof(file2));

    FileComparison retValue = FileComparison.Error;
    // get the version information
    FileVersionInfo file1Version = FileVersionInfo.GetVersionInfo(file1);
    FileVersionInfo file2Version = FileVersionInfo.GetVersionInfo(file2);

    retValue = ComparePart(file1Version.FileMajorPart, 
        file2Version.FileMajorPart);
    if (retValue != FileComparison.Same)
    {
        retValue = ComparePart(file1Version.FileMinorPart, file2Version.FileMinorPart);
        if (retValue != FileComparison.Same)
        {
            retValue = ComparePart(file1Version.FileBuildPart,
                           file2Version.FileBuildPart);
            if (retValue != FileComparison.Same)
                retValue = ComparePart(file1Version.FilePrivatePart,
                        file2Version.FilePrivatePart);
        }
    }
    return retValue;
}

Discussion

Not all executable modules have version information. If you load a module with no version information using the FileVersionInfo class, you will not provoke an exception, nor will you get null back for the object reference. Instead, you will get a valid FileVersionInfo object with all data members in their initial state, which is null for .NET objects.

Assemblies actually have two sets of version information: the version information available in the assembly manifest and the PE (portable executable) file version information. FileVersionInfo reads the assembly manifest version information.

The first action this method takes is to determine whether the two files passed in to the file1 and file2 parameters actually exist. If so, the static GetVersionInfo method of the FileVersionInfo class is called to get version information for the two files.

The CompareFileVersions method attempts to compare each portion of the file’s version number using the following properties of the FileVersionInfo object returned by GetVersionInfo:

FileMajorPart
The first two bytes of the version number.
FileMinorPart
The second two bytes of the version number.
FileBuildPart
The third two bytes of the version number.
FilePrivatePart
The final two bytes of the version number.

The full version number is composed of these four parts, making up an 8-byte number representing the file’s version number.

The CompareFileVersions method first compares the FileMajorPart version information of the two files. If these are equal, the FileMinorPart version information of the two files is compared. This continues through the FileBuildPart and finally the FilePrivatePart version information values. If all four parts are equal, the files are considered to have the same version number. If either file is found to have a higher number than the other file, it is considered to be the latest version.

See Also

The “FileVersionInfo Class” topic in the MSDN documentation.

8.8 Querying Information for All Drives on a System

Problem

Your application needs to know if a drive (HDD, CD drive, DVD drive, BluRay drive, etc.) is available and ready to be written to and/or read from and if you have enough available free space on the drive.

Solution

Use the various properties in the DriveInfo class as shown here:

public static void DisplayAllDriveInfo()
{
    DriveInfo[] drives = DriveInfo.GetDrives();
    Array.ForEach(drives, drive =>
    {
        if (drive.IsReady)
        {
            Console.WriteLine($"Drive {drive.Name} is ready.");
            Console.WriteLine($"AvailableFreeSpace: {drive.AvailableFreeSpace}");
            Console.WriteLine($"DriveFormat: {drive.DriveFormat}");
            Console.WriteLine($"DriveType: {drive.DriveType}");
            Console.WriteLine($"Name: {drive.Name}");
            Console.WriteLine("RootDirectory.FullName: " +
                $"{drive.RootDirectory.FullName}");
            Console.WriteLine($"TotalFreeSpace: {drive.TotalFreeSpace}");
            Console.WriteLine($"TotalSize: {drive.TotalSize}");
            Console.WriteLine($"VolumeLabel: {drive.VolumeLabel}");
        }
        else
        {
            Console.WriteLine($"Drive {drive.Name} is not ready.");
        }
        Console.WriteLine();
    });
}

This code will display the results in the following format. Because each system is different, the results will vary:

Drive C: is ready.
AvailableFreeSpace: 143210795008
DriveFormat: NTFS
DriveType: Fixed
Name: C:
RootDirectory.FullName: C:
TotalFreeSpace: 143210795008
TotalSize: 159989886976
VolumeLabel: Vol1

Drive D: is ready.
AvailableFreeSpace: 0
DriveFormat: UDF
DriveType: CDRom
Name: D:
RootDirectory.FullName: D:
TotalFreeSpace: 0
TotalSize: 3305965568
VolumeLabel: Vol2

Drive E: is ready.
AvailableFreeSpace: 4649025536
DriveFormat: UDF
DriveType: CDRom
Name: E:
RootDirectory.FullName: E:
TotalFreeSpace: 4649025536
TotalSize: 4691197952
VolumeLabel: Vol3

Drive F: is not ready

Of particular interest are the IsReady and AvailableFreeSpace properties. The IsReady property determines if the drive is ready to be queried, written to, or read from but is not terribly reliable, as this state could quickly change. When using IsReady, be sure to account for the case where the drive becomes not ready as well. The AvailableFreeSpace property returns the free space on that drive in bytes.

Discussion

The DriveInfo class from the .NET Framework allows you to easily query information on one particular drive or on all drives in the system. To query the information from a single drive, use the code in Example 8-4.

Example 8-4. Getting information from a specific drive
DriveInfo drive = new DriveInfo("D");
if (drive.IsReady)
    Console.WriteLine($"The space available on the D:\ drive: " +
                $"{drive.AvailableFreeSpace}");
else
    Console.WriteLine("Drive D:\ is not ready.");

Notice that only the drive letter is passed in to the DriveInfo constructor. The drive letter can be either uppercase or lowercase—it does not matter. The next thing you will notice with the code in the Solution to this recipe is that the IsReady property is always tested for true before either using the drive or querying its properties. If we did not test this property for true and for some reason the drive was not ready (e.g., a CD was not in the drive at that time), a System.IO.IOException would be returned stating “The device is not ready.” The DriveInfo constructor was not used for the Solution to this recipe. Instead, the static GetDrives method of the DriveInfo class was used to return an array of DriveInfo objects. Each DriveInfo object in this array corresponds to one drive on the current system.

The DriveType property of the DriveInfo class returns an enumeration value from the DriveType enumeration. This enumeration value identifies what type of drive the current DriveInfo object represents. Table 8-3 identifies the various values of the DriveType enumeration.

Table 8-3. DriveType enumeration values
Enum value Description
CDRom This can be a CD-ROM, CD writer, DVD-ROM, DVD, or Blu-ray writer drive.
Fixed This is the fixed drive, such as an HDD. Note that USB HDDs fall into this category.
Network A network drive.
NoRootDirectory No root directory was found on this drive.
Ram A RAM disk.
Removable A removable storage device.
Unknown Some other type of drive than those listed here.

In the DriveInfo class there are two very similar properties, AvailableFreeSpace and TotalFreeSpace. Both properties will return the same value in most cases. However, AvailableFreeSpace also takes into account any disk-quota information for a particular drive. You can find disk-quota information by right-clicking a drive in Windows Explorer and selecting the Properties pop-up menu item. This displays the Properties page for the drive. Click the Quota tab on the Properties page to view the quota information for the drive. If the Enable Quota Management checkbox is unchecked, then disk-quota management is disabled, and the AvailableFreeSpace and TotalFreeSpace properties should be equal.

See Also

The “DriveInfo Class” topic in the MSDN documentation.

8.9 Compressing and Decompressing Your Files

Problem

You need a way to compress a file using one of the stream-based classes without being constrained by the 4 GB limit imposed by the framework classes. In addition, you need a way to decompress the file to allow you to read it back in.

Solution

Use the System.IO.Compression.DeflateStream or the System.IO.Compression.GZipStream classes to read and write compressed data to a file using a “chunking” routine. The CompressFileAsync, DecompressFileAsync, and Decompress methods shown in Example 8-5 demonstrate how to use these classes to compress and decompress files on the fly.

Example 8-5. The CompressFileAsync and DecompressFileAsync methods
/// <summary>
/// Compress the source file to the destination file.
/// This is done in 1MB chunks to not overwhelm the memory usage.
/// </summary>
/// <param name="sourceFile">the uncompressed file</param>
/// <param name="destinationFile">the compressed file</param>
/// <param name="compressionType">the type of compression to use</param>
public static async Task CompressFileAsync(string sourceFile,
                                string destinationFile,
                                CompressionType compressionType)
{
    if (string.IsNullOrWhiteSpace(sourceFile))
        throw new ArgumentNullException(nameof(sourceFile));

    if (string.IsNullOrWhiteSpace(destinationFile))
        throw new ArgumentNullException(nameof(destinationFile));

    FileStream streamSource = null;
    FileStream streamDestination = null;
    Stream streamCompressed = null;

    int bufferSize = 4096;
    using (streamSource = new FileStream(sourceFile,
            FileMode.OpenOrCreate, FileAccess.Read, FileShare.None,
            bufferSize, useAsync: true))
    {
        using (streamDestination = new FileStream(destinationFile,
            FileMode.OpenOrCreate, FileAccess.Write, FileShare.None,
            bufferSize, useAsync: true))
        {
            // read 1MB chunks and compress them
            long fileLength = streamSource.Length;

            // write out the fileLength size
            byte[] size = BitConverter.GetBytes(fileLength);
            await streamDestination.WriteAsync(size, 0, size.Length);

            long chunkSize = 1048576; // 1MB
            while (fileLength > 0)
            {
                // read the chunk
                byte[] data = new byte[chunkSize];
                await streamSource.ReadAsync(data, 0, data.Length);

                // compress the chunk
                MemoryStream compressedDataStream =
                    new MemoryStream();

                if (compressionType == CompressionType.Deflate)
                    streamCompressed =
                        new DeflateStream(compressedDataStream,
                            CompressionMode.Compress);
                else
                    streamCompressed =
                        new GZipStream(compressedDataStream,
                            CompressionMode.Compress);

                using (streamCompressed)
                {
                    // write the chunk in the compressed stream
                    await streamCompressed.WriteAsync(data, 0, data.Length);
                }
                // get the bytes for the compressed chunk
                byte[] compressedData =
                    compressedDataStream.GetBuffer();

                // write out the chunk size
                size = BitConverter.GetBytes(chunkSize);
                await streamDestination.WriteAsync(size, 0, size.Length);

                // write out the compressed size
                size = BitConverter.GetBytes(compressedData.Length);
                await streamDestination.WriteAsync(size, 0, size.Length);

                // write out the compressed chunk
                await streamDestination.WriteAsync(compressedData, 0,
                    compressedData.Length);

                // subtract the chunk size from the file size
                fileLength -= chunkSize;

                // if chunk is less than remaining file use
                // remaining file
                if (fileLength < chunkSize)
                    chunkSize = fileLength;
            }
        }
    }
}

/// <summary>
/// This function will decompress the chunked compressed file
/// created by the CompressFile function.
/// </summary>
/// <param name="sourceFile">the compressed file</param>
/// <param name="destinationFile">the destination file</param>
/// <param name="compressionType">the type of compression to use</param>
public static async Task DecompressFileAsync(string sourceFile,
                                string destinationFile,
                                CompressionType compressionType)
{
    if (string.IsNullOrWhiteSpace(sourceFile))
        throw new ArgumentNullException(nameof(sourceFile));
    if (string.IsNullOrWhiteSpace(destinationFile))
        throw new ArgumentNullException(nameof(destinationFile));

    FileStream streamSource = null;
    FileStream streamDestination = null;
    Stream streamUncompressed = null;

    int bufferSize = 4096;
    using (streamSource = new FileStream(sourceFile,
            FileMode.OpenOrCreate, FileAccess.Read, FileShare.None,
            bufferSize, useAsync: true))
    {
        using (streamDestination = new FileStream(destinationFile,
            FileMode.OpenOrCreate, FileAccess.Write, FileShare.None,
            bufferSize, useAsync: true))
        {
            // read the fileLength size
            // read the chunk size
            byte[] size = new byte[sizeof(long)];
            await streamSource.ReadAsync(size, 0, size.Length);
            // convert the size back to a number
            long fileLength = BitConverter.ToInt64(size, 0);
            long chunkSize = 0;
            int storedSize = 0;
            long workingSet = Process.GetCurrentProcess().WorkingSet64;
            while (fileLength > 0)
            {
                // read the chunk size
                size = new byte[sizeof(long)];
                await streamSource.ReadAsync(size, 0, size.Length);
                // convert the size back to a number
                chunkSize = BitConverter.ToInt64(size, 0);
                if (chunkSize > fileLength ||
                    chunkSize > workingSet)
                    throw new InvalidDataException();

                // read the compressed size
                size = new byte[sizeof(int)];
                await streamSource.ReadAsync(size, 0, size.Length);
                // convert the size back to a number
                storedSize = BitConverter.ToInt32(size, 0);
                if (storedSize > fileLength ||
                    storedSize > workingSet)
                    throw new InvalidDataException();

                if (storedSize > chunkSize)
                    throw new InvalidDataException();

                byte[] uncompressedData = new byte[chunkSize];
                byte[] compressedData = new byte[storedSize];
                await streamSource.ReadAsync(compressedData, 0,
                    compressedData.Length);

                // uncompress the chunk
                MemoryStream uncompressedDataStream =
                    new MemoryStream(compressedData);

                if (compressionType == CompressionType.Deflate)
                    streamUncompressed =
                        new DeflateStream(uncompressedDataStream,
                            CompressionMode.Decompress);
                else
                    streamUncompressed =
                        new GZipStream(uncompressedDataStream,
                            CompressionMode.Decompress);

                using (streamUncompressed)
                {
                    // read the chunk in the compressed stream
                    await streamUncompressed.ReadAsync(uncompressedData, 0,
                        uncompressedData.Length);
                }

                // write out the uncompressed chunk
                await streamDestination.WriteAsync(uncompressedData, 0,
                    uncompressedData.Length);

                // subtract the chunk size from the file size
                fileLength -= chunkSize;

                // if chunk is less than remaining file use remaining file
                if (fileLength < chunkSize)
                    chunkSize = fileLength;
            }
        }
    }

The CompressionType enumeration is defined as follows:

public enum CompressionType
{
    Deflate,
    GZip
}

Discussion

The CompressFileAsync method accepts a path to the source file to compress, a path to the destination of the compressed file, and a CompressionType enumeration value indicating which type of compression algorithm to use (Deflate or GZip). This method produces a file containing the compressed data.

The DecompressFileAsync method accepts a path to the source compressed file to decompress, a path to the destination of the decompressed file, and a CompressionType enumeration value indicating which type of decompression algorithm to use (Deflate or GZip).

The TestCompressNewFile method shown in Example 8-6 exercises the CompressFileAsync and DecompressFileAsync methods defined in the Solution section of this recipe.

Example 8-6. Using the CompressFile and DecompressFile methods
public static async void TestCompressNewFileAsync()
{
    byte[] data = new byte[10000000];
    for (int i = 0; i < 10000000; i++)
        data[i] = (byte)i;


    using(FileStream fs =
        new FileStream(@"C:NewNormalFile.txt",
            FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None,
            4096, useAsync:true))
    {
        await fs.WriteAsync(data, 0, data.Length);
    }

    await CompressFileAsync(@"C:NewNormalFile.txt", @"C:NewCompressedFile.txt",
        CompressionType.Deflate);

    await DecompressFileAsync(@"C:NewCompressedFile.txt",
        @"C:NewDecompressedFile.txt",
        CompressionType.Deflate);

    await CompressFileAsync(@"C:NewNormalFile.txt", @"C:NewGZCompressedFile.txt",
        CompressionType.GZip);

    await DecompressFileAsync(@"C:NewGZCompressedFile.txt",
        @"C:NewGZDecompressedFile.txt",
        CompressionType.GZip);

    //Normal file size == 10,000,000 bytes
    //GZipped file size == 84,362
    //Deflated file size == 42,145
    //Pre .NET 4.5 GZipped file size == 155,204
    //Pre .NET 4.5 Deflated file size == 155,168

    // 36 bytes are related to the GZip CRC
}

When this test code is run, we get three files with different sizes. The first file, NewNormalFile.txt, is 10,000,000 bytes in size. The NewCompressedFile.txt file is 42,145 bytes. The final file, NewGzCompressedFile.txt, file is 84,362 bytes. As you can see, there is not much difference between the sizes for the files compressed with the DeflateStream class and the GZipStream class. The reason for this is that both compression classes use the same compression/decompression algorithm (i.e., the lossless Deflate algorithm as described in the RFC 1951: Deflate 1.3 specification).

In .NET 4.5, the GZipStream and DeflateStream classes have been updated to use the zlib library behind the scenes to perform the compression, which has improved the compression ratios. You can see this if you run the older version of the CompressFile and DecompressFile methods on prior versions of the .NET Framework, as shown in Example 8-7.

Example 8-7. Pre–.NET 4.5 version of the CompressFile and DecompressFile methods
/// <summary>
/// Compress the source file to the destination file.
/// This is done in 1MB chunks to not overwhelm the memory usage.
/// </summary>
/// <param name="sourceFile">the uncompressed file</param>
/// <param name="destinationFile">the compressed file</param>
/// <param name="compressionType">the type of compression to use</param>
public static void CompressFile(string sourceFile,
                                string destinationFile,
                                CompressionType compressionType)
{
    if (sourceFile != null)
    {
        FileStream streamSource = null;
        FileStream streamDestination = null;
        Stream streamCompressed = null;

        using (streamSource = File.OpenRead(sourceFile))
        {
            using (streamDestination = File.OpenWrite(destinationFile))
            {
                // read 1MB chunks and compress them
                long fileLength = streamSource.Length;

                // write out the fileLength size
                byte[] size = BitConverter.GetBytes(fileLength);
                streamDestination.Write(size, 0, size.Length);

                long chunkSize = 1048576; // 1MB
                while (fileLength > 0)
                {
                    // read the chunk
                    byte[] data = new byte[chunkSize];
                    streamSource.Read(data, 0, data.Length);

                    // compress the chunk
                    MemoryStream compressedDataStream =
                        new MemoryStream();

                    if (compressionType == CompressionType.Deflate)
                        streamCompressed =
                            new DeflateStream(compressedDataStream,
                                CompressionMode.Compress);
                    else
                        streamCompressed =
                            new GZipStream(compressedDataStream,
                                CompressionMode.Compress);

                    using (streamCompressed)
                    {
                        // write the chunk in the compressed stream
                        streamCompressed.Write(data, 0, data.Length);
                    }
                    // get the bytes for the compressed chunk
                    byte[] compressedData =
                        compressedDataStream.GetBuffer();

                    // write out the chunk size
                    size = BitConverter.GetBytes(chunkSize);
                    streamDestination.Write(size, 0, size.Length);

                    // write out the compressed size
                    size = BitConverter.GetBytes(compressedData.Length);
                    streamDestination.Write(size, 0, size.Length);

                    // write out the compressed chunk
                    streamDestination.Write(compressedData, 0,
                        compressedData.Length);

                    // subtract the chunk size from the file size
                    fileLength -= chunkSize;

                    // if chunk is less than remaining file use
                    // remaining file
                    if (fileLength < chunkSize)
                        chunkSize = fileLength;
                }
            }
        }
    }
}

/// <summary>
/// This function will decompress the chunked compressed file
/// created by the CompressFile function.
/// </summary>
/// <param name="sourceFile">the compressed file</param>
/// <param name="destinationFile">the destination file</param>
/// <param name="compressionType">the type of compression to use</param>
public static void DecompressFile(string sourceFile,
                                string destinationFile,
                                CompressionType compressionType)
{
    FileStream streamSource = null;
    FileStream streamDestination = null;
    Stream streamUncompressed = null;

    using (streamSource = File.OpenRead(sourceFile))
    {
        using (streamDestination = File.OpenWrite(destinationFile))
        {
            // read the fileLength size
            // read the chunk size
            byte[] size = new byte[sizeof(long)];
            streamSource.Read(size, 0, size.Length);
            // convert the size back to a number
            long fileLength = BitConverter.ToInt64(size, 0);
            long chunkSize = 0;
            int storedSize = 0;
            long workingSet = Process.GetCurrentProcess().WorkingSet64;
            while (fileLength > 0)
            {
                // read the chunk size
                size = new byte[sizeof(long)];
                streamSource.Read(size, 0, size.Length);
                // convert the size back to a number
                chunkSize = BitConverter.ToInt64(size, 0);
                if (chunkSize > fileLength ||
                    chunkSize > workingSet)
                    throw new InvalidDataException();

                // read the compressed size
                size = new byte[sizeof(int)];
                streamSource.Read(size, 0, size.Length);
                // convert the size back to a number
                storedSize = BitConverter.ToInt32(size, 0);
                if (storedSize > fileLength ||
                    storedSize > workingSet)
                    throw new InvalidDataException();

                if (storedSize > chunkSize)
                    throw new InvalidDataException();

                byte[] uncompressedData = new byte[chunkSize];
                byte[] compressedData = new byte[storedSize];
                streamSource.Read(compressedData, 0,
                    compressedData.Length);

                // uncompress the chunk
                MemoryStream uncompressedDataStream =
                    new MemoryStream(compressedData);

                if (compressionType == CompressionType.Deflate)
                    streamUncompressed =
                        new DeflateStream(uncompressedDataStream,
                            CompressionMode.Decompress);
                else
                    streamUncompressed =
                        new GZipStream(uncompressedDataStream,
                            CompressionMode.Decompress);

                using (streamUncompressed)
                {
                    // read the chunk in the compressed stream
                    streamUncompressed.Read(uncompressedData, 0,
                        uncompressedData.Length);
                }

                // write out the uncompressed chunk
                streamDestination.Write(uncompressedData, 0,
                    uncompressedData.Length);

                // subtract the chunk size from the file size
                fileLength -= chunkSize;

                // if chunk is less than remaining file use remaining file
                if (fileLength < chunkSize)
                    chunkSize = fileLength;
            }
        }
    }
}

You may be wondering why you would pick one class over the other if they use the same algorithm. One good reason is that the GZipStream class adds a CRC (cyclic redundancy check) to the compressed data to determine if it has been corrupted. If the data has been corrupted, an InvalidDataException is thrown with the statement “The CRC in GZip footer does not match the CRC calculated from the decompressed data.” By catching this exception, you can determine if your data is corrupted.

In the Decompress method, it’s possible for some InvalidDataException instances to be thrown:

// read the chunk size
size = new byte[sizeof(long)];
streamSource.Read(size, 0, size.Length);
// convert the size back to a number
chunkSize = BitConverter.ToInt64(size, 0);
if (chunkSize > fileLength || chunkSize > workingSet)
    throw new InvalidDataException();

// read the compressed size
size = new byte[sizeof(int)];
streamSource.Read(size, 0, size.Length);
// convert the size back to a number
storedSize = BitConverter.ToInt32(size, 0);
if (storedSize > fileLength || storedSize > workingSet)
    throw new InvalidDataException();
if (storedSize > chunkSize)
    throw new InvalidDataException();

byte[] uncompressedData = new byte[chunkSize];
byte[] compressedData = new byte[storedSize];

The code is reading in a buffer that may have been tampered with, so we need to check not only for stability but also for security reasons. Since Decompress will actually allocate memory based on the numbers derived from the buffer, it needs to be careful about what those numbers turn out to be, and we don’t want to unwittingly bring in other code that has been injected into the stream either. The very basic checks being done here are to ensure that:

  • The size of the chunk is not bigger than the file length.

  • The size of the chunk is not bigger than the current program working set.

  • The size of the compressed chunk is not bigger than the file length.

  • The size of the compressed chunk is not bigger than the current program working set.

  • The size of the compressed chunk is not bigger than the actual chunk size.

See Also

The “DeflateStream Class” and “GZipStream” topics in the MSDN documentation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.182.150