How it works...

We import a number of libraries to assist with argument parsing, creating CSV spreadsheets, hashing files, handling evidence containers and filesystems, and creating progress bars.

from __future__ import print_function
import argparse
import csv
import hashlib
import os
import pytsk3
import pyewf
import sys
from tqdm import tqdm

This recipe's command-line handler takes three positional arguments, EVIDENCE_FILE, TYPE, and HASH_LIST, which represent the evidence file, the type of evidence file, and the newline delimited list of hashes to search for, respectively. As always, the user can also manually supply the partition type using the p switch if necessary.

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__description__,
        epilog="Developed by {} on {}".format(
            ", ".join(__authors__), __date__)
    )
    parser.add_argument("EVIDENCE_FILE", help="Evidence file path")
    parser.add_argument("TYPE", help="Type of Evidence",
                        choices=("raw", "ewf"))
    parser.add_argument("HASH_LIST",
                        help="Filepath to Newline-delimited list of "
                             "hashes (either MD5, SHA1, or SHA-256)")
    parser.add_argument("-p", help="Partition Type",
                        choices=("DOS", "GPT", "MAC", "SUN"))
    parser.add_argument("-t", type=int,
                        help="Total number of files, for the progress bar")
    args = parser.parse_args()

After we parse the inputs, we perform our typical input-validation checks on both the evidence file and the hash list. If those pass, we call the main() function and supply it with the user-supplied inputs.

    if os.path.exists(args.EVIDENCE_FILE) and 
            os.path.isfile(args.EVIDENCE_FILE) and 
            os.path.exists(args.HASH_LIST) and 
            os.path.isfile(args.HASH_LIST):
        main(args.EVIDENCE_FILE, args.TYPE, args.HASH_LIST, args.p, args.t)
    else:
        print("[-] Supplied input file {} does not exist or is not a "
              "file".format(args.EVIDENCE_FILE))
        sys.exit(1)

As in the previous recipe, the main() function, EWFImgInfo class, and the open_fs() function are nearly identical to the previous recipes. For a more detailed explanation of these functions, refer to the previous recipes. One new addition to the main() function is the first line, where we call the read_hashes() method. This method reads the input hash list and returns a list of hashes and the type of hash (that is, MD5, SHA-1, or SHA-256).

Other than that, the main() function proceeds as we are accustomed to seeing it. First, it determines what type of evidence file it is working with in order to create a pytsk3 handle on the image. Then, it uses that handle and attempts to access the image volume. Once this process has completed, the variables are sent to the open_fs() function for further processing.

def main(image, img_type, hashes, part_type, pbar_total=0):
    hash_list, hash_type = read_hashes(hashes)
    volume = None
    print("[+] Opening {}".format(image))
    if img_type == "ewf":
        try:
            filenames = pyewf.glob(image)
        except IOError:
            _, e, _ = sys.exc_info()
            print("[-] Invalid EWF format:
 {}".format(e))
            sys.exit(2)

        ewf_handle = pyewf.handle()
        ewf_handle.open(filenames)

        # Open PYTSK3 handle on EWF Image
        img_info = EWFImgInfo(ewf_handle)
    else:
        img_info = pytsk3.Img_Info(image)

    try:
        if part_type is not None:
            attr_id = getattr(pytsk3, "TSK_VS_TYPE_" + part_type)
            volume = pytsk3.Volume_Info(img_info, attr_id)
        else:
            volume = pytsk3.Volume_Info(img_info)
    except IOError:
        _, e, _ = sys.exc_info()
        print("[-] Unable to read partition table:
 {}".format(e))

    open_fs(volume, img_info, hash_list, hash_type, pbar_total)

Let's quickly look at one of the new functions, the read_hashes() method. First, we instantiate the hash_list and hash_type variables as an empty list and None object, respectively. Next, we open and iterate through the input hash list and add each hash to our list. As we do this, if the hash_type variable is still None, we check the length of the line as a means of identifying the type of hash algorithm we should use.

At the end of this process, if for whatever reason the hash_type variable is still None, then the hash list must be made up of hashes we do not support, and so we exit the script after printing the error to the console.

def read_hashes(hashes):
    hash_list = []
    hash_type = None
    with open(hashes) as infile:
        for line in infile:
            if hash_type is None:
                if len(line.strip()) == 32:
                    hash_type = "md5"
                elif len(line.strip()) == 40:
                    hash_type == "sha1"
                elif len(line.strip()) == 64:
                    hash_type == "sha256"
            hash_list.append(line.strip().lower())
    if hash_type is None:
        print("[-] No valid hashes identified in {}".format(hashes))
        sys.exit(3)

    return hash_list, hash_type

The open_fs() method function is identical to that of previous recipes. It tries to use two different methods to access both physical and logical filesystems. Once successful, it passes these filesystems onto the recurse_files() method. As with the previous recipe, the magic happens within this function. We are also incorporating a progress bar with tqdm to provide feedback to the user, as it may take a while to hash all of the files within an image.

def open_fs(vol, img, hashes, hash_type, pbar_total=0):
    # Open FS and Recurse
    print("[+] Recursing through and hashing files")
    pbar = tqdm(desc="Hashing", unit=" files",
                unit_scale=True, total=pbar_total)
    if vol is not None:
        for part in vol:
            if part.len > 2048 and "Unallocated" not in part.desc and 
                    "Extended" not in part.desc and 
                    "Primary Table" not in part.desc:
                try:
                    fs = pytsk3.FS_Info(
                        img, offset=part.start * vol.info.block_size)
                except IOError:
                    _, e, _ = sys.exc_info()
                    print("[-] Unable to open FS:
 {}".format(e))
                root = fs.open_dir(path="/")
                recurse_files(part.addr, fs, root, [], [""], hashes,
                              hash_type, pbar)
    else:
        try:
            fs = pytsk3.FS_Info(img)
        except IOError:
            _, e, _ = sys.exc_info()
            print("[-] Unable to open FS:
 {}".format(e))
        root = fs.open_dir(path="/")
        recurse_files(1, fs, root, [], [""], hashes, hash_type, pbar)
    pbar.close()

Within the recurse_files() method, we iterate through all subdirectories and hash each file. We skip the . and .. directory entries and check that the fs_object has the correct properties. If so, we build the file path for use in our output.

def recurse_files(part, fs, root_dir, dirs, parent, hashes,
                  hash_type, pbar):
    dirs.append(root_dir.info.fs_file.meta.addr)
    for fs_object in root_dir:
        # Skip ".", ".." or directory entries without a name.
        if not hasattr(fs_object, "info") or 
                not hasattr(fs_object.info, "name") or 
                not hasattr(fs_object.info.name, "name") or 
                fs_object.info.name.name in [".", ".."]:
            continue
        try:
            file_path = "{}/{}".format("/".join(parent),
                                       fs_object.info.name.name)

As we perform each iteration, we determine which objects are files versus directories. For each file discovered, we send it to the hash_file() method along with its path, the list of hashes, and the hash algorithm. The remainder of the recurse_files() function logic is specifically designed to handle directories and makes recursive calls to this function for any sub-directories to ensure the whole tree is walked and files are not missed.

            if getattr(fs_object.info.meta, "type", None) == 
                    pytsk3.TSK_FS_META_TYPE_DIR:
                parent.append(fs_object.info.name.name)
                sub_directory = fs_object.as_directory()
                inode = fs_object.info.meta.addr

                # This ensures that we don't recurse into a directory
                # above the current level and thus avoid circular loops.
                if inode not in dirs:
                    recurse_files(part, fs, sub_directory, dirs,
                                  parent, hashes, hash_type, pbar)
                    parent.pop(-1)
            else:
                hash_file(fs_object, file_path, hashes, hash_type, pbar)

        except IOError:
            pass
    dirs.pop(-1)

The hash_file() method first checks which type of hash algorithm instance to create based on the hash_type variable. With that decided and an update of the file size to the progress bar, we read the file's data into the hash object using the read_random() method. Again, we read the entire file's contents by starting our read at the first byte and reading the entire file's size. We generate the hash of the file using the hexdigest() function on the hash object and then check whether that hash is in our list of supplied hashes. If it is, we alert the user by printing the file path, using pbar.write() to prevent progress bar display issues, and name to the console.

def hash_file(fs_object, path, hashes, hash_type, pbar):
    if hash_type == "md5":
        hash_obj = hashlib.md5()
    elif hash_type == "sha1":
        hash_obj = hashlib.sha1()
    elif hash_type == "sha256":
        hash_obj = hashlib.sha256()
    f_size = getattr(fs_object.info.meta, "size", 0)
    pbar.set_postfix(File_Size="{:.2f}MB".format(f_size / 1024.0 / 1024))
    hash_obj.update(fs_object.read_random(0, f_size))
    hash_digest = hash_obj.hexdigest()
    pbar.update()

    if hash_digest in hashes:
        pbar.write("[*] MATCH: {}
{}".format(path, hash_digest))

By running the script we are presented with a nice progress bar showing the hashing status and a list of files that match the list of provided hashes, as seen in the following screenshot:

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...