How it works...

This script imports the required libraries to handle argument parsing, file and folder iteration, writing CSV spreadsheets, and the yara library to compile and scan for the YARA rules.

from __future__ import print_function
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
import os
import csv
import yara

This recipe's command-line handler accepts two positional arguments, yara_rules and path_to_scan, which represent the path to the YARA rules and the file or folder to scan, respectively. This recipe also accepts one optional argument, output, which, if supplied, writes the results of the scan to a spreadsheet as opposed to the console. Lastly, we pass these values to the main() method.

if __name__ == '__main__':
parser = ArgumentParser(
description=__description__,
formatter_class=ArgumentDefaultsHelpFormatter,
epilog="Developed by {} on {}".format(
", ".join(__authors__), __date__)
)
parser.add_argument(
'yara_rules',
help="Path to Yara rule to scan with. May be file or folder path.")
parser.add_argument(
'path_to_scan',
help="Path to file or folder to scan")
parser.add_argument(
'--output',
help="Path to output a CSV report of scan results")
args = parser.parse_args()

main(args.yara_rules, args.path_to_scan, args.output)

In the main() function, we accept the path to the yara rules, the files or folders to scan, and the output file (if any). Since the yara rules can be a file or directory, we use the ios.isdir() method to determine if we use the compile() method on a whole directory or, if the input is a file, pass it to the method using the filepath keyword. The compile() method reads the rule file or files and creates an object that we can match against objects we scan.

def main(yara_rules, path_to_scan, output):
if os.path.isdir(yara_rules):
yrules = yara.compile(yara_rules)
else:
yrules = yara.compile(filepath=yara_rules)

Once the rules are compiled, we perform a similar if-else statement to process the path to scan. If the input to scan is a directory, we pass it to the process_directory() function and, otherwise, we use the process_file() method. Both take the compiled YARA rules and the path to scan and return a list of dictionaries containing any matches.

    if os.path.isdir(path_to_scan):
match_info = process_directory(yrules, path_to_scan)
else:
match_info = process_file(yrules, path_to_scan)

As you may guess, we will ultimately convert this list of dictionaries to a CSV report if the output path was specified, using the columns we define in the columns list. However, if the output argument is None, we write this data to the console in a different format instead.

    columns = ['rule_name', 'hit_value', 'hit_offset', 'file_name',
'rule_string', 'rule_tag']

if output is None:
write_stdout(columns, match_info)
else:
write_csv(output, columns, match_info)

The process_directory() function essentially iterates through a directory and passes each file to the process_file() function. This decreases the amount of redundant code in the script. Each processed entry that is returned is added to the match_info list, as the returned object is a list. Once we have processed each file, we return the complete list of results to the parent function.

def process_directory(yrules, folder_path):
match_info = []
for root, _, files in os.walk(folder_path):
for entry in files:
file_entry = os.path.join(root, entry)
match_info += process_file(yrules, file_entry)
return match_info

The process_file() method uses with the match() method of the yrules object. The returned match object is an iterable containing one or more hits against the rules. From the hit, we can extract the rule name, any tags, the offset in the file, the string value of the rule, and the string value of the hit. This information, plus the file path, will form an entry in the report. Collectively, this information is useful in identifying whether the hit is a false positive or is of significance. It can also be helpful when fine-tuning YARA rules to ensure only relevant results are presented for review.

def process_file(yrules, file_path):
match = yrules.match(file_path)
match_info = []
for rule_set in match:
for hit in rule_set.strings:
match_info.append({
'file_name': file_path,
'rule_name': rule_set.rule,
'rule_tag': ",".join(rule_set.tags),
'hit_offset': hit[0],
'rule_string': hit[1],
'hit_value': hit[2]
})
return match_info

To write_stdout() function reports match information to the console if the user does not specify an output file. We iterate through each entry in the match_info list and print each column name and its value from the match_info dictionary in a colon-delimited, newline-separated format. After each entry, we print 30 equals signs to visually separate the entries from each other.

def write_stdout(columns, match_info):
for entry in match_info:
for col in columns:
print("{}: {}".format(col, entry[col]))
print("=" * 30)

The write_csv() method follows the standard convention, using the DictWriter class to write the headers and all of the data into the sheet. Notice how this function is adjusted to handle CSV writing in Python 3, using the 'w' mode and newline parameter.

def write_csv(outfile, fieldnames, data):
with open(outfile, 'w', newline="") as open_outfile:
csvfile = csv.DictWriter(open_outfile, fieldnames)
csvfile.writeheader()
csvfile.writerows(data)

Using this code, we can provide the appropriate arguments at the command-line and generate a report of any matches. The following screenshot shows the custom rules for detecting Python files and keyloggers:

These rules are shown in the output CSV report, or console if a report is not specified, as seen here:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.6.75