How it works...

We start by importing libraries for argument handling, EML processing, and decoding base64 encoded data. The email library provides classes and methods necessary to read EML files. We will use the message_from_file() function to parse data from the provided EML file. Quopri is a new library to this book which we use to decode the QP encoded values found in the HTML body and attachments. The base64 library, as one might expect, allows us to decode any base64 encoded data:

from __future__ import print_function
from argparse import ArgumentParser, FileType
from email import message_from_file
import os
import quopri
import base64

This recipe's command-line handler accepts one positional argument, EML_FILE, which represents the path to the EML file we will process. We use the FileType class to handle the opening of the file for us:

if __name__ == '__main__':
parser = ArgumentParser(
description=__description__,
epilog="Developed by {} on {}".format(
", ".join(__authors__), __date__)
)
parser.add_argument("EML_FILE",
help="Path to EML File", type=FileType('r'))
args = parser.parse_args()

main(args.EML_FILE)

In the main() function, we read the file-like object into the email library using the message_from_file() function. We can now use the resulting variable, emlfile, to access the headers, body content, attachments, and other payload information. Reading the email headers is simply a matter of iterating through a dictionary provided by the library's _headers attribute. To handle the body content, we must check if this message contains multiple payloads and, if so, pass each to the designated processing function, process_payload():

def main(input_file):
emlfile = message_from_file(input_file)

# Start with the headers
for key, value in emlfile._headers:
print("{}: {}".format(key, value))

# Read payload
print(" Body ")
if emlfile.is_multipart():
for part in emlfile.get_payload():
process_payload(part)
else:
process_payload(emlfile[1])

The process_payload() function begins by extracting extracting the MIME type of the message using the get_content_type() method. We print this value to the console and, on a newline, we print a number of "=" characters to distinguish between this and the remainder of the message.

In one line, we extract the message body content using the get_payload() method and decoding the QP encoded data with the quopri.decodestring() function. We then check the there is a character set of the data and, if we do identify a character set, use the decode() method on the content while specifying the character set. If the encoding is unknown, we will try to decode the object with UTF8, the default when leaving the decode() method empty, and Windows-1252:

def process_payload(payload):
print(payload.get_content_type() + " " + "=" * len(
payload.get_content_type()))
body = quopri.decodestring(payload.get_payload())
if payload.get_charset():
body = body.decode(payload.get_charset())
else:
try:
body = body.decode()
except UnicodeDecodeError:
body = body.decode('cp1252')

With our decoded data, we check the content MIME type to properly handle the storage of the email. The first condition for HTML information, specified by the text/html MIME type, is written to an HTML document in the same directory as the input file. In the second condition, we handle binary data under the Application MIME type. This data is conveyed as base64 encoded values, which we decode before writing to a file in the current directory using the base64.b64decode() function. The binary data has the get_filename() method, which we can use to accurately name the attachment. Note that the output file must be opened in "w" mode for the first type and "wb" mode for the second. If the MIME type is other than what we have covered here, we print the body to the console:

    if payload.get_content_type() == "text/html":
outfile = os.path.basename(args.EML_FILE.name) + ".html"
open(outfile, 'w').write(body)
elif payload.get_content_type().startswith('application'):
outfile = open(payload.get_filename(), 'wb')
body = base64.b64decode(payload.get_payload())
outfile.write(body)
outfile.close()
print("Exported: {} ".format(outfile.name))
else:
print(body)

When we execute this code, we see the header information first printed to the console, followed by the various payloads. In this case, we have a text/plain MIME content first, containing a sample message, followed by an application/vnd.ms-excel attachment that we export, and another text/plain block showing the initial message:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.121.242