How it works...

We begin by importing libraries for argument handling, argparse and os, followed by the win32com library from pywin32. We also import the pywintypes library to properly catch and handle pywin32 errors:

from __future__ import print_function
from argparse import ArgumentParser
import os
import win32com.client
import pywintypes

This recipe's command-line handler accepts two positional arguments, MSG_FILE and OUTPUT_DIR, which represent the path to the MSG file to process and the desired output folder, respectively. We check if the desired output folder exists and create it if it does not. Afterwards, we pass the two inputs to the main() function:

if __name__ == '__main__':
parser = ArgumentParser(
description=__description__,
epilog="Developed by {} on {}".format(
", ".join(__authors__), __date__)
)
parser.add_argument("MSG_FILE", help="Path to MSG file")
parser.add_argument("OUTPUT_DIR", help="Path to output folder")
args = parser.parse_args()
out_dir = args.OUTPUT_DIR
if not os.path.exists(out_dir):
os.makedirs(out_dir)
main(args.MSG_FILE, args.OUTPUT_DIR)

In the main() function we call the win32com library to set up the Outlook API configuring it in such a way that allows access to the MAPI namespace. Using this mapi variable, we can open an MSG file with the OpenSharedItem() method and create an object we will use for the other functions in this recipe. These functions include: display_msg_attribs(), display_msg_recipients(), extract_msg_body(), and extract_attachments(). Let's now turn our attention to each of these functions, in turn, to see how they work:

def main(msg_file, output_dir):
mapi = win32com.client.Dispatch(
"Outlook.Application").GetNamespace("MAPI")
msg = mapi.OpenSharedItem(os.path.abspath(args.MSG_FILE))
display_msg_attribs(msg)
display_msg_recipients(msg)
extract_msg_body(msg, output_dir)
extract_attachments(msg, output_dir)

The display_msg_attribs() function allows us to display the various attributes of a message (subject, to, BCC, size, and so on). Some of these attributes may not be present in the message we are parsing, however, we attempt to export all values regardless. The attribs list shows, in order, the attributes we try to access from the message. As we iterate through each attribute, we use the built-in getattr() method on the msg object and attempt to extract the relevant value, if present, and "N/A" if not. We then print the attribute and its determined value to the console. As a word of caution, some of these values may be present but only set to a default value, such as the year 4501 for some dates:

def display_msg_attribs(msg):
# Display Message Attributes
attribs = [
'Application', 'AutoForwarded', 'BCC', 'CC', 'Class',
'ConversationID', 'ConversationTopic', 'CreationTime',
'ExpiryTime', 'Importance', 'InternetCodePage', 'IsMarkedAsTask',
'LastModificationTime', 'Links', 'OriginalDeliveryReportRequested',
'ReadReceiptRequested', 'ReceivedTime', 'ReminderSet',
'ReminderTime', 'ReplyRecipientNames', 'Saved', 'Sender',
'SenderEmailAddress', 'SenderEmailType', 'SenderName', 'Sent',
'SentOn', 'SentOnBehalfOfName', 'Size', 'Subject',
'TaskCompletedDate', 'TaskDueDate', 'To', 'UnRead'
]
print(" Message Attributes")
print("==================")
for entry in attribs:
print("{}: {}".format(entry, getattr(msg, entry, 'N/A')))

The display_msg_recipients() function iterates through the message and displays recipient details. The msg object provides a Recipients() method, which accepts an integer argument to access recipients by index. Using a while loop, we try to load and display values for available recipients. For each recipient found, as in the prior function, we use of getattr() method with a list of attributes, called recipient_attrib, to extract and print the relevant values or, if they are not present, assign them the value "N/A". Though most Python iterables use zero as the first index, the Recipients() method starts at 1. For this reason, the variable i will start at 1 and be incremented until no further recipients are found. We will continue to try and read these values until we receive a pywin32 error:

def display_msg_recipients(msg):
# Display Recipient Information
recipient_attrib = [
'Address', 'AutoResponse', 'Name', 'Resolved', 'Sendable'
]
i = 1
while True:
try:
recipient = msg.Recipients(i)
except pywintypes.com_error:
break

print(" Recipient {}".format(i))
print("=" * 15)
for entry in recipient_attrib:
print("{}: {}".format(entry, getattr(recipient, entry, 'N/A')))
i += 1

The extract_msg_body() function is designed to extract the body content from the message. The msg object exposes the body content in a few different formats; in this recipe, we will export the HTML, using the HTMLBody() method, and plaintext, using the Body() method, versions of the body. Since these objects are byte strings, we must first decode them, which we do with the cp1252 code page. With the decoded content, we open the output file for writing, in the user-specified directory, and create the respective *.body.html and *.body.txt files:

def extract_msg_body(msg, out_dir):
# Extract HTML Data
html_data = msg.HTMLBody.encode('cp1252')
outfile = os.path.join(out_dir, os.path.basename(args.MSG_FILE))
open(outfile + ".body.html", 'wb').write(html_data)
print("Exported: {}".format(outfile + ".body.html"))

# Extract plain text
body_data = msg.Body.encode('cp1252')
open(outfile + ".body.txt", 'wb').write(body_data)
print("Exported: {}".format(outfile + ".body.txt"))

Lastly, the extract_attachments() function exports attachment data from the MSG file to the desired output directory. Using the msg object, we again create a list, attachment_attribs, representing a series of attributes about an attachment. Similar to the recipient function, we use a while loop and the Attachments() method, which accepts an integer as an argument to select an attachment by index, to iterate through each attachment. As we saw before with the Recipients() method, the Attachments() method starts its index at 1. For this reason, the variable i will start at 1 and be incremented until no further attachments are found:

def extract_attachments(msg, out_dir):
attachment_attribs = [
'DisplayName', 'FileName', 'PathName', 'Position', 'Size'
]
i = 1 # Attachments start at 1
while True:
try:
attachment = msg.Attachments(i)
except pywintypes.com_error:
break

For each attachment, we print its attributes to the console. The attributes we extract and print are defined in the attachment_attrib list at the beginning of this function. After printing available attachment details, we write its content using the SaveAsFile() method and supplying it with a string containing the output path and desired name of the output attachment (which is obtained using the FileName attribute). After this, we are ready to move onto the next attachment and so we increment variable i and try to access the next attachment.

        print("
Attachment {}".format(i))
print("=" * 15)
for entry in attachment_attribs:
print('{}: {}'.format(entry, getattr(attachment, entry,
"N/A")))
outfile = os.path.join(os.path.abspath(out_dir),
os.path.split(args.MSG_FILE)[-1])
if not os.path.exists(outfile):
os.makedirs(outfile)
outfile = os.path.join(outfile, attachment.FileName)
attachment.SaveAsFile(outfile)
print("Exported: {}".format(outfile))
i += 1

When we execute this code, we see the following output, along with several files in the output directory. This includes the body as text and HTML, along with any discovered attachments. The attributes of the message and its attachments are displayed in the console window.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.181.231