Analyzing Your Own Mail Data

The Enron mail data makes for great illustrations in a chapter on mail analysis, but you’ll almost certainly want to take a closer look at your own mail data. Fortunately, many popular mail clients provide an “export to mbox” option, which makes it pretty simple to get your mail data into a format that lends itself to analysis by the techniques described in this chapter. For example, in Apple Mail, you can select some number of messages, pick “Save As…” from the File menu, and then choose “Raw Message Source” as the formatting option to export the messages as an mbox file (see Figure 3-7). A little bit of searching should turn up results for how to do this in most other major clients.

Most mail clients provide an option for exporting your mail data to an mbox archive

Figure 3-7. Most mail clients provide an option for exporting your mail data to an mbox archive

If you exclusively use an online mail client, you could opt to pull your data down into a mail client and export it, but you might prefer to fully automate the creation of an mbox file by pulling the data directly from the server. Just about any online mail service will support POP3 (Post Office Protocol version 3), most also support IMAP (Internet Message Access Protocol), and Python scripts for pulling down your mail aren’t very hard to whip up. One particularly robust command-line tool that you can use to pull mail data from just about anywhere is getmail , which turns out to be written in Python. Two modules included in Python’s standard library, poplib and imaplib , provide a terrific foundation, so you’re also likely to run across lots of useful scripts if you do a bit of searching online. getmail is particularly easy to get up and running. To slurp down your Gmail inbox data, for example, you just download and install it, then set up a getmailrc file with a few basic options. Example 3-22 demonstrates some settings for a *nix environment. Windows users would need to change the [destination] path and [options] message_log values to valid paths.

Example 3-22. Sample getmail settings for a *nix environment

[retriever]
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
username = ptwobrussell
password = blarty-blar-blar

[destination]
type = Mboxrd
path = /tmp/gmail.mbox

[options]
verbose = 2 
message_log = ~/.getmail/gmail.log

With a configuration in place, simply invoking getmail from a terminal does the rest. Once you have a local mbox on hand, you can analyze it using the techniques you’ve learned in this chapter:

$ getmail
getmail version 4.20.0
Copyright (C) 1998-2009 Charles Cazabon.  Licensed under the GNU GPL version 2.
SimpleIMAPSSLRetriever:[email protected]:993:
  msg     1/10972 (4227 bytes) from ... delivered to Mboxrd /tmp/gmail.mbox
  msg     2/10972 (3219 bytes) from ... delivered to Mboxrd /tmp/gmail.mbox
  ...

Tapping into Your Gmail investigates using imaplib to slurp down your Gmail data and analyze it, as one part of the exercises in Chapter 7, which focuses on Google technologies.

The Graph Your (Gmail) Inbox Chrome Extension

There are several useful toolkits floating around that analyze webmail, and one of the most promising to emerge recently is the Graph Your Inbox Chrome Extension. To use this extension, you just install it, authorize it to access your mail data, run some Gmail queries, and let it take care of the rest. You can search for keywords like “pizza,” time values such as “2010,” or run more advanced queries such as “from:[email protected]” and “label:Strata”. It’s highly likely that this extension is only going to keep getting better, given that it’s new and has been so well received thus far. Figure 3-8 shows a sample screenshot.

The Graph Your Inbox Chrome Extension provides a concise summary of your Gmail activity

Figure 3-8. The Graph Your Inbox Chrome Extension provides a concise summary of your Gmail activity

Tapping into Your Gmail provides an overview of how to use Python’s smtplib module to tap into your Gmail account (or any other mail account that speaks SMTP) and mine the textual information in messages. Be sure to check it out when you’re interested in moving beyond mail header information and ready to dig into text mining.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.7.154