Visualizing Mail “Events” with SIMILE Timeline

There are numerous ways to visualize mail data. You could bucket messages by time and present the data as a bar chart to inspect the time of day that the most mail transactions are happening, create a graph of connections among senders and recipients and filter by the discussion thread, load query results onto a time line, or use any number of other techniques. This section demonstrates out-of-the-box usage of the SIMILE Timeline, an easy to use (yet extremely powerful) tool for visualizing event-centric data. The SIMILE Timeline is particularly useful for exploring mail data because it allows us to view the transmission of each individual message as a unique event while also visualizing the larger discussion thread as an extended event that transpires over a longer period of time. We can also easily specify a link for each individual message so that when we click on a message in Timeline, it brings up the full text of the message in Futon.

We’ll stick to pragmatic analysis approaches and avoid building a full-blown web app to visualize the mail data, but given very little additional effort, it wouldn’t be difficult to construct something more robust. We’ll opt to simply modify the output format from Example 3-18 so that it emits JSON that’s compatible with the Timeline. From there, all that’s required is pointing a simple web page to the SIMILE Event Source JSON output on your local filesystem to load up the Timeline. Sample target output is shown in Example 3-20.

Example 3-20. The data format expected by the SIMILE Timeline (mailboxes__participants_in_conversations_adapted_for_simile.py)

{
    "dateTimeFormat": "iso8601", 
    "events": [
        {    
            "start": "2002-02-06T08:20:49-08:00", 
            "description": "Message involving [email protected]", 
            "link": "http://localhost:5984/_utils/document.html?enron/bb...", 
            "durationEvent": false, 
            "title": "Enron Mentions -- 02/06/02"
        },   
        {    
            "start": "2001-05-22T16:20:25-07:00", 
            "description": "Message involving [email protected], ...", 
            "link": "http://localhost:5984/_utils/document.html?enron/24a...", 
            "durationEvent": false, 
            "title": "RE: Pricing of restriction on Enron stock"
        },   
        ...
    ]
}

Example 3-21 demonstrates a basic augmentation to Example 3-18 that’s necessary to produce output that can be consumed by the SIMILE Timeline (shown in Figure 3-6). It creates an event for each individual message in addition to an event for each discussion thread.

Example 3-21. Augmented output from Example 3-18 that emits output that can be consumed by the SIMILE Timeline

# Finally, with full messages of interest on hand, parse out headers of interest
# and compute output for SIMILE Timeline

events = []
for thread in threads_of_interest:

    # Process each thread: create an event object for the thread as well as
    # for individual messages involved in the thread

    participants = []
    message_dates = []
    for message_id in thread['message_ids']:
        doc = [d for d in full_docs if d['_id'] == message_id][0]
        message_dates.append(parse(doc['Date']).isoformat())
        try:
            participants.append(doc.get('From'))
            participants.extend(doc.get('To'))
            if doc.get('Cc'):
                participants.extend(doc.get('Cc'))
            if doc.get('Bcc'):
                participants.extend(doc.get('Bcc'))
        except:
            pass  # Maybe a X-To or X-Origin header, etc. as opposed to To?

        # Append each individual message in the thread

        event = {}
        event['title'] = doc['Subject']
        event['start'] = parse(doc['Date']).isoformat()
        event['durationEvent'] = False
        event['description'] = 'Message involving ' 
            + ', '.join(list(set(participants)))
        event['link'] = 'http://localhost:5984/_utils/document.html?%s/%s' % (DB,
                doc['_id'])
        events.append(event)

    # Find start and end dates for the messages involved in the thread

    if len(thread['message_ids']) > 1:
        event = {}
        event['title'] = doc['Subject']
        message_dates.sort()
        event['start'] = parse(message_dates[0]).isoformat()
        event['end'] = parse(message_dates[-1]).isoformat()
        event['durationEvent'] = True
        event['description'] = str(len(thread['message_ids'])) 
            + ' messages in thread'
        events.append(event)  # append the thread event

if not os.path.isdir('out'):
    os.mkdir('out')

f = open(os.path.join('out', 'simile_data.json'), 'w')
f.write(json.dumps({'dateTimeFormat': 'iso8601', 'events': events}, indent=4))
f.close()

print >> sys.stderr, 'Data file written to: %s' % f.name

# Point SIMILE to the data file
Sample results from a query for “Raptor” visualized with SIMILE Timeline: you can scroll “infinitely” in both directions

Figure 3-6. Sample results from a query for “Raptor” visualized with SIMILE Timeline: you can scroll “infinitely” in both directions

There are lots of online demonstrations of Timeline, along with ample documentation. This simple example of plotting mail on Timeline just shows the bare minimum to get you up and running; it’s just the beginning of what’s possible. The “Getting Started with Timeline” tutorial is a great way to begin. Assuming you have the data to back the queries it requests, the mailboxes__participants_in_conversations_adapted_for_simile.py script is turn-key in that it parses the data, dumps it into an HTML template, and automatically opens it in your web browser. Enjoy!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.74.18