Loading the dataset

First of all, it is essential to download the dataset. Follow the preceding steps from the Technical requirements section and download the data. Gmail (https://takeout.google.com/settings/takeout) provides data in mbox format. For this chapter, I loaded my own personal email from Google Mail. For privacy reasons, I cannot share the dataset. However, I will show you different EDA operations that you can perform to analyze several aspects of your email behavior:

  1. Let's load the required libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Note that for this analysis, we need to have the mailbox package installed. If it is not installed on your system, it can be added to your Python build using the pip install mailbox instruction.
  1. When you have loaded the libraries, load the dataset:
import mailbox

mboxfile = "PATH TO DOWNLOADED MBOX FIL"
mbox = mailbox.mbox(mboxfile)
mbox

Note that it is essential that you replace the mbox file path with your own path.

The output of the preceding code is as follows:

<mailbox.mbox at 0x7f124763f5c0>

The output indicates that the mailbox has been successfully created.

  1. Next, let's see the list of available keys:
for key in mbox[0].keys():
print(key)

The output of the preceding code is as follows:

X-GM-THRID
X-Gmail-Labels
Delivered-To
Received
X-Google-Smtp-Source
X-Received
ARC-Seal
ARC-Message-Signature
ARC-Authentication-Results
Return-Path
Received
Received-SPF
Authentication-Results
DKIM-Signature
DKIM-Signature
Subject
From
To
Reply-To
Date
MIME-Version
Content-Type
X-Mailer
X-Complaints-To
X-Feedback-ID
List-Unsubscribe
Message-ID

The preceding output shows the list of keys that are present in the extracted dataset. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.196.146