Data cleansing

 Let's create a CSV file with only the required fields. Let's start with the following steps:

  1. Import the csv package:
import csv
  1. Create a CSV file with only the required attributes:
with open('mailbox.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])

for message in mbox:
writer.writerow([
message['subject'],
message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
]
)

The preceding output is a csv file named mailbox.csv. Next, instead of loading the mbox file, we can use the CSV file for loading, which will be smaller than the original dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.189.23