Writing to files

We have needed to extract lines of data throughout this book. You may have noticed that, in most of these examples, we used a dataset (a Python list object that was used to collect data) that was appended with various fields in a Python list, as shown in the following code (collected from various examples of this book):

dataSet.append([year,month,day,game_date,team1,team1_score,team2,team2_score,game_status])
..
dataSet.append([title,price,availability,image.replace('../../../..',baseUrl),rating.replace('star-rating ','')])
...
dataSet.append([link, atype, adate, title, excerpt,",".join(categories)])
...
dataSet.append([titleLarge, title, price, stock, image, starRating.replace('star-rating ', ''), url])

With the availability of such a dataset, we can write this information to external files, as well as to the database. Before we write the dataset to the files, column names that describe the data from the dataset are needed. Consider the following code, where keys is a separate list containing a string title, that is, the name of the columns to the respective list item appended to the dataset:

keys = ['year','month','day','game_date','team1', 'team1_score', 'team2', 'team2_score', 'game_status']
......
dataSet.append([year,month,day,game_date,team1,team1_score,team2,team2_score,game_status])

Let's consider the following example, which contains colNames with the column to be used, and dataSet with the cleaned and formatted data:

import csv
import json

colNames = ['Title','Price','Stock','Rating']
dataSet= [['Rip it Up and ...', 35.02, 'In stock', 5],['Our Band Could Be ...', 57.25, 'In stock', 4],
['How Music Works', 37.32, 'In stock', 2],['Love Is a Mix ...', 18.03, 'Out of stock',1],
['Please Kill Me: The ...', 31.19, 'In stock', 4],["Kill 'Em and Leave: ...", 45.0, 'In stock',5],
['Chronicles, Vol. 1', 52.60, 'Out of stock',2],['This Is Your Brain ...', 38.4, 'In stock',1],
['Orchestra of Exiles: The ...', 12.36, 'In stock',3],['No One Here Gets ...', 20.02, 'In stock',5],
['Life', 31.58, 'In stock',5],['Old Records Never Die: ...', 55.66, 'Out of Stock',2],
['Forever Rockers (The Rocker ...', 28.80, 'In stock',3]]

Now we will write the preceding dataSet to the CSV file. The first line of the CSV file should always contain the column names. In this case, we will use colNames for the columns:

fileCsv = open('bookdetails.csv', 'w', newline='', encoding='utf-8')
writer = csv.writer(fileCsv) #csv.writer object created

writer.writerow(colNames) #write columns from colNames
for data in dataSet: #iterate through dataSet and write to file
writer.writerow(data)

fileCsv.close() #closes the file handler

The preceding code will result in the bookdetails.csv file, which has the following content:

Title,Price,Stock,Rating
Rip it Up and ...,35.02,In stock,5
Our Band Could Be ...,57.25,In stock,4
...........
Life,31.58,In stock,5
Old Records Never Die: ...,55.66,Out of Stock,2
Forever Rockers (The Rocker ...,28.8,In stock,3

Similarly, let's create a JSON file with colNames and dataSets. JSON is similar to Python dictionary, where each data or value possesses a key; that is, it exists in a key-value pair:

finalDataSet=list() #empty DataSet

for data in dataSet:
finalDataSet.append(dict(zip(colNames,data)))

print(finalDataSet)

[{'Price': 35.02, 'Stock': 'In stock', 'Title': 'Rip it Up and ...', 'Rating': 5}, {'Price': 57.25, 'Stock': 'In stock', ..........'Title': 'Old Records Never Die: ...', 'Rating': 2}, {'Price': 28.8, 'Stock': 'In stock', 'Title': 'Forever Rockers (The Rocker ...', 'Rating': 3}]

As we can see, finalDataSet is formed by appending data from dataSet and by using the zip() Python function. zip() combines each individual element from the list. This zipped object is then converted into a Python dictionary. For example, consider the following code:

#first iteration from loop above dict(zip(colNames,data)) will generate
{'Rating': 5, 'Title': 'Rip it Up and ...', 'Price': 35.02, 'Stock': 'In stock'}

Now, with the available finalDataSet, we can dump or add the data to a JSON file using the dump() function from the json module:

with open('bookdetails.json', 'w') as jsonfile:
json.dump(finalDataSet,jsonfile)

The preceding code will result in the bookdetails.json file. Its content is as follows:

[
{
"Price": 35.02,
"Stock": "In stock",
"Title": "Rip it Up and ...",
"Rating": 5
},
................
{
"Price": 28.8,
"Stock": "In stock",
"Title": "Forever Rockers (The Rocker ...",
"Rating": 3
}
]

In this section, we have covered the basic steps for managing raw data. The files we have obtained can be shared and exchanged easily across various independent systems, used as models for ML, and can be imported as data sources in applications. Furthermore, we can also use Database Management Systems (DBMS) such as MySQL, PostgreSQL, and more to store data and execute Structured Query Language (SQL) using the necessary Python libraries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.248.119