Reading and writing data

Now that the function works, we can put it to work using any address, or an array of addresses using loops. For that, addresses could be copied and pasted into Jupyter, but that is not a sustainable solution. Most of the time, our data is stored somewhere in a database or a file. Let's learn how to read addresses from a file and store the results to another file.

CSV is a popular text-based format for tabular data, where each line represents a row and cells are separated by separator symbols—usually commas, but it could be a semicolon or a pipe. Cells containing separator or newline symbols are usually "escaped" using quotes. This format is not the most efficient, but it is widespread and easy to read using any text editor.

Python has a built-in library for dealing with .csv files—it is called csv. It has two ways to parse files: representing each row as a list or as a dictionary. We'll use the second approach:

from csv import DictReader, DictWriter
path = './cities.csv'

with open(path, 'r') as f:
    data = list(DictReader(f))

Here, DictReader is a generator that treats the first row of the CSV file as the header (which would be the column names) and uses it to create an ordered dictionary (which is just a standard dictionary with the order ensured) for each row. Its content requires the file to be open, so we need to either convert it to a list (storing all the data in memory) before closing the file or run all our geocoding within the scope of the open file. The second approach can handle a file of any size, storing only one row at a time in memory, but is more complex. So, for now, we'll stick with the first, simpler approach, which is still sufficient in the vast majority of cases.

Let's wrap this code into another function, and, since we’re working with files, write another one to write CSV files:

def read_csv(path):
    '''read csv and return it as a list of dictionaries, one per row'''
    with open(path, 'r') as f:
        return list(DictReader(f))

def write_csv(data, path, mode='w'):
    '''write data to csv or append to existing one'''
    if mode not in 'wa':
        raise ValueError("mode should be either 'w' or 'a'")
    
    with open(path, mode) as f:
        writer = DictWriter(f, fieldnames=data[0].keys())
        if mode == 'w':
            writer.writeheader() 

        for row in data:
            writer.writerow(row)

The preceding store_csv function is capable of writing a new file or appending data to an existing one, assuming it is similarly structured, without adding a header a second time. With this code, we can now read and write data!

For testing purposes, we prepared a tiny CSV file called cities.csv, which covers the top 10 largest cities in the world, according to the ArchDaily website (https://www.archdaily.com/906605/the-20-largest-cities-in-the-world-of-2018). Here is what the first two rows of the data look like:

Name	Population	Country
Tokyo	38.05	Japan
Jakarta	32.27	Indonesia

Before we start geocoding, let's test our reading function on this sample. We use the function we just wrote to read the file in the following sections.

Once it's done, we check the first element in the resulting list (the one representing the first row in the CSV file):

cities = read_csv('./cities.csv')
cities[0]
>>> OrderedDict([('name', 'Tokyo'), ('population', '38.05'), ('country', 'Japan')])

Again, for the sake of testing, let's try writing the data into another file:

write_csv(cities, './my_cities.csv')

Once the operation is done, feel free to check the new file. Having read all the addresses into memory, we are now ready to geocode!

Table of Contents for Reading and writing data

Create new playlist

Sign In

Sign Up

Table of Contents for
Reading and writing data