Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Exporting data to JSON, CSV, and Excel

While as producers of data visualization, we are mostly using other people's data, importing and reading data are our major activities. We do need to write or export data that we produced or processed, whether it is for our or others' current or future use.

We will demonstrate how to use the previously mentioned Python modules to import, export, and write data to various formats such as JSON, CSV, and XLSX.

For demonstration purposes, we are using the pregenerated dataset from the Importing data from fixed-width data files recipe.

Getting ready

For the Excel writing part, we will need to install the xlwt module (inside our virtual environment) by executing the following command:

$ pip install xlwt

How to do it...

We will present one code sample that contains all the formats that we want to demonstrate: CSV, JSON, and XLSX. The main part of the program accepts the input and calls appropriate functions to transform data. We will walk through separate sections of code explaining its purpose, as shown here:

Import the required modules:

import os
import sys
import argparse

try:
    import cStringIO as StringIO
except:
    import StringIO
import struct
import json
import csv

Then, define the appropriate functions for reading and writing data:

def import_data(import_file):
    '''
    Imports data from import_file.
    Expects to find fixed width row
    Sample row: 161322597 0386544351896 0042
    '''
    mask = '9s14s5s'
    data = []
    with open(import_file, 'r') as f:
        for line in f:
            # unpack line to tuple
            fields = struct.Struct(mask).unpack_from(line)
            # strip any whitespace for each field
            # pack everything in a list and add to full dataset
            data.append(list([f.strip() for f in fields]))
    return data

def write_data(data, export_format):
    '''Dispatches call to a specific transformer and returns data set.
    Exception is xlsx where we have to save data in a file.
    '''
    if export_format == 'csv':
        return write_csv(data)
    elif export_format == 'json':
        return write_json(data)
    elif export_format == 'xlsx':
        return write_xlsx(data)
    else:
        raise Exception("Illegal format defined")

We separately specify separate implementation for each data format (CSV, JSON, and XLSX):

def write_csv(data):
    '''Transforms data into csv. Returns csv as string.
    '''
    # Using this to simulate file IO,
    # as csv can only write to files.
    f = StringIO.StringIO()
    writer = csv.writer(f)
    for row in data:
        writer.writerow(row)
    # Get the content of the file-like object
    return f.getvalue()

def write_json(data):
    '''Transforms data into json. Very straightforward.
    '''
    j = json.dumps(data)
    return j

def write_xlsx(data):
    '''Writes data into xlsx file.
    
    '''
    from xlwt import Workbook
    book = Workbook()
    sheet1 = book.add_sheet("Sheet 1")
    row = 0
    for line in data:
        col = 0
        for datum in line:
            print datum
            sheet1.write(row, col, datum)
            col += 1
        row += 1
        # We have hard limit here of 65535 rows
        # that we are able to save in spreadsheet.
        if row > 65535:
            print >> sys.stderr, "Hit limit of # of rows in one sheet (65535)."
            break
    # XLS is special case where we have to
    # save the file and just return 0
    f = StringIO.StringIO()
    book.save(f)
    return f.getvalue()

Finally, we have the main code entry point, where we parse argument-like files from the command line to import data and export it to the required format:

if __name__ == '__main__':
    # parse input arguments
    parser = argparse.ArgumentParser()
    parser.add_argument("import_file", help="Path to a fixed-width data file.")
    parser.add_argument("export_format", help="Export format: json, csv, xlsx.")
    args = parser.parse_args()

    if args.import_file is None:
        print >> sys.stderr, "You must specify path to import from."
        sys.exit(1)

    if args.export_format not in ('csv','json','xlsx'):
        print >> sys.stderr, "You must provide valid export file format."
        sys.exit(1)

    # verify given path is accessible file
    if not os.path.isfile(args.import_file):
        print >> sys.stderr, "Given path is not a file: %s" % args.import_file
        sys.exit(1)

    # read from formatted fixed-width file
    data = import_data(args.import_file)

    # export data to specified format
    # to make this Unix-like pipe-able
    # we just print to stdout
    print write_data(data, args.export_format)

How it works...

In one broad sentence, we import the fixed-width dataset (as defined in the Importing data from fixed-width datafiles recipe) and then export that to stdout, so we can catch that in a file or as an input to another program.

We call out the programmer from the command line giving two mandatory arguments: the input filename and the export data format (JSON, CSV, and XLSX).

If we successfully parse those arguments, we dispatch the input file reading to a function import_data(), which returns the Python data structure (list of lists) that we can easily manipulate to get to the appropriate export format.

We route our request inside the write_data() function, where we just forward a call to the appropriate function (for example, write_csv()).

For CSV, we obtain the csv.writer() instance that we use to write every line of data we iterate over.

We just return the given string as we will redirect this output from our program to another program (or just to copy in a file).

The JSON export is not required for this example as the json module provides us with the dump() method that happily reads our Python structure. Just as for CSV, we simply return and dump this output to stdout.

The Excel export requires more code as we need to create a more complex model of the Excel workbook and worksheet(s) that will hold the data. This activity is followed by a similar iterative approach. We have two loops— the outer one goes over every line in the source dataset iterated and the inner one iterates over every field in the given line.

After all this, we save the Book instance into a file-like stream that we can return to stdout and use it both in read files and the files consumed by the web service.

There's more...

This, of course, is just a small set of possible data formats that we could be exporting to. It is fairly easy to modify the behavior. Basically, two places need changes: the import and export functions. The function for import needs to change if we want to import a new kind of data source.

If we want to add a new export format, we need to first add functions that will return a stream of formatted data. Then, we need to update the write_data() function to add the new elif branch to have it call our new write_* function.

One thing we could also do is make this a Python package, so we can reuse it over more projects. In that case, we would like to make import more flexible and add some more configuration features for import.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Exporting data to JSON, CSV, and Excel

Create new playlist

Sign In

Sign Up

Exporting data to JSON, CSV, and Excel

Getting ready

How to do it...

How it works...

There's more...

Table of Contents for
Exporting data to JSON, CSV, and Excel