Exporting data to JSON, CSV, and Excel

While as producers of data visualization, we are mostly using other people's data, importing and reading data are our major activities. We do need to write or export data that we produced or processed, whether it is for our or others' current or future use.

We will demonstrate how to use the previously mentioned Python modules to import, export, and write data to various formats such as JSON, CSV, and XLSX.

For demonstration purposes, we are using the pregenerated dataset from the Importing data from fixed-width data files recipe.

Getting ready

For the Excel writing part, we will need to install the xlwt module (inside our virtual environment) by executing the following command:

$ pip install xlwt

How to do it...

We will present one code sample that contains all the formats that we want to demonstrate: CSV, JSON, and XLSX. The main part of the program accepts the input and calls appropriate functions to transform data. We will walk through separate sections of code explaining its purpose, as shown here:

  1. Import the required modules:
    import os
    import sys
    import argparse
    
    try:
        import cStringIO as StringIO
    except:
        import StringIO
    import struct
    import json
    import csv
  2. Then, define the appropriate functions for reading and writing data:
    def import_data(import_file):
        '''
        Imports data from import_file.
        Expects to find fixed width row
        Sample row: 161322597 0386544351896 0042
        '''
        mask = '9s14s5s'
        data = []
        with open(import_file, 'r') as f:
            for line in f:
                # unpack line to tuple
                fields = struct.Struct(mask).unpack_from(line)
                # strip any whitespace for each field
                # pack everything in a list and add to full dataset
                data.append(list([f.strip() for f in fields]))
        return data
    
    def write_data(data, export_format):
        '''Dispatches call to a specific transformer and returns data set.
        Exception is xlsx where we have to save data in a file.
        '''
        if export_format == 'csv':
            return write_csv(data)
        elif export_format == 'json':
            return write_json(data)
        elif export_format == 'xlsx':
            return write_xlsx(data)
        else:
            raise Exception("Illegal format defined")
  3. We separately specify separate implementation for each data format (CSV, JSON, and XLSX):
    def write_csv(data):
        '''Transforms data into csv. Returns csv as string.
        '''
        # Using this to simulate file IO,
        # as csv can only write to files.
        f = StringIO.StringIO()
        writer = csv.writer(f)
        for row in data:
            writer.writerow(row)
        # Get the content of the file-like object
        return f.getvalue()
    
    def write_json(data):
        '''Transforms data into json. Very straightforward.
        '''
        j = json.dumps(data)
        return j
    
    def write_xlsx(data):
        '''Writes data into xlsx file.
        
        '''
        from xlwt import Workbook
        book = Workbook()
        sheet1 = book.add_sheet("Sheet 1")
        row = 0
        for line in data:
            col = 0
            for datum in line:
                print datum
                sheet1.write(row, col, datum)
                col += 1
            row += 1
            # We have hard limit here of 65535 rows
            # that we are able to save in spreadsheet.
            if row > 65535:
                print >> sys.stderr, "Hit limit of # of rows in one sheet (65535)."
                break
        # XLS is special case where we have to
        # save the file and just return 0
        f = StringIO.StringIO()
        book.save(f)
        return f.getvalue()
  4. Finally, we have the main code entry point, where we parse argument-like files from the command line to import data and export it to the required format:
    if __name__ == '__main__':
        # parse input arguments
        parser = argparse.ArgumentParser()
        parser.add_argument("import_file", help="Path to a fixed-width data file.")
        parser.add_argument("export_format", help="Export format: json, csv, xlsx.")
        args = parser.parse_args()
    
        if args.import_file is None:
            print >> sys.stderr, "You must specify path to import from."
            sys.exit(1)
    
        if args.export_format not in ('csv','json','xlsx'):
            print >> sys.stderr, "You must provide valid export file format."
            sys.exit(1)
    
        # verify given path is accessible file
        if not os.path.isfile(args.import_file):
            print >> sys.stderr, "Given path is not a file: %s" % args.import_file
            sys.exit(1)
    
        # read from formatted fixed-width file
        data = import_data(args.import_file)
    
        # export data to specified format
        # to make this Unix-like pipe-able
        # we just print to stdout
        print write_data(data, args.export_format)

How it works...

In one broad sentence, we import the fixed-width dataset (as defined in the Importing data from fixed-width datafiles recipe) and then export that to stdout, so we can catch that in a file or as an input to another program.

We call out the programmer from the command line giving two mandatory arguments: the input filename and the export data format (JSON, CSV, and XLSX).

If we successfully parse those arguments, we dispatch the input file reading to a function import_data(), which returns the Python data structure (list of lists) that we can easily manipulate to get to the appropriate export format.

We route our request inside the write_data() function, where we just forward a call to the appropriate function (for example, write_csv()).

For CSV, we obtain the csv.writer() instance that we use to write every line of data we iterate over.

We just return the given string as we will redirect this output from our program to another program (or just to copy in a file).

The JSON export is not required for this example as the json module provides us with the dump() method that happily reads our Python structure. Just as for CSV, we simply return and dump this output to stdout.

The Excel export requires more code as we need to create a more complex model of the Excel workbook and worksheet(s) that will hold the data. This activity is followed by a similar iterative approach. We have two loops— the outer one goes over every line in the source dataset iterated and the inner one iterates over every field in the given line.

After all this, we save the Book instance into a file-like stream that we can return to stdout and use it both in read files and the files consumed by the web service.

There's more...

This, of course, is just a small set of possible data formats that we could be exporting to. It is fairly easy to modify the behavior. Basically, two places need changes: the import and export functions. The function for import needs to change if we want to import a new kind of data source.

If we want to add a new export format, we need to first add functions that will return a stream of formatted data. Then, we need to update the write_data() function to add the new elif branch to have it call our new write_* function.

One thing we could also do is make this a Python package, so we can reuse it over more projects. In that case, we would like to make import more flexible and add some more configuration features for import.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.22.145