Chapter 7. Data and Report Generation

The needs of every e-commerce application can vary widely when it comes to reports, metrics, and data exports. Some businesses will want to capture detailed profiles of their customers and what they are purchasing in order to optimize promotions and marketing activities for their particular needs. Others will be interested in making data available internally, to provide the boss updates on how many jars of cranberry preserves sold in December last year versus this year.

In this chapter, we will discuss a toolbox of Python libraries and Django applications to assist with whatever reporting needs that may arise. These topics include:

  • Serializing and exposing data
  • Tracking and improving search engine rank using sitemaps
  • Generating charts and graph-based reports
  • Exporting information via RSS and Atom feeds
  • Salesforce integration

We will be using a variety of tools, many builtin to Django. As in other chapters, however, we will discuss some third-party libraries. These are all relatively stable and mature, but as with all open source technology, new versions could change their usage at any time.

Exposing data and APIs

One of the biggest elements of the web applications developed in the last decade has been the adoption of so-called Web 2.0 features. These come in a variety of flavors, but one thing that has been persistent amongst them all is a data-centric view of the world. Modern web applications work with data, usually stored in a database, in ways that are more modular and flexible than ever before. As a result, many web-based companies are choosing to share parts of their data with the world in hopes of generating "buzz", or so that interested developers might create a clever "mash-up" (a combination of third-party application software with data exposed via an API or other source).

These mash-ups take a variety of forms. Some simply allow external data to be integrated or imported into a desktop or web-based application. For example, loading Amazon's vast product catalog into a niche website on movie reviews. Others actually deploy software written in web-based languages into their own application. This software is usually provided by the service that is exposing their data in the form of a code library or web-accessible API.

Larger web services that want to provide users with programmatic access to their data will produce code libraries written in one or more of the popular web-development languages. Increasingly, this includes Python, though not always, and typically also includes PHP, Java, or Perl. Often when an official data library exists in another language, an enterprising developer has ported the code to Python.

Increasingly, however, full-on code libraries are eschewed in favor of open, standards-based, web-accessible APIs. These came into existence on the Web in the form of remote procedure call tools. These mapped functions in a local application written in a programming language that supports XML-RPC to functions on a server that exposed a specific, well-documented interface. XML and network transport protocols were used "under the hood" to make the connection and "call" the function.

Other similar technologies also achieved a lot of use. For example, many web-services provide Simple Object Access Protocol (SOAP) interface, which is the successor to XML-RPC and built on a very similar foundation. Other standards, sometimes with proprietary implementations, also exist, but many new web-services are now building APIs using REST-style architecture.

REST stands for Representational State Transfer and is a lightweight and open technique for transmitting data across the Web in both server-to-server and client-to-server situations. It has become extremely popular in the Web 2.0 and open source world due to its ease of use and its reliance on standard web protocols such as HTTP, though it is not limited to any one particular protocol.

A full discussion of REST web services is beyond the scope of this book. Despite their simplicity, there can arise many complicated technical details. Our implementation in this chapter will focus on a very straightforward, yet powerful design.

REST focuses on defining our data as a resource that when used with HTTP can map to a URL. Access to data in this scheme is simply a matter of specifying a URL and, if supported, any look-up, filter, or other operational parameters. A fully featured REST web service that uses the HTTP protocol will attempt to define as many operations as possible using the basic HTTP access methods. These include the usual GET and POST methods, but also PUT and DELETE, which can be used for replacement, updating, or deletion of resources.

There is no standard implementation of a REST-based web service and as such the design and use can vary widely from application to application. Still, REST is lightweight enough and relies on a well known set of basic architectures that a developer can learn a new REST-based web service in a very short period of time. This gives it a degree of advantage over competing SOAP or XML-RPC web services. Of course, there are many people who would dispute this claim. For our purposes, however, REST will work very well and we will begin by implementing a REST-based view of our data using Django.

Writing our own REST service in Django would be very straightforward, partly because URL mapping schemes are very easy to design in the urls.py file. A very quick and dirty data API could be created using the following super-simple URL patterns:

(r'^api/(?P<obj_model>w*)/$', 'project.views.api')
(r'^api/(?P<obj_model>w*)/(?P<id>d*)/$', 'project.views.api')

And this view:

from django.core import serializers

def api(request, obj_model, obj_id=None):
    model = get_model(obj_model.split(".")) 
    if model is None:
        raise Http404 
    if obj_id is not None:
        results = model.objects.get(id=obj_id) 
    else:
        results = model.objects.all() 
    json_data = serializers.serialize('json', results) 
    return HttpResponse(json_data, mimetype='application/json'))

This approach as it is written above is not recommended, but it shows an example of one of the simplest possible data APIs. The API view returns the full set of model objects requested in JSON form. JSON is a simple, lightweight data format that resembles JavaScript syntax. It is quickly becoming the preferred method of data transfer for web applications.

To request a list of all products, for example, we only need to access the following URL path on our site: /api/products.Product/. This uses Django's app.model syntax to refer to the model we want to retrieve. The view uses get_model to obtain a reference to the Product model and then we can work with it as needed. A specific model can be retrieved by including an object ID in the URL path: /api/products.Product/123/ would retrieve the Product whose ID is 123.

After obtaining the results data, it must be encoded to JSON format. Django provides serializers for several data formats, including JSON. These are all located in the django.code.serializers module. In our case, we simply pass the results QuerySet to the serialize function, which returns our JSON data. We can limit the fields to be serialized by including a field's keyword argument in the call to serialize:

json_data = serializers.serialize('json', results, fields=('name','price'))

We can also use the built-in serializers to generate XML. We could modify the above view to include a format flag to allow the generation of JSON or XML:

def api(request, obj_model, obj_id=None, format='json'):
    model = get_model(*obj_model.split()) 
    If model is None:
        raise Http404 
    if obj_id is not None:
        results = model.objects.get(id=obj_id) 
    else:
        results = model.objects.all() 
    serialized_data = serializers.serialize(format, results)
    return HttpResponse(serialized_data, 
                                   mimetype='application/' + format)

Format could be passed directly on the URL or better yet, we could define two distinct URL patterns and use Django's keyword dictionary:

(r'^api/(?P<obj_model>w*)/$', 'project.views.api'),
(r'^api/(?P<obj_model>w*)/xml/$', 'project.views.api',
{'format': 'xml'}),
(r'^api/(?P<obj_model>w*)/yaml/$', 'project.views.api',
{'format': 'yaml'}),
(r'^api/(?P<obj_model>w*)/python/$', 'project.views.api',
{'format': 'python'}),

By default our serializer will generate JSON data, but we've got to provide alternative API URLs that support XML, YAML, and Python formats. These are the four built-in formats supported by Django's serializers module. Note that Django's support for YAML as a serialization format requires installation of the third-party PyYAML module.

Building our own API is in some ways both easy and difficult. Clearly we have a good start with the above code, but there are many problems. For example, this is exposing all of our Django model information to the world, including our User objects. This is why we do not recommend this approach. The views could be password protected or require a login (which would make programmatic access from code more difficult) or we could look for another solution.

Django-piston: A mini-framework for data APIs

One excellent Django community project that has emerged recently is called django-piston. Piston allows Django developers to quickly and easily build data APIs for their web applications using a REST-style interface. It supports all the serialization formats mentioned above and includes sophisticated authentication tools such as OAuth as well as HTTP Basic.

The official repository for django-piston is hosted on bitbucket at the following URL: http://bitbucket.org/jespern/django-piston/wiki/Home.

Complete documentation on the installation and usage of Piston are available on the bitbucket site and in the readme file.

Piston supports the full set of HTTP methods: GET, POST, PUT, and DELETE. GET is used for the retrieval of objects, POST is used for creation, PUT is used for updating, and DELETE is used for deletion. Any subset of these operations can be defined on a model-by-model basis. Piston does this by using class-based "handlers" that behave somewhat like class-based generic views.

To define a handler on our Product model, we would write something like this:

from piston.handler import BaseHandler
from coleman.products import Product


class ProductHandler(BaseHandler):
    allowed_methods = ('GET',)
    model = Product

    def read(self, request, post_slug):
          ...

The ProductHandler defines one operation, the GET, on our Product model. To define the behavior when a GET request is made to a Product object, we write a read method. Method names for other HTTP operations include: create for POST, update for PUT, and delete for DELETE. Each of these methods can be defined on our ProductHandler and added to the allowed_methods class variable and Piston will instantly enable them in our web-based API.

To utilize our ProductHandler, we must create the appropriate URL scheme in our urls.py file:

from piston.resource import Resource
from coleman.api.handlers import ProductHandler


product_resource = Resource(ProductHandler)
(r'^product/(?P<slug>[^/]+)/', product_resource)

Our Product objects and their data are now accessible using the URL above and the Product slug field, as in: /api/product/cranberry-sauce/.

Piston allows us to restrict the returned data by including fields and exclude attributes on our handler class:

class ProductHandler(BaseHandler):
    fields = ('name', 'slug', 'description') 
    exclude = ('id', 'photo')
        ...

Piston also makes it very easy to request our data in a different format. Simply pass the format as a GET parameter to any Piston-enabled URL and set the value to any of the formats Piston supports. For example, to get our Cranberry Sauce product information in YAML format use: /api/product/cranberry-sauce/?format=yaml.

Adding authentication to our handlers is also very simple. Django-piston includes three kinds of authentication handlers in the current release: HTTP BASIC, OAuth, and Django. The Django authentication handler is a simple wrapper around the usual Django auth module. This means users will need cookies enabled and will be required to log in to the site using their Django account before this auth handler will grant API access.

The other two handlers are more suitable for programmatic access from a script or off-site. HTTP BASIC uses the standard, web-server based authentication. In a typical Apache configuration, this involves defining user and password combinations in an htpasswd file using the htpasswd command line utility. See the web server's documentation for more details. It's also possible to configure Apache authentication against Django's auth module to support HTTP BASIC auth against the Django database. This involves adding the django.contrib.auth.handlers.modpython handler to the Apache configuration. See the Django manual for additional details.

To attach BASIC authentication to the handler for our Product model, we will include it in our urls.py file as part of the Resource object definition:

from piston.authentication import HttpBasicAuthentication

basic_auth = HttpBasicAuthentication(realm='Products API') product_resource = Resource(handler=ProductHandler, auth=basic_auth)

Our Product URLs will now be available only to clients who have passed HTTP BASIC authentication with a user name and password.

As we've seen, Piston makes building a REST-based API for our Django projects extremely easy. It also uses some Django design principles we've seen earlier. For example, the authentication tools are designed to be pluggable. We can examine the HttpBasicAuthentication class in piston.authentication as a template to write our own. A custom authentication class can be plugged in to the Resource definition with just a small change to the code. Despite being easily customizable, Piston's default setup includes enough built-in functionality for the majority of data API needs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.91.44