Chapter 12. Advanced Django Deployment

As with Chapter 11, “Advanced Django Programming,” this chapter consists of a handful of mostly unrelated sections on varied topics. Chapter 11 dealt with topics relating to your own application code; here, we go over topics that are a little more tangential and have to do with deploying your applications, updating the environment they run in, or modifying Django itself.

Writing Utility Scripts

Django is a Web framework, but that doesn’t mean you can’t interact with it outside of a browser. In fact, one of the great things about Django being written in Python, as opposed to a Web-specific language, such as ColdFusion or PHP, is that it’s designed for use in a command-line environment. You can have periodic or ad-hoc operations you need to perform on the data managed by your Django application yet not have the need to create a full Web interface.

Some common use cases for Django-powered utility scripts include

  • Creating cached values or documents that you rebuild every night (or every hour)

  • Importing data into your Django models

  • Sending out scheduled e-mail notifications

  • Generating reports

  • Performing cleanup tasks (for example, deleting stale sessions)

This is an aspect of using Django where solid Python skills are especially valuable. When you write a Django utility script, you’re just writing Python with a small amount of environment setup required by Django.

The following are a few examples of Django utility scripts. We explain each one after showing the code, so you can determine what approach works best for your project.

Cronjobs for Cleanup

In SQLite (and some PostgreSQL) databases with significant churn—deletion of old records and adding of new records—a periodic “vacuum” operation is useful to reclaim unused space. On dpaste.com, for example, most entries stay in the database for a month and are then purged. This means that every week there is about 25 percent churn.

Without periodic vacuuming, the database would become gigantic. Even though SQLite claims to support database files up to 4GB in size, we’d rather not test that limit. The following is what the vacuuming script on dpaste.com looks like. It runs nightly under the control of cron. (If running on Windows-based systems, you have to create its automation as a “service.”)

import os
import sys
os.environ['DJANGO_SETTINGS_MODULE'] = "dpaste.settings"
from django.conf import settings

def vacuum_db():
    from django.db import connection
    cursor = connection.cursor()
    cursor.execute("VACUUM")
    connection.close()

if __name__ == "__main__":
    print "Vacuuming database..."
    before = os.stat(settings.DATABASE_NAME).st_size
    print "Size before: %s bytes" % before
    vacuum_db()
    after = os.stat(settings.DATABASE_NAME).st_size
    print "Size after: %s bytes" % after
    print "Reclaimed: %s bytes" % (before - after)

At the top of this script, after the first two imports, we do some manual setup of our environment—specifically, we set the all-important DJANGO_SETTINGS_MODULE environment variable so that Django knows which project we are working with.

This script assumes both Django itself and the parent directory of your project are on your Python path. They can be symlinked from site-packages, installed as Python eggs, or included in a PYTHONPATH environment variable. If you need to set them manually inside your script, you have extra lines such as these after the initial two import statements:

sys.path.append('/YOUR/DJANGO/CODEBASE')
sys.path.append('/YOUR/DJANGO/PROJECTS')

Substitute your own paths, of course—the first one points to the directory where the Django source code lives on our system (like all Django pioneers we are of course running it from a fresh Subversion checkout); the second adds our project directory to sys.path so all our projects can be found by the various import statements that reference them.

The key thing to remember about writing Django-based utility scripts is that in the end it’s just a Python script. As long as Python knows where to find Django and your project, and Django knows where to find your settings file, you’re all set.

Data Import/Export

The command line is also a good place for tools that are used infrequently and not by end users. For example, if you periodically receive data that needs to be inserted into your database, you can write a Django utility script to handle that task.

Now, if your Django project is sitting on top of a SQL database, you can wonder why you would go through the seemingly indirect route of creating a Python/Django script to handle the import when you could instead just use SQL.

The answer is typically that your data needs to be converted or massaged in some way before being converted to SQL. The fact is, if you’re going to import some foreign data format more than once or twice, it is less work to write a tool that works directly with the provided data format (CSV, XML, JSON, plain text, or what have you) instead of doing one-off search and replace operations in your text editor in an attempt to wrangle that data into a sequence of SQL INSERT statements.

This is an area where the “batteries included” aspect of Python—specifically the fact that it has libraries for parsing an incredibly wide variety of file formats—really pays off. For example, if you were building an e-mail archive and wanted to import a Unix-style “mbox” file, you could leverage Python’s email module in the standard library rather than writing your own clever, but inevitably either labor-intensive or flawed (or both) parser.

The following is a simple model that can be used to store e-mail messages—in fact, it is very much like the model used on purportal.com for the “spammy scam” message archive.

class Message(models.Model):
    subject = models.CharField(max_length=250)
    date = models.DateField()
    body = models.TextField()

Assuming the presence of such a module and the presence of an mbox file whose path is given in your project’s settings.MAILBOX setting, you can import mail into the model like this:

import os, mailbox, email, datetime
try:
    from email.utils import parsedate  # Python >= 2.5
except ImportError:
    from email.Utils import parsedate  # Python < 2.5

os.environ['DJANGO_SETTINGS_MODULE'] = "YOURPROJECT.settings"
from django.conf import settings
from YOURAPP.models import Message

mbox = open(settings.MAILBOX, 'rb')
for message in mailbox.PortableUnixMailbox(mbox, email.message_from_file):
    date = datetime.datetime(*parsedate(message['date'])[:6])
    msg = Message(
        subject=message['subject'],
        date=date,
        body=message.get_payload(decode=False),
    )
    msg.save()

print "Archive now contains %s messages" % Message.objects.count()
# Depending on your application, you might clear the mbox now:
# open(MAILBOX, "w").write("")

As mentioned, this is only one small example of how to write scripts concerning your Django projects. Python’s standard library—to say nothing of the collection of third-party libraries available—covers an enormous amount of ground. If you plan to become a serious Django developer, it is definitely worth your time to skim the Python “stdlib” (http://docs.python.org/lib/), so you have an idea of what’s out there.

Customizing the Django Codebase Itself

Customizing the internal code of Django is a measure of last resort. Not because it’s difficult—it’s Python, after all, and it’s a clean codebase with significant amounts of internal documentation in docstrings and comments. No, the reason we discourage you from leaping into Django internals to “fix” some problem you are having is that it’s often not worth the effort.

Django is a project under active development. Because stability is prized, keeping up with the main line or “trunk” version of Django is a pretty safe prospect. As new features get added and old bugs get fixed, you can follow the reports on code.djangoproject.com and upgrade any time you’re comfortable. However, if you’ve got your own customized version, you are effectively locking yourself out of upgrades to the trunk. Or, at best, you are setting yourself up for a great deal of work as you try to merge the new updates with your old changes. Distributed version control systems can make this easier if you must do it. (See Appendix C, “Tools for Practical Django Development,” for more on that approach.)

Finally, if you find yourself irresistably drawn to hacking on the Django codebase itself, think about whether the change you are making for your own purposes can be effectively generalized so it would make a useful addition to the framework for others. If you think this is true, be sure to read “Contributing to the Django Project” in Appendix F, “Getting Involved in the Django Project.”

Caching

High-traffic sites are rarely limited in their performance by how fast the Web server can send data. The bottleneck is almost always in the generation of that data; the database is not able to answer queries quickly enough, or the server’s CPU can be bogged down executing the same code over and over for every request. The answer to this problem is caching—saving a copy of the generated data, so the relatively “expensive” database or computation steps don’t have to be performed every time.

For high-traffic sites, caching is a must, no matter what back-end technology you use. Django has fairly extensive support for caching with three different levels of control depending on what works for your site architecture. It also provides a handy template tag that enables you to identify particular sections of rendered pages that should be cached.

A Basic Caching Recipe

Django’s cache framework presents the first-time user with a potentially confusing number of possible configurations. Although the needs of every site (and the capabilities of every server) are different, you have a better handle on how to use this tool if we begin with a concrete example. As a bonus, it happens to be a configuration that’s suitable for a large number of sites—so this can be all you need to know about caching in Django.

Get a Baseline

The entire point of caching is improving site performance, so it makes sense to make some measurements beforehand. Every site is different, and the only way you know the effect of caching on your site is to test it.

One widely available tool for doing basic server benchmarking is ab, the Apache benchmarking tool. If you have Apache installed, you have ab as well. On any POSIX-based system such as Linux or Mac OS X, it should already be available due to being on one of your paths. On a Windows-based system, it can be found where you’ve installed Apache, for example, C:Program FilesApache Software FoundationApache2.2in. (For more usage information, see its manual page at http://httpd.apache.org/docs/2.2/programs/ab.html.)

The way it works is you give it a URL and a number of requests to make, and it gives you performance statistics. Here’s the output of running ab on our example blog app from Chapter 2, “Django for the Impatient: Building a Blog.” The bottom line here, literally and figuratively, is the relative change in “requests per second.” Don’t think too much about the absolute numbers in our example because they’re tied to the the particular three-year-old laptop we used to run this test—hopefully your server performance is better!

$ ab -n 1000 http://127.0.0.1:8000/blog/
...
Benchmarking 127.0.0.1 (be patient)
...
Finished 1000 requests
...
Time taken for tests:   27.724 seconds
...
Requests per second:    36.07 [#/sec] (mean)

So, about 36 requests per second. Now let’s turn on caching and see what kind of difference it makes.

Add the Middleware

Django’s caching features happen via a piece of middleware that is not active by default. To use it, open your settings.py file and add django.middleware.cache.CacheMiddleware to your MIDDLEWARE_CLASSES setting. In general, you want to add it at the end because certain other middleware (notably SessionMiddleware and GZipMiddleware) has the potential to interfere with the HTTP Vary header on which the caching framework depends.

Set the Cache Type

The caching framework offers no less than four cache data storage mechanisms or backends. To keep things simple for now we use Django’s default cache backend, a local-memory cache called locmem. It stores cached data in RAM, which makes retrieval instantaneous. Though many caching solutions store the cache on disk, an in-memory cache can give great performance benefits. (If you’re skeptical, see the following for discussion of Memcached, an extremely high-performance cache originally designed to support LiveJournal.com.)

Add this line to your settings.py:

CACHE_BACKEND = "locmem://"

(The peculiar, pseudo-URL style of this setting makes more sense when you’ve seen some of the other backends, which use the URL format to encapsulate configuration arguments. Because it’s the default backend, strictly speaking we don’t need to set it unless we want something different. However, as Python lore says, “Explicit is better than implicit,” and switching backends or adding some of the configuration parameters outlined next is simpler if you have this setting in place. )

Try It Out

That’s all it takes to turn on basic, site-wide caching in Django. Now let’s see how our new, cache-enabled site performs.

$ ab -n 1000 http://127.0.0.1:8000/blog/
...
Benchmarking 127.0.0.1 (be patient)
...
Finished 1000 requests
...
Time taken for tests:   8.750 seconds
...
Requests per second:    114.29 [#/sec] (mean)

That’s more than three times faster, and all it took was two lines of code in our settings.py. Also, keep in mind that our blog app is very lightweight in terms of database queries and business logic; you can generally expect the improvement to be much greater for more complicated apps.

Caching Strategies

Though the results we got with the previous simplest possible cache implementation are impressive, they aren’t suitable for all situations. We haven’t addressed specifying how long cached items should live, the caching of content that is not a full Web page (for example, complex sidebars or widgets), what to do about pages that need to be exempt from caching (admin pages, for example), or what arguments are available for performance tuning. Let’s talk about some of them now.

Site-wide

What we enabled previously is known as the site-wide caching feature. Django simply caches the result of all requests that don’t involve GET or POST arguments. We’ve gone through the simplest possible usage, but there are a few other settings.py settings to help you tune it.

  • CACHE_MIDDLEWARE_SECONDS: The number of seconds that a cached page should be used before being replaced by a fresh copy. The default value for this setting is 600 (ten minutes).

  • CACHE_MIDDLEWARE_KEY_PREFIX: A string that is used to prefix keys to items in the cache. If you are sharing a cache across several Django sites, whether in memory, files, or a database, this ensures no key collisions occur across site boundaries. You can use any unique string for these settings—the site’s domain name or str(settings.SITE_ID) are both sensible choices.

  • CACHE_MIDDLEWARE_ANONYMOUS_ONLY: Simple URL-based caching doesn’t always play nicely with interactive Web applications, where the content at a given URL can change frequently in response to user input. Even if the public side of your site doesn’t involve user-created content—if you’re using the Django admin app—you want to set this setting to True to make sure that your changes (additions, deletions, edits) are instantly reflected in the pages of the admin site.

If Django’s page caching works for your needs, then the previous information is as much as you need to know. However, it’s not suitable for all situations. Let’s see what other, more granular options Django offers for caching and when you can take advantage of them.

The Per-view Cache

The site-wide cache assumes every part of your site should be cached for the same amount of time. However, you can have other ideas. For instance, let’s say you run a news site and track the popularity of individual stories, aggregating those statistics to generate lists of the most popular pages. A “Yesterday’s Top Stories” list can clearly be cached for 24 hours. “Today’s Top Stories,” on the other hand, changes over the course of the day. To strike a balance between keeping the content fresh and the server load reasonable, we might want that page to be cached for only five minutes.

Presuming those two lists are generated by two separate views, turning on caching is as simple as applying a decorator.

from django.views.decorators.cache import cache_page

@cache_page(24 * 60 * 60)
def top_stories_yesterday(request):
    # ... retrieve stories and return HttpResponse

@cache_page(5 * 60)
def top_stories_today(request):
    # ... retrieve stories and return HttpResponse

The cache_page decorator takes a single argument, the number of seconds that the page should be cached. That’s it; there’s nothing else you have to do to make this work.

The per-view decorators depend on the fact that all Django views accept an HttpRequest object and return an HttpResponse object. They use the former to learn what URL was requested; the cached data is stored key-value style with the URL as the key. They use the latter to set appropriate cache-related headers on the HTTP response.

Controlling Cache-Related Headers

Up until this point in our coverage of caching, we’ve focused on what you do on your server to determine how often cached content is regenerated. In practice, caching is a conversation between your server and the clients that connect to it (including external cache servers that you might not have control over). This conversation is shaped by special headers, called “cache-control” headers, that you can pass along in your HTTP responses.

The most basic form of additional cache control Django gives you is a “never cache” decorator.

from django.views.decorators.cache import never_cache

@never_cache
def top_stories_this_second(request):
    # ... we don't want anybody caching this

This instructs downstream recipients of your page that it is not to be cached. As long as they abide by that standard (RFC 2616), it won’t be. The never_cache decorator is actually a wrapper around a more powerful and flexible caching-related tool that Django offers: django.views.decorators.cache.cache_control.

The cache_control decorator modifies the Cache-control header of your HttpResponse to communicate your caching policies to Web clients and downstream caches. You can pass the decorator any of six boolean settings (public, private, no_cache, no_transform, must_revalidate, proxy_revalidate) and two integer settings (max_age, s_maxage).

For example, if you want to force clients and downstream caches to “revalidate” your page—to check whether it has been modified, even if the cached version they are holding has not yet expired—you can decorate your view function such as:

from django.views.decorators.cache import cache_control

@cache_control(must_revalidate=True)
def revalidate_me(request):
    # ...

Most sites are unlikely to need many, if any, of the fine-grained options provided by the cache_control decorator. But if you do need them, it’s nice to have this functionality available rather than having to manually alter the headers of the HttpResponse object yourself.

Django also gives you control over the Vary HTTP header. Normally, content is cached using just the URL as the key. However, you can have other factors that affect what content is returned for a specific URL—logged-in users, for example, can see a different page than anonymous ones, or the response can be customized based on the user’s browser type or language setting. All those factors are communicated to your server via HTTP headers in the page request. The “Vary” response header lets you specify exactly which of those headers have an effect on the content.

For example, if you are sending different content from the same URL depending on what the request’s Accept-Language header says, you can tell Django’s caching mechanism to consider that header as well.

from django.views.decorators.vary import vary_on_headers

@vary_on_headers("Accept-Language")
def localized_view(request):
    # ...

Because varying on the “Cookie” header is a common case, there’s also a simple vary_on_cookie decorator for convenience.

The Object Cache

The previous caching options focus on caching pages—every page on your site in the case of the site-wide cache and individual pages (views) in the case of the per-view cache. These solutions are extremely simple to implement. However, in some situations you can leverage this caching infrastructure directly to store individual chunks of data.

Let’s say you’re running a busy site and have an information box on many pages as the result of some expensive process—for example, it can be the result of processing a large file that is periodically updated. Your pages are otherwise relatively quick to generate, and this generated information is displayed on many pages, and then it makes sense to use object caching.

Django’s object cache—really just a simple key/value store in which you can assign expiration times—enables you to save and retrieve arbitrary objects, so you can focus on the ones you know to be resource-intensive. Here’s some code based on our imaginary example with no caching yet.

def stats_from_log(request, stat_name):
    logfile = file("/var/log/imaginary.log")
    stat_list = [line for line in logfile if line.startswith(stat_name)]
    # ... go on to render a template which  display stat_list

Now, although that list comprehension in line 3 might be slick, it’s not going to be particularly speedy on a large log file. What we want to do is insulate ourselves from having to assemble stat_list on every request. Our primary tools for solving this is the cache.get and cache.set methods from django.core.cache.

from django.core.cache import cache

def stats_from_log(request, stat_name):
    stat_list = cache.get(stat_name)
    if stat_list == None:
        logfile = file("/var/log/imaginary.log")
        stat_list = [line for line in logfile if line.startswith(stat_name)]
        cache.set(stat_name, stat_list, 60)
    # ... go on to render a template which  display stat_list

The cache.get call returns any cached value (object) for the given key—until that object expires at which point cache.get returns None, and the item is deleted from the cache.

The cache.set method takes a key (a string), a value (any value that Python’s pickle module can handle), and an optional time to expiration (in seconds). If you omit the expiration argument, the timeout value from the CACHE_BACKEND setting is used. See the following for details on CACHE_BACKEND.

There’s also a get_many method, which takes a list of keys and returns a dictionary mapping of those keys to their (possibly still-cached) values. One final note in case you didn’t notice: The object cache does not depend on Django’s caching middleware—we merely imported django.core.cache and didn’t ask you to change any settings or add any middleware.

The cache Template Tag

Django provides one final caching option: the cache template tag. It provides a way to use the object cache from the template side without having to alter your view code. Although some developers do not like the idea of an optimization artifact such as caching appearing in the presentation layer, others find it expedient.

For the sake of example, let’s say we have a template that displays information on a long list of items and that the process of generating that information is somewhat resource-intensive. Let’s also say the page as a whole, outside this list, changes on every page load, so simple caching of the entire thing is of no benefit, and that the long list only needs to be updated every five minutes at most. Because the “expensiveness” of the list output is a combination of our display loop and the expensive method call inside the loop, there is not a single point of attack in our view or model code where we can solve this. With the cache template tag, though, we can apply caching right where we need it.

{% load cache %}
... Various uncached parts of the page ...
{% cache 300 list_of_stuff %}
    {% for item in really_long_list_of_items %}
        {{ item.do_expensive_rendering_step }}
    {% endfor %}
{% endcache %}
... Other uncached parts ...

The entire previous output of the for loop is cached. The cache tag takes two arguments: the length of time the content should be cached, in seconds, and a cache key for the content.

In certain cases, a static key for the content isn’t sufficient. For example, if your site is localized and the rendered data is specific to a the current user’s language preference, you want the cache key to reflect that fact. Luckily, the cache tag has an optional third parameter designed for this sort of situation. This parameter is the name of a template variable to be combined with the static key name (list_of_stuff in the previous example) to create the key.

To accommodate the fact that the contents of list_of_stuff is different for each language, your cache tag can look like this:

{% cache 300 list_of_stuff LANGUAGE_CODE %}

Note

This last example assumes you are passing RequestContext to your templates, which adds extra variables to your template context based on your context processor settings. The django.core.context_processors.i18n internationalization is activated by default and provides the LANGUAGE_CODE variable. See Chapter 6, “Templates and Form Processing,” for more on context processors.

Caching Backend Types

In your previous introduction to Django caching, you were introduced to the “locmem” cache type. Here is the full list:

  • dummy: For development only; actually performs no caching, but enables you to leave your other cache settings intact, so they work correctly with the cache on your live site (which uses one of the following nondevelopment backend types).

  • locmem: A reliable in-memory cache that is multiprocess safe. This is the default.

  • file: A filesystem cache.

  • db: A database cache (requires creating a special table in your database).

  • memcached: A high performance, distributed, in-memory cache; the most powerful option.

The CACHE_BACKEND setting takes a URL-style argument, beginning with the cache type followed by a colon and two slashes (three in the case of file). The development backends, dummy and locmem, take no further arguments. Configuration of the file, db, and memcached backends is described next.

The CACHE_BACKEND setting also takes three optional arguments.

  • max_entries: The maximum number of unique entries the cache stores; the default is 300. Remember, it’s likely that a relatively small number of items account for the bulk of the load of the server, so the cache doesn’t have to store everything to make an improvement. And because of the way expiry works, the cache tends to be dominated by frequently used items. If you have very little RAM or very large objects in the cache, reduce this value; if you have lots of RAM or are storing tiny objects, you can increase it.

  • cull_percentage: Poorly named, this argument is not a percentage; it specifies what portion of the entries in the cache are removed when the max_entries limit is reached. It defaults to 3, meaning the oldest 1/3 of the cache’s entries is deleted each time the cache becomes full.

  • timeout: The length of time cached content should live, in seconds; the default is 300 (five minutes). This number is used not only in determining when something should be deleted from the cache, but also in creating the various headers that tell Web clients about the cache-ability of the content you are sending.

These arguments are specified URL-argument style, such as this:

CACHE_BACKEND = "locmem://?max_entries=1000&cull_percentage=4&timeout=60"

That tells Django to use the local-memory cache, to keep 1000 entries, remove 1/4 of them when the cache becomes full, and set the expiry of cached items to 60 seconds after their creation time.

File

All that the file backend requires is a directory that is writable by the Web server process. Remember to use three slashes after the colon; the first two mark the end of the URL’s “scheme” portion, although the third indicates the path is absolute (that is it starts at the root of the filesystem). Like other file settings in Django, use forward slashes here, even on Windows.

CACHE_BACKEND = "file:///var/cache/django/mysite"

Of course, on a Windows-based system, it looks more like this:

CACHE_BACKEND = "file:///C:/py/django/mysite/cache"

Database

To use the database cache backend, you need to make sure you have the cache table set up in your database. The command to do this is

$ python manage.py createcachetable cache

The last argument is the table name; we recommend simply calling it cache as we have next, but you can call it whatever you like. Once you’ve set up the table, your CACHE_BACKEND setting becomes

CACHE_BACKEND = "db://cache/"

This is a very simple table with only three columns: cache_key (the table’s primary key), value (the actual data being cached), and expires (a datetime field; Django sets an index on this column for speed).

Memcached

Memcached is the most powerful caching option that Django provides. Not surprisingly, it is also more complicated to set up than the others. But if you need it, it’s worth it. It was originally created at Livejournal.com to ease the load that 20 million page views per day were putting on their database servers. It has since been adopted at Wikipedia.org, Fotolog.com, Slashdot.org, and other busy sites. Memcached’s home page is located at http://danga.com/memcached.

The major advantage Memcached offers over the other options listed here is easy distribution across multiple servers. Memcached is a “giant hash table in the sky”; you use it like a key-value mapping such as a Python dictionary, but it transparently spreads the data across as many servers as you give it.

Even though Memcached is much more heavy-duty than the other caching options presented here, it’s still just a cache and a memory-based one at that. It’s not an object database. One Memcached FAQ answers questions like “How is memcached redundant?” and “How does memcached handle failover?” and “How can you dump data from or load data into memcached?” with the answers “It’s not, it doesn’t. and you don’t!” Your reliable, persistent store of data is your database; Memcached just makes it fast. (For a great deal of fascinating detail about the creation and architecture of Memcached, see this article at http://www.linuxjournal.com/article/7451.)

You need two things to run Memcached: the software itself and the Python bindings that Django uses to talk to Memcached. You should be able to easily find a package for your Linux distribution or check Darwin Ports or Macports for your Mac OS X system. A Windows-based memcached can be found at http://splinedancer.com/memcached-win32.

Next, on the server where your Django app is running, you need to give Python the capability to talk to memcached. You can do this either with the pure-Python client python-memcached (http://tummy.com/Community/software/python-memcached) or a faster version called cmemcache that relies on a C library (http://gijsbert.org/cmemcache/). python-memcached is also available via Easy Install for mindless installation and setup.

Set up your server so it automatically starts the memcached daemon on bootup. The daemon has no configuration file. The following line tells memcached to start up in daemon mode, using 2GB of RAM, listening on IP address 10.0.1.1:

$ memcached -d -m 2048 -l 10.0.1.1

If you’re curious about the full spate of command line options for memcached, check its manual page or other documentation. On POSIX-based systems, you put this command in the operating system startup scripts, while on Windows-based systems, you have to set it up as a service.

Now that you have your memcached daemon running, tell Django to use it via the CACHE_BACKEND setting.

CACHE_BACKEND = "memcached://10.0.1.1:11211"

Django requires us to specify a port; by default, Memcached uses port 11211, and because we didn’t specify a port on our previous command line, that’s the port our Memcached server is listening on. If you’re using multiple servers, separate them by semicolons.

CACHE_BACKEND = "memcached://10.0.1.1:11211;10.0.5.5:11211"

Finally, although Memcached takes a bit more setup than the other backends, it is still just that—a backend—and thus it behaves identically to the rest once it’s installed properly.

Testing Django Applications

It has become an uncontested point that having an automated test suite for your application is a good thing. This is especially true in dynamically typed languages, such as Python, which don’t offer the safety net of compile-time type checking.

Note

This chapter presumes you already have caught the testing religion and focuses on the how rather than the why. If you feel like you could use more convincing, though, please see our additional reading and resources at withdjango.com.

Python is blessed with excellent testing support in the form of two complementary modules—doctest and unittest—as well as a number of popular independent tools. This chapter, like Django itself, focuses on the two built-in systems, but if you are curious about the wider world of Python testing you can learn more at the previous URL.

The bad news is testing Web applications is hard. They are inherently untidy with all kinds of real-world interactions such as database connections, HTTP requests and responses, e-mail generation, and so on. The good news is Django’s testing support makes it relatively easy to incorporate testing into your project. Before getting into the specifics of Django’s testing support, let’s review the Python foundations on which it’s built.

Doctest Basics

A doctest is simply a copy of an interactive Python session included in the docstring of a module, class, or function. We then use the doctest module’s test runner to discover, execute, and verify these tests. For a review of docstrings and their uses, see Chapter 1, “Practical Python for Django.”

For example, here’s a simplistic function we can easily write a test for.

def double(x):
    return x * 2

If we were testing this function manually in the interpreter, we can type something such as this:

>>> double(2)
4

We get the expected result, and declare the function has passed. To add the doctest to the function, we copy the literal text of that interactive session into the function’s docstring.

def double(x):
    """
    >>> double(2)
    4
    """
    return x * 2

When this function is tested by the doctest module’s test runner, the command double(2) is executed. If its output is “4,” all is well. If it’s not, a report is issued.

The test runner is smart enough to skip over nontest text, too (such as regular old documentation text not prefixed by or immediately following >>>), so we can add a more human-readable introduction.

def double(x):
    """
    This function should double the provided number. We hope.
    >>> double(2)
    4
    """
    return x * 2

Unittest Basics

The unittest module complements doctest with a different approach. It is an adaptation of the JUnit testing framework for Java, which in turn took its inspiration from the original unit testing work done in Smalltalk. Typical use of plain old unittest tests in Python looks something like this:

import unittest

class IntegerArithmeticTestCase(unittest.TestCase):
    def testAdd(self):
        self.assertEquals(1 + 2, 3)
    def testMultiply(self):
        self.assertEquals(5 * 8, 40)

if __name__ == '__main__':
    unittest.main()

This example is a complete script; when executed on its own, it runs its test suite. This happens via the unittest.main() call, which searches for all subclasses of unittest.TestCase and calls any methods beginning with test.

Running Tests

Django tests can be run with the following command:

./manage.py test

Django automatically detects tests (of either kind) in the models.py files of all applications listed in your INSTALLED_APPS setting. You have the option of narrowing these choices with additional arguments to the test command specifying an individual app or even a specific model within an app, for example, manage.py test blog or manage.py test blog.Post.

Additionally, the test command looks for unit tests in any files named test.py that live within app subdirectories (at the same level as your models.py). Therefore, you can keep your unit tests in either or both locations—whatever suits you best.

Testing Models

Models are typically tested with doctests because Django looks for these in each of your installed apps’ models when you run the manage.py test command. If you have a basic model that consists solely of data fields, you don’t have much to test. Your model in this case is a simple declarative representation of your data with the actual logic being handled by Django’s well-tested internals. As soon as you begin adding model methods, however, you are introducing logic that needs testing.

For example, let’s say you have a Person model that includes a birthdate field, and you have a model method to calculate the person’s age as of a certain date. That code can look something like this:

from django.db import models

class Person(models.Model):
    first = models.CharField(max_length=100)
    last = models.CharField(max_length=100)
    birthdate = models.DateField()

    def __unicode__(self):
        return "%s %s" % (self.first, self.last)

    def age_on_date(self, date):
        if date < self.birthdate:
            return 0
        return (date - self.birthdate).days / 365

Code, such as our age_on_date method, is notorious for susceptibility to “fencepost” errors, where boundary conditions (for example, testing on the person’s birthday) can yield incorrect results. Using doctests, we can guard against these and other errors.

If we were going to manually test our age method, we would run the Python interpreter, creating example objects and performing method calls, such as:

>>> from datetime import date
>>> p = Person(firstname="Jeff", lastname="Forcier", city="Jersey City",
... state="NJ", birthdate=date(1982, 7, 15))
>>> p.age_on_date(date(2008, 8, 10))
26
>>> p.age_on_date(date(1950, 1, 1))
0
>>> p.age_on_date(p.birthdate)
0

Of course, as you can surmise from what you already know about doctests, we can simply lift this straight out of the interactive session and place it into the docstring for the age_on_date method, so the method looks like this:

def age_on_date(self, date):
    """
    Returns integer specifying person's age in years on date given.

    >>> from datetime import date
    >>> p = Person(firstname="Jeff", lastname="Forcier",
    ... city="Jersey City", state="NJ", birthdate=date(1982, 7, 15))
    >>> p.age_on_date(date(2008, 8, 10))
    26
    >>> p.age_on_date(date(1950, 1, 1))
    0
    >>> p.age_on_date(p.birthdate)
    0
    """
    if date < self.birthdate:
        return 0
    return (date - self.birthdate).days / 365

Finally, we can use the aforementioned manage.py command to execute our test:

user/opt/code/myproject $ ./manage.py test myapp
Creating test database...
Creating table auth_permission
Creating table auth_group
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table django_admin_log
Creating table myapp_person
Installing index for auth.Permission model
Installing index for auth.Message model
Installing index for admin.LogEntry model
.
----------------------------------------------------------------------
Ran 1 test in 0.003s

OK
Destroying test database...

A lot of output for one little test, of course, but once your entire model hierarchy is fully tested, you get a nice line or two of periods with the occasional E or F when unexpected errors or test failures occur.

Finally, note that although doctests probably fulfill your needs most of the time, don’t hesitate to set up model-related unit tests when more complex business logic or inter-model relationships come into the picture. If you’re new to the world of testing, it takes time to figure out what to use when—but don’t give up!

Testing Your Entire Web App

Testing your web application from top to bottom is by no means an easy task and cannot be 100 percent automated using the same test scripts as every Web app is surely different. However, there are several tools out there that have proven quite useful.

The first one you should check out is built into Django itself and at the time of writing was quite new. It’s simply referred to as “the Django test client” and is documented on the official Django Web site http://www.djangoproject.com/documentation/testing/#testing-tools. The test client offers an easy way to mock up a typical request-response cycle and tests certain conditions throughout.

When you find you need more control than the built-in test client gives you, it can be time to move up to an older and more featureful tool called Twill, found at http://twill.idyll.org/. Like Django’s test client, it’s fully command-line based and is designed to be easy-to-use but still powerful—your typical Pythonic library.

Another test tool, one making waves more recently, is Selenium (see http://selenium.openqa.org/). Unlike the other two, it’s an HTML/JavaScript-based test tool created specifically for testing Web applications from a truly browser-based perspective. It supports most major browsers on most platforms, and because it’s JavaScript-based, can test Ajax functionality as well. The application codebase is compartmentalized into 2.5 to 3 distinct modes of operation: Selenium Core, Selenium RC (Remote Control), and Selenium IDE (Integrated Development Environment).

Selenium Core (http://selenium-core.openqa.org/) represents the heart of the (manual and automated) testing of Web applications. Some people refer to it as running Selenium in “bot mode.” It’s the workhorse. The core can also perform browser compatibility tests in addition to your Web app’s system functional testing.

Selenium RC (http://selenium-rc.openqa.org/) gives its users the ability to create full-fledged automated tests in a variety of programming languages. You write your test apps; they are run by Selenium Core—you can think of it as a scripting layer that sits on top of the Core, a “code mode” if you will.

A great tool to get started with Selenium is the IDE (see http://selenium-ide.openqa.org/). It’s written as a Firefox extension and is a full IDE that enables you to record and play back Web sessions as tests. It can also output tests in any of the languages supported by Selenium RC, so you can further enhance or modify those tests. You can set breakpoints as well as single-step through tests. Because it’s written on Firefox, one common FAQ is whether it exists for Internet Explorer (IE). The answer is no; however, the “record mode” of the IDE enables you to run them on IE via Selenium Core.

Aside from these three tools—the Django test client, Twill, and Selenium, you can find more reading on Web application testing at http://www.awaretek.com/tutorials.html#test and by following links found therein.

Testing the Django Codebase Itself

The Django framework itself has an extensive test suite. Every bugfix is generally accompanied by a regression test that ensures the bug doesn’t resurface unnoticed. New functionality is also typically accompanied by tests that ensure it works as intended.

You can run these tests yourself. This can be especially useful if you are having trouble running Django on a little-used platform or in an unusual configuration. Although it’s always wise to check your own code first, it’s possible you have uncovered an unusual bug that hasn’t been seen before. A failing test or tests in the built-in suite enables you to create a bug report that is taken much more seriously.

Running Django’s test suite is easy with one minor hurdle: It needs to be pointed to a settings file, so it knows how to create its test database. This can be the settings file of any active project, or you can create a dummy project (that is one with no apps) and fill out only the DATABASE_* settings in the settings.py file.

The test runner is at the top level of the installed Django directory in a directory called tests. (This is not to be confused with the test package that is part of the overall Django package.) Invoking the command looks like this:

$ tests/runtests.py --settings=mydummyproject.settings

This is a pretty quiet process because tests are only supposed to produce output if they fail. Because the test suite can take a while to run, you can see more feedback about the tests in progress. The runtests.py command takes a -v verbosity argument. At -v1 the process begins with output such as this:

..................................................E...EE...

The E characters indicate tests producing an error; this summary is followed by output that details the nature of those failures, so you can determine if it’s an artifact of your setup or an actual problem in Django.

At verbosity -v2 the output begins with a long list of imports, followed by messages detailing the creation of the test database and its tables (the “...” in the following example represent lines removed from the actual output for brevity).

Importing model basic
Importing model choices
Importing model custom_columns
Importing model custom_managers
Importing model custom_methods
Importing model custom_pk
Importing model empty
...
Creating test database...
Processing contenttypes.ContentType model
Creating table django_content_type
Processing auth.Message model
Creating table auth_message
Processing auth.Group model
Creating table auth_group
Processing auth.User model
Creating table auth_user
...

Seeing an indicated failure when running the test suite doesn’t necessarily mean you have found something wrong with Django—if you’re unsure, a good first step is to post to the Django-users mailing list with your configuration details and failed test output.

Summary

This chapter covered a number of advanced topics, and together with Chapter 11, we hope it’s given you a good overview of the kind of depth you can go into when it comes to Django development. These topics are only a sample of what’s possible, of course: Web application development, like most other computer-based disciplines, is not self-contained but branches out into many other general areas, much like Python itself, which is capable of handling a wide variety of situations and technologies.

At this point, you’re just about done with our book—congratulations! The appendices are all that’s left, although they—like these last two chapters—are still important parts of the book, covering a number of different subjects from command-line usage and installing and deploying Django to a list of external resources and development tools.

Finally, you might find it useful to go back and reread (or at least skim) the earlier parts of the book; now that you’ve seen just about all the topics we’ve wanted to cover, the earlier code examples and explanations can give you additional insight. This is true of any technical book, of course, not just this one.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.162.14