How to implement a per-site cache
</objective> <objective>How to implement a per-view cache
</objective> <objective>How to manage access to cached pages
</objective> </feature>One of Django’s best features is that it lets you easily generate web pages on-the-fly. It saves you a lot of development and maintenance time. However, this advantage comes at a price. Each time a request comes in, Django spends many cycles in database queries and page generation. On small to medium websites, this isn’t much of a problem. However, on large sites that receive several requests per second, this can quickly become a problem.
Django’s caching framework solves this problem nicely. Using the cache framework, you can cache web pages and objects so that subsequent requests for the same data can quickly be drawn from the cache rather than performing the resource-intensive query and processing again.
In this hour, we will discuss configuring caching and the different types of backends that are available. We will then discuss how to implement cache at the site, view, and object levels. Finally, we will cover how to use the response headers to manage caching in upstream caches.
Django’s caching system is easy to implement. The only thing you need to do is define in the settings.py
file which backend you want Django to use for caching. Django includes several backends that you can use depending on your particular needs. To enable a caching backend, set the CACHE_BACKEND
setting in the settings.py
file. For example, to enable local memory caching, you would use the following setting in the settings.py
file:
CACHE_BACKEND = 'locmem:///'
You can also configure how long Django caches data. You can add the following arguments to the CACHE_BACKEND
setting:
timeout
is the amount of time in seconds that the data is cached. The default is 300.
max_entries
is the maximum number of entries allowed in the cache before old values are removed. The default is 300.
cull_percentage
specifies the percentage of entries that are removed from the cache when the max_entries
limit is reached. The default is 3, meaning that one in three entries are removed. If you specify a value of 0
, the entire cache is emptied.
For example, the following setting keeps data in the cache for 2 minutes and allows 200 cached entries:
CACHE_BACKEND = 'locmem:///?timeout=120&max_entries=200'
The following sections describe each of the available backends you can configure for your website.
The database backend allows you to create a table in the database that can then be used to store and retrieve cached data. An advantage of using the database backend is that the cached data is persistently stored and is available even after a server reboot.
Before you can enable the database backend, you need to create a table in the database to store the cache using Python’s createcachetable
application at the root of your project. The table can be given any valid table name as long as the database doesn’t already have a table with that name. For example, the following command creates a database backend table called mysitecache:
python manage.py createcachetable mysitecache
To enable the database backend in the settings.py
file, set the CACHE_BACKEND
to the db://
backend and provide the name of the cache table. For example, to enable the database backend using the table just listed, you would use the following setting:
CACHE_BACKEND = 'db://mysitecache'
The file system backend allows you to define a directory in which Django stores and retrieves cached data. An advantage of using the file system backend is that the cached data is stored persistently in the file system and is available even after a server reboot.
Cached data is stored as individual files. The contents of the files are in Python pickled format. The file system backend uses the cache key as the filename (escaped for the purpose of security).
Before you can enable the file system backend, you need to create a directory in which to store the cached data.
Django needs read and write permissions to that directory. For example, if your web server is running as the user apache, you need to grant the apache user read and write access to the directory.
To enable the file system backend in the settings.py
file, you should set the CACHE_BACKEND
to the file://
backend and provide the full path to the directory. For example, if you create a directory called /var/temp/mysitecache on a Linux system, you would use the following setting to the settings.py
file:
CACHE_BACKEND = 'db:///mysitecache'
As another example, if you create a directory called c: empmysitecache on a Windows system, you would use the following setting to the settings.py
file:
CACHE_BACKEND = 'db://c:/temp/mysitecache'
The local memory backend uses the system memory to store and retrieve cached data. An advantage of using the local memory backend is that the cached data is stored in memory, which is extremely quick to access. The local memory backend uses locking to ensure that it is multiprocess and thread-safe.
Cached data that is stored in the local memory backend is lost if the server crashes. You should not rely on items in the local memory cache as any kind of data storage.
To enable the local memory backend in the settings.py
file, set the CACHE_BACKEND
to the locmem:///
backend:
CACHE_BACKEND = 'locmem:///'
The simple backend caches data in memory for a single process. This is useful when you are developing the website and for testing purposes. However, you should not use it in production.
To enable the simple backend in the settings.py
file, set the CACHE_BACKEND
to the simple:///
backend:
CACHE_BACKEND = 'simple:///'
The dummy backend does not cache any data, but it enables the cache interface. The dummy backend should be used only in development or test websites.
To enable the dummy backend in the settings.py
file, set the CACHE_BACKEND
to the dummy:///
backend:
CACHE_BACKEND = 'dummy:///'
The fastest and most efficient backend available for Django is the Memcached backend. It runs as a daemon that stores and retrieves data into a memory cache.
The Memcached backend is not distributed with Django; you must obtain it from www.django.com/memcached/. Before you can enable Memcached, you must install it, along with the Memcached Python bindings. The Memcached Python bindings are in the Python module, memcache.py
, which you can find at www.djangoproject.com/thirdparty/python-memcached/.
To enable the Memcached backend in the settings.py
file, you should set the CACHE_BACKEND
to the memcached://
backend and provide the IP address and port that the Memcached daemon is running on. For example, if the Memcached daemon is running on the local host (127.0.0.1) using port 12221, you would use the following setting:
CACHE_BACKEND = 'memcached://127.0.0.1:12221'
One of the best features of Memcached is that you can distribute the cache over multiple servers by running the Memcached daemon on multiple machines. Memcached treats the servers as a single cache.
After you have configured a caching backend, you can implement caching on the website. The easiest way to implement caching is at the site level. Django provides the django.middleware.cache.CacheMiddleware
middleware framework to cache the entire site. Add the following entry to the MIDDLEWARE_CLASSES
setting in the settings.py
file to enable caching for the entire website:
' django.middleware.cache.CacheMiddleware',
The CacheMiddleware
application does not cache pages that have GET
or POST
. When you design your website, make certain that pages that need to be cached do not require URLs that contain query strings.
After you enable the CacheMiddleware
framework, you need to add the following required settings to the settings.py
file:
CACHE_MIDDLEWARE_SECONDS
: Defines the number of seconds that each page should be kept in the cache.
CACHE_MIDDLEWARE_KEY_PREFIX
: If you are using the same cache for multiple websites, you can use a unique string for this setting to designate which site the object is being cached from to prevent collisions. If you are not worried about collisions, you can use an empty string.
You can enable the same cache for multiple sites that reside on the same Django installation. Just add the middleware to the settings.py
file for each site.
The CacheMiddleware
framework also allows you to restrict caching to requests made by anonymous users. If you set CACHE_MIDDLEWARE_ANONYMOUS_ONLY
in the settings.py
file to True
, requests that come from logged-in users are not cached.
If you use the CACHE_MIDDLEWARE_ANONYMOUS_ONLY
option, make certain that AuthenticationMiddleware
is enabled and is listed earlier in the MIDDLEWARE_CLASSES
setting.
The CacheMiddleware
framework automatically sets the value of some headers in each HttpResponse
. The Last-Modified
header is set to the current date and time when a fresh version of the page is requested. The Expires
header is set to the current date and time plus the value defined in CACHE_MIDDLEWARE_SECONDS
. The Cache-Control
header is set to the value defined in CACHE_MIDDLEWARE_SECONDS
.
Django’s caching makes it possible to implement the cache at the view level as well. Instead of caching every page in the website, you might want to cache only a few specific views.
Use the django.views.decorators.cache.cache_page
decorator function to implement caching for a specific view. The cache_page
decorator function caches the web page generated by a view function. The cache_page
decorator accepts one argument that specifies how many seconds to keep the web page cached.
The following code shows an example of implementing the cache_page
decorator function to cache the web page generated by myView
for 3 minutes:
@cache_page(180) def myView(request): . . .
Django provides a low-level cache API that allows you to access the cache from your Python code. Instead of caching entire pages, you may want to cache only specific data that will be used to render the display.
The django.core.cache.cache.set(key, value, timeout_seconds)
function allows you to store any Python object that can be pickled in the cache. The set()
function accepts three arguments—key
, value
, and timeout_seconds
. The key
argument is a string used to reference the object. The value
argument is the object to be cached. The timeout_seconds
argument specifies the number of seconds to cache the object.
The following code stores a list of Blog
objects in the cache for 25 seconds:
from django.core.cache import cache blogs = Blog.objects.all() cache.set('Blog_List', blogs, 25)
The django.core.cache.cache.get(key)
function accesses the cache and returns the value of the entry in the cache. If the entry is not found, None
is returned. For example, the following code accesses the Blog list stored in the cache using the preceding example:
blogs = cache.get('Blog_List')
The get()
function can also accept a second argument that specifies a value to be returned instead of None
if no entry is found:
blogs = cache.get('Blog_List', [])
The django.core.cache.cache.getmany(key_list)
function accesses the cache and returns the values of the multiple cache entries. The getmany()
function accepts a list of keys as its only argument. It returns a dictionary containing the keys from the arguments and their corresponding values in the cache. If the entry is not found or is expired, it is not included in the dictionary.
For example, the following code returns a dictionary containing the Date and User entries in the cache:
from datetime import datetime from django.core.cache import cache Date = datetime.now() cache.set('User', request.User, 60) cache.set('Date', datetime.now(), 60) . . . cache.get_many(['User', 'Date'])
The cache API is key-based, so you can store an object in one view function and retrieve it in another.
The django.core.cache.cache.delete(key)
function deletes the entry specified by the key
argument in the cache. The delete()
function has no return value and does not raise an error if the key is not found in the cache. The following example deletes the Blog_List
entry from the cache:
cache.delete('Blog_List')
So far in this hour we have discussed how to implement caching on your own website. Web pages are also cached upstream from your website by ISPs, proxies, and even web browsers. Upstream caching provides a major boost to the Internet’s efficiency, but it can also pose a couple of problems and security holes. For example, a home page that contains personal data about a user may be cached. A subsequent request to that home page would display that user’s information in another user’s browser.
The HTTP protocol solves these types of problems using Vary
and Cache-Control
headers. They allow websites to define some behavior and access requirements before cached pages are distributed. The following sections discuss how to implement these headers in your view functions.
The Vary
header allows you to define headers that an upstream cache engine checks when building its cache key. Then the cached page is used only if the values of headers in the Vary
header of the request match those in the database.
The Vary
header can be set in several different ways in the view function. The simplest way is to set the header manually in the HttpResponse
object using the following syntax:
def myView(request): . . . response = HttpResponse() response['Vary'] = 'User-Agent'
Setting the Vary
header manually in this way can potentially overwrite items that are already there. Django provides the django.views.decorators.vary.vary_on_headers()
decorator function so that you can easily add headers to the Vary
header for the view function.
The vary_on_headers()
decorator function adds headers to the Vary
header instead of overwriting headers that are already there. The vary_on_headers()
decorator function can accept multiple headers as arguments. For example, the following code adds both the User-Agent
and Content-Language
headers to the Vary
header:
from django.views.decorators import vary_on_headers @vary_on_headers('User-Agent', 'Content-Language') def myView(request): . . .
Another useful function to modify the Vary
header is the django.utils.cache.patch_vary_headers(response, [headers])
function. The patch_vary_headers()
function requires a response object as the first argument and a list of headers as the second. All headers listed in the second argument are added to the Vary
header of the response object. For example, the following code adds the User-Agent
and Content-Language
headers to the Vary
header inside the view function:
from django.utils.cache import patch_vary_headers def myView(request): . . . response = HttpResponse() patch_vary_headers(response, ['User-Agent', 'Content-Language'])
One of the biggest advantages of using the patch_vary_headers()
function is that you can selectively set which headers to add using code inside the view function. For example, you might want to add the Cookie
header only if your view function actually sets a cookie.
The values that get passed to vary_on_headers()
and patch_vary_headers()
are not case-sensitive. For example, the header user-agent
is the same as User-Agent
.
One of the most common headers that you will want to add to the Vary
header is the Cookie
header. For that reason, Django has added the django.views.decorators.vary_on_cookie()
decorator function to add just the Cookie
header to the Vary
header. The vary_on_cookie()
decorator function does not accept any parameters and simply adds the Cookie
header to Vary
:
from django.views.decorators import vary_on_cookie @vary_on_cookie def myView(request): . . .
One of the biggest problems with caching is keeping data that should remain private, private. Users basically use two types of caches—the private cache stored in the user’s web browser, and the public cache stored by ISPs or other upstream caches. Private data, such as credit card numbers and account numbers, should only be stored in the private cache.
HTTP handles the issue of keeping data private using the Cache-Control
header. The Cache-Control
header allows you to define directives that caching engines will use to determine if data is public or private and if it should even be cached.
The following are the currently valid directives for the Cache-Control
header:
public=True
private=True
no_cache=True
no_store=True
no_transform=True
must_revalidate=True
proxy_revalidate=True
max_age=num_seconds
s_maxage=num_seconds
Django provides the django.views.decorators.cache.cache_control()
decorator function to configure the directives in the Cache-Control
header. The cache_control()
decorator function accepts any valid Cache-Control
directive as an argument. For example, the following code sets the private
and max_age
directives in the Cache-Control
header for a view function:
from django.views.decorators.cache import cache_control @ cache_control(private=True, max_age=600) def myView(request): . . .
In this hour, we discussed how to configure caching for your website using different types of backends. You also learned that you can implement caching at the site level using the CacheMiddleware
framework. You can implement caching at the view level using the cache_page()
decorator function. You also can implement caching at the object level using a low-level API that allows you to get, set, and delete items in the cache.
We also discussed how to use the Vary
and Cache-Control
headers to manage how upstream caches cache web pages.
Does Django’s | |
Yes. The | |
Where can I go to better understand the | |
The header definitions can be found at www.w3.org/Protocols/rfc2616/rfc2616-sec14.html. |
The workshop consists of a set of questions and answers designed to solidify your understanding of the material covered in this hour. Try answering the questions before looking at the answers.
3.147.55.42