Designing Sites for Effective Caching

One of the most important things while designing a new website or application is to think of how the user will eventually access your information. One of the corner stones of the Web is the use of proxy servers that cache things for the end user, usually done by Internet Service Providers (ISP) to improve the user experience of the Web. Web browsers also cache things locally to improve the speed of frequently accessed websites.

This is done by using four different HTTP headers that will command how the proxy servers and web browsers cache content:

  • Last-Modified
  • ETag
  • Expires
  • Cache-Control

We will go over these headers next, and how to manipulate them to improve the caching of your website or application.

The Last-Modified and ETag Headers

Most web servers nowadays return valid content automatically for these two headers, which are used by user agents (web browsers and proxy servers) to check whether a given URL should be cached or not. The following would be a good example of a HTTP response that includes these two headers:

HTTP/1.1 200 OK
Date: Thu, 29 Sep 2005 18:54:26 GMT
Server: Apache/1.3.33 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Sat, 29 Oct 2005 18:54:26 GMT
Last-Modified: Thu, 29 Sep 2005 18:54:26 GMT
ETag: “6e93-130-3852bdde”
Content-Length: 3489
Content-Type: text/html

The Last-Modified header is used to tell the user agent when this content was generated and it will be re-used later when the user agent needs to check whether the server has a fresher version of the content than the one it has cached locally. This check is done by yet another HTTP header, called If-Modified-Since, which will be passed to the server, and will help decide whether the local cache should be used or not. Here’s a simple example:

GET /cnn/images/1.gif HTTP/1.1
Host: i.cnn.net
Accept: image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
If-Modified-Since: Tue, 27 May 2003 19:00:10 GMT
Cache-Control: max-age=0

As you can see above, the 1.gif image was probably previously returned with a Last-Modified value of Tue, 27 May 2003 19:00:10 GMT, and that value is being sent back to the server to see whether the web browser should use the local cached copy of this image, or if it should request a new copy. Here’s the response from the server:

HTTP/1.x 304 Not Modified
Date: Sat, 03 Sep 2005 04:15:42 GMT
Content-Type: image/gif
Last-Modified: Tue, 27 May 2003 19:00:10 GMT
Age: 987
Connection: keep-alive

The 304 response code is what tells the web browser to simply re-use the local cached copy.

The ETag header is used in a similar way to the Last-Modified one, but this time a unique identifier created by the server that relates to that specific version of the content. Here’s an example HTTP request:

GET /cnn/.element/img/1.3/searchbar/bg.gif HTTP/1.1
Host: i.a.cnn.net
Accept: image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
If-Modified-Since: Wed, 09 Mar 2005 17:32:07 GMT
If-None-Match: “d0b3114c-1b1-9b-0”
Cache-Control: max-age=0

The If-None-Match header value above is what is passed by the user agent to the server, to check whether the local cached copy of this image is still valid or not. If it is, the web server will return a 304 response code as it did for the If-Modified-Since example we described before:

HTTP/1.x 304 Not Modified
Content-Type: image/gif
Last-Modified: Wed, 09 Mar 2005 17:32:07 GMT
Etag: “d0b3114c-182-9b-0”
Date: Sat, 03 Sep 2005 04:44:30 GMT
Connection: close

As you can see, the server will return the 304 response code, and the same ETag value, to demonstrate to the user agent that its local copy is still valid and it should be used.

The Expires Header

This header allows you to set an expiration date for a particular piece of content on your website or application. It is most useful for static content, such as images for buttons and logos. Since these things don’t tend to change very often, you can set a very long expiration date on them, and it will improve the performance of your pages since some of their content will be delivered through caches.

The allowed value for this header is an HTTP date set to Greenwich Mean Time (GMT). That alone brings a potential problem, since depending on how synchronized the clocks on your web server and the cache user agent are, you might get stale content that is deemed fresh. Here’s an example of the use of the Expires header:

HTTP/1.x 200 OK
Date: Sat, 03 Sep 2005 05:06:24 GMT
Server: Apache
Content-Type: text/html
Last-Modified: Sat, 03 Sep 2005 05:06:17 GMT
Cache-Control: max-age=60, private
Expires: Sat, 03 Dec 2005 05:06:24 GMT
Content-Encoding: gzip
Content-Length: 13606
Connection: close

As you can see above, the Expires header is being set to 3 months in the future, which is just fine in this specific case since it is the response for a navigation bar image that doesn’t get changed very often, if at all.

The Cache-Control Header

This header was introduced with HTTP 1.1 to provide more control to proxy servers and overall cache maintenance. There are several values that can be used for this response header:

Header Value

Description

max-age=[seconds]

Amount of time for which the cached content will be deemed valid.

s-maxage=[seconds]

Same as above header value, but only appropriate to proxy caches.

public

Flags authenticated responses as cacheable.

no-cache

Instructs caches to submit the request to the web server before releasing any content. Useful in combination with authenticated requests.

no-store

Prevents caches from storing any content.

must-revalidate

Forces caches to obey your expiration rules.

proxy-revalidate

Similar to must-revalidate, but specific to proxy caches.

Here’s an example of how to mark a specific URL as being cacheable, but still force caches to check with the web server before releasing any content:

Cache-Control: public, no-cache, must-revalidate
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.213.209