The Internet may seem to be always on these days but, let’s be honest, it’s not. There are times and places when even the most modern mobile devices are out of range of the network for one reason or another.
Chapter 4 looked at how to have data stored local to the browser so that it does not require network access to use. However, if the web page on which the application is hosted is not available, having the data handy will be of no use.
With more and more of the modern application infrastructure moving into the browser, being able to access this software at any time has become critically important. The problem is that the standard web application assumes that many components, including JavaScript sources, HTML, images, CSS, and so forth, will be loaded with the web page. In order to be able to use those resources when the user does not have access to the Internet requires that copies of those files be stored locally, and used by the browser when needed. HTML5 lets a programmer give the browser a listing (known as a manifest) of files that should be loaded and saved. The browser will be able to access these files even when there is no network connection to the server.
The files listed in the manifest will also be loaded from the local disk even if the browser is online, thus giving the end user the experience of the ultimate content delivery network.
As long as the browser is online when a page is loaded, it will check the manifest file with the server. If the manifest file has changed, the browser will attempt to redownload all the files listed for download in the manifest. Once all the files in the manifest have been downloaded, the browser will update the file cache to show the new files.
The ability to access files while offline was one of the features introduced by Google in Gears. The user provided a manifest as a JSON file, which then directed the browser to load other required files offline. When the browser next visited that page, the files would be loaded from the local disk instead of from the network. When the version field of the manifest file was updated, Gears would check all the files in the manifest for updates.
The HTML5 manifest is similar in idea but somewhat different in
implementation. One nice thing about it is that you can implement a
manifest in an application without using any JavaScript code, which Gears
did require. To create a manifest, add the manifest
attribute containing the name of the
manifest file to the document’s <html>
tag (see Example 7-1).
Example 7-1. HTML manifest declaration
<!DOCTYPE HTML> <html manifest="/cache.manifest"> <body> ... </body> </html>
The manifest file must be served with the MIME type text/cache-manifest
. This can be done via the
web server configuration files. For the Apache web server, add the following line to the config
file. For other web servers, consult the server’s documentation. The
filename does not matter as long as the file has the correct MIME type,
but cache.manifest
seems to be a good
default choice:
AddType text/cache-manifest .manifest
The format of the manifest file is in fact pretty simple. The
first line must be just the words CACHE
MANIFEST
. After that comes a list of files, one per line, to
include in the manifest (see Example 7-2).
Comments can be marked with the pound (#
) character.
The manifest will cache HTTP GET
requests,
while POST
, PUT
, and
DELETE
will still go to the network. If the page has
an active manifest file, all GET
requests will be
directed to local storage. But for some files, offline access does not
make sense. These can include various server resources such as
Ajax calls, or collections of documents that could get so
large as to overflow the cache area. These files can be included in a
NETWORK
section of the
manifest. Any URLs in the NETWORK
section will bypass the cache and load directly from the server. The
HTML5 manifest specification requires that any non-included files be
explictly opted out of the manifest.
In other cases, you may wish to provide different content
depending on whether the user is offline or online. The manifest
provides a FALLBACK
section for
such resources. The user will be shown different content, depending on
whether the browser has a connection to the Internet or not. On each
line of the FALLBACK
section, the
first file is loaded from the server when a connection is available, and
the second file is loaded locally when the connection is not
available.
Both the NETWORK
and FALLBACK
sections list file patterns, not
specific files. So it is possible to list entire directories or
URL paths here, as well as file types such as images
(e.g., *.jpg
).
The browser will update the files in the manifest whenever the manifest file itself changes. There are several ways to handle this. It is possible to add a version number in a comment in the file. If the project is making use of a version control system like Subversion, you can use the version number tag for this.
The problem with using a version number from a version control system is that it requires a programmer to remember to update that file every time any file in the system changes. It would be much better to create an automated system that updates the manifest file whenever a file listed in it changes, and run that script as part of a deployment procedure.
For instance, you could write a script that checks all the files in the manifest for changes and then change the manifest file itself when one of the files changes. A simple way to do this is to write a script that loops over all the files in the manifest, then does an MD5 checksum on each one, then puts a final checksum into the manifest file. This will ensure that any changes will cause the manifest file to update.
This script is probably too slow to run from the web server, as running it hundreds of times a second would be overkill. However, it can be efficiently run in the development environment. One option would be to have it run from an editor when a file is saved. Another option is to run it as part of the check-in process for a version control system.
In Example 7-3, we parse the manifest file and do a few things with it. The program uses the Symfony Yaml Library to load a list of files to use as a manifest. As a bonus, the program first checks that no file has been included more than once. It also checks that every file exists, because missing files will break the manifest. By adding each file’s MD5 as a comment after the filename, the script makes sure that any updated file will cause a manifest change so that the browser will update its content. It takes a datafile in the format of Example 7-5. Example 7-3 will output a manifest file with the MD5 hash as a comment in the file, as in Example 7-4.
Example 7-3. Automatically updating a manifest file
<?php header('Content-Type: text/cache-manifest'), echo ("CACHE MANIFEST "); $files = sfYaml::load('manifest.yml'), $hashes = ''; $files = unique($files); foreach($files->cache as $file) { if(file_exists($file)) { echo $file." "; $hashes .=md5_file($file); } } echo " NETWORK: " foreach ($files->network as $file) { echo $file. " "; } echo " FALLBACK: " foreach ($files->fallback as $file) { echo $file. " "; } echo "# HASH: ". md5($hashes) . " ";
Example 7-4. Manifest with MD5 hash
CACHE MANIFEST index.html css/style.js js/jquery.js js/myscript.js NETWORK: network/file FALLBACK: /avatars/ /offline-avatars/offline.png #HASH: 090c7e8fe42c16777fba844f835e839b
Example 7-5. The data for Example 7-3
files: - index.html - css/style.css - js/jquery.js - js/myscript.js network: - network/file faillback: - /avatars/ /offline-avatars/offline.png
The manifest is not always very good about updating when you think it should. Even with a new version of a manifest, it can often take some time to update the content in the browser. Unless you set the cache control headers, the browser will not download the manifest again until several hours after it was last downloaded. Make sure the cache control headers don’t cause the browser to only download the file, say, every five years, or use the ETag header. Or, better yet, have the server set a no cache header. Be sure to test well.
When the browser loads a page with a manifest file, it will fire a
checking
event on the window.applicationCache
object. This event will fire whether or not the page has been visited
before.
If the cache has not been seen before, the browser will fire a
downloading
event, and
start to download the files. This event will also fire if the manifest
file has changed. If the manifest has not changed the browser will fire a
noupdate
event.
As the browser downloads the files, it fires a series of progress
events. These
can be used if you wish to provide some form of feedback to the user to
let her know that software is downloading.
Once all the files have downloaded, the cached
event is fired.
If anything goes wrong, the browser will fire the error
event. This can be
caused by a problem in the HTML page, a defective manifest, or a failure
to download the manifest or any resource listed in it. Normally, if a
single file is missing from the manifest, the cache won’t download any of
the files in the manifest. When a manifest changes and ends up including a
bad link, the old version of the file will be retained. If there was no
existing manifest at the time the erroneous manifest is downloaded, the
browser will not create an incomplete offline storage, but will continue
to rely on the network.
However, it is possible that not all browsers or browser versions will handle erroneous manifests in the exact way just described. Having an automatic test to validate all the URLs in a manifest is a good idea. This can be a very hard error to catch because there may be very little visible evidence of what went wrong. Catching the error object in your JavaScript and presenting it to the user would be a good idea, as would some form of automatic testing for bad links.
In Google Chrome, the Developer Tools can show a list of files in the manifest (see Figure 7-1). Under the Storage tab, the Application Cache item will show the status of various items.
It is a good idea during development to turn off the manifest file, and enable it only when the project is ready to go live. Using the cache can make it very hard to develop the application if changes don’t appear quickly.
Manifest files provide a particular debugging challenge. They can be the source of several special classes of bugs.
The first and most obvious bug is to include missing files in the manifest. If a file is included in the page and it is not in the manifest, it will not be loaded by the page, in the same way a missing file on the server will not be downloaded.
Many Selenium tests will not explicitly test for correct styles and the presence of images, so it is quite possible that an application missing a CSS file or image will still work to the extent that it is normally tested in Selenium. In an application that includes resources from outside web servers, those must also be whitelisted in the manifest file.
A further complication comes in some browsers, including Firefox, that make the manifest an opt-in feature. So a Selenium test may not opt into it, which would make the entire test moot. In order to test this in Firefox, it will be necessary to set up a Firefox profile in which the application cache is on by default. To do this:
Quit Firefox completely.
Start up Firefox from a command line with the -profileManager
switch. This will result in
a dialog similar to that shown in Figure 7-2. Save the custom profile.
Restart Firefox. Go to the Firefox Options menu, select the Advanced tab and under that the Network tab (see Figure 7-3), and turn off the “Tell me when a website asks to store data for offline use” option.
Now, when starting up the Selenium RC server, use an option like this:
java -jar selenium-server.jar -firefoxProfileTemplate
For full details on Firefox profiles, see http://support.mozilla.com/en-US/kb/Managing+profiles.
A second class of problems can occur when the manifest is updated and the browser does not reflect the update. Normally, it will take a minute or two after loading a page for the browser to update the file cache, and the browser will not check the cache until the page is loaded. So if the server is updated, the browser will not have the new version until the user visits the page. This can cause problems if there has been an update on the server that will cause the application in the browser to fail, such as a change in the format of how data is sent between the client and server.
When the user visits the page (assuming of course that the browser is online), the browser will fetch the manifest file from the server. However, if the manifest file has a cache control header set on it, the browser may not check for a new version of the manifest. For example, if the file has a header that says the browser should check for updates only once a year (as is sometimes common on web servers), the browser will not reload the manifest file. So it is very important to ensure that the manifest file itself is not cached by the browser, or if it is cached it is done only via an ETag.
The browser can always prevent caching of the manifest file by
giving the URL with a query string attached, as in cache.manifest?load=1
. If the manifest file is a
static text file, the query string will be ignored, but the browser will
not know that and will force the server to send a fresh copy.
Different web browsers, and even different versions of a single browser, may update the manifest file somewhat differently. So it is very important to test any application using a manifest file very carefully across different browsers and browser versions.
3.144.90.182