Basics of caching

Caching is quite a complex and multi-faceted technique, when implemented correctly. However, implementing caching in your application should not be complex, but rather the mindwork before, where you think about what and when to cache, should be. There are many different aspects, layers, and types (and their combinations) of caching in any web application. This recipe will give a short overview about the different types of caching and how to use them.

You can find the source code of this example in the chapter2/caching-general directory.

Getting ready

First, it is important that you understand where caching can happen—inside and outside of your Play application. So let's start by looking at the caching possibilities of the HTTP protocol. HTTP sometimes looks like a simple protocol, but is tricky in the details. However, it is one of the most proven protocols in the Internet, and thus it is always useful to rely on its functionalities.

HTTP allows the caching of contents by setting specific headers in the response. There are several headers which can be set:

  • Cache-Control: This is a header which must be parsed and used by the client and also all the proxies in between.
  • Last-Modified: This adds a timestamp, explaining when the requested resource had been changed the last time. On the next request the client may send an If-Modified-Since header with this date. Now the server may just return a HTTP 304 code without sending any data back.
  • ETag: An ETag is basically the same as a Last-Modified header, except it has a semantic meaning. It is actually a calculated hash value resembling the resource behind the requested URL instead of a timestamp. This means the server can decide when a resource has changed and when it has not. This could also be used for some type of optimistic locking.

So, this is a type of caching on which the requesting client has some influence on. There are also other forms of caching which are purely on the server side. In most other Java web frameworks, the HttpSession object is a classic example, which belongs to this case.

Play has a cache mechanism on the server side. It should be used to store big session data, in this case any data exceeding the 4KB maximum cookie size. Be aware that there is a semantic difference between a cache and a session. You should not rely on the data being in the cache and thus need to handle cache misses.

You can use the Cache class in your controller and model code. The great thing about it is that it is an abstraction of a concrete cache implementation. If you only use one node for your application, you can use the built-in ehCache for caching. As soon as your application needs more than one node, you can configure a memcached in your application.conf and there is no need to change any of your code.

Furthermore, you can also cache snippets of your templates. For example, there is no need to reload the portal page of a user on every request when you can cache it for 10 minutes.

This also leads to a very simple truth. Caching gives you a lot of speed and might even lower your database load in some cases, but it is not free. Caching means you need RAM, lots of RAM in most cases. So make sure the system you are caching on never needs to swap, otherwise you could read the data from disk anyway. This can be a special problem in cloud deployments, as there are often limitations on available RAM.

The following examples show how to utilize the different caching techniques. We will show four different use cases of caching in the accompanying test. First test:

public class CachingTest extends FunctionalTest {

    @Test
    public void testThatCachingPagePartsWork() {
        Response response = GET("/");
        String cachedTime = getCachedTime(response);
        assertEquals(getUncachedTime(response), cachedTime);

        response = GET("/");
        String newCachedTime = getCachedTime(response);
        assertNotSame(getUncachedTime(response), newCachedTime);
        assertEquals(cachedTime, newCachedTime);
    }

    @Test
    public void testThatCachingWholePageWorks() throws Exception {
        Response response = GET("/cacheFor");
        String content = getContent(response);
        response = GET("/cacheFor");
        assertEquals(content, getContent(response));
        Thread.sleep(6000);
        response = GET("/cacheFor");
        assertNotSame(content, getContent(response));
    }

    @Test
    public void testThatCachingHeadersAreSet() {
        Response response = GET("/proxyCache");
        assertIsOk(response);
        assertHeaderEquals("Cache-Control", "max-age=3600", response);
    }

    @Test
    public void testThatEtagCachingWorks() {
        Response response = GET("/etagCache/123");
        assertIsOk(response);
        assertContentEquals("Learn to use etags, dumbass!", response);

        Request request = newRequest();

        String etag = String.valueOf("123".hashCode());
        Header noneMatchHeader =  new Header("if-none-match", etag);
        request.headers.put("if-none-match", noneMatchHeader);

        DateTime ago = new DateTime().minusHours(12);
        String agoStr = Utils.getHttpDateFormatter().format(ago.toDate());
        Header modifiedHeader = new Header("if-modified-since", agoStr);
        request.headers.put("if-modified-since", modifiedHeader);

        response = GET(request, "/etagCache/123");
        assertStatus(304, response);
    }


    private String getUncachedTime(Response response) {
        return getTime(response, 0);
    }

    private String getCachedTime(Response response) {
        return getTime(response, 1);
    }

    private String getTime(Response response, intpos) {
        assertIsOk(response);
        String content = getContent(response);
        return content.split("
")[pos];
    }
}

The first test checks for a very nice feature. Since play 1.1, you can cache parts of a page, more exactly, parts of a template. This test opens a URL and the page returns the current date and the date of such a cached template part, which is cached for about 10 seconds. In the first request, when the cache is empty, both dates are equal. If you repeat the request, the first date is actual while the second date is the cached one.

The second test puts the whole response in the cache for 5 seconds. In order to ensure that expiration works as well, this test waits for six seconds and retries the request.

The third test ensures that the correct headers for proxy-based caching are set.

The fourth test uses an HTTP ETag for caching. If the If-Modified-Since and If-None-Match headers are not supplied, it returns a string. On adding these headers to the correct ETag (in this case the hashCode from the string 123) and the date from 12 hours before, a 302 Not-Modified response should be returned.

How to do it...

Add four simple routes to the configuration as shown in the following code:

GET     /                   Application.index
GET     /cacheFor           Application.indexCacheFor
GET     /proxyCache         Application.proxyCache
GET     /etagCache/{name}   Application.etagCache

The application class features the following controllers:

public class Application extends Controller {

    public static void index() {
        Date date = new Date();
        render(date);
    }

    @CacheFor("5s")
    public static void indexCacheFor() {
        Date date = new Date();
        renderText("Current time is: " + date);
    }

    public static void proxyCache() {
        response.cacheFor("1h");
        renderText("Foo");
    }

    @Inject
    private static EtagCacheCalculator calculator;

    public static void etagCache(String name) {
        Date lastModified = new DateTime().minusDays(1).toDate();
        String etag = calculator.calculate(name);
        if(!request.isModified(etag, lastModified.getTime())) {
            throw new NotModified();
        }
        response.cacheFor(etag, "3h", lastModified.getTime());
        renderText("Learn to use etags, dumbass!");
    }
}

As you can see in the controller, the class to calculate ETags is injected into the controller. This is done on startup with a small job as shown in the following code:

@OnApplicationStart
public class InjectionJob extends Job implements BeanSource {

    private Map<Class, Object>clazzMap = new HashMap<Class, Object>();

    public void doJob() {
        clazzMap.put(EtagCacheCalculator.class, new EtagCacheCalculator());
        Injector.inject(this);
    }

    public <T> T getBeanOfType(Class<T>clazz) {
        return (T) clazzMap.get(clazz);
    }
}

The calculator itself is as simple as possible:

public class EtagCacheCalculator implements ControllerSupport {

    public String calculate(String str) {
        return String.valueOf(str.hashCode());
    }
}

The last piece needed is the template of the index() controller, which looks like this:

Current time is: ${date}
#{cache 'mainPage', for:'5s'}
Current time is: ${date}
#{/cache}

How it works...

Let's check the functionality per controller call. The index() controller has no special treatment inside the controller. The current date is put into the template and that's it. However, the caching logic is in the template here because not the whole, but only a part of the returned data should be cached, and for that a #{cache} tag used. The tag requires two arguments to be passed. The for parameter allows you to set the expiry out of the cache, while the first parameter defines the key used inside the cache. This allows pretty interesting things. Whenever you are in a page where something is exclusively rendered for a user (like his portal entry page), you could cache it with a key, which includes the user name or the session ID, like this:

#{cache 'home-' + connectedUser.email, for:'15min'}
${user.name}
#{/cache}

This kind of caching is completely transparent to the user, as it exclusively happens on the server side. The same applies for the indexCacheFor() controller. Here, the whole page gets cached instead of parts inside the template. This is a pretty good fit for non-personalized, high performance delivery of pages, which often are only a very small portion of your application. However, you already have to think about caching before. If you do a time consuming JPA calculation, and then reuse the cache result in the template, you have still wasted CPU cycles and just saved some rendering time.

The third controller call proxyCache() is actually the most simple of all. It just sets the proxy expire header called Cache-Control. It is optional to set this in your code, because your Play is configured to set it as well when the http.cacheControl parameter in your application.conf is set. Be aware that this works only in production, and not in development mode.

The most complex controller is the last one. The first action is to find out the last modified date of the data you want to return. In this case it is 24 hours ago. Then the ETag needs to be created somehow. In this case, the calculator gets a String passed. In a real-world application you would more likely pass the entity and the service would extract some properties of it, which are used to calculate the ETag by using a pretty-much collision-safe hash algorithm. After both values have been calculated, you can check in the request whether the client needs to get new data or may use the old data. This is what happens in the request.isModified() method.

If the client either did not send all required headers or an older timestamp was used, real data is returned; in this case, a simple string advising you to use an ETag the next time. Furthermore, the calculated ETag and a maximum expiry time are also added to the response via response.cacheFor().

A last specialty in the etagCache() controller is the use of the EtagCacheCalculator. The implementation does not matter in this case, except that it must implement the ControllerSupport interface. However, the initialization of the injected class is still worth a mention. If you take a look at the InjectionJob class, you will see the creation of the class in the doJob() method on startup, where it is put into a local map. Also, the Injector.inject() call does the magic of injecting the EtagCacheCalculator instance into the controllers. As a result of implementing the BeanSource interface, the getBeanOfType() method tries to get the corresponding class out of the map. The map actually should ensure that only one instance of this class exists.

There's more...

Caching is deeply integrated into the Play framework as it is built with the HTTP protocol in mind. If you want to find out more about it, you will have to examine core classes of the framework.

More information in the ActionInvoker

If you want to know more details about how the @CacheFor annotation works in Play, you should take a look at the ActionInvoker class inside of it.

Be thoughtful with ETag calculation

Etag calculation is costly, especially if you are calculating more then the last-modified stamp. You should think about performance here. Perhaps it would be useful to calculate the ETag after saving the entity and storing it directly at the entity in the database. It is useful to make some tests if you are using the ETag to ensure high performance. In case you want to know more about ETag functionality, you should read RFC 2616.

You can also disable the creation of ETags totally, if you set http.useETag=false in your application.conf.

Use a plugin instead of a job

The job that implements the BeanSource interface is not a very clean solution to the problem of calling Injector.inject() on start up of an application. It would be better to use a plugin in this case.

See also

The cache in Play is quite versatile and should be used as such. We will see more about it in all the recipes in this chapter. However, none of this will be implemented as a module, as it should be. This will be shown in Chapter 6, Practical Module Examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.15.161