Chapter 12. Cashing in Hibernate

Caching is one of the important features implemented by an application for better performance. In an ORM perspective, data retrieved from a database is cached in memory or to disk so that there is no need to make a call to the database for every request. A cache is a local copy of the information from the database that may be used to avoid a database call whenever:

  • The application performs a lookup by identifier.

  • The persistence layer resolves an association or collection lazily.

In Figure 12-1, when an application queries for data, the first time Hibernate fetches it from the database; from then on, it fetches data from the cache if the same data is requested.

How Hibernate caches

Figure 12-1. How Hibernate caches

Hibernate provides a way to configure caching at the class level, at the collection level, and also at the query result-set level. Cached data is available at three different scopes in an application:

  • Transaction scope: As you've seen in earlier chapters, a transaction is defined as a unit of work. How does caching affect data in a transaction? If data is fetched in one session, and the same query is executed again in that same session before that unit of work is completed, is that data stored in memory/disk? Does this avoid a call to the database? Yes. This data is stored in memory/disk by default, so the call to the database for the second query is avoided. Because a unit of work is for one request, this data isn't shared or accessed concurrently.

  • Process scope: Data is cached across sessions or across units of work. If a query is executed in one session and the same query is executed in a second session after closing the first, this data is cached so that the second call to the database is avoided. Because two requests can access the same data in these two sessions, this data is accessed concurrently. You should use this scope with caution.

  • Cluster scope: Data is shared between multiple processes on the same machine or different machines in a cluster.

A process-scoped cache makes data retrieved from the database in one unit of work visible to another unit of work. This may have some unwanted consequences that you need to avoid. If an application has non-exclusive access to the database, you shouldn't use process scope. That means the application has concurrent access to the database, and one request can share data with another request. In this case, process scope should be used when data doesn't change often and also if it can be refreshed safely when the local cache expires.

Any application that is designed to be scalable should support caching at the cluster level. Process scope doesn't maintain consistency of data between machines, so you should use cluster-level caching.

Hibernate has a two-level cache architecture:

  • The first-level cache is the persistence context cache. This is at the unit-of-work level. It corresponds to one session in Hibernate for a single request and is by default enabled for the Hibernate session.

  • The second-level cache is either at the process scope or the cluster scope. This is the cache of the state of the persistence entities. A cache-concurrency strategy defines the transaction isolation details for a particular item of data, and the cache provider represents the physical cache implementation.

Hibernate also implements caching for query result sets. This requires two additional physical cache regions that hold the cached query result sets and the timestamp when a table was last updated.

This chapter first shows how to use and configure the second-level cache and then looks at the query-level cache in Hibernate.

Using the Second-Level Cache in Hibernate

At the second-level cache, all persistence contexts that have been started from the same SessionFactory share the same cached data. Different kinds of data require different cache policies: for example, how long the cache should be maintained before evicting data, at what scope the data should be cached, whether the data should be cached in memory, and so on. The cache policy involves setting the following:

  • Whether the second-level cache is enabled

  • The Hibernate concurrency strategy

  • Cache expiration policies (such as timeout, least recently used, and memory sensitive)

  • The physical format of the cache (memory, indexed files, or cluster-replicated)

To reiterate, the second-level cache is more useful when there is less data that is often updated. In other words, it's useful when there is more read-only data.

You set up the second-level cache in two steps. First, you have to decide on the concurrency strategy; then, you configure the cache expiration and physical cache attributes using the cache provider.

Concurrency Strategies

There are four built-in concurrency strategies:

  • Transactional: This strategy should be used when there is more data to read. It prevents stale data in concurrency strategies. It's equivalent to isolation level: repeatable read.

  • Read-write: This maintains a read-committed isolation level.

  • Non-strict read-write: This doesn't guarantee that you won't read stale data. It doesn't guarantee consistency between cache and database.

  • Read-only: This is appropriate for data that never changes.

Cache Providers

The following built-in providers are available in Hibernate:

  • EHCache: This is an open source standards-based cache that supports process-scope cache. It can cache in memory or to disk, and it supports query cache. EHCache Distributed supports the cluster-scope cache. It has its own configuration file (ehcache.xml) where all the required parameters are set for caching. You can check out the tutorial at http://ehcache.org/.

  • OSCache: This is also open source caching provider that writes to memory and disk. It also has cluster cache support. The URL www.opensymphony.com/oscache/ has documentation.

  • SwarmCache: This is a simple but effective distributed cache. It uses IP multicast to communicate with any number of hosts in a distributed system. The documentation is at http://swarmcache.sourceforge.net.

  • JBoss cache: This is a distributed cache system. It replicates data across different nodes in a distributed system. State is always kept in synch with other servers in the system. The documentation is at www.jboss.org/jbosscache/.

Table 12-1 summarizes these cache providers.

Table 12-1. Cache Providers

Cache

Provider Class

Type

Cluster Safe

Query Cache Supported

Hashtable

org.hibernate.cache.HashtableCacheProvider

Memory

Yes

 

EHCache

org.hibernate.cache.EHCacheProvider

Memory, disk

Yes

 

OSCache

org.hibernate.cache.OSCacheProvider

Memory, disk

Yes

 

JBoss Cache

org.hibernate.cache.JBossCacheProvider

Clustered, ip-multicast

Yes

 

SwarmCache

org.hibernate.cache.SwarmCacheProvider

Clustered, ip-multicast

Yes

 

You can develop an adapter for your application by implementing org.hibernate.cache.CacheProvider. All of the listed cache providers implement this interface.

Not every cache provider is compatible with every concurrency strategies. The compatibility matrix is given in Table 12-2.

Table 12-2. Cache Providers' Concurrency Strategies

Cache Provider

Read-Only

Non-Strict Read-Write

Read-Write

Transactional

EHCache

Yes

Yes

Yes

 

OSCache

Yes

Yes

Yes

 

SwarmCache

Yes

Yes

Yes

 

JBoss Cache

Yes

  

Yes

You can configure the cache for a specific class or a collection. The <cache> element is used to configure cache in the hibernate.cfg.xml file. The <cache> element is defined as follows:

<cache  usage="transactional|read-write|nonstrict-read-write|read-only"
        region="RegionName" include="all|lazy"/>

Table 12-3 defines its attributes.

Table 12-3. <cache> Element Attributes

Attribute

Description

usage

The caching strategy.

region

The region name. Region names are references to actual cached data.

include

Specifies that properties of the entity mapped with lazy=true can't be cached when the attribute level lazy fetching is enabled.

What Are Cache Regions?

Cache regions are handles by which you can reference classes and collections in the cache provider configuration and set the expiration policies applicable to that region. Regions are buckets of data of two types: one type contains disassembled data of entity instances, and the other contains only identifiers of entities that are linked through a collection.

The name of the region is the class name in the case of a class cache or the class name together with the property name in the case of a collection cache.

Caching Query Results

A query's result set can be configured to be cached. By default, caching is disabled, and every HQL, JPA QL, and Criteria query hits the database. You enable the query cache as follows:

hibernate.cache.use_query_cache = true

In addition to setting this configuration property, you should use the org.hibernate.Query interface:

Query bookQuery = session.createQuery("from Book book where book.name < ?");
bookQuery.setString("name","HibernateRecipes");
bookQuery.setCacheable(true);

The setCacheable() method enables the result to be cached.

Using the First-Level Cache

Problem

What is the first-level cache, and how is it used in Hibernate?

Solution

The first-level cache is at the transaction level or the unit of work. It's enabled by default in Hibernate. Caching at the first level is associated with a session. If the same query is executed multiple times in the same session, the data associated with the query is cached.

How It Works

The general concept of caching persistent objects is that when an object is first read from external storage, a copy of it is stored in the cache. Subsequent readings of the same object can be retrieved from the cache directly. Because caches are typically stored in memory or on local disk, it's faster to read an object from cache than external storage. If you use it properly, caching can greatly improve the performance of your application.

As a high-performance ORM framework, Hibernate supports the caching of persistent objects at different levels. Suppose you retrieve an object more than once within a session: does Hibernate query the database as many times as the query is invoked?

Session session = factory.openSession();
try {
        Book book1 = (Book) session.get(Book.class, id);
        Book book2 = (Book) session.get(Book.class, id);
} finally {
        session.close();
}

If you inspect the SQL statements executed by Hibernate, you find that only one database query is made. That means Hibernate is caching your objects in the same session. This kind of caching is called first-level caching, and its caching scope is a session.

But how about getting an object with same identifier more than once in two different sessions?

Session session1 = factory.openSession();
try {
        Book book1 = (Book) session1.get(Book.class, id);
} finally {
        session1.close();
}
Session session2 = factory.openSession();
try {
        Book book2 = (Book) session2.get(Book.class, id);
} finally {
        session2.close();
}

Two database queries are made. That means Hibernate isn't caching the persistent objects across different sessions by default. You need to turn on this second-level caching, whose caching scope is a session factory.

Configuring the Second-Level Cache

Problem

What is the second-level cache, and how is it configured and used in Hibernate?

Solution

Second-level cache is configured at the process level or the cluster level. In Hibernate, you can configure it for a particular class or for a collection to improve performance. You can use the second-level cache with large and complex object graphs that may be loaded often. It's associated with one SessionFactory and can be reused in multiple Sessions created from the same SessionFactory.

How It Works

To turn on the second-level cache, the first step is to choose a cache provider in the Hibernate configuration file hibernate.cfg.xml. Hibernate supports several cache implementations, such as EHCache, OSCache, SwarmCache and JBoss Cache, as discussed earlier. In a nondistributed environment, you can choose EHCache, which is Hibernate's default cache provider:

<hibernate-configuration>
<session-factory>
...
<property name="cache.provider_class">
org.hibernate.cache.EhCacheProvider
</property>
...
</session-factory>
</hibernate-configuration>

You can configure EHCache through the configuration file ehcache.xml, located in the source root folder. You can specify different settings for different cache regions, which store different kinds of objects. If the parameter eternal is set to false, the elements will be expired in a time period. The parameter timeToIdleSeconds is the time to idle for an element before it expires; it's the maximum amount of time between accesses before an element expires and is used only if the element isn't eternal. The parameter timeToLiveSeconds is time to live for an element before it expires; it's the maximum time between creation and when the element expires and is used only if the element isn't eternal:

<ehcache>
<diskStore path="java.io.tmpdir" />
<defaultCache maxElementsInMemory="10000" eternal="false" timeToIdleSeconds="120" timeToLiveSeconds="120" overflowToDisk="true"
/>
<cache name="com.metaarchit.bookshop.Book" maxElementsInMemory="10000"
eternal="false" timeToIdleSeconds="300" timeToLiveSeconds="600"
overflowToDisk="true"
/>
</ehcache>

To monitor Hibernate's caching activities at runtime, add the following line to the log4j configuration file log4j.properties:

log4j.logger.org.hibernate.cache=debug

Now, you need to enable caching for a particular persistent class. Let's look at what goes on behind the scenes when caching is enabled in Hibernate.

When you enable a class with the second-level cache, Hibernate doesn't store the actual instances in cache. Instead it caches the individual properties of that object. In the example, the instance of Book isn't cached—rather, the properties in the book object like name and price are cached. The cached objects are stored in a region that has the same name as the persistent class, such as com.metaarchit.bookshop.Book.

You can choose several cache usages for a persistent class. If the persistent objects are read-only and never modified, the most efficient usage is read-only:

<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Book" table="BOOK">
<cache usage="read-only" />
...
</class>
</hibernate-mapping>

When the Book class is cached, only one SQL query is made, even though the book object is loaded more than once in two different sessions. Because the second call to the database is avoided, the query's performance improves:

Session session1 = factory.openSession();
try {
        Book book1 = (Book) session1.get(Book.class, id);
} finally {
        session1.close();
}
Session session2 = factory.openSession();
try {
        Book book2 = (Book) session2.get(Book.class, id);
} finally {
        session2.close();
}

However, if you modify the book object in one session and flush the changes, an exception will be thrown for updating a read-only object.

Session session1 = factory.openSession();
try {
        Book book1 = (Book) session1.get(Book.class, id);
        book1.setName("New Book");
        session1.save(book1);
        session1.flush();
} finally {
session1.close();
}

So, for an updatable object, you should choose another cache usage: read-write. Hibernate invalidates the object from cache before it's updated:

<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Book" table="BOOK">
<cache usage="read-write" />
...
</class>
</hibernate-mapping>

In some cases, such as when the database updated by other applications, you may want to invalidate the cached objects manually. You can do it through the methods provided by the session factory. You can invalidate either one instance or all instances of a persistent class:

factory.evict(Book.class);
factory.evict(Book.class, id);
factory.evictEntity("com.metaarchit.bookshop.Book");
factory.evictEntity("com.metaarchit.bookshop.Book", id);

After the cached object is evicted from cache, when you need to fetch data from the database, the query is executed and the updated data is retrieved. This avoids having stale data in cache. Note that the eviction from the second-level cache is nontransactional, meaning that the cache region isn't locked during the eviction process.

CacheMode options are provided to control the interaction of Hibernate with the second level cache:

Session session1 = factory.openSession();
Session.setCacheMode(CacheMode.IGNORE);
try {
Book book1 = new Book();
book1.setName("New Book");
session1.save(book1);
session1.flush();
} finally {
session1.close();
}

CacheMode.IGNORE tells Hibernate not to interact with second level cache for that session. Options available in CacheMode are as follows:

  • CacheMode.NORMAL: The default behavior.

  • CacheMode.IGNORE: Hibernate doesn't interact with the second-level cache. When entities cached in the second-level cache are updated, Hibernate invalidates them.

  • CacheMode.GET: Hibernate only reads and doesn't add items to the second-level cache. When entities cached in the second-level cache are updated, Hibernate invalidates them.

  • CacheMode.PUT: Hibernate only adds and doesn't add read from the second-level cache. When entities cached in the second-level cache are updated, Hibernate invalidates them.

  • CacheMode.REFRESH: Hibernate only adds and doesn't read from the second-level cache. In this mode, the setting of hibernate.cache.use_minimal_puts is bypassed, as the refresh is forced.

Caching Associations

Problem

Can associated objects be cached? How do you configure them?

Solution

Associated objects have a parent-child relationship in the database. How does caching work in this case? If the parent object is cached, is the child object also cached? By default, associated objects aren't cached. If you need to cache these objects, you can configure it explicitly. The primary reason to cache associations is to avoid additional calls to the database.

How It Works

Is a book's associated publisher object cached when its parent book object is cached?

<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Book" table="BOOK">
<cache usage="read-write" />
...
<many-to-one name="publisher" class="Publisher" column="PUBLISHER_ID" />
</class>
</hibernate-mapping>

If you initialize the association in two different sessions, it's loaded as many times as the initialization is made. This is because Hibernate caches only the identifier of publisher in Book's region:

Session session1 = factory.openSession();
try {
        Book book1 = (Book) session1.get(Book.class, id);
        Hibernate.initialize(book1.getPublisher());
} finally {
session1.close();
}
Session session2 = factory.openSession();
try {
        Book book2 = (Book) session2.get(Book.class, id);
        Hibernate.initialize(book2.getPublisher());
} finally {
        session2.close();
}

To cache the publisher objects in their own region, you need to enable caching for the Publisher class. You can do it the same way as for the Book class. The cached publisher objects are stored in a region with the name com.metaarchit.bookshop.Publisher:

<hibernate-mapping package="com.metaarchit.bookshop">
        <class name="Publisher" table="PUBLISHER">
                <cache usage="read-write" />
...
        </class>
</hibernate-mapping>

When the Publisher class is configured to be cached along with Book, for all the join queries between Publisher and Book, if the Publisher data is cached, it doesn't fetch the data from the database but fetches from the cache.

Caching Collections

Problem

How do you cache collections?

Solution

Collections also can be cached explicitly in Hibernate. If a persistent object contains associated objects in a collection, the collection can also be cached explicitly. If the collection contains value types, they're stored by their values. If the collection contains objects, the object's identifiers are cached.

How It Works

For the associated chapters of a book to be cached, you enable caching for the Chapter class. The cached chapter objects are stored in a region names com.metaarchit.bookshop.Chapter:

<hibernate-mapping package="com.metaarchit.bookshop">
        <class name="Book" table="BOOK">
                <cache usage="read-write" />
                ...
                <set name="chapters" table="BOOK_CHAPTER">
                <key column="BOOK_ID" />
                <one-to-many class="Chapter" />
                </set>
        </class>
</hibernate-mapping>
<hibernate-mapping package="com.metaarchit.bookshop">
        <class name="Chapter" table="CHAPTER">
                <cache usage="read-write" />
                ...
                <many-to-one name="book" class="Book" column="BOOK_ID" />
        </class>
</hibernate-mapping>

Then, you can try initializing the collection in two different sessions. No query should be made for the second session:

Session session1 = factory.openSession();
try {
        Book book1 = (Book) session1.get(Book.class, id);
        Hibernate.initialize(book1.getChapters());
} finally {
        session1.close();
}
Session session2 = factory.openSession();
try {
        Book book2 = (Book) session2.get(Book.class, id);
        Hibernate.initialize(book2.getChapters());
} finally {
        session2.close();
}

Inspecting the SQL statements, you see that one query is still made. The reason is that, unlike a many-to-one association, a collection isn't cached by default. You need to turn on caching manually by specifying cache usage to the collection. For a book's chapter collection, it's stored in a region named com.metaarchit.bookshop.Book.chapters.

How can Hibernate cache a collection? If the collection is storing simple values, the values themselves are cached. If the collection is storing persistent objects, the identifiers of the objects are cached in the collection region, and the persistent objects are cached in their own region:

<hibernate-mapping package="com.metaarchit.bookshop">
        <class name="Book" table="BOOK">
                <cache usage="read-write" />
                ...
                <set name="chapters" table="BOOK_CHAPTER">
                <cache usage="read-write" />
                <key column="BOOK_ID" />
                <one-to-many class="Chapter" />
                </set>
        </class>
</hibernate-mapping>

To invalidate a particular collection or all the collections in a region, you can use the following methods provided by the session factory:

factory.evictCollection("com.metaarchit.bookshop.Book.chapters");
factory.evictCollection("com.metaarchit.bookshop.Book.chapters", id);

For a bidirectional one-to-many/many-to-one association, you should call the evictCollection() method on the collection end after the single end is updated. That means you need to remove the Book reference from the Chapter and update the Chapter first. Then, call the evictCollection() method:

Session session1 = factory.openSession();
try {
        Book book1 = (Book) session1.get(Book.class, id);
        Chapter chapter = (Chapter) book1.getChapters().iterator().next();
        chapter.setBook(null);
        session1.saveOrUpdate(chapter);
        session1.flush();
        factory.evictCollection("com.metaarchit.bookshop.Book.chapters", id);
} finally {
       session1.close();
}
Session session2 = factory.openSession();
try {
        Book book2 = (Book) session2.get(Book.class, id);
        Hibernate.initialize(book2.getChapters());
} finally {
        session2.close();
}

Caching Queries

Problem

Can queries be cached? How is this achieved in Hibernate?

Solution

Query result sets can be cached. This is useful when you run a particular query often with the same parameters.

How It Works

In addition to caching objects loaded by a session, a query with HQL can be cached. Suppose you're running the same query in two different sessions:

Session session1 = factory.openSession();
try {
        Query query = session1.createQuery("from Book where name like ?");
        query.setString(0, "%Hibernate%");
        List books = query.list();
} finally {
session1.close();
}
Session session2 = factory.openSession();
try {
        Query query = session2.createQuery("from Book where name like ?");
        query.setString(0, "%Hibernate%");
        List books = query.list();
} finally {
        session2.close();
}

By default, the HQL queries aren't cached. You must first enable the query cache in the Hibernate configuration file:

<hibernate-configuration>
        <session-factory>
        ...
        <property name="cache.use_query_cache">true</property>
        ...
        </session-factory>
</hibernate-configuration>

The setting cache.use_query_cache creates two cache regions: one holding the cached query result sets and the other holding timestamps of the more recent updates to queryable tables.

By default, the queries aren't cached. In addition to using the previous setting, to enable caching, you need to call Query.setCacheable(true). This allows the query to look for existing cache results or add the results of the query to the cache.

Then, you need to set the query to be cacheable before execution. The query result is cached in a region named org.hibernate.cache.QueryCache by default.

How can a query be cached by Hibernate? If the query returns simple values, the values themselves are cached. If the query returns persistent objects, the identifiers of the objects are cached in the query region, and the persistent objects are cached in their own region:

Session session1 = factory.openSession();
try {
        Query query = session1.createQuery("from Book where name like ?");
        query.setString(0, "%Hibernate%");
        query.setCacheable(true);
        List books = query.list();
} finally {
        session1.close();
}
Session session2 = factory.openSession();
try {
        Query query = session2.createQuery("from Book where name like ?");
        query.setString(0, "%Hibernate%");
        query.setCacheable(true);
        List books = query.list();
} finally {
        session2.close();
}

You can also specify the cache region for a query. Doing so lets you separate query caches in different regions and reduce the number of caches in one particular region:

...
query.setCacheable(true);
query.setCacheRegion("com.metaarchit.bookshop.BookQuery");

Summary

In this chapter, you've seen what caching is and the different levels of caching in Hibernate. Caching can be enabled at the transaction level, where data is cached for one unit of work; this is the default behavior in Hibernate. Multiple queries in a particular session share the cached data.

You've learned that at the second level, caching is enabled for a process, and the cache is associated with a SessionFactory. Multiple sessions within a SessionFactory share the same data. Configuring the second-level cache is a two step process: first, you decide on the concurrent strategy, and then you choose the provider that implements the caching strategy.

You've also learned that you can use four concurrent strategies: read-only, read-write, transactional, and not-strict read-write. There are many open source cache providers, including OSCache, EHCache, SwarmCache, and JBoss Cache. Each of these provides different concurrent strategies.

Query result sets also can be cached. You do so through configuration and by invoking setCacheable(true). Associated objects and collections can also be cached explicitly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.47.169