Caching is one of the important features implemented by an application for better performance. In an ORM perspective, data retrieved from a database is cached in memory or to disk so that there is no need to make a call to the database for every request. A cache is a local copy of the information from the database that may be used to avoid a database call whenever:
The application performs a lookup by identifier.
The persistence layer resolves an association or collection lazily.
In Figure 12-1, when an application queries for data, the first time Hibernate fetches it from the database; from then on, it fetches data from the cache if the same data is requested.
Hibernate provides a way to configure caching at the class level, at the collection level, and also at the query result-set level. Cached data is available at three different scopes in an application:
Transaction scope: As you've seen in earlier chapters, a transaction is defined as a unit of work. How does caching affect data in a transaction? If data is fetched in one session, and the same query is executed again in that same session before that unit of work is completed, is that data stored in memory/disk? Does this avoid a call to the database? Yes. This data is stored in memory/disk by default, so the call to the database for the second query is avoided. Because a unit of work is for one request, this data isn't shared or accessed concurrently.
Process scope: Data is cached across sessions or across units of work. If a query is executed in one session and the same query is executed in a second session after closing the first, this data is cached so that the second call to the database is avoided. Because two requests can access the same data in these two sessions, this data is accessed concurrently. You should use this scope with caution.
Cluster scope: Data is shared between multiple processes on the same machine or different machines in a cluster.
A process-scoped cache makes data retrieved from the database in one unit of work visible to another unit of work. This may have some unwanted consequences that you need to avoid. If an application has non-exclusive access to the database, you shouldn't use process scope. That means the application has concurrent access to the database, and one request can share data with another request. In this case, process scope should be used when data doesn't change often and also if it can be refreshed safely when the local cache expires.
Any application that is designed to be scalable should support caching at the cluster level. Process scope doesn't maintain consistency of data between machines, so you should use cluster-level caching.
Hibernate has a two-level cache architecture:
The first-level cache is the persistence context cache. This is at the unit-of-work level. It corresponds to one session in Hibernate for a single request and is by default enabled for the Hibernate session.
The second-level cache is either at the process scope or the cluster scope. This is the cache of the state of the persistence entities. A cache-concurrency strategy defines the transaction isolation details for a particular item of data, and the cache provider represents the physical cache implementation.
Hibernate also implements caching for query result sets. This requires two additional physical cache regions that hold the cached query result sets and the timestamp when a table was last updated.
This chapter first shows how to use and configure the second-level cache and then looks at the query-level cache in Hibernate.
At the second-level cache, all persistence contexts that have been started from the same SessionFactory
share the same cached data. Different kinds of data require different cache policies: for example, how long the cache should be maintained before evicting data, at what scope the data should be cached, whether the data should be cached in memory, and so on. The cache policy involves setting the following:
To reiterate, the second-level cache is more useful when there is less data that is often updated. In other words, it's useful when there is more read-only data.
You set up the second-level cache in two steps. First, you have to decide on the concurrency strategy; then, you configure the cache expiration and physical cache attributes using the cache provider.
There are four built-in concurrency strategies:
Transactional: This strategy should be used when there is more data to read. It prevents stale data in concurrency strategies. It's equivalent to isolation level: repeatable read.
Read-write: This maintains a read-committed isolation level.
Non-strict read-write: This doesn't guarantee that you won't read stale data. It doesn't guarantee consistency between cache and database.
Read-only: This is appropriate for data that never changes.
The following built-in providers are available in Hibernate:
EHCache: This is an open source standards-based cache that supports process-scope cache. It can cache in memory or to disk, and it supports query cache. EHCache Distributed supports the cluster-scope cache. It has its own configuration file (ehcache.xml
) where all the required parameters are set for caching. You can check out the tutorial at http://ehcache.org/
.
OSCache: This is also open source caching provider that writes to memory and disk. It also has cluster cache support. The URL www.opensymphony.com/oscache/
has documentation.
SwarmCache: This is a simple but effective distributed cache. It uses IP multicast to communicate with any number of hosts in a distributed system. The documentation is at http://swarmcache.sourceforge.net
.
JBoss cache: This is a distributed cache system. It replicates data across different nodes in a distributed system. State is always kept in synch with other servers in the system. The documentation is at www.jboss.org/jbosscache/
.
Table 12-1 summarizes these cache providers.
Table 12-1. Cache Providers
Provider Class | Type | Cluster Safe | Query Cache Supported | |
---|---|---|---|---|
Hashtable |
| Memory | Yes | |
EHCache |
| Memory, disk | Yes | |
OSCache |
| Memory, disk | Yes | |
JBoss Cache |
| Clustered, ip-multicast | Yes | |
SwarmCache |
| Clustered, ip-multicast | Yes |
You can develop an adapter for your application by implementing org.hibernate.cache.CacheProvider
. All of the listed cache providers implement this interface.
Not every cache provider is compatible with every concurrency strategies. The compatibility matrix is given in Table 12-2.
Table 12-2. Cache Providers' Concurrency Strategies
Cache Provider | Read-Only | Non-Strict Read-Write | Read-Write | Transactional |
---|---|---|---|---|
EHCache | Yes | Yes | Yes | |
OSCache | Yes | Yes | Yes | |
SwarmCache | Yes | Yes | Yes | |
JBoss Cache | Yes | Yes |
You can configure the cache for a specific class or a collection. The <cache>
element is used to configure cache in the hibernate.cfg.xml
file. The <cache>
element is defined as follows:
<cache usage="transactional|read-write|nonstrict-read-write|read-only" region="RegionName" include="all|lazy"/>
Table 12-3 defines its attributes.
Cache regions are handles by which you can reference classes and collections in the cache provider configuration and set the expiration policies applicable to that region. Regions are buckets of data of two types: one type contains disassembled data of entity instances, and the other contains only identifiers of entities that are linked through a collection.
The name of the region is the class name in the case of a class cache or the class name together with the property name in the case of a collection cache.
A query's result set can be configured to be cached. By default, caching is disabled, and every HQL, JPA QL, and Criteria query hits the database. You enable the query cache as follows:
hibernate.cache.use_query_cache = true
In addition to setting this configuration property, you should use the org.hibernate.Query
interface:
Query bookQuery = session.createQuery("from Book book where book.name < ?"); bookQuery.setString("name","HibernateRecipes"); bookQuery.setCacheable(true);
The setCacheable()
method enables the result to be cached.
The first-level cache is at the transaction level or the unit of work. It's enabled by default in Hibernate. Caching at the first level is associated with a session. If the same query is executed multiple times in the same session, the data associated with the query is cached.
The general concept of caching persistent objects is that when an object is first read from external storage, a copy of it is stored in the cache. Subsequent readings of the same object can be retrieved from the cache directly. Because caches are typically stored in memory or on local disk, it's faster to read an object from cache than external storage. If you use it properly, caching can greatly improve the performance of your application.
As a high-performance ORM framework, Hibernate supports the caching of persistent objects at different levels. Suppose you retrieve an object more than once within a session: does Hibernate query the database as many times as the query is invoked?
Session session = factory.openSession(); try { Book book1 = (Book) session.get(Book.class, id); Book book2 = (Book) session.get(Book.class, id); } finally { session.close(); }
If you inspect the SQL statements executed by Hibernate, you find that only one database query is made. That means Hibernate is caching your objects in the same session. This kind of caching is called first-level caching, and its caching scope is a session.
But how about getting an object with same identifier more than once in two different sessions?
Session session1 = factory.openSession(); try { Book book1 = (Book) session1.get(Book.class, id); } finally { session1.close(); } Session session2 = factory.openSession(); try { Book book2 = (Book) session2.get(Book.class, id); } finally { session2.close(); }
Two database queries are made. That means Hibernate isn't caching the persistent objects across different sessions by default. You need to turn on this second-level caching, whose caching scope is a session factory.
Second-level cache is configured at the process level or the cluster level. In Hibernate, you can configure it for a particular class or for a collection to improve performance. You can use the second-level cache with large and complex object graphs that may be loaded often. It's associated with one SessionFactory
and can be reused in multiple Sessions
created from the same SessionFactory
.
To turn on the second-level cache, the first step is to choose a cache provider in the Hibernate configuration file hibernate.cfg.xml
. Hibernate supports several cache implementations, such as EHCache, OSCache, SwarmCache and JBoss Cache, as discussed earlier. In a nondistributed environment, you can choose EHCache, which is Hibernate's default cache provider:
<hibernate-configuration> <session-factory> ...<property name="cache.provider_class">
org.hibernate.cache.EhCacheProvider
</property>
... </session-factory> </hibernate-configuration>
You can configure EHCache through the configuration file ehcache.xml
, located in the source root folder. You can specify different settings for different cache regions, which store different kinds of objects. If the parameter eternal
is set to false, the elements will be expired in a time period. The parameter timeToIdleSeconds
is the time to idle for an element before it expires; it's the maximum amount of time between accesses before an element expires and is used only if the element isn't eternal. The parameter timeToLiveSeconds
is time to live for an element before it expires; it's the maximum time between creation and when the element expires and is used only if the element isn't eternal:
<ehcache> <diskStore path="java.io.tmpdir" /> <defaultCache maxElementsInMemory="10000" eternal="false" timeToIdleSeconds="120" timeToLiveSeconds="120" overflowToDisk="true" /> <cache name="com.metaarchit.bookshop.Book" maxElementsInMemory="10000" eternal="false" timeToIdleSeconds="300" timeToLiveSeconds="600" overflowToDisk="true" />
</ehcache>
To monitor Hibernate's caching activities at runtime, add the following line to the log4j configuration file log4j.properties
:
log4j.logger.org.hibernate.cache=debug
Now, you need to enable caching for a particular persistent class. Let's look at what goes on behind the scenes when caching is enabled in Hibernate.
When you enable a class with the second-level cache, Hibernate doesn't store the actual instances in cache. Instead it caches the individual properties of that object. In the example, the instance of Book
isn't cached—rather, the properties in the book
object like name
and price
are cached. The cached objects are stored in a region that has the same name as the persistent class, such as com.metaarchit.bookshop.Book
.
You can choose several cache usages for a persistent class. If the persistent objects are read-only and never modified, the most efficient usage is read-only
:
<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Book" table="BOOK">
<cache usage="read-only" />
...
</class>
</hibernate-mapping>
When the Book
class is cached, only one SQL query is made, even though the book
object is loaded more than once in two different sessions. Because the second call to the database is avoided, the query's performance improves:
Session session1 = factory.openSession(); try { Book book1 = (Book) session1.get(Book.class, id); } finally { session1.close(); } Session session2 = factory.openSession(); try { Book book2 = (Book) session2.get(Book.class, id); } finally { session2.close(); }
However, if you modify the book
object in one session and flush the changes, an exception will be thrown for updating a read-only object.
Session session1 = factory.openSession(); try { Book book1 = (Book) session1.get(Book.class, id);book1.setName("New Book");
session1.save(book1);
session1.flush();
} finally {
session1.close(); }
So, for an updatable object, you should choose another cache usage: read-write
. Hibernate invalidates the object from cache before it's updated:
<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Book" table="BOOK">
<cache usage="read-write" />
...
</class>
</hibernate-mapping>
In some cases, such as when the database updated by other applications, you may want to invalidate the cached objects manually. You can do it through the methods provided by the session factory. You can invalidate either one instance or all instances of a persistent class:
factory.evict(Book.class); factory.evict(Book.class, id); factory.evictEntity("com.metaarchit.bookshop.Book"); factory.evictEntity("com.metaarchit.bookshop.Book", id);
After the cached object is evicted from cache, when you need to fetch data from the database, the query is executed and the updated data is retrieved. This avoids having stale data in cache. Note that the eviction from the second-level cache is nontransactional, meaning that the cache region isn't locked during the eviction process.
CacheMode
options are provided to control the interaction of Hibernate with the second level cache:
Session session1 = factory.openSession();
Session.setCacheMode(CacheMode.IGNORE);
try {
Book book1 = new Book();
book1.setName("New Book");
session1.save(book1);
session1.flush();
} finally {
session1.close();
}
CacheMode.IGNORE
tells Hibernate not to interact with second level cache for that session. Options available in CacheMode
are as follows:
CacheMode.NORMAL
: The default behavior.
CacheMode.IGNORE
: Hibernate doesn't interact with the second-level cache. When entities cached in the second-level cache are updated, Hibernate invalidates them.
CacheMode.GET
: Hibernate only reads and doesn't add items to the second-level cache. When entities cached in the second-level cache are updated, Hibernate invalidates them.
CacheMode.PUT
: Hibernate only adds and doesn't add read from the second-level cache. When entities cached in the second-level cache are updated, Hibernate invalidates them.
CacheMode.REFRESH
: Hibernate only adds and doesn't read from the second-level cache. In this mode, the setting of hibernate.cache.use_minimal_puts
is bypassed, as the refresh is forced.
Associated objects have a parent-child relationship in the database. How does caching work in this case? If the parent object is cached, is the child object also cached? By default, associated objects aren't cached. If you need to cache these objects, you can configure it explicitly. The primary reason to cache associations is to avoid additional calls to the database.
Is a book
's associated publisher
object cached when its parent book
object is cached?
<hibernate-mapping package="com.metaarchit.bookshop"> <class name="Book" table="BOOK"> <cache usage="read-write" /> ... <many-to-one name="publisher" class="Publisher" column="PUBLISHER_ID" /> </class> </hibernate-mapping>
If you initialize the association in two different sessions, it's loaded as many times as the initialization is made. This is because Hibernate caches only the identifier of publisher
in Book
's region:
Session session1 = factory.openSession(); try { Book book1 = (Book) session1.get(Book.class, id);Hibernate.initialize(book1.getPublisher());
} finally { session1.close(); } Session session2 = factory.openSession(); try { Book book2 = (Book) session2.get(Book.class, id);Hibernate.initialize(book2.getPublisher());
} finally { session2.close(); }
To cache the publisher
objects in their own region, you need to enable caching for the Publisher
class. You can do it the same way as for the Book
class. The cached publisher
objects are stored in a region with the name com.metaarchit.bookshop.Publisher
:
<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Publisher" table="PUBLISHER">
<cache usage="read-write" />
...
</class>
</hibernate-mapping>
When the Publisher
class is configured to be cached along with Book
, for all the join queries between Publisher
and Book
, if the Publisher
data is cached, it doesn't fetch the data from the database but fetches from the cache.
Collections also can be cached explicitly in Hibernate. If a persistent object contains associated objects in a collection, the collection can also be cached explicitly. If the collection contains value
types, they're stored by their values. If the collection contains objects, the object's identifiers are cached.
For the associated chapters of a book to be cached, you enable caching for the Chapter
class. The cached chapter
objects are stored in a region names com.metaarchit.bookshop.Chapter
:
<hibernate-mapping package="com.metaarchit.bookshop"> <class name="Book" table="BOOK"> <cache usage="read-write" /> ... <set name="chapters" table="BOOK_CHAPTER"> <key column="BOOK_ID" /> <one-to-many class="Chapter" /> </set> </class> </hibernate-mapping>
<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Chapter" table="CHAPTER">
<cache usage="read-write" />
...
<many-to-one name="book" class="Book" column="BOOK_ID" />
</class>
</hibernate-mapping>
Then, you can try initializing the collection in two different sessions. No query should be made for the second session:
Session session1 = factory.openSession(); try { Book book1 = (Book) session1.get(Book.class, id);Hibernate.initialize(book1.getChapters());
} finally { session1.close(); } Session session2 = factory.openSession(); try { Book book2 = (Book) session2.get(Book.class, id);Hibernate.initialize(book2.getChapters());
} finally { session2.close(); }
Inspecting the SQL statements, you see that one query is still made. The reason is that, unlike a many-to-one association, a collection isn't cached by default. You need to turn on caching manually by specifying cache usage to the collection. For a book's chapter collection, it's stored in a region named com.metaarchit.bookshop.Book.chapters
.
How can Hibernate cache a collection? If the collection is storing simple values, the values themselves are cached. If the collection is storing persistent objects, the identifiers of the objects are cached in the collection region, and the persistent objects are cached in their own region:
<hibernate-mapping package="com.metaarchit.bookshop">
<class name="Book" table="BOOK">
<cache usage="read-write" />
...
<set name="chapters" table="BOOK_CHAPTER">
<cache usage="read-write" />
<key column="BOOK_ID" />
<one-to-many class="Chapter" />
</set>
</class>
</hibernate-mapping>
To invalidate a particular collection or all the collections in a region, you can use the following methods provided by the session factory:
factory.evictCollection("com.metaarchit.bookshop.Book.chapters"); factory.evictCollection("com.metaarchit.bookshop.Book.chapters", id);
For a bidirectional one-to-many/many-to-one association, you should call the evictCollection()
method on the collection end after the single end is updated. That means you need to remove the Book
reference from the Chapter
and update the Chapter
first. Then, call the evictCollection()
method:
Session session1 = factory.openSession(); try { Book book1 = (Book) session1.get(Book.class, id);Chapter chapter = (Chapter) book1.getChapters().iterator().next();
chapter.setBook(null);
session1.saveOrUpdate(chapter);
session1.flush();
factory.evictCollection("com.metaarchit.bookshop.Book.chapters", id);
} finally { session1.close(); } Session session2 = factory.openSession(); try { Book book2 = (Book) session2.get(Book.class, id); Hibernate.initialize(book2.getChapters()); } finally { session2.close(); }
Query result sets can be cached. This is useful when you run a particular query often with the same parameters.
In addition to caching objects loaded by a session, a query with HQL can be cached. Suppose you're running the same query in two different sessions:
Session session1 = factory.openSession(); try { Query query = session1.createQuery("from Book where name like ?"); query.setString(0, "%Hibernate%"); List books = query.list(); } finally {
session1.close(); } Session session2 = factory.openSession(); try { Query query = session2.createQuery("from Book where name like ?"); query.setString(0, "%Hibernate%"); List books = query.list(); } finally { session2.close(); }
By default, the HQL queries aren't cached. You must first enable the query cache in the Hibernate configuration file:
<hibernate-configuration>
<session-factory>
...
<property name="cache.use_query_cache">true</property>
...
</session-factory>
</hibernate-configuration>
The setting cache.use_query_cache
creates two cache regions: one holding the cached query result sets and the other holding timestamps of the more recent updates to queryable tables.
By default, the queries aren't cached. In addition to using the previous setting, to enable caching, you need to call Query.setCacheable(true)
. This allows the query to look for existing cache results or add the results of the query to the cache.
Then, you need to set the query to be cacheable before execution. The query result is cached in a region named org.hibernate.cache.QueryCache
by default.
How can a query be cached by Hibernate? If the query returns simple values, the values themselves are cached. If the query returns persistent objects, the identifiers of the objects are cached in the query region, and the persistent objects are cached in their own region:
Session session1 = factory.openSession(); try { Query query = session1.createQuery("from Book where name like ?"); query.setString(0, "%Hibernate%");query.setCacheable(true);
List books = query.list(); } finally { session1.close(); } Session session2 = factory.openSession(); try { Query query = session2.createQuery("from Book where name like ?"); query.setString(0, "%Hibernate%");query.setCacheable(true);
List books = query.list(); } finally { session2.close(); }
You can also specify the cache region for a query. Doing so lets you separate query caches in different regions and reduce the number of caches in one particular region:
...
query.setCacheable(true);
query.setCacheRegion("com.metaarchit.bookshop.BookQuery");
In this chapter, you've seen what caching is and the different levels of caching in Hibernate. Caching can be enabled at the transaction level, where data is cached for one unit of work; this is the default behavior in Hibernate. Multiple queries in a particular session share the cached data.
You've learned that at the second level, caching is enabled for a process, and the cache is associated with a SessionFactory
. Multiple sessions within a SessionFactory
share the same data. Configuring the second-level cache is a two step process: first, you decide on the concurrent strategy, and then you choose the provider that implements the caching strategy.
You've also learned that you can use four concurrent strategies: read-only, read-write, transactional, and not-strict read-write. There are many open source cache providers, including OSCache, EHCache, SwarmCache, and JBoss Cache. Each of these provides different concurrent strategies.
Query result sets also can be cached. You do so through configuration and by invoking setCacheable(true)
. Associated objects and collections can also be cached explicitly.
18.216.47.169