You use object/relational mapping to move data into the application tier in order to use an object-oriented programming language to process that data. This is a good strategy when implementing a multiuser online transaction-processing application with small to medium size data sets involved in each unit of work.
On the other hand, operations that require massive amounts of data aren’t best-suited for the application tier. You should move the operation closer to where the data lives, rather than the other way around. In an SQL system, the DML statements UPDATE and DELETE execute directly in the database and are often sufficient if you have to implement an operation that involves thousands of rows. Operations that are more complex may require additional procedures to run inside the database; therefore, you should consider stored procedures as one possible strategy. You can fall back to JDBC and SQL at any time in Hibernate applications. We discussed some these options earlier, in chapter 17. In this chapter, we show you how to avoid falling back to JDBC and how to execute bulk and batch operations with Hibernate and JPA.
A major justification for our claim that applications using an object/relational persistence layer outperform applications built using direct JDBC is caching. Although we argue passionately that most applications should be designed so that it’s possible to achieve acceptable performance without the use of a cache, there’s no doubt that for some kinds of applications, especially read-mostly applications or applications that keep significant metadata in the database, caching can have an enormous impact on performance. Furthermore, scaling a highly concurrent application to thousands of online transactions per second usually requires some caching to reduce the load on the database server(s). After discussing bulk and batch operations, we explore Hibernate’s caching system.
First we look at standardized bulk statements in JPQL, such as UPDATE and DELETE, and their equivalent criteria versions. After that, we repeat some of these operations with SQL native statements. Then, you learn how to insert and update a large number of entity instances in batches. Finally, we introduce the special org.hibernate.StatelessSession API.
The Java Persistence Query Language is similar to SQL. The main difference between the two is that JPQL uses class names instead of table names and property names instead of column names. JPQL also understands inheritance—that is, whether you’re querying with a superclass or an interface. The JPA criteria query facility supports the same query constructs as JPQL but in addition offers type-safe and easy programmatic statement creation.
The next statements we show you support updating and deleting data directly in the database without the need to retrieve them into memory. We also provide a statement that can select data and insert it as new entity instances, directly in the database.
JPA offers DML operations that are a little more powerful than plain SQL. Let’s look at the first operation in JPQL: an UPDATE.
This JPQL statement looks like an SQL statement, but it uses an entity name (class name) and property names. The aliases are optional, so you can also write update Item set active = true. You use the standard query API to bind named and positional parameters. The executeUpdate call returns the number of updated entity instances, which may be different from the number of updated database rows, depending on the mapping strategy.
This UPDATE statement only affects the database; Hibernate doesn’t update any Item instance you’ve already retrieved into the (current) persistence context. In the previous chapters, we’ve repeated that you should think about state management of entity instances, not how SQL statements are managed. This strategy assumes that the entity instances you’re referring to are available in memory. If you update or delete data directly in the database, what you’ve already loaded into application memory, into the persistence context, isn’t updated or deleted.
A pragmatic solution that avoids this issue is a simple convention: execute any direct DML operations first in a fresh persistence context. Then, use the Entity-Manager to load and store entity instances. This convention guarantees that the persistence context is unaffected by any statements executed earlier. Alternatively, you can selectively use the refresh() operation to reload the state of an entity instance in the persistence context from the database, if you know it’s been modified outside of the persistence context.
Executing a DML operation directly on the database automatically clears the optional Hibernate second-level cache. Hibernate parses your JPQL and criteria bulk operations and detects which cache regions are affected. Hibernate then clears the regions in the second-level cache. Note that this is a coarse-grained invalidation: although you may only update or delete a few rows in the ITEM table, Hibernate clears and invalidates all cache regions where it holds Item data.
This is the same operation with the criteria API:
CriteriaUpdate<Item> update = criteriaBuilder.createCriteriaUpdate(Item.class); Root<Item> i = update.from(Item.class); update.set(i.get(Item_.active), true); update.where( criteriaBuilder.equal(i.get(Item_.seller), johndoe) ); <enter/> int updatedEntities = em.createQuery(update).executeUpdate();
Another benefit is that the JPQL UPDATE statement and a CriteriaUpdate work with inheritance hierarchies. The following statement marks all credit cards as stolen if the owner’s name starts with “J” :
Hibernate knows how to execute this update, even if several SQL statements have to be generated or some data needs to be copied into a temporary table; it updates rows in several base tables (because CreditCard is mapped to several superclass and subclass tables).
JPQL UPDATE statements can reference only a single entity class, and criteria bulk operations may have only one root entity; you can’t write a single statement to update Item and CreditCard data simultaneously, for example. Subqueries are allowed in the WHERE clause, and any joins are allowed only in these subqueries.
You can update values of an embedded type: for example, update User u set u.homeAddress.street = .... You can’t update values of an embeddable type in a collection. This isn’t allowed: update Item i set i.images.title = ....
Direct DML operations, by default, don’t affect any version or timestamp values in the affected entities (as standardized by JPA). But a Hibernate extension lets you increment the version number of directly modified entity instances:
int updatedEntities = em.createQuery("update versioned Item i set i.active = true") .executeUpdate();
The version of each updated Item entity instance will now be directly incremented in the database, indicating to any other transaction relying on optimistic concurrency control that you modified the data. (Hibernate doesn’t allow use of the versioned keyword if your version or timestamp property relies on a custom org.hibernate.usertype.UserVersionType.)
With the JPA criteria API, you have to increment the version yourself:
CriteriaUpdate<Item> update = criteriaBuilder.createCriteriaUpdate(Item.class); <enter/> Root<Item> i = update.from(Item.class); <enter/> update.set(i.get(Item_.active), true); <enter/> update.set( i.get(Item_.version), criteriaBuilder.sum(i.get(Item_.version), 1) ); <enter/> int updatedEntities = em.createQuery(update).executeUpdate();
The second bulk operation we introduce is DELETE:
em.createQuery("delete CreditCard c where c.owner like 'J%'") .executeUpdate(); CriteriaDelete<CreditCard> delete = criteriaBuilder.createCriteriaDelete(CreditCard.class); <enter/> Root<CreditCard> c = delete.from(CreditCard.class); <enter/> delete.where( criteriaBuilder.like( c.get(CreditCard_.owner), "J%" ) ); <enter/> em.createQuery(delete).executeUpdate();
The same rules for UPDATE statements and CriteriaUpdate apply to DELETE and Criteria-Delete: no joins, single entity class only, optional aliases, or subqueries allowed in the WHERE clause.
Another special JPQL bulk operation lets you create entity instances directly in the database.
Let’s assume that some of your customers’ credit cards have been stolen. You write two bulk operations to mark the day they were stolen (well, the day you discovered the theft) and to remove the compromised credit-card data from your records. Because you work for a responsible company, you have to report the stolen credit cards to the authorities and affected customers. Therefore, before you delete the records, you extract everything stolen and create a few hundred (or thousand) StolenCreditCard records. You write a new mapped entity class just for this purpose:
Hibernate maps this class to the STOLENCREDITCARD table. Next, you need a statement that executes directly in the database, retrieves all compromised credit cards, and creates new StolenCreditCard records. This is possible with the Hibernate-only INSERT ... SELECT statement:
int createdRecords = em.createQuery( "insert into" + " StolenCreditCard(id, owner, cardNumber, expMonth, expYear, userId, username)" + " select c.id, c.owner, c.cardNumber, c.expMonth, c.expYear, u.id, u.username" + " from CreditCard c join c.user u where c.owner like 'J%'" ).executeUpdate();
This operation does two things. First, it selects the details of CreditCard records and the respective owner (a User). Second, it inserts the result directly into the table mapped by the StolenCreditCard class.
Note the following:
The INSERT ... SELECT statement was, at the time of writing, not supported by the JPA or Hibernate criteria APIs.
JPQL and criteria bulk operations cover many situations in which you’d usually resort to plain SQL. In some cases, you may want to execute SQL bulk operations without falling back to JDBC.
In the previous section, you saw JPQL UPDATE and DELETE statements. The primary advantage of these statements is that they work with class and property names and that Hibernate knows how to handle inheritance hierarchies and versioning when generating SQL. Because Hibernate parses JPQL, it also knows how to efficiently dirty-check and flush the persistence context before the query and how to invalidate second-level cache regions.
If JPQL doesn’t have the features you need, you can execute native SQL bulk -statements:
With JPA native bulk statements, you must be aware of one important issue: Hibernate will not parse your SQL statement to detect the affected tables. This means Hibernate doesn’t know whether a flush of the persistence context is required before the query executes. In the previous example, Hibernate doesn’t know you’re updating rows in the ITEM table. Hibernate has to dirty-check and flush any entity instances in the persistence context when you execute the query; it can’t only dirty-check and flush Item instances in the persistence context.
You must consider another issue if you enable the second-level cache (if you don’t, don’t worry): Hibernate has to keep your second-level cache synchronized to avoid returning stale data, so it will invalidate and clear all second-level cache regions when you execute a native SQL UPDATE or DELETE statement. This means your second-level cache will be empty after this query!
You can get more fine-grained control over dirty checking, flushing, and second-level cache invalidation with the Hibernate API for SQL queries:
With the addSynchronizedEntityClass() method, you can let Hibernate know which tables are affected by your SQL statement and Hibernate will clear only the relevant cache regions. Hibernate now also knows that it has to flush only modified Item entity instance in the persistence context, before the query.
Sometimes you can’t exclude the application tier in a mass data operation. You have to load data into application memory and work with the EntityManager to perform your updates and deletions, which brings us to batch processing.
If you have to create or update a few hundred or thousand entity instances in one transaction and unit of work, you may run out of memory. Furthermore, you have to consider the time it takes for the transaction to complete. Most transaction managers have a low transaction timeout, in the range of seconds or minutes. The Bitronix transaction manager used for the examples in this book has a default transaction time-out of 60 seconds. If your unit of work takes longer to complete, you should first override this timeout for a particular transaction:
This is the UserTransaction API. Only future transactions started on this thread will have the new timeout. You must set the timeout before you begin() the transaction.
Next, let’s insert a few thousand Item instances into the database in a batch.
Every transient entity instance you pass to EntityManager#persist() is added to the persistence context cache, as explained in section 10.2.8. To prevent memory exhaustion, you flush() and clear() the persistence context after a certain number of insertions, effectively batching the inserts.
You should set the hibernate.jdbc.batch_size property in the persistence unit to the same size as your batch, here 100. With this setting, Hibernate will batch the INSERT statements at the JDBC level, with PreparedStatement#addBatch().
A batch procedure persisting several different entity instances in an interleaved fashion, let’s say an Item, then a User, then another Item, another User, and so on, isn’t efficiently batched at the JDBC level. When flushing, Hibernate generates an insert into ITEM SQL statement, then an insert into USERS statement, then another insert into ITEM statement, and so on. Hibernate can’t execute a larger batch at once, given that each statement is different from the last. If you enable the property hibernate.order_inserts in the persistence unit configuration, Hibernate sorts the operations before trying to build a batch of statements. Hibernate then executes all INSERT statements for the ITEM table and all INSERT statements for the USERS table. Then, Hibernate can batch the statements at the JDBC level.
If you enable the shared second-level cache for the Item entity, you should then bypass the cache for your batch (insertion) procedure; see section 20.2.5.
A serious problem with mass insertions is contention on the identifier generator: every call of EntityManager#persist() must obtain a new identifier value. Typically, the generator is a database sequence, called once for every persisted entity instance. You have to reduce the number of database round trips for an efficient batch procedure.
In section 4.2.5, we recommended the Hibernate-specific enhanced-sequence generator, not least because it supports certain optimizations ideal for batch operations. First, define the generator in the package-info.java metadata:
@org.hibernate.annotations.GenericGenerator( name = "ID_GENERATOR_POOLED", strategy = "enhanced-sequence", parameters = { @org.hibernate.annotations.Parameter( name = "sequence_name", value = "JPWH_SEQUENCE" ), @org.hibernate.annotations.Parameter( name = "increment_size", value = "100" ), @org.hibernate.annotations.Parameter( name = "optimizer", value = "pooled-lo" ) })
Now use the generator with @GeneratedValue in your mapped entity classes.
With increment_size set to 100, the sequence produces the “next” values 100, 200, 300, 400, and so on. The pooled-lo optimizer in Hibernate generates intermediate values each time you call persist(), without another round trip to the database. Therefore, if the next value obtained from the sequence is 100, Hibernate will generate the identifier values 101, 102, 103, and so on in the application tier. Once the optimizer’s pool of 100 identifier values is exhausted, the database obtains the next sequence value, and the procedure repeats. This means you only make one round trip to get an identifier value from the database per batch of 100 insertions. Other identifier generator optimizers are available, but the pooled-lo optimizer covers virtually all use cases and is the easiest to understand and configure.
Be aware that an increment size of 100 will leave large gaps in between numeric identifiers if an application uses the same sequence but doesn’t apply the same algorithm as Hibernate’s optimizer. This shouldn’t be too much of a concern; instead of being able to generate a new identifier value each millisecond for 300 million years, you might exhaust the number space in 3 million years.
You can use the same batching technique to update large number of entity instances.
Imagine that you have to manipulate many Item entity instances and that the changes you need to make aren’t as trivial as setting a flag (which you’ve done with a single UPDATE JPQL statement previously). Let’s also assume that you can’t create a database stored procedure, for whatever reason (maybe because your application has to work on database-management systems that don’t support stored procedures). Your only choice is to write the procedure in Java and to retrieve a massive amount of data into memory to run it through the procedure.
This requires working in batches and scrolling through a query result with a database cursor, which is a Hibernate-only query feature. Please review our explanation of scrolling with cursors in section 14.3.3 and make sure database cursors are properly supported by your DBMS and JDBC driver. The following code loads 100 Item entity instances at a time for processing.
For the best performance, you should set the size of the property hibernate.jdbc .batch_size in the persistence unit configuration to the same value as your procedure batch: 100. Hibernate batches at the JDBC level all UPDATE statements executed while flushing. By default, Hibernate won’t batch at the JDBC level if you’ve enabled versioning for an entity class—some JDBC drivers have trouble returning the correct updated row count for batch UPDATE statements (Oracle is known to have this issue). If you’re sure your JDBC driver supports this properly, and your Item entity class has an @Version annotation, enable JDBC batching by setting the property hibernate.jdbc.batch_versioned_data to true. If you enable the shared second-level cache for the Item entity, you should then bypass the cache for your batch (update) procedure; see section 20.2.5.
Another option that avoids memory consumption in the persistence context (by effectively disabling it) is the org.hibernate.StatelessSession interface.
The persistence context is an essential feature of the Hibernate engine. Without a persistence context, you can’t manipulate entity state and have Hibernate detect your changes automatically. Many other things wouldn’t also be possible.
Hibernate offers an alternative interface, however, if you prefer to work with your database by executing statements. The statement-oriented interface org.hibernate.StatelessSession, feels and works like plain JDBC, except that you get the benefit of mapped persistent classes and Hibernate’s database portability. The most interesting methods in this interface are insert(), update(), and delete(), which all map to the equivalent immediately executed JDBC/SQL operation.
Let’s write the same “update all item entity data” procedure from the earlier example with this interface.
Disabling the persistence context and working with the StatelessSession interface has some other serious consequences and conceptual limitations (at least, if you compare it to a regular EntityManager and org.hibernate.Session):
Good use cases for a StatelessSession are rare; you may prefer it if manual batching with a regular EntityManager becomes cumbersome.
In the next section, we introduce the Hibernate shared caching system. Caching data on the application tier is a complementary optimization that you can utilize in any sophisticated multiuser application.
In this section, we show you how to enable, tune, and manage the shared data caches in Hibernate. The shared data cache is not the persistence context cache, which Hibernate never shares between application threads. For reasons explained in section 10.1.2, this isn’t optional. We call the persistence context a first-level cache. The shared data cache—the second-level cache—is optional, and although JPA standardizes some configuration settings and mapping metadata for shared caching, every vendor has different solutions for optimization. Let’s start with some background information and explore the architecture of Hibernate’s shared cache.
A cache keeps a representation of current database state close to the application, either in memory or on disk of the application server machine. A cache is a local copy of the data and sits between your application and the database. Simplified, to Hibernate a cache looks like a map of key/value pairs. Hibernate can store data in the cache by providing a key and a value, and it can look up a value in the cache with a key.
Hibernate has several types of shared caches available. You may use a cache to avoid a database hit whenever the following take place:
It’s critically important to understand that the entity data cache is the only type of cache that holds actual entity data values. The other three cache types only hold entity identifier information. Therefore, it doesn’t make sense to enable the natural identifier cache, for example, without also enabling the entity data cache. A lookup in the natural identifier cache will, when a match is found, always involve a lookup in the entity data cache. We’ll further analyze this behavior below with some code examples.
As we hinted earlier, Hibernate has a two-level cache architecture.
Hibernate holds data in the second-level entity cache as a copy in a disassembled format and reassembles it when read from the cache. Copying data is an expensive operation; so, as an optimization, Hibernate allows you to specify that immutable data may be stored as is rather than copied into the second-level cache. This is useful for reference data. Let’s say you have a City entity class with the properties zipcode and name, annotated @Immutable. If you enable the configuration property hibernate.cache.use_reference_entries in your persistence unit, Hibernate will try to (and can’t in some special cases) to store a reference of City directly in the second-level data cache. One caveat is that if you accidentally modify an instance of City in your application, the change will effectively write-through to all concurrent users of the (local) cache region, because they all get the same reference.
You can see the various elements of Hibernate’s caching system in figure 20.1. The first-level cache is the persistence context cache, which we discussed in section 10.1.2. Hibernate does not share this cache between threads; each application thread has its own copy of the data in this cache. Hence, there are no issues with transaction isolation and concurrency when accessing this cache.
The second-level cache system in Hibernate may be process-scoped in the JVM or may be a cache system that can work in a cluster of JVMs. Multiple application threads may access the shared second-level caches concurrently. The cache concurrency strategy defines the transaction isolation details for entity data, collection elements, and natural identifier caches. Whenever an entry is stored or loaded in these caches, Hibernate will coordinate access with the configured strategy. Picking the right cache concurrency strategy for entity classes and their collections can be challenging, and we’ll guide you through the process with several examples later on.
The query result cache also has its own, internal strategy for handling concurrent access and keeping the cached results fresh and coordinated with the database. We show you how the query cache works and for which queries it makes sense to enable result caching.
The cache provider implements the physical caches as a pluggable system. For now, Hibernate forces you to choose a single cache provider for the entire persistence unit. The cache provider is responsible for handling physical cache regions—the buckets where the data is held on the application tier (in memory, in indexed files, or even replicated in a cluster). The cache provider controls expiration policies, such as when to remove data from a region by timeout, or keeping only the most-recently used data when the cache is full. The cache provider implementation may be able to communicate with other instances in a cluster of JVMs, to synchronize data in each instance’s buckets. Hibernate itself doesn’t handle any clustering of caches; this is fully delegated to the cache provider engine.
In this section, you set up caching on a single JVM with the Ehcache provider, a simple but very powerful caching engine (originally developed for Hibernate specifically as the easy Hibernate cache). We only cover some of Ehcache’s basic settings; consult its manual for more information.
Frequently, the first question many developers have about the Hibernate caching system is, “Will the cache know when data is modified in the database?” Let’s try to answer this question before you get hands-on with cache configuration and usage.
If an application does not have exclusive access to the database, shared caching should only be used for data that changes rarely and for which a small window of inconsistency is acceptable after an update. When another application updates the database, your cache may contain stale data until it expires. The other application may be a database-triggered stored procedure or even an ON DELETE or ON UPDATE foreign key option. There is no way for Hibernate’s cache system to know when another application or trigger updates the data in the database; the database can’t send your application a message. (You could implement such a messaging system with database triggers and JMS, but doing so isn’t exactly trivial.) Therefore, using caching depends on the type of data and the freshness of the data required by your business case.
Let’s assume for a moment that your application has exclusive access to the database. Even then, you must ask the same questions as a shared cache makes data retrieved from the database in one transaction visible to another transaction. What transaction isolation guarantees should the shared cache provide? The shared cache will affect the isolation level of your transactions, whether you read only committed data or if reads are repeatable. For some data, it may be acceptable that updates by one application thread aren’t immediately visible by other application threads, providing an acceptable window of inconsistency. This would allow a much more efficient and aggressive caching strategy.
Start this design process with a diagram of your domain model, and look at the entity classes. Good candidates for caching are classes that represent
Bad candidates include
These aren’t the only rules we usually apply. Many applications have a number of classes with the following properties:
We sometimes call this kind of data reference data. Examples of reference data are Zip codes, locations, static text messages, and so on. Reference data is an excellent candidate for shared caching, and any application that uses reference data heavily will benefit greatly from caching that data. You allow the data to be refreshed when the cache timeout period expires, and some small window of inconsistency is acceptable after an update. In fact, some reference data (such as country codes) may have an extremely large window of inconsistency or may be cached eternally if the data is read-only.
You must exercise careful judgment for each class and collection for which you want to enable caching. You have to decide which concurrency strategy to use.
A cache concurrency strategy is a mediator: it’s responsible for storing items of data in the cache and retrieving them from the cache. This important role defines the transaction isolation semantics for that particular item. You’ll have to decide, for each persistent class and collection, which cache concurrency strategy to use if you want to enable the shared cache.
The four built-in Hibernate concurrency strategies represent decreasing levels of strictness in terms of transaction isolation:
With decreasing strictness come increasing performance and scalability. A clustered asynchronous cache with NONSTRICT_READ_WRITE can handle many more transactions than a synchronous cluster with TRANSACTIONAL. You have to evaluate carefully the performance of a clustered cache with full transaction isolation before using it in production. In many cases, you may be better off not enabling the shared cache for a particular class, if stale data isn’t an option!
You should benchmark your application with the shared cache disabled. Enable it for good candidate classes, one at a time, while continuously testing the scalability of your system and evaluating concurrency strategies. You must have automated tests available to judge the impact of changes to your cache setup. We recommend that you write these tests first, for the performance and scalability hotspots of your application, before you enable the shared cache.
With all this theory under your belt, it’s time to see how caching works in practice. First, you configure the shared cache.
You configure the shared cache in your persistence.xml configuration file.
The second-level cache system is now ready, and Hibernate will start Ehcache when you build an EntityManagerFactory for this persistence unit. Hibernate won’t cache anything by default, though; you have to enable caching selectively for entity classes and their collections.
We now look at entity classes and collections of the CaveatEmptor domain model and enable caching with the right concurrency strategy. In parallel, you’ll configure the necessary physical cache regions in the Ehcache configuration file.
First the User entity: this data rarely changes, but, of course, a user may change their user name or address from time to time. This isn’t critical data in a financial sense; few people make buying decisions based on a user’s name or address. A small window of inconsistency is acceptable when a user changes name or address information. Let’s say there is no problem if, for a maximum of one minute, the old information is still visible in some transactions. This means you can enable caching with the NONSTRICT_READ_WRITE strategy:
The @Cacheable annotation enables the shared cache for this entity class, but a Hibernate annotation is necessary to pick the concurrency strategy. Hibernate stores and loads User entity data in the second-level cache, in a cache region named your.package.name.User. You can override the name with the region attribute of the @Cache annotation. (Alternatively, you can set a global region name prefix with the hibernate.cache.region_prefix property in the persistence unit.)
You also enable the natural identifier cache for the User entity with @org.hibernate.annotations.NaturalIdCache. The natural identifier properties are marked with @org.hibernate.annotations.NaturalId, and you have to tell Hibernate whether the property is mutable. This enables you to look up User instances by username without hitting the database.
Next, configure the cache regions for both the entity data and the natural identifier caches in Ehcache:
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ehcache.org/ehcache.xsd"> <enter/> <cache name="org.jpwh.model.cache.User" maxElementsInMemory="500" eternal="false" timeToIdleSeconds="30" timeToLiveSeconds="60"/> <enter/> <cache name="org.jpwh.model.cache.User##NaturalId" maxElementsInMemory="500" eternal="false" timeToIdleSeconds="30" timeToLiveSeconds="60"/> <enter/> </ehcache>
You can store a maximum 500 entries in both caches, and Ehcache won’t keep them eternally. Ehcache will remove an element if it hasn’t been accessed for 30 seconds and will remove even actively accessed entries after 1 minute. This guarantees that your window of inconsistency from cache reads is never more than 1 minute. In other words, the cache region(s) will hold up to the 500 most-recently used user accounts, none older than 1 minute, and shrink automatically.
Let’s move on to the Item entity class. This data changes frequently, although you still have many more reads than writes. If the name or description of an item is changed, concurrent transactions should see this update immediately. Users make financial decisions, whether to buy an item, based on the description of an item. Therefore, READ_WRITE is an appropriate strategy:
@Entity @Cacheable @org.hibernate.annotations.Cache( usage = org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE ) public class Item { <enter/> // ... }
Hibernate will coordinate reads and writes when Item changes are made, ensuring that you can always read committed data from the shared cache. If another application is modifying Item data directly in the database, all bets are off! You configure the cache region in Ehcache to expire the most-recently used Item data after one hour, to avoid filling up the cache bucket with stale data:
<cache name="org.jpwh.model.cache.Item" maxElementsInMemory="5000" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600"/>
Consider the bids collection of the Item entity class: A particular Bid in the Item#bids collection is immutable, but the collection itself is mutable, and concurrent units of work need to see any addition or removal of a collection element immediately:
public class Item { <enter/> @OneToMany(mappedBy = "item") @org.hibernate.annotations.Cache( usage = org.hibernate.annotations.CacheConcurrencyStrategy.READ_WRITE ) protected Set<Bid> bids = new HashSet<>(); <enter/> // ... }
You configure the cache region with the same settings as for the entity class owning the collection, because each Item has one bids collection:
<cache name="org.jpwh.model.cache.Item.bids" maxElementsInMemory="5000" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600"/>
It’s critical to remember that the collection cache will not contain the actual Bid data. The collection cache only holds a set of Bid identifier values. Therefore, you must enable caching for the Bid entity as well. Otherwise, Hibernate may hit the cache when you start iterating through Item#bids, but then, due to cache misses, load each Bid separately from the database. This is a case where enabling the cache will result in more load on your database server!
We’ve said that Bids are immutable, so you can cache this entity data as READ_ONLY:
@Entity @org.hibernate.annotations.Immutable @Cacheable @org.hibernate.annotations.Cache( usage = CacheConcurrencyStrategy.READ_ONLY ) public class Bid { <enter/> // ... }
Even though Bids are immutable, you should configure an expiration policy for the cache region, to prevent old bid data from clogging up the cache eternally:
<cache name="org.jpwh.model.cache.Bid" maxElementsInMemory="100000" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600"/>
You’re now ready to test the cache and explore Hibernate’s caching behavior.
Hibernate’s transparent caching behavior can be difficult to analyze. The API for loading and storing data is still the EntityManager, with Hibernate automatically writing and reading data in the cache. Of course, you can see actual database access by logging Hibernate’s SQL statements, but you should familiarize yourself with the org.hibernate.stat.Statistics API to obtain more information about a unit of work and see what’s going on behind the scenes. Let’s run through some examples to see how this works.
You enabled the statistics collector earlier in the persistence unit configuration, in section 20.2.2. You access the statistics of the persistence unit on the org.hibernate.SessionFactory:
Statistics stats = JPA.getEntityManagerFactory() .unwrap(SessionFactory.class) .getStatistics(); <enter/> SecondLevelCacheStatistics itemCacheStats = stats.getSecondLevelCacheStatistics(Item.class.getName()); assertEquals(itemCacheStats.getElementCountInMemory(), 3); assertEquals(itemCacheStats.getHitCount(), 0);
Here, you also get statistics for the data cache region for Item entities, and you can see that there are several entries already in the cache. This is a warm cache; Hibernate stored data in the cache when the application saved Item entity instances. However, the entities haven’t been read from the cache, and the hit count is zero.
When you now look up an Item instance by identifier, Hibernate attempts to read the data from the cache and avoids executing an SQL SELECT statement:
Item item = em.find(Item.class, ITEM_ID); assertEquals(itemCacheStats.getHitCount(), 1);
You also have some User entity data in the cache, so initializing the Item#seller proxy hits the cache, too:
When you iterate through the Item#bids collection, Hibernate uses the cache:
The special natural identifier cache for Users is not completely transparent. You need to call a method on the org.hibernate.Session to perform a lookup by natural identifier:
The statistics API offers much more information than we’ve shown in these simple examples; we encourage you to explore this API further. Hibernate collects information about all its operations, and these statistics are useful for finding hotspots such as the queries taking the longest time and the entities and collections most accessed.
You can analyze Hibernate statistics at runtime through the standard Java Management Extension (JMX) system. Bind the Hibernate Statistics object as an MBean; this is only a few lines of code with a dynamic proxy. We’ve included an example in org.jpwh.test.cache.SecondLevel.
As mentioned at the beginning of this section, Hibernate transparently writes and reads the cached data. For some procedures, you need more control over cache usage, and you may want to bypass the caches explicitly. This is where cache modes come into play.
JPA standardizes control of the shared cache with several cache modes. The following EntityManager#find() operation, for example, doesn’t attempt a cache lookup and hits the database directly:
The default CacheRetrieveMode is USE; here, you override it for one operation with BYPASS.
A more common usage of cache modes is the CacheStoreMode. By default, Hibernate puts entity data in the cache when you call EntityManager#persist(). It also puts data in the cache when you load an entity instance from the database. But if you store or load a large number of entity instances, you may not want to fill up the available cache. This is especially important for batch procedures, as we showed earlier in this chapter.
You can disable storage of data in the shared entity cache for the entire unit of work by setting a CacheStoreMode on the EntityManager:
Let’s look at the special cache mode CacheStoreMode.REFRESH. When you load an entity instance from the database with the default CacheStoreMode.USE, Hibernate first asks the cache whether it already has the data of the loaded entity instance. Then, if the cache already contains the data, Hibernate doesn’t put the loaded data into the cache. This avoids a cache write, assuming that cache reads are cheaper. With the REFRESH mode, Hibernate always puts loaded data into the cache without first querying the cache
In a cluster with synchronous distributed caching, writing to all cache nodes is usually a very expensive operation. In fact, with a distributed cache, you should set the configuration property hibernate.cache.use_minimal_puts to true. This optimizes second-level cache operation to minimize writes, at the cost of more frequent reads. If, however, there is no difference for your cache provider and architecture between reads and writes, you may want to disable the additional read with CacheStoreMode.REFRESH. (Note that some cache providers in Hibernate may set use_minimal_ puts: for example, with Ehcache this setting is enabled by default.)
Cache modes, as you’ve seen, can be set on the find() operation and for the entire EntityManager. You can also set cache modes on the refresh() operation and on individual Querys as hints, as discussed in section 14.5. The per-operation and per-query settings override the cache mode of the EntityManager.
The cache mode only influences how Hibernate works with the caches internally. Sometimes you want to control the cache system programmatically: for example, to remove data from the cache.
The standard JPA interface for controlling the caches is the Cache API:
EntityManagerFactory emf = JPA.getEntityManagerFactory(); Cache cache = emf.getCache(); <enter/> assertTrue(cache.contains(Item.class, ITEM_ID)); cache.evict(Item.class, ITEM_ID); cache.evict(Item.class); cache.evictAll();
This is a simple API, and it only allows you to access cache regions of entity data. You need the org.hibernate.Cache API to access the other cache regions, such as the collection and natural identifier cache regions:
org.hibernate.Cache hibernateCache = cache.unwrap(org.hibernate.Cache.class); <enter/> assertFalse(hibernateCache.containsEntity(Item.class, ITEM_ID)); hibernateCache.evictEntityRegions(); hibernateCache.evictCollectionRegions(); hibernateCache.evictNaturalIdRegions(); hibernateCache.evictQueryRegions();
You’ll rarely need these control mechanisms. Also, note that eviction of the second-level cache is nontransactional: that is, Hibernate doesn’t lock the cache regions during eviction.
Let’s move on to the last part of the Hibernate caching system: the query result cache.
The query result cache is by default disabled, and every JPA, criteria, or native SQL query you write always hits the database first. In this section, we show you why Hibernate disables the query cache by default and then how to enable it for particular queries when needed.
The following procedure executes a JPQL query and stores the result in a special cache region for query results:
The org.hibernate.cachable hint is set on the Query API, so it also works for criteria and native SQL queries. Internally, the cache key is the SQL Hibernate uses to access the database, with arguments rendered into the string where you had parameter markers.
The query result cache doesn’t contain the entire result set of the SQL query. In the last example, the SQL result set contained rows from the ITEM table. Hibernate ignores most of the information in this result set; only the ID value of each ITEM record is stored in the query result cache. The property values of each Item are stored in the entity cache region.
Now, when you execute the same query again, with the same argument values for its parameters, Hibernate first accesses the query result cache. It retrieves the identifier values of the ITEM records from the cache region for query results. Then, Hibernate looks up and assembles each Item entity instance by identifier from the entity cache region. If you query for entities and decide to enable caching, make sure you also enable regular data caching for these entities. If you don’t, you may end up with more database hits after enabling the query result cache!
If you cache the result of a query that doesn’t return entity instances but returns only scalar or embeddable values (for example, select i.name from Item i or select u.homeAddress from User), the values are held in the query result cache region directly.
The query result cache uses two physical cache regions:
<cache name="org.hibernate.cache.internal.StandardQueryCache" maxElementsInMemory="500" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600"/> <enter/> <cache name="org.hibernate.cache.spi.UpdateTimestampsCache" maxElementsInMemory="50" eternal="true"/>
The first cache region is where the query results are stored. You should let the cache provider expire the most-recently used result sets over time, such that the cache uses the available space for recently executed queries.
The second region, org.hibernate.cache.spi.UpdateTimestampsCache, is special: Hibernate uses this region to decide whether a cached query result set is stale. When you re-execute a query with caching enabled, Hibernate looks in the timestamp cache region for the timestamp of the most recent insert, update, or delete made to the queried table(s). If the timestamp found is later than the timestamp of the cached query results, Hibernate discards the cached results and issues a new database query. This effectively guarantees that Hibernate won’t use the cached query result if any table that may be involved in the query contains updated data; hence, the cached result may be stale. You should disable expiration of the update timestamp cache so that the cache provider never removes an element from this cache. The maximum number of elements in this cache region depends on the number of tables in your mapped model.
The majority of queries don’t benefit from result caching. This may come as a surprise. After all, it sounds like avoiding a database hit is always a good thing. There are two good reasons this doesn’t always work for arbitrary queries, compared to entity retrieval by identifier or collection initialization.
First, you must ask how often you’re going to execute the same query repeatedly, with the same arguments. Granted, your application may execute a few queries repeatedly with exactly the same arguments bound to parameters and the same automatically generated SQL statement. We consider this a rare case, but when you’re certain you’re executing a query repeatedly, it becomes a good candidate for result set caching.
Second, for applications that perform many queries and few inserts, deletes, or updates, caching query results can improve performance and scalability. On the other hand, if the application performs many writes, Hibernate won’t use the query result cache efficiently. Hibernate expires a cached query result set when there is any insert, update, or delete of any row of a table that appears in the cached query result. This means cached results may have a short lifetime, and even if you execute a query repeatedly, Hibernate won’t use cached results due to concurrent modifications of rows in the tables referenced by the query.
For many queries, the benefit of the query result cache is nonexistent or, at least, doesn’t have the impact you’d expect. But if your query restriction is on a unique natural identifier, such as select u from User u where u.username = ?, you should consider natural identifier caching and lookup as shown earlier in this chapter.
3.21.93.20