Chapter 9. Best Practices and Advanced Techniques

Throughout this book, you've learned a lot about the workings of the Spring and Hibernate frameworks. In this chapter, you will learn the techniques necessary for building a performant, production-ready application. Although Hibernate and Spring are relatively easy to bootstrap, their default settings are appropriate only for simple applications. If you are building an application with significant load or performance requirements, you will likely need to do some fine-tuning in order to attain peak performance. In most scenarios, you can circumvent performance issues simply by leveraging the appropriate optimization or architectural strategies.

Lazy Loading Issues

Lazy loading has long been regarded as one of Hibernate's most valuable features, especially with respect to improving performance. By declaring a domain object's association or property to be lazy, an application can avoid undue overhead on the underlying database, which can often lead to faster response times and smaller datasets—both favorable qualities.

Without lazy loading, a simple query may be executed over and over again unnecessarily, or worse yet, a query for a single domain entity might force the loading of an entire object graph, as Hibernate attempts to traverse from one association to the next.

The problem is that lazy loading is a double-edged sword. It is vital for maintaining decent loading performance, but is also a significant risk for major performance problems. While lazy loading reduces the amount of data (as well as the potential for table joins) loaded from the database, this laziness can be very problematic for data that might need to be loaded from the database anyway.

This is not to imply that lazy loading is a bad feature or that it should be disabled. It is a misunderstood problem that is very dependent on the context.

Let's begin by looking at one of the most common and significant issues related to lazy loading that affects persistence-based applications.

The N+1 Selects Problem

Let's examine the way in which lazy loading works in a typical use case. In our sample application, our Category domain object contains a one-to-many association to the ArtEntity domain object. In other words, a Category contains a collection of ArtEntity instances:

@Entity
public class Category implements DomainObject {

    private Long id;
    private String categoryName;
    private String categoryDescription;
    private Set<ArtEntity> artEntities = new HashSet<ArtEntity>();

    . . .

    @ManyToMany
    public Set<ArtEntity> getArtEntities() {
        return artEntities;
    }

    public void setArtEntities(Set<ArtEntity> artEntities){
        this.artEntities = artEntities;
    }

    . . .

}

By default, the java.util.Set of ArtEntity entities is declared lazy. Let's consider what happens under the hood when we attempt to load all the artEntities for a series of categories.

entityManager.createQuery("SELECT c FROM Category c").getResultList();

Assuming there is at least one row in the Category table, the preceding statements will return a list of Category instances. However, because our artEntities association (within the Category class) is declared to be lazy, Hibernate will not perform a SQL join in an attempt to load data from both the Category table and the related rows from the ArtEntity table. Instead of loading these ArtEntity rows from the database, Hibernate populates the artEntities property for each of the returned Category instances with a proxy object.

For collections, Hibernate provides persistent collection implementations that serve as proxies for the collection associations in our domain model. For instance, our artEntities property is declared as a java.util.Set. Hibernate will set this property to an instance of org.hibernate.collection.PersistentSet, a special class designed to intercept attempts to access the referenced collection so that a lazy collection can be initialized.

Hibernate will generate proxies for each domain object within an application, and will use these proxies for single-ended associations that are marked as lazy. For example, we can define our many-to-one association of commentedArt in the Comment domain object to be lazy using the following annotation:

@ManyToOne(fetch=FetchType.LAZY)
public ArtEntity getCommentedArt() {
    return commentedArt;
}

This snippet will prevent a Comment's reference to the associated ArtEntity from being loaded from the database until the property is accessed.

The goal of these proxies is to serve as placeholders of sorts. For data that is not loaded from the database, Hibernate can't simply ignore these properties. Instead, a proxy can be used to defer loading behavior. If no attempt is made to access an uninitialized, lazy property, then nothing will happen. However, if an attempt is made to access one of these proxies, then the proxy will intercept this request and trigger a callback into the database. The end result is that the lazy property is initialized with the relevant data from the database.

All of this sounds pretty ideal. But let's consider what happens if we have multiple ArtEntity instances associated with each Category. When a given Category instance is first loaded, the artEntities association is set to an instance of org.hibernate.collection.PersistentSet. Now imagine that we want to iterate through all the ArtEntities for all of the Category instances returned in our original query.

for (Category category: categories) {
    for (ArtEntity artEntity: category.getArtEntities()) {
        // implicitly initialize another collection here
        System.out.println("art:" + artEntity.getTitle());
    }
}

Although this code may seem innocuous, there is actually a serious performance issue hiding between the lines. Since the artEntities association is not yet initialized when we first retrieve each Category instance, we are actually initializing each artEntities association within each successive iteration of the loop. Because Hibernate has no way to infer what we are trying to do, it simply initializes each instance as we reference it. The result is a separate SQL query for each item within the collection. So for the preceding loop, we are actually inadvertently making (number of categories) + 1 queries! Suddenly, lazy loading doesn't seem like such an optimization technique anymore.

This disturbingly common scenario is known as the N+1 selects issue, in that a select query is issued N times (one for each item returned by the original query), plus the original query to load the entity containing the collection in the first place.

A similar predicament occurs for other associations, such as in the many-to-one reference to the ArtEntity domain object from the Comment class. In this scenario, if a list of Comment instances were to be loaded, an additional select query would be initiated each time an attempt was made to access the commentedArt property. Suppose a JSP page iterated through a long list of comments in an attempt to display related information about the comment and its associated art. This has the potential of requiring hundreds of additional round-trips to the database!

Understanding the potential for this problem is the first step, but how do we go about preventing the N+1 selects issue? Unfortunately, there is no single solution. (If there were, it would probably be an implicit part of Hibernate or JPA.) Each situation may require a slightly different approach. Fortunately, several strategies can help mitigate this potentially damaging scenario. The goal, of course, is to limit the number of SQL queries and attempt to load all the necessary data as efficiently as possible.

Less Lazy Mappings

One solution to the N+1 selects problem is to update your mapping configuration for the affected domain classes. The default behavior for collections is to be lazy and to initialize the collection via a SQL SELECT when the association is accessed. This default strategy is known as select fetching, as a second SELECT is issued in order to initialize the lazy association or property. The simplest solution is to override this default behavior, preventing the property from being lazy in the first place.

Let's refactor the mapping configuration affecting the artEntities association on our Category instance, as follows:

@ManyToMany
@Fetch(FetchMode.JOIN)
public Set<ArtEntity> getArtEntities() {
    return artEntities;
}

By adding the @Fetch annotation, specifying a FetchMode of JOIN, we request that Hibernate automatically initialize our artEntities collection by using a left outer join when a particular Category instance is loaded. Hibernate is affected by this @Fetch directive when navigating to a particular Category instance, loading an instance via get() or load(), or when loading Category instances via the Criteria API. Alternatively, you can opt to specify FetchMode.SUBSELECT, which will instead load the artEntities collection by including a SQL subselect as part of the initial query. In either case, the end result is that the artEntities association is no longer lazy, and an additional query is not required to initialize each artEntities association.

So problem solved, right? Not exactly. Remember how we mentioned that lazy loading is actually a pretty important feature, and that without it, you risk inadvertently loading too much of your entire database into memory? In other words, you may not always need the artEntities association, and in those circumstances, you are better off keeping the property as lazy.

So, sometimes it's good to be lazy, like on weekends and on vacation when you're catching up on rest. But other times being lazy can get you into trouble (especially at work). Hibernate is the same way. The best way of solving the N+1 selects problem is to keep your associations declared lazy by default, but override this behavior when you know the association is needed. For example, using JPQL, we could write the following query:

List categories =  entityManager.createQuery("SELECT c FROM category c

     LEFT JOIN FETCH c.artEntities

     WHERE c.id = :id").getResultList();

As part of this JPQL query. we issue a LEFT JOIN FETCH. This will force Hibernate to initialize our artEntities association, overriding the default lazy behavior in the mapping file.

Batching for Performance

Another strategy for reducing the number of SQL queries required to load data is to use Hibernate's batching feature, which loads multiple entities or collections. Batching offers a slightly simpler solution than controlling lazy loading. You attempt to grab data in batches to prevent this data from being loaded in many more "single queries" later on. The advantage of batching is that it can help improve performance without requiring significant changes to queries or code.

The @BatchSize annotation can be added to a domain entity or to a particular association. Let's update our artEntities association in our Category class again to see how we might be able to use Hibernate's batching feature:

@ManyToMany
@BatchSize(size = 10)
public Set<ArtEntity> getArtEntities() {
    return artEntities;
}

Now, even though our artEntities association is still lazy by default, Hibernate will get ahead of us and attempt to initialize more than just a single artEntities collection at a time. It accomplishes this by using a SQL in condition, passing in ten identifiers of a Category instance when loading from the ArtEntity table.

In other words, batching works similarly to the default lazy configuration. First a Category is loaded, then its artEntities association is loaded in a separate query (when the artEntities property is accessed, of course). However, with batching enabled, Hibernate will attempt to load more than one artEntities association, querying for the number of associations specified in the size attribute of the @BatchSize annotation.

Keep in mind that @BatchSize doesn't attempt to load multiple items within a collection. A collection is normally initialized in entirety via a separate select. Rather, @BatchSize will load multiple associations, to preclude initialization of other associations in our other Category instances (using our example).

Lazy Initialization Exceptions

Another common issue is the ominous LazyInitializationException. You can probably infer what this exception means by its name: Hibernate is unable to initialize a lazy property. What circumstances account for such a problem?

As we discussed in Chapter 4, a domain object's persistent state is managed through Hibernate's implementation of the EntityManager interface. If a new domain object is instantiated, it is considered transient until it becomes associated with the EntityManager. Similarly, an already persistent domain object can continue to be persistent if the EntityManager is closed, which transitions the entity to a Detached state. However changes to this domain object will not be "recorded" until the domain object transitions back to a Managed state by being reassociated with another EntityManager.

A domain object that has become disassociated from an EntityManager is called a detached object. Hibernate is able to detect changes made to a detached domain object and propagate these changes to the database once the instance is reassociated. However, there are some things that are difficult to work around when an EntityManager is closed, and lazy properties are one of those things.

As you learned in the previous section, Hibernate implements laziness by referencing uninitialized properties with proxies—either special persistent collection implementations or proxy classes, depending on the type of association or property. These proxies are able to defer the loading of an association until an attempt is made to access them. Once that happens, the proxies will access the EntityManager and attempt to load the necessary data from the database. Obviously, this can't happen if the EntityManager is closed, so a LazyInitializationException is thrown.

The most common cause of a LazyInitializationException stems from failing to initialize a collection or lazy property in a DAO or controller method, instead leaving a JSP or other view-related technology to discover an uninitialized property. The problem is that Hibernate will close the EntityManager by default whenever a persistent operation completes. In the case of a DAO or service method, the EntityManager is normally closed when these relevant methods return.

The best way to prevent the LazyInitializationException is to ensure that all lazy associations and properties that are required by the view are successfully initialized before the domain objects are passed to the view layer. Fortunately, Spring provides some solutions that help to prevent the occurrence of LazyInitializationExceptions, even when lazy properties are not properly initialized before passing domain objects to the view. There are a couple of variations on the solution, but they both employ the same general strategy: defer the closing of the EntityManager until after the view has finished rendering.

Now Open Late: Keeping EntityManager Open Past Its Bedtime

Deferring the EntityManager from being closed is now typically known as the Open EntityManager In View pattern. The simplest approach for applying this strategy is to use a servlet filter, as described in the next section. However, if you are using Spring MVC, an alternative is to use an interceptor.

The interceptor technique essentially opens an EntityManager at the beginning of a servlet request and binds the EntityManager to the current thread, allowing it to be accessed by Spring's Hibernate support classes. Then, at the end of the request, the EntityManager is closed and unbound from the thread. This is a bit of an oversimplification, and the implementation details differ slightly, depending on whether you are using the servlet filter or the controller interceptor. However, the basic concepts are the same: open an EntityManager and associate it with the active thread to be used by persistence-related methods, and then ensure the EntityManager is kept open until the request completes. Because the request doesn't complete until after the view rendering has finished processing, the potential for the LazyInitializationException is significantly reduced.

Using the Open EntityManager In View pattern is relatively simple. If you are already using Spring MVC, you can define the OpenEntityManagerInViewInterceptor class as a new bean, adding it to your Spring MVC configuration, like so:

<bean name="openEntityManagerInViewInterceptor"
             class="org.springframework.orm.jpa.support.OpenEntityManagerInViewInterceptor" />

With your OpenEntityManagerInViewInterceptor defined, you then need to add this interceptor to your list of MVC interceptors. The interceptors defined in this list will be invoked (in order) as part of the request-processing flow of each MVC controller. Spring MVC controllers provide hooks into the life cycle of an MVC controller, such as preHandle, postHandle, and afterCompletion. Spring 3 provides an easy way to globally define interceptors. Let's take a look at an MVC configuration file.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:mvc="http://www.springframework.org/schema/mvc"
       xmlns:p="http://www.springframework.org/schema/p"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
            http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
            http://www.springframework.org/schema/context
            http://www.springframework.org/schema/context/spring-context-3.0.xsd
            http://www.springframework.org/schema/mvc
            http://www.springframework.org/schema/mvc/spring-mvc-3.0.xsd">

    <context:component-scan base-package="com.prospringhibernate.gallery"
                            use-defaultfilters="false">
        <context:include-filter type="annotation"/>
                                expression="org.springframework.stereotype.Controller"
    </context:component-scan>

    <!—- integrates MVC Controllers via @Controller -->
    <mvc:annotation-driven/>

    <!—specifies those interceptors that will be applied to all handlerMappings -->
<mvc:interceptors>
        <bean
            class="org.springframework.orm.jpa.support.OpenEntityManagerInViewInterceptor"/>
    </mvc:interceptors>

   . . .

</beans>

In this example, we use the mvc:annotation-driven and component-scan features to allow us to enable those Spring life-cycle features and to define our controllers via annotation (meaning we can add @Controller to the class and Spring will integrate these classes as controllers, provided they are in the appropriate package path). Also notice that we added our OpenEntityManagerInViewInterceptor inline within the mvc:interceptors block. Any interceptor beans defined here will have the appropriate methods invoked within the various stages of the request life cycle.

Applying the Open EntityManager Filter

If you aren't using Spring MVC, or just don't want to use an interceptor approach, you can instead add the OpenEntityManagerInViewFilter to your web.xml file. The approach is roughly the same as the interceptor technique, except the hooks for opening and closing the EntityManager occur at the servlet-request level rather than at the controller level.

Here is how you might add the OpenEntityManagerInViewFilter to your application's web.xml file:

<!-- binds a JPA EntityManager to the thread for the entire processing of the request -->
<filter>
    <filter-name>OpenEntityManagerInViewFilter</filter-name>
    <filter-class>org.springframework.orm.jpa.support.OpenEntityManagerInViewFilter</filter-class>
</filter>

<!—Map the EntityManager Filter to all requests -->
<filter-mapping>
    <filter-name>OpenEntityManagerInViewFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

This snippet is an excerpt from a web.xml file that references the filter definition and mapping necessary for integrating the OpenEntityManagerInViewFilter. It is important that you set the appropriate filter-mapping glob pattern, as this will define to which URLs processing should be applied.

Caching

So far, we have discussed a few strategies for reducing or optimizing trips to the database. Even better than improving the ways in which data is queried is to preclude the need for accessing the database at all. Obviously, some database access is always needed, but caching can go quite a long way toward minimizing database load and improving application performance.

One of Hibernate's greatest advantages is that it gives developers many features "for free." And one of these free features is implicit caching. If you were to decide to implement a persistence layer using plain JDBC, you would need to explicitly integrate caching within your DAO methods or at some lower level of abstraction. While caching may seem trivial to implement on the surface, you will begin to perceive the complexity when you consider the rules for invalidation (the factors that cause a particular item in the cache to be expired), preventing conflicts, and handling a cached item's time to live (TTL).

So if Hibernate provides all these caching features for free, what is the benefit of understanding the mechanics of caching? Although Hibernate includes some foundational caching features, providing basic optimizations to limit any unnecessary trips to the database, tuning its default caching behavior can significantly improve your application's performance.

To be able to leverage caching for improved application performance, you need to understand the different layers of caching within Hibernate and what can actually be cached. For all domain objects, Hibernate provides two distinct caching levels:

  • The first-level, or L1, cache is provided by the EntityManager, and therefore relates only to the limited scope of a particular user or request. The first-level cache is designed primarily as an optimization, preventing the requerying of domain objects that have already been loaded.

  • The second-level, or L2, cache is scoped to the EntityManagerFactory, and therefore is longer-lived and can provide caching capabilities across multiple users and requests. The second-level cache provides the most utility and flexibility for optimization through caching.

So, the approach is to activate the second-level cache and integrate a cache provider to start caching. Now we need to consider what can be cached.

Hibernate caches domain objects in slightly different ways. Each top-level domain object is cached within a different region. A region is essentially a different section or namespace, intended to partition each entity and prevent the potential for clashes. Each domain object is persisted to a cache using its identifier as the key. So, given a cache region and an identifier, you are able to access the data for a particular domain object. Each domain object is cached by storing the values of its respective properties.

However, a domain object's references and collections are persisted separately from a domain object. In other words, the cached representation of a domain object will reference only the identifiers of its references. For example, many-to-one associations will be persisted as a single ID, while a collection will be persisted as a list of identifiers. Domain object collections are actually persisted within a separate cache region, intended specifically for that particular collection. The key in this case is still the parent domain object's identifier, but the region is specific to the domain object and the collection name. The value, however, is a list of identifiers, where each identifier in the list corresponds to the ID of each entity referenced in the original collection.

Hibernate uses this strategy because it is more efficient to just store the IDs of each entity within a collection, rather than the data of every entity in its entirety. The intention is that having the IDs should be enough, since the full data should be cached elsewhere, within the referenced domain object's own cache region. Furthermore, caching references as identifiers decouples the domain objects to which they relate, ensuring that changes to the referenced domain objects are cached only in a single location. This is obviously far simpler than managing a complex dependency tree—especially when you begin to consider the complexity of invalidating a particular item when it expires or when an update is made to the database.

Integrating a Caching Implementation

Hibernate provides a generic abstraction layer for caching functionality, allowing numerous caching implementations to be easily plugged in to the Hibernate infrastructure. There are a variety of excellent caching solutions, including Ehcache, SwarmCache, JBoss Infinispan, and many more. Each caching implementation differs slightly in the feature set it provides. For instance, some implementations offer clustering capability, allowing multiple nodes within a cluster to share the same caching data (which can reduce the potential for cache conflicts and stale data). Some caching solutions provide specialized features, such as transactional behavior.

Note

The choice of which cache provider to use depends on your requirements. Generally, we recommend Ehcache, a flexible open source caching implementation that provides clustering capability. If your application has requirements for a transactional cache or other specific needs, you should take a look at some of the other cache provider choices.

Let's revisit our persistence.xml configuration and modify it to incorporate Ehcache.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
                          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                          xsi:schemaLocation="http://java.sun.com/xml/ns/persistence

http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
                          version="2.0">

    <persistence-unit name="galleryPersistenceUnit" transaction-type="RESOURCE_LOCAL">
        <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <properties>
            <property name="hibernate.dialect" value="org.hibernate.dialect.H2Dialect"/>
            <!--
                value='create' to build a new database on each run;
                value='update' to modify an existing database;
                value='create-drop' to create and drop tables on each run;
                value='validate' makes no changes to the database
             -->
            <property name="hibernate.hbm2ddl.auto" value="create"/>
            <property name="hibernate.show_sql" value="true"/>
            <property name="hibernate.cache.use_second_level_cache" value="true"/>
            <property name="hibernate.cache.provider_class"
                      value="net.sf.ehcache.hibernate.SingletonEhCacheProvider"/>
            <property name="hibernate.ejb.naming_strategy"
                      value="org.hibernate.cfg.ImprovedNamingStrategy"/>
        </properties>
    </persistence-unit>

</persistence>

Here, we enable second-level caching by setting the hibernate.cache.use_second_level_cache property on the persistence unit to true. Then we specify the cache implementation, ehcache, via the hibernate.cache.provider_class property.

Once you've activated the second-level cache and selected a cache provider, you have officially started caching. Next, you need to configure the caching rules.

Determining Caching Rules

To configure the caching rules for your domain model, the simplest approach is to add the @Cache annotation to your domain objects. As an example, let's examine the caching configuration of the Category domain object in our art gallery application:

@Entity
@Cache(region="category", usage = CacheConcurrencyStrategy.READ_WRITE)
public class Category implements DomainObject {

    private Long id;
    private String categoryName;
    private String categoryDescription;
    private Set<ArtEntity> artEntities = new HashSet<ArtEntity>();

    @Id
    @GeneratedValue
    public final Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

     . . .

    @ManyToMany
    @Cache(usage=CacheConcurrencyStrategy.READ_WRITE)
    public Set<ArtEntity> getArtEntities() {
        return artEntities;
    }

    public void setArtEntities(Set<ArtEntity> artEntities){
        this.artEntities = artEntities;
    }

    . . .

}

Here, we have added a @Cache annotation in two places: at the top of the entity, which serves as the configuration for caching the domain object itself, and above our many-to-many artEntities association. Therefore, we have defined the caching rules for both the Category domain object itself and the Category domain object's artEntities collection.

In the first instance of the @Cache annotation, we also set the region attribute. This allows us to set the region within which we will be persisting our cached data. We omitted this attribute for the artEntities collection, which will then allow Hibernate to use the default region setting. The region default is the class name (including the package). For collections, the region default is the full class name, followed by .<collectionname>. So in the case of the artEntities collection, the default region name will be com.prospringhibernate.gallery.domain.Category.artEntities. Of course, we could choose to override this instead by specifying a region for the collection.

The @Cache annotation's usage attribute defines the cache strategy to use for the configured entity or collection. When using Ehcache, there are three options:

  • The read-only setting should be used only when the data to be cached will never be updated. A read-only cache strategy will provide the best performance, since cached data will never need to expire or be invalidated.

  • The nonstrict-read-write setting should be used when concurrent access of data is unlikely, as the caching implementation will not attempt to lock the cache to prevent contention or version mismatch.

  • The read-write setting is suitable when concurrent access and updating of data is likely, as this approach provides the semantics of a read-committed isolation level.

Configuring Cache Regions

Next, you need to set up the configuration for the regions into which your data will be persisted. Ehcache employs an XML configuration file that is loaded at application startup. Typically, the file is called ehcache.xml and placed at the root of the classpath. However, you can override this default location by setting the following properties in your persistence.xml file:

<prop key="hibernate.cache.region.factory_class">
  net.sf.ehcache.hibernate.EhCacheRegionFactory
</prop>
<prop key="net.sf.ehcache.configurationResourceName">
  /path/to/ehcache.xml
</prop>

The default ehcache.xml file that ships with Ehcache includes a default cache configuration that contains the settings that will be used for any region that is not explicitly defined. However, it is usually a good idea to configure each cache region you plan to include in your application. Here is an example of the definition of our cache regions for our Category domain object and the Category.artEntities collection:

<cache name="Category"
       maxElementsInMemory="10000"
       eternal="false"
       timeToIdleSeconds="300"
       timeToLiveSeconds="600"
       overflowToDisk="true"
/>
<cache name="com.prospringhibernate.gallery.domain.Category.artEntities"
               maxElementsInMemory="10000"
               eternal="false"
               timeToIdleSeconds="300"
               timeToLiveSeconds="600"
               overflowToDisk="false"
/>

We have defined two cache regions, as specified by the name attribute. Typically, the name attribute for a domain object includes the fully qualified class name (including package). However, in our earlier caching configuration of the Category domain object (the listing in the previous section), we explicitly changed the default region attribute, using the shorter region name Category instead. We left the default region value for the artEntities collection.

These cache region settings work as follows:

  • maxElementsInMemory specifies the maximum number of cached entities to store in this region. We used a value of 10000 for both cache regions, but it is important to consider this number very carefully. Using too high of a value can cause OutOfMemoryException issues, as well as degrade performance. Because object sizes and access patterns can vary so much from application to application, it is a good idea to experiment with these settings and profile your application to determine optimal values.

  • eternal specifies whether a cache region should "live forever." This value can come in handy (along with overFlowToDisk) when you want to keep your cache prepopulated in between restarts. This is also valuable in situations when it might take a lot of time to populate your cache. A value of true for eternal will ensure that your cached data will persist, even when the application needs to be restarted.

  • timeToIdleSeconds specifies how long a cached item will stay in the cache when there are no attempts to access it. For instance, if a particular Category instance is stored in the cache but there are no attempts to load this value from the cache for a while, then the benefit of keeping this item cached is questionable. It is a good idea to keep this setting to around half of the timeToLiveSeconds attribute value.

  • timeToLiveSeconds corresponds to an entity's TTL—the amount of time before the cached entity expires and the data is purged from the cache, regardless of last access.

  • overFlowToDisk specifies that if the maxElementsInMemory is exceeded, Ehcache should begin storing overflow on disk. While this setting sounds useful, keep in mind that persisting data on disk incurs significant performance penalties when compared to memory storage. You are using caching because you have a database for persisting data permanently. Of course, data cached on disk will outperform a database, but you should still consider this setting carefully.

It is very important to carefully consider your TTL values. Setting these values too high increases the potential for stale data and version conflicts. This risk is significantly increased in situations where an application is deployed in a clustered configuration (but the cache for each application server node is not shared). In a typical cluster configuration, updates made to one node will invalidate that node's cache, but these changes won't propagate to the caches of other nodes in the cluster. One solution is to use a lower TTL value for the timeToLiveSeconds attribute, which reduces the likelihood of stale data in the cache. A better solution is to use a clusterable caching solution, which allows all the nodes in the cluster to use a shared cache, significantly reducing the potential for conflicts and stale data. We will discuss clustered caching strategies later in this chapter.

Caching Your Queries

Much like collections caching, query caching attempts to store only the identifiers of the entities returned by a particular query's result. By default, queries are all cached within a single region, but you can override this setting by specifying a region name for a particular query, forcing the query to be cached elsewhere. The key for a particular cached query is composed of the query along with the identifiers or values of each of the query's parameters. This approach ensures that the results of each cached query are cached separately. If the same query is invoked with slightly different parameters, the cache will not be used.

While caching of your domain objects and collections is more a part of the default configuration, query caching requires a few additional steps. First, the second-level cache must be enabled, as described in the previous section. Next, the following property must be set to true in your persistence.xml file:

<property name="hibernate.cache.use_query_cache" value="true"/>

Hibernate leverages an additional cache region for powering its query cache implementation: the UpdateTimestampsCache. This cache region should also be configured explicitly in the Ehcache configuration file. Here is a sample configuration:

<cache name="org.hibernate.cache.UpdateTimestampsCache"
       maxElementsInMemory="5000"
       eternal="true"
       overflowToDisk="true"/>

Here, we specified that this cache region should be eternal. This is the recommended setting for the UpdateTimestampsCache, but at the very least, the TTL should be longer than the TTL of any of the query cache regions.

If you decide to use the default cache region for all query caches, you could configure the following in Ehcache for your query cache itself:

<cache name="org.hibernate.cache.StandardQueryCache"
       maxElementsInMemory="500"
       eternal="false"
       timeToLiveSeconds="120"
       overflowToDisk="true"/>

This configuration defines the cache region settings for the queries to be cached.

Caching in a Clustered Configuration

If you are building an application that is intended to handle a high volume of requests, you will likely need to set up multiple application nodes in a clustered configuration. Although having multiple nodes will provide more resources for your application, if each node maintains its own cache, you will begin to strain the database. With each additional node added to the cluster, you will increase database load commensurately, such that the number of nodes in your cluster will represent the factor of database request volume:

(Num Nodes in Cluster) * (Requests) = Load on Database

Additionally, updates to the database by one node will not be propagated to the cache state of other nodes in the cluster, resulting in stale reads. Obviously, the load on the database will increase in proportion to the number of application server nodes in the cluster, but caching must also be taken into consideration; the more effective your caching strategy, the lesser the load on the database. That said, the database load will still be multiplied by the number of nodes, even with an aggressive caching strategy. In effect, your caching efficacy is commensurately weakened as the number of nodes in your cluster increases.

When building applications that have objects that receive high volumes of writes, the solution is to remove the redundancy of maintaining a single cache per node, and instead move to a clustered caching configuration. There are several caching implementations that provide clustering capability, including Ehcache and SwarmCache. For our discussion, we'll continue using Ehcache as our cache provider.

Cluster Caching and Replication Mechanics

Ehcache provides three different mechanisms for synchronizing each node's cache data. As data is persisted to one node's cache, the changes are broadcast to the other nodes in the cluster using a particular replication strategy. Ehcache supports replication via JMS, RMI, JGroups, or Terracotta. For all of these strategies, Ehcache does not attempt to use locking as a means to prevent data inconsistencies between nodes in the cluster. This is likely done for performance considerations, and therefore your application should be able to deal with the potential for stale data.

When used in the basic clustered configuration, Ehcache does not distribute the entirety of cached data across each of the nodes in the cluster. Rather, each node contains a complete set of the cached data. While this does increase memory overhead, it improves performance by reducing network overhead. To reduce your application's memory footprint, you should adjust the maximum number of objects stored within each cache region. You should also consider the average size of each entity that might be stored within a particular cache region, as this will impact the memory utilization. We have seen memory issues creep up in cache configurations with a low number of cached items, due to the large size of each item stored in the cache. These factors are rarely given ample consideration, but are often the cause of significant bottlenecks.

Regardless of the replication mechanism, Ehcache provides two different strategies for actually notifying different nodes in the cluster of changes:

  • The default strategy is to send the key of the cached item that was updated, along with the updated value. This strategy is called replicateUpdatesViaCopy, as the updated value is sent to all the other nodes in the cluster. While this approach is usually the fastest way to keep the different nodes in sync, it also carries the overhead of sending the updated value over the network. In cases where the updated value is quite large, this can have performance implications.

  • An alternative is to just send a notification to the other nodes that they should invalidate the data in their respective caches. Then once the particular cache key has been invalidated, it will eventually be reloaded from the database on the next attempt to access that particular entity (or collection) for each of the nodes in the cluster. Obviously, this will incur additional load on the database—when a cache miss occurs on each of the other nodes in the cluster, they will need to requery the database to populate their respective caches. The advantage of this approach is that only the cache key needs to be transmitted to the other nodes.

The default replication behavior is to notify other nodes of changes asynchronously, allowing cache propagation to happen in the background and not affect the response time of the original operation (the notifier). In high-concurrency scenarios in which data coherency is a top priority, Ehcache can perform replication synchronously instead, preventing the cache operation from returning until the other nodes in the cluster have been successfully notified. Since this will have significant performance implications, it should be used only in specialized situations.

Configuring Replication

Ehcache clustering implementation does not require any changes to an application's code or architecture. You just need to modify the Ehcache configuration.

To get rolling with a clustered caching configuration for our example, we need to update our ehcache.xml file. We will select the JGroups replication mechanism. The following snippet is suggested by Ehcache's documentation:

<cacheManagerPeerProviderFactory
        class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
        properties="connect=UDP(mcast_addr=231.12.21.132;mcast_port=45566;ip_ttl=32;
        mcast_send_buf_size=150000;mcast_recv_buf_size=80000):
        PING(timeout=2000;num_initial_members=6):
        MERGE2(min_interval=5000;max_interval=10000):
        FD_SOCK:VERIFY_SUSPECT(timeout=1500):
        pbcast.NAKACK(gc_lag=10;retransmit_timeout=3000):
        UNICAST(timeout=5000):
        pbcast.STABLE(desired_avg_gossip=20000):
        FRAG:
        pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;
        shun=false;print_local_addr=true)"
        propertySeparator="::"
/>

These details specify the network and communication details for the JGroup implementation of Ehcache's cacheManagerPeerProviderFactory.

Next, we must add a cacheEventListenerFactory element to each of our cache regions. If we do not specify specific configuration for each cache region, we can just add this element to the default region configuration. Let's configure our ArtEntity cache region as follows:

<cache name="com.prospringhibernate.gallery.domain.ArtEntity"
       maxElementsInMemory="5000"
       eternal="false"
       timeToIdleSeconds="900"
       timeToLiveSeconds="1800"
       overflowToDisk="false">

<cacheEventListenerFactory
  class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
  properties="replicateAsynchronously=true,
              replicatePuts=true,
              replicateUpdates=true,
              replicateUpdatesViaCopy=true,
              replicateRemovals=true"/>
</cache>

In this configuration, we set replicateAsynchronously to true, ensuring that updates happen asynchronously. Additionally, we set replicateUpdatesViaCopy to true, ensuring that the values of updated cache elements are sent directly to all of the other cluster nodes. Most of the other attributes should be fairly self-explanatory.

Summary

In this chapter, we examined several strategies for evaluating and improving application performance. One of the most common pitfalls for Hibernate developers is the N+1 selects issue. This problem typically stems from a failure to properly tune a domain object's mapping configuration or the queries within the DAO layer. Understanding how this problem can appear, as well as how to detect it, is important in ensuring decent ORM performance. Although tuning really depends on the unique requirements of an application, often the best solution is to consider what data needs to be made available within the service, controller, or view layers, and optimize your queries to load this data as efficiently as possible. You saw that using a fetch-join is often an effective approach for initializing an association without requiring multiple queries. Relying on Hibernate's batching capability can also be a decent strategy, although it isn't always as effective.

Another technique for improving performance is to leverage Hibernate's caching capabilities. Properly tuning the cache can make a dramatic difference for application performance. However, caching can also degrade performance if it is not done correctly. For example, caching too aggressively can trigger OutOfMemoryException exceptions. Understanding the different caching configuration options within Hibernate will help you select the appropriate behavior. It is also important to experiment with different TTL settings.

Hibernate provides several different caching layers. The first-level cache is scoped at the EntityManager, but rarely requires much tuning. The second-level cache provides the ability to cache domain objects, collections, and queries. Each of these cache types is managed and cached separately. Domain objects are keyed by their identifier, and the values of all an object's properties are persisted to the cache. Associations and queries, however, persist only collections of identifiers. These identifiers are cross-referenced against the entity cache to load the actual domain object data.

Some cache implementations, such as Ehcache, are clusterable, allowing updates to the cache to be persisted to other nodes in the cluster. However, without a way to keep the caches of other nodes within the cluster in sync, there is the potential for significant problems, caused by version conflicts or stale data. For instance, it is possible for an important update applied to the database to be inadvertently rolled back. This can happen when a node's cache is not notified of the initial update to the database.

Then, when a different user attempts to perform a write operation on the same entity, the user is applying his updates against stale data, which effectively rolls back the initial update once the second (stale) process is applied.

When deploying a clustered application, it is important to use a clusterable cache or a centralized cache server that all the nodes in the cluster can share. Ehcache provides a stand-alone server product called Cache Server. Additionally, Ehcache offers several configurable options for tuning its clusterable features. It is important to experiment with various settings to determine the options most suitable for your application's requirements.

In the next chapter, we will continue to investigate advanced strategies for providing specialized features for your persistence tier, improving performance, and utilizing best practices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.21.47