Chapter 12. Fetch plans, strategies, and profiles

In this chapter

  • Lazy and eager loading
  • Fetch plans, strategies, and profiles
  • Optimizing SQL execution

In this chapter, we explore Hibernate’s solution for the fundamental ORM problem of navigation, as mentioned in section 1.2.5. We show you how to retrieve data from the database and how you can optimize this loading.

Hibernate provides the following ways to get data out of the database and into memory:

  • Retrieving an entity instance by identifier is the most convenient method when the unique identifier value of an entity instance is known: for example, entityManager.find(Item.class, 123).
  • You can navigate the entity graph, starting from an already-loaded entity instance, by accessing the associated instances through property accessor methods such as someItem.getSeller().getAddress().getCity(), and so on. Elements of mapped collections are also loaded on demand when you start iterating through a collection. Hibernate automatically loads nodes of the graph if the persistence context is still open. What and how data is loaded when you call accessors and iterate through collections is the focus of this chapter.
  • You can use the Java Persistence Query Language (JPQL), a full object-oriented query language based on strings such as select i from Item i where i.id = ?.
  • The CriteriaQuery interface provides a type-safe and object-oriented way to perform queries without string manipulation.
  • You can write native SQL queries, call stored procedures, and let Hibernate take care of mapping the JDBC result sets to instances of your domain model classes.

In your JPA applications, you’ll use a combination of these techniques, but we won’t discuss each retrieval method in much detail in this chapter. By now you should be familiar with the basic Java Persistence API for retrieval by identifier. We keep our JPQL and CriteriaQuery examples as simple as possible, and you won’t need the SQL query-mapping features. Because these query options are sophisticated, we’ll explore them further in chapters 15 and 17.

Major new features in JPA 2

  • You can manually check the initialization state of an entity or an entity property with the new PersistenceUtil static helper class.
  • You can create standardized declarative fetch plans with the new Entity-Graph API.

This chapter covers what happens behind the scenes when you navigate the graph of your domain model and Hibernate retrieves data on demand. In all the examples, we show you the SQL executed by Hibernate in a comment right immediately after the operation that triggered the SQL execution.

What Hibernate loads depends on the fetch plan: you define the (sub)graph of the network of objects that should be loaded. Then you pick the right fetch strategy, defining how the data should be loaded. You can store your selection of plan and strategy as a fetch profile and reuse it.

Defining fetch plans and what data should be loaded by Hibernate relies on two fundamental techniques: lazy and eager loading of nodes in the network of objects.

12.1. Lazy and eager loading

At some point, you must decide what data should be loaded into memory from the database. When you execute entityManager.find(Item.class, 123), what is available in memory and loaded into the persistence context? What happens if you use EntityManager#getReference() instead?

In your domain-model mapping, you define the global default fetch plan, with the FetchType.LAZY and FetchType.EAGER options on associations and collections. This plan is the default setting for all operations involving your persistent domain model classes. It’s always active when you load an entity instance by identifier and when you navigate the entity graph by following associations and iterating through persistent collections.

Our recommended strategy is a lazy default fetch plan for all entities and collections. If you map all of your associations and collections with FetchType.LAZY, Hibernate will only load the data you’re accessing at this time. While you navigate the graph of your domain model instances, Hibernate will load data on demand, bit by bit. You then override this behavior on a per-case basis when necessary.

To implement lazy loading, Hibernate relies on runtime-generated entity placeholders called proxies and on smart wrappers for collections.

12.1.1. Understanding entity proxies

Consider the getReference() method of the EntityManager API. In section 10.2.3, you had a first look at this operation and how it may return a proxy. Let’s further explore this important feature and find out how proxies work:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyProxyCollections.java

This code doesn’t execute any SQL against the database. All Hibernate does is create an Item proxy: it looks (and smells) like the real thing, but it’s only a placeholder. In the persistence context, in memory, you now have this proxy available in persistent state, as shown in figure 12.1.

Figure 12.1. The persistence context contains an Item proxy.

The proxy is an instance of a runtime-generated subclass of Item, carrying the identifier value of the entity instance it represents. This is why Hibernate (in line with JPA) requires that entity classes have at least a public or protected no-argument constructor (the class may have other constructors, too). The entity class and its methods must not be final; otherwise, Hibernate can’t produce a proxy. Note that the JPA specification doesn’t mention proxies; it’s up to the JPA provider how lazy loading is implemented.

If you call any method on the proxy that isn’t the “identifier getter,” you trigger initialization of the proxy and hit the database. If you call item.getName(), the SQL SELECT to load the Item will be executed. The previous example called item.getId() without triggering initialization because getId() is the identifier getter method in the given mapping; the getId() method was annotated with @Id. If @Id was on a field, then calling getId(), just like calling any other method, would initialize the proxy! (Remember that we usually prefer mappings and access on fields, because this allows you more freedom when designing accessor methods; see section 3.2.3. It’s up to you whether calling getId() without initializing a proxy is more important.)

With proxies, be careful how you compare classes. Because Hibernate generates the proxy class, it has a funny-looking name, and it is not equal to Item.class:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyProxyCollections.java

If you really must get the actual type represented by a proxy, use the HibernateProxyHelper.

JPA provides PersistenceUtil, which you can use to check the initialization state of an entity or any of its attributes:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyProxyCollections.java

The static isLoaded() method also accepts the name of a property of the given entity (proxy) instance, checking its initialization state. Hibernate offers an alternative API with Hibernate.isInitialized(). If you call item.getSeller(), though, the item proxy is initialized first!

Hibernate Feature

Hibernate also offers a utility method for quick-and-dirty initialization of proxies:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyProxyCollections.java

The first call hits the database and loads the Item data, populating the proxy with the item’s name, price, and so on.

The seller of the Item is an @ManyToOne association mapped with FetchType.LAZY, so Hibernate creates a User proxy when the Item is loaded. You can check the seller proxy state and load it manually, just like the Item. Remember that the JPA default for @ManyToOne is FetchType.EAGER! You usually want to override this to get a lazy default fetch plan, as first shown in section 7.3.1 and again here:

Path: /model/src/main/java/org/jpwh/model/fetching/proxy/Item.java

@Entity
public class Item {

    @ManyToOne(fetch = FetchType.LAZY)
    public User getSeller() {
        return seller;
    }
    // ...
}

With such a lazy fetch plan, you might run into a LazyInitializationException. Consider the following code:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyProxyCollections.java

  1. An Item entity instance is loaded in the persistence context. Its seller isn’t initialized: it’s a User proxy.
  2. You can manually detach the data from the persistence context or close the persistence context and detach everything.
  3. The static PersistenceUtil helper works without a persistence context. You can check at any time whether the data you want to access has been loaded.
  4. In detached state, you can call the identifier getter method of the User proxy. But -calling any other method on the proxy, such as getUsername(), will throw a Lazy-InitializationException. Data can only be loaded on demand while the persistence context manages the proxy, not in detached state.
How does lazy loading of one-to-one associations work?

Lazy loading for one-to-one entity associations is sometimes confusing for new Hibernate users. If you consider one-to-one associations based on shared primary keys (see section 8.1.1), an association can be proxied only if it’s optional=false. For example, an Address always has a reference to a User. If this association is nullable and optional, Hibernate must first hit the database to find out whether it should apply a proxy or a null—and the purpose of lazy loading is to not hit the database at all. You can enable lazy loading of optional one-to-one associations through bytecode instrumentation and interception, which we discuss later in this chapter.

Hibernate proxies are useful beyond simple lazy loading. For example, you can store a new Bid without loading any data into memory:

The first two calls produce proxies of Item and User, respectively. Then the item and bidder association properties of the transient Bid are set with the proxies. The -persist() call queues one SQL INSERT when the persistence context is flushed, and no SELECT is necessary to create the new row in the BID table. All (foreign) key values are available as identifier values of the Item and User proxy.

Runtime proxy generation as provided by Hibernate is an excellent choice for transparent lazy loading. Your domain model classes don’t have to implement any special (super)type, as some older ORM solutions would require. No code generation or post-processing of bytecode is needed either, simplifying your build procedure. But you should be aware of some potentially negative aspects:

  • Cases where runtime proxies aren’t completely transparent are polymorphic associations that are tested with instanceof, a problem shown in section 6.8.1.
  • With entity proxies, you have to be careful not to access fields directly when writing custom equals() and hashCode() methods, as discussed in section 10.3.2.
  • Proxies can only be used to lazy-load entity associations. They can’t be used to lazy load individual basic properties or embedded components, such as Item#description or User#homeAddress. If you set the @Basic(fetch = FetchType.LAZY) hint on such a property, Hibernate ignores it; the value is eagerly loaded when the owning entity instance is loaded. Although possible with bytecode instrumentation and interception, we consider this kind of optimization to be rarely useful. Optimizing at the level of individual columns selected in SQL is unnecessary if you aren’t working with (a) a significant number of optional/nullable columns or (b) columns containing large values that have to be retrieved on demand because of the physical limitations of your system. Large values are best represented with locator objects (LOBs) instead; they provide lazy loading by definition (see the section “Binary and large value types” in chapter 5).

Proxies enable lazy loading of entity instances. For persistent collections Hibernate has a slightly different approach.

12.1.2. Lazy persistent collections

You map persistent collections with either @ElementCollection for a collection of elements of basic or embeddable type or with @OneToMany and @ManyToMany for many--valued entity associations. These collections are, unlike @ManyToOne, lazy-loaded by default. You don’t have to specify the FetchType.LAZY option on the mapping.

When you load an Item, Hibernate doesn’t load its lazy collection of images right away. The lazy bids one-to-many collection is also only loaded on demand, when accessed and needed:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyProxyCollections.java

The find() operation loads the Item entity instance into the persistence context, as you can see in figure 12.2. The Item instance has a reference to an uninitialized User proxy: the seller. It also has a reference to an uninitialized Set of bids and an uninitialized List of images.

Figure 12.2. Proxies and collection wrappers represent the boundary of the loaded graph.

Hibernate implements lazy loading (and dirty checking) of collections with its own special implementations called collection wrappers. Although the bids certainly look like a Set, Hibernate replaced the implementation with an org.hibernate.collection.internal.PersistentSet while you weren’t looking. It’s not a HashSet, but it has the same behavior. That’s why it’s so important to program with interfaces in your domain model and only rely on Set and not HashSet. Lists and maps work the same way.

These special collections can detect when you access them and load their data at that time. As soon as you start iterating through the bids, the collection and all bids made for the item are loaded:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyProxyCollections.java

Bid firstBid = bids.iterator().next();
// select * from BID where ITEM_ID = ?

// Alternative: Hibernate.initialize(bids);

Alternatively, just as for entity proxies, you can call the static utility method Hibernate.initialize() to load a collection. It will be completely loaded; you can’t say “only load the first two bids,” for example. For this, you’d have to write a query.

Hibernate Feature

For convenience, so you don’t have to write many trivial queries, Hibernate offers a proprietary setting on collection mappings:

Path: /model/src/main/java/org/jpwh/model/fetching/proxy/Item.java

@Entity
public class Item {

    @OneToMany(mappedBy = "item")
    @org.hibernate.annotations.LazyCollection(
       org.hibernate.annotations.LazyCollectionOption.EXTRA
    )
    public Set<Bid> getBids() {
        return bids;
    }
    // ...
}

With LazyCollectionOption.EXTRA, the collection supports operations that don’t trigger initialization. For example, you could ask for the collection’s size:

Item item = em.find(Item.class, ITEM_ID);
// select * from ITEM where ID = ?
<enter/>
assertEquals(item.getBids().size(), 3);
// select count(b) from BID b where b.ITEM_ID = ?

The size() operation triggers a SELECT COUNT() SQL query but doesn’t load the bids into memory. On all extra lazy collections, similar queries are executed for the isEmpty() and contains() operations. An extra lazy Set checks for duplicates with a simple query when you call add(). An extra lazy List only loads one element if you call get(index). For Map, extra lazy operations are containsKey() and containsValue().

Hibernate’s proxies and smart collections are one possible implementation of lazy loading, with a good balance of features and cost. An alternative we’ve mentioned before is interception.

12.1.3. Lazy loading with interception

Hibernate Feature

The fundamental problem with lazy loading is that the JPA provider must know when to load the seller of an Item or the collection of bids. Instead of runtime-generated proxies and smart collections, many other JPA providers rely exclusively on interception of method calls. For example, when you call someItem.getSeller(), the JPA provider would intercept this call and load the User instance representing the seller.

This approach requires special code in your Item class to implement the interception: the getSeller() method or the seller field must be wrapped. Because you don’t want to write this code by hand, typically you run a bytecode enhancer (bundled with your JPA provider) after compiling your domain model classes. This enhancer injects the necessary interception code into your compiled classes, manipulating the fields and methods at the bytecode level.

Let’s discuss lazy loading based on interception with a few examples. First, you probably want to disable Hibernate’s proxy generation:

Path: /model/src/main/java/org/jpwh/model/fetching/interception/User.java

@Entity
@org.hibernate.annotations.Proxy(lazy = false)
public class User {
    // ...
}

Hibernate will now no longer generate a proxy for the User entity. If you call entity-Manager.getReference(User.class, USER_ID), a SELECT is executed, just as for find():

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyInterception.java

For entity associations targeting User, such as the seller of an Item, the FetchType.LAZY hint has no effect:

Path: /model/src/main/java/org/jpwh/model/fetching/interception/Item.java

Instead, the proprietary LazyToOneOption.NO_PROXY setting tells Hibernate that the bytecode enhancer must add interception code for the seller property. Without this option, or if you don’t run the bytecode enhancer, this association would be eagerly loaded and the field would be populated right away when the Item is loaded, because proxies for the User entity have been disabled.

If you run the bytecode enhancer, Hibernate intercepts access of the seller field and triggers loading when you touch the field:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyInterception.java

This is less lazy than proxies. Remember that you could call User#getId() on a proxy without initializing the instance, as explained in the previous sections. With interception, any access of the seller field, and calling getSeller(), will trigger initialization.

For lazy entity associations, proxies are usually a better choice than interception. A more common use case for interception is properties of basic type, such a String or byte[], with potentially large values. We might argue that LOBs (see “Binary and large value types” in chapter 5) should be preferred for large strings or binary data, but you might not want to have the java.sql.Blob or java.sql.Clob type in your domain model. With interception and bytecode enhancement, you can load a simple String or byte[] field on demand:

Path: /model/src/main/java/org/jpwh/model/fetching/interception/Item.java

@Entity
public class Item {

    @Basic(fetch = FetchType.LAZY)
    protected String description;

    // ...
}

The Item#description will be lazy loaded if you run the bytecode enhancer on the compiled class. If you don’t run the bytecode enhancer—for example, during development—the String will be loaded together with the Item instance.

If you rely on interception, be aware that Hibernate will load all lazy fields of an entity or embeddable class, even if only one has to be loaded:

Path: /examples/src/test/java/org/jpwh/test/fetching/LazyInterception.java

When Hibernate loads the description of the Item, it loads the seller and any other intercepted field right away, too. There are no fetch groups in Hibernate at the time of writing: it’s all or nothing.

The downside of interception is the cost of running a bytecode enhancer every time you build your domain model classes, and waiting for the instrumentation to complete. You may decide to skip the instrumentation during development, if the behavior of your application doesn’t depend on an item’s description load state. Then, when building the testing and production package, you can execute the enhancer.

The new Hibernate 5 bytecode enhancer

Unfortunately, we can’t guarantee that the interception examples we show here will work with the latest Hibernate 5 version. The Hibernate 5 bytecode enhancer has been rewritten and now supports more than interception for lazy loading: the byte-code enhancer can inject code into your domain model classes to speed up dirty checking and automatically manage bidirectional entity associations. At the time of writing, however, we couldn’t get this brand-new enhancer to work, and development was ongoing. We refer you to the current Hibernate documentation for more information about the enhancer feature and configuring it in your project using Maven or Gradle plug-ins.

We leave it to you to decide whether you want interception for lazy loading—in our experience, good use cases are rare. Note that we haven’t talked about collection wrappers when discussing interception: although you could enable interception for collection fields, Hibernate would still use its smart collection wrappers. The reason is that these collection wrappers are, unlike entity proxies, needed for other purposes besides lazy loading. For example, Hibernate relies on them to track additions and removals of collection elements when dirty checking. You can’t disable the collection wrappers in your mappings; they’re always on. (Of course, you never have to map a persistent collection; they’re a feature, not a requirement. See our earlier discussion in section 7.1.) Persistent arrays, on the other hand, can only be lazily loaded with field interception—they can’t be wrapped like collections.

You’ve now seen all available options for lazy loading in Hibernate. Next, we look at the opposite of on-demand loading: the eager fetching of data.

12.1.4. Eager loading of associations and collections

We’ve recommended a lazy default fetch plan, with FetchType.LAZY on all your association and collection mappings. Sometimes, although not often, you want the opposite: to specify that a particular entity association or collection should always be loaded. You want the guarantee that this data is available in memory without an additional database hit.

More important, you want a guarantee that, for example, you can access the seller of an Item once the Item instance is in detached state. When the persistence context is closed, lazy loading is no longer available. If seller were an uninitialized proxy, you’d get a LazyInitializationException when you accessed it in detached state. For data to be available in detached state, you need to either load it manually while the persistence context is still open or, if you always want it loaded, change your fetch plan to be eager instead of lazy.

Let’s assume that you always require loading of the seller and the bids of an Item:

Path: /model/src/main/java/org/jpwh/model/fetching/eagerjoin/Item.java

Unlike FetchType.LAZY, which is a hint the JPA provider can ignore, a FetchType.EAGER is a hard requirement. The provider has to guarantee that the data is loaded and available in detached state; it can’t ignore the setting.

Consider the collection mapping: is it really a good idea to say, “Whenever an item is loaded into memory, load the bids of the item right away, too”? Even if you only want to display the item’s name or find out when the auction ends, all bids will be loaded into memory. Always eager-loading collections, with FetchType.EAGER as the default fetch plan in the mapping, usually isn’t a great strategy. You’ll also see the Cartesian product problem appear if you eagerly load several collections, which we discuss later in this chapter. It’s best if you leave collections as the default FetchType.LAZY.

If you now find() an Item (or force the initialization of an Item proxy), both the seller and all the bids are loaded as persistent instances into your persistence context:

Path: /examples/src/test/java/org/jpwh/test/fetching/EagerJoin.java

For the find(), Hibernate executes a single SQL SELECT and JOINs three tables to retrieve the data. You can see the contents of the persistence context in figure 12.3. Note how the boundaries of the loaded graph are represented: the collection of images hasn’t been loaded, and each Bid has a reference to an uninitialized User proxy, the bidder. If you now detach the Item, you can access the loaded seller and bids without causing a LazyInitializationException. If you try to access the images or one of the bidder proxies, you’ll get an exception!

Figure 12.3. The seller and the bids of an Item are loaded.

In the following examples, we assume that your domain model has a lazy default fetch plan. Hibernate will only load the data you explicitly request and the associations and collections you access.

Next, we discuss how data should be loaded when you find an entity instance by identity and when you navigate the network, using the pointers of your mapped associations and collections. We’re interested in what SQL is executed and finding the ideal fetch strategy.

12.2. Selecting a fetch strategy

Hibernate executes SQL SELECT statements to load data into memory. If you load an entity instance, one or more SELECT(s) are executed, depending on the number of tables involved and the fetching strategy you’ve applied. Your goal is to minimize the number of SQL statements and to simplify the SQL statements so that querying can be as efficient as possible.

Consider our recommended fetch plan from earlier in this chapter: every association and collection should be loaded on demand, lazily. This default fetch plan will most likely result in too many SQL statements, each loading only one small piece of data. This will lead to n+1 selects problems, and we discuss this issue first. The alternative fetch plan, using eager loading, will result in fewer SQL statements, because larger chunks of data are loaded into memory with each SQL query. You might then see the Cartesian product problem, as SQL result sets become too large.

You need to find the middle ground between these two extremes: the ideal fetching strategy for each procedure and use case in your application. Like fetch plans, you can set a global fetching strategy in your mappings: the default setting that is always active. Then, for a particular procedure, you might override the default fetching strategy with a custom JPQL, CriteriaQuery, or even SQL query.

First, let’s discuss the fundamental problems you see, starting with the n+1 selects issue.

12.2.1. The n+1 selects problem

This problem is easy to understand with some example code. Let’s assume that you mapped a lazy fetch plan, so everything is loaded on demand. The following example code checks whether the seller of each Item has a username:

Path: /examples/src/test/java/org/jpwh/test/fetching/NPlusOneSelects.java

You see one SQL SELECT to load the Item entity instances. Then, while you iterate through all the items, retrieving each User requires an additional SELECT. This amounts to one query for the Item plus n queries depending on how many items you have and whether a particular User is selling more than one Item. Obviously, this is a very inefficient strategy if you know you’ll access the seller of each Item.

You can see the same issue with lazily loaded collections. The following example checks whether each Item has some bids:

Path: /examples/src/test/java/org/jpwh/test/fetching/NPlusOneSelects.java

Again, if you know you’ll access each bids collection, loading only one at a time is inefficient. If you have 100 items, you’ll execute 101 SQL queries!

With what you know so far, you might be tempted to change the default fetch plan in your mappings and put a FetchType.EAGER on your seller or bids associations. But doing so can lead to our next topic: the Cartesian product problem.

12.2.2. The Cartesian product problem

If you look at your domain and data model and say, “Every time I need an Item, I also need the seller of that Item,” you can map the association with FetchType.EAGER instead of a lazy fetch plan. You want a guarantee that whenever an Item is loaded, the seller will be loaded right away—you want that data to be available when the Item is detached and the persistence context is closed:

Path: /model/src/main/java/org/jpwh/model/fetching/cartesianproduct/Item.java

@Entity
public class Item {
<enter/>
    @ManyToOne(fetch = FetchType.EAGER)
    protected User seller;
<enter/>
    // ...
}

To implement your eager fetch plan, Hibernate uses an SQL JOIN operation to load an Item and a User instance in one SELECT:

 item = em.find(Item.class, ITEM_ID);
// select i.*, u.*
//  from ITEM i
//   left outer join USERS u on u.ID = i.SELLER_ID
//  where i.ID = ?

The result set contains one row with data from the ITEM table combined with data from the USERS table, as shown in figure 12.4.

Figure 12.4. Hibernate joins two tables to eagerly fetch associated rows.
i.ID i.NAME i.SELLER_ID ... u.ID u.USERNAME ...
1 One 2 ... 2 johndoe ...

Eager fetching with the default JOIN strategy isn’t problematic for @ManyToOne and @OneToOne associations. You can eagerly load, with one SQL query and JOINs, an Item, its seller, the User’s Address, the City they live in, and so on. Even if you map all these associations with FetchType.EAGER, the result set will have only one row. Now, Hibernate has to stop following your FetchType.EAGER plan at some point. The number of tables joined depends on the global hibernate.max_fetch_depth configuration property. By default, no limit is set. Reasonable values are small, usually between 1 and 5. You may even disable JOIN fetching of @ManyToOne and @OneToOne associations by setting the property to 0. If Hibernate reaches the limit, it will still eagerly load the data according to your fetch plan, but with additional SELECT statements. (Note that some database dialects may preset this property: for example, MySQL-Dialect sets it to 2.)

Eagerly loading collections with JOINs, on the other hand, can lead to serious performance issues. If you also switched to FetchType.EAGER for the bids and images collections, you’d run into the Cartesian product problem.

This issue appears when you eagerly load two collections with one SQL query and a JOIN operation. First, let’s create such a fetch plan and then look at the SQL problem:

Path: /model/src/main/java/org/jpwh/model/fetching/cartesianproduct/Item.java

@Entity
public class Item {

    @OneToMany(mappedBy = "item", fetch = FetchType.EAGER)
    protected Set<Bid> bids = new HashSet<>();

    @ElementCollection(fetch = FetchType.EAGER)
    @CollectionTable(name = "IMAGE")
    @Column(name = "FILENAME")
    protected Set<String> images = new HashSet<String>();

    // ...
}

It doesn’t matter whether both collections are @OneToMany, @ManyToMany, or @Element-Collection. Eager fetching more than one collection at once with the SQL JOIN operator is the fundamental issue, no matter what the collection content is. If you load an Item, Hibernate executes the problematic SQL statement:

Path: /examples/src/test/java/org/jpwh/test/fetching/CartesianProduct.java

Item item = em.find(Item.class, ITEM_ID);
// select i.*, b.*, img.*
//  from ITEM i
//   left outer join BID b on b.ITEM_ID = i.ID
//   left outer join IMAGE img on img.ITEM_ID = i.ID
//  where i.ID = ?

em.detach(item);

assertEquals(item.getImages().size(), 3);
assertEquals(item.getBids().size(), 3);

As you can see, Hibernate obeyed your eager fetch plan, and you can access the bids and images collections in detached state. The problem is how they were loaded, with an SQL JOIN that results in a product. Look at the result set in figure 12.5.

Figure 12.5. A product is the result of two joins with many rows.

i.ID

i.NAME

...

b.ID

b.AMOUNT

img.FILENAME

1 One ... 1 99.00 foo.jpg
1 One ... 1 99.00 bar.jpg
1 One ... 1 99.00 baz.jpg
1 One ... 2 100.00 foo.jpg
1 One ... 2 100.00 bar.jpg
1 One ... 2 100.00 baz.jp
1 One ... 3 101.00 foo.jpg
1 One ... 3 101.00 bar.jpg
1 One ... 3 101.00 baz.jpg

This result set contains many redundant data items, and only the shaded cells are relevant for Hibernate. The Item has three bids and three images. The size of the product depends on the size of the collections you’re retrieving: three times three is nine rows total. Now imagine that you have an Item with 50 bids and 5 images—you’ll see a result set with possibly 250 rows! You can create even larger SQL products when you write your own queries with JPQL or CriteriaQuery: imagine what happens if you load 500 items and eager-fetch dozens of bids and images with JOINs.

Considerable processing time and memory are required on the database server to create such results, which then must be transferred across the network. If you’re hoping that the JDBC driver will compress the data on the wire somehow, you’re probably expecting too much from database vendors. Hibernate immediately removes all duplicates when it marshals the result set into persistent instances and collections; information in cells that aren’t shaded in figure 12.5 will be ignored. Obviously, you can’t remove these duplicates at the SQL level; the SQL DISTINCT operator doesn’t help here.

Instead of one SQL query with an extremely large result, three separate queries would be faster to retrieve an entity instance and two collections at the same time. Next, we focus on this kind of optimization and how you find and implement the best fetch strategy. We start again with a default lazy fetch plan and try to solve the n+1 selects problem first.

12.2.3. Prefetching data in batches

Hibernate Feature

If Hibernate fetches every entity association and collection only on demand, many additional SQL SELECT statements may be necessary to complete a particular procedure. As before, consider a routine that checks whether the seller of each Item has a username. With lazy loading, this would require one SELECT to get all Item instances and n more SELECTs to initialize the seller proxy of each Item.

Hibernate offers algorithms that can prefetch data. The first algorithm we discuss is batch fetching, and it works as follows: if Hibernate must initialize one User proxy, go ahead and initialize several with the same SELECT. In other words, if you already know that there are several Item instances in the persistence context and that they all have a proxy applied to their seller association, you may as well initialize several proxies instead of just one if you make the round trip to the database.

Let’s see how this works. First, enable batch fetching of User instances with a proprietary Hibernate annotation:

Path: /model/src/main/java/org/jpwh/model/fetching/batch/User.java

@Entity
@org.hibernate.annotations.BatchSize(size = 10)
@Table(name = "USERS")
public class User {
    // ...
}

This setting tells Hibernate that it may load up to 10 User proxies if one has to be loaded, all with the same SELECT. Batch fetching is often called a blind-guess optimization, because you don’t know how many uninitialized User proxies may be in a particular persistence context. You can’t say for sure that 10 is an ideal value—it’s a guess. You know that instead of n+1 SQL queries, you’ll now see n+1/10 queries, a significant reduction. Reasonable values are usually small, because you don’t want to load too much data into memory either, especially if you aren’t sure you’ll need it.

This is the optimized procedure, which checks the username of each seller:

Path: /examples/src/test/java/org/jpwh/test/fetching/Batch.java

List<Item> items = em.createQuery("select i from Item i").getResultList();
// select * from ITEM

for (Item item : items) {
    assertNotNull(item.getSeller().getUsername());
    // select * from USERS where ID in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
}

Note the SQL query that Hibernate executes while you iterate through the items. When you call item.getSeller().getUserName() for the first time, Hibernate must initialize the first User proxy. Instead of only loading a single row from the USERS table, Hibernate retrieves several rows, and up to 10 User instances are loaded. Once you access the eleventh seller, another 10 are loaded in one batch, and so on, until the persistence context contains no uninitialized User proxies.

FAQ: What is the real batch-fetching algorithm?

Our explanation of batch loading was somewhat simplified, and you may see a slightly different algorithm in practice. As an example, imagine a batch size of 32. At startup time, Hibernate creates several batch loaders internally. Each loader knows how many proxies it can initialize: 32, 16, 10, 9, 8, 7, ..., 1. The goal is to minimize the memory consumption for loader creation and to create enough loaders that every possible batch fetch can be produced. Another goal is to minimize the number of SQL queries, obviously.

To initialize 31 proxies, Hibernate executes 3 batches (you probably expected 1, because 32 > 31). The batch loaders that are applied are 16, 10, and 5, as automatically selected by Hibernate. You can customize this batch-fetching algorithm with the property hibernate.batch_fetch_style in your persistence unit configuration. The default is LEGACY, which builds and selects several batch loaders on startup. Other options are PADDED and DYNAMIC. With PADDED, Hibernate builds only one batch loader SQL query on startup with placeholders for 32 arguments in the IN clause and then repeats bound identifiers if fewer than 32 proxies have to be loaded. With DYNAMIC, Hibernate dynamically builds the batch SQL statement at runtime, when it knows the number of proxies to initialize.

Batch fetching is also available for collections:

Path: /model/src/main/java/org/jpwh/model/fetching/batch/Item.java

@Entity
public class Item {

    @OneToMany(mappedBy = "item")
    @org.hibernate.annotations.BatchSize(size = 5)
    protected Set<Bid> bids = new HashSet<>();

    // ...
}

If you now force the initialization of one bids collection, up to five more Item#bids collections, if they’re uninitialized in the current persistence context, are loaded right away:

Path: /examples/src/test/java/org/jpwh/test/fetching/Batch.java

List<Item> items = em.createQuery("select i from Item i").getResultList();
// select * from ITEM

for (Item item : items) {
    assertTrue(item.getBids().size() > 0);
    // select * from BID where ITEM_ID in (?, ?, ?, ?, ?)
}

When you call item.getBids().size() for the first time while iterating, a whole batch of Bid collections are preloaded for the other Item instances.

Batch fetching is a simple and often smart optimization that can significantly reduce the number of SQL statements that would otherwise be necessary to initialize all your proxies and collections. Although you may prefetch data you won’t need in the end and consume more memory, the reduction in database round trips can make a huge difference. Memory is cheap, but scaling database servers isn’t.

Another prefetching algorithm that isn’t a blind guess uses subselects to initialize many collections with a single statement.

12.2.4. Prefetching collections with subselects

Hibernate Feature

A potentially better strategy for loading all bids of several Item instances is prefetching with a subselect. To enable this optimization, add a Hibernate annotation to your collection mapping:

Path: /model/src/main/java/org/jpwh/model/fetching/subselect/Item.java

@Entity
public class Item {
<enter/>
    @OneToMany(mappedBy = "item")
    @org.hibernate.annotations.Fetch(
       org.hibernate.annotations.FetchMode.SUBSELECT
    )
    protected Set<Bid> bids = new HashSet<>();

    // ...
}

Hibernate now initializes all bids collections for all loaded Item instances as soon as you force the initialization of one bids collection:

Path: /examples/src/test/java/org/jpwh/test/fetching/Subselect.java

List<Item> items = em.createQuery("select i from Item i").getResultList();
// select * from ITEM
<enter/>
for (Item item : items) {
    assertTrue(item.getBids().size() > 0);
    // select * from BID where ITEM_ID in (
    //  select ID from ITEM
    // )
}

Hibernate remembers the original query used to load the items. It then embeds this initial query (slightly modified) in a subselect, retrieving the collection of bids for each Item.

Prefetching using a subselect is a powerful optimization, but at the time of writing, it was only available for lazy collections, not for entity proxies. Also note that the original query that is rerun as a subselect is only remembered by Hibernate for a particular persistence context. If you detach an Item instance without initializing the collection of bids, and then merge it with a new persistence context and start iterating through the collection, no prefetching of other collections occurs.

Batch and subselect prefetching reduce the number of queries necessary for a particular procedure if you stick with a global lazy fetch plan in your mappings, helping mitigate the n+1 selects problem. If instead your global fetch plan has eager loaded associations and collections, you have to avoid the Cartesian product problem—for example, by breaking down a JOIN query into several SELECTs.

12.2.5. Eager fetching with multiple SELECTs

Hibernate Feature

When you’re trying to fetch several collections with one SQL query and JOINs, you run into the Cartesian product problem, as explained earlier. Instead of a JOIN operation, you can tell Hibernate to eagerly load data with additional SELECT queries and hence avoid large results and SQL products with duplicates:

Path: /model/src/main/java/org/jpwh/model/fetching/eagerselect/Item.java

Now, when an Item is loaded, the seller and bids have to be loaded as well:

Path: /examples/src/test/java/org/jpwh/test/fetching/EagerSelect.java

Item item = em.find(Item.class, ITEM_ID);
// select * from ITEM where ID = ?
// select * from USERS where ID = ?
// select * from BID where ITEM_ID = ?
<enter/>
em.detach(item);
<enter/>
assertEquals(item.getBids().size(), 3);
assertNotNull(item.getBids().iterator().next().getAmount());
assertEquals(item.getSeller().getUsername(), "johndoe");

Hibernate uses one SELECT to load a row from the ITEM table. It then immediately executes two more SELECTs: one loading a row from the USERS table (the seller) and the other loading several rows from the BID table (the bids).

The additional SELECT queries aren’t executed lazily; the find() method produces several SQL queries. You can see how Hibernate followed the eager fetch plan: all data is available in detached state.

Still, all of these settings are global; they’re always active. The danger is that adjusting one setting for one problematic case in your application might have negative side effects on some other procedure. Maintaining this balance can be difficult, so our recommendation is to map every entity association and collection as FetchType.LAZY, as mentioned before.

A better approach is to dynamically use eager fetching and JOIN operations only when needed, for a particular procedure.

12.2.6. Dynamic eager fetching

As in the previous sections, let’s say you have to check the username of each Item#seller. With a lazy global fetch plan, load the data you need for this procedure and apply a dynamic eager fetch strategy in a query:

Path: /examples/src/test/java/org/jpwh/test/fetching/EagerQuery.java

The important keywords in this JPQL query are join fetch, telling Hibernate to use a SQL JOIN (an INNER JOIN, actually) to retrieve the seller of each Item in the same query. The same query can be expressed with the CriteriaQuery API instead of a JPQL string:

Path: /examples/src/test/java/org/jpwh/test/fetching/EagerQuery.java

Dynamic eager join fetching also works for collections. Here you load all bids of each Item:

Path: /examples/src/test/java/org/jpwh/test/fetching/EagerQuery.java

Now the same with the CriteriaQuery API:

Path: /examples/src/test/java/org/jpwh/test/fetching/EagerQuery.java

Note that for collection fetching, a LEFT OUTER JOIN is necessary, because you also want rows from the ITEM table if there are no bids. We’ll have much more to say about fetching with JPQL and CriteriaQuery later in this book, in chapter 15. You’ll see many more examples then of inner, outer, left, and right joins, so don’t worry too much about these details now.

Writing queries by hand isn’t the only available option if you want to override the global fetch plan of your domain model dynamically. You can write fetch profiles declaratively.

12.3. Using fetch profiles

Fetch profiles complement the fetching options in the query languages and APIs. They allow you to maintain your profile definitions in either XML or annotation metadata. Early Hibernate versions didn’t have support for special fetch profiles, but today Hibernate supports the following:

  • Fetch profiles— A proprietary API based on declaration of the profile with @org.hibernate.annotations.FetchProfile and execution with Session #enableFetchProfile(). This simple mechanism currently supports overriding lazy-mapped entity associations and collections selectively, enabling a JOIN eager fetching strategy for a particular unit of work.
  • Entity graphs— Specified in JPA 2.1, you can declare a graph of entity attributes and associations with the @EntityGraph annotation. This fetch plan, or a combination of plans, can be enabled as a hint when executing EntityManager #find() or queries (JPQL, criteria). The provided graph controls what should be loaded; unfortunately it doesn’t control how it should be loaded.

It’s fair to say that there is room for improvement here, and we expect future versions of Hibernate and JPA to offer a unified and more powerful API.

Don’t forget that you can externalize JPQL and SQL statements and move them to metadata (see section 14.4). A JPQL query is a declarative (named) fetch profile; what you’re missing is the ability to overlay different plans easily on the same base query. We’ve seen some creative solutions with string manipulation that are best avoided. With criteria queries, on the other hand, you already have the full power of Java available to organize your query-building code. Then the value of entity graphs is being able to reuse fetch plans across any kind of query.

Let’s talk about Hibernate fetch profiles first and how you can override a global lazy fetch plan for a particular unit of work.

12.3.1. Declaring Hibernate fetch profiles

Hibernate Feature

Hibernate fetch profiles are global metadata: they’re declared for the entire persistence unit. Although you could place the @FetchProfile annotation on a class, we prefer it as package-level metadata in a package-info.java:

Path: /model/src/main/java/org/jpwh/model/fetching/profile/package-info.java

  1. Each profile has a name. This is a simple string isolated in a constant.
  2. Each override in a profile names one entity association or collection.
  3. The only supported mode at the time of writing is JOIN. The profiles can now be enabled for a unit of work:

Path: /examples/src/test/java/org/jpwh/test/fetching/Profile.java

  1. The Item#seller is mapped lazy, so the default fetch plan only retrieves the Item instance.
  2. You need the Hibernate API to enable a profile. It’s then active for any operation in that unit of work. The Item#seller is fetched with a join in the same SQL statement whenever an Item is loaded with this EntityManager.
  3. You can overlay another profile on the same unit of work. Now the Item#seller and the Item#bids collection are fetched with a join in the same SQL statement whenever an Item is loaded. Although basic, Hibernate fetch profiles can be an easy solution for fetching optimization in smaller or simpler applications. With JPA 2.1, the introduction of entity graphs enables similar functionality in a standard fashion.

12.3.2. Working with entity graphs

An entity graph is a declaration of entity nodes and attributes, overriding or augmenting the default fetch plan when you execute an EntityManager#find() or with a hint on query operations. This is an example of a retrieval operation using an entity graph:

Path: /examples/src/test/java/org/jpwh/test/fetching/FetchLoadGraph.java

The name of the entity graph you’re using is Item, and the hint for the find() operation indicates it should be the load graph. This means attributes that are specified by attribute nodes of the entity graph are treated as FetchType.EAGER, and attributes that aren’t specified are treated according to their specified or default FetchType in the mapping.

This is the declaration of this graph and the default fetch plan of the entity class:

Path: /model/src/main/java/org/jpwh/model/fetching/fetchloadgraph/Item.java

Entity graphs in metadata have names and are associated with an entity class; they’re usually declared in annotations on top of an entity class. You can put them in XML if you like. If you don’t give an entity graph a name, it gets the simple name of its owning entity class, which here is Item. If you don’t specify any attribute nodes in the graph, like the empty entity graph in the last example, the defaults of the entity class are used. In Item, all associations and collections are mapped lazy; this is the default fetch plan. Hence, what you’ve done so far makes little difference, and the find() operation without any hints will produce the same result: the Item instance is loaded, and the seller, bids, and images aren’t.

Alternatively, you can build an entity graph with an API:

Path: /examples/src/test/java/org/jpwh/test/fetching/FetchLoadGraph.java

EntityGraph<Item> itemGraph = em.createEntityGraph(Item.class);
<enter/>
Map<String, Object> properties = new HashMap<>();
properties.put("javax.persistence.loadgraph", itemGraph);
<enter/>
Item item = em.find(Item.class, ITEM_ID, properties);

This is again an empty entity graph with no attribute nodes, given directly to a retrieval operation.

Let’s say you want to write an entity graph that changes the lazy default of Item#seller to eager fetching, when enabled:

Path: /model/src/main/java/orgjpwh/model/fetching/fetchloadgraph/Item.java

@NamedEntityGraphs({
    @NamedEntityGraph(
        name = "ItemSeller",
        attributeNodes = {
            @NamedAttributeNode("seller")
        }
    )
})
@Entity
public class Item {
<enter/>
    // ...
}

Now enable this graph by name when you want the Item and the seller eagerly loaded:

Path: /examples/src/test/java/org/jpwh/test/fetching/FetchLoadGraph.java

Map<String, Object> properties = new HashMap<>();
properties.put(
    "javax.persistence.loadgraph",
    em.getEntityGraph("ItemSeller")
);
<enter/>
Item item = em.find(Item.class, ITEM_ID, properties);
// select i.*, u.*
//  from ITEM i
//   inner join USERS u on u.ID = i.SELLER_ID
// where i.ID = ?

If you don’t want to hardcode the graph in annotations, build it with the API instead:

Path: /examples/src/test/java/org/jpwh/test/fetching/FetchLoadGraph.java

So far you’ve seen only properties for the find() operation. Entity graphs can also be enabled for queries, as hints:

Path: /examples/src/test/java/org/jpwh/test/fetching/FetchLoadGraph.java

List<Item> items =
    em.createQuery("select i from Item i")
        .setHint("javax.persistence.loadgraph", itemGraph)
        .getResultList();
// select i.*, u.*
//  from ITEM i
//   left outer join USERS u on u.ID = i.SELLER_ID

Entity graphs can be complex. The following declaration shows how to work with reusable subgraph declarations:

Path: /model/src/main/java/org/jpwh/model/fetching/fetchloadgraph/Bid.java

@NamedEntityGraphs({
    @NamedEntityGraph(
        name = "BidBidderItemSellerBids",
        attributeNodes = {
            @NamedAttributeNode(value = "bidder"),
            @NamedAttributeNode(
                value = "item",
                subgraph = "ItemSellerBids"
            )
        },
        subgraphs = {
            @NamedSubgraph(
                name = "ItemSellerBids",
                attributeNodes = {
                    @NamedAttributeNode("seller"),
                    @NamedAttributeNode("bids")
                })
        }
    )
})
@Entity
public class Bid {
    // ...
}

This entity graph, when enabled as a load graph when retrieving Bid instances, also triggers eager fetching of Bid#bidder, the Bid#item, and furthermore the Item #seller and all Item#bids. Although you’re free to name your entity graphs any way you like, we recommend that you develop a convention that everyone in your team can follow, and move the strings to shared constants.

With the entity graph API, the previous plan looks as follows:

Path: /examples/src/test/java/org/jpwh/test/fetching/FetchLoadGraph.java

EntityGraph<Bid> bidGraph = em.createEntityGraph(Bid.class);
bidGraph.addAttributeNodes(Bid_.bidder, Bid_.item);
Subgraph<Item> itemGraph = bidGraph.addSubgraph(Bid_.item);
itemGraph.addAttributeNodes(Item_.seller, Item_.bids);
<enter/>
Map<String, Object> properties = new HashMap<>();
properties.put("javax.persistence.loadgraph", bidGraph);
<enter/>
Bid bid = em.find(Bid.class, BID_ID, properties);

You’ve only seen entity graphs as load graphs so far. There is another option: you can enable an entity graph as a fetch graph with the javax.persistence.fetchgraph hint. If you execute a find() or query operation with a fetch graph, any attributes and collections not in your plan will be made FetchType.LAZY, and any nodes in your plan will be FetchType.EAGER. This effectively ignores all FetchType settings in your entity attribute and collection mappings, whereas the load graph feature was only augmenting.

Two weak points of the JPA entity graph operations are worth mentioning, because you’ll run into them quickly. First, you can only modify fetch plans, not the Hibernate fetch strategy (batch/subselect/join/select). Second, declaring an entity graph in annotations or XML isn’t fully type-safe: the attribute names are strings. The EntityGraph API at least is type-safe.

12.4. Summary

  • A fetch profile combines a fetch plan (what data should be loaded) with a fetch strategy (how the data should be loaded), encapsulated in reusable metadata or code.
  • You created a global fetch plan and defined which associations and collections should be loaded into memory at all times. You defined the fetch plan based on use cases, how to access associated entities and iterate through collections in your application, and which data should be available in detached state.
  • You learned to select the right fetching strategy for your fetch plan. Your goal is to minimize the number of SQL statements and the complexity of each SQL statement that must be executed. You especially want to avoid the n+1 selects and Cartesian product issues we examined in detail, using various optimization strategies.
  • You explored Hibernate fetch profiles and entity graphs, the fetch profiles in JPA.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.226.255