CHAPTER 7

image

Using Queries

For most enterprise applications, getting data out of the database is at least as important as the ability to put new data in. From searching to sorting, analytics, and business intelligence, efficiently moving data from the database to the application and presenting it to the user is a regular part of enterprise development. Doing so requires the ability to issue bulk queries against the database and interpret the results for the application. Although high-level languages and expression frameworks have in many cases attempted to insulate developers from the task of dealing with database queries at the level of SQL, it’s probably fair to say that most enterprise developers have worked with at least one SQL dialect at some point in their career.

Object-relational mapping adds another level of complexity to this task. Most of the time, the developer will want the results converted to entities so that the query results may be used directly by application logic. Similarly, if the domain model has been abstracted from the physical model via object-relational mapping, it makes sense to also abstract queries away from SQL, which is not only tied to the physical model but also difficult to port between vendors. Fortunately, as you will see, JPA can handle a diverse set of query requirements.

JPA supports two methods for expressing queries to retrieve entities and other persistent data from the database: query languages and the Criteria API. The primary query language is Java Persistence Query Language (JP QL), a database-independent query language that operates on the logical entity model as opposed to the physical data model. Queries may also be expressed in SQL to take advantage of the underlying database. We will explore using SQL queries with JPA in Chapter 11. The Criteria API provides an alternative method for constructing queries based on Java objects instead of query strings. Chapter 9 covers the Criteria API in detail.

We will begin our discussion of queries with an introduction to JP QL, followed by an exploration of the query facilities provided by the EntityManager and Query interfaces.

Java Persistence Query Language

Before discussing JP QL, we should first look to its roots. The Enterprise JavaBeans Query Language (EJB QL) was first introduced in the EJB 2.0 specification to allow developers to write portable finder and select methods for container-managed entity beans. Based on a small subset of SQL, it introduced a way to navigate across entity relationships both to select data and to filter the results. Unfortunately, it placed strict limitations on the structure of the query, limiting results to either a single entity or a persistent field from an entity. Inner joins between entities were possible, but used an odd notation. The initial release didn’t even support sorting.

The EJB 2.1 specification tweaked EJB QL a little bit, adding support for sorting, and introduced basic aggregate functions; but again the limitation of a single result type hampered the use of aggregates. You could filter the data, but there was no equivalent to SQL GROUP BY and HAVING expressions.

JP QL significantly extends EJB QL, eliminating many weaknesses of the previous versions while preserving backward compatibility. The following list describes some of the features available above and beyond EJB QL:

  • Single and multiple value result types
  • Aggregate functions, with sorting and grouping clauses
  • A more natural join syntax, including support for both inner and outer joins
  • Conditional expressions involving subqueries
  • Update and delete queries for bulk data changes
  • Result projection into non-persistent classes

The next few sections provide a quick introduction to JP QL intended for readers familiar with SQL or EJB QL. A complete tutorial and reference for JP QL can be found in Chapter 8.

Getting Started

The simplest JP QL query selects all the instances of a single entity type. Consider the following query:

SELECT e
FROM Employee e

If this looks similar to SQL, it should. JP QL uses SQL syntax where possible in order to give developers experienced with SQL a head start in writing queries. The key difference between SQL and JP QL for this query is that instead of selecting from a table, an entity from the application domain model has been specified instead. The SELECT clause of the query is also slightly different, listing only the Employee alias e. This indicates that the result type of the query is the Employee entity, so executing this statement will result in a list of zero or more Employee instances.

Starting with an alias, we can navigate across entity relationships using the dot (.) operator. For example, if we want just the names of the employees, the following query will suffice:

SELECT e.name
FROM Employee e

Each part of the expression corresponds to a persistent field of the entity that is a simple or embeddable type, or an association leading to another entity or collection of entities. Because the Employee entity has a persistent field named name of type String, this query will result in a list of zero or more String objects.

We can also select an entity we didn’t even list in the FROM clause. Consider the following example:

SELECT e.department
FROM Employee e

An employee has a many-to-one relationship with her department named department, so the result type of the query is the Department entity.

Filtering Results

Just like SQL, JP QL supports the WHERE clause to set conditions on the data being returned. The majority of operators commonly available in SQL are available in JP QL, including basic comparison operators; IN, LIKE, and BETWEEN expressions; numerous function expressions (such as SUBSTRING and LENGTH); and subqueries. The key difference for JP QL is that entity expressions and not column references are used. The following is an example of filtering using entity expressions in the WHERE clause:

SELECT e
FROM Employee e
WHERE e.department.name = 'NA42' AND
      e.address.state IN ('NY','CA')

Projecting Results

For applications that need to produce reports, a common scenario is selecting large numbers of entity instances, but using only a portion of that data. Depending on how an entity is mapped to the database, this can be an expensive operation if much of the entity data is discarded. It would be useful to return only a subset of the properties from an entity. The following query demonstrates selecting only the name and salary of each Employee instance:

SELECT e.name, e.salary
FROM Employee e

Joins Between Entities

The result type of a select query cannot be a collection; it must be a single valued object such as an entity instance or persistent field type. Expressions such as e.phones are illegal in the SELECT clause because they would result in Collection instances (each occurrence of e.phones is a collection, not an instance). Therefore, just as with SQL and tables, if we want to navigate along a collection association and return elements of that collection, we must join the two entities together. In the following code, a join between Employee and Phone entities is implicitly applied in order to retrieve all the cell phone numbers for a specific department:

SELECT p.number
FROM Employee e, Phone p
WHERE e = p.employee AND
      e.department.name = 'NA42' AND
      p.type = 'Cell'

In JP QL, joins may also be expressed in the FROM clause using the JOIN operator. The advantage of this operator is that the join can be expressed in terms of the association itself, and the query engine will automatically supply the necessary join criteria when it generates the SQL. The previous query can be rewritten to use the JOIN operator. Just as before, the alias p is of type Phone, only this time it refers to each of the phones in the e.phones collection:

SELECT p.number
FROM Employee e JOIN e.phones p
WHERE e.department.name = 'NA42' AND
      p.type = 'Cell'

JP QL supports multiple join types, including inner and outer joins, as well as a technique called fetch joins for eagerly loading data associated to the result type of a query but not directly returned. See the “Joins” section in Chapter 8 for more information.

Aggregate Queries

The syntax for aggregate queries in JP QL is very similar to that of SQL. There are five supported aggregate functions (AVG, COUNT, MIN, MAX, and SUM), and results may be grouped in the GROUP BY clause and filtered using the HAVING clause. Once again, the difference is the use of entity expressions when specifying the data to be aggregated. An aggregate JP QL query can use many of the aggregate functions within the same query:

SELECT d, COUNT(e), MAX(e.salary), AVG(e.salary)
FROM Department d JOIN d.employees e
GROUP BY d
HAVING COUNT(e) >= 5

Query Parameters

JP QL supports two types of parameter binding syntax. The first is positional binding, where parameters are indicated in the query string by a question mark followed by the parameter number. When the query is executed, the developer specifies the parameter number that should be replaced. Positional parameter syntax is similar to what JDBC currently supports.

SELECT e
FROM Employee e
WHERE e.department = ?1 AND
      e.salary > ?2

Named parameters may also be used and are indicated in the query string by a colon followed by the parameter name. When the query is executed, the developer specifies the parameter name that should be replaced. This type of parameter allows for more descriptive parameter specifiers, like so:

SELECT e
FROM Employee e
WHERE e.department = :dept AND
      e.salary > :base

Defining Queries

JPA provides the Query and TypedQuery interfaces to configure and execute queries. The Query interface is used in cases when the result type is Object or in dynamic queries when the result type may not be known ahead of time. The TypedQuery interface is the preferred one and can be used whenever the result type is known. As TypedQuery extends Query, a strongly typed query can always be treated as an untyped version, though not vice versa. An implementation of the appropriate interface for a given query is obtained through one of the factory methods in the EntityManager interface. The choice of factory method depends on the type of query (JP QL, SQL, or criteria object), whether the query has been predefined, and whether strongly typed results are desired. For now, we will restrict our discussion to JP QL queries. SQL query definition is discussed in Chapter 11, and criteria queries are discussed in Chapter 9.

There are three approaches to defining a JP QL query. A query may either be dynamically specified at runtime, configured in persistence unit metadata (annotation or XML) and referenced by name, or dynamically specified and saved to be later referenced by name. Dynamic JP QL queries are nothing more than strings, and therefore may be defined on the fly as the need arises. Named queries, on the other hand, are static and unchangeable, but are more efficient to execute because the persistence provider can translate the JP QL string to SQL once when the application starts as opposed to every time the query is executed. Dynamically defining a query and then naming it allows a dynamic query to be reused multiple times throughout the life of the application but incur the dynamic processing cost only once.

The following sections compare the approaches and discuss when one should be used instead of the others.

Dynamic Query Definition

A query may be defined dynamically by passing the JP QL query string and expected result type to the createQuery() method of the EntityManager interface. The result type may be omitted to create an untyped query. We will discuss this approach in the “Working with Query Results” section. There are no restrictions on the query definition. All JP QL query types are supported, as well as the use of parameters. The ability to build up a string at runtime and use it for a query definition is useful, particularly for applications where the user may specify complex criteria and the exact shape of the query cannot be known ahead of time. As noted earlier, in addition to dynamic string queries, JPA also supports a Criteria API to create dynamic queries using Java objects. We will discuss this approach in Chapter 9.

An issue to consider with string dynamic queries, however, is the cost of translating the JP QL string to SQL for execution. A typical query engine will have to parse the JP QL string into a syntax tree, get the object-relational mapping metadata for each entity in each expression, and then generate the equivalent SQL. For applications that issue many queries, the performance cost of dynamic query processing can become an issue.

Many query engines will cache the translated SQL for later use, but this can easily be defeated if the application does not use parameter binding and concatenates parameter values directly into query strings. This has the effect of generating a new and unique query every time a query that requires parameters is constructed.

Consider the bean1 method shown in Listing 7-1 that searches for salary information given the name of a department and the name of an employee. There are two problems with this example, one related to performance and one related to security. Because the names are concatenated into the string instead of using parameter binding, it is effectively creating a new and unique query each time. One hundred calls to this method could potentially generate one hundred different query strings. This not only requires excessive parsing of JP QL but also almost certainly makes it difficult for the persistence provider if it attempts to build a cache of converted queries.

Listing 7-1.  Defining a Query Dynamically

public class QueryService {
    @PersistenceContext(unitName="DynamicQueries")
    EntityManager em;

    public long queryEmpSalary(String deptName, String empName) {
        String query = "SELECT e.salary " +
                       "FROM Employee e " +
                       "WHERE e.department.name = '" + deptName +
                       "' AND " +
                       "      e.name = '" + empName + "'";
        return em.createQuery(query, Long.class).getSingleResult();
    }
}

The second problem with this example is that it is vulnerable to injection attacks, where a malicious user could pass in a value that alters the query to his advantage. Consider a case where the department argument was fixed by the application but the user was able to specify the employee name (the manager of the department is querying the salaries of his or her employees, for example). If the name argument were actually the text '_UNKNOWN' OR e.name = 'Roberts', the actual query parsed by the query engine would be as follows:

SELECT e.salary
FROM Employee e
WHERE e.department.name = 'NA65' AND
      e.name = '_UNKNOWN' OR
      e.name = 'Roberts'

By introducing the OR condition, the user has effectively given himself access to the salary value for any employee in the company because the original AND condition has a higher precedence than OR, and the fake employee name is unlikely to belong to a real employee in that department.

This type of problem may sound unlikely, but in practice many web applications take text submitted over a GET or POST request and blindly construct queries of this sort without considering side effects. One or two attempts that result in a parser stack trace displayed to the web page, and the attacker will learn everything he needs to know about how to alter the query to his advantage.

Listing 7-2 shows the same method as in Listing 7-1, except that it uses named parameters instead. This not only reduces the number of unique queries parsed by the query engine, but it also eliminates the chance of the query being altered.

Listing 7-2.  Using Parameters with a Dynamic Query

public class QueryService {
    private static final String QUERY =
        "SELECT e.salary " +
        "FROM Employee e " +
        "WHERE e.department.name = :deptName AND " +
        "      e.name = :empName ";

    @PersistenceContext(unitName="QueriesUnit")
    EntityManager em;

    public long queryEmpSalary(String deptName, String empName) {
        return em.createQuery(QUERY, Long.class)
                 .setParameter("deptName", deptName)
                 .setParameter("empName", empName)
                 .getSingleResult();
    }
}

The parameter binding approach shown in Listing 7-2 defeats the security threat described previously because the original query string is never altered. The parameters are marshaled using the JDBC API and handled directly by the database. The text of a parameter string is effectively quoted by the database, so the malicious attack would actually end up producing the following query:

SELECT e.salary
FROM Employee e
WHERE e.department.name = 'NA65' AND
      e.name = '_UNKNOWN'' OR e.name = ''Roberts'

The single quotes used in the query parameter here have been escaped by prefixing them with an additional single quote. This removes any special meaning from them, and the entire sequence is treated as a single string value.

We recommend statically defined named queries in general, particularly for queries that are executed frequently. If dynamic queries are a necessity, take care to use parameter binding instead of concatenating parameter values into query strings in order to minimize the number of distinct query strings parsed by the query engine.

Named Query Definition

Named queries are a powerful tool for organizing query definitions and improving application performance. A named query is defined using the @NamedQuery annotation, which may be placed on the class definition for any entity. The annotation defines the name of the query, as well as the query text. Listing 7-3 shows how the query string used in Listing 7-2 would be declared as a named query.

Listing 7-3.  Defining a Named Query

@NamedQuery(name="findSalaryForNameAndDepartment",
            query="SELECT e.salary " +
                  "FROM Employee e " +
                  "WHERE e.department.name = :deptName AND " +
                  "      e.name = :empName")

Named queries are typically placed on the entity class that most directly corresponds to the query result, so the Employee entity would be a good location for this named query. Note the use of string concatenation in the annotation definition. Formatting your queries visually aids in the readability of the query definition. The garbage normally associated with repeated string concatenation will not apply here because the annotation will be processed only once at startup time and will be executed at runtime in query form.

The name of the query is scoped to the entire persistence unit and must be unique within that scope. This is an important restriction to keep in mind, as commonly used query names such as "findAll" will have to be qualified for each entity. A common practice is to prefix the query name with the entity name. For example, the "findAll" query for the Employee entity would be named "Employee.findAll". It is undefined what should happen if two queries in the same persistence unit have the same name, but it is likely that either deployment of the application will fail or one will overwrite the other, leading to unpredictable results at runtime.

If more than one named query is to be defined on a class, they must be placed inside of a @NamedQueries annotation, which accepts an array of one or more @NamedQuery annotations. Listing 7-4 shows the definition of several queries related to the Employee entity. Queries may also be defined (or redefined) using XML. This technique is discussed in Chapter 13.

Listing 7-4.  Multiple Named Queries for an Entity

@NamedQueries({
    @NamedQuery(name="Employee.findAll",
                query="SELECT e FROM Employee e"),
    @NamedQuery(name="Employee.findByPrimaryKey",
                query="SELECT e FROM Employee e WHERE e.id = :id"),
    @NamedQuery(name="Employee.findByName",
                query="SELECT e FROM Employee e WHERE e.name = :name")
})

Because the query string is defined in the annotation, it cannot be altered by the application at runtime. This contributes to the performance of the application and helps to prevent the kind of security issues discussed in the previous section. Due to the static nature of the query string, any additional criteria required for the query must be specified using query parameters. Listing 7-5 demonstrates using the createNamedQuery() call on the EntityManager interface to create and execute a named query that requires a query parameter.

Listing 7-5.  Executing a Named Query

public class EmployeeService {
    @PersistenceContext(unitName="EmployeeService")
    EntityManager em;

    public Employee findEmployeeByName(String name) {
         return em.createNamedQuery("Employee.findByName",
                                    Employee.class)
                  .setParameter("name", name)
                  .getSingleResult();
    }

    // ...
}

Named parameters are the most practical choice for named queries because they effectively self-document the application code that invokes the queries. Positional parameters are still supported, however, and may be used instead.

Dynamic Named Queries

A hybrid approach is to dynamically create a query and then save it as a named query in the entity manager factory. At that point it becomes just like any other named query that may have been declared statically in metadata. While this may seem like a good compromise, it turns out to be useful in only a few specific cases. The main advantage it offers is if there are queries that are not known until runtime, but then reissued repeatedly. Once the dynamic query becomes a named query it will only bear the cost of processing once. It is implementation-specific whether that cost is paid when the query is registered as a named query, or deferred until the first time it is executed.

A dynamic query can be turned into a named query by using the EntityManagerFactory addNamedQuery() method. Listing 7-6 shows how this is done.

Listing 7-6.  Dynamically Adding a Named Query

public class QueryService {
    private static final String QUERY =
        "SELECT e.salary " +
        "FROM Employee e " +
        "WHERE e.department.name = :deptName AND " +
        "      e.name = :empName ";

    @PersistenceContext(unitName="QueriesUnit")
    EntityManager em;

    @PersistenceUnit(unitName="QueriesUnit")
    EntityManagerFactory emf;

    @PostConstruct
    public void init() {
        TypedQuery<Long> q = em.createQuery(QUERY, Long.class);
        emf.addNamedQuery("findSalaryForNameAndDepartment", q);
    }

    public long queryEmpSalary(String deptName, String empName) {
        return em.createNamedQuery("findSalaryForNameAndDepartment", Long.class)
                 .setParameter("deptName", deptName)
                 .setParameter("empName", empName)
                 .getSingleResult();
    }

    // ...
}

The bean initialization method, annotated with @PostConstruct, creates the dynamic query and adds it to the set of named queries. As mentioned earlier, named queries are scoped to the entire persistence unit so it makes sense that they be added at the level of the entity manager factory. It also requires some caution in the choosing of names since adding a query with the same name as an existing one would simply cause the existing one to be overwritten. Because we needed the entity manager factory in both of the methods we just injected it into the bean instance using @PersistenceUnit. We could just as easily have accessed it from the entity manager using getEntityManagerFactory().

image Note  The addNamedQuery() method was added in JPA 2.1. If it occurred to you that the example is a little less than useful, you are beginning to get the reason why using this mixed approach is less commonly needed. If you can put the query description in a static string at development time, it’s more appropriate to just use an annotation to define a static named query. The cases where it might be advantageous to make a named query out of a dynamic query are the following:

  • The application gets access to some criteria at runtime that contributes to a query that is predetermined to be commonly executed.
  • A named query is already defined but because of some aspect of the runtime environment you want to override the named query with a different one without using an additional XML descriptor.
  • There is a preference to define all of the queries in code at startup time.

image Tip  When a dynamic query is added as a named query then all of the settings2 on the query at the time it was added will be saved along with it. Each time the named query is executed, the saved settings will apply, unless they are overridden at execution time. However, as with all other named queries, any setting made to the named query (i.e. the query returned from createNamedQuery()) at execution time will be temporary and apply only to that individual execution.

Parameter Types

As mentioned earlier, JPA supports both named and positional parameters for JP QL queries. The query factory methods of the entity manager return an implementation of the Query interface. Parameter values are then set on this object using the setParameter() methods of the Query interface.

There are three variations of this method for both named parameters and positional parameters. The first argument is always the parameter name or number. The second argument is the object to be bound to the named parameter. Date and Calendar parameters also require a third argument that specifies whether the type passed to JDBC is a java.sql.Date, java.sql.Time, or java.sql.TimeStamp value.

Consider the following named query definition, which requires two named parameters:

@NamedQuery(name="findEmployeesAboveSal",
            query="SELECT e " +
                  "FROM Employee e " +
                  "WHERE e.department = :dept AND " +
                  "      e.salary > :sal")

This query highlights one of the nice features of JP QL in that entity types may be used as parameters. When the query is translated to SQL, the necessary primary key columns will be inserted into the conditional expression and paired with the primary key values from the parameter. It is not necessary to know how the primary key is mapped in order to write the query. Binding the parameters for this query is a simple case of passing in the required Department entity instance as well as a long representing the minimum salary value for the query. Listing 7-7 demonstrates how to bind the entity and primitive parameters required by this query.

Listing 7-7.  Binding Named Parameters

public List<Employee> findEmployeesAboveSal(Department dept, long minSal) {
    return em.createNamedQuery("findEmployeesAboveSal", Employee.class)
                       .setParameter("dept", dept)
                       .setParameter("sal", minSal)
                       .getResultList();
}

Date and Calendar parameters are a special case because they represent both dates and times. In Chapter 4, we discussed mapping temporal types by using the @Temporal annotation and the TemporalType enumeration. This enumeration indicates whether the persistent field is a date, time, or timestamp. When a query uses a Date or Calendar parameter, it must select the appropriate temporal type for the parameter. Listing 7-8 demonstrates binding parameters where the value should be treated as a date.

Listing 7-8.  Binding Date Parameters

public List<Employee> findEmployeesHiredDuringPeriod(Date start, Date end) {
    return em.createQuery("SELECT e " +
                                         "FROM Employee e " +
                                         "WHERE e.startDate BETWEEN ?1 AND ?2",
                                         Employee.class)
                      .setParameter(1, start, TemporalType.DATE)
                      .setParameter(2, end, TemporalType.DATE)
                      .getResultList();
}

One thing to keep in mind with query parameters is that the same parameter can be used multiple times in the query string yet only needs to be bound once using the setParameter() method. For example, consider the following named query definition, where the "dept" parameter is used twice in the WHERE clause:

@NamedQuery(name="findHighestPaidByDepartment",
            query="SELECT e " +
                  "FROM Employee e " +
                  "WHERE e.department = :dept AND " +
                  "      e.salary = (SELECT MAX(e.salary) " +
                  "                  FROM Employee e " +
                  "                  WHERE e.department = :dept)")

To execute this query, the "dept" parameter needs to be set only once with setParameter(), as in the following example:

public Employee findHighestPaidByDepartment(Department dept) {
    return em.createNamedQuery("findHighestPaidByDepartment",
                               Employee.class)
                        .setParameter("dept", dept)
                        .getSingleResult();
}

Executing Queries

The Query and TypedQuery interfaces each provide three different ways to execute a query, depending on whether or not the query returns results and how many results should be expected. For queries that return values, the developer may choose to call either getSingleResult() if the query is expected to return a single result or getResultList() if more than one result may be returned. The executeUpdate() method is used to invoke bulk update and delete queries. We will discuss this method later in the “Bulk Update and Delete” section. Note that both of the query interfaces define the same set of methods and differ only in their return types. We will cover this issue in the next section.

The simplest form of query execution is via the getResultList() method. It returns a collection containing the query results. If the query did not return any data, the collection is empty. The return type is specified as a List instead of a Collection in order to support queries that specify a sort order. If the query uses the ORDER BY clause to specify a sort order, the results will be put into the result list in the same order. Listing 7-9 demonstrates how a query might be used to generate a menu for a command-line application that displays the name of each employee working on a project as well as the name of the department that the employee is assigned to. The results are sorted by the name of the employee. Queries are unordered by default.

Listing 7-9.  Iterating over Sorted Results

public void displayProjectEmployees(String projectName) {
    List<Employee> result = em.createQuery(
                          "SELECT e " +
                          "FROM Project p JOIN p.employees e "+
                          "WHERE p.name = ?1 " +
                          "ORDER BY e.name",
                          Employee.class)
                    .setParameter(1, projectName)
                    .getResultList();
    int count = 0;
    for (Employee e : result) {
        System.out.println(++count + ": " + e.getName() + ", " +
                           e.getDepartment().getName());
    }
}

The getSingleResult() method is provided as a convenience for queries that return only a single value. Instead of iterating to the first result in a collection, the object is directly returned. It is important to note, however, that getSingleResult() behaves differently from getResultList() in how it handles unexpected results. Whereas getResultList() returns an empty collection when no results are available, getSingleResult() throws a NoResultException exception. Therefore if there is a chance that the desired result may not be found, then this exception needs to be handled.

If multiple results are available after executing the query instead of the single expected result, getSingleResult() will throw a NonUniqueResultException exception. Again, this can be problematic for application code if the query criteria may result in more than one row being returned in certain circumstances. Although getSingleResult() is convenient to use, be sure that the query and its possible results are well understood; otherwise application code may have to deal with an unexpected runtime exception. Unlike other exceptions thrown by entity manager operations, these exceptions will not cause the provider to roll back the current transaction, if there is one.

Any SELECT query that returns data via the getResultList() and getSingleResult() methods may also specify locking constraints for the database rows impacted by the query. This facility is exposed through the query interfaces via the setLockMode() method. We will defer discussion of the locking semantics for queries until the full discussion of locking in Chapter 12.

Query and TypedQuery objects may be reused as often as needed so long as the same persistence context that was used to create the query is still active. For transaction-scoped entity managers, this limits the lifetime of the Query or TypedQuery object to the life of the transaction. Other entity manager types may reuse them until the entity manager is closed or removed.

Listing 7-10 demonstrates caching a TypedQuery object instance on the bean class of a stateful session bean that uses an extended persistence context. Whenever the bean needs to find the list of employees who are currently not assigned to any project, it reuses the same unassignedQuery object that was initialized during PostConstruct.

Listing 7-10.  Reusing a Query Object

@Stateful
public class ProjectManager {
    @PersistenceContext(unitName="EmployeeService",
                        type=PersistenceContextType.EXTENDED)
    EntityManager em;

    TypedQuery<Employee> unassignedQuery;

    @PostConstruct
    public void init() {
        unassignedQuery =
            em.createQuery("SELECT e " +
                           "FROM Employee e " +
                           "WHERE e.projects IS EMPTY",
                           Employee.class);
    }

    public List<Employee> findEmployeesWithoutProjects() {
        return unassignedQuery.getResultList();
    }

    // ...
}

Working with Query Results

The result type of a query is determined by the expressions listed in the SELECT clause of the query. If the result type of a query is the Employee entity, then executing getResultList() will result in a collection of zero or more Employee entity instances. There is a wide variety of results possible, depending on the makeup of the query. The following are just some of the types that may result from JP QL queries:

  • Basic types, such as String, the primitive types, and JDBC types
  • Entity types
  • An array of Object
  • User-defined types created from a constructor expression

For developers used to JDBC, the most important thing to remember when using the Query and TypedQuery interfaces is that the results are not encapsulated in a JDBC ResultSet. The collection or single result corresponds directly to the result type of the query.

Whenever an entity instance is returned, it becomes managed by the active persistence context. If that entity instance is modified and the persistence context is part of a transaction, the changes will be persisted to the database. The only exception to this rule is the use of transaction-scoped entity managers outside of a transaction. Any query executed in this situation returns detached entity instances instead of managed entity instances. To make changes on these detached entities, they must first be merged into a persistence context before they can be synchronized with the database.

A consequence of the long-term management of entities with application-managed and extended persistence contexts is that executing large queries will cause the persistence context to grow as it stores all the managed entity instances that are returned. If many of these persistence contexts are holding onto large numbers of managed entities for long periods of time, then memory use may become a concern. The clear() method of the EntityManager interface may be used to clear application-managed and extended persistence contexts, removing unnecessary managed entities.

Untyped Results

So far in this chapter we have been demonstrating the strongly typed versions of the query creation methods. We have provided the expected result type and therefore received an instance of TypedQuery that is bound to the expected type. By qualifying the result type in this way, the getResultList() and getSingleResult() methods return the correct types without the need for casting.

In the event that the result type is Object, or the JP QL query selects multiple objects, you may use the untyped versions of the query creation methods. Omitting the result type produces a Query instance instead of a TypedQuery instance, which defines getResultList() to return an unbound List and getSingleResult() to return Object. For an example of using untyped results see the code listings in the “Special Result Types” section.

Optimizing Read-Only Queries

When the query results will not be modified, queries using transaction-scoped entity managers outside of a transaction can be more efficient than queries executed within a transaction when the result type is an entity. When query results are prepared within a transaction, the persistence provider has to take steps to convert the results into managed entities. This usually entails taking a snapshot of the data for each entity in order to have a baseline to compare against when the transaction is committed. If the managed entities are never modified, the effort of converting the results into managed entities is wasted.

Outside of a transaction, in some circumstances the persistence provider may be able to optimize the case where the results will be detached immediately. Therefore it can avoid the overhead of creating the managed versions. Note that this technique does not work on application-managed or extended entity managers because their persistence context outlives the transaction. Any query result from this type of persistence context may be modified for later synchronization to the database even if there is no transaction.

When encapsulating query operations behind a bean with container-managed transactions, the easiest way to execute nontransactional queries is to use the NOT_SUPPORTED transaction attribute for the session bean method. This will cause any active transaction to be suspended, forcing the query results to be detached and enabling this optimization. Listing 7-11 shows an example of this technique using a stateless session bean.

Listing 7-11.  Executing a Query Outside of a Transaction

@Stateless
public class QueryService {
    @PersistenceContext(unitName="EmployeeService")
    EntityManager em;

    @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
    public List<Department> findAllDepartmentsDetached() {
        return em.createQuery("SELECT d FROM Department d",
                              Department.class)
                 .getResultList();
    }

    // ...
}

image Note  This optimization is completely provider-specific. Some providers may instead opt to create a temporary persistence context for the query and just throw it away after extracting the results from it, making this suggested optimization rather extraneous. Check your provider before making a coding decision.

Special Result Types

Whenever a query involves more than one expression in the SELECT clause, the result of the query will be a List of Object arrays. Common examples include projection of entity fields and aggregate queries where grouping expressions or multiple functions are used. Listing 7-12 revisits the menu generator from Listing 7-9 using a projection query instead of returning full Employee entity instances. Each element of the List is cast to an array of Object that is then used to extract the employee and department name information. We use an untyped query because the result has multiple elements in it.

Listing 7-12.  Handling Multiple Result Types

public void displayProjectEmployees(String projectName) {
    List result = em.createQuery(
                            "SELECT e.name, e.department.name " +
                            "FROM Project p JOIN p.employees e " +
                            "WHERE p.name = ?1 " +
                            "ORDER BY e.name")
                    .setParameter(1, projectName)
                    .getResultList();
    int count = 0;
    for (Iterator i = result.iterator(); i.hasNext();) {
        Object[] values = (Object[]) i.next();
        System.out.println(++count + ": " +
                           values[0] + ", " + values[1]);
    }
}

Constructor expressions provide developers with a way to map an array of Object result types to custom objects. Typically this is used to convert the results into JavaBean-style classes that provide getters for the different returned values. This makes the results easier to work with and makes it possible to use the results directly in an environment such as JavaServer Faces without additional translation.

A constructor expression is defined in JP QL using the NEW operator in the SELECT clause. The argument to the NEW operator is the fully qualified name of the class that will be instantiated to hold the results for each row of data returned. The only requirement for this class is that it has a constructor with arguments matching the exact type and order that will be specified in the query. Listing 7-13 shows an EmpMenu class defined in the package example that could be used to hold the results of the query that was executed in Listing 7-12.

Listing 7-13.  Defining a Class for Use in a Constructor Expression

package example;

public class EmpMenu {
    private String employeeName;
    private String departmentName;

    public EmpMenu(String employeeName, String departmentName) {
        this.employeeName = employeeName;
        this.departmentName = departmentName;
    }

    public String getEmployeeName() { return employeeName; }
    public String getDepartmentName() { return departmentName; }
}

Listing 7-14 shows the same example as Listing 7-12 using the fully qualified EmpMenu class name in a constructor expression. Instead of working with array indexes, each result is an instance of the EmpMenu class and is used like a regular Java object. We can also use typed queries again because there is only one expression in the SELECT clause.

Listing 7-14.  Using Constructor Expressions

public void displayProjectEmployees(String projectName) {
    List<EmpMenu> result =
        em.createQuery("SELECT NEW example.EmpMenu(" +
                                       "e.name, e.department.name) " +
                       "FROM Project p JOIN p.employees e " +
                       "WHERE p.name = ?1 " +
                       "ORDER BY e.name",
                       EmpMenu.class)
           .setParameter(1, projectName)
           .getResultList();
    int count = 0;
    for (EmpMenu menu : result) {
        System.out.println(++count + ": " +
                           menu.getEmployeeName() + ", " +
                           menu.getDepartmentName());
    }
}

Query Paging

Large result sets from queries are often a problem for many applications. In cases where it would be overwhelming to display the entire result set, or if the application medium makes displaying many rows inefficient (web applications, in particular), applications must be able to display ranges of a result set and provide users with the ability to control the range of data that they are viewing. The most common form of this technique is to present the user with a fixed-size table that acts as a sliding window over the result set. Each increment of results displayed is called a page, and the process of navigating through the results is called pagination.

Efficiently paging through result sets has long been a challenge for both application developers and database vendors. Before support existed at the database level, a common technique was to first retrieve all the primary keys for the result set and then issue separate queries for the full results using ranges of primary key values. Later, database vendors added the concept of logical row number to query results, guaranteeing that as long as the result was ordered, the row number could be relied on to retrieve portions of the result set. More recently, the JDBC specification has taken this even further with the concept of scrollable result sets, which can be navigated forward and backward as required.

The Query and TypedQuery interfaces provide support for pagination via the setFirstResult() and setMaxResults() methods. These methods specify the first result to be received (numbered from zero) and the maximum number of results to return relative to that point. Values set for these methods may be likewise retrieved via the getFirstResult() and getMaxResults() methods. A persistence provider may choose to implement support for this feature in a number of different ways because not all databases benefit from the same approach. It’s a good idea to become familiar with the way your vendor approaches pagination and what level of support exists in the target database platform for your application.

image Caution  The setFirstResult() and setMaxResults() methods should not be used with queries that join across collection relationships (one-to-many and many-to-many) because these queries may return duplicate values. The duplicate values in the result set make it impossible to use a logical result position.

To better illustrate pagination support, consider the stateful bean shown in Listing 7-15. Once created, it is initialized with the name of a query to count the total results and the name of a query to generate the report. When results are requested, it uses the page size and current page number to calculate the correct parameters for the setFirstResult() and setMaxResults() methods. The total number of results possible is calculated by executing the count query. By using the next(), previous(), and getCurrentResults() methods, presentation code can page through the results as required. If this bean were bound into an HTTP session, it could be directly used by a JSP or JavaServer Faces page presenting the results in a data table. The class in Listing 7-15 is a general template for a bean that holds intermediate state for an application query from which the results are processed in segments. A stateful session bean is used.

Listing 7-15.  Stateful Report Pager

@Stateful
public class ResultPager {
    @PersistenceContext(unitName="QueryPaging")
    private EntityManager em;

    private String reportQueryName;
    private long currentPage;
    private long maxResults;
    private long pageSize;
    
    public long getPageSize() {
        return pageSize;
    }

    public long getMaxPages() {
        return maxResults / pageSize;
    }

    public void init(long pageSize, String countQueryName,
                     String reportQueryName) {
        this.pageSize = pageSize;
        this.reportQueryName = reportQueryName;
        maxResults = em.createNamedQuery(countQueryName, Long.class)
                              .getSingleResult();
        currentPage = 0;
    }

    public List getCurrentResults() {
        return em.createNamedQuery(reportQueryName)
                 .setFirstResult(currentPage * pageSize)
                 .setMaxResults(pageSize)
                 .getResultList();
    }

    public void next() {
        currentPage++;
    }

    public void previous() {
        currentPage--;
        if (currentPage < 0) {
            currentPage = 0;
        }
    }

    public long getCurrentPage() {
        return currentPage;
    }

    public void setCurrentPage(long currentPage) {
        this.currentPage = currentPage;
    }

    @Remove
    public void finished() {}
}

Queries and Uncommitted Changes

Executing queries against entities that have been created or changed in a transaction is a topic that requires special consideration. As we discussed in Chapter 6, the persistence provider will attempt to minimize the number of times the persistence context must be flushed within a transaction. Optimally this will occur only once, when the transaction commits. While the transaction is open and changes are being made, the provider relies on its own internal cache synchronization to ensure that the right version of each entity is used in entity manager operations. At most the provider may have to read new data from the database in order to fulfill a request. All entity operations other than queries can be satisfied without flushing the persistence context to the database.

Queries are a special case because they are executed directly as SQL against the database. Because the database executes the query and not the persistence provider, the active persistence context cannot usually be consulted by the query. As a result, if the persistence context has not been flushed and the database query would be impacted by the changes pending in the persistence context, incorrect data is likely to be retrieved from the query. The entity manager find() operation, on the other hand, queries for a single entity with a given primary key. It can always check the persistence context before going to the database, so incorrect data is not a concern.

The good news is that by default, the persistence provider will ensure that queries are able to incorporate pending transactional changes in the query result. It might accomplish this by flushing the persistence context to the database, or it might leverage its own runtime information to ensure the results are correct.

And yet, there are times when having the persistence provider ensure query integrity is not necessarily the behavior you need. The problem is that it is not always easy for the provider to determine the best strategy to accommodate the integrity needs of a query. There is no practical way the provider can logically determine at a fine-grained level which objects have changed and therefore need to be incorporated into the query results. If the provider solution to ensuring query integrity is to flush the persistence context to the database, then you might have a performance problem if this is a frequent occurrence.

To put this issue in context, consider a message board application, which has modeled conversation topics as Conversation entities. Each Conversation entity refers to one or more messages represented by a Message entity. Periodically, conversations are archived when the last message added to the conversation is more than 30 days old. This is accomplished by changing the status of the Conversation entity from ACTIVE to INACTIVE. The two queries to obtain the list of active conversations and the last message date for a given conversation are shown in Listing 7-16.

Listing 7-16.  Conversation Queries

@NamedQueries({
    @NamedQuery(name="findActiveConversations",
                query="SELECT c " +
                      "FROM Conversation c " +
                      "WHERE c.status = 'ACTIVE'"),
    @NamedQuery(name="findLastMessageDate",
                query="SELECT MAX(m.postingDate) " +
                      "FROM Conversation c JOIN c.messages m " +
                      "WHERE c = :conversation")
})

Listing 7-17 shows the method used to perform this maintenance, accepting a Date argument that specifies the minimum age for messages in order to still be considered an active conversation. In this example, two queries are being executed. The "findActiveConversations" query collects all the active conversations, while the "findLastMessageDate" returns the last date that a message was added to a Conversation entity. As the code iterates over the Conversation entities, it invokes the "findLastMessageDate" query for each one. As these two queries are related, it is reasonable for a persistence provider to assume that the results of the "findLastMessageDate" query will depend on the changes being made to the Conversation entities. If the provider ensures the integrity of the "findLastMessageDate" query by flushing the persistence context, this could become a very expensive operation if hundreds of active conversations are being checked.

Listing 7-17.  Archiving Conversation Entities

public void archiveConversations(Date minAge) {
    List<Conversation> active =
        em.createNamedQuery("findActiveConversations",
                            Conversation.class)
          .getResultList();
    TypedQuery<Date> maxAge =
        em.createNamedQuery("findLastMessageDate", Date.class);
    for (Conversation c : active) {
        maxAge.setParameter("conversation", c);
        Date lastMessageDate = maxAge.getSingleResult();
        if (lastMessageDate.before(minAge)) {
            c.setStatus("INACTIVE");
        }
    }
}

To offer more control over the integrity requirements of queries, the EntityManager and Query interfaces support a setFlushMode() method to set the flush mode, an indicator to the provider how it should handle pending changes and queries. There are two possible flush mode settings, AUTO and COMMIT, which are defined by the FlushModeType enumerated type. The default setting is AUTO, which means that the provider should ensure that pending transactional changes are included in query results. If a query might overlap with changed data in the persistence context, this setting will ensure that the results are correct. The current flush mode setting may be retrieved via the getFlushMode() method.

The COMMIT flush mode tells the provider that queries don’t overlap with changed data in the persistence context, so it does not need to do anything in order to get correct results. Depending on how the provider implements its query integrity support, this might mean that it does not have to flush the persistence context before executing a query because you have indicated that there is no changed data in memory that would affect the results of the database query.

Although the flush mode is set on the entity manager, the flush mode is really a property of the persistence context. For transaction-scoped entity managers, that means the flush mode has to be changed in every transaction. Extended and application-managed entity managers will preserve their flush mode setting across transactions. Flush mode will not apply at all to persistence contexts that are not synchronized with the transaction.

Setting the flush mode on the entity manager applies to all queries, while setting the flush mode for a query limits the setting to that scope. Setting the flush mode on the query overrides the entity manager setting, as you would expect. If the entity manager setting is AUTO and one query has the COMMIT setting, the provider will guarantee query integrity for all the queries other than the one with the COMMIT setting. Likewise, if the entity manager setting is COMMIT and one query has an AUTO setting, only the query with the AUTO setting is guaranteed to incorporate pending changes from the persistence context.

Generally speaking, if you are going to execute queries in transactions where data is being changed, AUTO is the right answer. If you are concerned about the performance implications of ensuring query integrity, consider changing the flush mode to COMMIT on a per-query basis. Changing the value on the entity manager, while convenient, can lead to problems if more queries are added to the application later and they require AUTO semantics.

Coming back to the example at the start of this section, we can set the flush mode on the TypedQuery object for the "findLastMessageDate" query to COMMIT because it does not need to see the changes being made to the Conversation entities. The following fragment shows how this would be accomplished for the archiveConversations() method shown in Listing 7-17:

public void archiveConversations(Date minAge) {
    // ...
    TypedQuery<Date> maxAge = em.createNamedQuery(
                                   "findLastMessageDate", Date.class);
    maxAge.setFlushMode(FlushModeType.COMMIT);
    // ...
}

Query Timeouts

Generally speaking, when a query executes it will block until the database query returns. In addition to the obvious concern about runaway queries and application responsiveness, it may also be a problem if the query is participating in a transaction and a timeout has been set on the JTA transaction or on the database. The timeout on the transaction or database may cause the query to abort early, but it will also cause the transaction to roll back, preventing any further work in the same transaction.

If an application needs to set a limit on query response time without using a transaction or causing a transaction rollback, the javax.persistence.query.timeout property may be set on the query or as part of the persistence unit. This property defines the number of milliseconds that the query should be allowed to run before it is aborted. Listing 7-18 demonstrates how to set a timeout value for a given query. This example uses the query hint mechanism, which we will discuss in more detail later in the “Query Hints” section. Setting properties on the persistence unit is covered in Chapter 14.

Listing 7-18.  Setting a Query Timeout

public Date getLastUserActivity() {
    TypedQuery<Date> lastActive =
        em.createNamedQuery("findLastUserActivity", Date.class);
    lastActive.setHint("javax.persistence.query.timeout", 5000);
    try {
        return lastActive.getSingleResult();
    } catch (QueryTimeoutException e) {
        return null;
    }
}

Unfortunately, setting a query timeout is not portable behavior. It may not be supported by all database platforms nor is it a requirement to be supported by all persistence providers. Therefore, applications that want to enable query timeouts must be prepared for three scenarios. The first is that the property is silently ignored and has no effect. The second is that the property is enabled and any select, update, or delete operation that runs longer than the specified timeout value is aborted, and a QueryTimeoutException is thrown. This exception may be handled and will not cause any active transaction to be marked for rollback. Listing 7-18 demonstrates one approach to handling this exception. The third scenario is that the property is enabled, but in doing so the database forces a transaction rollback when the timeout is exceeded. In this case, a PersistenceException will be thrown and the transaction marked for rollback. In general, if enabled the application should be written to handle the QueryTimeoutException, but should not fail if the timeout is exceeded and the exception is not thrown.

Bulk Update and Delete

Like their SQL counterparts, JP QL bulk UPDATE and DELETE statements are designed to make changes to large numbers of entities in a single operation without requiring the individual entities to be retrieved and modified using the entity manager. Unlike SQL, which operates on tables, JP QL UPDATE and DELETE statements must take the full range of mappings for the entity into account. These operations are challenging for vendors to implement correctly, and as a result, there are restrictions on the use of these operations that must be well understood by developers.

The full syntax for UPDATE and DELETE statements is described in Chapter 8. The following sections will describe how to use these operations effectively and the issues that may result when used incorrectly.

Using Bulk Update and Delete

Bulk update of entities is accomplished with the UPDATE statement. This statement operates on a single entity type and sets one or more single-valued properties of the entity (either a state field or a single-valued association) subject to the conditions in the WHERE clause. It can also be used on embedded state in one or more embeddable objects referenced by the entity. In terms of syntax, it is nearly identical to the SQL version with the exception of using entity expressions instead of tables and columns. Listing 7-19 demonstrates using a bulk UPDATE statement. Note that the use of the REQUIRES_NEW transaction attribute type is significant and will be discussed following the examples.

Listing 7-19.  Bulk Update of Entities

@Stateless
public class EmployeeService {
    @PersistenceContext(unitName="BulkQueries")
    EntityManager em;

    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    public void assignManager(Department dept, Employee manager) {
         em.createQuery("UPDATE Employee e " +
                        "SET e.manager = ?1 " +
                        "WHERE e.department = ?2")
           .setParameter(1, manager)
           .setParameter(2, dept)
           .executeUpdate();
    }
}

Bulk removal of entities is accomplished with the DELETE statement. Again, the syntax is the same as the SQL version, except that the target in the FROM clause is an entity instead of a table, and the WHERE clause is composed of entity expressions instead of column expressions. Listing 7-20 demonstrates bulk removal of entities.

Listing 7-20.  Bulk Removal of Entities

@Stateless
public class ProjectService {
    @PersistenceContext(unitName="BulkQueries")
    EntityManager em;

    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    public void removeEmptyProjects() {
         em.createQuery("DELETE FROM Project p " +
                        "WHERE p.employees IS EMPTY")
           .executeUpdate();
    }
}

The first issue to consider when using these statements is that the persistence context is not updated to reflect the results of the operation. Bulk operations are issued as SQL against the database, bypassing the in-memory structures of the persistence context. Therefore, updating the salary of all the employees will not change the current values for any entities managed in memory as part of a persistence context. The developer can rely only on entities retrieved after the bulk operation completes.

When using transaction-scoped persistence contexts the bulk operation should either execute in a transaction all by itself or be the first operation in the transaction. Running the bulk operation in its own transaction is the preferred approach because it minimizes the chance of accidentally fetching data before the bulk change occurs. Executing the bulk operation and then working with entities after it completes is also safe because then any find() operation or query will go to the database to get current results. The examples in Listing 7-19 and Listing 7-20 used the REQUIRES_NEW transaction attribute to ensure that the bulk operations occurred within their own transactions.

A typical strategy for persistence providers dealing with bulk operations is to invalidate any in-memory cache of data related to the target entity. This forces data to be fetched from the database the next time it is required. How much cached data gets invalidated depends on the sophistication of the persistence provider. If the provider can detect that the update impacts only a small range of entities, those specific entities may be invalidated, leaving other cached data in place. Such optimizations are limited, however, and if the provider cannot be sure of the scope of the change, the entire cache must be invalidated. This can have an impact on the performance of the application if bulk changes are a frequent occurrence.

image Caution  Native SQL update and delete operations should not be executed on tables mapped by an entity. The JP QL operations tell the provider what cached entity state must be invalidated in order to remain consistent with the database. Native SQL operations bypass such checks and can quickly lead to situations where the in-memory cache is out of date with respect to the database.

The danger present in bulk operations and the reason they must occur first in a transaction is that any entity actively managed by a persistence context will remain that way, oblivious to the actual changes occurring at the database level. The active persistence context is separate and distinct from any data cache that the provider may use for optimizations. Consider the following sequence of operations:

  1. A new transaction starts.
  2. Entity A is created by calling persist() to make the entity managed.
  3. Entity B is retrieved from a find() operation and modified.
  4. A bulk remove deletes entity A.
  5. A bulk update changes the same properties on entity B that were modified in step 3.
  6. The transaction commits.

What should happen to entities A and B in this sequence? (Before you answer, recall that bulk operations translate directly to SQL and bypass the persistence context!) In the case of entity A, the provider has to assume that the persistence context is correct and so will still attempt to insert the new entity even though it should have been removed. In the case of entity B, again the provider has to assume that the managed version is the correct version and will attempt to update the version in the database, undoing the bulk update change.

This brings us to the issue of extended persistence contexts. Bulk operations and extended persistence contexts are a particularly dangerous combination because the persistence context survives across transaction boundaries, but the provider will never refresh the persistence context to reflect the changed state of the database after a bulk operation has completed. When the extended persistence context is next associated with a transaction, it will attempt to synchronize its current state with the database. Because the managed entities in the persistence context are now out of date with respect to the database, any changes made since the bulk operation could result in incorrect results being stored. In this situation, the only option is to refresh the entity state or ensure that the data is versioned in such a way that the incorrect change can be detected. Locking strategies and refreshing of entity state are discussed in Chapter 12.

Bulk Delete and Relationships

In our discussion of the remove() operation in the previous chapter, we emphasized that relationship maintenance is always the responsibility of the developer. The only time a cascading remove occurs is when the REMOVE cascade option is set for a relationship. Even then, the persistence provider won’t automatically update the state of any managed entities that refer to the removed entity. As you are about to see, the same requirement holds true when using DELETE statements as well.

A DELETE statement in JP QL corresponds more or less to a DELETE statement in SQL. Writing the statement in JP QL gives you the benefit of working with entities instead of tables, but the semantics are exactly the same. This has implications for how applications must write DELETE statements in order to ensure that they execute correctly and leave the database in a consistent state.

DELETE statements are applied to a set of entities in the database, unlike remove(), which applies to a single entity in the persistence context. A consequence of this is that DELETE statements do not cascade to related entities. Even if the REMOVE cascade option is set on a relationship, it will not be followed. It is your responsibility to ensure that relationships are correctly updated with respect to the entities that have been removed. The persistence provider also has no control over constraints in the database. If you attempt to remove data that is the target of a foreign key relationship in another table, you may get a referential integrity constraint violation in return.

Let’s look at an example that puts these issues in context. Suppose, for example, that a company wishes to reorganize its department structure. It wants to delete a number of departments and then assign the employees to new departments. The first step is to delete the old departments, so the following statement is to be executed:

DELETE FROM Department d
WHERE d.name IN ('CA13', 'CA19', 'NY30')

This is a straightforward operation. We want to remove the department entities that match the given list of names using a DELETE statement instead of querying for the entities and using the remove() operation to dispose of them. But when this query is executed, a PersistenceException exception is thrown, reporting that a foreign key integrity constraint has been violated. Another table has a foreign key reference to one of the rows we are trying to delete. Checking the database, we see that the table mapped by the Employee entity has a foreign key constraint against the table mapped by the Department entity. Because the foreign key value in the Employee table is not NULL, the parent key from the Department table can’t be removed.

We first need to update the Employee entities in question to make sure they do not point to the department we are trying to delete:

UPDATE Employee e
SET e.department = null
WHERE e.department.name IN ('CA13', 'CA19', 'NY30')

With this change the original DELETE statement will work as expected. Now consider what would have happened if the integrity constraint had not been in the database. The DELETE operation would have completed successfully, but the foreign key values would still be sitting in the Employee table. The next time the persistence provider tried to load the Employee entities with dangling foreign keys, it would be unable to resolve the target entity. The outcome of this operation is vendor-specific, but it may lead to a PersistenceException exception being thrown, complaining of the invalid relationship.

Query Hints

Query hints are the JPA extension point for query features. A hint is simply a string name and object value. Hints allow features to be added to JPA without introducing a new API. This includes standard features such as the query timeouts demonstrated earlier, as well as vendor-specific features. Note that when not explicitly covered by the JPA specification, no assumptions can be made about the portability of hints between vendors, even if the names are the same. Every query may be associated with any number of hints, set either in persistence unit metadata as part of the @NamedQuery annotation, or on the Query or TypedQuery interfaces using the setHint() method. The current set of hints enabled for a query may be retrieved with the getHints() method, which returns a map of name and value pairs.

In order to simplify portability between vendors, persistence providers are required to ignore hints that they do not understand. Listing 7-21 demonstrates the "eclipselink.cache-usage" hint supported by the JPA Reference Implementation to indicate that the cache should not be checked when reading an Employee from the database. Unlike the refresh() method of the EntityManager interface, this hint will not cause the query result to overwrite the current cached value.

Listing 7-21.  Using Query Hints

public Employee findEmployeeNoCache(int empId) {
    TypedQuery<Employee> q = em.createQuery(
        "SELECT e FROM Employee e WHERE e.id = :empId", Employee.class);
    // force read from database
    q.setHint("eclipselink.cache-usage", "DoNotCheckCache");
    q.setParameter("empId", empId);
    try {
        return q.getSingleResult();
    } catch (NoResultException e) {
        return null;
    }
}

If this query were to be executed frequently, a named query would be more efficient. The following named query definition incorporates the cache hint used earlier:

@NamedQuery(name="findEmployeeNoCache",
            query="SELECT e FROM Employee e WHERE e.id = :empId",
            hints={@QueryHint(name="eclipselink.cache-usage",
                              value="DoNotCheckCache")})

The hints element accepts an array of @QueryHint annotations, allowing any number of hints to be set for a query. However, a limitation of using annotations for the named query is that hints are restricted to having values that are strings, whereas when using the Query.setHint() method any kind of object can be passed in as a hint value. This may be particularly relevant when using proprietary vendor hints. This also represents another use case that could be added to the list in the “Dynamic Named Queries” section.

Query Best Practices

The typical application using JPA will have many queries defined. It is the nature of enterprise applications that information is constantly being queried from the database for everything from complex reports to drop-down lists in the user interface. Therefore, efficiently using queries can have a major impact on your application’s overall performance and responsiveness. As you carry out the performance testing of your queries, we recommend you consider some of the discussion points in the following sections.

Named Queries

First and foremost, we recommend named queries whenever possible. Persistence providers will often take steps to precompile JP QL named queries to SQL as part of the deployment or initialization phase of an application. This avoids the overhead of continuously parsing JP QL and generating SQL. Even with a cache for converted queries, dynamic query definition will always be less efficient than using named queries.

Named queries also enforce the best practice of using query parameters. Query parameters help to keep the number of distinct SQL strings parsed by the database to a minimum. Because databases typically keep a cache of SQL statements on hand for frequently accessed queries, this is an essential part of ensuring peak database performance.

As we discussed in the “Dynamic Query Definition” section, query parameters also help to avoid security issues caused by concatenating values into query strings. For applications exposed to the Web, security has to be a concern at every level of an application. You can either spend a lot of effort trying to validate input parameters, or you can use query parameters and let the database do the work for you.

When naming queries, decide on a naming strategy early in the application development cycle, with the understanding that the query namespace is global for each persistence unit. Collisions between query names are likely to be a common source of frustration if there is no established naming pattern. We recommend prefixing the name of the query with the name of the entity that is being returned, separated by a dot.

Using named queries allows for JP QL queries to be overridden with SQL queries or even with vendor-specific languages and expression frameworks. For applications migrating from an existing object-relational mapping solution, it is quite likely that the vendor will provide some support for invoking their existing query solution using the named query facility in JPA. We will discuss SQL named queries in Chapter 11.

The ability to dynamically create named queries in code may be useful if any of the cases we described earlier applies to you, or if you happen to have some other use case that makes dynamic creation relevant. In general, though, it is preferable and safer if you can declare all of your named queries statically.

Report Queries

If you are executing queries that return entities for reporting purposes and have no intention of modifying the results, consider executing queries using a transaction-scoped entity manager but outside of a transaction. The persistence provider may be able to detect the lack of a transaction and optimize the results for detachment, often by skipping some of the steps required to create an interim managed version of the entity results.

Likewise, if an entity is expensive to construct due to eager relationships or a complex table mapping, consider selecting individual entity properties using a projection query instead of retrieving the full entity result. If all you need is the name and office phone number for 500 employees, selecting only those 2 fields is likely to be far more efficient than fully constructing 1,000 entity instances.

Vendor Hints

It is likely that vendors will entice you with a variety of hints to enable different performance optimizations for queries. Query hints may well be an essential tool in meeting your performance expectations. If source code portability to multiple vendors is important, you should resist the urge to embed vendor query hints in your application code. The ideal location for query hints is in an XML mapping file (which we will be describing in Chapter 13) or at the very least as part of a named query definition. Hints are often highly dependent on the target platform and may well have to be changed over time as different aspects of the application impact the overall balance of performance. Keep hints decoupled from your code if at all possible.

Stateless Beans

We have tried to demonstrate many of the examples in the context of a stateless bean because we believe that to be the best way to organize queries in a Java EE application. Using any kind of stateless bean, be it a stateless session bean, a dependent scoped CDI bean, or a prototype scoped Spring bean, has a number of benefits over simply embedding queries all over the place in application code.

  • Clients can execute queries by invoking an appropriately named business method instead of relying on a cryptic query name or multiple copies of the same query string.
  • Bean methods can optimize their transaction usage depending on whether or not the results need to be managed or detached.
  • Using a transaction-scoped persistence context ensures that large numbers of entity instances don’t remain managed long after they are needed.

This is not to say that other components are unsuitable locations for issuing queries, but stateless beans are a well-established best practice for hosting queries in the Java EE environment.

Bulk Update and Delete

If bulk update and delete operations must be used, ensure that they are executed only in an isolated transaction where no other changes are being made. There are many ways in which these queries can negatively impact an active persistence context. Interweaving these queries with other non-bulk operations requires careful management by the application.

Entity versioning and locking requires special consideration when bulk update operations are used. Bulk delete operations can have wide ranging ramifications depending on how well the persistence provider can react and adjust entity caching in response. Therefore, we view bulk update and delete operations as being highly specialized, to be used with care.

Provider Differences

Take time to become familiar with the SQL that your persistence provider generates for different JP QL queries. Although understanding SQL is not necessary for writing JP QL queries, knowing what happens in response to the various JP QL operations is an essential part of performance tuning. Joins in JP QL are not always explicit, and you may find yourself surprised at the complex SQL generated for a seemingly simple JP QL query.

The benefits of features such as query paging are also dependent on the approach used by your persistence provider. There are a number of different techniques that can be used to accomplish pagination, many of which suffer from performance and scalability issues. Because JPA can’t dictate a particular approach that will work well in all cases, become familiar with the approach used by your provider and whether or not it is configurable.

Finally, understanding the provider strategy for when and how often it flushes the persistence context is necessary before looking at optimizations such as changing the flush mode. Depending on the caching architecture and query optimizations used by a provider, changing the flush mode may or may not make a difference to your application.

Summary

We began this chapter with an introduction to JP QL, the query language defined by JPA. We briefly discussed the origins of JP QL and its role in writing queries that interact with entities. We also provided an overview of major JP QL features for developers already experienced with SQL.

In the discussion on executing queries, we introduced the methods for defining queries both dynamically at runtime and statically as part of persistence unit metadata. We looked at the Query and TypedQuery interfaces and the types of query results possible using JP QL. We also looked at parameter binding, strategies for handling large result sets, and how to ensure that queries in transactions with modified data complete successfully.

In the section on bulk update and delete we explained how to execute these types of queries and how to ensure that they are used safely by the application. We provided details on how persistence providers deal with bulk operations and the impact that they have on the active persistence context.

We ended our discussion of query features with a look at query hints. We showed how to specify hints and provided an example using hints supported by the JPA Reference Implementation.

Finally, we summarized our view of best practices relating to queries, looking at named queries, different strategies for the various query types, as well as the implementation details that need to be understood for different persistence providers.

In the next chapter, we will continue to focus on queries by examining JP QL in detail.

1 As in most examples in the book, the bean could be a session bean, a CDI bean, or any other kind of container bean that supports entity manager injection.

2 Parameter values that may have already been bound using setParameter() are not saved as part of the named query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.130.24