This chapter investigates at the performance of Java-driven database applications. Applications that access a database are subject to non-Java performance issues: if a database is I/O bound, or if it is executing SQL queries that require full table scans because an index is missing from a table in the database, no amount of Java tuning or application coding is going to solve the performance issues. When dealing with database technologies, be prepared to learn (from another source) about how to tune and program the database.
This is not to say that the performance of an application that uses a database is insensitive to things under the control of the JVM and the Java technologies that are used. Rather, for good performance it is necessary to ensure that both the database and the application are correctly tuned and executing the best possible code.
This chapter covers database performance from the perspective of JPA—the Java Persistence API, version 2.0. However, JPA uses JDBC under the covers, and many developers still write applications directly to the JDBC APIs—so it is important to look the most important performance aspects of JDBC also. Even for applications that use JPA (or some other database framework), understanding JDBC performance will help get the best performance out of the framework.
The JDBC driver is the most important factor in the performance of database applications. Databases come with their own set of JDBC drivers, and alternate JDBC drivers are available for most popular databases. Frequently, the justification for these alternate drivers is that they offer better performance.
I can’t adjudicate the performance claims of all database drivers, but here are some things to consider when evaluating different drivers.
Where Work is Performed
JDBC drivers can be written to perform more work within the Java application (the database client) or to perform more work on the database server. The best example of this is the thin and thick drivers for Oracle databases. The thin driver is written to have a fairly small footprint within the Java application—it relies on the database server to do more processing. The thick driver is just the opposite: it off-loads work from the database at the expense of requiring more processing and more memory on the Java client. That kind of trade-off is possible in most databases.
There are competing claims as to which model gives the better performance. The truth is that neither model offers an inherent advantage—the driver that will offer the best performance depends on the specifics of the environment in which it is run. Consider the case where an application host is a small, 2-core machine connecting to a huge, well-tuned database. The CPU of the application host is likely to become saturated well before any significant load is placed on the database. A thin-style driver will give the better performance in that case. Conversely, an enterprise that has 100 departments accessing a single HR database will see the best performance if database resources are preserved and the clients deploy a thick-style driver.
This is a reason to be suspicious of any performance claims when it comes to JDBC drivers: it is quite easy to pick a driver that is well-suited to a particular environment and show that it is superior to some other vendor’s driver that performs badly on the exact same setup. As always, test in your own environment, and make sure that environment mirrors what you will deploy on.
The JDBC Driver Type
JDBC drivers come in four types (1 - 4). The driver types in wide use today are type 2 (which uses native code), and type 4 (which is pure Java).
Type 1 drivers provide a bridge between ODBC and JBDC. If an application must talk to an ODBC database, then it must use this driver. Type 1 drivers generally have quite bad performance; given a choice to avoid ODBC, you should.
Type 3 drivers are, like type 4 drivers, written purely in Java, but they are designed for a specific architecture where some piece of middleware (sometimes, though usually not, an application server) provides an intermediary translation. In this architecture, a JDBC client (usually a standalone program, though conceivably an application server) sends JDBC-protocol to the middleware, which translates the requests into a database-specific protocol and forwards the request to the database (and performs the reverse translation for the response).
There are some situations where this architecture is required: the middleware can sit in the network DMZ zone and provide some additional security for connections to the database. From a performance standpoint, there are potential advantages and disadvantages. The middleware is free to cache database information, which offloads the database (making it faster) and returns data to the client sooner (decreasing the latency of the request). Without that caching, however, performance will suffer, as two round-trip network requests are now required in order to perform a database operation. In the ideal case, those will balance out (or the caching will be even faster).
As a practical situation, though, this architecture has not really been widely adopted.[69] It is generally easier to put the application server itself in the middle tier (including in the DMZ if needed). The application server can then perform the database operations, but it needn’t provide a JDBC interface to clients: it is better off providing servlet interfaces, webservice interfaces, and so on—isolating the client from any knowledge of the database.
That leaves type 2 and 4 drivers, neither of which has an inherent performance advantage over the other. Type 2 drivers can suffer from JNI overhead, but a well-written type 2 driver can overcome that. Don’t conflate the driver type (2 or 4) with whether the driver is considered “thick” or “thin” as discussed in the previous section. It is true that type 2 drivers tend to be thick and type 4 drivers tend to be thin, but that is not a requirement. In the end, whether a type 2 or type 4 driver is better depends on the environment and the specific drivers in question. There is really no a priori way to know which will perform better.
In most circumstances, code should use a
PreparedStatement
rather than
a
Statement
for its JDBC calls. The difference is that prepared
statements allow the database to reuse information about the SQL that is
being executed. That saves work for the database on subsequent executions
of the prepared statement.
Reuse is the operative word here: the first use of a prepared statement takes more time for the database to execute, since it must set up and save information. If the statement is used only once, then that work will be wasted; better off to use a regular statement in that case.
In a batch-oriented program that makes only a few database
calls, the
Statement
interface will let the application finish faster.
But even batch-oriented programs may make hundreds or
thousands of JDBC calls to the same few SQL statements;
later examples in this chapter will use a
batch program to load a database with 10,000 stock records. Batch
program that have many JDBC calls—and application servers that will
service many requests over their lifetime—are better off using a
PreparedStatement
interface (and database frameworks like JPA will do
that automatically).
Prepared statements provide their performance benefit
when they are pooled: that is, when the actual
PreparedStatement
object
is reused. For proper pooling, two things must be considered: the JDBC
connection pool, and the JDBC driver configuration.[70] These
configuration options apply to any program that uses JDBC—whether
directly or via JPA.
Prepared statement pools operate on a per-connection basis. If one thread in a program pulls a JDBC connection out of the pool and uses a prepared statement on that connection, the information associated with the statement will be valid only for that connection. A second thread that uses a second connection will end up establishing a second pooled instance of the prepared statement. In the end, each connection object will have its own pool of all the prepared statements used in the application (assuming that they are all used over the lifetime of the application).
This is one reason why a standalone JDBC application should use a connection pool (JPA transparently creates a connection pool for Java SE programs, or uses a connection pool from the application server when used in a Java EE environment). It also means that the size of the connection pool matters (to both JDBC and JPA programs). That is particularly true early in the program’s execution: when a connection that has not yet used a particular prepared statement is used, that first request will be a little slower.
The size of the connection pool also matters because it is caching those prepared statements, which take up of heap space (and often a lot of heap space). Object reuse is certainly a good thing in this case, but you must be aware of how much space those reusable objects take up and make sure it isn’t negatively affecting the garbage collection time.
The second thing to consider about the prepared statement pool is what
piece of code will actually create and manage the pool.
Prepared statement pooling was introduced in JDBC 3.0, which provides a
single method (the
setMaxStatements()
method of the
ConnectionPoolDataSource
class)
to enable or disable statement pooling.[71] That interface specifically
does not define where the statement pooling should occur—whether in
the JDBC driver, or some other layer such as the application server. And
that single interface is insufficient for some JDBC drivers, which require
additional configuration.
So, when writing a Java SE application that uses JDBC calls directly, there are two choices: either the JDBC driver must be configured to create and manage the statement pool, or the pool must be created and managed within the application code. Java EE applications have two (slightly different) possibilities: the JDBC driver can create and manage the pool, or the application server can create and manage the pool.
The tricky thing is that there are no standards in this area. Some JDBC drivers do not provide a mechanism to pool statements at all; they expect to be used only within an application server that is doing the statement pooling and want to provide a simpler driver. Some application servers do not provide and manage a pool; they expect the JDBC driver to handle that task and don’t want to complicate their code. Both arguments have merit (though a JDBC driver that does not provide a statement pool puts a burden on you if you are the developer of a standalone application). In the end, you’ll have to sift through this landscape and make sure that the statement pool is created somewhere.
Since there are no standards, you may encounter a situation where both the JDBC driver and the application server are capable of managing the prepared statement pool. In that case, it is important that only one of them be configured to do so. From a performance perspective, the better choice will again depend on the exact combination of driver and server. As a general rule, you can expect the JDBC driver to perform better statement pooling. Since the driver is (usually) specific to a particular database, it can be expected to make better optimizations for that database than the more generic application server code.
To enable statement pooling (or caching) for a particular JDBC driver,
consult that driver’s documentation. In many cases, you need only set up
the driver so that the
maxStatements
property is set to the desired value
(that is, the size of the statement pool). Other drivers may require
additional settings (e.g., the Oracle JDBC drivers require that specific
properties be set to tell it whether to use implicit or explicit statement
caching).
Connections to a database are time-consuming to create, and so JDBC connections are the prototypical object that you should reuse in Java.
In a Java EE environment, all JDBC connections come from the application server’s pool. In a Java SE environment with JPA, most JPA providers will use a connection pool transparently, and you can configure the connection pool within the persistence.xml file. In a standalone Java SE environment, the connections must be managed by the application. To deal with that last case, you can use one of several connection pool libraries that are available from many sources. Often, though, it is easier simply to create a connection and store it in a thread local variable for each thread in a standalone application.
As usual, it is important to strike the right balance here between the memory occupied by the pooled objects, and the amount of extra garbage collection the pooling will trigger. This is particularly true because of the prepared statement caches. The actual connection objects may not be very big, but statement caches (which exist on a per-connection basis) can grow to be quite big.
In this case, striking the correct balance applies to the database as well. Each connection to the database requires resources on the database (in addition to the memory held in the application). As connections are added to the database, the database needs more resources. Each prepared statement used by the JDBC driver also requires memory and other resources on the database. Database performance can be adversely affected if the application server has too many open connections.
The general rule of thumb for connection pools is to have one connection for every thread in the application. In an application server, start by applying the same sizing to the thread pool and the connection pool. In a standalone application, size the connection pool based on the number of threads the application creates. In the typical case, this will offer the best performance: no thread in the program will have to wait for a database connection to be available, and typically there are enough resources on the database to handle the load imposed by the application.
If the database becomes a bottleneck, however, this rule can become counter-productive. Having too many connections to an undersized database is another illustration of the principal that injecting load into a busy system will decrease its performance. Using a connection pool to throttle the amount of work that is sent to an undersized database is the way to improve performance in that situation. Application threads may have to wait for a free connection, but the total throughput of the system will be maximized if the database is not overburdened.
Applications have correctness requirements that ultimately dictate how transactions are handled. A transaction that requires repeatable read semantics will be slower than a transaction that requires only read committed semantics, but knowing that is of little practical benefit for an application that cannot tolerate non-repeatable reads. So while this section discusses how to use the least-intrusive isolation semantics for an application, don’t let the desire for speed overcome the correctness of the application.
Database transactions have two performance penalties. First, it takes time for the database to set up and then commit the transaction. This involves making sure that changes to the database are fully stored on disk, that the database transaction logs are consistent, and so on. Second, during a database transaction, it is quite common for the transaction to obtain a lock for a particular set of data (not always a row, but I’ll use that as the example case here). If two transactions are contending for a lock on the same database row, then the scalability of the application will suffer. From a Java perspective, this is exactly analogous to the discussion in Chapter 9 about contended and uncontended locks.
For optimal performance, consider both of these issues: how to program the transactions so that the transaction itself is efficient, and how to hold locks on the database during a transaction so that the application as a whole can scale.
Transactions are present within both JDBC and JPA applications, but JPA
manages transactions differently (those details are discussed later in
this chapter).
For JDBC, transactions
begin and end based on how the
Connection
object is used.
In basic JDBC usage, connections have a notion of an autocommit mode
(set via the
setAutoCommit()
method). If autocommit is turned
on (and for most JDBC drivers, that is the default), then each statement
in a JDBC program is its own transaction. In that case, a program need
take no action to commit a transaction (in fact, if
the
commit()
method is called, performance will often suffer).
If autocommit is turned off, then a transaction implicitly begins when the
first call is made on the connection object (e.g., by calling the
executeQuery()
method). The transaction continues until the
commit()
method (or the
rollback()
method) is called. A new transaction will begin
when the connection is used for the next database call.
Transactions are expensive to commit, so one goal is to perform as much work in a transaction as is possible. Unfortunately, that principle is completely at odds with another goal: because transactions can hold locks, they should be as short as possible. There is definitely a balance here, and striking the balance will depend on the application and its locking requirements. The next section on transaction isolation and locking covers that in more detail; first let’s look into the options for optimizing the transaction handling itself.
Consider some sample code that inserts data into a database for use by the stock application. For each day of valid data, one row must be inserted into the STOCKPRICE table, and five rows into the STOCKOPTIONPRICE table. A basic loop to accomplish that looks like this:
Connection
c
=
DriverManager
.
getConnection
(
URL
,
user
,
pw
);
PreparedStatement
ps
=
c
.
prepareStatement
(
insertStockSQL
);
PreparedStatement
ps2
=
c
.
prepareStatement
(
insertOptionSQL
))
{
Date
curDate
=
new
Date
(
startDate
.
getTime
());
while
(!
curDate
.
after
(
endDate
))
{
StockPrice
sp
=
createRandomStock
(
symbol
,
curDate
);
if
(
sp
!=
null
)
{
ps
.
clearParameters
();
ps
.
setBigDecimal
(
1
,
sp
.
getClosingPrice
());
// Similar set calls for the remaining fields
ps
.
executeUpdate
();
for
(
int
j
=
0
;
j
<
5
;
j
++)
{
ps2
.
clearParameters
();
ps2
.
setBigDecimal
(
1
,
...);
// Similar set calls for the remaining fields
ps2
.
executeUpdate
();
}
}
// else curDate is a weekend and it is skipped
curDate
.
setTime
(
curDate
.
getTime
()
+
msPerDay
);
}
If the start and end dates represent the year 2013, this loop will insert
261 rows into the STOCKPRICE table (via the first call to the
executeUpdate()
method) and 1305 rows into the STOCKOPTIONPRICE table (in the inner
for loop). In the default autocommit mode, that means 1566 separate
transactions, which will be quite expensive.
Better performance will be achieved if autocommit mode is disabled and an explicit commit is performed at the end of the loop:
Connection
c
=
DriverManager
.
getConnection
(
URL
,
user
,
pw
);
c
.
setAutoCommit
(
false
);
...
while
(!
curDate
.
after
(
endDate
))
{
...
}
c
.
commit
();
From a logical point of view, that likely makes sense as well: the database will end up with either an entire year’s worth of data, or no data.
If this loop is repeated for multiple stocks, there is a choice of committing all the data at once or committing a year’s worth of data at a time:
Connection
c
=
DriverManager
.
getConnection
(
URL
,
user
,
pw
);
c
.
setAutoCommit
(
false
);
for
(
int
i
=
0
;
i
<
numStocks
;
i
++)
{
curDate
=
startDate
;
while
(!
curdate
.
after
(
endDate
))
{
...
}
//c.commit(); // To commit a year at a time
}
c
.
commit
();
// To commit all the data at once
Committing all the data at once offers the fastest performance, which is why the other option is commented out. In this example, though, there is a reasonable case where the application semantics might dictate that each year of data be committed individually. Sometimes, other requirements intrude on attempts to get the best performance.
Each time the
executeUpdate()
method is executed in the above code,
a remote call is made to the database and some work must be performed. In
addition, some locking will occur when the updates are made (to ensure,
at least, that another transaction cannot insert a record for the same
symbol and date). The transaction handling can be further optimized in this
case by batching the inserts. When inserts are batched, the JDBC driver
holds them until the batch is completed; then all statements are transmitted
in one remote JDBC call.
Here is how batching is achieved:
for
(
int
i
=
0
;
i
<
numStocks
;
i
++)
{
while
(!
curdate
.
after
(
endDate
))
{
...
ps
.
addBatch
();
// replaces executeUpdate() call
for
(
int
j
=
0
;
j
<
5
;
j
++)
{
...
ps2
.
addBatch
();
// replaces executeUpdate() call
}
}
}
ps
.
executeBatch
();
ps2
.
executeBatch
();
c
.
commit
();
The code could equally well
choose to execute each batch on a per-stock basis (after the while
loop).
Some JDBC drivers have a limitation to the number of statements they can
batch (and the batch does consume memory in the application), so even if
the data is committed at the end of the entire operation, the batches may
need to be executed more frequently.
These optimizations can yield very large performance increases. Table 11-1 shows the time required to insert one year of data for 128 stocks (a total of 200,448 insertions).
Programming Mode | Time Required | DB Calls | DB Commits |
AutoCommit enabled, no batching | 2220.53 seconds | 200,448 | 200,448 |
One Commit for each stock | 174.44 seconds | 200,448 | 128 |
One Commit for all data | 169.34 seconds | 200,448 | 1 |
One Batch/Commit for each stock | 19.32 seconds | 128 | 128 |
One Batch/stock; One commit | 17.33 seconds | 128 | 1 |
One Batch/Commit for all data | 11.55 seconds | 1 | 1 |
One interesting fact about this table that is not immediately obvious:
the difference between line 1 and 2 is that autocommit has been turned off
and the code is explicitly calling the
commit()
method at the end of
each
while
loop. The difference between lines 1 and 4 is that statements are being
batched—but autocommit is still enabled. A batch is considered one
transaction, which is why there is
a 1-1 correspondence between database calls and commits. In this example,
then, a larger benefit accrued from batching than from explicitly managing
the transaction boundaries.
The second factor affecting transaction performance concerns the scalability of the database as data within transactions is locked. Locking protects data integrity; in database terms, it allows one transaction to be isolated from other transactions. JDBC and JPA support the four major transaction isolation modes of databases, though they differ in the way the accomplish that.
Isolation modes are briefly covered here, though since programming to a correct isolation mode isn’t really a Java-specific issue, you are urged to consult a database programming book for more information.
The basic transaction isolation modes (in order from most- to least-expensive) are:
phantom reads
, where a transaction that re-issues a query with a WHERE clause may get back different data the second time the query is executed.
non-repeatable reads
: data that is read at one point in the transaction may be different than data that is read at another point in the transaction.
dirty read
; the problem here arises because the first transaction may rollback (meaning the write never actually happens), and hence the second transaction is operating on incorrect data.
Databases operate in a default mode of transaction isolation: MySQL starts with a default of TRANSACTION_REPEATABLE_READ; Oracle and DB2 start with a default of TRANSACTION_READ_COMMITTED; and so on. There are lots of database-specific permutations here. DB2 calls their default transaction mode CS (for cursor stability) and has different names for the other three JDBC modes. Oracle doesn’t support either TRANSACTION_READ_UNCOMMITTED or TRANSACTION_REPEATABLE_READ.
When a JDBC statement is executed, it
uses the database’s default isolation mode. Alternately,
the
setTransaction()
method on the JDBC connection can be called
to have the database
supply the necessary transaction isolation level (and if the database doesn’t
support the given level, the JDBC driver will either thrown an exception, or
silently upgrade the isolation level to the next strictest level it supports).
For simple JDBC programs, this is sufficient. More commonly—and particularly when used with JPA—programs may want to mix isolation levels on data within a transaction. In an application that queries my employee information so as to ultimately give me a large raise, access to my employee record must be protected: that data needs to be treated as TRANSACTION_REPEATABLE_READ. But that transaction is also likely to access data in other tables, such as the table that holds my office ID. There is no real reason to lock that data during the transaction, so access to that row could certainly operate as TRANSACTION_READ_COMMITTED (or possibly even lower).
JPA allows you to specifying locking levels on a per-entity basis (and of course an entity is [at least usually] simply a row in the database). Because getting these locking levels correct can be quite difficult, it is easier to use JPA rather than performing the locking in JDBC statements. Still, it is possible to use different locking levels in JDBC applications, employing the same pessimistic and optimistic locking semantics that JPA uses (and if you’re not familiar with those semantics, this example should serve as a good introduction to them).
At a JDBC level, the basic approach is to set the isolation level of the connection to TRANSACTION_READ_UNCOMMITTED, and then to lock explicitly only that data which needs to be locked during the transaction.
Connection
c
=
DriverManager
.
getConnection
();
// Or...get it from a local pool
c
.
setAutoCommit
(
false
);
c
.
setTransactionIsolation
(
TRANSACTION_READ_UNCOMMITTED
);
PreparedStatement
ps1
=
c
.
prepareStatement
(
"SELECT * FROM employee WHERE e_id = ? FOR UPDATE"
);
...
process
info
from
ps1
...
PreparedStatement
ps2
=
c
.
prepareStatement
(
"SELECT * FROM office WHERE office_id = ?"
);
...
process
info
from
ps2
...
c
.
commit
();
The ps1
statement establishes an explicit lock on the employee
data table—no other transaction will be able to access that row for the
duration of this transaction. The SQL syntax to accomplish that is
non-standard. You must consult your database vendor’s documentation to
see how to achieve the desired level of locking, but the common syntax
is to include the FOR UPDATE clause. This kind of locking
is called pessimistic locking. It actively prevents other transactions
from accessing the data in question.
Locking performance can often be improved
by using optimistic locking—the same way that the
java.util.concurrent.atomic
package approaches uncontended atomic operations.
If the data access is uncontended, this will be a significant
performance boost. If the data is even slightly contended, however, the
programming becomes more difficult.
In a database, optimistic concurrency is implemented with a version column. When data is selected from a row, the selection must include the desired data plus a version column. To select information about me, I could issue the following SQL:
SELECT first_name, last_name, version FROM employee WHERE e_id = 5058;
This query will return my names (Scott and Oaks) plus whatever the current version number is (say, 1012). When it comes time to complete the transaction, the transaction updates the version column:
UPDATE employee SET version = 1013 WHERE e_id = 5058 AND version = 1012;
If the row in question requires repeatable read or serialization semantics, this update must be performed even if the data was only read during the transaction—those isolation levels require locking read-only data used in a transaction. For read committed semantics, the version column only needs to be updated when other data in the row is also updated.
Under this scheme, if two transactions use my employee record at the same time, each will read a version number of 1012. The first transaction to complete will successfully update the version number to 1013 and continue. The second transaction will not be able to update the employee record—there is no longer any record where the version number is 1012, and so the SQL update statement will fail. That transaction will get an exception and be rolled back.
This highlights to a major difference between optimistic locking in the
database and Java’s atomic primitives: in database programming, when the
transaction gets that exception, it is not (and cannot be) transparently
retried. If you are programming directly to JDBC, the
commit()
method will
get an
SQLException
in JPA, your application will get an
OptimisticLockException
when the transaction is committed.
Depending on your perspective, this is either a good or a bad thing. When the performance of the atomic utilities (which did transparently retry the operation) was discussed, we observed that performance in highly-contended cases could suffer when there were a lot of retries chewing up a lot of CPU resources. In a database, that situation is far worse, since the code executed in a transaction is far more complicated than simply incrementing the value held in a memory location. Retrying a failed optimistic transaction in the database has a far greater potential to lead to a never-ending spiral of retries. Plus, it is often infeasible to determine automatically what operation(s) to retry.
So not retrying transparently is a good thing (and often the only possible solution), but on the other hand, that does mean the application is now responsible for handling the exception. The application can choose to retry the transaction (maybe only once or twice); it can choose to prompt the user for different data; or it can simply inform the user that the operation has failed. There is no one-size-fits-all answer here.
Optimistic locking works best, then, when there is very little chance of a collision between two sources. Think of a joint checking account: there is a slight chance that my husband and I may be in different parts of the city withdrawing money from our checking account at exactly the same time. That would trigger an optimistic lock exception for one of us. Even if that does happen, asking one of us to try again is not too onerous, and now the chance of an optimistic lock exception is virtually nil (or so I would hope; let’s not address how frequently we make ATM withdrawals). Contrast that scenario to something involving the sample stock application. In the real world, that data is updated so frequently that locking it optimistically would be quite counter-productive.[72]
Typical database applications will operate on a range of data. The stock application, for example, deals with a history of prices for an individual stock. That history is loaded via a single select statement:
SELECT * FROM stockprice WHERE symbol = 'TPKS' AND pricedate >= '2013-01-01' AND pricedate <= '2013-12-31';
That statement returns 261 rows of data. If the option prices for the stock are also required, a similar query would be executed which would retrieve five times that amount of data. The SQL to retrieve all data in the sample database (128 stocks covering 1 year) will retrieve 200,448 rows of data:
SELECT * FROM stockprice s, stockoptionprice o WHERE o.symbol = s.symbol AND s.pricedate >= '2013-01-01' AND s.pricedate <= '2013-12-31';
To use this data, code must scroll through the result set:
PreparedStatement
ps
=
c
.
prepareStatement
(...);
ResultSet
rs
=
ps
.
executeQuery
();
while
(
rs
.
next
())
{
...
read
the
current
row
...
}
The question here is where that data for the 200,448 rows
lives. If the entire set of data is returned during the
executeQuery()
call, then the application will have a very large chunk of live data in its
heap, possibly causing GC and other issues. Instead, if only one row of data
is returned from the call to the
next()
method,
there will be a lot of back-and-forth
traffic between the application and the database as the result set is
processed.
As usual, there is no correct answer here; in some cases it will be more
efficient to keep the bulk of the data on the database and retrieve it
as needed, while in other cases it will be more efficient to load all the
data at once when the query is executed. To control this, use the
setFetchSize()
method on the
PreparedStatement
object to let the JDBC driver
know how many rows at a time it should transfer.
The default value for
this varies by JDBC driver; for example, in Oracle’s JDBC drivers, the
default value is ten. When the
executeQuery()
method is called in the loop shown above,
the database will return ten rows of data which will be
buffered internally by the JDBC driver. Each of the first ten calls to
the
next()
method will process one of those buffered rows. The eleventh
call will return to the database to retrieve another ten rows, and so on.
Though the value varies, JDBC drivers will typically set the default fetch
size to a fairly small number. That approach is reasonable in most
circumstances; in
particular, it is very unlikely to lead to any memory issues within the
application. If the performance of the
next()
method
(or the performance of the first getter method on the result set) is
particularly slow every now and then, consider increasing the
fetch size.
The performance of JPA is directly affected by the performance of the underlying JDBC driver, and most of the performance considerations regarding the JDBC driver apply to JPA. JPA has additional performance considerations.
JPA achieves many of its performance enhancements by altering the bytecodes of the entity classes. In a Java EE environment, this happens transparently. In a Java SE environment, it is very important to make sure that the bytecode processing is setup correctly. Otherwise, JPA application performance will be unpredictable: fields that are expected to be loaded lazily might be loaded eagerly, data saved to the database might be redundant, data that should be in the JPA cache may need to be re-fetched from the database, and so on.
There is no JPA-defined way for the bytecodes to be processed. Typically, this is done as part of compilation—after the entity classes are compiled (and before they are loaded into jar files or run by the JVM), they are passed through an implementation-specific post-processor that “enhances” the bytecodes, producing an altered class file with the desired optimizations.
Some JPA implementations also provide a way to dynamically enhance the
bytecodes
as the classes are loaded into the JVM. This requires running
an agent within the JVM that is notified when classes are loaded; the
agent interposes on the class loading and alters the bytes before they are
used to
define the class. The agent is specified on the command line of the
application; e.g., for Eclipselink you include the
-javaagent:path_to/eclipselink.jar
argument.
JPA can be used within both Java SE and Java EE applications. The platform in use affects the way JPA transactions are handled.
In Java EE, JPA transactions are part of the application server’s Java Transaction API (JTA) implementation. This offers two choices of how to the transaction boundaries are handled: the application server can take care of the boundaries (using container-managed transactions, CMT), or the transaction boundaries can be explicitly programed in the application (using user-managed transactions, UMT).
There is no significant difference in the performance between CMT and UMT if they are used equivalently. However, it is not always possible to use them equivalently. In particular, user-managed transactions can have a larger or smaller scope than container-managed transactions, which can have a significant impact on performance.
Consider the following pseudo-code:
@Stateless
public
class
Calculator
{
@PersistenceContext
(
unitName
=
"Calc"
)
EntityManager
em
;
@TransactionAttribute
(
REQUIRED
)
public
void
calculate
()
{
Parameters
p
=
em
.
find
(...);
...
perform
expensive
calculation
...
em
.
persist
(...
answer
...);
}
}
The transaction scope here (using CMT) is the entire method. If the method requires repeatable read semantics for the data that is being persisted, then data in the table will be locked during the expensive calculation.
With user managed transactions, there is more flexibility:
@Stateless
public
class
Calculator
{
@PersistenceContext
(
unitName
=
"Calc"
)
EntityManager
em
;
public
void
calculate
()
{
UserTransaction
ut
=
...
lookup
UT
in
application
server
...;
ut
.
begin
();
Parameters
p
=
em
.
find
(...);
ut
.
commit
();
...
perform
expensive
calculation
...
ut
.
begin
();
em
.
persist
(...
answer
...);
ut
.
commit
();
}
}
Splitting the transaction—which can only be done easily using user-managed transactions—limits the effect of the transaction on the scaling of the application. Strictly speaking, this could have been done using container-managed transactions by splitting the work into three different methods, each with a different transaction attribute. In general, the UMT approach is far more convenient.
Similarly, a servlet using user-managed transactions can establish a transaction boundary that spans multiple calls to an EJB. Using container manged transactions to accomplish the same thing means adding into the EJB interface a new meta-method which calls those other methods within the same transaction.
In a Java SE application, the entity manager is responsible for providing the transaction object, but the application is responsible for demarcating the transaction boundaries on that object. The JPA example to populate the database of stock prices includes the following code:
public
void
run
()
{
for
(
int
i
=
startStock
;
i
<
numStocks
;
i
++)
{
EntityManager
em
=
emf
.
createEntityManager
();
EntityTransaction
txn
=
em
.
getTransaction
();
txn
.
begin
();
while
(!
curDate
.
after
(
endDate
))
{
StockPrice
sp
=
createRandomStock
(
curDate
);
if
(
sp
!=
null
)
{
em
.
persist
(
sp
);
for
(
int
j
=
0
;
j
<
5
;
j
++)
{
StockOptionPriceImpl
sop
=
createRandomOption
(
sp
.
getSymbol
,
sp
.
getDate
());
em
.
persist
(
sop
);
}
}
curDate
.
setTime
(
curDate
.
getTime
()
+
msPerDay
);
}
txn
.
commit
();
em
.
close
();
}
}
Similar to what was observed using transactions in JDBC, there are efficiency trade-offs here between the frequency of committing the transaction vs. the natural transaction boundaries. Some timings for this example are included in the next section.
In JDBC, we looked at two critical performance techniques: reusing prepared statements, and performing updates in batches.
It is possible to accomplish both of those optimizations with JPA, but the way it is done depends on the JPA implementation in use; there are no calls within the JPA API to do that. For Java SE, these optimizations typically require setting a particular property in the application’s persistence.xml file.
For example, using the JPA Eclipselink reference implementation, statement reuse is enabled by adding the following property to the persistence.xml file:
<property name="eclipselink.jdbc.cache-statements" value="true" />
Note that this enables statement reuse within the eclipselink implementation. If the JDBC driver is capable of providing a statement pool, it is usually preferable to enable the statement caching in the driver and to leave this property out of the JPA configuration.
Statement batching in the reference JPA implementation is achieved by adding these properties:
<property name="eclipselink.jdbc.batch-writing" value="JDBC" /> <property name="eclipselink.jdbc.batch-writing.size" value="10000" />
JDBC drivers cannot automatically implement statement batching, so this is
a useful property to set in all cases. The batch size can be controlled in
two ways: first, the
size
property can be set as is done in this example.
Second, the application can periodically call the
flush()
method of the
entity manager, which will cause all batched statements to be executed
immediately.
Table 11-2 shows the effect of the statement reuse and batching to create and write stock entities into the database.
Programming Mode | Time Required |
No Batching, No Statement Pool | 240 seconds |
No Batching, Statement Pool | 200 seconds |
Batching, No Statement Pool | 23.37 seconds |
Batching, Statement Pooling | 21.08 seconds |
flush()
method).
Optimizing when and how JPA reads data from the database is much more complicated than it might seem. That’s because JPA will cache data in the hope that it might be used to satisfy a future request. That’s usually a good thing for performance, but it means that the JPA-generated SQL which is used to read that data may seem, on the face of it, sub-optimal. The data retrieval is optimized to serve the needs of the JPA cache, rather than being optimized for whatever particular request is in progress.
The details of the cache are covered in the next section. For now, let’s
look at the basic ways to apply database read optimizations
to JPA. JPA reads data from the database in three cases: when the
find()
method of the
EntityManager
is called, when a JPA query is executed,
and when code navigates to a new
entity using the relationship of an existing entity. In the stock class,
that latter case means calling the
getOptions()
method on a
Stock
entity.
Calling the
find()
method is the most straightforward case here—there
is only a single row involved, and (at least) that single row is read
from the database.
The only thing that can be controlled here how much data is retrieved.
JPA can retrieve only some of the fields in the row, it can retrieve the
entire row, or it can pre-fetch other entities
that are related to the row being retrieved.
Those optimizations apply to queries as well.
There are two possible paths: read less data (because the data won’t be needed), or read more data at a time (because that data will definitely be needed in the future).
To read less data, specify that the field in question is loaded lazily. When an entity is retrieved, the fields with a lazy annotation will be excluded from the SQL used to load the data. If the getter of that field is ever executed, it will mean another trip to the database to retrieve that piece of data.
It is rare to use that annotation for simple columns of basic types, but consider using it if the entity contains large BLOB- or CLOB-based objects.
@Lob
@Column
(
name
=
"IMAGEDATA"
)
@Basic
(
fetch
=
FetchType
.
LAZY
)
private
byte
[]
imageData
;
In this case, the entity is mapped to a table storing binary image data. The binary data is large, and the example assumes it shouldn’t be loaded unless it is actually needed. Not loading the unneeded data in this case serves two purposes: it makes for faster SQL when the entity is retrieved, and it saves a lot of memory, leading to less GC pressure.
Note also that the lazy annotation is, in the end, only a hint to the JPA implementation. The JPA implementation is free to request that the database eagerly load that data anyway.
On the other hand, perhaps some other data should be preloaded—for example, when one entity is fetched, data for other (related) entities should also be returned. That is known as eager fetching, and it has a similar annotation:
@OneToMany
(
mappedBy
=
"stock"
,
fetch
=
FetchType
.
EAGER
)
private
Collection
<
StockOptionPriceImpl
>
optionsPrices
;
By default, related entities are already fetched eagerly if the relationship
type is
@OneToOne
or
@ManyToOne
(and so it is possible to apply the
opposite optimization to them: mark them as
FetchType.LAZY
if they are almost never used).
This also is just a hint to the JPA implementation, but it essentially says
that any time a stock price is retrieved, make sure also to retrieve all
related option prices. Beware here:
a common expectation about eager relationship fetching is that it will
employ a JOIN in the generated SQL. In typical JPA providers, that is not
the case: they will issue a single SQL query to fetch the primary object,
and then one or more SQL commands to fetch any additional, related objects.
From a simple
find()
method, there is no control over this: if
a JOIN statement is required, you will have to use a query and
program the JOIN into the query.
JPQL doesn’t allow you to specify fields of an object to be retrieved. Take the following JPQL query:
Query
q
=
em
.
createQuery
(
"SELECT s FROM StockPriceImpl s"
);
That query will always yield this SQL statement
SELECT <enumerated list of non-LAZY fields> FROM StockPriceTable
If you want to retrieve fewer fields in the generated SQL, you have no option but to mark them as lazy. Similarly, for fields that are marked as lazy, there is no real option for fetching them in a query.
If there are relationships between entities, the entities can be explicitly joined in query in JPQL, which will retrieve the initial entities and their related entities in one shot. For example, in the stock entities, this query can be issued:
Query q = em.createQuery("SELECT s FROM StockOptionImpl s " + "JOIN FETCH s.optionsPrices");
That results in an SQL statement similar to:
SELECT t1.<fields>, t0.<fields> FROM StockOptionPrice t0, StockPrice t1 WHERE ((t0.SYMBOL = t1.SYMBOL) AND (t0.PRICEDATE = t1.PRICEDATE))
The exact SQL will differ among JPA providers (this example is from Eclipselink), but this is the general process.
Join fetching is valid for entity relationships regardless of whether they are annotated as eager or lazy. If the join is issued on a lazy relationship, the lazily-annotated entities that satisfy the query are still retrieved from the database, and if those entities are later used, no additional trip to the database is required.
When all the data returned by a query using a join fetch will be used, then the join fetch often provides a big improvement in performance. However, a join fetch also interacts with the JPA cache in unexpected ways. An example of that is shown in the section on caching; be sure you understand those ramifications before writing custom queries using join fetch.
JPA queries are handled like a JDBC query yielding a result set: the JPA implementation has the option of getting all the results at once, getting the results one at a time as the application iterates over the query results, or getting a few results at a time (analogous to how the fetch sized worked for JDBC).
There is no standard way to control this, but JPA vendors have proprietary mechanisms to set the fetch size. In Eclipselink, a hint on the query specifies the fetch size:
q
.
setHint
(
"eclipselink.JDBC_FETCH_SIZE"
,
"100000"
);
Hibernate offers a custom
@BatchSize
annotation instead.
If a very large set of data is being processed, the code may need to page through the list returned by the query. This has a natural relationship to how the data might be displayed to the user on a web page: a subset of data is displayed (say 100 rows), along with “next” and “previous” page links to navigate (page) through the data.
This is accomplished by setting a range on the query:
Query
q
=
em
.
createNamedQuery
(
"selectAll"
);
query
.
setFirstResult
(
101
);
query
.
setMaxResults
(
100
);
List
<?
implements
StockPrice
>
=
q
.
getResultList
();
This returns a list suitable for displaying on the second page of the web application: items 101 - 200. Retrieving only the range of data needed will be more efficient than retrieving 200 rows and discarding the first 100 of them.
Note that this example uses a named query
(the
createNamedQuery()
method) rather
than an ad hoc query (the
createQuery()
method). In many JPA implementations, named
queries are faster: the JPA implementation will almost always use a prepared
statement with bind parameters, utilizing the statement cache pool. There is
nothing that prevents JPA implementations from using similar logic for
unnamed, ad hoc queries, though implementing that is more difficult,
and the JPA
implementation may simply default to creating a new statement (i.e., a
Statement
object) each time.
PreparedStatement
for named queries.
One of the canonical performance-related use cases for Java is to supply a middle tier that caches data from back-end database resources. The Java tier performs a number of architecturally useful functions (such as preventing clients from directly accessing the database). From a performance perspective, caching frequently-used data in the Java tier can greatly speed up response times for the clients.
JPA is designed with that architecture in mind. There are two kinds of caches in JPA. Each entity manager instance is its own cache: it will locally cache data that it has retrieved during a transaction. It will also locally cache data that is written during a transaction; the data is sent to the database only when the transaction commits. A program may have many different entity manager instances,[73] each executing a different transaction, and each of has its own local cache.
When an entity manager commits a transaction, all data in the local cache can be merged into a global cache. The global cache is shared among all entity managers in the application. The global cache is also known as the Level 2 (L2) cache or the second-level cache; the cache in the entity manager is known as the Level 1, L1, or first-level cache.
There is little to tune within an entity manager transaction (the L1 cache), and the L1 cache is enabled in all JPA implementations. The L2 cache is different: most JPA implementations provide one, but not all of them enable it by default (e.g., Hibernate does not, but Eclipselink does). Once enabled, the way in which the L2 cache is tuned and used can substantially affect performance.
The JPA cache operates only on entities accessed by their primary keys—that is, items retrieved from a call to the
find()
method, or items
retrieved from accessing (or eagerly loading) a related entity. When the
entity manager attempts to find an object via either its primary key or
a relationship mapping, it can look in the L2 cache and return the object(s)
if they are found there—thus saving a trip to the database.
Items
retrieved via a query are not held in the L2 cache.
Some
JPA implementations do have a vendor-specific mechanism to cache the results
of a query, but those results are only reused if the exact same query is
re-executed. Even if the JPA implementation supports query caching,
the entities themselves are not stored in the L2 cache and cannot be returned
in a subsequent call to the
find()
method.
There are many ways that the connections between the L2 cache, queries, and the loading of objects affects performance. To examine them, code based on the following loop will be used:
EntityManager
em
=
emf
.
createEntityManager
();
Query
q
=
em
.
createNamedQuery
(
queryName
);
List
<
StockPrice
>
l
=
q
.
getResultList
();
for
(
StockPrice
sp
:
l
)
{
...
process
sp
...
if
(
processOptions
)
{
Collection
<?
extends
StockOptionPrice
>
options
=
sp
.
getOptions
();
for
(
StockOptionPrice
sop
:
options
)
{
...
process
sop
...
}
}
}
em
.
close
();
Because of the L2 cache, this loop will perform one way the first time it is executed, and another (generally faster) way on subsequent executions. The specific difference of that performance depends on various details of the queries and the entity relationships. The next few subsections will explain the results in detail.
The differences in this example are based in
some cases on different JPA
configurations, but also because some tests are executed without
traversing the relationship between the
Stock
and
StockOptions
classes.
In those tests without traversal of the relationship, the
processOptions
value in the loop is false
; only the
StockPrice
objects are actually used.
In the sample code, the stock prices are loaded via a named query. In the default case, this simple query is executed to load the stock data:
@NamedQuery
(
name
=
"findAll"
,
query
=
"SELECT s FROM StockPriceImpl s ORDER BY s.id.symbol"
)
The
StockPrice
class has a
@OneToMany
relationship with the
StockOptionPrice
class using the
optionsPrices
instance variable:
@OneToMany
(
mappedBy
=
"stock"
)
private
Collection
<
StockOptionPrice
>
optionsPrices
;
@OneToMany
relationships are loaded lazily by default. Table 11-3
shows the time to execute this loop.
Test Case | First Execution | Subsequent Executions |
Lazy Relationship | 61.9 seconds (33,409 SQL calls) | 3.2 seconds (1 SQL call) |
Lazy Relationship, No Traversal | 5.6 seconds (1 SQL call) | 2.8 seconds (1 SQL call) |
The first time the sample loop is executed in this scenario (for 128
stocks with one year of data), the JPA code executes 1 SQL statement
in the
call to the
executeQuery()
method. That statement is executed at SQL Call Site #1 in the
code listing.
As the code loops through the stock and visits each collection of option prices, JPA will issue SQL statements to retrieve all the options associated with the particular entity (that is, it retrieves the entire collection for one stock/date combination at once). This occurs at SQL Call Site #2, and it results in 33,408 individual select statements during execution (261 days * 128 stocks).
That example takes almost 62 seconds for the first execution of the loop. The next time that code is executed, it takes only 3.2 seconds. That’s because the second time the loop is executed, the only SQL that is executed is the named query. The entities retrieved via the relationship are still in the L2 cache, so no database calls are needed in that case.[74]
The second line in Table 11-3 represents the code that does not
visit each of the options in the relationship (i.e., the
processOptions
variable is false
). In that case, the code is substantially faster:
it takes 5.6 seconds for the first iteration of the loop, and 2.8 seconds
for subsequent iterations.[75]
In the next two experiments, the relationship between the stock prices and option prices is redefined so that the option prices are loaded eagerly.
Test Case | First Execution | Subsequent Executions |
Eager Relationship | 60.2 seconds (33,409 SQL calls) | 3.1 seconds (1 SQL call) |
Eager Relationship, No Traversal | 60.2 seconds (33,409 SQL calls) | 2.8 seconds (1 SQL call) |
When all the data is used (i.e., the first lines in Table 11-3 and Table 11-4), the performance of the eager and lazy loading cases is essentially the same. But when the relationship data isn’t actually used (the second lines in each table), the lazy relationship case saves some time—particularly on the first execution of the loop. Subsequent executions of the loop don’t save time since the eager loading code isn’t reloading the data in those subsequent iterations; it is loading data from the L2 cache.
As discussed in the previous section, the query could be written to explicitly use a join statement:
@NamedQuery
(
name
=
"findAll"
,
query
=
"SELECT s FROM StockPriceEagerLazyImpl s "
+
"JOIN FETCH s.optionsPrices ORDER BY s.id.symbol"
)
Using that named query (with full traversal) yields the data in Table 11-5.
Test Case | First Execution | Subsequent Executions |
Default Configuration | 61.9 seconds (33,409 SQL calls) | 3.2 seconds (1 SQL call) |
Join Fetch | 17.9 seconds (1 SQL call) | 11.4 seconds (1 SQL call) |
Join Fetch with Query Cache | 17.9 seconds (1 SQL call) | 1.1 seconds (0 SQL call) |
The first time the loop is executed with a join query, there is a big performance win: it takes only 17.9 seconds. That is the result of issuing only one SQL request, rather than 33,409 of them.
Unfortunately, the next time the code is executed, it still needs that one SQL statement statement, since query results are not in the L2 cache. Subsequent executions of the example take 11.4 seconds—because the SQL statement that is executed has the JOIN statement and is retrieving over 200,000 rows of data.
If the JPA provider implements query caching, this is clearly a good time to use it. If no SQL statements are required during the second execution of the code, only 1.1 seconds is required on the subsequent executions. Be aware that query caching works only if the parameters used in the query are exactly the same each time the query is executed.
If entities are never retrieved via a query, then after an initial warmup period, all entities can be accessed through the L2 cache. The L2 cache can be warmed up by loading all entities, so slightly modifying the previous example gives this code:
EntityManager
em
=
emf
.
createEntityManager
();
ArrayList
<
String
>
allSymbols
=
...
all
valid
symbols
...;
ArrayList
<
Date
>
allDates
=
...
all
valid
dates
...;
for
(
String
symbol
:
allSymbols
)
{
for
(
Date
date
=
allDates
)
{
StockPrice
sp
=
em
.
find
(
StockPriceImpl
.
class
,
new
StockPricePK
(
symbol
,
date
);
...
process
sp
...
if
(
processOptions
)
{
Collection
<?
extends
StockOptionPrice
>
options
=
sp
.
getOptions
();
...
process
options
...
}
}
}
The results of executing this code are given in Table 11-6.
Test Case | First Execution | Subsequent Executions |
Default Configuration | 61.9 seconds (33,409 SQL calls) | 3.2 seconds (1 SQL call) |
No Query | 100.5 seconds (66,816 SQL calls) | 1.19 seconds (0 SQL calls) |
The first execution of this loop requires
66,816 SQL statements: 33,408
for the call to the
find()
method, and an additional 33,408 for the call
to the
getOptions()
method. Subsequent executions of that code are the fastest possible,
since all the entities are in the L2 cache, and no SQL statements need be
issued.
Recall that in the sample database, there are five option prices for every date and symbol pair, or a total of 167,040 option prices for 128 stocks over one year of data. When the five stock options for a particular symbol and date are accessed via a relationship, they can all be retrieved at once. That’s why only 33,408 SQL statements are required to load all the option price data. Even though multiple rows are returned from those SQL statements, JPA is still able to cache the entities—it is not the same things as executing a query. If the L2 cache is warmed up by iterating through entities, don’t iterate through related entities individually—do that by simply visiting the relationship.
As code is optimized, you must take into
account the effects of the cache (and particularly the L2 cache). Even if
you think you could write better SQL than what JPA generates (and hence
should use
complex named queries), make sure that
code is worthwhile once the cache comes into play. Even if it
seems that using a simple named query will be faster to load data,
consider what
would happen in the long run if those entities were loaded into the L2 cache
via a call to the find()
method.
Like all cases where objects are reused, the JPA cache has a potential performance downside: if the cache consumes too much memory, it will cause GC pressure. This may require that the cache be tuned to adjust its size, or that you control the mode in which entities remain cached. Unfortunately, these are not standard options, so you must perform these tunings based on which JPA provider you are using.
JPA implementations typically provide an option to set the size of the cache, either globally or on a per-entity basis. The latter case is obviously more flexible, though it also requires more work to determine the optimal size for each entity. An alternative approach is for the JPA implementation to use soft and/or weak references for the L2 cache. Eclipselink, for example, provides five different cache types (plus additional deprecated types) based on different combinations of soft and weak references. That approach, while potentially easier than finding optimal sizes for each entity, still requires some planning—in particular. recall from Chapter 7 that weak references do not really survive any GC operation and are hence a questionable choice for a cache.
If a cache based on soft or weak references is used, the performance of the application will also depend on what else happens in the heap. The examples of this section all used a large heap so that caching the 200,448 entity objects in the application would not cause issues with the garbage collector. Tuning a heap when there are large JPA L2 caches is quite important for good performance.
The JPA specification does not directly address the notion of a read-only entity, but many JPA providers do. A read-only entity will usually offer much better performance than a (default) read-write entity, because the JPA implementation knows that it does not need to keep track of the state of the entity, nor enroll the entity in transactions, nor lock the entity, and so on. In Java EE containers, read-only entities are often supported regardless of the JPA implementation used. The application server in that case ensures that the entities are accessed using a special, non-transactional JDBC connection. This usually offers a significant performance benefit.
In a Java EE container, the area of transaction support for read-only
entities is one area that
the JPA specification does address: a business method that is annotated
with
@TransactionAttributeType.SUPPORTS
can be run outside of a transaction
(assuming no transaction is in progress when that method is called).
In that case, the entities accessed in that method must be, essentially, read-only, since they are not part of a transaction. However, if the method is called from another method that is part of a transaction, the entities still become part of the transaction.
Properly tuning JDBC and JPA access to a database is one of the most significant ways to affect the performance of a middle-tier application. Keep in mind these best practices:
[69] As always, things are subject to change. Oracle, for example, has supplies a JDBC driver for its Distributed Remote Connection Pool (DRCP) implementation. Strictly speaking, that is a type 3 driver, though it is the same driver jar file as the usual type 4 JDBC driver, and the type 3/type 4 dichotomy is transparent to end users.
[70] Statement pooling is often called statement caching by database vendors.
[71] Statement pooling is disabled if the value passed
to the
setMaxStatements()
method is 0.
[72] In truth, stock applications would frequently use no locking when possible just because of the volume of changes, although actual trade updates would require some locking.
[73] In particular, the entity managers injected into Java EE applications are distinct instances.
[74] Recall that the L2 cache only works for entities loaded from a relationship or a find operation. So the stock option entities can be found in the L2 cache, but the stock prices—since they were loaded from a query—do not appear in the L2 cache and must be reloaded.
[75] The difference in performance between those two cases is due to the warm-up period of the compiler. Although it wasn’t as noticeable, that warm-up occurred in the first example as well.
3.15.182.62