Chapter 5. Object-Relational Mapping with JPA

Michael McDonald, five-time Grammy award winner, singer said, "I realized early on that I wouldn't sing for very long if I kept trying to sound like James Brown!". In the previous chapter, we covered the fundamentals of entity beans, the Plain Old Java Objects (POJOs) that are enhanced through metadata so that they can be persisted to a relational database or other long-term storage.

The Java Persistence API is a very involved specification and supports the flexibility and features of relational databases and the mapping of Java objects to database tables and columns. In this chapter, we venture much deeper in the JPA specification so that we further configure how JPA persistence provider maps persistence objects—entities—to a database.

We shall start by adding some finesse to entity beans.

Adding finesse to entity beans

As our first port of call, let us revisit how to configure field mappings and property accessors in the entity beans.

Field binding

There are two forms of binding in JPA: lazy and eager. The binding is controlled by the enumeration javax.persistence.FetchType. Binding determines when the persistence provider loads state into an entity. The process of loading state from the database is called fetching . In order to preserve performance, for simple entities the JPA provider will load state eagerly (FetchType.EAGER), especially for simple primitive type fields. This also applies to state load for fields implicitly annotated as @Basic. For collections of objects, the JPA provider will load the state lazily (FetchType.LAZY) in order to avoid the possibly of loading a deeply nested object graph, which inevitably would kill your application's performance.

Software engineers generally make a decision on the part of the entity object graph that is set to EAGER or LAZY. It depends on the query, and naïve consideration can lead to the infamous N+1 query problem. Later in this chapter, we cover the issues around excessive queries. Java EE 7 and JPA 2.1 introduce the idea fetch plan that can help balance the performance of queries. See Chapter 11 Advanced Topics in Persistence.

Binding eagerly

In JPA 2.1, the value of FetchType.EAGER is the fetch attribute default for many annotations including @Basic, @ManyToOne and @OneToOne. The EAGER value is a specification requirement on the JPA provider to eagerly fetch data relevant to the field or property accessor of the entity.

For direct reference the default value in the specification helps the persistence provider, the JPA vendor, write viable implementation that performs as expected. The idea is borrowed from the principle of least astonishment.

Binding lazily

The enumeration value FetchType.LAZY is the default value for the fetch attribute in most of the JPA annotations: @ElementCollection, @ManyToMany, and @OneToMany.

When an entity can load multiple dependent entities, the default value in specification does help persistence providers to treat collections as lazily loaded entities.

You should always treat the FetchType.LAZY value as a hint to the JPA provider to do the right thing. There is no guarantee that it will obey the advice, and the provider is allowed to analyze the persistence context and only then decide to honor the hint if it is possible, and of course, makes sense.

Let us suppose we have a smart implementation of the JPA specification and we have two entities, the master and the detail. The master is associated with the detail in a one-to-many relationship. By default this would be lazy fetch. What if our smart JPA provider recognizes at runtime that a set of master entities only ever has one detail? In other words, a smart JPA provider may choose to eagerly load the master and detail in one operation because of some internal secret-sauce optimization algorithm.

The trade-off between eager and lazy

Here is some code to illustrate the trade-off between the eager and lazy loading definitions.

The entity Customer has a name and address and a set of invoices. For the purposes of the discussion we do not see the source code for the Address and Invoice entities.

The code now follows:

@Entity
public class Customer implements java.io.Serializable {
  @Id  
  @Column(name="CUST_ID")
  private int id;
  
  @OneToOne
  @JoinColumn(name="CUSTOMER_ADDRESS_REF",
    referencedColumnName="ADDRESS_ID")
  private Address address;
  /* ... */
  
  @OneToMany
  private List<Invoice> invoices;
  
  public List<Invoice> getInvoices() { return invoices; }
  public void setInvoices( List<Invoice> invoices ) { 
    this.invoices = invoices; }
}

The Customer entity has a one-to-one relationship with an Address entity, which is another way of saying every customer has an address. The Customer entity has a list collection of Invoice entities, which represents a one-to-many relationship. A single customer may have ten, a thousand and one, or zero invoices.

Without any further annotations, when a JPA provider loads the state of the Customer entity from the database, it will also load the Address entity eagerly, because there is a direct reference to the other entity. The JPA Provider will also eagerly load the state for Customer for properties that are primitive and basic JDBC database types.

For the collection properties, when a JPA provider loads the state of the Customer, it will lazily load the associated Invoice entities from the database. This is the default behavior to achieve common-sense performance. The JPA provider will only load states for the dependent records in the customer record if the method getInvoices() is invoked in an external call.

It is probably not efficient to load a customer with a huge collection of invoices and such a penalty of time and loading may not make sense for all applications, because retrieving entire setS of records may be unnecessary if the end client is only interested in a few. Specifying the attribute FetchType.LAZY infers the loading of state is on-demand.

Note

The EAGER strategy is a requirement on the persistence provider runtime that data be eagerly fetched. The LAZY strategy is a hint (not a requirement) to the persistence provider runtime that data should be loaded lazily when it is first accessed.

Let's revisit the example code and reverse the fetch style for both properties in the customer entity:

@Entity
public class Customer implements java.io.Serializable {
  @Id  
  @Column(name="CUST_ID")
  private int id;
  
  @OneToOne(fetch=FetchType.LAZY)
  @JoinColumn(name="CUSTOMER_ADDRESS_REF",
    referencedColumnName="ADDRESS_ID")
  private Address address;
  /* ... */
  
  @OneToMany(fetch=FetchType.EAGER)
  private List<Invoice> invoices;
  
  public Address getAddress();
  public void setAddress(Address address) {
    this.address = address;  }
    
  public List<Invoice> getInvoices() { return invoices; }
  public void setInvoices( List<Invoice> invoices ) { 
    this.invoices = invoices; }
}

Here in this version of Customer entity, the Address entity is loaded on demand, at the behest of the persistence provider. The full list of invoices associated with the Customer record is loaded eagerly whenever the JPA Provider decides to load the master entity. The invoices are already loaded for this version of the customer by the time getInvoices() method is called.

Note

Many engineers tackle the issue of dependency and responsibility by splitting the association into an idiom called master and detail. The master usually is the owner of the detail. If the master record does not exists and the child detail does exist, then it makes no semantic sense. A child detail record with no parent master is considered orphaned or a free record. There are some synonyms of the master-detail, namely: parent-child, master, and slave.

If we intend to have a customer entity that is instantiated, persisted, and then detached from the persistence context, then a detached customer entity will have a full set of customer invoices to hand, but will not have ready access to the address entity. Potentially, having the full invoices could be architecturally useful for an application that sends data back to a remote client, like a web client. In this circumstance, there is a clear advantage in pre-binding some associations like we have done with the invoices.

Overriding the invoices' collection association to be eagerly loaded (FetchType.EAGER) could have a consequence: what if the Invoice entity actually contains other associations? If the invoice entity also contains more unseen entities that are eagerly bound and therefore fetched from the database, our little application could see possible performance degradation, because the persistence context is loading unnecessary entities.

Deciding if and when to override the default fetching strategy for entities is a delicate matter of application design. As software designers and architects, we certainly have to think rather carefully about how to improve the efficiency of the JPA applications and we have to avoid creating JPA cascades, a snow storm of load states between eagerly loaded entities, which appear to be out-of-control.

Cascades onto dependent entities

JPA allows related entities to cascade life-cycle operations across references. We have seen to override the fetching of load state for entities. Fetching, however, is nothing to do with lifecycle management. JPA allows the developer to control how a dependent entity is saved or updated, merged, deleted, refreshed, and detached to and from the database with the parent [master] entity. The cascade behavior can be precisely controlled by configuring the cascade attribute on the entity relationship annotations: @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany.

Cascade operations

The enumeration javax.persistence.CascadeType defines the different level of cascade operations. Here is the source code for it:

public enum CascadeType {
  ALL,
  PERSIST,
  MERGE,
  REMOVE,
  REFRESH,
  DETACH
}

To define all cascade operations for the customer entity we should also propagate on the address record. Then we configure the @OneToOne annotation as follows:

@Entity
public class Customer implements java.io.Serializable {
   /* ... */
  
  @OneToOne(cascade=CascadeType.ALL)
  @JoinColumn(name="CUSTOMER_ADDRESS_REF",
    referencedColumnName="ADDRESS_ID")
  private Address address;
  /* ... */
}

Let's understand the meaning of the cascade operation:

Given a Customer entity, when the EntityManager is called with persist() with this object, this operation will also be invoked on the Address object referenced by the field.

The cascade repeats for other EntityManager operations: remove(), merge(), refresh(), and detach(), because the address is annotated with CascadeType.ALL.

The follow table describes the cascade operation enumerations in detail.

Operation

Description

ALL

This equivalent to the following

cascade={PERSIST, MERGE, REMOVE, REFRESH, DETACH}

PERSIST

Specifies that the entity manager saves or updates the dependent entity to the database when the master entity is also persisted.

MERGE

Specifies that the entity manager merges the dependent entity with the existing copy in the database when the master entity is also merged.

REMOVE

Specifies that the entity manager removes the dependent entity from the database when the master entity is also removed.

REFRESH

Specifies that the entity manager refreshes the dependent entity from the database when the master entity is also refreshed.

DETACH

Specifies that the entity manager detaches the dependent entity when the master entity is also detached. (Since JPA 2.1)

The cascade annotation attribute provides the engineer a flexible way to configure the lifecycle of dependent entities. The control provided can save the developer's time and avoid engineers having to write boilerplate code themselves that cascades database operations.

Removal of orphans in relationships

JPA allows the engineers to configure the behavior of orphans in a one-to-one or many-to-one relationship. An orphan is an entity instance that is already set pending to be removed because it was removed from the relationship (the collection), or because it was replaced by a new entity (in the collection). The issue, here, is that when we remove an entity from a collection, we have a dependent detail entity that is no longer referenced by the master. This is reflected inside the target database with a record that is no longer being used. Hence the database row becomes orphaned.

For example, if we have a customer record already persisted to the database with our address. If we replace the reference in the Customer entity with a null pointer, what do we want to have happened in the database? Clearly we want to remove the ADDRESS row from the database table record.

Likewise, as we continue the example, if one of the Invoice entities is removed from the list collection of invoices in the Customer entity, then the Invoice is no longer referenced from the master Customer record. We possibly have an INVOICE row in the database table that is orphaned and no longer used by the application.

JPA allows the orphans to be removed only on @OneToOne and @OneToMany relationships. This attribute is called orphanRemoval and it expects a Boolean value, which by default is set to false.

Here is an example of the use of orphan removal applied only to the invoices in the customer entity:

@Entity
public class Customer implements java.io.Serializable {	
  /* ... */	
  @OneToMany(orphanRemoval=true)
  private List<Invoice> invoices;
  public List<Invoice> getInvoices() { return invoices; }
  public void setInvoices( List<Invoice> invoices ) { 
    this.invoices = invoices; }
}

We apply the orphan removal operation in the Invoice dependent entity in the Customer entity code example. JPA provider will act should one or more of the Customer instances be removed from the collection of Invoice objects. The orphan removal operations take place when the EntityManager flushes the persistence context with the master Customer entity; it will then automatically remove the orphaned invoice records from the database table.

Finally, the removal of orphans is entirely separate to the Cascade.REMOVE operations. It happens independently of the cascade operations, if set.

Let us now look at the last method of finessing entity beans under JPA—how to configure automatic generation of primary key values for entities.

Generated values and primary keys

Many relational databases support the automatic generation of primary keys, which can be extremely useful when inserting new records into a database table. However, the database vendors traditionally provide non-portable ways to achieve auto incrementing integers. Some databases support the creation of database sequence types, some databases have a master incremental table value or view, and some databases have some completely novel schemes to generate unique identifiers.

The JPA specification allows the Java developer to use a strategy into order to create primary keys automatically. The key to defining strategy is in the annotation, which we have already seen, called @javax.persistence.GeneratedValue. This annotation only supports simple primary keys. The strategy attribution has the following definition:

public enum GenerationType {TABLE, SEQUENCE, IDENTITY, AUTO };

The enumeration @javax.persistence.GeneratedType has four values and they are described in the following table:

Value

Description

TABLE

Specifies that the persistence provider generates primary keys for the entity from the supplied database table, which is defined by the additional generator attribute.

SEQUENCE

Specifies that the persistence provider generates primary keys for the entity from the supplied database sequence, which is defined by the additional generator attribute.

WARNING: Sequence strategy is not portable across database vendors.

IDENTITY

Specifies that the persistence provider generates primary keys for the entity from the special database identity column, which is defined by the additional generator attribute.

WARNING: Identity strategy is not portable across database vendors.

AUTO

Specifies that the persistence provider picks a strategy for the entity that is appropriate to the database vendor in order to generate primary keys. The AUTO strategy may expect a particular sequence, table, or identity or it may generate one of them.

This is the most portable of the strategies.

Table auto increment

The GeneratedType.TABLE enumeration is the most portable of the settings. A database table is created or updated by the persistence provider with two columns. The first column is the sequence name that requires the increment and the second column is the current value.

Let us modify the earlier customer entity to use a table identity for its customer ID:

@Entity
public class Customer implements java.io.Serializable {
  @Id
   @GeneratedValue(value=GeneratedType.TABLE,
      generator="CUSTOMER_SEQ")	
  @Column(name="CUST_ID")
  private int id;
  
  /* ... */
}

The Customer entity now uses the auto increment through a database table strategy and it declares a sequence name called CUSTOMER_SEQ. The persistence context may create a table called SEQUENCE_TABLE with the columns SEQUENCE_NAME and SEQUENCE_VALUE.

Here is what this table will look like:

SEQUENCE_NAME

SEQUENCE_VALUE

CUSTOMER_SEQ

146273

INVOICE_SEQ

23941580

EMPLOYEE_SEQ

2081

There is a row for each sequence in the table, and every time the persistence provider requires a primary key for a new entity, it will read the current value and then increment value and store it back into the SEQUENCE_TABLE. The new current value, which was just incremented, is the one supplied to the entity as a new primary key.

The database table is most likely shared with other entities in the same persistence context, and therefore the database schema.

The TABLE strategy is portable across different database vendors, because it is simply a regular database table generated by the persistence provider, and gives the engineer control of how sequences are created for the application. The table and how the table is incremented can be configured with, say, pre-allocation, which might be important for data population insertion performance. Pre-allocation is very useful in situations when there are lots of insertions are happening.

There are issues with the TABLE strategy, namely to do with concurrency access. If the database table is shared between two or more JVMs in a clustered environment without some synchronization of access to the underlying database table, an application could cause inconsistency issues and of course failure to insert records with the others.

JPA also provides a special annotation called @javax.persistence.TableGenerator, which can further configure generation of primary keys from the table strategy. This annotation requires a reference name, the sequence name, which is provided by the @GeneratedValue annotation. Using the @TableGenerator annotation, a developer can set the initial value, the pre-allocation size, optionally the database table, catalogue, or schema, and also set the column name of the entity primary key.

Here is a revised example of the customer entity that now uses the @TableGenerator annotation:

@Entity
public class Customer implements java.io.Serializable {
    @Id
    @GeneratedValue(strategy=GeneratedType.TABLE,
        generator="CustomerSeq")
    @TableGenerator(
        name="CustomerSeq",
        catalog="APP_KEYS",
        table="APP_IDS_TABLE",
        pkColumnName="SEQ_KEY",
        valueColumnName="SEQ_VALUE",
        pkColumnValue="CUSTOMER_ID",
        initialValue=1000000,
        allocationSize=25)
    private int id;
	
    /* ... */
}

This example specifies that the primary key id of the entity Customer is generated with the table strategy. The name of the sequence generator is called CustomerSeq, and we specify an explicit database schema called APP_KEYS, which is an optional attribute. The actual database table is called APP_IDS_TABLE with the primary column name called SEQ_KEY and the value column name called SEQ_VALUE. The database column name for the primary key of the entity is called CUSTOMER_ID; hence we do not require an additional @Column annotation just to override the column name. The @TableGenerator specifies an initial value of 100000 and we have a pre-allocation size of 25 (the default value is 50).

Overall, the @TableGenerator annotation allows developers to have more control of the auto-generation of primary keys for this strategy, rather than just declaring an identity with the @GeneratedValue.

Sequence auto increment

The GeneratedType.SEQUENCE enumeration specifies that the primary key of the entity is populated according to a database sequence. Database sequences are only implemented by database vendors such as Oracle, DB2, Postgres and Apache Derby. Therefore using SEQUENCE strategy is an implementation concern, if you choose this strategy for an application.

The JPA provides an additional annotation @javax.persistence.SequenceGenerator that can give more precise control of how the sequence is generated. The annotation allows the allocation size to be defined, as well as the name of the sequence itself.

Let us modify the earlier customer entity to use a sequence identity for its customer ID:

@Entity
public class Customer implements java.io.Serializable {
  @Id
   @GeneratedValue(value=GeneratedType.TABLE,
      generator="CUSTOMER_SEQ")	
@SequenceGenerator(name="CUSTOMER_SEQ",
      sequenceName="CUSTOMER_SEQ", 
      initialValue=3000000, allocationSize=50 )
  @Column(name="CUST_ID")
  private int id;
  
  /* ... */
}

The Customer entity now uses the auto increment through sequence strategy and it declares a sequence name called CUSTOMER_SEQ. The persistence context may create a database vendor specific sequence object called CUSTOMER SEQUENCE and with only one integral column, say NEXT_VAL.

The sequence object will have an INCREMENT size value, and starting value, and allocation size value. Some database providers that support sequence objects allow them to cycle around, although this feature is not supported by the current JPA 2.1 specification.

Every time the persistence provider requires a new value for the entity, it will create the sequence object for the next value. The database will take off automatically incrementing the value itself.

Although the sequence strategy is least portable, it has the benefit of being the best able to support concurrency across JVMs; and sequence objects are efficient in pre-allocation of primary key identities.

Identity auto increment

The GeneratedType.IDENTITY enumeration specifies that the primary key of the entity is populated according to a database specific identity column. Database identity columns are only implemented by some database vendors such as MySQL, SQL Server, Sybase, and Apache Derby.

An identity column is a column that stores numbers that increment by one with each insertion. Identity columns are sometimes called auto-increment columns.

Let us modify the earlier customer entity to use an identity column for its customer ID:

@Entity
public class Customer implements java.io.Serializable {
  @Id
   @GeneratedValue(value=GeneratedType.IDENTITY,
      generator="CUSTOMER_SEQ")	
  @Column(name="CUST_ID")
  private int id;
  
  /* ... */
}

It is very easy to create an identity with the annotation @GeneratedValue. JPA provides no other special annotation for identity columns.

There are drawbacks for identity columns. The first concern is the fact that the next primary key is only available after the record has been inserted into the database. The second concern is that it is not possible to have a pre-allocation of primary key identities, and it could be a performance problem when and if your application produces proportionally more insertions than reads on certain entity beans.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.197.95