The largest part of an API that persists objects to a relational database ends up being the object-relational mapping (ORM) component. The topic of ORM usually includes everything from how the object state is mapped to the database columns to how to issue queries across the objects. We are focusing this chapter primarily on how to define and map entity state to the database, emphasizing the simple manner in which it can be done.
This chapter introduces the basics of mapping fields to database columns and then goes on to show how to map and automatically generate entity identifiers. We go into some detail about different kinds of relationships and illustrate how they are mapped from the domain model to the data model.
Idiomatic persistence: By enabling to write the persistence classes using object-oriented classes
High performance: By enabling fetching and locking techniques
Reliable: By enabling stability for Jakarta Persistence programmers
Persistence Annotations
We have shown in previous chapters how annotations have been used extensively both in the Enterprise Beans and Jakarta Persistence specifications. We discuss persistence and mapping metadata in significant detail, and because we use annotations to explain the concepts, it is worth reviewing a few things about the annotations before we get started.
Persistence annotations can be applied at three different levels: class, method, and field. To annotate any of these levels, the annotation must be placed in front of the code definition of the artifact being annotated. In some cases, we put them on the same line just before the class, method, or field; in other cases, we put them on the line above. The choice is based completely on the preferences of the person applying the annotations, and we think it makes sense to do one thing in some cases and the other in other cases. It depends on how long the annotation is and what the most readable format seems to be.
The Jakarta Persistence annotations were designed to be readable, easy to specify, and flexible enough to allow different combinations of metadata. Most annotations are specified as siblings instead of being nested inside each other, meaning that multiple annotations can annotate the same class, field, or property instead of having annotations embedded within other annotations. As with all trade-offs, the piper must be paid, however, and the cost of flexibility is that many possible permutations of top-level metadata will be syntactically correct but semantically invalid. The compiler will be of no use, but the provider runtime will often do some basic checking for improper annotation groupings. The nature of annotations, however, is that when they are unexpected, they will often just not get noticed at all. This is worth remembering when attempting to understand behavior that might not match what you thought you specified in the annotations. It could be that one or more of the annotations are being ignored.
The mapping annotations can be categorized as being in one of two categories: logical annotations and physical annotations. The annotations in the logical group are those that describe the entity model from an object modeling view. They are tightly bound to the domain model and are the sort of metadata that you might want to specify in UML or any other object modeling language or framework. The physical annotations relate to the concrete data model in the database. They deal with tables, columns, constraints, and other database-level artifacts that the object model might never be aware of otherwise.
We use both types of annotations throughout the examples and to demonstrate the mapping metadata. Understanding and being able to distinguish between these two levels of metadata will help you make decisions about where to declare metadata, and where to use annotations and XML. As you will see in Chapter 13, there are XML equivalents to all the mapping annotations described in this chapter, giving you the freedom to use the approach that best suits your development needs .
Accessing Entity State
The mapped state of an entity must be accessible to the provider at runtime, so that when it comes time to write the data out, it can be obtained from the entity instance and stored in the database. Similarly, when the state is loaded from the database, the provider runtime must be able to insert it into a new entity instance. The way the state is accessed in the entity is called the access mode.
In Chapter 2, you learned that there are two different ways to specify persistent entity state: you can either annotate the fields or annotate the JavaBean-style properties. The mechanism that you use to designate the persistent state is the same as the access mode that the provider uses to access that state. If you annotate fields, the provider will get and set the fields of the entity using reflection. If the annotations are set on the getter methods of properties, those getter and setter methods will be invoked by the provider to access and set the state.
Field Access
Annotating the fields of the entity will cause the provider to use field access to get and set the state of the entity. Getter and setter methods might or might not be present, but if they are present, they are ignored by the provider. All fields must be declared as either protected, package, or private. Public fields are disallowed because it would open up the state fields to access by any unprotected class in the VM. Doing so is not just an obviously bad practice but could also defeat the provider implementation. Of course, the other qualifiers do not prevent classes within the same package or hierarchy from doing the same thing, but there is an obvious trade-off between what should be constrained and what should be recommended. Other classes must use the methods of an entity in order to access its persistent state, and even the entity class itself should only really manipulate the fields directly during initialization.
Using Field Access
Property Access
When property access mode is used, the same contract as for JavaBeans applies, and there must be getter and setter methods for the persistent properties. The type of property is determined by the return type of the getter method and must be the same as the type of the single parameter passed into the setter method. Both methods must have either public or protected visibility. The mapping annotations for a property must be on the getter method.
Using Property Access
Mixed Access
It is also possible to combine field access with property access within the same entity hierarchy, or even within the same entity. This will not be a very common occurrence, but can be useful, for example, when an entity subclass is added to an existing hierarchy that uses a different access type. Adding an @Access annotation with a specified access mode on the subclass entity will cause the default access type to be overridden for that entity subclass.
The @Access annotation is also useful when you need to perform a simple transformation to the data when reading from or writing to the database. Usually you will want to access the data through field access, but in this case you will define a getter/setter method pair to perform the transformation and use property access for that one attribute. In general, there are three essential steps to add a persistent field or property to be accessed differently from the default access mode for that entity.
Consider an Employee entity that has a default access mode of FIELD, but the database column stores the area code as part of the phone number, and we only want to store the area code in the entity phoneNum field if it is not a local number. We can add a persistent property that transforms it accordingly on reads and writes.
Using Combined Access
Mapping to a Table
You saw in Chapter 2 that in the simplest case, mapping an entity to a table does not need any mapping annotations at all. Only the @Entity and @Id annotations need to be specified to create and map an entity to a database table.
Overriding the Default Table Name
Default names are not specified to be either uppercase or lowercase. Most databases are not case-sensitive, so it won’t generally matter whether a vendor uses the case of the entity name or converts it to uppercase. In Chapter 10, we discuss how to delimit database identifiers when the database is set to be case-sensitive.
Setting a Schema
When specified, the schema name will be prepended to the table name when the persistence provider goes to the database to access the table. In this case, the HR schema will be prepended to the EMP table each time the table is accessed.
Some vendors might allow the schema to be included in the name element of the table without having to specify the schema element, such as in @Table(name="HR.EMP"). Support for inlining the name of the schema with the table name is nonstandard.
Setting a Catalog
Mapping Simple Types
Primitive Java types: byte, int, short, long, boolean, char, float, and double
Wrapper classes of primitive Java types: Byte, Integer, Short, Long, Boolean, Character, Float, and Double
Byte and character array types: byte[], Byte[], char[], and Character[]
Large numeric types: java.math.BigInteger and java.math.BigDecimal
Strings: java.lang.String
Java temporal types: java.util.Date and java.util.Calendar
JDBC temporal types: java.sql.Date, java.sql.Time, and java.sql.Timestamp
Enumerated types: Any system or user-defined enumerated type
Serializable objects: Any system or user-defined serializable type
Sometimes the type of the database column being mapped to is not exactly the same as the Java type. In almost all cases, the provider runtime can convert the type returned by JDBC into the correct Java type of the attribute. If the type from the JDBC layer cannot be converted to the Java type of the field or property, an exception will normally be thrown, although it is not guaranteed.
When the persistent type does not match the JDBC type, some providers might choose to take proprietary action or make a best guess to convert between the two. In other cases, the JDBC driver might be performing the conversion on its own.
When persisting a field or property, the provider looks at the type and ensures that it is one of the persistable types listed earlier. If it is on the list, the provider will persist it using the appropriate JDBC type and pass it through to the JDBC driver. At that point, if the field or property is not serializable, the result is unspecified. The provider might choose to throw an exception or just try to pass the object through to JDBC. You will see in Chapter 10 how converters can be used to extend the list of types that can be persisted in Jakarta Persistence.
An optional @Basic annotation can be placed on a field or property to explicitly mark it as being persistent. This annotation is mostly for documentation purposes and is not required for the field or property to be persistent. If it is not there, then it is implicitly assumed in the absence of any other mapping annotation. Because of the annotation, mappings of simple types are called basic mappings, whether the @Basic annotation is actually present or is just being assumed.
Now that you have seen how you can persist either fields or properties and how they are virtually equivalent in terms of persistence, we will just call them attributes. An attribute is a field or property of a class, and we will use the term attribute from now on to avoid having to continually refer to fields or properties in specific terms.
Column Mappings
The @Basic annotation (or assumed basic mapping in its absence) can be thought of as a logical indication that a given attribute is persistent. The physical annotation that is the companion annotation to the basic mapping is the @Column annotation. Specifying @Column on the attribute indicates specific characteristics of the physical database column that the object model is less concerned about. In fact, the object model might never even need to know to which column it is mapped, and the column name and physical mapping metadata can be located in a separate XML file.
Mapping Attributes to Columns
To put these annotations in context, let’s look at the full table mapping represented by this entity. The first thing to notice is that no @Table annotation exists on the class, so the default table name of EMPLOYEE will be applied to it.
Lazy Fetching
On occasion , it will be known ahead of time that certain portions of an entity will be seldom accessed. In these situations, you can optimize the performance when retrieving the entity by fetching only the data that you expect to be frequently accessed; the remainder of the data can be fetched only when or if it is required. There are many names for this kind of feature, including lazy loading, deferred loading, lazy fetching, on-demand fetching, just-in-time reading, indirection, and others. They all mean pretty much the same thing, which is just that some data might not be loaded when the object is initially read from the database, but will be fetched only when referenced or accessed.
Lazy Field Loading
We are assuming in this example that applications will seldom access the comments in an employee record, so we mark it as being lazily fetched. Note that in this case, the @Basic annotation is not only present for documentation purposes but is also required in order to specify the fetch type for the field. Configuring the comments field to be fetched lazily will allow an Employee instance returned from a query to have the comments field empty. The application does not have to do anything special to get it, however. By simply accessing the comments field, it will be transparently read and filled in by the provider if it was not already loaded.
Before you use this feature, you should be aware of a few pertinent points about lazy attribute fetching. First and foremost, the directive to lazily fetch an attribute is meant only to be a hint to the persistence provider to help the application achieve better performance. The provider is not required to respect the request because the behavior of the entity is not compromised if the provider goes ahead and loads the attribute. The converse is not true, though, because specifying that an attribute be eagerly fetched might be critical to being able to access the entity state once the entity is detached from the persistence context. We discuss detachment more in Chapter 6 and explore the connection between lazy loading and detachment.
Second, on the surface it might appear that this is a good idea for certain attributes of an entity, but in practice it is almost never a good idea to lazily fetch simple types. There is little to be gained in returning only part of a database row unless you are certain that the state will not be accessed in the entity later on. The only times when lazy loading of a basic mapping should be considered are when there are many columns in a table (e.g., dozens or hundreds) or when the columns are large (e.g., very large character strings or byte strings). It could take significant resources to load the data, and not loading it could save quite a lot of effort, time, and resources. Unless either of these two cases is true, in the majority of cases, lazily fetching a subset of object attributes will end up being more expensive than eagerly fetching them.
Lazy fetching is quite relevant when it comes to relationship mappings, though, so we discuss this topic later in the chapter.
Large Objects
A common database term for a character or byte-based object that can be very large (up to the gigabyte range) is a large object, or LOB for short. Database columns that can store these types of large objects require special JDBC calls to be accessed from Java. To signal to the provider that it should use the LOB methods when passing and retrieving this data to and from the JDBC driver, an additional annotation must be added to the basic mapping. The @Lob annotation acts as the marker annotation to fulfill this purpose and might appear in conjunction with the @Basic annotation, or it might appear when @Basic is absent and implicitly assumed to be on the mapping.
Because the @Lob annotation is really just qualifying the basic mapping, it can also be accompanied by a @Column annotation when the name of the LOB column needs to be overridden from the assumed default name.
LOBs come in two flavors in the database: character large objects, called CLOBs, and binary large objects, or BLOBs. As their names imply, a CLOB column holds a large character sequence, and a BLOB column can store a large byte sequence. The Java types mapped to BLOB columns are byte[], Byte[], and Serializable types, while char[], Character[], and String objects are mapped to CLOB columns. The provider is responsible for making this distinction based on the type of the attribute being mapped.
Mapping a BLOB Column
Enumerated Types
Another of the simple types that might be treated specially is the enumerated type. The values of an enumerated type are constants that can be handled differently depending on the application needs.
As with enumerated types in other languages, the values of an enumerated type in Java have an implicit ordinal assignment that is determined by the order in which they were declared. This ordinal cannot be modified at runtime and can be used to represent and store the values of the enumerated type in the database. Interpreting the values as ordinals is the default way that providers will map enumerated types to the database, and the provider will assume that the database column is an integer type.
Mapping an Enumerated Type Using Ordinals
You can see that mapping EmployeeType is trivially easy to the point where you don’t actually have to do anything at all. The defaults are applied, and everything will just work. The type field will get mapped to an integer TYPE column, and all full-time employees will have an ordinal of 0 assigned to them. Similarly, the other employees will have their types stored in the TYPE column accordingly.
If an enumerated type changes, however, then we have a problem. The persisted ordinal data in the database will no longer apply to the correct value. In this example, if the company benefits policy changed and we started giving additional benefits to part-time employees who worked more than 20 hours per week, we would want to differentiate between the two types of part-time employees. By adding a PART_TIME_BENEFITS_EMPLOYEE value after PART_TIME_EMPLOYEE, we would be causing a new ordinal assignment to occur, where our new value would get assigned the ordinal of 2 and CONTRACT_EMPLOYEE would get 3. This would have the effect of causing all the contract employees on record to suddenly become part-time employees with benefits, clearly not the result that we were hoping for.
We could go through the database and adjust all the Employee entities to have their correct type, but if the employee type is used elsewhere, then we would need to make sure that they were all fixed as well. This is not a good maintenance situation to be in.
A better solution would be to store the name of the value as a string instead of storing the ordinal. This would isolate us from any changes in declaration and allow us to add new types without having to worry about the existing data. We can do this by adding an @Enumerated annotation on the attribute and specifying a value of STRING.
The @Enumerated annotation actually allows an EnumType to be specified, and the EnumType is itself an enumerated type that defines values of ORDINAL and STRING. While it is somewhat ironic that an enumerated type is being used to indicate how the provider should represent enumerated types, it is wholly appropriate. Because the default value of @Enumerated is ORDINAL, specifying @Enumerated(ORDINAL) is useful only when you want to make this mapping explicit.
Mapping an Enumerated Type Using Strings
Note that using strings will solve the problem of inserting additional values in the middle of the enumerated type, but it will leave the data vulnerable to changes in the names of the values. For instance, if we wanted to change PART_TIME_EMPLOYEE to PT_EMPLOYEE, then we would be in trouble. This is a less likely problem, though, because changing the names of an enumerated type would cause all the code that uses the enumerated type to have to change also. This would be a bigger bother than reassigning values in a database column.
In general, storing the ordinal is the best and most efficient way to store enumerated types as long as the likelihood of additional values inserted in the middle is not high. New values could still be added on the end of the type without any negative consequences .
One final note about enumerated types is that they are defined quite flexibly in Java. In fact, it is even possible to have values that contain state. There is currently no support within the Jakarta Persistence for mapping state contained within enumerated values. Neither is there support for the compromise position between STRING and ORDINAL of explicitly mapping each enumerated value to a dedicated numeric value different from its compiler-assigned ordinal value. More extensive enumerated support is being considered for future releases.
Temporal Types
Temporal types are the set of time-based types that can be used in persistent state mappings. The list of supported temporal types includes the three java.sql types—java.sql.Date, java.sql.Time, and java.sql.Timestamp—and the two java.util types, java.util.Date and java.util.Calendar.
The java.sql types are completely hassle-free. They act just like any other simple mapping type and do not need any special consideration. The two java.util types need additional metadata, however, to indicate which of the JDBC java.sql types to use when communicating with the JDBC driver. This is done by annotating them with the @Temporal annotation and specifying the JDBC type as a value of the TemporalType-enumerated type. There are three enumerated values of DATE, TIME, and TIMESTAMP to represent each of the java.sql types.
Mapping Temporal Types
Like the other varieties of basic mappings, the @Column annotation can be used to override the default column name.
Transient State
Attributes that are part of a persistent entity but not intended to be persistent can either be modified with the transient modifier in Java or be annotated with the @Transient annotation. If either is specified, the provider runtime will not apply its default mapping rules to the attribute on which it was specified.
Using a Transient Field
Mapping the Primary Key
Every entity that is mapped to a relational database must have a mapping to a primary key in the table. You have already learned the basics of how the @Id annotation indicates the identifier of the entity. In this section, you explore simple identifiers and primary keys in a little more depth and learn how you can let the persistence provider generate unique identifier values.
When an entity identifier is composed of only a single attribute, it's called a simple identifier.
Overriding the Primary Key Column
The same defaulting rules apply to ID mappings as to basic mappings, which is that the name of the column is assumed to be the same as the name of the attribute. Just as with basic mappings, the @Column annotation can be used to override the column name that the ID attribute is mapped to.
Primary keys are assumed to be insertable, but not nullable or updatable. When overriding a primary key column, the nullable and updatable elements should not be overridden. Only in the very specific circumstance of mapping the same column to multiple fields/relationships (as described in Chapter 10) should the insertable element be set to false.
Primary Key Types
Primitive Java types: byte, int, short, long, and char
Wrapper classes of primitive Java types: Byte, Integer, Short, Long, and Character
String: java.lang.String
Large numeric type: java.math.BigInteger
Temporal types: java.util.Date and java.sql.Date
Floating point types such as float and double are also permitted, as well as the Float and Double wrapper classes and java.math.BigDecimal, but they are discouraged because of the nature of rounding error and the untrustworthiness of the equals() operator when applied to them. Using floating types for primary keys is a risky endeavor and is definitely not recommended.
Identifier Generation
Sometimes applications do not want to be bothered with trying to define and ensure uniqueness in some aspect of their domain model and are content to let the identifier values be automatically generated for them. This is called ID generation and is specified by the @GeneratedValue annotation .
When ID generation is enabled, the persistence provider will generate an identifier value for every instance of that entity type. Once the identifier value is obtained, the provider will insert it into the newly persisted entity; however, depending on the way it is generated, it might not actually be present in the object until the entity has been inserted in the database. In other words, the application cannot rely on being able to access the identifier until after either a flush has occurred or the transaction has completed.
Applications can choose one of the four different ID generation strategies by specifying a strategy in the strategy element. The value can be any one of AUTO, TABLE, SEQUENCE, or IDENTITY enumerated values of the GenerationType-enumerated type.
Table and sequence generators can be specifically defined and then reused by multiple entity classes. These generators are named and are globally accessible to all the entities in the persistence unit.
Automatic ID Generation
If an application does not care what kind of generation is used by the provider but wants generation to occur, it can specify a strategy of AUTO. This means that the provider will use whatever strategy it wants to generate identifiers. Listing 4-14 shows an example of using automatic ID generation. This will cause an identifier value to be created by the provider and inserted into the id field of each Employee entity that gets persisted.
It is not explicitly required that the entity identifier field be an integral type, but it is typically the only type that AUTO will create. We recommend that long be used to accommodate the full extent of the generated identifier domain.
Using Auto ID Generation
There is a catch to using AUTO, though. The provider gets to pick its own strategy to store the identifiers, but it needs to have some kind of persistent resource in order to do so. For example, if it chooses a table-based strategy, it needs to create a table; if it chooses a sequence-based strategy, it needs to create a sequence. The provider can’t always rely on the database connection that it obtains from the server to have permissions to create a table in the database. This is normally a privileged operation that is often restricted to the DBA. There will need to be some kind of creation phase or schema generation to cause the resource to be created before the AUTO strategy is able to function.
The AUTO mode is really a generation strategy for development or prototyping. It works well as a means of getting you up and running more quickly when the database schema is being generated. In any other situation, it would be better to use one of the other generation strategies discussed in the later sections .
ID Generation Using a Table
The most flexible and portable way to generate identifiers is to use a database table. Not only will it port to different databases but it also allows for storing multiple different identifier sequences for different entities within the same table.
An ID generation table should have two columns. The first column is a string type used to identify the particular generator sequence. It is the primary key for all the generators in the table. The second column is an integral type that stores the actual ID sequence that is being generated. The value stored in this column is the last identifier that was allocated in the sequence. Each defined generator represents a row in the table.
Because the generation strategy is indicated but no generator has been specified, the provider will assume a table of its own choosing. If schema generation is used, it will be created; if not, the default table assumed by the provider must be known and must exist in the database.
Although we are showing the @TableGenerator annotating the identifier attribute, it can actually be defined on any attribute or class. Regardless of where it is defined, it will be available to the entire persistence unit. A good practice would be to define it locally on the ID attribute if only one class is using it but to define it in XML, as described in Chapter 13, if it will be used for multiple classes.
The name element globally names the generator, allowing us to reference it in the generator element of the @GeneratedValue annotation. This is functionally equivalent to the previous example where we simply said that we wanted to use table generation but did not specify the generator. Now we are specifying the name of the generator but not supplying any of the generator details, leaving them to be defaulted by the provider.
We have included some additional elements after the name of the generator. Following the name are three elements—table, pkColumnName, and valueColumnName—that define the actual table that stores the identifiers for Emp_Gen.
The table element just indicates the name of the table. The pkColumnName element is the name of the primary key column in the table that uniquely identifies the generator, and the valueColumnName element is the name of the column that stores the actual ID sequence value being generated. In this case, the table is named ID_GEN, the name of the primary key column (the column that stores the generator names) is named GEN_NAME, and the column that stores the ID sequence values is named GEN_VAL.
The name of the generator becomes the value stored in the pkColumnName column for that row and is used by the provider to look up the generator to obtain its last allocated value.
Note that the last allocated Employee identifier is 0, which tells us that no identifiers have been generated yet. An initialValue element representing the last allocated identifier can be specified as part of the generator definition, but the default setting of 0 will suffice in almost every case. This setting is used only during schema generation when the table is created. During subsequent executions, the provider will read the contents of the value column to determine the next identifier to give out.
To avoid updating the row for every single identifier that gets requested, an allocation size is used. This will cause the provider to preallocate a block of identifiers and then give out identifiers from memory as requested until the block is used up. Once this block is used up, the next request for an identifier triggers another block of identifiers to be preallocated, and the identifier value is incremented by the allocation size. By default, the allocation size is set to 50. This value can be overridden to be larger or smaller through the use of the allocationSize element when defining the generator.
The provider might allocate identifiers within the same transaction as the entity being persisted or in a separate transaction. It is not specified, but you should check your provider documentation to see how it can avoid the risk of deadlock when concurrent threads are creating entities and locking resources.
Using Table ID Generation
ID Generation Using a Database Sequence
Many databases support an internal mechanism for ID generation called sequences. A database sequence can be used to generate identifiers when the underlying database supports them.
The initial value and allocation size can also be used in sequence generators and would need to be reflected in the SQL to create the sequence. Note that the default allocation size is 50, just as it is with table generators. If schema generation is not being used, and the sequence is being manually created, the INCREMENT BY clause would need to be configured to match the allocationSize element or default allocation size of the corresponding @SequenceGenerator annotation.
ID Generation Using Database Identity
Some databases support a primary key identity column, sometimes referred to as an autonumber column. Whenever a row is inserted into the table, the identity column will get a unique identifier assigned to it. It can be used to generate the identifiers for objects, but once again is available only when the underlying database supports it. Identity is often used when database sequences are not supported by the database or because a legacy schema has already defined the table to use identity columns. They are generally less efficient for object-relational identifier generation because they cannot be allocated in blocks and because the identifier is not available until after commit time.
There is no generator annotation for IDENTITY because it must be defined as part of the database schema definition for the primary key column of the entity. Because each entity primary key column defines its own identity characteristic, IDENTITY generation cannot be shared across multiple entity types.
Another difference, hinted at earlier, between using IDENTITY and other ID generation strategies is that the identifier will not be accessible until after the insert has occurred. Although no guarantee is made about the accessibility of the identifier before the transaction has completed, it is at least possible for other types of generation to eagerly allocate the identifier. But when using identity, it is the action of inserting that causes the identifier to be generated. It would be impossible for the identifier to be available before the entity is inserted into the database, and because insertion of entities is most often deferred until commit time, the identifier would not be available until after the transaction has been committed .
If you use IDENTITY, make sure you are aware of what your persistence provider is doing and that it matches your requirements. Some providers eagerly insert (when the persist method is invoked) entities that are configured to use IDENTITY ID generation, instead of waiting until commit time. This will allow the ID to be available immediately, at the expense of premature locking and reduced concurrency. Some providers even have an option that allows you to configure which approach is used.
Relationships
If entities contained only simple persistent state, the business of object-relational mapping would be a trivial one, indeed. Most entities need to be able to reference, or have relationships with, other entities. This is what produces the domain model graphs that are common in business applications.
In the following sections, we explore the different kinds of relationships that can exist and show how to define and map them using Jakarta Persistence mapping metadata.
Relationship Concepts
Before we go off and start mapping relationships, let’s take a quick tour through some of the basic relationship concepts and terminology. Having a firm grasp on these concepts will make it easier to understand the remainder of the relationship mapping sections.
Roles
There is an old adage that says every story has three sides: yours, mine, and the truth. Relationships are kind of the same in that there are three different perspectives. The first is the view from one side of the relationship, the second is from the other side, and the third is from a global perspective that knows about both sides. The “sides” are called roles. In every relationship there are two entities that are related to one another, and each entity is said to play a role in the relationship.
Relationships are everywhere, so examples are not hard to come by. An employee has a relationship to the department that he or she works in. The Employee entity plays the role of working in the department, while the Department entity plays the role of having an employee working in it.
Of course, the role a given entity is playing differs according to the relationship, and an entity might be participating in many different relationships with many different entities. We can conclude, therefore, that any entity might be playing a number of different roles in any given model. If we think of an Employee entity, we realize that it does, in fact, play other roles in other relationships, such as the role of working for a manager in its relationship with another Employee entity, working on a project in its relationship with the Project entity, and so forth. Although there are no metadata requirements to declare the role an entity is playing, roles are nevertheless still helpful as a means of understanding the nature and structure of relationships.
Directionality
In order to have relationships at all, there has to be a way to create, remove, and maintain them. The basic way this is done is by an entity having a relationship attribute that refers to its related entity in a way that identifies it as playing the other role of the relationship. It is often the case that the other entity, in turn, has an attribute that points back to the original entity. When each entity points to the other, the relationship is bidirectional. If only one entity has a pointer to the other, the relationship is said to be unidirectional.
As you will see later in the chapter, although they both share the same concept of directionality, the object and data models each see it a little differently because of the paradigm difference. In some cases, unidirectional relationships in the object model can pose a problem in the database model.
Cardinality
It isn’t very often that a project has only a single employee working on it. We would like to be able to capture the aspect of how many entities exist on each side of the same relationship instance. This is called the cardinality of the relationship. Each role in a relationship will have its own cardinality, which indicates whether there can be only one instance of the entity or many instances.
Each employee can work on a number of projects.
Many employees can work on the same project.
Each project can have a number of employees working on it.
Many projects can have the same employee working on them.
Implicit in this model is the fact that there can be sharing of Employee and Project instances across multiple relationship instances.
Ordinality
A role can be further specified by determining whether or not it might be present at all. This is called the ordinality , and it serves to show whether the target entity needs to be specified when the source entity is created. Because the ordinality is really just a Boolean value, it is also referred to as the optionality of the relationship.
In cardinality terms, ordinality would be indicated by the cardinality being a range instead of a simple value, and the range would begin with 0 or 1 depending on the ordinality. It is simpler, though, to merely state that the relationship is either optional or mandatory. If optional, the target might not be present; if mandatory, a source entity without a reference to its associated target entity is in an invalid state.
Mappings Overview
Now that you know enough theory and have the conceptual background to be able to discuss relationships, we can go on to explaining and using relationship mappings.
Many-to-one
One-to-one
One-to-many
Many-to-many
These mapping names are also the names of the annotations that are used to indicate the relationship types on the attributes that are being mapped. They are the basis for the logical relationship annotations, and they contribute to the object modeling aspects of the entity. Like basic mappings, relationship mappings can be applied to either fields or properties of the entity.
Single-Valued Associations
An association from an entity instance to another entity instance (where the cardinality of the target is “one”) is called a single-valued association. The many-to-one and one-to-one relationship mappings fall into this category because the source entity refers to at most one target entity. We discuss these relationships and some of their variants first.
Many-to-One Mappings
In our cardinality discussion of the Employee and Department relationship (shown in Figure 4-8), we first thought of an employee working in a department, so we just assumed that it was a one-to-one relationship. However, when we realized that more than one employee works in the same department, we changed it to a many-to-one relationship mapping. It turns out that many-to-one is the most common mapping and is the one that is normally used when creating an association to an entity.
Many-to-One Relationship from Employee to Department
We have included only the bits of the class that are relevant to our discussion, but you can see from the previous example that the code was rather anticlimactic. A single annotation was all that was required to map the relationship, and it turned out to be quite dull, really. Of course, when it comes to configuration, dull is beautiful.
The same kinds of attribute flexibility and modifier requirements that were described for basic mappings also apply to relationship mappings. The annotation can be present on either the field or property, depending on the strategy used for the entity .
Using Join Columns
In the database , a relationship mapping means that one table has a reference to another table. The database term for a column that refers to a key (usually the primary key) in another table is a foreign key column . In Jakarta Persistence, they’re called join columns, and the @JoinColumn annotation is the primary annotation used to configure these types of columns.
Later in the chapter, we talk about join columns that are present in other tables called join tables. In Chapter 10, we cover a more advanced case of using a join table for single-valued associations.
In almost every relationship, independent of source and target sides, one of the two sides will have the join column in its table. That side is called the owning side or the owner of the relationship. The side that does not have the join column is called the nonowning or inverse side.
Ownership is important for mapping because the physical annotations that define the mappings to the columns in the database (e.g., @JoinColumn) are always defined on the owning side of the relationship. If they are not there, the values are defaulted from the perspective of the attribute on the owning side.
Although we have described the owning side as being determined by the data schema, the object model must indicate the owning side through the use of the relationship mapping annotations. The absence of the mappedBy element in the mapping annotation implies ownership of the relationship, while the presence of the mappedBy element means the entity is on the inverse side of the relationship. The mappedBy element is described in subsequent sections.
Many-to-one mappings are always on the owning side of a relationship, so if there is a @JoinColumn to be found in the relationship that has a many-to-one side, that is where it will be located. To specify the name of the join column, the name element is used. For example, the @JoinColumn(name="DEPT_ID") annotation means that the DEPT_ID column in the source entity table is the foreign key to the target entity table, whatever the target entity of the relationship happens to be.
If no @JoinColumn annotation accompanies the many-to-one mapping, a default column name will be assumed. The name that is used as the default is formed from a combination of both the source and target entities. It is the name of the relationship attribute in the source entity, which is department in our example, plus an underscore character (_), plus the name of the primary key column of the target entity. So if the Department entity were mapped to a table that had a primary key column named ID, the join column in the EMPLOYEE table would be assumed to be named DEPARTMENT_ID. If this is not actually the name of the column, the @JoinColumn annotation must be defined to override the default.
Many-to-One Relationship Overriding the Join Column
Annotations allow us to specify @JoinColumn on either the same line as @ManyToOne or on a separate line, above or below it. By convention, the logical mapping should appear first, followed by the physical mapping. This makes the object model clear because the physical part is less important to the object model.
One-to-One Mappings
We define the mapping in a similar way to the way we define a many-to-one mapping, except that we use the @OneToOne annotation instead of a @ManyToOne annotation on the parkingSpace attribute. Just as with a many-to-one mapping, the one-to-one mapping has a join column in the database and needs to override the name of the column in a @JoinColumn annotation when the default name does not apply. The default name is composed the same way as for many-to-one mappings using the name of the source attribute and the target primary key column name.
As it turns out, one-to-one mappings are almost the same as many-to-one mappings except that only one instance of the source entity can refer to the same target entity instance. In other words, the target entity instance is not shared among the source entity instances. In the database, this equates to having a uniqueness constraint on the source foreign key column (i.e., the foreign key column in the source entity table). If there were more than one foreign key value that was the same, it would contravene the rule that no more than one source entity instance can refer to the same target entity instance.
One-to-One Relationship from Employee to ParkingSpace
Bidirectional One-to-One Mappings
You already learned that the entity table that contains the join column determines the entity that is the owner of the relationship. In a bidirectional one-to-one relationship, both the mappings are one-to-one mappings, and either side can be the owner, so the join column might end up being on one side or the other. This would normally be a data modeling decision, not a Java programming decision, and it would likely be decided based on the most frequent direction of traversal.
Inverse Side of a Bidirectional One-to-One Relationship
The mappedBy element in the one-to-one mapping of the employee attribute of ParkingSpace is needed to refer to the parkingSpace attribute in the Employee class. The value of mappedBy is the name of the attribute in the owning entity that points back to the inverse entity.
The @JoinColumn annotation goes on the mapping of the entity that is mapped to the table containing the join column, or the owner of the relationship. This might be on either side of the association.
The mappedBy element should be specified in the @OneToOne annotation in the entity that does not define a join column, or the inverse side of the relationship.
It would not be legal to have a bidirectional association that had mappedBy on both sides, just as it would be incorrect to not have it on either side. The difference is that if it were absent on both sides of the relationship, the provider would treat each side as an independent unidirectional relationship. This would be fine except that it would assume that each side was the owner and that each had a join column.
Bidirectional many-to-one relationships are explained later as part of the discussion of multivalued bidirectional associations .
Collection-Valued Associations
When the source entity references one or more target entity instances, a many-valued association or associated collection is used. Both the one-to-many and many-to-many mappings fit the criteria of having many target entities, and although the one-to-many association is the most frequently used, many-to-many mappings are useful as well when there is sharing in both directions.
One-to-Many Mappings
As mentioned earlier, when a relationship is bidirectional, there are actually two mappings, one for each direction. A bidirectional one-to-many relationship always implies a many-to-one mapping back to the source, so in our Employee and Department example, there is a one-to-many mapping from Department to Employee and a many-to-one mapping from Employee back to Department. We could just as easily say that the relationship is bidirectional many-to-one if we were looking at it from the Employee perspective. They are equivalent because bidirectional many-to-one relationships imply a one-to-many mapping back from the target to source, and vice versa.
When a source entity has an arbitrary number of target entities stored in its collection, there is no scalable way to store those references in the database table that it maps to. How would it store an arbitrary number of foreign keys in a single row? Instead, it must let the tables of the entities in the collection have foreign keys back to the source entity table. This is why the one-to-many association is almost always bidirectional and the “one” side is not normally the owning side.
Furthermore, if the target entity tables have foreign keys that point back to the source entity table, the target entities should have many-to-one associations back to the source entity object. Having a foreign key in a table for which there is no association in the corresponding entity object model is not being true to the data model. It is nonetheless still possible to configure, though.
Let’s look at a concrete example of a one-to-many mapping based on the Employee and Department example shown in Figure 4-15. The tables for this relationship are exactly the same as those shown in Figure 4-11, which showed a many-to-one relationship. The only difference between the many-to-one example and this one is that we are now implementing the inverse side of the relationship. Because Employee has the join column and is the owner of the relationship, the Employee class is unchanged from Listing 4-16.
One-to-Many Relationship
There are a couple of noteworthy points to mention about this class. The first is that a generic type-parameterized Collection is being used to store the Employee entities. This provides the strict typing that guarantees that only objects of type Employee will exist in the Collection. This is quite useful because it not only provides compile-time checking of our code but also saves us from having to perform cast operations when we retrieve the Employee instances from the collection.
Using targetEntity
The many-to-one side should be the owning side, so the join column should be defined on that side.
The one-to-many mapping should be the inverse side, so the mappedBy element should be used.
Failing to specify the mappedBy element in the @OneToMany annotation will cause the provider to treat it as a unidirectional one-to-many relationship that is defined to use a join table (described later). This is an easy mistake to make and should be the first thing you look for if you see a missing table error with a name that has two entity names concatenated together .
Many-to-Many Mappings
Many-to-Many Relationship Between Employee and Project
There are some important differences between this many-to-many relationship and the one-to-many relationship discussed earlier. The first is a mathematical inevitability: when a many-to-many relationship is bidirectional, both sides of the relationship are many-to-many mappings.
The second difference is that there are no join columns on either side of the relationship. You will see in the next section that the only way to implement a many-to-many relationship is with a separate join table. The consequence of not having any join columns in either of the entity tables is that there is no way to determine which side is the owner of the relationship. Because every bidirectional relationship has to have both an owning side and an inverse side, we must pick one of the two entities to be the owner. In this example, we picked Employee to be owner of the relationship, but we could have just as easily picked Project instead. As in every other bidirectional relationship, the inverse side must use the mappedBy element to identify the owning attribute.
Note that no matter which side is designated as the owner, the other side should include the mappedBy element; otherwise, the provider will think that both sides are the owner and that the mappings are separate unidirectional relationships .
Using Join Tables
Because the multiplicity of both sides of a many-to-many relationship is plural, neither of the two entity tables can store an unlimited set of foreign key values in a single entity row. We must use a third table to associate the two entity types. This association table is called a join table, and each many-to-many relationship must have one. They might be used for the other relationship types as well, but are not required and are therefore less common.
A join table consists simply of two foreign key or join columns to refer to each of the two entity types in the relationship. A collection of entities is then mapped as multiple rows in the table, each of which associates one entity with another. The set of rows that contain a given entity identifier in the source foreign key column represents the collection of entities related to that given entity.
Using a Join Table
The @JoinTable annotation is used to configure the join table for the relationship. The two join columns in the join table are distinguished by means of the owning and inverse sides. The join column to the owning side is described in the joinColumns element, while the join column to the inverse side is specified by the inverseJoinColumns element. You can see from Listing 4-23 that the values of these elements are actually @JoinColumn annotations embedded within the @JoinTable annotation. This provides the ability to declare all of the information about the join columns within the table that defines them. The names are plural for times when there might be multiple columns for each foreign key (either the owning entity or the inverse entity has a multipart primary key). This more complicated case is discussed in Chapter 10.
In our example, we fully specified the names of the join table and its columns because this is the most common case. But if we were generating the database schema from the entities, we would not actually need to specify this information. We could have relied on the default values that would be assumed and used when the persistence provider generates the table for us. When no @JoinTable annotation is present on the owning side, then a default join table named <Owner>_<Inverse> is assumed, where <Owner> is the name of the owning entity and <Inverse> is the name of the inverse or nonowning entity. Of course, the owner is basically picked at random by the developer, so these defaults will apply according to the way the relationship is mapped and whichever entity is designated as the owning side.
The join columns will be defaulted according to the join column defaulting rules that were previously described in the section “Using Join Columns.” The default name of the join column that points to the owning entity is the name of the attribute on the inverse entity that points to the owning entity, appended by an underscore and the name of the primary key column of the owning entity table. So in our example, the Employee is the owning entity, and the Project has an employees attribute that contains the collection of Employee instances. The Employee entity maps to the EMPLOYEE table and has a primary key column of ID, so the defaulted name of the join column to the owning entity would be EMPLOYEES_ID. The inverse join column would be likewise default to PROJECTS_ID.
It is fairly clear that the defaulted names of a join table and the join columns within it are not likely to match up with an existing table. This is why we mentioned that the defaults are really useful only if the database schema being mapped to was generated by the provider.
Unidirectional Collection Mappings
Similarly, when one side of a many-to-many relationship does not have a mapping to the other, it is a unidirectional relationship. The join table must still be used; the only difference is that only one of the two entity types actually uses the table to load its related entities or updates it to store additional entity associations.
Unidirectional One-to-Many Relationship
Note that when generating the schema, default naming for the join columns is slightly different in the unidirectional case because there is no inverse attribute. The name of the join table would default to EMPLOYEE_PHONE and would have a join column named EMPLOYEE_ID after the name of the Employee entity and its primary key column. The inverse join column would be named PHONES_ID, which is the concatenation of the phones attribute in the Employee entity and the ID primary key column of the PHONE table .
Lazy Relationships
Previous sections showed how to configure an attribute to be loaded when it got accessed and not necessarily before. You learned that lazy loading at the attribute level is not normally very beneficial.
At the relationship level, however, lazy loading can be a big boon to enhancing performance. It can reduce the amount of SQL that gets executed, and speed up queries and object loading considerably.
The fetch mode can be specified on any of the four relationship mapping types. When not specified on a single-valued relationship, the related object is guaranteed to be loaded eagerly. Collection-valued relationships default to be lazily loaded, but because lazy loading is only a hint to the provider, they can be loaded eagerly if the provider decides to do so.
In bidirectional relationship cases, the fetch mode might be lazy on one side but eager on the other. This kind of configuration is actually quite common because relationships are often accessed in different ways depending on the direction from which navigation occurs.
Changing the Fetch Mode on a Relationship
A relationship that is specified or defaulted to be lazily loaded might or might not cause the related object to be loaded when the getter method is used to access the object. The object might be a proxy, so it might take actually invoking a method on it to cause it to be faulted in.
Embedded Objects
An embedded object is one that is dependent on an entity for its identity. It has no identity of its own, but is merely part of the entity state that has been carved off and stored in a separate Java object hanging off of the entity. In Java, embedded objects appear similar to relationships in that they are referenced by an entity and appear in the Java sense to be the target of an association. In the database, however, the state of the embedded object is stored with the rest of the entity state in the database row, with no distinction between the state in the Java entity and that in its embedded object.
Although embedded objects are referenced by the entities that own them, they are not said to be in relationships with the entities. The term relationship can only be applied when both sides are entities.
If the database row contains all the data for both the entity and its embedded object, why have such an object anyway? Why not just define the fields of the entity to reference all its persistence state instead of splitting it up into one or more subobjects that are second-class persistent objects dependent on the entity for their existence?
This brings us back to the object-relational impedance mismatch we talked about in Chapter 1. Because the database record contains more than one logical type, it makes sense to make that separation explicit in the object model of the application even though the physical representation is different. You could almost say that the embedded object is a more natural representation of the domain concept than a simple collection of attributes on the entity. Furthermore, once you have identified a grouping of entity state that makes up an embedded object, you can share the same embedded object type with other entities that also have the same internal representation.1
With this representation, not only is the address information neatly encapsulated within an object but if another entity such as Company also has address information, it can also have an attribute that points to its own embedded Address object. We describe this scenario in the next section.
Embeddable Address Type
Using an Embedded Object
When the provider persists an instance of Employee, it will access the attributes of the Address object just as if they were present on the entity instance itself. Column mappings on the Address type really pertain to columns on the EMPLOYEE table, even though they are listed in a different type.
The decision to use embedded objects or entities depends on whether you think you will ever need to create relationships to them or from them. Embedded objects are not meant to be entities, and as soon as you start to treat them as entities, you should probably make them first-class entities instead of embedded objects if the data model permits it.
It is not portable to define embedded objects as part of inheritance hierarchies. Once they begin to extend one another, the complexity of embedding them increases, and the value for cost ratio decreases.
We use an @AttributeOverride annotation for each attribute of the embedded object that we want to override in the entity. We annotate the embedded field or property in the entity and specify in the name element the field or property in the embedded object that we are overriding. The column element allows us to specify the column that the attribute is being mapped to in the entity table. We indicate this in the form of a nested @Column annotation. If we are overriding multiple fields or properties, we can use the plural @AttributeOverrides annotation and nest multiple @AttributeOverride annotations inside of it. Note that since the @AttributeOverride annotation is @Repeatable, usage of @AttributeOverrides annotation is not mandatory.
Reusing an Embedded Object in Multiple Entities
Summary
Mapping objects to relational databases is of critical importance to persistence applications. Dealing with the impedance mismatch requires a sophisticated suite of metadata. Jakarta Persistence not only provides this metadata but also facilitates easy and convenient development.
In this chapter, we went through the process of mapping entity state that included simple Java types, large objects, enumerated types, and temporal types. We also used the metadata to do meet-in-the-middle mapping to specific table names and columns.
We explained how identifiers are generated and described four different strategies of generation. You saw the different strategies in action and learned how to differentiate them from each other.
We then reviewed some of the relationship concepts and applied them to object-relational mapping metadata. We used join columns and join tables to map single-valued and collection-valued associations and went over some examples. We also discussed special types of objects called embeddables that are mapped but do not have identifiers and can exist only within persistent entities.
The next chapter discusses more of the intricacies of mapping collection-valued relationships, as well as how to map collections of nonentity objects. We delve into the different Collection types and the ways that these types can be used and mapped, and see how they affect the database tables that are being mapped to.