In this chapter
This chapter presents some fundamental mapping options and explains how to map entity classes to SQL tables. We show and discuss how you can handle database identity and primary keys, and how you can use various other metadata settings to customize how Hibernate loads and stores instances of your domain model classes. All mapping examples use JPA annotations. First, though, we define the essential distinction between entities and value types, and explain how you should approach the object/relational mapping of your domain model.
You can globally enable escaping of all names in generated SQL statements with the <delimited-identifiers> element in the persistence.xml configuration file.
When you look at your domain model, you’ll notice a difference between classes: some of the types seem more important, representing first-class business objects (the term object is used here in its natural sense). Examples are the Item, Category, and User classes: these are entities in the real world you’re trying to represent (refer back to figure 3.3 for a view of the example domain model). Other types present in your domain model, such as Address, String, and Integer, seem less important. In this section, we look at what it means to use fine-grained domain models and making the distinction between entity and value types.
A major objective of Hibernate is support for fine-grained and rich domain models. It’s one reason we work with POJOs. In crude terms, fine-grained means more classes than tables.
For example, a user may have a home address in your domain model. In the database, you may have a single USERS table with the columns HOME_STREET, HOME_CITY, and HOME_ZIPCODE. (Remember the problem of SQL types we discussed in section 1.2.1?)
In the domain model, you could use the same approach, representing the address as three string-valued properties of the User class. But it’s much better to model this using an Address class, where User has a homeAddress property. This domain model achieves improved cohesion and greater code reuse, and it’s more understandable than SQL with inflexible type systems.
JPA emphasizes the usefulness of fine-grained classes for implementing type safety and behavior. For example, many people model an email address as a string-valued property of User. A more sophisticated approach is to define an EmailAddress class, which adds higher-level semantics and behavior—it may provide a prepareMail() method (it shouldn’t have a sendMail() method, because you don’t want your domain model classes to depend on the mail subsystem).
This granularity problem leads us to a distinction of central importance in ORM. In Java, all classes are of equal standing—all instances have their own identity and life cycle. When you introduce persistence, some instances may not have their own identity and life cycle but depend on others. Let’s walk through an example.
Two people live in the same house, and they both register user accounts in Caveat-Emptor. Let’s call them John and Jane.
An instance of User represents each account. Because you want to load, save, and delete these User instances independently, User is an entity class and not a value type. Finding entity classes is easy.
The User class has a homeAddress property; it’s an association with the Address class. Do both User instances have a runtime reference to the same Address instance, or does each User instance have a reference to its own Address? Does it matter that John and Jane live in the same house?
In figure 4.1, you can see how two User instances share a single Address instance (this is a UML object diagram, not a class diagram). If Address is supposed to support shared runtime references, it’s an entity type. The Address instance has its own life, you can’t delete it when John removes his User account—Jane might still have a reference to the Address.
Now let’s look at the alternative model where each User has a reference to its own homeAddress instance, as shown in figure 4.2. In this case, you can make an instance of Address dependent on an instance of User: you make it a value type. When John removes his User account, you can safely delete his Address instance. Nobody else will hold a reference.
Hence, we make the following essential distinction:
If you read the JPA specification, you’ll find the same concept. But value types in JPA are called basic property types or embeddable classes. We come back to this in the next chapter; first our focus is on entities.
Identifying entities and value types in your domain model isn’t an ad hoc task but follows a certain procedure.
You may find it helpful to add stereotype (a UML extensibility mechanism) information to your UML class diagrams so you can immediately recognize entities and value types. This practice also forces you to think about this distinction for all your classes, which is a first step to an optimal mapping and well-performing persistence layer. Figure 4.3 shows an example.
The Item and User classes are obvious entities. They each have their own identity, their instances have references from many other instances (shared references), and they have independent lifespans.
Marking the Address as a value type is also easy: a single User instance references a particular Address instance. You know this because the association has been created as a composition, where the User instance has been made fully responsible for the life cycle of the referenced Address instance. Therefore, Address instances can’t be referenced by anyone else and don’t need their own identity.
The Bid class could be a problem. In object-oriented modeling, this is marked as a composition (the association between Item and Bid with the diamond). Thus, an Item is the owner of its Bid instances and holds a collection of references. At first, this seems reasonable, because bids in an auction system are useless when the item they were made for is gone.
But what if a future extension of the domain model requires a User#bids collection, containing all bids made by a particular User? Right now, the association between Bid and User is unidirectional; a Bid has a bidder reference. What if this was bidirectional?
In that case, you have to deal with possible shared references to Bid instances, so the Bid class needs to be an entity. It has a dependent life cycle, but it must have its own identity to support (future) shared references.
You’ll often find this kind of mixed behavior; but your first reaction should be to make everything a value typed class and promote it to an entity only when absolutely necessary. Try to simplify your associations: persistent collections, for example, frequently add complexity without offering any advantages. Instead of mapping Item#bids and User#bids collections, you can write queries to obtain all the bids for an Item and those made by a particular User. The associations in the UML diagram would point from the Bid to the Item and User, unidirectionally, and not the other way. The stereotype on the Bid class would then be <<Value type>>. We come back to this point again in chapter 7.
Next, take your domain model diagram and implement POJOs for all entities and value types. You’ll have to take care of three things:
We come back to references, associations, and life cycle rules when we discuss more-advanced mappings throughout later chapters in this book. Object identity and identifier properties are our next topic.
Mapping entities with identity requires you to understand Java identity and equality before we can walk through an entity class example and its mapping. After that, we’ll be able to dig in deeper and select a primary key, configure key generators, and finally go through identifier generator strategies. First, it’s vital to understand the difference between Java object identity and object equality before we discuss terms like database identity and the way JPA manages identity.
Java developers understand the difference between Java object identity and equality. Object identity (==) is a notion defined by the Java virtual machine. Two references are identical if they point to the same memory location.
On the other hand, object equality is a notion defined by a class’s equals() method, sometimes also referred to as equivalence. Equivalence means two different (non-identical) instances have the same value—the same state. Two different instances of String are equal if they represent the same sequence of characters, even though each has its own location in the memory space of the virtual machine. (If you’re a Java guru, we acknowledge that String is a special case. Assume we used a different class to make the same point.)
Persistence complicates this picture. With object/relational persistence, a persistent instance is an in-memory representation of a particular row (or rows) of a database table (or tables). Along with Java identity and equality, we define database identity. You now have three methods for distinguishing references:
We now need to look at how database identity relates to object identity and how to express database identity in the mapping metadata. As an example, you’ll map an entity of a domain model.
We weren’t completely honest in the previous chapter: the @Entity annotation isn’t enough to map a persistent class. You also need an @Id annotation, as shown in the following listing.
This is the most basic entity class, marked as “persistence capable” with the @Entity annotation, and with an @Id mapping for the database identifier property. The class maps by default to a table named ITEM in the database schema.
Every entity class has to have an @Id property; it’s how JPA exposes database identity to the application. We don’t show the identifier property in our diagrams; we assume that each entity class has one. In our examples, we always name the identifier property id. This is a good practice for your own project; use the same identifier property name for all your domain model entity classes. If you specify nothing else, this property maps to a primary key column named ID of the ITEM table in your database schema.
Hibernate will use the field to access the identifier property value when loading and storing items, not getter or setter methods. Because @Id is on a field, Hibernate will now enable every field of the class as a persistent property by default. The rule in JPA is this: if @Id is on a field, the JPA provider will access fields of the class directly and consider all fields part of the persistent state by default. You’ll see how to override this later in this chapter—in our experience, field access is often the best choice, because it gives you more freedom for accessor method design.
Should you have a (public) getter method for the identifier property? Well, the application often uses database identifiers as a convenient handle to a particular instance, even outside the persistence layer. For example, it’s common for web applications to display the results of a search screen to the user as a list of summaries. When the user selects a particular element, the application may need to retrieve the selected item, and it’s common to use a lookup by identifier for this purpose—you’ve probably already used identifiers this way, even in applications that rely on JDBC.
Should you have a setter method? Primary key values never change, so you shouldn’t allow modification of the identifier property value. Hibernate won’t update a primary key column, and you shouldn’t expose a public identifier setter method on an entity.
The Java type of the identifier property, java.lang.Long in the previous example, depends on the primary key column type of the ITEM table and how key values are produced. This brings us to the @GeneratedValue annotation and primary keys in general.
The database identifier of an entity is mapped to some table primary key, so let’s first get some background on primary keys without worrying about mappings. Take a step back and think about how you identify entities.
A candidate key is a column or set of columns that you could use to identify a particular row in a table. To become the primary key, a candidate key must satisfy the following requirements:
The relational model defines that a candidate key must be unique and irreducible (no subset of the key attributes has the uniqueness property). Beyond that, picking a candidate key as the primary key is a matter of taste. But Hibernate expects a candidate key to be immutable when used as the primary key. Hibernate doesn’t support updating primary key values with an API; if you try to work around this requirement, you’ll run into problems with Hibernate’s caching and dirty-checking engine. If your database schema relies on updatable primary keys (and maybe uses ON UPDATE CASCADE foreign key constraints), you must change the schema before it will work with Hibernate.
If a table has only one identifying attribute, it becomes, by definition, the primary key. But several columns or combinations of columns may satisfy these properties for a particular table; you choose between candidate keys to decide the best primary key for the table. You should declare candidate keys not chosen as the primary key as unique keys in the database if their value is indeed unique (but maybe not immutable).
Many legacy SQL data models use natural primary keys. A natural key is a key with business meaning: an attribute or combination of attributes that is unique by virtue of its business semantics. Examples of natural keys are the US Social Security Number and Australian Tax File Number. Distinguishing natural keys is simple: if a candidate key attribute has meaning outside the database context, it’s a natural key, regardless of whether it’s automatically generated. Think about the application users: if they refer to a key attribute when talking about and working with the application, it’s a natural key: “Can you send me the pictures of item #123-abc?”
Experience has shown that natural primary keys usually cause problems in the end. A good primary key must be unique, immutable, and never null. Few entity attributes satisfy these requirements, and some that do can’t be efficiently indexed by SQL databases (although this is an implementation detail and shouldn’t be the deciding factor for or against a particular key). In addition, you should make certain that a candidate key definition never changes throughout the lifetime of the database. Changing the value (or even definition) of a primary key, and all foreign keys that refer to it, is a frustrating task. Expect your database schema to survive decades, even if your application won’t.
Furthermore, you can often only find natural candidate keys by combining several columns in a composite natural key. These composite keys, although certainly appropriate for some schema artifacts (like a link table in a many-to-many relationship), potentially make maintenance, ad hoc queries, and schema evolution much more difficult. We talk about composite keys later in the book, in section 9.2.1.
For these reasons, we strongly recommend that you add synthetic identifiers, also called surrogate keys. Surrogate keys have no business meaning—they have unique values generated by the database or application. Application users ideally don’t see or refer to these key values; they’re part of the system internals. Introducing a surrogate key column is also appropriate in the common situation when there are no candidate keys. In other words, (almost) every table in your schema should have a dedicated surrogate primary key column with only this purpose.
There are a number of well-known approaches to generating surrogate key values. The aforementioned @GeneratedValue annotation is how you configure this.
The @Id annotation is required to mark the identifier property of an entity class. Without the @GeneratedValue next to it, the JPA provider assumes that you’ll take care of creating and assigning an identifier value before you save an instance. We call this an application-assigned identifier. Assigning an entity identifier manually is necessary when you’re dealing with a legacy database and/or natural primary keys. We have more to say about this kind of mapping in a dedicated section, 9.2.1.
Usually you want the system to generate a primary key value when you save an entity instance, so you write the @GeneratedValue annotation next to @Id. JPA standardizes several value-generation strategies with the javax.persistence.Generation-Type enum, which you select with @GeneratedValue(strategy = ...):
Although AUTO seems convenient, you need more control, so you usually shouldn’t rely on it and explicitly configure a primary key generation strategy. In addition, most applications work with database sequences, but you may want to customize the name and other settings of the database sequence. Therefore, instead of picking one of the JPA strategies, we recommend a mapping of the identifier with @GeneratedValue(generator = "ID_GENERATOR"), as shown in the previous example.
This is a named identifier generator; you are now free to set up the ID_GENERATOR configuration independently from your entity classes.
JPA has two built-in annotations you can use to configure named generators: @javax.persistence.SequenceGenerator and @javax.persistence.TableGenerator. With these annotations, you can create a named generator with your own sequence and table names. As usual with JPA annotations, you can unfortunately only use them at the top of a (maybe otherwise empty) class, and not in a package-info.java file.
For this reason, and because the JPA annotations don’t give us access to the full Hibernate feature set, we prefer an alternative: the native @org.hibernate.annotations.GenericGenerator annotation. It supports all Hibernate identifier generator strategies and their configuration details. Unlike the rather limited JPA annotations, you can use the Hibernate annotation in a package-info.java file, typically in the same package as your domain model classes. The next listing shows a recommended configuration.
This Hibernate-specific generator configuration has the following advantages:
You can share the same database sequence among all your domain model classes. There is no harm in specifying @GeneratedValue(generator = "ID_GENERATOR") in all your entity classes. It doesn’t matter if primary key values aren’t contiguous for a particular entity, as long as they’re unique within one table. If you’re worried about contention, because the sequence has to be called prior to every INSERT, we discuss a variation of this generator configuration later, in section 20.1.
Finally, you use java.lang.Long as the type of the identifier property in the entity class, which maps perfectly to a numeric database sequence generator. You could also use a long primitive. The main difference is what someItem.getId() returns on a new item that hasn’t been stored in the database: either null or 0. If you want to test whether an item is new, a null check is probably easier to understand for someone else reading your code. You shouldn’t use another integral type such as int or short for identifiers. Although they will work for a while (perhaps even years), as your database size grows, you may be limited by their range. An Integer would work for almost two months if you generated a new identifier each millisecond with no gaps, and a Long would last for about 300 million years.
Although recommended for most applications, the enhanced-sequence strategy as shown in listing 4.2 is just one of the strategies built into Hibernate.
Following is a list of all available Hibernate identifier generator strategies, their options, and our usage recommendations. If you don’t want to read the whole list now, enable GenerationType.AUTO and check what Hibernate defaults to for your database dialect. It’s most likely sequence or identity—a good but maybe not the most efficient or portable choice. If you require consistent portable behavior, and identifier values to be available before INSERTs, use enhanced-sequence, as shown in the previous section. This is a portable, flexible, and modern strategy, also offering various optimizers for large datasets.
We also show the relationship between each standard JPA strategy and its native Hibernate equivalent. Hibernate has been growing organically, so there are now two sets of mappings between standard and native strategies; we call them Old and New in the list. You can switch this mapping with the hibernate.id.new_generator_mappings setting in your persistence.xml file. The default is true; hence the New mapping. Software doesn’t age quite as well as wine:
An ORM service tries to optimize SQL INSERTs: for example, by batching several at the JDBC level. Hence, SQL execution occurs as late as possible during a unit of work, not when you call entityManager.persist(someItem). This merely queues the insertion for later execution and, if possible, assigns the identifier value. But if you now call someItem.getId(), you might get null back if the engine wasn’t able to generate an identifier before the INSERT. In general, we prefer pre-insert generation strategies that produce identifier values independently, before INSERT. A common choice is a shared and concurrently accessible database sequence. Autoincremented columns, column default values, or trigger-generated keys are only available after the INSERT.
To summarize, our recommendations on identifier generator strategies are as follows:
We assume from now on that you’ve added identifier properties to the entity classes of your domain model and that after you complete the basic mapping of each entity and its identifier property, you continue to map the value-typed properties of the entities. We talk about value-type mappings in the next chapter. Read on for some special options that can simplify and enhance your class mappings.
You’ve now mapped a persistent class with @Entity, using defaults for all other settings, such as the mapped SQL table name. The following section explores some class-level options and how you control them:
These are options; you can skip this section and come back later when you have to deal with a specific problem.
Let’s first talk about the naming of entity classes and tables. If you only specify @Entity on the persistence-capable class, the default mapped table name is the same as the class name. Note that we write SQL artifact names in UPPERCASE to make them easier to distinguish—SQL is actually case insensitive. So the Java entity class Item maps to the ITEM table. You can override the table name with the JPA @Table annotation, as shown next.
@Entity @Table(name = "USERS") public class User implements Serializable { <enter/> // ... }
The User entity would map to the USER table; this is a reserved keyword in most SQL DBMSs. You can’t have a table with that name, so you instead map it to USERS. The @javax.persistence.Table annotation also has catalog and schema options, if your database layout requires these as naming prefixes.
If you really have to, quoting allows you to use reserved SQL names and even work with case-sensitive names.
From time to time, especially in legacy databases, you’ll encounter identifiers with strange characters or whitespace, or wish to force case sensitivity. Or, as in the previous example, the automatic mapping of a class or property would require a table or column name that is a reserved keyword.
Hibernate 5 knows the reserved keywords of your DBMS through the configured database dialect. Hibernate 5 can automatically put quotes around such strings when generating SQL. You can enable this automatic quoting with hibernate.auto_quote _keyword=true in your persistence unit configuration. If you’re using an older version of Hibernate, or you find that the dialect’s information is incomplete, you must still apply quotes on names manually in your mappings if there is a conflict with a keyword.
If you quote a table or column name in your mapping with backticks, Hibernate always quotes this identifier in the generated SQL. This still works in latest versions of Hibernate, but JPA 2.0 standardized this functionality as delimited identifiers with double quotes.
This is the Hibernate-only quoting with backticks, modifying the previous example:
@Table(name = "`USER`")
To be JPA-compliant, you also have to escape the quotes in the string:
@Table(name = ""USER"")
Either way works fine with Hibernate. It knows the native quote character of your dialect and now generates SQL accordingly: [USER] for MS SQL Server, 'USER' for MySQL, "USER" for H2, and so on.
If you have to quote all SQL identifiers, create an orm.xml file and add the setting <delimited-identifiers/> to its <persistence-unit-defaults> section, as shown in listing 3.8. Hibernate then enforces quoted identifiers everywhere.
You should consider renaming tables or columns with reserved keyword names whenever possible. Ad hoc SQL queries are difficult to write in an SQL console if you have to quote and escape everything properly by hand.
Next, you’ll see how Hibernate can help when you encounter organizations with strict conventions for database table and column names.
Hibernate provides a feature that allows you to enforce naming standards automatically. Suppose that all table names in CaveatEmptor should follow the pattern CE_<table name>. One solution is to manually specify an @Table annotation on all entity classes. This approach is time-consuming and easily forgotten. Instead, you can implement Hibernate’s PhysicalNamingStrategy interface or override an existing implementation, as in the following listing.
public class CENamingStrategy extends org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl { <enter/> @Override public Identifier toPhysicalTableName(Identifier name, JdbcEnvironment context) { return new Identifier("CE_" + name.getText(), name.isQuoted()); } <enter/> }
The overridden method toPhysicalTableName() prepends CE_ to all generated table names in your schema. Look at the Javadoc of the PhysicalNamingStrategy interface; it offers methods for custom naming of columns, sequences, and other artifacts.
You have to enable the naming-strategy implementation in persistence.xml:
<persistence-unit>name="CaveatEmptorPU"> ... <properties> <property name="hibernate.physical_naming_strategy" value="org.jpwh.shared.CENamingStrategy"/> </properties> </persistence-unit> <enter/>
A second option for naming customization is ImplicitNamingStrategy. Whereas the physical naming strategy acts at the lowest level, when schema artifact names are ultimately produced, the implicit-naming strategy is called before. If you map an entity class and don’t have an @Table annotation with an explicit name, the implicit-naming strategy implementation is asked what the table name should be. This is based on factors such as the entity name and class name. Hibernate ships with several strategies to implement legacy- or JPA-compliant default names. The default strategy is ImplicitNamingStrategyJpaCompliantImpl.
Let’s have a quick look at another related issue, the naming of entities for queries.
By default, all entity names are automatically imported into the namespace of the query engine. In other words, you can use short class names without a package prefix in JPA query strings, which is convenient:
List result = em.createQuery("select i from Item i") .getResultList();
This only works when you have one Item class in your persistence unit. If you add another Item class in a different package, you should rename one of them for JPA if you want to continue using the short form in queries:
package my.other.model; @javax.persistence.Entity(name = "AuctionItem") public class Item { // ... }
The short query form is now select i from AuctionItem i for the Item class in the my.other.model package. Thus you resolve the naming conflict with another Item class in another package. Of course, you can always use fully qualified long names with the package prefix.
This completes our tour of the naming options in Hibernate. Next, we discuss how Hibernate generates the SQL that contains these names.
By default, Hibernate creates SQL statements for each persistent class when the persistence unit is created, on startup. These statements are simple create, read, update, and delete (CRUD) operations for reading a single row, deleting a row, and so on. It’s cheaper to store these in memory up front, instead of generating SQL strings every time such a simple query has to be executed at runtime. In addition, prepared statement caching at the JDBC level is much more efficient if there are fewer statements.
How can Hibernate create an UPDATE statement on startup? After all, the columns to be updated aren’t known at this time. The answer is that the generated SQL statement updates all columns, and if the value of a particular column isn’t modified, the statement sets it to its old value.
In some situations, such as a legacy table with hundreds of columns where the SQL statements will be large for even the simplest operations (say, only one column needs updating), you should disable this startup SQL generation and switch to dynamic statements generated at runtime. An extremely large number of entities can also impact startup time, because Hibernate has to generate all SQL statements for CRUD up front. Memory consumption for this query statement cache will also be high if a dozen statements must be cached for thousands of entities. This can be an issue in virtual environments with memory limitations, or on low-power devices.
To disable generation of INSERT and UPDATE SQL statements on startup, you need native Hibernate annotations:
@Entity @org.hibernate.annotations.DynamicInsert @org.hibernate.annotations.DynamicUpdate public class Item { // ... }
By enabling dynamic insertion and updates, you tell Hibernate to produce the SQL strings when needed, not up front. The UPDATE will only contain columns with updated values, and the INSERT will only contain non-nullable columns.
We talk again about SQL generation and customizing SQL in chapter 17. Sometimes you can avoid generating an UPDATE statement altogether, if your entity is immutable.
Instances of a particular class may be immutable. For example, in CaveatEmptor, a Bid made for an item is immutable. Hence, Hibernate never needs to execute UPDATE statements on the BID table. Hibernate can also make a few other optimizations, such as avoiding dirty checking, if you map an immutable class as shown in the next example. Here, the Bid class is immutable and instances are never modified:
@Entity @org.hibernate.annotations.Immutable public class Bid { // ... }
A POJO is immutable if no public setter methods for any properties of the class are exposed—all values are set in the constructor. Hibernate should access the fields directly when loading and storing instances. We talked about this earlier in this chapter: if the @Id annotation is on a field, Hibernate will access the fields directly, and you are free to design your getter and setter methods as you see fit. Also, remember that not all frameworks work with POJOs without setter methods; JSF, for example, doesn’t access fields directly to populate an instance.
When you can’t create a view in your database schema, you can map an immutable entity class to an SQL SELECT query.
Sometimes your DBA won’t allow you to change the database schema; even adding a new view might not be possible. Let’s say you want to create a view that contains the identifier of an auction Item and the number of bids made for that item.
Using a Hibernate annotation, you can create an application-level view, a read-only entity class mapped to an SQL SELECT:
When an instance of ItemBidSummary is loaded, Hibernate executes your custom SQL SELECT as a subselect:
ItemBidSummary itemBidSummary = em.find(ItemBidSummary.class, ITEM_ID); // select * from ( // select i.ID as ITEMID, i.ITEM_NAME as NAME, ... // ) where ITEMID = ?
You should list all table names referenced in your SELECT in the @org.hibernate.annotations.Synchronize annotation. (At the time of writing, Hibernate has a bug tracked under issue HHH-8430[1] that makes the synchronized table names case sensitive.) Hibernate will then know it has to flush modifications of Item and Bid instances before it executes a query against ItemBidSummary:
Note that Hibernate doesn’t flush automatically before a find() operation—only before a Query is executed, if necessary. Hibernate detects that the modified Item will affect the result of the query, because the ITEM table is synchronized with ItemBid-Summary. Hence, a flush and the UPDATE of the ITEM row are necessary to avoid the query returning stale data.
3.144.82.154