1.2. The paradigm mismatch

The object/relational paradigm mismatch can be broken into several parts, which we'll examine one at a time. Let's start our exploration with a simple example that is problem free. As we build on it, you'll begin to see the mismatch appear.

Suppose you have to design and implement an online e-commerce application. In this application, you need a class to represent information about a user of the system, and another class to represent information about the user's billing details, as shown in figure 1.1.

In this diagram, you can see that a User has many BillingDetails. You can navigate the relationship between the classes in both directions. The classes representing these entities may be extremely simple:

public class User {
    private String username;
    private String name;
    private String address;
    private Set billingDetails;

    // Accessor methods (getter/setter), business methods, etc.
    ...
}
public class BillingDetails {
    private String accountNumber;
    private String accountName;
    private String accountType;
    private User user;

    // Accessor methods (getter/setter), business methods, etc.
    ...
}

Figure 1-1. A simple UML class diagram of the User and BillingDetails entities

Note that we're only interested in the state of the entities with regard to persistence, so we've omitted the implementation of property accessors and business methods (such as getUsername() or billAuction()).

It's easy to come up with a good SQL schema design for this case:

create table USERS (
    USERNAME varchar(15) not null primary key,
    NAME varchar(50) not null,
    ADDRESS varchar(100)
)
create table BILLING_DETAILS (
    ACCOUNT_NUMBER varchar(10) not null primary key,
    ACCOUNT_NAME varchar(50) not null,
    ACCOUNT_TYPE varchar(2) not null,
    USERNAME varchar(15) foreign key references user
)

The relationship between the two entities is represented as the foreign key, USERNAME, in BILLING_DETAILS. For this simple domain model, the object/relational mismatch is barely in evidence; it's straightforward to write JDBC code to insert, update, and delete information about users and billing details.

Now, let's see what happens when we consider something a little more realistic. The paradigm mismatch will be visible when we add more entities and entity relationships to our application.

The most glaringly obvious problem with our current implementation is that we've designed an address as a simple String value. In most systems, it's necessary to store street, city, state, country, and ZIP code information separately. Of course, we could add these properties directly to the User class, but because it's highly likely that other classes in the system will also carry address information, it makes more sense to create a separate Address class. The updated model is shown in figure 1.2.

Should we also add an ADDRESS table? Not necessarily. It's common to keep address information in the USERS table, in individual columns. This design is likely to perform better, because a table join isn't needed if you want to retrieve the user and address in a single query. The nicest solution may even be to create a user-defined SQL datatype to represent addresses, and to use a single column of that new type in the USERS table instead of several new columns.

Basically, we have the choice of adding either several columns or a single column (of a new SQL datatype). This is clearly a problem of granularity.

Figure 1-2. The User has an Address

1.2.1. The problem of granularity

Granularity refers to the relative size of the types you're working with.

Let's return to our example. Adding a new datatype to our database catalog, to store Address Java instances in a single column, sounds like the best approach. A new Address type (class) in Java and a new ADDRESS SQL datatype should guarantee interoperability. However, you'll find various problems if you check the support for user-defined datatypes (UDT) in today's SQL database management systems.

UDT support is one of a number of so-called object-relational extensions to traditional SQL. This term alone is confusing, because it means that the database management system has (or is supposed to support) a sophisticated datatype system—something you take for granted if somebody sells you a system that can handle data in a relational fashion. Unfortunately, UDT support is a somewhat obscure feature of most SQL database management systems and certainly isn't portable between different systems. Furthermore, the SQL standard supports user-defined datatypes, but poorly.

This limitation isn't the fault of the relational data model. You can consider the failure to standardize such an important piece of functionality as fallout from the object-relational database wars between vendors in the mid-1990s. Today, most developers accept that SQL products have limited type systems—no questions asked. However, even with a sophisticated UDT system in our SQL database management system, we would likely still duplicate the type declarations, writing the new type in Java and again in SQL. Attempts to find a solution for the Java space, such as SQLJ, unfortunately, have not had much success.

For these and whatever other reasons, use of UDTs or Java types inside an SQL database isn't common practice in the industry at this time, and it's unlikely that you'll encounter a legacy schema that makes extensive use of UDTs. We therefore can't and won't store instances of our new Address class in a single new column that has the same datatype as the Java layer.

Our pragmatic solution for this problem has several columns of built-in vendor-defined SQL types (such as boolean, numeric, and string datatypes). The USERS table is usually defined as follows:

create table USERS (
    USERNAME varchar(15) not null primary key,
    NAME varchar(50) not null,
    ADDRESS_STREET varchar(50),
    ADDRESS_CITY varchar(15),
    ADDRESS_STATE varchar(15),

    ADDRESS_ZIPCODE varchar(5),
    ADDRESS_COUNTRY varchar(15)
)

Classes in our domain model come in a range of different levels of granularity—from coarse-grained entity classes like User, to finer-grained classes like Address, down to simple String-valued properties such as zipcode. In contrast, just two levels of granularity are visible at the level of the SQL database: tables such as USERS, and columns such as ADDRESS_ZIPCODE.

Many simple persistence mechanisms fail to recognize this mismatch and so end up forcing the less flexible SQL representation upon the object model. We've seen countless User classes with properties named zipcode!

It turns out that the granularity problem isn't especially difficult to solve. We probably wouldn't even discuss it, were it not for the fact that it's visible in so many existing systems. We describe the solution to this problem in chapter 4, section 4.4, "Fine-grained models and mappings."

A much more difficult and interesting problem arises when we consider domain models that rely on inheritance, a feature of object-oriented design we may use to bill the users of our e-commerce application in new and interesting ways.

1.2.2. The problem of subtypes

In Java, you implement type inheritance using superclasses and subclasses. To illustrate why this can present a mismatch problem, let's add to our e-commerce application so that we now can accept not only bank account billing, but also credit and debit cards. The most natural way to reflect this change in the model is to use inheritance for the BillingDetails class.

We may have an abstract BillingDetails superclass, along with several concrete subclasses: CreditCard, BankAccount, and so on. Each of these subclasses defines slightly different data (and completely different functionality that acts on that data). The UML class diagram in figure 1.3 illustrates this model.

SQL should probably include standard support for supertables and subtables. This would effectively allow us to create a table that inherits certain columns from its parent. However, such a feature would be questionable, because it would introduce a new notion: virtual columns in base tables. Traditionally, we expect virtual columns only in virtual tables, which are called views. Furthermore, on a theoretical level, the inheritance we applied in Java is type inheritance. A table isn't a type, so the notion of supertables and subtables is questionable. In any case, we can take the short route here and observe that SQL database products don't generally implement type or table inheritance, and if they do implement it, they don't follow a standard syntax and usually expose you to data integrity problems (limited integrity rules for updatable views).

Figure 1-3. Using inheritance for different billing strategies

In chapter 5, section 5.1, "Mapping class inheritance," we discuss how ORM solutions such as Hibernate solve the problem of persisting a class hierarchy to a database table or tables. This problem is now well understood in the community, and most solutions support approximately the same functionality.

But we aren't finished with inheritance. As soon as we introduce inheritance into the model, we have the possibility of polymorphism.

The User class has an association to the BillingDetails superclass. This is a polymorphic association. At runtime, a User object may reference an instance of any of the subclasses of BillingDetails. Similarly, we want to be able to write polymorphic queries that refer to the BillingDetails class, and have the query return instances of its subclasses.

SQL databases also lack an obvious way (or at least a standardized way) to represent a polymorphic association. A foreign key constraint refers to exactly one target table; it isn't straightforward to define a foreign key that refers to multiple tables. We'd have to write a procedural constraint to enforce this kind of integrity rule.

The result of this mismatch of subtypes is that the inheritance structure in your model must be persisted in an SQL database that doesn't offer an inheritance strategy. Fortunately, three of the inheritance mapping solutions we show in chapter 5 are designed to accommodate the representation of polymorphic associations and the efficient execution of polymorphic queries.

The next aspect of the object/relational mismatch problem is the issue of object identity. You probably noticed that we defined USERNAME as the primary key of our USERS table. Was that a good choice? How do we handle identical objects in Java?

1.2.3. The problem of identity

Although the problem of object identity may not be obvious at first, we'll encounter it often in our growing and expanding e-commerce system, such as when we need to check whether two objects are identical. There are three ways to tackle this problem: two in the Java world and one in our SQL database. As expected, they work together only with some help.

Java objects define two different notions of sameness:

  • Object identity (roughly equivalent to memory location, checked with a==b)

  • Equality as determined by the implementation of the equals() method (also called equality by value)

On the other hand, the identity of a database row is expressed as the primary key value. As you'll see in chapter 9, section 9.2, "Object identity and equality," neither equals() nor == is naturally equivalent to the primary key value. It's common for several nonidentical objects to simultaneously represent the same row of the database, for example, in concurrently running application threads. Furthermore, some subtle difficulties are involved in implementing equals() correctly for a persistent class.

Let's discuss another problem related to database identity with an example. In our table definition for USERS, we used USERNAME as a primary key. Unfortunately, this decision makes it difficult to change a username; we need to update not only the USERNAME column in USERS, but also the foreign key column in BILLING_DETAILS. To solve this problem, later in the book we'll recommend that you use surrogate keys whenever you can't find a good natural key (we'll also discuss what makes a key good). A surrogate key column is a primary key column with no meaning to the user; in other words, a key that isn't presented to the user and is only used for identification of data inside the software system. For example, we may change our table definitions to look like this:

create table USERS (
    USER_ID bigint not null primary key,
    USERNAME varchar(15) not null unique,
    NAME varchar(50) not null,
    ...
)
create table BILLING_DETAILS (
    BILLING_DETAILS_ID bigint not null primary key,
    ACCOUNT_NUMBER VARCHAR(10) not null unique,
    ACCOUNT_NAME VARCHAR(50) not null,
    ACCOUNT_TYPE VARCHAR(2) not null,
    USER_ID bigint foreign key references USER
)

The USER_ID and BILLING_DETAILS_ID columns contain system-generated values. These columns were introduced purely for the benefit of the data model, so how (if at all) should they be represented in the domain model? We discuss this question in chapter 4, section 4.2, "Mapping entities with identity," and we find a solution with ORM.

In the context of persistence, identity is closely related to how the system handles caching and transactions. Different persistence solutions have chosen different strategies, and this has been an area of confusion. We cover all these interesting topics—and show how they're related—in chapters 10 and 13.

So far, the skeleton e-commerce application we've designed has identified the mismatch problems with mapping granularity, subtypes, and object identity. We're almost ready to move on to other parts of the application, but first we need to discuss the important concept of associations: how the relationships between our classes are mapped and handled. Is the foreign key in the database all you need?

1.2.4. Problems relating to associations

In our domain model, associations represent the relationships between entities. The User, Address, and BillingDetails classes are all associated; but unlike Address, BillingDetails stands on its own. BillingDetails instances are stored in their own table. Association mapping and the management of entity associations are central concepts in any object persistence solution.

Object-oriented languages represent associations using object references; but in the relational world, an association is represented as a foreign key column, with copies of key values (and a constraint to guarantee integrity). There are substantial differences between the two representations.

Object references are inherently directional; the association is from one object to the other. They're pointers. If an association between objects should be navigable in both directions, you must define the association twice, once in each of the associated classes. You've already seen this in the domain model classes:

public class User {
    private Set billingDetails;
    ...
}
public class BillingDetails {
    private User user;
    ...
}

On the other hand, foreign key associations aren't by nature directional. Navigation has no meaning for a relational data model because you can create arbitrary data associations with table joins and projection. The challenge is to bridge a completely open data model, which is independent of the application that works with the data, to an application-dependent navigational model, a constrained view of the associations needed by this particular application.

It isn't possible to determine the multiplicity of a unidirectional association by looking only at the Java classes. Java associations can have many-to-many multiplicity. For example, the classes could look like this:

public class User {
    private Set billingDetails;
    ...
}
public class BillingDetails {
    private Set users;
    ...
}

Table associations, on the other hand, are always one-to-many or one-to-one. You can see the multiplicity immediately by looking at the foreign key definition. The following is a foreign key declaration on the BILLING_DETAILS table for a one-to-many association (or, if read in the other direction, a many-to-one association):

USER_ID bigint foreign key references USERS

These are one-to-one associations:

USER_ID bigint unique foreign key references USERS
BILLING_DETAILS_ID bigint primary key foreign key references USERS

If you wish to represent a many-to-many association in a relational database, you must introduce a new table, called a link table. This table doesn't appear anywhere in the domain model. For our example, if we consider the relationship between the user and the billing information to be many-to-many, the link table is defined as follows:

create table USER_BILLING_DETAILS (
    USER_ID bigint foreign key references USERS,
    BILLING_DETAILS_ID bigint foreign key references BILLING_DETAILS,
    PRIMARY KEY (USER_ID, BILLING_DETAILS_ID)
)

We discuss association and collection mappings in great detail in chapters 6 and 7.

So far, the issues we've considered are mainly structural. We can see them by considering a purely static view of the system. Perhaps the most difficult problem in object persistence is a dynamic problem. It concerns associations, and we've already hinted at it when we drew a distinction between object network navigation and table joins in section 1.1.4, "Persistence in object-oriented applications." Let's explore this significant mismatch problem in more depth.

1.2.5. The problem of data navigation

There is a fundamental difference in the way you access data in Java and in a relational database. In Java, when you access a user's billing information, you call aUser.getBillingDetails() .getAccountNumber() or something similar. This is the most natural way to access object-oriented data, and it's often described as walking the object network. You navigate from one object to another, following pointers between instances. Unfortunately, this isn't an efficient way to retrieve data from an SQL database.

The single most important thing you can do to improve the performance of data access code is to minimize the number of requests to the database. The most obvious way to do this is to minimize the number of SQL queries. (Of course, there are other more sophisticated ways that follow as a second step.)

Therefore, efficient access to relational data with SQL usually requires joins between the tables of interest. The number of tables included in the join when retrieving data determines the depth of the object network you can navigate in memory. For example, if you need to retrieve a User and aren't interested in the user's billing information, you can write this simple query:

select * from USERS u where u.USER_ID = 123

On the other hand, if you need to retrieve a User and then subsequently visit each of the associated BillingDetails instances (let's say, to list all the user's credit cards), you write a different query:

select *
    from USERS u
    left outer join BILLING_DETAILS bd on bd.USER_ID = u.USER_ID
    where u.USER_ID = 123

As you can see, to efficiently use joins you need to know what portion of the object network you plan to access when you retrieve the initial User—this is before you start navigating the object network!

On the other hand, any object persistence solution provides functionality for fetching the data of associated objects only when the object is first accessed. However, this piecemeal style of data access is fundamentally inefficient in the context of a relational database, because it requires executing one statement for each node or collection of the object network that is accessed. This is the dreaded n+1 selects problem.

This mismatch in the way you access objects in Java and in a relational database is perhaps the single most common source of performance problems in Java applications. There is a natural tension between too many selects and too big selects, which retrieve unnecessary information into memory. Yet, although we've been blessed with innumerable books and magazine articles advising us to use StringBuffer for string concatenation, it seems impossible to find any advice about strategies for avoiding the n+1 selects problem. Fortunately, Hibernate provides sophisticated features for efficiently and transparently fetching networks of objects from the database to the application accessing them. We discuss these features in chapters 13, 14, and 15.

1.2.6. The cost of the mismatch

We now have quite a list of object/relational mismatch problems, and it will be costly (in time and effort) to find solutions, as you may know from experience. This cost is often underestimated, and we think this is a major reason for many failed software projects. In our experience (regularly confirmed by developers we talk to), the main purpose of up to 30 percent of the Java application code written is to handle the tedious SQL/JDBC and manual bridging of the object/relational paradigm mismatch. Despite all this effort, the end result still doesn't feel quite right. We've seen projects nearly sink due to the complexity and inflexibility of their database abstraction layers. We also see Java developers (and DBAs) quickly lose their confidence when design decisions about the persistence strategy for a project have to be made.

One of the major costs is in the area of modeling. The relational and domain models must both encompass the same business entities, but an object-oriented purist will model these entities in a different way than an experienced relational data modeler would. The usual solution to this problem is to bend and twist the domain model and the implemented classes until they match the SQL database schema. (Which, following the principle of data independence, is certainly a safe long-term choice.)

This can be done successfully, but only at the cost of losing some of the advantages of object orientation. Keep in mind that relational modeling is underpinned by relational theory. Object orientation has no such rigorous mathematical definition or body of theoretical work, so we can't look to mathematics to explain how we should bridge the gap between the two paradigms—there is no elegant transformation waiting to be discovered. (Doing away with Java and SQL, and starting from scratch isn't considered elegant.)

The domain modeling mismatch isn't the only source of the inflexibility and the lost productivity that lead to higher costs. A further cause is the JDBC API itself. JDBC and SQL provide a statement-oriented (that is, command-oriented) approach to moving data to and from an SQL database. If you want to query or manipulate data, the tables and columns involved must be specified at least three times (insert, update, select), adding to the time required for design and implementation. The distinct dialects for every SQL database management system don't improve the situation.

To round out your understanding of object persistence, and before we approach possible solutions, we need to discuss application architecture and the role of a persistence layer in typical application design.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.152.166