Chapter 7. Rapidity and the Database

Although there is a perverse sense of pleasure to be had from writing carefully optimized data access code, the repetitive and error-prone nature of the task soon reduces it to a menial chore.

The J2EE platform offers a sophisticated set of services for accessing relational database management systems (RDBMS), yet despite these services, many developers find the task of writing data access code for enterprise systems a time-consuming and laborious process.

This chapter examines the problems relational databases present for the rapid developer. By applying the techniques covered in the previous chapters, we look at how the right development tools and code generation techniques can help to alleviate the issues identified with writing data access code.

Specifically, we focus on the open source tools Hibernate and Middlegen; two products that can ease the frustrations often associated with implementing a persistence layer by both simplifying data access technology and reducing the amount of code we must write.

First, let’s examine the problems relational databases present the enterprise specialist.

The Database Dilemma

Enterprise-scale database servers are highly sophisticated software products capable of storing enormous amounts of data in a format that is fully optimized for blisteringly fast data access and retrieval. Given this level of sophistication, why do databases cause such frustration for the J2EE developer?

A number of factors, both cultural and technical, combine to make life harder for the developer:

  • Enterprise data is a valuable corporate asset, and its access and management is often carefully controlled.

  • Databases use relational rather than object technology, resulting in an object–relational impedance mismatch.

  • Databases are sensitive to change, with database schema modifications having the potential to impact significantly any dependent systems.

To understand why databases present such barriers to producing solutions rapidly and agilely, let’s consider each of these factors in turn.

Enterprise Data Is Valuable

Enterprise data is a valuable company asset that is strategic to an organization’s ability to conduct its core business. Consequently, companies go to great lengths to safeguard the integrity and security of such vital corporate resources. Having enterprise data prized so highly and treated so carefully has its implications for development teams:

  • In many organizations, development teams and database teams are operated as separate groups.

  • Enterprise databases are unlikely to be used exclusively by a single application but by other systems and reporting tools as well.

  • As new systems replace old systems, applications must deal with legacy data structures.

  • Access to information may be restricted if the data is commercially sensitive.

It is worth considering these points in more detail, as each has an impact on how a development project is conducted.

Separate Development and Database Teams

Due to the importance of company data, it is common for companies to run a dedicated team of database administrators (DBAs) and data architects charged with safeguarding and administering all enterprise-level data repositories. This enterprise data team is often independent of the application development project team, but typically advise the project team on database design issues.

The implications of this distinction between application and data teams mean software architects do not have complete freedom to structure the data used by the application as they see fit. Instead, it is likely the data architect, whose role is to ensure the application needs of a single development project are not in conflict with corporate data standards and policies, must approve all database designs in some capacity.

This constraint may prevent the development team from adopting certain data access technologies that require data structure to be laid down according to a specific format. Moreover, the development team architects may find themselves having to work with a database structure that is not to their liking and may preclude the use of some data access technologies.

Although many architects might feel aggrieved not to have total control over the design of an application, it is reasonable that someone with specialized database design skills should be involved in the database design. The skill set for designing and maintaining a database is vastly different than that of designing J2EE applications.

This issue points to a cultural difference between teams using object-oriented methods to develop software and those charged with the integrity of corporate data. In his book Agile Database Techniques [Ambler, 2003], Scott Ambler suggests appointing someone to the role of mediator between the development and database groups. Ambler defines this role as the Agile DBA.

The role of the Agile DBA is to bridge the gap between the J2EE development team working with object-based techniques and the database group whose focus is on data modeling. By mediating between the two groups, the Agile DBA should ensure both teams are working toward the same goal, regardless of paradigm.

Shared Data Between Systems

Data that is truly enterprise-level is unlikely to be the sole preserve of a single system. Such data is usually accessed, and possibly even updated, by other applications within the organization.

Shared access is likely to come from multiple directions. Batch processes running reconciliation or data-fix jobs are common. Most organizations use commercial software tools for accessing data in order to generate reports. Consequently, a J2EE application is likely to share a database with batch processes and commercial reporting tools.

Shared database access between applications has implications for the data architect and the J2EE architect. The data architect must design the schema of the database according to the best practices of database design to ensure efficient use of the database for all systems, not just those using object-oriented technologies. Thus, the data architect is reluctant to violate the principles of good database design to meet the needs of a single application unless the application in question is of significant strategic importance to the business.

For the J2EE architect, sharing a database with other systems presents design issues, especially if considering the use of EJB-caching technology, as is offered by entity beans for addressing performance concerns.

Legacy Data Structures

Many new enterprise systems are either replacing older systems or being integrated with existing systems. Corporate data also tends to have a longer life span than software systems, so existing data must be migrated to newer systems as they come online.

Project teams therefore find themselves working with legacy data structures that are a hangover from an older system. In this situation, the team has no control of the structure of the data and must work with the design in place. Given the tendency for software to atrophy over time, legacy data structures often bear the scars of numerous enhancements, design changes, and emergency quick fixes. Such legacy data structures can result in data access code that is extremely difficult to write due to the convoluted nature of the data design upon which it must be based.

Data Security and Confidentiality

Where data is considered especially commercial sensitive or of a personal nature, access to data that is a copy of a production version may not be possible for the development team. This situation is most likely to arise when an existing database is being built upon.

A project team might be denied access to the very data it is expected to work with for a variety of reasons, including government laws regarding personal data.

This issue is of particular relevance where an organization employs the services of a separate software development company to undertake application development on its behalf. With this scenario, the commercially sensitive nature of the data may prevent the external development team from accessing any data representative of the production version. If this situation arises, additional tasks must be added to the project plan to cover the creation of suitable test data for the development and testing teams.

note

Some companies have policies that mandate that all sensitive customer data, such as names, addresses, and phone numbers, be either stripped or obfuscated before being made available to development teams.

In addition, not being able to work with actual data and realistic data volumes presents some significant risks to the project. These risks relate to performance, since exploratory prototypes cannot be used to validate that the design will meet the performance criteria required of the system. Subtle differences between test data and actual data may also present problems when the system is released into a live environment.

All of the issues discussed so far take time to manage and thus may extend the timeframe of the project.

The Object-Relational Impedance Mismatch

Object-oriented and relational database technologies represent separate and distinct conceptual paradigms. The term object-relational impedance mismatch, or impedance mismatch, was coined in the early 1990s to formalize the problems endemic to moving between the object and the relational worlds.

The impedance mismatch problem occurs because object and relational techniques each work toward different objectives. Databases rely on the mathematical precision of relational algebra to structure business data in an efficient normalized form. Object-oriented design methods go beyond pure data modeling to define business processes as a collection of collaborating business components that have both state and behavior.

Given the impedance mismatch problem, the question arises, Why are object and relational technologies so frequently used together for the development of enterprise systems? An alternative to the relational database does exist in the form of the object database management system (ODBMS). However, ODBMS technology is taking time to mature and has yet to prove itself at the enterprise level. For this reason, almost all enterprise software uses a relational database.

Relational databases are a mature and proven technology and can trace their origins back to the 1970s when Dr. E. F. Codd, the father of the relational database, was working on defining his famous twelve rules.

Contrast this history with object-oriented technologies such as J2EE, which have only emerged into the mainstream in the past decade. Despite the frustrations the impedance mismatch causes object-oriented practitioners, relational databases are likely to be the standard form of database technology for enterprise software for the foreseeable future. Therefore, it is important to understand the constraints imposed by impedance mismatch and why the problem causes such headaches.

To appreciate the problems, consider the ideal behavior a J2EE architect would like to see from a persistence mechanism. Most well-designed object-oriented systems are constructed around a domain model. The domain model describes the various relationships between each object involved in the problem domain. Typically, the objects and the relationships between them are represented using a UML class diagram.

Ideally, the architect would like object instances from this domain model to be transparently persisted to and from the underlying database, although a good design would see a persistence layer residing between the business objects of the domain and the data store for decoupling purposes.

Note

The importance of layers in software architecture is covered in Chapter 4.

The keyword here is that of transparency. True persistence transparency enables objects to be transferred between the database and the application without concern for the intricacies of how the state of an object is persisted to the data store.

Unfortunately, impedance mismatch problems make true transparency difficult to achieve if a relational database is the target. Let’s consider some of the reasons this should be the case.

Mapping Database Types to Java Types

The first problem is relatively straightforward. The properties of a persistent Java class must be mapped to columns in a database table. The Java language and relational databases support subtly different basic types. For persistence to occur, the types must be mapped correctly to ensure no loss of data results from, for example, long Java strings being truncated in VARCHAR(20) columns. The mapping of types is relatively easy to manage. Mapping relationships, however, is considerably more complex.

Mapping Relationships

On the surface, the differences between object and relational technologies appear to be only superficial. After all, relational databases enable relationships to be specified between entities, while the object-oriented model defines relationships between classes.

Database designers use entity-relationship (ER) diagrams to describe relationships between database entities. ER diagrams are not part of the UML but are a recognized modeling notation. Relationships between entities in a database are modeled based on cardinality and enforced using foreign keys. Three possible relationship types can be modeled with relational technology:

  • One-to-one

  • One-to-many

  • Many-to-many

note

The many-to-many relationship, although it may be modeled, is not supported by relational databases. Common practice in this case is to use a link, or association table, to split the many-to-many relationship.

The object-oriented designer has a richer set of relationships to draw upon. Information that can be both modeled in the UML class diagram and implemented in the Java code includes:

  • Relationship cardinality

  • Association by both composition and aggregation

  • Inheritance

  • Unidirectional and bidirectional relationships

Coercing these relationships to fit those of the relational database model is not a trivial task, and a direct mapping of object model to database schema can result in a suboptimal database design. This gives rise to the argument as to which technology, object-oriented or relational, should be driving the design of the data model.

Data Models Driving the Object Model

For the majority of enterprise systems, the data model drives the design of a system’s object model. The reasons for taking this approach are as follows:

  • Object models tend to translate into inefficient database schemas.

  • Databases are often accessed by other enterprise systems not using object-oriented technology.

  • Database schemas are more rigid and harder to change than the object models, which are in the hands of the development teams.

Despite these reasons, life is considerably simpler for the development team if the object model translates directly into the underlying data architecture. This approach removes many of the headaches associated with mapping between the two paradigms and enables systems to be constructed swiftly.

Nevertheless, for the development of enterprise software, such arguments are likely to prove moot. As we discussed previously, enterprise data is a valuable commodity, and no data architect is likely to accept a data model from an object-oriented designer that does not comply with the best practices of data modeling techniques.

Data Access Options

When it comes to implementing a persistence layer, J2EE application developers seem to be spoiled for choice, with an almost bewildering range of data access technologies available. This section provides an overview of some of the more common mechanisms employed on projects for implementing one of the most important layers in the application:

  • The JDBC API

  • Object/Relational mapping tools

  • Entity beans

  • Java Data Objects

The foundation of all these technologies is the JDBC API, which we discuss first.

Java Database Connectivity (JDBC)

The ability to write code for data access using the JDBC API is a skill most Java developers possess. JDBC enables us to communicate directly with the database, pushing and pulling information to and from the relational world into that of the object-oriented, forming a bridge between the two paradigms.

The strengths of JDBC lie in its popularity, as it is perhaps the most common method of database access for Java applications. By coding to the JDBC API, developers can

  • Establish database connections

  • Execute SQL statements

  • Invoke database-stored procedures

  • Process database results

  • Achieve a degree of portability between different database products

Here are some of the main advantages of using a JDBC driver for database access:

  • Most Java developers are familiar with the JDBC API.

  • Developers skilled in SQL and relational database technology can construct highly optimized SQL statements.

  • JDBC drivers are available for all major DBMS products.

  • JDBC resources are managed by the J2EE server.

JDBC enables the use of SQL as a domain-specific language for simplifying data manipulation with convenient and easily understandable language constructs. With a suitable database driver, the developer has the option of constructing SQL statements within the code and executing them against the database at runtime.

This approach, in addition to being time consuming and error prone, is very sensitive to database change. Changes to table names, column names, or the order of columns within a table require the data access code to be updated. Worse still, differences between data access code and the database schema are only detected at runtime, as embedded SQL statements are not validated at compile time. The careful use of stored procedures is one method of alleviating the impact of the runtime detection issue.

Some of the problems associated with the use of the JDBC API for object persistence include the following:

  • The process of writing code to map objects to a relational database is time consuming and error prone.

  • Embedding SQL within Java code results in a brittle data access layer that is easily broken by database schema changes.

  • The writing of highly optimized SQL statements is a skill not all Java developers possess.

  • Poorly written SQL statements can have a catastrophic effect on database performance.

  • Although the JDBC API is common between DBMS products, SQL syntaxes vary to the degree that JDBC data access code is not portable between databases.

The next sections look at database access technologies that use JDBC as a base but attempt to avoid some of these pitfalls.

Object/Relational Mapping Tools

One technology that has grown in maturity and sophistication in recent years is object/relational (O/R) mapping. Mapping tools offer a method of storing an object-oriented domain model transparently to the database without the need to write a line of JDBC data access code or SQL.

The task of persisting an object’s state is handled by the O/R mapping product, which generates the JDBC calls on our behalf. All that is required of the developer is to specify, usually via an XML-based configuration document, the mapping rules for a particular class. Based on the rules in the configuration document, the O/R mapping tool undertakes all data access, leaving the developer free to concentrate on implementing critical business functionality within the application instead of writing boilerplate data access code. Defining the mapping rules can be done by hand but is more commonly generated by the O/R mapping tool.

The benefits of O/R mapping technology include these:

  • Increased developer productivity, as the O/R mapping tool generates the JDBC calls necessary for object persistence

  • A flexible persistence layer, as database changes can be reflected in the application by regenerating all data access code

  • Simpler software architectures, as the persistence concern is handled by the O/R mapping tool

  • Better system performance, as the generated data access code is optimized for the target DBMS

  • The ability to operate both outside or inside the confines of a J2EE server, applicable to both the EJB and Web containers

  • Portability, as the SQL generated by the O/R mapping tool can target a specific DBMS

tip

The ability to operate O/R mapping tools outside of the J2EE server has excellent implications for evolutionary prototyping.

For example, a persistence layer can be used either as part of a prototype built as a Web application, using only the Web container, or as part of a two-tier application developed using Swing for the GUI.

The persistence layer for either prototype scenario can then be evolved to an EJB architecture by having session beans make use of the O/R mapping tool.

During this chapter, we look at an example of an open source O/R mapping tool. First, let’s look at two Java technologies that build on the services of the JDBC API, but with an O/R mapping approach to object persistence.

Entity Beans

Entity beans are a form of O/R mapping technology and are a member of the Enterprise Java-Beans family. Entity beans provide an object-oriented view of business domain model entities held in persistent storage, where persistent storage can be either a database or an existing application. By providing a wrapper around the underlying data, entity beans attempt to simplify the process of data access and retrieval. Entity beans are a core part of the EJB specification and so can be considered the official persistence layer for J2EE applications.

An entity bean is a shareable, in-memory representation of a row in a database table. The combination of being both shareable and in-memory allows entity beans to operate as a caching mechanism for application data. The responsibility of controlling concurrent access to the cached data in the bean and refreshing or writing the cache when needed is the responsibility of the EJB container. The EJB container also safeguards the state of an entity by ensuring it can survive a server crash.

While the EJB container manages the state of an entity bean and ensures support for concurrent access, the mechanism by which the underlying row in the database is accessed and updated is the responsibility of the developer. The decision as to how data access is managed is dictated by two different flavors of entity bean:

  • Bean-Managed Persistence (BMP).

    BMP entity beans rely on the developer implementing framework methods on the bean, such as ejbLoad() and ejbStore(), for managing data access. Typically, data access code takes the form of calls to the JDBC API.

  • Container-Managed Persistence (CMP).

    CMP is a form of rudimentary O/R mapping that sees the EJB container take on the responsibility for access to the data source. CMP was introduced as part of the EJB 1.1 specification, but unfortunately, the technology was not adequately specified to a workable level until version 2.0 of the EJB specification was released. It is now recommended that entity beans developed for the EJB 2.0 and 2.1 specifications use CMP in preference to BMP.

Here is a list of some of the main benefits entity beans offer as an object-persistence mechanism:

  • They are a mandatory part of the EJB specification and are supported by all compliant J2EE servers.

  • Entity beans are well supported by development tools.

  • They provide higher performance through instance pooling.

  • Management of the lifetime of an entity bean instance is the responsibility of the J2EE server.

  • Entity beans can survive a server crash.

  • The J2EE server manages access to shared entity bean instances from multiple threads.

  • As Enterprise JavaBeans, entity beans offer declarative support for security and transactions.

Although the inclusion of entity beans within the EJB specification would seem to make them the persistence technology of choice, the reaction of developers to the merits of entity beans has been less than favorable. Some of the complaints raised regarding entity beans include these:

  • The CMP O/R mapping functionality offered by entity beans is limited when compared to mature O/R mapping tools such as TopLink and CocoBase.

  • It is not possible to model the inheritance relationship.

  • Entity beans, like all enterprise beans, are heavyweight components and require considerable amounts of framework code (boilerplate).

  • Components are more suited to business objects than persistent domain model objects.

  • Entity beans are complex to develop and do not align with the domain model, especially if inheritance is involved.

  • Entity beans cannot be tested outside of the container.

  • Developers have found enterprise beans difficult to work with.

The hostile reaction of the Java community to entity beans has seen other technologies spring to the fore. One such technology is Java Data Objects.

Java Data Objects

Somewhat confusingly, Java also has a second O/R mapping technology that competes with entity beans. Java Data Objects (JDO) is a specification defined under the Java Community Process by JSR-012, Java Data Objects. A second iteration of the JDO technology is currently under review, which is covered by JSR-243.

note

Find the JDO specifications for JSR-012 and JSR-243 at http://jcp.org/en/jsr/detail?id=012 and http://jcp.org/aboutJava/communityprocess/edr/jsr243/index.html respectively.

The JDO specification has a set of objectives similar to O/R mapping products in that the stated aim of the technology is to provide Java developers with a transparent Java-centric view of persistent information residing in a variety of data sources.

The scope of JDO is wider than providing a purely object/relational mapping technology. JDO implementations cover a range of data storage mediums, including object databases and enterprise information systems. In this way, JDO provides an object view of persistent storage regardless of the underlying persistence technology, whether relational, object-based, or otherwise.

Here are some of the main features and benefits JDO technology brings to the Java platform:

  • It provides a standard Java-centric API for true transparent object persistence.

  • JDO implementations support the inheritance relationship.

  • JDO persistent classes run both inside and outside of a J2EE server.

  • A JDO enhancer can add data access code for persistence to standard Java objects (or plain old Java objects, POJOs) at the bytecode level.

  • JDO implementations offer access to EIS persistence resources via the J2EE Connector Architecture on the J2EE platform.

Unfortunately, the JDO specification has failed to strike a chord with O/R mapping tools vendors, presumably because products such as CocoBase from Thought Inc. and TopLink from Oracle have already carved themselves a sizeable market niche.

The JDO architecture also faces an uncertain future. The first draft release of the EJB 3.0 specification under JSR-220 announced that the next incarnation of the EJB architecture would provide yet another standard for a Java O/R mapping technology.

Since that time, Sun Microsystems has announced that as both the EJB and JDO specifications are undergoing further revision under the Java Community Process, members of both the JSR-220 and JSR-243 Expert Groups will collaborate to define a single specification for providing transparent Java object persistence. Only time will tell what this move means for the future of JDO and entity beans.

Code Generation and O/R Mapping

The database is a rich source of metadata, making the writing of data access code a prime candidate for code generation techniques. As we learned in Chapter 6, Code Generation, active code generation can help increase developer productivity and promote project agility.

The next sections cover by example how O/R mapping tools can combine with code generation techniques to automate the generation of all data access code for a project. The example covers the use of code generation and O/R mapping tools from the perspective of the data model driving the process, as this is the typical scenario for enterprise software.

Two open source tools are used in the example: Hibernate and Middlegen. Hibernate offers O/R mapping, while Middlegen is a database-driven code generation tool, able to generate mapping files for a variety of O/R mapping technologies, including entity beans, JDO, and Hibernate.

note

Using code generation for producing data access code is not exclusive to O/R mapping products. Entity beans and standard JDBC calls can both benefit from code generation techniques.

The example has a number of steps that demonstrate an end-to-end generation process from entity model to database and then back to an object models:

  1. Create a database.

  2. Define the entities of a data model using a modeling tool.

  3. Use the modeling tool to forward-engineer the script to create a database schema.

  4. Create the database schema.

  5. Use Middlegen to construct Hibernate mapping files by reverse engineering from the database schema.

  6. Use the Hibernate tool, hbm2java, to create plain Java classes from the mapping files.

  7. With the modeling tool, reverse-engineer the generated Java code to view the object model.

  8. Compare the original data model against the generated object model.

First, we look at the two main products used in the example.

Introducing Hibernate

Hibernate is an open source O/R mapping product. Unlike entity beans, which are heavyweight components, Hibernate allows standard Java objects, or POJOs, to be transparently written and retrieved to and from the underlying data store.

As with commercial O/R mapping tools, Hibernate supports enterprise-level capabilities, including transactions, connection pooling, a powerful SQL-like query language, and the ability to declaratively define entity relationships. The tool also integrates with J2EE, making it suitable for both J2SE-based and J2EE-based applications.

Hibernate uses configuration files to hold metadata describing mapping settings and entity relationship information. The Hibernate configuration files are one of the tool’s key strengths. They offer extensive configuration options, making it possible to precisely define the behavior of the persistence framework. Such configuration precision ensures Hibernate can be tuned to deliver a high-performance persistence solution.

Hibernate supports the definition of inheritance, association, and composition relationships types, in addition to the specification of relationships based on Java collections. Hibernate supports most of the leading RDBMS products.

Although Hibernate is open source, it is nevertheless a proprietary API. Unlike JDO, it has not gone through the JSR process and consequent community review. Despite this, the Hibernate development team has arguably eclipsed JDO by providing a robust implementation with a highly detailed level of documentation. Consequently, Hibernate has enjoyed good word of mouth regarding the quality of the product, and developers have been quick to embrace Hibernate as the open source mapping tool of choice. The Hibernate team claims the product is the leading O/R mapping toolkit for Java.

The Hibernate binaries, source, documentation, and examples can be downloaded from http://www.hibernate.org. Hibernate is available under the LGPL license, which allows commercial use. As usual, check with the site for full licensing details.

Introducing Middlegen

Over the years, various code generators have emerged that use database metadata to drive the code generation process. Middlegen is one such open source product that has proven popular with developers and complements Hibernate’s persistence capabilities by taking on the chore of generating Hibernate mapping files.

Middlegen uses JDBC to access database metadata and the Velocity engine for code generation. The use of JDBC enables Middlegen to support most of the major RDBMS vendors. Like XDoclet, Middlegen uses an Ant build file for launching the product.

Note

The Velocity template engine and XDoclet are discussed in Chapter 6.

Middlegen uses a plug-in approach for code generation. After Middlegen has determined the database structure, the task of generating code is handed over to those plug-in generators that have been configured with the <middlegen> Ant build task. At the time of writing, plug-ins are available for generating entity beans, JDO classes, and Hibernate mapping files.

The example focuses on generating Hibernate mapping files; however, Middlegen is not constrained to this one persistence technology. The ability to produce code for more than one type of persistence mechanism is vital, as either the architecture or customer requirements may mandate the persistence technology used. The plug-in architecture of Middlegen provides the ability to accommodate such changes.

To download the latest version of Middlegen, along with full documentation and a comprehensive sample application, visit http://boss.bekk.no/boss/middlegen/.

note

As with all successful software technologies, ongoing development work continues to refine and improve the product. This is certainly the case with Middlegen and Hibernate. At the time of writing, I am using versions 2.x of both Middlegen and Hibernate. Middlegen currently has a version 3.0 in the pipeline, while Hibernate is producing a new version of its Middlegen plug-in for version 3.0. The release strategy is to make all of these betas final at the same time.

So as not to be totally out of date, the example uses a preview of the new Hibernate plug-in, which is available for download from the Hibernate site.

With all the necessary software installed, we can move on to generating the persistence layer. However, before we can look at how code generation techniques can combine with O/R mapping tools to save us time and effort, we need a RDBMS installed and an operational database established with a suitable schema.

The next section covers the setup of the RDBMS and creation of the database schema required for the example.

Setting Up the Database

To illustrate the virtues of Hibernate and Middlegen, we first install a suitable RDBMS and create a database for which a persistence layer can be generated.

The first step is to install an RDBMS, and in this case, we use the services of MySQL.

Introducing MySQL

MySQL is a small, efficient database and is freely available for most platforms. MySQL offers several types of the product, including MySQL Pro, a commercial version intended for use in production systems, and MySQL Standard, which is available free under the GNU Public License (GPL). Both versions of MySQL offer identical functionality, the only differences being in the terms and conditions of the license.

The upcoming example uses version 4.0 of MySQL Standard. MySQL binaries are available for download from http://dev.mysql.com/downloads/mysql/4.0.html.

Comprehensive installation instructions, along with the MySQL manual, can be found at http://dev.mysql.com/doc/mysql/en/Installing.html.

Access to a MySQL database from Java requires the MySQL Connector/J JDBC driver. This is a Type 4 driver, and like MySQL Standard, is free to download. The latest version can be pulled down from http://www.mysql.com/products/connector/j.

From this download, we get mysql-connector-java-3.0.11-stable-bin.jar, which must be placed on the classpath of any Java application requiring access to a MySQL database.

For any additional information regarding MySQL, refer to the main product Web site at http://www.mysql.com.

Once the database has been installed, the next step is to build up a suitable database schema for use in the example.

Creating a Database Schema

To generate the persistence layer from the database, we need a suitable schema from which to work. For the example, I have put together an entity-relationship (ER) diagram containing four entities: Customer, Account, Purchase_Order, and Item.

The ER diagram in Figure 7-1 defines the relationships between the entities.

Entity-relationship diagram for the Customer schema.

Figure 7-1. Entity-relationship diagram for the Customer schema.

From the ER diagram, the following relationships are apparent:

  • Each Customer entity can have exactly one Account.

  • Each Account can have many Purchase_Order entities.

  • A Purchase_Order is comprised of many Item entities.

  • A one-to-one relationship has been defined between Item and Product.

To save time creating the schema from the diagram, the data modeling features of a modeling tool were used to generate the data definition language (DDL) statements from the ER diagram. The example uses Borland’s Together ControlCenter to produce the DDL.

note

Unfortunately, the generated scripts weren’t exactly what I was after, making it necessary to tinker with the output. I therefore claim this to be an example of passive, not active, code generation.

The final version of the database script is shown in Listing 7-1.

Example 7-1. Database Script customer-mysql.sql

DROP TABLE IF EXISTS Customer;
DROP TABLE IF EXISTS Account;
DROP TABLE IF EXISTS Purchase_Order;
DROP TABLE IF EXISTS Item;
DROP TABLE IF EXISTS Product;

CREATE TABLE Customer
(
  customer_id mediumint(7) NOT NULL,
  name varchar(127),
  PRIMARY KEY (customer_id)
) TYPE=INNODB;

CREATE TABLE Account
(
  account_id mediumint(7) NOT NULL,
  customer_id mediumint(7) NOT NULL,
  balance mediumint(7),
  invoice_period mediumint(7),
  PRIMARY KEY (account_id),
  INDEX customer_idx(customer_id),
  FOREIGN KEY (customer_id) REFERENCES Customer(customer_id)
) TYPE=INNODB;

CREATE TABLE Purchase_Order
(
  order_id mediumint(7) NOT NULL,
  account_id mediumint(7) NOT NULL,
  delivery_date date,
  PRIMARY KEY (order_id),
  INDEX account_idx(account_id),
  FOREIGN KEY (account_id) REFERENCES Account(account_id),
) TYPE=INNODB;

CREATE TABLE Product
(
  product_id mediumint(7) NOT NULL,
  name varchar(127),
  description text,
  PRIMARY KEY (product_id)
) TYPE=INNODB;

CREATE TABLE Item
(
  item_id mediumint(7) NOT NULL,
  order_id mediumint(7) NOT NULL,
  product_id mediumint(7) NOT NULL,
  quantity mediumint(7),
  unit_price mediumint(7),
  PRIMARY KEY (item_id, order_id),
  INDEX order_idx(order_id),
  INDEX product_idx(product_id),
  FOREIGN KEY (order_id) REFERENCES Purchase_Order(order_id),
  FOREIGN KEY (product_id) REFERENCES Product(product_id)
) TYPE=INNODB;

The changes made were around declaring the type of the tables to be InnoDB by adding the line TYPE=INNODB. Without this specification, MySQL discards the relationship information imposed by the foreign key constraints. Another option to changing the scripts would have been to configure MySQL to use InnoDB table types by default.

Looking at the Account table, a relationship is expressed through the customer_id foreign key to the Customer table. It is a MySQL condition that to be able to define a foreign key, an index must exist for the key. Hence, the script declares INDEX customer_idx (customer_id). This creates the index necessary to allow the relationship to be defined between the Account and Customer entities.

Running the Database Script

To create the database schema, we must first create a database and run the script. Start MySQL using the instructions appropriate for your particular platform—for example, mysqld --console.

MySQL comes with a command-line application for interacting with the database engine. Issue mysql from a command shell to start the application, enter the following commands to create a database called customer, and run the script.

mysql> create database customer;
mysql> use customer
mysql> source customer-mysql.sql

Issuing show tables at the mysql prompt lists the newly created tables for the schema.

With the database established and the schema created, we can now put both Hibernate and Middlegen through their paces.

Generating the Persistence Layer

The example takes the Customer data model directly from the database and generates a persistence layer of POJOs. Once the persistence layer has been generated, we reverse-engineer the Java into a UML class diagram and compare the result with the original ER diagram.

The code generation process involves several steps:

  1. Produce an Ant build file for running Middlegen.

  2. Run Middlegen against the database and fine-tune the code generation options using the GUI.

  3. Generate Hibernate mapping documents.

  4. Use the Hibernate tool hbm2java to generate Java classes from the Hibernate documents.

Listing 7-2 shows the relevant extracts from the build.xml file required to run Middlegen.

Example 7-2. Database-Generated Ant Build File

<?xml version="1.0" encoding="UTF-8"?>
  .
  .
  .

  <!-- Use Middlegen to create hibernate files from DB -->
  <target name="generate"
          description="Run Middlegen">

    <taskdef name="middlegen"
             classname="middlegen.MiddlegenTask"
             classpathref="project.class.path"/>

    <middlegen appname="${name}"
               prefsdir="${gen.dir}"
               gui="true"
               databaseurl="${database.url}"
               driver="${database.driver}"
               username="${database.userid}"
               password="${database.password}"
               schema="${database.schema}"
               catalog="${database.catalog}">

      <hibernate destination="${gen.dir}"
                 package="${name}.hibernate"
                 genXDocletTags="true"
                 genIntergratedCompositeKeys="false"/>

    </middlegen>
  </target>

  <!-- Generate Java from mapping files -->
  <target name="hbm2java"
          depends="generate"
          description="Creates Java classes from hbm files">

    <taskdef name="hbm2java"
             classname="net.sf.hibernate.tool.hbm2java.Hbm2JavaTask"
             classpathref="project.class.path" />

    <hbm2java output="${gen.dir}">
      <fileset dir="${gen.dir}">
        <include name="**/*.hbm.xml"/>
      </fileset>
    </hbm2java>
  </target>

</project>

The build file controls the entire code generation process. These next sections cover each step.

Running Middlegen from Ant

The Middlegen tool does the hard work of creating the Hibernate mapping documents for us. The Middlegen tool is invoked from an Ant build file, and the <middlegen> task is provided for this purpose. From the example build.xml, we have the following:

<middlegen appname="customer"
           prefsdir="gen_src"
           gui="true"
           databaseurl="jdbc:mysql://localhost/customer"
           driver="org.gjt.mm.mysql.Driver"
           username=""
           password=""
           schema=""
           catalog="">

This task provides Middlegen with the information it requires to use the MySQL JDBC driver to connect to the Customer database through the attributes of databaseurl and driver. The attributes username, password, schema, and catalog provide further connection detail. These attributes are not needed if MySQL is running locally on a Windows platform.

The use of the gui attribute becomes clear shortly but basically tells Middlegen to open its visual configuration tool.

The next step is to tell Middlegen what to generate. Middlegen uses plug-ins for controlling code generation, and we use the Hibernate plug-in, which requires the nested element <hibernate> for configuration:

<hibernate destination="gen_src"
            package="customer.hibernate"
            genXDocletTags="true"
            genIntergratedCompositeKeys="false"/>

The package attribute provides the Hibernate plug-in with Java package information. The next two attributes require a little more description. The genXDocletTags tells the plug-in to generate XDoclet tags in the XML mapping documents. The XDoclet tags are ignored by the mapping documents. However, the Java generation tool later in the process picks up this additional metadata.

Where we have a composite primary key, Hibernate provides the option to generate an external primary key class. The choice of either an external or an internal primary key class is specified with the genIntegratedCompositeKeys attribute. We generate our own keys externally and set this attribute to false.

With this minimal Middlegen configuration complete, we can now invoke the build file and start the tool.

The Middlegen GUI

By setting the gui attribute to true, we have informed Middlegen we wish to use the GUI tool to configure the output from the plug-ins. Running the example build file brings up the screen shown in Figure 7-2.

The Customer schema in the Middlegen GUI.

Figure 7-2. The Customer schema in the Middlegen GUI.

The GUI allows us to assist Middlegen in correctly determining the relationships between entities. From the metadata available in the database, Middlegen is unable to infer the type of relationships established by a foreign key constraint.

tip

In preference to running the GUI, it is possible to define relationship information within the build script.

With the configuration used in our minimal build file, Middlegen assumes all foreign key constraints are modeling one-to-many relationships. Thus, on first opening the GUI, one-to-many relationships are defined between entities where a one-to-one relationship was expected—for example, Customer and Account, Product and Item. Thankfully, the GUI allows this initial interpretation to be corrected. By pressing the <ctrl> key and clicking with the mouse on the relationship, the cardinality of the association can be corrected.

The GUI also enables the output from the Hibernate plug-in to be further fine-tuned. Selecting either an entity or property in the ER diagram displays a configuration dialog for the element in focus. In this way, we can control exactly what is to be generated for the persistence layer. This dialog is shown in the lower pane of Figure 7-2.

warning

By default, all configuration settings made with the GUI tool are stored in the generated source directory. Cleaning all files from this directory results in these configuration settings being lost.

When satisfied with all settings, pressing the Generate button on the toolbar causes the Hibernate plug-in to generate mapping documents for each database entity. This process is visible in the console from which Middlegen was launched and provides a visual cue when the generation process has completed.

Let’s assess what Middlegen has produced.

Hibernate O/R Mapping Documents

As stated earlier, the output from the <middlegen> task is not Java source but Hibernate XML mapping documents. These documents instruct Hibernate how the mapping between a Java class and a database entity is to be orchestrated. A mapping document is produced for each entity in the Customer schema. Listing 7-3 shows the XML mapping document generated for the Purchase_Order database entity.

Example 7-3. Middlegen-Generated PurchaseOrder.hbm.xml

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
    "-//Hibernate/Hibernate Mapping DTD 2.0//EN"
    "http://hibernate.sourceforge.net/hibernate-mapping-2.0.dtd" >

<hibernate-mapping>
<!--
    Created by the Middlegen Hibernate plugin

    http://boss.bekk.no/boss/middlegen/
    http://hibernate.sourceforge.net/
-->

<class
    name="customer.hibernate.PurchaseOrder"
    table="purchase_order"
>
    <meta attribute="class-description" inherit="false">
       @hibernate.class
        table="purchase_order"
    </meta>

    <id
       name="orderId"
       type="short"
       column="order_id"
    >
       <meta attribute="field-description">
          @hibernate.id
           generator-class="assigned"
           type="short"
           column="order_id"

       </meta>
       <generator class="assigned" />
   </id>

   <property
       name="deliveryDate"
       type="java.sql.Date"
       column="delivery_date"
       length="10"
   >
       <meta attribute="field-description">
          @hibernate.property
           column="delivery_date"
           length="10"
       </meta>
   </property>
 <!-- associations -->
 <!-- bi-directional many-to-one association to Account -->
 <many-to-one
     name="account"
     class="customer.hibernate.Account"
     not-null="true"
 >
     <meta attribute="field-description">
        @hibernate.many-to-one
         not-null="true"
        @hibernate.column name="account_id"
     </meta>
     <column name="account_id" />
 </many-to-one>
 <!-- bi-directional one-to-many association to Item -->
 <set
     name="items"
     lazy="true"
     inverse="true"
 >
     <meta attribute="field-description">
        @hibernate.set
         lazy="true"
         inverse="true"

       @hibernate.collection-key
        column="order_id"

         @hibernate.collection-one-to-many
          class="customer.hibernate.Item"
     </meta>
      <key>
          <column name="order_id" />
      </key>
      <one-to-many
          class="customer.hibernate.Item"
      />
   </set>

</class>
</hibernate-mapping>

In Listing 7-3, it is evident how the configuration is used to map Java types to database types. This ease-of-use configuration is one reason why Hibernate is proving a favorite with developers. Many of the attributes are self-explanatory. For example, the mapping of a class to a table is achieved with the <class> element:

<class name="customer.hibernate.PurchaseOrder"
       table="purchase_order">

Likewise, the <property> element maps properties to columns:

<property
  name="deliveryDate"
  type="java.sql.Date"
  column="delivery_date"
  length="10"/>

The <id> element is similar to that of the <property> element but identifies the property as being a primary key:

<id
  name="orderId"
  type="short"
  column="order_id">
  <generator class="assigned" />
</id>

The <generator> element specifies how the primary key will be produced. A value of assigned indicates that we will supply Hibernate with the key. Alternatively, Hibernate can generate the key on our behalf from a variety of unique key-generation algorithms it provides.

Not all configuration settings are quite so easily understood, and some elements warrant further description. The <meta> element looks a little out of place: it appears to be using Javadoc notation and repeats configuration detail that is already specified in other tags. For example,

<meta attribute="field-description">
  @hibernate.many-to-one
    not-null="true"
  @hibernate.column name="account_id"
</meta>

These are XDoclet tags and are parsed by the XDoclet Hibernate plug-in when encountered in Java source. These tags exist because the genXDocletTags attribute was set for the nested <hibernate> element in the Ant build file. This information is picked up in the next step when Java source is generated from the Hibernate mapping documents. These XDoclet tags are written out to the Java classes that are mapped by the Hibernate XML mapping documents. Using XDoclet, should we wish, we can regenerate the mapping documents from the Java instead of from the database. XDoclet provides developers with the option to adopt an object-driven rather than data-driven generation approach.

Once the Hibernate mapping documents have been produced, we can move to the next stage of the process: generating Java from the mapping information.

From Mapping Documents to Java

Closing the GUI causes the Ant build file to run to completion. Hibernate comes with the hbm2java tool, used to generate Java from mapping documents, and the build file invokes this tool once Middlegen has completed its part of the process. The relevant section of the build.xml file is the <hbm2java> task:

<hbm2java output="gen_src">
  <fileset dir="gen_src">
    <include name="**/*.hbm.xml"/>
  </fileset>
</hbm2java>

This task generates a Java class for each Hibernate mapping document. Listing 7-4 shows an extract from the PurchaseOrder class produced from the PurchaseOrder.hbm.xml mapping file.

Example 7-4. PurchaseOrder.java Extract Generated from PurchaseOrder.hbm.xml

/**
 *        @hibernate.class
 *         table="purchase_order"
 *
*/
public class PurchaseOrder implements Serializable {

  /** identifier field */
  private Short orderId;

  /** persistent field */
  private customer.hibernate.Account account;

  /** persistent field */
  private Set items;

  /** full constructor */
  public PurchaseOrder(Short orderId,
                       Date deliveryDate,
                       customer.hibernate.Account account,
                       Set items) {
    this.orderId = orderId;
    this.deliveryDate = deliveryDate;
    this.account = account;
    this.items = items;
  }

  /**
    *            @hibernate.many-to-one
    *             not-null="true"
    *            @hibernate.column name="account_id"
    *
    */
  public customer.hibernate.Account getAccount() {
    return this.account;
  }

  public void setAccount(customer.hibernate.Account account) {
     this.account = account;
  }

  /**
    *             @hibernate.set
    *              lazy="true"
    *              inverse="true"
    *           @hibernate.collection-key
    *            column="order_id"
    *             @hibernate.collection-one-to-many
    *              class="customer.hibernate.Item"
    *
    */
  public Set getItems() {
    return this.items;
  }

  public void setItems(Set items) {
    this.items = items;
  }
  .
  .
  .

 }

Comparing the Hibernate mapping documents with generated Java gives an insight into how O/R mapping is handled by Hibernate. Of note are the embedded XDoclet tags, denoted with the @hibernate namespace. The XDoclet Ant task <hibernatedoclet> can be used to drive the generation of the mapping documents from the Java source if required.

Completing the Round Trip

The generation of the persistence layer was initiated from an ER diagram. Despite a few manual changes to the DDL for producing the database schema, the entire persistence layer was constructed by code generators. The result is a set of lightweight Java objects that provide an object view of the database entities.

Reverse engineering the generated Java code into a UML class diagram enables the object relationships to be compared with the original ER diagram. Figure 7-3 shows the UML class diagram for the persistence layer.

UML class diagram mapping to Customer database schema.

Figure 7-3. UML class diagram mapping to Customer database schema.

note

Adornments have been added to the diagram to show multiplicity between classes that have not been detected as part of the reverse-engineering process. However, these are only small refinements that describe what is already expressed in the generated code.

At the conclusion of the code generation process, we now have a transparent, object-based persistence layer with which to work. Best of all, other than an Ant build file, it wasn’t necessary to write a line of code to produce it.

Summary

We’ve established that the production of data access code for enterprise-level software can be a laborious, time-consuming task. Thankfully, due to the metadata maintained by the database on the information it holds, data access code is ideally suited to code generation techniques.

To demonstrate this, we examined how the code generation tool Middlegen can be combined with the O/R mapping tool Hibernate to offer a powerful framework for generating an object-based view of a system’s relational database entities.

One area of database development highlighted is the importance that organizations attach to enterprise data. Databases often contain critical and sensitive information, so they are carefully controlled corporate resources. Sensitivities surrounding access to this type of data can be detrimental to rapid development. This issue is a political consideration for the project and one that is best addressed through careful project management.

The next chapter focuses on Model-Driven Architecture, an approach to systems development that combines design, modeling tools, and code generation. If the protagonists of MDA are to be believed, this technology is the proverbial silver bullet for rapid development.

Additional Information

For more information on entity beans, visit Sun Microsystems’ J2EE site at http://java.sun.com/j2ee.

The latest news and developer information on JDO can be accessed at http://www.jdocentral.com.

For a side-by-side comparison of O/R mapping products, see http://c2.com/cgi-bin/wiki?ObjectRelationalToolComparison.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.224.135