9 Object Databases

Together with the advent of object-oriented programming languages came the problem of how to easily store objects persistently in a database system. We briefly review the most important aspects of the object-oriented paradigm that affect the way objects can be stored in a database. On this basis, we discuss the storage of objects in relational databases, describe the steps that object-relational databases made towards object support, and lastly cover purely object-oriented database systems.

9.1 Object Orientation

The peculiarities of object-orientation are usually not entirely covered by conventional database systems. The following object-oriented constructs give rise to difficulties for object storage.

Classes and Objects: A class definition specifies a set of entities, or, more precisely, it specifies a type for a set of entities: that is, a class defines common features of these entities. The concrete, uniquely identifiable entities inside a class are then called objects or instances of the class.

Encapsulation: A class definition contains both attributes (also called variables) and methods. The values stored in the attributes describe the current state of a concrete object (equivalent to the attributes in an Entity-Relationship diagram). Methods describe the behavior that all objects of a certain class have. In other words, an object encapsulates both state and behavior. Method calls are used to let objects communicate with each other and send messages between them: one object calls a method of another object so that it executes the operations defined in the method.

Person

name: String
age: int

marry(Person p)
divorce()

Information Hiding: As we have seen in Section 1.3.2, when modeling objects with UML, attributes and methods can have different scopes of visibility. It is common use that attributes should not be accessible from outside: methods should protect attributes from direct access. Another effect of information hiding is that, as long as the external interface remains the same, internal implementation of an object can change without a need to modify accessing objects.

Person

– name: String
– age: int

# marry(Person p)
# divorce()

Complex data types: Attributes can be either simple or complex. A simple attribute has a system-defined primitive type (such as integer or string) and takes on only a single such primitive value. A complex attribute can contain a collection of primitive values, a reference to another (possibly user-defined) class, or even a collection of references. An object that contains complex attributes is called a complex object. A reference attribute corresponds to a relationship/association between entities. The class definition of the referenced object can be defined anonymously within the referencing class. Or alternatively, the referenced class is an external class with its own identity; then, the reference attribute either contains the object identifier of the referenced object or it contains the memory address of the referenced object.

Specialization: A class can be defined to be a special case of a more general class: The special cases are called “subclasses” and the more general cases are called “superclasses”. Hence, defining a subclass for a class is called specialization; the reverse – defining a superclass for a class – is called generalization. Subclasses inherit all properties of their superclasses; however, subclasses can redefine (override) inherited methods. Subclasses can also extend the superclass definition by their own attributes or methods. In object-oriented programs, usually objects of a subclass can be substituted in for objects of a superclass; that is, objects of a subclass can be treated as objects of one of their superclasses. Due to this property of specialization, another important object-oriented feature is dynamic binding: while a method call is specified in a program, only at runtime the concrete subclass and hence the appropriate method implementation to be executed is determined.

Abstraction: Abstraction allows for separating external interfaces of an object from the internal application details. As mentioned previously, classes implementing an interface must implement all methods defined by the interface. A comparison of generalization and abstraction is shown in the UML diagram in Figure 9.1.

All these properties of object-oriented programs require that objects be managed in a totally different manner than tuples in a relational database table. The term object-relational impedance mismatch has been coined to describe the incompatibility of object-oriented and relational paradigms.

9.1.1 Object Identifiers

The most-distinctive feature of objects in an object-oriented program is that each object has its own object identifier (OID) which is assigned to the object at the time of creation. The OID is a value that system-generated, unique to that object, invariant during the lifetime of the object, and independent of the values of the object’s attributes (that is, the object state). As mentioned previously, references between objects can be implemented by assigning the reference attribute the OID of the referenced object. With the given notion of OIDs, a clear distinction can be made with regard to whether two objects are identical or whether they are equal:

image

Fig. 9.1. Generalization (left) versus abstraction (right)

Identity of objects: The definition of identity is simply based on the OID: Two objects are identical only if the have the same OID.

Equality of objects: Equality however is value-based: Two objects are equal when the values inside their attributes coincide (that is, they have the same state independent of their OIDs).

For complex objects, equality can be ambiguous: there is a further distinction into shallow equality and deep equality: Shallow equality means that if the shallowly equal objects reference other objects, it has to be checked whether all referenced objects are identical; in other words objects that are shallowly equal reference objects with the same identifier. Deep equality is more complicated: If the deeply equal objects reference other objects, it has to be checked whether all referenced objects are also deeply equal; that is, they have the same values in their attributes, and in the referenced objects all attributes have the same values, and all attributes in the objects referenced by the referenced objects have the same values and so on).

We can now see the following notable difference to the relational model: Identity of tuples in a relational table is value-based instead of ID-based and hence identity and equality coincide. In other words, two tuples inside one table are identical when they have exactly the same values for each of their attributes. In addition to this, there is no means of identifying a tuple other than by the unique values of its primary key.

From a data storage point of view, permanence of OIDs is on important issue: the scope of validity of an OID must be larger for a database system than for a common, short-lived application. The following scopes of OIDs are possible:

Intraprocedure: the OID is valid during execution of a single procedure; the object identified by the OID exists only inside the procedure and hence the OID can be reused once the procedure has finished.

Intraprogram: the OID is valid during the execution of a single application; when used in different applications, the same OID references totally different objects inside one application and the other.

Interprogram: the OID can be shared by several applications on the same machine, but the OID of an object might change when an application is restarted or run on a different machine.

Persistent: the OID has a long-term validity and hence is persistent; an object always has the same ID even when accessed by different applications on different machines or in different executions of the same application.

In sum we see that persistent OIDs are needed when storing objects in database systems so that they can be loaded and reused by different applications.

9.1.2 Normalization for Objects

We have reviewed the process of normalization for the relational data model in Section 2.2; its purpose is to obtain a good database schema without anomalies. For object storage models it might be similarly beneficial to distribute attributes among different classes as it might reduce interdependencies between complex objects. Generally speaking, it makes sense to define normalization by object identifiers.

image

Normalization for objects demands that every attribute in an object depends on the object’s OID.

But, taking a closer look, the case of normalization is a bit more difficult for object models than for the conventional relational data model: because object-orientation also has the feature of specialization, object normalization is not only affected by complex objects but also by class hierarchies. Moreover, methods (and method distribution among objects) can play a role in normalization. Normalization for objects has not been as deeply analyzed as relational normalization; indeed, various proposals of object normalization techniques exist. In this section we informally present four object normal forms (ONFs). We illustrate these ONFs with a library example. The unnormalized form in Figure 9.2 consists of a Person class and a separate Reader class.

image

Fig. 9.2. Unnormalized objects

The first ONF (1ONF) wants to avoid repetitions of attributes in a class. To achieve 1ONF, repetitive sequences of attributes (that represent a new type) are replaced by a 1:n-association to a new class representing the new type. Methods operating on the outsourced attributes are also moved to the new class.

image

First Object Normal Form (1ONF): A class is in 1ONF when repetitive sequences of attributes (representing a new type) are extracted to their own class and replaced by a 1:n-association. Methods applicable to the new class will be part of the new class.

In our example, we see that book information is repeated in the Reader class. Hence is makes sense to extract these repeating attributes into a separate Book class as in Figure 9.3. All methods belonging to books are also moved to the new Book class.

The second ONF (2ONF) extracts information that is shared by objects of different classes (or by different objects of the same class).

image

Second Object Normal Form (2ONF): A class is in 2ONF when it is in 1ONF and information that is shared by multiple objects is extracted to its own class. The classes sharing the information are connected to the new class by appropriate associations. Methods applicable to the new class will be part of the new class.

image

Fig. 9.3. First object normal form

image

Fig. 9.4. Generalization (left) versus abstraction (right)

In our example, the information of the due date is shared by reader and book; as we have seen previously, it should be extracted as an association class characterizing the association between readers and books as in Figure 9.4. The method for setting the due date is moved to the new class.

The third ONF (3ONF) is meant for ensuring cohesion: A class should not mix responsibilities but instead should have only a single well-defined task.

image

Third Object Normal Form (3ONF): A class is in 3ONF when it is in 2ONF and when it encapsulates a single well-defined, cohesive task. Other tasks have to be extracted into separate classes (together with the methods for the tasks) and linked to the other class(es) by associations.

In our example we can see in Figure 9.5 that the address information can be extracted into a separate address class. This has the advantage that the Address class can be used for storing other addresses as well (not only addresses of readers) and the internal address format can be changed without modifying the accessing classes (for example, Reader). We can now also easily model that readers have multiple addresses (like home address, office address etc.).

image

Fig. 9.5. Third object normal form

The fourth ONF (4ONF) reduces duplication of attributes and methods by building a class hierarchy: Some classes may be subclasses of other classes effectively inheriting their attributes and methods without a need for duplicating them. If necessary, appropriate superclasses have to be newly created.

image

Fourth Object Normal Form (4ONF): A class is in 4ONF when it is in 3ONF and when duplicated attributes and methods are extracted into a superclass (or an existing class is used as the superclass, respectively). The class is linked to the superclass by an inheritance relation.

Obviously we can make Person the superclass of Reader as in Figure 9.6 and hence avoid the duplicate declaration of the person attributes.

To sum up, normalization can help with the construction of a well-structured software design. A word of warning should however be given: As is the case for relational normalization, object normalization should be based on an assessment of application requirements. A good software design should also take data accesses and data usage into consideration. For example, if whenever a person object is accessed also his address information is needed, then the address information could best be embedded into the person object instead of extracting an address into an external class. From a database perspective, retrieving one larger object (a person with embedded address) is usually more efficient than retrieving two smaller separate objects (a person object and its associated address object).

image

Fig. 9.6. Fourth object normal form

9.1.3 Referential Integrity for Objects

Referential integrity for objects is similar to referential integrity for the relational data model. While referential integrity for the relational data model is based on foreign keys (see Section 2.3), referential integrity for objects is based on object identifiers.

image

Referential Integrity: For each referenced OID in the system there should always be an object present that corresponds to the OID.

This means that any referenced object must exist. In particular, dangling references should be avoided: one should not delete a referenced object without informing the referencing object. Hence, there is a need to maintain a backward (“inverse”) reference for each forward reference. More precisely, for complex objects, inverse attributes can be used to maintain referential integrity. As an example, consider the 1:n-relationship from readers to books. To ensure referential integrity, deletion of an object of class Reader must fail as long as it points to at least one object of class Book. To delete an object of class Reader, first of all, the system has to modify the inverse references to the Reader object from objects of class Book; modification can be setting the references to another Reader object or to a null value.

9.1.4 Object-Oriented Standards and Persistence Patterns

Accompanying the evolution of object-oriented programming languages, several groups formed to promote the idea of object-orientation. A main goal was (and still is) to develop standards for object models and query languages to improve the interoperability between different object-oriented programming languages. The first international group was the Object Management Group (OMG). It was founded in 1989 by several hundred partners including large software companies. Although it is not an official standardization organization, it developed a reference object management architecture that defined an object model and the Object Request Broker (ORB) as a communication architecture for objects. The Common Object Request Broker Architecture (CORBA) is a reference architecture that uses an ORB as well as an interface repository or a stub/skeleton approach to enable message passing between objects that do not reside on the same server. The OMG also adopted the Unified Modeling Language (UML) as the standard for object-oriented software development.

image Web resources:

OMG: http://www.omg.org/


specifications: http://www.omg.org/spec/

The Object Data Management Group (ODMG) was founded by vendors of object database systems. The Object Data Management Group announced to develop a “4th generation” standard for object databases by adopting the ODMG standard (version 3.0) as sketched below. It extended the OMG object model leading to the ODMG object model. The ODMG also defined an Object Definition Language (ODL) and an Object Query Language (OQL). The ODMG officially ceased to exist in 2001, but still some ODMG-compliant object database systems are available. In more detail, the main components of the ODMG specification are

The object model (OM) describes what an an object is. It first of all distinguishes between literals and objects. Literals are just constant values (they do not have an identifier) and hence are immutable. There are atomic literals (like integers, doubles, strings and enumerations), collection literals (set, bag, list, array and dictionary), and structured literals (like date and time). In contrast, objects have an identifier and are mutable (they can change their value); objects include user-defined types and mutable versions of the above listed collections and structures. The lifetime of an object can be either transient (it is just needed inside one method call or inside one application and is destructed after termination of the method or application), or persistent (it has to be persistently stored in a database).

The object definition language (ODL) is used for specifying type definitions for interfaces and classes. In particular, with the ODL one can define attributes (that can only have literal values), relationships (that represent one or more referenced objects and hence correspond to reference attributes or collections of references), and method signatures (without the actual definition in the method bodies). ODL definitions can be used to exchange objects between different programming languages.

The object query language provides a SQL-like syntax to execute queries on objects in a database. It uses the well-known SELECT FROM WHERE clause, with the difference that the SELECT clause can be used to execute methods on an object, and path expressions can be used to follow relationships (that is, references) inside an object. Flattening must be used to break a collection object into its constituent objects to be able to process them further.

Apart from the comprehensive object technologies defined by OMG and ODMG, some more light-weight notations and patters have become widely adopted. For example, when it comes to include storage components in an object-oriented software design, the design pattern of Data Access Object (DAO) is commonly used. It separates persistence issues from the rest of the application, and hence is an abstraction of any storage details (like database accesses). Roughly the DAO pattern works as follows: for each application object that should be persisted to the database, there has to be an accompanying DAO; the DAO executes all necessary operations to create, read, update or delete the object in the database. Note however that transactions can (and should) not be handled by each DAO individually but have to be managed globally for the application.

image Web resources:

Oracle Core J2EE Patterns – Data Access Object:
http://www.oracle.com/technetwork/java/dataaccessobject-138824.html


IBM developerWorks:
Sean Sullivan: Advanced DAO programming – Learn techniques for building better DAOs: http://www.ibm.com/developerworks/java/library/j-dao/

9.2 Object-Relational Mapping

One approach to persistently store objects out of an object-oriented program is to use a conventional RDBMS as the underlying storage engine. In this case, we have to map each object to (one or more) tuples in (one or more) relational tables. In particular, on the one hand, we have to write code that decomposes the object and stores the attribute values into the appropriate tables; on the other hand, when retrieving the object from storage we have to recombine the tuple values and reconstruct an object out of them. We focus here on the problems that arise with complex objects (in particular, collection attributes and reference attributes) and specialization.

Table. 9.1. Unnormalized representation of collection attributes

image

9.2.1 Mapping Collection Attributes to Relations

We start with the mapping of collection attributes: collections correspond to multi-valued attributes (as introduced in Section 1.3.1). As already briefly described in Section 2.1, multi-valued attributes are not allowed in the conventional relational model; a fact that leads to difficulties when mapping the Entity-Relationship diagram to a relational schema. To illustrate these difficulties even more we elaborate the example of a Person table with an ID attribute as its key and other attributes for Name, Hobby and Child information. Persons can have more than one hobby and more than one child; that is, Hobby as well as Child should be multi-valued attributes. However we try to model these attributes in our relational table by simply duplicating entries for Name, Hobby, Child in all necessary combinations. We assume the following domains for the (now single-valued) attributes: dom(ID)=dom(Child): Integer, dom(Name)=dom(Hobby): String. When looking at Table 9.1, we notice a lot of redundancy: Name, Hobby and Child have many duplicated entries.

Normalization (see Section 2.2) can come to the rescue: we obtain three tables (see Table 9.2) – Name (N), Hobby (H) and Child (C) – where the ID is used as a key for the Name table and a foreign key for the Hobby and Child table, respectively. We see that we got rid of unnecessary redundancy: each combination of ID and hobby, as well as ID and child, only occurs once. However, redundancy reduction comes at the cost of more complex querying. In fact, SQL queries that have to combine names, hobbies and children need lots of join operations – for example, the query “What are the hobbies of Alice’s grandchildren?”:

SELECT H.Hobby FROM C C1, C C2, N, H WHERE N.Name = ’Alice’
AND N.ID = C1.ID AND C1.Child = C2.ID AND C2.Child = H.ID

Table. 9.2. Normalized representation of collection attributes

image

This query requires joins on all three tables including a self-join on the child table and hence are quite costly.

All in all we note that, while it is technically possible to store collections (and hence multi-valued attributes) in a relational table, we forfeit performance of object storage and object retrieval.

9.2.2 Mapping Reference Attributes to Relations

Another aspect of complex objects is that reference attributes have to be stored in the database. Reference attributes represent relationships/associations between classes; in particular, a reference attribute can point to an association class (see Section 1.3.2) that is used to link two or more objects together possibly with additional attributes. Reference attributes contain the OID of the referenced object. Hence in order to obtain referential integrity for objects (see Section 9.1.3), we have to explicitly store the OID of each object: each table representing a class has a separate column for the OID. The OID column can then serve as a foreign key in the referencing class. Then we have to ensure referential integrity in the relational tables as described in Section 2.3.

9.2.3 Mapping Class Hierarchies to Relations

The feature of specialization of classes implies that classes are effectively organized in a class hierarchy. More and more attributes are implicitly added to objects of subclasses deeper in the hierarchy. An appropriate table structure (that is, a database schema) has to be devised to store all the attribute values belonging to an object of some class in the hierarchy. Here we only consider the case of one level of specialization; with increasing depth of specialization, however, the problem of storing attributes of superclasses is aggravated. As an example for a simple class hierarchy, we consider two subclasses (Student and Employee) of a Person class (see Figure 9.7).

image

Fig. 9.7. Simple class hierarchy

To store these classes in a relational database, we have three options:

Store each class in a separate table: We store the attributes of the superclass in a table separate from the attributes of the subclass. The superclass and the subclass information then has to be linked together by an ID. For our example this means that we store general Person data (attributes Name and Age) in Person table; Employee data (attribute Company) in an additional Employee table and use ID as a foreign key to reference the Person table; as well as Student data (attributes University and StudentID) in an additional Student table and use ID as a foreign key to reference the Person table. More formally, we have the following relation schemas:

Person({ID, Name, Age},{ID →{ID, Name, Age}})

Employee({ID, Company},{ID →{ID, Company}})

Student({ID, University, StudentID},{ID →{ID, University, StudentID}})

And the database schema:

D={{Person, Employee, Student},{Employee.IDPerson.ID, Student.IDPerson.ID}}

With this way of storing a class hierarchy, we lose the distinction between what is the subclass and what is the superclass. This semantics must then be built into the query strings sent by the accessing applications. For example, applications have to decompose objects to write the data into different tables: correct SQL insertion statements have to be written depending on the subclass. For example, inserting an employee, like

INSERT INTO Person VALUES (1,’Alice’,31)

INSERT INTO Employee VALUES (1,’ACME’)

is different from inserting a student, like

INSERT INTO Person VALUES (2,’Bob’,20)

INSERT INTO Student VALUES (2,’Uni’, 234797).

When retrieving objects from the database, the application programmer has to write SQL queries to join the Person table with the right subclass table. Retrieving all names of employees of ACME requires a join with Employee:

SELECT P.Name FROM Person P, Employee E

WHERE P.ID = E.ID AND E.Company = ’ACME’

Whereas, retrieving all names of Students of Uni requires a join with Student:

SELECT P.Name FROM Person P, Student S

WHERE P.ID = S.ID AND S.University = ’Uni’

Hence, storing each class in a separate table leads to duplication of code and handling the different table names might cause inconsistencies. The advantage is that this storage method reduces redundancy in the tables and maintains the superclass information; for example, retrieving all names of all persons is easy:

SELECT P.Name FROM Person P

Store only the subclasses in tables: We store all the attributes of the subclass together will all inherited attributes in one table. For our example this means that we don’t have a Person table but instead store all employee data (attributes ID, Name, Age, Company) in the Employee table and all student data (attributes ID, Name, Age, University and StudentID) in the Student table. That is, we have the following two relation schemas each with key ID:

Employee({ID, Name, Age, Company},{{ID} →{ID, Name, Age, Company}})

Student({ID, Name, Age, University, StudentID},{{ID} →{ID, Name, Age, University, StudentID}})

And the database schema is simply composed of these two relation schemas:

D={{Employee, Student},{}}

Insertion and selection of values that only affect a single subclass has now become easier, because we just have to access the right subclass. Application logic must construct correct SQL statements depending on subclass. Inserting an employee is now just a single SQL statement:

INSERT INTO Employee VALUES (1,’Alice’,31,’ACME’)

And the same applies to inserting a student:

INSERT INTO Student VALUES (2,’Bob’,20,’Uni’, 234797)

In the same vein, retrieving all names of employees of ACME does not require a join with another table:

SELECT E.Name FROM Employee E WHERE E.Company = ’ACME’

and neither does retrieving all names of students of Uni:

SELECT S.Name FROM Student S WHERE S.University = ’Uni’.

Choosing the right table for storing an object is still the task of the accessing application. However, we absolutely lose the information of what is defined by the superclass (in our example, the attributes Name and Age). The semantics of the superclass must be built into application logic: a UNION operation has to be executed on all subclasses to produce the superclass. In our example, to get the names of all persons we have to combine the information from the Employee and the Student table:

SELECT E.Name FROM Employee E UNION SELECT S.Name FROM Student S

Store all classes in a single table: We store all subclasses in one relation and hence combine all attributes of all subclasses in this single relation. Doing this we cannot differentiate between subclasses anymore: from the table structure it is not clear which attribute belongs to which subclass. That is, again the accessing application has to ensure correct handling of subclasses. Additional attribute “Type” that contains as value the class name of the appropriate subclass can artificially be introduced to distinguish the subclasses. In our example, we only have a single Person table with all attributes ID, Name, Age, Company, University and StudentID. The additional type attribute ranges over {Employee, Student} and ID is the key attribute. The relation schema of Person is hence:

Person({ID, Name, Age, Company, University, StudentID, Type},{{ID} →{ID, Name, Age, Company, University, StudentID, Type}})

Again each application must construct correct SQL statements depending on subclass. The SQL statements differ on whether an employee is inserted:

INSERT INTO Person
VALUES (1,’Alice’,31,’ACME’, NULL, NULL, ‘Employee’

or a student is inserted:

INSERT INTO Person
VALUES (2,’Bob’,20, NULL, ’Uni’, 234797,’Student’)

What we see from this small example is that lots of unnecessary attributes are stored for the objects; that is, the table stores lots of NULL values.

Retrieving data for a subclass hence requires checking the type attribute; for example, all names of employees:

SELECT P.Name FROM Person P WHERE P.Type = ’Employee’

or, all names of Students:

SELECT P.Name FROM Person P WHERE P.Type = ’Student’

As the superclass information is now contained in the single table, retrieving information of the superclass is easy; for example, retrieving the names of all persons:

SELECT P.Name FROM Person P

9.2.4 Two-Level Storage

When looking at storage management for object-relational mapping, the storage can be divided into main memory as the first level and disk storage as the second level. While the OOPL application storage model in main memory is object-oriented (dividing data into objects, variables and the like), the database storage model on disk is relational (handling data in terms of tables, tuples etc.). Data loading and storage is hence more complex due to increased transformation efforts. In particular, there is a separation in the main memory between the database page buffer and the OOPL application cache (sometimes called the local object cache). Hence the following basic steps are required to handle the two-level storage model (compared to Section 1.2):

1.the application needs to access some object the attribute values of which are stored in an RDBMS; the application hence produces some query to access the appropriate database table (or tables if the object values are spread over more than one table);

2.the DBMS locates a page containing (some of) the values of the demanded object on disk (possibly using indexes or “scanning” the table);

3.the DBMS copies this page into its page buffer;

4.as the page usually contains more data than currently needed, the DBMS locates the relevant values (for example, certain attributes of a tuple) inside the page;

5.the application (possibly using a specialized database driver library) copies these values in the application’s local object cache (potentially conversions of SQL data into the OOPL data types are necessary);

6.the application reconstructs the object from the data values by instantiating a new object and assigning the loaded values to its attributes;

7.the application can access and update the object’s attribute values in its local object cache;

8.the application (again using the database driver library) transfers modified attribute values from the local object cache to the appropriate page in the DBMS page buffer (conversions of OOPL data types into SQL data types might be necessary);

9.the DBMS eventually writes pages containing modified values back from the DBMS page buffer onto disk.

What we see is that not only the mapping of objects to tables as well as reconstructing objects by reading values from different tables involves some overhead; the storage management itself is more complex to due transformations and conversions necessary to handle data in the main memory.

9.3 Object Mapping APIs

As shown in the previous sections, mapping objects to relational tables requires quite some data engineering. In particular, for the process of storing objects, the programmer has to manually connect to the RDBMS, store the object’s attribute values by mapping them to tables while ensuring that referential integrity is maintained, and additionally store class definitions (including method bodies). For the process of retrieving objects, the programmer has to connect to the RDBMS, potentially read in the corresponding class definition, retrieve the object’s attribute values (potentially by joining several tables) and also reconstruct all referenced objects. In contrast, object-relational mapping (ORM) tools form an automatic bridge between the object-oriented programming language (OOPL) and an RDBMS. In particular, there is no need to write SQL code manually. This implies that the source code is better readable; and also better portable between different RDBMSs – although an RDBMS-specific database driver is usually still required. In ORM tools often optimizations like connection pooling and automatic maintenance of referential integrity are already included.

ORM tools and APIs are available for a wide range of OOPLs. Standards for Java object persistence are the Java Persistence API (JPA; see Section 9.3.1) as well as the Java Data Objects (JDO) API which is covered in Section 9.3.2.

9.3.1 Java Persistence API (JPA)

We will have a closer look at the Java Persistence API; this API is defined in the javax.persistence package of the Java language. The Java Persistence Query Language (JPQL) is used to manage objects in storage. Additional metadata (for example used to express relationships between classes) are used to map objects into database tables; metadata can be expressed as annotations (markers starting with @ in the source code) or configuration files (in XML).

image Web resources:

Oracle Java Platform, Enterprise Edition: http://docs.oracle.com/javaee/


The Java EE Tutorial Part VIII Java Persistence API:
http://docs.oracle.com/javaee/7/tutorial/partpersist.htm

The main components of JPA are:

Entity Objects: Entity objects are objects that are to be stored in the database. The class of an entity object has to implement the Serializable interface and has to be marked with the annotation @Entity. This annotation causes the generation of a database table – by default the table name is the class name; the default table name can be changed with the @Table annotation. Column names of the table will be the attributes of the class; a column name can be changed by explicitly specifying the annotation @Column. If some of the attributes should not be stored, they can be marked with @Transient. An example for a Person class with three persistent attributes and one transient attribute is as follows:

@Entity @Table(name=“PERSON”)
public class Person implements Serializable {

@Column(name=“FIRST_NAME” )
String firstname;

@Column(name=“LAST_NAME” )
String lastname;

@Column(name=“AGE” )
int age;

@Transient private boolean hungry;

}

Note that it is more common to define columns by calling the “getter” methods for an attribute. For example, the firstname column might be defined by

@Column(name=“FIRST_NAME”)

public String getFirstName() {

return firstname;

}

Entity Lifecycle: A persistence context is a set of entity objects at runtime; all objects in a persistence context are mapped to same database (“persistence unit”). The EntityManager API manages the persistence context; it also supports transactions that must be committed (see the “all or nothing” principle in Section 2.5). An entity object can have different states during its lifecycle:

new: A new entity object is created but not managed (in a persistence context) nor persisted (stored on disk)

persist: An entity object is managed and will be persisted to the database on transaction commit

remove: An entity object is removed from a persistence context and will be deleted from the database on transaction commit.

refresh: The state of entity object is (re-)loaded from database

detach: When a transaction ends, the persistence context ceases to exist; that is, the connection of the entity object to the database is lost and loading of any referenced objects (“lazy loading”) is impossible

merge: The state of a detached entity object is merged back into a managed entity object.

Identity Attributes: In each class definition, a persistent identity attribute is required: it is mapped to primary key of the database table corresponding to the class. There are three ways to specify the identity attribute; it can be either

a single attribute of simple type which is annotated with @Id, or

a system-generated value by annotating the ID attribute with @GeneratedValue, or

spanning multiple attributes by adding a user-defined class for the ID which can be an internal embedded class (annotated with @EmbeddedId), or an external independent class (by annotating the entity with @IdClass).

Note that the identity attribute must be defined in the top-most superclass of a class hierarchy. As an example for an identity attribute that is system-generated and stored in an identity column (when the row belonging to a Person object is inserted in the table), consider the following Person class:

@Entity

public class Person implements Serializable {

@Id @GeneratedValue(strategy=GenerationType.IDENTITY)
protected Long id;

}

Embedding: Embedding allows for the attributes of an embedded class to be stored in the same table as the embedding class. The reference attribute in the embedding class is annotated with @Embedded; while the embedded class definition is annotated with @Embeddable. For example, we could define an Address class that can be stored (in embedded form) in the same table as the remaining person attributes.

@Entity public class Person implements Serializable {

@Id protected Long id;

@Embedded protected Address address;

}

@Embeddable public class Address {…}

Inheritance: JPA offers all the three kinds of mapping a class hierarchy to relational tables (as described in Section 9.2.3). With the @Inheritance annotation we can define which of the strategies is used.

The first case (“Store each class in a separate table”) is declared with the annotation @Inheritance(strategy=InheritanceType.JOINED): superclasses and subclasses each are mapped to their own table and each table contains only those columns defined by the attributes in the class; as mentioned previously, the ID attribute has to be defined in the top-most superclass (and hence will be a column in the corresponding table) and in all subclass tables the ID is used as a foreign key to link rows belonging to the same object in the separate tables.

The second case (“Store only the subclasses in tables”) is declared with the @Inheritance(strategy=InheritanceType.TABLE_PER_CLASS) annotation: In a table for a subclass, columns will be created for all attributes inherited from any superclass; hence, we have duplication of all superclass attributes on subclass tables. Note however that concrete superclasses (those that are not abstract classes) will get their own table, too: any concrete superclass can be instantiated and its objects will be stored in the appropriate superclass table. If this is not desired, we can annotate a superclass with @MappedSuperclass (instead of annotating it with @Entity): in this case no table for the superclass will be created and no object of the superclass can be stored (only objects of its subclasses).

The third case (“Store all classes in a single table”) is declared with the annotation @Inheritance(strategy=InheritanceType.SINGLE_TABLE): All objects of all superclasses and subclasses are stored together in one table that contains columns for all attributes. As described in Section 9.2.3, we need an additional type column to differentiate between objects of different class; this type column is called discriminator column in JPA. We can give this discriminator column a name with the @DiscriminatorColumn annotation: for example, for our person hierarchy we can annotate the Person class with @DiscriminatorColumn(name = “PERSON_TYPE”); each subclass will be annotated with a discriminator value: for example, the Employee class can be annotated with its own discriminator value like @DiscriminatorValue(“employee”) such that any employee in the table will have the value employee in the PERSON_TYPE column.

Relationships: JPA allows for cardinalities for relationships (as the ones described in Section 1.3.1). Hence, an attribute that references an object of another class can be annotated with either @OneToOne, @OneToMany (or its reverse @ManyToOne), or @ManyToMany. A one-to-many relationship corresponds to a collection attribute (Set, List, Map,…). A relationship can be unidirectional – which means we only have a forward reference – or bidirectional – in this case we also have an inverse reference that helps ensure referential integrity as described in Section 9.1.3. For bidirectional relationships the forward reference is on the so-called owning side; the backward (inverse) reference is on the owned side. The owning side is responsible for managing the relationship; for example, maintaining the correct foreign key values that participate in the relationship. In order to do so, the owned side has to annotate the backward reference with the information which attribute on the owning side constitutes the forward reference; this is done with the mappedBy statement. In our previous library example we had the case of a one-to-many (1:n) relationship between a reader and his loaned books. When implementing this as a bidirectional relationship with the Book class as the owning side, and the Reader as the owned side, then the mappedBy statement declares that the attribute borrower in the Book class is responsible for the management of the relationship:

@Entity public class Book implements Serializable {

@Id protected Long BookId;

@ManyToOne protected Reader borrower;

}

@Entity public class Reader implements Serializable {

@Id protected Long ReaderId;

@OneToMany (mappedBy=“borrower”)

protected Set<Book> booksBorrowed = new HashSet();

}

Several other options are available that can be used to configure storage of relationships in JPA. For example, relationships may be loaded as either eager or lazy. Lazy loading means that loading of a referenced object is deferred until the object is actually accessed for the first time. Eager loading means that a reference object is loaded when the referencing object is loaded. Another issue is cascading of operations to referenced objects; for example, we might configure that whenever an object is stored (“persisted”), referenced objects are also persisted. These settings can be specified individually for each relationship; they could however also be configured globally in the XML mapping file (for example, persistence by reachability means that all referenced objects are always persisted when the referencing object is persisted).

Java Persistence Query Language: The Java Persistence Query Language (JPQL) is a SQL-like language that offers projection onto some attributes (in its SELECT clause), explicit JOINS, subqueries, grouping (GROUP BY), as well as UPDATE and DELETE operations.

For example, from our Person table with embedded Address information we can retrieve Alice Smith’s hometown and its ZIP code as follows:

SELECT p.address.city, p.address.zipcode FROM Person p

WHERE p.firstname=‘Alice’ AND p.lastname=‘Smith’

Query Objects: Query objects can be created by callen the createQuery() method of the EntityManager API. Queries are processed by executing getResultList(). Named queries can be stored for reuse with different parameters; this can be done by using the @NamedQuery annotation. Dynamic queries are specified at runtime; their number of parameters can change and they can have named or positional parameters. For example, to get a list of persons for a given ZIP code we can define a query method findByZipcode where the zip code can be input as a parameter:

public List findByZipcode(int zip) {

@PersistenceContext EntityManager em; Query query = em.createQuery(“SELECT p FROM Person p” +

“ WHERE p.address.zipcode = :zipparameter”);

query.setParameter(“zipparameter”, zip);

return query.getResultList();

}

When using a JPA-compliant ORM tool, several system-specific settings have to be made; these are usually declared in an XML configuration file. For example, for the Hibernate ORM tool, a configuration file can contain properties (like the location of driver for the underlying RDBMS, the URL for the database connection, and a user-name and password for the database connection), as well as mappings (like names of classes we want to store in the database):

<hibernate-configuration>

<session-factory>

<property name=“hibernate.connection.driver_class”>

org.postgresql.Driver</property>

<property name=“dialect”>

org.hibernate.dialect.PostgreSQLDialect</property>

<mapping class=“org.dbtech.Person”/>

</session-factory>

</hibernate-configuration>

9.3.2 Apache Java Data Objects (JDO)

The Java Data Objects API is meant to be highly independent of document formats or data models of databases or any database-specific query languages. The main purpose of JDO is to let Java programmers interact with any underlying database (or data format) without using database-specific code.

image Web resources:

Apache JDO: http://db.apache.org/jdo


specifications: http://db.apache.org/jdo/specifications.html


Apache SVN repository: http://svn.apache.org/viewvc/db/jdo/

In JDO there a three types of classes:

persistence capable classes that are mapped to the storage layer – they are annotated as @PersistenceCapable;

persistence aware classes that interact and modify persistence capable classes – they are annotated as @PersistenceAware;

normal classes that are totally unaware of any storage related issues and are not stored themselves.

Persistence capable and persistence aware classes must be declared by either using an XML medatadata file or by using annotations (for example, @PersistenceCapable). Persistence related operations are offered by the interface PersistenceManager. Field-level persistence modifiers can be persistent, transactional or none (in which case defaults or persistence by reachability are applied). In general, an object (if it is declared transient) that is referenced by a persistent object as a persistent field will become persistent if the referencing object is stored and with the closure of its references in the object graph. For fields of objects, there is a so-called default fetch group: depending on the type of the field, some fields are loaded by default whenever an object of a class is loaded (for example simple types like numbers); however arrays and other collections types by default are not loaded.

JDO supports two types of object identifiers: application identity is based on the values of some fields in the object (these fields are then also called primary key); datastore identity however considers an internal identifier that the programmer can neither declare nor influence.

An object can hence be in one of several different states during its entity lifecycle: Transient: An object that is newly created and is not yet or will never be persisted. Persistent New: A newly created object that has been stored to the data store for the first time.

Persistent Dirty: A persistent object that has been modified after it has been last stored. Its state in memory is different from its state in the data store.

Persistent Clean: A persistent object that has not been modified after it has been last stored and hence represents the same state in memory and in the data store. Persistent Deleted: Any persistent object that is to be removed from the data store.

Hollow: An object that is stored in the data store but not all its fields have been loaded into memory.

Detached Clean: An in-memory object that is disconnected from its datastore representation and has not been changed since it was detached.

Detached Dirty: An in-memory object that is disconnected from its datastore representation but has been changed since it was detached.

Furthermore, modifications of an object can be part of a transaction or not. Depending on this, even more states of an object are possible by differentiating whether an object is transactional or non-transactional. The persistence manager offers several methods to manage the different states of an object. For example, to store a newly created object, makePersistent is called as follows:

PersistenceManagerFactory pmf = JDOHelper

.getPersistenceManagerFactory(properties);

PersistenceManager pm = pmf.getPersistenceManager();

Transaction tx = p m.currentTransaction();

try {

tx.begin();

Person p = new Person(“Alice”,”Smith”,31);

pm.makePersistent(p);

tx.commit();

}

finally {

if (tx.isActive()){

tx.rollback();

}

}

An object can be retrieved from storage by using its identifier:

Object obj = pm.getObjectById(identity);

Moreover, the getExtent method returns a collection of all persisted objects of a given class, that can be iterated.

Extent e = pm.getExtent(Person.class, true);

Iterator iter=e.iterator();

As a query language the Java Data Objects Query Language (JDOQL) can be used. A query object is created with the help of the persistence manager; a selection condition can be passed as a parameter of a filter; and then the query can be executed.

Query q = pm.newQuery(Person.class);

q.setFilter(“lastName == ”Smith”“);

List results = (List)q.execute();

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.224.226